o3 will see you now
Welcome to Cautious Optimism, a newsletter on tech, business, and power.
📈 Trending Up: Chances of a government shutdown … student debt relief … The U.K. getting frisky with its regulatory pen … slower, better AI models … Qwen2.5 … Tidelift … o3 …
📉 Trending Down: Crypto prices after a lengthy rally … stocks … Tether, in Europe … coming for the king, missing … intelligence …
Inflation
Inflation data is out this morning, showing PCE price index gains of 0.1% from October levels, or 2.4% on a year-over-year basis. The same index sans food and energy items (‘core PCE’) was up 0.1%, or 2.8% year-over-year. CNBC reports that both items came in “below expectations” preceding the release.
Yes, but: ‘Personal Income’ rose 0.3% in November, 0.1% under expectations. So, inflation was less than anticipated, but consumer health was a little under expectations.
The data does not spark joy, if joy is the anticipation of future rate cuts.
Notably the markets recovered some of their losses today despite the mess in Washington, and somewhat sticky inflation numbers, expectations be damned.
o3
Google’s been hailed recently on X and other tech-heavy platforms for a come-from-behind series of AI drops that impressed the market. Gemini 2.0, Veo 2, Imagen 3, and Gemini 2.0 Flash Thinking Experimental are pretty damned cool, I have to admit.
But OpenAI still has much to show off. Despite worries that its lead over rivals — driven partly by an earlier start — is slipping, today the Microsoft-backed AI model giant showed off o3. It’s very, very good. Here’s Maxwell Zeff and Kyle Wiggers from TechCrunch:
The model outperforms o1 by 22.8 percentage points on SWE-Bench Verified, a benchmark focused on programming tasks, and achieves a Codeforces rating — another measure of coding skills — of 2727. (A rating of 2400 places an engineer at the 99.2nd percentile.) o3 scores 96.7% on the 2024 American Invitational Mathematics Exam, missing just one question, and achieves 87.7% on GPQA Diamond, a set of graduate-level biology, physics, and chemistry questions. Finally, o3 sets a new record on EpochAI’s Frontier Math benchmark, solving 25.2% of problems; no other model exceeds 2%.
It also does well on one particular AGI benchmark, implying that there’s still progress to be made with today’s ~methods towards powerful, general artificial intelligence.
I am in the ‘when we reach AGI many people will miss it’ camp, because when we get a model that good, it won’t roll out everywhere at once. So, the future will be very unevenly distributed, to abuse the old saw.
Why will distro be limited? Cost. François Chollet who helped found the ARC-AGI “benchmark to measure the efficiency of AI skill-acquisition on unknown tasks” noted on X that o3 “scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute) and 87.5% in high-compute mode (thousands of $ per task).”
$20 per task is lethal for any product that a consumer might touch. $1000s per task is Bubonic Plague-level expensive. So, there’s more work to do. But OpenAI’s quick progress from o1 to o3 with their constituent improvements is so goddamn bullish that I cannot wait for what 2025 is going to bring. Hell yes.
Power
After helping to snuff a bipartisan compromise and forcing a vote on a slimmer, less-popular spending package in the House that did not merely fail along partisan lines, but managed to rack up 38 Republication defections in the process, current American shadow president Elon Musk is keeping his powder dry abroad.
Musk has been flirting with donating a huge slug of capital to the Reform Party in the United Kingdom. Previously the Brexit Party, Reform’s policies are Euro-skeptical as you can imagine, and feature a focus on American right-wing cultural issues (concern about our trans brothers and sisters), a desire to lower taxes, and some decided non-American right-wing ideas like putting more money into the NHS.
Musk is also close with Italian PM Giorgia Meloni and her FdI party, which is opposed to immigration, skeptical of LGBTQ rights and abortion access, and got in trouble after “an investigative outlet published two reports showing members of the party's youth wing making fascist salutes and using racist and antisemitic language” earlier this year.
And then there was the latest tweet from last night: “Only the AfD can save Germany.” AfD stands for Alternative for Germany, and it’s one heck of a political party. After strong results in some parts of Germany, the party — often dubbed ‘far-right’ — was criticized by Jewish groups in the nation. It’s also in favor of deportations, so much so that some party leaders were caught chatting up the issue with neo-Nazis about kicking out some German citizens along with others that the party views as burdensome to the nation.
You can draw any number of through-lines between the various parties that Musk supports around the world. But I don’t think that it’s off-base to say that, generally, the Musk-endorsed folks tend to be opposed to supranational organizations — the United Nations, the European Union, etc — opposed to national media organizations like the BBC, opposed to immigration, opposed to lowering carbon emissions and dealing with climate change head-on, and opposed to many liberal social planks (trans rights, abortion).
When CO started, I added ‘power’ to its remit to give it topical room to maneuver. Now that tech folks and their wealth are using both to garner and wield political power, I suspect we’re going to have a busy year. And with a greater focus on power than I had expected.