The AI Factory model of the future
More, cheap tokens means more generalized access in the market to AI-powered technologies.
Welcome to Cautious Optimism, a newsletter on tech, business, and power.
The Fed drops a rate decision this afternoon. No one expects a rate cut. So, all eyes will be on commentary regarding future rate cuts. Fun times! — Alex
📈 Trending Up: German defense spending … memory holes … Claude 3.7 … domestic power grabs … hypocrisy … X’s cash needs? … crony capitalism … 100M and counting … brain programming …
📉 Trending Down: Deezer stock, after earnings … antitrust? … Bakkt …
The AI Factory model of the future
Yesterday the world sat through a multi-hour keynote from the current leader of global technology, Nvidia CEO Jensen Huang. After recently reading Tae Kim’s The Nvidia Way, I went into the speech armed with a bit of context about how Nvidia works, and, in particular, how Jensen thinks.
Given my newfound appreciation for how Nvidia approaches pace, I was pretty curious to see what the company has cooking for the rest of the year. I was expecting lots of new chips (check), but what I did not expect was an economic argument for a new form of mass goods production (check?). Essentially, Jensen laid out a model for future economics of compute. Let’s try to get our arms around it.
We’re going to need more tokens
By my count this morning, Nvidia’s CEO said “token” 87 times during his keynote. Why did tokens get so much attention? Because they are the fundamental output of AI models. As the (I believe) synthetic female voice said during Jensen’s introduction, tokens are “the building blocks of AI.”
Nvidia believes that the number of tokens needed to build models is going to go up, just as the number of tokens generated by AI inference — actually using AI models — is going to go up. If you are even a modest AI bull, you agree, but there’s a case to be made that the number of tokens that global computers will need to generate in the future will prove a large multiple of what we see today.
Jensen argued that as today we have “AIs that can reason step by step by step using a technology called chain of thought,” the number of tokens we’re going to generate in the future is going to go up by a factor of a hundred or more. “Instead of just generating one token or one word after the next, [reasoning models] generate a sequence of words that represents a step of reasoning. The amount of tokens that's generated as a result is substantially higher.”
During the lengthy chat, Nvidia showed off new chips and detailed its roadmap for the next few years. I’ll let the real silicon-heads break down the company’s news for you, but the product gist is that Nvidia has a series of faster and better AI-crunching chips coming this year, next year, and the year after.
That means that not only is the company’s Blackwell line of GPUs supplanting its prior SOTA Hopper line today, boosting overall token throughput of global compute, but we can anticipate even more of the same through 2027.
If the token-bullish are correct, Nvidia and its rivals should have on offer the computing gear required to crunch a mind-numbingly large number of tokens.
Token generators are GPU datacenters, and GPU datacenters are AI factories
The expectation of more needed and actual token generation in the future does not hinge only on bundling AI-powered tools into existing software. It also doesn’t hinge on the current computing framework we all know and love.
Here’s Jensen explaining the future as he sees it:
In the future, the computer is going to generate the tokens for the software. And so the computer has become a generator of tokens, not a retrieval of files. From retrieval-based computing to generative-based computing, from the old way of doing data centers to a new way of building these infrastructure — and I call them AI factories.
They're AI factories because it has one job and one job only: Generating these incredible tokens that we then reconstitute into music, into words, into videos, into research into chemicals or proteins [or other things].
Summing quickly, training AI models today takes more tokens, AI models that can reason require more tokens, and computing itself could shift from a retrieval (go fetch) approach to one that generates tokens as it thinks.
That’s a lot of goddamn tokens. That’s tokens from the AI lab to the inference cluster to your home machine. This is where AI factories come in.
Calling datacenters a factory feels like a stretch at first blush. What does it produce akin to a factory? Tokens! Which have a price attached to them. Jensen noted during his speech that ChatGPT costs around “$10 per million tokens.” [Today per its pricing page, OpenAI o1, a reasoning model costs $60 per million token output, while o3-mini costs a mere $4.40 per million tokens generated.] That means you can run the math on how much money an AI factory — GPU datacenter — can earn from its computing.
The Nvidia CEO was riffing live, so I won’t hold him to the math in exact terms, but he ran some simple calculations on stage that led to some AI factories being able to generate between $250,000 and $25 million per second.
It’s reasonable to consider AI-focused compute clusters as factories because they mass produce a common good. And with a value attached to marginal token production, we can infer their economics:
IF more tokens
And IF tokens have a discrete value
THEN AI factories will generate lots of income
The more you believe in AI becoming more complex (capable) and in demand (required), then the more Jensen’s model makes sense.
Ok but he’s just talking his book, right?
Of course the CEO of Nvidia thinks that the future for Nvidia is bright.
That said, there are a few things worth noting that are taken as implicitly true in Jensen’s argument. Here’s my take:
For AI factories to exist as a general business project and not merely the domain of a single company, AI models in the future have to be somewhat fungible. That is to say, there are multiple models in the market that are commercially viable. If that was not the case, then AI factories would have a single possible customer, and would probably wind up owned by that one company. This is almost what OpenAI is betting on with its Project Stargate — it wants to build the best models, and own its own compute.
The good news is that today AI models are largely fungible amongst those on the leading edge. You can use models from Google or Amazon or Microsoft or Anthropic or OpenAI or Mistral, the list goes on.
AI factories to exist as a general business project and not merely the domain of a single company, token demand growth must always outstrip gains in AI compute efficiency. It’s not impossible to imagine a world where tech folks figure out how to make AI models more efficient faster than token demand rises, leading to a world in which the value of marginal token generation collapses. In that case, AI factories are going to look like hubristic money pits. Hence Jensen’s note that he expects AI models to reach into the trillions of parameters. Larger models require more compute, after all.
DeepSeek showed that there are some gains to made in AI compute efficiency, but given the global race to build more compute capacity still, I don’t think that we need to worry here. Yet.
The above arguments explain why Nvidia is investing so heavily around the AI world. It wants lots of competing AI models, lots of competing AI model-using startups and companies in the world, lots of new use-cases and testing, and lots of demand for ever more token generation. So, it’s helping foment that demand.
Again, fair enough. What I appreciate about the AI factory concept is not merely that it rejiggers my thinking about compute output, but it also implies that synthetic intelligence will get cheaper to purchase. Factories became a big deal because they were more efficient, and could produce goods more cheaply. More, cheap tokens means more generalized access in the market to AI-powered technologies.
Good. That could level the global playing field. And what better use of technology than that?