Yes, we took your stuff and are making a killing. Why aren't you happy when we offer you scraps?
Show me the money
I do not know if the New York Times will prevail over OpenAI in its suit against the well-known generative AI company and Microsoft vassal. But one question that I have had, and I presume you as well, is why negotiations between the two companies broke down. They were talking, then the Times sued. OpenAI has struck a publicly perplexed tone, saying that it thought that it was still talking with the paper. Quelle surprise, a lawsuit!
New information out this morning from The Information — points to my friends over there, they’ve been doing great lately — indicates that deals that OpenAI is striking with at least some publishers are in the $1 million to $5 million range. Those figures likely help explain why the Times is suing OpenAI.
Here’s the way I see it:
OpenAI ingested a lot of the Internet to build and train its models; the Times notes in its suit that the AI company also chose, and gave extra weight to datasets that included lots of its work. In short, OpenAI agrees with the Times that Times material is great, and is good for providing the grist that OpenAI mills down into powerful, popular, and pretty darn cool AI tools.
Now, far after the fact, OpenAI is working to cover some of its legal risk by cutting deals with publishers, offering up some revenue in exchange for what I presume is peace, and continued ability to consume. You can think of these agreements as super-subscriber transactions, in other words. (My thinking here may change as we learn more; for now, that’s my model.)
If I was the Times, and my copyright-protected work was consumed by a tech company that was worth tens of billions, was supported by a tech giant worth trillions, and was helping to create a multi-billion dollar revenue stream and I was not paid to date, I would be incensed. You would be too, if you were in their shoes.
I hope that OpenAI’s potential deal with the Times was worth more than $5 million. That figure works out to less than 1% of New York Times revenue in Q3 2023 alone. And the Times makes a pretty grokkable case in its suit that OpenAI’s models can be used to compete with it directly — the Wirecutter section, among others — so any such potential payout is not material to the Times, while the risks that OpenAI poses to it are.
Hence, lawsuit.
What about not getting in the way of innovation?
Lots of tech folks are fans of the ‘ask for forgiveness, not permission’ model of building companies. I am as well.
But this is that model. Or, more precisely, the Times’ suit against OpenAI is the natural result of asking for forgiveness instead of permission. OpenAI did ingest lots of copyright-protected material to build its early models, and we presume later iterations as well. OpenAI did get to raise a lot of money, and sell a lot of product without paying for what it used.
Now it is being asked to make good on its actions. So, the complaint that somehow the Times is being greedy, that paying for ingested materials that belong to someone else to train models that are turned into massively commercially-successful products is contrary to how ‘things should work,’ is specious rot from folks with a wager that they can get away with eating from others’ plates without the need to compensate them for it.
Silly, and self-serving. Also long-term defeatist; AI models are going to need to have access to up-to-date journalism to remain useful. Without new, fresh, accurate information they will not progress as quickly as possible.
Paying your suppliers fairly is not anti-capitalist. And it is certainly not anti-technology, or anti-progress. It’s just plain, old-fashioned capitalism. You know, the thing that we profess to love.
Word is Apple will be offering to pay publishers for their work as it develops its AI offering. The key questions are which ones and how many?
They tried to build a business model with all revenue and no COGS. Hmm, something doesn't seem right there.
Ok, they have hosting expenses. But when your product's value--model training--is tied to someone's else's data, not paying for it is kind of like stealing.