Will paying for training data make AI a mediocre business?
And Microsoft stares down a parking ticket
Welcome back to Cautious Optimism! It’s Monday, June 25th, 2024.
Did you know it now takes 1.8 years for the median startup to reach its Seed round? And that’s amongst startups that do raise a Seed round. The data point is up from 1.5 years in 2024, per Carta data. It also takes longer to raise Series A through D rounds this year. The question for us to sort out is if the longer wait times between rounds are due to startups enjoying longer runways thanks to cash caution or investor lethargy.
Hey, that sounds familiar!
While it’s entertaining to watch Perplexity tie itself into knots to explain how its software is super-smart ‘AI search’ and not merely a journalism-plagiarism-and-remixing machine, the AI training data debate is not academic. The Recording Industry Association of America’s (RIAA) recent suits against AI music startups Suno and Udio make that plain; we will resolve the issue.
There’s irony at play. Tech companies love to avoid sharing information about their operations and guard their IP closely. We’ve plenty of evidence of that. But when it comes to non-tech IP, the vibe from many startups is that existing intellectual property should be free to use so long as it can be used by them to make lots of money.
Media, often on the losing end of a historically contentious relationship with tech companies, has found itself suddenly in a position of leverage. Being confronted with tech companies generating hundreds of billions of dollars worth of new wealth without — mostly — paying for the data they used to create their models is both a chance at revenge and a shot at a long-term financial lifeline.
The situation could slip from the media’s fingers. Companies like Perplexity, when it comes to the journalism it needs to operate, or Suno and Udio, when it comes to the art they need to generate remixes, could yet scamper off with the bag. That would leave media companies in the miserable situation of having paid for the materials that companies working to replace them used to undercut their existing offerings. It’s like being forced to provision the enemy army and, when you complain, being told that ‘it’s just food, food is everywhere, why are you being so weird about me eating your food? Give me your fork. Food belongs to everyone. No, you cannot have any of mine. Why are you crying?’
A good question is why AI companies are so obstinate about paying for training data. The simple answer is that if they can get away without paying, their gross margins will be stronger, and thus, their businesses will be worth more. As venture capital backs most of the startups we’re discussing, that’s a big deal. But at the same time, the AI world could strike long-term deals with the companies they need to work with, and lawsuits could be avoided — perhaps at better terms than a legal scrap will lead to.
Startups are not very close to that perspective. Here’s how the RIAA describes Suno (court filing):
Udo (court filing):
And Perplexity (Fast Company):
There is a slight nod after that last quote from Perplexity’s CEO about how having a healthy Internet is key for AI companies so that his company has new material to pull from. True! I am glad that we are all in agreement there. I just wish that tech companies, so willing to pay up for top talent and executive comp, would also be willing to pay for the bricks they use to build their newest houses.
Quick Hits
Trending Up: Self-driving semis … the number of startups working on AI agents … the ability of Julian Assange to travel … AI chip investment … inflation in Canada (BG) … endless demands from backbone-free autocracy apologists for Ukraine to cede territory …
Trending Down: OpenAI access in China … Moderation at Meta … bundling … cybertruck reliability … Nvidia’s ascent … our ability to handle extreme heat … crops in China, India …. Argentina’s economy (BG) …
Election Season: Between UK elections on July 5th, French elections on June 30th and July 7th, and the United States voting in November, it’s a time for choosing. My fingernails are going to suffer.
So Much For SaaS: The value of the Bessemer cloud index, tracked via the WCLD 0.00%↑ ticker, is off 0.85% over the last year. Given that the companies that comprise the software index grew over that time period, we’re definitely back into multiples-compression territory. Sorry, startups.