Welcome to Cautious Optimism, a newsletter on tech, business, and power.
📈 Trending Up: Patience with Apple Intelligence … not deleting records … creatives’ will to (financially) live … TollBit, after its Series A … eVTOLs in the United States …
PhonePe: The Indian consumer payments company dropped its “inaugural annual report.” There are two things to note:
PhonePe wants you to know it’s an Indian company. Sure, Walmart owns the majority of it — observe its funding history, and recall that Walmart owns Flipkart — but PhonePe is “made in India, by Indians, for Indians,” in its own words.
PhonePe wants you to know that it’s a killer business (PAT-excluding ESOPs meant profit after tax, exclusive of employee share-based compensation costs):
📉 Trending Down: Arm-Qualcomm … Android in China … Shein’s implied valuation, after growth slowed and profits fell in H1 2024 … births in Germany … South Korea’s population …
Anthropic just made the agentic AI race interesting
Recently I used ChatGPT/o1 to help find a book from my youth. The OpenAI service tried to recommend the wrong book (Ender’s Game) more times than I felt was warranted, but with a chipper willingness to keep trying, it eventually managed to dredge up the relatively obscure title: Virtual War by Gloria Skurzynski.
Current AI consumer chat tools are getting good enough to replace Google for quite a lot. It’s encouraging — we desperately need more competition in the search category.
But over in enterprise-land, it’s been less clear to me how much the genAI wave has or will impact day-to-day worker productivity. Microsoft is working to beat back the narrative that its AI features for Office are window dressings, for example.
One way that many tech folks think that AI will shake up corporate work is through the use of AI agents. We’ve covered them before at CO, but as a reminder here’s how Sierra, a startup working with agents, describes them:
Agents are autonomous software systems that can reason, make decisions, and pursue goals with creativity and flexibility, all while staying within the bounds that have been set for them. Whereas applications help you do the work, agents get the work done for you.
You can see the corporate appeal. Software is cheaper than humans. Software that can do more means companies need fewer workers. Full employment will, as always, eventually catch up, but companies like higher net margins now more than they covet preventing former employee familial upset.
On the agent front, Microsoft and Salesforce are making lots of noise. With their respective enterprise customer bases, this is not a shock. This week, Microsoft announced two agentic items:
Microsoft, which describes agents as “the new apps for an AI-powered world,” is clearly focused on ensuring that if agentic AI becomes a critical corporate work tool, it owns a big slice of the market. Salesforce, too, wants a piece.
It’s intellectually interesting that companies now have access to a nascent technology that could dramatically limit their need to employee humans. But what I want is something so much more:
I want a personal AI that lives on my computers, follows me from device to device, learns from me, helps me, and becomes not a digital twin, but a digital pal.
I suspect this model would be 90% help and 10% talking shit to my PC. But, hey, I talk too much.
Enter Anthropic, which announced a “computer use” API for its Claude family of AI models. Here’s how the company described its work:
Available today on the API, developers can direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking buttons, and typing text. Claude 3.5 Sonnet is the first frontier AI model to offer computer use in public beta.
How Anthropic built the computer use API is incredibly cool:
We were surprised by how rapidly Claude generalized from the computer-use training we gave it on just a few pieces of simple software, such as a calculator and a text editor (for safety reasons we did not allow the model to access the internet during training). In combination with Claude’s other skills, this training granted it the remarkable ability to turn a user’s written prompt into a sequence of logical steps and then take actions on the computer. We observed that the model would even self-correct and retry tasks when it encountered obstacles.
Why bother to train Claude to act like a human on a computer? Why not skip the entire UI layer, and just have the AI ghost chat with the silicon and ethernet ghosts, directly?
It’s the humanoid robot argument: Why build a humanoid robot when doing so requires lots of joints and balance and other issues that could be erased by choosing a different form factor? Because if you want to build a robot that is generally useful in the world, it needs to be human-shaped.
In a similar way, the digital world today is designed for human-style interaction. We see a certain amount, can read in a certain way, have hands for moving and interacting with purpose-built input devices, the list goes on. So, if you want to interact with the world today — and, let’s be honest, we’re a long way from moving past the GUI for personal computing — you need to act like a human.
Enter the computer use API.
A final thought here: What I love about Anthropic’s news is not just the funny anecdotes:
Even while we were recording demonstrations of computer use for today’s launch, we encountered some amusing errors. In one, Claude accidentally clicked to stop a long-running screen recording, causing all footage to be lost. In another, Claude suddenly took a break from our coding demo and began to peruse photos of Yellowstone National Park.
It’s the programmability:
At this stage, [computer use API] is still experimental—at times cumbersome and error-prone. We're releasing computer use early for feedback from developers, and expect the capability to improve rapidly over time.
Hell. Yes. Agentic AI was looking increasingly locked behind enterprise service agreements and modern corporate babble-speak. Now, we have something that everyone can use to tinker. This is going to be good.
In other news
CNBC reports that Stripe’s new Bridge.xyz asset had “annual revenue is in the range of $10 million to $15 million” around the time of its sale for around $1.1 billion. That means that Stripe paid a bonkers premium for the asset in revenue-multiple terms.
But that’s not the point: Stripe didn’t buy a company ala Salesforce looking to boost its growth rate. Instead, CNBC reports that Bridge is the most common back-end for stablecoin startups and related efforts. A growing segment of payments infra outside Stripe’s current remit? That’s a risk to its core business, and thus something to solve. With money.