Why we demoted our AI

A wall of detailed text sure is helpful but it doesn't quite compare to the magic of seeing your prompt come to life in front of your very eyes. Your browser opens, a flurry of text appears in form fields and all of a sudden it feels like your computer is alive.

All these complex actions extracted from just one simple instruction.

In case you didn't know, that's exactly how Carbonate started. You typed a broad instruction like "fill in the login form" and it would turn that into a fully functioning end-to-end test.

However, against the trend of frontier labs like Anthropic, Google, and now OpenAI... we recently made the decision to demote AI from a browser-controlling wizard to a simple copycat. Instead of AI deciding what to do, we simply record what you do and recreate it; only using AI to fill in the gaps.

That doesn't sound very magical at all... and it's not.

Consistency is key

Code magic either dies a hero or lives long enough to become the villain
– Misappropriated quote from a movie franchise

Unlike general browser-controlling agents, Carbonate has a slightly more specific goal of testing web applications. The ideal end-to-end test is quick, predictable, and consistent; none of the qualities that AI excels at. Instead, you'll find yourself wondering... Did the AI actually do what I asked? Will it decide to do something different next time?

These are the kinds of issues our customers often hit with our original tool. Tests ended up being slow, inconsistent, and occasionally did the completely wrong thing without you even realizing. We added a bunch of clever workarounds but at the end of the day, the quality of the test still hinged on a well-crafted prompt.

At a certain point, it felt just as much effort and skill to write the perfect prompt as writing the darn code yourself. You're essentially still using the same skills as coding but instead of a well-defined, fast, and deterministic programming language you're stuck using an ambiguous and inconsistent instruction language.

Potentially costly mistakes

The dangers of getting it wrong in a test generally aren't that high since tests often run in isolated environments - purposely designed to handle misbehaving and broken code.

But, can you imagine asking OpenAI operator to book a flight to Sydney and after a long, uncomfortable, flight you step off the plane - jet-lagged and grumpy - only to find out you're in Sydney... Canada?!

Okay, that does sound a bit contrived, but after years of real-world experience controlling browsers with AI, these are the kind of silly mistakes you get by giving AI full autonomy.

These new tools are very impressive and incredibly cool to witness but as mistakes are made and confidence erodes, they risk being condemned to the "too magic to trust" drawer and people will fall back to more conventional methods.

Playing to AI's strengths

60% of the time it works every time
– Another misappropriated quote from a cult classic

It's not all doom and gloom though. There exists a place for AI to provide tremendous value and time savings.

So that's why we removed the magic from Carbonate. It might seem obvious in hindsight, but as a flurry of new browser-controlling AI tools appear on the market we can't help but feel slightly redeemed that they're all about to learn the same mistakes as us.

But don't worry, we'll be waiting at the airport with a stiff drink, a slightly smug smile, and a free trial to Carbonate to make you feel better.

Consistency is key

Potentially costly mistakes

Playing to AI's strengths

Related Posts