The carwash example on ChatGPT was exemplary for more reasons than one.

In case you are not aware of what it was, when someone asked, “I want to wash my car. The car wash is 50 meters away. Should I walk or drive?”, the responses were wildly varied.

Between, “It is a short distance, walk” and “You need the car at the carwash to wash it, so drive the short distance”, the sensible responses were a needle in a haystack.

For the general populace that does not understand (nor does it need to) how transformers work, this sounds stupid. Point the first to address.

Transformers are a much, much better way to generate text. Especially compared to the likes of a Markov Model that would go astray after a while. Take it from someone who has been trying to tame them in the text/genomics space for a couple of decades now.

But, at the end of the day, it is no more than a probabilistic model that gets worse, with a die roll. So, when the model failed, it is already on a limping foot, attempting to run an iron man.

This is also why LLMs work excellently in bounded contexts like programming where decades of documentation and best practices exist — the larger the data, the lower the variation, the better the response.

That would also explain why design work has a long way to go before these models can do anything beyond stock UIs.

Should I Drive?

And second, the hullabaloo around AGI and scale concerns the industry is figuring itself around. I’ll address these in two parts.

First, on the AGI. The situation above is a crystalline one to understand what intelligence entails and why AGI, no matter what one says, is not here anytime soon.

The models, are still doing a catch-up. To paraphrase a discussion with Grok (which, along with Claude) did fare well in these tests, acknowledged its current avatar fared well because the internal framing and continuous training helped.

A side note from an earlier life. When we were building the summary feature for crisp, a point of interest was to build a single purpose model that lay adjacent to the context, rather than readjusted scores. Similar is the case of general models that get tripped.

Second, on to the concern of scale. An unfortunate direction has been (a) larger models, and (b) a voracious appetite for hoarding resources that’s only fuelled by the fastest finger first advocacy of investments.

Andrej Karpathy says the pointlessly large models will make way for smaller distilled models, but the question remains — when will the Overton window shift and by the time it does, would we have battered down RAM flooding the market?