Early thoughts on AI
I’m finally giving up on my pedantry over “machine learning” and “text prediction models”; let’s just call LLMs AI. More than a year since ChatGPT launched, it may seem odd that I title this “early thoughts”, but I have finally come around to the idea there’s something of consequence happening.
I have to recognize that the semantics of “AI” have shifted with public use, and that these text-generating statistical engines are useful for something more than, well, generating text. Like many, I was both impressed by the realism of ChatGPT but rolling my eyes over hype as the Turing test fell. “Artificial Inelligence” is not a word I want to adopt lightly, after spending too much of my early life excited about expert systems, fuzzy logic, childishly-simple analogies between cellular automata and Minsky’s society of mind. The GPTs have been easy to dismiss as fun experiments what won’t scale to real-world usage.
I also knew too much to enjoy the magic everyone was feeling. In 1989 I learned about Markov chains; as a student of language and hobbyist computer scientist, of course my imagination was ignited. I immediately revived my “AutoSam” chatbot as an IRC bot. Nothing came of this, except I began to think of human language as statistical rather than the result of rigid grammers.
20 years later, I found myself managing a product management team that worked with a lot of analytics and included data scientists. I don’t believe in managing people who’s craft I can’t participate in. AlexNet was already 6 years old by then so it was obvious that scaling up machine learning had potential. So, I took Andrew Ng’s machine learning course from Stanford and had to learn Octave and a lot of statistics I don’t need to train and test toy prediction models. The magic of models was properly dispelled, in the way only building your own can achieve.
GPT-2 was released during that course, so I paid a lot of attention to what applying neural nets to language could do. From then through ChatGPT’s release last year, I thought of the big models as really technically impressive evolutions of my little Markov bot, and expected we’d see some neat advances in NLP, spam filtering, and autocorrect.
Here we are a year later, though, and I have to admit that I am quite surprised at the emergant capabilities. RAG is a simple idea, but hints at how these limited black boxes might be useful. And while after interacting with many models I still do not believe they are thinking, I do think it’s fun to interrogate the difference between their generation of language and my own. Thinking ot not, I also cannot ignore that code copilots seem to be changing the field of software development in a way I haven’t seen since the JVM.
My thoughts are immature, but my intuition is that there is a new field of study being born. This is not just another hype cycle like consumer 3D printing or blockchains, but something bigger. It may not be sentience or consciouness in my lifetime, but these tools will be interesting building blocks for systems we’ve yet to imagine.
I posted this in December 2023 during week 2598.
For more, you should follow me on the fediverse: @hans@gerwitz.com