Abstract
In this essay, written in dialogue with the introduction to this special issue, the authors offer a critical history of the development of large language models (LLMs). The essay's goal is to clearly explicate their functionalities and illuminate the effects of their “generative” capacities—particularly the troubling divergences between how these models came into being, how they are currently developed, and how they are marketed. The evolution of LLMs and of their deployment as chatbots was not rooted in the design of interactive systems or in robust frameworks for humanlike communication or information access. Instead LLMs—in particular, generative pretrained transformers (GPTs)—arose through the steady advance of statistical proxies for predicting the plausibility of automated transcriptions and translations. Buoyed by their increasing faith in scale and “data positivism,” researchers adapted these powerful models for the probabilistic scoring of text to chat interaction and other “generative” applications—even though the models generate convincingly humanlike output without any means of tracking its provenance or ensuring its veracity. The authors contrast this technical trajectory with other intellectual currents in AI research that aimed to create empowering tools to help users to accomplish explicit goals by augmenting their capabilities to think, act, and communicate, through mechanisms that were transparent and accountable. The comparison to this “road not taken” positions the weaknesses of LLMs, chatbots, and LLM-based digital assistants—including their well-known “misalignment” with helpful and safe human use—as a reflection of developers’ failure to conceptualize and pursue their ambitions for intelligent assistance as responsible to and engaged with a broader public.