Abstract

Companies such as OpenAI and other tech start-ups often pass on “technical debt” to consumers—that is, they roll out undertested software so that users can discover errors and problems. The breakneck pace of our current AI “arms race” implicitly encourages this practice and has resulted in consumer-facing large language models (LLMs) that have problems with bias and truth and unclear social implications. Yet, once the models are out, they are rarely retracted. The result of passing on the technical debt of LLMs to users is a “moral hazard,” where companies are incentivized to take greater risks because they do not bear their full cost. The concepts of technical debt and moral hazards help to explain the dangers of LLMs to society and underscore a need for a critical approach to AI to balance the ledger of AI risks.

Over the December holidays in 2022, Southwest Airlines dominated the headlines, canceling thousands of flights, stranding two million passengers, and frustrating thousands of their employees. In total, they lost about a billion dollars—all from a failure to update software scheduling systems (Lampert and Singh 2023). With a strategy focused relentlessly on growth, Southwest had underinvested in their technology for years. Why?

It turns out that updating legacy software systems—while necessary for long-term sustainability—is hard to implement because the expense shows up in quarterly reports. CEOs eyeing retirement or departure are thus tempted to pass on the responsibility for updating to the next quarter—or, better still, to the next CEO.

In start-up companies—especially in the fast-moving space of generative AI—a pressure to be first-to-market creates a similarly perverse incentive. Founders hope to succeed in the present by shipping quickly and deferring the costs of maintenance. The result of this incentive system is underinvestment in technical infrastructure, or technical debt.

Technical Debt: Move Fast and Break Things

In finance, debt is borrowing against the future—enjoying something now for a promise to pay for it later. In software engineering, the term technical debt refers to borrowing against a future of technical work—deferring cleanup or compliance in a rush to ship products or using short-term fixes instead of undertaking the major overhauls that would take a crucial system off-line. It's sometimes strategic to borrow against future work: start-ups that build for longevity or sustainability might be so focused on code reviews that other companies beat them to the punch, rendering all their long-term visions moot. As with financial debt, not all technical debt is bad. But both kinds of debt eventually need to be serviced (Sculley et al. 2015).

In the world of start-ups, fast-moving teams might cut corners on functionality, reliability, scalability, or testing with user groups. Twitter's original iteration is a canonical example: it wasn't built to scale. The developers wrote it quickly in a popular software framework (Ruby on Rails), got it running, and people started to use it. But Ruby on Rails is built for speed (“on rails”), not scale. So the early days of Twitter were marked by the friendly “fail whale,” indicating a software problem. Paradoxically, this failure was key to Twitter's early success in the marketplace. Early Twitter was janky, but it was out there and good enough to corner a new market in microblogging. Indeed, it's proved remarkably resilient. Despite its current technical, political, and marketing disasters and its rocky rebranding as “X,” it still maintains market dominance (at least as of August 2024). In the fast-paced world of software, a rushed deployment will likely accrue technical debt that the company hopes they will be around to deal with later. Move fast and break things, Mark Zuckerberg said.

“Generative AI” is also moving fast and breaking things. In the so-called arms race between Google and OpenAI (in a multibillion-dollar partnership with Microsoft), which also includes Amazon (now in a partnership with Anthropic AI), Meta, and a growing number of start-ups with or without large corporate partners, the rush to wow consumers and grow market share has pushed long-standing concerns about user safety, privacy, data theft, and social harms aside. As Inioluwa Deborah Raji et al. (2022) argue, even basic functionality can get overlooked in the climate of hype-driven AI development. Before OpenAI's release of ChatGPT, there was a tacit and cautionary agreement among companies working in generative AI: test out the models, roll them out slowly, keep them proprietary, or sell them to platforms to embed. But OpenAI violated that convention in November 2022 when they released ChatGPT to the public.

What made ChatGPT different was a friendly format and heightened accessibility: a “free” English-language interface that resembled a search engine and could be used in a conversational style that required almost no technical knowledge. And thanks in large part to the company's quiet enlistment of low-paid human labor, the chatbot was more reliable and seemingly more coherent than previous models, plus less prone to spouting bias and toxicity (see also Lauren M. E. Goodlad and Matthew Stone's introduction to this special issue). By January 2023, ChatGPT had one hundred million users and was the most rapidly adopted software of all time (Hu). Just a few months after its release, it had catapulted into the public consciousness—and many of its users’ everyday workflows.

OpenAI's strategy was to roll out ChatGPT and let its users test it in exchange for “free” access. OpenAI has argued that this “iterative deployment” strategy increases safety and curbs misuse because the company receives far more useful information from real-world user-testing than from internal “red team” testers (Fridman 2023; OpenAI 2022). This strategy has continued through the release of GPT-4 (an updated model released for paying subscribers) and GPT-4o (the free multimodal version of ChatGPT that began rolling out in May 2024). While OpenAI asserts that the data they receive from this real-world testing will improve the model, the AI horse, so to speak, is already out of the barn. To put this differently, OpenAI's plan—and its partner Microsoft's—is to pass on ChatGPT's technical debts to its users.

Public Paying of Technical Debt

In a capitalist economy, piling up technical debt is as crucial to a start-up's success as accruing monetary debt is to new ventures. To borrow another expression from software engineering: technical debt is a feature, not a bug. And technical debt is a particularly salient feature of the hypercompetitive space of generative AI. AI start-ups are moving so fast that they are necessarily deferring costs. For large language models (LLMs), some of this technical debt inheres in the motley soup of “scraped” data, the lack of testing, and the methods through which systems are trained and deployed. Because proprietary entities like OpenAI guard the secrets of how the models are trained, on whom, and with what data, and because the marketers of AI are not required to report what they know about problematic use, these debts will be invisible to the public.

Yet, unlike with Southwest Airlines, it is solely the public that is left to bear the social impact and diverse harms of rapid technological deployment. Those paying the debt now and in the foreseeable future include underpaid content moderators in the Global South; workers whose jobs are eliminated or downgraded because of automation; artists and writers whose creative work is ingested and imitated without consent; minoritized cultures and linguistic populations whose language and culture are distorted, appropriated, or ignored; educators who must grapple with student use of automated AI tools; regulators struggling to plug holes in a bursting dam of AI-generated content; and a warming planet burdened with yet another highly resource-intensive technology. And so on. The recipients of huge investment capital are generally unconcerned about these technical debts because they're unlikely to be the ones to pay for them. Reap the technical and financial benefits now! Fix things later! Or leave the scene entirely and let someone else pay the debts and clean up the mess. Put another way: privatize profits and socialize losses.

What is distinctive about LLMs, however, is that the collective and uncompensated labor of millions of people, past and present (including you and me), made this technology possible. We supply the data that fuels LLMs, largely through a variety of online services and resources that we were told were “free,” such as Wikipedia, Google docs, and Reddit. The old adage about technology applies here: if the service is free, you're the product. Or, as Shoshanna Zuboff puts it, “You're something even more degrading: an input for the real product, predictions about your future sold to the highest bidder so that this future can be altered” (quoted in Biddle 2019).1 So, here we are, in the age of surveillance capitalism: as internet users, we're paying a debt from a contract we never realized we'd signed, terms hidden deep in click-through EULAs (end-user license agreements). Everything we've written or shared online is grist for the mill of LLMs.

LLMs Are Collateralized Debt Obligations

Debt carries risk that if not properly accounted for can lead to disaster. At the heart of 2008’s global financial crisis was collateralized debt obligations, or CDOs. These opaque financial products bundled poorly rated investments such as risky subprime mortgages and sold them as if they were “secure” investments.

LLMs are effectively CDOs: a bunch of mixed-quality data repackaged and then resold at volume as good data. It's information laundering. Jared Lanier (2023) notes that the web was built for such erasure: “The Web was made to remember everything while forgetting its context.” When data is scraped, the provenance is erased. Without knowing the origin of the training data and other inputs—a problem called documentation debt—we can't see what's been used to build LLMs. We can't credit it. And we can't hold it to account. Worse, the information is overleveraged, as synthetic AI writing is being fed back into the models, effectively double-counted on the books and potentially leading to a problem dubbed “model collapse” (Shumailov 2023). Underneath, LLMs are still a bad asset—and the vast scale of Big Data can't render it secure.

In 2021, Emily M. Bender et al. warned us of the dangers of scale in LLMs. Enumerating a variety of problems and harms, these researchers demonstrated how the massive scale of unfiltered data underlay the unreliable and biased condition of their outputs. They recognized the CDO problem with LLMs—that it is low-quality data simply repackaged at scale. But their lesson about scale also applies to the level of investment in LLMs. Now that generative AI is being integrated into workflows and the corporate landscape, we're told we can't turn back. Generative AI is now too big to fail. And just as in the banking crisis of 2008—which saw “Main Street” lose homes while few on “Wall Street” lost fortunes or jobs—we're yet again punishing the wrong people.

The Moral Hazards of Tech

When people are forced to pay for the risks taken by others, the result is misaligned and distorted incentives. Economists calls this situation a moral hazard: “where an economic actor has an incentive to increase its exposure to risk because it does not bear the full costs of that risk.”2 Moral hazards occur, for example, when corporations insured against damages take on higher risks, banks rely on public bailouts because they're “too big to fail” (as they did in 2008 and 2023), or investors overvalue a company if they know they can cash out before it tanks. Southwest's C-suite also faced a moral hazard when considering the technical debt of its decrepit scheduling software. As Zeynep Tufekci (2022) explains, golden parachutes and quarterly earnings reports encourage privatizing short-term gains at public expense. And when company leaders aren't incentivized to fix it, technical debt accrues.

With billions already in the new tech bubble, investors in generative AI are running headlong into moral hazards. Tech leaders know that their systems are accruing technical debt: OpenAI's Sam Altman (2023) testified before Congress about how ChatGPT continues to “hallucinate,” mislead, and generate misinformation. LLMs are designed to produce probabilities rather than represent ground truth—although many users are misled by the hype that suggests they're reliable. But there's little worry for the plucky and brilliant entrepreneurs leading the charge here. As David Graeber, the author of Debt: The First Five Thousand Years, noted, “The rich have always been capable of extraordinary acts of generosity and forgiveness when dealing with each other. The absolute morality of debt is meant for us lesser mortals” (Kernis 2011).

In 2008, only some banks failed, and few leaders suffered any personal consequences. When Gary C. Kelly retired as Southwest's CEO in 2022, he left with tens of millions of dollars despite the company's pandemic losses and decreases in worker pay (Tufekci 2022). OpenAI's Sam Altman is now the guest of international political leaders, and generative AI start-ups are sucking up more venture capital than all other areas combined (Temkin 2023). At the same time, automated customer service threatens to displace workers (Goldberg 2023), schools are grappling with privacy risks and academic integrity concerns (Rodríguez, Ishmael, and Adams 2023), and medical providers worry that low-quality patient summaries and faulty assessments will impair heath care (Yun et al. 2023). No one in charge is paying any of the costs of generative AI. And its technical debt is accruing.

Critical AI

Generative AI exists in a world rife with moral hazards brought on by a toxic mix of techno-utopian hype and the relentless drive of fast capital. In such a world, “AI ethics”—despite much serious research—is arguably more effective as a marketing strategy than a robust assurance against public harm.

These moral hazards are compounded with hidden debt. When no one knows how overleveraged a particular bank is or when companies conceal the technical vulnerabilities of their systems, debts compound silently and amplify public risks. Similarly, generative AI is particularly susceptible to “hidden technical debt.” As D. Sculley et al. (2015) describe, the machine learning (ML) that underlies LLMs hides technical debt for a few reasons: its risks are often at the level of real-world data or deployment and can't be fixed in the code; its desired results are often unspecified; and the systems are so complex that inputs are never fully independent of each other. They note that a particular risk with large ML systems occurs when other systems use an ML's predictions as input, potentially compounding errors from the original model (Sculley et al. 2015). These secondary systems are “undeclared consumers” who amplify the problem of hidden technical debt. In the case of ChatGPT, the many apps that pull from its outputs as well as the users of those apps are “undeclared consumers,” compounding the model's hallucinations, misinformation, and bias through their further distribution of its output.

The technical debt of generative AI, particularly the hidden social debts that the public incurs from these systems, are central concerns for critical AI research. Although technical debt and moral hazards are inherent in the entangled financial and technical systems supporting generative AI, we can expose debts so as to shield the public purse more effectively. What would it look like for this technical debt to accrue to the appropriate side of the ledger? For generative AI companies to pay the social and technical debts they accrue? At the cusp of a new era of technology, a critical AI perspective may help us to keep those debts in check.

Notes

1.

This idea has been around for a long time, and its most recent iteration in connection to social media services may be attributed to Andrew Lewis and was amplified by Tim O'Reilly (2010). Zuboff's analysis is more subtle and devastating here, and also more directly applicable to LLMs. See O'Toole 2017; blue_beetle's profile (Andrew Lewis), MetaFilter, https://www.metafilter.com/user.mefi/15556 (accessed May 22, 2024).

2.

Wikipedia, s.v. “Moral hazard,” last modified April 25, 2024, https://en.wikipedia.org/wiki/Moral_hazard.

Works Cited

Altman, Sam.
2023
. “
Written Testimony of Sam Altman Chief Executive Officer OpenAI before the U.S. Senate Committee on the Judiciary Subcommittee on Privacy, Technology, and the Law
.”
US Senate Committee on the Judiciary
,
May
16
. https://www.judiciary.senate.gov/imo/media/doc/2023-05-16%20-%20Bio%20&%20Testimony%20-%20Altman.pdf.
Biddle, Sam.
2019
. “
‘A Fundamentally Illegitimate Choice’: Shoshanna Zuboff on the Age of Surveillance Capitalism
.”
Intercept
,
February
2
. https://theintercept.com/2019/02/02/shoshana-zuboff-age-of-surveillance-capitalism/.
Bender, Emily, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Schmitchell.
2021
. “
On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
” In
FAccT '21: Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency
,
610
23
. https://doi.org/10.1145/3442188.3445922.
Fridman, Lex.
2023
. “
Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI | Lex Fridman Podcast #367
.” YouTube,
March
25
. https://www.youtube.com/watch?v=L_Guz73e6fw.
Goldberg, Emma.
2023
. “
‘Training My Replacement’: Inside a Call Center Worker's Battle with A.I
.”
New York Times
,
July
19
. https://www.nytimes.com/2023/07/19/business/call-center-workers-battle-with-ai.html.
Hu, Krystal.
2023
. “
ChatGPT Sets Record for Fastest-Growing User Base—Analyst Note
.”
Reuters
,
February
2
. https://www.reuters.com/technology/chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/.
Kernis, Jay.
2011
. “
David Graeber Studied Five Thousand Years of Debt
.”
In the Arena
(blog), CNN,
July
5
. https://davidgraeber.org/articles/david-graeber-studied-5000-years-of-debt/.
Lampert, Allison, and Rajesh Kumar Singh.
2023
. “
Analysis: Southwest Network Failure Raises Concerns over System's Strength
.”
Reuters
,
April
20
. https://www.reuters.com/business/aerospace-defense/southwest-network-failure-raises-concerns-over-systems-strength-2023-04-19/.
Lanier, Jaron.
2023
. “
There Is No A.I
.”
New Yorker
,
April
20
. https://www.newyorker.com/science/annals-of-artificial-intelligence/there-is-no-ai.
OpenAI
.
2022
. “
Lessons Learned on Language Model Safety and Misuse
.” OpenAI (blog),
March
3
. https://openai.com/research/language-model-safety-and-misuse.
O'Reilly, Tim (@timoreilly).
2010
. “
Yes! RT @bryce love this quote
.” Twitter, September 2, 3:17 p.m. https://twitter.com/timoreilly/status/22823381903.
O'Toole, Garson.
2017
. “
You're Not the Customer; You're the Product
.”
Quote Investigator
,
July
16
. https://quoteinvestigator.com/2017/07/16/product/.
Raji, Inioluwa Deborah, Elizabeth Kumar, Aaron Horowitz, and Andrew D. Selbst.
2022
. “
The Fallacy of AI Functionality
.” Preprint, last revised July 1. https://arxiv.org/pdf/2206.09511.pdf.
Rodríguez, Roberto J., Kristina Ishmael, and Bernadette Adams.
2023
.
Artificial Intelligence and the Future of Teaching and Learning
. US Department of Education, Office of Educational Technology, May. https://www2.ed.gov/documents/ai-report/ai-report.pdf.
Sculley, D., et al.
2015
. “
Hidden Technical Debt in Machine Learning Systems
.” In
Advances in Neural Information Processing Systems
, vol
28
, edited by C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett. https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf.
Shumailov, Ilia, Zakhar Shumaylov, Yiren Zhao, Yarin Gal, Nicolas Papernot, and Ross Anderson,
2023
. “
The Curse of Recursion: Training on Generated Data Makes Models Forget
.” Preprint, last revised May 27. https://arxiv.org/abs/2305.17493.
Temkin, Marina.
2023
. “
In the World of Startup Valuations, There's Generative AI—and Everything Else
.”
PitchBook
,
May
22
. https://pitchbook.com/news/articles/early-stage-valuations-generative-AI-compare-VC.
Tufekci, Zeynep.
2022
. “
The Shameful Open Secret Behind Southwest's Failure
.”
New York Times
,
December
31
. https://www.nytimes.com/2022/12/31/opinion/southwest-airlines-computers.html.
Yun, Hye Sun, Iain J. Marshall, Thomas A. Trikalinos, and Byron C. Wallace.
2023
. “
Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews
.” Preprint, last revised May 22. https://arxiv.org/abs/2305.11828.