Abstract
This article argues that it is essential to design algorithmic systems that help us think instead of thinking for us. The aim of the article is to give some sense of how broad and diverse is the design space of interactive systems that use computation to challenge writers to be stronger and more limber.
Against Proletarianization
This word worker is at wit's end. The number on their email program's desktop icon, the one that refers to the quantity of unread emails, seems only to ratchet in one direction. Each plaintive message in this dizzying bureau is overdue for a reply—and not just any reply but a thoughtful one. How will the writer offer such solicitude “at scale”? And, if they even try, when will they find time to compose memos or tend to the company's various social media accounts, all overdue for encouraging updates?
We, the viewers of this promotional video for GrammarlyGO—a new algorithmic writing assistant powered by an “LLM” (large language model)—are meant to sympathize with the exhaustion of the corporate emailsmith. The solution, this advertisement (Grammarly 2023) suggests, is to move from typing to clicking: with a few radio-buttons toggled on, GrammarlyGO generates text according to the user's specifications: “casual,” “personable,” and “empathetic.” Or, if these predefined options are insufficient, the user can take a few more moments to issue a custom prompt to be satisfied by the so-called generative AI—but here once more, the speed of the AI is its main added value. In this advertisement, the cutting edge of knowledge work seems to long for industrial efficiency, albeit with a degree of customization that is more reminiscent of flexible “just-in-time” manufacturing than base-model Fordism. As Charles Bernstein (1986: 356–57) once observed, the move to early forms of computer-assisted “word processing” meant submitting writing to industrial and bureaucratic standards of efficiency and homogeneity. The rise of GrammarlyGO (and many similar tools now coming to market) would seem to represent the perfection of this drive to submit words to processes.
Welcome to the word factory. Though it may appear only as big as your desk, its invisibly hulking machinery surrounds you. Here are the levers you must pull, and here is the rate at which you must pull them. While the quality of the linguistic product is somewhat lackluster, the unit cost is low and getting lower.
“In machinery,” Marx wrote, “knowledge appears as alien” ([1973] 1993: 695). This comment, from the Grundrisse, begins to theorize the strange new mode of “distributed” and even “artificial” intelligence that developed in actual, brick-and-whirring-steel factories. Unlike the artisan—that woefully inefficient homo faber who through generationally forged knowledge and skill can turn clay into bowl, fiber into garment—the factory worker is eventually stripped of comparable prowess. Instead, their activity is constrained by the operations of industrial machinery that surrounds them, and their knowledge and skill are limited to helping these machines. What the artisan knew—how to make something, more or less from beginning to end—is no longer the cognitive possession of any individual worker; rather, it is distributed across the machines themselves: “[U]nity exists not in the living workers, but rather in the living (active) machinery” (693). And so, at last count, the knowledge of how to make something (a bowl, a gunstock, a coat) “is absorbed into capital,” which controls and maintains these machines.
But this de-skilling of the human is unique neither to a specific moment in the development of capitalism nor to capitalism in general. Abstracting and generalizing Marx's insights in the Grundrisse, Bernard Stiegler sees that it is not just the gargantuan factory apparatus of the Manchester textile mill that threatens to leave us cognitively bereft; rather, this loss of knowledge, this proletarianization, is an inherent danger when we take up any technology, anything that promises to make our lives easier or more efficient by externalizing some knowledge that was once internal to us.
By Stiegler's (2016) reckoning, the twentieth century in particular marked the rise of media industries that have attempted to proletarianize our “sensibility”—converting many or perhaps most of us into addled, barely thinking “de-symbolized consumers” who chug whatever ripe shlock the professionals pump through the radio or the screen. For instance, the advent of recorded music has freed most people from a felt need to learn how to create music ourselves, a loss of knowledge that in turn destroys our ability to seriously hear and seriously desire music (Stiegler 2015; see also Stiegler and Rogoff 2010). For Stiegler (2016), algorithmic media represents an even more disastrous development in the history of human cognition: the externalization of the capacity to decide.
A personal, if somewhat trivial example of this proletarianization: where once I needed to exercise at least some savvy in order to “discover” bands (going to venues, seeking out the right publications, debating the virtues of this or that style with friends), now Spotify's recommendation algorithms serve me specific albums or artists based on my already existing listening patterns and the listening patterns of people like me. I can just sit back. More than Marx, Stiegler emphasizes the emotional, even the “spiritual” (or noetic) consequences of losing one's knowledge as it is delegated to technology. When we never learn how to make a bowl from clay, when we never learn or play an instrument (or even “discover” music), we lose opportunities to become ourselves in the context of historically-meaningful activity.1 According to Stiegler's anthropology, humans who are too emptied-out of such knowledge are plagued by misery and malaise. The mind's emptiness is the soul's vacuum, and it aches.
But if all technology carries with it the risk of proletarianization, this does not mean that all technology does in fact proletarianize us. Stiegler takes seriously Plato's anxiety that literacy itself, because it allows us to externalize our knowledge, leaves us evacuated by our own forgetting. The written word is a pharmakon—the toxin that can be curative, the medicine that can be poisonous, all depending on how it is used. Writing is worth the risk, however, since we have discovered ways of using this dangerous thing therapeutically; through it we have enabled ourselves to engage in cognitively rich activities (e.g., composing a sonnet, studying history, following a philosophical proof) that more than make up for whatever is lost. This gain or profit is not merely external (as are the economic efficiencies delivered by that form of writing called double-entry bookkeeping) but internal: the sonnet, historical study, and philosophical proof strengthen our minds (see also Chloë Kitzinger's thinkpiece in this special issue) and indeed connect them to other minds. “Proletarianization,” Stiegler (2016: 28) explains, “is what constitutes an exteriorization without return, that is, without interiorization in return.” A thoughtful, careful, nontoxic use of technology is, therefore, one that strives for “interiorization in return.”2
This moment—the LLM moment—is aswirl with countervailing gusts of pessimism, hype, curiosity, confusion, and cautious optimism. Stiegler's development of the concept of proletarianization could help to calm some of these tempestuous urges while harnessing the power of others. On the one hand, proletarianization gives definition to vague feelings of anxiety or impending doom. If LLMs really were to take over the responsibility to write, what inner resources would be stripped from us? Could we even begin to tally all that would be lost in this catastrophic trade? Moreover, what would we be trading the obligation to write for? How much emptier will it feel to be a human? On the other hand, proletarianization refuses to couple nicely with nostalgia for the past or jealous attachment to the present. Instead it advises: don't be surprised if technology seems to strip you of some capacity, since it almost always does; don't cling too firmly to your current technological milieu or even the way that your mind currently works; do, however, be shrewd, making sure that whenever you yield some capacity to a technology, allowing a power that was once proper to you to be pulled beyond you, you gain a profit—some other mentally worthwhile activity, your new possession that you share with others. In this way Stiegler (2020), who sees in algorithmic media a terrible potential to evacuate the human mind, counsels experimentation: we cannot retreat but must instead explore alternative ways of making use of the very technologies that are only just revealing their toxic potential.
Suddenly disoriented by the potential of algorithms that can produce plausible student essays, some educators have decided already that LLMs are the baleful enemy that will allow students to complete coursework without engaging in any serious writing, serious thinking, themselves. Since this enemy has snuck into the classroom via computer, to expel it requires returning to precomputational methods of assessment, including handwritten “blue book” essays and even viva voce exams (Dobson 2023). Other educators do take a more sanguine view of the incursion of LLMs into humanistic writing. These models are not the enemy, or at least they need not be. Colin Milburn poses the question in terms of “human-computer partnership”: “Can we teach students to use critical methods to collaborate with AIs to make better, more robust knowledge?” (quoted in Raley and Rhee 2023).
The blue-bookers, even if their primary concern is to preclude student learning loss or cheating, would seem to be stalwart guardians against proletarianization. But the reactionary position is often a retreat: to return to paper or oral examination as the primary modes through which we see our students think would itself involve certain kinds of learning loss. The sort of writing performed on computers—though perhaps more homogenized, as Bernstein suggests—is also arguably more easily intertextual. It often emerges from a clutter of windows, saturated by all sorts of other algorithmic media: from blogs and social media to email and digital advertisements, not just from GrammarlyGO, ChatGPT, or Google's Bard. As such, a discourse that can be inked on paper in the classroom or spoken spontaneously is not equivalent to composition on a screen. In this regard, on-screen (and often online) composition entails a bargain—whether implicit or explicit—much like the bargain that took us to writing in Plato's time. The bargain posits that what is lost is worth less than what is gained.
Compared to Milburn's call for a pedagogy of “human-computer partnership,” however, the reactionary position has a distinct advantage: its simplicity. AI is cognitively toxic, and students should not avail themselves of it at all is a position that already contains its own implementation: ban the AIs forthwith. By contrast, there are many ways in which humans and AI could collaborate. One option is indeed for writers to turn themselves into managers of word factories. Writer Stephen Marche (2023), who has himself experimented extensively with AI-assisted writing, imagines a future of writing in which the human is primarily a “curator,” issuing fertile prompts to LLMs and then, with taste and judiciousness, selecting the choicest morsels of algorithmic output for compilation. (Marche is thinking here of literary writing—composing a short story or a novel—but it would not be impossible to apply this same approach to scholarly writing.) Let us call this perspective a pragmatic accelerationist one—accelerationist merely in the sense that it seems happy, even eager, to yield a great deal of human labor with algorithmic labor, and pragmatic in the sense that it imagines a way for humans to preserve their value by retreating to the activities that seem least algorithmic. One may have questions about the stability of this equilibrium: might LLMs simply become “better” at judgment too? (As the success of the recommendation algorithms, such as Spotify's, makes clear, taste is hardly incomputable.)
But, putting this question aside, the larger ones remain: Would this mode of collaboration between human and machine be good for the human writer? By delegating the nitty-gritty of writing, by becoming a manager, collector, and curator of an algorithmic word factory, would we human writers have gained more than we would have given up? Or would the return on this trade be only more proletarianization? Those preceding two questions are not entirely rhetorical. Should many human writers decide to follow Marche's program, perhaps many or most of them would find that they come out ahead, unlocking some treasured new competency, some new entanglement of learning and delight. Marche hints at what this may be: to be a savvy literary prompt engineer and curator of the text that LLMs generate in response to prompts, one must cultivate deep, explicit (rather than tacit) knowledge of literary style. He draws an analogy to hip-hop—though his reasoning applies as well to other forms of sample-heavy musical genres such as house music: “You don't need to know how to play the drums, but you do need to be able to reference the entire history of beats and hooks.” Perhaps the not-so-distant future will witness the formation of communities of those who relish the texture of LLM-generated text, expertly prompting and collecting the model's most unlikely, obscure tenors and tones with the same passion as house producers rifling through a dusty and unloved bin of vinyl records. Or an alternative future: humans—even those who read—will forget how to write, and, through this forgetting, will also forget to how to read, at least in the manner for which writing is the best training.3
My point here is not to dissuade the sort of experimentation that Marche imagines. After all, these are early days in the history of LLMs and our interactions with them. But, even at this point, it could be useful to have a sense of what, in the field of HCI, one would call the “design space” (MacLean et al. 1991)—the various models available to designers, mapped according to distinguishing tendencies. Constructing word factories, where one leaves the rapid and fluent generation of text to LLMs (and other algorithmic media), would occupy only one corner of this map of possibilities—and, if one's goal is to make sure that any aspect of the writer's intelligence that becomes externalized is recompensed by “interiorization in return,” it is not the most obvious direction to pursue. To point to a different quadrant of the design space, I offer this heuristic: avoid using algorithmic assistance primarily to make writing more pleasant and efficient; instead make sure from the outset that algorithmic assistance will reveal to the writer a worthy new challenge or struggle, one that could be reasonably expected to make writing in some way more difficult and slower.4 Or, more succinctly: don't build word factories, build word gyms.
Word Gyms
In my brief talk at Critical AI @ Rutgers's “AI Futures” roundtable in February 2023, I described a handful of systems—word gyms—that I have designed over the past several years (Booten 2023). For my purposes, a word gym is a digital interface that—like GrammarlyGO or, for that matter, any word processor whose squiggly red lines attend to botched spellings and subject-verb disagreements—interferes in the writing process in some way. But, while those systems try to make writing easier, these interfaces try to “make writing harder,” much the same way that the clanking “hack squat” machine at the local YMCA makes it harder to straighten one's legs at the knees. The concept of an algorithmic word gym owes something, both in form and name, to the progymnasmata of classical rhetoric: those small exercises meant to stretch and strengthen the rhetor's mind in preparation for advanced disquisition (Kennedy 2003).
Most of these word gyms are meant to interfere in the writing of poetry. The first three rely on natural language processing (NLP) techniques (which mobilize computation for research on language) but do not entail fully fledged “generative AI.” In what follows, I touch on these designs before focusing on a fourth word gym that uses an LLM, GPT-3. Since I built these word gyms for myself in response to my own sense of what sort of noetic5 “training” I most needed to improve my thinking and writing, my descriptions of them will necessarily be personal in nature.
Word Gym 1: Style Vise
The goal of Style Vise is to goad the writer toward more interesting syntax.6 In this case, syntax is operationalized crudely as mere part-of-speech sequences. In other words, Style Vise encourages the writer to use parts of speech in unusual combinations: not “The dog ran . . . ” (a determiner, a singular noun, then a verb—one of the most typical three-word part-of-speech sequences) but, perhaps, “Ran the dog . . . ”
Whenever the user inputs a sentence, Style Vise performs upon it one of the most basic techniques of NLP: part-of-speech tagging. Then, focusing only on that sequence of tags (as distinct from the meaning of the words involved), Style Vise breaks this sequence into overlapping subsequences of a certain predetermined length (for instance, the first through the sixth tag, the second through the seventh, the third through the eighth, and so on). It checks these subsequences against a large corpus of texts that has already been processed in the same way, part-of-speech tagged and broken into sub-sequences. As far as Style Vise is concerned, a sentence input by the user is of satisfactory quality if and only if it is comprised exclusively of part-of-speech chunks that do not appear, or appear only very rarely, in this comparison corpus. The user's interaction with Style Vise takes the form of a back-and-forth via a text-based interface: the user inputs a sentence and receives a response—either a “passing” one or a critique, with the offending sub-sequence of words (with the “offense” corresponding to a high-usage sub-sequence) dismissively scratched through, in which case the user may try again. Style Vise keeps track of the number of times the user has written something with high-usage syntax, taking away “lives” (figured by a heart emoji) whenever they do:
>The drawer was holding something we often call shine, so I put my ear up to it.
The drawer was holding something we
often call shine , so I put my ear up to it .. . . zzzzz . . .
-1
>The drawer was holding something we often consider to be shiny, so I put my ear up to it.
The drawer was holding something we often consider to be shiny ,
so I put my ear up to it .Is that all you got?
-1
>The drawer was holding something we often consider to be shiny, so earwise I did where its abdomen was.
+1
Word Gym 2: Manual Style Transfer
In computer science, “style transfer” is the process of taking a text (or an image) that is characteristic of one author or genre and transforming it so it bears the stylistic markers of another—the lyrics of a Top 40 hit into the style of Chaucer, a presidential portrait into the style of anime. Manual Style Transfer is a word gym that tasks the writer with imitating this sort of algorithmic instruction in order to convert a sentence of poetry into the style of Walt Whitman.
This task is a difficult one, and the user is meant to make small, iterative changes to the sentence. When the writer inputs an attempt, the underlying system calculates two values:
How much stylistic resemblance is there between the input sentence and the style of the author? A vector-space model of language represents linguistic units (typically words) as vectors—lists of numbers representing coordinates in multidimensional space. Similar words (e.g., dog and wolf) will possess similar vectors. The Word Mover's Distance algorithm (Kusner et al. 2015) is then used to calculate the similarity of any two sentences based on the vectors corresponding to the words in these sentences.
How similar is the input sentence to the original sentence? I trained a (small and rather unsophisticated) machine learning model to distinguish between Whitman and a comparison corpus made up of texts by authors. It does so based on unigrams (single words) and bigrams (consecutive pairs of words), and it will assign a high score for “Whitmanicity” to those texts that contain unigrams or bigrams that appear in Whitman's writing more frequently than the comparison corpus.
A web-based interface provides visual feedback to guide the writer toward a successful completion of this task. The first value—the Word Mover's Distance score between the original sentence and the user's revision of it—is plotted along the y axis, the second—Whitmanicity—along the x axis. Thus, every attempt to manually transfer the style of the original sentence into the style of Whitman is visualized as a point in two-dimensional space. It is not so difficult to lard one's sentence with obviously Whitmanic words or sequences of words, but doing so carelessly, without regard for the meaning of the original sentence, will cause the Word Mover's Distance score to balloon. Successful attempts manage to land in the gray square in the bottom right (high Whitmanicity, low semantic distance from the original sentence).
Word Gym 3: Lotus Chorus Workshop
Lotus Chorus Workshop is another word gym, a “creative-writing workshop simulator” that uses NLP techniques to offer feedback to poets in the midst of writing poetry. Far more multifarious than the previous two word gyms, this one possesses (at present) nearly twenty “critique functions,” different ways of suggesting specific changes to a sentence of poetry. For instance:
Ban Parts of Speech randomly chooses a part of speech that is currently used in the input sentence and suggests that the writer make do without it.
Wikipedia Rare Word Suggestion picks a noun from the input sentence, finds a very rare word that occurs on this noun's Wikipedia page, and encourages the writer to use this rare word.
Suggest Rhyme picks two nouns or adjectives from the input sentence and suggests that the writer revise the sentence so that it rhymes with the input.
Flip merely suggests that the writer move a word to the beginning of the sentence.
Each of these functions is assigned to an emoji (such as or ), representing a participant in a creative writing workshop. Like Style Vise, Lotus Chorus Workshop takes the form of a text-user interface; the human writer offers a sentence, receives feedback from a random selection of the critique functions, revises it, and then writes a new sentence, and so on. Responding to multiple critique functions can get quite complicated:
>The architect has misplaced the favorite house, the small castle with the thorn wall.
: This needs some oomph . . . when I see the word “castle” I think of “arrowslits.”
: I'd move “misplaced” to the beginning of the sentence.
: Replace “favorite” with a similar word that rhymes with “thorn.”
>Misplaced is now the architect's most-mourned house, the castle with the thorn wall with the rose arrowslits.
: Good.
>Every new structure seems, on paper, solid and handsome, but to build it is to reveal yet another astonishing error.
: I think this line could be shifted into the future tense.
: Try getting rid of all the “r”‘s.
: Yes! Keep going: “ . . . yet another astonishing error, the kind . . . ”
> Every new shape-system will seem, in CAD, solid and handsome, but building it will only unveil a new astonishing misthinking, the kind that will itch until new plans claw, soothe.
: Much better.
As I have explained elsewhere (Booten 2023), Lotus Chorus Workshop, by peppering the writer with diverse and possibly even contradictory critiques that assail their text at various levels (letter, word, phrase, sentence), tries to place them into a state of cognitive overload.
An AI Word Gym: Salon des Fantômes
Now I arrive at a word gym that makes use of an LLM: Salon des Fantômes is a “literary, philosophical, and artistic salon simulator” built on top of GPT-3.
As many have noticed, LLMs tend to be passable mimics, able to generate text that, when prompted to do so, suggests the style of various literary and political characters (e.g., Shakespeare or Donald Trump). With Salon des Fantômes, I wanted to take advantage of the dramaturgical and even “heteroglossic” (Bakhtin 2008) potential of GPT-3 to allow myself to attend a salon in which I would be the only human guest, the others being a diverse cast of “characters” fabricated or coaxed from the model.
Conversation, I take it, is a social technique that can allow one to refine and evolve one's own thoughts by testing them against the ideas of others. Some conversations are intimate and, through this extended intimacy, become “deep” conversations. But there are other successful models of conversation. At a cocktail party, a huddle of perfect strangers may discuss politics or art, liberally offering bold claims, counterpoints, and good-natured barbs. It does not matter that the points one makes while holding a martini and straining to be heard over the music are not as refined as they might be, nor that the huddle evaporates before anything like a conclusion is reached. If the “depth” of this species of conversation is relatively low, the number of conversants (and thus, at least in theory, perspectives) is relatively high. And not just the total number of conversants but the combinatorial possibilities: assuming that a cocktail party has twenty guests, and assuming that the guests always huddle to talk in groups of five, there are 15,504 unique combinations possible.
Having already suffered through a few rather dull conversations with GPT-3, I suspected that the model would make a poor intimate; perhaps, however, it could make up for its lack of quality with sheer quantity—that is, of characters or perspectives. Salon des Fantômes is essentially a wrapper around GPT-3 that instructs the model to respond to me and to itself in the voices of various characters, twenty or so in all. The salon works like this: a subset of five (or so) characters are randomly chosen, and a conversational topic or question is drawn from a list of them. The autogenerated characters will begin to “chat” among each other, though the human user can jump in at any time.
But fabricating minimally worthy, minimally interesting interlocutors was not as easy as I had imagined. When I instructed GPT-3 to respond to me in the style of (for instance) Sigmund Freud, its responses were more evocative of the great psychoanalyst than of Simone Weil or Martin Heidegger, but only barely so; I wanted eau de Freud, but what I received from the model was diluted to an almost homeopathic concentration.7 After writing some code to manage the salon's conversation, bouncing between characters and myself, I noticed a related problem: as the characters began to speak with each other and with myself, they tended to seek agreement. Thus, a vaguely Mao-ish character and a vaguely libertarian character, instead of bludgeoning each other with the sharper edges of their respective ideologies, would make haste to find common cause, putting down their arms in favor of a milquetoast common ground. This did not bode well, since my purpose was to build an artificial arena for frenzied, zesty discourse, full of clashing voices and perspectives.
One of several tactics that I deployed against the model's stultifying détentes is what I call the random word trick. Each “character” is associated with a list of words (and phrases), typically between ten and twenty of them. So, for instance, the vaguely Freudian character's words included cathexis, disavowal, and repetition compulsion; the vaguely Maoist character's, cadre, anti-rightist, and imperialism. At each turn in the conversation, my code would randomly select one from a subset of the characters (my current conversational huddle) and instruct GPT-3 to respond in the voice of that character and to include this random word in its response. While not perfect, this trick did mitigate two problems at once: using the word cathexis made the Freudian character more Freudian, but it also made it less boring. GPT-3, driven by statistical probabilisms, wants to make obvious sense. When given a hurdle to obvious sense-making, it will strain for this plausibility—with the result that this “sense” will become a bit less obvious and more interesting.
An example of this random word trick in action: during my own weeklong use of Salon de Fantômes (see Booten 2024), the conversation at one point turned to the topic of why the night is to be preferred to the day. One bot character, an employee of “a consulting firm specializing in global supply chain solutions,” was randomly enlisted to respond to this question. The randomly generated prompt to GPT-3 included the stipulation that this particular bot character's utterance must use the term “fill rate” (chosen from a list of relevant corporate and logistical vocabulary, such as “e-procurement tools” and “days on hand”). And so, the consulting firm specialist responded like this: “From another angle, I believe that the night is to be preferred to the day for the simple reason that it allows us to optimize our fill rate. In the night, we can take advantage of the higher levels of productivity that come with fewer distractions, engage in strategic planning for our global supply chain solutions, and ensure that our fill rate is maximized.” Having experimented with different ways of getting GPT-3 to take on unique voices and perspectives, my sense is that the specificity of this character's response is in part due to the pressures exerted by the random phrase on the model. When the random words from which my code draws are all (like “fill rate” in the example above) related to a profession or some other identity, the random word trick can force the LLM to stay in character. But, since a randomly chosen word often makes little obvious sense to use in a particular step in the conversation, the LLM will, in order to make use of it, veer past verisimilitude and into absurd, off-kilter caricature.
The debate continued. Another character—“an ancient Greek philosopher who believes above all that one must care for one's soul”—offered a rebuttal to the corporate consultant: “I must challenge you to consider the spiritual implications of such an endeavor. Do we truly gain from increasing our days on hand if we are still bound by our own flawed and defective nature? I believe that what is most important is that we use our night hours for spiritual reflection, for dreaming, for pondering the bigger questions in life, and for connecting with what is most important to us.” To which the corporate consultant more or less repeated her earlier point about the night facilitating not philosophical meditation or reflection but “higher-order thought processes and strategic planning for our global supply chain solutions.” At this point, I myself interjected, trying to synthesize these two perspectives: “The day is indeed the time of logistics, the time to move things around and to meet people and get stuff done. At night, we turn inward, we slow down, we are unable to carry things as expeditiously, and so it is the spiritual time. But, if this is the case, we may wonder if humans should live close to the poles, where spiritual and logistical life may become imbalanced, especially near the summer and winter solstices.” It must be left to the reader to decide whether this thought of mine was one worth having; at the very least, however, this thought seems to have been made possible by my interactions with the bots. It is a small example of what we might call cognitive epiphytism—thoughts growing upon an (in this case, probabilistic) substrate of other thoughts and determined in part by the character of that substrate, much the same way that a lichen (or another epiphyte) grows atop another plant, but only if this plant's surface has the correct characteristics (mineral content, physical texture, moisture, etc.) (Brodo 1973).
I have described Salon des Fantômes's primary mode—free-form conversation between the human and a random subset of the AI characters—but this system also contains a “minor mode” (or a “mini-game” of sorts). Here the user is put in the position of the judge of the salon's official art exhibition. Two works of art are chosen from a list of important works of (mostly twentieth-century) art—for example, Marina Abramović’s Rhythm 0 and Maya Lin's Wave Field, or Derek Jarman's Blue and Carolee Schneemann's Meat Joy—and the user, as the ultimate judge, must decide which of the two is better and thus will receive the exhibition's top prize. Before making this weighty decision, however, the user will receive advice from several assistant judges, AI characters who will be instructed to give their opinion on which of the two works is superior. (Since GPT-3 is a text-only model, they do so based solely on hard-coded descriptions of the artworks as well as whatever knowledge about them, sometimes faulty, exists in the model itself.) I developed this “panel of judges” mode after noticing a limitation of the free-form conversation mode: due to the way that my code organizes the conversation, characters always respond to the immediately previous comment. If this allows for moments of tension or disagreement, they are only fleeting, since the conversation is always moving on. (The conversation can also, at times, wander too far from the original question.) The panel of judges mode forgoes conversation per se; rather, it merely assails or assays the objects from a variety of perspectives, leaving it up to the human writer to put these perspectives into conversation through their own, summative response.
In my use of this panel of judges mode, I often found myself disagreeing, or wanting to disagree with, the rather banal things that my AI interlocutors had to say. For instance, at one point, the other judges and I were tasked with deciding which was superior, Henry Moore's Reclining Figure or one of Dan Flavin's neon works. One bot character criticized Flavin's sculpture for being much more “lifeless” than Moore's. Another bot character—a kind of mystic based on Simone Weil—implicitly disagreed with this judgment by describing the “divine forces” visible in Flavin's sculpture. Yet a third character referred Flavin's sculpture (disparagingly) as “something from a sci-fi movie and not an art piece.” In forging my judgment, I felt the need to reject all of these opinions. The work is not “sci-fi” at all, I decided, but scientific, “psychosomatic” because of the powerful effect that light has upon our bodies and our minds. “Lifeless,” I argued, “is not pejorative,” since Flavin's sculpture's “very lifelessness makes us consider our life, our life force, which is governed by simple forces like attraction to certain colors, fear of fire, attraction to fire.” Perhaps a fit way to capture the utility of this interaction is catharsis. The rather boring, too-sensible pronouncements of bot interlocutors goad the flesh-and-blood intellectual to rebel.
Bot interlocutors can be useful when they offer rather boring, too-sensible pronouncements because this is an opportunity for the human to rebel against boringness and too much sense.
The difference between Salon des Fantômes's two modes—free-flowing discussion and the judges panel—is one of shape: a chain of variable length with transitions from bot to human or to another bot (see fig. 2) versus a firmly constrained and hierarchical discourse unit (see fig. 3).
If you, the reader, are an educator, you have no doubt had to make analogous decisions about how to architect the classroom environment. (Should we jump right into a whole-class discussion, or should students break into “small groups” first? Do they need discussion prompts, or should I just ask them what they found interesting about the reading?) With an LLM, one has far more power to precisely design the shape or shapes of a machine-generated conversation, but the goal can be just the same: to optimize a discursive substrate for human cognition to grow upon.
Considering Media As If from the Perspective of a Weightlifter
I have presented four examples of “word gyms.” Each of these gyms “assists” a human writer in some way: Style Vise, for instance, keeps track of the writer's grammar, while Salon des Fantômes generates contradictory perspectives, counterarguments to which the human writer may respond. However, this assistance also makes writing newly difficult; now the user of Style Vise must avoid common syntactic patterns, and now the user of Salon des Fantômes must respond to interlocutors who espouse (hopefully quite odd and unfamiliar) perspectives. I have done little more so far than “show-and-tell.” Since these pages appear in a special issue of Critical AI, I conclude by reflecting on what is “critical” about this sort of “critical making” (Ratto 2011)—a point that takes me back to the start of this essay.
A focus on proletarianization should encourage the critic to be a bit more hesitant to write off LLMs (or any computational media) outright. Technologies are pharmacological: their effects— whether curative, toxic, or a little of each— are not inherent but manifest in their particular uses. A pharmakological approach must, then, be an empirical one that is open to the possibility that even “faulty” technologies can be used therapeutically (just as potentially beneficial ones can be used in toxic ways), and so it should locate its critique in specific use-cases and applications. An LLM such as ChatGPT is neither a word gym nor a word factory. Nor is it a brainstorming buddy, a mere spelling corrector, or the sole author of the writing to which you might affix your name. It is, with varying degrees of competence, all these things, though not all at once. The result of this manifest lability is that the burden of deciding how to use the system falls on the user, who is simultaneously always positioned as a designer in charge of crafting their own user experience, whether by careful planning or on-the-fly iteration.
By setting out several examples of word gyms, I have tried to give some sense of how broad and diverse is the design space of interactive systems that use computation to challenge writers to be stronger and more limber. Even this small handful of examples contains some diversity: feedback of a single type versus of multifarious types; humanlike feedback versus numerical feedback; gamelike versus not; open versus constrained; text-based dialogues versus interactive visualization; semantic versus syntactic; aspiring-to-be fun versus intentionally boring; LLMs versus small, custom-trained language models (or versus mere NLP legerdemain). These few designs, I hasten to add, are merely points in this possibility space; I do not argue that they are better or worse than other ones nearby or distant. The field of human-computer interaction is ripe with other examples worth taking seriously or at least trying out. For instance, Metaphoria is a tool that “assists” the writer by drawing connections between some metaphorical tenor and vehicle (e.g., “faith can become infected like an ear”) (Gero and Chilton 2019). I put the word assist in scare quotes because the metaphors that it offers are often catachrestic or simply bizarre, if entirely logical: a poet integrating one of its algorithm-forged metaphors into their own verse would first have to strain and stretch to figure out what obscure connection the model is encouraging them to notice. Scraft, a writing assistant that raises “thought-provoking Socratic questions” as the writer types, is an even more straightforward example of the way that LLMs can be used in a gymnastic way, adding rather than taking away pressure (Kim and Tan 2023). From the field of (computer-assisted) literary production, K Allado-McDowell's Pharmako-AI (2020) features the outputs of writing with the assistance of GPT-3: Allado-McDowell would write passages of varying length, after which the model would take over, generating more text (which the human author would then sometimes “prune”). Midway through this book, Allado-McDowell reflects, “My internal model has already, in the space of a week, absorbed some of the patterns found in GPT output” (58). This is an ambivalent reflection, especially given the “political nature of machine learning training sets,” and yet Allado-McDowell compares this form of writing to a ritualized psychedelic practice, a way of “reshaping” and opening their mind. Placing the technology and politics of LLMs beneath the pressure of critique need not be mutually exclusive with using these same LLMs to put pressure on one's own customary patterns of thinking.
Pharmako-AI represents one early and intrepid experiment with LLMs, which also means that its use of the GPT-3 model (like my own use of 3.5) seems already antiquated. Were Allado-McDowell to repeat the process of Pharmako-AI with a more recent model, such as GPT-4, perhaps this model would hold up an even more luminous mirror to their language. Its increased context-window (the number of tokens that the model can take as input for the generation of the next token) and its ability to accept images as input8 make it useful for a different range of applications. On the other hand, it is not inconceivable that those “patterns found in GPT output” that Allado-McDowell found themselves absorbing would be less salient, less absorbable or bioavailable, in a state-of-the-art-model. More generally, the newest or the largest language model may lack expressive affordances possessed by its predecessors (especially given the increasingly ham-fisted use of RLHF to curtail the realm of topics an LLM might expound upon [Goodlad and Baker 2023]). Even the blatant flaws of earlier models, such as GPT-3’s well-known tendency to frequently “hallucinate” spurious historical facts or academic citations, could be useful for certain exercises, such as one in which the human must try to deduce, based on their own knowledge, whether an LLM is hallucinating. While companies such as OpenAI will always be obliged to extol their most recent model as the ne plus ultra, individual users may benefit from cultivating a more nuanced as well as dispassionate sense of the differences between various models—new and old, gargantuan and merely large.9
Furthermore, insofar as it is necessary now to design for ourselves and each other algorithmic systems that help us to think instead of thinking for us, I suggest that we do not limit ourselves to the currently fashionable LLMs. ChatGPT and Google's Bard are impressive and no doubt will only become more so, but the penumbra of hype threatens to obscure other, simpler modes of computation that, despite and because of their comparative simplicity, have affordances that we have yet to fully explore (not least because something ostensibly better is now glittering for our attention).
To return to the examples I have already elaborated above: compare, for instance, Salon des Fantômes to Style Vise. The former makes use of a cutting-edge language model composed of billions of parameters; the latter does nothing more complicated than part-of-speech tagging. Next to GPT-3, part-of-speech tagging is Stone Age machine learning. But these different algorithmic media—an LLM and a very small model capable only of assigning part-of-speech tags—invite different applications. LLMs can “do voices,” so my choice to use OpenAI's model to cobble together a heteroglossic conversation was an obvious one. While LLMs no doubt can perform part-of-speech tagging, perhaps even more ably than the modest part-of-speech tagger I was using, it would seem odd, even wasteful, to use such a complicated and resource-intensive system for such a simple purpose. Furthermore, the mere possibility of tokenizing10 and then part-of-speech tagging text (as in Style Vise), or of outputting a real number that represents the confidence of a classification algorithm (as in Manual Style Transfer), or of identifying hapax legomena from Wikipedia articles and recommending the writer use them (as in Lotus Chorus Workshop) are far from the most obvious purposes to which an LLM could be put.
One part of being critical vis-à-vis AI is remembering that the computation of language, which did not begin with AI, does not end with it either. Part of that effort could and should involve, still, knowing how to code, knowing how to perform the nuts and bolts of NLP with Python, R, JavaScript, or some other language. It is through doing so that one has an opportunity and indeed an obligation to think about language differently (more rudimentarily in some ways, more formally in others) than one does when communing with an LLM that gracefully accepts the vaguest sorts of natural language commands.11
It may be the case that writing to avoid common part-of-speech sequences is better, more salutary training for poets and other writers than is using a commercial program that allegedly mimics the style of famous authors, such as Sudowrite, or even conversing with a phantom salon. This is an empirical question, and it is not one that can be settled by comparing any two systems in vacuo, based only on their technical merits or conceptual coolness.12 The only way to know would be to experiment on ourselves, so my claim that we need to build word gyms implies a pragmatic, experimental ethos of the word-gym-goer. Vigilant against the loss of the mind's agility, the builder of word gyms is more interested in the human who uses the technology than in the technology itself. At a brick-and-mortar gym, when experienced weightlifters scorn mechanically complex workout machines in favor of crude plates of rusted cast iron, they do so not because they fetishize simplicity but because they value different sorts of complexity: of their movement, of the hormetic pressures exerted by that movement at many points across the body's axes. Through trial and error, they have figured out what works for them. Likewise, and in their spirit, we might resist the hype around the current state-of-the-art (and the next, and the next) by asking the salesperson this question: But how will it make my mind's body move?
Notes
Stiegler elaborates this becoming as “transindividuation” (see Stiegler and Rogoff 2010).
For a more in-depth discussion of the relationship between proletarianization, pharmaka, and the Grundrisse, see Manche 2021.
Once again, I follow Stiegler (2015), for whom sample-based music represents, if ambivalently and partially, one antidote to the widespread proletarianization of musical sensibility initiated by recorded music, which enabled people to consume music without knowing how to compose it, play it, or otherwise participate in it themselves.
I am expanding upon a point I have made previously: we need to figure out new ways of “making writing harder” (Booten 2021b).
I use the term noetic rather than, for instance, mental or creative because of the capaciousness of the etymological root noos, which, as Stiegler (2016) notes from time to time, means something like “mind” but also “spirit,” not strictly “cognition.” For this reason, I call this sort of small-scale, design-based intervention into the way that algorithmic media shapes one's own mind noöhacking.
For a more sustained consideration of this particular gym, see Booten 2021a.
Fine-tuning the model on various corpora was impracticable, in part due to the sheer number of characters, in part because some of these characters were entirely fabrications of my own imagination and so no such corpora existed. I chose instead to explore the model as it was, trying to see what voices could be lured from it.
This would have been particularly useful for the Salon des Fantômes project. Since GPT-3.5 was text-only, I was forced to offer it my own very imperfect textual descriptions of paintings and other artworks I wanted it to discuss with me.
A computer-assisted writer, with hipsterish scorn, might make it known that they only write with fountain pens, only listen to vinyl, and only generate text with GPT-2.
Breaking text (a sequence of characters) into a list of constituent units such as words and punctuation.
The user may ask an LLM something like “Make this sound more like Whitman, more concise, with more variation in syntax.” When coding any such transformations (in, for instance, Python), the programmer must come up with precise operational definitions of descriptors such as “like Whitman,” “concise,” and “variation in syntax.”
Here I am elaborating on a previous suggestion that the empirical techniques of HCI could be used to help guide the design of writing tools that care for the mind of the writer (Booten 2021b).