Does Not Compute: Why Machines Need a Practical Sense of Humor

A path to higher forms of humor is a well-rounded education in all the things that make us humans human.
Image Source: Open AI’s GPT-2 (SoTA Language Model)
By: Tony Veale

Last summer, in the brief interlude between the first and second lockdowns, my wife and I slipped out to the cinema to see the new Christopher Nolan film, “Tenet.” Like “Memento” on steroids, this promised to be a head-wrecking time-travel yarn in which shadowy figures use cod science to move backward through time. The next day, I summarized our experience in an email to a fellow Nolan fan:

“BTW, we went to see ‘Tenet’ last night. Our brains are still bent out of shape. But we plan to see it again last week, so we’ll understand it eventually.”

Gmail helpfully underlined my choice of “last” in blue, to politely signal that I should, perhaps, rethink it. After all, how could a forward-looking plan take place in the past, unless I too could travel back in time? While Gmail fails to grasp my intent, it shows an impressive grasp of everyday language in spotting my conflict of times and tenses. To do this, it leverages powerful machine learning techniques that train its language model (LM) to mimic the natural rhythms of language.

We find LMs at the root of many AI technologies, from the auto-predict features of texting apps to chat bots, speech recognition, machine translation, and automated storytelling. They can be used to fill the blanks in cloze tests, or to weigh the linguistic acceptability of different wordings of the same idea, or to generate the most likely completions for a given prompt. LMs are all about form, not substance, but when the surface forms are so expressive, the deeper substance is tacitly modeled too. So, it is quite possible to let Gmail write large chunks of your emails for you — certainly the parts that are the most ritualized, or the most predictable — while you occasionally intervene with the specific details that its model cannot predict for itself.

LMs make real the fear that George Orwell expressed in a famously polemical essay about language from 1946. Orwell fretted that English was so clogged with easy clichés, inviting idioms, and stale metaphors that writers must fight against the pull of convention so as to say anything that is vivid and fresh. As he put it, “ready-made phrases … will construct your sentences for you, even think your thoughts for you.” He would no doubt be appalled by how Gmail’s smart compose feature is contributing to the IKEAfication of language, and side with those who argue that some LMs are little more than “stochastic parrots.” These models may be a rather sophisticated marriage of statistics and the cut-up method of William S. Burroughs, but recall how that method was ultimately used to shatter clichés and to jump out of, not leap headlong into, the most rutted avenues of language.

And so it is with LMs: A machine can use an LM to enforce linguistic orthodoxy or to subvert it, making predictable choices here and highly improbable ones there. It can be used to recognize unlikely sequences, like my “see it again last week,” and to suggest normative fixes, like “next week.” But can also do the reverse, and find compelling ways to make the stale and the conventional seem fresh again.

Orwell would no doubt be appalled by how Gmail’s smart compose feature is contributing to the IKEAfication of language.

Consider puns, the earliest form of verbal humor that children, and child-like machines, learn to master. Suppose that, feeling upbeat about your Covid-19 booster, you describe the rollout as “a jab well done.” An LM will assign a low probability to this wording, and a much higher probability to its phonetic neighbor, “a job well done,” in much the same way that Gmail predicts “next,” not “last,” as the most likely word after “plan to see it again.” But your pun is more easily appreciated as a deliberate attempt at humor, because it combines both phonetic similarity (jab/job) and statistical divergence (each wording has a very different probability of occurring). If a machine can also link the word “jab” to the broader context of vaccination, using other statistical models — the same models would recognize a “jab” pun in a boxing context too — it can confidently assert that your choice of words is both deliberate and meaningful: You did mean to say “jab,” where “jab” can be taken at face value and as a substitute for “job.” A machine can also apply this process in reverse, to swap out a word in a familiar setting, such as a popular idiom, for one that sounds alike, that has a much lower likelihood (according to our LM) of being seen in this setting, and that has a solid statistical grounding in the larger context. The key to punning is recoverability: Our substitutions must be recognized for what they are, and then easily undone.

So how do you go from a machine sense of punnery to a machine sense of, say, irony? Machines are competent at the former, even if mastery is elusive, yet stumble on the latter. However, a moment’s thought reveals the kinship between these two forms of humor. The two-for-one echoing in irony may be conceptual rather than phonetic, but an ironic echo should be just as recoverable. When I swapped next for last in the context of a time-travel movie, I relied on the semantic relationship between the two, and on the fact that “last” echoed the plot of “Tenet.” Irony may operate on a higher level of knowledge, of words and the world, where both the rewards and the risks are higher, but the fundamental mechanisms are the same. It is the kinds, and sources, of knowledge that differ, so an ironic machine needs a more extensive, and more expensive, education to master its raw materials.

The next obvious question: What does a machine’s sense of irony do for us, the machine’s users? The strongest case can be made for the automated recognition of irony, and of sarcasm too, since these have such dramatic effects on the perception of a user’s sentiment, either in online reviews, social media, and emails, or in our direct interactions with the machine. To do its job properly, a machine needs to grasp our intent, but irony can do for sentiment analysis what a magnet can do to a compass, making true north difficult to locate. So, for instance, if a machine is going to derive actionable insights from an online product review, it needs to know whether a positive outlook is sincere or ironic.

If a machine is going to derive actionable insights from an online product review, it needs to know whether a positive outlook is sincere or ironic.

In an email setting, a machine sense of irony can gauge whether an incongruity is deliberate or accidental, and prompt the machine to suggest a fix if it is the latter. But its real value is not in the removal of that little blue line, but in how it allows a machine to help and advise its users. Perhaps the context is not clear enough to support irony, and needs a little more oomph to draw out the humor? Even if the irony is suitably anchored, perhaps it is not suited to its addressee, who may be an individual with no track record of wit in their own emails, or a rather large distribution list that poses a high risk of misunderstanding or accidental offense?

Just as a good friend might advise us against drunk driving and drunk dialing, a machine with enough emotional intelligence to grasp and use irony can advise us against angry emailing and snarky tweeting. Just think of the careers that have been ruined by smart-aleck tweets at 3 am that turn rancid in the light of day. A tap on the shoulder or a forced timeout might well have saved that day. Saving us from our impulses may still be the best and most practical reason to give machines a sense of humor. A sense of irony, and of how a witty upending of a jaded nostrum can soften the indignity of implied criticism, can lend real weight to an intervention, and make machines a joy to use even when their target is us. To get to this point, we must lift their sense of humor out of the playground, where puns reign supreme, into the realm of ideas, so as to turn the unearned wisdom of convention on its head.

As this playground metaphor suggests, the path from punning to higher forms of humor is a well-rounded education in all the things that make us humans human. There is a good reason why the lonely-hearted seek out partners with a GSOH (a good sense of humor) in dating profiles. Jokes are fun, but it is what they rest on — an understanding of others, a willingness to laugh at oneself, and a nimbleness with norms that present themselves as rules — that matters most to us humans.

Tony Veale is Associate Professor of Computer Science at University College Dublin, with a focus on computational creativity. He is the coauthor of “Twitterbots: Making Machines That Make Meaning” and author of “Your Wit Is My Command: Building AIs with a Sense of Humor.”

Posted on
The MIT Press is a mission-driven, not-for-profit scholarly publisher. Your support helps make it possible for us to create open publishing models and produce books of superior design quality.