From Atoms to AI: The Futile Search for a “Perfect” Language

In order to convince us of the atomic hypothesis, wherein atoms are the building blocks of all matter, the Roman philosopher Lucretius uses the model of the alphabet. Like alphabetical letters, he proposed, there is not an infinite number of types of matter. Rather, everything we observe is the result of infinite copies of finite types of elements that, due to their specific shape, are driven by random movement, expressing new shapes and properties that the atoms themselves did not originally possess.

However, this credible and fascinating hypothesis leaves us with a new problem, which Lucretius does not invoke but which is logically valid: Is there any order among the basic elements — that is, an underlying structure or set of rules — themselves? And can this order manifest itself in the objects we can observe?
Lucretius explored these questions through the metaphor of language — specifically through the “elements” of the alphabet. In Western alphabets, as everyone knows, letters have a rigid order, one that dates back to a distant history, beginning in Mesopotamia and Egypt. It is certainly no coincidence that the meaning of the word alphabet alludes precisely and solely to the relative order of the first two letters (alpha and beta). However, there appears to be no deeper reason for any letter to precede any other in the alphabet. This same problem of order arises whenever we speak of any primitive elements: There is no deeper reason for any element to precede the other.
Reading Lucretius and his insistence on the question of order reminded me of the words of another canonical giant — and something he expressed in a relatively neglected text. I’m speaking of René Descartes, who, in a 1629 letter to Father Mersenne — a renowned philosopher and mathematician — opined about the project of a mutual friend, mathematician and polyglot Claude Hardy. The project, which is not mentioned in any other source, centered on a proposed artificial language that could be learned “in five or six hours.”
Descartes, without hesitation, completely and irrevocably demolishes this proposal for a series of exemplary reasons, ranging from the difficulty of constructing valid sounds for all languages to the fact that an invented language without literature, if not an epic, is practically useless (those who know J. R. R. Tolkien’s invented languages know this well). His reaction, however, goes far beyond the pars destruens, offering an alternative proposal that is quite amazing and totally unexpected: a reasoned and systematic catalog of all infinite possible thoughts:
Besides, I find it possible to add another invention to this, both for composing the primitive words of this language and for their characters, in such a way that it can be taught very quickly and by means of an order, that is to say, by establishing an order among all the thoughts that can enter the human mind, just as there is a natural order established among numbers. And just as in a single day one can learn to give names to all numbers up to infinity and to write them in an unknown language, even though they would amount to an infinite number of different words, one could do the same thing with all other necessary words to express all the other things that fall within the human mind. If this order were discovered, I have no doubt that this language would soon spread throughout the world, for many people would have the desire to spend five or six days in order to make themselves understood by all men.
Before approaching the very core of this quotation, let’s consider some important collateral aspects. Descartes is not the only one who dared dream of discovering a better or a perfect or universal; shortly after him, in the same century, Gottfried Wilhelm Leibniz offered his Lingua characteristica universalis and Juan de Caramuel y Lobkowitz, bishop of Vigevano, his Leptotatos; jumping ahead to the 20th century, Louis Hjelmslev presented his Glossematics and Giuseppe Peano his Algebra de Gramatica. These efforts continued all the way to the great scam of ingenious and perfect languages, which have sadly led to the delirium of ranking tongues and the effort to anchor them in the notion of “race.”
There is, of course, a danger to searching for a perfect language. It started in the mid-20th century with the Bavarian-born linguist Max Müller’s hypothesis that a noble language could be spoken only by people of the noble race, with the term “noble” standing in for “Aryan.” He refuted his theory later in life, but by then it was too late: This delusional idea had already taken root in countries all over the world and fed the political propaganda of governments calling for aggression against other populations, and even extermination, as in the case of Jewish people.
The fact that the notion of race is not biologically sustainable proved to be completely insufficient in preventing the social consequences of this delirium. In fact, as humans, the necessity to name shared phenotypical differences between groups of people — skin color, eye shape, or average height — emerges naturally, if not with the collective term “race,” which nowadays has negative connotations. Even if these biological details are eliminated on scientific grounds, another much more dangerous and underestimated issue remains: cognition. In fact, two independent ideas that are not generally taken to be dangerous in isolation constitute a powerfully explosive mixture if considered together.
The first idea is the hypothesis that some languages are “better” than others. There is a notion that some languages have a lexicon that captures abstract concepts better than others, that some operate more quickly than others, and that some appear to have a more stable word order at the level of sentences, while others have an unconstrained, even chaotic word order. Meanwhile, some view certain languages as more acoustically pleasant or harmonic than others.
Being in love with someone doesn’t authorize the lover to argue that one’s beloved is the best person in the world; the same holds for languages.
The second idea is the hypothesis that a person perceives reality and reasons about it differently depending on what language they speak.
These two hypotheses are the most radical, dangerous, and pervasive of all racisms: the one that bears on a person’s capacity to understand, think, and love. Interestingly, Dante Alighieri, in his “On Eloquence in the Vernacular,” already knew that this was a delusional argument. He mocks those who think a “better” language is possible by referring to Pietramala, an abandoned small village or a crumbling manor on the Italian Apennines between Bologna and Firenze, where the Tarlatis, a family rival to the Alighieris, lived: “Pietramala is a great city indeed, the home of the greater part of the children of Adam. For whoever is guided by such an obscene reasoning as to think that the place of his birth is the most delightful spot under the sun, he may also believe that his own language, i.e., his mother tongue, excels over all others; and, as a result, he may believe that his language was also Adam’s language.”
In this passage, Dante uses the very strong and aggressive adjective, “obscene” (in Latin obscenus, meaning both “indecent” and “inauspicious”), to qualify this reasoning — quite surprising for a lofty treatise devoted to intellectuals of all countries. This stylistic mark indirectly highlights Dante’s strong feelings, if not outrage: By exploiting such a harsh irony and unexpected verbal crudity, he expresses his opposition to those who conflate the affective domain with the rational one. Being in love with someone doesn’t authorize the lover to argue that one’s beloved is the best person in the world; the same holds for languages.
Had Western culture listened and read Dante carefully — something that sadly didn’t happen, as even Alessandro Manzoni recognized — we might not have engaged with the delirious notion of a noble language and a noble race. There is simply no language that is better than another — and certainly no perfect language.
Turning back to Descartes’s reflections, his comparison of a natural order of thoughts to the natural order of numbers is astonishing, not only for the force of his analysis but also for the originality with which he weaves ideas together to generate new understanding. It arguably surpasses all other similar dreamers in the history of Western thought: What would be needed is “an order among all the thoughts that can enter the human mind, just as there is a natural order established among numbers.”
Some examples may clarify his intuition: No one ever doubts which number follows, say, the number called one thousand seven hundred and twenty-two and what its name is; but when it comes to which word follows the word cloud, no one knows, and this is necessarily the case. We can perhaps think of the plural clouds, and then cloudy, the adjective that is derived from it, but we know these only because in the alphabetical order another letter, s, is added to cloud, making clouds a successor of cloud, and cloudy a successor to clouds because y follows the s. But that pertains to the form, the signifier, the sound: When it comes to a word’s content, the signified or the meaning, we are just not capable of placing words into an exhaustive, complete natural order.
Such was the dream expressed by Descartes, or perhaps we should say mirage, but it is a very important dream for at least three reasons.
The first is that everyone would like a map of meanings to define the borders of Babel, which would amount to a map of impossible meanings. The second is that we would like to understand how words are stored in our brain; sometimes slips make us understand that they are in alphabetical order, leading us to mistakenly utter bed instead of bread, but other times they must be arranged according to meaning because we will mistakenly utter a conceptually contiguous noun, chair, instead of bed. The third is that today, machines are being built that can work with language automatically; I am thinking of popular “talking machines” such as ChatGPT.
These machines utilize a type of computational model involving “neural networks” or (very) large language models, referred to as “(v)LLMs,” the latest in a series of inventions that started in the 1900s, which then went by the now obsolete term “cybernetics” before getting renamed as “artificial intelligence.” While AI is a fairly opaque term for this, it is at least less pretentious than what is referred to as “neuronal or neural networks,” whose mechanisms remain quite mysterious given what we know about neurons.
None of these attempts can compete in ambition with the Cartesian dream of a natural order among word meanings, but there is a substantial difference: The Cartesian dream suggests an opportunity to know language structure, our limits, and thus, ultimately, ourselves in a better way. Meanwhile, modern technology aims to facilitate everyday life by creating machines that can help humans perform routine tasks. Ever since the invention of the first tool, we have simply designed artificial objects to help us avoid fatigue and boredom.
The very idea of a grammar for the cosmos still stimulates us today, and it allows us, 2,000 years later, to think about our limits.
One difference needs to be highlighted, though: To properly compare and contrast humans and machines, it is important to understand each one’s intrinsic limits. It wasn’t that long ago that we thought machines were too primitive to resemble humans and that we needed to wait for more advanced technology for them to be capable of doing what we do.
Today, talking machines have led us to radically overturn this perspective and support the epoch-making strides of formal and comparative linguistics in the second half of the 20th century. The central fact is quite easy to grasp, given what we’ve observed so far: For machines, there exist no impossible languages. Actually, we need to be clear on this point: Even if a machine demonstrates a different “behavior” in how it understands an impossible language (the way humans do), the nonbiological nature of impossible languages still establishes a difference between humans and machines. Humans exploit natural circuits developed by genetic instructions under evolutionary forces, but only for possible languages; impossible languages progressively inhibit those circuits and are computed by other networks in the brain. In other words, for machines and their grammars (the vLLM-based programs), even the very notion of “natural circuit” has no empirical equivalent.
Machines don’t resemble us humans, not because they lack computational power, but because they are too powerful. Ultimately, they do not look like us because they do not have our limits: After all, we are our limits.
Ultimately, what stands out after reading Lucretius, at least to me, is that his reflections on the alphabet as a model of the universe should not be dismissed as an antiquarian discovery. The very idea of a grammar for the cosmos still stimulates us today, and it allows us, 2,000 years later, to think about our limits; our understanding of the world and its connection with our brain; and, finally, to trace a meaningful, non-mystical, and in fact quite measurable and substantial border between humans and machines. We still have good reasons to read Lucretius and allow him to lead us to formulate the right questions about the very different kind of Big Bang — the big bang of language — because ultimately, when we reflect on language, the data we’re considering is ourselves.
Andrea Moro, member of the Accademia dei Lincei, is Professor of General Linguistics at the Institute for Advanced Study (IUSS) in Pavia and at the Scuola Normale Superiore in Pisa, Italy. He is the author of “Impossible Languages,” “The Secret of Pietramala,” and “Lucretius and the Bat with Blue Eyes,” from which this article is adapted.