Biological Babel: Why DNA Is Humanity’s Infinite Atlas

Jorge Luis Borges, the reclusive Argentinian librarian, author, polymath, and expert on Anglo-Saxon literature, was born in 1899. Raised in the Palermo district on the shabby northern outskirts of Buenos Aires, Borges crafted miniaturist poetry and short stories that conjured up fantastic, metaphysical worlds.

The “chief event” of his life was, by his own account, his childhood fixation with his father’s library — a room with glass-fronted shelves, containing thousands of volumes. Borges was mesmerized by his father’s library and the imaginary worlds it invoked in his mind. Lacking childhood friends, he invented two imaginary companions to accompany him on his journeys into the fantastical worlds he created.
This early preoccupation with his father’s library likely inspired Borges’s celebrated story, “The Library of Babel.” Published in 1941, as part of his collection “The Garden of Forking Paths,” it is regarded as one of the most influential works of 20th-century literature. In it, Borges explores the mathematical concept of infinity through the metaphor of a library of “indefinite and perhaps infinite” size — a space containing every book that could ever be written.
The books in this hypothetical “Borgesian library” contain every possible arrangement of 25 alphabetic characters. The library is thus “complete” in its coverage of literary possibility, realizable with its constrained alphabet — timeless, eternal, “useless” and “incorruptible.” Everything that can be written with these 25 letters already pre-exists as an endless array of typographical texts, randomly distributed across the library’s infinite expanses. The library contains books on every imaginable subject, including manuals, comic books, weather reports, commentaries, and opinions, as well as biographies detailing your life and mine that span both the past and the future. It includes humanity’s complete literary canon from Lucretius to Dickens and Murakami.
But the vast majority of meaningful texts in the library remain silent, irrelevant, and untouched, gathering dust for eternity. While acknowledging that humans would be “imperfect” librarians in the Library of Babel, Borges speculated that it might nevertheless be possible to formulate “a general theory of the library” — a framework that would reveal “the formless and chaotic nature of almost all the books.”
In many respects, the metaphor of the Library of Babel is a perfect metaphor for capturing the staggering scope of possibilities in biology, too. The ability to synthesize genomes makes the prospect of constructing and exploring defined regions of this library of all possible genome sequences a tangible, physical reality.
Yet the challenge of formulating a predictive theory of the library of biological possibility — a framework that could help identify meaningful regions within its endless landscapes — has so far eluded us. To reach them, we will need a method for charting this mathematical jungle that can reliably predict where coherent biological territory might be located. Without such a system, the task of retrieving a coherent book from the sea of typographical nonsense would be like looking for a microscopic needle in an endless haystack.
Unlike Borges’s books, which are drafted in combinations of alphabetic letters, the language of biology utilizes a much simpler alphabet, comprising combinations of the four nucleotide bases of DNA. The philosopher Daniel Dennett formulated a version of this nucleotide sequence rendition of the Library of Babel, which he called “The Library of Mendel.” But I prefer to think of the imaginary library of all possible DNA nucleotide sequences as “Fred’s Library” — in honor of Frederick Sanger, a British biochemist who, in 1977, developed the chain-termination method, which provided us a way to precisely read any sequence of DNA.
The Library of Babel is a perfect metaphor for capturing the staggering scope of possibilities in biology.
Instead of housing literary works like Borges’s fictional library, Fred’s library contains the genetic blueprints of every organism that has ever existed — ants, stick insects, dandelions, and pterosaurs, to name but a few. It contains not only actual organisms, but hypothetical ones. Unicorns, giant humans with feet the size of a cricket bat, immortal humans, miniature humans, humans resistant to cancer, humans the size of fleas, flying pigs, singing plants, talking fruits, and photosynthesizing ants.
It remains to be seen whether the logical “truth” of these genomes — their capacity to compute viable organisms — can be discerned from their DNA sequences alone. The vast majority are false genomes, masquerading as legitimate blueprints for life. They lack the essential organizational structure necessary for life. Within this “genomeverse,” the regions encoding viable species exist as rare, occasional mathematical islands scattered across endless oceans of irrelevance. The sheer scale of possibility ensures that most will never be realized; there simply isn’t enough time, space, or matter in the universe to permit it.
To see plausible entities, sequences that compute viable living organisms, one would need to traverse vast sections of wilderness, populated by irrelevance and gibberish. Along the way, one would encounter the twisted, fragmentary imperfections of impossible creatures — the secret sea of unrealizable beasts — that creep, crawl, whirr, and whizz aimlessly through the twisted landscapes of mathematical implausibility.
And then, of course, there is the tedium of the endless repetitions. For much of the traveler’s journey through the library, boredom would reign supreme. Here, nonsense is the norm; the plausible, the reasonable are “an almost miraculous exception.”
Nature has devised a machine capable of exploring a tiny corner of Fred’s Library. This engine — Darwinian evolution by natural selection — has allowed life to breach the library’s outermost edges. It is a biological time machine capable of traversing the present, but it is also the generator of the past. Once a genome sequence emerges through evolution, it becomes subject to the passage of time. The history of life on Earth — encompassing all past and present species — is its improbable and magnificent legacy.
Yet there is a fundamental difference between human-made structures and biological species. Human creations arise from deliberate design, shaped by conscious intent. Nature’s complexity, in contrast, though it may appear purposeful, is the outcome of a blind and indifferent evolutionary process. Evolution operates without foresight, driven not by intention but by chance mutations and the pressures of environmental selection.
Take the Cretaceous–Paleogene extinction, for example, the most recent of the so-called “big five” mass extinction events, which occurred around 66 million years ago. It wiped out roughly three-quarters of Earth’s species, including the non-avian dinosaurs, and is thought to have been caused by the impact of an asteroid over 10 kilometers in diameter. When it crashed into the Yucatán Peninsula, the Chicxulub asteroid released an amount of energy equivalent to 10,000 times the world’s nuclear arsenal, plunging the planet into a catastrophic global winter. Without this cosmic event, the evolutionary trajectory of life on Earth – including the emergence of humans — would have been profoundly different.
Beyond chance events, the human imagination – restless, insatiable, and forever searching out the possibility of an improved future — has also intuited the potential for alternative worlds and different ways of being. Long before modern genetic science, Neolithic farmers pioneered selective breeding methods, crafting the first genetic time machine capable of exploring life’s future. The domestication of wild species occurred simultaneously in several far-flung geographical regions across the world. By selecting for favorable traits in plants and animals, early agriculturalists unlocked a new frontier of genome exploration, unwittingly laying the groundwork for modern biotechnology.
One of the greatest triumphs of Neolithic selective breeding occurred some 10,000 years ago in Mexico’s Balsas River Basin, where a wild grass called teosinte was gradually transformed into maize. Through generations of selective breeding, farmers turned an unremarkable, tough-seeded, wild grass into one of humanity’s most essential staple crops. By the time European colonizers arrived in the 15th century, maize had become the dominant food production system across the Americas. Today, it accounts for roughly one-fifth of all the calories humans consume annually.
The history of life on Earth is its improbable and magnificent legacy.
Recent studies have shown that this transformation was primarily driven by small genetic changes, often in regulatory genes — the master switches that can have outsized effects by orchestrating development. Slight adjustments to these critical control points can give rise to entirely new features. In the case of maize, these innovations reshaped teosinte’s architecture, transforming its stony-sheathed seeds into cobs with hundreds of exposed kernels. Interbreeding between different teosinte strains also played a pivotal role. By 6,000 years ago, teosinte was only partially domesticated. But once it reached the Mexican highlands, farmers crossed it with local highland strains. This infusion of new genes supercharged the genome of “second-wave” maize, endowing it with traits that completed its domestication.
The Victorians, too, took selective breeding to extravagant new heights, channeling their fascination with unnatural selection into reimagining the nature of dogs. Queen Victoria herself kept a King Charles spaniel named Dash. Yet this “passion” for dogs had a darker underside. The distinctive spots of Dalmatians, for example, introduced vulnerabilities that left the breed prone to debilitating urinary tract diseases.
Selective breeding is, however, an imperfect time machine. It can only explore adjacent genetic territories, inching forward step by tentative step. Its navigation of genetic space is performed, moreover, without maps. It has no notion of direction, of where it is heading, or of what it might find when it arrives there. To build more sophisticated time machines capable of venturing deeper into the alien worlds scattered across Fred’s Library would require further leaps in human ingenuity.
One such quantum leap occurred in 1927, when geneticist Hermann Joseph Muller enthralled a Berlin audience with the unprecedented results of his latest experiment.
Back at the University of Texas, Muller told the conference, he had bred fruit flies in a refrigerator to shield them from the intense Texan heat. Using a machine of his own design, he then irradiated them with a high dose of X-rays, which boosted their mutation rate by a hundredfold. While selective breeding merely shifted the frequencies of naturally occurring mutations, Muller was able to introduce artificial mutations directly into the genomes of species. More strikingly, the mutations proved heritable — suggesting that they had been permanently fixed in the genetic code.
Muller was awarded the 1946 Nobel Prize in Physiology or Medicine for his discovery. Yet while he had produced the prototype of a molecular time machine capable of leaping a little further into the future than selective breeding, his method remained constrained by its starting material.
It would be only nine years later that such constraints vanished. In 1955, biochemist Alexander Todd and his colleague A. M. Michelson, for the first time ever, chemically synthesized a minute fragment of artificial DNA. It was tiny — just two nucleotides long and far smaller than the tiniest genome — but they had shown it was possible to construct DNA of a predefined sequence from scratch. Most crucially, the synthetic material was indistinguishable from natural DNA.
This new capacity to chemically synthesize DNA was nothing short of revolutionary. It offered the potential to traverse the entire landscape of DNA sequence space. Life would no longer be tethered to pre-existing genetic material. Significant improvements would be required, but the foundations were in place. Todd had established the foundations of the science of synthetic genomics.
Each genome sequence in this endless expanse is, literally, a piece of informational real estate.
While the ability to synthesize DNA from air, figuratively speaking, has greatly expanded our access to Fred’s Library, journeys into its infinite landscapes require more than just a molecular time machine; they demand a map. Without having some familiarity with the library’s geography, the odds of arriving at a meaningful destination within its mathematical expanse were infinitesimally small. What we need now is a guidebook that can identify potential destinations and highlight regions of particular interest.
In “The First Voyage Around the World,” published in 1524, the Venetian explorer Antonio Pigafetta chronicled his three-year circumnavigation as part of Ferdinand Magellan’s Spanish expedition. Of the original 89 men who set out, only 56 returned. Pigafetta’s detailed maps and observations charted previously unrecorded regions — the vastness of the Pacific and the narrow straits that would later bear Magellan’s name. Fred’s Library still awaits its own Magellan: a patient and meticulous cartographer to begin the momentous task of mapping out the contours of its landscapes.
But unlike Earth’s finite geography, the DNA sequence landscapes of Fred’s Library are infinite. The instruments required for this cartographic task are not the sextants, chronometers, and compasses used for geographical exploration, but rather AI, massive biological datasets, and the ability to construct synthetic DNA sequences at scale. Taken together, this strong iteration of artificial biological intelligence will eventually enable us to chart DNA sequence space. The modern-day Pigafetta circumnavigating Fred’s Library will set sail not in a ship, but at the helm of an AI-enabled digital computer.
The potential destinations within the metaverse of potential genomes possess a concrete reality, as tangible in their own way as the Eiffel Tower, Times Square, or the Albert Memorial. Each genome sequence in this endless expanse is, literally, a piece of informational real estate. A destination to be visited, and whose nature may be defined without physically having to go there.
Adrian Woolfson is the cofounder of Genyro, a California-based biotechnology company specializing in synthetic genome design and construction. He is the author of several books, including “On the Future of Species,” from which this article is adapted.