Just a quick note on some probably unoriginal ideas I had about categorizing biological data storage methods.
While it’s important to note that these are listed in roughly the order of their evolution and respective capacity, there’s nothing particularly deterministic about their evolution. Some of them are more-or-less necessary, invoking the anthropic principle, to write this blog, but plenty of microbes get on fine with the supposedly less sophisticated subset of them.
This blog is inspired in part by the astonishing Pfizer/Moderna mRNA vaccines for COVID, which I just cannot wait to get shot into my immune system. It’s so damn cyberpunk!
DNA/RNA. The last universal ancestor used DNA to store information, most notably the instructions to make copies of itself. Its species could adapt to its environment through evolution and natural selection, which represents a very distributed and slow mechanism for reading and writing data into storage. DNA is a digital storage mechanism. Humans have about 3 billion base pairs of DNA, representing 6 gigabits of raw data. There is, however, a lot of redundancy and non-coding DNA included, and all humans share about 99.9% of their DNA. So the individual information entropy separating even unrelated individuals may be as little as 6 megabits. By comparison, a novel might have about 100 kbits of entropy.
Immune System. Phylogenetic bracketing indicates that the adaptive part of our immune system evolved during the Cambrian explosion, 525 million years ago and is shared by all vertebrates. It uses lymphocytes to recognize, attack, and remember foreign pathogens in the human body. It’s super duper complicated and I don’t understand it, but here I’m interested in its data capacity. Humans are exposed to millions of pathogens every day, and T cell receptors can “spell” at least hundreds of thousands of different shapes. Over time memory may fade, while autoimmune problems are caused by errors in the delete function. I point this out only to reinforce that, like DNA, data stored in hundreds of millions of immune lymphocytes is not necessarily data in a fully general sense, but it’s safe to estimate its overall capacity at around a gigabit of information. Unlike DNA (although employing a similar selection mechanism in miniature) an individual organism can update their immune system through infection or vaccination in as little as a couple of days, conferring greatly increased immunity for their remainder of their life. Parts of this immune system can even be transmitted to offspring through, eg, milk. A biological data storage mechanism that can react effectively within days compared to the countless deaths over hundreds of generations required for natural selection is a very valuable adaptation.
Bee dances. While not a part of our collective human evolutionary history, bee waggle dances are so damn cool I am including them anyway. Strictly speaking the waggle dance doesn’t form any kind of persistent collective cultural memory, but it does provide some insight into the navigation and learning capacity of superficially simple insects. The waggle dance encodes information about the quality, distance, and solar bearing of flowers and does so by transferring perhaps 15-20 bits of information in a repeating action that takes a few seconds to communicate. The ability of spectator bees to successfully navigate to the pollen source varies substantially among members of the hive, with some preferring to forage at random.
Brains. While the earliest vertebrates had nervous systems, they weren’t much more sophisticated than worms and probably didn’t have particularly deep thoughts. Yet many fish (including their mammalian descendants) and even a few invertebrates (particularly cephalopods) went one step further and evolved brains to integrate sensory information, coordinate motion, plan, and learn from the environment. Despite common myths, even goldfish can remember things for much longer than four seconds! Again, I’m far from an expert on brains but I will point out that birds make do with much smaller brains than many mammals, possibly due to more efficient neural structure.
Brains are much more like conventional computers when it comes to data capacity and read/write capability, compared to the evolution of DNA or the adaptations of the immune system. It’s difficult to quantify the capacity of the human brain but some estimates put it at around 2.5 petabytes. Certainly I’ve never experienced running out of space, although it seems that perhaps retrieval is more troublesome than storage. There are known cases of patients in brain surgery responding to a stimulus by reciting, verbatim, books read half a century prior and almost certainly (except in rare cases) completely “forgotten” as measured by conventional means. Human memory is contextual, constructed, and often unreliable. And yet its storage capacity is effectively infinite.
As for reading and writing, I often feel that these are major bottlenecks. Within certain contexts, humans can absorb vast quantities of information very quickly, such as an afternoon spent learning a new skill or reading a good book. Other sorts of data, less anchored to the familiar, are much harder to retain and yet, with practice, can be remembered with astonishing precision. Experts in music or chess can often recite symphonies or games move by move after only a single viewing.
In writing, a human can speak or type intelligibly at around 100 words per minute. With the assistance of intuition, context, and other people the data rate from people in positions of leadership can be increased incrementally. But 100 bits/second isn’t a whole lot, especially when most people would struggle to maintain this rate for more than a few hours at a time. On occasion I write relatively prolifically, but seldom more than 10,000 words in a day, representing a limit of about 10 kbits/day. That is, given that the information entropy of text is about one bit per word.
Speech and oral history. Jumping more fully into human prehistory, modern social structures suggest that ancient kin groups or tribes employed communication to improve their odds of survival. Here the question is how much data can be reliably transmitted from generation to generation using purely oral history. As a data storage mechanism it’s relatively flexible and able to accommodate the addition of new information, but its capacity is limited by the weakest link the chain. Who knows how many times various facts of nature were discovered by neolithic scientists only to be lost due to cultural discontinuities.
There are about 180,000 words in each of the Iliad and Odyssey, two Homeric Greek epic poems passed down orally for generations prior to being written down. We know of numerous other epic poems from this era that have not survived to the present day, though many of them were written down during the classical period and lost more recently. For comparison, the bible has about 780,000 words.
There are Australian Aboriginal stories describing events such as volcanic eruptions that have been dated to about 13,000 years ago, so we know that oral history is able to transmit information over much longer time periods than the comprehensible coherence interval. That is, after a thousand years a language typically evolves enough to be 80% similar to itself. After 13,000 years, barely 5% remains. Even though it was written only 700 years ago, Chaucer’s “Canterbury Tales” in Middle English are nearly incomprehensible to naive reading. Shakespeare is Modern English.
At typical rates of recitation, it takes about 2 hours to speak one of the 24 books of the Odyssey. If a prehistoric society tells stories on a yearly cycle, then about 2.5 million words can be spoken. Of course, some breaks might be taken, and some stories would only be told to certain people or at certain times. The quantity of information that can be reliably preserved, however, is not much more than half a dozen or so epics of similar scale to the Odyssey.
Writing. Symbolic representation of words as text in a persistent medium arose independently at least four times in Mesopotamia, Egypt, China, and Mesoamerica. Writing is useful not only because it has essentially unlimited capacity and endurance (almost), but it also enables all kinds of bureaucratic technology that is necessary for large scale organization of civilizations. As previously discussed, the read/write speeds of writing are about 100 bits/second, but because reading requires relatively little effort, a motivated scholar can read all day (and night, if they have artificial light) for years, absorbing the thoughts and ideas of hundreds of other people, even if those people are no longer alive, never spoke the same language, lived in a different culture, or were entirely imaginary.
The ancient Library of Alexandria stored between 40,000 and 400,000 scrolls at its height, in around 200BC. While undoubtedly some of them were not very interesting, as a store of information it may have contained the equivalent of as much as 100,000 books, or 10 gigabits of information. For comparison the Library of Congress has 170 million catalogued items, including 39 million books. The text only English language Wikipedia is about 20 gigabytes, or 160 gigabits (uncompressed).
The Printing Press. While a dedicated scholar could probably have read the entire collection at Alexandria in a lifetime, mass consumption and dissemination of knowledge, to say nothing of preservation of classical texts, was not really possible when all books were written by hand. Ada Palmer has a great anecdote about pre-Gutenberg books in medieval Florence costing more than a house, because creating a single one took a scribe more than a year of labor. Then came the printing press in the 1400s, whose modern form, the linotype, was phased out as recently as my childhood in favor of digital methods. Within a generation, the cost of a book was determined not by a year of skilled labor, but by the cost of paper and ink, an improvement of eventually five orders of magnitude. The printing press enabled the expansion of medieval university libraries from thousands of volumes to millions and freed most scholars from the day to day drudgery of copying text after text. Books have been written about the social changes enabled by widespread literacy. While the read/write speed for a well-resourced individual was much the same, the overall flux of knowledge through the world’s people increased by many orders of magnitude.
Computers and The Internet. While books represent a transportable, durable, and accessible form of information, they still lack several intuitive aspects of knowledge that we take for granted in thought or conversation. Beginning with the Mother of All Tech Demos, we began to see the emergence of a new knowledge medium: digital information. Mutable, storable, infinitely replicable, transportable, flexible and enabling hyperlinks, the internet represents a major improvement in the capacity of evolved life to manipulate data.
While earlier forms of rapid communication, including the postal service, telegraph, and radio advertised steady improvements in both speed and data capacity, none of these media were also durable stores of information in the way that the internet is.
In 2020, read/write speeds, download/upload speeds, processing speeds are all measured in units of gigabits per second. Podcasters and streamers can exploit data abundance to avoid spending cycles precompressing data for the written medium. Bleeding edge software engineers automate as much as possible to avoid bottle-necking their algorithms with human input that, after all, can type barely 100 words per minute and must sleep 8 hours a day. In many ways, the data has taken on a life of its own, connected to our own minds by a few tenuous strings that we are cutting as quickly as we can. In this world, I can text the entire data contents of my DNA, immune system, oral history, and the Library of Alexandria to my friend in seconds. I’m sure that’s just what they want.
Neuralink? At this point I think it’s fair to say that the capabilities of the medium have grown beyond any meaningful constraints, and the only remaining bottleneck is factory-standard human i/o capability. What evolved on the African savanna to be adequate for sensory input is not really equal to the task of enabling large scale communication and coordination, needed to build our grand future global cybernetic collective. Indeed, how much effort is expended in human organizations routing around the damage and limitations of our imperfectly evolved bootstrapped communication capabilities? A lot!
The idea is that brains are capable of impressive feats of internal computation, and so are our computers of one form or another, but the eyes, ears, and typing fingers connecting them are due for an upgrade. Neuralink is the most obvious candidate here, about which Wait But Why has a comprehensive explainer. The most recent update shows promise along the path of generating a generic, high bandwidth neural interface. I think it’s fair to say that a revolutionary increase in brain-computer data rate is a matter of time, whether years or decades, and that it will be on par with the printing press for changing the ways humans interact with their knowledge environment.
Conclusion. I think it’s tempting to review the history of data transmission as an exercise in inevitability, when in fact there’s no guarantee that any of this would occur, given the laws of physics. However, if we see ourselves as some kind of evolved entity that derives utility from manipulating data we can understand why each of these major increases in capacity did, in fact, occur, as well as chart a course to a more integrated, connected future.