Language Birth

Karson Elmgren

Since 1960, the world has lost hundreds of languages — and gained thousands.

In October 1992, a Circassian man by the name of Tevfik Esenç passed away at the age of 88. Esenç had been, as his gravestone attested, “the last person able to speak the language they called Ubykh.” An extreme example of the remarkable languages of the Caucasus mountains, Ubykh was articulated with all of 80 different consonants, compared to English’s 24. Regarding Ubykh, linguist John Colarusso noted that “any rigorous account of human phonetic perceptual capacity will have to take into account this precious marvel.” We know something particular about the human mind in large part thanks to Ubykh. And we know about Ubykh in large part thanks to Tevfik Esenç.

Over the course of decades before his death, Esenç had worked with a rotating cast of linguists to document his marvelously exacting tongue, and much of the culture and oral history of his forebears. He contributed numerous audio recordings, some of which can be found on the internet today.

Yet Ubykh is merely one of hundreds of languages that have withered and expired in recent decades. About 3000 of the world’s 7000 or so languages are endangered. A highly disproportionate fraction of now-extinct languages have gone silent just since 1960, and most were not blessed with a Tevfik. Major global languages like English, Spanish, Mandarin and Hindi continue to accrue speakers, while smaller ones shrivel.

But the same fundamental forces that cleft Proto-Indo-European, Proto-Sino-Tibetan, or Proto-Austronesian into their thousands of offspring today are still around. Languages, after all, do not only live and die. They are also born.

Transportation and telecommunications technology have pierced languages’ protective bubbles. Where mountains and jungle used to shelter divergent dialects, roads, airstrips and satellite internet now bring in major regional, if not world, languages. But at the same time, technology has allowed for a vast proliferation of cultural niches, defined not by geographic boundaries but by academic disciplines, industries and social media platforms. The increasing specialization of the global economy has created fractals of expertise in hypercomplex domains across science, technology, humanities, skilled trades and more, and with them terms and concepts which would have been unimaginable for even recent generations of our ancestors.

Online, subcultures have proliferated and mutated from yesterday’s IRC chats and listservs to today's Discords, subreddits and obscure corners of social media platforms — all with their own local vernacular. After all, people everywhere simply cannot stop creating inside jokes and memes, letting fly slips of the tongue and typos, and trying to make “fetch” happen. Meanwhile, the internet has also enabled a prolific flourishing of a rather new kind of human linguistic activity: the intentional creation of new linguistic diversity, on a grand scale, in the form of constructed languages, also known as “conlangs.”

Perhaps it’s more accurate to say that linguistic diversity is not so much collapsing as radically transforming, with decimation on some dimensions coexisting with explosive growth on others. The losses are relatively uncontroversial, and have attracted wide attention with good reason. But the gains, I believe, are comparatively underappreciated, or even unfairly disdained, despite being of an arguably similar humanistic value.

In Auraicepit na n-Éces, a medieval text written in Old Irish, the author advances an argument that, as luck would have it, the very same language in which they write was in fact cobbled together from the linguistic rubble of the Tower of Babel, and preserved pristine and unchanged since then. Although the Irish no longer assert that theirs is the Biblically perfect language, they do make a much more defensible claim to possessing a rich vocabulary of words related to traditional Irish lifestyles, including thirty-two for different kinds of fields.¹

This kind of lexical diversity is often feted as the expression of important, unique parts of human experience which are the sole province and property of some particular tongue. Any newspaper reader with a passing interest in language will certainly have heard that “the Eskimo have 50 words for snow.” And any with more than a passing interest in language will likely have learned to scoff at this claim, long debunked as the “Great Eskimo Vocabulary Hoax” — which has more recently been firmly rebunked by careful work from anthropologist Igor Krupnik, who has found Inuit languages do indeed have dozens of distinct words for snow and up to hundreds for sea ice. Of course they would — when any particular topic, whether farming or freezing, is so integral to the life of a community, sooner or later they will develop the vocabulary to speak precisely and efficiently about it. One Army Corps of Engineers geophysicist noted that some Inuit elders know “as much about snow as [he] knew after 30 years as a scientist.” Naturally, they would accumulate, refine and pass on that knowledge using the rich lexicon available to them in their heritage languages.

It’s less lauded by literary tastemakers that the tech bros have approximately five million words for “software.” This abundant vocabulary and its associated expertise are, to say the least, in no danger of dying out. In fact, it’s documented ad nauseum across the internet, with plentiful backups. But somehow it’s not only its more assured foothold in the annals of civilization that makes people view it differently. When Irish farmers and Inuit seamen talk shop, we revere their arcane eloquence. But when it’s a 27-year-old in California with a mid-6-figure income, we sneer. Generations of readers of Herman Melville’s whale of a tale have delighted in the copious sailor’s jargon, the cant of the premier American capitalist enterprise of the day. If an author today were to use the terminology of AI similarly, they would have to be mighty careful not to be called cringe.

Yet, linguistically, the relationship between aqilokoq and piegnartoq, or between reidhlean and cathairin, is of the same nature as that between hypervisor and operating system. In terms of lexical diversity, science and technology are a domain of breakneck growth. If the software industry doesn’t give you any frisson, consider linguistic contributions from mathematics such as tesseract or eigenvector. If math isn’t your jam, maybe try Umwelt or exaptation from biology, or liminality or habitus from sociology. Even many people with no background in the sciences have taken up words from cybernetics and statistics like feedback and correlation. The list goes on. And on, and on.

Now-famous psycholinguistics research has found that speakers of Russian, which has two distinct, unrelated words for light and dark blue, recognize the difference between the two colors slightly faster than speakers of English, which groups both together as simply “blue.” More evidence has recently accumulated for the idea that, speaking broadly, greater ease of putting a name to a concept can facilitate more accurate recognition of it. James Flynn, famous for studying the secular trend of increasing IQ scores over time in industrial societies, has speculated that part of the credit goes to the spread of the “scientific ethos,” and its vocabulary and taxonomies in particular.² A language, in some sense, is a set of cognitive tools. Each toolbox contains a set of components — vocabulary — which can be pieced together into exquisite semantic constructions. New components are constantly being created by human cultural ingenuity, and old ones repurposed or re-engineered.

When specialists and subcultures create jargon and slang, they are engaging in exactly the same process of cultural accretion that produced our Irish fields and Inuit ice. These communities can often be as large or larger than the entire population of smaller languages, many of which have never had more than a few thousand speakers. And linguistic innovations are much more likely to break containment and contribute to the speech of millions when they arise in English or Mandarin than in Tuyuca. The massive Corpus of Historical American English shows, for instance, that the diversity of word types used in the news increased approximately threefold from the mid-1800s to the turn of the millennium.³ More broadly, lexical diversity has increased over time in many world languages.⁴

This leaves still to be discussed the other category of tools: grammar. Russian linguist Roman Jakobson famously summarized grammatical diversity by saying that languages “differ essentially in what they must convey and not in what they may convey.” In some languages, such as Tuyuca and Bulgarian, you must convey the source of information you share — for example, whether you saw that crotchedy old neighbor shoot your dog, heard a rumor that he did, or simply reckon that it must be true. In others like Guugu Yimithirr and Tzeltal, a speaker must specify the cardinal direction of location or movement; there are no words for the relative directions “left” and “right.” That means that you have to remember, when recounting an epic fishing trip from 20 years ago, whether the sea monster pulled you over the gunwale to the west or to the east (and whether your friend reached out to grab your foot with her south or her north hand).⁵

Grammar changes too, but usually more slowly than vocabulary. Neither English nor Russian is likely to pick up evidential markers like Bulgarian, nor develop ejective consonants like Ubykh. In general, the world is likely indeed moving towards a less diverse set of grammatical structures in common use. However, the same growth in subcultures and specialized professional communities that leads to vocabulary growth can also create novel affordances that would be considered shocking and fascinating if they were part of a language’s grammar. For instance, in the forecasting community that has arisen in recent years around platforms like Metaculus, a prediction about the future must be conveyed with a numeric estimate of probability and an end date by which the proposition can definitively be judged true or false. To do otherwise would be in some sense almost ungrammatical, not quite a complete and fully formed utterance, and eliciting a similar twinge of discomfort among the natives as an unconjugated verb. If Forecasterese underwent enough sound changes and lexical divergence to pass the sniff test as being a different language from English, linguists might marvel that in a particular subjunctive mood of the future tense, to speak Forecasterese is to be forced to choose from over a hundred words for varying degrees of “likely” or “unlikely.”

Granted, this is more properly considered cultural diversity than grammatical diversity. It’s not as compact as most grammatical structures, it’s not as ingrained and automatic, and it’s not as strictly required by listeners even within the community. But there is also one area where wholly new grammars are sprouting like mushrooms after the rain.

In the early 2000s, Canadian polyglot Sonja Lang was depressed. In a flash of “sudden intuition,” she sketched the outline of a new language. An extremely simple language, in the hopes that a simple and straightforward medium of expression could help bring clarity and levity to her thinking.⁶ Soon thereafter, she published her work on the internet, and Toki Pona entered the world.

Toki Pona has only 120 words. None of them change form, in any way, in any context. The grammar is nearly the bare minimum — specified word order, and a few particles to denote a predicate, a direct object, subordination, negation. Interpreting any utterance takes a lot of context clues, or a lot of follow-up questions. This participatory experiment in parsimony has attracted thousands of learners around the world, who make up a highly active speech community on platforms such as Discord. When I asked some of these folks what they liked about Toki Pona, some pointed out that the need to ask questions to determine meaning led to better conversations, or said that speaking Toki Pona came with an assumption of politeness and good faith, because that is the intent with which the language was designed. One Toki Pona speaker told me they started learning the language because they struggled with picking out only the relevant details when trying to summarize. Learning Toki Pona, in which rendering detail is nigh impossible, helped them think more carefully about what pieces of information really mattered.

Conlangs are typically regarded as playthings for the uncoolest possible variety of nerd. J.R.R. Tolkien referred to conlanging as his “secret vice.” It would not score you points in a typical happy hour anywhere from London to Los Angeles to admit to learning a conlang, or — much worse — creating one yourself. Many people have a sense that the loss of a natural language is something poignant — at least as regrettable, perhaps, as the extinction of a species of small frog whose ecological niche would nevertheless be easily filled by one of his amphibian brethren. But nobody seems to think that constructing a language is like bioengineering a species of frog de novo as a living, hopping, ribbiting part of the human cognitive ecosystem. The more common view is that conlangs are either exercises in comically failed utopianism (Esperanto’s sad and somewhat unfair fate in public opinion), or embarrassing frivolities, like an over-zealous passion for cosplay. But I believe conlangs do represent new linguistic diversity in some significant senses. Some of them much more so than others.

John Quijada is an ambitious man. For several decades now, he has sought to create a language that is maximally unambiguous while being maximally concise. His creation has gone through several iterations now, but is generally known by the name of the first, Ithkuil. I have studied dozens of languages to a great or lesser degree, ranging from Turkish to Classical Tibetan. I speak Mandarin Chinese and Russian fluently, in addition to my native English. Surveying the grammar of a new language at this point in my life can be a source of sublime joy, but rarely carries any sort of shock. So I am speaking from a position of some experience when I say that merely glancing at the basics of Ithkuil’s grammar fills my heart with fear.

Ithkuil does not have nouns or verbs, but rather “formatives,” which inhabit a spectrum between noun-like object and verb-like process. The five evidentials of Tuyuca are no match for Ithkuil’s exacting requirements to specify between nine different types of “Validation” denoting the evidential basis of an assertion, or between eight other types of “Illocution” indicating what other kind of speech act the sentence represents, and much more besides. Quijada provides an example, with references to the relevant grammatical categories: “merely saying (or thinking) that it’s raining outside would require a hypothetical Ithkuil speaker to consider the evidential source of the information (direct observation? hearsay? inference?) and its reliability (Validation), the pattern and timing of the raindrops (Phase), the purpose/intent of the utterance (Sanction), whether the rainfall is being considered as a gestalt versus a sequence of discrete componential events (Configuration), whether the context of the thought/utterance is descriptive, purposefully important, metaphorical, or a component of a holistic situation (Context), and so on.”

In truth, Ithkuil is more like a philosophical system translated directly in morphology. It, too, has attracted a cult following, including among a certain set of transhuman-oriented self-optimizers hoping that learning to speak this language fluently would accelerate their very thought. This project has met with limited success. It seems that the limits of human cognition are such that one cannot simply speak Ithkuil and think faster. We are apparently doomed to think at mere mortal speeds and speak Ithkuil very, very slowly. As in, 10 minutes to compose a sentence slowly. No one, not even Quijada himself, has been able to do much better.

But although Ithkuil lacks lively craic, it has not failed to inspire some insights.⁷ As explained by one enthusiast, “using Ithkuil, we can see things that exist but don’t have names, in the same way that Mendeleyev’s periodic table showed gaps where we knew elements should be that had yet to be discovered.” This is indeed a natural result of the intricate compositional structure of Ithkuil grammar. Any word, once constructed, can be inflected across any of a dozen dimensions. The word for “gawk” in Ithkuil would be a variant of the word for looking, with modifications to specify a certain degree of surprise and cultural impropriety. Those modifications could then be ported over to any other verb to indicate an action with similar connotations. Or, one could be varied to imagine a type of gawking that is entirely appropriate. Once asked to conjure an example of a wholly new concept from his language on the spot, Quijada suggested ašţal — “that chin-stroking moment you get, often accompanied by a frown on your face, when someone expresses an idea that you’ve never thought of and you have a moment of suddenly seeing possibilities you never saw before.”

The art of conlanging highlights another particular kind of value of linguistic diversity, at least for connoisseurs: the purely aesthetic. Whether or not German or Hopi or Kinyarwanda encode any useful insights or provide any unique affordances for human thought, one can at least say that they clearly each have a vibe, a flavor, a particular timbre and melody all their own. One can play Bach on any well-tempered clavier, but there is a difference between hearing his notes on the piano and the harpsichord, and it’s wonderful to have both. Part of the appeal of conlangs from Toki Pona to Ithkuil to Tolkien’s Quenya and beyond is their unique Sprachgefühl.

Aesthetics are one area where conlangers are pioneering linguistic innovation quite unlike that of prior eras. The most clichéd kind of project in conlanging is something like an Elvish lush with mellifluous liquids and sibilants, an Orkish laden with pharyngeals and glottal stops, or some keysmash-y, ostensibly alien tongue. These typically trod little terra incognita. But the more original creations can be riotously novel, like Adam Aleksic’s bird, gorilla and dolphin conlangs. Philosophical conlangs like Ithkuil or Toki Pona, besides the mere sounds of their phonology, are instances of an art form rooted in something like an aesthetics of linguistic cognition itself. Even Tolkienesque derivative creations are so popular precisely because Tolkien’s linguistic aesthetics are simply so powerfully enchanting for so many.

Since developing conlangs is a solitary activity, it’s hard to know how many of them exist. The Conlang Database lists about 1500, but Margaret Ransdell-Green, President of the Language Creation Society, tells me that a conservative total estimate would be double that — 3000 languages, pulled out of thin air by human creativity. Nothing to sneeze at, compared to the 7000 or so that come down to us from history.

Of course, there’s a natural question of how “real” this diversity is. It exists much more on paper than in practice. Precious few conlangs have any, much less many, speakers.⁸ The creation of a new conlang, even if a few people end up speaking it fluently, cannot offset the loss of languages that had been passed down between countless generations of parents and children. However, in my view, these odd artworks have a similar kind of wonder to them as the likes of Ubykh and Tuyuca, French and Mandarin. Consider: they provide affordances of expression that can be at least as intriguing and illuminating as natlangs; they bring together communities for whom a particular language has emotional resonance; they also offer opportunities for broadening scientific understanding. Studies on native speakers of Esperanto have shed light on how a language learned natively only from non-native speakers evolves. Viossa, a conlang created entirely through unplanned, emergent interaction between speakers with no shared language to start, parallels the development of creole languages from pidgins and exhibits rich variation among speakers. The fact that even John Quijada has not been able to fully master any version of his masterpiece gives us at least some vague indication of what might be the outer limits of manageable morphological complexity. Or perhaps some latter-day Cicero will come along who will extemporize elegantly in Ithkuil and prove us wrong.

While much may be gained in our times, much will certainly be lost. Although conlangs, jargon and slang provide conduits to belonging and community of a certain kind, it is of a shallower and weaker nature than the role of a heritage language for a given ethnolinguistic group. What’s worse, in many cases, languages are not just dead or dying — they have been intentionally killed.⁹ It is some silver lining that technology is also helping endangered linguistic communities around the world perceive and grasp the possibility of documentation and revitalization of their native languages.

Even if we accept the contribution, such as it is, of conlangs to linguistic diversity, we must acknowledge that natlangs — conlangers’ slang for natural languages — provide much of the most valuable grist for the mill. John Quijada describes Ithkuil as “inspired by such obscure linguistic sources as the morpho-phonology of Abkhaz verb complexes, the moods of verbs in certain American Indian languages, the aspectual system of Niger-Kordofanian languages, the nominal case systems of Basque and the Dagestanian languages, the enclitic system of Wakashan languages, the positional orientation systems of Tzeltal and Guugu Yimidhirr [sic], the Semitic triliteral root morphology,” as well as “the hearsay and possessive categories of Suzette Elgin’s Láadan language” — another pioneering philosophically-motivated conlang. Tolkien’s much-beloved Quenya and Sindarin were themselves also heavily influenced by Finnish and Welsh.

We risk losing precious gems in the lexical dimension as well. No subcultures of today are likely to reinvent the particular cognitive instruments that prior generations created. Traditional knowledge passed down over generations may or may not be encoded in certain lexical items, but it certainly exists in the heads of those who have received it. Language documentation lets us preserve Inuit elders’ understanding of the arctic environment, or Amazonian natives’ knowledge of the medicinal properties of local flora. Dictionaries and grammars are excellent, but stopping there leaves so much on the table. People are more than just their speech.

On the other hand, we should acknowledge that preserving languages is not costless. The tradeoffs are less stark than they may seem — learning and maintaining a heritage language that gives you a sense of belonging and dignity is unlikely to be a net drag on your productivity, and a third language is always easier than a second.¹⁰ And yet, even partial language barriers do impose transaction costs that have real economic consequences.¹¹ Of course we all want to preserve linguistic diversity, but both policy and personal choices become more complicated when it might ultimately be at the cost of a rural family in the developing world having a refrigerator at home, or a hospital in the neighborhood. There are clear reasons to promote a common language, and a certain degree of inevitability in what that means for smaller ones.

What has been lost cannot be regained. But what whispers still float between lips and ears can be recorded. Communities who seek to strengthen their tongues can be supported. Technology can help with both, and should be used to the utmost. With luck, some may even be able to resurrect the dead. Meanwhile, we should celebrate what we gain, and commend those who devote their precious days to fostering the flourishing of the world’s great vehicles of poetry and song, exaltation and laments, prayers and curses, epic myths and, of course, dank memes. And perhaps look forward to a day when humanity’s linguistic frontiers might extend even beyond humanity itself.

And, apparently, 50 for penis, though that fact didn’t make it into the book title.
James Flynn, What is intelligence? Beyond the Flynn Effect (Cambridge University Press, 2007), 29.
Haarald Baayek, et al.,“The Ecclesiastes Principle in Language Change” in The Changing English Language: Psycholinguistic Perspectives, ed. Marianne Hundt et al. (Cambridge University Press, 2017), 44.
Alexander Koplenig and Carolin Muller-Spitzer point out that although the increase in lexical diversity is correlated with population growth, the relationship is confounded in that both lexical diversity and population have increased over time. They note that rather increase in lexical diversity “could reflect the fact that onomasiological needs increase with the complexity of modern societies. Or put differently, new ideas and new technologies need new designations in order to efficiently communicate about related concepts.” Alexander Koplenig and Carolin Muller-Spitzer, “Population Size Predicts Lexical Diversity, but so Does the Mean Sea Level – Why It Is Important to Correctly Account for the Structure of Temporal Data,” Plos One, March 3, 2016.
A native speaker of Guugu Yimithirr has in fact been documented consistently recollecting the direction a boat capsized in two retellings, two years apart, of a story from decades prior. Even his hand gestures were consistently aligned, despite him being facing a different direction on the two occasions. And the direction he remembered is consistent with prevailing winds in that area at that time of year. Guy Deutscher, Through the Language Glass (Picador, 2010), 174-175.
In her own words, she was inspired by how hunter-gatherers might speak, interacting directly with nature. She was apparently unaware how such groups in fact converse after untold generations interacting with nature together — with meticulous evidentials, hundreds of noun classes, thickets of verbal prefixes, and esoteric avoidance registers among other features of languages like Tuyuca and Warlpiri.
It has also been used in some music, by John Quijada’s band Kaduatán (Ithkuil for “wayfarers”).
Though many conlangers do use their conlangs actively, not every conlanger is fully fluent in even one, much less all, of their creations.
A particularly lamentable history is that of forced separations of indigenous children from their communities and placement in white foster families in a crash assimilation program in the United States, Canada and Australia. In many places, punishment and discrimination for speaking native languages continue even today.
So no, I do not regret spending many hundreds of hours in high school learning Swedish, the language of my great-grandparents, when I could have been learning programming instead. I think.
Some research has suggested that fully half of the difference in economic growth between Japan and Tanzania between 1960 and 1990 was explainable by the higher ethno-linguistic diversity of Tanzania.

Previous
AI After Drug Development

Next
Rethinking High-School Science Fairs

Further Reading

A Brief History of the History of Science

The Institute Behind Taiwan’s Chip Dominance