Post-dictionary lexicography. An overview

Ilan Kernerman

This is a succinct update of a talk given at eLex 2017 (


  1. global digital data

The major universal trends of the last generation could be crystalized in the advent of digitization and globalization. The consequences are reflected in practically every realm of life, be it society, economy, sciences, culture, sports, and so forth, including the world of dictionaries and lexicography – giving rise to bleak concerns about the future of dictionaries besides bright hopes to extend the reach of lexicography through enhanced multidisciplination and interoperability.

Digital wise, contemporary dictionaries increasingly tend to be corpus-based, compiled using dedicated software, combining automatically generated raw entry components with refined post-editing, mobile and online, often offering a choice of titles simultaneously, and fairly easy to customize and personalize to suit users’ needs and tastes. Lexicographic practice and resources are substantially reinforced and enriched by natural language processing and other computational methodologies, such as linked and big data and knowledge systems, transforming content into lingual settings and graphs and drifting from merely machine-readable, i.e. usable also by machines, to machine-, rather than initially human-, oriented.

Going global implies strong worldwide cooperation, standardization and systemization, uniformization and unification, as well as localization. The effects on dictionaries include decrease in revenue for much of the private sector and decline of traditional publishers and brand names alongside higher and diversified participation in the lexicographic processes by the user, shifting from passive reader to interactive player, and by a range of public bodies including national language academies, multinational networks, international associations, universities and research institutes.

These trends merge and evolve into the concept of datacization, that is “the process of transforming information resources previously accessed directly by humans into resources primarily accessed by software” (Erin McKean, 2017). As part of this transformation process, dictionaries as a vehicle for representing human language insight risk becoming redundant – cast aside, swallowed up and made virtually invisible – at the margins and mercy of up-to-date products and services that integrate their precious inner lexical substance into the heart of smarter knowledge systems, tools and applications.

  1. dictionaries

Gloomy forecasts on the destiny of dictionaries have revolved since the turn of the century, mainly with regard to their passage from print to electronic media initiating novel ecosystems and necessitating the emergence of suitable strategies and adaptations. Yet, dictionaries are not dead but so far prosper in a golden age of enhanced accessibility and greater abundance, and only time will tell whether this is short-doomed disillusion leading to a dead end.

A great deal of the dictionary’s force and charm stem from its centuries-old image of messenger of wordly truth, and legacy dictionaries in particular are perceived as the most faithful and respectful conveyors of facts about language. As such, dictionaries – whether descriptive or prescriptive, author or corpus-based, or otherwise – serve to open our mind and promote our sense of security, reassurance and revelation in the world we know (or knew) and help to make sense of and safeguard from distortion, confusion and instability. Their innovativeness breeds on being conservative by nature, recording the past or present at best, and their social role and moral value naturally leap in an era of post-truth fueled by alternative facts, which constitute lies.

In his essay, Politics and the English Language (1946), George Orwell wrote: “All issues are political issues,” “When the general atmosphere is bad, language must suffer,” and “If thought corrupts language, language can also corrupt thought.” These views are no less relevant today than they were then or ever before, and by correspondence such search for true reality, relevance and power of language is at the very essence of dictionaries. The high esteem and trust they gain make state-of-the-art lexicography well placed to serve humanity at its finest, whether in the form of dictionary products or services, or embedded in new language and knowledge solutions.

  1. lexicography

Computational linguistics is a vital driving force for next developments of Artificial Intelligence, the Internet of Things and the Semantic Web, etc. It faces the challenge of weaving a delicate balance and harmony between unequal and to some extent contradictory partners: computing, based on mathematics, with the principle that 1+1=2; and language, with its infinite variation, where one plus one are often not equal to two. Both elements appear to be opposed by nature but may be leveraged to complement each other.

The supreme edge of quality lexicography stems from deep systemic and unwavering analysis of language, deciphering and mapping its DNA, as also represented in dictionaries. That inner lexical substance is what computational linguists and other scientists, researchers and developers seek in lexicography, namely not the dictionary body but spirit. Moreover, to be well represented in dictionaries any lexicographic matter must adhere to their specific formulations and arrangements, some of which, though, might be trivial or even negative for other technological implementations, e.g. abbreviations rather than full forms, typesetting or style indications, or space-saving technics in printed media. The purer, more minute, orderly and mathematically arranged such substance would be, the higher its value – hence datacization.

Lexicography can go far beyond enjoying the fruit of “state-of-the-art technologies and methods for automating the creation of dictionaries” (aim of eLex 2017) to reciprocally nourish those and other technologies and methods, as with word processing, search engines, machine translation, information retrieval, text mining, deep learning, and more to come. To do this it will become digital native and in tune with other innovations and domains, e.g. interplay with phraseology and wordnets, add graph format like RDF (Resource Description Framework) for data linking onto hierarchical XML (Extended Markup Language) or JSON (JavaScript Object Notation) structures, upgrade search capacities and implementation models, and continue to adapt to changes and devise more suitable forms. Dictionaries, like lexicography, may establish a give-and-take relation, not just take, with technology to survive and thrive.