STATE OF LEXICOGRAPHY FALL 2018 ILAN KERNERMAN | Dictionary Society of North America

Post-dictionary lexicography. An overview

Ilan Kernerman

This is a succinct update of a talk given at eLex 2017 (https://youtu.be/yA3yg6wO5M8).

global digital data

The major universal trends of the last generation could be crystalized in the advent of digitization and globalization. The consequences are reflected in practically every realm of life, be it society, economy, sciences, culture, sports, and so forth, including the world of dictionaries and lexicography – giving rise to bleak concerns about the future of dictionaries besides bright hopes to extend the reach of lexicography through enhanced multidisciplination and interoperability.

Digital wise, contemporary dictionaries increasingly tend to be corpus-based, compiled using dedicated software, combining automatically generated raw entry components with refined post-editing, mobile and online, offering a choice of titles simultaneously, supported by extensions and add-ons, and fairly easy to customize and personalize to suit users’ needs and tastes. Lexicographic practice and resources are substantially reinforced and enriched by natural language processing and other computational methodologies, such as linked and big data and knowledge systems, transforming content into lingual settings and graphs and drifting from merely machine-readable, i.e. usable also by machines, to machine-, rather than initially human-, oriented.

Going global implies strong worldwide cooperation, standardization and systemization, uniformization and unification, as well as localization. Some of the impacts on dictionaries include decrease in revenue for much of the private sector and decline of publishing houses and brand names alongside higher and diversified involvement in the lexicographic processes by the user, shifting from passive reader to interactive participant, and by a range of public bodies including national language academies, multinational networks, international associations, universities and research institutes.

These trends merge and evolve into the concept of datacization, that is “the process of transforming information resources previously accessed directly by humans into resources primarily accessed by software” (Erin McKean, 2017). As part of this transformation process, dictionaries as a vehicle for representing human language insight risk becoming redundant – cast aside, swallowed up and made virtually invisible – at the margin of up-to-date products and services that incorporate their precious inner substance at the heart of smarter knowledge systems, tools and applications.

dictionaries

Gloomy forecasts on the destiny of dictionaries began revolving before the new millennium, mainly with regard to their passage from print to electronic media initiating novel ecosystems and adapting suitable strategies. Yet, dictionaries are not dead but so far prosper in a golden age of empowered accessibility and abundance, and only time will tell whether this is short-doomed disillusion leading to a dead end.

A great deal of the dictionary’s force and charm stems from its centuries-old image of messenger of wordly truth, and legacy dictionaries in particular are perceived as most faithful and respectful detectors and conveyors of facts about language. As such, dictionaries – whether descriptive or prescriptive, author or corpus-based, or otherwise – serve to open our mind and promote our security, reassurance and revelation in the world we see and know, and help to make sense of and safeguard from distortion, confusion and instability. Their innovativeness breeds on being conservative by nature, recording the past and present at best, and their social role and moral value naturally leap in an era of post-truth fueled by alternative facts, which constitute lies.

In his essay, Politics and the English Language (1946), George Orwell wrote: “All issues are political issues,” “When the general atmosphere is bad, language must suffer,” and “If thought corrupts language, language can also corrupt thought.” These views are no less relevant today than they have been then or before, and by correspondence such search for true reality, relevance and power of language is at the very essence of dictionaries. The high esteem and trust they possess make state-of-the-art lexicography well placed to serve humanity at its finest, whether in the form of dictionary products and services or embedded in new language, knowledge and learning solutions.

lexicography

Computational linguistics is a vital driving force for next developments of Artificial Intelligence, the Internet of Things and the Semantic Web, etc. It faces the challenge of weaving delicate balance and harmony between unequal and to some extent contradictory partners: computing, based on mathematics, with the principle that 1+1=2; and language, with its infinite variation, where one and one are often not equal to two. Both elements appear to be opposed by nature but are leveraged to complement each other.

The supreme edge of quality lexicography is born in deep systemic and unwavering analysis of language, deciphering and mapping its DNA, as traditionally represented in dictionaries. That inner lexical substance, the lexical self, is what computational linguists and other scientists, researchers and developers seek in lexicography, namely not the dictionary body but spirit. Moreover, to be well represented in dictionaries any lexicographic matter must adhere to their specific formulations and arrangements, some of which, though, might be trivial or even negative for technological integration, e.g. abbreviation rather than full form, typesetting and style indications or space-saving technics in printed media. The purer, more minute, accurate and better arranged such substance would be, the higher its value – hence datacization.

Lexicography may enjoy the fruit of “state-of-the-art technologies and methods for automating the creation of dictionaries” (aim of eLex 2017) but also reciprocally nourish those and other technologies and methods, as with word processing, search engines, machine translation, information retrieval, text mining, deep learning, personal assistants, and more to come. To do this it should become digital native and in tune with other disciplines and domains, e.g. interplay with phraseology, wordnets and digital humanities, match up graph format like RDF (Resource Description Framework) and hierarchical XML (Extended Markup Language) or JSON (JavaScript Object Notation) structures, upgrade search capacities and update implementation and dissemination models, continuing to adapt to change, rethink and reinvent itself. Dictionaries, like lexicography, can develop give-and-take relations, not just take, with technology to survive and thrive.