STATE OF LEXICOGRAPHY SPRING 2019: WIN CARUS

A lexicographic avant-garde

Win Carus

The Russian Formalist Viktor Shklovskii argues in his Theory of Prose (1929) that at any given time a literary system contains three competing and coexisting generations: the old-timers, the central trend, and the avant-garde. This note will briefly present two representatives of the lexicographic avant-garde — specifically, lexicographic knowledge bases — that might provide some new directions for lexicography in general.

[V. Shklovsky, Theory of Prose, Dalkey Archive Press (1991).]

The two knowledge bases discussed here are the English WordNet (https://wordnet.princeton.edu/) and the NIH Medical Subject Headings (MeSH) (https://www.nlm.nih.gov/mesh/).

It is helpful to position these knowledge bases in the spectrum of structured lexical resources. In particular, consider the various types of knowledge bases discussed by Pieterse and Kourie (2014) as “knowledge organization systems” (KOSs):

“In our classification of KOSs we consider the inherent structure of classifications. Classes of KOSs are characterized by the progressive addition of features that enhance the capabilities offered by these KOSs. The addition of these features contributes to their increased complexity. We call these classes of KOSs lists, taxonomies, lattices, thesauri and ontologies….Lists are found at the simplest end. The addition of hierarchical relationships in taxonomies enables more advanced retrieval processes which can make use of broader and narrower terms to improve recall and precision respectively. The next class of KOSs is lattices. These are hierarchical structures encoded as formal concept lattices. This formalization allows for computations that have the potential to improve the precision and recall when information is retrieved using these computations. A further enhancement offered by thesauri is the inclusion of semantic relationships beyond hierarchical relationships. These relationships are intended to contribute to the reasoning power that is to be built into applications that use thesauri. The final enhancement extends KOSs beyond controlled vocabularies to ontologies. This enhancement entails two things: firstly, the addition of inference rules in the form of meta-relations, constraints, conditional rules or production rules, and secondly, the formalization of its content.”

The five basic KOS types are, therefore:

  1. List: “A list is a linearly organized collection that contains items and their attributes.”
  2. Taxonomy: “A taxonomy is a hierarchically organized collection that contains items and their attributes.”
  3. Lattice: “A lattice is a hierarchically organized collection that contains items and their attributes in which these items and their attributes are formally presented as a concept lattice.”
  4. Thesaurus: “A thesaurus is a collection that contains items within a selected domain. A thesaurus allows for the specification of the attributes of items as well as the definition of equivalence, hierarchical, associative and/or contrast semantic relations between its items.”
  5. Ontology: “An ontology is an electronically stored collection that comprises a thesaurus combined with a set of inference rules.”

[V. Pieterse and D.G. Kourie, “Lists, Taxonomies, Lattices, Thesauri and Ontologies: Paving a pathway through a terminological jungle”, Knowledge Organization 41(2014) No. 3., pp. 217-229.]

Following this scheme, the classical print dictionary would be a KOS “list” of headwords (orthographic forms) and their associated attributes, typically the following:

orthographic headwords (and variants)
pronunciations (and variants)
parts of speech
inflections (and variants)
etymologies
senses (ordered in some principled way)
labels (register and domain)
usage examples
morphological derivations (including part-of-speech labels and possibly other decorations)

but without any further semantic relations.

These further semantic relations are the first obvious contribution that KOSs such as the English WordNet and NIH Medical Subject Headings (MeSH) can supply.

The English WordNet (https://wordnet.princeton.edu/) is a broad-coverage English thesaurus-like KOS of nouns, verbs, adjectives and adverbs organized into cognitive synonyms expressing a distinct concept (“synsets”). Each synset has an associated “gloss” (brief definition). Polysemous words are represented by the participation of a given lexeme in multiple synsets. Synsets are interrelated by semantic and lexical relations (decorated links). Most of these relations are within each of the WordNet part-of-speech-based “subnets”; there are relatively few relations between subnets.

The nine top-level WordNet noun synsets (“unique beginners”) are:

Abstraction_1
Act/Human_Action/Human_Activity
Entity/Something
Event_1
Group/Grouping
Phenomenon_1
Possession_1
Psychological_Feature
State_1

The relations between WordNet synsets are:

hypernym/hyponym: nouns; expresses an is-a relationship
Car hypernym Vehicle / Vehicle hyponym Car

instance hypernym/instance hyponym: nouns; expresses an is=a relationship involving instances
Honda instance hypernym Car / Car instance hyponym Honda

part holonym/part meronym: nouns; expresses a part-of relation
Tire part holonym Car / Car part meronym Tire

substance holonym/substance meronym: nouns; expresses part-of relation
Flour substance holonym Bread / Bread substance meronym Flour

member holonym/member meronym: nouns; expresses part-of relation
Senator member holonym Senate / Senate member meronym Senator

troponym: verbs; expresses specificity (similar to hypernymy in nouns)
Walk troponym Move

entailment: verbs; if the action described by A is true, then the action described by B must necessarily be true
Snore entailment Sleep

similar/antonym: adjectives and nouns; expresses polar opposites
Love antonym Hate

pertainym: adjectives and nouns; expresses a derivational relation
Criminal pertainym Crime

English WordNet data can be downloaded in several formats: an application-specific format (for Windows and Unix); “stand-off” files that provide additional semantic information not found in the application; and in Prolog format (https://wordnet.princeton.edu/download/current-version).

For more information on the English WordNet, see:

Fellbaum, Christiane (2005). “WordNet and wordnets.” In: Brown, Keith et al. (eds.), Encyclopedia of Language and Linguistics, Second Edition, Oxford: Elsevier, 665-670.

The NIH Medical Subject Headings (MeSH), on the other hand, is a little lower on the KOS spectrum. It is a comprehensive taxonomically organized medical controlled vocabulary. MeSH terms are used principally for categorizing, searching and organizing the biomedical literature. For example, the Pubmed database and the MedlinePlus website both use MeSH codes.

MeSH differs from the English WordNet not only in being a domain-specific KOS, but also because it is organized as a “multiarchy”. All MeSH terms are placed within one or more term hierarchies. The top-level terms of these hierarchies are:

Anatomy [A]
Organisms [B]
Diseases [C]
Chemicals and Drugs [D]
Analytical, Diagnostic and Therapeutic Techniques and Equipment [E]
Psychiatry and Psychology [F]
Biological Sciences [G]
Physical Sciences [H]
Anthropology, Education, Sociology and Social Phenomena [I]
Technology and Food and Beverages [J]
Humanities [K]
Information Science [L]
Persons [M]
Health Care [N]
Publication Characteristics [V]
Geographic Locations [Z]

MeSH terms are densely decorated. Here are the ASCII records for “Exercise” and “Physical Fitness”:

*NEWRECORD
RECTYPE = D
MH = Physical Fitness
AQ = HI PH PX
ENTRY = Fitness, Physical
MN = G11.427.685
MN = I03.450.642.845.054.800
MN = N01.400.545
FX = Exercise
FX = Exercise Test
FX = Exercise Therapy
FX = Physical Endurance
MH_TH = NLM (1966)
ST = T078
MS = The ability to carry out daily tasks and perform physical activities in a highly functional state, often as a result of physical conditioning.
MR = 20180116
DA = 19990101
DC = 1
DX = 19660101
UI = D010809

*NEWRECORD
RECTYPE = D
MH = Exercise
AQ = PH PX
PRINT ENTRY = Aerobic Exercise|T040|T056|NON|NRW|UNK (19XX)|880608|abbcdef
PRINT ENTRY = Exercise, Aerobic|T040|T056|NON|NRW|UNK (19XX)|880608|abbcdef
PRINT ENTRY = Exercise, Isometric|T040|T056|NON|NRW|UNK (19XX)|880608|abbcdef
PRINT ENTRY = Exercise, Physical|T040|T056|NON|EQV|UNK (19XX)|880608|abbcdef
PRINT ENTRY = Isometric Exercise|T040|T056|NON|NRW|UNK (19XX)|880608|abbcdef
PRINT ENTRY = Physical Activity|T040|T056|NON|EQV|NLM (2003)|020128|abbcdef
ENTRY = Acute Exercise|T040|NON|NRW|NLM (2017)|160325|abcdef
ENTRY = Exercise Training|T040|NON|REL|NLM (2017)|160526|abcdef
ENTRY = Activities, Physical
ENTRY = Activity, Physical
ENTRY = Acute Exercises
ENTRY = Aerobic Exercises
ENTRY = Exercise Trainings
ENTRY = Exercise, Acute
ENTRY = Exercises
ENTRY = Exercises, Acute
ENTRY = Exercises, Aerobic
ENTRY = Exercises, Isometric
ENTRY = Exercises, Physical
ENTRY = Isometric Exercises
ENTRY = Physical Activities
ENTRY = Physical Exercise
ENTRY = Physical Exercises
ENTRY = Training, Exercise
ENTRY = Trainings, Exercise
MN = G11.427.410.698.277
MN = I03.350
FX = Exercise Movement Techniques
FX = Exercise Therapy
FX = Physical Exertion
FX = Physical Fitness
FX = Sports
MH_TH = NLM (1989)
ST = T040
ST = T056
AN = restrict to humans: for animals use PHYSICAL CONDITIONING, ANIMAL; EXERCISE THERAPY & EXERCISE TEST are also available; includes body building unless article specifies WEIGHT LIFTING; do not confuse with PHYSICAL EXERTION
PI = Exertion (1966-1988)
PI = Physical Fitness (1966-1988)
MS = Physical activity which is usually regular and done with the intention of improving or maintaining PHYSICAL FITNESS or HEALTH. Contrast with PHYSICAL EXERTION which is concerned largely with the physiologic and metabolic response to energy expenditure.
OL = search EXERTION & SPORTS 1966-74; use EXERTION to search EXERCISE, PHYSICAL 1976-88; use ISOMETRIC CONTRACTION to search EXERCISE, ISOMETRIC 1984-88
PM = 89; was see under EXERTION & SPORTS 1963-74; EXERCISE, ISOMETRIC was see under EXERTION 1977-83, was see ISOMETRIC CONTRACTION 1984-88; EXERCISE, PHYSICAL was see EXERTION 1976-88
HN = 89; was see under EXERTION & SPORTS 1963-74; EXERCISE, ISOMETRIC was see under EXERTION 1977-83, was see ISOMETRIC CONTRACTION 1984-88; EXERCISE, PHYSICAL was see EXERTION 1976-88
CATSH = CAT LIST
MR = 20160630
DA = 19880608
DC = 1
DX = 19890101
UI = D015444

For reference, here are the values of MeSH field codes:

AN: Annotation
AQ: Allowable Topic Classifier
CATSH: Cataloging Sub-Heading
CX: Consider Also Cross-Reference
DA: Date of Entry
DC: Descriptor Class
DE: Descriptor Entry Version
DQ: Date Qualifier Established
DS: Descriptor Short Version
DX: Date Descriptor Established
EC: Entry Combination
ENTRY: Non-Print Entry Term
FR: Frequency
FX: Forward Cross-Reference
HM: Heading Mapped-To
HN: History Note
II: Indexing Information
MH: MeSH Heading
MH_TH: MeSH Heading Thesaurus ID
MN: MeSH Tree Number
MR: Major Revision Date
MS: MeSH Scope Note
N1: Chemical Abstracts
NM: Name of Substance
NM_TH: NM Thesaurus ID
NO: Note
OL: Online Note
PA: Pharmacological Action
PI: Previous Indexing
PM: Public MeSH Note
QA: Qualifier Abbreviation
QE: Qualifier Entry Version
QS: Qualifier Short Version
QT: Qualifier Type
QX: Qualifier Cross-Reference
PRINT: Print Entry Term
RECTYPE: Record Type
RH: Running Head
RN: Registry Number
RR: Related Registry Number/EC Number/UNII Code
SH: Subheading Qualifier Name
SO: Source
ST: Semantic Type
SY: Synonym
TN: Tree Node Allowed
UI: Unique Identifier

See MeSH XML Data Elements (https://www.nlm.nih.gov/mesh/xmlconvert_ascii.html).

These entries can be presented in a more readable format:

MeSH entry for “Exercise” (https://www.ncbi.nlm.nih.gov/mesh/68015444)

1: Exercise
Physical activity which is usually regular and done with the intention of
improving or maintaining PHYSICAL FITNESS or HEALTH. Contrast with PHYSICAL
EXERTION which is concerned largely with the physiologic and metabolic response
to energy expenditure.
Year introduced: 1989

Subheadings:
adverse effects
analysis
anatomy and histology
blood
classification
drug effects
economics
education
epidemiology
ethics
history
injuries
instrumentation
legislation and jurisprudence
metabolism
methods
mortality
organization and administration
pathology
pharmacology
physiology
psychology
standards
statistics and numerical data
therapeutic use
trends
veterinary

Tree Number(s): G11.427.410.698.277, I03.350
Entry Terms:
Exercises
Physical Activity
Activities, Physical
Activity, Physical
Physical Activities
Exercise, Physical
Exercises, Physical
Physical Exercise
Physical Exercises
Acute Exercise
Acute Exercises
Exercise, Acute
Exercises, Acute
Exercise, Isometric
Exercises, Isometric
Isometric Exercises
Isometric Exercise
Exercise, Aerobic
Aerobic Exercise
Aerobic Exercises
Exercises, Aerobic
Exercise Training
Exercise Trainings
Training, Exercise
Trainings, Exercise

Previous Indexing:
Exertion (1966-1988)
Physical Fitness (1966-1988)

See Also:
Exercise Therapy
Physical Exertion
Physical Fitness
Sports
Exercise Movement Techniques

All MeSH Categories
    Phenomena and Processes Category
        Musculoskeletal and Neural Physiological Phenomena
            Musculoskeletal Physiological Phenomena
                Movement
                    Motor Activity
                        Exercise
                            Cool-Down Exercise
                            Gymnastics
                            Muscle Stretching Exercises
                            Physical Conditioning, Animal
                            Physical Conditioning, Human
                                Circuit-Based Exercise
                                Endurance Training
                                High-Intensity Interval Training
                                Plyometric Exercise
                                Resistance Training
                            Running
                                Jogging
                            Swimming
                            Walking
                                Stair Climbing
                            Warm-Up Exercise

All MeSH Categories
    Anthropology, Education, Sociology and Social Phenomena Category
        Human Activities
            Exercise
                Cool-Down Exercise
                Gymnastics
                Muscle Stretching Exercises
                Physical Conditioning, Human
                    Circuit-Based Exercise
                    Endurance Training
                    High-Intensity Interval Training
                    Plyometric Exercise
                    Resistance Training
                Running
                    Jogging
                Swimming
                Walking
                    Stair Climbing
                Warm-Up Exercise

MeSH Entry for “Physical Fitness”
https://www.ncbi.nlm.nih.gov/mesh/68010809

1: Physical Fitness
The ability to carry out daily tasks and perform physical activities in a highly functional state, often as a result of physical conditioning.

Subheadings:
analysis
classification
drug effects
education
history
instrumentation
legislation and jurisprudence
methods
organization and administration
physiology
psychology
standards

Tree Number(s): G11.427.685, I03.450.642.845.054.800, N01.400.545
Entry Terms:
Fitness, Physical

See Also:
Exercise Test
Exercise Therapy
Physical Endurance
Exercise

All MeSH Categories
    Phenomena and Processes Category
        Musculoskeletal and Neural Physiological Phenomena
            Musculoskeletal Physiological Phenomena
                Physical Fitness
                    Cardiorespiratory Fitness

All MeSH Categories
    Anthropology, Education, Sociology and Social Phenomena Category
        Human Activities
            Leisure Activities
                Recreation
                    Sports
                        Athletic Performance
                            Physical Fitness
                                Cardiorespiratory Fitness

All MeSH Categories
    Health Care Category
        Population Characteristics
            Health
                Physical Fitness
                    Cardiorespiratory Fitness
                    Physical Functional Performance
                        Gait Analysis

MeSH data are available in ASCII, XML, Marc 21 and RDF formats. MeSH codes are used to classify entries in the U.S. National Library of Medicine’s MedLinePlus medical website (https://medlineplus.gov/; https://medlineplus.gov/aboutmedlineplus.html) and by Pubmed
(https://www.ncbi.nlm.nih.gov/pubmed/ ; https://www.ncbi.nlm.nih.gov/books/NBK3827/#pubmedhelp.PubMed_Quick_Start) for annotating over 29 million biomedical articles.

Example Pubmed Entry in MEDLINE format with MeSH Headings (MH) (https://www.ncbi.nlm.nih.gov/pubmed/19760431):

PMID- 19760431
OWN – NLM
STAT- MEDLINE
DCOM- 20100430
LR – 20181113
IS – 1439-6327 (Electronic)
IS – 1439-6319 (Linking)
VI – 108
IP – 1
DP – 2010 Jan
TI – The effects of low-intensity resistance training with vascular restriction on leg muscle strength in older men.
PG – 147-55
LID – 10.1007/s00421-009-1204-5 [doi]
AB – The purpose of this study was to investigate and compare the effects of two types of resistance training protocols on the adaptation of skeletal muscle strength in older men. Thirty-seven healthy male subjects (50-64 years) participated in this study. Subjects were assigned to one of three groups: high-intensity (80% 1-RM) resistance training (RT80); low-intensity (20% 1-RM) resistance training with vascular restriction (VR-RT20); and a control group (CON) that performed no exercise. Subjects in both exercise groups performed three upper body (at 80% 1-RM) and two lower body exercises either with (20% 1-RM) or without (80% 1-RM) vascular restriction three times a week for 6 weeks. As expected, the RT80 and VR-RT20 groups had significantly (p < 0.01) greater strength increases in all upper body and leg press exercises compared with CON, however, absolute strength gains for the RT80 and VR-RT20 groups were similar (p > 0.05). It should be noted that the percentage increase in leg extension strength for the RT80 group was significantly greater than that for both the VR-RT20 (p < 0.05) and CON groups (p < 0.01), while the percentage increase in leg extension strength for the VR-RT20 group was significantly (p < 0.01) greater than that for the CON. The findings suggested that leg muscle strength improves with the low-load vascular restriction training and the VR-RT20 training protocol was almost as effective as the RT80 training protocol for increasing muscular strength in older men.
FAU – Karabulut, Murat
AU – Karabulut M
AD – Department of Health and Human Performance, University of Texas at Brownsville/Texas Southmost College, Brownsville, TX 78520, USA.
murat.karabulut@utb.edu
FAU – Abe, Takashi
AU – Abe T
FAU – Sato, Yoshiaki
AU – Sato Y
FAU – Bemben, Michael G
AU – Bemben MG
LA – eng
PT – Journal Article
DEP – 20090918
PL – Germany
TA – Eur J Appl Physiol
JT – European journal of applied physiology
JID – 100954790
SB – IM
MH – Adaptation, Physiological/physiology MH – Blood Flow Velocity MH – Blood Pressure MH – Blood Vessels/physiology
MH – Body Height/physiology
MH – Contraindications
MH – Exercise/physiology MH – Humans MH – Leg/physiology MH – Male MH – Middle Aged MH – Muscle Contraction/physiology
MH – Muscle Strength/*physiology
MH – Physical Endurance/physiology
MH – *Resistance Training
MH – Stress, Mechanical
MH – Weight Lifting/physiology
EDAT- 2009/09/18 06:00
MHDA- 2010/05/01 06:00
CRDT- 2009/09/18 06:00
PHST- 2009/09/10 00:00 [accepted]
PHST- 2009/09/18 06:00 [entrez]
PHST- 2009/09/18 06:00 [pubmed]
PHST- 2010/05/01 06:00 [medline]
AID – 10.1007/s00421-009-1204-5 [doi]
PST – ppublish
SO – Eur J Appl Physiol. 2010 Jan;108(1):147-55. doi: 10.1007/s00421-009-1204-5. Epub
2009 Sep 18.

The entry MedlinePlus “Exercise and Physical Fitness” entry is classified with the MeSH codes “Exercise” and “Physical Fitness” (https://medlineplus.gov/exerciseandphysicalfitness.html).

As can be seen from this brief overview, the English WordNet and MeSH have some interesting characteristics:

  • they offer types of lexical information (and ways of looking at lexical information) that deepen and complement the information found in standard dictionaries
  • they are formally structured
  • they are available in multiple standard computer-readable formats
  • they support and enable a range of computational functions such as text searching, semantic similarity and word-sense disambiguation
  • standard formats allow knowledge bases to link to and be incorporated in other structured data sources
  • formal structuring demands a disciplined lexicographic development and maintenance process

Given the computational and electronic future of lexicographic resources, these attributes will benefit traditional lexicography. Future lexicographic resources must serve multiple–computational and human–users. And data that cannot be linked or shared cannot and will not be used.