Link Grammar

From OpenCog

Link Grammar is a theory for assigning structural bonds between structural elements in a general "structuralist" setting. It originated as a specific theory of natural language proposed by Sleator and Temperley in 1991, but now appears to be a far more general descriptive framework for structural relationships.

It has heavily influenced the design of Atomese and lead to a general sheaf-theoretic understanding of structure. As of 2025, much of the research effort into Atomese is to elucidate and expand the computational algorithms and properties of using such Link-Grammar-inspired sheaf-theoretic "self"-assembly methods. It seems plausible that Sections can be used to create systems that self-assemble at a critical phase transition (much like Per Bak's critical sand-piles) and can provide a foundation for basal cognition.

Overview

The Wikipedia article on Link Grammar provides a reasonable overview of the original theory as it applies to natural language. In short, it is a kind of dependency grammar that indicates how pairs of words in a sentence are related to one-another. It encodes the syntax of natural language, and, due to it's nature, it also encodes a semantics to a shallow degree. In this last sense, it provides a natural bridge to the ideas of deep syntax and semantics, as described by Meaning-Text Theory (MTT).

The basic concepts used in Link Grammar are those of "connectors" and "disjuncts". Connectors define how two words may connect to one-another, while disjuncts are a sequence of connectors, all of which must be satisfied in order to obtain a valid parse of a sentence. A "word" in Link Grammar is a lexical entry that associates to each natural-language word a collection of disjuncts. It becomes apparent that disjuncts are a kind of fine-grained "part of speech". For example, transitive verbs require both a subject and an object. Link Grammar defines connectors to subjects and objects, and associates a disjunct to verbs that has both connectors in them.

The parsing that LG does, using disjuncts to represent "jigsaw puzzle pieces" that "snap together when the tabs match" can be understood to obey the axioms that define a sheaf, in that sheaves are also assembled by "snapping together matching subsets in a coherent fashion". Loosely speaking, parsing in general can be understood in this way.

Generalization

Deeper exploration indicates that the concept of connectors, connector sequences and lexical entries are generically applicable to graphs and graph theory. Link Grammar links correspond to the edges of a graph. Connectors correspond to "half-edges": the possibility of creating a full edge by connecting together two half-edges. A connector sequence corresponds to a vertex in a graph. These basic concepts can be further shown to obey the defining axioms of a sheaf, as in sheaf theory. The page on Connectors and Sections reviews how this generalization is broadly applicable to computing and graphs.

These core concepts have been reified into AtomSpace Atom types. These are:

  • Connector corresponds to the idea of a Link Grammar connector. It generalizes the connector by comprising it of two parts: an identifier for the connection type, and a connection constraint, the SexNode.
  • The SexNode generalizes the idea of the "+" and "-" directions in a Link Grammar connector. The Sex can hold arbitrary user-defined connection constraints.
  • The ConnectorSeq generalizes the idea of a Link Grammar "disjunct". It is an ordered sequence of connectors.
  • The Section generalizes the idea of a dictionary entry in the Link Grammar dictionary. It consists of two parts: the "germ" (corresponding to the word of the lexical entry) and the associated ConnectorSeq (the associated disjunct).

Due to the typing nature of the AtomSpace, it is useful to define some subsidiary types. These aren't really central concepts, but they do provide a set of base classes from which other key Atom types are derived. These are:

  • LexicalNode - a base class for anything that can be looked up in a dictionary. It's the base class for WordNodes.
  • BondNode corresponds to the Link Grammar concept of a "link". Unfortunately, the word "link" is already heavily used in the AtomSpace, so it is renamed to "bond". This is provisional; this atom type is currently not used very much.

The details of this generalization are reviewed the github sheaf directory, and specifically as a collection of PDF's that can be found in the sheaf/docs subdirectory.

Parser

Link Grammar has an implementation, it can be found in the Link Grammar github repo. That implementation contains source code for a parser, as well as hand-crafted dictionaries for English and Russian, as well as prototypes/proofs-of-concept for another eight natural languages (Kazakh, Turkish, Persian, Arabic, Lithuanian, German, Vietnamese, Indonesian, Hebrew). The main Link Grammar web page provides more info. The intro web page provides a detailed introduction to Link Grammar, specifically.

Within OpenCog, the parser and the dictionaries can be accessed directly. The LgDictNode provides a method for looking up word expressions in a given dictionary, and importing them into the AtomSpace (as Atomese). The LgParseLink provides an interface to the parser. Given a sentence, it will use the Link Grammar parser to parse the sentence, and place the parse results into the AtomSpace as Atomese.

Generation

Given a dictionary and/or a collection of Sections, one might want to generate grammatically-valid sentences, based on the syntax rules encoded in that dictionary. There are more than a half-dozen different efforts to do this, in various states of development and disrepair. Two are in active development are:

  • The link-generator command that is available with the main Link Grammar source code (or rather, will be available in version 5.9.0, once the "generate" branch in github is merged). This command is in active development. In it's current form, it allows a sequence of wild-cards to be specified, and it will "fill in the blanks" with any words that would be grammatically valid in those locations. For example, using the input "This is a * idea", it will insert all possible adjectives for the star,
  • The opencog/generate framework, using the generalized sheaf concepts sketched above. The goal is to be able to assemble Sections into sheaves, in the general sense. So far, a proof-of-concept has been implemented. It can expand simple Link Grammar dictionaries into valid sentences. It can also expand more general graph specifications into all possible graphs that match the graph constraints.

Learning

The English and Russian dictionaries that are provided with Link Grammar are hand-crafted. It is clear that hand-crafting dictionaries is not a viable long-term strategy for AGI. Practical experience shows that the hand-crafted dictionaries are just barely suitable for use as stepping-stones or building blocks. Thus, there is a need to automatically learn grammar and syntax, in an unsupervised fashion. This is effected in the unsupervised language learning project.