Atom types

From OpenCog

Atomese defines a wide variety of Atom and Value types. The name "Atom" was chosen because of the resemblance to the concept of the "atomic sentence (Wikipedia link)" in mathematical logic. Likewise, the word "Value" is meant to invoke the idea of a truth value, or a valuation in logic. All Atoms are typed, both in the sense of a data type, and in the sense of type theory. Almost all Atomese types are abstract data types.

The two most basic types of atoms are the Node and the Link. Nodes are identified by their names or labels; that is the only property that they have. Nodes, once placed in the AtomSpace, are unique: there can only ever be one node of a given type and name. Links are identified by their contents, which are either ordered sequences or unordered sets of other atoms. Links do not have any name or label other than their contents: a link is uniquely identified by it's type and it's contents. Nodes and links are used to construct abstract syntax trees.

Almost all atom types are executable: that is, they have a dual role; they represent graphs, hypergraphs, data structures and relationships. As representational elements, they are stored in the AtomSpace. But they can also "do something", when executed. Thus, for example,

  (PlusLink (Number 2) (Number 2))

will return

   (Number 4)

when executed. This is accomplished by backing most Atom types with C++ classes that actually "do that thing" that they are designed to do.

The above example illustrates that Atomese can be shockingly verbose. This is a side effect of it's representational design goal. It is not intended as a "programming language" for humans, but rather as a system for representing abstract structures that are operated on by algorithms. Thus, you would never write the above to add two numbers together; instead, you'd invoke some algorithm that eventually results in the above being produced and executed. That algorithm either presents the result to you in a nice, human-readable form, or it passes this result on to other algorithms for further processing, manipulation and rewriting.

The category of atom types lists all documented atoms currently in use; there are over 150 of them. This page provides a general introduction to some of the types in current use. It is not a complete list: new types are easily created (see the examples/type-system demo in github.) There are also more modern, large sparse vector-oriented representation systems in the AtomSpace (i.e. neural-net-like); these are not covered here. There are also a considerable number of domain-specific Atom types, for genetics, biology, chemistry, robotics, vision, natural language processing. Most of these are not documented on this wiki; they appear in 3rd-party packages and subsystems.

The various OpenCog books mention Atom types that are not implemented or have been implemented in a somewhat different way (or are obsolete.) The partial nature of the list given here reflects a more general point: The specific collection of core Atom types has changed as the system developed. The current core set is the result of several decades of experimentation and false starts.

Overview

Different subsystems provide different collections of Atom types. Thus, the idea of what is "basic" depends on the subsystem in question. We begin by reviewing some obsolete Atom types. These were foundational for early versions of OpenCog. Although they are gone, some shadows remain; there are some important lessons in why they are gone.

  • Early versions of OpenCog imagined that it would be used as a knowledge representation system, and thus had a rich variety of Atom types suitable for representing ontological relationships. These are now mostly all gone. The section Old PLN semantics below reviews some of these uses.
  • Early versions of OpenCog implemented PLN, an inference engine for a probabilistic first-order logic. This included a rich set of Atom types resembling concepts commonly appearing in discussions of logical reasoning and inference. The Category:PLN Atom Types records these types. However, PLN is unmaintained and now obsolete, so these types are no longer directly available for immediate use.
  • Early versions of OpenCog implemented a natural language processing subsystem, performing linguistic analysis based on relatively conventional linguistic theory. These types are described in Category:NLP Atom Types. Most, but not all of the NLP subsystem is unmaintained and obsolete.
  • The one part of the NLP subsystem that remains in active use is the Link Grammar subsystem. The reason for this is that LG provides a generic, NLP-independent subsystem that can parse arbitrary token streams, and not just natural language. It uses a small collection of custom Atom types to represent parse results.

Important classes of Atom types that are in active service are presented below.

Generic (hyper-)graphs

Generic hypergraph structures are represented in the AtomSpace using Atoms such as EdgeLink, TagNode, ListLink and ItemNode. Edges are meant to just be graph edges, and are not meant to have any deeper semantic content or interpretation.

In contexts where the graph is asserting "factual knowledge" about the physical universe (or the noosphere), the PredicateNode, ConceptNode and EvaluationLink may be used. These are meant to assert "truth" of some kind, in the sense of classical Aristotelian logic, or of predicate logic. These are commonly given some numeric truth value (using the Value mechanism) or some associated arithmetic formula for computing the truth (as was done in PLN).

Values

Representing Values. Every Atom as an associated mutable key-value database. The keys are necessarily other Atoms, but the values can be Atoms or Values. Basic Value types include the FloatValue, the StringValue and the LinkValue. These are all vectors: the FloatValue is a vector of floats; the StringValue is a vector of strings, and the LinkValue is a vector of Values. Subtypes include the QueueValue and the UnisetValue, which provide thread-safe multi-reader, multi-writer access. Types derived from StreamValue provide a way of flowing data through Atomese graphs.

The vector in a FloatValue might be used to store only one or two numbers: say, a probability and a confidence. Alternately, the vector may be large, thus providing a vector embedding for the Atom. LinkValues could be used (for example) to associate arithmetic formulas with an Atom.

Graph rewriting

Representing graph rewriting rules, such as RuleLink. In order to do any sort of useful computation within the AtomSpace, one must have a way of spotting a particular pattern, and creating a new/different pattern. This is accomplished by using the QueryLink and the FilterLink, both derived from a base class RuleLink. The RuleLink can be thought of as a graph re-write rule, or as an if-then statement or an implication A -> B. That is, A->B means "if(A) then (B)" where A and B are hypergraphs themselves. RuleLinks take two arguments: A and B. Either can be arbitrarily complicated.

Queries

Representing queries that find Atoms in the AtomSpace, such as the QueryLink, MeetLink, JoinLink and DualLink. This includes specialized Atom types pertaining to query: PresentLink, AbsentLink, AlwaysLink, ChoiceLink, GroupByLink.

Variables

Representing typed variables. The VariableNode, TypedVariableLink. TypeNode and VariableList provide a mechanism for the appearance of a variable in an expression. Variables are used heavily to specify rewrite rules and query patterns.

Variable binding

Representing bound variables in function expressions. The LambdaLink and ScopeLink provide infrastructure for binding variables to expressions. The FreeLink provides infrastructure for working with unbound variables. The QuoteLink provides a way of quoting expressions, thus "hiding" the variables in them. Note that scoped links obey the common expectations from lambda calculus: alpha equivalence is explicitly maintained by the C++ classes that implement these types. The private PrenexLink is used to place terms into prenex order. There is extensive support for beta reduction. Arithmetic functions can be delta-reduced.

Flows

Representing the manipulation of flowing data streams. A primary workhorse is the FilterLink. Access to Atoms and stream contents are provided by ValueOfLink. The setter is SetValueLink, while atomic increment is provided by IncrementValueLink ("atomic" meaning "race-free in multi-threaded usage".) Stream promises (aka "futures") are provided by Values, such as FormulaStream and FutureStream.

Streams

Atoms for converting AtomSpace contents to data streams and back include specialty links include SizeOf, IncomingOf, TypeOf, KeysOf, CollectionOf, AtomSpaceOf. Numeric streams can be accessed with FloatValueOf; the streams themselves with StreamValueOf. Conversions can be done with LinkSignatureLink.

Stream utilities

Assorted basic stream utilities, such as SplitLink and JsonSplitLink. More are provides by the https://github.com/opencog/sensory project, including Atom types for file system access, terminal access and IRC access. Access to GPU's are provided by the https://github.com/opencog/atomspace-simd project.

Arithmetic

Arithmetic expressions can be represented with PlusLink, TimesLink and GreaterThanLink. These, like most Atom types, are dual purpose: they can both represent expressions, and, when executed, perform the operation that they are representing. Since there is more to life than plus and times, various elementary functions are defined, such as SineLink, ImpulseLink, AccumulateLink, MinLink, MaxLink. These all derive from the base type of NumericFunctionLink.

Vector operations

Much of the flow through the system is represented with vectors; thus there are a collection of types for manipulating vectors, including ElementOf, DecimateLink and BoolOpLink. Columns can be extracted with FloatColumn and LinkColumn; a column/row transpose can be done with TransposeColumn.

Storage and Networking

The StorageNode and ProxyNode subsystem provides mechanisms to store the AtomSpace (or individual Atoms) to disk, or to transmit them over the network to other AtomSpaces. The ProxyNodes provide a mechanism for mirroring, sharding, load-balancing and caching, thus providing the basic building blocks for a distributed AtomSpace.

Stateful manipulations

Atoms are, by definition, immutable: they can only be created and destroyed. Yet, mutable graphs are desirable. This is ability is provided by the StateLink, the GrantLink, the DefineLink and the DeleteLink. These are all thread-safe, providing guarantees against racing threads. The GrantLink provides a particularly strong guarantee.

Type subsystem

As can be seen, Atomese provides a rich collection of types; these are organized into a type system. The type constructors include the TypeNode, the SignatureLink, the ArrowLink and the DefinedTypeNode. Polymorphic types can be specified using the TypeChoice link. The type system allows for basic error checking, such as with the type checker, and more generally, it allows type-logical reasoning and inference to be performed on atom signatures. This type system resembles that found in CaML or Haskell; however a huge difference is that the Atomese type system is run-time dynamic. Properly mapping it to statically typed languages has proved to be effectively impossible. Static typing is a one-way trap-door.

Sheaves

A basic concept from a structuralist approach of representing, well, structure, is the sheaf. A section of a sheaf can be visualized as a partly assembled jigsaw puzzle: Some of the jigsaw pieces are connected to others, while others have exposed, unconnected tabs. This conception of a sheaf occurs in many settings, including natural language, such as Link Grammar, and in spatial relationships (what object is next to what other object), chemistry (what chemical atom is bonded to another), biochemistry (the variable part of immunoglobulin) and economics (the inputs and outputs of a factory, the imports and exports of a nation). A collection of Atoms are provided to represent sheaves: this includes the Section, the ConnectorSeq, the Connector, the SexNode and the Bond. The ShapeLink and CrossSection are dual to the Section.

In the economics example given above, one might imagine that exports and imports can be represented with directed graphs, i.e. vertexes and edges. This is indeed the case; however, the jigsaw paradigm, of half-edges waiting to be connected into a bond, offers significant conceptual and computational advantages, having impacts on RAM usage, CPU usage and queriability.

External systems

Representing external function calls. The AtomSpace cannot live in isolation; it must interface with the external world. The GroundedSchemaNode, when used with the ExecutionOutputLink provides a way of calling extern C++, python or scheme (guile) code. The DefinedProcedureNode provides the same idea, but now, the function being called is one that has been defined in pure Atomese.

Sensorimotor systems

Function calls do not provide an adequate mechanism for interacting with external systems, and, in particular, with sensorimotor systems. For this purpose, the ObjectNode provides a way of encapsulating interactions with external entities, by sending messages to that object. Atoms for working with messages include the MessagesOfLink, the IsMessageLink, the KeysOfLink and IsKeyLink. The ObjectNode is a base type for the StorageNode and the various storage and proxy types. It is also heavily used in the the https://github.com/opencog/sensory project to provide external sensorimotor interfaces.

Membrane computing

The AtomSpace itself can be though of as one giant, mutable unordered Link. To capture this idea, the Frame provides a base class that is both a Node (thus giving it a name) and a Link (allowing it to contain Atoms.) AtomSpaces can be composed into directed acyclic graphs, with child AtomSpaces "inheriting" all of the Atoms in the parent space. Overlay semantics are implemented: values on Atoms in the child space hide the values in the parent; Atoms deleted in the child hide the Atom in the parent (where it is still present). These overlays can be understood as providing a form of membrane computing, where each AtomSpace holds everything that is within a membrane. The AtomSpaceOfLink provides a way of finding out which AtomSpace an Atom belongs to.

Old PLN semantics

Older versions of the OpenCog included PLN, a now unmaintained and obsolete subsystem for performing a kind of probabilistic first order logic inference. Since it was meant to determine the truth of logical statements and inferences, it had a rich variety of Atom types closely resembling those one might find in predicate logic systems, or in knowledge representation systems. These atom types are now all obsolete, not so much because they were bad ideas, but rather, simply because there are no actively maintained subsystems that actually define, require or use these types.

Suppose we may want to say "if (young and beautiful) then attractive". This would be:

ImplicationLink
   AndLink
        ConceptNode young
        ConceptNode beautiful
    ConceptNode attractive

Note: Two ConceptNodes that are linked by an AND represent a new ConceptNode (the intersection of the two concepts). However, the above doesn't express who it is that is young and attractive. We really want pattern matching to allow "variable holes" in the pattern. That is "if (X is young and X is beautiful) then X is attractive" where X can be thought of a "hole" or a "blank" in the expression: anything that fits in this hole will provide a match to the template "X is young and X is beautiful" and thus allow the graph re-write "X is attractive" to occur. The variable X is indicated with a VariableNode:

AverageLink $X
  ImplicationLink
      AndLink
           EvaluationLink young $X
           EvaluationLink beautiful $X
       EvaluationLink attractive $X

NotLink is a unary link, so, for example, we might say

AverageLink $X
  ImplicationLink
      AndLink
           EvaluationLink young $X
           EvaluationLink beautiful $X
           EvaluationLink 
                NotLink
                EvaluationLink poor $X
       EvaluationLink attractive $X

ContextLink allows explicit contextualization of knowledge, which is used in PLN, e.g.

ContextLink
  ConceptNode golf
  InheritanceLink
     ObjectNode BenGoertzel
     ConceptNode incompetent

says that Ben Goertzel is incompetent in the context of golf.

(Obsolete) Basic Knowledge representation in PLN

First, a ConceptNode does not necessarily refer to a whole concept, but may refer to part of a concept -- it is essentially a "basic semantic node" whose meaning comes from its links to other Atoms. It would be more accurately, but less tersely, named "concept or concept fragment or element node." A simple example would be a ConceptNode grouping nodes that are somehow related, for example

ConceptNode "C"
InheritanceLink (ObjectNode "BW") (ConceptNode "C")
InheritanceLink (ObjectNode "BP") (ConceptNode "C")
InheritanceLink (ObjectNode "BN") (ConceptNode "C")
ReferenceLink (ObjectNode "BW") (PhraseNode "Ben's watch")
ReferenceLink (ObjectNode "BP")(PhraseNode "Ben's passport")
ReferenceLink (ObjectNode "BN") (PhraseNode "Ben's necklace")

indicates the simple ConceptNode grouping three objects owned by Ben. The above-given Atoms don't indicate the ownership relationship, they just link the three objects with textual descriptions -- they provide a syntactic relation, and not a semantic one. In this example, the ConceptNode links transparently to physical objects and English descriptions, but in general this won't be the case -- most ConceptNodes will look to the human eye like groupings of links of various types, that link to other nodes consisting of groupings of links of various types, etc.

There are Atoms referring to basic, useful mathematical objects, e.g. NumberNodes like

NumberNode 4
NumberNode 3.14
NumberNode 1 2 3 4 5 42

NumberNodes are vectors. (For most practical applications, the FloatValue provides a faster, easier and cheaper numeric vector; but Values are a more advanced topic.)

A core distinction is made between ordered links and unordered links; these are handled differently in the Atomspace software. A basic unordered link is the SetLink, which groups its arguments into a set. For instance, the ConceptNode C defined by

ConceptNode C
MemberLink A C
MemberLink B C

is equivalent to

SetLink A B

On the other hand, ListLinks are like SetLinks but ordered, and they play a fundamental role due to their relationship to predicates. Most predicates are assumed to take ordered arguments, so we may say e.g.

EvaluationLink
   PredicateNode eat
   ListLink
      ConceptNode cat
       ConceptNode mouse

to indicate that cats eat mice.

Note that by an expression like

ConceptNode cat

is meant

ConceptNode C
ReferenceLink W C
WordNode W #cat

since it's WordNodes rather than ConceptNodes that refer to words. (And note that the strength of the ReferenceLink would not be 1 in this case, because the word "cat" has multiple senses.) However, there is no harm nor formal incorrectness in the "ConceptNode cat" usage, since "cat" is just as valid a name for a ConceptNode as, say, "C."

We've already introduced above the MemberLink, which is a link joining a member to the set that contains it. Notable is that the truth value of a MemberLink is fuzzy rather than probabilistic, and that PLN is able to inter-operate fuzzy and probabilistic values.

SubsetLinks also exist, with the obvious meaning, e.g.

ConceptNode cat
ConceptNode animal
SubsetLink cat animal

Note that SubsetLink refers to a purely extensional subset relationship, and that InheritanceLink should be used for the generic "intensional + extensional" analogue of this -- more on this below. SubsetLink could more consistently (with other link types) be named ExtensionalInheritanceLink, but SubsetLink is used because it's shorter and more intuitive.

(Obsolete) Variables in Quantifiers

Variables are handled via quantifiers; the default quantifier being the AverageLink, so that the default interpretation of

ImplicationLink
  InheritanceLink $X animal
  EvaluationLink 
      PredicateNode: eat
      ListLink
          $X
          ConceptNode: food

is

AverageLink $X
   ImplicationLink
      InheritanceLink $X animal
      EvaluationLink 
         PredicateNode: eat
         ListLink
             $X
             ConceptNode: food

The AverageLink invokes an estimation of the average TruthValue of the embedded expression (in this case an ImplicationLink) over all possible values of the variable $X. If there are type restrictions regarding the variable $X, these are taken into account in conducting the averaging. ForAllLink and ExistsLink may be used in the same places as AverageLink, with uncertain truth value semantics defined in PLN theory using third-order probabilities. There is also a ScholemLink used to indicate variable dependencies for existentially quantified variables, used in cases of multiply nested existential quantifiers.

EvaluationLink and MemberLink have overlapping semantics, allowing expression of the same conceptual/logical relationships in terms of predicates or sets. So, for example, the predicate specifying things $X that eat food is expressed by

EvaluationLink 
   PredicateNode: eat
   ListLink
         $X
         ConceptNode: food

It has the same semantics as the set-membership declaration of things $X that are food-eaters:

MemberLink
    ListLink
         $X
         ConceptNode: food
    ConceptNode: FoodEaters


The relation between the predicate "eat" and the concept "FoodEaters" is formally given by

ExtensionalEquivalenceLink
    ConceptNode: FoodEaters
    SatisfyingSetLink 
        PredicateNode: eat

In other words, we say that "FoodEaters" is the SatisfyingSet of the predicate "eat": it is the set of entities that satisfy the predicate "eat". Note that the truth values of MemberLink and EvaluationLink are fuzzy rather than probabilistic.

(Obsolete) Logical Links

There is a host of link types embodying logical relationships as defined in the PLN logic system, e.g. InheritanceLink, SubsetLink (aka ExtensionalInheritanceLink) and IntensionalInheritanceLink. There are different sorts of inheritance, e.g.

SubsetLink salmon fish
IntensionalInheritanceLink whale fish
InheritanceLink fish animal

There are SimilarityLink, ExtensionalSimilarityLink, IntensionalSimilarityLink which are symmetrical versions, e.g.

SimilarityLink shark barracuda
IntensionalSimilarityLink shark dolphin
ExtensionalSimiliarityLink American obese\_person

There are also higher-order versions of these links, both asymmetric ImplicationLink, ExtensionalImplicationLink, IntensionalImplicationLink. The symmetric versions are EquivalenceLink, ExtensionalEquivalenceLink and IntensionalEquivalenceLink. These are used between predicates and links, e.g.

ImplicationLink
   EvaluationLink
        eat
   	ListLink
          $X
          dirt
   EvaluationLink
        feel
        ListLInk
             $X
             sick

or

ImplicationLink
   EvaluationLink
       eat
  	ListLink
          $X
          dirt
   InheritanceLink $X sick

or

ForAllLink $X, $Y, $Z
   ExtensionalEquivalenceLink
      EquivalenceLink
          $Z
   	  EvaluationLink
             +
             ListLink
                 $X
                 $Y
  ExtensionalEquivalenceLink
      EquivalenceLink
          $Z
  	   EvaluationLink
             +
             ListLink
                 $Y
                 $X

Note, the latter is given as an extensional equivalence because it's a pure mathematical equivalence. This is not the only case of pure extensional equivalence, but it's an important one.

(Obsolete) Temporal Links

There are also temporal versions of these links, such as

  • PredictiveImplicationLink
  • PredictiveAttractionLink
  • SequentialANDLink
  • SimultaneousANDLink

which combine logical relation between the argument with temporal relation between their arguments. For instance, we might say

PredictiveImplicationLink
   PredicateNode: JumpOffCliff
   PredicateNode: Dead

or including arguments,

PredictiveImplicationLink
   EvaluationLink JumpOffCliff $X
   EvaluationLink Dead $X

The former version, without variable arguments given, shows the possibility of using higher-order logical links to join predicates without any explicit variables. Via using this format exclusively, one could avoid VariableAtoms entirely, using only higher-order functions in the manner of pure functional programming formalisms like combinatory logic. However, this purely functional style has not proved convenient, so the Atomspace in practice combines functional-style representation with variable-based representation.

Temporal links often come with specific temporal quantification, e.g.

PredictiveImplicationLink <5 seconds>
   EvaluationLink JumpOffCliff $X
   EvaluationLink Dead $X

indicating that the conclusion will generally follow the premise within 5 seconds. There is a system for managing fuzzy time intervals and their interrelationships, based on a fuzzy version of Allen Interval Algebra.

SequentialANDLink is similar to PredictiveImplicationLink but its truth value is calculated differently. The truth value of

SequentialANDLink <5 seconds>
   EvaluationLink JumpOffCliff $X
   EvaluationLink Dead $X

indicates the likelihood of the sequence of events occurring in that order, with gap lying within the specified time interval. The truth value of the PredictiveImplicationLink version indicates the likelihood of the second event, conditional on the occurrence of the first event (within the given time interval restriction).

There are also links representing basic temporal relationships, such as BeforeLink and AfterLink. These are used to refer to specific events, e.g. if X refers to the event of Ben waking up on July 15 2012, and Y refers to the event of Ben getting out of bed on July 15 2012, then one might have

AfterLink X Y

And there are TimeNodes (representing time-stamps such as temporal moments or intervals) and AtTimeLinks, so we may e.g. say

AtTimeLink
  X
  TimeNode: 8:24AM Eastern Standard Time, July 15 2012 AD

(Obsolete) Links for Special External Data Types

Finally, there are also Atom types referring to specific types of data important to using OpenCog in specific contexts. For instance, there are Atom types referring to general natural language data types, such as

These are just three out of several dozen listed in Category:NLP Atom Types. Different projects can create their own Atom types; the AGI-Bio project defines a GeneNode and ProteinNode. The chemistry project defines a Node for each chemical element. The 3D and Robot projects define Atoms appropriate for those domains.

(Obsolete) Defining new Link Types

All of the atom types mentioned above can be thought of as custom link types created for special-purpose use. These link types are convenient, and handy, but are not really "fundamental" -- almost all of them can be (and should be) understood as "syntactic sugar" for an EvaluationLink. Thus, for example, the NLP subsystem uses a PartOfSpeechLink to tag words with parts of speech:

PartOfSpeechLink
  WordNode "automobile"
  DefinedLinguisticConceptNode "noun"

This should effectively be understood to be equivalent to

EvaluationLink
  PredicateNode "PartOfSpeech"
  ListLink
     WordNode "automobile"
     DefinedLinguisticConceptNode "noun"

Furthermore, both WordNode and DefinedLinguisticConceptNode are not "fundamental", but are in turn yet more syntactic sugar, which can be re-expressed using InheritanceLinks. So, at the fundamental level, one has:

EvaluationLink
  PredicateNode "PartOfSpeech"
  ListLink
     InheritanceLink
        ConceptNode "automobile"
        ConceptNode "Word"
     InheritanceLink
        ConceptNode "noun"
        ConceptNode "DefinedLinguisticConcept"

The difference between this last form, and the first, indicates why "syntactic sugar" is tasty.

The DefinedTypeNode can be used to give a name to anonymous types. The mechanism for converting types created with DefinedTypeNode into types accessible in C++ has not been fully specified or implemented. By contrast, the EquivalenceLink can be used to create definitions for new PredicateNodes and Conceptnodes.