Graphs and graphical databases are now accepted as a very good (the best?) way of capturing the relationship between things. Yet, many of the “things” represented graphically are actually processes. An example might be the water cycle in nature: rain collects into rivers and lakes, flows into oceans, and evaporates again. Rainwater erodes rocks, is absorbed into soils, is taken up by tree roots. This can be represented as a network graph, with nodes and links representing each of the relationships. To be useful, though, the network must be annotated with time-varying data: how much rain fell, how much was absorbed into the ground, and how much was run-off. These are the “values” that inhabit the network graph.
The OpenCog AtomSpace has long supported value annotation; it now has improved support for dynamically changing values. The goal of this blog post is to describe this a bit. But first, its important to draw a sharper distinction between graphs and values. Graphs are searchable — queryable — as typically, one wishes to find all graph patterns of a given shape, to exclude certain sub-patterns from a search, and so on. Performing such searches has a design impact on how data is represented in RAM: indexes must contain lists of nearest-neighbors, so that the graph can be rapidly walked. Parallel searches suggests that lock-free operations, via immutable structures, are best. This in turn implies that graph nodes and links are fairly heavy-weight, and that updates – adding and deleting graph nodes and links – require CPU-consuming updates to various indexes. This is not particularly different than any other database: there are always indexes provided to speed up searches of a given kind, and updating those indexes takes time. Values are different: the key idea behind Values is that they do not need to be indexed. If a rain gauge changes from 0.2 to 0.3, there is no imminent need to remove the old value from an index (a sorted, addressable ordered data structure: tree or hash table) and then place the new value into that index. If the time-changing value is sound (audio), or some video pixels, then indexing this data is not just absurd, but impossible in practice. Thus, as a database design, one is driven to think of two distinct classes of data: the graph data, and the ephemeral, changing values.
This is the distinction made in the OpenCog AtomSpace. Nodes and Links – the AtomSpace Atoms – are part of the graph. Each Atom has it’s own “little” key-value database attached to it: this holds the Values. We use a capital-V Value to distinguish the actual AtomSpace object from the general notion of values. Originally, Values were TruthValues – a binary 0 or 1, T or F, providing a crisp predicate logic valuation to each graph node or link. The obvious modern extension is to a Bayesian network – where each TruthValue becomes a probability. For uncertain inference, it is convenient to maintain a second number: the certainty. But why stop there? Values can be anything: vectors of numbers, strings of text, records of data, trees and DAGs. Anything that might need to be attached to a graph node or link, anything that seems transient, fleeting, and does not need indexing.
Fleeting implies flowing, or changing: the metaphor is that of pipes, and fluids that flow in them. The graph is the description of how the pipes are connected to one-another. Like actual plumbing, it can be a bit laborious to assemble this. The Values are like the fluid: flowing effortlessly, now that the graph has been assembled.
But what is this like, in practice? Suppose one has a relationship: if P and Q then R, and one wants to compute some numeric value: say, “if rain and dry soil then fraction absorbed into soil”. This is represented as a formula. Let’s dive right in. An arithmetic formula, expressed in Atomese, looks like this:
(Lambda (VariableList (Variable "$X") (Variable "$Y")) (Divide (Times (Variable "$X") (Plus (Variable "$Y") (Number 1))) (Number 2)))
This is painfully verbose, but you get the idea. This formula is actually a graph: this is why it is so verbose. The (Number 1) is actually a graph node, and the (Plus …) is a link encompassing the Atoms underneath it. This is stored as a graph, for several reasons. One is that one should not assume that human beings (i.e. software programmers) are writing these formulas: they may be coming out of some machine-learning algorithm. As such, it must be possible to perform abstract symbolic manipulations on the formulas (as you might do in your favorite symbolic math system, e.g. Mathematica, Maxima, Sage,…). It must also be possible to evaluate the formulas: we’ll be plugging in actual numbers into those variables, and so the ability to compile the formula down to JIT code is desirable.
Values are kept in a key-value database, one per Atom (recall: an Atom is just a graph Node, or a Link). To access a specific value, one may write:
(ValueOf (Concept "rain") (Concept "Austin"))
The idea here being that the graph node “rain” has an associated table; one of the table entries is for “Austin” (again, a graph node!), and the value there might be a FloatValue — a list (vector) of floating-point numbers. The above is a graphical representation of how to access that Value. The inverse of this operation is the setting of Values. Putting this all together, one might have an expression like this:
(SetValue (Concept "runoff") (Concept "Austin") (Lambda ...) (ValueOf (Concept "rain") (Concept "Austin")))
The (Lambda …) here is an un-named, anonymous formula. Like all good systems, formulas can be named, and so (DefineLink (DefinedSchema “runoff calculation”) (Lambda …)) is how one names such things; the name can be used wherever the formulas is expected.
That’s it! The above demonstrates how to wire up a formula, connecting it to Values held in the AtomSpace. There are two kinds of updates possible: manual, and automatic. A “manual” update requires calling cog-execute! on the particular expression to run. That expression will be evaluated, drawing on whatever Values are currently stored, and updating whatever Values are specified as output. The dynamic update is more interesting. Here, the formula is installed in such a way that any time one examines the Value, the formula is triggered, thus running and updating the Values just before they are presented. The dynamic Values are really what provides the title to this blog entry: they capture the idea of Values flowing about, through the network graph.
… But …. But, dynamic Values remain a new, experimental feature of the system. There are good reasons for this. First, it is all to easy to create an infinite loop: rain runs off, evaporates, goes into clouds and comes back down as rain again. Polling the current runoff can potentially trigger an unending weather-system simulation. Worse, there is the risk of combinatoric explosion: the water in the clouds comes not only from Austin, but also from other geographical locales, which in turn come from elsewhere still. Controlling combinatorial explosion in AI is an infamously hard problem. Providing practical tools to do so, in this context, remains an open problem. (Yes, we’re working on it.)
The above used rain as a stand-in for a generic flow process. In practical machine learning, one might instead want to use something like TensorFlow, or some other neural-net package. Indeed, such packages already include small, specific graph-specification languages, describing exactly how the neural net is to be wired up, where its inputs are, where it’s outputs go. But its a special-case. Weather simulation systems running on supercomputers also have ways of specifying what is connected to what. Then there is the Systems Biology Markup Language (SBML), which provides a way of specifying a differential equation that describes the kidney clearance rate of some blood component – or of the mitochondrial metabolism rate. The goal (or rather, the dream, so far) of the OpenCog AtomSpace value-flows subsystem is to provide a common home to all of these kinds of systemic simulation processes.