Graphs and graphical databases are now accepted as a very good (the best?) way of capturing the relationship between things. Yet, many of the “things” represented graphically are actually processes. An example might be the water cycle in nature: rain collects into rivers and lakes, flows into oceans, and evaporates again. Rainwater erodes rocks, is absorbed into soils, is taken up by tree roots. This can be represented as a network graph, with nodes and links representing each of the relationships. To be useful, though, the network must be annotated with time-varying data: how much rain fell, how much was absorbed into the ground, and how much was run-off. These are the “values” that inhabit the network graph.
The OpenCog AtomSpace has long supported value annotation; it now has improved support for dynamically changing values. The goal of this blog post is to describe this a bit. But first, its important to draw a sharper distinction between graphs and values. Graphs are searchable — queryable — as typically, one wishes to find all graph patterns of a given shape, to exclude certain sub-patterns from a search, and so on. Performing such searches has a design impact on how data is represented in RAM: indexes must contain lists of nearest-neighbors, so that the graph can be rapidly walked. Parallel searches suggests that lock-free operations, via immutable structures, are best. This in turn implies that graph nodes and links are fairly heavy-weight, and that updates – adding and deleting graph nodes and links – require CPU-consuming updates to various indexes. This is not particularly different than any other database: there are always indexes provided to speed up searches of a given kind, and updating those indexes takes time. Values are different: the key idea behind Values is that they do not need to be indexed. If a rain gauge changes from 0.2 to 0.3, there is no imminent need to remove the old value from an index (a sorted, addressable ordered data structure: tree or hash table) and then place the new value into that index. If the time-changing value is sound (audio), or some video pixels, then indexing this data is not just absurd, but impossible in practice. Thus, as a database design, one is driven to think of two distinct classes of data: the graph data, and the ephemeral, changing values.
This is the distinction made in the OpenCog AtomSpace. Nodes and Links – the AtomSpace Atoms – are part of the graph. Each Atom has it’s own “little” key-value database attached to it: this holds the Values. We use a capital-V Value to distinguish the actual AtomSpace object from the general notion of values. Originally, Values were TruthValues – a binary 0 or 1, T or F, providing a crisp predicate logic valuation to each graph node or link. The obvious modern extension is to a Bayesian network – where each TruthValue becomes a probability. For uncertain inference, it is convenient to maintain a second number: the certainty. But why stop there? Values can be anything: vectors of numbers, strings of text, records of data, trees and DAGs. Anything that might need to be attached to a graph node or link, anything that seems transient, fleeting, and does not need indexing.
Fleeting implies flowing, or changing: the metaphor is that of pipes, and fluids that flow in them. The graph is the description of how the pipes are connected to one-another. Like actual plumbing, it can be a bit laborious to assemble this. The Values are like the fluid: flowing effortlessly, now that the graph has been assembled.
But what is this like, in practice? Suppose one has a relationship: if P and Q then R, and one wants to compute some numeric value: say, “if rain and dry soil then fraction absorbed into soil”. This is represented as a formula. Let’s dive right in. An arithmetic formula, expressed in Atomese, looks like this:
(Lambda (VariableList (Variable "$X") (Variable "$Y")) (Divide (Times (Variable "$X") (Plus (Variable "$Y") (Number 1))) (Number 2)))
This is painfully verbose, but you get the idea. This formula is actually a graph: this is why it is so verbose. The (Number 1) is actually a graph node, and the (Plus …) is a link encompassing the Atoms underneath it. This is stored as a graph, for several reasons. One is that one should not assume that human beings (i.e. software programmers) are writing these formulas: they may be coming out of some machine-learning algorithm. As such, it must be possible to perform abstract symbolic manipulations on the formulas (as you might do in your favorite symbolic math system, e.g. Mathematica, Maxima, Sage,…). It must also be possible to evaluate the formulas: we’ll be plugging in actual numbers into those variables, and so the ability to compile the formula down to JIT code is desirable.
Values are kept in a key-value database, one per Atom (recall: an Atom is just a graph Node, or a Link). To access a specific value, one may write:
(ValueOf (Concept "rain") (Concept "Austin"))
The idea here being that the graph node “rain” has an associated table; one of the table entries is for “Austin” (again, a graph node!), and the value there might be a FloatValue — a list (vector) of floating-point numbers. The above is a graphical representation of how to access that Value. The inverse of this operation is the setting of Values. Putting this all together, one might have an expression like this:
(SetValue (Concept "runoff") (Concept "Austin") (Lambda ...) (ValueOf (Concept "rain") (Concept "Austin")))
The (Lambda …) here is an un-named, anonymous formula. Like all good systems, formulas can be named, and so (DefineLink (DefinedSchema “runoff calculation”) (Lambda …)) is how one names such things; the name can be used wherever the formulas is expected.
That’s it! The above demonstrates how to wire up a formula, connecting it to Values held in the AtomSpace. There are two kinds of updates possible: manual, and automatic. A “manual” update requires calling cog-execute! on the particular expression to run. That expression will be evaluated, drawing on whatever Values are currently stored, and updating whatever Values are specified as output. The dynamic update is more interesting. Here, the formula is installed in such a way that any time one examines the Value, the formula is triggered, thus running and updating the Values just before they are presented. The dynamic Values are really what provides the title to this blog entry: they capture the idea of Values flowing about, through the network graph.
… But …. But, dynamic Values remain a new, experimental feature of the system. There are good reasons for this. First, it is all to easy to create an infinite loop: rain runs off, evaporates, goes into clouds and comes back down as rain again. Polling the current runoff can potentially trigger an unending weather-system simulation. Worse, there is the risk of combinatoric explosion: the water in the clouds comes not only from Austin, but also from other geographical locales, which in turn come from elsewhere still. Controlling combinatorial explosion in AI is an infamously hard problem. Providing practical tools to do so, in this context, remains an open problem. (Yes, we’re working on it.)
The above used rain as a stand-in for a generic flow process. In practical machine learning, one might instead want to use something like TensorFlow, or some other neural-net package. Indeed, such packages already include small, specific graph-specification languages, describing exactly how the neural net is to be wired up, where its inputs are, where it’s outputs go. But its a special-case. Weather simulation systems running on supercomputers also have ways of specifying what is connected to what. Then there is the Systems Biology Markup Language (SBML), which provides a way of specifying a differential equation that describes the kidney clearance rate of some blood component – or of the mitochondrial metabolism rate. The goal (or rather, the dream, so far) of the OpenCog AtomSpace value-flows subsystem is to provide a common home to all of these kinds of systemic simulation processes.
Interesting. It’s clear such Value update mechanisms would be useful for automatic cognitive processing. For instance at first updating a certain Value may require expensive search, maybe using reasoning or such, then over time some meta-cognitive process may realize that such expensive search can confidently be replaced by a simpler, deterministic function and place in the Value update function.
Good day. I’d share a bowl of soup, but the virtual isn’t nourishing the group.
…But not graphs, or Atoms? It simultaneously says too much and too little. No clarity.
I already don’t like this approach. But maybe there’s merit here?
Okay, I’ll need those tutorials first.
If what I’m reading is correct, the internal representation is a graph, except there are separate object types for nodes and edges, and it’s not clear whether edges are ordered in some way or not.
(_obj make pottery)(specified with either the host language’s variables or a simple backpatching routine to represent non-trees) must have been too inconvenient and too non-ambiguous to use as a graph IR, I guess, but I digress.
And there is one global dictionary that’s forced onto users (I saw no mention of separate Atom namespaces).
And, graph nodes are re-executed on re-entry, which forces
State-fiddling — except, explanations’ language implies that values flow, and not through a separate
Stateconcept but directly, which in my mind sounds like “there’s no combinatorial explosion of computation, and an accidentally-misconnected reference won’t cause an infinite loop, because one node is computed only once” (like in my own programming language, but I’m biased towards that, and can only understand things through that)… Maybe there’s caching somewhere in some circumstances, but I can’t see where, and I dislike the ambiguity.
(Also, “tail-recursive” in some advanced tutorials is misused, because there’s no recursion there, just iteration. Maybe it’s implemented as a tail-recursive loop when compiled, but that’s an implementation detail.)
And why does a supposed AGI project have so many spelling errors?
And to top it all off, the basic tutorials pull in language from logic and category theory, and throw in database concepts and confidence intervals, and use this clunky graph representation. But the advanced tutorials are very straightforward and to-the-point. This is indeed shockingly different from what I’m used to.
Okay, now I’m equipped to understand this blog post.
Is that a dictionary/map/hash-table to store values, on each node? If the runtime system flattens these maps to arrays and JIT-compiles their shape accesses away like V8 in JS does, then that’s fine, though it’s still an extra pointer for every single node.
Yeah, saw that coming. Consequence of the chosen fundamental semantics. Sucks.
Statealready does that?? (In a way that I personally disagree with, but still.)
No, maybe, that one’s for in-expression value flow, and this one is for meta-expression value flow. In other words, a function that is given an expression (and another expression as key, for good measure) and gives its “value”.
And, of course, spending two lines on caching the value to prevent infinite loops and combinatorial explosions of re-calculation was not deemed a good idea.
OSes seem to have solved it just fine, with processes that can be stopped.
But less jokingly, it can’t be controlled with any more granularity if all we know is “do X, or Y if Z, then W”, and computations should explicitly include choices for ‘stopping’ or ‘continuing’. And where there’s a choice, you can minimize some number that’s computed from locally-available information, like a prediction of global run time. (Sounds like a good scheme for practical learnable arbitrary rewrites if information is routed perfectly, which is AGI.)
Or if you don’t want learning of that calibre, the “recompute if the last computation was at least N milliseconds ago, else return from cache” functionality can allow users to patch infinite loops manually.
…No, I’m good for now. Hope you’re good too.