When one reasons about the world, what is one actually doing? This post is about that.
OpenCog has a reasoning system, called PLN, short for “Probabilistic Logic Networks”. Its actually two things: first and foremost, its a set of “rules of inference”, which can be applied to “real world knowledge”, to deduce new “facts” about the world. There are about half a dozen of these rules, and one of them resembles the classical “Modus Ponens“, except that it assigns a probability to the outcome, based on the probabilities of the inputs. For the rest of this post, the details of PLN mostly don’t matter: if you wish, you can think of classical propositional logic, or of some kind of fuzzy logic, if you wish, or even competing systems such as NARS. Anyway, PLN applies these rules of inference to the Atoms contained in the AtomSpace, to generate new Atoms. This is a fancy way of saying that the AtomSpace is the knowledge repository in OpenCog, an that the atoms are the “facts”. Its not much more than that: its just a big jumble of facts.
I want to talk about reasoning using PLN. Now, this is NOT the way that the current opencog code base implements PLN reasoning; however, its a conceptual description of what it could (or should, or might) do.
Now, I mentioned that PLN consists of maybe a half-dozen or a dozen rules of inference. They have fancy names like “modus ponens” but we could call them just “rule MP” … or just “rule A”, “rule B”, and so on.
Suppose I start with some atomspace contents, and apply the PLN rule A. As a result of this application, we have a “possible world 1″. If, instead, we started with the same original atomspace contents as before, but applied rule B, then we would get “possible world 2″. It might also be the case that PLN rule A can be applied to some different atoms from the atomspace, in which case, we get “possible world 3″.
Each possible world consists of the triple (some subset of the atomspace, some PLN inference rule, the result of applying the PLN rule to the input). Please note that some of these possible worlds are invalid or empty: it might not be possible to apply the choosen PLN rule to the chosen subset of the atomspace. I guess we should call these “impossible worlds”. You can say that their probability is exactly zero.
Observe that the triple above is an arrow: the tail of the arrow is “some subset of the atomspace”, the head of the arrow is “the result of applying PLN rule X”, and the shaft of the arrow is given a name: its “rule X”. (In fancy-pants, peacock language, the arrows are morphisms, and the slinging together, here, are Kripke frames. But lets avoid the fancy language since its confuses things a lot. Just know that it’s there.)
Anyway — considering this process, this clearly results in a very shallow tree, with the original atomspace as the root, and each branch of the tree corresponding to possible world. Note that each possible world is a new and different atomspace: The rules of the game here are that we are NOT allowed to dump the results of the PLN inference back into the original atomspace. Instead, we MUST fork the atomspace. Thus, if we have N possible worlds, then we have N distinct atomspaces (not counting the original, starting atomspace). This is very different from what the PLN code base does today: it currently dumps its results back into the original atomspace. But, for this conceptual model, we don’t want to do that.
Now, for each possible world, we can apply the above procedure again. Naively, this is a combinatoric explosion. For the most part, each different possible world will be different than the others. They will share a lot of atoms in common, but some will be different. Note, also, that *some* of these worlds will NOT be different, but will converge, or be “confluent“, arriving at the same atomspace contents along different routes. So, although, naively, we have a highly branching tree, it should be clear that sometimes, some of the branches come back together again.
I already pointed out that some of the worlds are “impossible” i.e. have a probability of zero. These can be discarded. But wait, there’s more. Suppose that one of the possible worlds contains the statement “John Kennedy is alive” (with a very very high confidence) , while another one contains the statement “John Kennedy is dead” (with a very very high confidence). What I wish to claim is that, no matter what future PLN inferences might be made, these two worlds will never become confluent.
There is also a different effect: during inferencing (i.e. the repeated application of PLN), one might find oneself in a situation where the atoms being added to the atomspace, at each inference step, have lower and lower probability. At some point, this suggests that one should just plain quit — that particular branch is just not going anywhere. Its a dead end. A similar situation occurs when no further PLN rules can be applied. Dead end.
OK, that’s it. The above provides a very generic description of how inferencing can be performed. It doesn’t have to be PLN — it could be anything — classical logic using sequent calculus, for example. So far, everything I said is very easy-peasy, direct and straightforward. So now is where the fun starts.
First, (lets get it out of the way now) the above describes *exactly* how Link Grammar works. For “atomspace” substitute “linkage” and for “PLN rule of inference” substitute “disjunct“. That’s it. End of story. QED.
Oh, I forgot to introduce Link Grammar. It is a system for parsing natural languages, such as English. It does this by maintaining a dictionary of so-called “disjuncts”, which can be thought of “jigsaw puzzle pieces”. The act of parsing requires finding and joining together the jigsaw pieces into a coherent whole. The final result of the act of parsing is a linkage (a parse is a linkage – same thing). These jigsaw puzzle pieces are nicely illustrated in the very first paper on Link Grammar.
Notice that each distinct linkage in link-grammar is a distinct possible-world. The result of parsing is to create a list of possible worlds (linkages, aka “parses”). Now, link-grammar has a “cost system” that assigns different probabilities (different costs) to each possible world: this is “parse ranking”: some parses (linkages) are more likely than others. Note that each different parse is, in a sense, “not compatible” with every other parse. Two different parses may share common elements, but other parts will differ.
Claim: the link-grammar is a closed monoidal category, where the words are the objects, and the disjuncts are the morphisms. I don’t have the time or space to articulate this claim, so you’ll have to take it on faith, or think it through, or compare it to other papers on categorial grammar or maybe pregroup grammar. There is nice example from Bob Coecke showing the jigsaw-puzzle pieces. You can see a similar story develop in John Baez’s “Rosetta Stone” paper, although the jigsaw-pieces are less distinctly illustrated.
Theorem: the act of applying PLN, as described above, is a closed monoidal category. Proof: A “PLN rule of inference” is, abstractly, exactly the same thing as a link-grammar disjunct. The contents of the atomspace is exactly the same thing as a (partially or fully) parsed sentence. QED.
There is nothing more to this proof than that. I mean, it can fleshed it out in much greater detail, but that’s the gist of it.
Observe two very important things: (1) During the proof, I never once had to talk about modus ponens, or any of the other PLN inference rules. (2) During the proof, I never had to invoke the specific mathematical formulas that compute the PLN “TruthValues” — that compute the strength and confidence. Both of these aspects of PLN are completely and utterly irrelevant to the proof. The only thing that mattered is that PLN takes, as input, some atoms, and applies some transformation, and generates atoms. That’s it.
The above theorem is *why* I keep talking about possible worlds and kripke-blah-blah and intuitionistic logic and linear logic. Its got nothing to do with the actual PLN rules! The only thing that matters is that there are rules, that get applied in some way. The generic properties of linear logic and etc. are the generic properties of rule systems and Kripke frames. Examples of such rule systems include link-grammar, PLN, NARS, classical logic, and many more. The details of the specific rule system do NOT alter the fundamental process of rule application aka “parsing” aka “reasoning” aka “natural deduction” aka “sequent calculus”. In particular, it is a category error to confuse the details of PLN with the act of parsing: the logic that describes parsing is not PLN, and PLN dos not describe parsing: its an error to confuse the two.
What remains to be done: I believe that what I describe above, the “many-worlds hypothesis” of reasoning, can be used to create a system that is far more efficient than the current PLN backward/forward chainer. It’s not easy, though: the link-parser algorithm struggles with the combinatoric explosion, and has some deep, tricky techniques to beat it down. ECAN was invented to deal with the explosion in PLN. But there are other ways.
By the way: the act of merging the results of a PLN inference back into the original atomspace corresponds, in a very literal sense, to a “wave function collapse”. As long as you keep around multiple atomspaces, each containing partial results, you have “many worlds”, but every time you discard or merge some of these atomspaces back into one, its a “collapse”. That includes some of the truth-value merge rules that currently plague the system. To truly understand these last three sentences, you will, unfortunately, have to do a lot of studying. But I hope this blog post provides a good signpost.