When one reasons about the world, what is one actually doing? This post is about that.
OpenCog has a reasoning system, called PLN, short for “Probabilistic Logic Networks”. Its actually two things: first and foremost, its a set of “rules of inference”, which can be applied to “real world knowledge”, to deduce new “facts” about the world. There are about half a dozen of these rules, and one of them resembles the classical “Modus Ponens“, except that it assigns a probability to the outcome, based on the probabilities of the inputs. For the rest of this post, the details of PLN mostly don’t matter: if you wish, you can think of classical propositional logic, or of some kind of fuzzy logic, if you wish, or even competing systems such as NARS. Anyway, PLN applies these rules of inference to the Atoms contained in the AtomSpace, to generate new Atoms. This is a fancy way of saying that the AtomSpace is the knowledge repository in OpenCog, an that the atoms are the “facts”. Its not much more than that: its just a big jumble of facts.
I want to talk about reasoning using PLN. Now, this is NOT the way that the current opencog code base implements PLN reasoning; however, its a conceptual description of what it could (or should, or might) do.
Now, I mentioned that PLN consists of maybe a half-dozen or a dozen rules of inference. They have fancy names like “modus ponens” but we could call them just “rule MP” … or just “rule A”, “rule B”, and so on.
Suppose I start with some atomspace contents, and apply the PLN rule A. As a result of this application, we have a “possible world 1”. If, instead, we started with the same original atomspace contents as before, but applied rule B, then we would get “possible world 2”. It might also be the case that PLN rule A can be applied to some different atoms from the atomspace, in which case, we get “possible world 3”.
Each possible world consists of the triple (some subset of the atomspace, some PLN inference rule, the result of applying the PLN rule to the input). Please note that some of these possible worlds are invalid or empty: it might not be possible to apply the choosen PLN rule to the chosen subset of the atomspace. I guess we should call these “impossible worlds”. You can say that their probability is exactly zero.
Observe that the triple above is an arrow: the tail of the arrow is “some subset of the atomspace”, the head of the arrow is “the result of applying PLN rule X”, and the shaft of the arrow is given a name: its “rule X”. (In fancy-pants, peacock language, the arrows are morphisms, and the slinging together, here, are Kripke frames. But lets avoid the fancy language since its confuses things a lot. Just know that it’s there.)
Anyway — considering this process, this clearly results in a very shallow tree, with the original atomspace as the root, and each branch of the tree corresponding to possible world. Note that each possible world is a new and different atomspace: The rules of the game here are that we are NOT allowed to dump the results of the PLN inference back into the original atomspace. Instead, we MUST fork the atomspace. Thus, if we have N possible worlds, then we have N distinct atomspaces (not counting the original, starting atomspace). This is very different from what the PLN code base does today: it currently dumps its results back into the original atomspace. But, for this conceptual model, we don’t want to do that.
Now, for each possible world, we can apply the above procedure again. Naively, this is a combinatoric explosion. For the most part, each different possible world will be different than the others. They will share a lot of atoms in common, but some will be different. Note, also, that *some* of these worlds will NOT be different, but will converge, or be “confluent“, arriving at the same atomspace contents along different routes. So, although, naively, we have a highly branching tree, it should be clear that sometimes, some of the branches come back together again.
I already pointed out that some of the worlds are “impossible” i.e. have a probability of zero. These can be discarded. But wait, there’s more. Suppose that one of the possible worlds contains the statement “John Kennedy is alive” (with a very very high confidence) , while another one contains the statement “John Kennedy is dead” (with a very very high confidence). What I wish to claim is that, no matter what future PLN inferences might be made, these two worlds will never become confluent.
There is also a different effect: during inferencing (i.e. the repeated application of PLN), one might find oneself in a situation where the atoms being added to the atomspace, at each inference step, have lower and lower probability. At some point, this suggests that one should just plain quit — that particular branch is just not going anywhere. Its a dead end. A similar situation occurs when no further PLN rules can be applied. Dead end.
OK, that’s it. The above provides a very generic description of how inferencing can be performed. It doesn’t have to be PLN — it could be anything — classical logic using sequent calculus, for example. So far, everything I said is very easy-peasy, direct and straightforward. So now is where the fun starts.
First, (lets get it out of the way now) the above describes *exactly* how Link Grammar works. For “atomspace” substitute “linkage” and for “PLN rule of inference” substitute “disjunct“. That’s it. End of story. QED.
Oh, I forgot to introduce Link Grammar. It is a system for parsing natural languages, such as English. It does this by maintaining a dictionary of so-called “disjuncts”, which can be thought of “jigsaw puzzle pieces”. The act of parsing requires finding and joining together the jigsaw pieces into a coherent whole. The final result of the act of parsing is a linkage (a parse is a linkage – same thing). These jigsaw puzzle pieces are nicely illustrated in the very first paper on Link Grammar.
Notice that each distinct linkage in link-grammar is a distinct possible-world. The result of parsing is to create a list of possible worlds (linkages, aka “parses”). Now, link-grammar has a “cost system” that assigns different probabilities (different costs) to each possible world: this is “parse ranking”: some parses (linkages) are more likely than others. Note that each different parse is, in a sense, “not compatible” with every other parse. Two different parses may share common elements, but other parts will differ.
Claim: the link-grammar is a closed monoidal category, where the words are the objects, and the disjuncts are the morphisms. I don’t have the time or space to articulate this claim, so you’ll have to take it on faith, or think it through, or compare it to other papers on categorial grammar or maybe pregroup grammar. There is nice example from Bob Coecke showing the jigsaw-puzzle pieces. You can see a similar story develop in John Baez’s “Rosetta Stone” paper, although the jigsaw-pieces are less distinctly illustrated.
Theorem: the act of applying PLN, as described above, is a closed monoidal category. Proof: A “PLN rule of inference” is, abstractly, exactly the same thing as a link-grammar disjunct. The contents of the atomspace is exactly the same thing as a (partially or fully) parsed sentence. QED.
There is nothing more to this proof than that. I mean, it can fleshed it out in much greater detail, but that’s the gist of it.
Observe two very important things: (1) During the proof, I never once had to talk about modus ponens, or any of the other PLN inference rules. (2) During the proof, I never had to invoke the specific mathematical formulas that compute the PLN “TruthValues” — that compute the strength and confidence. Both of these aspects of PLN are completely and utterly irrelevant to the proof. The only thing that mattered is that PLN takes, as input, some atoms, and applies some transformation, and generates atoms. That’s it.
The above theorem is *why* I keep talking about possible worlds and kripke-blah-blah and intuitionistic logic and linear logic. Its got nothing to do with the actual PLN rules! The only thing that matters is that there are rules, that get applied in some way. The generic properties of linear logic and etc. are the generic properties of rule systems and Kripke frames. Examples of such rule systems include link-grammar, PLN, NARS, classical logic, and many more. The details of the specific rule system do NOT alter the fundamental process of rule application aka “parsing” aka “reasoning” aka “natural deduction” aka “sequent calculus”. In particular, it is a category error to confuse the details of PLN with the act of parsing: the logic that describes parsing is not PLN, and PLN dos not describe parsing: its an error to confuse the two.
What remains to be done: I believe that what I describe above, the “many-worlds hypothesis” of reasoning, can be used to create a system that is far more efficient than the current PLN backward/forward chainer. It’s not easy, though: the link-parser algorithm struggles with the combinatoric explosion, and has some deep, tricky techniques to beat it down. ECAN was invented to deal with the explosion in PLN. But there are other ways.
By the way: the act of merging the results of a PLN inference back into the original atomspace corresponds, in a very literal sense, to a “wave function collapse”. As long as you keep around multiple atomspaces, each containing partial results, you have “many worlds”, but every time you discard or merge some of these atomspaces back into one, its a “collapse”. That includes some of the truth-value merge rules that currently plague the system. To truly understand these last three sentences, you will, unfortunately, have to do a lot of studying. But I hope this blog post provides a good signpost.
“Note that each possible world is a new and different atomspace: The rules of the game here are that we are NOT allowed to dump the results of the PLN inference back into the original atomspace. Instead, we MUST fork the atomspace. Thus, if we have N possible worlds, then we have N distinct atomspaces (not counting the original, starting atomspace). This is very different from what the PLN code base does today: it currently dumps its results back into the original atomspace. But, for this conceptual model, we don’t want to do that”
Why don’t we want that? Could you explain this further, maybe giving some examples? If there are 20 issues I am not certain about (like: is the given politician honest or corrupted) and for each of them I have 2 hypothesis then I would need 2^20 different atmospaces to hold this. Is that right? Or does this forking of atomspaces concern only reasoning process and is temporary?
I believe, we don’t want to do that because it is a destructive transaction.
Say you dream up an alternate reality where all color was purple. Now think of the many different conclusions you would have come to in this monocolor world. Now suppose you merge these fantasy conclusions back into your own, real-world knowledge. Since the (rather silly) conclusions have overwritten your previous ones (where they overlap), the sensible data is destroyed, and you’d have trouble living normally…
Rather, we want to keep both, and if some atoms are useless, we can decide that later – not at the time of generation.
Bingo! Yes, exactly! A very good illustration. Even the use of probabilities does not solve this problem — e.g. by averaging together, instead of over-writing. If you just took the average of the real-world and the purple-world, that average would still be pretty nutty in many cases. Even if you were more clever, and used a “distributional TV” everywhere (kept the entire probability distribution) you would still have this purplishness in there, along for the ride, jamming up real-world inference.
Isn’t this handled by Context Links so that your conclusions are only true in the context of the Purple World? But since i don’t understand the theory right now not sure how Context Links relate to this. Maybe they are a kind of implementation of this or just do something similar.
It is important, at this stage, to separate theory and implementation. Just because, in theory, there are 2^20 atomspaces, does not mean that the actual code should implemented as stupidly as possible. In fact, there are some highly efficient implementation tricks for this, with very minimal overhead. But none of that matters, if the theory itself remains opaque, cloudy and mis-understood.
Right now, based on the kinds of regular discussions that I have, the theory remains widely mis-understood by just about everyone on the mailing list — which is a shame, because there are books, and I’m guessing, even textbooks, on the topic. This kind of stuff is widely applied in the semi-conductor industry. We can apply it here, too.
To answer your question: “Why don’t we want that?” There are two answers: (1) it changes the contents of the atomspace, and thus changes all subsequent steps. (2) (meta) it prevents the theory from being developed. If you do that, then you can’t have a theory of not-doing-that.
Why not store the forked worlds differentially, that is to say, don’t actually keep an entire new world around but only the newly generated atoms. Then why not find the duplicate atoms among the forks and store only one copy of each atom, marking it with the PLN rules that made it (all the fork worlds it is a member of). Better yet, we can then take these reduced worlds, free from redendancies, and put the atoms back in the main world – without any wave function collapse – since the atoms are marked distinctly. A simple filter of these markings can make this main world then appear as any forked world we wish to browse, and the non-merger can be programmatically invisible, especially in functional languages like Haskell. Speaking of Haskell, said atoms could be stored merely as functional applications, lazily un-evaluated until the atom is needed or interesting. This would reduce the reasoning workload and allow JIT atom generation…
My initial goals here are to lay down the theory, and understand how things work. Once that is out of the way, then efficient implementations become clear.
The analogy of, or parallel with an arrow made me think of a Richard Feynman lecture i watched on YouTube in which he broke down Quantum Mechanics using arrows. I think arrows are a great tool of theoretical intuition and they are also translatable into effective visual symbology. This was also a very digestable read from a lay point of view. Thankyou for articulating something clearly packed with complexity in a relatively simple way.