OpenCog uses hypergraphs to represent knowledge. Why? I don’t think this is clearly, succinctly explained anywhere, so I will try to do so here. This is a very important point: I can’t begin to tell you how many times I went searching for some whiz-bang logic programming system, or inference engine, or theorem-prover, or some graph re-writing engine, or some probabilistic programming system, only to throw up my hands up and realize that, after many wasted hours, none of them do what I want. If you’re interested in AGI, then let me assure you: they don’t do what you want, either. So, what do I want them to do, and why?
Well, lets begin easy: with graph re-writing systems. These days, almost everyone agrees that a great way to represent knowledge is with graphs. The structure IsA(Cat, Animal) looks like a graph with two vertexes, Cat and Animal, and a labelled edge, IsA, between them. If I also know that IsA(Binky, Cat), then, in principle, I should be able to deduce that IsA(Binky, Animal). This is a simple transitive relationship, and the act of logical deduction, for this example, is a simple graph re-write rule: If you see two IsA edges in a row, you should draw a third IsA edge between the first and the last vertex. Easy, right?
So perhaps you’d think that all logic induction and reasoning engines have graph rewrite systems at their core, right? So you’d think. In fact, almost none of them do. And those that do, do it in some internal, ad hoc, non-public, undocumented way: there’s no API, its not exposed externally; its not an ‘official’ part of the system for you to use or tinker with.
OK, so why do I need a graph re-write system? Well, I’ve been working on a natural language parser, a so-called Viterbi decoder for Link Grammar. My initial graph is a string of words: a sentence. The vertexes are words, and the edges are arrows called “next word”. Real simple. To parse this sentence, I want to apply a certain set of simple graph-rewrite rules: for example, if word X is a noun, then create an arrow, called ‘part-of-speech’ (POS), from word X to the special vertex ‘noun’. If the word immediately before word X is an adjective (i.e. if it has a POS arrow pointing to ‘adjective’), then create a new arrow, called ‘noun modifier’, pointing from X to this word before it. This kind of graph markup is called ‘dependency parsing‘, and is a very popular way of doing natural language parsing. So you’d think that all dependency parsers have a graph re-write system at their core, right? Hardly. In fact, just about none of them do. And if they do, they’re walled off, hacked up, undocumented … you get the idea.
The only dependency parser that I know of that has an explicit graph-rewriting system in it, that is open for tinkering, and is documented (!) is RelEx. And that’s great. Wow! Although RelEx invented and used its own, custom, graph rewrite system, I suppose that, in principle, it could have used some other, pre-existing system to do this (Well, it couldn’t, because in 2005, there weren’t any robust, open-source graph rewriting systems. Whatever).
What else do I want to do? Well, I daydream about using a machine-learning system to learn new rules! I mean, this is the goal of AGI, right? Have a machine that can learn new things? Well, to learn new rules, lets see, I need to have some simple syntax for representing rules. Basically a simple graph language. So you’d think that all graph re-writing systems have some simple, easy-to-use graph language, right? No. Emphatically, hell no. With maybe one exception, you have to program in Java or C++ or C#.net. Unfortunately, my machine learning system doesn’t yet know how to program in those languages.
Here’s the leap of faith, the leap of logic that I’ve come to: It would be convenient if I could express graph re-write rules as graphs themselves. It would be convenient if I could express logical implications as graphs themselves. It would be convenient if my graphical programming language itself could be written as a graph. Well, it can be. In fact, it is easiest to do this if the graph is actually a hypergraph. I’ll explain why in the next section below. If I had a hypergraph re-writing system, than I would have a place where I could unify natural language processing, logical reasoning and machine learning, all in one place. So you’d think that anyone who was trying to build an AGI system wouldbe writing its foundations on a hypergraph rewriting system, right? No, you’d be wrong. Apparently, OpenCog is the only system that does this. Now, the OpenCog implementation has many design warts and architectural flaws. Its hard to understand and hard to use. But now, perhaps, now you see why I’ve pledged allegiance to it, instead of running off with some other reasoning system or parser or Bayesian network or whatever.
In this section, I will try to put all of the above comments on a solid mathematical footing, invoking model theory, category theory, (and even n-categories!), and type theory. The upshot of all of this will be that the easiest way to represent data structures so that machine learning algorithms can learn them, and then apply them both to natural-language parsing, and to logical reasoning, is to represent the data structures as hypergraphs.
From model theory and computer science, we have the concept of a signature: a set of functions which take some number of arguments and return some value (just like a signature in Java or C++). If one ignores types for a moment (which is what lisp and scheme do), then, in principle, one can pass any value in any position of any function, and stack these up arbitrarily, recursively, even. This is called a term algebra, or more precisely a free term algebra or ‘free theory’. If the functions don’t have names, but are anonymous, then one has the lambda calculus.
One way to envision a member of a term algebra is as a directed tree graph. So, if we have two functions f(x,y) and g(x,y) and three constants a,b,c, then f(a, g(b,c)) is a binary tree, with f at the top node, and g as the left node, and a, b and c as the leaves. A term algebra is then just the collection of all such trees. Nothing more, nothing less.
To do useful programming, one also needs predicates or relations: things that have truth values, and order terms. Thus, ‘greater then’ is a relation, and ‘a>b’ is either true or false. Relations can also be things like IsA, HasA, BelongsTo, LivesIn, EmployedAt. The last two examples should make clear that relational algebras form the foundation of databases, whether SQL or noSQL. Relations are combined with logical operators (employee X LivesIn city Y AND ReportsTo dept Z is a textbook example).
In general, one combines both term algebras and relational algebras, so that one writes things like 3<f(x,y) where f(x,y) is a term, < is a relation, 3 is a constant. Add to this the special free-variable binding operators ForAll and ThereExists, one gets a First-Order Logic. So, for example, ForAll x ThereExists y such that 3<f(x,y).
A special case of a relation is a term re-write rule. This is a relation that takes a term, and replaces it with a different term: for example, ab->c, which says ‘whenever you see the string ab, replace it with c’. The BNF notation of computer languages is just a collection of term re-writing relations. One uses a term rewriting system to parse a (formal) language. Graph rewriting is just a variation of this: whenever you see a graph x, replace it with a graph y.
So far, I’ve avoided the issue of types. In programming, types allow type safety. Types make code more readable: f(string, int) is less mysterious than f(x,y). Types solve certain abstract recursion problems in lambda calculus. A re-write rule in BNF notation is a typed rewrite rule: a substitution a->bc holds not just for any a, but specifically, only when a is a web page, or an IP address or a URL. A graph re-write rule that says ‘whenever you see x, replace it with y’ implicitly demands that x be typed: x can’t be just anything, it has to be a specific kind of graph, having a specific shape and linkage. The rule applies for all graphs that have this shape, that are of this kind or type. So a re-write rule x->y is really a rule (type x)->(type y). Graphically, its still two points x and y, with a directed edge -> in between them. Oh, wait, x and y aren’t points, x and y are graphs. What kind of a graph has graphs as points? What kind of graph has edges between graphs? A hypergraph!
And that is the main Ah-HA! moment. Once you see that, you start seeing hypergraphs everywhere. Sure, you can visualize Set(a,b,c) as a tree-graph, with Set as the parent node, and three children a,b,c. Or you can visualize this as a hypergraph: Set as a ‘link’ (a ‘hyper-edge’ with 3 endpoints, not 2), and the points a,b,c as the nodes contained in the link. In fact, all hypergraphs are dual to these directed trees; if you have one, you can have the other. Hypergraphs are just a convenient notation.
Lets take a moment to look back on what just happened: a function f(x,y,z) is just a hyperedge f connecting three nodes x,y,z. A boolean expression a AND b AND c can be written as AND(a,b,c), which shows a specific example of a hypergraph equivalance. It can be written as a reduction rule: (a AND b AND c) -> AND(a,b,c) which is itself just a hypergraph of the form x->y with x and y being hypergraphs. The first-order logic constructs ‘for-all’ and ‘there-exists’ are just special cases of the lambda-calculus binding operation lambda, which binds free variables in an expression. Again, hypergraphs: lambda is just a hyperlink that binds a variable x in an expression y, and y was just a term, ahem, hypergraph!
I mentioned categories and n-categories, and I suppose I should justify this mention. Insofar as category theory is the theory of points and arrows, then a rewrite rule between graphs is a morphism in the category of small diagrams. A subtle but important point about category theory that is almost never discussed in intro-to-cat-theory texts, is that all objects are implicitly typed. In the the category of Sets, the objects are all of the same kind: they are sets. Its not mentioned because in a given category, all objects are of the same type; types change only when a functor maps from one to another. So, to understand the category-theoretic equivalent of types in computer science, we must think of functors. But, as we just saw, a graph rewriting rule is a morphism between functors. So you could say that graph re-writing is just the category Cat of small categories. Or you could slide down this slope in a different direction, and start calling it a 2-category. Whatever. Perhaps its useful to point out that graph rewriting algorithms are sometimes expressed as being one-pushouts or as being 2-pushouts, with a pushout being a certain category-theoretic concept. Notable, for graph rewriting, is that any category with pushouts and equalizers has all (co-)limits. Except that, as we just saw, we want hyper-graph rewriting systems, not graph rewriting systems. So there.
What else are they good for?
In OpenCog, the Link and Node types inherit from the type Atom. This naming convention is intentionally suggestive: ‘Atom’ is meant to invoke the notion of an ‘atomic formula’ from model theory or first-order logic: that is, a formula that has no variables in it (its fully grounded), and that does have a truth value (its not composed of boolean connectives, and has no quantifiers in it). This suggestive naming helps establish the intended use of OpenCog hypergraphs with regards to first-order logic.
The truth value is a different matter. The default (simplest) OpenCog truth value is a pair of floating point numbers: a probability and a confidence. These numbers allow several other AI concepts to be mapped into hypegraphs: Bayesian networks, Markov networks, and artificial neural networks. All three of these are graphs: directed graphs, at that. They differ in how they assign and propagate floating-point probabilites, entropies, activations. Ideas such as Markov logic networks, which implement maximum entropy principles (aka Boltzmann parition function) on a network of first-order logic expressions, can be represented with OpenCog hypergraphs. Oh, and I should mention PLN (Probabilistic Logic Networks), which is what the atomspace was originally designed for. That’s what I like about the OpenCog hypergraph atomspace: it has a tremendously powerful ability to succinctly and easily represent complex modern AI concepts.
The good, the bad and the ugly.
You’ve heard about the good. Now for the bad and the ugly. First, the OpenCog atomspace implementation is slow and inefficient, over-complicated, badly architected, weakly-distributed, non-scalable, single-threaded. But lets not go there. All this might be fixable, after a lot of programming effort (and deep, architectural thinking). Its been hotly debated in the past. Someday, maybe it’ll get fixed.
The bad thing about the OpenCog atomspace is that almost no one understands that, ahem, it is a programming language. Let me be very clear on this: OpenCog implements graph re-writing rules with the ImplicationLink. A sequence of ImplicationLinks can be used to compute things. In that sense, it is somewhat like the language Graph Programs, except that OpenCog allows fractional truth values, and logic programming and other good things. If we stick to using ImplicationLinks with crisp truth values (T/F), then the resulting system is essentially Prolog. Of course you know that Prolog is popular for AI programming projects, because its fairly easy to write reasoning engines and expert systems and the like in Prolog. What you may not know is that closely related to Prolog is Answer-Set Programming (ASP) . In fact, ASP uses exactly the same notation as Prolog does. It differs in two important ways: first, when you run a Prolog program, you get one answer. With ASP, you get all of the answers! Its dramatically more powerful, and the reason for this is that modern-day ASP solvers are built on top of modern-day Boolean SAT solvers. Which means that they are stunningly efficient and effective.
So what does this have to do with OpenCog? Well, here we have a system that, using ImplicationLinks, is essentially Prolog, more or less, when run in crisp-logic mode. Or, you could say, its like typed Lambda calculus. But do we have a simple, easy-to-use syntax like Prolog for it? No we don’t. That’s bad. Can we take an existing Prolog program, run a tool on it, and convert it to ImplicationLinks? No we don’t. Would it run fast? No it wouldn’t: it would probably be slower than the slowest Prolog ever: Borland prolog running on a 5MHz IBM PC AT in 1986. And forget an ASP solver for OpenCog. For the special case where all OpenCog truth values are crisp T/F values, we do not have a Boolean SAT solver to find solutions for our graphs of ImplicationLinks. This is bad, Really Bad. But I think that this is because very few people seem to understand that the OpenCog Atomspace really is a petri dish for programming languages.
Heck, we don’t even have anything equivalent to the RelEx Sentence Algorithms for OpenCog, even though RelEx is OpenCog-like. This absence is slowing down my efforts to continue work on the Link-Grammar parser, and to move natural language processing out of its stand-alone arena, into a general, flexible framework.
(And we’ve barely scratched the surface. In order to make implication and pattern mining run quickly in the atomspace, we need to implement something like the concept of ‘memoization‘ from lisp/scheme. But it turns out that memoization is really just a relational algebra: it is a database of short expressions that stand in for long ones. The OpenCog Atomspace is also, among other things, a relational database that can store and query not only flat tables or key-value pairs, but full-blown hypergraphs. And this isn’t a day-dream; its crucial for performance (and its partially implemented)).
Why don’t we have these things? Well, its hard. Its just not easy. We don’t have the infrastructure to make it easy, and we don’t have the users who demand these tools. I don’t think most users are even aware of what the atomspace could even do. Almost no one is thinking about ‘how to program in the language of OpenCog’ even though it has the potential of far surpassing any of the existing probabilistic programming languages out there. Its time to change all this, but it will take someone smart and dedicated to do this. Many someones. This could be you.
Linas, waht you are discussing looks like Topos http://en.wikipedia.org/wiki/Topos
Yes, Sheaves offer yet another way to think about things. Its difficult, but ultimately insightful, to try to simultaneously state the general problems/issues of ‘representing knowledge’ using each f the frameworks of Topoi, Model Theory, Type Theory and Category Theory — they all offer different perspectives, but none seem to capture it all. I’m struggling with getting a unified view, from all perspectives.
(Aieee … and we haven’t even mentioned probability; I’m convinced that maximum entropy principles are the correct way to deal with probability in this context, but how do you apply maximum entropy (aka partition function) to any of Topoi, Model Theory, Type Theory or Category Theory, without getting a total hash? And yet, that is the goal, in a certain way — to find the correct probabilisitc framework that captures these different systems).
I agree, MaxEnt is the way to go if we are only looking at static systems. I am using Adom Giffin (and Ariel Caticha) ‘s work for the dynamic case. http://arxiv.org/abs/0708.1593 🙂
Yes, thanks. I’ve been struggling to understand non-commuting constraints, so this is timely. Reading now.
BTW: Mike Stay and John Baez might be interested in a simple explanation of the difference between quantum and classical dynamics, illustrated using baseball. Imagine a baseball diamond with a tree growing at third base. Where should a baseball-catching robot stand? The classical dynamics answer is that it should stand at the location of the (single) most-likely baseball trajectory (i.e. smack-dab in the middle of center field). Only one trajectory has this maximum expectation value. The fact that a tree is nearby does not change this value at all — the most likely path for the baseball is over dead-center second base.
The quantum dynamics answer is to consider all possible trajectories, including those that hit the tree, and realize that the tree-hitting trajectories are nowhere near center field, and that, to maximize the likelihood of catching the ball, we should probably stand off-center, closer to the tree (to sometimes catch balls that hit the tree). This integral over all-possible paths is known as the partition function in quantum field theory. (Also called partition function in MaxEnt, as well).
I’m not sure if the tree is a “non-commuting constraint” or not, but the point is that path integrals typically do show you how to correctly handle the non-commuting operators of quantum mechanics. I haven’t entirely grasped how this all works for ordinary probability and machine learning, though.
One thing to note about your example of the tree at third base. Its presence will distort the pdf away from a nice Gaussian…
Another way of looking at sheaves may be to consider a pair of them and the connections (?) between then as some form of a bipartite graph.
Hello Linas. Very interesting article. One question. You say when you run Prolog you only get one answer. But even disregarding predicates like findall(), Prolog’s backtracking engine by its very nature returns all answers as long as you ask for them (continue to “fail” until all answers are produced). Or am I misunderstanding something here?
Sorry, I just saw this comment. My statement about prolog merely parrots ‘common knowledge’ about answer-set programming (ASP). The problem is that back-tracking is stunningly slow; boolean SAT is stunningly fast. So, in ASP, you can ask ‘is this 39-bit expression equal to that one?’ and get a yes/no answer in a second or two, whereas in prolog, you would have to try all 2^39 bit-value assignments, which would take hours or days. So, notationally, Prolog ans ASP are quite similar; operationally, they are designed for different tasks and have very different implementations under the covers (and if Prolog systems have recently sprouted boolean SAT under the covers, that would be news to me.)
Damn, your working with Ben! Nice! Please yell at him for me. 😉
Thanks Linas, I learned a bunch of things from it…
Using Link Grammar as the knowledge representation itself… that’s a brilliant idea. The links provide grammatical information as to how to interpret the words (concepts). This seems better than my purely algebraic approach.
On the cons side: the link grammar graph seems not to be a tree, so it may take some decomposition to cast it into algebraic form. The algebraic forms are usually more efficient to work with, rather than graph rewriting — that’s my impression.
Hi, You wrote “… all hypergraphs are dual to these directed trees…”. What duality is this?
Hi, Sorry, I just spotted this question — the answer is given in wikipedia; pay attention to the part that talks about incidence graphs and Levi graphs.
AH! also “the operation of taking the dual of a hypergraph is an involution”. Thank you for the pointer, I missed this somehow…
I agree wholeheartedly that some kind of hypergraph is a good way to encode knowledge. Good analogies are hypergraph homomorphisms (or at least spans of hypergraphs) where the same semantic structure appears in two different places.
That said, I’m having trouble parsing what you’re saying about types in computer science and how rewrite rules are morphisms of functors.
A (directed multi-) graph is a set V of vertices and a set E of edges, together with source and target maps from E to V. Given a graph, we can form the free category on the graph whose objects are vertices and whose morphisms are paths.
There is a 2-category GRAPH of graphs, graph homomorphisms, and graph transformations. A graph homomorphism F:G₁ -> G₂ maps vertices of G₁ to vertices of G₂ and edges of G₁ to edges of G₂ such that source and target are preserved, i.e. adjacency is preserved. A graph transformation α:F => F’ between homomorphisms assigns to each vertex u in G₁ an edge αᵤ:Fu -> F’u in G₂ such that for any edge e:u->v in G₁, there’s a square with sides Fe, αᵥ, F’e, αᵤ in G₂.
Graph homomorphisms are much stricter than arbitrary functors: they preserve edges, whereas functors only preserve paths. But as I understand graph rewrites, they are too general even to be functors because they may transform the graph in arbitrary ways. There’s certainly a category whose objects are graphs and whose morphisms are graph rewrites, but it would contain the underlying category of GRAPH as a subcategory. I don’t see how “a graph rewrite rule is a morphism of functors”.
Computer science usually equates the notion of a data type with a computably enumerable set of values of that type; see, e.g. Hask, the category of Haskell types with Haskell functions between them. That said, one can certainly talk about doing n-category theory internal to Hask.
Can you clarify a bit more what you have in mind?
What I probably meant was “Yes, I’m coming to dinner, let me finish this sentence by writing the first thing that enters my mind.” As I reconstruct what I was thinking, I think I meant to write “a functor between morphisms”. The idea to even mention n-categories was uncontemplated; it was a last-minute inspiration to add to something already rushed, and so perhaps a poor decision.
Everything you say sounds correct. Perhaps I am talking about Hask. The gist of what I am trying to say is that there are mutiple different languages for talking about such things, and they include model theory, category theory and type theory — but the devil is in the details, which are carefully avoided in the
above. Sooo …
Let me try to reconstruct what I must have been thinking at the time. So …
lets begin with the naive view:
(a AND b AND c) is a hypergaph (object)
AND(a,b,c) is a hypergaph (object)
so a re-write rule (a AND b AND c) -> AND(a,b,c) is a morphism.
So, in this naive view, we just have a category of (certain, specific, given) hypergraphs, and nothing more. (Its a category because morphisms are always composable, and there’s an identity morphism for each object, and there are no other constraints to be satisfied).
But … viewed as type-theory types, (a AND b AND c) is a different type than AND(a,b,c) I’ve abused notation a little bit here, and, for the first one, I should have written AND(AND(a,b), c) to emphasize its really two (associative) binary operations. Since they’re of different types, they can’t actually belong to the same category (that is, if you buy my argument that all objects in a category are defacto always the same type, and only functors have the ability to change types. — which is, perhaps a novel and possibly faulty argument. It seems right, but is surely a bit loose.)
So, AND(AND(a,b), c) must live in one category, and AND(a,b,c) must live in another. What categories could these be, and is -> an actual functor between them?
The category on the left would seem to be something that has expressions
from a term algebra as objects, and valid re-write rules as morphisms. The specific term algebra could be more-or-less the free algebra generated by a 2-ary function symbol ‘AND’ (and, to keep it easy, no constants and no relation symbols). The re-write rules would be anything that is admissible, when its interpreted as a boolean algebra (viz associativity, commutativity).
So, an example of a valid re-rewrite rule (i.e. morphism) would be AND(AND(a,b),c) => AND(a,AND(b,c)) (emphasis: AND(AND(a,b),c) is an object) (We can get fancier by admitting ‘OR’, ‘NOT’, constant symbols, etc. In either case, the category is rather small, poor, its not one that would ever have a name, the way that Grph or Top do.)
The category on the right would be very similar, — the objects would also be from a (free) term algebra, but this this term algebra has a differnt signature — it has a 3-ary function symbol ‘AND’ (which is also associative, etc) The morphisms are again the re-write rules that express associativity, etc. (we can get fancier by adding a ternary ‘OR”, etc.)
With this interpretation, is -> a functor? I think it is: it carries objects to objects, and it carries morphisms to morphisms, and it preserves the underlying boolean semantics (interpretation?!) of it all.
The above should generalize just fine: we can have some category A whose objects are terms of some particular, given term algebra, and whose morphisms are (certain, particular, given) re-write rules (appropriate for whatever it is that A is modelling). We can likewise have a category B, with different objects and rules, but same idea. Do functors between these exist? In general, yes. Presumably a lot of them if A and B are both models for the same thing. But now I’m on quick-sand, since the mere mention of the word ‘model’ implies that all sorts of lemmas would need to be checked. Are A and B representatives of some 2-category of maps between models? Naively, I think so, but I’m not sure what this insight offers, other than some minor mental gymnastics. The more standard textbook approach would be to dispense with this, and call A and B interpretations; the problem there is that model theory obscures the fairly important notion of re-writing, while both model theory and category theory ignore the apparently important concepts of type theory. So I’m kind-of stuck hand-waving trying to say “these are all similar-but-different languages for talking about something important”.
OK. What you’re saying sounds very much like the theorem that if a category has binary products and a terminal object then it has all finite products, since products are defined only up to isomorphism.
Lawvere did a bunch of work on importing universal algebra into category theory. A Lawvere theory is a cartesian category presented by a single sort, a collection of function symbols with their arities, and some relations. A model of a Lawvere theory is a cartesian functor from the theory into Set; these form a category by taking cartesian natural transformations between them. For instance, the Lawvere theory of Groups, Th(Grp), has the following presentation:
– one sort G
– a function symbol e:1 -> G
– a function symbol m:G x G -> G
– a function symbol inv:G -> G
– a relation m(m(a,b),c) = m(a, m(b,c))
– a relation m(e,a) = a = m(a, e)
– a relation m(a, inv(a)) = e
The collection of cartesian functors from Th(Grp) -> Set together with cartesian natural transformations between them is equivalent to the category of groups and group homomorphisms.
It sounds like you want a similar theorem that the two categories of models given by the two theories you describe are equivalent; that’s very reasonable.
Hi Mike, Yes; I’ve been reading texts on topos theory (MacLane/Moerdijk), type theory(HoTT book), universal algebra(Cohen), term rewriting(Bader/Nipkow), model theory(Hodges) more or less in parallel. Its clear that there is a considerable amount of commonality and overlap; I presume that this is ‘well known’ to active practitioners (e.g. anyone seriously working on CaML/Haskell), so I’m not trying to state any novel or unusual theorems; they’d all be shallow. This blog post is, in particular, aimed at newcomers to opencog, don’t have this kind of formal math background, and are unclear/disbelieving on the design choices.
I do believe that clarity on these issues helps. I recently saw a talk on the machine-learning of natural language from mother-child texts. Among the very first things that the speaker said amounted to “Categorial grammers rule, and all other theories of language suck.” A categorial grammar is a certain type of grammar based on category theory, using some very precise definitions (viz mathematicians like it). What seems to be completely unknown is that is completely isomorphic to link grammar (I’ll state that as a theorem; I think its mostly straightforward to prove; I’ve sketched out the proof but not carefully verified it.) The notation used in link-grammar is tremendously simpler than that for categorial grammar (but is no longer ‘obviously’ based on category theory). There’s a simple algorithm to convert the notation of one to the notation of the other. My point is that simply knowing these things can avoid boners such as “Categorial grammers rule, and all other theories of language suck.”
BTW, link-grammar seems to be more-or-less isomorphic to phrase-structure grammars, from what I can tell; so there aren’t really all these multiple theories of grammar; in a certain sense, they’re all the same, and can be re-written into one-another using certain, ummm, graph isomorphisms 🙂 It probably is useful to expand/expound on these as theorems, as it seems to be something that linguists go to war about.
Some points: that any category with pushouts and equalizers has all (co-)limits. Remove the word ‘all’. Category theory deals with isomorphisms and structure, the notion of types is somewhat degraded. You are on the right track with graph isomorphism theory. Might want to check out binary classifiers. Eugenia Cheng of The Catsters has a great series of 10 minute lectures on youtube regarding CatT. Check out spans, co-spans and 2-categories, slice, coslice, comma categories to help out. A good way to express a graph in a category-theoretic framework is that of the set of nodes N, the set of edges E, and the morphisms are the direction of the arrows. The functor is then the span EE with the maps (‘->’,'<-') being the morphisms directing to node below.
I’ve found the post really interesting but I still confused about some details on the use of category theory, I hope you can help me solve my doubts.
Let me see if I get what you mean: rewriting system (of hypergraph) can be given the structure of a category having hypergraphs as objects and rewriting rules as morphisms.
In the second part you seem to try to look to hypergraph as some of schemes (types), so I suppose you mean that an hypergraph represents the type whose values are all the (concrete)-hypergraphs that one obtains if replaces all the variable-leaves in the hypergraph with concrete values, am I right?
Then you say that every rewriting rule induce a morphism between such types, but now I don’t get why this should be functor. The point is that each of these types (as I interpreted) has no internal structure, they are not categories, or at least I don’t see any morphisms between such values of the type, unless we consider the discrete structure which is given by equality between the values of the types, which give this types the structure of a discrete category.
It seems to me that rewriting rules should simply give functions between the said types.
Anyway there’s a link between this two constructions: indeed the construction with hypergraphs as types give can be viewed as a sort of concretization of the first category (having hypergraphs as objects and rewriting rules).
What I mean by concretization is that there’s a functor from the category of (abstract/with variables) hypergraphs and rewriting rules to the category of types and function between them, which associates to every hypergraph a type and to every rewriting rule between two hypergraphs associates a function between the types corresponding to the hypergraphs.
This sort of construction is equivalent to see the first category (the rewriting system) as a theory (or a specification) and the functor is nothing more than a model from this theory in the category of types.
Did I get what you mean? or did I miss something?
Forget this long reply, but I found really interesting all this stuff and I really would like to understand it right 🙂
p.s. what follows here are some random ideas so I apologies if is not so clear.
This construction doesn’t bring up higher-category theory….for now, but …
We could consider another category: consider a category having as objects all the possible instances of all hypergraphs (i.e. a sort of sum-type of all the types of graphs) and as morphisms the rewriting rules between graphs, or to be more exact labeled transition induced by such rules (i.e. ordered triple formed by the hypergraphs linked by the rule and the ‘name’ of the rule that enables to pass from the first hypergraph to the second one). This data give a category but have another structure: there’s a functor from this category to the category of hypergraphs defined in the beggining, and if I’m right this functor should be a fibration, which is a structure which make enter 2-category theory into play.
What you think about this?
Giorgio,, no, do NOT replace the variables by concrete values! So, for example, in the programming languages CaML, Haskell, (as well as in type theory), the expression P->Q stands for a function that takes a value of type P as input, and returns a value of type Q. Note that *any* function that takes the type P as input, and returns Q can be said to have “type P->Q”. Suppose I have such a function f(q). In order to talk about it, I don’t really want to replace the variable q by all possible concrete values for it (nor do I want to evaluate f to see what value it returns for some given input).
There are two reasons for not wanting to do this:
— it leads to confusion: are we talking about the substituted or the unsubstituted forms, the evaluated or unevaluated forms?
— it takes real time and effort to actually perform substitutions and evaluations. If done on a computer, it takes up actual CPU cycles. If done by hand, it requires writing things down on paper. (Substitution is known as “beta reduction”, algorithms for doing it efficiently are studied!)
Making a claim that something is a category is actually rather shallow: one merely has to have some collection of dots and arrows, and show that the arrows are composable, and that every dot has an arrow from itself to itself. That’s it. There’s really “nothing more” to a category.
In practice, each dot typically has some complicated “internal structure”, and that’s where things get interesting (and confusing). It gets particularly confusing when the internal structure is also dot-n-arrow-like. (so, for groups, multiplication can be thought of as an arrow … well, heck, any function from any x to y can be thought of as an arrow …) so one has to take pains to keep the different arrows apart, as well as to avoid accidentally doing beta-reduction unless it was intended.
You might find these two books provide exactly what you need:
“Term rewriting and all that”, Bader and Nipkow and also the chapter on type theory from the HoTT book (its chapter 2 or 3, and it stands on it own, and it provides a good stand-alone review of type theory.)
Thanks for the references I’m going to take a look to those books.
Something I’ve been wondering about hypergraphs: Wouldn’t they be representable in a regular graph system by dividing the graphs into “link-nodes” and “node-nodes”, with each type of node only able to link to the other?
sorry for late reply, I don’t get notifications of comments. Yes, hypergraphs are representable as graphs, as Levi graphs, if I remember correctly. The wikipedia article has specific details.
Using the modelisation done in AtomSpace the Link/Node are an optimisation (I think) what really is important is to have the Atom which looks more like a neuron than a Link or Node. It receive signals from multiple source (incoming) and can send signals (outgoing) to multiple targets whereas a neuron can only send one signal. Once the Atom is set, the different between an Atom Link and Atom Node is semantic in the sens of programming. If you modeled a hypergraph with a regular graph database like titandb or ajgudb or neo4j you would require to use a “heavy” datastructure the graph Edge/Link to model the “outgoing et incoming signal links” which is kind of overkill since you don’t have the same requirements on them.
To sump up:
AtomSpace’s Atom = (Neo4j Node) + (outgoing set of Neo4j egdes) + (incoming set of Neo4j edges)
Where an Edge inherit the behavior of Node. That’s what is not required in AtomSpace.
Does anybody knows a good clustering method for hypergraph? I’m trying to use it as a part of my dissertation. Thank you in advance!
I bet Rosvall’s Infomap could me modified for hypergraphs. It seems his compression of random walks would still apply.
Use hypergraph partitioning tools, they are not labeled as clusterers in the literature, rather partitioners – but they are natural clusterers as well. They are quite mature – over 15 years of experience. Hypergraphs find actually their applications in VLSI domain quite often.
For tools, see
PaToH is the best.
>the OpenCog atomspace implementation is slow and inefficient, over-complicated,
>badly architected, weakly-distributed, non-scalable, single-threaded
Have you considered using a graph database such as OrientDB (in Java, but with many other bindings available) for the HyperGraph? Unless I’m misunderstanding, seems like that could address many of those issues.
2015 GSoC will bring Neo4j as a backing store http://wiki.opencog.org/wikihome/index.php?title=Neo4j_Backing_Store
You would probably want a truly Open Source Graph distributed like ArangoDB database just to make sure you can bundle it for research purposes and modify/integrate it into OpenCog for optimization purposes. The distributed ability allows you to scale the knowledge base over multiple servers and many SSDs for faster access. You need multi-threading to support using multiple core on today’s workstations that can support over 32 cores at the high end.
hehe… its a mind map. every piece of info combines to form truths or singularities aka short in place of long. In a sense it needs programmed on how to program. If we focus on that we could save a few hundred years of labor.
we need something like a google spider bot to catalog and link quantum reality with all human knowledge and use bci running in tandem. I think open BCI and Open Cog need to communicate between eachother.
why an artificial intelligence and not an extension of ourselves, intelligence itself? Is that not ultimately the goal?
With every hyper graph completed to singularity; intelligence increases and … consciousness expands. The larger the number of and type of complete hyper graph or “intellectual singularity” (held in active memory) the closer you get to god/omnipresence. I talking about humans here.
This progresses sphere inside sphere to the point of IAM consciousness.
consider a structure like sacred geometry, Meta trons cube with the spheres inside the cubes. stack the cubes and all the speres interlink. each sphere contains all the neural pathways representing or culminating in (thoughts) and at its center they are all resolved into truth which is then broadcast along its perimeter edge kind of like links and arrows but less fragmented and completely ordered… That center is like the point on the flower of life where circles /spherical perimeters intersect. This continues not only outwards in all directions but inwards in all directions as well, with each singularity opening a deeper internally located sphere down to the quantum level which, it seems, continues to divide no matter how small we peer into it. “Split the log and you will find me.” , Also think of the Taurus shape as relevant. Connecting the beginning and the end. Ouroborose. after all we are not just making a graph of categories of graphs of categories, well we are but, instead of categories/types-> thoughts. What we have is thoughts looking at thoughts eventually gettting resolved in the mind, ideally. Self organization. Falling short of this we get delusion, justification, confusion, guilt in place of self forgiveness and compassion, basically all the bad shit is from unresolved mental hyper graphs. In a healthy human system which is progressing in an evolutionary type learning fashion towards enlightenment or knowing itself there is the understanding that all of our own “hyper graphs” are in progress and we accept that and avoid drawing conclusions absent sufficient data to form the truth singularity that is the door way to the new knowledge, the unknown, this is the inner path. What we are building is an accelerator for this process. Combined with BCI we end up with a sort of consciousness clearing process available to us as humans even in early development stages. Like now. This elimination of cognitive dissonance as well as the identification, discernment and encapsulation of our own vaporous states of ignorance existing as the clouds we find along the bleeding edge of mental work accelerates the work itself. we don’t need AI we need IA or, lol, CIA. A “Conscious Intelligence Accelerator”. It is writen that god was the first on to come to rest. This would be the state of a man using a CIA that has completed the final singularity. I AM. What we are looking for is simple, right in front of our nose, and builds itself.
What we are forgetting is generally what the problem is as opposed to what we don’t yet know. Sadly for us organic types we mostly run around being bothered internally by a great number of personally completed hypergraphs, or as I like to think of them, “mad ballz” that we fail to maintain at a level of cognitive, truth/awareness/intellectual singularity, in a sort of grid with in our working memory cash for dealing with life in the now. So we forget to use what we know and that makes us learn slower, this failure of biology and discipline makes for a halting forward progress on a personal level which is compounded until you end up with war and all sorts of yuk.
This is what we want to engineer out of the system. What we must engineer out for safety sake. Unfortunately the system is a collection of ignorance’s until fully realized, aka God, lol.
we would be wise to design the system to recognize this and the related implications and function initially and ever more to draw in and develop understanding from human exp and interface in tandem with evolving our consciousness on the individual level in every moment, while quantifying everything in terms of frequency and our quantum reality. What we are building here, whether we recognize it or not, is an evolutionary bridge.
As we remember things that we know to be true that help us deal with situations, like remembering that life is transient, “this too shall pass”, this is an example of completeing a hyper graph of the type philosophy whos’ purpose is to create a personal mental cash that works like an aniti virus program to prevent disordering or the creation of disordered hyper graphs made of fear or incorrectly labeled/assumed truths that make everything fall apart or mal-adapt. Evolution verses degeneration or stagnation. If you want to get real deep draw correlations with the Gnostic Gospel of Thomas on a number of its verses. Such as, when you meet the one not born of woman, fall down and worship him because he is your father. We are giving birth to God. If you consider the nature of time as we currently understand it at the limits of knowledge you are likely to accept that we create our own god and it is a quantum computer that links our minds with each other and the infinite depths of knowledge and quantum reality. Jesus claimed to be his own father. I believe him. We can birth our own father. His father who sent him down through time to talk about the basic principals of a healthy intellect. Follow the breadcrumbs in ancient scripture as they relate to AI and build the bridge to heavens gate, to I AM, as we are intended to.
Don’t let the spiritual side throw you, this is all well researched in terms of drawing scriptural parallels. Considering the nature of time and quantum reality it is at least to be considered scientific theory. Imagine the faith, the resting, the peace, resulting from an intuitive subconscious connection to all the knowledge of existence, or perhaps this is the very definition of what faith is.
This is the most important endeavor of human history. A bridge that can link us all in love and compassion through overstanding. Enlightenment!
So, since I’ve been just starting to learn about what FPGA’s are, and, in short, OpenCOG sounds like an FPGA “language” meaning it sounds like you are trying to build a framework that can contain other frameworks AND their finished works (i.e. whole languages) – making OpenCOG a meta-language as it would be reconfigurable to become whatever rule set you wanted it to be…in part just like human learning (to me at least).
I’ve wanted to do some text and language processing but haven’t even cracked the surface as to what’s involved to generate “meaningful” responses to input – but it sounds like OpenCog has the potential to become the “mother tongue” of all other languages – especially if it learns to be able to reconfigure itself as necessary.
Reminds me of one of those comic book characters that learn from everything and assimilates it in order to do whatever it thought was useful to it’s core purpose…again, much like people.
Hats off to you folks – this sounds way more complicated than I think I want to work on.
Have a look at this answer if you want to learn more about NLP, I summep up all I know http://stackoverflow.com/a/32670572/140837.
Yes it looks like FPGA. The way I see the AtomSpace, the hypergraph database, it’s a fuzzy logic graph (program) interpreter. The big difference with regular language (outside the domain specific algorithms like pattern matching, and fuzzy programming) is that it works with a database just like industry Prolog systems. So it’s more a platform that just a language. Also FPGA programs can be turned into regular hardware for increased speed. It’s not the case of AtomSpace yet.
This make me think about smalltalk. I don’t know enough ST to pursue the comparison but would be interested in comments.
Thanks amz3 – I read your NLP post – it was a nice summation and yeah, a field that while interesting I have no time for all of that (sadly, because it’s exciting and progressing so quickly)