Why Hypergraphs?

OpenCog uses hypergraphs to represent knowledge.  Why?  I don’t think this is clearly, succinctly explained anywhere, so I will try to do so here.  This is a very important point: I can’t begin to tell you how many times I went searching for some whiz-bang logic programming system,  or inference engine, or theorem-prover, or some graph re-writing engine, or some probabilistic programming system, only to throw up my hands up and realize that, after many wasted hours, none of them do what I want.  If you’re interested in AGI, then let me assure you: they don’t do what you want, either.  So, what do I want them to do, and why?

Well, lets begin easy: with graph re-writing systems.  These days, almost everyone agrees that a great way to represent knowledge is with graphs.  The structure IsA(Cat, Animal) looks like a graph with two vertexes, Cat and Animal, and a labelled edge, IsA, between them.  If I also know that IsA(Binky, Cat), then, in principle, I should be able to deduce that IsA(Binky, Animal).  This is a simple transitive relationship, and the act of logical deduction, for this example, is a simple graph re-write rule: If you see two IsA edges in a row, you should draw a third IsA edge between the first and the last vertex.  Easy, right?

So perhaps you’d think that all logic induction and reasoning engines have graph rewrite systems at their core, right? So you’d think. In fact, almost none of them do.  And those that do, do it in some internal, ad hoc, non-public, undocumented way: there’s no API, its not exposed externally; its not an ‘official’ part of the system for you to use or tinker with.

OK, so why do I need a graph re-write system? Well, I’ve been working on a natural language parser, a so-called Viterbi decoder for Link Grammar.  My initial graph is a string of words: a sentence.  The vertexes are words, and the edges are arrows called “next word”. Real simple. To parse this sentence, I want to apply a certain set of simple graph-rewrite rules: for example, if word X is a noun, then create an arrow, called ‘part-of-speech’ (POS),  from word X to the special vertex ‘noun’.  If the word immediately before word X is an adjective (i.e. if  it has a POS arrow pointing to ‘adjective’), then create a new arrow, called ‘noun modifier’, pointing from X to this word before it.   This kind of graph markup is called ‘dependency parsing‘, and is a very popular way of doing natural language parsing.  So you’d think that all dependency parsers have a graph re-write system at their core, right?  Hardly. In fact, just about none of them do.  And if they do, they’re walled off, hacked up, undocumented … you get the idea.

The only dependency parser that I know of that has an explicit graph-rewriting system in it, that is open for tinkering, and is documented (!) is RelEx.  And that’s great.  Wow!  Although RelEx invented and used its own, custom, graph rewrite system, I suppose that, in principle, it could have used some other, pre-existing system to do this (Well, it couldn’t, because in 2005, there weren’t any robust, open-source graph rewriting systems. Whatever).

What else do I want to do? Well, I daydream about using a machine-learning system to learn new rules!   I mean, this is the goal of AGI, right? Have a machine that can learn new things?  Well, to learn new rules, lets see, I need to have some simple syntax for representing rules.  Basically a simple graph language.  So you’d think that all graph re-writing systems have some simple, easy-to-use graph language, right?  No. Emphatically, hell no. With maybe one exception, you have to program in Java or C++ or C#.net. Unfortunately, my machine learning system doesn’t yet know how to program in those languages.

Here’s the leap of faith, the leap of logic that I’ve come to: It would be convenient if I could express graph re-write rules as graphs themselves.  It would be convenient if I could express logical implications as graphs themselves.  It would be convenient if  my graphical programming language itself could be written as a graph. Well, it can be. In fact, it is easiest to do this if the graph is actually a hypergraph. I’ll explain why in the next section below.  If I had a hypergraph re-writing system, than I would have a place where I could unify natural language processing, logical reasoning and machine learning, all in one place.  So you’d think that anyone who was trying to build an AGI system wouldbe writing its foundations on a hypergraph rewriting system, right? No, you’d be wrong. Apparently, OpenCog is the only system that does this.  Now, the OpenCog implementation has many design warts and architectural flaws. Its hard to understand and hard to use.  But now, perhaps, now you see why I’ve pledged allegiance to it,   instead of running off with some other reasoning system or parser or Bayesian network or whatever.

Mathematical Foundations

In this section, I will try to put all of the above comments on a solid mathematical footing, invoking model theory, category theory, (and even n-categories!), and type theory.  The upshot of all of this will be that the easiest way to represent data structures so that machine learning algorithms can learn them, and then apply them both to natural-language parsing, and to logical reasoning, is to represent the data structures as hypergraphs.

From model theory and computer science, we have the concept of a signature: a set of functions which take some number of arguments and return some value (just like a signature in Java or C++).  If one ignores types for a moment (which is what lisp and scheme do), then, in principle, one can pass any value in any position of any function, and stack these up arbitrarily, recursively, even.  This is called a term algebra, or more precisely a free term algebra or ‘free theory’.  If the functions don’t have names, but are anonymous, then one has the lambda calculus.

One way to envision a member of a term algebra is as a directed tree graph.  So, if we have two functions f(x,y) and g(x,y) and three constants a,b,c, then f(a, g(b,c)) is a binary tree, with f at the top node, and g as the left node, and a, b and c as the leaves. A term algebra is then just the collection of all such trees. Nothing more, nothing less.

To do useful programming, one also needs predicates or relations: things that have truth values, and order terms. Thus, ‘greater then’ is a relation, and ‘a>b’ is either true or false.  Relations can also be things like IsA, HasA, BelongsTo, LivesIn, EmployedAt. The last two examples should make clear that relational algebras form the foundation of databases, whether SQL or noSQL.  Relations are combined with logical operators (employee X LivesIn city Y AND ReportsTo dept Z is a textbook example).

In general, one combines both term algebras and relational algebras, so that one writes things like 3<f(x,y) where f(x,y) is a term, < is a relation, 3 is a constant.  Add to this the special free-variable binding operators ForAll and ThereExists, one gets a First-Order Logic. So, for example, ForAll x ThereExists y such that 3<f(x,y).

A special case of a relation is a term re-write rule.  This is a relation that takes a term, and replaces it with a different term: for example, ab->c, which says ‘whenever you see the string ab, replace it with c’. The BNF notation of computer languages is just a collection of term re-writing relations. One uses a term rewriting system to parse a (formal) language. Graph rewriting is just a variation of this: whenever you see a graph x, replace it with a graph y.

So far, I’ve avoided the issue of types.  In programming, types allow type safety.  Types make code more readable: f(string, int) is less mysterious than f(x,y). Types solve certain abstract recursion problems in lambda calculus.  A re-write rule in BNF notation is a typed rewrite rule: a substitution a->bc holds not just for any a, but specifically, only when a is a web page, or an IP address or a URL.  A graph re-write rule that says ‘whenever you see x, replace it with y’ implicitly demands that x be typed: x can’t be just anything, it has to be a specific kind of graph, having a specific shape and linkage.  The rule applies for all graphs that have this shape, that are of this kind or type.  So a re-write rule x->y is really a rule (type x)->(type y). Graphically, its still two points x and y, with a directed edge -> in between them. Oh, wait, x and y aren’t points, x and y are graphs.  What kind of a graph has graphs as points?  What kind of graph has edges between graphs? A hypergraph!

And that is the main Ah-HA! moment.  Once you see that, you start seeing hypergraphs everywhere. Sure, you can visualize Set(a,b,c) as a tree-graph, with Set as the parent node, and three children a,b,c.  Or you can visualize this as a hypergraph: Set as a ‘link’ (a ‘hyper-edge’ with 3 endpoints, not 2), and the points a,b,c as the nodes contained in the link.  In fact, all hypergraphs are dual to these directed trees; if you have one, you can have the other.  Hypergraphs are just a convenient notation.

Lets take a moment to look back on what just happened: a function f(x,y,z) is just a hyperedge f connecting three nodes x,y,z. A boolean expression a AND b AND c can be written as AND(a,b,c), which shows a specific example of a hypergraph equivalance. It can be written as a reduction rule: (a AND b AND c) -> AND(a,b,c) which is itself just a hypergraph of the form x->y with x and y being hypergraphs.  The first-order logic constructs ‘for-all’ and ‘there-exists’ are just special cases of the lambda-calculus binding operation lambda, which binds free variables in an expression. Again, hypergraphs: lambda is just a hyperlink that binds a variable x in an expression y, and y was just a term, ahem, hypergraph!

I mentioned categories and n-categories, and I suppose I should justify this mention. Insofar as category theory is the theory of points and arrows, then a rewrite rule between graphs is a morphism in the category of small diagrams.  A subtle but important point about category theory that is almost never discussed in intro-to-cat-theory texts, is that all objects are implicitly typed. In the the category of Sets, the objects are all of the same kind: they are sets.  Its not mentioned because in a given category, all objects are of the same type; types change only when a functor maps from one to another.   So, to understand the category-theoretic equivalent of types in computer science, we must think of functors.  But, as we just saw, a graph rewriting rule is a morphism between functors.  So you could say that graph re-writing is just the category Cat of small categories.  Or you could slide down this slope in a different direction, and start calling it a 2-category. Whatever.  Perhaps its useful to point out that graph rewriting algorithms are sometimes expressed as being one-pushouts or as being 2-pushouts, with a pushout being a certain category-theoretic concept. Notable, for graph rewriting, is that any category with pushouts and equalizers has all (co-)limits. Except that, as we just saw, we want hyper-graph rewriting systems, not graph rewriting systems. So there.

What else are they good for?

In OpenCog, the Link and Node types inherit from the type Atom. This naming convention is intentionally suggestive: ‘Atom’ is meant to invoke the notion of an ‘atomic formula’ from model theory or first-order logic: that is, a formula that has no variables in it (its fully grounded), and that does have a truth value (its not composed of boolean connectives, and has no quantifiers in it).  This suggestive naming helps establish the intended use of OpenCog hypergraphs with regards to first-order logic.

The truth value is a different matter. The default (simplest) OpenCog truth value is a pair of floating point numbers: a probability and a confidence. These numbers allow several other AI concepts to be mapped into hypegraphs: Bayesian networks, Markov networks, and artificial neural networks. All three of these are graphs: directed graphs, at that. They differ in how they assign and propagate floating-point probabilites, entropies, activations. Ideas such as Markov logic networks, which implement maximum entropy principles (aka Boltzmann parition function) on a network of first-order logic expressions, can be represented with OpenCog hypergraphs.  Oh, and I should mention PLN (Probabilistic Logic Networks), which is what the atomspace was originally designed for. That’s what I like about the OpenCog hypergraph atomspace: it has a tremendously powerful ability to succinctly and easily represent complex modern AI concepts.

The good, the bad and the ugly.

You’ve heard about the good.  Now for the bad and the ugly.  First, the OpenCog atomspace implementation is slow and inefficient, over-complicated, badly architected, weakly-distributed, non-scalable, single-threaded. But lets not go there.  All this might be fixable, after a lot of programming effort (and deep, architectural thinking). Its been hotly debated in the past. Someday, maybe it’ll get fixed.

The bad thing about the OpenCog atomspace is that almost no one understands that, ahem, it is a programming language. Let me be very clear on this: OpenCog implements graph re-writing rules with the ImplicationLink. A sequence of ImplicationLinks can be used to compute things. In that sense, it is somewhat like the language Graph Programs, except that OpenCog allows fractional truth values, and logic programming and other good things.  If we stick to using ImplicationLinks with crisp truth values (T/F), then the resulting system is essentially Prolog. Of course you know that Prolog is popular for AI programming projects, because its fairly easy to write reasoning engines and expert systems and the like in Prolog.  What you may not know is that closely related to Prolog is Answer-Set Programming (ASP) . In fact, ASP uses exactly the same notation as Prolog does. It differs in two important ways: first, when you run a Prolog program, you get one answer. With ASP, you get all of the answers!  Its dramatically more powerful, and the reason for this is that  modern-day ASP solvers are built on top of modern-day Boolean SAT solvers. Which means that they are stunningly efficient and effective.

So what does this have to do with OpenCog? Well, here we have a system that, using ImplicationLinks, is essentially Prolog, more or less, when run in crisp-logic mode. Or, you could say, its like typed Lambda calculus. But do we have a simple, easy-to-use syntax like Prolog for it? No we don’t. That’s bad. Can we take an existing Prolog program, run a tool on it, and convert it to ImplicationLinks? No we don’t.  Would it run fast? No it wouldn’t: it would probably be slower than the slowest Prolog ever: Borland prolog running on a 5MHz IBM PC AT in 1986.  And forget an ASP solver for OpenCog.  For the special case where all OpenCog truth values are crisp T/F values, we do not have a Boolean SAT solver to find solutions for our graphs of ImplicationLinks.  This is bad, Really Bad. But I think that this is because very few people seem to understand that the OpenCog Atomspace really is a petri dish for programming languages.

Heck, we don’t even have anything equivalent to the RelEx Sentence Algorithms for OpenCog, even though RelEx is OpenCog-like. This absence is slowing down my efforts to continue work on the Link-Grammar parser, and to move natural language processing out of its stand-alone arena, into a general, flexible framework.

(And we’ve barely scratched the surface. In order to make implication and pattern mining run quickly in the atomspace, we need to implement something like the concept of ‘memoization‘ from lisp/scheme. But it turns out that memoization is really just a relational algebra: it is a database of short expressions that stand in for long ones. The OpenCog Atomspace is also, among other things, a relational database that can store and query not only flat tables or key-value pairs, but full-blown hypergraphs. And this isn’t a day-dream; its crucial for performance (and its partially implemented)).

Why don’t we have these things? Well, its hard. Its just not easy. We don’t have the infrastructure to make it easy, and we don’t have the users who demand these tools.   I don’t think most users are even aware of what the atomspace could even do.   Almost no one is thinking about ‘how to program in the language of OpenCog’ even though it has the potential of far surpassing any of the existing probabilistic programming languages out there.  Its time to change all this, but it will take someone smart and dedicated to do this. Many someones. This could be you.

About Linas Vepstas

Computer Science Researcher
This entry was posted in Design, Introduction, Theory and tagged , , , , , , , , , . Bookmark the permalink.
  • http://www.facebook.com/people/Stephen-KIng/100000479464385 Stephen KIng

    Linas, waht you are discussing looks like Topos http://en.wikipedia.org/wiki/Topos

  • http://www.robodance.com/ roschler

    Hello Linas. Very interesting article. One question. You say when you run Prolog you only get one answer. But even disregarding predicates like findall(), Prolog’s backtracking engine by its very nature returns all answers as long as you ask for them (continue to “fail” until all answers are produced). Or am I misunderstanding something here?

  • http://www.facebook.com/people/Stephen-KIng/100000479464385 Stephen KIng

    Damn, your working with Ben! Nice! Please yell at him for me. ;-)

  • King Yin Yan

    Thanks Linas, I learned a bunch of things from it…
    Using Link Grammar as the knowledge representation itself… that’s a brilliant idea. The links provide grammatical information as to how to interpret the words (concepts). This seems better than my purely algebraic approach.

    On the cons side: the link grammar graph seems not to be a tree, so it may take some decomposition to cast it into algebraic form. The algebraic forms are usually more efficient to work with, rather than graph rewriting — that’s my impression.

  • Stephen Paul King

    Hi, You wrote “… all hypergraphs are dual to these directed trees…”. What duality is this?

  • Stephen Paul King
  • Mike Stay

    I agree wholeheartedly that some kind of hypergraph is a good way to encode knowledge. Good analogies are hypergraph homomorphisms (or at least spans of hypergraphs) where the same semantic structure appears in two different places.

    That said, I’m having trouble parsing what you’re saying about types in computer science and how rewrite rules are morphisms of functors.

    A (directed multi-) graph is a set V of vertices and a set E of edges, together with source and target maps from E to V. Given a graph, we can form the free category on the graph whose objects are vertices and whose morphisms are paths.

    There is a 2-category GRAPH of graphs, graph homomorphisms, and graph transformations. A graph homomorphism F:G₁ -> G₂ maps vertices of G₁ to vertices of G₂ and edges of G₁ to edges of G₂ such that source and target are preserved, i.e. adjacency is preserved. A graph transformation α:F => F’ between homomorphisms assigns to each vertex u in G₁ an edge αᵤ:Fu -> F’u in G₂ such that for any edge e:u->v in G₁, there’s a square with sides Fe, αᵥ, F’e, αᵤ in G₂.

    Graph homomorphisms are much stricter than arbitrary functors: they preserve edges, whereas functors only preserve paths. But as I understand graph rewrites, they are too general even to be functors because they may transform the graph in arbitrary ways. There’s certainly a category whose objects are graphs and whose morphisms are graph rewrites, but it would contain the underlying category of GRAPH as a subcategory. I don’t see how “a graph rewrite rule is a morphism of functors”.

    Computer science usually equates the notion of a data type with a computably enumerable set of values of that type; see, e.g. Hask, the category of Haskell types with Haskell functions between them. That said, one can certainly talk about doing n-category theory internal to Hask.

    Can you clarify a bit more what you have in mind?

  • Linas Vepstas

    Hi Mike,

    What I probably meant was “Yes, I’m coming to dinner, let me finish this sentence by writing the first thing that enters my mind.” As I reconstruct what I was thinking, I think I meant to write “a functor between morphisms”. The idea to even mention n-categories was uncontemplated; it was a last-minute inspiration to add to something already rushed, and so perhaps a poor decision.

    Everything you say sounds correct. Perhaps I am talking about Hask. The gist of what I am trying to say is that there are mutiple different languages for talking about such things, and they include model theory, category theory and type theory — but the devil is in the details, which are carefully avoided in the
    above. Sooo …

    Let me try to reconstruct what I must have been thinking at the time. So …
    lets begin with the naive view:

    (a AND b AND c) is a hypergaph (object)

    AND(a,b,c) is a hypergaph (object)

    so a re-write rule (a AND b AND c) -> AND(a,b,c) is a morphism.

    So, in this naive view, we just have a category of (certain, specific, given) hypergraphs, and nothing more. (Its a category because morphisms are always composable, and there’s an identity morphism for each object, and there are no other constraints to be satisfied).

    But … viewed as type-theory types, (a AND b AND c) is a different type than AND(a,b,c) I’ve abused notation a little bit here, and, for the first one, I should have written AND(AND(a,b), c) to emphasize its really two (associative) binary operations. Since they’re of different types, they can’t actually belong to the same category (that is, if you buy my argument that all objects in a category are defacto always the same type, and only functors have the ability to change types. — which is, perhaps a novel and possibly faulty argument. It seems right, but is surely a bit loose.)

    So, AND(AND(a,b), c) must live in one category, and AND(a,b,c) must live in another. What categories could these be, and is -> an actual functor between them?

    The category on the left would seem to be something that has expressions
    from a term algebra as objects, and valid re-write rules as morphisms. The specific term algebra could be more-or-less the free algebra generated by a 2-ary function symbol ‘AND’ (and, to keep it easy, no constants and no relation symbols). The re-write rules would be anything that is admissible, when its interpreted as a boolean algebra (viz associativity, commutativity).
    So, an example of a valid re-rewrite rule (i.e. morphism) would be AND(AND(a,b),c) => AND(a,AND(b,c)) (emphasis: AND(AND(a,b),c) is an object) (We can get fancier by admitting ‘OR’, ‘NOT’, constant symbols, etc. In either case, the category is rather small, poor, its not one that would ever have a name, the way that Grph or Top do.)

    The category on the right would be very similar, — the objects would also be from a (free) term algebra, but this this term algebra has a differnt signature — it has a 3-ary function symbol ‘AND’ (which is also associative, etc) The morphisms are again the re-write rules that express associativity, etc. (we can get fancier by adding a ternary ‘OR”, etc.)

    With this interpretation, is -> a functor? I think it is: it carries objects to objects, and it carries morphisms to morphisms, and it preserves the underlying boolean semantics (interpretation?!) of it all.

    The above should generalize just fine: we can have some category A whose objects are terms of some particular, given term algebra, and whose morphisms are (certain, particular, given) re-write rules (appropriate for whatever it is that A is modelling). We can likewise have a category B, with different objects and rules, but same idea. Do functors between these exist? In general, yes. Presumably a lot of them if A and B are both models for the same thing. But now I’m on quick-sand, since the mere mention of the word ‘model’ implies that all sorts of lemmas would need to be checked. Are A and B representatives of some 2-category of maps between models? Naively, I think so, but I’m not sure what this insight offers, other than some minor mental gymnastics. The more standard textbook approach would be to dispense with this, and call A and B interpretations; the problem there is that model theory obscures the fairly important notion of re-writing, while both model theory and category theory ignore the apparently important concepts of type theory. So I’m kind-of stuck hand-waving trying to say “these are all similar-but-different languages for talking about something important”.

  • Linas Vepstas

    Hi, Sorry, I just spotted this question — the answer is given in wikipedia; pay attention to the part that talks about incidence graphs and Levi graphs.

  • Linas Vepstas

    Sorry, I just saw this comment. My statement about prolog merely parrots ‘common knowledge’ about answer-set programming (ASP). The problem is that back-tracking is stunningly slow; boolean SAT is stunningly fast. So, in ASP, you can ask ‘is this 39-bit expression equal to that one?’ and get a yes/no answer in a second or two, whereas in prolog, you would have to try all 2^39 bit-value assignments, which would take hours or days. So, notationally, Prolog ans ASP are quite similar; operationally, they are designed for different tasks and have very different implementations under the covers (and if Prolog systems have recently sprouted boolean SAT under the covers, that would be news to me.)

  • Mike Stay

    OK. What you’re saying sounds very much like the theorem that if a category has binary products and a terminal object then it has all finite products, since products are defined only up to isomorphism.

    Lawvere did a bunch of work on importing universal algebra into category theory. A Lawvere theory is a cartesian category presented by a single sort, a collection of function symbols with their arities, and some relations. A model of a Lawvere theory is a cartesian functor from the theory into Set; these form a category by taking cartesian natural transformations between them. For instance, the Lawvere theory of Groups, Th(Grp), has the following presentation:

    - one sort G

    - a function symbol e:1 -> G
    - a function symbol m:G x G -> G
    - a function symbol inv:G -> G

    - a relation m(m(a,b),c) = m(a, m(b,c))
    - a relation m(e,a) = a = m(a, e)
    - a relation m(a, inv(a)) = e

    The collection of cartesian functors from Th(Grp) -> Set together with cartesian natural transformations between them is equivalent to the category of groups and group homomorphisms.

    It sounds like you want a similar theorem that the two categories of models given by the two theories you describe are equivalent; that’s very reasonable.

  • Linas Vepstas

    Yes, Sheaves offer yet another way to think about things. Its difficult, but ultimately insightful, to try to simultaneously state the general problems/issues of ‘representing knowledge’ using each f the frameworks of Topoi, Model Theory, Type Theory and Category Theory — they all offer different perspectives, but none seem to capture it all. I’m struggling with getting a unified view, from all perspectives.

    (Aieee … and we haven’t even mentioned probability; I’m convinced that maximum entropy principles are the correct way to deal with probability in this context, but how do you apply maximum entropy (aka partition function) to any of Topoi, Model Theory, Type Theory or Category Theory, without getting a total hash? And yet, that is the goal, in a certain way — to find the correct probabilisitc framework that captures these different systems).

  • Stephen Paul King

    I agree, MaxEnt is the way to go if we are only looking at static systems. I am using Adom Giffin (and Ariel Caticha) ‘s work for the dynamic case. http://arxiv.org/abs/0708.1593 :-)

  • Stephen Paul King

    AH! also “the operation of taking the dual of a hypergraph is an involution”. Thank you for the pointer, I missed this somehow…

  • Stephen Paul King

    Another way of looking at sheaves may be to consider a pair of them and the connections (?) between then as some form of a bipartite graph.

  • Giorgio Mossa

    I’ve found the post really interesting but I still confused about some details on the use of category theory, I hope you can help me solve my doubts.

    Let me see if I get what you mean: rewriting system (of hypergraph) can be given the structure of a category having hypergraphs as objects and rewriting rules as morphisms.

    In the second part you seem to try to look to hypergraph as some of schemes (types), so I suppose you mean that an hypergraph represents the type whose values are all the (concrete)-hypergraphs that one obtains if replaces all the variable-leaves in the hypergraph with concrete values, am I right?
    Then you say that every rewriting rule induce a morphism between such types, but now I don’t get why this should be functor. The point is that each of these types (as I interpreted) has no internal structure, they are not categories, or at least I don’t see any morphisms between such values of the type, unless we consider the discrete structure which is given by equality between the values of the types, which give this types the structure of a discrete category.

    It seems to me that rewriting rules should simply give functions between the said types.

    Anyway there’s a link between this two constructions: indeed the construction with hypergraphs as types give can be viewed as a sort of concretization of the first category (having hypergraphs as objects and rewriting rules).
    What I mean by concretization is that there’s a functor from the category of (abstract/with variables) hypergraphs and rewriting rules to the category of types and function between them, which associates to every hypergraph a type and to every rewriting rule between two hypergraphs associates a function between the types corresponding to the hypergraphs.

    This sort of construction is equivalent to see the first category (the rewriting system) as a theory (or a specification) and the functor is nothing more than a model from this theory in the category of types.

    Did I get what you mean? or did I miss something?

    Forget this long reply, but I found really interesting all this stuff and I really would like to understand it right :)

    p.s. what follows here are some random ideas so I apologies if is not so clear.

    This construction doesn’t bring up higher-category theory….for now, but …
    We could consider another category: consider a category having as objects all the possible instances of all hypergraphs (i.e. a sort of sum-type of all the types of graphs) and as morphisms the rewriting rules between graphs, or to be more exact labeled transition induced by such rules (i.e. ordered triple formed by the hypergraphs linked by the rule and the ‘name’ of the rule that enables to pass from the first hypergraph to the second one). This data give a category but have another structure: there’s a functor from this category to the category of hypergraphs defined in the beggining, and if I’m right this functor should be a fibration, which is a structure which make enter 2-category theory into play.

    What you think about this?

  • Linas Vepstas

    Hi Mike, Yes; I’ve been reading texts on topos theory (MacLane/Moerdijk), type theory(HoTT book), universal algebra(Cohen), term rewriting(Bader/Nipkow), model theory(Hodges) more or less in parallel. Its clear that there is a considerable amount of commonality and overlap; I presume that this is ‘well known’ to active practitioners (e.g. anyone seriously working on CaML/Haskell), so I’m not trying to state any novel or unusual theorems; they’d all be shallow. This blog post is, in particular, aimed at newcomers to opencog, don’t have this kind of formal math background, and are unclear/disbelieving on the design choices.

    I do believe that clarity on these issues helps. I recently saw a talk on the machine-learning of natural language from mother-child texts. Among the very first things that the speaker said amounted to “Categorial grammers rule, and all other theories of language suck.” A categorial grammar is a certain type of grammar based on category theory, using some very precise definitions (viz mathematicians like it). What seems to be completely unknown is that is completely isomorphic to link grammar (I’ll state that as a theorem; I think its mostly straightforward to prove; I’ve sketched out the proof but not carefully verified it.) The notation used in link-grammar is tremendously simpler than that for categorial grammar (but is no longer ‘obviously’ based on category theory). There’s a simple algorithm to convert the notation of one to the notation of the other. My point is that simply knowing these things can avoid boners such as “Categorial grammers rule, and all other theories of language suck.”

    BTW, link-grammar seems to be more-or-less isomorphic to phrase-structure grammars, from what I can tell; so there aren’t really all these multiple theories of grammar; in a certain sense, they’re all the same, and can be re-written into one-another using certain, ummm, graph isomorphisms :-) It probably is useful to expand/expound on these as theorems, as it seems to be something that linguists go to war about.

  • Linas Vepstas

    Giorgio,, no, do NOT replace the variables by concrete values! So, for example, in the programming languages CaML, Haskell, (as well as in type theory), the expression P->Q stands for a function that takes a value of type P as input, and returns a value of type Q. Note that *any* function that takes the type P as input, and returns Q can be said to have “type P->Q”. Suppose I have such a function f(q). In order to talk about it, I don’t really want to replace the variable q by all possible concrete values for it (nor do I want to evaluate f to see what value it returns for some given input).

    There are two reasons for not wanting to do this:
    – it leads to confusion: are we talking about the substituted or the unsubstituted forms, the evaluated or unevaluated forms?
    – it takes real time and effort to actually perform substitutions and evaluations. If done on a computer, it takes up actual CPU cycles. If done by hand, it requires writing things down on paper. (Substitution is known as “beta reduction”, algorithms for doing it efficiently are studied!)

    Making a claim that something is a category is actually rather shallow: one merely has to have some collection of dots and arrows, and show that the arrows are composable, and that every dot has an arrow from itself to itself. That’s it. There’s really “nothing more” to a category.

    In practice, each dot typically has some complicated “internal structure”, and that’s where things get interesting (and confusing). It gets particularly confusing when the internal structure is also dot-n-arrow-like. (so, for groups, multiplication can be thought of as an arrow … well, heck, any function from any x to y can be thought of as an arrow …) so one has to take pains to keep the different arrows apart, as well as to avoid accidentally doing beta-reduction unless it was intended.

    You might find these two books provide exactly what you need:
    “Term rewriting and all that”, Bader and Nipkow and also the chapter on type theory from the HoTT book (its chapter 2 or 3, and it stands on it own, and it provides a good stand-alone review of type theory.)

  • Linas Vepstas

    Yes, thanks. I’ve been struggling to understand non-commuting constraints, so this is timely. Reading now.

    BTW: Mike Stay and John Baez might be interested in a simple explanation of the difference between quantum and classical dynamics, illustrated using baseball. Imagine a baseball diamond with a tree growing at third base. Where should a baseball-catching robot stand? The classical dynamics answer is that it should stand at the location of the (single) most-likely baseball trajectory (i.e. smack-dab in the middle of center field). Only one trajectory has this maximum expectation value. The fact that a tree is nearby does not change this value at all — the most likely path for the baseball is over dead-center second base.

    The quantum dynamics answer is to consider all possible trajectories, including those that hit the tree, and realize that the tree-hitting trajectories are nowhere near center field, and that, to maximize the likelihood of catching the ball, we should probably stand off-center, closer to the tree (to sometimes catch balls that hit the tree). This integral over all-possible paths is known as the partition function in quantum field theory. (Also called partition function in MaxEnt, as well).

    I’m not sure if the tree is a “non-commuting constraint” or not, but the point is that path integrals typically do show you how to correctly handle the non-commuting operators of quantum mechanics. I haven’t entirely grasped how this all works for ordinary probability and machine learning, though.

  • Stephen Paul King

    One thing to note about your example of the tree at third base. Its presence will distort the pdf away from a nice Gaussian…

  • Giorgio Mossa

    Thanks for the references I’m going to take a look to those books.

  • MDude

    Something I’ve been wondering about hypergraphs: Wouldn’t they be representable in a regular graph system by dividing the graphs into “link-nodes” and “node-nodes”, with each type of node only able to link to the other?

  • http://linas.org Linas Vepstas

    sorry for late reply, I don’t get notifications of comments. Yes, hypergraphs are representable as graphs, as Levi graphs, if I remember correctly. The wikipedia article has specific details.

  • Renato Freitas Martins

    Hi,
    Does anybody knows a good clustering method for hypergraph? I’m trying to use it as a part of my dissertation. Thank you in advance!

  • qd

    >the OpenCog atomspace implementation is slow and inefficient, over-complicated,

    >badly architected, weakly-distributed, non-scalable, single-threaded

    Have you considered using a graph database such as OrientDB (in Java, but with many other bindings available) for the HyperGraph? Unless I’m misunderstanding, seems like that could address many of those issues.

    https://groups.google.com/forum/#!topic/orient-database/_dHciP_VnAE

    http://www.orientechnologies.com/orientdb/

  • Derek P. Moore

    I bet Rosvall’s Infomap could me modified for hypergraphs. It seems his compression of random walks would still apply.

  • ros ros

    Use hypergraph partitioning tools, they are not labeled as clusterers in the literature, rather partitioners – but they are natural clusterers as well. They are quite mature – over 15 years of experience. Hypergraphs find actually their applications in VLSI domain quite often.

    For tools, see

    PaToH
    hmetis
    Mondriaan.

    PaToH is the best.

  • Renato Freitas Martins

    Thanks folks!