The goal of AGI is to create a thinking machine, a thinking organism, an algorithmic means of knowledge representation, knowledge discovery and self-expression. There are two conventional approaches to this endeavor. One is the ad hoc assembly of assorted technology pieces-parts, with the implicit belief that, after some clever software engineering, it will just come alive. The other approach is to propose some grand over-arching theory-of-everything that, once implemented in software, will just come alive and become the Singularity.
This blog post is a sketch of the second case. As you read what follows, your eyes might glaze over, and you might think to yourself, “oh this is silly, why am I wasting my time reading this?” The reason for your impatience is that, to say what I need to say, I must necessarily talk in such generalities, and provide such silly, childish examples, that it all seems a bit vapid. The problem is that a theory of everything must necessarily talk about everything, which is hard to do without saying things that seem obvious. Do not be fooled. What follows is backed up by some deep and very abstract mathematics that few have access to. I’ll try to summon a basic bibliography at the end, but, for most readers who have not been studying the mathematics of knowledge for the last few decades, the learning curve will be impossibly steep. This is an expedition to the Everest of intellectual pursuits. You can come at this from any (intellectual) race, creed or color; but the formalities may likely exhaust you. That’s OK. If you have 5 or 10 or 20 years, you can train and work out and lift (intellectual) weights. You can get there. And so… on with the show.
The core premise is that “everything is a network” — By “network”, I mean a graph, possibly with directed edges, usually with typed edges, usually with weights, numbers, and other data on each vertex or edge. By “everything” I mean “everything”. Knowledge, language, vision, understanding, facts, deduction, reasoning, algorithms, ideas, beliefs … biological molecules… everything.
A key real-life, observed fact about the “graph of everything” is it consists almost entirely of repeating sub-patterns. For example, “the thigh bone is connected to the hip bone” — this is true generically for vertebrates, no matter which animal it might be, or if it’s alive or dead, it’s imaginary or real. The patterns may be trite, or they may be complex. For images/vision, an example might be “select all photos containing a car” — superficially, this requires knowing how cars look alike, and what part of the pattern is important (wheels, windshields) and what is not (color, parked in a lot or flying through space).
The key learning task is to find such recurring patterns, both in fresh sensory input (what “the computer” is seeing/hearing/reading right now) and in stored knowledge (when processing a dataset – previously-learned, remembered knowledge – for example, a dataset of medical symptoms). The task is not just “pattern recognition” identifying a photo of a car, but of pattern discovery — learning that there are things in the universe called “cars”, and that they have wheels and windows — extensive and intensive properties.
Learning does not mean “training” — of course, one can train, but AGI cannot depend on some pre-existing dataset, gathered by humans, annotated by humans. Learning really means that, starting from nothing at all, except one’s memories, one’s sensory inputs, and one’s wits and cleverness, one discovers something new, and remembers it.
OK, fine, the above is obvious to all. The novelty begins here: The best way to represent a graph with recurring elements in it is with “jigsaw puzzle pieces”. (and NOT with vertexes and edges!!) The pieces represent the recurring elements, and the “connectors” on the piece indicate how the pieces are allowed to join together. For example, the legbone has a jigsaw-puzzle-piece connector on it that says it can only attach to a hipbone. This is true not only metaphorically, but (oddly enough) literally! So when I say “everything is a network” and “the network is a composition of jigsaw puzzle pieces”, the deduction is “everything can be described with these (abstract) jigsaw pieces.”
That this is the case in linguistics has been repeatedly rediscovered by more than a few linguists. It is explained perhaps the most clearly and directly in the original Link Grammar papers, although I can point at some other writings as well; one from a “classical” (non-mathematical) humanities-department linguist; another from a hard-core mathematician – a category theorist – who rediscovered this from thin air. Once you know what to look for, its freakin everywhere. Say, in biology, the Krebs cycle (citric acid cycle) – some sugar molecules come in, some ATP goes out, and these chemicals relate to each other not only abstractly as jigsaw-pieces, but also literally, in that they must have the right shapes! The carbon atom itself is of this very form: it can connect, by bonds, in very specific ways. Those bonds, or rather, the possibility of those bonds, can be imagined as the connecting tabs on jigsaw-puzzle pieces. This is not just a metaphor, it can also be stated in a very precise mathematical sense. (My lament: the mathematical abstraction to make this precise puts it out of reach of most.)
The key learning task is now transformed into one of discerning the shapes of these pieces, given a mixture of “what is known already” plus “sensory data”. The scientific endeavor is then: “How to do this?” and “How to do this quickly, efficiently, effectively?” and “How does this relate to other theories, e.g. neural networks?” I believe the answer to the last question is “yes, its related”, and I can kind-of explain how. The answer to the first question is “I have a provisional way of doing this, and it seems to work“. The middle question – efficiency? Ooooof. This part is … unknown.
There is an adjoint task to learning, and that is expressing and communicating. Given some knowledge, represented in terms of such jigsaw pieces, how can it be converted from its abstract form (sitting in RAM, on the computer disk), into communications: a sequence of words, sentences, or a drawing, painting?
That’s it. That’s the meta-background. At this point, I imagine that you, dear reader, probably feel no wiser than you did before you started reading. So … what can I say to impart actual wisdom? Well, lets try an argument from authority: a jigsaw-puzzle piece is an object in an (asymmetric) monoidal category. The internal language of that category is … a language … a formal language having a syntax. Did that make an impression? Obviously, languages (the set of all syntactically valid expressions) and model-theoretic theories are dual to one-another (this is obvious only if you know model theory). The learning task is to discover the structure, the collection of types, given the language. There is a wide abundance of machine-learning software that can do this in narrow, specific domains. There is no machine learning software that can do this in the fully generic, fully abstract setting of … jigsaw puzzle pieces.
Don’t laugh. Reread this blog post from the beginning, and everywhere that you see “jigsaw piece”, think “syntactic, lexical element of a monoidal category”, and everywhere you see “network of everything”, think “model theoretic language”. Chew on this for a while, and now think: “Is this doable? Can this be encoded as software? Is it worthwhile? Might this actually work?”. I hope that you will see the answer to all of these questions is yes.
And now for a promised bibliography. The topic both deep and broad. There’s a lot to comprehend, a lot to master, a lot to do. And, ah, I’m exhausted from writing this; you might be exhausted from reading. A provisional bibliography can be obtained from two papers I wrote on this topic:
The first paper is rather informal. The second invoked a bunch of math. Both have bibliographies. There are additional PDF’s in each of the directories that fill in more details.
This is the level I am currently trying to work at. I invite all interested parties to come have a science party, and play around and see how far this stuff can be made to go.
p.s. Did I forget to talk about semantics and meaning? That would appear to be the case, as these words do not appear above. Fear not! It’s all in there! You can get a glimmer of this by examining the marvelous bridge from mundane syntax to deep semantics provided by Mel’čuk’s Meaning-Text Theory. I’ve convinced myself that it goes much farther than that, but I have not the ability to convince you of it, unless you arrive at that conclusion yourself.
p.p.s. Yes, these kinds of ideas have been circulating for decades, or longer. My overt goal is to convert these ideas into working software. Towards this end, some fair amount of progress has been made, see the learn and the generate github repos. The meta-goal is to recruit enough interested parties to pursue this at something more than a snails pace of progress.
Reminds me a bit of Ulf Grenander’s Pattern Theory, some of which I learned a while ago. Could be interesting to you in case you haven’t come across.
Care to describe it in greater detail? Search engines immediately point at this: David Mumford, “Pattern Theory: A Unifying Perspective” (1994) which hits on some of my favorites: Bayes (eqn 1) and maximum likelihood (eqn 2) and pattern synthesis (figure 4, figure 12) and Gibbs fields (egn 13). It then promptly gets tangled in a bunch of old ideas that I dislike, and want to run away from: signal analysis, wavelets, image processing. Although image segmentation is indeed a key idea; but the results of segmentation need to be converted into an image grammar.
So, its got some of the key ideas, it says some of the important things, but leaves off the one thing I insists is absolutely vital: grammar. Well, two: not just grammar, but specifically, Link Grammar. We’ve had Chompskian grammars since the 1960’s, context-free and context-sensitive and what-not, evolved into head phrase-structure grammars (HPSG) and similar. Which are all very nice, but toss up a stumbling block: how can one unify grammar with the ideas of Grenander, as explicated by Mumford? Well, you can’t, or, its hard, or just not obvious, or brain-smashingly confusing. The jigsaw puzzle-piece metaphor for grammar offers the way out. It shows, for example, how a segmented image can be treated as a collection of jigsaw puzzle pieces, and Link Grammar shows exactly how jigsaw pieces form a grammar. For the HPSG fans: please note, dependency grammars (DG) can be algorithmically converted into HPSG’s. What I’m saying is that DG’s are simpler for practical applications, such as image segmentation.