This post presents some speculative ideas and plans, but I broadcast them here because I think they are of particular strategic importance for the OpenCog project….
The topic is: how OpenCog and “current-variety deep learning perception algorithms” can help each other.
Background: Modern Deep Learning Networks
“Deep learning” architectures have worked wonders on visual and auditory data in recent years, and have also shown limited interesting results on other sorts of data such as natural language. The impressive applications have all involved training deep learning nets using a supervised learning methodology, on large training corpora; and the particulars of the network tend to be specifically customized to the problem at hand. There is also work on unsupervised learning, though so far purely unsupervised learning has not yielded practically impressive results. There is not much new conceptually in the new deep learning work, and nothing big that’s new mathematically; it’s mostly the availability of massive computing power and training data that has led to the recent, exciting successes…
These deep learning methods are founded on broad conceptual principles, such as
- intelligence consists largely of hierarchical pattern recognition — recognition of patterns within patterns within patterns.. —
- a mind should use both bottom-up and top-down dynamics to recognize patterns in a given data-item based on its own properties and based on experience from looking at other items
- in many cases, the dimensional structure of spacetime can be used to guide hierarchical pattern recognition (so that patterns higher-up in the hierarchy pertain to larger regions of spacetime)
However, the tools normally labeled “deep learning” these days constitute a very, very particular way of embodying these general principles, using certain sorts of “formal neural nets” and related structures. There are many other ways to manifest the general principles of “hierarchical pattern recognition via top-down and bottom-up learning, guided by spatiotemporal structure.”
The strongest advocates of the current deep learning methods claim that the deep networks currently used for perception, can be taken as templates or at least close inspirations for creating deep networks to be used for everything else a human-level intelligence needs to do. The use of human-labeled training examples obviously doesn’t constitute a general-intelligence-capable methodology, but if one substitutes a reinforcement signal for a human label, then one has an in-principle workable methodology.
Weaker advocates claim that networks such as these may serve as a large part of a general intelligence architecture, but may ultimately need to be augmented by other components with (at least somewhat) different structures and dynamics.
It is sometimes suggested that the “right” deep learning network might serve the role of the “one crucial learning algorithm” underlying human and human-like general intelligence. However, the deep learning paradigm does not rely on this idea… it might also be that a human-level intelligence requires a significant number of differently-configured deep networks, connected together in an appropriate architecture.
Deep Learning + OpenCog
My own intuition is that, given the realities of current (or near future) computer hardware technology, deep learning networks are a great way to handle visual and auditory perception and some other sorts of data processing; but that for many other critical parts of human-like cognition, deep learning is best suited for a peripheral role (or no role at all). Based on this idea, Ted Sanders, Jade O’Neill and I did some prototype experiments a few years ago, connecting a deep learning vision system (DeSTIN) with OpenCog via extracting patterns from DeSTIN states over time and importing relations among these patterns into the OpenCog Atomspace. This prototype work served to illustrate a principle, but did not represent a scalable methodology (the example dataset used was very small, and the different components of the architecture were piped together using ad hoc specialized scripts).
I’ve now started thinking seriously about how to resume this direction of work, but “doing it for real” this time. What I’d like to do is build a deep learning architecture inside OpenCog, initially oriented toward vision and audition, with a goal of making it relatively straightforward to interface between deep learning perception networks and OpenCog’s cognitive mechanisms.
What cognitive mechanisms am I thinking of?
- The OpenCog Pattern Miner, written by Shujing Ke (in close collaboration with me on the conceptual and math aspects), can be used to recognize (frequent or surprising) patterns among states of a deep learning network — if this network’s states are represented as Atoms. Spatiotemporal patterns among these “generally common or surprising” patterns may then be recognized and stored in the Atomspace as well. Inference may be done, using PLN, on the links representing these spatiotemporal patterns. Clusters of spatiotemporal patterns may be formed, and inference may be done regarding these clusters.
- Having recognized common patterns within a set of states of a deep network, one can then annotate new deep-network states with the “generally common patterns” that they contain. One may then use the links known in the Atomspace regarding these patterns, to create new *features* associated with nodes in the deep-network. These features may be used as inputs for the processing occurring within the deep network.
This would be a quite thorough and profound form of interaction between perceptual and cognitive algorithms.
This sort of interaction could be done without implementing deep learning networks in the Atomspace, but it will be much easier operationally if they are represented in the Atomspace.
A Specific Proposal
So I’ve put together a specific proposal for putting deep learning into OpenCog, for computer vision (at first) and audition. In its initial version, this would let one build quite flexible deep learning networks in OpenCog, deferring the expensive number-crunching operations to the GPU via the Theano library developed by Yoshua Bengio’s group at U. Montreal.
As it may get tweaked and improved or augmented by others, I’ve put it at the OpenCog wiki site instead of packing it into this blog post… you can read it at