Blockchains and smart contracts are all the rage, these days. What does this have to do with OpenCog? Let me explain.
TL;DR:
The idea of a block-chain comes from the idea of block ciphers, where you want to securely sign (or encrypt) some message, by chaining together blocks of data, in such a way that each prior encrypted block provides “salt” or a “seed” for the next block. Both bitcoin and git use block-chaining to provide cryptographic signatures authenticating the data that they store. Now, git stores big blobs of ASCII text (aka “source code”), while bitcoin stores a very simple (and not at all general) general ledger. Instead of storing text-blobs, like git, or storing an oversimplified financial ledger, like bitcoin, what if, instead, we could store general structured data? Better yet: what if it was tuned for knowledge representation and reasoning? Better still: what if you could store algorithms in it, that could be executed? But all of these things together, and you’ve got exactly what you need for smart contracts: a “secure”, cryptographically-authenticated general data store with an auditable transaction history. Think of it as internet-plus: a way of doing distributed agreements, world-wide. It has been the cypher-punk day-dream for decades, and now maybe within reach. The rest of this essay unpacks these ideas a bit more.
Git vs. Bitcoin
When I say “git, the block-chain”, I’m joking or misunderstanding, I mean it. Bitcoin takes the core idea of git, and adds a new component: incentives to provide an “Acked-by” or a “Signed-off-by” line, which git does not provide: with git, people attach Ack and Sign-off lines only to increase their personal status, rather than to accumulate wealth. What is more, git does NOT handle acked-by/signed-off-by in a cryptographic fashion: it is purely manual; Torvalds or Andrew Morton or the other maintainers accumulate these, and they get added manually to the block chain, by cut-n-paste from email into the git commit message.
Some of the key differences between git and bitcoin are:
- Bitcoin handles acked-by messages automatically, not manually, and they accumulate as distinct crypto signatures on the block-chain, — by contrast, the git process is to cut-n-paste the acked-by sigs from the email and into the commit, and only then crypto-sign. A modern block-chain API should provide this automated acked-by handling that git does not.
- Bitcoin provides (financial) incentives for people to generate acked-by messages: it does so through mining. Unfortunately, mining is incredibly stupid and wasteful: mining is designed to use immense amounts of cpu time before a new bitcoin is found. This stupidity is to balance a human weakness, by appealing to human greed: the ability to get “free money” in exchange for the crypto-signed acked-by’s. A modern block-chain API does NOT need to support mining, except maybe as an add-on feature so that it can say “hey, me too”. Keeping up with the Jones’s.
For the things that I am interested in, I really don’t care about the mining aspect of blockchains. It’s just stupid. Git is a superior block-chain to bitcoin. It’s got more features, its got a better API, it offers consistent histories — that is, merging! Which bitcoin does not. Understandably — bitcoin wants to prevent double-spending. But there are other ways to avoid double-spending, than to force a single master. Git shows the way.
Now, after building up git, it also has a lot of weaknesses: it does not provide any sort of built-in search or query. You can say “git log” and view the commit messages, but you cannot search the contents: there is no such feature.
Structured Data
Git is designed for block-chaining unstructured ASCII (utf8) blobs of character strings — source-code, basically — it started life as a source-code control system. Let’s compare that to structured data. So, in the 1960’s, the core concepts of relations and relational queries got worked out: the meaning of “is-a”, “has-a”, “part-of”, “is-owned-by”, etc. The result of this research was the concept of a relational database, and a structured query language (SQL) to query that structured data. Businesses loved SQL, and Oracle, Sybase, IBM DB2 boomed in the 1970’s and 1980’s, and that is because the concept of relational data fit very well with the way that businesses organize data.
Lets compare SQL to bitcoin: In bitcoin, there is only ONE relational table, and it is hard-coded. It can store only one thing: a quantity of bitcoin. There is only one thing you can do to that table: add or remove bitcoin. That’s it.
In SQL, the user can design any kind of table at all, to hold any kind of data. Complete freedom. So, if you wanted to implement block-chained smart contracts, that is what you would do: allow the user to create whatever structured data they might want. For example: every month, the baker wants to buy X bags of flour from the miller for Y dollars: this is not just a contract, but a recurring contract: every month, it is the same. To handle it, an SQL architect designs an SQL table to store dollars, bags of flour, multiple date-stamps: datestamp of when the order was made, date-stamp of when the order was given to the shipping firm (who then crypto-signs the block-chain of this transaction), the datestamp of when the baker received the flour, the datestamp of when the baker paid the miller. Each of these live on the block-chain, each get crypto-signed when the transaction occurs.
The SQL architect was able to design the data table in such a way that it is NATURAL for the purchase-ship-sell, inventory, accounts-payable, accounts-receivable way that this kind of business is conducted.
There are far more complicated transactions, in the petroleum industry, where revenue goes to pipeline owners, well owners, distillers, etc. in a very complicated process. Another example is the music-industry royalties. Both of these industries use a rather complex financial ledger system that resemble financial derivatives, except that there is no futures contract structure to it: the pipeline owner cannot easily arbitrage the petroleum distiller. Anyway, this is what accounting programs and general ledgers excel at: they match up with the business process, and the reason they can match up is because the SQL architect can design the database tables so that they fit well with the business process.
If you want to build a blockchain-based smart contract, you need to add structured data to the block-chain. So this is an example of where git falls flat: its an excellent block-chain, but it can only store unstructured ASCII blobs.
Comparing Git to SQL: Git is also missing the ability to perform queries: but of course: the git data is unstructured, so queries are hard/impossible, by nature. A smart-contract block-chain MUST provide a query language! Without that, it is useless. Let me say it again: SQL is KEY to business contracts. If you build a blockchain without SQL-like features in it, it will TOTALLY SUCK. The world does not need another bitcoin!
I hope you have followed me so far.
The AtomSpace
OK, now, we are finally almost at where OpenCog is. So: the idea of relational data and relational databases was fleshed out in the 1960’s and the 1970’s, and it appears to be enough for accounting. However, it is not enough for other applications, in two different ways.
First, for “big data”, it is much more convenient to substitute SQL and ACID with NoSQL and BASE. The Google MapReduce system is a prime example of this. It provides a highly distributed, highly-parallelizable query mechanism for structured data. Conclusion: if you build a block-chain for structured data, but use only SQL-type PRIMARY-KEY’s for your tables, it will fail to scale to big-data levels. Your block-chain needs to support both SQL and NoSQL. The good news is that this is a “solved problem”: it is known that these are category-theoretic duals, there is a famous Microsoft paper on this: “ACM Queue March 18, 2011 Volume 9, issue 3 “A co-Relational Model of Data for Large Shared Data Banks”, Erik Meijer and Gavin Bierman, Microsoft. Contrary to popular belief, SQL and noSQL are really just two sides of the same coin.”
Next problem: for the task of “knowledge representation” (ontology, triple-stores, OWL, SPARQL,) and “logical reasoning”, the flat tables and structures offered by SQL/noSQL are insufficient — it turns out that graphical databases are much better suited for this task. Thus, we have the concept of a graph_database, some well-known examples include Neo4j, tinkerpop, etc.
The OpenCog AtomSpace fits into this last category. Here, the traditional 1960’s-era “is-a” relation corresponds to the OpenCog InheritanceLink. Named relations (such as “Billy Bob is a part-time employee” in and SQL table) are expressed using EvaluationLinks and PredicateNodes:
(EvaluationLink (PredicateNode "is-employee") (ListLink (ConceptNode "BillyBob") (ConceptNode "employee")))
Its a bit verbose, but it is another way of expressing the traditional SQL relations. It is somewhat No-SQL-like, because you do not have to declare an “is-employee” table in advance, the way you do in SQL — there is no “hoisting” — instead, you can create new predicates dynamically, on the fly, at any time.
OpenCog has a centralized database, called the AtomSpace. Notice how the above is a tree, and so the AtomSpace becomes a “forest of trees”. In the atomspace, each link or node is unique, and so each tree shares nodes and links: this is called a “Levi graph” and is a general bipartite way of representing hypergraphs. So, the atomspace is not just a graph database, its a hypergraph database.
Edits to this database are very highly regulated and centralized: so there is a natural location where a blockchain signature could be computed: every time an atom is added or removed, that is a good place to hash the atomspace contents, and apply a signature.
The atomspace does NOT have any sort of history-of-transactions (we have not needed one, yet). We are (actually, Nil is) working on something similar, though, called the “backwards-inference tree”, which is used to store a chain of logical deductions or inferences that get made. Its kind-of-like a transaction history, but instead of storing any kind of transaction, it only stores those transactions that can be chained together to perform a forward-chained logical deduction. Because each of these deductions lead to yet-another deduction, this is also a natural location to perform crypto block-chaining. That is, if some early inference is wrong or corrupted, all later inferences become invalid – – that is the chaining. So we chain, but we have not needed crypto signatures on that chain.
The atomspace also has a query language, called the “pattern matcher“. It is designed to search only the current contents of the database. I suppose it could be extended to search the transaction history. The backward-inference-tree-chains were designed by Nil to be explicitly compatible with the pattern matcher.
The AtomSpace is a typed graph store, and some of the types are taken from predicate logic: there is a boolean AndLink, boolean OrLink, a boolean NotLink; but also an intuitionist-logic ChoiceLink, AbsentLink, PresentLink, and to round it out, a Kripke-frame ContextLink (similar to a CYC “microtheory” but much, much better). The reason I am mentioning these logic types is because they are the natural constructor types for smart contracts: in a legal contract, you want to say “this must be fulfilled and this or this but not this”, and so the logical connectives provide what you need for specifying contractual obligations.
Next, the AtomSpace has LambdaLinks, which implement the lambda abstractor from lambda calculus. This enables generic computation: you need this for smart_contracts. The AtomSpace is NOT very strong in this area, though: it provides a rather basic computational ability with the LambdaLink, but it is very primitive, and does not go much farther. We do some, but not a lot of computation in the AtomSpace. It was not meant to be the kind of programming language that humans would want to code in.
The atomspace does NOT have any lambda flow in it, e.g. Marius Buliga’s ChemLambda. I am still wrestling with that. The atomspace does have a distinct beta-reduction type, called PutLink, dual to the LambdaLink abstractor. However, for theorem-proving, I believe that a few more abstractors are needed: Buliga has four: lambda and beta, and two more. I am also trying to figure out Jean-Yves Girard’s Ludics. Not there, yet.
Security, Scalability
Perhaps I failed to mention: the current AtomSpace design has no security features in it, whatsoever. Absolutely zero. Even the most trivial hostile action will wipe out everything. There is a reason for this: development is focused on reasoning and thinking. Also, the current atomspace is not scalable. It’s also rather low-performance. Its unsuitable for big-data. None of these checkboxes that many people look for are satisfied by the atomspace. That’s because these issues are, for this project, quite low priority. We are focused on reasoning and understanding, and just about nothing else.
So, taken at face value, it is absurd to contemplate a blockchain for the atomspace; without even basic security, or decentralized, distributed storage, byzantine fault tolerance, and high performance, its a non-starter for serious consideration. Can these checkboxes be added to the atomspace, someday? Maybe. Soon? Not at all likely. These are nice-to-haves, but opencog’s primary focus must remain reasoning and thinking, not scalable, distributed, secure storage.
Conclusion
So that’s it, then: you can think of the OpenCog atomspace as a modern-day graphical, relational database that includes the datalog fragment of prolog, and lots of other parts as well. It has an assortment of weaknesses and failures, which I know of, but won’t get into here. It is probably a decent rough sketch for the data storage system that you’d want for a block-chained smart contract. To make it successful, you would need to a whole lotta things:
- First, you’d need to actually get a real-life example of a smart-contract that people would want to have.
- Next, you’d have to build a smart-phone app for it.
- Next, it would have to appeal to some core class of users: I dunno — package tracking for UPS, or the collection of executive signatures by some legal dept, or maybe some accounts-receivable system that some billing dept. would want to use. There has got to be a hook: people have to want to use it. It needs some magic ingredient.
The problem here is that, as a business, companies like IBM and PwC will trounce you at the high-end, cause they already have the business customers, and the IBM STSM’s are smart enough to figure out how block-chains work, and so will get some architects to create that kind of system for them. At the low-end, there must be thousands of twenty-something programmers writing apps for cell-phones, daydreaming of being the next big unicorn, and they are all exploring payment systems and smart-cards and whatever, at a furious pace. So if you really want a successful block-chain, smart-contract business, here, hold on to your butt.
I think that the only hope is to go open source, work with the Apache foundation, have them do the marketing for the AtomSpace or something like it, and set up API’s that people want to use. That’s a lot of work. But that is the way to go.
Do you know Ethereum? It’s basically Bitcoin 2.0 that allows for arbitarty data aswell as exectuable code to be stored on the Blockchain and they are currently working on moving from POW (Proof of Work, i.e. Mining) to POS (Proof of Stake) , which means no more pointless numbercruching athough all virtual miners still have to run the code in the blockchain that produced the new version/block to verify it.
It’s not quite what you want as they don’t have a DataBase like the AtomSpace and while you could implemente it on top that would be quite inefficent. You would want to implemente something like this on the client level. Even then it would be slow to verify.
Not sure how else you would want to prevent double-spending.
Well, not everything in life is about spending or double-spending. Git is a blockchain, and has no concept of spending in it. Git also allows storing of “arbitrary data”, as long as you don’t mind that your “arbitrary data” is an unstructured blob.
Here’s another: BigchainDB and a smart critique of it’s security flaws – basically, if even one database node can be corrupted, then the entire blockchain can be wiped out in one stroke!
Which maybe underscores an important point I failed to emphasize: The current opencog system has absolutely no security in it, whatsoever. There is no attempt to lock down anything at all, and anyone who has access, has total access. So… let me update the post to say this.