Seems like everyone wants to be an epidemiologist these days, so why not OpenCog? After all, diseases spread through networks, propagating from one node to the next. A network is a graph, the AtomSpace is a graph database, and the value flow subsystem provides a convenient infrastructure for propagating data (e.g. communicable diseases) from one node to the next. How hard could it be to model the progression of disease through a social network? The answer? Pretty easy.
The full working demo is here. It highlights the brand-new network generation codebase. The network generator is an AtomSpace module, specialized for generating random networks that are constrained by grammatical rules. This is in the same sense as “grammar” in linguistics, or computer science: a description of syntax. With an appropriate grammar, one can generate linguistic parse trees. But there’s no need to constrain oneself to trees: one can generate graphs with undirected edges and loops.The network generator is generic. That said, please note: this is version 0.1 of the network generator! This really is a brand-new module. It is needed for several projects, but really, its just completely brand-new, and is not a complete, finished final product.
Note also: this is a technology demo! This is not a full-scale medical epidemiology tool. It could be used for scientific research — but, for that, it would need to be extended and attached to other scientific tools, such as data analysis tools, graphing tools, network visualization tools. Such tools are commonplace and abundant; we won’t dwell on them here. The demo here is a demo of network generation, and of running a state transition machine on a network. Its a demo of using Atomese for these sorts of computations, playing to the strengths of Atomese.
The demo consists of four parts. The first part demonstrates how to write a finite state machine in Atomese. The second demonstrates the current network generation API. Third: actually run the danged thing, and fourth: collect up and print some stats. Before starting, though, a few words about Atomese.
Atomese is a graphical programming language. Unlike almost all other programming languages, it was designed for automation, and not for humans. Atomese expressions are graphs; they live as graphs in the AtomSpace graph database. Atomese encodes “Abstract Syntax Trees” (see the linked Wikipedia article.) Such a representation is useful for all of the reasons that the Wikipedia article says they are. But it also means that “coding” in Atomese can sometimes feel unusual. Like: “Why can’t I just code in python? It would be so much easier!” — Well, but that misses the point. And the point is, again: Atomese was designed for automation.
What this really means is that the graphs are meant to be easy for other algorithms to manipulate. It’s designed for ease-of-use by machines, not for humans! The style is more barren, unadorned, without short-cuts. Its a bit verbose at times. That’s OK, machines don’t mind verbosity and tedium.
Consider, for example, a graphical editor – something where you can drag-n-drop, hand-draw interacting bubble diagrams. Something easy to use – something a medical professional could get the hang of in a few hours – something where a disease model could be sketched out with bubbles and lines and flow-charts. The goal of Atomese is that it becomes easy — really easy — to convert such diagrams into executable code. That’s what Atomese is designed to do.
In this demo, the disease progression model is written in Atomese. This is the model that a graphical GUI WYSIWYG whizz-bang system would generate.
Atomese enables other, more abstract approaches. Consider an automatic model explorer: a system that automatically creates a variety of different disease progression models, and then explores how each model works. The model generator might try to fit an existing dataset. Alternately, it could mutate models so as to obtain certain characteristics. This moves beyond just adjusting some parameters in some hand-created, hypothesized model. This is not just some monte-carlo search or machine-learning hill-climbing for parameter tuning. This is a whole-sale creation of previously non-existent code. Atomese allows software to read, understand, mutate, modify and write programs.
There are two state transition rules. One governs the transmission of disease between pairs of individuals, and the other controls the disease progression in an infected individual. Consider first the evolution of the disease state in a single individual.
(And (Equal (ValueOf (Variable "$A") seir-state) exposed) (GreaterThan (ValueOf (Variable "$A") susceptibility) (RandomNumber (Number 0) (Number 1)))) (SetValue (Variable "$A") seir-state infected)
I’m hoping this is is self-explanatory. The (Variable “$A”) is a person, an individual in the network. The seir-state is a location – a lookup key, holding the “SEIR” state: “susceptible”, “exposed”, “infected”, “recovered”. If the individual is exposed, and a random draw exceeds that individual’s susceptibility, they will become infected. All of the state transitions can be written in this form.
The above is written in scheme, but you could also use Python — the AtomSpace has Python bindings. Arguing about Python vs. something else kind-of misses the point: the above Atomese is actually a graph, a tree, in this case, that happens to encode, in its nodes and links, an abstract syntax tree (AST) specifying an executable program. Atomese allows you to work directly with the AST, whether in scheme in python, or just abstractly, at a higher layer. The Wikipedia article on AST’s explains exactly why this is a useful thing to have.
The person-to-person transmission rule looks like this:
(Cond (And (Equal (ValueOf (Variable "$A") seir-state) susceptible) (Equal (ValueOf (Variable "$B") seir-state) infected) (Or (And (Equal (Variable "$REL") (Concept "friend")) (GreaterThan (RandomNumber (Number 0) (Number 1)) (Number 0.3))) (And (Equal (Variable "$REL") (Concept "stranger")) (GreaterThan (RandomNumber (Number 0) (Number 1)) (Number 0.7))))) (SetValue (Variable "$A") seir-state exposed)
Again, this is hopefully obvious. If A is susceptible and B is infected, then maybe A becomes exposed, if A gets close enough/spends enough time with B. A passing stranger is unlikely to lead to exposure; but friends are harder to avoid and thus one is more likely to be exposed to an infected friend.
The network grammar in this demo is extremely simple. It specifies two types of edges: “friend” and “stranger”, used for different types of social encounter. The grammar itself just consists of a set of “seeds” or “burrs” – a single vertex (a prototype individual) together with some half-edges on it – potential edges that could form either “friend” or “stranger” relationships.
It’s perhaps the most useful to imagine each burr as a jigsaw-puzzle piece: in this case, with only two types of connectors: “friend” or “stranger”, varying in quantity from a few, to many. The visuals show a seed with some attached half-edges, and the process of connecting them. It’s painfully obvious, isn’t it?
A single seed, expressed in Atomese, looks like this:
(Section (Concept "person-2-3") (ConnectorSeq (Connector (Concept "friend") (ConnectorDir "*")) (Connector (Concept "friend") (ConnectorDir "*")) (Connector (Concept "stranger") (ConnectorDir "*")) (Connector (Concept "stranger") (ConnectorDir "*")) (Connector (Concept "stranger") (ConnectorDir "*"))))
It has two unconnected “friend” connectors, and three unconnected “stranger” connectors. The connectors are unpolarized – they have no directionality, so that the resulting relationships are symmetrical. That is, unlike an actual jigsaw-puzzle piece, the “*” ConnectorDir doesn’t have a “direction”, it allows an any-to-any connection to be made. When a connection is formed, the connector labels must match – thus, in this case, two different kinds of edges are formed in the final graph.
Further below is an example random network, generated by the current system. It’s long, and thin – very long and thin – as controlled by the network generation parameters. Yes, a network of this shape is not very realistic. Exploring different network shapes is part of the fun! Adjustable parameters include what one might expect: a maximum size, number of graphs to generate, a set of starting points that must be bridged, and so on. Again: this is version 0.1; there is more to come.
Graph generation parameters are also specified in Atomese. For example, the maximum network size:
(State (Member max-network-size params) (Number 2000))
Initializing the network
After generating the network, one now has a number of interconnected individuals. Some initial values and state must be assigned to each. This is straight-forward, in Atomese:
(Bind (TypedVariable (Variable "$person") (Type "ConceptNode")) (Present (Member (Variable "$person") anchor)) (SetValue (Variable "$person") seir-state susceptible) (SetValue (Variable "$person") susceptibility (RandomNumber (Number 0.2) (Number 0.8))) (SetValue (Variable "$person") infirmity (RandomNumber (Number 0.01) (Number 0.55))) (SetValue (Variable "$person") recovery (RandomNumber (Number 0.6) (Number 0.95))))
The BindLink is a graph-rewriting link. It searches for all graphs of a certain shape, having a variable area in them, and then takes that variable, and generates some new graphs. In this particular case, the variable is obvious: the first stanza is just a type declaration for the variable. The second stanza, the PresentLink, asserts that the indicated graph must be found in the AtomSpace. It’s a very simple graph here, just a MemberLink that must be present. The remaining stanzas are created and executed, initializing various keys on the individual with various random values.
Running the simulation
The spread of the disease across the network can be achieved using the same graph-query and graph-rewriting mechanism. Consider the graph query below.
(Bind (VariableList (TypedVariable (Variable "$A") (Type "ConceptNode")) (TypedVariable (Variable "$B") (Type "ConceptNode"))) (And (Present (Evaluation (Concept "friend") (UnorderedLink (Variable "$A") (Variable "$B")))) (Equal (ValueOf (Variable "$A") seir-state) susceptible) (Equal (ValueOf (Variable "$B") seir-state) infected)) (SetValue (Variable "$A") seir-state exposed))
This locates all “friendship” graphs: one must find, present in the AtomSpace, a very simple pairwise relation between A and B, marked with the label “friend”. Habitually, EvaluationLinks are used for this purpose. Having found this relationship, and if B is infected, and A is not yet ill, nor has developed immunity (that is, if A is susceptible), then A becomes exposed. The UnorderedLink is a link-type that allows any ordering of the contents, and during searches, explores all permutations. Its uses here to encode a mutual relationship between two friends.
After running such a disease-transmission step, one then runs a state-change step, using the SEIR rules described up top. And then one repeats … looping, until the population has succumbed, or there has been a break in the transmission that keeps a portion of the population isolated and uninfected.
The graph query language is an all purpose tool. Network stats are straight-forward to query. Of course! Like so:
(Get (TypedVariable (Variable "$indiv") (Type "ConceptNode")) (And (Present (Member (Variable "$indiv") anchor)) (Equal (ValueOf (Variable "$indiv") seir-state) infected)))
The above just searches for all individuals, identified by their membership to a common anchor point. and then checks to see if their state is “infected”. This returns a list of all of those individuals; one need merely count to see how many of them there are.
Putting it together
The demo is fully functional. Start a guile shell, load the demo file, and off it will go. Network generation may take anywhere from a small fraction of a section to over ten seconds, that’s the only burp. The source is heavily documented. If you get into trouble, the wiki links provide additional documentation. If Atomese is confusing, then perhaps its a review of the basic AtomSpace examples is in order.
The adventurous may try translating the demo to Python. The AtomSpace has Python bindings. They work. There’s no particular reason that Atomese has to be written in scheme (although, clearly, I find it preferable: its easier, simpler than Python. For me, at least.)
The AtomSpace is an in-RAM (hyper-)graph database with a power associated graph query language, specifically formulated for working with large, complex graphs, for re-writing and editing those graphs, and for managing the flow of state through the graph. Modelling disease networks is one of the things it should be good at … and I hope, with this blog post, I’ve convinced you that this is indeed the case.