<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>OpenCog Brainwave &#187; MOSES</title>
	<atom:link href="http://blog.opencog.org/tag/moses/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.opencog.org</link>
	<description>The latest developments in building an open-source mind</description>
	<lastBuildDate>Wed, 21 Mar 2012 16:44:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Genetic Crossover in MOSES</title>
		<link>http://blog.opencog.org/2012/03/20/genetic-crossover-in-moses/</link>
		<comments>http://blog.opencog.org/2012/03/20/genetic-crossover-in-moses/#comments</comments>
		<pubDate>Tue, 20 Mar 2012 18:59:58 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[Introduction]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[Genetic Algorithm]]></category>
		<category><![CDATA[Hillclimbing]]></category>
		<category><![CDATA[Learning]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Meta-learning]]></category>
		<category><![CDATA[MOSES]]></category>

		<guid isPermaLink="false">http://blog.opencog.org/?p=370</guid>
		<description><![CDATA[MOSES is a system for learning programs from input data.  Given a table of input values, and a column of outputs, MOSES tries to learn a program, the simplest program that can reproduce the output given the input values. ...]]></description>
			<content:encoded><![CDATA[<p><a href="http://wiki.opencog.org/w/MOSES">MOSES</a> is a system for learning programs from input data.  Given a table of input values, and a column of outputs, MOSES tries to learn a program, the simplest program that can reproduce the output given the input values.  The programs that it learns are in the form of a &#8220;program tree&#8221; &#8212;  a nested concatenation of operators, such as addition or multiplication, boolean AND&#8217;s or OR&#8217;s, if-statements, and the like, taking the inputs as arguments.  To learn a program, it starts by guessing a new random program.  More precisely, it generates a new, random program tree, with as-yet unspecified operators at the nodes of the tree. So, for example, an arithmetic node maybe be addition, or subtraction, or multiplication, division, or it may be entirely absent.  It hasn&#8217;t yet been decided which.   In MOSES, each such undecided node is termed a &#8220;knob&#8221;, and program learning is done by &#8220;turning the knobs&#8221; until a reasonable program is found.  But things don&#8217;t stop there: once a &#8220;reasonable&#8221; program is found, a new, random program tree is created by decorating this &#8220;most reasonable&#8221; program with a new set of knobs.  The process then repeats: knobs are turned until an even better program is found.</p>
<p>Thus, MOSES is a &#8220;metalearning&#8221; system: it consists of an outer loop, that creates trees and knobs, and an inner loop, that finds optimal knob settings.  Both loops &#8220;learn&#8221; or &#8220;optimize&#8221;; it is the nesting of these that garners the name &#8220;metalearning&#8221;.  Each loop can use completely different optimization algorithms in its search for optimal results.</p>
<p>The rest of this post concerns this inner loop, and making sure that it finds optimal knob settings as quickly and efficiently as possible.  The space of all possible knob settings is large: if, for example, each knob has 5 possible settings, and there are 100 knobs, then there is a total of 5<sup>100</sup> possible different settings: a combinatorial explosion. Such spaces are hard to search. There are a variety of different algorithms for exploring such a space.  One very simple, very traditional algorithm is &#8220;hillclimbing&#8221;.  This algo starts somewhere in this space, at a single point, say, the one with all the knobs set to zero.  It then searches the entire local neighborhood of this point: each knob is varied, one at a time, and a score is computed.  Of these scores, one will be best. The corresponding knob setting is then picked a the new center, and the process then repeats; it repeats until there is no improvement: until one can&#8217;t &#8220;climb up this hill&#8221; any further.  At this point, the inner loop is done; the &#8220;best possible&#8221; program has been found, and control is returned to the outer loop.</p>
<p>Hill-climbing is a rather stupid algorithm: most knob settings will result in terrible scores, and are pointless to explore, but the hill-climber does so anyway, as it has no clue as to where the &#8220;good knobs&#8221; lie.  It does an exhaustive search of the local neighborhood of single-knob twists.  One can do much better by using estimation-of-distribution algorithms, such as the Bayesian Optimization Algorithm.  The basic premise is that knob settings are correlated: good settings are near other good settings.  By collecting statistics and computing probabilities, one can make informed, competent guesses at which knob settings might actually be good.  The downside to such algorithms is that they are complex:  the code is hard to write, hard to debug, and slow to run: there is a performance penalty for computing those &#8220;educated guesses&#8221;.</p>
<p>This post explores a middle ground: a genetic cross-over algorithm that improves on simple hill-climbing simply by blindly assuming that good knob settings really are &#8220;near each other&#8221;, without bothering to compute any probabilities to support this rash assumption.  The algorithm works; headway can be made by exploring only the small set of knob settings that correlate with previous good knob settings.</p>
<p>To explain this, it is time to take a look at some typical &#8220;real-life&#8221; data. In what follows, a dataset was collected from a customer-satisfaction survey; the goal is to predict satisfaction from a set of customer responses.  The dataset is a table; the outer loop has generated a program decorated with a set of knobs.  Starting with some initial knob setting, we vary each knob in turn, and compute the score. The first graph below shows what a  typical &#8220;nearest neighborhood&#8221; looks like.  The term &#8220;nearest neighborhood&#8221; simply means that, starting with the initial knob setting, the nearest neighbors are those that differ from it by exactly one knob setting, and no more.  There is also a <em>distance</em>=2 neighborhood: those instances that differ by exactly two knob settings from the &#8220;center&#8221; instance.  Likewise, there is a <em>distance</em>=3 neighborhood, differing by 3 knob settings,<em> etc. </em> The size of each neighborhood gets combinatorially larger.  So, if there are 100 knobs, and each knob has five settings, then there are 5 × 100=500 nearest neighbors.  There are 500 × 499 / 2 = 125K next-nearest neighbors, and 500 × 499 × 498 / (2 × 3) = 21M instances at <em>distance</em>=3.  In general, this is the binomial coefficient: (500 choose <em>k</em>) for distance <em>k</em>.  Different knobs, however, may have more or fewer than just 5 settings, so the above is just a rough example.</p>
<div id="attachment_380" class="wp-caption alignnone" style="width: 650px"><a href="http://blog.opencog.org/files/2012/03/hc-20.png"><img class="size-full wp-image-380" src="http://blog.opencog.org/files/2012/03/hc-20.png" alt="Nearest Neighbor Scores" width="640" height="480" /></a><p class="wp-caption-text">Nearest Neighbor Scores</p></div>
<p>The above graph shows the distribution of nearest neighbor scores, for a &#8220;typical&#8221; neighborhood. The score of the center instance (the center of the neighborhood) is indicated by the solid green line running across the graph, labelled &#8220;previous high score&#8221;.  All of the other instances differ by exactly one knob setting from this center.  They&#8217;ve been scored and ranked, so that the highest-scoring neighbors are to the left.  As can be seen, there are maybe 15 instances with higher scores than the center, another 5 that seem to tie.  A slow decline is followed by a precipitous drop; there are another 80 instances with scores so bad that they are not shown in this figure.  The hill-climbing algo merely picks the highest scorer, declares it to be the new center, and repeats the process.</p>
<p>All of the other neighborhoods look substantially similar. The graph below shows an average over many generations (here, each iteration of the inner loop is one generation).  The jaggedness above is smoothed out by averaging.</p>
<div id="attachment_393" class="wp-caption alignnone" style="width: 650px"><a href="http://blog.opencog.org/files/2012/03/distrib-avg-bank-r01.png"><img class="size-full wp-image-393" src="http://blog.opencog.org/files/2012/03/distrib-avg-bank-r01.png" alt="Nearest Neighbor Score Change" width="640" height="480" /></a><p class="wp-caption-text">Nearest Neighbor Score Change</p></div>
<p>Rather than searching the entire neighborhood, one would like to test only those knob settings likely to yield good scores. But which might these be?  For nearest neighbors, there is no way to tell, without going through the bother of collecting statistics, and running them through some or another Bayesian estimation algorithm.</p>
<p>However, for more distant neighbors, there is a way of guessing and getting lucky: perform genetic cross-overs.  That is, take the highest and next-highest scoring instances, and create a new instance that differs from the center by two knob-settings, the two knobs associated with the two high scorers.  In fact, this new instance will very often be quite good, beating both of its parents.   The graph below shows what happens when we cross the highest scorer with each one of the next 70 highest. The label &#8220;1-simplex&#8221; simply reminds us that these instances differ by exactly two knob settings from the center.  More on simplexes later.  The green zero line is located at the highest-scoring single-knob change.  The graph shows that by starting here, and twiddling the next-most-promising knob, can often be a win.  Not always: in the graph below, only 4 different knobs showed improvement.  However, we explored relatively few instances to find these four; for this dataset, most exemplars have thousands of knobs.</p>
<div id="attachment_391" class="wp-caption alignnone" style="width: 650px"><a href="http://blog.opencog.org/files/2012/03/distrib-avg-bank-1-plex1.png"><img class="size-full wp-image-391" src="http://blog.opencog.org/files/2012/03/distrib-avg-bank-1-plex1.png" alt="Average Score Change, 1-simplex" width="640" height="480" /></a><p class="wp-caption-text">Average Score Change, 1-simplex</p></div>
<p>The take-away lesson here is that we can avoid exhaustive searches by simply crossing the 10 or 20 or 30 best instances, and hoping for the best. In fact, we get lucky with these guesses quite often.  What happens if, instead of just crossing two, we cross three of the top scorers?  This is the &#8220;2-simplex&#8221;, below:</p>
<div id="attachment_390" class="wp-caption alignnone" style="width: 650px"><a href="http://blog.opencog.org/files/2012/03/distrib-avg-bank-2-plex1.png"><img class="size-full wp-image-390" src="http://blog.opencog.org/files/2012/03/distrib-avg-bank-2-plex1.png" alt="Average Score Change" width="640" height="480" /></a><p class="wp-caption-text">Average Score Change, 2-simplex</p></div>
<p>Notice that there are now even more excellent candidates!  How far can we go?  The 3-simplex graph below shows the average score change from crossing over four high-scoring instances:</p>
<div id="attachment_388" class="wp-caption alignnone" style="width: 650px"><a href="http://blog.opencog.org/files/2012/03/distrib-avg-bank-3-plex.png"><img class="size-full wp-image-388" src="http://blog.opencog.org/files/2012/03/distrib-avg-bank-3-plex.png" alt="Average Score Change" width="640" height="480" /></a><p class="wp-caption-text">Average Score Change. 3-simplex</p></div>
<p>The term &#8220;crossover&#8221; suggests some sort of &#8220;sexual genetic reproduction&#8221;. While this is correct, it is somewhat misleading.   The starting population is genetically very uniform, with little &#8220;genetic variation&#8221;.  The algorithm starts with one single &#8220;grandparent&#8221;, and produces a population of &#8220;parents&#8221;, each of which differ from the grandparent by exactly one knob setting.  In the &#8220;nearest neighborhood&#8221; terminology, the &#8220;grandparent&#8221; is the &#8220;center&#8221;, and each &#8220;parent&#8221; is exactly one step away from this center.  Any two &#8220;parents&#8221;, arbitrarily chosen, will always differ from one-another by exactly two knob settings.  Thus, crossing over two parents will produce a child that differs by exactly one knob setting from each parent, and by two from the grandparent.   In the &#8220;neighborhood&#8221; model, this child is a distance=2 from the grandparent.   For the case of  three parents, the child is at distance=3 from the grandparent, and so on: four parents produce a child that is distance=4 from the grandparent.  Thus, while &#8220;sexual reproduction&#8221; is a sexy term, it looses its punch with the rather stark uniformity of the parent population; thinking in terms of &#8220;neighbors&#8221; and &#8220;distance&#8221; provides a more accurate mental model of what is happening here.</p>
<p>The term &#8220;simplex&#8221; used above refers to the shape of the iteration over the ranked instances: a 1-simplex is a straight line segment, a 2-simplex is a right triangle, a 3-simplex is a right tetrahedron.  The iteration is performed with 1, 2 or 3 nested loops that cross over 1, 2 or 3 instances against the highest.  It is important to notice that the loops do not run over the entire range of nearest neighbors, but only over the top scoring ones.   So, for example, crossing over the 7 highest-scoring instances for the 3-simplex generates 6!/(6-3)! = 6 × 5 × 4 = 120 candidates.  Scoring a mere 120 candidates can be very quick, as compared to an exhaustive search of many thousands of nearest neighbors.  Add to this the fact that most of the 120 are likely to score quite well, whereas only a tiny handful of the thousands of nearest neighbors will show any improvement, and the advantage of this guessing game is quite clear.</p>
<p>So what is it like, after we put it all together? The graph below shows the score as a function of runtime.</p>
<div id="attachment_397" class="wp-caption alignnone" style="width: 650px"><a href="http://blog.opencog.org/files/2012/03/deme-tri-ti.png"><img class="size-full wp-image-397" src="http://blog.opencog.org/files/2012/03/deme-tri-ti.png" alt="Score as function of time" width="640" height="480" /></a><p class="wp-caption-text">Score as function of time</p></div>
<p>In the above graph, each tick mark represents one generation. The long horizontal stretches between tick marks shows the time taken to perform an exhaustive nearest-neighborhood search.  For the first 100 seconds or so, the exemplar has very few knobs in it (a few hundred), and so an exhaustive search is quick and easy.  After this point, the exemplars get dramatically more complex, and consist of thousands of knobs.   At this point, an exhaustive neighborhood search becomes expensive: about 100 seconds or so, judging from the graph.  While the exhaustive search is always finding an improvement for this dataset, it is clear that performing some optimistic guessing can improve the score a good bit faster.  As can be seen from this graph, the algorithm falls back to an exhaustive search when the optimistic simplex-based guessing fails to show improvement; it then resumes with guessing.</p>
<p>To conclude: for many kinds of datasets, a very simple genetic-crossover algorithm combined with hillclimbing can prove a simple but effective search algorithm.</p>
<p><em>Note Bene</em>: the above only works for some problem types; thus it is not (currently) enabled by default. To turn it on, specify the -Z1 flag when invoking moses.</p>
<h2>Appendix</h2>
<p>Just to keep things honest, and to show some of the difficulty of algorithm tuning, below is a graph of some intermediate results taken during the work.  I won&#8217;t explain what they all are, but do note one curious feature:  the algos which advance the fastest initially seem to have trouble advancing later on.  This suggests a somewhat &#8220;deceptive&#8221; scoring landscape: the strong early advancers get trapped in local maxima that they can&#8217;t escape.   The weak early advancers somehow avoid these traps.  Note also that results have some fair dependence on the random number generator seed; different algos effectively work with different random sequences, and so confuse direct comparison by some fair bit.</p>
<div id="attachment_415" class="wp-caption alignleft" style="width: 1034px"><a href="http://blog.opencog.org/files/2012/03/deme-tri-more.png"><img class="size-full wp-image-415" src="http://blog.opencog.org/files/2012/03/deme-tri-more.png" alt="Many Different Algorithms" width="1024" height="768" /></a><p class="wp-caption-text">Many Different Algorithms</p></div>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=370&amp;md5=95c0c9157826aa7af7d26e86c21b2a14" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2012/03/20/genetic-crossover-in-moses/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Tuning Metalearning in MOSES</title>
		<link>http://blog.opencog.org/2012/02/07/tuning-moses/</link>
		<comments>http://blog.opencog.org/2012/02/07/tuning-moses/#comments</comments>
		<pubDate>Tue, 07 Feb 2012 20:53:31 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[Learning]]></category>
		<category><![CDATA[MachineLearning]]></category>
		<category><![CDATA[MOSES]]></category>

		<guid isPermaLink="false">http://blog.opencog.org/?p=326</guid>
		<description><![CDATA[I've been studying MOSES recently, with an eye towards performance tuning it. Turns out optimization algorithms don't always behave the way you think they do, and certainly not the way you want them to.

Given a table of values, MOSES ...]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been studying <a href="http://wiki.opencog.org/w/MOSES">MOSES</a> recently, with an eye towards performance tuning it. Turns out optimization algorithms don&#8217;t always behave the way you think they do, and certainly not the way you want them to.</p>
<p>Given a table of values, MOSES will automatically learn a program that reproduces those values. That is, MOSES performs table regression: given N columns of &#8220;input&#8221; values, and one column of &#8220;output&#8221;, MOSES will create a program that outputs the output, given the inputs.  MOSES can deal with both floating point and boolean inputs, and thus can learn, for example, expressions such as ((x&lt;2) AND b) OR (x*(y+1) &gt;3).  MOSES programs are real &#8220;programs&#8221;: it can even learn branches and loops, although I won&#8217;t explore that here. For performance tuning, I studied the 4-parity problem: given 4 input bits, compute the parity bit.  Written out in terms of just AND, OR and NOT, this is a fairly complex expression, and is rather non-trivial to learn.</p>
<p>MOSES performs learning by keeping a &#8220;metapopulation&#8221; of example programs, or &#8220;exemplars&#8221;.  These are graded on how well the match the output, given the inputs. For the 4-parity problem, there are 2<sup>4</sup>=16 different possible inputs; a given program may get any number of these correct.  For example, there are 16 ways to get one answer wrong; 16×15 ways to get two wrong, 16×15×14 ways to get three wrong, <em>etc.</em> This is the binomial distribution: (16 choose <em>k</em>) ways to get <em>k</em> answers wrong, in general. But this doesn&#8217;t mean that there are only 16 different programs that get one answer wrong: there are zillions: some simple, some very very complex.</p>
<p>As MOSES iterates, it accumulates a metapopulation of programs that best fit the data. As soon as it finds a program that gets more correct answers than the others, the old metapopulation is wiped out; but then, it starts growing again, as new programs with equal score are found.  This is shown in the following graph:</p>
<div id="attachment_333" class="wp-caption alignnone" style="width: 650px"><a href="http://blog.opencog.org/files/2012/02/stats-r7.png"><img class="size-full wp-image-333" src="http://blog.opencog.org/files/2012/02/stats-r7.png" alt="Metapopulation size as function of generation." width="640" height="480" /></a><p class="wp-caption-text">Metapopulation size as function of generation number.</p></div>
<p>The red line shows the metapopulation size (divided by 50), as a function of the generation number (that is, the iteration count).  It can be seen to collapse every time the score improves; here, the &#8220;minus score&#8221;, in green is the number of wrong answers: a perfect score has zero wrong answers; the program stops when a perfect score is reached.</p>
<p>In blue, the complexity of the program &#8212; actually, the complexity of the least complex program that produces the given score. Computing the parity requires a fairly complex combination of AND&#8217;s OR&#8217;s and NOT&#8217;s; there is a minimum amount of complexity such a program can have.  Here, for example, are two different programs that compute the parity perfectly, a short one:</p>
<pre>﻿﻿and(or(and(or(and(!#1 !#2) and(!#3 #4)) or(!#2 !#3))
   and(#1 #2) and(#3 !#4))
   or(and(!#1 #2) and(#1 !#2) and(!#3 !#4) and(#3 #4)))</pre>
<p>and a longer one:</p>
<pre>or(and(or(and(or(!#1 !#3) #4) and(or(#1 !#2) !#3 !#4)
   and(or(#3 #4) #2)) or(and(or(!#1 !#4) #2 !#3)
   and(or(!#2 #3) #1 #4) and(!#1 !#4) and(!#2 #3)))
   and(#1 !#2 #3 !#4))</pre>
<p>More on complexity later.</p>
<p>But first: how long does it take for MOSES to find a solution to 4-parity? It turns out that this depends strongly on the random-number sequence.  MOSES makes heavy use of a random number generator to explore the problem space.  Each run can be started with a different seed value, to seed the random number generator.  Some runs find the correct solution, some take a surprisingly long amount of time. Amazingly so: the distribution appears to follow a logarithmic distribution, as in the following graph:</p>
<div id="attachment_336" class="wp-caption alignnone" style="width: 650px"><a href="http://blog.opencog.org/files/2012/02/k4-bigger.png"><img class="size-full wp-image-336" src="http://blog.opencog.org/files/2012/02/k4-bigger.png" alt="" width="640" height="480" /></a><p class="wp-caption-text">Runtime, showing temperature dependence</p></div>
<p>One the vertical axis, the amount of time, in seconds, to find a solution. One the horizontal axis, the order in which a solution was found, out of 20 random attempts.  The way to read this graph is as follows: there is a probability Pr=1/20 chance of finding a solution in about 10 seconds.  There is a Pr=2/20 chance of finding a solution in about 20 seconds, <em>etc.</em> Continuing: about a Pr=6/20 chance of finding a solution in less than about 100 seconds, and a Pr=17/20 chance of finding a solution in less than about 1000 seconds.</p>
<p>The shape of this graph indicates that there is a serious problem with the current algorithm. To see this, consider running two instances of the  algorithm for 300 seconds each. Per the above graph, there is a 50-50 chance that each one will finish, or a 75% chance that at least one of them will finish.  That is, we have a 75% chance of having an answer after 600 CPU-seconds.  This is better than running a single instance, which requires about 900 seconds before it has a 75% chance of finding an answer!  This is bad.  It appears that, in many cases, the algorithm is getting stuck in a region far away from the best solution.</p>
<p>Can we do better? Yes. Write <em>p </em>= Pr(<em>t&lt;T</em>) for the probability that a single instance will find a solution in less time than T.  Then, from the complexity point of view, it would be nice if we had an algorithm if two instances did NOT run faster than a single instance taking twice as long; that is, if</p>
<p style="padding-left: 30px">Pr(<em>t&lt;2T</em>) ≤ <em>p<sup>2</sup>+2p(1-p)</em></p>
<p>The first term, <em>p<sup>2</sup></em>, is the probability that both instances finished.  The second term is the probability that one instance finished, and the other one did not (times two, as there are two ways this could happen).   More generally, for <em>n</em> instances,  we sum the probability that all <em>n</em> finished, with the probability that <em>n-1</em> finished, and one did not (<em>n</em> different ways), <em>etc.</em>:</p>
<p style="padding-left: 30px">Pr(<em>t&lt;nT</em>) ≤ <em>p<sup>n</sup> + np<sup>n-1</sup>(1-p) + n(n-1)p<sup>n-2</sup>(1-p)<sup>2</sup> + &#8230; + np(1-p)<sup>n-1</sup></em></p>
<p>This inequality, this desired bound on performance, has a simple solution, given by the exponential decay of probability:</p>
<p style="padding-left: 30px">Pr(<em>t&lt;T</em>) = 1-exp(<em>-T/m</em>)</p>
<p>As before,  Pr(<em>t&lt;T</em>) is the probability of finding a solution in less than time <em>T</em>, and<em> m</em> is the mean time to finding a solution (the expectation value). To better compare the measured performance to this desired bound, we need to graph the data differently:</p>
<p><a href="http://blog.opencog.org/files/2012/02/log-bound-unclamped.png"><img class="alignnone size-full wp-image-356" src="http://blog.opencog.org/files/2012/02/log-bound-unclamped.png" alt="Showing the exponential bound" width="640" height="480" /></a></p>
<p>This graph shows the same data as before, but graphed differently: the probability of not yet having found a solution is shown on the horizontal axis. Note that this axis is logarithmic, so that the exponential decay bound becomes a straight line.  Here, the straight purple line shows the bound for a 500 second decay constant; ideally, we&#8217;d like an algorithm that generates points below this line.</p>
<p>Before continuing, a short aside on the label &#8220;<em>temp</em>&#8220;, which we haven&#8217;t explained yet. During the search, MOSES typically picks one of the simplest possible programs out of the current metapopulation, and explores variations of it, it explores its local neighborhood.  If it cannot find a better program, it picks another, simple, exemplar out of the metapopulation, and tries with that, and so on.   It occurred to me that perhaps MOSES was being too conservative in always picking from among the least complex exemplars.  Perhaps it should be more adventurous, and occasionally pick a complex exemplar, and explore variations on that.   The results are shown in the green and blue lines in the graph above.  The <tt>select_exemplar()</tt> function uses a Boltzmann distribution to pick the next exemplar to explore.  That is, the probability of picking an exemplar of complexity <em>C</em> as a starting point is</p>
<p style="padding-left: 30px">exp(-<em>C/temp</em>)</p>
<p>where <em>temp</em> is the &#8220;temperature&#8221; of the distribution. The original MOSES algorithm used <em>temp</em>=1, which appears to be a bit too cold; a temperature of 2 seems about right.  With luck, this new, improved code will be checked into BZR by the time you read this.</p>
<p>There is another issue: the unbounded size of the metapopulation. When MOSES stalls, grinding away and having trouble finding a better solution, the size of the metapopulation tends to grow without bounds, linearly over time. It can get truly huge: sometimes up to a million, after a few thousand generations.  Maintaining such a large metapopulation is costly: it takes up storage, and eats up CPU time to keep it sorted in order of complexity.  Realistically, with a metapopulation that large, there is only a tiny chance (exponentially small!) that one of the high-complexity programs will be selected for the next round. The obvious fix is to clamp down on the population size, getting rid of the unlikely, high-complexity members.   I like the results so far:</p>
<div id="attachment_357" class="wp-caption alignnone" style="width: 650px"><a href="http://blog.opencog.org/files/2012/02/log-bound-clamped.png"><img class="size-full wp-image-357" src="http://blog.opencog.org/files/2012/02/log-bound-clamped.png" alt="clamped data" width="640" height="480" /></a><p class="wp-caption-text">Runtime, using a limited population size.</p></div>
<p>Clamping the population size clearly improves performance &#8212; by a factor of two or more, as compared to before.  However, the troublesome behavior, with some solutions being hard to discover, remains.</p>
<p>Now, to attack the main issue: Lets hypothesize what might be happening, that causes the exceptionally long runtimes.  Perhaps the algorithm is getting stuck at a local maximum?  Due to the knob-insertion/tweaking nature of the algorithm, there are no &#8220;true&#8221; local maxima, but some may just have very narrow exits.  The standard solution is to apply a simulated-annealing-type trick, to bounce the solver out of the local maximum.  But we are already using a Boltzmann factor, as described above, so what&#8217;s wrong?</p>
<p>The answer seems to be that the algorithm was discarding the &#8220;dominated&#8221;  exemplars, and was keeping only those with the best score, and varying levels of complexity. It only applied the Boltzmann factor to the complexity.  What if, instead, we applied the Boltzmann factor to mixture of score and complexity?  Specifically, lets try this:</p>
<p style="padding-left: 30px">exp(-(<em>C &#8211; S W) / temp</em>)</p>
<p>Here, <em>C</em> is the complexity, as before, while <em>S</em> is the score, and <em>W</em> a weight.  That is, some of the time, the algorithm will select exemplars with a poor score, thus bouncing out of the local maximum.  Setting <em>W</em> to zero regains the old behavior, where only the highest-scoring exemplars are explored.  So .. does this work? Yes! Bingo! Look at this graph:</p>
<div id="attachment_363" class="wp-caption alignnone" style="width: 650px"><a href="http://blog.opencog.org/files/2012/02/log-bound-weighted1.png"><img class="size-full wp-image-363" src="http://blog.opencog.org/files/2012/02/log-bound-weighted1.png" alt="score-weighted annealing" width="640" height="480" /></a><p class="wp-caption-text">Score-weighted Annealing</p></div>
<p>Two sets of data points, those for <em>W</em>=1/4 and 1/3, look very good.  Its somewhat strange and confusing that other <em>W</em> values do so poorly.   I&#8217;m somewhat worried that the <em>W</em>=1/4 value is &#8220;magical&#8221;: take a look again at the very first graph in this post.  Notice that every time a better solution is found, the complexity jumps by about 4.  Is this ﻿the <em>W</em>=1/4 value special to the 4-parity problem? Will other problems behave similarly, or not?</p>
<p>I&#8217;m continuing to experiment. Collecting data takes a long time.  More later&#8230;  The above was obtained with the  code in bzr revision 6573, with constant values for &#8220;temp&#8221; and &#8220;weight&#8221; hand-edited as per graphs.  Later revisions have refinements that fundamentally alter some loops, including that in <tt>select_exemplar()</tt>, thus altering the range of reasonable values, and the meaning/effect of some of these parameters. Sorry <img src='http://blog.opencog.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> </p>
<p>I do hope that this post does offer some insight into how MOSES actually works.  A general overview of MOSES can be found on the <a href="http://wiki.opencog.org/w/MOSES">MOSES wiki</a>, as well as a detailed description of the <a href="http://wiki.opencog.org/w/MOSES_algorithm">MOSES algorithm</a>. But even so, the actual behavior, above, wasn&#8217;t obvious, at least to me, until I did the experiments.</p>
<h2>Appendix: Parallelizability</h2>
<p>A short footnote about the generic and fundamental nature of the exponential decay of time-to-solution in search problems. Earlier in this post, there is a derivation of exponential decay as the result of running <em>N</em> instances in parallel.   How should this be understood, intuitively?</p>
<p>Search algorithms are, by nature, highly parallelizable: there are many paths (<em>aka exemplars</em>) to explore; some lead to a solution, some do not.  (An exemplar is like a point on a path: from it, there are many other paths leading away).  A serial search algorithm must implement a chooser: which exemplar to explore next? If this chooser  is unlucky/unwise, it will waste effort exploring exemplars that don&#8217;t lead to a solution, before it finally gets around to the ones that do.  By contrast, if one runs <em>N</em> instances in parallel (<em>N</em> large), then the chooser doesn&#8217;t matter, as the <em>N-1</em> &#8216;bad&#8217; exemplars don&#8217;t matter: the one good one that leads to a solution will end the show.</p>
<p>Thus, we conclude: if a serial search algorithm follows the exponential decay curve, then it has an optimal chooser for the next exemplar to explore.  If it is &#8220;worse&#8221; than exponential, then the chooser is poorly designed or incapable.  If it is &#8220;better&#8221; than exponential, then that means that there is a fixed startup cost associated with each parallel instance: cycles that each instance must  pay, to solve the problem, but do not directly advance towards a solution.  Ideal algorithms avoid/minimize such startup costs.  Thus, the perfect, optimal algorithm, when run in serial mode, will exhibit exponential solution-time decay.</p>
<p>The current MOSES algorithm very nearly achieves this for 4-parity, as shown in this last figure, which compares the original chooser to the current one (bzr revno 6579)</p>
<p><a href="http://blog.opencog.org/files/2012/02/final.png"><img class="alignnone size-full wp-image-366" src="http://blog.opencog.org/files/2012/02/final.png" alt="runtime, tuned chooser" width="640" height="480" /></a></p>
<p><em> </em></p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=326&amp;md5=208cefc62b86d6bd16e8e19a335926b2" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2012/02/07/tuning-moses/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Google Summer of Code</title>
		<link>http://blog.opencog.org/2008/05/05/google-summer-of-code/</link>
		<comments>http://blog.opencog.org/2008/05/05/google-summer-of-code/#comments</comments>
		<pubDate>Mon, 05 May 2008 20:36:11 +0000</pubDate>
		<dc:creator>David Hart</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[GSoC]]></category>
		<category><![CDATA[HyperGraphDB]]></category>
		<category><![CDATA[MOSES]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[OpenBiomind]]></category>
		<category><![CDATA[OpenCog]]></category>
		<category><![CDATA[OpenSim]]></category>
		<category><![CDATA[Pleasure]]></category>
		<category><![CDATA[RelEx]]></category>

		<guid isPermaLink="false">http://opencog.wordpress.com/?p=11</guid>
		<description><![CDATA[Crunch time is here! Our participation in Google's Summer of Code program has accelerated release schedules and shifted priorities. Ben is busy writing initial documentation, converting much of it from Novamente documentation. Gustavo, Senna and Linas are working to ...]]></description>
			<content:encoded><![CDATA[<p>Crunch time is here! Our participation in Google&#8217;s <a href="http://code.google.com/soc/2008/">Summer of Code</a> program has accelerated release schedules and shifted priorities. Ben is busy writing initial documentation, converting much of it from Novamente documentation. Gustavo, Senna and Linas are working to tidy OpenCog code, removing crufty and embarrassing bits and improving infrastructure and interfaces. Joel is working on the first collection of research-oriented MindAgents. You&#8217;ll hear more soon on this blog from these <a href="http://opencog.org/wiki/Community">team members</a>, and from GSoC students on the <a href="http://opencog.ning.com/">OpenCog Collective</a> blog and the new list <a href="http://groups.google.com/group/opencog-soc">opencog-soc@googlegroups.com</a>.</p>
<p>To quote Ben Goertzel&#8217;s post on <a href="http://groups.google.com/group/opencog/browse_thread/thread/aa7159e328f00406/63db705f7f518fb4">opencog@googlegroups.com</a>:<br />
<blockquote>The Google Summer of Code selection process is done, and 11 proposals were chosen.</p>
<p>It was a really painful process to go through, as we had more than 70 applications, and at least 25-30 of them were really quite good.</p>
<p>The accepted proposals span a fairly wide variety of areas, and the choices were ultimately made based on a number of factors including</p>
<p>&#8211; clarity and completeness of the proposal<br />
&#8211; background of the student<br />
&#8211; readiness of the OpenCog codebase for the project<br />
&#8211; critical-ness of the project for OpenCog</p>
<p>Of the 11 selected, 2 were for OpenBiomind projects, and the other 9 for OpenCog proper &#8230; including a bunch of stuff for the RelEx NLP toolkit.</p>
<p>There was a strong bias toward proposals dealing with improvements to OpenCog-related software components that already are moderately mature, like RelEx and MOSES.</p>
<p>Next year when OpenCog is more mature, if we are chosen to participate in GSoC again (as we hope, and have reason to somewhat expect), you can expect to see more explicitly, broadly, AGI-related proposals.</p>
<p>Anyway this list of selected projects is here for all who are curious:</p>
<p><b>OpenSim for OpenCog<br />
by Kino High Coursey, mentored by Andre Luiz de Senna</p>
<p>Implementing a SAT/SMT Based Link Grammar Parser<br />
by Filip Marić, mentored by Predrag Janicic</p>
<p>Bayesian and Causal Networks Inference using Indefinite Probabilities<br />
by Cesar Augusto Cavalheiro Marcondes, mentored by Cassio Pennachin</p>
<p>Java GUI for OpenBiomind<br />
by Bhavesh Sanghvi, mentored by Murilo Saraiva de Queiroz</p>
<p>MOSES: the Pleasure Algorithm<br />
by Alesis Novik, mentored by Nil Geisweiller</p>
<p>Graph Algorithms for HyperGraphDB<br />
by Guo Junfei, mentored by Ben Goertzel</p>
<p>Improved MOSES<br />
by ChenShuo, mentored by Moshe Looks</p>
<p>RelEx Web Crawler and HypergraphDB Manager<br />
by Rich Jones, mentored by David Hart</p>
<p>RelEx: Learning Simple Grammars<br />
by Elizabeth Dawn Alpert, mentored by Lukasz Kaiser</p>
<p>Distributed HipergraphDB Version<br />
by Costa Ciprian, mentored by Borislav Iordanov</p>
<p>Recursive Feature Selection for Enhancing Genetic Disease Prediction<br />
by Paul Cao, mentored by Lucio de Souza Coelho</b></p>
<p>Many thanks to all who applied, all who agreed to help mentor &#8230; and especially to David Hart for coming up with the idea of applying for SIAI to be included as a mentoring organization in GSoC, with a focus on OpenCog work.</p></blockquote>
<p>We&#8217;d also like to thank the terrific Open Source team at Google, particularly Leslie Hawthorn, Dave Anderson and Chris DiBona, for their patience and good advice.</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=11&amp;md5=086f820ddaad5c5fdd685376a9da451e" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2008/05/05/google-summer-of-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

