<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>OpenCog Brainwave &#187; NLP</title>
	<atom:link href="http://blog.opencog.org/tag/nlp/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.opencog.org</link>
	<description>The latest developments in building an open-source mind</description>
	<lastBuildDate>Wed, 21 Mar 2012 16:44:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Meaning-Text Theory</title>
		<link>http://blog.opencog.org/2009/11/08/meaning-text-theory/</link>
		<comments>http://blog.opencog.org/2009/11/08/meaning-text-theory/#comments</comments>
		<pubDate>Sun, 08 Nov 2009 04:58:49 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Theory]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Mel'cuk]]></category>
		<category><![CDATA[MTT]]></category>
		<category><![CDATA[Natural Language Generation]]></category>
		<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[RelEx]]></category>
		<category><![CDATA[semantics]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://brainwave.opencog.org/?p=154</guid>
		<description><![CDATA[During some recent reading, it struck me that a useful framework for thinking about and talking about sentence generation is the MTT or "meaning-text theory" of Igor Mel'cuk, et al  Here is one readable reference:

Igor A. Mel'čuk and ...]]></description>
			<content:encoded><![CDATA[<p>During some recent reading, it struck me that a useful framework for thinking about and talking about sentence generation is the MTT or &#8220;meaning-text theory&#8221; of Igor Mel&#8217;cuk, <em>et al </em> Here is one readable reference:</p>
<p>Igor A. Mel&#8217;čuk and Alain Polguère, (1987) &#8220;A Formal Lexicon in Meaning-Text Theory&#8221;, Computational Linguistics, vol. 13, pp. 261-275.</p>
<p><a href="http://portal.acm.org/citation.cfm?id=48160.48166" target="_blank">portal.acm.org/citation.cfm?id=48160.48166</a><br />
<a href="http://www.aclweb.org/anthology/J/J87/J87-3006.pdf" target="_blank">www.aclweb.org/anthology/J/J87/J87-3006.pdf</a></p>
<p>Within the context of that theory, the output of the Stanford parser is strictly at the SSynR or &#8220;surface syntactic representation&#8221; level, while, as a general rule Relex attempts to generate the DSynR or &#8220;Deep syntactic representation&#8221; structure.  Some of what I&#8217;ve been trying to do with opencog is towards the &#8220;SemR&#8221; structure, as described in that paper.</p>
<p>The more I read about MTT, the more it seems to capture some of what we are trying to do (defacto are doing) with NLP within opencog.  In particular, the MTT concept of a &#8220;lexical function&#8221; (which is not really described in that paper??) could be a particularly strong way of guaranteeing correct syntactic output for <a href="http://opencog.org/wiki/SegSim">segsim</a>, <a href="https://launchpad.net/nlgen">nlgen</a> or <a href="http://www.louisiana.edu/~bal2277/NLGen2.doc">NLGen2</a><br />
<span style="color:#888888"><br />
</span></p>
<p>&#8211; Linas Vepstas</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=154&amp;md5=019c90de4afd053a0b39c15697b9c6f5" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2009/11/08/meaning-text-theory/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Semantic dependency relations</title>
		<link>http://blog.opencog.org/2009/10/05/semantic-dependency-relations/</link>
		<comments>http://blog.opencog.org/2009/10/05/semantic-dependency-relations/#comments</comments>
		<pubDate>Mon, 05 Oct 2009 03:51:03 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Documentation]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[dependency grammar]]></category>
		<category><![CDATA[grammar]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[Rel]]></category>
		<category><![CDATA[Semantic relations]]></category>
		<category><![CDATA[Syntax]]></category>

		<guid isPermaLink="false">http://opencog.wordpress.com/?p=145</guid>
		<description><![CDATA[I spent the weekend comparing the Stanford parser to RelEx, and learned a lot.  RelEx really does deserve to be called a "semantic relation extractor", and not just a "dependency relation extractor".  It provides a more abstract, ...]]></description>
			<content:encoded><![CDATA[<p>I spent the weekend comparing the <a href="http://nlp.stanford.edu/software/lex-parser.shtml">Stanford parser</a> to <a href="http://opencog.org/wiki/RelEx_Semantic_Relationship_Extractor">RelEx</a>, and learned a lot.  RelEx really does deserve to be called a &#8220;semantic relation extractor&#8221;, and not just a &#8220;dependency relation extractor&#8221;.  It provides a more abstract, more semantic output than the Stanford parser, which sticks very narrowly to the syntactic structure of a sentence.</p>
<p>I wrote up a few paragraphs on the most prominent differences; most of my updates were to the <a href="http://opencog.org/wiki/Dependency_relations">RelEx dependency relations</a> page.</p>
<p>Here are the main bullet points:</p>
<ul>
<li>RelEx attempts basic entity extraction, and thus avoids generating nn noun modifier relations for named entities.</li>
<li>RelEx will collapse the object and complement of a preposition into one. Stanford will do this for some, but not all relationships.</li>
<li>RelEx will convert passive subjects into objects, and instead indicate passiveness by tagging the verb with a passive tense feature.</li>
<li> RelEx avoids generating copulas, if at all possible, and instead indicates copular relations as predicative adjectives, or in other ways.</li>
<li>RelEx extracts semantic variables from questions, with the intent of simplifying question answering. For example, &#8220;<em>Where is the ball?</em>&#8221; generates <em>_pobj(_%atLocation, _$qVar) _psubj(_%atLocation, ball)</em>, which can then pattern-match a plausible answer: <em>_pobj(under, couch)</em>.</li>
<li>RelEx attempts to extract <a href="http://opencog.org/wiki/Comparison_variables">comparison variables</a>.</li>
</ul>
<p>Its also clear to me that I could split up the relex processing into two stages: one which generates stanford-style syntactic relations, and a second stage that generates the more abstract stuff.  This might be a wise move &#8230; Since RelEx is already more than 3x faster than the Stanford parser, this could attract new users.</p>
<p>&#8211; Linas Vepstas</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=145&amp;md5=97df5fdc21f8c4787c761d192245b6ae" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2009/10/05/semantic-dependency-relations/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Sentence Patterns</title>
		<link>http://blog.opencog.org/2009/09/08/sentence-patterns/</link>
		<comments>http://blog.opencog.org/2009/09/08/sentence-patterns/#comments</comments>
		<pubDate>Tue, 08 Sep 2009 14:54:23 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Introduction]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[chatbot]]></category>
		<category><![CDATA[IRC]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[parser]]></category>
		<category><![CDATA[question answering]]></category>
		<category><![CDATA[RelEx]]></category>

		<guid isPermaLink="false">http://opencog.wordpress.com/?p=139</guid>
		<description><![CDATA[I've recently resumed work on the question-answering chatbot, and am trying to get it to comprehend a broader range of questions and statements.   The "big idea" is to create a number of "sentence patterns" that the pattern matcher can ...]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve recently resumed work on the question-answering chatbot, and am trying to get it to comprehend a broader range of questions and statements.   The &#8220;big idea&#8221; is to create a number of &#8220;sentence patterns&#8221; that the pattern matcher can recognize and respond to.  The reason this is a &#8220;big&#8221; idea is because I am trying to avoid anything algorothmic or procedural &#8212; everything is to be done by specifying OpenCog hypergraphs, and NOT by writing C++ code, or <a href="http://www.gnu.org/software/guile/guile.html">scheme</a> code (or python code&#8230;etc). The reason for working entirely with patterns and hypergraphs, rather than with C++ or scheme, is because this puts the &#8220;knowledge&#8221; of the system into a form that AI routines can manipulate it: learning algos can learn new hypergraphs; statistical algos can gather usage information on which hypergraphs get triggered, and so on.  This is all easer said than done: although I&#8217;ve eliminated a fair amount of question-answering code previously written in C++, I&#8217;ve also had to write some new scheme code. Bummer. <img src='http://blog.opencog.org/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /> </p>
<p>Patten matching is now used through-out all of the OpenCog NLP pipeline, although not in a unified manner. The <a href="http://www.abisource.com/projects/link-grammar/">Link Grammar parser</a> uses patterns (called &#8220;disjuncts&#8221;) to determine how the words in a sentence can link to one-another, thus &#8220;parsing&#8221;, or pulling the grammatical structure out of a sentence (<a href="http://www.cs.cmu.edu/afs/cs.cmu.edu/project/link/pub/www/papers/ps/tr91-196.pdf">this paper</a> provides an excellent overview). The <a href="http://opencog.org/wiki/RelEx">RelEx dependency relation extractor</a> applies patterns on the link-grammar output  to extract syntactic relations. For example, the sentence &#8220;John threw a rock&#8221; becomes</p>
<blockquote><p>_obj(throw, ball)<br />
_subj(throw, John)</p></blockquote>
<p>after RelEx gets done with it. And now, there are a dozen patterns inside of OpenCog that can pick out certain kinds of questions and statements from RelEx output, and pattern-match questions to find answers to them.</p>
<p>For example, the new OpenCog patterns convert &#8220;The capital of France is Paris&#8221; into</p>
<blockquote><p>capital_of(France, Paris)</p></blockquote>
<p>and similarly, &#8220;What is the capital of France?&#8221; into</p>
<blockquote><p>capital_of(France,what)</p></blockquote>
<p>Treating &#8220;what&#8221; as a variable, there is yet another pattern that matches up the form of the question to the form of the answer, thus deducing that &#8220;what&#8221; must be &#8220;Paris&#8221;.</p>
<p>Somewhat harder is using patterns to distinguish similar from dis-similar concepts, so that sentences like &#8220;John threw a green ball&#8221; aren&#8217;t used as answers to questions such as &#8220;Did John throw a red ball?&#8221;: the word &#8220;ball&#8221; with modifier &#8220;green&#8221; has to be detected as a different entity than the word &#8220;ball&#8221; with modifier &#8220;red&#8221;: these are two different entities (called &#8220;semes&#8221; in the code).  In fact, out of laziness, I&#8217;ve punted on this one: the promotion of word-instances to &#8220;semes&#8221; is done by code, rather than by pattern matching. But soon, I hope, this will change. In the meanwhile, the <a href="http://buildbot.opencog.org/doxygen/d7/d41/opencog_2nlp_2seme_2README-source.html">README file</a> provides a more detailed discussion.</p>
<p>Here are some patterns that work these days:</p>
<blockquote><p>&lt;me&gt;         John threw a green ball.<br />
&lt;me&gt;         Fred threw a red ball<br />
&lt;me&gt;         Mary threw a blue rock<br />
&lt;me&gt;         who threw a ball?<br />
&lt;cogita-bot&gt; Syntax pattern match found: Fred John<br />
&lt;me&gt;         who threw a red ball?<br />
&lt;cogita-bot&gt; Syntax pattern match found: Fred</p>
<p>&lt;me&gt;         Did Fred throw a ball?<br />
&lt;cogita-bot&gt; Truth query determined &#8220;yes&#8221;: throw</p>
<p>&lt;me&gt;         Did Fred throw a red ball?<br />
&lt;cogita-bot&gt; Truth query determined &#8220;yes&#8221;: throw</p>
<p>&lt;me&gt;         The color of the book is red.<br />
&lt;me&gt;         What is the color of the book?<br />
&lt;cogita-bot&gt; Triples abstraction found: red</p>
<p>&lt;me&gt;         the cat sat on the mat<br />
&lt;me&gt;         what did the cat sit on?<br />
&lt;cogita-bot&gt; Triples abstraction found: mat</p></blockquote>
<p>And here are some that don&#8217;t yet work: &#8220;Did Fred throw a green ball?&#8221; &#8212; gets no reply, because the system can&#8217;t find an answer, and doesn&#8217;t make the common-sense leap of &#8220;can&#8217;t find answer-&gt; answer must be no&#8221;.  Another common-sense problem is illustrated by: &#8220;Did Fred throw a round ball?&#8221; &#8212; the system doesn&#8217;t know that balls are round, and simply assumes that a &#8220;round ball&#8221; is some special kind of &#8220;ball&#8221;.  Oh well. There&#8217;s work to be done.</p>
<p>You can try out the chatbot yourself (when its up, and not broken!) on the IRC chat channel #opencog on the freenode.net chat servers.</p>
<p>&#8211; Linas Vepstas</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=139&amp;md5=291d34897fe55a736594b581aaa5a47f" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2009/09/08/sentence-patterns/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Frequency of grammatical disjuncts</title>
		<link>http://blog.opencog.org/2009/07/06/frequency-of-grammatical-disjuncts/</link>
		<comments>http://blog.opencog.org/2009/07/06/frequency-of-grammatical-disjuncts/#comments</comments>
		<pubDate>Mon, 06 Jul 2009 18:14:56 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[frequency]]></category>
		<category><![CDATA[grammar]]></category>
		<category><![CDATA[GSoC]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[link-grammar]]></category>
		<category><![CDATA[NLP]]></category>

		<guid isPermaLink="false">http://opencog.wordpress.com/?p=123</guid>
		<description><![CDATA[The link-grammar parser uses labeled links to connect together pairs of words.  In order to capture the idea of proper grammatical construction, any given word is only allowed to have very specific links to its right or left: for ...]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://www.abisource.com/projects/link-grammar/">link-grammar parser</a> uses labeled links to connect together pairs of words.  In order to capture the idea of proper grammatical construction, any given word is only allowed to have very specific links to its right or left: for example, verbs have their subject on the left, and an object on the right.  Link-grammar defines hundreds of different link types, and there are typically dozens or even hundreds of ways that these can attach to a word. Each allowed set of links is called a &#8220;disjunct&#8221;. So, for example:</p>
<p style="text-align:center">MVp- Js+</p>
<p>is a disjunct that says &#8220;there must be an MVp link from this word, going to the left, and an Js link, going to the right&#8221;. This disjunct commonly connects prepositions to a verb on their left (the MV- link) and the object of the preposition on the right (the J+ link).</p>
<p>A good way to think about disjuncts is to imagine them as very fine-grained part-of-speech tags. Thus, when one sees &#8220;MVp- Js+&#8221; associated to a word, one knows not only that the word is a preposition, but even a bit more: its a preposition that took a singular object.  Disjuncts classify words not just into crude part-of-speech categories, but much finer categories:  thus verbs are not just as transtivie or intransitive verbs, but mgiht be transitive verbs that take both direct and indirect objects, or participles, etc.</p>
<p>Siva Reddy, a GSOC 2009 summer student, prepared a table of the frequency of occurrence of different disjuncts in a large collection of text. The top six entries are</p>
<p style="text-align:center">Ds+           950275.635843<br />
Xp-           838569.90527<br />
A+          616522.664867<br />
AN+        566658.997313<br />
MVp- Js+       563082.649325<br />
MVp- Jp+      446487.310222</p>
<p style="text-align:left">and these are exactly what one might expect:</p>
<ul>
<li>Ds+ connects the determiner &#8220;the&#8221; to nouns: and of course, &#8220;the&#8221; is the most frequent word in the English language.</li>
<li>Xp- connects the period at the end of the sentence to the start of the sentence, so of course its frequently observed.</li>
<li>A+ connects adjectives to nouns, AN+ connects noun modifiers to nouns.</li>
<li>As noted above, MV connects verbs to modifying phrases, and J connects prepositions to objects, so that MV- J+ is the disjunct that most prepositions will get. Js connects to a singular object, Jp connects to a plural count or mass noun.</li>
</ul>
<p>A graph of rank vs. frequency is shown below:</p>
<div id="attachment_132" class="wp-caption alignnone" style="width: 490px"><img class="size-full wp-image-132" src="http://blog.opencog.org/files/2009/07/disjunct-true-rank2.png" alt="Disjunct rank vs. frequency of occurance " width="480" height="360" /><p class="wp-caption-text">Disjunct rank vs. frequency of occurance </p></div>
<p>As can be seen, the distribution is more or less Zipfian, with a power-law exponent of 1.5.  The fact that the long tail appears to be linear indicates that grammatical construction in the English language appears to be more ore less scale-free: difficult and akward constructions are increasingly rare.  The fact that the graph is not purely Zipfian, but instead has a knee for the most common grammatical connections suggests that the most common grammatical constructions are &#8220;less common than they should be&#8221;: almost as if English speakers are resisting the use of formulaic sentence constructions. So, for example, since adjectives and noun-modifiers appear near the top of the rank, this suggests that English speakers &#8220;could have&#8221; used more adjectives and noun-modifiers, but didn&#8217;t. Quite why this is so is not clear.  Perhaps the use of anaphora and references in general  helps decrease the need for lots of modifiers.</p>
<p>The open questions are then:</p>
<ol>
<li>Why a power law of 1.5?</li>
<li>Why is there a knee?</li>
<li>Does this result hold for other languages?</li>
</ol>
<p>The corpus used here consists of approximately 1 million sentences, obtained by parsing entire Wikipedia articles, Voice of America news stories, and 10 books from Project Gutenberg, including War and Peace, Jane Austen, and some scientific or medical texts.</p>
<p>&#8211; Linas Vepstas</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=123&amp;md5=071ee4693b8c455d8b281862fe2a1e0d" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2009/07/06/frequency-of-grammatical-disjuncts/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>proto-chatbot at last!</title>
		<link>http://blog.opencog.org/2009/04/28/proto-chatbot-at-last/</link>
		<comments>http://blog.opencog.org/2009/04/28/proto-chatbot-at-last/#comments</comments>
		<pubDate>Tue, 28 Apr 2009 02:22:07 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Meta]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[chatbot]]></category>
		<category><![CDATA[ConceptNet]]></category>
		<category><![CDATA[link-grammar]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[RelEx]]></category>

		<guid isPermaLink="false">http://brainwave.opencog.org/?p=114</guid>
		<description><![CDATA[A prototype chatbot demonstrates the OpenCog NLP pipeline by parsing simple statements and answering simple questions. <a href="http://blog.opencog.org/2009/04/28/proto-chatbot-at-last/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Hands-on tutorials are planned for the next month or so; we&#8217;ve already had a few on PLN, and my turn is coming up, for the opencog NLP pipeline. So I thought I&#8217;d wire up a cute demo for the occasion: a rough, crude IRC chatbot, &#8220;La Cogita&#8221;. It can answer simple questions about straightforward statements.  Nothing fancy &#8230; it doesn&#8217;t do any reasoning <strong><em>at all</em></strong> &#8230; but it can work off of the basic syntactic structure of English sentences to find answers. Thus, for example:</p>
<p><tt><br />
&lt;linas&gt; Mary ate a mango<br />
&lt;cogita-bot&gt; Hello linas, parsing ...<br />
&lt;cogita-bot&gt; linas, you made a statement: Mary ate a mango<br />
&lt;linas&gt; what did Mary eat?<br />
&lt;cogita-bot&gt; Hello linas, parsing ...<br />
&lt;cogita-bot&gt; linas, you asked a question: what did Mary eat?<br />
&lt;cogita-bot&gt; The answer to your question is: mango<br />
</tt></p>
<p>Its meant to be a demo of the basic NLP pipeline within OpenCog. It takes input text, runs it throught the <a href="http://www.abisource.com/projects/link-grammar/">Link Grammar</a> + <a href="http://opencog.org/wiki/RelEx">RelEx</a> parser, imports the results into the OpenCog atomspace, sucks in a small common-sense database, and waits for questions to be posed.  The common-sense database is derived from MIT&#8217;s <a href="http://conceptnet.media.mit.edu/">ConceptNet (OpenMind/CommonSense project)</a>, and so one can have interactions like the following:</p>
<p><tt><br />
&lt;linas&gt; cogita-bot: what is a saxophone?<br />
&lt;cogita-bot&gt; Hello linas, parsing ...<br />
&lt;cogita-bot&gt; linas, you asked a question:  what is a saxophone?<br />
&lt;cogita-bot&gt; No answer was found to your question.<br />
&lt;linas&gt; hmm<br />
&lt;linas&gt; cogita-bot: what is an instrument?<br />
&lt;cogita-bot&gt; Hello linas, parsing ...<br />
&lt;cogita-bot&gt; linas, you asked a question:  what is an instrument?<br />
&lt;cogita-bot&gt; The answer to your question is: woodwind r bass harmonica An_Oboe Oboe megaphone saxophone chronometer drum scale ukulele cymbal instrument<br />
&lt;linas&gt; Heh. Complete with assorted linguistic garbage <img src='http://blog.opencog.org/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /><br />
</tt></p>
<p>You get the idea. Don&#8217;t ask it anything more complicated than the above examples: it will fail to find any answer. Again, <em><strong>it does no reasoning at all. Its as thick as a brick</strong></em>. You can test-drive it at the #opencog channel on the freenode.net IRC network.  Assuming its not down for development.</p>
<p>Next up: wire in <a href="https://launchpad.net/nlgen">NLGen for natural-language output</a>, and start taking baby steps in actual reasoning. Anyway, I&#8217;m pretty excited, as this means that a lot of the basic bits&amp;pieces are working, and I can now dive into the deep end, and start working on the hard stuff.</p>
<p>&#8211; Linas Vepstas</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=114&amp;md5=cffcebe95319beea22d00071f0b5efea" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2009/04/28/proto-chatbot-at-last/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Distribution of Mutual Information</title>
		<link>http://blog.opencog.org/2009/03/11/distribution-of-mutual-information/</link>
		<comments>http://blog.opencog.org/2009/03/11/distribution-of-mutual-information/#comments</comments>
		<pubDate>Wed, 11 Mar 2009 22:28:55 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[corpus linguistics]]></category>
		<category><![CDATA[frequency]]></category>
		<category><![CDATA[linguistics]]></category>
		<category><![CDATA[mutual information]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[statistics]]></category>
		<category><![CDATA[word pair]]></category>

		<guid isPermaLink="false">http://brainwave.opencog.org/?p=89</guid>
		<description><![CDATA[A bit of corpus linguistics is performed to examine the mutual information distribution of word pairs. <a href="http://blog.opencog.org/2009/03/11/distribution-of-mutual-information/">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve been playing NLP statistics games for a long time now, and got to thinking that I had no clue as to the statistical distribution of some of the things I work with.  So below follow some graphs.</p>
<div id="attachment_95" class="wp-caption alignnone" style="width: 490px"><img class="size-full wp-image-95" src="http://blog.opencog.org/files/2009/03/mi-nearby1.png" alt="Mutual information of nearby words" width="480" height="360" /><p class="wp-caption-text">Mutual information of nearby words</p></div>
<p>Above is a graph showing the distribution of the mutual information of word pairs that occur in the same sentence. A number of texts were analyzed, including a portion of Wikipedia, some books from project Gutenberg, etc. A collection of all possible pairs of words was created, where each word in the pair occurs in the same sentence (with the left word of the pair having occurred in the sentence to the left of the right word in the pair). These were counted &#8212; about 10 million word pairs were observed &#8212; and their mutual information was calculated.</p>
<p>Mutual information is a measure of the likelihood of seeing two words occur together: thus, for example &#8220;Northern Ireland&#8221; will have a high mutual information, since the words &#8220;Northern&#8221; and &#8220;Ireland&#8221; are used together frequently.  By contrast, &#8220;Ireland is&#8221; will have negative mutual information, mostly because the word &#8220;is&#8221; is used with many, many other words besides &#8220;Ireland&#8221;; there is no special relationship between these words. High-mutual-information word pairs are typically noun phrases, often idioms and &#8220;collocations&#8221;, and almost always embody some concept (so, for example, &#8220;Northern Ireland&#8221; is the name of a place &#8212; the name of the conception of a particular country).</p>
<p>In mathematical terms, the mutual information of a word pair (x,y) is defined as:</p>
<p>M(x,y) = log_2  P(x,y) / P(x,*) P(*,y)</p>
<p>where P(x,y) is the probability of seeing the word pair (x,y), P(x,*) is the probability of seeing a word pair where the left word is x, and P(*,y) is the probability of seing a word pair where the right word is y.</p>
<p>The graph shows M(x,y) on the horizontal axis, and the probability of seeing such a value of M on the vertical axis. This is a bin-count of the distribution of possible values of mutual information, over all word pairs.  This is *NOT* a scatterplot of M(x,y) vs. P(x,y).</p>
<p>Here&#8217;s another graph: same as above, except that this time, only pairs of words that occur immediately next to one-another are considered.  The sample size is much smaller: only about 2.4M word-pairs were collected.</p>
<div id="attachment_97" class="wp-caption alignnone" style="width: 490px"><img class="size-full wp-image-97" src="http://blog.opencog.org/files/2009/03/mi-pair.png" alt="Mutual Information of Neighboring Word Pairs" width="480" height="360" /><p class="wp-caption-text">Mutual Information of Neighboring Word Pairs</p></div>
<p>The blue and green exponential lines are located in <strong>exactly</strong> the same place as in the previous graph. It&#8217;s humped in a different way than the previous graph. What is the shape of this hump?  Are the slopes characteristic, or do they vary from one corpus sample to another?  If anyone knows the answers to these questions, please let me know!</p>
<p>Notice the peaks off to the right, at high MI values, in the first graph. I think these are word pairs which are heavily used (topics/terms that are discussed) in one single contributing text, but in none of the others. That&#8217;s the hypothesis, I don&#8217;t know.</p>
<p>Here is a <a href="http://linas.org/nlp/word-pairs.pdf">more detailed discussion, with many other additional figures</a>.</p>
<p>&#8211; Linas Vepstas</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=89&amp;md5=935a83757361733fc1af99acacb3a054" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2009/03/11/distribution-of-mutual-information/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Determining word senses from grammatical usage</title>
		<link>http://blog.opencog.org/2009/01/12/determining-word-senses-from-grammatical-usage/</link>
		<comments>http://blog.opencog.org/2009/01/12/determining-word-senses-from-grammatical-usage/#comments</comments>
		<pubDate>Mon, 12 Jan 2009 22:34:23 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[link-grammar]]></category>
		<category><![CDATA[Natural Language Processing]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[Syntax]]></category>
		<category><![CDATA[WSD]]></category>

		<guid isPermaLink="false">http://brainwave.opencog.org/?p=70</guid>
		<description><![CDATA[I've recently been tinkering with a mechanism for determining word senses based on their grammatical usage.  This has me pretty excited, because, so far, it seems to be reasonably accurate (i.e. not terrible), and lightning-fast.  I'm doing this by ...]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ve recently been tinkering with a mechanism for determining word senses based on their grammatical usage.  This has me pretty excited, because, so far, it seems to be reasonably accurate (<em>i.e.</em> not terrible), and lightning-fast.  I&#8217;m doing this by doing some heavy statistical NLP work, computing statistical correlations between word senses and syntax &#8212; specifically, link-grammar disjuncts.</p>
<p>The basic observation driving this work is that fairly often, one can identify the meaning of a word (or at least narrow it down) simply by observing how it is used in a sentence. To do this, one needs accurate, fine-grained syntactical information about a sentence: and Link Grammar provides an abundance of it. Link Grammar generates very fine-grained syntactic linkage information for every word in a sentence. Every dictionary word is associated with dozens, or even hundreds, of different linkage patterns, or &#8216;disjuncts&#8217;. These disjuncts indicate how a word in a sentence can be connected to other words to its left or right.  One may imagine that these disjuncts provide highly detailed grammatical information about a word. For example, they not only distinguish between a noun and a verb; they not only distinguish between present, past and future tenses of a verb, they not only distinguish between a transitive and an intransitive verb, but they also distinguish between a wide variety of relationships that are so fine that most are not even given formal names by linguists. This fine-grained information is a rich source of information, and is ideal for correlating with word senses.</p>
<p>For word-sense tagging, it turns out that simply picking the Most Frequent Sense (MFS) already gives a rather strong indication of what the correct word-sense assignment should be &#8212; it gives the correct answer about half the time&#8211; See [Mi04] below.  So the idea here is that supplementing the MFS with additional data&#8211; how the word was used syntactically &#8212; will improve accuracy even further.  That is, the idea is to compute the &#8220;MFS for a given syntactic context&#8221;.</p>
<p>Before one can perform statistical correlations between syntax and sense, one must first assign a sense to each word.  This is, of course, the really hard part to modern NLP. For now, I&#8217;m taking a fairly easy, straightforward, yet strong approach to this: I&#8217;m using  an algorithm due to Rada Mihalcea[Mi05].  This algorithm is currently, as far as I know, the most accurate algorithm known for tagging words with word senses. Unfortunately, it is also fairly slow and CPU intensive, making practical deployment tricky.  It performs this tagging by associating, with each word in a sentence, a list of its possible senses.  Then, senses of nearby words are linked together with a similarity measure. This forms a network, a graph, over the sentence, with vertices being word-senses, and edges being weighted by the similarity between senses. Such a network is formally a Markov chain, and can be solved as such. There are many ways of solving a Markov chain; Mihalcea proposes, and I&#8217;ve implemented, the Google (Page-Brin) page-rank algorithm[PB]. The result is that each vertex (each word-sense) is assigned a probability; The highest-probability senses do indeed appear to be the linguistically correct senses most of the time; the correct sense will almost always appear in the top three probabilities.</p>
<p>Once a word sense is identified, it can be correlated with the Link Grammar disjunct in play for the particular sentence.  This is done simply by processing a lot of sample text. A database then stores a frequency count for each (word, disjunct, word-sense) triple. After a reasonable amount of data is accumulated the unconditional probability p(w,d,s) can be calculated (w==word, d==disjunct, s==sense), and, from this, various marginal and conditional probabilities and entropies. To make use of this information in a new sentence, one first parses the sentence using the Link Grammar parser, thus obtaining (word,disjunct) pairs. It is then a straightforward (and fast) database lookup to obtain the conditional probability p(s|w,d), the probability of observing the sense s given the pair (w,d).  The exciting result of this effort is that, quite often, the conditional probability p(s|w,d) identifies one sense more or less uniquely (i.e. there is one sense for which p(s|w,d) is about 1).</p>
<p>Although this result is quite exciting, its based on the inspection of a small handful of nouns and verbs. I believe that the result holds well in a broad setting, but I don&#8217;t have any quantitative measure for the extent of the setting. Clearly, there will be *some* words for which the sense will be obvious from the grammatical usage. But, on average, how many of these are there per sentence?  In some cases, the sense won&#8217;t be unique, but there will be many senses that are ruled out. How often does this happen?  Is it possible that the accuracy results are equal to, or even improve, on the Mihalcea accuracy results? (They may improve on them by averaging over and eliminating false-positives, eliminating them because of the various different semantic contexts a word might appear in).  A quantitative measure of the recall and accuracy can, in principle, be done, as there is a database (the SemCor database) of text that has been hand-annotated with the correct senses.  I&#8217;ve not yet given any serious thought to performing this quantitative analysis.</p>
<p>Still, I&#8217;m pretty excited. It seems to work pretty well; I like that. I&#8217;ve already roughed in some basic infrastructure into the Link Grammar parser so that it will return the sense tags for each parse, assuming you have the database installed.  The tags returned are WordNet 3.0 sense keys &#8212; strings like &#8220;run%2:38:04::&#8221; which can be used to look up specific senses from WordNet.</p>
<p>To expose this function, database support has been added to the link-grammar parser.  This has been added to the parser itself, as opposed to a layer built on top of it, because database support is needed for other reasons &#8212; specifically, for parse ranking (Gee, I haven&#8217;t talked about parse ranking, have I?).  The database support is provided by sqllite[SQLLITE]. This was picked for two reasons: (1) its license is public domain, and is thus compatible with the link-grammar BSD license, and (2) it is an embedded database, requiring zero administration by the user. This second point is quite unlike traditional SQL databases, which typically require trained database administrator to configure and operate. One reason that zero administration is possible is because the database is used in a read-only fashion: the data it holds is static. Code integrating this database is in the link-grammar SVN repository now, and will be available in version 4.4.2.</p>
<p>Creating the dataset is a good bit tricker. Currently, the Mihalcea algorithm is implemented within OpenCog.  Was this a good technology choice? I dunno, but it seemed like a reasonable experiment at the time. Parsed sentences are fed to OpenCog, where word senses are assigned, and then frequency counts are updated.  I&#8217;ve been feeding it a diet consisting solely of parsed Wikipedia articles &#8212; not very healthy, but maybe OK for now.  The Markov chain network used to solve for word senses is four sentences wide, as a window sliding across an article. That is, a given word sense is influenced by other words occurring in sentences as far as four sentences away.  This should keep accuracy up, without bogging down in solving a Markov chain across an entire article. The current OpenCog implementation uses the Page-Brin PageRank algorithm; however, I&#8217;m thinking tht it might be faster simple to use the linpack subroutine library to solve for the eigenvectors directly. (The Page-Brin algorithm shows its power when the Markov chain has  billions or trillions of nodes, e.g. as used by Google.  By contrast, with a four-sentence sliding window, the Markov matrix connects at most thousands of senses, and thus should be rapidly solvable by ordinary linear equation techniques.)</p>
<p>So far, I&#8217;ve put a few CPU-months of data crunching into this. Its not much. It&#8217;s slow; it takes minutes per sentence &#8212; and this gives only one word-sense, disjunct pair. To build up a reasonable statistical dataset will require cpu-hours or more per word &#8212; and there&#8217;s not really all that many cpu-hours in a cpu-month. So its slow slogging. The  database coverage remains quite thin, as most disjuncts have been observed only a handful of times. There are maybe about a thousand or so words for which an &#8216;adequate&#8217; amount of statistics have been collected; I&#8217;d like it better if the  database was deep enough to cover at least 20K words and 100K senses.  Since the preliminary results look so promising, I&#8217;m corssing my fingers, and am slowly been tinkering with ways to improve performance.</p>
<p>I&#8217;ll let you know &#8230;<br />
References<br />
==========<br />
[Mi04] Rada Mihalcea, Paul Tarau, Elizabeth Figa, &#8220;PageRank on Semantic Networks, with Application to Word Sense Disambiguation&#8221; (2004) COLING &#8217;04: Proceedings of the 20th international conference on Computational Linguistics <a href="http://dx.doi.org/10.3115/1220355.1220517">DOI</a><br />
[Mi05]  Rada Mihalcea, &#8220;Unsupervised Large-Vocabulary Word Sense Disambiguation with Graph-based Algorithms for Sequence Data  Labeling&#8221;, (2005) HLT &#8217;05: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing <a href="http://dx.doi.org/10.3115/1220575.1220627">DOI</a></p>
<p>[SQLLITE] http://www.sqlite.org/</p>
<p>[PB] http://en.wikipedia.org/wiki/PageRank</p>
<p>&#8211; Linas Vepstas</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=70&amp;md5=7062da30e03a8b1aaaac7ae8e4c43896" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2009/01/12/determining-word-senses-from-grammatical-usage/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Hacking on Link-Grammar</title>
		<link>http://blog.opencog.org/2008/08/17/hacking-on-link-grammar/</link>
		<comments>http://blog.opencog.org/2008/08/17/hacking-on-link-grammar/#comments</comments>
		<pubDate>Sun, 17 Aug 2008 19:43:50 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[grammar]]></category>
		<category><![CDATA[Learning]]></category>
		<category><![CDATA[link-grammar]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[WSD]]></category>

		<guid isPermaLink="false">http://opencog.wordpress.com/?p=41</guid>
		<description><![CDATA[I hack, heads-down, on link-grammar every now and then. Yesterday, I fixed another round of broken parse rules: making sure that sentences like "John is altogether amazingly quick." "That one is marginally better" "I am done working" "I asked ...]]></description>
			<content:encoded><![CDATA[<p>I hack, heads-down, on <a href="http://www.abisource.com/projects/link-grammar/">link-grammar</a> every now and then. Yesterday, I fixed another round of broken parse rules: making sure that sentences like &#8220;John is altogether amazingly quick.&#8221; &#8220;That one is marginally better&#8221; &#8220;I am done working&#8221; &#8220;I asked Jim a question&#8221; &#8220;I was told that crap, too&#8221; all parse correctly.</p>
<p>Solving these required adding new rules to the link-grammar dictionary: so, for example, adding   the rule &#8220;O+ &amp;amp; {@MV+}&#8221; to the dictionary entry for &#8220;asked&#8221; &#8212; thus allowing the word &#8220;asked&#8221; to take a direct object (&#8220;&#8230;asked Jim&#8230;&#8221;) and an indirect object (&#8220;&#8230; a qeustion&#8221;).</p>
<p>Patching dictionary entries by hand is tedious and time consuming. The long term goal is to get to the point where the system can learn new entries on its own. So I&#8217;ve been day-dreaming about how to do that, and some of the baby-steps in that direction.</p>
<p>The first step is to put all of the rules into a database where they can be easily modified, added-to, and deleted. Currently, the rules all live in a file, which can only be hand-edited.</p>
<p>A second step is to assign probabilities to the rules: often, multiple parses are possible, yet not each rule is equally likely to lead to a correct parse. So it would be better to have a probability  assigned to each rule, indicating how often it leads to a good parse. These probabilities are already needed for the GSoC project of adding a SAT solver to link-grammar. I&#8217;ve got a database of tens of millions of word pairs and link types, ready to go for this step.</p>
<p>A third step is to distinguish good parses from bad ones: this is the parse-ranking step: to judge parsees for the likely-hood of being good, or bad. There are superficial things one can do for parse ranking, and quite complex ones. I am hoping to someday-soon reinforce the parse-ranking scores with the results of word-sense disambiguation. Which brings us to the next point:</p>
<p>Step four &#8212; some grammar parsing rules are appropriate only for some senses of a word, but not for others. Link grammar already accounts for this at a rough scale: it has distinct rules for different parts of speech. These are indicated by tacking a single extra letter to the end of a word: walk.v (I walk.v)  versus walk.n (I took.v a walk.n) This basic idea can be further refined, for finer divisions than just parts of speech: some word senses can only be used in certain ways, and not others.  The technical problem is that 26 letters is not enough&#8230; already, link-grammar is using some 20 or so of these suffixes.  So this mechanism needs to be expanded, somehow.</p>
<p>Step five &#8212; where teh rubber hits the road &#8212; actually learning new rules.  This is much more vague; my ideas are still swimming. There are two distinct problems: new words, and fixing rules for existing words. New words should not be *much* of a problem: try to find synonyms for a word, and assume the new word can be used like the synonym. Modifying existing rules is much much harder&#8230;</p>
<p>&#8230; so hard, in fact, that I&#8217;m not sure I&#8217;m ready to engage in that, just yet.  Some steps can be taken in that direction, by looking at minimum-spanning-tree dependency grammars (MST grammars): that is, computing the mutual information of word pairs, and comparing them to the output of link-grammar. This should suggest where new links can be created. Then comes the question: what should the appropriate link type be, for this new link?  For some words, perhaps synonyms can help, but other words are so unique, that they have no synonyms:  the word &#8220;to be&#8221; is so central to the English language  that  trying to discover information about it is very difficult: it has no synonyms, even as it has many.</p>
<p>I&#8217;ve got some infrastructure set up to run this last experiment: I&#8217;ve got a fairly large collection of parsed sentences, from which I can build mutual information pairs, and compare these to  link-grammar parses. In fact, I&#8217;ve already done so: I use these for  parse ranking. (that is, if a link-grammar parse has a high mutual information content, then I assume its a good parse).</p>
<p>I could turn this around: given a sentence that is parsing badly, I can look for high mutual-info parses. Perhaps broaden the coverage by comparing to parses of approximately synonymous words, perhaps reinforced by word-sense disambiguation. But this generally leads to a combinatorial explosion &#8212; which, I guess is expected. We know that general intelligence requires a lot of CPU.  The trick is to have the patience to set up an experiment, run the experiment, wait for its results, and then do it again&#8230;</p>
<p>Enough for now. I&#8217;m off to work on &#8220;related words&#8221; &#8212; another part of the puzzle.</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=41&amp;md5=f6b8fb1a58d3829d75f2f19ba7194238" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2008/08/17/hacking-on-link-grammar/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Mapping Wordnet, RelEx to OpenCog</title>
		<link>http://blog.opencog.org/2008/05/05/mapping-wordnet-relex-to-opencog/</link>
		<comments>http://blog.opencog.org/2008/05/05/mapping-wordnet-relex-to-opencog/#comments</comments>
		<pubDate>Mon, 05 May 2008 23:30:08 +0000</pubDate>
		<dc:creator>Linas Vepstas</dc:creator>
				<category><![CDATA[Design]]></category>
		<category><![CDATA[Theory]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[OpenCog]]></category>
		<category><![CDATA[RelEx]]></category>
		<category><![CDATA[WordNet]]></category>
		<category><![CDATA[WSD]]></category>

		<guid isPermaLink="false">http://opencog.wordpress.com/?p=9</guid>
		<description><![CDATA[I spent the afternoon creating a formalized mapping from RelEx and Wordnet to OpenCog.  The goal is to clean things up enough so that I can run word-sense disambiguation code with opencog itself. Now, one thing that was ...]]></description>
			<content:encoded><![CDATA[<p>I spent the afternoon creating a formalized mapping from RelEx and Wordnet to OpenCog.  The goal is to clean things up enough so that I can run word-sense disambiguation code with opencog itself. Now, one thing that was nagging me is that this is, in some sense, the hard-way forward &#8212; I could just download  Rada Mihalcea&#8217;s WSD code from off the net, and just run that. But this misses the point: I want to have the network graph of word senses within opencog so that I can start trying to resolve senses across multiple sentences, and thus, potentially do document-level word-sense detection. So I&#8217;m pretty excited by the approach, but it sure does feel like re-inventing the wheel,. sometimes. I hope to post the spec to opencog-discuss in a few days.</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=9&amp;md5=24432adba0b2fbb0f615c3caf7361dc2" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2008/05/05/mapping-wordnet-relex-to-opencog/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Google Summer of Code</title>
		<link>http://blog.opencog.org/2008/05/05/google-summer-of-code/</link>
		<comments>http://blog.opencog.org/2008/05/05/google-summer-of-code/#comments</comments>
		<pubDate>Mon, 05 May 2008 20:36:11 +0000</pubDate>
		<dc:creator>David Hart</dc:creator>
				<category><![CDATA[Development]]></category>
		<category><![CDATA[GSoC]]></category>
		<category><![CDATA[HyperGraphDB]]></category>
		<category><![CDATA[MOSES]]></category>
		<category><![CDATA[NLP]]></category>
		<category><![CDATA[OpenBiomind]]></category>
		<category><![CDATA[OpenCog]]></category>
		<category><![CDATA[OpenSim]]></category>
		<category><![CDATA[Pleasure]]></category>
		<category><![CDATA[RelEx]]></category>

		<guid isPermaLink="false">http://opencog.wordpress.com/?p=11</guid>
		<description><![CDATA[Crunch time is here! Our participation in Google's Summer of Code program has accelerated release schedules and shifted priorities. Ben is busy writing initial documentation, converting much of it from Novamente documentation. Gustavo, Senna and Linas are working to ...]]></description>
			<content:encoded><![CDATA[<p>Crunch time is here! Our participation in Google&#8217;s <a href="http://code.google.com/soc/2008/">Summer of Code</a> program has accelerated release schedules and shifted priorities. Ben is busy writing initial documentation, converting much of it from Novamente documentation. Gustavo, Senna and Linas are working to tidy OpenCog code, removing crufty and embarrassing bits and improving infrastructure and interfaces. Joel is working on the first collection of research-oriented MindAgents. You&#8217;ll hear more soon on this blog from these <a href="http://opencog.org/wiki/Community">team members</a>, and from GSoC students on the <a href="http://opencog.ning.com/">OpenCog Collective</a> blog and the new list <a href="http://groups.google.com/group/opencog-soc">opencog-soc@googlegroups.com</a>.</p>
<p>To quote Ben Goertzel&#8217;s post on <a href="http://groups.google.com/group/opencog/browse_thread/thread/aa7159e328f00406/63db705f7f518fb4">opencog@googlegroups.com</a>:<br />
<blockquote>The Google Summer of Code selection process is done, and 11 proposals were chosen.</p>
<p>It was a really painful process to go through, as we had more than 70 applications, and at least 25-30 of them were really quite good.</p>
<p>The accepted proposals span a fairly wide variety of areas, and the choices were ultimately made based on a number of factors including</p>
<p>&#8211; clarity and completeness of the proposal<br />
&#8211; background of the student<br />
&#8211; readiness of the OpenCog codebase for the project<br />
&#8211; critical-ness of the project for OpenCog</p>
<p>Of the 11 selected, 2 were for OpenBiomind projects, and the other 9 for OpenCog proper &#8230; including a bunch of stuff for the RelEx NLP toolkit.</p>
<p>There was a strong bias toward proposals dealing with improvements to OpenCog-related software components that already are moderately mature, like RelEx and MOSES.</p>
<p>Next year when OpenCog is more mature, if we are chosen to participate in GSoC again (as we hope, and have reason to somewhat expect), you can expect to see more explicitly, broadly, AGI-related proposals.</p>
<p>Anyway this list of selected projects is here for all who are curious:</p>
<p><b>OpenSim for OpenCog<br />
by Kino High Coursey, mentored by Andre Luiz de Senna</p>
<p>Implementing a SAT/SMT Based Link Grammar Parser<br />
by Filip Marić, mentored by Predrag Janicic</p>
<p>Bayesian and Causal Networks Inference using Indefinite Probabilities<br />
by Cesar Augusto Cavalheiro Marcondes, mentored by Cassio Pennachin</p>
<p>Java GUI for OpenBiomind<br />
by Bhavesh Sanghvi, mentored by Murilo Saraiva de Queiroz</p>
<p>MOSES: the Pleasure Algorithm<br />
by Alesis Novik, mentored by Nil Geisweiller</p>
<p>Graph Algorithms for HyperGraphDB<br />
by Guo Junfei, mentored by Ben Goertzel</p>
<p>Improved MOSES<br />
by ChenShuo, mentored by Moshe Looks</p>
<p>RelEx Web Crawler and HypergraphDB Manager<br />
by Rich Jones, mentored by David Hart</p>
<p>RelEx: Learning Simple Grammars<br />
by Elizabeth Dawn Alpert, mentored by Lukasz Kaiser</p>
<p>Distributed HipergraphDB Version<br />
by Costa Ciprian, mentored by Borislav Iordanov</p>
<p>Recursive Feature Selection for Enhancing Genetic Disease Prediction<br />
by Paul Cao, mentored by Lucio de Souza Coelho</b></p>
<p>Many thanks to all who applied, all who agreed to help mentor &#8230; and especially to David Hart for coming up with the idea of applying for SIAI to be included as a mentoring organization in GSoC, with a focus on OpenCog work.</p></blockquote>
<p>We&#8217;d also like to thank the terrific Open Source team at Google, particularly Leslie Hawthorn, Dave Anderson and Chris DiBona, for their patience and good advice.</p>
<p class="wp-flattr-button"></p> <p><a href="http://blog.opencog.org/?flattrss_redirect&amp;id=11&amp;md5=086f820ddaad5c5fdd685376a9da451e" title="Flattr" target="_blank"><img src="http://blog.opencog.org/wp-content/plugins/flattr/img/flattr-badge-large.png" alt="flattr this!"/></a></p>]]></content:encoded>
			<wfw:commentRss>http://blog.opencog.org/2008/05/05/google-summer-of-code/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

