I spent the weekend comparing the Stanford parser to RelEx, and learned a lot. RelEx really does deserve to be called a “semantic relation extractor”, and not just a “dependency relation extractor”. It provides a more abstract, more semantic output than the Stanford parser, which sticks very narrowly to the syntactic structure of a sentence.
I wrote up a few paragraphs on the most prominent differences; most of my updates were to the RelEx dependency relations page.
Here are the main bullet points:
- RelEx attempts basic entity extraction, and thus avoids generating nn noun modifier relations for named entities.
- RelEx will collapse the object and complement of a preposition into one. Stanford will do this for some, but not all relationships.
- RelEx will convert passive subjects into objects, and instead indicate passiveness by tagging the verb with a passive tense feature.
- RelEx avoids generating copulas, if at all possible, and instead indicates copular relations as predicative adjectives, or in other ways.
- RelEx extracts semantic variables from questions, with the intent of simplifying question answering. For example, “Where is the ball?” generates _pobj(_%atLocation, _$qVar) _psubj(_%atLocation, ball), which can then pattern-match a plausible answer: _pobj(under, couch).
- RelEx attempts to extract comparison variables.
Its also clear to me that I could split up the relex processing into two stages: one which generates stanford-style syntactic relations, and a second stage that generates the more abstract stuff. This might be a wise move … Since RelEx is already more than 3x faster than the Stanford parser, this could attract new users.
– Linas Vepstas