Earley Deduction and Earley Parsing:
Parsing of Arbitrary ContextFree Grammars
Overview
There are a number of parsing algorithms for contextfree grammars, but most have restrictions. If you are looking for an efficient algorithm to handle arbitrary contextfree grammars, this is the page for you.
The parsing algorithms commonly used for programming languages, including the LL and LR algorithms, can only handle unambiguous grammars. However, when processing natural languages, ambiguous grammars are generally used, since natural languages themselves contain much ambiguity.
Prolog is a logic programming language and every contextfree grammar can be written as a Prolog program with virtually no change. A Prolog system can then be used to "execute" the logic program, which results in parsing a string.
Unfortunately, the Prolog execution model results in a topdown parse, which is not generally adequate for natural language grammars.
Earley deduction is an execution model which can be applied to logic programs, but unlike the Prolog execution model, Earley deduction is complete in the sense that it will find all solutions. As a general proof technique, Earley deduction can be applied to an arbitrary set of Horn clauses, an important subset of FirstOrder Logic.
When Earley deduction is applied to a logic program representing a contextfree grammar, the resulting algorithm is known as "Earley Parsing" and "Chart Parsing."
DATALOG is a subset of logic programs, i.e., a subset of Horn clauses. DATALOG is sufficient to encode relational operators and contextfree grammars. Definite Clause Grammars (DCGs), a notation for expressing contextfree grammars, fits within the DATALOG subset.
These papers go into greater depth on Earley deduction (and by extension, Earley parsing) and discuss optimizations for efficient implementation. Some performance results are also mentioned.
Papers

Earley Deduction (12 pages)
Describes Earley deduction; gives straightforward arguments for consistency and completeness; describes the DATALOG subset into which contextfree grammars can be written; discusses implementation techniques;
html
pdf

Optimizations to Earley Deduction for DATALOG Programs (11 pages)
More details on the efficient implementation of Earley deduction for the DATALOG subset.
html
pdf
About the Author
Harry H. Porter III, Ph.D.
Computer Science Department
Portland State University
Short Bio:
click here
Harry's Website: www.cs.pdx.edu/~harry
Email: harry@cs.pdx.edu
This page was produced May 15, 2009.