Parsing with Pictures by Keshav Pingali and Gianfranco Bilardi. (PDF file)
From an email that Keshav sent to the firstname.lastname@example.org email list:
Gianfranco Bilardi and I have developed a new approach to parsing context-free languages that we call “Parsing with pictures”. It provides an alternative (and, we believe, easier to understand) approach to context-free language parsing than the standard presentations using derivations or pushdown automata. It also unifies Earley, SLL, LL, SLR, and LR parsers among others.
Parsing problems are formulated as path problems in a graph called the grammar flow graph (GFG) that is easily constructed from a given grammar. Intuitively, the GFG is to context-free grammars what NFAs are to regular languages. Among other things, the paper has :
(i) an elementary derivation of Earley’s algorithm for parsing general context-free grammars, showing that it is an easy generalization of the well-known reachability-based NFA simulation algorithm,
(ii) a presentation of look-ahead that is independent of particular parsing strategies, and is based on a simple inter-procedural dataflow analysis,
(iii) GFG structural characterizations of LL and LR grammars that are simpler to understand than the standard definitions, and bring out a symmetry between these grammar classes,
(iv) derivations of recursive-descent and shift-reduce parsers for LL and LR grammars by optimizing the Earley parser to exploit this structure, and
(v) a connection between GFGs and NFAs for regular grammars based on the continuation-passing style (CPS) optimization.
Or if you prefer the more formal abstract:
The development of elegant and practical algorithms for parsing context-free languages is one of the major accomplishments of 20th century Computer Science. These algorithms are presented in the literature using string rewriting systems or abstract machines like pushdown automata, but the resulting descriptions are unsatisfactory for several reasons. First, even a basic understanding of parsing algorithms for some grammar classes such as LR(k) grammars requires mastering a formidable number of difficult concepts and terminology. Second, parsing algorithms for different grammar classes are often presented using entirely different formalisms, so the relationships between these grammar classes are obscured. Finally, these algorithms seem unrelated to algorithms for regular language recognition even though regular languages are a subset of context-free languages.
In this paper, we show that these problems are avoided if parsing is reformulated as the problem of finding certain kinds of paths in a graph called the Grammar Flow Graph (GFG) that is easily constructed from a context-free grammar. Intuitively, GFG’s permit parsing problems for context-free grammars to be formulated as path problems in graphs in the same way that non-deterministic finite-state automata do for regular grammars. We show that the GFG enables a unified treatment of Earley’s parser for general context-free grammars, recursive-descent parsers for LL(k) and SLL(k) grammars, and shift-reduce parsers for LR(k) and SLR(k) grammars. Computation of look-ahead sets becomes a simple interprocedural dataflow analysis. These results suggest that the GFG can be a new foundation for the study of context-free languages.
Odd as it may sound, some people want to be understood.
If you think being understood isn’t all that weird, do a slow read on this paper and provide feedback to the authors.