Archive for the ‘Compilers’ Category

Parsing with Pictures

Wednesday, October 17th, 2012

Parsing with Pictures by Keshav Pingali and Gianfranco Bilardi. (PDF file)

From an email that Keshav sent to the compilers@iecc.com email list:

Gianfranco Bilardi and I have developed a new approach to parsing context-free languages that we call “Parsing with pictures”. It provides an alternative (and, we believe, easier to understand) approach to context-free language parsing than the standard presentations using derivations or pushdown automata. It also unifies Earley, SLL, LL, SLR, and LR parsers among others.

Parsing problems are formulated as path problems in a graph called the grammar flow graph (GFG) that is easily constructed from a given grammar. Intuitively, the GFG is to context-free grammars what NFAs are to regular languages. Among other things, the paper has :

(i) an elementary derivation of Earley’s algorithm for parsing general context-free grammars, showing that it is an easy generalization of the well-known reachability-based NFA simulation algorithm,

(ii) a presentation of look-ahead that is independent of particular parsing strategies, and is based on a simple inter-procedural dataflow analysis,

(iii) GFG structural characterizations of LL and LR grammars that are simpler to understand than the standard definitions, and bring out a symmetry between these grammar classes,

(iv) derivations of recursive-descent and shift-reduce parsers for LL and LR grammars by optimizing the Earley parser to exploit this structure, and

(v) a connection between GFGs and NFAs for regular grammars based on the continuation-passing style (CPS) optimization.

Or if you prefer the more formal abstract:

The development of elegant and practical algorithms for parsing context-free languages is one of the major accomplishments of 20th century Computer Science. These algorithms are presented in the literature using string rewriting systems or abstract machines like pushdown automata, but the resulting descriptions are unsatisfactory for several reasons. First, even a basic understanding of parsing algorithms for some grammar classes such as LR(k) grammars requires mastering a formidable number of difficult concepts and terminology. Second, parsing algorithms for different grammar classes are often presented using entirely different formalisms, so the relationships between these grammar classes are obscured. Finally, these algorithms seem unrelated to algorithms for regular language recognition even though regular languages are a subset of context-free languages.

In this paper, we show that these problems are avoided if parsing is reformulated as the problem of finding certain kinds of paths in a graph called the Grammar Flow Graph (GFG) that is easily constructed from a context-free grammar. Intuitively, GFG’s permit parsing problems for context-free grammars to be formulated as path problems in graphs in the same way that non-deterministic finite-state automata do for regular grammars. We show that the GFG enables a unified treatment of Earley’s parser for general context-free grammars, recursive-descent parsers for LL(k) and SLL(k) grammars, and shift-reduce parsers for LR(k) and SLR(k) grammars. Computation of look-ahead sets becomes a simple interprocedural dataflow analysis. These results suggest that the GFG can be a new foundation for the study of context-free languages.

Odd as it may sound, some people want to be understood.

If you think being understood isn’t all that weird, do a slow read on this paper and provide feedback to the authors.

Martin Odersky: Reflection and Compilers

Saturday, April 14th, 2012

Martin Odersky: Reflection and Compilers

From the description:

Reflection and compilers do tantalizing similar things. Yet, in mainstream, statically typed languages the two have been only loosely coupled, and generally share very little code. In this talk I explore what happens if one sets out to overcome their separation.

The first half of the talk addresses the challenge how reflection libraries can share core data structures and algorithms with the language’s compiler without having compiler internals leaking into the standard library API. It turns out that a component system based on abstract types and path-dependent types is a good tool to solve this challenge. I’ll explain how the “multiple cake pattern” can be fruitfully applied to expose the right kind of information.

The second half of the talk explores what one can do when strong, mirror-based reflection is a standard tool. In particular, the compiler itself can use reflection, leading to a particular system of low-level macros that rewrite syntax trees. One core property of these macros is that they can express staging, by rewriting a tree at one stage to code that produces the same tree at the next stage. Staging lets us implement type reification and general LINQ-like functionality. What’s more, staging can also be applied to the macro system itself, with the consequence that a simple low-level macro system can produce a high-level hygienic one, without any extra effort from the language or compiler.

Ignore the comments about the quality of the sound and video. It looks like substantial improvements have been made or I am less sensitive to those issues. Give it a try and see what you think.

Strikes me as being very close to Newcomb’s thoughts on subject identity being composed of other subject identities.

Such that you could have subject representatives that “merge” together and then themselves form the basis for merging other subject representatives.

Suggestions of literature on reflection, its issues and implementations? (Donated books welcome as well. Contact for physical delivery address.)

The Heterogeneous Programming Jungle

Saturday, March 24th, 2012

The Heterogeneous Programming Jungle by Michael Wolfe.

Michael starts off with one definition of “heterogeneous:”

The heterogeneous systems of interest to HPC use an attached coprocessor or accelerator that is optimized for certain types of computation.These devices typically exhibit internal parallelism, and execute asynchronously and concurrently with the host processor. Programming a heterogeneous system is then even more complex than “traditional” parallel programming (if any parallel programming can be called traditional), because in addition to the complexity of parallel programming on the attached device, the program must manage the concurrent activities between the host and device, and manage data locality between the host and device.

And while he returns to that definition in the end, another form of heterogeneity is lurking not far behind:

Given the similarities among system designs, one might think it should be obvious how to come up with a programming strategy that would preserve portability and performance across all these devices. What we want is a method that allows the application writer to write a program once, and let the compiler or runtime optimize for each target. Is that too much to ask?

Let me reflect momentarily on the two gold standards in this arena. The first is high level programming languages in general. After 50 years of programming using Algol, Pascal, Fortran, C, C++, Java, and many, many other languages, we tend to forget how wonderful and important it is that we can write a single program, compile it, run it, and get the same results on any number of different processors and operating systems.

So there is the heterogeneity of attached coprocessor and, just as importantly, of the processors with coprocessors.

His post concludes with:

Grab your Machete and Pith Helmet

If parallel programming is hard, heterogeneous programming is that hard, squared. Defining and building a productive, performance-portable heterogeneous programming system is hard. There are several current programming strategies that attempt to solve this problem, including OpenCL, Microsoft C++AMP, Google Renderscript, Intel’s proposed offload directives (see slide 24), and the recent OpenACC specification. We might also learn something from embedded system programming, which has had to deal with heterogeneous systems for many years. My next article will whack through the underbrush to expose each of these programming strategies in turn, presenting advantages and disadvantages relative to the goal.

These are languages that share common subjects (think of their target architectures) and so are ripe for a topic map that co-locates their approaches to a particular architecture. Being able to incorporate official and non-official documentation, tests, sample code, etc., might enable faster progress in this area.

The future of HPC processors is almost upon us. It will not do to be tardy.

A Distributed C Compiler System on MapReduce: Mrcc

Thursday, March 15th, 2012

A Distributed C Compiler System on MapReduce: Mrcc

Alex Popescu of myNoSQL points to software and a paper on distributed C code for compilation.

Changing to distributed architectures may uncover undocumented decisions made long ago and far away. Decisions that we may choose to make differently this time. Hopefully we will do a better job of documenting them. (Not that it will happen but there is no law against hoping.)