Evaluating the Design of the R Language

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 11, 2012

Evaluating the Design of the R Language

Filed under: Language,Language Design,R — Patrick Durusau @ 3:33 pm

Sean McDirmid writes:

From our recent discussion on R, I thought this paper deserved its own post (ECOOP final version) by Floreal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek; abstract:

R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features.

Excerpts from the paper:

R comes equipped with a rather unlikely mix of features. In a nutshell, R is a dynamic language in the spirit of Scheme or JavaScript, but where the basic data type is the vector. It is functional in that functions are ﬁrst-class values and arguments are passed by deep copy. Moreover, R uses lazy evaluation by default for all arguments, thus it has a pure functional core. Yet R does not optimize recursion, and instead encourages vectorized operations. Functions are lexically scoped and their local variables can be updated, allowing for an imperative programming style. R targets statistical computing, thus missing value support permeates all operations.

One of our discoveries while working out the semantics was how eager evaluation of promises turns out to be. The semantics captures this with C[]; the only cases where promises are not evaluated is in the arguments of a function call and when promises occur in a nested function body, all other references to promises are evaluated. In particular, it was surprising and unnecessary to force assignments as this hampers building inﬁnite structures. Many basic functions that are lazy in Haskell, for example, are strict in R, including data type constructors. As for sharing, the semantics cleary demonstrates that R prevents sharing by performing copies at assignments.

The R implementation uses copy-on-write to reduce the number of copies. With superassignment, environments can be used as shared mutable data structures. The way assignment into vectors preserves the pass-by-value semantics is rather unusual and, from personal experience, it is unclear if programmers understand the feature. … It is noteworthy that objects are mutable within a function (since ﬁelds are attributes), but are copied when passed as an argument.

Perhaps not immediately applicable to a topic map task today but I would argue very relevant for topic maps in general.

In part because it is a reminder that we are fashioning, when writing topic maps or topic map interfaces or languages to be used with topic maps, languages. Languages that will or perhaps will not fit how our users view the world and how they tend to formulate queries or statements.

The test for an artificial language should be whether users have to stop to consider the correctness of their writing. Every pause is a sign that error may be about to occur. Will they remember that this is an SVO language? Or is the terminology a familiar one?

Correcting the errors of others may “validate” your self-worth but is that what you want as the purpose of your language?

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 11, 2012

Evaluating the Design of the R Language

No Comments