Archive for the ‘Language Design’ Category

Pattern Overloading

Tuesday, December 6th, 2016

Pattern Overloading by Ramsey Nasser.

From the post:

C-like languages have a problem of overloaded syntax that I noticed while teaching high school students. Consider the following snippets in such a language:


function foo(int x) {

for(int i=0;i < 10; i++) {

if(x > 10) {

case(x) {

A programmer experienced with this family would see

  1. Function invocation
  2. Function definition
  3. Control flow examples

In my experience, new programmers see these constructs as instances of the same idea: name(some-stuff) more-stuff. This is not an unreasonable conclusion to reach. The syntax for each construct is shockingly similar given that their semantics are wildly different.

You won’t be called upon to re-design C but Nasser’s advice:

Syntactic similarity should mirror semantic similarity

Or, to take a quote from the UX world

Similar things should look similar and dissimilar things should look dissimilar

is equally applicable to any syntax that you design.

Gremlin Users – Beware the Double-Underscore!

Wednesday, February 3rd, 2016

A user recently posted this example from the Gremlin documentation:

out()).values(‘name’) [apologies for the line wrap]

which returned:

“No such property: _ for class: Script121”

Marko Rodriguez responded:

Its a double underscore, not a single underscore.

__ vs. _

I mention this to benefit beginning Gremlin users who haven’t developed an underscore stutter but also as a plea for sanity in syntax design.

It’s is easy to type two successive underscores but the obviousness of a double underscore versus a single underscore depends on local typography.

To say nothing that what might be obvious to the eyes of a twenty-something may not be as obvious to the eyes of a fifty-something+.

In syntax design, answer the question:

Do you want to be clever or clear?

Growing a Language

Saturday, September 20th, 2014

Growing a Language by Guy L. Steele, Jr.

The first paper in a new series of posts from the Hacker School blog, “Paper of the Week.”

I haven’t found a good way to summarize Steele’s paper but can observe that a central theme is the growth of programming languages.

While enjoying the Steele paper, ask yourself how would you capture the changing nuances of a language, natural or artificial?


Using Category Theory to design…

Tuesday, July 29th, 2014

Using Category Theory to design implicit conversions and generic operators by John C. Reynolds.


A generalization of many-sorted algebras, called category-sorted algebras, is defined and applied to the language-design problem of avoiding anomalies in the interaction of implicit conversions and generic operators. The definition of a simple imperative language (without any binding mechanisms) is used as an example.

The greatest exposure most people have to implicit conversions is that they are handled properly.

This paper dates from 1980 so some of the category theory jargon will seem odd but consider it a “practical” application of category theory.

That should hold your interest. 😉

I first saw this in a tweet by scottfleischman.

An Empirical Investigation into Programming Language Syntax

Monday, July 14th, 2014

An Empirical Investigation into Programming Language Syntax by Greg Wilson.

A great synopsis of Andreas Stefik and Susanna Siebert’s “An Empirical Investigation into Programming Language Syntax.” ACM Transactions on Computing Education, 13(4), Nov. 2013.

A sample to interest you in the post:

  1. Programming language designers needlessly make programming languages harder to learn by not doing basic usability testing. For example, “…the three most common words for looping in computer science, for, while, and foreach, were rated as the three most unintuitive choices by non-programmers.”
  2. C-style syntax, as used in Java and Perl, is just as hard for novices to learn as a randomly-designed syntax. Again, this pain is needless, because the syntax of other languages (such as Python and Ruby) is significantly easier.

Let me repeat part of that:

C-style syntax, as used in Java and Perl, is just as hard for novices to learn as a randomly-designed syntax.

Randomly-designed syntax?

Now, think about the latest semantic syntax or semantic query syntax you have read about.

Was it designed for users? Was there any user testing at all?

Is there a lesson here for designers of semantic syntaxes and query languages?


I first saw this in Greg Wilson’s Software Carpentry: Lessons Learned video.

So You Want To Write Your Own Language?

Tuesday, February 4th, 2014

So You Want To Write Your Own Language? by Walter Bright.

From the post:

The naked truth about the joys, frustrations, and hard work of writing your own programming language

My career has been all about designing programming languages and writing compilers for them. This has been a great joy and source of satisfaction to me, and perhaps I can offer some observations about what you’re in for if you decide to design and implement a professional programming language. This is actually a book-length topic, so I’ll just hit on a few highlights here and avoid topics well covered elsewhere.

In case you are wondering if Walter is a good source for language writing advice, I pulled this bio from the Dr. Dobb’s site:

Walter Bright is a computer programmer known for being the designer of the D programming language. He was also the main developer of the first native C++ compiler, Zortech C++ (later to become Symantec C++, now Digital Mars C++). Before the C++ compiler he developed the Datalight C compiler, also sold as Zorland C and later Zortech C.

I am sure writing a language is an enormous amount of work but Water makes it sound quite attractive.

The Next 700 Programming Languages
[Essence of Topic Maps]

Saturday, March 16th, 2013

The Next 700 Programming Languages by P. J. Landin.


A family of unimplemented computing languages is described that is intended to span differences of application area by a unified framework. This framework dictates the rules about the uses of user-coined names, and the conventions about characterizing functional relationships. Within this framework ‘lhe design of a specific language splits into two independent parts. One is the choice of written appearances of programs (or more generally, their physical representation). The other is the choice of the abstract entities (such as numbers, character-strings, lists of them, functional relations among them) that can be referred to in the language.

The system is biased towards “expressions” rather than “statements.” It includes a nonprocedural (purely functional) subsystem that aims to expand the class of users’ needs that can be met by a single print-instruction, without sacrificing the important properties that make conventional right-hand-side expressions easy to construct and understand.

The introduction to this paper reminded me of an acronym, SWIM (See What I Mean) that was coined to my knowledge by Michel Biezunski several years ago:

Most programming languages are partly a way of expressing things in terms of other things and partly a basic set of given things. The ISWIM (If you See What I Mean) system is a byproduct of an attempt to disentangle these two aspects in some current languages.

This attempt has led the author to think that many linguistic idiosyncracies are concerned with the former rather than the latter, whereas aptitude for a particular class of tasks is essentially determined by the latter rather than the former. The conclusion follows that many language characteristics are irrelevant to the alleged problem orientation.

ISWIM is an attempt at a general purpose system for describing things in terms of other things, that can be problem-oriented by appropriate choice of “primitives.” So it is not a language so much as a family of languages, of which each member is the result of choosing a set of primitives. The possibilities concerning this set and what is needed to specify such a set are discussed below.

The essence of topic maps is captured by:

ISWIM is an attempt at a general purpose system for describing things in terms of other things, that can be problem-oriented by appropriate choice of “primitives.”

Every information system has a set of terms, the meaning of which are known to its designers and/or users.

Data integration issues arise from the description of terms, “in terms of other things,” being known only to designers and users.

The power of topic maps comes from the expression of descriptions “in terms of other things,” for terms.

Other designers or users can examine those descriptions to see if they recognize any terms similar to those they know by other descriptions.

If they discover descriptions they consider to be of same thing, they can then create a mapping of those terms.

Hopefully using the descriptions as a basis for the mapping. A mapping of term to term only multiplies the opaqueness of the terms.

For some systems, Social Security Administration databases for example, descriptions of terms “in terms of other things” may not be part of the database itself. But descriptions maintained as “best practice” to facilitate later maintenance and changes.

For other systems, U.S. Intelligence community as another example, still chasing the will-o’-the-wisp* of standard terminology for non-standard terms, even the possibility of interchange depends on the development of description of terms “in terms of other things.”

Before you ask, yes, yes the Topic Maps Data Model (TMDM) and the various Topic Maps syntaxes are terms that can be described “in terms of other things.”

The advantage of the TMDM and relevant syntaxes is that even if not described “in terms of other things,” standardized terms enable interchange of a class of mappings. The default identification mapping in the TMDM being by IRIs.

Before and since Landin’s article we have been producing terms that could be described “in terms of other things.” In CS and other areas of human endeavor as well.

Isn’t it about time we starting describing our terms rather than clamoring for one set of undescribed terms or another?

* I use the term will-o’-the-wisp quite deliberately.

After decades of failure to create universal information systems with computers, following on centuries of non-computer failures to reach the same goal, following on millennia of semantic and linguistic diversity, someone knows attempts at universal information systems will leave intelligence agencies not sharing critical data.

Perhaps the method you choose says a great deal about the true goals of your project.

I first saw this in a tweet by CompSciFact.

Notation as a Tool of Thought

Thursday, November 29th, 2012

Notation as a Tool of Thought by Kenneth E. Iverson.

From the introduction:

Nevertheless, mathematical notation has serious deficiencies. In particular, it lacks universality, and must be interpreted differently according to the topic, according to the author, and even according to the immediate context. Programming languages, because they were designed for the purpose of directing computers, offer important advantages as tools of thought. Not only are they universal (general-purpose), but they are also executable and unambiguous. Executability makes it possible to use computers to perform extensive experiments on ideas expressed in a programming language, and the lack of ambiguity makes possible precise thought experiments. In other respects, however, most programming languages are decidedly inferior to mathematical notation and are little used as tools of thought in ways that would be considered significant by, say, an applied mathematician.

The thesis of the present paper is that the advantages of executability and universality found in programming languages can be effectively combined, in a single coherent language, with the advantages offered by mathematical notation.

Will expose you to APL but that’s not a bad thing. The history of reasoning about data structures can be interesting and useful.

Iverson’s response to critics of the algorithms in this work was in part as follows:

…overemphasis of efficiency leads to an unfortunate circularity in design: for reasons of efficiency early programming languages reflected the characteristics of the early computers, and each generation of computers reflects the needs of the programming languages of the preceding generation. (5.4 Mode of Presentation)

A good reason to understand the nature of a problem before reaching for the keyboard.

Exploiting Parallelism and Scalability (XPS) (NSF)

Thursday, October 25th, 2012

Exploiting Parallelism and Scalability (XPS) (NSF)

From the announcement:

Synopsis of Program:

Computing systems have undergone a fundamental transformation from the single-processor devices of the turn of the century to today’s ubiquitous and networked devices and warehouse-scale computing via the cloud. Parallelism has become ubiquitous at many levels. The proliferation of multi- and many-core processors, ever-increasing numbers of interconnected high performance and data intensive edge devices, and the data centers servicing them, is enabling a new set of global applications with large economic and social impact. At the same time, semiconductor technology is facing fundamental physical limits and single processor performance has plateaued. This means that the ability to achieve predictable performance improvements through improved processor technologies has ended.

The Exploiting Parallelism and Scalability (XPS) program aims to support groundbreaking research leading to a new era of parallel computing. XPS seeks research re-evaluating, and possibly re-designing, the traditional computer hardware and software stack for today’s heterogeneous parallel and distributed systems and exploring new holistic approaches to parallelism and scalability. Achieving the needed breakthroughs will require a collaborative effort among researchers representing all areas– from the application layer down to the micro-architecture– and will be built on new concepts and new foundational principles. New approaches to achieve scalable performance and usability need new abstract models and algorithms, programming models and languages, hardware architectures, compilers, operating systems and run-time systems, and exploit domain and application-specific knowledge. Research should also focus on energy- and communication-efficiency and on enabling the division of effort between edge devices and clouds.

Full proposals due: February 20, 2013, (due by 5 p.m. proposer’s local time).

I see the next wave of parallelism and scalability being based on language and semantics. Less so on more cores and better designs in silicon.

Not surprising since I work in languages and semantics every day.

Even so, consider a go-cart that exceeds 160 miles per hour (260 km/h) remains a go-cart.

Go beyond building a faster go-cart.

Consider language and semantics when writing your proposal for this program.

OPLSS 2012

Saturday, August 11th, 2012

OPLSS 2012 by Robert Harper.

From the post:

The 2012 edition of the Oregon Programming Languages Summer School was another huge success, drawing a capacity crowd of 100 eager students anxious to learn the latest ideas from an impressive slate of speakers. This year, as last year, the focus was on languages, logic, and verification, from both a theoretical and a practical perspective. The students have a wide range of backgrounds, some already experts in many of the topics of the school, others with little or no prior experience with logic or semantics. Surprisingly, a large percentage (well more than half, perhaps as many as three quarters) had some experience using Coq, a large uptick from previous years. This seems to represent a generational shift—whereas such topics were even relatively recently seen as the province of a few zealots out in left field, nowadays students seem to accept the basic principles of functional programming, type theory, and verification as a given. It’s a victory for the field, and extremely gratifying for those of us who have been pressing forward with these ideas for decades despite resistance from the great unwashed. But it’s also bittersweet, because in some ways it’s more fun to be among the very few who have created the future long before it comes to pass. But such are the proceeds of success.

As if a post meriting your attention wasn’t enough, it concludes with:

Videos of the lectures, and course notes provided by the speakers, are all available at the OPLSS 12 web site.

Just a summary of what you will find:

  • Logical relations — Amal Ahmed
  • Category theory foundations — Steve Awodey
  • Proofs as Processes — Robert Constable
  • Polarization and focalization — Pierre-Louis Curien
  • Type theory foundations — Robert Harper
  • Monads and all that — John Hughes
  • Compiler verification — Xavier Leroy
  • Language-based security — Andrew Myers
  • Proof theory foundations — Frank Pfenning
  • Software foundations in Coq — Benjamin Pierce


Evaluating the Design of the R Language

Friday, May 11th, 2012

Evaluating the Design of the R Language

Sean McDirmid writes:

From our recent discussion on R, I thought this paper deserved its own post (ECOOP final version) by Floreal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek; abstract:

R is a dynamic language for statistical computing that combines lazy functional features and object-oriented programming. This rather unlikely linguistic cocktail would probably never have been prepared by computer scientists, yet the language has become surprisingly popular. With millions of lines of R code available in repositories, we have an opportunity to evaluate the fundamental choices underlying the R language design. Using a combination of static and dynamic program analysis we assess the success of different language features.

Excerpts from the paper:

R comes equipped with a rather unlikely mix of features. In a nutshell, R is a dynamic language in the spirit of Scheme or JavaScript, but where the basic data type is the vector. It is functional in that functions are first-class values and arguments are passed by deep copy. Moreover, R uses lazy evaluation by default for all arguments, thus it has a pure functional core. Yet R does not optimize recursion, and instead encourages vectorized operations. Functions are lexically scoped and their local variables can be updated, allowing for an imperative programming style. R targets statistical computing, thus missing value support permeates all operations.

One of our discoveries while working out the semantics was how eager evaluation of promises turns out to be. The semantics captures this with C[]; the only cases where promises are not evaluated is in the arguments of a function call and when promises occur in a nested function body, all other references to promises are evaluated. In particular, it was surprising and unnecessary to force assignments as this hampers building infinite structures. Many basic functions that are lazy in Haskell, for example, are strict in R, including data type constructors. As for sharing, the semantics cleary demonstrates that R prevents sharing by performing copies at assignments.

The R implementation uses copy-on-write to reduce the number of copies. With superassignment, environments can be used as shared mutable data structures. The way assignment into vectors preserves the pass-by-value semantics is rather unusual and, from personal experience, it is unclear if programmers understand the feature. … It is noteworthy that objects are mutable within a function (since fields are attributes), but are copied when passed as an argument.

Perhaps not immediately applicable to a topic map task today but I would argue very relevant for topic maps in general.

In part because it is a reminder that we are fashioning, when writing topic maps or topic map interfaces or languages to be used with topic maps, languages. Languages that will or perhaps will not fit how our users view the world and how they tend to formulate queries or statements.

The test for an artificial language should be whether users have to stop to consider the correctness of their writing. Every pause is a sign that error may be about to occur. Will they remember that this is an SVO language? Or is the terminology a familiar one?

Correcting the errors of others may “validate” your self-worth but is that what you want as the purpose of your language?

(I saw this at Christophe Lalanne’s blog.)

The Heterogeneous Programming Jungle

Saturday, March 24th, 2012

The Heterogeneous Programming Jungle by Michael Wolfe.

Michael starts off with one definition of “heterogeneous:”

The heterogeneous systems of interest to HPC use an attached coprocessor or accelerator that is optimized for certain types of computation.These devices typically exhibit internal parallelism, and execute asynchronously and concurrently with the host processor. Programming a heterogeneous system is then even more complex than “traditional” parallel programming (if any parallel programming can be called traditional), because in addition to the complexity of parallel programming on the attached device, the program must manage the concurrent activities between the host and device, and manage data locality between the host and device.

And while he returns to that definition in the end, another form of heterogeneity is lurking not far behind:

Given the similarities among system designs, one might think it should be obvious how to come up with a programming strategy that would preserve portability and performance across all these devices. What we want is a method that allows the application writer to write a program once, and let the compiler or runtime optimize for each target. Is that too much to ask?

Let me reflect momentarily on the two gold standards in this arena. The first is high level programming languages in general. After 50 years of programming using Algol, Pascal, Fortran, C, C++, Java, and many, many other languages, we tend to forget how wonderful and important it is that we can write a single program, compile it, run it, and get the same results on any number of different processors and operating systems.

So there is the heterogeneity of attached coprocessor and, just as importantly, of the processors with coprocessors.

His post concludes with:

Grab your Machete and Pith Helmet

If parallel programming is hard, heterogeneous programming is that hard, squared. Defining and building a productive, performance-portable heterogeneous programming system is hard. There are several current programming strategies that attempt to solve this problem, including OpenCL, Microsoft C++AMP, Google Renderscript, Intel’s proposed offload directives (see slide 24), and the recent OpenACC specification. We might also learn something from embedded system programming, which has had to deal with heterogeneous systems for many years. My next article will whack through the underbrush to expose each of these programming strategies in turn, presenting advantages and disadvantages relative to the goal.

These are languages that share common subjects (think of their target architectures) and so are ripe for a topic map that co-locates their approaches to a particular architecture. Being able to incorporate official and non-official documentation, tests, sample code, etc., might enable faster progress in this area.

The future of HPC processors is almost upon us. It will not do to be tardy.

Notation as a Tool of Thought – Iverson – Turing Lecture

Sunday, October 23rd, 2011

Notation as a Tool of Thought by Kenneth E. Iverson – 1979 Turing Award Lecture

I saw this lecture tweeted with a link to a poor photocopy of a double column printing of the lecture.

I think you will find the single column version from the ACM awards site much easier to read.

Not to mention that the ACM awards site has all the Turing as well as other award lectures for viewing.

I suspect that a CS class could be taught using only ACM award lectures as the primary material. Perhaps someone already has, would appreciate a pointer if true.

Execution in the Kingdom of Nouns

Sunday, October 9th, 2011

Execution in the Kingdom of Nouns

From the post:

They’ve a temper, some of them—particularly verbs: they’re the proudest—adjectives you can do anything with, but not verbs—however, I can manage the whole lot of them! Impenetrability! That’s what I say!
— Humpty Dumpty

Hello, world! Today we’re going to hear the story of Evil King Java and his quest for worldwide verb stamp-outage.1

Caution: This story does not have a happy ending. It is neither a story for the faint of heart nor for the critical of mouth. If you’re easily offended, or prone to being a disagreeable knave in blog comments, please stop reading now.

Before we begin the story, let’s get some conceptual gunk out of the way.

What I find compelling is the notion that a programming language should follow how we think, that is the most of us.

If you want a successful topic map, should it follow/mimic the thinking of:

  1. the author
  2. the client
  3. intended user base?

#1 is easy, that’s the default and requires the least work.

#2 is instinctive, but you will need to educate the client to #3.

#3 is golden if you can hit that mark.

How to Think about Parallel Programming: Not!

Saturday, January 15th, 2011

How to Think about Parallel Programming: Not! by Guy Steele is a deeply interesting presentation on how not to approach parallel programming. The central theme is that languages should provide parallelism transparently, without programmers having to think in parallel.

Parallel processing of topic maps is another way to scale topic map for particular situations.

How to parallel process questions of subject identity is an open and possibly domain specific issue.

Watch the presentation even if you are only seeking an entertaining account of my first program.