## Archive for the ‘C/C++’ Category

### Numba Versus C++ – On Wolfram CAs

Tuesday, March 6th, 2018

Numba Versus C++ by David Butts, Gautham Dharuman, Bill Punch and Michael S. Murillo.

Python is a programming language that first appeared in 1991; soon, it will have its 27th birthday. Python was created not as a fast scientific language, but rather as a general-purpose language. You can use Python as a simple scripting language or as an object-oriented language or as a functional language…and beyond; it is very flexible. Today, it is used across an extremely wide range of disciplines and is used by many companies. As such, it has an enormous number of libraries and conferences that attract thousands of people every year.

But, Python is an interpreted language, so it is very slow. Just how slow? It depends, but you can count on about 10-100 times as slow as, say, C/C++. If you want fast code, the general rule is: don’t use Python. However, a few more moments of thought lead to a more nuanced perspective. What if you spend most of the time coding, and little time actually running the code? Perhaps your familiarity with the (slow) language, or its vast set of libraries, actually saves you time overall? And, what if you learned a few tricks that made your Python code itself a bit faster? Maybe that is enough for your needs? In the end, for true high performance computing applications, you will want to explore fast languages like C++; but, not all of our needs fall into that category.

As another example, consider the fact that many applications use two languages, one for the core code and one for the wrapper code; this allows for a smoother interface between the user and the core code. A common use case is C or C++ wrapped by, of course, Python. As a user, you may not even know that the code you are using is in another language! Such a situation is referred to as the “two-language problem”. This situation is great provided you don’t need to work in the core code, or you don’t mind working in two languages – some people don’t mind, but some do. The question then arises: if you are one of those people who would like to work only in the wrapper language, because it was chosen for its user friendliness, what options are available to make that language (Python in this example) fast enough that it can also be used for the core code?

We wanted to explore these ideas a bit further by writing a code in both Python and C++. Our past experience suggested that while Python is very slow, it could be made about as fast as C using the crazily-simple-to-use library Numba. Our basic comparisons here are: basic Python, Numba and C++. Because we are not religious about Python, and you shouldn’t be either, we invited expert C++ programmers to have the chance to speed up the C++ as much as they could (and, boy could they!).

This webpage is highly annoying, in both Mozilla and Chrome. You’ll have to visit to get the full impact.

It is, however, also a great post on using Numba to obtain much faster results while still using Python. The use of Wolfram CAs (cellular automata) as examples is an added bonus.

Enjoy!

### Evolving a Decompiler

Wednesday, February 14th, 2018

Evolving a Decompiler by Matt Noonan.

From the post:

Back in 2016, Eric Schulte, Jason Ruchti, myself, Alexey Loginov, and David Ciarletta (all of the research arm of GrammaTech) spent some time diving into a new approach to decompilation. We made some progress but were eventually all pulled away to other projects, leaving a very interesting work-in-progress prototype behind.

Being a promising but incomplete research prototype, it was quite difficult to find a venue to publish our research. But I am very excited to announce that I will be presenting this work at the NDSS binary analysis research (BAR) workshop next week in San Diego, CA! BAR is a workshop on the state-of-the-art in binary analysis research, including talks about working systems as well as novel prototypes and works-in-progress; I’m really happy that the program committee decided to include discussion of these prototypes, because there are a lot of cool ideas out there that aren’t production-ready, but may flourish once the community gets a chance to start tinkering with them.

How wickedly cool!

Did I mention all the major components are open-source?

GrammaTech recently open-sourced all of the major components of BED, including:

• SEL, the Software Evolution Library. This is a Common Lisp library for program synthesis and repair, and is quite nice to work with interactively. All of the C-specific mutations used in BED are available as part of SEL; the only missing component is the big code database; just bring your own!
• clang-mutate, a command-line tool for performing low-level mutations on C and C++ code. All of the actual edits are performed using clang-mutate; it also includes a REPL-like interface for interactively manipulating C and C++ code to quickly produce variants.

The building of the “big code database” sounds like an exercise in subject identity doesn’t it?

Topic maps anyone?

### Intro to Low-Level Graphics on Linux – Impressing Spouse’s Family

Sunday, November 12th, 2017

Intro to Low-Level Graphics on Linux

From the webpage:

This tutorial attempts to explain a few of the possible methods that exist on Linux to access the graphics hardware from a low level. I am not talking about using Xlib instead of GTK+ or QT5, nor am I talking about using DirectFB, I want to go even lower than that; I’m talking about drawing graphics to the screen without needing any external dependencies; I’m talking about communicating directly with the Linux kernel. I will also provide information about programming for newer graphical systems (Wayland/Mir) even though those do not involve direct communication with the kernel drivers. The reason I want to provide this information in this tutorial is that even though their APIs are higher level, the programming techniques used in low-level graphics programming can easily be adapted to work with Wayland and Mir. Also, similar to fbdev and KMS/DRM APIs, good programming resources are hard to come by.

Most Linux systems actually provide a few different methods for drawing graphics to the screen; there are options. However, the problem is that documentation is basically non-existent. So, I would like to explain here what you need to know to get started.

Please note that this tutorial assumes you have a basic knowledge of C, this is not a beginner tutorial, this is for people who are interested in something like learning more about how Linux works, or about programming for embedded systems, or just doing weird experimental stuff for fun.

You can impress your spouse’s family this holiday season by writing C code for low-level graphics on Linux. They won’t know you are frantically typing comments to the example code and will be suitably impressed by compiling.

The other reason to mention this is the presence of Linux on embedded systems. Embedded systems such as in industrial controllers, monitoring equipment, etc. The more comfortable you are will such systems the easy they will be to explore.

Enjoy!

### C Reference Manual (D.M. Richie, 1974)

Tuesday, May 23rd, 2017

C Reference Manual (D.M. Richie, 1974)

I mention the C Reference Manual, now forty-three (43) years old, as encouragement to write good documentation.

It may have a longer life than you ever expected!

For example, in 1974 Richie writes:

2.2 Identifier (Names)

An identifier is a sequence of letters and digits: the first character must be alphabetic.

Which we find replicated years later in ISO/IEC 8879 : 1986 (SGML):

4.198 name: A name token whose first character is a name start character.

4.201 name start character: A character that can begin a name: letters and others designated by the concrete syntax.

And in production [53]:

name start character =
LC Letter \
UC Letter \
LCNMSTRT \
UCNMSTRT

Where Figure 1 of 9.2.1 SGML Character defines LC Letter as a-z, UC Letter as A-Z, LCNMSTRT as (none), UCNMSTRT as (none), in the concrete syntax.

And in 1997, the letter vs. digit distinction, finds its way into Extensible Markup Language (XML) 1.0.

[4] NameChar ::= Letter | Digit | ‘.’ | ‘-‘ | ‘_’ | ‘:’ | CombiningChar | Extender
[5] Name ::= (Letter | ‘_’ | ‘:’) (NameChar)*

“Letter” is a link to a production referencing all the qualifying Unicode characters which is too long to include here.

What started off as an arbitrary choice, “alphabetic” characters as name start characters in 1974, is picked up some 12 years later (1986) in ISO/IEC 8879 (SGML), both of which were bound by a restricted character set.

When the opportunity came to abandon the letter versus digit distinction in name start characters (XML 1.0), the result is a larger character repertoire for name start characters, but digits continue as second-class citizens.

Can you point to an explanation why Richie preferred alphabetic characters over digits for name start characters?

### Google open sources a MapReduce framework for C/C++

Monday, February 23rd, 2015

Google open sources a MapReduce framework for C/C++ by Derrick Harris,

From the post:

Google announced on Wednesday that the company is open sourcing a MapReduce framework that will let users run native C and C++ code in their Hadoop environments. Depending on how much traction MapReduce for C, or MR4C, gets and by whom, it could turn out to be a pretty big deal.

Hadoop is famously, or infamously, written in Java and as such can suffer from performance issues compared with native C++ code. That’s why Google’s original MapReduce system was written in C++, as is the Quantcast File System, that company’s homegrown alternative for the Hadoop Distributed File System. And, as the blog post announcing MR4C notes, “many software companies that deal with large datasets have built proprietary systems to execute native code in MapReduce frameworks.”

Great news but be aware that “performance” is a tricky issue. If “performance” had a single meaning, the TIOBE Index for February 2015 (a rough gauge of programming language popularity) to look quite different over the years.

I remember a conference story where a programmer had written an application using Python, reasoning that resource limitations would compel the client to return for a fuller, enterprise solution. To their chagrin, the customer never exhausted the potential of the first solution. 😉

### Flow: Actor-based Concurrency with C++ [FoundationDB]

Saturday, February 14th, 2015

Flow: Actor-based Concurrency with C++

From the post:

FoundationDB began with ambitious goals for both high performance per node and scalability. We knew that to achieve these goals we would face serious engineering challenges while developing the FoundationDB core. We’d need to implement efficient asynchronous communicating processes of the sort supported by Erlang
or the Async library in .NET, but we’d also need the raw speed and I/O efficiency of C++. Finally, we’d need to perform extensive simulation to engineer for reliability and fault tolerance on large clusters.

To meet these challenges, we developed several new tools, the first of which is Flow, a new programming language that brings actor-based concurrency to C++11. To add this capability, Flow introduces a number of new keywords and control-flow primitives for managing concurrency. Flow is implemented as a compiler which analyzes an asynchronous function (actor) and rewrites it as an object with many different sub-functions that use callbacks to avoid blocking (see streamlinejs for a similar concept using JavaScript). The Flow compiler’s output is normal C++11 code, which is then compiled to a binary using traditional tools. Flow also provides input to our simulation tool, Lithium, which conducts deterministic simulations of the entire system, including its physical interfaces and failure modes. In short, Flow allows efficient concurrency within C++ in a maintainable and extensible manner, achieving all three major engineering goals:

• high performance (by compiling to native code),
• actor-based concurrency (for high productivity development),
• simulation support (for testing).

Flow Availability

Flow is not currently available outside of FoundationDB, but we’d like to open-source it in the future. If you’d like to stay in the loop with our progress subscribe below.

Are you going to be ready when Flow is released separate from FoundationDB?

### Holiday Gift: Open-Source C++ SDK & GraphLab Create 1.2

Wednesday, December 24th, 2014

Holiday Gift: Open-Source C++ SDK & GraphLab Create 1.2 by Rajat Arya.

From the post:

Just when you were wondering how to keep from getting bored this holiday season, we’re delivering something to fuel your creativity and sharpen your C++ coding skills. With the release of GraphLab Create 1.x SDK (beta) you can now harness and extend the C++ engine that powers GraphLab Create.

Extensions built with the SDK can directly access the SFrame and SGraph data structures from within the C++ engine. Direct access enables you to build custom algorithms, toolkits, and lambdas in efficient native code. The SDK provides a lightweight path to create and compile custom functions and expose them through Python.

One of the great things about the Internet is that as soon as you wonder something like “…how am I going to keep from being bored…” a post like this one appears in your Twitter stream. Well, at least if you are a follower of @graphlabteam. (A good reason to be following @graphlabteam.)

Watching the explosive growth of progress on graphs and graph processing over the past couple of years makes me suspect that the security side of the house is doing something wrong. Not sure what but it isn’t making this sort of progress.

Enjoy the SDK!

### Wintel and Open Source

Thursday, November 13th, 2014

The software world is reverberating with the news that Microsoft is in the process of making .NET completely open source.

On the same day, Intel announced that it had released “Julia2C, a source-to-source translator from Julia to C.”

Hmmm, is this evidence that open source is a viable path for commercial vendors? 😉

Next Question: How long before non-open source code become a liability? As in a nesting place for government surveillance/malware.

Speculation: Not as long as it took Wintel to move towards open source.

Consumers should demand open source code as a condition for purchase. All software, all the time.

### Announcing Clasp

Tuesday, October 28th, 2014

Announcing Clasp by Christian Schafmeister.

From the post:

Click here for up to date build instructions

Today I am happy to make the first release of the Common Lisp implementation “Clasp”. Clasp uses LLVM as its back-end and generates native code. Clasp is a super-set of Common Lisp that interoperates smoothly with C++. The goal is to integrate these two very different languages together as seamlessly as possible to provide the best of both worlds. The C++ interoperation allows Common Lisp programmers to easily expose powerful C++ libraries to Common Lisp and solve complex programming challenges using the expressive power of Common Lisp. Clasp is licensed under the LGPL.

Common Lisp is considered by many to be one of the most expressive programming languages in existence. Individuals and small teams of programmers have created fantastic applications and operating systems within Common Lisp that require much larger effort when written in other languages. Common Lisp has many language features that have not yet made it into the C++ standard. Common Lisp has first-class functions, dynamic variables, true macros for meta-programming, generic functions, multiple return values, first-class symbols, exact arithmetic, conditions and restarts, optional type declarations, a programmable reader, a programmable printer and a configurable compiler. Common Lisp is the ultimate programmable programming language.

Clojure is a dialect of Lisp, which means you may spot situations where Lisp would be the better solution. Especially if you can draw upon C++ libraries.

The project is “actively looking” for new developers. Could be your opportunity to get in on the ground floor!

### Native Actors – A Scalable Software Platform for Distributed, Heterogeneous Environments

Saturday, September 27th, 2014

Native Actors – A Scalable Software Platform for Distributed, Heterogeneous Environments by Dominik Charousset, Thomas C. Schmidt, Raphael Hiesgen, and Matthias Wählisch.

Abstract:

Writing concurrent software is challenging, especially with low-level synchronization primitives such as threads or locks in shared memory environments. The actor model replaces implicit communication by an explicit message passing in a ‘shared-nothing’ paradigm. It applies to concurrency as well as distribution, but has not yet entered the native programming domain. This paper contributes the design of a native actor extension for C++, and the report on a software platform that implements our design for (a)concurrent, (b) distributed, and (c) heterogeneous hardware environments. GPGPU and embedded hardware components are integrated in a transparent way. Our software platform supports the development of scalable and efficient parallel software. It includes a lock-free mailbox algorithm with pattern matching facility for message processing. Thorough performance evaluations reveal an extraordinary small memory footprint in realistic application scenarios, while runtime performance not only outperforms existing mature actor implementations, but exceeds the scaling behavior of low-level message passing libraries such as OpenMPI.

When I read Stroustrup: Why the 35-year-old C++ still dominates ‘real’ dev I started to post a comment asking why there were no questions about functional programming languages? But, the interview is a “puff” piece and not a serious commentary on programming.

Then I ran across this work on implementing actors in C++. Maybe Stroustrup was correct without being aware of it.

Bundled with the C++ library libcppa, available at: http://www.libcppa.org

### OpenGM

Friday, August 1st, 2014

OpenGM

From the webpage:

OpenGM is a C++ template library for discrete factor graph models and distributive operations on these models. It includes state-of-the-art optimization and inference algorithms beyond message passing. OpenGM handles large models efficiently, since (i) functions that occur repeatedly need to be stored only once and (ii) when functions require different parametric or non-parametric encodings, multiple encodings can be used alongside each other, in the same model, using included and custom C++ code. No restrictions are imposed on the factor graph or the operations of the model. OpenGM is modular and extendible. Elementary data types can be chosen to maximize efficiency. The graphical model data structure, inference algorithms and different encodings of functions inter-operate through well-defined interfaces. The binary OpenGM file format is based on the HDF5 standard and incorporates user extensions automatically.

Documentation lists algorithms with references.

I first saw this in a post by Danny Bickson, OpenGM graphical models toolkit.

### Learning Lisp With C

Wednesday, April 9th, 2014

Build Your Own Lisp by Daniel Holden.

From the webpage:

If you’re looking to learn C, or you’ve ever wondered how to build your own programming language, this is the book for you.

In just a few lines of code, I’ll teach you how to effectively use C, and what it takes to start building your own language.

Along the way we’ll learn about the weird and wonderful nature of Lisps, and what really makes a programming language. By building a real world C program we’ll learn implicit things that conventional books cannot teach. How to develop a project, how to make life easy for your users, and how to write beautiful code.

This book is free to read online. Get started now!

This looks interesting and useful.

Enjoy!

### Screaming fast Lucene searches using C++ via JNI

Wednesday, June 19th, 2013

Screaming fast Lucene searches using C++ via JNI by Michael McCandless.

From the post:

At the end of the day, when Lucene executes a query, after the initial setup the true hot-spot is usually rather basic code that decodes sequential blocks of integer docIDs, term frequencies and positions, matches them (e.g. taking union or intersection for BooleanQuery), computes a score for each hit and finally saves the hit if it’s competitive, during collection.

Even apparently complex queries like FuzzyQuery or WildcardQuery go through a rewrite process that reduces them to much simpler forms like BooleanQuery.

Lucene’s hot-spots are so simple that optimizing them by porting them to native C++ (via JNI) was too tempting!

So I did just that, creating the lucene-c-boost github project, and the resulting speedups are exciting:

(…)

Speedups range from 0.7 X to 7.8 X.

Read Michael’s post for explanations, warnings, caveats, etc.

But it is exciting news!

### Introduction to C and C++

Monday, March 25th, 2013

Introduction to C and C++

Description:

This course provides a fast-paced introduction to the C and C++ programming languages. You will learn the required background knowledge, including memory management, pointers, preprocessor macros, object-oriented programming, and how to find bugs when you inevitably use any of those incorrectly. There will be daily assignments and a small-scale individual project.

This course is offered during the Independent Activities Period (IAP), which is a special 4-week term at MIT that runs from the first week of January until the end of the month.

Just in case you want a deeper understanding of bugs that enable hacking or how to avoid creating such bugs in the first place.