## Archive for the ‘Computer Science’ Category

### Notes to (NUS) Computer Science Freshmen…

Monday, March 13th, 2017

Notes to (NUS) Computer Science Freshmen, From The Future

From the intro:

Early into the AY12/13 academic year, Prof Tay Yong Chiang organized a supper for Computer Science freshmen at Tembusu College. The bunch of seniors who were gathered there put together a document for NUS computing freshmen. This is that document.

Feel free to create a pull request to edit or add to it, and share it with other freshmen you know.

The Art of Computer Programming (a review of everything in Computer Science; pretty much nobody, save Knuth, has finished reading this)

When you think about the amount of time Knuth has spent researching, writing and editing The Art of Computer Programming (TAOCP), it doesn’t sound unreasonable to expect others, a significant number of others, to have read it.

Any online reading groups focused on TAOCP?

### New Spaceship Speed in Conway’s Game of Life

Saturday, January 14th, 2017

From the post:

In this article, I assume that you have basic familiarity with Conway’s Game of Life. If this is not the case, you can try reading an explanatory article but you will still struggle to understand much of the following content.

The day before yesterday ConwayLife.com forums saw a new member named zdr. When we the lifenthusiasts meet a newcomer, we expect to see things like “brand new” 30-cell 700-gen methuselah and then have to explain why it is not notable. However, what zdr showed us made our jaws drop.

It was a 28-cell c/10 orthogonal spaceship:

… (emphasis in the original)

The mentioned introduction isn’t sufficient to digest the material in this post.

There is a wealth of material available on cellular automata (the Game of Life is one).

LifeWiki is one and Complex Cellular Automata is another. While not exhaustive of all there is to know about cellular automata, familiarity with take some time and skill.

Still, I offer this as encouragement that fundamental discoveries remain to be made.

But if and only if you reject conventional wisdom that prevents you from looking.

### D-Wave Just Open-Sourced Quantum Computing [DC Beltway Parking Lot Distraction]

Friday, January 13th, 2017

D-Wave Just Open-Sourced Quantum Computing by Dom Galeon.

D-Wave has just released a welcome distraction for CS types sitting in the DC Beltway Parking Lot on January 20-21, 2017. (I assuming you brought extra batteries for your laptop.) After you run out of gas, your laptop will be running on battery power alone.

Just remember to grab a copy of Qbsolv before you leave for the tailgate/parking lot party on the Beltway.

A software tool known as Qbsolv allows developers to program D-Wave’s quantum computers even without knowledge of quantum computing. It has already made it possible for D-Wave to work with a bunch of partners, but the company wants more. “D-Wave is driving the hardware forward,” Bo Ewald, president of D-Wave International, told Wired. “But we need more smart people thinking about applications, and another set thinking about software tools.”

To that end, D-Wave has open-sourced Qbsolv, making it possible for anyone to freely share and modify the software. D-Wave hopes to build an open source community of sorts for quantum computing. Of course, to actually run this software, you’d need access to a piece of hardware that uses quantum particles, like one of D-Wave’s quantum computers. However, for the many who don’t have that access, the company is making it possible to download a D-Wave simulator that can be used to test Qbsolv on other types of computers.

This open-source Qbsolv joins an already-existing free software tool called Qmasm, which was developed by one of Qbsolv’s first users, Scott Pakin of Los Alamos National Laboratory. “Not everyone in the computer science community realizes the potential impact of quantum computing,” said mathematician Fred Glover, who’s been working with Qbsolv. “Qbsolv offers a tool that can make this impact graphically visible, by getting researchers and practitioners involved in charting the future directions of quantum computing developments.”

D-Wave’s machines might still be limited to solving optimization problems, but it’s a good place to start with quantum computers. Together with D-Wave, IBM has managed to develop its own working quantum computer in 2000, while Google teamed up with NASA to make their own. Eventually, we’ll have a quantum computer that’s capable of performing all kinds of advanced computing problems, and now you can help make that happen.

From the github page:

qbsolv is a metaheuristic or partitioning solver that solves a potentially large quadratic unconstrained binary optimization (QUBO) problem by splitting it into pieces that are solved either on a D-Wave system or via a classical tabu solver.

The phrase, “…might still be limited to solving optimization problems…” isn’t as limiting as it might appear.

A recent (2014) survey of quadratic unconstrained binary optimization (QUBO), The Unconstrained Binary Quadratic Programming Problem: A Survey runs some thirty-three pages and should keep you occupied however long you sit on the DC Beltway.

From page 10 of the survey:

Kochenberger, Glover, Alidaee, and Wang (2005) examine the use of UBQP as a tool for clustering microarray data into groups with high degrees of similarity.

Where I read one person’s “similarity” to be another person’s test of “subject identity.”

PS: Enjoy the DC Beltway. You may never see it motionless ever again.

### OpenTOC (ACM SIG Proceedings – Free)

Sunday, January 1st, 2017

OpenTOC

From the webpage:

ACM OpenTOC is a unique service that enables Special Interest Groups to generate and post Tables of Contents for proceedings of their conferences enabling visitors to download the definitive version of the contents from the ACM Digital Library at no charge.

Downloads of these articles are captured in official ACM statistics, improving the accuracy of usage and impact measurements. Consistently linking to definitive versions of ACM articles should reduce user confusion over article versioning.

Conferences are listed by year, 2014 – 2016 and by event.

A step in the right direction.

It didn’t the last time I had a digital library subscription. Contacting the secret ACM committee that decides on web features was verboten.

Enjoy this improvement in access while waiting for ACM access bottlenecks to wither and die.

### Continuous Unix commit history from 1970 until today

Thursday, December 29th, 2016

Continuous Unix commit history from 1970 until today

From the webpage:

The history and evolution of the Unix operating system is made available as a revision management repository, covering the period from its inception in 1970 as a 2.5 thousand line kernel and 26 commands, to 2016 as a widely-used 27 million line system. The 1.1GB repository contains about half a million commits and more than two thousand merges. The repository employs Git system for its storage and is hosted on GitHub. It has been created by synthesizing with custom software 24 snapshots of systems developed at Bell Labs, the University of California at Berkeley, and the 386BSD team, two legacy repositories, and the modern repository of the open source FreeBSD system. In total, about one thousand individual contributors are identified, the early ones through primary research. The data set can be used for empirical research in software engineering, information systems, and software archaeology.

You can read more details about the contents, creation, and uses of this repository through this link.

Two repositories are associated with the project:

• unix-history-repo is a repository representing a reconstructed version of the Unix history, based on the currently available data. This repository will be often automatically regenerated from scratch, so this is not a place to make contributions. To ensure replicability its users are encouraged to fork it or archive it.
• unix-history-make is a repository containing code and metadata used to build the above repository. Contributions to this repository are welcomed.

Not everyone will find this exciting but this rocks as a resource for:

empirical research in software engineering, information systems, and software archaeology

Need to think seriously about putting this on a low-end laptop and sealing it up in a Faraday cage.

Just in case. 😉

### Low fat computing

Thursday, December 22nd, 2016

A summary of the presentation by Schmidt by Malcolm Sparks, along with the presentation itself.

Lots of strange and 3-D printable eye candy for the first 15 minutes or so with Schmidt’s background. Starts to really rock around 20 minutes in with Forth code and very low level coding.

To get a better idea of what Schmidt has been doing, see his website: thi.ng, or his Forth repl in Javascript, http://forth.thi.ng/, or his GitHub repository or at: Github: thi.ng

Stop by at http://toxiclibs.org/ although the material there looks dated.

### Operating Systems Design and Implementation (12th USENIX Symposium)

Thursday, November 17th, 2016

Operating Systems Design and Implementation (12th USENIX Symposium) – Savannah, GA, USA, November 2-4, 2016.

Message from the OSDI ’16 Program Co-Chairs:

We are delighted to welcome to you to the 12th USENIX Symposium on Operating Systems Design and Implementation, held in Savannah, GA, USA! This year’s program includes a record high 47 papers that represent the strength of our community and cover a wide range of topics, including security, cloud computing, transaction support, storage, networking, formal verification of systems, graph processing, system support for machine learning, programming languages, troubleshooting, and operating systems design and implementation.

Weighing in at seven hundred and ninety-seven (797) pages, this tome will prove more than sufficient to avoid annual family arguments during the holiday season.

Not to mention this is an opportunity to hone your skills to a fine edge.

### Understanding the fundamentals of attacks (Theory of Exploitation)

Thursday, November 3rd, 2016

Understanding the fundamentals of attacks – What is happening when someone writes an exploit? by Halvar Flake / Thomas Dullien.

The common “bag of tricks” as Halvar refers to them for hacking, does cover all the major data breaches for the last 24 months.

No zero-day exploits.

Certainly none of the deep analysis offered by Halvar here.

Still, you owe it to yourself and your future on one side or the other of computer security, to review these slides and references carefully.

Even though Halvar concludes (in part)

Exploitation is programming emergent weird machines.

It does not require EIP/RIP, and is not a bad of tricks.

Theory of exploitation is still in embryonic stage.

Imagine the advantages of having mastered the art of exploitation theory at its inception.

In an increasingly digital world, you may be worth your own weight in gold. 😉

PS: Specifying the subject identity properties of exploits will assist in organizing them for future use/defense.

One expert hacker is like a highly skilled warrior.

Making exploits easy to discover/use by average hackers is like a skilled warrior facing a company of average fighters.

The outcome will be bloody, but never in doubt.

### The Hanselminutes Podcast

Friday, August 26th, 2016

I went looking for Felienne’s podcast on code smells and discovered along with it, The Hanselminutes Podcast: Fresh Air for Developers!

Felienne’s podcast is #542 so there is a lot of content to enjoy! (I checked the archive. Yes, there really are 542 episodes as of today.)

### Exploring Code Smells in code written by Children

Friday, August 26th, 2016

From the description:

Felienne is always learning. In exploring her PhD dissertation and her public speaking experience it’s clear that she has no intent on stopping! Most recently she’s been exploring a large corpus of Scratch programs looking for Code Smells. How do children learn how to code, and when they do, does their code “smell?” Is there something we can do when teaching to promote cleaner, more maintainable code?

Felienne discusses a paper due to appear in September on analysis of 250K Scratch programs for code smells.

Thoughts on teaching programmers to detect bug smells?

### If You Believe In OpenAccess, Do You Practice OpenAccess?

Wednesday, June 15th, 2016

CSC-OpenAccess LIBRARY

From the webpage:

CSC Open-Access Library aim to maintain and develop access to journal publication collections as a research resource for students, teaching staff, researchers and industrialists.

You can see a complete listing of the journals here.

Before you protest these are not Science or Nature, remember that Science and Nature did not always have the reputations they do today.

Let the quality of your work bolster the reputations of open access publications and attract others to them.

### How to Run a Russian Hacking Ring [Just like Amway, Mary Kay … + Career Advice]

Sunday, June 12th, 2016

From the post:

A man with intense eyes crouches over a laptop in a darkened room, his face and hands hidden by a black ski mask and gloves. The scene is lit only by the computer screen’s eerie glow.

Exaggerated portraits of malicious hackers just like this keep popping up in movies and TV, despite the best efforts of shows like Mr. Robot to depict hackers in a more realistic way. Add a cacophony of news about data breaches that have shaken the U.S. government, taken entire hospital systems hostage, and defrauded the international banking system, and hackers start to sound like omnipotent super-villains.

But the reality is, as usual, less dramatic. While some of the largest cyberattacks have been the work of state-sponsored hackers—the OPM data breach that affected millions of Americans last year, for example, or the Sony hack that revealed Hollywood’s intimate secrets​—the vast majority of the world’s quotidian digital malice comes from garden-variety hackers.

What a downer this would be at career day at the local high school.

Yes, you too can be a hacker but it’s as dull as anything you have seen in Dilbert.

Your location plays an important role in whether Russian hacking ring employment is in your future. Kaveh reports:

Even the boss’s affiliates, who get less than half of each ransom that they extract, make a decent wage. They earned an average of 600 dollars a month, or about 40 percent more than the average Russian worker.

$600/month is ok, if you are living in Russia, not so hot if you aspire to Venice Beach. (It’s too bad the beach cam doesn’t pan and zoom.) The level of technical skills required for low-lying fruit hacking is falling, meaning more competitors for the low-end. Potential profits are going to fall even further. The no liability for buggy software will fall sooner rather than later and skilled hackers (I mean security researchers) will find themselves in demand by both plaintiffs and defendants. You will earn more money if you can appear in court, some expert witnesses make$600/hour or more. (Compare the $600/month in Russia.) Even if you can’t appear in court, for reasons that seem good to you, fleshing out the details of hacks is going to be on demand from all sides. You may start at the shallow end of the pool but resolve to not stay there. Read deeply, practice everyday, start current on new developments and opportunities, contribute to online communities. ### “This guy’s arrogance takes your breath away” Tuesday, May 31st, 2016 From the post: Item No. 155: Correspondence with Edsger Dijkstra. 1979 At the time of this correspondence, Backus had just won the 1977 Turing Award and had chosen to talk about his then-current research on functional programming (FP) for his award lecture in Seattle. See this pdf of the published version, noting that Backus himself described “significant differences” with the talk that was actually given. Indeed, the transcript at the LoC was much more casual and easier to follow. Dijkstra, in his characteristically acerbic and hyperbolic style, wrote a scathing public review (EWD 692) and some private critical remarks in what looks like a series of letters with Backus. From what I can tell, these letters are not part of the E. W. Dijkstra archives at UT Austin, nor are they available online anywhere else. So here they are for posterity. You won’t find Long form exchanges such as these in present-day near instant bait-reply cycles of email messages. That’s unfortunate. Chen has created a Github repository if you are interested in transcribing pre-email documents. You can help create better access to the history of computer science and see how to craft a cutting remark, as opposed to blurting out the first insult that comes to mind. Enjoy! ### Tip #20: Play with Racket [Computer Science for Everyone?] Saturday, January 30th, 2016 From the post: Racket is a programming language in the Lisp tradition that is different from other programming languages in a few important ways. It can be any language you want – because Racket is heavily used for pedagogy, it has evolved into a suite of languages and tools that you can use to explore as many different programming paradigms as you can think of. You can also download it and play with it right now, without installing anything else, or knowing anything at all about computers or programming. Watching Matthias Felleisen’s “big-bang: the world, universe, and network in the programming language” talk will give you an idea of how Racket can be used to help people learn how to think about mathematics, computation, and more. Try it out even if you “hate Lisp” or “don’t know how to program” – it’s really a lot of fun. Aaron and Michael scooped President Obama’s computer science skills for everyone by a day: President Barack Obama said Saturday he will ask Congress for billions of dollars to help students learn computer science skills and prepare for jobs in a changing economy. “In the new economy, computer science isn’t an optional skill. It’s a basic skill, right along with the three R’s,” Obama said in his weekly radio and Internet address….(Obama Wants$4B to Help Students Learn Computer Science)

The “computer science for everyone” is a popular chant but consider the Insecure Internet of Things (IIoT).

Will minimal computer science skills increase or decrease the level of security for the IIoT?

That’s what I think too.

Removal of IoT components is the only real defense. Expect a vibrant cottage industry to grow up around removing IoT components.

### Everything You Know About Latency Is Wrong

Thursday, December 24th, 2015

From the post:

Okay, maybe not everything you know about latency is wrong. But now that I have your attention, we can talk about why the tools and methodologies you use to measure and reason about latency are likely horribly flawed. In fact, they’re not just flawed, they’re probably lying to your face.

When I went to Strange Loop in September, I attended a workshop called “Understanding Latency and Application Responsiveness” by Gil Tene. Gil is the CTO of Azul Systems, which is most renowned for its C4 pauseless garbage collector and associated Zing Java runtime. While the workshop was four and a half hours long, Gil also gave a 40-minute talk called “How NOT to Measure Latency” which was basically an abbreviated, less interactive version of the workshop. If you ever get the opportunity to see Gil speak or attend his workshop, I recommend you do. At the very least, do yourself a favor and watch one of his recorded talks or find his slide decks online.

The remainder of this post is primarily a summarization of that talk. You may not get anything out of it that you wouldn’t get out of the talk, but I think it can be helpful to absorb some of these ideas in written form. Plus, for my own benefit, writing about them helps solidify it in my head.

Great post, not only for the discussion of latency but for two extensions to the admonition (Moon is a Harsh Mistress) “Always cut cards:”

• Always understand the nature of your data.
• Always understand the nature your methodology.

If you fail at either of those, the results presented to you or that you present to others may or may not be true, false or irrelevant.

Treat’s post is just one example in a vast sea of data and methodologies which are just as misleading if not more so.

If you need motivation to put in the work, how’s your comfort level with being embarrassed in public? Like someone demonstrating your numbers are BS.

### Readings in Database Systems, 5th Edition (Kindle Stuffer)

Tuesday, December 15th, 2015

Readings in Database Systems, 5th Edition, Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker, editors.

From the webpage:

1. Preface [HTML] [PDF]
2. Background introduced by Michael Stonebraker [HTML] [PDF]
3. Traditional RDBMS Systems introduced by Michael Stonebraker [HTML] [PDF]
4. Techniques Everyone Should Know introduced by Peter Bailis [HTML] [PDF]
5. New DBMS Architectures introduced by Michael Stonebraker [HTML] [PDF]
6. Large-Scale Dataflow Engines introduced by Peter Bailis [HTML] [PDF]
7. Weak Isolation and Distribution introduced by Peter Bailis [HTML] [PDF]
8. Query Optimization introduced by Joe Hellerstein [HTML] [PDF]
9. Interactive Analytics introduced by Joe Hellerstein [HTML] [PDF]
10. Languages introduced by Joe Hellerstein [HTML] [PDF]
11. Web Data introduced by Peter Bailis [HTML] [PDF]
12. A Biased Take on a Moving Target: Complex Analytics
by Michael Stonebraker [HTML] [PDF]
13. A Biased Take on a Moving Target: Data Integration
by Michael Stonebraker [HTML] [PDF]

Complete Book: [HTML] [PDF]

Previous Editions: [HTML]

Citations to the “reading” do not present themselves as hyperlinks but they are.

If you are giving someone a Kindle this Christmas, consider pre-loading Readings in Database Systems, along with the readings as a Kindle stuffer.

### The Moral Failure of Computer Scientists [Warning: Scam Alert!]

Sunday, December 13th, 2015

From the post:

Computer scientists and cryptographers occupy some of the ivory tower’s highest floors. Among academics, their work is prestigious and celebrated. To the average observer, much of it is too technical to comprehend. The field’s problems can sometimes seem remote from reality.

But computer science has quite a bit to do with reality. Its practitioners devise the surveillance systems that watch over nearly every space, public or otherwise—and they design the tools that allow for privacy in the digital realm. Computer science is political, by its very nature.

That’s at least according to Phillip Rogaway, a professor of computer science at the University of California, Davis, who has helped create some of the most important tools that secure the Internet today. Last week, Rogaway took his case directly to a roomful of cryptographers at a conference in Auckland, New Zealand. He accused them of a moral failure: By allowing the government to construct a massive surveillance apparatus, the field had abused the public trust. Rogaway said the scientists had a duty to pursue social good in their work.

He likened the danger posed by modern governments’ growing surveillance capabilities to the threat of nuclear warfare in the 1950s, and called upon scientists to step up and speak out today, as they did then.

I spoke to Rogaway about why cryptographers fail to see their work in moral terms, and the emerging link between encryption and terrorism in the national conversation. A transcript of our conversation appears below, lightly edited for concision and clarity.

I don’t disagree with Rogaway that all science and technology is political. I might use the term social instead but I agree, there are no neutral choices.

Having said that, I do disagree that Rogaway has the standing to pre-package a political stance colored as “morals” and denounce others as “immoral” if they disagree.

It is one of the oldest tricks in rhetoric but quite often effective, which is why people keep using it.

If Rogaway is correct that CS and technology are political, then his stance for a particular take on government, surveillance and cryptography is equally political.

Not that I disagree with his stance, but I don’t consider it be a moral choice.

Anything you can do to impede, disrupt or interfere with any government surveillance is fine by me. I won’t complain. But that’s because government surveillance, the high-tech kind, is a waste of time and effort.

Rogaway uses scientists who spoke out in the 1950’s about the threat of nuclear warfare as an example. Some example.

The Federation of American Scientists estimates that as of September 2015, there are approximately 15,800 nuclear weapons in the world.

Hmmm, doesn’t sound like their moral outrage was very effective does it?

There will be sessions, presentations, conferences, along with comped travel and lodging, publications for tenure, etc., but the sum of the discussion of morality in computer science with be largely the same.

The reason for the sameness of result is that discussions, papers, resolutions and the rest, aren’t nearly as important as the ethical/moral choices you make in the day to day practice as a computer scientist.

Choices in the practice of computer science make a difference, discussions of fictional choices don’t. It’s really that simple.*

*That’s not entirely fair. The industry of discussing moral choices without making any of them is quite lucrative and it depletes the bank accounts of those snared by it. So in that sense it does make a difference.

### Order of Requirements Matter

Tuesday, December 8th, 2015

Sam Lightstone posted a great illustration of why the order of requirements can matter to Twitter:

Visualizations rarely get much clearer.

You could argue that Minard’s map of Napoleon’s invasion of Russia is equally clear:

But Minard drew with the benefit of hindsight, not foresight.

The Laws of Robotics, on the other hand, have predictive value for the different orders of requirements.

I don’t know how many requirements Honeywell had for the Midas and Midas Black Gas Detectors but you can bet IP security was near the end of the list, if explicit at all.

IP security should be #1 with a bullet, especially for devices that detect Ammonia (caustic, hazarous), Arsine (highly toxic, flammable), Chlorine (extremely dangerous, poisonous for all living organisms), Hydrogen cyanide, and Hydrogen flouride (“Hydrogen fluoride is a highly dangerous gas, forming corrosive and penetrating hydrofluoric acid upon contact with living tissue. The gas can also cause blindness by rapid destruction of the corneas.”)

When IP security is not the first requirement, it’s not hard to foresee the outcome, an Insecure Internet of Things.

Is that what we want?

### arXiv Sanity Preserver

Sunday, November 29th, 2015

arXiv Sanity Preserver by Andrej Karpathy.

From the webpage:

There are way too many arxiv papers, so I wrote a quick webapp that lets you search and sort through the mess in a pretty interface, similar to my pretty conference format.

It’s super hacky and was written in 4 hours. I’ll keep polishing it a bit over time perhaps but it serves its purpose for me already. The code uses Arxiv API to download the most recent papers (as many as you want – I used the last 1100 papers over last 3 months), and then downloads all papers, extracts text, creates tfidf vectors for each paper, and lastly is a flask interface for searching through and filtering similar papers using the vectors.

Main functionality is a search feature, and most useful is that you can click “sort by tfidf similarity to this”, which returns all the most similar papers to that one in terms of tfidf bigrams. I find this quite useful.

You can see this rather remarkable tool online at: https://karpathy23-5000.terminal.com/

Beyond its obvious utility for researchers, this could be used as a framework for experimenting with other similarity measures.

Enjoy!

I first saw this in a tweet by Lynn Cherny.

### Best Paper Awards in Computer Science (2014)

Friday, November 27th, 2015

Best Paper Awards in Computer Science (2014)

From the webpage:

Jeff Huang’s list of the best paper awards from 29 CS conferences since 1996 up to and including 2014.

I saw a tweet about Jeff’s site being updated to include papers from 2014.

If you are looking for reading material in a particular field, this is a good place to start.

For a complete list of the organizations, conferences as expanded abbreviations: see: Best Paper Awards in Computer Science (2013). None of them have changed so I didn’t see the point of repeating them.

### LIQUi|> – A Quantum Computing Simulator

Friday, November 13th, 2015

From the post:

Next week, at the SuperComputing 2015 conference in Austin, Texas, Dave Wecker, a lead architect on the QuArC team, will discuss the recent public release on GitHub of a suite of tools that allows computer scientists to simulate a quantum computer’s capabilities. That’s a crucial step in building the tools needed to run actual quantum computers.

“This is the closest we can get to running a quantum computer without having one,” said Wecker, who has helped develop the software.

The software is called Language-Integrated Quantum Operations, or LIQUi|>. The funky characters at the end refer to how a quantum operation is written in mathematical terms.

The researchers are hoping that, using LIQUi|>, computer scientists at Microsoft and other academic and research institutions will be able to perfect the algorithms they need to efficiently use a quantum computer even as the computers themselves are simultaneously being developed.

“We can actually debut algorithms in advance of running them on the computer,” Svore said.

As of today, November 13, 2015, LIQUi|> has only one (1) hit at GitHub. Will try back next week to see what the numbers look like then.

You won’t have a quantum computer by the holidays but you may have created your first quantum algorithm by then.

Enjoy!

### The Architecture of Open Source Applications

Thursday, November 12th, 2015

The Architecture of Open Source Applications

From the webpage:

Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

Our goal is to change that. In these two books, the authors of four dozen open source applications explain how their software is structured, and why. What are each program’s major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to these books provide unique insights into how they think.

If you are a junior developer, and want to learn how your more experienced colleagues think, these books are the place to start. If you are an intermediate or senior developer, and want to see how your peers have solved hard design problems, these books can help you too.

Follow us on our blog at http://aosabook.org/blog/, or on Twitter at @aosabook and using the #aosa hashtag.

I happened upon these four books because of a tweet that mentioned: Early Access Release of Allison Kaptur’s “A Python Interpreter Written in Python” Chapter, which I found to be the tenth chapter of “500 Lines.”

OK, but what the hell is “500 Lines?” Poking around a bit I found The Architecture of Open Source Applications.

Which is the source for the material I quote above.

Do you learn from example?

Let me give you the flavor of three of the completed volumes and the “500 Lines” that is in progress:

The Architecture of Open Source Applications: Elegance, Evolution, and a Few Fearless Hacks (vol. 1), from the introduction:

Carpentry is an exacting craft, and people can spend their entire lives learning how to do it well. But carpentry is not architecture: if we step back from pitch boards and miter joints, buildings as a whole must be designed, and doing that is as much an art as it is a craft or science.

Programming is also an exacting craft, and people can spend their entire lives learning how to do it well. But programming is not software architecture. Many programmers spend years thinking about (or wrestling with) larger design issues: Should this application be extensible? If so, should that be done by providing a scripting interface, through some sort of plugin mechanism, or in some other way entirely? What should be done by the client, what should be left to the server, and is “client-server” even a useful way to think about this application? These are not programming questions, any more than where to put the stairs is a question of carpentry.

Building architecture and software architecture have a lot in common, but there is one crucial difference. While architects study thousands of buildings in their training and during their careers, most software developers only ever get to know a handful of large programs well. And more often than not, those are programs they wrote themselves. They never get to see the great programs of history, or read critiques of those programs’ designs written by experienced practitioners. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

This book is our attempt to change that. Each chapter describes the architecture of an open source application: how it is structured, how its parts interact, why it’s built that way, and what lessons have been learned that can be applied to other big design problems. The descriptions are written by the people who know the software best, people with years or decades of experience designing and re-designing complex applications. The applications themselves range in scale from simple drawing programs and web-based spreadsheets to compiler toolkits and multi-million line visualization packages. Some are only a few years old, while others are approaching their thirtieth anniversary. What they have in common is that their creators have thought long and hard about their design, and are willing to share those thoughts with you. We hope you enjoy what they have written.

The Architecture of Open Source Applications: Structure, Scale, and a Few More Fearless Hacks (vol. 2), from the introduction:

In the introduction to Volume 1 of this series, we wrote:

Building architecture and software architecture have a lot in common, but there is one crucial difference. While architects study thousands of buildings in their training and during their careers, most software developers only ever get to know a handful of large programs well… As a result, they repeat one another’s mistakes rather than building on one another’s successes… This book is our attempt to change that.

In the year since that book appeared, over two dozen people have worked hard to create the sequel you have in your hands. They have done so because they believe, as we do, that software design can and should be taught by example—that the best way to learn how think like an expert is to study how experts think. From web servers and compilers through health record management systems to the infrastructure that Mozilla uses to get Firefox out the door, there are lessons all around us. We hope that by collecting some of them together in this book, we can help you become a better developer.

The Performance of Open Source Applications, from the introduction:

It’s commonplace to say that computer hardware is now so fast that most developers don’t have to worry about performance. In fact, Douglas Crockford declined to write a chapter for this book for that reason:

If I were to write a chapter, it would be about anti-performance: most effort spent in pursuit of performance is wasted. I don’t think that is what you are looking for.

Donald Knuth made the same point thirty years ago:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

but between mobile devices with limited power and memory, and data analysis projects that need to process terabytes, a growing number of developers do need to make their code faster, their data structures smaller, and their response times shorter. However, while hundreds of textbooks explain the basics of operating systems, networks, computer graphics, and databases, few (if any) explain how to find and fix things in real applications that are simply too damn slow.

This collection of case studies is our attempt to fill that gap. Each chapter is written by real developers who have had to make an existing system faster or who had to design something to be fast in the first place. They cover many different kinds of software and performance goals; what they have in common is a detailed understanding of what actually happens when, and how the different parts of large applications fit together. Our hope is that this book will—like its predecessor The Architecture of Open Source Applications—help you become a better developer by letting you look over these experts’ shoulders.

500 Lines or Less From the GitHub page:

Every architect studies family homes, apartments, schools, and other common types of buildings during her training. Equally, every programmer ought to know how a compiler turns text into instructions, how a spreadsheet updates cells, and how a database efficiently persists data.

Previous books in the AOSA series have done this by describing the high-level architecture of several mature open-source projects. While the lessons learned from those stories are valuable, they are sometimes difficult to absorb for programmers who have not yet had to build anything at that scale.

“500 Lines or Less” focuses on the design decisions and tradeoffs that experienced programmers make when they are writing code:

• Why divide the application into these particular modules with these particular interfaces?
• Why use inheritance here and composition there?
• How do we predict where our program might need to be extended, and how can we make that easy for other programmers

Each chapter consists of a walkthrough of a program that solves a canonical problem in software engineering in at most 500 source lines of code. We hope that the material in this book will help readers understand the varied approaches that engineers take when solving problems in different domains, and will serve as a basis for projects that extend or modify the contributions here.

BTW, for markup folks, check out Parsing XML at the Speed of Light by Arseny Kapoulkine.

Many hours of reading and keyboard pleasure await anyone using these volumes.

### Visualizing What Your Computer (and Science) Ignore (mostly)

Thursday, November 12th, 2015

Abstract:

Structures and objects are often supposed to have idealized geome- tries such as straight lines or circles. Although not always visible to the naked eye, in reality, these objects deviate from their idealized models. Our goal is to reveal and visualize such subtle geometric deviations, which can contain useful, surprising information about our world. Our framework, termed Deviation Magnification, takes a still image as input, fits parametric models to objects of interest, computes the geometric deviations, and renders an output image in which the departures from ideal geometries are exaggerated. We demonstrate the correctness and usefulness of our method through quantitative evaluation on a synthetic dataset and by application to challenging natural images.

The video for the paper is quite compelling:

From the introduction to the paper:

Many phenomena are characterized by an idealized geometry. For example, in ideal conditions, a soap bubble will appear to be a perfect circle due to surface tension, buildings will be straight and planetary rings will form perfect elliptical orbits. In reality, however, such flawless behavior hardly exists, and even when invisible to the naked eye, objects depart from their idealized models. In the presence of gravity, the bubble may be slightly oval, the building may start to sag or tilt, and the rings may have slight perturbations due to interactions with nearby moons. We present Deviation Magnification, a tool to estimate and visualize such subtle geometric deviations, given only a single image as input. The output of our algorithm is a new image in which the deviations from ideal are magnified. Our algorithm can be used to reveal interesting and important information about the objects in the scene and their interaction with the environment. Figure 1 shows two independently processed images of the same house, in which our method automatically reveals the sagging of the house’s roof, by estimating its departure from a straight line.

Departures from “idealized geometry” make for captivating videos but there is a more subtle point that Deviation Magnification will help bring to the fore.

“Idealized geometry,” just like discrete metrics for attitude measurement or metrics of meaning, etc. are all myths. Useful myths as houses don’t (usually) fall down, marketing campaigns have a high degree of success, and engineering successfully relies on approximations that depart from the “real world.”

Science and computers have a degree of precision that has no counterpart in the “real world.”

Watch the video again if you doubt that last statement.

Whether you are using science and/or a computer, always remember that your results are approximations based upon approximations.

I first saw this in Four Short Links: 12 November 2015 by Nat Torkington.

### Information Visualization MOOC 2015

Thursday, November 5th, 2015

Information Visualization MOOC 2015 by Katy Börner.

From the webpage:

This course provides an overview about the state of the art in information visualization. It teaches the process of producing effective visualizations that take the needs of users into account.

Among other topics, the course covers:

• Data analysis algorithms that enable extraction of patterns and trends in data
• Major temporal, geospatial, topical, and network visualization techniques
• Discussions of systems that drive research and development.

The MOOC ended in April of 2015 but you can still register for a self-paced version of the course.

A quick look at 2013 client projects or the current list of clients and projects, with who students can collaborate, will leave no doubt this is a top-rank visualization course.

I first saw this in a tweet by Kirk Borne.

Monday, October 26th, 2015

From the post:

Most academic journals charge expensive subscriptions and, for those without a login, fees of $30 or more per article. Now academics are using the hashtag #icanhazpdf to freely share copyrighted papers. Scientists are tweeting a link of the paywalled article along with their email address in the hashtag—a riff on the infamous meme of a fluffy cat’s “I Can Has Cheezburger?” line. Someone else who does have access to the article downloads a pdf of the paper and emails the file to the person requesting it. The initial tweet is then deleted as soon as the requester receives the file. 3 rules to remember: 1. Paywall link + #icanhazpdf + your email. 2. Delete tweet when paper arrives. 3. Don’t ask/Don’t tell. Enjoy! ### The Refreshingly Rewarding Realm of Research Papers Wednesday, October 14th, 2015 From the description: Sean Cribbs teaches us how to read and implement research papers – and translate what they describe into code. He covers examples of research implementations he’s been involved in and the relationships he’s built with researchers in the process. A bit longer description at: http://chicago.citycode.io/sean-cribbs.html Have you ever run into a thorny problem that makes your code slow or complicated, for which there is no obvious solution? Have you ever needed a data structure that your language’s standard library didn’t provide? You might need to implement a research paper! While much of research in Computer Science doesn’t seem relevant to your everyday web application, all of those tools and techniques you use daily originally came from research! In this talk we’ll learn why you might want to read and implement research papers, how to read them for relevant information, and how to translate what they describe into code and test the results. Finally, we’ll discuss examples of research implementation I’ve been involved in and the relationships I’ve built with researchers in the process. As you might imagine, I think this rocks! ### The World’s First$9 Computer is Shipping Today!

Sunday, September 27th, 2015

From the post:

Remember Project: C.H.I.P. ?

A $9 Linux-based, super-cheap computer that raised some$2 Million beyond a pledge goal of just $50,000 on Kickstarter will be soon in your pockets. Four months ago, Dave Rauchwerk, CEO of Next Thing Co., utilized the global crowd-funding corporation ‘Kickstarter’ for backing his project C.H.I.P., a fully functioning computer that offers more than what you could expect for just$9.

See Khyati’s post for technical specifications.

Security by secrecy is meaningless when potential hackers (14-64) number 4.8 billion.

With enough hackers, all bugs can be found.

### Solving the Stable Marriage problem…

Friday, August 21st, 2015

With all the Ashley Madison hack publicity, I didn’t know there was a “stable marriage problem.” 😉

Turns out is it like the Eight-Queens problem. Is is a “problem” but it isn’t one you are likely to encounter outside of a CS textbook.

Yan sets up the problem with this quote from Wikipedia:

The stable marriage problem is commonly stated as:

Given n men and n women, where each person has ranked all members of the opposite sex with a unique number between 1 and n in order of preference, marry the men and women together such that there are no two people of opposite sex who would both rather have each other than their current partners. If there are no such people, all the marriages are “stable”. (It is assumed that the participants are binary gendered and that marriages are not same-sex).

The wording is a bit awkward. I would rephrase it to say that for no pair, both partners prefer some other partner. One of the partner’s can prefer someone else, but if the someone else does not share that preference, both marriages are “stable.”

The Wikipedia article does observe:

While the solution is stable, it is not necessarily optimal from all individuals’ points of view.

Yan sets up the problem and then walks through the required code.

Enjoy!

### Free Packtpub Books (Legitimate Ones)

Thursday, August 20th, 2015

Packtpub Books is running a “free book per day” event. Most of you know Packtpub already so I won’t belabor the quality of their publications, etc.

The important news is that for 24 hours each day in August, Packtpub Books is offering a different book for free download! The current free book offer appears to expire at the end of August, 2015.

Packtpub Books – Free Learning

This is a great way to introduce non-Packtpub customers to Packtpub publications.

### The Life Cycle of Programming Languages

Tuesday, June 30th, 2015

I don’t know that you will agree with Betsy’s conclusion but it is an interesting read.

Fourteen years ago the authors of the Agile Manifesto said unto us: all technical problems are people problems that manifest technically. In doing so they repeated what Peopleware’s DeMarco and Lister had said fourteen years before that. We cannot break the endless cycle of broken frameworks and buggy software by pretending that broken, homogenous [sic] communities can produce frameworks that meet the varied needs of a broad developer base. We have known this for three decades.

The “homogeneous community” in question is, of course, white males.

I have no idea if the founders of the languages she mentions are all white males or not. But for purposes of argument, let’s say that the founding communities in question are exclusively white males. And intentionally so.

OK, where is the comparison case of language development that demonstrates a more gender, racial, sexual orientation, religious, inclusive group would produce less broken frameworks and less buggy software, but some specified measure?

I understand the point that frameworks and code are currently broken and buggy, no argument there. No need to repeat that or come up with new examples.

The question that interests me and I suspect would interest developers and customers alike, is where are the frameworks or code that is less buggy because they were created by more inclusive communities?

Inclusion will sell itself, quickly, if the case can be made that inclusive communities produce more useful frameworks or less buggy code.

In making the case for inclusion, citing studies that groups are more creative when diverse isn’t enough. Point to the better framework or less buggy code created by a diverse community. That should not be hard to do, assuming such evidence exists.

Make no mistake, I think discrimination on the basis of gender, race, sexual orientation, religion, etc. are not only illegal, they are immoral. However, the case for non-discrimination is harmed by speculative claims for improved results that are not based on facts.

Where are those facts? I would love to be able to cite them.

PS: Flames will be deleted. With others I fought gender/racial discrimination in organizing garment factories where the body heat of the workers was the only heat in the winter. Only to be betrayed by a union more interested in dues than justice for workers. Defeating discrimination requires facts, not rhetoric. (Recalling it was Brown vs. Board of Education that pioneered the use of social studies data in education litigation. They offered facts, not opinions.)