Archive for the ‘Computer Science’ Category

Lisp at the Frontier of Computation

Saturday, December 9th, 2017


Since the 1950s, Lisp has been used to describe and calculate in cutting-edge fields like artificial intelligence, robotics, symbolic mathematics, and advanced optimizing compilers. It is no surprise that Lisp has also found relevance in quantum computation, both in academia and industry. Hosted at Rigetti Computing, a quantum computing startup in Berkeley, Robert Smith will provide a pragmatic view of the technical, sociological, and psychological aspects of working with an interdisciplinary team, writing Lisp, to build the next generation of technology resource: the quantum computer.

ABOUT THE SPEAKER: Robert has been using Lisp for over decade, and has been fortunate to work with and manage expert teams of Lisp programmers to build embedded fingerprint analysis systems, machine learning-based product recommendation software, metamaterial phased-array antennas, discrete differential geometric computer graphics software, and now quantum computers. As Director of Software Engineering, Robert is responsible for building the publicly available Rigetti Forest platform, powered by both a real quantum computer and one of the fastest single-node quantum computer simulators in the world.

Video notes mention “poor audio quality.” Not the best but clear and audible to me.

The coverage of the quantum computer work is great but mostly a general promotion of Lisp.

Important links:

Forest (beta) Forest provides development access to our 30-qubit simulator the Quantum Virtual Machine ™ and limited access to our quantum hardware systems for select partners. Workshop video plus numerous other resources.

A Practical Quantum Instruction Set Architecture by Robert S. Smith, Michael J. Curtis, William J. Zeng. (speaker plus two of his colleagues)

The Computer Science behind a modern distributed data store

Thursday, December 7th, 2017

From the description:

What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are at least three necessary ingredients which are everything else than trivial to combine and of course even more challenging when heading for an acceptable performance.

Over the past years there has been significant progress in respect in both the science and practical implementations of such data stores. In his talk Max Neunhöffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores.

Topics are:

  • Challenges in developing a distributed, resilient data store
  • Consensus, distributed transactions, distributed query optimization and execution
  • The inner workings of ArangoDB, Cassandra, Cockroach and RethinkDB

The talk will touch complex and difficult computer science, but will at the same time be accessible to and enjoyable by a wide range of developers.

I haven’t found the slides for this presentation but did stumble across ArangoDB Tech Talks and Slides.

Neunhöffer’s presentation will make you look at ArangoDB more closely.

So You Want to be a WIZARD [Spoiler Alert: It Requires Work]

Monday, November 20th, 2017

So You Want to be a WIZARD by Julia Evans.

I avoid using terms like inspirational, transforming, etc. because it is so rare that software, projects, presentations merit merit those terms.

Today I am making an exception to that rule to say:

So You Want to be a Wizard by Julia Evans can transform your work in computer science.

Notice the use of “can” in that sentence. No guarantees because unlike many promised solutions, Julia says up front that hard work is required to use her suggestions successfully.

That’s right. If these methods don’t work for you it will be because you did not apply them. (full stop)

No guarantees you will get praise, promotions, recognition, etc., as a result of using Julia’s techniques, but you will be a wizard none the less.

One consolation is that wizards rarely notice back-biters, office sycophants, and a range of other toxic co-workers. They are too busy preparing themselves to answer the next technical issue that requires a wizard.

10 Papers Every Developer Should Read (At Least Twice) [With Hyperlinks]

Thursday, November 16th, 2017

10 Papers Every Developer Should Read (At Least Twice) by Michael Feathers

Feathers omits hyperlinks for the 10 papers every developer should read, at least twice.

Hyperlinks eliminate searches by every reader, saving them time and load on their favorite search engine, not to mention providing access more quickly. Feathers’ list with hyperlinks follows.

Most are easy to read but some are rough going – they drop off into math after the first few pages. Take the math to tolerance and then move on. The ideas are the important thing.

See Feather’s post for his comments on each paper.

Even a shallow web composed of hyperlinks is better than no web at all.

A Primer for Computational Biology

Thursday, November 9th, 2017

A Primer for Computational Biology by Shawn T. O’Neil.

From the webpage:

A Primer for Computational Biology aims to provide life scientists and students the skills necessary for research in a data-rich world. The text covers accessing and using remote servers via the command-line, writing programs and pipelines for data analysis, and provides useful vocabulary for interdisciplinary work. The book is broken into three parts:

  1. Introduction to Unix/Linux: The command-line is the “natural environment” of scientific computing, and this part covers a wide range of topics, including logging in, working with files and directories, installing programs and writing scripts, and the powerful “pipe” operator for file and data manipulation.
  2. Programming in Python: Python is both a premier language for learning and a common choice in scientific software development. This part covers the basic concepts in programming (data types, if-statements and loops, functions) via examples of DNA-sequence analysis. This part also covers more complex subjects in software development such as objects and classes, modules, and APIs.
  3. Programming in R: The R language specializes in statistical data analysis, and is also quite useful for visualizing large datasets. This third part covers the basics of R as a programming language (data types, if-statements, functions, loops and when to use them) as well as techniques for large-scale, multi-test analyses. Other topics include S3 classes and data visualization with ggplot2.

Pass along to life scientists and students.

This isn’t the primer that separates the CS material from domain specific examples and prose. Adaptation to another domain is a question of re-writing.

I assume an adaptable primer wasn’t the author’s intention and so that isn’t a criticism but an observation that basic material is written over and over again, needlessly.

I first saw this in a tweet by Christophe Lalanne.

3 Reasons to Read: Algorithms to Live By

Monday, April 24th, 2017

How Algorithms can untangle Human Questions. Interview with Brian Christian by Roberto V. Zican.

The entire interview is worth your study but the first question and answer establish why you should read Algorithms to Live By:

Q1. You have worked with cognitive scientist Tom Griffiths (professor of psy­chol­ogy and cognitive science at UC Berkeley) to show how algorithms used by computers can also untangle very human questions. What are the main lessons learned from such a joint work?

Brian Christian: I think ultimately there are three sets of insights that come out of the exploration of human decision-making from the perspective of computer science.

The first, quite simply, is that identifying the parallels between the problems we face in everyday life and some of the canonical problems of computer science can give us explicit strategies for real-life situations. So-called “explore/exploit” algorithms tell us when to go to our favorite restaurant and when to try something new; caching algorithms suggest — counterintuitively — that the messy pile of papers on your desk may in fact be the optimal structure for that information.

Second is that even in cases where there is no straightforward algorithm or easy answer, computer science offers us both a vocabulary for making sense of the problem, and strategies — using randomness, relaxing constraints — for making headway even when we can’t guarantee we’ll get the right answer every time.

Lastly and most broadly, computer science offers us a radically different picture of rationality than the one we’re used to seeing in, say, behavioral economics, where humans are portrayed as error-prone and irrational. Computer science shows us that being rational means taking the costs of computation — the costs of decision-making itself — into account. This leads to a much more human, and much more achievable picture of rationality: one that includes making mistakes and taking chances.
… (emphasis in original)

After the 2016 U.S. presidential election, I thought the verdict that humans are error-prone and irrational was unassailable.

Looking forward to the use of a human constructed lens (computer science) to view “human questions.” There are answers to “human questions” baked into computer science so watching the authors unpack those will be an interesting read. (Waiting for my copy to arrive.)

Just so you know, the Picador edition is a reprint. It was originally published by William Collins, 21/04/2016 in hardcover, see: Algorithms to Live By, a short review by Roberto Zicari, October 24, 2016.

Sci Hub It!

Friday, April 7th, 2017

Sci Hub It!

Simple add-on to make it easier to use Sci-Hub.

If you aren’t already using this plug-in for Firefox you should be.

Quite handy!


Notes to (NUS) Computer Science Freshmen…

Monday, March 13th, 2017

Notes to (NUS) Computer Science Freshmen, From The Future

From the intro:

Early into the AY12/13 academic year, Prof Tay Yong Chiang organized a supper for Computer Science freshmen at Tembusu College. The bunch of seniors who were gathered there put together a document for NUS computing freshmen. This is that document.

Feel free to create a pull request to edit or add to it, and share it with other freshmen you know.

There is one sad note:

The Art of Computer Programming (a review of everything in Computer Science; pretty much nobody, save Knuth, has finished reading this)

When you think about the amount of time Knuth has spent researching, writing and editing The Art of Computer Programming (TAOCP), it doesn’t sound unreasonable to expect others, a significant number of others, to have read it.

Any online reading groups focused on TAOCP?

New Spaceship Speed in Conway’s Game of Life

Saturday, January 14th, 2017

New Spaceship Speed in Conway’s Game of Life by Alexy Nigin.

From the post:

In this article, I assume that you have basic familiarity with Conway’s Game of Life. If this is not the case, you can try reading an explanatory article but you will still struggle to understand much of the following content.

The day before yesterday forums saw a new member named zdr. When we the lifenthusiasts meet a newcomer, we expect to see things like “brand new” 30-cell 700-gen methuselah and then have to explain why it is not notable. However, what zdr showed us made our jaws drop.

It was a 28-cell c/10 orthogonal spaceship:

An animated image of the spaceship

… (emphasis in the original)

The mentioned introduction isn’t sufficient to digest the material in this post.

There is a wealth of material available on cellular automata (the Game of Life is one).

LifeWiki is one and Complex Cellular Automata is another. While not exhaustive of all there is to know about cellular automata, familiarity with take some time and skill.

Still, I offer this as encouragement that fundamental discoveries remain to be made.

But if and only if you reject conventional wisdom that prevents you from looking.

D-Wave Just Open-Sourced Quantum Computing [DC Beltway Parking Lot Distraction]

Friday, January 13th, 2017

D-Wave Just Open-Sourced Quantum Computing by Dom Galeon.

D-Wave has just released a welcome distraction for CS types sitting in the DC Beltway Parking Lot on January 20-21, 2017. (I assuming you brought extra batteries for your laptop.) After you run out of gas, your laptop will be running on battery power alone.

Just remember to grab a copy of Qbsolv before you leave for the tailgate/parking lot party on the Beltway.

A software tool known as Qbsolv allows developers to program D-Wave’s quantum computers even without knowledge of quantum computing. It has already made it possible for D-Wave to work with a bunch of partners, but the company wants more. “D-Wave is driving the hardware forward,” Bo Ewald, president of D-Wave International, told Wired. “But we need more smart people thinking about applications, and another set thinking about software tools.”

To that end, D-Wave has open-sourced Qbsolv, making it possible for anyone to freely share and modify the software. D-Wave hopes to build an open source community of sorts for quantum computing. Of course, to actually run this software, you’d need access to a piece of hardware that uses quantum particles, like one of D-Wave’s quantum computers. However, for the many who don’t have that access, the company is making it possible to download a D-Wave simulator that can be used to test Qbsolv on other types of computers.

This open-source Qbsolv joins an already-existing free software tool called Qmasm, which was developed by one of Qbsolv’s first users, Scott Pakin of Los Alamos National Laboratory. “Not everyone in the computer science community realizes the potential impact of quantum computing,” said mathematician Fred Glover, who’s been working with Qbsolv. “Qbsolv offers a tool that can make this impact graphically visible, by getting researchers and practitioners involved in charting the future directions of quantum computing developments.”

D-Wave’s machines might still be limited to solving optimization problems, but it’s a good place to start with quantum computers. Together with D-Wave, IBM has managed to develop its own working quantum computer in 2000, while Google teamed up with NASA to make their own. Eventually, we’ll have a quantum computer that’s capable of performing all kinds of advanced computing problems, and now you can help make that happen.

From the github page:

qbsolv is a metaheuristic or partitioning solver that solves a potentially large quadratic unconstrained binary optimization (QUBO) problem by splitting it into pieces that are solved either on a D-Wave system or via a classical tabu solver.

The phrase, “…might still be limited to solving optimization problems…” isn’t as limiting as it might appear.

A recent (2014) survey of quadratic unconstrained binary optimization (QUBO), The Unconstrained Binary Quadratic Programming Problem: A Survey runs some thirty-three pages and should keep you occupied however long you sit on the DC Beltway.

From page 10 of the survey:

Kochenberger, Glover, Alidaee, and Wang (2005) examine the use of UBQP as a tool for clustering microarray data into groups with high degrees of similarity.

Where I read one person’s “similarity” to be another person’s test of “subject identity.”

PS: Enjoy the DC Beltway. You may never see it motionless ever again.

OpenTOC (ACM SIG Proceedings – Free)

Sunday, January 1st, 2017


From the webpage:

ACM OpenTOC is a unique service that enables Special Interest Groups to generate and post Tables of Contents for proceedings of their conferences enabling visitors to download the definitive version of the contents from the ACM Digital Library at no charge.

Downloads of these articles are captured in official ACM statistics, improving the accuracy of usage and impact measurements. Consistently linking to definitive versions of ACM articles should reduce user confusion over article versioning.

Conferences are listed by year, 2014 – 2016 and by event.

A step in the right direction.

Do you know if the digital library allows bulk downloading of search result metadata?

It didn’t the last time I had a digital library subscription. Contacting the secret ACM committee that decides on web features was verboten.

Enjoy this improvement in access while waiting for ACM access bottlenecks to wither and die.

Continuous Unix commit history from 1970 until today

Thursday, December 29th, 2016

Continuous Unix commit history from 1970 until today

From the webpage:

The history and evolution of the Unix operating system is made available as a revision management repository, covering the period from its inception in 1970 as a 2.5 thousand line kernel and 26 commands, to 2016 as a widely-used 27 million line system. The 1.1GB repository contains about half a million commits and more than two thousand merges. The repository employs Git system for its storage and is hosted on GitHub. It has been created by synthesizing with custom software 24 snapshots of systems developed at Bell Labs, the University of California at Berkeley, and the 386BSD team, two legacy repositories, and the modern repository of the open source FreeBSD system. In total, about one thousand individual contributors are identified, the early ones through primary research. The data set can be used for empirical research in software engineering, information systems, and software archaeology.

You can read more details about the contents, creation, and uses of this repository through this link.

Two repositories are associated with the project:

  • unix-history-repo is a repository representing a reconstructed version of the Unix history, based on the currently available data. This repository will be often automatically regenerated from scratch, so this is not a place to make contributions. To ensure replicability its users are encouraged to fork it or archive it.
  • unix-history-make is a repository containing code and metadata used to build the above repository. Contributions to this repository are welcomed.

Not everyone will find this exciting but this rocks as a resource for:

empirical research in software engineering, information systems, and software archaeology

Need to think seriously about putting this on a low-end laptop and sealing it up in a Faraday cage.

Just in case. 😉

Low fat computing

Thursday, December 22nd, 2016

Low fat computing by Karsten Schmidt

A summary of the presentation by Schmidt by Malcolm Sparks, along with the presentation itself.

Lots of strange and 3-D printable eye candy for the first 15 minutes or so with Schmidt’s background. Starts to really rock around 20 minutes in with Forth code and very low level coding.

To get a better idea of what Schmidt has been doing, see his website:, or his Forth repl in Javascript,, or his GitHub repository or at: Github:

Stop by at although the material there looks dated.

Operating Systems Design and Implementation (12th USENIX Symposium)

Thursday, November 17th, 2016

Operating Systems Design and Implementation (12th USENIX Symposium) – Savannah, GA, USA, November 2-4, 2016.

Message from the OSDI ’16 Program Co-Chairs:

We are delighted to welcome to you to the 12th USENIX Symposium on Operating Systems Design and Implementation, held in Savannah, GA, USA! This year’s program includes a record high 47 papers that represent the strength of our community and cover a wide range of topics, including security, cloud computing, transaction support, storage, networking, formal verification of systems, graph processing, system support for machine learning, programming languages, troubleshooting, and operating systems design and implementation.

Weighing in at seven hundred and ninety-seven (797) pages, this tome will prove more than sufficient to avoid annual family arguments during the holiday season.

Not to mention this is an opportunity to hone your skills to a fine edge.

Understanding the fundamentals of attacks (Theory of Exploitation)

Thursday, November 3rd, 2016

Understanding the fundamentals of attacks – What is happening when someone writes an exploit? by Halvar Flake / Thomas Dullien.

The common “bag of tricks” as Halvar refers to them for hacking, does cover all the major data breaches for the last 24 months.

No zero-day exploits.

Certainly none of the deep analysis offered by Halvar here.

Still, you owe it to yourself and your future on one side or the other of computer security, to review these slides and references carefully.

Even though Halvar concludes (in part)

Exploitation is programming emergent weird machines.

It does not require EIP/RIP, and is not a bad of tricks.

Theory of exploitation is still in embryonic stage.

Imagine the advantages of having mastered the art of exploitation theory at its inception.

In an increasingly digital world, you may be worth your own weight in gold. 😉

PS: Specifying the subject identity properties of exploits will assist in organizing them for future use/defense.

One expert hacker is like a highly skilled warrior.

Making exploits easy to discover/use by average hackers is like a skilled warrior facing a company of average fighters.

The outcome will be bloody, but never in doubt.

The Hanselminutes Podcast

Friday, August 26th, 2016

The Hanselminutes Podcast: Fresh Air for Developers by Scott Hanselman.

I went looking for Felienne’s podcast on code smells and discovered along with it, The Hanselminutes Podcast: Fresh Air for Developers!

Felienne’s podcast is #542 so there is a lot of content to enjoy! (I checked the archive. Yes, there really are 542 episodes as of today.)

Exploring Code Smells in code written by Children

Friday, August 26th, 2016

Exploring Code Smells in code written by Children (podcast) by Dr. Felienne

From the description:

Felienne is always learning. In exploring her PhD dissertation and her public speaking experience it’s clear that she has no intent on stopping! Most recently she’s been exploring a large corpus of Scratch programs looking for Code Smells. How do children learn how to code, and when they do, does their code “smell?” Is there something we can do when teaching to promote cleaner, more maintainable code?

Felienne discusses a paper due to appear in September on analysis of 250K Scratch programs for code smells.

Thoughts on teaching programmers to detect bug smells?

If You Believe In OpenAccess, Do You Practice OpenAccess?

Wednesday, June 15th, 2016


From the webpage:

CSC Open-Access Library aim to maintain and develop access to journal publication collections as a research resource for students, teaching staff, researchers and industrialists.

You can see a complete listing of the journals here.

Before you protest these are not Science or Nature, remember that Science and Nature did not always have the reputations they do today.

Let the quality of your work bolster the reputations of open access publications and attract others to them.

How to Run a Russian Hacking Ring [Just like Amway, Mary Kay … + Career Advice]

Sunday, June 12th, 2016

How to Run a Russian Hacking Ring by Kaveh Waddell.

From the post:

A man with intense eyes crouches over a laptop in a darkened room, his face and hands hidden by a black ski mask and gloves. The scene is lit only by the computer screen’s eerie glow.

Exaggerated portraits of malicious hackers just like this keep popping up in movies and TV, despite the best efforts of shows like Mr. Robot to depict hackers in a more realistic way. Add a cacophony of news about data breaches that have shaken the U.S. government, taken entire hospital systems hostage, and defrauded the international banking system, and hackers start to sound like omnipotent super-villains.

But the reality is, as usual, less dramatic. While some of the largest cyberattacks have been the work of state-sponsored hackers—the OPM data breach that affected millions of Americans last year, for example, or the Sony hack that revealed Hollywood’s intimate secrets​—the vast majority of the world’s quotidian digital malice comes from garden-variety hackers.

What a downer this would be at career day at the local high school.

Yes, you too can be a hacker but it’s as dull as anything you have seen in Dilbert.

Your location plays an important role in whether Russian hacking ring employment is in your future. Kaveh reports:

Even the boss’s affiliates, who get less than half of each ransom that they extract, make a decent wage. They earned an average of 600 dollars a month, or about 40 percent more than the average Russian worker.

$600/month is ok, if you are living in Russia, not so hot if you aspire to Venice Beach. (It’s too bad the beach cam doesn’t pan and zoom.)

The level of technical skills required for low-lying fruit hacking is falling, meaning more competitors for the low-end. Potential profits are going to fall even further.

The no liability for buggy software will fall sooner rather than later and skilled hackers (I mean security researchers) will find themselves in demand by both plaintiffs and defendants. You will earn more money if you can appear in court, some expert witnesses make $600/hour or more. (Compare the $600/month in Russia.)

Even if you can’t appear in court, for reasons that seem good to you, fleshing out the details of hacks is going to be on demand from all sides.

You may start at the shallow end of the pool but resolve to not stay there. Read deeply, practice everyday, start current on new developments and opportunities, contribute to online communities.

“This guy’s arrogance takes your breath away”

Tuesday, May 31st, 2016

“This guy’s arrogance takes your breath away” – Letters between John W Backus and Edsger W Dijkstra, 1979 by Jiahao Chen.

From the post:

Item No. 155: Correspondence with Edsger Dijkstra. 1979

At the time of this correspondence, Backus had just won the 1977 Turing Award and had chosen to talk about his then-current research on functional programming (FP) for his award lecture in Seattle. See this pdf of the published version, noting that Backus himself described “significant differences” with the talk that was actually given. Indeed, the transcript at the LoC was much more casual and easier to follow.

Dijkstra, in his characteristically acerbic and hyperbolic style, wrote a scathing public review (EWD 692) and some private critical remarks in what looks like a series of letters with Backus.

From what I can tell, these letters are not part of the E. W. Dijkstra archives at UT Austin, nor are they available online anywhere else. So here they are for posterity.

You won’t find Long form exchanges such as these in present-day near instant bait-reply cycles of email messages.

That’s unfortunate.

Chen has created a Github repository if you are interested in transcribing pre-email documents.

You can help create better access to the history of computer science and see how to craft a cutting remark, as opposed to blurting out the first insult that comes to mind.


Tip #20: Play with Racket [Computer Science for Everyone?]

Saturday, January 30th, 2016

Tip #20: Play with Racket by Aaron Quint and Michael R. Bernstein.

From the post:

Racket is a programming language in the Lisp tradition that is different from other programming languages in a few important ways. It can be any language you want – because Racket is heavily used for pedagogy, it has evolved into a suite of languages and tools that you can use to explore as many different programming paradigms as you can think of. You can also download it and play with it right now, without installing anything else, or knowing anything at all about computers or programming. Watching Matthias Felleisen’s “big-bang: the world, universe, and network in the programming language” talk will give you an idea of how Racket can be used to help people learn how to think about mathematics, computation, and more. Try it out even if you “hate Lisp” or “don’t know how to program” – it’s really a lot of fun.

Aaron and Michael scooped President Obama’s computer science skills for everyone by a day:

President Barack Obama said Saturday he will ask Congress for billions of dollars to help students learn computer science skills and prepare for jobs in a changing economy.

“In the new economy, computer science isn’t an optional skill. It’s a basic skill, right along with the three R’s,” Obama said in his weekly radio and Internet address….(Obama Wants $4B to Help Students Learn Computer Science)

The “computer science for everyone” is a popular chant but consider the Insecure Internet of Things (IIoT).

Will minimal computer science skills increase or decrease the level of security for the IIoT?

That’s what I think too.

Removal of IoT components is the only real defense. Expect a vibrant cottage industry to grow up around removing IoT components.

Everything You Know About Latency Is Wrong

Thursday, December 24th, 2015

Everything You Know About Latency Is Wrong by Tyler Treat.

From the post:

Okay, maybe not everything you know about latency is wrong. But now that I have your attention, we can talk about why the tools and methodologies you use to measure and reason about latency are likely horribly flawed. In fact, they’re not just flawed, they’re probably lying to your face.

When I went to Strange Loop in September, I attended a workshop called “Understanding Latency and Application Responsiveness” by Gil Tene. Gil is the CTO of Azul Systems, which is most renowned for its C4 pauseless garbage collector and associated Zing Java runtime. While the workshop was four and a half hours long, Gil also gave a 40-minute talk called “How NOT to Measure Latency” which was basically an abbreviated, less interactive version of the workshop. If you ever get the opportunity to see Gil speak or attend his workshop, I recommend you do. At the very least, do yourself a favor and watch one of his recorded talks or find his slide decks online.

The remainder of this post is primarily a summarization of that talk. You may not get anything out of it that you wouldn’t get out of the talk, but I think it can be helpful to absorb some of these ideas in written form. Plus, for my own benefit, writing about them helps solidify it in my head.

Great post, not only for the discussion of latency but for two extensions to the admonition (Moon is a Harsh Mistress) “Always cut cards:”

  • Always understand the nature of your data.
  • Always understand the nature your methodology.

If you fail at either of those, the results presented to you or that you present to others may or may not be true, false or irrelevant.

Treat’s post is just one example in a vast sea of data and methodologies which are just as misleading if not more so.

If you need motivation to put in the work, how’s your comfort level with being embarrassed in public? Like someone demonstrating your numbers are BS.

Readings in Database Systems, 5th Edition (Kindle Stuffer)

Tuesday, December 15th, 2015

Readings in Database Systems, 5th Edition, Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker, editors.

From the webpage:

  1. Preface [HTML] [PDF]
  2. Background introduced by Michael Stonebraker [HTML] [PDF]
  3. Traditional RDBMS Systems introduced by Michael Stonebraker [HTML] [PDF]
  4. Techniques Everyone Should Know introduced by Peter Bailis [HTML] [PDF]
  5. New DBMS Architectures introduced by Michael Stonebraker [HTML] [PDF]
  6. Large-Scale Dataflow Engines introduced by Peter Bailis [HTML] [PDF]
  7. Weak Isolation and Distribution introduced by Peter Bailis [HTML] [PDF]
  8. Query Optimization introduced by Joe Hellerstein [HTML] [PDF]
  9. Interactive Analytics introduced by Joe Hellerstein [HTML] [PDF]
  10. Languages introduced by Joe Hellerstein [HTML] [PDF]
  11. Web Data introduced by Peter Bailis [HTML] [PDF]
  12. A Biased Take on a Moving Target: Complex Analytics
    by Michael Stonebraker [HTML] [PDF]
  13. A Biased Take on a Moving Target: Data Integration
    by Michael Stonebraker [HTML] [PDF]

Complete Book: [HTML] [PDF]

Readings Only: [HTML] [PDF]

Previous Editions: [HTML]

Citations to the “reading” do not present themselves as hyperlinks but they are.

If you are giving someone a Kindle this Christmas, consider pre-loading Readings in Database Systems, along with the readings as a Kindle stuffer.

The Moral Failure of Computer Scientists [Warning: Scam Alert!]

Sunday, December 13th, 2015

The Moral Failure of Computer Scientists by Kaveh Waddell.

From the post:

Computer scientists and cryptographers occupy some of the ivory tower’s highest floors. Among academics, their work is prestigious and celebrated. To the average observer, much of it is too technical to comprehend. The field’s problems can sometimes seem remote from reality.

But computer science has quite a bit to do with reality. Its practitioners devise the surveillance systems that watch over nearly every space, public or otherwise—and they design the tools that allow for privacy in the digital realm. Computer science is political, by its very nature.

That’s at least according to Phillip Rogaway, a professor of computer science at the University of California, Davis, who has helped create some of the most important tools that secure the Internet today. Last week, Rogaway took his case directly to a roomful of cryptographers at a conference in Auckland, New Zealand. He accused them of a moral failure: By allowing the government to construct a massive surveillance apparatus, the field had abused the public trust. Rogaway said the scientists had a duty to pursue social good in their work.

He likened the danger posed by modern governments’ growing surveillance capabilities to the threat of nuclear warfare in the 1950s, and called upon scientists to step up and speak out today, as they did then.

I spoke to Rogaway about why cryptographers fail to see their work in moral terms, and the emerging link between encryption and terrorism in the national conversation. A transcript of our conversation appears below, lightly edited for concision and clarity.

I don’t disagree with Rogaway that all science and technology is political. I might use the term social instead but I agree, there are no neutral choices.

Having said that, I do disagree that Rogaway has the standing to pre-package a political stance colored as “morals” and denounce others as “immoral” if they disagree.

It is one of the oldest tricks in rhetoric but quite often effective, which is why people keep using it.

If Rogaway is correct that CS and technology are political, then his stance for a particular take on government, surveillance and cryptography is equally political.

Not that I disagree with his stance, but I don’t consider it be a moral choice.

Anything you can do to impede, disrupt or interfere with any government surveillance is fine by me. I won’t complain. But that’s because government surveillance, the high-tech kind, is a waste of time and effort.

Rogaway uses scientists who spoke out in the 1950’s about the threat of nuclear warfare as an example. Some example.

The Federation of American Scientists estimates that as of September 2015, there are approximately 15,800 nuclear weapons in the world.

Hmmm, doesn’t sound like their moral outrage was very effective does it?

There will be sessions, presentations, conferences, along with comped travel and lodging, publications for tenure, etc., but the sum of the discussion of morality in computer science with be largely the same.

The reason for the sameness of result is that discussions, papers, resolutions and the rest, aren’t nearly as important as the ethical/moral choices you make in the day to day practice as a computer scientist.

Choices in the practice of computer science make a difference, discussions of fictional choices don’t. It’s really that simple.*

*That’s not entirely fair. The industry of discussing moral choices without making any of them is quite lucrative and it depletes the bank accounts of those snared by it. So in that sense it does make a difference.

Order of Requirements Matter

Tuesday, December 8th, 2015

Sam Lightstone posted a great illustration of why the order of requirements can matter to Twitter:


Visualizations rarely get much clearer.

You could argue that Minard’s map of Napoleon’s invasion of Russia is equally clear:


But Minard drew with the benefit of hindsight, not foresight.

The Laws of Robotics, on the other hand, have predictive value for the different orders of requirements.

I don’t know how many requirements Honeywell had for the Midas and Midas Black Gas Detectors but you can bet IP security was near the end of the list, if explicit at all.

IP security should be #1 with a bullet, especially for devices that detect Ammonia (caustic, hazarous), Arsine (highly toxic, flammable), Chlorine (extremely dangerous, poisonous for all living organisms), Hydrogen cyanide, and Hydrogen flouride (“Hydrogen fluoride is a highly dangerous gas, forming corrosive and penetrating hydrofluoric acid upon contact with living tissue. The gas can also cause blindness by rapid destruction of the corneas.”)

When IP security is not the first requirement, it’s not hard to foresee the outcome, an Insecure Internet of Things.

Is that what we want?

arXiv Sanity Preserver

Sunday, November 29th, 2015

arXiv Sanity Preserver by Andrej Karpathy.

From the webpage:

There are way too many arxiv papers, so I wrote a quick webapp that lets you search and sort through the mess in a pretty interface, similar to my pretty conference format.

It’s super hacky and was written in 4 hours. I’ll keep polishing it a bit over time perhaps but it serves its purpose for me already. The code uses Arxiv API to download the most recent papers (as many as you want – I used the last 1100 papers over last 3 months), and then downloads all papers, extracts text, creates tfidf vectors for each paper, and lastly is a flask interface for searching through and filtering similar papers using the vectors.

Main functionality is a search feature, and most useful is that you can click “sort by tfidf similarity to this”, which returns all the most similar papers to that one in terms of tfidf bigrams. I find this quite useful.


You can see this rather remarkable tool online at:

Beyond its obvious utility for researchers, this could be used as a framework for experimenting with other similarity measures.


I first saw this in a tweet by Lynn Cherny.

Best Paper Awards in Computer Science (2014)

Friday, November 27th, 2015

Best Paper Awards in Computer Science (2014)

From the webpage:

Jeff Huang’s list of the best paper awards from 29 CS conferences since 1996 up to and including 2014.

I saw a tweet about Jeff’s site being updated to include papers from 2014.

If you are looking for reading material in a particular field, this is a good place to start.

For a complete list of the organizations, conferences as expanded abbreviations: see: Best Paper Awards in Computer Science (2013). None of them have changed so I didn’t see the point of repeating them.

LIQUi|> – A Quantum Computing Simulator

Friday, November 13th, 2015

With quantum computing simulator, Microsoft offers a sneak peek into future of computing by Allison Linn.

From the post:

Next week, at the SuperComputing 2015 conference in Austin, Texas, Dave Wecker, a lead architect on the QuArC team, will discuss the recent public release on GitHub of a suite of tools that allows computer scientists to simulate a quantum computer’s capabilities. That’s a crucial step in building the tools needed to run actual quantum computers.

“This is the closest we can get to running a quantum computer without having one,” said Wecker, who has helped develop the software.

The software is called Language-Integrated Quantum Operations, or LIQUi|>. The funky characters at the end refer to how a quantum operation is written in mathematical terms.

The researchers are hoping that, using LIQUi|>, computer scientists at Microsoft and other academic and research institutions will be able to perfect the algorithms they need to efficiently use a quantum computer even as the computers themselves are simultaneously being developed.

“We can actually debut algorithms in advance of running them on the computer,” Svore said.

As of today, November 13, 2015, LIQUi|> has only one (1) hit at GitHub. Will try back next week to see what the numbers look like then.

You won’t have a quantum computer by the holidays but you may have created your first quantum algorithm by then.


The Architecture of Open Source Applications

Thursday, November 12th, 2015

The Architecture of Open Source Applications

From the webpage:

Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

Our goal is to change that. In these two books, the authors of four dozen open source applications explain how their software is structured, and why. What are each program’s major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to these books provide unique insights into how they think.

If you are a junior developer, and want to learn how your more experienced colleagues think, these books are the place to start. If you are an intermediate or senior developer, and want to see how your peers have solved hard design problems, these books can help you too.

Follow us on our blog at, or on Twitter at @aosabook and using the #aosa hashtag.

I happened upon these four books because of a tweet that mentioned: Early Access Release of Allison Kaptur’s “A Python Interpreter Written in Python” Chapter, which I found to be the tenth chapter of “500 Lines.”

OK, but what the hell is “500 Lines?” Poking around a bit I found The Architecture of Open Source Applications.

Which is the source for the material I quote above.

Do you learn from example?

Let me give you the flavor of three of the completed volumes and the “500 Lines” that is in progress:

The Architecture of Open Source Applications: Elegance, Evolution, and a Few Fearless Hacks (vol. 1), from the introduction:

Carpentry is an exacting craft, and people can spend their entire lives learning how to do it well. But carpentry is not architecture: if we step back from pitch boards and miter joints, buildings as a whole must be designed, and doing that is as much an art as it is a craft or science.

Programming is also an exacting craft, and people can spend their entire lives learning how to do it well. But programming is not software architecture. Many programmers spend years thinking about (or wrestling with) larger design issues: Should this application be extensible? If so, should that be done by providing a scripting interface, through some sort of plugin mechanism, or in some other way entirely? What should be done by the client, what should be left to the server, and is “client-server” even a useful way to think about this application? These are not programming questions, any more than where to put the stairs is a question of carpentry.

Building architecture and software architecture have a lot in common, but there is one crucial difference. While architects study thousands of buildings in their training and during their careers, most software developers only ever get to know a handful of large programs well. And more often than not, those are programs they wrote themselves. They never get to see the great programs of history, or read critiques of those programs’ designs written by experienced practitioners. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

This book is our attempt to change that. Each chapter describes the architecture of an open source application: how it is structured, how its parts interact, why it’s built that way, and what lessons have been learned that can be applied to other big design problems. The descriptions are written by the people who know the software best, people with years or decades of experience designing and re-designing complex applications. The applications themselves range in scale from simple drawing programs and web-based spreadsheets to compiler toolkits and multi-million line visualization packages. Some are only a few years old, while others are approaching their thirtieth anniversary. What they have in common is that their creators have thought long and hard about their design, and are willing to share those thoughts with you. We hope you enjoy what they have written.

The Architecture of Open Source Applications: Structure, Scale, and a Few More Fearless Hacks (vol. 2), from the introduction:

In the introduction to Volume 1 of this series, we wrote:

Building architecture and software architecture have a lot in common, but there is one crucial difference. While architects study thousands of buildings in their training and during their careers, most software developers only ever get to know a handful of large programs well… As a result, they repeat one another’s mistakes rather than building on one another’s successes… This book is our attempt to change that.

In the year since that book appeared, over two dozen people have worked hard to create the sequel you have in your hands. They have done so because they believe, as we do, that software design can and should be taught by example—that the best way to learn how think like an expert is to study how experts think. From web servers and compilers through health record management systems to the infrastructure that Mozilla uses to get Firefox out the door, there are lessons all around us. We hope that by collecting some of them together in this book, we can help you become a better developer.

The Performance of Open Source Applications, from the introduction:

It’s commonplace to say that computer hardware is now so fast that most developers don’t have to worry about performance. In fact, Douglas Crockford declined to write a chapter for this book for that reason:

If I were to write a chapter, it would be about anti-performance: most effort spent in pursuit of performance is wasted. I don’t think that is what you are looking for.

Donald Knuth made the same point thirty years ago:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

but between mobile devices with limited power and memory, and data analysis projects that need to process terabytes, a growing number of developers do need to make their code faster, their data structures smaller, and their response times shorter. However, while hundreds of textbooks explain the basics of operating systems, networks, computer graphics, and databases, few (if any) explain how to find and fix things in real applications that are simply too damn slow.

This collection of case studies is our attempt to fill that gap. Each chapter is written by real developers who have had to make an existing system faster or who had to design something to be fast in the first place. They cover many different kinds of software and performance goals; what they have in common is a detailed understanding of what actually happens when, and how the different parts of large applications fit together. Our hope is that this book will—like its predecessor The Architecture of Open Source Applications—help you become a better developer by letting you look over these experts’ shoulders.

500 Lines or Less From the GitHub page:

Every architect studies family homes, apartments, schools, and other common types of buildings during her training. Equally, every programmer ought to know how a compiler turns text into instructions, how a spreadsheet updates cells, and how a database efficiently persists data.

Previous books in the AOSA series have done this by describing the high-level architecture of several mature open-source projects. While the lessons learned from those stories are valuable, they are sometimes difficult to absorb for programmers who have not yet had to build anything at that scale.

“500 Lines or Less” focuses on the design decisions and tradeoffs that experienced programmers make when they are writing code:

  • Why divide the application into these particular modules with these particular interfaces?
  • Why use inheritance here and composition there?
  • How do we predict where our program might need to be extended, and how can we make that easy for other programmers

Each chapter consists of a walkthrough of a program that solves a canonical problem in software engineering in at most 500 source lines of code. We hope that the material in this book will help readers understand the varied approaches that engineers take when solving problems in different domains, and will serve as a basis for projects that extend or modify the contributions here.

If you answered the question about learning from example with yes, adding these works to your read and re-read list.

BTW, for markup folks, check out Parsing XML at the Speed of Light by Arseny Kapoulkine.

Many hours of reading and keyboard pleasure await anyone using these volumes.

Visualizing What Your Computer (and Science) Ignore (mostly)

Thursday, November 12th, 2015

Deviation Magnification: Revealing Departures from Ideal Geometries by Neal Wadhwa, Tali Dekel, Donglai Wei, Frédo Durand, William T. Freeman.


Structures and objects are often supposed to have idealized geome- tries such as straight lines or circles. Although not always visible to the naked eye, in reality, these objects deviate from their idealized models. Our goal is to reveal and visualize such subtle geometric deviations, which can contain useful, surprising information about our world. Our framework, termed Deviation Magnification, takes a still image as input, fits parametric models to objects of interest, computes the geometric deviations, and renders an output image in which the departures from ideal geometries are exaggerated. We demonstrate the correctness and usefulness of our method through quantitative evaluation on a synthetic dataset and by application to challenging natural images.

The video for the paper is quite compelling:

Read the full paper here:

From the introduction to the paper:

Many phenomena are characterized by an idealized geometry. For example, in ideal conditions, a soap bubble will appear to be a perfect circle due to surface tension, buildings will be straight and planetary rings will form perfect elliptical orbits. In reality, however, such flawless behavior hardly exists, and even when invisible to the naked eye, objects depart from their idealized models. In the presence of gravity, the bubble may be slightly oval, the building may start to sag or tilt, and the rings may have slight perturbations due to interactions with nearby moons. We present Deviation Magnification, a tool to estimate and visualize such subtle geometric deviations, given only a single image as input. The output of our algorithm is a new image in which the deviations from ideal are magnified. Our algorithm can be used to reveal interesting and important information about the objects in the scene and their interaction with the environment. Figure 1 shows two independently processed images of the same house, in which our method automatically reveals the sagging of the house’s roof, by estimating its departure from a straight line.

Departures from “idealized geometry” make for captivating videos but there is a more subtle point that Deviation Magnification will help bring to the fore.

“Idealized geometry,” just like discrete metrics for attitude measurement or metrics of meaning, etc. are all myths. Useful myths as houses don’t (usually) fall down, marketing campaigns have a high degree of success, and engineering successfully relies on approximations that depart from the “real world.”

Science and computers have a degree of precision that has no counterpart in the “real world.”

Watch the video again if you doubt that last statement.

Whether you are using science and/or a computer, always remember that your results are approximations based upon approximations.

I first saw this in Four Short Links: 12 November 2015 by Nat Torkington.