Archive for the ‘Computer Science’ Category

Evidence for Power Laws – “…I work scientifically!”

Saturday, February 17th, 2018

Scant Evidence of Power Laws Found in Real-World Networks by Erica Klarreich.

From the post:

A paper posted online last month has reignited a debate about one of the oldest, most startling claims in the modern era of network science: the proposition that most complex networks in the real world — from the World Wide Web to interacting proteins in a cell — are “scale-free.” Roughly speaking, that means that a few of their nodes should have many more connections than others, following a mathematical formula called a power law, so that there’s no one scale that characterizes the network.

Purely random networks do not obey power laws, so when the early proponents of the scale-free paradigm started seeing power laws in real-world networks in the late 1990s, they viewed them as evidence of a universal organizing principle underlying the formation of these diverse networks. The architecture of scale-freeness, researchers argued, could provide insight into fundamental questions such as how likely a virus is to cause an epidemic, or how easily hackers can disable a network.

An informative and highly entertaining read that reminds me of an exchange between in The Never Ending Story between Atreyu and Engywook.

Engywook’s “scientific specie-ality” is the Southern Oracle. From the transcript:

Atreyu: Have you ever been to the Southern Oracle?

Engywook: Eh… what do YOU think? I work scientifically!

In the context of the movie, Engywook’s answer is deeply ambiguous.

Where do you land on the power law question?

Unfairness By Algorithm

Monday, February 5th, 2018

Unfairness By Algorithm: Distilling the Harms of Automated Decision-Making by Lauren Smith.

From the post:

Analysis of personal data can be used to improve services, advance research, and combat discrimination. However, such analysis can also create valid concerns about differential treatment of individuals or harmful impacts on vulnerable communities. These concerns can be amplified when automated decision-making uses sensitive data (such as race, gender, or familial status), impacts protected classes, or affects individuals’ eligibility for housing, employment, or other core services. When seeking to identify harms, it is important to appreciate the context of interactions between individuals, companies, and governments—including the benefits provided by automated decision-making frameworks, and the fallibility of human decision-making.

Recent discussions have highlighted legal and ethical issues raised by the use of sensitive data for hiring, policing, benefits determinations, marketing, and other purposes. These conversations can become mired in definitional challenges that make progress towards solutions difficult. There are few easy ways to navigate these issues, but if stakeholders hold frank discussions, we can do more to promote fairness, encourage responsible data use, and combat discrimination.

To facilitate these discussions, the Future of Privacy Forum (FPF) attempted to identify, articulate, and categorize the types of harm that may result from automated decision-making. To inform this effort, FPF reviewed leading books, articles, and advocacy pieces on the topic of algorithmic discrimination. We distilled both the harms and potential mitigation strategies identified in the literature into two charts. We hope you will suggest revisions, identify challenges, and help improve the document by contacting lsmith@fpf.org. In addition to presenting this document for consideration for the FTC Informational Injury workshop, we anticipate it will be useful in assessing fairness, transparency and accountability for artificial intelligence, as well as methodologies to assess impacts on rights and freedoms under the EU General Data Protection Regulation.

The primary attraction are two tables, Potential Harms from Automated Decision-Making and Potential Mitigation Sets.

Take the tables as a starting point for analysis.

Some “unfair” practices, such as increased auto insurance prices for night-shift workers, which results in differential access to insurance, is an actuarial question. Insurers are not public charities and can legally discriminate based on perceived risk.

GraphDBLP [“dblp computer science bibliography” as a graph]

Wednesday, January 31st, 2018

GraphDBLP: a system for analysing networks of computer scientists through graph databases by Mario Mezzanzanica, et al.

Abstract:

This paper presents GraphDBLP, a system that models the DBLP bibliography as a graph database for performing graph-based queries and social network analyses. GraphDBLP also enriches the DBLP data through semantic keyword similarities computed via word-embedding. In this paper, we discuss how the system was formalized as a multi-graph, and how similarity relations were identified through word2vec. We also provide three meaningful queries for exploring the DBLP community to (i) investigate author profiles by analysing their publication records; (ii) identify the most prolific authors on a given topic, and (iii) perform social network analyses over the whole community. To date, GraphDBLP contains 5+ million nodes and 24+ million relationships, enabling users to explore the DBLP data by referencing more than 3.3 million publications, 1.7 million authors, and more than 5 thousand publication venues. Through the use of word-embedding, more than 7.5 thousand keywords and related similarity values were collected. GraphDBLP was implemented on top of the Neo4j graph database. The whole dataset and the source code are publicly available to foster the improvement of GraphDBLP in the whole computer science community.

Although the article is behind a paywall, GraphDBLP as a tool is not! https://github.com/fabiomercorio/GraphDBLP.

From the webpage:

GraphDBLP is a tool that models the DBLP bibliography as a graph database for performing graph-based queries and social network analyses.

GraphDBLP also enriches the DBLP data through semantic keyword similarities computed via word-embedding.

GraphDBLP provides to users three meaningful queries for exploring the DBLP community:

  1. investigate author profiles by analysing their publication records;
  2. identify the most prolific authors on a given topic;
  3. perform social network analyses over the whole community;
  4. perform shortest-paths over DBLP (e.g., the shortest-path between authors, the analysis of co-author networks, etc.)

… (emphasis in original)

Sorry to see author, title, venue, publication, keyword all as flat strings but that’s not uncommon. Disappointing but not uncommon.

Viewing these flat strings as parts of structured representatives will be in addition to this default.

Not to minimize the importance of improving the usefulness of the dblp, but imagine integrating the GraphDBLP into your local library system. Without a massive data mapping project. That’s what lies just beyond the reach of this data project.

Don Knuth Needs Your Help

Monday, January 22nd, 2018

Donald Knuth Turns 80, Seeks Problem-Solvers For TAOCP

From the post:

An anonymous reader writes:

When 24-year-old Donald Knuth began writing The Art of Computer Programming, he had no idea that he’d still be working on it 56 years later. This month he also celebrated his 80th birthday in Sweden with the world premier of Knuth’s Fantasia Apocalyptica, a multimedia work for pipe organ and video based on the bible’s Book of Revelations, which Knuth describes as “50 years in the making.”

But Knuth also points to the recent publication of “one of the most important sections of The Art of Computer Programming” in preliminary paperback form: Volume 4, Fascicle 6: Satisfiability. (“Given a Boolean function, can its variables be set to at least one pattern of 0s and 1 that will make the function true?”)

Here’s an excerpt from its back cover:

Revolutionary methods for solving such problems emerged at the beginning of the twenty-first century, and they’ve led to game-changing applications in industry. These so-called “SAT solvers” can now routinely find solutions to practical problems that involve millions of variables and were thought until very recently to be hopelessly difficult.

“in several noteworthy cases, nobody has yet pointed out any errors…” Knuth writes on his site, adding “I fear that the most probable hypothesis is that nobody has been sufficiently motivated to check these things out carefully as yet.” He’s uncomfortable printing a hardcover edition that hasn’t been fully vetted, and “I would like to enter here a plea for some readers to tell me explicitly, ‘Dear Don, I have read exercise N and its answer very carefully, and I believe that it is 100% correct,'” where N is one of the exercises listed on his web site.

Elsewhere he writes that two “pre-fascicles” — 5a and 5B — are also available for alpha-testing. “I’ve put them online primarily so that experts in the field can check the contents before I inflict them on a wider audience. But if you want to help debug them, please go right ahead.”

Do you have some other leisure project for 2018 that is more important?

😉

Weird machines, exploitability, and provable unexploitability

Thursday, December 21st, 2017

Weird machines, exploitability, and provable unexploitability by Thomas Dullien (IEEE pre-print, to appear IEEE Transactions on Emerging Topics in Computing)

Abstract:

The concept of exploit is central to computer security, particularly in the context of memory corruptions. Yet, in spite of the centrality of the concept and voluminous descriptions of various exploitation techniques or countermeasures, a good theoretical framework for describing and reasoning about exploitation has not yet been put forward.

A body of concepts and folk theorems exists in the community of exploitation practitioners; unfortunately, these concepts are rarely written down or made sufficiently precise for people outside of this community to benefit from them.

This paper clarifies a number of these concepts, provides a clear definition of exploit, a clear definition of the concept of a weird machine, and how programming of a weird machine leads to exploitation. The papers also shows, somewhat counterintuitively, that it is feasible to design some software in a way that even powerful attackers – with the ability to corrupt memory once – cannot gain an advantage.

The approach in this paper is focused on memory corruptions. While it can be applied to many security vulnerabilities introduced by other programming mistakes, it does not address side channel attacks, protocol weaknesses, or security problems that are present by design.

A common vocabulary to bridge the gap between ‘Exploit practitioners’ (EPs) and academic researchers. Whether it will in fact bridge that gap remains to be seen. Even the attempt will prove to be useful.

Tracing the use/propagation of Dullien’s vocabulary across Google’s Project Zero reports and papers would provide a unique data set on the spread (or not) of a new vocabulary in computer science.

Not to mention being a way to map back into earlier literature with the newer vocabulary, via a topic map.

BTW, Dullien’s statement “is is feasible to design some software in a way that even powerful attackers … cannot gain an advantage,” is speculation and should not dampen your holiday spirits. (I root for the hare and not the hounds as a rule.)

Lisp at the Frontier of Computation

Saturday, December 9th, 2017

Abstract:

Since the 1950s, Lisp has been used to describe and calculate in cutting-edge fields like artificial intelligence, robotics, symbolic mathematics, and advanced optimizing compilers. It is no surprise that Lisp has also found relevance in quantum computation, both in academia and industry. Hosted at Rigetti Computing, a quantum computing startup in Berkeley, Robert Smith will provide a pragmatic view of the technical, sociological, and psychological aspects of working with an interdisciplinary team, writing Lisp, to build the next generation of technology resource: the quantum computer.

ABOUT THE SPEAKER: Robert has been using Lisp for over decade, and has been fortunate to work with and manage expert teams of Lisp programmers to build embedded fingerprint analysis systems, machine learning-based product recommendation software, metamaterial phased-array antennas, discrete differential geometric computer graphics software, and now quantum computers. As Director of Software Engineering, Robert is responsible for building the publicly available Rigetti Forest platform, powered by both a real quantum computer and one of the fastest single-node quantum computer simulators in the world.

Video notes mention “poor audio quality.” Not the best but clear and audible to me.

The coverage of the quantum computer work is great but mostly a general promotion of Lisp.

Important links:

Forest (beta) Forest provides development access to our 30-qubit simulator the Quantum Virtual Machine ™ and limited access to our quantum hardware systems for select partners. Workshop video plus numerous other resources.

A Practical Quantum Instruction Set Architecture by Robert S. Smith, Michael J. Curtis, William J. Zeng. (speaker plus two of his colleagues)

The Computer Science behind a modern distributed data store

Thursday, December 7th, 2017

From the description:

What we see in the modern data store world is a race between different approaches to achieve a distributed and resilient storage of data. Every application needs a stateful layer which holds the data. There are at least three necessary ingredients which are everything else than trivial to combine and of course even more challenging when heading for an acceptable performance.

Over the past years there has been significant progress in respect in both the science and practical implementations of such data stores. In his talk Max Neunhöffer will introduce the audience to some of the needed ingredients, address the difficulties of their interplay and show four modern approaches of distributed open-source data stores.

Topics are:

  • Challenges in developing a distributed, resilient data store
  • Consensus, distributed transactions, distributed query optimization and execution
  • The inner workings of ArangoDB, Cassandra, Cockroach and RethinkDB

The talk will touch complex and difficult computer science, but will at the same time be accessible to and enjoyable by a wide range of developers.

I haven’t found the slides for this presentation but did stumble across ArangoDB Tech Talks and Slides.

Neunhöffer’s presentation will make you look at ArangoDB more closely.

So You Want to be a WIZARD [Spoiler Alert: It Requires Work]

Monday, November 20th, 2017

So You Want to be a WIZARD by Julia Evans.

I avoid using terms like inspirational, transforming, etc. because it is so rare that software, projects, presentations merit merit those terms.

Today I am making an exception to that rule to say:

So You Want to be a Wizard by Julia Evans can transform your work in computer science.

Notice the use of “can” in that sentence. No guarantees because unlike many promised solutions, Julia says up front that hard work is required to use her suggestions successfully.

That’s right. If these methods don’t work for you it will be because you did not apply them. (full stop)

No guarantees you will get praise, promotions, recognition, etc., as a result of using Julia’s techniques, but you will be a wizard none the less.

One consolation is that wizards rarely notice back-biters, office sycophants, and a range of other toxic co-workers. They are too busy preparing themselves to answer the next technical issue that requires a wizard.

10 Papers Every Developer Should Read (At Least Twice) [With Hyperlinks]

Thursday, November 16th, 2017

10 Papers Every Developer Should Read (At Least Twice) by Michael Feathers

Feathers omits hyperlinks for the 10 papers every developer should read, at least twice.

Hyperlinks eliminate searches by every reader, saving them time and load on their favorite search engine, not to mention providing access more quickly. Feathers’ list with hyperlinks follows.

Most are easy to read but some are rough going – they drop off into math after the first few pages. Take the math to tolerance and then move on. The ideas are the important thing.

See Feather’s post for his comments on each paper.

Even a shallow web composed of hyperlinks is better than no web at all.

A Primer for Computational Biology

Thursday, November 9th, 2017

A Primer for Computational Biology by Shawn T. O’Neil.

From the webpage:

A Primer for Computational Biology aims to provide life scientists and students the skills necessary for research in a data-rich world. The text covers accessing and using remote servers via the command-line, writing programs and pipelines for data analysis, and provides useful vocabulary for interdisciplinary work. The book is broken into three parts:

  1. Introduction to Unix/Linux: The command-line is the “natural environment” of scientific computing, and this part covers a wide range of topics, including logging in, working with files and directories, installing programs and writing scripts, and the powerful “pipe” operator for file and data manipulation.
  2. Programming in Python: Python is both a premier language for learning and a common choice in scientific software development. This part covers the basic concepts in programming (data types, if-statements and loops, functions) via examples of DNA-sequence analysis. This part also covers more complex subjects in software development such as objects and classes, modules, and APIs.
  3. Programming in R: The R language specializes in statistical data analysis, and is also quite useful for visualizing large datasets. This third part covers the basics of R as a programming language (data types, if-statements, functions, loops and when to use them) as well as techniques for large-scale, multi-test analyses. Other topics include S3 classes and data visualization with ggplot2.

Pass along to life scientists and students.

This isn’t the primer that separates the CS material from domain specific examples and prose. Adaptation to another domain is a question of re-writing.

I assume an adaptable primer wasn’t the author’s intention and so that isn’t a criticism but an observation that basic material is written over and over again, needlessly.

I first saw this in a tweet by Christophe Lalanne.

3 Reasons to Read: Algorithms to Live By

Monday, April 24th, 2017

How Algorithms can untangle Human Questions. Interview with Brian Christian by Roberto V. Zican.

The entire interview is worth your study but the first question and answer establish why you should read Algorithms to Live By:

Q1. You have worked with cognitive scientist Tom Griffiths (professor of psy­chol­ogy and cognitive science at UC Berkeley) to show how algorithms used by computers can also untangle very human questions. What are the main lessons learned from such a joint work?

Brian Christian: I think ultimately there are three sets of insights that come out of the exploration of human decision-making from the perspective of computer science.

The first, quite simply, is that identifying the parallels between the problems we face in everyday life and some of the canonical problems of computer science can give us explicit strategies for real-life situations. So-called “explore/exploit” algorithms tell us when to go to our favorite restaurant and when to try something new; caching algorithms suggest — counterintuitively — that the messy pile of papers on your desk may in fact be the optimal structure for that information.

Second is that even in cases where there is no straightforward algorithm or easy answer, computer science offers us both a vocabulary for making sense of the problem, and strategies — using randomness, relaxing constraints — for making headway even when we can’t guarantee we’ll get the right answer every time.

Lastly and most broadly, computer science offers us a radically different picture of rationality than the one we’re used to seeing in, say, behavioral economics, where humans are portrayed as error-prone and irrational. Computer science shows us that being rational means taking the costs of computation — the costs of decision-making itself — into account. This leads to a much more human, and much more achievable picture of rationality: one that includes making mistakes and taking chances.
… (emphasis in original)

After the 2016 U.S. presidential election, I thought the verdict that humans are error-prone and irrational was unassailable.

Looking forward to the use of a human constructed lens (computer science) to view “human questions.” There are answers to “human questions” baked into computer science so watching the authors unpack those will be an interesting read. (Waiting for my copy to arrive.)

Just so you know, the Picador edition is a reprint. It was originally published by William Collins, 21/04/2016 in hardcover, see: Algorithms to Live By, a short review by Roberto Zicari, October 24, 2016.

Sci Hub It!

Friday, April 7th, 2017

Sci Hub It!

Simple add-on to make it easier to use Sci-Hub.

If you aren’t already using this plug-in for Firefox you should be.

Quite handy!

Enjoy!

Notes to (NUS) Computer Science Freshmen…

Monday, March 13th, 2017

Notes to (NUS) Computer Science Freshmen, From The Future

From the intro:

Early into the AY12/13 academic year, Prof Tay Yong Chiang organized a supper for Computer Science freshmen at Tembusu College. The bunch of seniors who were gathered there put together a document for NUS computing freshmen. This is that document.

Feel free to create a pull request to edit or add to it, and share it with other freshmen you know.

There is one sad note:


The Art of Computer Programming (a review of everything in Computer Science; pretty much nobody, save Knuth, has finished reading this)

When you think about the amount of time Knuth has spent researching, writing and editing The Art of Computer Programming (TAOCP), it doesn’t sound unreasonable to expect others, a significant number of others, to have read it.

Any online reading groups focused on TAOCP?

New Spaceship Speed in Conway’s Game of Life

Saturday, January 14th, 2017

New Spaceship Speed in Conway’s Game of Life by Alexy Nigin.

From the post:

In this article, I assume that you have basic familiarity with Conway’s Game of Life. If this is not the case, you can try reading an explanatory article but you will still struggle to understand much of the following content.

The day before yesterday ConwayLife.com forums saw a new member named zdr. When we the lifenthusiasts meet a newcomer, we expect to see things like “brand new” 30-cell 700-gen methuselah and then have to explain why it is not notable. However, what zdr showed us made our jaws drop.

It was a 28-cell c/10 orthogonal spaceship:

An animated image of the spaceship

… (emphasis in the original)

The mentioned introduction isn’t sufficient to digest the material in this post.

There is a wealth of material available on cellular automata (the Game of Life is one).

LifeWiki is one and Complex Cellular Automata is another. While not exhaustive of all there is to know about cellular automata, familiarity with take some time and skill.

Still, I offer this as encouragement that fundamental discoveries remain to be made.

But if and only if you reject conventional wisdom that prevents you from looking.

D-Wave Just Open-Sourced Quantum Computing [DC Beltway Parking Lot Distraction]

Friday, January 13th, 2017

D-Wave Just Open-Sourced Quantum Computing by Dom Galeon.

D-Wave has just released a welcome distraction for CS types sitting in the DC Beltway Parking Lot on January 20-21, 2017. (I assuming you brought extra batteries for your laptop.) After you run out of gas, your laptop will be running on battery power alone.

Just remember to grab a copy of Qbsolv before you leave for the tailgate/parking lot party on the Beltway.

A software tool known as Qbsolv allows developers to program D-Wave’s quantum computers even without knowledge of quantum computing. It has already made it possible for D-Wave to work with a bunch of partners, but the company wants more. “D-Wave is driving the hardware forward,” Bo Ewald, president of D-Wave International, told Wired. “But we need more smart people thinking about applications, and another set thinking about software tools.”

To that end, D-Wave has open-sourced Qbsolv, making it possible for anyone to freely share and modify the software. D-Wave hopes to build an open source community of sorts for quantum computing. Of course, to actually run this software, you’d need access to a piece of hardware that uses quantum particles, like one of D-Wave’s quantum computers. However, for the many who don’t have that access, the company is making it possible to download a D-Wave simulator that can be used to test Qbsolv on other types of computers.

This open-source Qbsolv joins an already-existing free software tool called Qmasm, which was developed by one of Qbsolv’s first users, Scott Pakin of Los Alamos National Laboratory. “Not everyone in the computer science community realizes the potential impact of quantum computing,” said mathematician Fred Glover, who’s been working with Qbsolv. “Qbsolv offers a tool that can make this impact graphically visible, by getting researchers and practitioners involved in charting the future directions of quantum computing developments.”

D-Wave’s machines might still be limited to solving optimization problems, but it’s a good place to start with quantum computers. Together with D-Wave, IBM has managed to develop its own working quantum computer in 2000, while Google teamed up with NASA to make their own. Eventually, we’ll have a quantum computer that’s capable of performing all kinds of advanced computing problems, and now you can help make that happen.

From the github page:

qbsolv is a metaheuristic or partitioning solver that solves a potentially large quadratic unconstrained binary optimization (QUBO) problem by splitting it into pieces that are solved either on a D-Wave system or via a classical tabu solver.

The phrase, “…might still be limited to solving optimization problems…” isn’t as limiting as it might appear.

A recent (2014) survey of quadratic unconstrained binary optimization (QUBO), The Unconstrained Binary Quadratic Programming Problem: A Survey runs some thirty-three pages and should keep you occupied however long you sit on the DC Beltway.

From page 10 of the survey:


Kochenberger, Glover, Alidaee, and Wang (2005) examine the use of UBQP as a tool for clustering microarray data into groups with high degrees of similarity.

Where I read one person’s “similarity” to be another person’s test of “subject identity.”

PS: Enjoy the DC Beltway. You may never see it motionless ever again.

OpenTOC (ACM SIG Proceedings – Free)

Sunday, January 1st, 2017

OpenTOC

From the webpage:

ACM OpenTOC is a unique service that enables Special Interest Groups to generate and post Tables of Contents for proceedings of their conferences enabling visitors to download the definitive version of the contents from the ACM Digital Library at no charge.

Downloads of these articles are captured in official ACM statistics, improving the accuracy of usage and impact measurements. Consistently linking to definitive versions of ACM articles should reduce user confusion over article versioning.

Conferences are listed by year, 2014 – 2016 and by event.

A step in the right direction.

Do you know if the digital library allows bulk downloading of search result metadata?

It didn’t the last time I had a digital library subscription. Contacting the secret ACM committee that decides on web features was verboten.

Enjoy this improvement in access while waiting for ACM access bottlenecks to wither and die.

Continuous Unix commit history from 1970 until today

Thursday, December 29th, 2016

Continuous Unix commit history from 1970 until today

From the webpage:

The history and evolution of the Unix operating system is made available as a revision management repository, covering the period from its inception in 1970 as a 2.5 thousand line kernel and 26 commands, to 2016 as a widely-used 27 million line system. The 1.1GB repository contains about half a million commits and more than two thousand merges. The repository employs Git system for its storage and is hosted on GitHub. It has been created by synthesizing with custom software 24 snapshots of systems developed at Bell Labs, the University of California at Berkeley, and the 386BSD team, two legacy repositories, and the modern repository of the open source FreeBSD system. In total, about one thousand individual contributors are identified, the early ones through primary research. The data set can be used for empirical research in software engineering, information systems, and software archaeology.

You can read more details about the contents, creation, and uses of this repository through this link.

Two repositories are associated with the project:

  • unix-history-repo is a repository representing a reconstructed version of the Unix history, based on the currently available data. This repository will be often automatically regenerated from scratch, so this is not a place to make contributions. To ensure replicability its users are encouraged to fork it or archive it.
  • unix-history-make is a repository containing code and metadata used to build the above repository. Contributions to this repository are welcomed.

Not everyone will find this exciting but this rocks as a resource for:

empirical research in software engineering, information systems, and software archaeology

Need to think seriously about putting this on a low-end laptop and sealing it up in a Faraday cage.

Just in case. 😉

Low fat computing

Thursday, December 22nd, 2016

Low fat computing by Karsten Schmidt

A summary of the presentation by Schmidt by Malcolm Sparks, along with the presentation itself.

Lots of strange and 3-D printable eye candy for the first 15 minutes or so with Schmidt’s background. Starts to really rock around 20 minutes in with Forth code and very low level coding.

To get a better idea of what Schmidt has been doing, see his website: thi.ng, or his Forth repl in Javascript, http://forth.thi.ng/, or his GitHub repository or at: Github: thi.ng

Stop by at http://toxiclibs.org/ although the material there looks dated.

Operating Systems Design and Implementation (12th USENIX Symposium)

Thursday, November 17th, 2016

Operating Systems Design and Implementation (12th USENIX Symposium) – Savannah, GA, USA, November 2-4, 2016.

Message from the OSDI ’16 Program Co-Chairs:

We are delighted to welcome to you to the 12th USENIX Symposium on Operating Systems Design and Implementation, held in Savannah, GA, USA! This year’s program includes a record high 47 papers that represent the strength of our community and cover a wide range of topics, including security, cloud computing, transaction support, storage, networking, formal verification of systems, graph processing, system support for machine learning, programming languages, troubleshooting, and operating systems design and implementation.

Weighing in at seven hundred and ninety-seven (797) pages, this tome will prove more than sufficient to avoid annual family arguments during the holiday season.

Not to mention this is an opportunity to hone your skills to a fine edge.

Understanding the fundamentals of attacks (Theory of Exploitation)

Thursday, November 3rd, 2016

Understanding the fundamentals of attacks – What is happening when someone writes an exploit? by Halvar Flake / Thomas Dullien.

The common “bag of tricks” as Halvar refers to them for hacking, does cover all the major data breaches for the last 24 months.

No zero-day exploits.

Certainly none of the deep analysis offered by Halvar here.

Still, you owe it to yourself and your future on one side or the other of computer security, to review these slides and references carefully.

Even though Halvar concludes (in part)

Exploitation is programming emergent weird machines.

It does not require EIP/RIP, and is not a bad of tricks.

Theory of exploitation is still in embryonic stage.

Imagine the advantages of having mastered the art of exploitation theory at its inception.

In an increasingly digital world, you may be worth your own weight in gold. 😉

PS: Specifying the subject identity properties of exploits will assist in organizing them for future use/defense.

One expert hacker is like a highly skilled warrior.

Making exploits easy to discover/use by average hackers is like a skilled warrior facing a company of average fighters.

The outcome will be bloody, but never in doubt.

The Hanselminutes Podcast

Friday, August 26th, 2016

The Hanselminutes Podcast: Fresh Air for Developers by Scott Hanselman.

I went looking for Felienne’s podcast on code smells and discovered along with it, The Hanselminutes Podcast: Fresh Air for Developers!

Felienne’s podcast is #542 so there is a lot of content to enjoy! (I checked the archive. Yes, there really are 542 episodes as of today.)

Exploring Code Smells in code written by Children

Friday, August 26th, 2016

Exploring Code Smells in code written by Children (podcast) by Dr. Felienne

From the description:

Felienne is always learning. In exploring her PhD dissertation and her public speaking experience it’s clear that she has no intent on stopping! Most recently she’s been exploring a large corpus of Scratch programs looking for Code Smells. How do children learn how to code, and when they do, does their code “smell?” Is there something we can do when teaching to promote cleaner, more maintainable code?

Felienne discusses a paper due to appear in September on analysis of 250K Scratch programs for code smells.

Thoughts on teaching programmers to detect bug smells?

If You Believe In OpenAccess, Do You Practice OpenAccess?

Wednesday, June 15th, 2016

CSC-OpenAccess LIBRARY

From the webpage:

CSC Open-Access Library aim to maintain and develop access to journal publication collections as a research resource for students, teaching staff, researchers and industrialists.

You can see a complete listing of the journals here.

Before you protest these are not Science or Nature, remember that Science and Nature did not always have the reputations they do today.

Let the quality of your work bolster the reputations of open access publications and attract others to them.

How to Run a Russian Hacking Ring [Just like Amway, Mary Kay … + Career Advice]

Sunday, June 12th, 2016

How to Run a Russian Hacking Ring by Kaveh Waddell.

From the post:

A man with intense eyes crouches over a laptop in a darkened room, his face and hands hidden by a black ski mask and gloves. The scene is lit only by the computer screen’s eerie glow.

Exaggerated portraits of malicious hackers just like this keep popping up in movies and TV, despite the best efforts of shows like Mr. Robot to depict hackers in a more realistic way. Add a cacophony of news about data breaches that have shaken the U.S. government, taken entire hospital systems hostage, and defrauded the international banking system, and hackers start to sound like omnipotent super-villains.

But the reality is, as usual, less dramatic. While some of the largest cyberattacks have been the work of state-sponsored hackers—the OPM data breach that affected millions of Americans last year, for example, or the Sony hack that revealed Hollywood’s intimate secrets​—the vast majority of the world’s quotidian digital malice comes from garden-variety hackers.

What a downer this would be at career day at the local high school.

Yes, you too can be a hacker but it’s as dull as anything you have seen in Dilbert.

Your location plays an important role in whether Russian hacking ring employment is in your future. Kaveh reports:


Even the boss’s affiliates, who get less than half of each ransom that they extract, make a decent wage. They earned an average of 600 dollars a month, or about 40 percent more than the average Russian worker.

$600/month is ok, if you are living in Russia, not so hot if you aspire to Venice Beach. (It’s too bad the beach cam doesn’t pan and zoom.)

The level of technical skills required for low-lying fruit hacking is falling, meaning more competitors for the low-end. Potential profits are going to fall even further.

The no liability for buggy software will fall sooner rather than later and skilled hackers (I mean security researchers) will find themselves in demand by both plaintiffs and defendants. You will earn more money if you can appear in court, some expert witnesses make $600/hour or more. (Compare the $600/month in Russia.)

Even if you can’t appear in court, for reasons that seem good to you, fleshing out the details of hacks is going to be on demand from all sides.

You may start at the shallow end of the pool but resolve to not stay there. Read deeply, practice everyday, start current on new developments and opportunities, contribute to online communities.

“This guy’s arrogance takes your breath away”

Tuesday, May 31st, 2016

“This guy’s arrogance takes your breath away” – Letters between John W Backus and Edsger W Dijkstra, 1979 by Jiahao Chen.

From the post:

Item No. 155: Correspondence with Edsger Dijkstra. 1979

At the time of this correspondence, Backus had just won the 1977 Turing Award and had chosen to talk about his then-current research on functional programming (FP) for his award lecture in Seattle. See this pdf of the published version, noting that Backus himself described “significant differences” with the talk that was actually given. Indeed, the transcript at the LoC was much more casual and easier to follow.

Dijkstra, in his characteristically acerbic and hyperbolic style, wrote a scathing public review (EWD 692) and some private critical remarks in what looks like a series of letters with Backus.

From what I can tell, these letters are not part of the E. W. Dijkstra archives at UT Austin, nor are they available online anywhere else. So here they are for posterity.

You won’t find Long form exchanges such as these in present-day near instant bait-reply cycles of email messages.

That’s unfortunate.

Chen has created a Github repository if you are interested in transcribing pre-email documents.

You can help create better access to the history of computer science and see how to craft a cutting remark, as opposed to blurting out the first insult that comes to mind.

Enjoy!

Tip #20: Play with Racket [Computer Science for Everyone?]

Saturday, January 30th, 2016

Tip #20: Play with Racket by Aaron Quint and Michael R. Bernstein.

From the post:

Racket is a programming language in the Lisp tradition that is different from other programming languages in a few important ways. It can be any language you want – because Racket is heavily used for pedagogy, it has evolved into a suite of languages and tools that you can use to explore as many different programming paradigms as you can think of. You can also download it and play with it right now, without installing anything else, or knowing anything at all about computers or programming. Watching Matthias Felleisen’s “big-bang: the world, universe, and network in the programming language” talk will give you an idea of how Racket can be used to help people learn how to think about mathematics, computation, and more. Try it out even if you “hate Lisp” or “don’t know how to program” – it’s really a lot of fun.

Aaron and Michael scooped President Obama’s computer science skills for everyone by a day:

President Barack Obama said Saturday he will ask Congress for billions of dollars to help students learn computer science skills and prepare for jobs in a changing economy.

“In the new economy, computer science isn’t an optional skill. It’s a basic skill, right along with the three R’s,” Obama said in his weekly radio and Internet address….(Obama Wants $4B to Help Students Learn Computer Science)

The “computer science for everyone” is a popular chant but consider the Insecure Internet of Things (IIoT).

Will minimal computer science skills increase or decrease the level of security for the IIoT?

That’s what I think too.

Removal of IoT components is the only real defense. Expect a vibrant cottage industry to grow up around removing IoT components.

Everything You Know About Latency Is Wrong

Thursday, December 24th, 2015

Everything You Know About Latency Is Wrong by Tyler Treat.

From the post:

Okay, maybe not everything you know about latency is wrong. But now that I have your attention, we can talk about why the tools and methodologies you use to measure and reason about latency are likely horribly flawed. In fact, they’re not just flawed, they’re probably lying to your face.

When I went to Strange Loop in September, I attended a workshop called “Understanding Latency and Application Responsiveness” by Gil Tene. Gil is the CTO of Azul Systems, which is most renowned for its C4 pauseless garbage collector and associated Zing Java runtime. While the workshop was four and a half hours long, Gil also gave a 40-minute talk called “How NOT to Measure Latency” which was basically an abbreviated, less interactive version of the workshop. If you ever get the opportunity to see Gil speak or attend his workshop, I recommend you do. At the very least, do yourself a favor and watch one of his recorded talks or find his slide decks online.

The remainder of this post is primarily a summarization of that talk. You may not get anything out of it that you wouldn’t get out of the talk, but I think it can be helpful to absorb some of these ideas in written form. Plus, for my own benefit, writing about them helps solidify it in my head.

Great post, not only for the discussion of latency but for two extensions to the admonition (Moon is a Harsh Mistress) “Always cut cards:”

  • Always understand the nature of your data.
  • Always understand the nature your methodology.

If you fail at either of those, the results presented to you or that you present to others may or may not be true, false or irrelevant.

Treat’s post is just one example in a vast sea of data and methodologies which are just as misleading if not more so.

If you need motivation to put in the work, how’s your comfort level with being embarrassed in public? Like someone demonstrating your numbers are BS.

Readings in Database Systems, 5th Edition (Kindle Stuffer)

Tuesday, December 15th, 2015

Readings in Database Systems, 5th Edition, Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker, editors.

From the webpage:

  1. Preface [HTML] [PDF]
  2. Background introduced by Michael Stonebraker [HTML] [PDF]
  3. Traditional RDBMS Systems introduced by Michael Stonebraker [HTML] [PDF]
  4. Techniques Everyone Should Know introduced by Peter Bailis [HTML] [PDF]
  5. New DBMS Architectures introduced by Michael Stonebraker [HTML] [PDF]
  6. Large-Scale Dataflow Engines introduced by Peter Bailis [HTML] [PDF]
  7. Weak Isolation and Distribution introduced by Peter Bailis [HTML] [PDF]
  8. Query Optimization introduced by Joe Hellerstein [HTML] [PDF]
  9. Interactive Analytics introduced by Joe Hellerstein [HTML] [PDF]
  10. Languages introduced by Joe Hellerstein [HTML] [PDF]
  11. Web Data introduced by Peter Bailis [HTML] [PDF]
  12. A Biased Take on a Moving Target: Complex Analytics
    by Michael Stonebraker [HTML] [PDF]
  13. A Biased Take on a Moving Target: Data Integration
    by Michael Stonebraker [HTML] [PDF]

Complete Book: [HTML] [PDF]

Readings Only: [HTML] [PDF]

Previous Editions: [HTML]

Citations to the “reading” do not present themselves as hyperlinks but they are.

If you are giving someone a Kindle this Christmas, consider pre-loading Readings in Database Systems, along with the readings as a Kindle stuffer.

The Moral Failure of Computer Scientists [Warning: Scam Alert!]

Sunday, December 13th, 2015

The Moral Failure of Computer Scientists by Kaveh Waddell.

From the post:

Computer scientists and cryptographers occupy some of the ivory tower’s highest floors. Among academics, their work is prestigious and celebrated. To the average observer, much of it is too technical to comprehend. The field’s problems can sometimes seem remote from reality.

But computer science has quite a bit to do with reality. Its practitioners devise the surveillance systems that watch over nearly every space, public or otherwise—and they design the tools that allow for privacy in the digital realm. Computer science is political, by its very nature.

That’s at least according to Phillip Rogaway, a professor of computer science at the University of California, Davis, who has helped create some of the most important tools that secure the Internet today. Last week, Rogaway took his case directly to a roomful of cryptographers at a conference in Auckland, New Zealand. He accused them of a moral failure: By allowing the government to construct a massive surveillance apparatus, the field had abused the public trust. Rogaway said the scientists had a duty to pursue social good in their work.

He likened the danger posed by modern governments’ growing surveillance capabilities to the threat of nuclear warfare in the 1950s, and called upon scientists to step up and speak out today, as they did then.

I spoke to Rogaway about why cryptographers fail to see their work in moral terms, and the emerging link between encryption and terrorism in the national conversation. A transcript of our conversation appears below, lightly edited for concision and clarity.

I don’t disagree with Rogaway that all science and technology is political. I might use the term social instead but I agree, there are no neutral choices.

Having said that, I do disagree that Rogaway has the standing to pre-package a political stance colored as “morals” and denounce others as “immoral” if they disagree.

It is one of the oldest tricks in rhetoric but quite often effective, which is why people keep using it.

If Rogaway is correct that CS and technology are political, then his stance for a particular take on government, surveillance and cryptography is equally political.

Not that I disagree with his stance, but I don’t consider it be a moral choice.

Anything you can do to impede, disrupt or interfere with any government surveillance is fine by me. I won’t complain. But that’s because government surveillance, the high-tech kind, is a waste of time and effort.

Rogaway uses scientists who spoke out in the 1950’s about the threat of nuclear warfare as an example. Some example.

The Federation of American Scientists estimates that as of September 2015, there are approximately 15,800 nuclear weapons in the world.

Hmmm, doesn’t sound like their moral outrage was very effective does it?

There will be sessions, presentations, conferences, along with comped travel and lodging, publications for tenure, etc., but the sum of the discussion of morality in computer science with be largely the same.

The reason for the sameness of result is that discussions, papers, resolutions and the rest, aren’t nearly as important as the ethical/moral choices you make in the day to day practice as a computer scientist.

Choices in the practice of computer science make a difference, discussions of fictional choices don’t. It’s really that simple.*

*That’s not entirely fair. The industry of discussing moral choices without making any of them is quite lucrative and it depletes the bank accounts of those snared by it. So in that sense it does make a difference.

Order of Requirements Matter

Tuesday, December 8th, 2015

Sam Lightstone posted a great illustration of why the order of requirements can matter to Twitter:

asimov-robotics

Visualizations rarely get much clearer.

You could argue that Minard’s map of Napoleon’s invasion of Russia is equally clear:

600px-Minard

But Minard drew with the benefit of hindsight, not foresight.

The Laws of Robotics, on the other hand, have predictive value for the different orders of requirements.

I don’t know how many requirements Honeywell had for the Midas and Midas Black Gas Detectors but you can bet IP security was near the end of the list, if explicit at all.

IP security should be #1 with a bullet, especially for devices that detect Ammonia (caustic, hazarous), Arsine (highly toxic, flammable), Chlorine (extremely dangerous, poisonous for all living organisms), Hydrogen cyanide, and Hydrogen flouride (“Hydrogen fluoride is a highly dangerous gas, forming corrosive and penetrating hydrofluoric acid upon contact with living tissue. The gas can also cause blindness by rapid destruction of the corneas.”)

When IP security is not the first requirement, it’s not hard to foresee the outcome, an Insecure Internet of Things.

Is that what we want?