Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 30, 2016

Tip #20: Play with Racket [Computer Science for Everyone?]

Filed under: Computer Science,Education,Programming — Patrick Durusau @ 2:23 pm

Tip #20: Play with Racket by Aaron Quint and Michael R. Bernstein.

From the post:

Racket is a programming language in the Lisp tradition that is different from other programming languages in a few important ways. It can be any language you want – because Racket is heavily used for pedagogy, it has evolved into a suite of languages and tools that you can use to explore as many different programming paradigms as you can think of. You can also download it and play with it right now, without installing anything else, or knowing anything at all about computers or programming. Watching Matthias Felleisen’s “big-bang: the world, universe, and network in the programming language” talk will give you an idea of how Racket can be used to help people learn how to think about mathematics, computation, and more. Try it out even if you “hate Lisp” or “don’t know how to program” – it’s really a lot of fun.

Aaron and Michael scooped President Obama’s computer science skills for everyone by a day:

President Barack Obama said Saturday he will ask Congress for billions of dollars to help students learn computer science skills and prepare for jobs in a changing economy.

“In the new economy, computer science isn’t an optional skill. It’s a basic skill, right along with the three R’s,” Obama said in his weekly radio and Internet address….(Obama Wants $4B to Help Students Learn Computer Science)

The “computer science for everyone” is a popular chant but consider the Insecure Internet of Things (IIoT).

Will minimal computer science skills increase or decrease the level of security for the IIoT?

That’s what I think too.

Removal of IoT components is the only real defense. Expect a vibrant cottage industry to grow up around removing IoT components.

December 24, 2015

Everything You Know About Latency Is Wrong

Filed under: Computer Science,Design,Statistics — Patrick Durusau @ 9:15 pm

Everything You Know About Latency Is Wrong by Tyler Treat.

From the post:

Okay, maybe not everything you know about latency is wrong. But now that I have your attention, we can talk about why the tools and methodologies you use to measure and reason about latency are likely horribly flawed. In fact, they’re not just flawed, they’re probably lying to your face.

When I went to Strange Loop in September, I attended a workshop called “Understanding Latency and Application Responsiveness” by Gil Tene. Gil is the CTO of Azul Systems, which is most renowned for its C4 pauseless garbage collector and associated Zing Java runtime. While the workshop was four and a half hours long, Gil also gave a 40-minute talk called “How NOT to Measure Latency” which was basically an abbreviated, less interactive version of the workshop. If you ever get the opportunity to see Gil speak or attend his workshop, I recommend you do. At the very least, do yourself a favor and watch one of his recorded talks or find his slide decks online.

The remainder of this post is primarily a summarization of that talk. You may not get anything out of it that you wouldn’t get out of the talk, but I think it can be helpful to absorb some of these ideas in written form. Plus, for my own benefit, writing about them helps solidify it in my head.

Great post, not only for the discussion of latency but for two extensions to the admonition (Moon is a Harsh Mistress) “Always cut cards:”

  • Always understand the nature of your data.
  • Always understand the nature your methodology.

If you fail at either of those, the results presented to you or that you present to others may or may not be true, false or irrelevant.

Treat’s post is just one example in a vast sea of data and methodologies which are just as misleading if not more so.

If you need motivation to put in the work, how’s your comfort level with being embarrassed in public? Like someone demonstrating your numbers are BS.

December 15, 2015

Readings in Database Systems, 5th Edition (Kindle Stuffer)

Filed under: Computer Science,Database — Patrick Durusau @ 2:28 pm

Readings in Database Systems, 5th Edition, Peter Bailis, Joseph M. Hellerstein, Michael Stonebraker, editors.

From the webpage:

  1. Preface [HTML] [PDF]
  2. Background introduced by Michael Stonebraker [HTML] [PDF]
  3. Traditional RDBMS Systems introduced by Michael Stonebraker [HTML] [PDF]
  4. Techniques Everyone Should Know introduced by Peter Bailis [HTML] [PDF]
  5. New DBMS Architectures introduced by Michael Stonebraker [HTML] [PDF]
  6. Large-Scale Dataflow Engines introduced by Peter Bailis [HTML] [PDF]
  7. Weak Isolation and Distribution introduced by Peter Bailis [HTML] [PDF]
  8. Query Optimization introduced by Joe Hellerstein [HTML] [PDF]
  9. Interactive Analytics introduced by Joe Hellerstein [HTML] [PDF]
  10. Languages introduced by Joe Hellerstein [HTML] [PDF]
  11. Web Data introduced by Peter Bailis [HTML] [PDF]
  12. A Biased Take on a Moving Target: Complex Analytics
    by Michael Stonebraker [HTML] [PDF]
  13. A Biased Take on a Moving Target: Data Integration
    by Michael Stonebraker [HTML] [PDF]

Complete Book: [HTML] [PDF]

Readings Only: [HTML] [PDF]

Previous Editions: [HTML]

Citations to the “reading” do not present themselves as hyperlinks but they are.

If you are giving someone a Kindle this Christmas, consider pre-loading Readings in Database Systems, along with the readings as a Kindle stuffer.

December 13, 2015

The Moral Failure of Computer Scientists [Warning: Scam Alert!]

Filed under: Computer Science,Ethics — Patrick Durusau @ 9:09 pm

The Moral Failure of Computer Scientists by Kaveh Waddell.

From the post:

Computer scientists and cryptographers occupy some of the ivory tower’s highest floors. Among academics, their work is prestigious and celebrated. To the average observer, much of it is too technical to comprehend. The field’s problems can sometimes seem remote from reality.

But computer science has quite a bit to do with reality. Its practitioners devise the surveillance systems that watch over nearly every space, public or otherwise—and they design the tools that allow for privacy in the digital realm. Computer science is political, by its very nature.

That’s at least according to Phillip Rogaway, a professor of computer science at the University of California, Davis, who has helped create some of the most important tools that secure the Internet today. Last week, Rogaway took his case directly to a roomful of cryptographers at a conference in Auckland, New Zealand. He accused them of a moral failure: By allowing the government to construct a massive surveillance apparatus, the field had abused the public trust. Rogaway said the scientists had a duty to pursue social good in their work.

He likened the danger posed by modern governments’ growing surveillance capabilities to the threat of nuclear warfare in the 1950s, and called upon scientists to step up and speak out today, as they did then.

I spoke to Rogaway about why cryptographers fail to see their work in moral terms, and the emerging link between encryption and terrorism in the national conversation. A transcript of our conversation appears below, lightly edited for concision and clarity.

I don’t disagree with Rogaway that all science and technology is political. I might use the term social instead but I agree, there are no neutral choices.

Having said that, I do disagree that Rogaway has the standing to pre-package a political stance colored as “morals” and denounce others as “immoral” if they disagree.

It is one of the oldest tricks in rhetoric but quite often effective, which is why people keep using it.

If Rogaway is correct that CS and technology are political, then his stance for a particular take on government, surveillance and cryptography is equally political.

Not that I disagree with his stance, but I don’t consider it be a moral choice.

Anything you can do to impede, disrupt or interfere with any government surveillance is fine by me. I won’t complain. But that’s because government surveillance, the high-tech kind, is a waste of time and effort.

Rogaway uses scientists who spoke out in the 1950’s about the threat of nuclear warfare as an example. Some example.

The Federation of American Scientists estimates that as of September 2015, there are approximately 15,800 nuclear weapons in the world.

Hmmm, doesn’t sound like their moral outrage was very effective does it?

There will be sessions, presentations, conferences, along with comped travel and lodging, publications for tenure, etc., but the sum of the discussion of morality in computer science with be largely the same.

The reason for the sameness of result is that discussions, papers, resolutions and the rest, aren’t nearly as important as the ethical/moral choices you make in the day to day practice as a computer scientist.

Choices in the practice of computer science make a difference, discussions of fictional choices don’t. It’s really that simple.*

*That’s not entirely fair. The industry of discussing moral choices without making any of them is quite lucrative and it depletes the bank accounts of those snared by it. So in that sense it does make a difference.

December 8, 2015

Order of Requirements Matter

Filed under: Computer Science,Cybersecurity,Visualization — Patrick Durusau @ 7:03 pm

Sam Lightstone posted a great illustration of why the order of requirements can matter to Twitter:

asimov-robotics

Visualizations rarely get much clearer.

You could argue that Minard’s map of Napoleon’s invasion of Russia is equally clear:

600px-Minard

But Minard drew with the benefit of hindsight, not foresight.

The Laws of Robotics, on the other hand, have predictive value for the different orders of requirements.

I don’t know how many requirements Honeywell had for the Midas and Midas Black Gas Detectors but you can bet IP security was near the end of the list, if explicit at all.

IP security should be #1 with a bullet, especially for devices that detect Ammonia (caustic, hazarous), Arsine (highly toxic, flammable), Chlorine (extremely dangerous, poisonous for all living organisms), Hydrogen cyanide, and Hydrogen flouride (“Hydrogen fluoride is a highly dangerous gas, forming corrosive and penetrating hydrofluoric acid upon contact with living tissue. The gas can also cause blindness by rapid destruction of the corneas.”)

When IP security is not the first requirement, it’s not hard to foresee the outcome, an Insecure Internet of Things.

Is that what we want?

November 29, 2015

arXiv Sanity Preserver

Filed under: Computer Science,Searching,Similarity,Similarity Retrieval,TF-IDF — Patrick Durusau @ 4:07 pm

arXiv Sanity Preserver by Andrej Karpathy.

From the webpage:

There are way too many arxiv papers, so I wrote a quick webapp that lets you search and sort through the mess in a pretty interface, similar to my pretty conference format.

It’s super hacky and was written in 4 hours. I’ll keep polishing it a bit over time perhaps but it serves its purpose for me already. The code uses Arxiv API to download the most recent papers (as many as you want – I used the last 1100 papers over last 3 months), and then downloads all papers, extracts text, creates tfidf vectors for each paper, and lastly is a flask interface for searching through and filtering similar papers using the vectors.

Main functionality is a search feature, and most useful is that you can click “sort by tfidf similarity to this”, which returns all the most similar papers to that one in terms of tfidf bigrams. I find this quite useful.

arxiv-sanity

You can see this rather remarkable tool online at: https://karpathy23-5000.terminal.com/

Beyond its obvious utility for researchers, this could be used as a framework for experimenting with other similarity measures.

Enjoy!

I first saw this in a tweet by Lynn Cherny.

November 27, 2015

Best Paper Awards in Computer Science (2014)

Filed under: Computer Science,Conferences — Patrick Durusau @ 5:17 pm

Best Paper Awards in Computer Science (2014)

From the webpage:

Jeff Huang’s list of the best paper awards from 29 CS conferences since 1996 up to and including 2014.

I saw a tweet about Jeff’s site being updated to include papers from 2014.

If you are looking for reading material in a particular field, this is a good place to start.

For a complete list of the organizations, conferences as expanded abbreviations: see: Best Paper Awards in Computer Science (2013). None of them have changed so I didn’t see the point of repeating them.

November 13, 2015

LIQUi|> – A Quantum Computing Simulator

Filed under: Computer Science,Physics,Quantum — Patrick Durusau @ 8:23 pm

With quantum computing simulator, Microsoft offers a sneak peek into future of computing by Allison Linn.

From the post:


Next week, at the SuperComputing 2015 conference in Austin, Texas, Dave Wecker, a lead architect on the QuArC team, will discuss the recent public release on GitHub of a suite of tools that allows computer scientists to simulate a quantum computer’s capabilities. That’s a crucial step in building the tools needed to run actual quantum computers.

“This is the closest we can get to running a quantum computer without having one,” said Wecker, who has helped develop the software.

The software is called Language-Integrated Quantum Operations, or LIQUi|>. The funky characters at the end refer to how a quantum operation is written in mathematical terms.

The researchers are hoping that, using LIQUi|>, computer scientists at Microsoft and other academic and research institutions will be able to perfect the algorithms they need to efficiently use a quantum computer even as the computers themselves are simultaneously being developed.

“We can actually debut algorithms in advance of running them on the computer,” Svore said.

As of today, November 13, 2015, LIQUi|> has only one (1) hit at GitHub. Will try back next week to see what the numbers look like then.

You won’t have a quantum computer by the holidays but you may have created your first quantum algorithm by then.

Enjoy!

November 12, 2015

The Architecture of Open Source Applications

Filed under: Books,Computer Science,Programming,Software,Software Engineering — Patrick Durusau @ 9:08 pm

The Architecture of Open Source Applications

From the webpage:

Architects look at thousands of buildings during their training, and study critiques of those buildings written by masters. In contrast, most software developers only ever get to know a handful of large programs well—usually programs they wrote themselves—and never study the great programs of history. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

Our goal is to change that. In these two books, the authors of four dozen open source applications explain how their software is structured, and why. What are each program’s major components? How do they interact? And what did their builders learn during their development? In answering these questions, the contributors to these books provide unique insights into how they think.

If you are a junior developer, and want to learn how your more experienced colleagues think, these books are the place to start. If you are an intermediate or senior developer, and want to see how your peers have solved hard design problems, these books can help you too.

Follow us on our blog at http://aosabook.org/blog/, or on Twitter at @aosabook and using the #aosa hashtag.

I happened upon these four books because of a tweet that mentioned: Early Access Release of Allison Kaptur’s “A Python Interpreter Written in Python” Chapter, which I found to be the tenth chapter of “500 Lines.”

OK, but what the hell is “500 Lines?” Poking around a bit I found The Architecture of Open Source Applications.

Which is the source for the material I quote above.

Do you learn from example?

Let me give you the flavor of three of the completed volumes and the “500 Lines” that is in progress:

The Architecture of Open Source Applications: Elegance, Evolution, and a Few Fearless Hacks (vol. 1), from the introduction:

Carpentry is an exacting craft, and people can spend their entire lives learning how to do it well. But carpentry is not architecture: if we step back from pitch boards and miter joints, buildings as a whole must be designed, and doing that is as much an art as it is a craft or science.

Programming is also an exacting craft, and people can spend their entire lives learning how to do it well. But programming is not software architecture. Many programmers spend years thinking about (or wrestling with) larger design issues: Should this application be extensible? If so, should that be done by providing a scripting interface, through some sort of plugin mechanism, or in some other way entirely? What should be done by the client, what should be left to the server, and is “client-server” even a useful way to think about this application? These are not programming questions, any more than where to put the stairs is a question of carpentry.

Building architecture and software architecture have a lot in common, but there is one crucial difference. While architects study thousands of buildings in their training and during their careers, most software developers only ever get to know a handful of large programs well. And more often than not, those are programs they wrote themselves. They never get to see the great programs of history, or read critiques of those programs’ designs written by experienced practitioners. As a result, they repeat one another’s mistakes rather than building on one another’s successes.

This book is our attempt to change that. Each chapter describes the architecture of an open source application: how it is structured, how its parts interact, why it’s built that way, and what lessons have been learned that can be applied to other big design problems. The descriptions are written by the people who know the software best, people with years or decades of experience designing and re-designing complex applications. The applications themselves range in scale from simple drawing programs and web-based spreadsheets to compiler toolkits and multi-million line visualization packages. Some are only a few years old, while others are approaching their thirtieth anniversary. What they have in common is that their creators have thought long and hard about their design, and are willing to share those thoughts with you. We hope you enjoy what they have written.

The Architecture of Open Source Applications: Structure, Scale, and a Few More Fearless Hacks (vol. 2), from the introduction:

In the introduction to Volume 1 of this series, we wrote:

Building architecture and software architecture have a lot in common, but there is one crucial difference. While architects study thousands of buildings in their training and during their careers, most software developers only ever get to know a handful of large programs well… As a result, they repeat one another’s mistakes rather than building on one another’s successes… This book is our attempt to change that.

In the year since that book appeared, over two dozen people have worked hard to create the sequel you have in your hands. They have done so because they believe, as we do, that software design can and should be taught by example—that the best way to learn how think like an expert is to study how experts think. From web servers and compilers through health record management systems to the infrastructure that Mozilla uses to get Firefox out the door, there are lessons all around us. We hope that by collecting some of them together in this book, we can help you become a better developer.

The Performance of Open Source Applications, from the introduction:

It’s commonplace to say that computer hardware is now so fast that most developers don’t have to worry about performance. In fact, Douglas Crockford declined to write a chapter for this book for that reason:

If I were to write a chapter, it would be about anti-performance: most effort spent in pursuit of performance is wasted. I don’t think that is what you are looking for.

Donald Knuth made the same point thirty years ago:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil.

but between mobile devices with limited power and memory, and data analysis projects that need to process terabytes, a growing number of developers do need to make their code faster, their data structures smaller, and their response times shorter. However, while hundreds of textbooks explain the basics of operating systems, networks, computer graphics, and databases, few (if any) explain how to find and fix things in real applications that are simply too damn slow.

This collection of case studies is our attempt to fill that gap. Each chapter is written by real developers who have had to make an existing system faster or who had to design something to be fast in the first place. They cover many different kinds of software and performance goals; what they have in common is a detailed understanding of what actually happens when, and how the different parts of large applications fit together. Our hope is that this book will—like its predecessor The Architecture of Open Source Applications—help you become a better developer by letting you look over these experts’ shoulders.

500 Lines or Less From the GitHub page:

Every architect studies family homes, apartments, schools, and other common types of buildings during her training. Equally, every programmer ought to know how a compiler turns text into instructions, how a spreadsheet updates cells, and how a database efficiently persists data.

Previous books in the AOSA series have done this by describing the high-level architecture of several mature open-source projects. While the lessons learned from those stories are valuable, they are sometimes difficult to absorb for programmers who have not yet had to build anything at that scale.

“500 Lines or Less” focuses on the design decisions and tradeoffs that experienced programmers make when they are writing code:

  • Why divide the application into these particular modules with these particular interfaces?
  • Why use inheritance here and composition there?
  • How do we predict where our program might need to be extended, and how can we make that easy for other programmers

Each chapter consists of a walkthrough of a program that solves a canonical problem in software engineering in at most 500 source lines of code. We hope that the material in this book will help readers understand the varied approaches that engineers take when solving problems in different domains, and will serve as a basis for projects that extend or modify the contributions here.

If you answered the question about learning from example with yes, adding these works to your read and re-read list.

BTW, for markup folks, check out Parsing XML at the Speed of Light by Arseny Kapoulkine.

Many hours of reading and keyboard pleasure await anyone using these volumes.

Visualizing What Your Computer (and Science) Ignore (mostly)

Filed under: Computer Science,Geometry,Image Processing,Image Understanding,Physics — Patrick Durusau @ 8:01 pm

Deviation Magnification: Revealing Departures from Ideal Geometries by Neal Wadhwa, Tali Dekel, Donglai Wei, Frédo Durand, William T. Freeman.

Abstract:

Structures and objects are often supposed to have idealized geome- tries such as straight lines or circles. Although not always visible to the naked eye, in reality, these objects deviate from their idealized models. Our goal is to reveal and visualize such subtle geometric deviations, which can contain useful, surprising information about our world. Our framework, termed Deviation Magnification, takes a still image as input, fits parametric models to objects of interest, computes the geometric deviations, and renders an output image in which the departures from ideal geometries are exaggerated. We demonstrate the correctness and usefulness of our method through quantitative evaluation on a synthetic dataset and by application to challenging natural images.

The video for the paper is quite compelling:

Read the full paper here: http://people.csail.mit.edu/nwadhwa/deviation-magnification/DeviationMagnification.pdf

From the introduction to the paper:

Many phenomena are characterized by an idealized geometry. For example, in ideal conditions, a soap bubble will appear to be a perfect circle due to surface tension, buildings will be straight and planetary rings will form perfect elliptical orbits. In reality, however, such flawless behavior hardly exists, and even when invisible to the naked eye, objects depart from their idealized models. In the presence of gravity, the bubble may be slightly oval, the building may start to sag or tilt, and the rings may have slight perturbations due to interactions with nearby moons. We present Deviation Magnification, a tool to estimate and visualize such subtle geometric deviations, given only a single image as input. The output of our algorithm is a new image in which the deviations from ideal are magnified. Our algorithm can be used to reveal interesting and important information about the objects in the scene and their interaction with the environment. Figure 1 shows two independently processed images of the same house, in which our method automatically reveals the sagging of the house’s roof, by estimating its departure from a straight line.

Departures from “idealized geometry” make for captivating videos but there is a more subtle point that Deviation Magnification will help bring to the fore.

“Idealized geometry,” just like discrete metrics for attitude measurement or metrics of meaning, etc. are all myths. Useful myths as houses don’t (usually) fall down, marketing campaigns have a high degree of success, and engineering successfully relies on approximations that depart from the “real world.”

Science and computers have a degree of precision that has no counterpart in the “real world.”

Watch the video again if you doubt that last statement.

Whether you are using science and/or a computer, always remember that your results are approximations based upon approximations.

I first saw this in Four Short Links: 12 November 2015 by Nat Torkington.

November 5, 2015

Information Visualization MOOC 2015

Filed under: Computer Science,Graphics,Visualization — Patrick Durusau @ 2:39 pm

Information Visualization MOOC 2015 by Katy Börner.

From the webpage:

This course provides an overview about the state of the art in information visualization. It teaches the process of producing effective visualizations that take the needs of users into account.

Among other topics, the course covers:

  • Data analysis algorithms that enable extraction of patterns and trends in data
  • Major temporal, geospatial, topical, and network visualization techniques
  • Discussions of systems that drive research and development.

The MOOC ended in April of 2015 but you can still register for a self-paced version of the course.

A quick look at 2013 client projects or the current list of clients and projects, with who students can collaborate, will leave no doubt this is a top-rank visualization course.

I first saw this in a tweet by Kirk Borne.

October 26, 2015

How to Get Free Access to Academic Papers on Twitter [3 Rules]

Filed under: Computer Science,Twitter — Patrick Durusau @ 8:56 pm

How to Get Free Access to Academic Papers on Twitter by Aamna Mohdin.

From the post:

Most academic journals charge expensive subscriptions and, for those without a login, fees of $30 or more per article. Now academics are using the hashtag #icanhazpdf to freely share copyrighted papers.

Scientists are tweeting a link of the paywalled article along with their email address in the hashtag—a riff on the infamous meme of a fluffy cat’s “I Can Has Cheezburger?” line. Someone else who does have access to the article downloads a pdf of the paper and emails the file to the person requesting it. The initial tweet is then deleted as soon as the requester receives the file.

3 rules to remember:

  1. Paywall link + #icanhazpdf + your email.
  2. Delete tweet when paper arrives.
  3. Don’t ask/Don’t tell.

Enjoy!

October 14, 2015

The Refreshingly Rewarding Realm of Research Papers

Filed under: Computer Science,Research Methods — Patrick Durusau @ 9:20 pm

From the description:

Sean Cribbs teaches us how to read and implement research papers – and translate what they describe into code. He covers examples of research implementations he’s been involved in and the relationships he’s built with researchers in the process.

A bit longer description at: http://chicago.citycode.io/sean-cribbs.html

Have you ever run into a thorny problem that makes your code slow or complicated, for which there is no obvious solution? Have you ever needed a data structure that your language’s standard library didn’t provide? You might need to implement a research paper!

While much of research in Computer Science doesn’t seem relevant to your everyday web application, all of those tools and techniques you use daily originally came from research! In this talk we’ll learn why you might want to read and implement research papers, how to read them for relevant information, and how to translate what they describe into code and test the results. Finally, we’ll discuss examples of research implementation I’ve been involved in and the relationships I’ve built with researchers in the process.

As you might imagine, I think this rocks!

September 27, 2015

The World’s First $9 Computer is Shipping Today!

Filed under: Computer Science — Patrick Durusau @ 8:43 pm

The World’s First $9 Computer is Shipping Today! by The World’s First $9 Computer is Shipping Today!Khyati Jain.

From the post:

Remember Project: C.H.I.P. ?

A $9 Linux-based, super-cheap computer that raised some $2 Million beyond a pledge goal of just $50,000 on Kickstarter will be soon in your pockets.

Four months ago, Dave Rauchwerk, CEO of Next Thing Co., utilized the global crowd-funding corporation ‘Kickstarter’ for backing his project C.H.I.P., a fully functioning computer that offers more than what you could expect for just $9.

See Khyati’s post for technical specifications.

Security by secrecy is meaningless when potential hackers (14-64) number 4.8 billion.

With enough hackers, all bugs can be found.

August 21, 2015

Solving the Stable Marriage problem…

Filed under: Computer Science,Erlang — Patrick Durusau @ 2:58 pm

Solving the Stable Marriage problem with Erlang by Yan Cui.

With all the Ashley Madison hack publicity, I didn’t know there was a “stable marriage problem.” 😉

Turns out is it like the Eight-Queens problem. Is is a “problem” but it isn’t one you are likely to encounter outside of a CS textbook.

Yan sets up the problem with this quote from Wikipedia:

The stable marriage problem is commonly stated as:

Given n men and n women, where each person has ranked all members of the opposite sex with a unique number between 1 and n in order of preference, marry the men and women together such that there are no two people of opposite sex who would both rather have each other than their current partners. If there are no such people, all the marriages are “stable”. (It is assumed that the participants are binary gendered and that marriages are not same-sex).

The wording is a bit awkward. I would rephrase it to say that for no pair, both partners prefer some other partner. One of the partner’s can prefer someone else, but if the someone else does not share that preference, both marriages are “stable.”

The Wikipedia article does observe:

While the solution is stable, it is not necessarily optimal from all individuals’ points of view.

Yan sets up the problem and then walks through the required code.

Enjoy!

August 20, 2015

Free Packtpub Books (Legitimate Ones)

Filed under: Books,Computer Science — Patrick Durusau @ 2:52 pm

Packtpub Books is running a “free book per day” event. Most of you know Packtpub already so I won’t belabor the quality of their publications, etc.

The important news is that for 24 hours each day in August, Packtpub Books is offering a different book for free download! The current free book offer appears to expire at the end of August, 2015.

Packtpub Books – Free Learning

This is a great way to introduce non-Packtpub customers to Packtpub publications.

Please share this news widely (and with other publishers). 😉

June 30, 2015

The Life Cycle of Programming Languages

Filed under: Computer Science,Programming — Patrick Durusau @ 4:03 pm

The Life Cycle of Programming Languages by Betsy Haibel

I don’t know that you will agree with Betsy’s conclusion but it is an interesting read.


Fourteen years ago the authors of the Agile Manifesto said unto us: all technical problems are people problems that manifest technically. In doing so they repeated what Peopleware’s DeMarco and Lister had said fourteen years before that. We cannot break the endless cycle of broken frameworks and buggy software by pretending that broken, homogenous [sic] communities can produce frameworks that meet the varied needs of a broad developer base. We have known this for three decades.

The “homogeneous community” in question is, of course, white males.

I have no idea if the founders of the languages she mentions are all white males or not. But for purposes of argument, let’s say that the founding communities in question are exclusively white males. And intentionally so.

OK, where is the comparison case of language development that demonstrates a more gender, racial, sexual orientation, religious, inclusive group would produce less broken frameworks and less buggy software, but some specified measure?

I understand the point that frameworks and code are currently broken and buggy, no argument there. No need to repeat that or come up with new examples.

The question that interests me and I suspect would interest developers and customers alike, is where are the frameworks or code that is less buggy because they were created by more inclusive communities?

Inclusion will sell itself, quickly, if the case can be made that inclusive communities produce more useful frameworks or less buggy code.

In making the case for inclusion, citing studies that groups are more creative when diverse isn’t enough. Point to the better framework or less buggy code created by a diverse community. That should not be hard to do, assuming such evidence exists.

Make no mistake, I think discrimination on the basis of gender, race, sexual orientation, religion, etc. are not only illegal, they are immoral. However, the case for non-discrimination is harmed by speculative claims for improved results that are not based on facts.

Where are those facts? I would love to be able to cite them.

PS: Flames will be deleted. With others I fought gender/racial discrimination in organizing garment factories where the body heat of the workers was the only heat in the winter. Only to be betrayed by a union more interested in dues than justice for workers. Defeating discrimination requires facts, not rhetoric. (Recalling it was Brown vs. Board of Education that pioneered the use of social studies data in education litigation. They offered facts, not opinions.)

June 29, 2015

A Critical Review of Recurrent Neural Networks for Sequence Learning

Filed under: Computer Science,Machine Learning,Neural Networks — Patrick Durusau @ 1:03 pm

A Critical Review of Recurrent Neural Networks for Sequence Learning by Zachary C. Lipton.

Abstract:

Countless learning tasks require awareness of time. Image captioning, speech synthesis, and video game playing all require that a model generate sequences of outputs. In other domains, such as time series prediction, video analysis, and music information retrieval, a model must learn from sequences of inputs. Significantly more interactive tasks, such as natural language translation, engaging in dialogue, and robotic control, often demand both.

Recurrent neural networks (RNNs) are a powerful family of connectionist models that capture time dynamics via cycles in the graph. Unlike feedforward neural networks, recurrent networks can process examples one at a time, retaining a state, or memory, that reflects an arbitrarily long context window. While these networks have long been difficult to train and often contain millions of parameters, recent advances in network architectures, optimization techniques, and parallel computation have enabled large-scale learning with recurrent nets.

Over the past few years, systems based on state of the art long short-term memory (LSTM) and bidirectional recurrent neural network (BRNN) architectures have demonstrated record-setting performance on tasks as varied as image captioning, language translation, and handwriting recognition. In this review of the literature we synthesize the body of research that over the past three decades has yielded and reduced to practice these powerful models. When appropriate, we reconcile conflicting notation and nomenclature. Our goal is to provide a mostly self-contained explication of state of the art systems, together with a historical perspective and ample references to the primary research.

Lipton begins with an all too common lament:

The literature on recurrent neural networks can seem impenetrable to the uninitiated. Shorter papers assume familiarity with a large body of background literature. Diagrams are frequently underspecified, failing to indicate which edges span time steps and which don’t. Worse, jargon abounds while notation is frequently inconsistent across papers or overloaded within papers. Readers are frequently in the unenviable position of having to synthesize conflicting information across many papers in order to understand but one. For example, in many papers subscripts index both nodes and time steps. In others, h simultaneously stands for link functions and a layer of hidden nodes. The variable t simultaneously stands for both time indices and targets, sometimes in the same equation. Many terrific breakthrough papers have appeared recently, but clear reviews of recurrent neural network literature are rare.

Unfortunately, Lipton gives no pointers to where the variant practices occur, leaving the reader forewarned but not forearmed.

Still, this is a survey paper with seventy-three (73) references over thirty-three (33) pages, so I assume you will encounter various notation practices if you follow the references and current literature.

Capturing variations in notation, along with where they have been seen, won’t win the Turing Award but may improve the CS field overall.

June 15, 2015

Project Oberon

Filed under: Computer Science,Cybersecurity,Programming,Security — Patrick Durusau @ 2:36 pm

Project Oberon

From the webpage:

Project Oberon is a design for a complete computer system. Its simplicity and clarity enables a single person to know and implement the entire system, while still providing enough power to make it useful and usable in a production environment. This website contains information and resources to help you explore and use the system. The project is fully described in a book — Project Oberon: The Design of an Operating System, a Compiler, and a Computer — written by the designers, Niklaus Wirth and Jürg Gutknecht. The second (2013) edition of the book and the accompanying code are published on Niklaus Wirth’s website. We provide links to the original material here, along with local packaged copies, with kind permission from the authors.

You are unlikely to encounter an Oberon system in production use at most government or enterprise offices. Still, the experience of knowing how computer operating systems work will enable you to ask pointed security questions and to cut through the fog of evasion.

Nicklaus comments in the 2013 preface:

But surely new systems will emerge, perhaps for different, limited purposes, allowing for smaller systems. One wonders where their designers will study and learn their trade. There is little technical literature, and my conclusion is that understanding is generally gained by doing, that is, “on the job”. However, this is a tedious and suboptimal way to learn. Whereas sciences are governed by
principles and laws to be learned and understood, in engineering experience and practice are indispensable. Does Computer Science teach laws that hold for (almost) ever? More than any other field of engineering, it would be predestined to be based on rigorous mathematical principles. Yet, its core hardly is. Instead, one must rely on experience, that is, on studying sound examples.

The main purpose of and the driving force behind this project is to provide a single book that serves as an example of a system that exists, is in actual use, and is explained in all detail. This task drove home the insight that it is hard to design a powerful and reliable system, but even much harder to make it so simple and clear that it can be studied and fully understood. Above everything else, it requires a stern concentration on what is essential, and the will to leave out the rest, all the popular “bells and whistles”.

Recently, a growing number of people has become interested in designing new, smaller systems. The vast complexity of popular operating systems makes them not only obscure, but also provides opportunities for “back doors”. They allow external agents to introduce spies and devils unnoticed by the user, making the system attackable and corruptible. The only safe remedy is to build a safe system anew from scratch.

Did you catch that last line?

The only safe remedy is to build a safe system anew from scratch.

We don’t all need to build diverse (safe) systems but is does sound like a task the government could contract out to computer science departments. Adoption by the government alone would create a large enough market share to make it a viable platform.

Think of it this way: We can keep building sieves upon sieves upon sieves….nth sieve, all the while proclaiming increasing security, or, a safe system can be built. Developers should think of all the apps to be re-invented for the safe system. Something for everybody.

June 14, 2015

CVPR 2015 Papers

CVPR [Computer Vision and Pattern Recognition] 2015 Papers by @karpathy.

This is very cool!

From the webpage:

Below every paper are TOP 100 most-occuring words in that paper and their color is based on LDA topic model with k = 7.
(It looks like 0 = datasets?, 1 = deep learning, 2 = videos , 3 = 3D Computer Vision , 4 = optimization?, 5 = low-level Computer Vision?, 6 = descriptors?)

You can sort by LDA topics, view the PDFs, rank the other papers by tf-idf similarity to a particular paper.

Very impressive and suggestive of other refinements for viewing a large number of papers in a given area.

Enjoy!

June 13, 2015

What Is Code?

Filed under: Computer Science,Programming — Patrick Durusau @ 8:33 pm

What Is Code? by Paul Ford.

A truly stunning exposition on programming and computers. Written for business users it is a model of exposition for business readers. Some 38,000 words so it isn’t superficial but insightful.

I suggest you bookmark it and read it on a regular basis. It won’t improve your computer skills but it may improve your communication skills.

If you want to know more background on the piece, see: What Is Code?: A Q&A With Writer and Programmer Paul Ford by Ashley Feinberg

If you want to follow on Twitter: Paul Ford.

May 22, 2015

Rosetta’s Way Back to the Source

Filed under: Compilers,Computer Science,Programming,Software Engineering — Patrick Durusau @ 4:11 pm

Rosetta’s Way Back to the Source – Towards Reverse Engineering of Complex Software by Herman Bos.

From the webpage:

The Rosetta project, funded by the EU in the form of an ERC grant, aims to develop techniques to enable reverse engineering of complex software sthat is available only in binary form. To the best of our knowledge we are the first to start working on a comprehensive and realistic solution for recovering the data structures in binary programs (which is essential for reverse engineering), as well as techniques to recover the code. The main success criterion for the project will be our ability to reverse engineer a realistic, complex binary. Additionally, we will show the immediate usefulness of the information that we extract from the binary code (that is, even before full reverse engineering), by automatically hardening the software to make it resilient against memory corruption bugs (and attacks that exploit them).

In the Rosetta project, we target common processors like the x86, and languages like C and C++ that are difficult to reverse engineer, and we aim for full reverse engineering rather than just decompilation (which typically leaves out data structures and semantics). However, we do not necessarily aim for fully automated reverse engineering (which may well be impossible in the general case). Rather, we aim for techniques that make the process straightforward. In short, we will push reverse engineering towards ever more complex programs.

Our methodology revolves around recovering data structures, code and semantic information iteratively. Specifically, we will recover data structures not so much by statically looking at the instructions in the binary program (as others have done), but mainly by observing how the data is used

Research question. The project addresses the question whether the compilation process that translates source code to binary code is irreversible for complex software. Irreversibility of compilation is an assumed property that underlies most of the commercial software today. Specifically, the project aims to demonstrate that the assumption is false.
… (emphasis added)

Herman gives a great thumbnail sketch of the difficulties and potential for this project.

Looking forward to news of a demonstration that “irreversibility of computation” is false.

One important use case being verification that software that claims to have used prevention of buffer overflow techniques has in fact done so. Not the sort of thing I would entrust to statements in marketing materials.

May 19, 2015

The Back-to-Basics Readings of 2012

Filed under: Computer Science,Distributed Systems — Patrick Durusau @ 4:46 pm

The Back-to-Basics Readings of 2012 by Werner Vogels (CTO – Amazon.com).

From the post:

After the AWS re: Invent conference I spent two weeks in Europe for the last customer visits of the year. I have since returned and am now in New York City enjoying a few days of winding down the last activities of the year before spending the holidays here with family. Do not expect too many blog posts or twitter updates. Although there are still a few very exciting AWS news updates to happen this year.

I thought this was a good moment to collect all the readings I suggested this year in one summary post. It was not until later in the year that I started to recording the readings here on the blog, so I hope this is indeed the complete list. I am pretty sure some if not all of these papers deserved to be elected to the hall of fame of best papers in distributed systems.

My count is twenty-four (24) papers. More than enough for a weekend at the beach! 😉

I first saw this in a tweet by Computer Science.

May 18, 2015

FreeSearch

Filed under: Computer Science,Search Behavior,Search Engines,Searching — Patrick Durusau @ 5:52 pm

FreeSearch

From the “about” page:

The FreeSearch project is a search system on top of DBLP data provided by Michael Ley. FreeSearch is a joint project of the L3S Research Center and iSearch IT Solutions GmbH.

In this project we develop new methods for simple literature search that works on any catalogs, without requiring in-depth knowledge of the metadata schema. The system helps users proactively and unobtrusively by guessing at each step what the user’s real information need is and providing precise suggestions.

A more detailed description of the system can be found in this publication: FreeSearch – Literature Search in a Natural Way.

You can choose to search across:

DBLP (4,552,889 documents)

TIBKat (2,079,012 documents)

CiteSeer (1,910,493 documents)

BibSonomy (448,166 documents)

Enjoy!

March 15, 2015

Researchers just built a free, open-source version of Siri

Filed under: Artificial Intelligence,Computer Science,Machine Learning — Patrick Durusau @ 8:05 pm

Researchers just built a free, open-source version of Siri by Jordan Norvet.

From the post:

Major tech companies like Apple and Microsoft have been able to provide millions of people with personal digital assistants on mobile devices, allowing people to do things like set alarms or get answers to questions simply by speaking. Now, other companies can implement their own versions, using new open-source software called Sirius — an allusion, of course, to Apple’s Siri.

Today researchers from the University of Michigan are giving presentations on Sirius at the International Conference on Architectural Support for Programming Languages and Operating Systems in Turkey. Meanwhile, Sirius also made an appearance on Product Hunt this morning.

“Sirius … implements the core functionalities of an IPA (intelligent personal assistant) such as speech recognition, image matching, natural language processing and a question-and-answer system,” the researchers wrote in a new academic paper documenting their work. The system accepts questions and commands from a mobile device, processes information on servers, and provides audible responses on the mobile device.

Read the full academic paper (PDF) to learn more about Sirius. Find Sirius on GitHub here.

Opens up the possibility of a IPA (intelligent personal assistant) that has custom intelligence. Are your day-to-day tasks Apple cookie-cutter tasks or do they go beyond that?

The security implications are interesting as well. What if your IPA “reads” on a news stream that you have been arrested? Or if you fail to check in within some time window?

I first saw this in a tweet by Data Geek.

March 13, 2015

Building A Digital Future

Filed under: Computer Science,Data Science,Marketing — Patrick Durusau @ 7:06 pm

You may have missed BBC gives children mini-computers in Make it Digital scheme by Jane Wakefield.

From the post:

One million Micro Bits – a stripped-down computer similar to a Raspberry Pi – will be given to all pupils starting secondary school in the autumn term.

The BBC is also launching a season of coding-based programmes and activities.

It will include a new drama based on Grand Theft Auto and a documentary on Bletchley Park.

Digital visionaries

The initiative is part of a wider push to increase digital skills among young people and help to fill the digital skills gap.

The UK is facing a significant skills shortage, with 1.4 million “digital professionals” estimated to be needed over the next five years.

The BBC is joining a range of organisations including Microsoft, BT, Google, Code Club, TeenTech and Young Rewired State to address the shortfall.

At the launch of the Make it Digital initiative in London, director-general Tony Hall explained why the BBC was getting involved.

Isn’t that clever?

Odd that I haven’t heard about a similar effort in the United States.

There are only 15 million (14.6 million actually) secondary students this year in the United States and at $35 per Raspberry Pi, that’s only $525,000,000. That may sound like a lot but remember that the 2015 budget request for the Department of Homeland security is $38.2 Billion (yes, with a B). We are spending 64 times the amount needed to buy every secondary student in the United States a Raspberry Pi on DHS. A department that has yet to catch a single terrorist.

There would be consequences to buying every secondary student in the United States a Raspberry Pi:

  • Manufacturers of Raspberry Pi would have a revenue stream for more improvements
  • A vast secondary markets for add-ons for Raspberry Pi computers would be born
  • An even larger market for tutors and classes on Raspberry Pi would jump start
  • Millions of secondary students would be taking positive steps towards digital literacy

The only real drawback that I foresee is that the usual suspects would not be at the budget trough.

Maybe, just this once, the importance of digital literacy and inspiring a new generation of CS researchers is worth taking that hit.

Any school districts distributing Raspberry Pis on their own to set an example for the feds?

PS: I would avoid getting drawn into “accountability” debates. Some students will profit from them, some won’t. The important aspect is development of an ongoing principle of digital literacy and supporting it. Not every child reads books from the library but every community is poorer for the lack of a well supported library.

I first saw this in a tweet by Bart Hannsens.

February 26, 2015

Structure and Interpretation of Computer Programs (LFE Edition)

Filed under: Computer Science,CS Lectures,Erlang,LFE Lisp Flavored Erlang — Patrick Durusau @ 7:40 pm

Structure and Interpretation of Computer Programs (LFE Edition)

From the webpage:

This Gitbook (available here) is a work in progress, converting the MIT classic Structure and Interpretation of Computer Programs to Lisp Flavored Erlang. We are forever indebted to Harold Abelson, Gerald Jay Sussman, and Julie Sussman for their labor of love and intelligence. Needless to say, our gratitude also extends to the MIT press for their generosity in licensing this work as Creative Commons.

Contributing

This is a huge project, and we can use your help! Got an idea? Found a bug? Let us know!.

Writing, or re-writing if you are transposing a CS classic into another language, is far harder than most people imagine. Probably even more difficult than the original because your range of creativity is bound by the organization and themes of the underlying text.

I may have some cycles to donate to proof reading. Anyone else?

February 22, 2015

The Morning Paper [computing papers selected by Adrian Colyer]

Filed under: Computer Science,Distributed Computing,Programming — Patrick Durusau @ 10:58 am

The Morning Paper [computing papers selected by Adrian Colyer]

From the about page:

The Morning Paper: a short summary of an important, influential, topical or otherwise interesting paper in the field of CS every weekday. The Morning Paper started out as a twitter project (#themorningpaper), then it became clear a longer form was also necessary because some papers just have too much good content to get across in a small batch of 140-character tweets!

The daily selection will still be tweeted on my twitter account (adriancolyer), with a quote or two to whet your appetite. Any longer excerpts or commentary will live here.

Why ‘The Morning Paper?’ (a) it’s a habit I enjoy, and (b) if one or two papers catch your attention and lead you to discover (or rediscover) something of interest then I’m happy.

Adrian’s 100th post was January 7, 2015 so you have some catching up to do. 😉

Very impressive and far more useful than the recent “newspaper” formats that automatically capture content from a variety of sources.

The Morning Paper is curated content, which makes all the difference in the world.

There is an emphasis on distributed computing making The Morning Paper a must read for anyone interested in the present and future of computing services.

Enjoy!

I first saw this in a tweet by Tyler Treat.

January 9, 2015

Machine Learning (Andrew Ng) – Jan. 19th

Filed under: Computer Science,Education,Machine Learning — Patrick Durusau @ 6:00 pm

Machine Learning (Andrew Ng) – Jan. 19th

From the course page:

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI.

This course provides a broad introduction to machine learning, datamining, and statistical pattern recognition. Topics include: (i) Supervised learning (parametric/non-parametric algorithms, support vector machines, kernels, neural networks). (ii) Unsupervised learning (clustering, dimensionality reduction, recommender systems, deep learning). (iii) Best practices in machine learning (bias/variance theory; innovation process in machine learning and AI). The course will also draw from numerous case studies and applications, so that you’ll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas.

I could have just posted Machine Learning, Andrew Ng and 19 Jan. but there are people who have heard of this course before. Hard to believe but I have been assured that is in fact the case.

So the prose stuff is for them. Why are you reading this far? Go register for the course!

I have heard rumors the first course had an enrollment of over 100,000! I wonder if this course will break current records?

Enjoy!

A Master List of 1,100 Free Courses From Top Universities:…

Filed under: Computer Science,Education — Patrick Durusau @ 4:27 pm

A Master List of 1,100 Free Courses From Top Universities: 33,000 Hours of Audio/Video Lectures

From the post:

While you were eating turkey, we were busy rummaging around the internet and adding new courses to our big list of Free Online Courses, which now features 1,100 courses from top universities. Let’s give you the quick overview: The list lets you download audio & video lectures from schools like Stanford, Yale, MIT, Oxford and Harvard. Generally, the courses can be accessed via YouTube, iTunes or university web sites, and you can listen to the lectures anytime, anywhere, on your computer or smart phone. We didn’t do a precise calculation, but there’s probably about 33,000 hours of free audio & video lectures here. Enough to keep you busy for a very long time.

Right now you’ll find 127 free philosophy courses, 82 free history courses, 116 free computer science courses, 64 free physics courses and 55 Free Literature Courses in the collection, and that’s just beginning to scratch the surface. You can peruse sections covering Astronomy, Biology, Business, Chemistry, Economics, Engineering, Math, Political Science, Psychology and Religion.

OpenCulture has gathered up a large variety of materials.

Sadly I must report that Akkadian, Egyptian, Hittite, Sanskrit, and, Sumerian are all missing from their language resources. Maybe next year.

In the meantime, there are a number of other course selections to enjoy!

« Newer PostsOlder Posts »

Powered by WordPress