Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

June 1, 2014

More Science for Computer Science

Filed under: Computer Science,Design,Science,UML — Patrick Durusau @ 6:44 pm

In Debunking Linus’s Law with Science I pointed you to a presentation by Felienne Hermans outlining why the adage:

given enough eyeballs, all bugs are shallow

is not only false but the exact opposite is in fact true. The more people who participate in development of software, the more bugs it will contain.

Remarkably, I have found another instance of the scientific method being applied to computer science.

The abstract for On the use of software design models in software development practice: an empirical investigation by Tony Gorschek, Ewan Tempero, and, Lefteris Angelis, reads as follows:

Research into software design models in general, and into the UML in particular, focuses on answering the question how design models are used, completely ignoring the question if they are used. There is an assumption in the literature that the UML is the de facto standard, and that use of design models has had a profound and substantial effect on how software is designed by virtue of models giving the ability to do model-checking, code generation, or automated test generation. However for this assumption to be true, there has to be significant use of design models in practice by developers.

This paper presents the results of a survey summarizing the answers of 3785 developers answering the simple question on the extent to which design models are used before coding. We relate their use of models with (i) total years of programming experience, (ii) open or closed development, (iii) educational level, (iv) programming language used, and (v) development type.

The answer to our question was that design models are not used very extensively in industry, and where they are used, the use is informal and without tool support, and the notation is often not UML. The use of models decreased with an increase in experience and increased with higher level of qualification. Overall we found that models are used primarily as a communication and collaboration mechanism where there is a need to solve problems and/or get a joint understanding of the overall design in a group. We also conclude that models are seldom updated after initially created and are usually drawn on a whiteboard or on paper.

I plan on citing this paper the next time someone claims that UML diagrams will be useful for readers of a standard.

If you are interested in fact correction issues at Wikipedia, you might want to suggest that in the article on UML the statement:

UML has been found useful in many design contexts,[5] so much so that is has become ubiquitous in its field.

At least the second half of it, “so much so that is has become ubiquitous in its field,” appears to be false.

Do you know of any other uses of science with regard to computer science?

I first saw this in a twee by Erik Meijer

May 30, 2014

Debunking Linus’s Law with Science

Filed under: Computer Science,Science — Patrick Durusau @ 1:14 pm

Putting the science in computer science by Felienne Hermans.

From the description:

Programmers love science! At least, so they say. Because when it comes to the ‘science’ of developing code, the most used tool is brutal debate. Vim versus emacs, static versus dynamic typing, Java versus C#, this can go on for hours at end. In this session, software engineering professor Felienne Hermans will present the latest research in software engineering that tries to understand and explain what programming methods, languages and tools are best suited for different types of development.

Felienne dispells the notion that a discipline is scientific because it claims “science” as part of its name.

To inject some “science” into “computer science,” she reports tests of several propositions, widely held in CS circles, that don’t bear up when “facts” are taken into account.

For example, Linus’s Law: “Given enough eyeballs, all bugs are shallow.”

“Debunking” may not be strong enough because as Felienne shows, the exact opposite of Linus’s Law is true: The more people who touch code, the more bugs are introduced.

If some proprietary software house rejoices over that fact, you can point out that complexity of the originating organization also has a direct relationship to bugs. As in more and not less bugs.

That’s what happens when you go looking for facts. Old sayings true out to be not true and people you already viewed with suspicion turned out to be more incompetent than you thought.

That’s science.

May 17, 2014

Types, and two approaches to problem solving

Filed under: Computer Science,Problem Solving,Types — Patrick Durusau @ 6:39 pm

Types, and two approaches to problem solving by Dan Piponi.

From the post:

There are two broad approaches to problem solving that I see frequently in mathematics and computing. One is attacking a problem via subproblems, and another is attacking a problem via quotient problems. The former is well known though I’ll give some examples to make things clear. The latter can be harder to recognise but there is one example that just about everyone has known since infancy.

I don’t want to spoil Dan’s surprise so all I can say is go read the post!

An intuitive appreciation for types may help you with the full monty of types.

May 12, 2014

Categories from scratch

Filed under: Category Theory,Computer Science,Mathematics — Patrick Durusau @ 7:00 pm

Categories from scratch by Rapahel ‘kena’ Poss.

From the post:

Prologue

The concept of category from mathematics happens to be useful to computer programmers in many ways. Unfortunately, all “good” explanations of categories so far have been designed by mathematicians, or at least theoreticians with a strong background in mathematics, and this makes categories especially inscrutable to external audiences.

More specifically, the common explanatory route to approach categories is usually: “here is a formal specification of what a category is; then look at these known things from maths and theoretical computer science, and admire how they can be described using the notions of category theory.” This approach is only successful if the audience can fully understand a conceptual object using only its formal specification.

In practice, quite a few people only adopt conceptual objects by abstracting from two or more contexts where the concepts are applicable, instead. This is the road taken below: reconstruct the abstractions from category theory using scratches of understanding from various fields of computer engineering.

Overview

The rest of this document is structured as follows:

  1. introduction of example Topics of study: unix process pipelines, program statement sequences and signal processing circuits;
  2. Recollections of some previous knowledge about each example; highlight of interesting analogies between the examples;
  3. Identification of the analogies with existing concepts from category theory;
  4. a quick preview of Goodies from category theory;
  5. references to Further reading.

If you don’t already grok category theory, perhaps this will be the approach that tips the balance in your favor!

May 2, 2014

Experimental CS – Networks

Filed under: Computer Science,Networks — Patrick Durusau @ 8:00 pm

Design and analysis of experiments in networks: Reducing bias from interference by Dean Eckles, Brian Karrer, and, Johan Ugander.

Abstract:

Estimating the effects of interventions in networks is complicated when the units are interacting, such that the outcomes for one unit may depend on the treatment assignment and behavior of many or all other units (i.e., there is interference). When most or all units are in a single connected component, it is impossible to directly experimentally compare outcomes under two or more global treatment assignments since the network can only be observed under a single assignment. Familiar formalism, experimental designs, and analysis methods assume the absence of these interactions, and result in biased estimators of causal effects of interest. While some assumptions can lead to unbiased estimators, these assumptions are generally unrealistic, and we focus this work on realistic assumptions. Thus, in this work, we evaluate methods for designing and analyzing randomized experiments that aim to reduce this bias and thereby reduce overall error. In design, we consider the ability to perform random assignment to treatments that is correlated in the network, such as through graph cluster randomization. In analysis, we consider incorporating information about the treatment assignment of network neighbors. We prove sufficient conditions for bias reduction through both design and analysis in the presence of potentially global interference. Through simulations of the entire process of experimentation in networks, we measure the performance of these methods under varied network structure and varied social behaviors, finding substantial bias and error reductions. These improvements are largest for networks with more clustering and data generating processes with both stronger direct effects of the treatment and stronger interactions between units.

Deep sledding but that is to be expected as CS matures and abandons simplistic models, such as non-interaction between units in a network.

While I was reading the abstract, it occurred to me that merges that precipitate other merges could be said to cause interaction between topics.

Since the authors found error reduction in networks with as few as 1,000 vertices, you should not wait until you are building very large topic maps to take this paper into account.

Find Papers We Love

Filed under: Computation,Computer Science,CS Lectures — Patrick Durusau @ 7:29 pm

Find Papers We Love by Zachary Tong.

A search interface to the Github repository maintained by @Papers_we_love.

It’s not “big data” but this search interface is going to make my life better.

You?

PS: Papers We Love

From their homepage:

What was the last paper within the realm of computing you read and loved? What did it inspire you to build or tinker with? Come share the ideas in an awesome academic/research paper with fellow engineers, programmers, and paper-readers. Lead a session and show off code that you wrote that implements these ideas or just give us the lowdown about the paper (because of HARD MATH!). Otherwise, just come, listen, and discuss.

We’re curating a repository for papers and places-to-find papers. You can contribute by adding PR’s for papers, code, and/or links to other repositories.

We’re posting videos of all our presentations, from all our chapters.

This is a productive use of the Internet.

March 10, 2014

Orbital Computing – Electron Orbits That Is.

Filed under: Computer Science,HPC — Patrick Durusau @ 7:41 pm

Physicist proposes a new type of computing at SXSW. Check out orbital computing by Stacey Higginbotham.

From the post:

The demand for computing power is constantly rising, but we’re heading to the edge of the cliff in terms of increasing performance — both in terms of the physics of cramming more transistors on a chip and in terms of the power consumption. We’ve covered plenty of different ways that researchers are trying to continue advancing Moore’s Law — this idea that the number of transistors (and thus the performance) on a chip doubles every 18 months — especially the far out there efforts that take traditional computer science and electronics and dump them in favor of using magnetic spin, quantum states or probabilistic logic.

We’re going to add a new impossible that might become possible to that list thanks to Joshua Turner, a physicist at the SLAC National Accelerator Laboratory, who has proposed using the orbits of electrons around the nucleus of an atom as a new means to generate the binary states (the charge or lack of a charge that transistors use today to generate zeros and ones) we use in computing. He calls this idea orbital computing and the big takeaway for engineers is that one can switch the state of an electron’s orbit 10,000 times faster than you can switch the state of a transistor used in computing today.

That means you can still have the features of computing in that you use binary programming, but you just can compute more in less time. To get us to his grand theory, Turner had to take the SXSW audience through how computing works, how transistors work, the structure of atoms, the behavior of subatomic particles and a bunch of background on X-rays.

This would have been a presentation to see: Bits, Bittier Bits & Qubits: Physics of Computing

Try this SLAC Search for some publications by Joshua Turner.

It’s always fun to read about how computers will be able to process data more quickly. A techie sort of thing.

On the other hand, going 10,000 times faster with semantically heterogeneous data, will get you to the wrong answer 10,000 times faster.

If you realize the answer is wrong, you may have time to try again.

What if you don’t realize the answer is wrong?

Do you really want to be the customs agent who stops a five year old because their name is similar to that of a known terrorist? Because the machine said they could not fly?

Excited about going faster, worried about data going by too fast for anyone to question its semantics.

March 8, 2014

papers-we-love

Filed under: Computer Science,CS Lectures — Patrick Durusau @ 8:20 pm

papers-we-love

From the webpage:

Repository related to the following meetups:

Let us know if you are interested in starting a chapter!

A GitHub repository of CS papers.

If you decide to start a virtual “meetup” be sure to ping me. Nothing against the F2F meetings, absolutely needed, but some of use can’t make F2F meetings.

PS: There is also a list of other places to search for good papers.

March 2, 2014

Theoretical CS Books Online

Filed under: Books,Computer Science — Patrick Durusau @ 9:25 pm

Theoretical CS Books Online

An awesome list of theoretical CS books at Stack Exchange, Theoretical Computer Science.

If you like the online version, be sure to acquire or recommend to your library to acquire a hard copy.

Good behavior on our part may encourage good behavior on the part of publishers.

Enjoy!

I first saw this in a tweet by Computer Science.

February 28, 2014

OCW Bookshelf

Filed under: Books,Computer Science — Patrick Durusau @ 6:59 pm

OCW Bookshelf

From the post:

MIT OpenCourseWare shares the course materials from classes taught on the MIT campus. In most cases, this takes the form of course documents such as syllabi, lecture notes, assignments and exams.

Occasionally, however, we come across textbooks we can share openly. This page provides an index of textbooks (and textbook-like course notes) that can be found throughout the OCW site. (Note that in most cases, resources are listed below by the course they are associated with.)

Covers a wide range of courses from computer science and engineering to languages and math.

I expect this resource will keep growing so remember to check back from time to time.

February 6, 2014

Should Everybody Learn to Code?

Filed under: Computer Science,CS Lectures,Programming — Patrick Durusau @ 7:41 pm

Should Everybody Learn to Code? by Esther Shein.

Interesting essay but most of the suggestions read like this one:

Just as students are taught reading, writing, and the fundamentals of math and the sciences, computer science may one day become a standard part of a K–12 school curriculum. If that happens, there will be significant benefits, observers say. As the kinds of problems we will face in the future will continue to increase in complexity, the systems being built to deal with that complexity will require increasingly sophisticated computational thinking skills, such as abstraction, decomposition, and composition, says Wing.

“If I had a magic wand, we would have some programming in every science, mathematics, and arts class, maybe even in English classes, too,” says Guzdial. “I definitely do not want to see computer science on the side … I would have computer science in every high school available to students as one of their required science or mathematics classes.”

But university CS programs for the most part don’t teach people to code. Rather they teach computer science in the abstract.

Moreover, coding practice isn’t necessary to contribute to computer science, as illustrated in “History of GIS and Early Computer Cartography Project,” by John Hessler, Cartographic Specialist, Geography and Map Division, Library of Congress.

As part of a project to collect early GIS materials, the following was discovered in an archive:

One set of papers in particular, which deserves much more attention from today’s mapmakers, historians, and those interested in the foundations of current geographic thought, is the Harvard Papers in Theoretical Geography. These papers, subtitled, “Geography and the properties of surfaces,” detail the lab’s early experiments in the computer analysis of cartographic problems. They also give insight into the theoretical thinking of many early researchers as they experimented with theorems from algebraic topology, complex spatial analysis algorithms, and various forms of abstract algebras to redefine the map as a mathematical tool for geographic analysis. Reading some of the titles in the series today, for example, “Hyper-surfaces and Geodesic Lines in 4-D Euclidean Space and The Sandwich Theorem: A Basic One for Geography,” gives one a sense of the experimentation and imaginative thinking that surrounded the breakthroughs necessary for the development of our modern computer mapping systems.

And the inspiration for this work?

Aside from the technical aspects that archives like this reveal, they also show deeper connections with cultural and intellectual history. They demonstrate how the practitioners and developers of GIS found themselves compelled to draw both distinctions and parallels with ideas that were appearing in the contemporary scholarly literature on spatial and temporal reasoning. Their explorations into this literature was not limited to geographic ideas on lived human space but also drew on philosophy, cognitive science, pure mathematics, and fields like modal logic—all somehow to come to terms with the diverse phenomena that have spatiotemporal extent and that might be mapped and analyzed.

Coding is a measurable activity but being measurable doesn’t mean it is the only way to teach abstract thinking skills.

The early days of computer science, including compiler research, suggest coding isn’t require to learn abstract thinking skills.

Coding is a useful skill but let’s not confuse a skill or even computer science with abstract thinking skills. Abstract thinking is needed in many domains and we will all profit from not defining it too narrowly.

I first saw this in a tweet from Tim O’Reilly, who credits Simon St. Laurent with the discovery.

January 18, 2014

Introduction to Statistical Computing

Filed under: Computation,Computer Science,R,Statistics — Patrick Durusau @ 7:54 pm

Introduction to Statistical Computing by Cosma Shalizi.

Description:

Computational data analysis is an essential part of modern statistics. Competent statisticians must not just be able to run existing programs, but to understand the principles on which they work. They must also be able to read, modify and write code, so that they can assemble the computational tools needed to solve their data-analysis problems, rather than distorting problems to fit tools provided by others. This class is an introduction to programming, targeted at statistics majors with minimal programming knowledge, which will give them the skills to grasp how statistical software works, tweak it to suit their needs, recombine existing pieces of code, and when needed create their own programs.

Students will learn the core of ideas of programming — functions, objects, data structures, flow control, input and output, debugging, logical design and abstraction — through writing code to assist in numerical and graphical statistical analyses. Students will in particular learn how to write maintainable code, and to test code for correctness. They will then learn how to set up stochastic simulations, how to parallelize data analyses, how to employ numerical optimization algorithms and diagnose their limitations, and how to work with and filter large data sets. Since code is also an important form of communication among scientists, students will learn how to comment and organize code.

The class will be taught in the R language.

Slides and R code for three years (as of the time of this post.

December 28, 2013

free-programming-books (878 books)

Filed under: Books,Computer Science — Patrick Durusau @ 5:09 pm

free-programming-books

A much larger collection of free books than I pointed to at Free Programming Books in October of 2011.

I count eight hundred and seventy-eight (878 entries) along with twenty-two (22) pointers to other lists of free programming books.

If you were disappointed by the computer books you got for Christmas and/or didn’t get any computer books at all, you can find solace here. 😉

November 26, 2013

A Book About Bandit Algorithms

Filed under: Algorithms,Computer Science,Programming — Patrick Durusau @ 9:50 am

A Book About Bandit Algorithms by Noel Welsh.

From the introduction:

…The basic bandit problem is this:

We are alone in a room full of slot machines (fruit machines, or one-armed bandits). At every tick of the clock we must choose a machine to play. When we pull the arm on a machine we get a reward – perhaps a river of coins tumbles from the machine, or perhaps nothing at all happens – but we don’t know what reward we will get till we try the arm. By pulling arms we can learn about the machines, and we must use this knowledge to choose the best arms to pull.

Though I’ve tried to make the problem concrete (and also show how the name multi-armed bandit arises) my description leaves many things undefined:

  • I haven’t said how the machines behave. Do they always reward at the same rate, or does the rate of reward change arbitrarily, or even capriciously?
  • Do we know anything about the machines before we start? Are some of them, for example, from the same manufacturer, so we might assume they behave similarly?
  • What exactly is the goal? Are we playing this game forever, and if so do we want to make the most money possible? Alternatively, we might want to find the best machine as quickly as possible, so we can go do something more interesting with our time instead.

All of these different variations are bandit problems that we will discuss. The essence of a bandit problem is meeting two criteria. Firstly, we only learn about actions when we take them. Thus bandits are partial information problems. We also require that our actions do not change the machines’ behaviour, though their behaviour may change for other reasons. This important restriction means we don’t have to consider the effect our past actions might have on the present, and makes bandit algorithms tractable to solve. If we drop this restriction we have a problem known as reinforcement learning.

Is merging a “partial information problem (PIP)?”

Or does that answer depend upon the complexity of the merging criteria?

If the status of merging as a PIP rests on the complexity of merging criteria, what is the boundary where merging becomes a PIP?

A book in progress so be sure to sign up for new content and assist Noel with comments.

I first saw this mentioned in a tweet by Lars Marius Garshol.

October 18, 2013

Introduction to Data Science at Columbia University

Filed under: Computer Science,Data Science — Patrick Durusau @ 3:01 pm

Introduction to Data Science at Columbia University by Dr. Rachel Schutt.

The link points to a blog for the course. Includes entries for the same class last year.

Search for “INTRODUCTION TO DATA SCIENCE VERSION 2.0” and that should put you out at the first class for Fall 2013.

Personally I would read the entries for 2012 as well.

Hard to know when a chance remark from a student will provoke new ideas from other students or even professors.

That is one the things I like the most about teaching, being challenged by insights and views that hadn’t occurred to me.

Not that I always agree with them but it is a nice break from talking to myself. 😉

I first saw this at: Columbia Intro to Data Science 2.0.

October 8, 2013

From Algorithms to Z-Scores:…

Filed under: Algorithms,Computer Science,Mathematics,Probability,R,Statistics — Patrick Durusau @ 2:47 pm

From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science by Noram Matloff.

From the Overview:

The materials here form a textbook for a course in mathematical probability and statistics for computer science students. (It would work fine for general students too.)


“Why is this text different from all other texts?”

  • Computer science examples are used throughout, in areas such as: computer networks; data and text mining; computer security; remote sensing; computer performance evaluation; software engineering; data management; etc.
  • The R statistical/data manipulation language is used throughout. Since this is a computer science audience, a greater sophistication in programming can be assumed. It is recommended that my R tutorials be used as a supplement:

  • Throughout the units, mathematical theory and applications are interwoven, with a strong emphasis on modeling: What do probabilistic models really mean, in real-life terms? How does one choose a model? How do we assess the practical usefulness of models?

    For instance, the chapter on continuous random variables begins by explaining that such distributions do not actually exist in the real world, due to the discreteness of our measuring instruments. The continuous model is therefore just that–a model, and indeed a very useful model.

    There is actually an entire chapter on modeling, discussing the tradeoff between accuracy and simplicity of models.

  • There is considerable discussion of the intuition involving probabilistic concepts, and the concepts themselves are defined through intuition. However, all models and so on are described precisely in terms of random variables and distributions.

Another open-source textbook from Norm Matloff!

Algorithms to Z-Scores (the book).

Source files for the book available at: http://heather.cs.ucdavis.edu/~matloff/132/PLN .

Norm suggests his R tutorial, R for Programmers http://heather.cs.ucdavis.edu/~matloff/R/RProg.pdf as supplemental reading material.

To illustrate the importance of statistics, Norm gives the following examples in chapter 1:

  • The statistical models used on Wall Street made the quants” (quantitative analysts) rich— but also contributed to the worldwide fi nancial crash of 2008.
  • In a court trial, large sums of money or the freedom of an accused may hinge on whether the judge and jury understand some statistical evidence presented by one side or the other.
  • Wittingly or unconsciously, you are using probability every time you gamble in a casino— and every time you buy insurance.
  • Statistics is used to determine whether a new medical treatment is safe/e ffective for you.
  • Statistics is used to flag possible terrorists —but sometimes unfairly singling out innocent people while other times missing ones who really are dangerous.

Mastering the material in this book will put you a long way to becoming a network “statistical skeptic.”

So you can debunk mis-leading or simply wrong claims by government, industry and special interest groups. Wait! Those are also known as advertisers. Never mind.

September 28, 2013

Big Data Boot Camp Day 2,3 and 4, Simons Institute, Berkeley

Filed under: BigData,Computer Science,Programming — Patrick Durusau @ 7:25 pm

Big Data Boot Camp Day 2,3 and 4, Simons Institute, Berkeley by Igor Carron.

Igor has posted links to videos and supplemental materials for:

  • Algorithmic High-Dimensional Geometry II by Alex Andoni, Microsoft Research
  • High-Dimensional Statistics I by Martin Wainwright, UC Berkeley
  • High-Dimensional Statistics II by Martin Wainwright, UC Berkeley
  • Optimization I by Ben Recht, UC Berkeley
  • Optimization II by Ben Recht, UC Berkeley
  • Past, Present and Future of Randomized Numerical Linear Algebra I by Petros Drineas, Rensselaer Polytechnic Institute & Michael Mahoney, Stanford University
  • Past, Present and Future of Randomized Numerical Linear Algebra II by Petros Drineas, Rensselaer Polytechnic Institute & Michael Mahoney, Stanford University
  • Some Geometric Perspectives on Combinatorics: High-Dimensional, Local and Local-to-Global I by Nati Linial, Hebrew University of Jerusalem
  • Some Geometric Perspectives on Combinatorics: High-Dimensional, Local and Local-to-Global II by Nati Linial, Hebrew University of Jerusalem
  • Streaming, Sketching and Sufficient Statistics I by Graham Cormode, University of Warwick
  • Streaming, Sketching and Sufficient Statistics II by Graham Cormode, University of Warwick
  • Theory and Big Data by Ravi Kannan, Microsoft Research India

Just in case you have run out of video material for the weekend.

September 26, 2013

Big Data Boot Camp Day 1, Simons Institute, Berkeley

Filed under: BigData,Computer Science,Programming — Patrick Durusau @ 4:37 pm

Big Data Boot Camp Day 1, Simons Institute, Berkeley by Igor Carron.

Igor has posted links to videos and supplemental materials for:

  • Big Data: The Computation/Statistics Interface by Michael Jordan, UC Berkeley.
  • Algorithmic High-Dimensional Geometry I by Alex Andoni, Microsoft Research.
  • User-Friendly Tools for Random Matrices I, II, III by Joel Tropp, California Institute of Technology.

BTW, the Simons Institute for the Theory of Computing has a channel on YouTube.

As of September 26, 2013, some one hundred and five videos. Lots of top quality, cutting edge material.

Now, more than ever, ignorance is a self-inflicted wound.

September 21, 2013

Easy 6502

Filed under: Computer Science,Programming — Patrick Durusau @ 1:54 pm

Easy 6502 by Nick Morgan.

From the post:

In this tiny ebook I’m going to show you how to get started writing 6502 assembly language. The 6502 processor was massive in the seventies and eighties, powering famous computers like the BBC Micro, Atari 2600, Commodore 64, Apple II, and the Nintendo Entertainment System. Bender in Futurama has a 6502 processor for a brain. Even the Terminator was programmed in
6502
.

So, why would you want to learn 6502? It’s a dead language isn’t it? Well, so’s Latin. And they still teach that. Q.E.D.

(Actually, I’ve been reliably informed that 6502 processors are still being produced by Western Design Center, so clearly 6502 isn’t a dead language! Who knew?)

Seriously though, I think it’s valuable to have an understanding of assembly language. Assembly language is the lowest level of abstraction in computers – the point at which the code is still readable. Assembly language translates directly to the bytes that are executed by your computer’s processor. If you understand how it works, you’ve basically become a computer magician.

Then why 6502? Why not a useful assembly language, like x86? Well, I don’t think learning x86 is useful. I don’t think you’ll ever have to write assembly language in your day job – this is purely an academic exercise, something to expand your mind and your thinking. 6502 was originally written in a different age, a time when the majority of developers were writing assembly directly, rather than in these new-fangled high-level programming languages. So, it was designed to be written by humans. More modern assembly languages are meant to written by compilers, so let’s leave it to them. Plus, 6502 is fun. Nobody ever called x86 fun.

I’m not too sure about no assembly language in your day job. The security on Intel’s Ivy Bridge line of microprocessors can be broken by changing doping on the chip. Not an everyday skill. See, Researchers can slip an undetectable trojan into Intel’s Ivy Bridge CPUs by Dan Goodin for the details.

I mention the 6502 ebook because programming where subject identity is an issue is different from programming where all identities are opaque.

Think about calling a method on a class in Java. Either the runtime finds the class and method or fails. It does not look for a class with particular characteristics and if it passes a subject identity test, then invoke the method.

I mention this because it seems relevant to the scaling question for topic maps. When you are manipulating subject identities, there is more work going on than invoking opaque strings that are found or not.

Part of deciding how to deal with the additional overhead is choosing which subjects are treated as having identifiers and which are going to be treated as opaque strings.

August 8, 2013

Bret Victor – The Future of Programming

Filed under: Computer Science,Design,Programming — Patrick Durusau @ 1:32 pm

I won’t try to describe or summarize Bret’s presentation for fear I will spoil it for you.

I can say that if you aspire to make a difference in computer science, large or small, this is a video for you.

There are further materials at Bret’s website: http://worrydream.com/

How to Design Programs, Second Edition

Filed under: Computer Science,Design,Programming — Patrick Durusau @ 12:44 pm

How to Design Programs, Second Edition by Matthias Felleisen, Robert Bruce Findler, Matthew Flatt, Shriram Krishnamurthi.

From the webpage:

Bad programming is easy. Idiots can learn it in 21 days, even if they are Dummies.

Good programming requires thought, but everyone can do it and everyone can experience the satisfaction that comes with it. The price is worth paying for the sheer joy of the discovery process, the elegance of the result, and the commercial benefits of a systematic program design process.

The goal of our book is to introduce readers of all ages and backgrounds to the craft of designing programs systematically. We assume few prerequisites: arithmetic, a tiny bit of middle school algebra, and the willingness to think through issues. We promise that the travails will pay off not just for future programmers but for anyone who has to follow a process or create one for others.

We are grateful to Ada Brunstein, our editor at MIT Press, who gave us permission to develop this second edition of “How to Design Programs” on-line.

Good to see this “classic” being revised online.

Differences: This second edition of “How to Design Programs” continues to present an introduction to systematic program design and problem solving. Here are some important differences:

  1. The recipes are applied in two different, typical settings: interactive graphical programs and so-called “batch” programs. The former mode of interaction is typical for games, the latter for data processing in business centers. Both kinds of programs are still created with our design recipes.

  2. While testing has always been a part of the “How to Design Programs” philosophy, the software started supporting it properly only in 2002, just after we had released the first edition. This new edition heavily relies on this testing support now.
  3. This edition of the book drops the design of imperative programs. The old chapters remain available on-line. The material will flow into the next volume of the book, “How to Design Components.”
  4. The book and its programs employ new libraries, also known as teachpacks. The preferred style is to link in these libraries via so-called require specifications, but it is still possible to add teachpacks via a menu in DrRacket.
  5. Finally, we decided to use a slightly different terminology:

    HtDP/1e

    HtDP/2e

    contract

    signature

    union

    itemization

Any other foundation texts that have abandoned imperative programming?

I first saw this in Nat Torkington’s Four short links: 5 August 2013.

August 4, 2013

Interesting times for literary theory

Filed under: Computer Science,Humanities,Literature — Patrick Durusau @ 3:16 pm

Interesting times for literary theory by Ted Underwood.

From the post:

(…)
This could be the beginning of a beautiful friendship. I realize a marriage between machine learning and literary theory sounds implausible: people who enjoy one of these things are pretty likely to believe the other is fraudulent and evil.** But after reading through a couple of ML textbooks,*** I’m convinced that literary theorists and computer scientists wrestle with similar problems, in ways that are at least loosely congruent. Neither field is interested in the mere accumulation of data; both are interested in understanding the way we think and the kinds of patterns we recognize in language. Both fields are interested in problems that lack a single correct answer, and have to be mapped in shades of gray (ML calls these shades “probability”). Both disciplines are preoccupied with the danger of overgeneralization (literary theorists call this “essentialism”; computer scientists call it “overfitting”). Instead of saying “every interpretation is based on some previous assumption,” computer scientists say “every model depends on some prior probability,” but there’s really a similar kind of self-scrutiny involved.
(…)

Computer science and the humanities could enrich each other greatly.

This could be a starting place for that enrichment.

DBLP feeds

Filed under: Bibliography,Computer Science — Patrick Durusau @ 2:16 pm

DBLP feeds

An RSS feed for conferences and journals that appear in the DBLP Computer Science Bibliography.

I count 1448 conference and 977 journal RSS feeds.

A great resource that merits your attention.

August 1, 2013

Ponder This

Filed under: Computer Science,Contest — Patrick Durusau @ 2:30 pm

Ponder This: August 2013

From the webpage:

Welcome to our monthly puzzles

You are cordially invited to match wits with some of the best minds in IBM Research.

Seems some of us can’t see a problem without wanting to take a crack at solving it. Does that sound like you? Good. Forge ahead and ponder this month’s problem. We’ll post a new one every month, and allow a month for you to submit solutions (we may even publish submitted answers, especially if they’re correct). We usually won’t reply individually to submitted solutions but every few days we will update a list of people who answered correctly. At the beginning of the next month, we’ll post the answer.

The August 2013 puzzle starts off:

Put five-bit numbers on the vertices of a 9-dimensional hypercube such that, from any vertex, you can reach any number in no more than two moves along the edges of the hypercube.
(…)

A worked example using four-bit numbers on a 5-dimensional hypercube is given.

Are higher dimension data structures in your future?

July 19, 2013

Communicating Sequential Processes (CSP)

Filed under: Computer Science,Programming — Patrick Durusau @ 3:42 pm

Communicating Sequential Processes (CSP) by Tony Hoare.

From the webpage:

Communicating Sequential Processes, or CSP, is a language for describing patterns of interaction. It is supported by an elegant, mathematical theory, a set of proof tools, and an extensive literature. The book Communicating Sequential Processes was first published in 1985 by Prentice Hall International (who have kindly released the copyright); it is an excellent introduction to the language, and also to the mathematical theory.

An electronic version of the book has been produced, and may be copied, printed, and distributed free of charge. However, such copying, printing, or distribution may not: be carried out for commercial gain; or – for copyright reasons – take place within India, Pakistan, Bangladesh, Sri Lanka, or the Maldives; or involve any modification to the document itself.

Electronic version

Mailing list: csp-announce@comlab.ox.ac.uk

.Enjoy!

June 24, 2013

Succinct data structures for representing equivalence classes

Filed under: Computer Science,Data Structures,Equivalence Class,Merging — Patrick Durusau @ 1:59 pm

Succinct data structures for representing equivalence classes by Moshe Lewenstein, J. Ian Munro, and Venkatesh Raman.

Abstract:

Given a partition of an n element set into equivalence classes, we consider time-space tradeoffs for representing it to support the query that asks whether two given elements are in the same equivalence class. This has various applications including for testing whether two vertices are in the same component in an undirected graph or in the same strongly connected component in a directed graph.

We consider the problem in several models.

— Concerning labeling schemes where we assign labels to elements and the query is to be answered just by examining the labels of the queried elements (without any extra space): if each vertex is required to have a unique label, then we show that a label space of (\sum_{i=1}^n \lfloor {n \over i} \rfloor) is necessary and sufficient. In other words, \lg n + \lg \lg n + O(1) bits of space are necessary and sufficient for representing each of the labels. This slightly strengthens the known lower bound and is in contrast to the known necessary and sufficient bound of \lceil \lg n \rceil for the label length, if each vertex need not get a unique label.

–Concerning succinct data structures for the problem when the n elements are to be uniquely assigned labels from label set {1, 2, …n}, we first show that \Theta(\sqrt n) bits are necessary and sufficient to represent the equivalence class information. This space includes the space for implicitly encoding the vertex labels. We can support the query in such a structure in O(\lg n) time in the standard word RAM model. We then develop structures resulting in one where the queries can be supported in constant time using O({\sqrt n} \lg n) bits of space. We also develop space efficient structures where union operation along with the equivalence query can be answered fast.

On the down side, this technique would not support merging based on arbitrary choice of properties.

On the up side, this technique does support merging based on pre-determined properties for merging.

The latter being the more common case, I commend this article to you for a close read.

June 22, 2013

The Quipper Language [Quantum Computing]

Filed under: Computation,Computer Science,Functional Programming,Programming,Quantum — Patrick Durusau @ 5:26 pm

The Quipper Language

From the webpage:

Quipper is an embedded, scalable functional programming language for quantum computing. It provides, among other things:

  • A high-level circuit description language. This includes gate-by-gate descriptions of circuit fragments, as well as powerful operators for assembling and manipulating circuits.
  • A monadic semantics, allowing for a mixture of procedural and declarative programming styles.
  • Built-in facilities for automatic synthesis of reversible quantum circuits, including from classical code.
  • Support for hierarchical circuits.
  • Extensible quantum data types.
  • Programmable circuit transformers.
  • Support for three execution phases: compile time, circuit generation time, and circuit execution time. A dynamic lifting operation to allow circuit generation to be parametric on values generated at circuit execution time.
  • Extensive libraries of quantum functions, including: libraries for quantum integer and fixed-point arithmetic; the Quantum Fourier transform; an efficient Qram implementation; libraries for simulation of pseudo-classical circuits, Stabilizer circuits, and arbitrary circuits; libraries for exact and approximate decomposition of circuits into specific gate sets.

The website has a Quipper tutorial, online documentation and a detailed look at the language itself.

No link for a quantum computer but that isn’t very far off.

Learn Quipper now and perhaps you can lead the first Apache project to develop open source software for a quantum computer.

I first saw this in a tweet by Jose A. Alonso.

June 19, 2013

Symposium on Visions of the Theory of Computing

Filed under: Computer Science,CS Lectures — Patrick Durusau @ 3:08 pm

Symposium on Visions of the Theory of Computing by the Simons Institute for the Theory of Computing.

Description:

May 29-31, 2013, UC Berkeley: This three-day symposium brought together distinguished speakers and participants from the Bay Area and all over the world to celebrate both the excitement of fundamental research on the Theory of Computing, and the accomplishments and promise of computational research in effecting progress in other sciences – the two pillars of the research agenda of the Institute.

I’m not sure if it is YouTube or the people who post videos to YouTube, but customary sorting rules, such as by an author’s last name appear to be lacking.

To facilitate your finding the lecture of your choice, I have sorted the videos by the speaker’s last name.

Should you have the occasion to post a list of videos, papers, presentations, please be courteous to readers who want to scan a list sorted by author/presenter.

It will still appear to be a random ordering to those unfamiliar with that technique. (Or ctrl-f or search engines.) 😉

June 17, 2013

From NAND to Tetris

Filed under: Computer Science,Programming — Patrick Durusau @ 4:23 pm

From NAND to Tetris: Building a Modern Computer from First Principles

From the website:

The official companion web site of Nand2Tetris courses

And of the book The Elements of Computing Systems, MIT Press, By Noam Nisan and Shimon Schocken.

The site contains all the software tools and project materials necessary to build a general-purpose computer system from the ground up. We also provide a set of lectures designed to support a typical course on the subject.

The materials are aimed at students, instructors, and self-learners. Everything is free and open-source; as long as you operate in a non-profit educational setting, you are welcome to modify and use our materials as you see fit.

Kris Geusebroek had me musing this morning about static database structures being a legacy of a shortage of CPU cycles. (I Mapreduced a Neo store [Legacy of CPU Shortages?])

What other limitations are due to system design?

This course could start you in the direction of answering that question.

May 26, 2013

SIAM Archives

Filed under: Algorithms,Archives,Computer Science — Patrick Durusau @ 2:05 pm

I saw an announcement for SDM 2014 : SIAM International Conference on Data Mining, Philadelphia, Pennsylvania, USA, April 24 – 26, 2014, today but the call for papers hasn’t appeared, yet.

While visiting the conference site I followed the proceedings link to discover:

Workshop on Algorithm Engineering and Experiments (ALENEX) 2006 – 2013

Workshop on Analytic Algorithmics and Combinatorics (ANALCO) 2006 – 2013

ACM-SIAM Symposium on Discrete Algorithms (SODA) 2009 – 2013

Data Mining – 2001 – 2013

Mathematics for Industry 2009

Just in case you are short on reading material for the summer. 😉

« Newer PostsOlder Posts »

Powered by WordPress