Archive for August, 2012

SMART – string matching research tool

Friday, August 31st, 2012

SMART – string matching research tool by Simone Faro and Thierry Lecroq.

From the webpage:

1 tool

smart (string matching algorithms research tool) is a tool which provides a standard framework for researchers in string matching. It helps users to test, design, evaluate and understand existing solutions for the exact string matching problem. Moreover it provides the implementation of (almost) all string matching algorithms and a wide corpus of text buffers.

40 years

In the last 40 years of research in computer science string matching was one of the most extensively studied problem, mainly due to its direct applications to such diverse areas as text, image and signal processing, speech analysis and recognition, data compression, information retrieval, computational biology and chemistry. Moreover String matching algorithms are also basic components used in implementations of practical softwares existing under most operating systems.

85 algos

Since 1970 more than 80 string matching algorithms have been proposed, and more than 50% of them in the last ten years. The smart tool provides a comprehensive collection of all string matching algorithms, inplemented in C programming language, and helps researcher to perform experimental results and compare them from a practical point of view. Smart provides a practical and standard platform for testing string matching algorithms and sharing results with the community.

12 texts

The smart tool provides also a corpus of 12 texts on which the string matching algorithms can be tested. Texts in the corpus are of different types, including natural language texts, genome sequences, protein sequences, and random texts with a uniform distribution of characters.

Do you know of any similar research tools for named entity recognition?

Bio-hackers will be interested in the “Complete genome of the E. Coli bacterium.”

A Fast Suffix Automata Based Algorithm for Exact Online String Matching

Friday, August 31st, 2012

A Fast Suffix Automata Based Algorithm for Exact Online String Matching by Simone Faro and Thierry Lecroq (Implementation and Application of Automata Lecture Notes in Computer Science, 2012, Volume 7381/2012, 149-158, DOI: 10.1007/978-3-642-31606-7_13)

Abstract:

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. Automata play a very important role in the design of efficient solutions for the exact string matching problem. In this paper we propose a new very simple solution which turns out to be very efficient in practical cases. It is based on a suitable factorization of the pattern and on a straightforward and light encoding of the suffix automaton. It turns out that on average the new technique leads to longer shift than that proposed by other known solutions which make use of suffix automata.

I suppose it is too much to expect writers to notice matching “…pattern[s] in a text is a fundamental problem in [topic maps]….”

Pattern matching in topic maps can extend beyond strings so perhaps the oversight is understandable. 😉

Apache Hadoop YARN – ResourceManager

Friday, August 31st, 2012

Apache Hadoop YARN – ResourceManager by Arun Murthy

From the post:

This is the third post in the multi-part series to cover important aspects of the newly formed Apache Hadoop YARN sub-project. In our previous posts (part one, part two), we provided the background and an overview of Hadoop YARN, and then covered the key YARN concepts and walked you through how diverse user applications work within this new system.

In this post, we are going to delve deeper into the heart of the system – the ResourceManager.

In case your data processing needs run towards the big/large end of the spectrum.

Data mining local radio with Node.js

Friday, August 31st, 2012

Data mining local radio with Node.js by Evan Muehlhausen.

From the post:

More harpsicord?!

Seattle is lucky to have KINGFM, a local radio station dedicated to 100% classical music. As one of the few existent classical music fans in his twenties, I listen often enough. Over the past few years, I’ve noticed that when I tune to the station, I always seem to hear the plinky sound of a harpsicord.

Before I sent KINGFM an email, admonishing them for playing so much of an instrument I dislike, I wanted to investigate whether my ears were deceiving me. Perhaps my own distaste for the harpsicord increased its impact in my memory.

This article outlines the details of this investigation and especially the process of collecting the data.
….

Another data collecting/mining post.

If you were collecting this data, how would you reliably share it with others?

In that regard, you might want to consider distinguishing members of the Bach family as a practice run.

I first saw this at DZone.

Adventures In Declarative Programming: Conway’s Game Of Life

Friday, August 31st, 2012

Adventures In Declarative Programming: Conway’s Game Of Life by Manuel Rotter.

From the post:

My first blog post about declarative programming explained how to write a Sudoku solver in the logic programming language Prolog. This time I’ll show you how to implement Conway’s Game of Life in the functional programming language Clojure.

But before that, let me explain a few general things. The first three paragraphs are for readers who are not familiar with certain concepts. People who already know what Clojure or Conway’s Game of Life is, may feel free to skip those paragraphs. It starts getting serious at “Game of Life in Clojure”.

Having a result that interests me makes learning something new easier.

Here it is “Conway’s Game of Life,” a two dimensional type of Cellular Automata.

You may also find the following of interest:

Game of Life 3D

The Game of Life in 3D (using three.js)

If you have heard of Wolfram’s A New Kind of Science, be aware the full text is online for free viewing with other materials at: Wolfram Science.

Distributed (in-memory) graph processing with Akka

Friday, August 31st, 2012

Distributed (in-memory) graph processing with Akka by Adelbert Chang.

From the post:

Graphs have always been an interesting structure to study in both mathematics and computer science (among other fields), and have become even more interesting in the context of online social networks such as Facebook and Twitter, whose underlying network structures are nicely represented by graphs.

These graphs are typically “big”, even when sub-graphed by things such as location or school. With “big” graphs comes the desire to extract meaningful information from these graphs. In the age of multi-core CPU’s and distributed computing, concurrent processing of graphs proves to be an important topic.

Luckily, many graph analysis algorithms are trivially parallelizable. One example that comes to mind is all-pairs shortest path. In the case of an undirected, unweighted graph, we can consider each vertex individually, and do a full BFS from each vertex.

In this post I detail a general framework for distributed graph processing. I do not use any particular graph library, so my graph class will simply be called Graph. Popular graph libraries for Scala can be found in Twitter’s Cassovary project or the Graph for Scala project.

I will also make use of Derek Wyatt’s submission to the Akka Summer of Blog—”Balancing Workloads Across Nodes with Akka 2“—which provides a nice and simple implementation of a BalancingDispatcher in the context of distributed processing.

If you like Akka, graphs, or both, you will enjoy this post.

Parsing Wikipedia Articles with Node.js and jQuery

Friday, August 31st, 2012

Parsing Wikipedia Articles with Node.js and jQuery by Ben Coe.

From the post:

For some NLP research I’m currently doing, I was interested in parsing structured information from Wikipedia articles. I did not want to use a full-featured MediaWiki parser. WikiFetch Crawls a Wikipedia article using Node.js and jQuery. It returns a structured JSON-representation of the page.

Harvesting of content (unless you are authoring all of it) is a major part of any topic map project.

Does this work for you?

Other small utilities or scripts you would recommend?

I first saw this at: DZone.

Developing CDH Applications with Maven and Eclipse

Friday, August 31st, 2012

Developing CDH Applications with Maven and Eclipse by Jon Natkins

Learn how to configure a basic Maven project that will be able to build applications against CDH

Apache Maven is a build automation tool that can be used for Java projects. Since nearly all the Apache Hadoop ecosystem is written in Java, Maven is a great tool for managing projects that build on top of the Hadoop APIs. In this post, we’ll configure a basic Maven project that will be able to build applications against CDH (Cloudera’s Distribution Including Apache Hadoop) binaries.

Maven projects are defined using an XML file called pom.xml, which describes things like the project’s dependencies on other modules, the build order, and any other plugins that the project uses. A complete example of the pom.xml described below, which can be used with CDH, is available on Github. (To use the example, you’ll need at least Maven 2.0 installed.) If you’ve never set up a Maven project before, you can get a jumpstart by using Maven’s quickstart archetype, which generates a small initial project layout.

I don’t have a Fairness Doctrine but thought since I had a post on make today that one on Maven would not be out of place.

Both of them are likely to figure in an active topic map/semantic application work.

BTW, since both “make” and “maven” have multiple meanings, how would you index this post to separate the uses in this post from other meanings?

Would it make a difference if, as appears above, some instances are surrounded with hyperlinks?

How would I indicate that the hyperlinks are identity references?

Or some subset of hyperlinks are identity references?

C Linked List Data Structure Explained ….. [Neo4j internals]

Friday, August 31st, 2012

C Linked List Data Structure Explained with an Example C Program by Himanshu Arora.

From the post:

Linked list is one of the fundamental data structures in C.

Knowledge of linked lists is must for C programmers. This article explains the fundamentals of C linked list with an example C program.

Linked list is a dynamic data structure whose length can be increased or decreased at run time.

How Linked lists are different from arrays? Consider the following points :

  • An array is a static data structure. This means the length of array cannot be altered at run time. While, a linked list is a dynamic data structure.
  • In an array, all the elements are kept at consecutive memory locations while in a linked list the elements (or nodes) may be kept at any location but still connected to each other.

When to prefer linked lists over arrays? Linked lists are preferred mostly when you don’t know the volume of data to be stored. For example, In an employee management system, one cannot use arrays as they are of fixed length while any number of new employees can join. In scenarios like these, linked lists (or other dynamic data structures) are used as their capacity can be increased (or decreased) at run time (as an when required).

My Neo4J Internals (update) post pointed you to resources on Neo4j’s use of linked lists.

You may find this explanation of linked list data structures in C helpful.

Be aware that Knuth (TACP, vol. 1, page 233, 3rd ed.) suggests “nodes” may also be referenced as records, entities, beads, items or elements. (“Item,” and “element” being variants found in TACP). I am sure there are others.

Question: Assume you have discovered an example of a variant name for “node.” (One of these or another one.) How would you use that discovery in formulating a search?

Next Generation Sequencing, GNU-Make and .INTERMEDIATE

Friday, August 31st, 2012

Next Generation Sequencing, GNU-Make and .INTERMEDIATE by Pierre Lindenbaum.

From the post:

I gave a crash course about NGS to a few colleagues today. For my demonstration I wrote a simple Makefile. Basically, it downloads a subset of the human chromosome 22, indexes it with bwa, generates a set of fastqs with wgsim, align the fastqs, generates the *.sai, the *.sam, the *.bam, sorts the bam and calls the SNPs with mpileup.

An illustration that there is plenty of life left in GNU Make.

Plus an interesting tip on the use of .intermediate in make scripts.

As a starting point, consider Make (software).

Physics as a geographic map

Thursday, August 30th, 2012

Physics as a geographic map

Nathan Yau of Flowing Data points to a rendering of the subject area physics as a geographic map.

Somewhat dated (1939) but shows a lot of creativity and not small amount of cartographic skill.

Rather than calling it a “fictional” map I would prefer to say it is an intellectual map of physics.

Like all maps, the objects appear in explicit relationships to each other and there are no doubt as many implicit relationships are there are viewers of the map.

What continuum or dimensions would you use to create a map of modern ontologies?

That could make a very interesting exercise for the topic maps class. To have students create maps and then attempt to draw out what unspoken dimensions were driving the layout between parts of the map.

Suggestions of mapping software anyone?

Recap of the August Pig Hackathon at Hortonworks

Thursday, August 30th, 2012

Recap of the August Pig Hackathon at Hortonworks by Russell Jurney.

From the post:

The August Pig Hackathon brought Pig users from Hortonworks, Yahoo, Cloudera, Visa, Kaiser Permanente, and LinkedIn to Hortonworks HQ in Sunnyvale, CA to talk and work on Apache Pig.

If you weren’t at this hackathon, Russell’s summary and pointers will make you want to attend the next one!

BTW, someone needs to tell Michael Sperberg-McQueen that Pig is being used to build generic DAG structures. Don’t worry, he’ll understand.

HTML5 Boilerplate

Thursday, August 30th, 2012

HTML5 Boilerplate

From the website:

HTML5 Boilerplate helps you build fast, robust, and adaptable web apps or sites. Kick-start your project with the combined knowledge and effort of 100’s of developers, all in one little package.

If this helps you roll out test web pages quickly, good.

If you prefer another package, please post a pointer.

The Top 10 Challenges in Extreme-Scale Visual Analytics [Human Bottlenecks and Parking Meters]

Thursday, August 30th, 2012

The Top 10 Challenges in Extreme-Scale Visual Analytics by Pak Chung Wong, Han-Wei Shen, Christopher R. Johnson, Chaomei Chen, and Robert B. Ross. (Link to PDF. IEEE Computer Graphics and Applications, July-Aug. 2012, pp. 63–67)

The top 10 challenges are:

  1. In Situ Interactive Analysis
  2. User-Driven Data Reduction
  3. Scalability and Multilevel Hierarchy
  4. Representing Evidence and Uncertainty
  5. Heterogeneous-Data Fusion
  6. Data Summarization and Triage for Interactive Query
  7. Analytics of Temporally Evolved Features
  8. The Human Bottleneck
  9. Design and Engineering Development
  10. The Renaissance of Conventional Wisdom

I was amused by #8: The Human Bottleneck, which reads:

Experts predict that all major high-performance computing (HPC) components—power, memory, storage, bandwidth, concurrence, and so on—will improve performance by a factor of 3 to 4,444 by 2018.2 Human cognitive capability will certainly remain constant. One challenge is to find alternative ways to compensate for human cognitive weaknesses.

It isn’t clear to me how speed counting 0’s and 1’s is an indicator of “human cognitive weakness?”

Parking meters stand in the weather day and night. I don’t take that as a commentary on human endurance.

Do you?

A Model of Consumer Search Behaviour (slideshow) [Meta-analysis Anyone?]

Thursday, August 30th, 2012

A Model of Consumer Search Behaviour (slideshow) by Tony Russell-Rose.

From the post:

Here are the slides from the talk I gave at EuroHCIR last week on A Model of Consumer Search Behaviour. This talk extends and validates the taxonomy of information search strategies (aka ‘search modes’) presented at last year’s event, but applies it in this instance to the domain of site search, i.e. consumer-oriented websites and search applications. We found that site search users presented significantly different information needs to those of enterprise search, implying some key differences in the information behaviours required to satisfy those needs.

Every so often I see “meta-analysis” used in medical research that combines the data from several clinical trials.

Are you aware of anyone who has performed a meta-analysis upon search behavior research?

Same question but with regard to computer interfaces more generally?

Applied and implied semantics in crystallographic publishing

Thursday, August 30th, 2012

Applied and implied semantics in crystallographic publishing by Brian McMahon. Journal of Cheminformatics 2012, 4:19 doi:10.1186/1758-2946-4-19.

Abstract:

Background

Crystallography is a data-rich, software-intensive scientific discipline with a community that has undertaken direct responsibility for publishing its own scientific journals. That community has worked actively to develop information exchange standards allowing readers of structure reports to access directly, and interact with, the scientific content of the articles.

Results

Structure reports submitted to some journals of the International Union of Crystallography (IUCr) can be automatically validated and published through an efficient and cost-effective workflow. Readers can view and interact with the structures in three-dimensional visualization applications, and can access the experimental data should they wish to perform their own independent structure solution and refinement. The journals also layer on top of this facility a number of automated annotations and interpretations to add further scientific value.

Conclusions

The benefits of semantically rich information exchange standards have revolutionised the scholarly publishing process for crystallography, and establish a model relevant to many other physical science disciplines.

A strong reminder to authors and publishers of the costs and benefits of making semantics explicit. (And the trade-offs involved.)

MyMiner: a web application for computer-assisted biocuration and text annotation

Thursday, August 30th, 2012

MyMiner: a web application for computer-assisted biocuration and text annotation by David Salgado, Martin Krallinger, Marc Depaule, Elodie Drula, Ashish V. Tendulkar, Florian Leitner, Alfonso Valencia and Christophe Marcelle. ( Bioinformatics (2012) 28 (17): 2285-2287. doi: 10.1093/bioinformatics/bts435 )

Abstract:

Motivation: The exponential growth of scientific literature has resulted in a massive amount of unstructured natural language data that cannot be directly handled by means of bioinformatics tools. Such tools generally require structured data, often generated through a cumbersome process of manual literature curation. Herein, we present MyMiner, a free and user-friendly text annotation tool aimed to assist in carrying out the main biocuration tasks and to provide labelled data for the development of text mining systems. MyMiner allows easy classification and labelling of textual data according to user-specified classes as well as predefined biological entities. The usefulness and efficiency of this application have been tested for a range of real-life annotation scenarios of various research topics.

Availability: http://myminer.armi.monash.edu.au.

Contacts: david.salgado@monash.edu and christophe.marcelle@monash.edu

Supplementary Information: Supplementary data are available at Bioinformatics online.

A useful tool and good tutorial materials.

I could easily see something similar for CS research (unless such already exists).

7 Habits of the Open Scientist

Thursday, August 30th, 2012

7 Habits of the Open Scientist

A series of posts by David Ketcheson that begins:

Science has always been based on a fundamental culture of openness. The scientific community rewards individuals for sharing their discoveries through perpetual attribution, and the community benefits by through the ability to build on discoveries made by individuals. Furthermore, scientific discoveries are not generally accepted until they have been verified or reproduced independently, which requires open communication.

Historically, openness simply meant publishing one’s methods and results in the scientific literature. This enabled scientists all over the world to learn about essential advances made by their colleagues, modulo a few barriers. One needed to have access to expensive library collections, to spend substantial time and effort searching the literature, and to wait while research conducted by other groups was refereed, published, and distributed.

Nowadays it is possible to practice a fundamentally more open kind of research — one in which we have immediate, free, indexed, universal access to scientific discoveries. The new vision of open science is painted in lucid tones in Michael Nielsen’s Reinventing Discovery. After reading Nielsen’s book, I was hungry to begin practicing open science, but not exactly sure where to start. Here are seven ways I’m aware of. Each will be the subject of a longer forthcoming post.

The seven principles are:

  1. Freely accessible publications.
  2. Reproducible research.
  3. Pre-publication dissemination of research.
  4. Open collaboration through social media.
  5. Live open science.
  6. Open expository writing.
  7. Open bibliographies and reviews.

What are your habits for research on topic maps or other semantic technologies?

I first saw this at: Igor Carron’s Around the blogs in 80 summer hours.

MongoDB 2.2 Released [Aggregation News – Expiring Data From Merges?]

Thursday, August 30th, 2012

MongoDB 2.2 Released

From the post:

We are pleased to announce the release of MongoDB version 2.2. This release includes over 1,000 new features, bug fixes, and performance enhancements, with a focus on improved flexibility and performance. For additional details on the release:

Of particular interest to topic map fans:

Aggregation Framework

The Aggregation Framework is available in its first production-ready release as of 2.2. The aggregation framework makes it easier to manipulate and process documents inside of MongoDB, without needing to use Map Reduce, or separate application processes for data manipulation.

See the aggregation documentation for more information.

The H Open also mentions TTL (time to live) which can remove documents from collections.

MongoDB documentation: Expire Data from Collections by Setting TTL.

Have you considered “expiring” data from merges?

Recline.js

Thursday, August 30th, 2012

Recline.js

From the documentation:

The Recline Library consists of 3 parts: Models, Backends and Views

Models

Models help you structure your work with data by providing some standard objects such as Dataset and Record – a Dataset being a collection of Records. More »

Backends

Backends connect your Models to data sources (and stores) – for example Google Docs spreadsheets, local CSV files, the DataHub, ElasticSearch etc. More »

Views

Views are user interface components for displaying, editing or interacting with the data. For example, maps, graphs, data grids or a query editor. More »

Make trial-n-error with interfaces easy, while you search for the one that users “like” in < 50 milliseconds. What if you had to hard code every interface change? How quickly would the rule become: Users must adapt to the interface? Not a bad rule, if you want to drive customers to other sites/vendors. (Think about that for a minute, then take Recline.js for a spin.)

The new Java 0Day examined

Wednesday, August 29th, 2012

The new Java 0Day examined

From the post:

A first analysis of the Java 0Day exploit code, which is already publicly available, suggests that the exploit is rather hard to notice: at first glance, the dangerous code looks just like any other Java program with no trace of any exotic bytecode. According to Michael Schierl, who has discovered several Java holes himself, the code’s secret is that it does something which it isn’t allowed to do: it uses the internal sun.awt.SunToolkit class to disable the SecurityManager, and ultimately the sandbox of Java.

The sun.awt.SunToolkit class gives public (public) access to a method called getField() that provides access to the private attributes of other classes. Technically speaking, untrusted code such as the exploit that is being executed in the browser shouldn’t be able to access this method at all. But Java 7 introduced a new method to the Expression class, .execute(), which allowed expressions created at runtime to be executed. Bugs in the implementation of the new method allows the code to gain access to the getField() method.

I’m not going to make a habit out of reporting security issues, with Java or otherwise but this looked worth passing along.

Curious, with all the design pattern books, are there any design flaw pattern books?

The Curse Of Knowledge

Wednesday, August 29th, 2012

The Curse Of Knowledge by Mark Needham.

From the post:

My colleague Anand Vishwanath recently recommended the book ‘Made To Stick‘ and one thing that has really stood out for me while reading it is the idea of the ‘The Curse Of Knowledge’ which is described like so:

Once we know something, we find it hard to imagine what it was like not to know it. Our knowledge has “cursed” us. And it becomes difficult for us to share out knowledge with others, because can’t readily re-create our listeners’ state of mind.

This is certainly something I imagine that most people have experienced, perhaps for the first time at school when we realised that the best teacher of a subject isn’t necessarily the person who is best at the subject.

I’m currently working on an infrastructure team and each week every team does a mini showcase where they show the other teams some of the things they’ve been working on.

It’s a very mixed audience – some very technical people and some not as technical people – so we’ve found it quite difficult to work out how exactly we can explain what we’re doing in a way that people will be able to understand.

A lot of what we’re doing is quite abstract/not very visible and the first time we presented we assumed that some things were ‘obvious’ and didn’t need an explanation.
….

Sounds like a problem that teachers/educators have been wrestling with for a long time.

Read the rest of Mark’s post, then find a copy of Made to Stick.

And/or, find a really good teacher and simply observe them teaching.

“Funny you should mention …”

Wednesday, August 29th, 2012

The New York Times’ At the Republican Convention, the Words Being Used is an interactive map of words used in speeches at the Republican Convention.

Nothing surprising about the words or even the distribution but something did catch my eye.

Look at the cloud and you will find:

  • Success – 65 mentions
  • God – 62 mentions
  • Romney – 148 mentions
  • Economy – 58
  • Tax – 48
  • Change – 28

So, why does Romney get listed with “mentions,” and the Economy, Tax and Change, do not?

If there is a pattern to “mentions” or not, I haven’t recognized it.

What’s odder is that the excerpts from speeches appears to always say “mentions.”

Thought you might want to give it a try.

I first saw this at Information Aesthetics.

Love or Hate in 50 Milliseconds

Wednesday, August 29th, 2012

Users love simple and familiar designs – Why websites need to make a great first impression by Javier Bargas-Avila, Senior User Experience Researcher at YouTube UX Research

I knew it didn’t take long to love/hate a website but…:

I’m sure you’ve experienced this at some point: You click on a link to a website, and after a quick glance you already know you’re not interested, so you click ‘back’ and head elsewhere. How did you make that snap judgment? Did you really read and process enough information to know that this website wasn’t what you were looking for? Or was it something more immediate?

We form first impressions of the people and things we encounter in our daily lives in an extraordinarily short timeframe. We know the first impression a website’s design creates is crucial in capturing users’ interest. In less than 50 milliseconds, users build an initial “gut feeling” that helps them decide whether they’ll stay or leave. This first impression depends on many factors: structure, colors, spacing, symmetry, amount of text, fonts, and more.

As a comparison, the post cites the blink of an eye taking from 100 to 400 milliseconds.

Raises the bar on the 30 second “elevator speech” doesn’t it?

Pass this on to web page, topic map (and other) semantic technology and software interface designers in general.

How would you test a webpage given this time constraint? (Serious question.)

ACLU maps cost of marijuana enforcement [Comparison]

Wednesday, August 29th, 2012

ACLU maps cost of marijuana enforcement

From the article:

Washington spent more than $200 million on enforcing and prosecuting marijuana laws and incarcerating the folks that violated them, the American Civil Liberties Union of Washington estimates.

The organization released an interactive map today of what it estimates each county spent on marijuana law enforcement. Although not specifically tied to Initiative 502, which gives voters a chance to legalize marijuana use for adults under some circumstances, ACLU is a supporter of the ballot measure.

I have always wondered what motivation, other that fear of others having a good time, could drive something as inane as an anti-marijuana policy.

I think I may have a partial answer.

That old American standby – keeping down competition.

In describing the $425.7 million dollars taken in by the Washington State Liquor Control Board, a map was given to show where the money went:

In Fiscal Year 2011, $345 million was sent to the General Fund, $71 million to cities and counties, $8.2 million to education and prevention, and $1.5 million to research. To see how much revenue your city or county received from the WSLCB in Fiscal Year 2011, visit www.liq.wa.gov/about/where-your-liquor-dollars-go [All the “where-your-liquor-dollars-go” links appear to be broken. Point an an FAQ and not the documentation.].

Consider Pierce County: Spend on anti-marijuana – $21,138,797.

If you can guess the direct URL to the county by county liquor proceeds: http://liq.wa.gov/publications/releases/2011CountiesRevenue/fy2011-PIERCE.pdf (for Pierce county), you will find in 2011, the entire county got $7,489,073.

I’m just a standards editor and semantic integration enthusiast and by no means a captain of industry.

But, spending three times the revenue from competitors to marijuana on anti-marijuana activities makes no business sense.

If you can find the liquor revenue numbers for 2011, what other comparisons would you draw?

DARPA Seeking Unconventional Processors for ISR Data Analysis [Analog Computing By Another Name]

Wednesday, August 29th, 2012

DARPA Seeking Unconventional Processors for ISR Data Analysis by Erwin Gianchandani.

From the post:

Earlier this month, the Defense Advanced Research Projects Agency (DARPA) announced a new initiative that aims “to break the status quo of digital processing” by investigating new ways of “non-digital” computation that are “fundamentally different from current digital processors and the power and speed limitations associated with them.” Called Unconventional Processing of Signals for Intelligent Data Exploitation, or UPSIDE, the initiative specifically seeks “a new, ultra-low power processing method [that] may enable faster, mission-critical analysis of [intelligence, surveillance, and reconnaissance (ISR)] data.”

According to the DARPA announcement (after the jump):

Instead of traditional complementary metal-oxide-semiconductor (CMOS)-based electronics, UPSIDE envisions arrays of physics-based devices (nanoscale oscillators may be one example) performing the processing. These arrays would self-organize and adapt to inputs, meaning that they will not need to be programmed as digital processors are. Unlike traditional digital processors that operate by executing specific instructions to compute, it is envisioned that the UPSIDE arrays will rely on a higher level computational element based on probabilistic inference embedded within a digital system.

Probabilistic inference is the fundamental computational model for the UPSIDE program. An inference process uses energy minimization to determine a probability distribution to find the object that is the most likely interpretation of the sensor data. It can be implemented directly in approximate precision by traditional semiconductors as well as by new kinds of emerging devices.

DARPA program manager Dan Hammerstrom noted:

“Redefining the fundamental computation as inference could unlock processing speeds and power efficiency for visual data sets that are not currently possible. DARPA hopes that this type of technology will not only yield faster video and image analysis, but also lend itself to being scaled for increasingly smaller platforms.

“Leveraging the physics of devices to perform computations is not a new idea, but it is one that has never been fully realized. However, digital processors can no longer keep up with the requirements of the Defense mission. We are reaching a critical mass in terms of our understanding of the required algorithms, of probabilistic inference and its role in sensor data processing, and the sophistication of new kinds of emerging devices. At DARPA, we believe that the time has come to fund the development of systems based on these ideas and take computational capabilities to the next level.”

How much “…not a new idea, but it is one that has never been fully realized[?]”

If you search for “analog computing,” you will get a good idea of how old and how useful a concept it has been.

You can jump to the Wikipedia article, Analog Computer or take a brief tour with the Analog Computer Manual.

Please post a note if you experiment with analog computing and subject identity processing.

Or if you decide that models for chemical reactions in the human brain should be analog ones and not digital.

XLDB-2012 Conference Program

Tuesday, August 28th, 2012

XLDB-2012 Conference Program

The program for the Extremely Large Databases Conference, Workshop & Tutorials, September 10-13, 2012, Stanford University has been posted.

From the homepage:

In response to the exploding need for systems and tools capable of managing and analysing extremely large data sets, we are organizing the 6th Extremely Large Database Conference. The main goals of the conference are:

  • Encourage and accelerate the exchange of ideas between users trying to build extremely large databases worldwide and database solution providers
  • Share lessons, trends, innovations, and challenges related to building extremely large databases
  • Facilitate the development and growth of practical technologies for extremely large databases
  • Strengthen, expand, and engage the XLDB community

This year, the XLDB conference focuses on practical solutions.

Moon Shots, Flying Ponies and Requirements

Tuesday, August 28th, 2012

At Bruce Eckert’s Mind View site I read:

If somebody comes up to you and says something like, “How do I make this pony fly to the moon?”, the question you need to ask is, “What problem are you trying to solve?” You’ll find out that they really need to collect gray rocks. Why they thought they had to fly to the moon, and use a pony to do it, only they know. People do get confused like this. — Max Kanat-Alexander

Everyone has their own “true” version of that story that can be swapped over beers at a conference.

Or at a “Users say the darnest things,” session.

Is that the key question? “What problem are you trying to solve?”

Or would it be better to ask: “What end result do you want?”

To keep it from being narrowly defined as a “problem,” it could be an opportunity, new product, service, etc.

And to avoid the solution being bound to include Lucene, Hadoop, MySQL, SQL Server, the Large Hadron Collider, etc.

Let’s find out what the goal is, then we can talk about solutions and what role technology will play.

Think of it this way, without an end result in mind, how will you know where to stop?

Atomic Scala

Tuesday, August 28th, 2012

Atomic Scala by Bruce Eckel and Dianne Marsh.

From the webpage:

Atomic Scala is meant to be your first Scala book, not your last. We show you enough to become familiar and comfortable with the language — competent, but not expert. You’ll be able to write useful Scala code, but you won’t necessarily be able to read all the Scala code you encounter.

When you’re done, you’ll be ready for more complex Scala books, several of which we recommend at the end of this book.

The first 25% of the book is available for download.

Take a peek at the “about” page if the author names sound familiar. 😉

I first saw this at Christopher Lalanne’s A bag of tweets / August 2012.

D3 2.10.0 Update

Tuesday, August 28th, 2012

D3 2.10.0 Update

An update of D3.js is out, just in case you are using it with topic map (or other) visualizations.

New features:

I first saw this at Christopher Lalanne’s A bag of tweets / August 2012.