Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 3, 2012

2013 International Supercomputing Conference

Filed under: Conferences,HPC,Supercomputing — Patrick Durusau @ 1:26 pm

2013 International Supercomputing Conference

Important Dates

Abstract Submission Deadline Sunday, January 27, 2013
23:59 pm, AoE
Full Paper Submission Deadline Sunday, February 10, 2013
23:59 pm, AoE
Author Notification Sunday, March 10, 2013
Rebuttal Phase Starts Sunday, March 10, 2013
Rebuttal Phase Ends Sunday, March 17, 2013
Notification of Acceptance Friday, March 22, 2013
Camera-Ready Submission Sunday, April 7, 2013

From the call for papers:

  • Architectures (multicore/manycore systems, heterogeneous systems, network technology and programming models) 
  • Algorithms and Analysis (scalability on future architectures, performance evaluation and tuning) 
  • Large-Scale Simulations (workflow management, data analysis and visualization, coupled simulations and industrial simulations) 
  • Future Trends (Exascale HPC, HPC in the Cloud) 
  • Storage and Data (file systems and tape libraries, data intensive applications and databases) 
  • Software Engineering in HPC (application of methods, surveys) 
  • Supercomputing Facility (batch job management, job mix and system utilization and monitoring and administration tools) 
  • Scalable Applications: 50k+ (ISC Research thrust). The Research Paper committee encourages scientists to submit parallelization approaches that lead to scalable applications on more than 50,000 (CPU or GPU) cores
  • Submissions on other innovative aspects of high-performance computing are also welcome. 

Did I mention it will be in Leipzig, Germany? 😉

December 2, 2012

A Rickety Stairway to SQL Server Data Mining, Part 0.1: Data In, Data Out

Filed under: Data Mining,SQL,SQL Server — Patrick Durusau @ 7:46 pm

A Rickety Stairway to SQL Server Data Mining, Part 0.1: Data In, Data Out

A rather refreshing if anonymous take on statistics and data mining.

Since I can access SQL Servers in the cloud (without the necessity of maintaining a local Windows Server box), thought I should look at data mining for SQL Servers.

This was one of the first posts I encountered.

In the first of a series of amateur tutorials on SQL Server Data Mining (SSDM), I promised to pull off an impossible stunt: explaining the broad field of statistics in a few paragraphs without the use of equations. What other SQL Server blog ends with a cliffhanger like that? Anyone who aims at incorporating data mining into their IT infrastructure or skill set in any substantial way is going to have to learn to interpret equations, but it is possible to condense a few key statistical concepts in a way that will help those who aren’t statisticians – like me – to make productive use of SSDM without them. These crude Cliff’s Notes can at least familiarize DBAs, programmers and other readers of these tutorials with the minimal bare bones concepts they will need to know in order to interpret the data output by SSDM’s nine algorithms, as well as to illuminate the inner workings of the algorithms themselves. Without that minimal foundation, it will be more difficult to extract useful meaning from your data mining efforts.

The first principle to keep in mind is so absurdly obvious that it is often half-consciously forgotten – perhaps because it is right before our noses – but it is indispensable to understanding both the field of statistics and the stats output by SSDM. To wit, the numbers signify something. Some intelligence assigned meaning to them. One of the biggest hurdles when interpreting statistical data, reading equations or learning a foreign language is the subtle, almost subconscious error of forgetting that these symbols reflect ideas in the head of another conscious human being, which probably correspond to ideas that you also have in your head, but simply lack the symbols to express. An Englishman learning to read or write Spanish, Portuguese, Russian or Polish may often forget that the native speakers of these languages are trying to express the exact same concepts that an English speaker would; they have the exact same ideas in their heads as we do, but communicate them quite differently. Quite often, the seemingly incoherent quirks and rules of a particular foreign language may actually be part of a complex structure designed to convey identical, ordinary ideas in a dissimilar, extraordinary way. It is the same way with mathematical equations: the scientists and mathematicians who use them are trying to convey ideas in the most succinct way they know. It is often easier for laymen to understand the ideas and supporting evidence that those equations are supposed to express, when they’re not particularly well-versed in the detailed language that equations represent. I’m a layman, like some of my readers probably are. My only claim to expertise in this area is that when I was in fourth grade, I learned enough about equations to solve the ones my father, a college physics teacher, taught every week – but then I forgot it all, so I found myself back at Square One when I took up data mining a few years back.

On a side note, it would be wise for anyone who works with equations regularly to consciously remind themselves that they are merely symbols representing ideas, rather than the other way around; a common pitfall among physicists and other scientists who work with equations regularly seems to be the Pythagorean heresy, i.e. the quasi-religious belief that reality actually consists of mathematical equations. It doesn’t. If we add two apples to two apples, we end up with four apples; the equation 2 + 2 = 4 expresses the nature and reality of several apples, rather than the apples merely being a stand-in for the equation. Reality is not a phantom that obscures some deep, dark equation underlying all we know; math is simply a shortcut to expressing certain truths about the external world. This danger is magnified when we pile abstraction on top of abstraction, which may lead to the construction of ivory towers that eventually fall, often spectacularly. This is a common hazard in the field of finance, where our economists often forget that money is just an abstraction based on agreements among large numbers of people to assign certain meanings to it that correspond to tangible, physical goods; all of the periodic financial crashes that have plagued Western civilization since Tulipmania have been accompanied by a distinct forgetfulness of this fact, which automatically produces the scourge of speculation. I’ve often wondered if this subtle mistake has also contributed to the rash of severe mental illness among mathematicians and physicists, with John Nash (of the film A Beautiful Mind), Nicolai Tesla and Georg Cantor being among the most recognized names in a long list of victims. It may also be linked to the uncanny ineptitude of our most brilliant physicists and mathematicians when it comes to philosophy, such as Rene Descartes, Albert Einstein, Stephen Hawking and Alan Turing. In his most famous work, Orthodoxy, 20th Century British journalist G.K. Chesterton noticed the same pattern, which he summed up thus: “Poets do not go mad; but chess-players do. Mathematicians go mad, and cashiers; but creative artists very seldom. I am not, as will be seen, in any sense attacking logic: I only say that this danger does lie in logic, not in imagination.”[1] At a deeper level, some of the risk to mental health from excessive math may pertain to seeking patterns that aren’t really there, which may be closely linked to the madness underlying ancient “arts” of divination like haruspicy and alectromancy.

Listen to Your Stakeholders : Sowing seeds for future research

Filed under: Design,Interface Research/Design,Usability,Use Cases,Users — Patrick Durusau @ 5:06 pm

Listen to Your Stakeholders : Sowing seeds for future research by Tomer Sharon.

From the post:

If I needed to summarize this article in one sentence, I’d say: “Shut up, listen, and then start talking.”

User experience practitioners who are also excellent interviewers know that listening is a key aspect of a successful interview. By keeping your mouth shut you reduce the risk of verbal foibles and are in a better position to absorb information. When you are concentrated in absorbing information, you can then begin to identify research opportunities and effectively sow seeds for future research.

When you discuss future UX research with your stakeholders you want to collect pure, unbiased data and turn it into useful information that will help you pitch and get buy-in for future research activities. As in end-user interviews, stakeholder interviews a word, a gesture, or even a blink or a certain body posture can bias an interviewee and add flaws to data you collect. Let’s discuss several aspects of listening to your stakeholders when you talk with them about UX research. You will quickly see how these are similar to techniques you apply when interviewing users.

Stakeholders are our clients, whether internal or external to our organization. These are people who need to believe in what we do so they will act on research results and fund future research. We all have a stake in product development. They have a stake in UX research.

Tomer’s advice doesn’t require hardware or software. It does require wetware and some social interaction skills.

If you are successful with the repeated phrase technique, ping me. (“These aren’t the droids you are looking for.”) I have a phrase for them that starts with a routing number. 😉

10 PRINT CHR$(205.5+RND(1)); : GOTO 10

Filed under: Programming — Patrick Durusau @ 4:51 pm

10 PRINT CHR$(205.5+RND(1)); : GOTO 10 by Nick Montfort, Patsy Baudoin, John Bell, Ian Bogost, Jeremy Douglass, Mark C. Marino, Michael Mateas, Casey Reas, Mark Sample, and Noah Vawter.

Appropriate that I should stumble upon this after posting about Kevlin Henney’s presentation on Cool Code.

From the introduction:

Computer programs process and display critical data, facilitate communication, monitor and report on sensor networks, and shoot down incoming missiles. But computer code is not merely functional. Code is a peculiar kind of text, written, maintained, and modified by programmers to make a machine operate. It is a text nonetheless, with many of the properties of more familiar documents. Code is not purely abstract and mathematical; it has significant social, political, and aesthetic dimensions. The way in which code connects to culture, affecting it and being influenced by it, can be traced by examining the specifics of programs by reading the code itself attentively.

Like a diary from the forgotten past, computer code is embedded with stories of a program’s making, its purpose, its assumptions, and more. Every symbol within a program can help to illuminate these stories and open historical and critical lines of inquiry. Traditional wisdom might lead one to believe that learning to read code is a tedious, mathematical chore. Yet in the emerging methodologies of critical code studies, software studies, and platform studies, computer code is approached as a cultural text reflecting the history and social context of its creation. “Code . . . has been inscribed, programmed, written. It is conditioned and concretely historical,” new media theorist Rita Raley notes (2006). The source code of contemporary software is a point of entry in these fields into much larger discussions about technology and culture. It is quite possible, however, that the code with the most potential to incite critical interest from programmers, students, and scholars is that from earlier eras.

I have only started to read the volume but it is already deeply interesting.

Topic maps would be obviously useful in tracing the history of programming code across paradigms, vocabularies and influences.

I suspect, but cannot prove that topic maps may have a role in auditing/indexing of the semantics of programming code.

Such that all calls to a particular data store, from sources written in the same or different languages, could be easily identified.

Or having identified a better way to perform a task, identifying when that same task is being executed by other, less optimal methods.

I first saw this in a tweet by Paul Steffen.

Grow up, use Mindmaps [Or, Grow confident and use what works for you.]

Filed under: Mapping,Maps,Mind Maps — Patrick Durusau @ 3:59 pm

Grow up, use Mindmaps by Anne Balke.

From the post:

No matter what the industry, there is one thing that all business owners have in common. We need to find ways to best utilize our time and to stay organized. Whether you’re just starting out or have had a successful business for years, in order to grow you need to plan for the future. The trick is finding a way to organize all the information that you gather along the way so that you can effectively develop a plan of action. You also need to be able to share your ideas and vision with others in a way that is concise and easy to follow.

The Solution – Mind Mapping

For small-business owners, mind maps are a useful tool for everything from brainstorming to strategic planning. Mind mapping is a way to visualize what you need to do and helps to organize information the same way that your brain does. NovaMind explains it quite well:

Our brains like thinking in pictures…The left half thinks linearly following direct linkages to related ideas. Our right brain likes to see the whole picture with colors and flow. A Mind Map caters to both sides of the brain… [making] it a very good way of storing and recalling information, presenting things to other people, and brainstorming new ideas.

I wasn’t aware the mind map folks had solved the problem of how brains work. Someone needs to call MIT to let them in on the news. 😉

Mind maps can be useful and may even be an authoring step prior to creation of a topic map. But a universal panacea, their not.

I won’t ever make a very good software zealot. What software is best for you depends on your requirements and resources.

It is dishonest, intellectually and morally to pretend otherwise.

If you are organizing a Christmas play for the approaching holidays, a topic map would do the job. But a spiral notebook and #2 pencil (with a pennalet for storage) has a shallower learning curve.

I would rephrase the title just a bit: Grow confident, use software that meets your needs, not what’s “hot” or popular.

Cool Code [Chess Program in 4.8 Tweets]

Filed under: Programming,Semantic Web — Patrick Durusau @ 10:51 am

Cool Code by Kevlin Henney.

From the description:

In most disciplines built on skill and knowledge, from art to architecture, from creative writing to structural engineering, there is a strong emphasis on studying existing work. Exemplary pieces from past and present are examined and discussed in order to provoke thinking and learn techniques for the present and the future. Although programming is a discipline with a very large canon of existing work to draw from, the only code most programmers read is the code they maintain. They rarely look outside the code directly affecting their work. This talk examines some examples of code that are interesting because of historical significance, profound concepts, impressive technique, exemplary style or just sheer geekiness.

Some observations:

At about 3:11 or a little before, Kevlin has a slide that reads:

There is an art, craft, and science to programming that exceeds far beyond the program. The act of programming marries the discrete world of computers to the fluid world of human affairs. Programmers mediate between the negotiated and uncertain truths of business and the crisp, uncompromising domain of bits and bytes and higher constructed types.

I rather like the phrases “…marries the discrete world of computers to the fluid world of human affairs,” and “…the negotiated and uncertain truths of business….

It captures the divergence of the AI/Semantic Web paradigm from life as we experience it.

In order to have the Semantic Web, we have to prune “…negotiated and uncertain truths…” until what remains can fit into “…the discrete world of computers….”

You will enjoy Kevlin’s take on RUD (Rapid Unscheduled Disassembly). 😉

Or a chess program written in 672 bytes or 4.8 tweets. (On which see: 1K ZX Chess The code and numerous other resources.)

The presentation is marred only by the unreadability (on the video) of some of the code examples.

Kevlin closes with:

If you don’t have time to read, you don’t have the time or tools to write. (Stephen King)


Kevlin’s homepage, and his papers.

97 Things Every Programmer Should Know: Collective Wisdom from the Experts (Kevlin as editor)

December 1, 2012

Encyclo

Filed under: Encyclo,News — Patrick Durusau @ 9:09 pm

Encyclo : An encyclopedia of the future of news from the Nieman Journalism Lab

From the about page:

Encyclo is an encyclopedia of the future of news, produced by the Nieman Journalism Lab at Harvard University.

You may already know the Lab for our reporting, analysis, and commentary on how the world of journalism is changing, both through our website and our Twitter feed. The Internet has revolutionized the way news is gathered, assembled, distributed, and consumed, and our mission is to learn about those changes, to identify what’s working and what isn’t, and to do our small part in helping that evolution along.

But our main site emphasizes new developments and the latest news. We think there’s great value in a resource that steps back a bit from the daily updates and focuses on background and context. What is it about Voice of San Diego that people find interesting? How has The New York Times been innovating? What model is Politico trying to achieve? Those kinds of questions are why we decided to build Encyclo — a resource on the most important organizations and issues in journalism’s evolution.

Another area where avoiding re-finding information and links between subjects of stories would be a tremendous benefit.

A site to watch and explore for opportunities for topic maps.

Spundge

Filed under: Data Streams,News,Writing — Patrick Durusau @ 8:56 pm

First look: Spundge is software to help journalists to manage real-time data streams by Andrew Phelps.

From the post:

“Spundge is a platform that’s built to take a journalist from information discovery and tracking all the way to publishing, regardless of whatever internal systems they have to contend with,” he told me.

A user creates notebooks to organize material (a scheme familiar to Evernote users). Inside a notebook, a user can add streams from multiple sources and activate filters to refine by keyword, time (past few minutes, last week), location, and language.

Spundge extracts links from those sources and displays headlines and summaries in a blog-style river. A user can choose to save individual items to the notebook or hide them from view, and Spundge’s algorithms begin to learn what kind of content to show more or less of. A user can also save clippings from around the web with a bookmarklet (another Evernote-like feature). If a notebook is public, the stream can be embedded in webpages, à la Storify. (Here’s an example of a notebook tracking the ONA 2012 conference.)

Looks interesting but I wonder about the monochrome view it presents the user?

That is some particular user makes their settings and until and unless they change those settings, the limits of the content they are shown is measured by that user.

As opposed to say a human curated source like the New York Times. (Give me human editors and the New York Times)

Or is the problem a lack of human curated data feeds?

A Consumer Electronics Named Entity Recognizer using NLTK [Post-Authoring ER?]

Filed under: Entity Resolution,Named Entity Mining,NLTK — Patrick Durusau @ 8:34 pm

A Consumer Electronics Named Entity Recognizer using NLTK by Sujit Pal.

From the post:

Some time back, I came across a question someone asked about possible approaches to building a Named Entity Recognizer (NER) for the Consumer Electronics (CE) industry on LinkedIn’s Natural Language Processing People group. I had just finished reading the NLTK Book and had some ideas, but I wanted to test my understanding, so I decided to build one. This post describes this effort.

The approach is actually quite portable and not tied to NLTK and Python, you could, for example, build a Java/Scala based NER using components from OpenNLP and Weka using this approach. But NLTK provides all the components you need in one single package, and I wanted to get familiar with it, so I ended up using NLTK and Python.

The idea is that you take some Consumer Electronics text, mark the chunks (words/phrases) you think should be Named Entities, then train a (binary) classifier on it. Each word in the training set, along with some features such as its Part of Speech (POS), Shape, etc is a training input to the classifier. If the word is part of a CE Named Entity (NE) chunk, then its trained class is True otherwise it is False. You then use this classifier to predict the class (CE NE or not) of words in (previously unseen) text from the Consumer Electronics domain.

Should help with mining data for “entities” (read “subjects” in the topic map sense) for addition to your topic map.

I did puzzle over the suggestion for improvement that reads:

Another idea is to not do reference resolution during tagging, but instead postponing this to a second stage following entity recognition. That way, the references will be localized to the text under analysis, thus reducing false positives.

Post-authoring reference resolution might benefit from that approach.

But, if references were resolved by authors during the creation of a text, such as the insertion of Wikipedia references for entities, a different result would be obtained.

In those cases, assuming the author of a text is identified, they can be associated with a particular set of reference resolutions.

Neo4j – New Website

Filed under: Graphs,Neo4j — Patrick Durusau @ 8:06 pm

Neo4j

Old location, new website.

Completely different (in a good way) from the previous version.

Take a look, you will be pleasantly surprised.

MOA Massively Online Analysis

Filed under: BigData,Data,Hadoop,Machine Learning,S4,Storm,Stream Analytics — Patrick Durusau @ 8:02 pm

MOA Massively Online Analysis : Real Time Analytics for Data Streams

From the homepage:

What is MOA?

MOA is an open source framework for data stream mining. It includes a collection of machine learning algorithms (classification, regression, and clustering) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems.

What can MOA do for you?

MOA performs BIG DATA stream mining in real time, and large scale machine learning. MOA can be easily used with Hadoop, S4 or Storm, and extended with new mining algorithms, and new stream generators or evaluation measures. The goal is to provide a benchmark suite for the stream mining community. Details.

Short tutorials and a manual are available. Enough to get started but you will need additional resources on machine learning if it isn’t already familiar.

A small niggle about documentation. Many projects have files named “tutorial” or in this case “Tutorial1,” or “Manual.” Those files are easier to discover/save, if the project name, version(?), is prepended to tutorial or manual. Thus “Moa-2012-08-tutorial1” or “Moa-2012-08-manual.”

If data streams are in your present or future, definitely worth a look.

« Newer Posts

Powered by WordPress