Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 3, 2012

20 More Reasons You Need Topic Maps

Filed under: Identification,Identifiers,Identity,Marketing,Topic Maps — Patrick Durusau @ 6:23 pm

Well, Ed Lindsey did call his column 20 Commom Data Errors and Variation but when you see the PNG of the 20 errors, here, you will agree my title works better (for topic maps anyway).

Not only that, but Ed’s opening paragraphs work for identifying a subject by more than one attribute (although this is “subject” in the police sense of the word):

A good friend of mine’s husband is a sergeant on the Chicago police force. Recenlty a crime was committed and a witness insisted that the perpetrator was a woman with blond hair about five nine weighing 160 pounds. She was wearing a gray pinstriped business suit with an Armani scarf and carrying a Gucci handbag.

So what does this sergeant have to do? Start looking at the women of Chicago. He only needs the women. Actually, he would start with women with blond hair (but judging from my daughter’s constant change of hair color he might skip that attribute). So he might start with women in a certain height range and in a certain weight group. He would bring those women in to the station for questioning.

As it turns out, when they finally arrested the woman at her son’s soccer game, she had brown hair, was 5’5″ tall and weighed 120 pounds. She was wearing an Oklahoma University sweatshirt, jeans and sneakers. When the original witness saw her she said yes that’s the same woman. Iit turns out she was wearing four inch heels and the pantsuit made her look bigger.

So what can we learn from this episode that has to do with matching? Well the first thing we need to understand is that each of the attributes of the witness can be used in matching the suspect and then immediately we must also recognize that not all the attributes that the witness gave the sergeant were extremely accurate. So later on when we start talking about matching, will use the term fuzzy matching. This means that when you look at an address, there could be a number of different types of errors in the address from one system that are not identical to an address in another system. Figure 1 shows a number of the common errors that can happen.

So, there you have it: 20 more reasons to use topic maps, a lesson on identifying a subject and proof that yes, a pinstripped pantsuit can make you look bigger.

April 30, 2012

Text Analytics: Yesterday, Today and Tomorrow

Filed under: Marketing,Text Analytics — Patrick Durusau @ 3:17 pm

Text Analytics: Yesterday, Today and Tomorrow

Another Tony Russell-Rose post that I ran across over the weekend:

Here’s something I’ve been meaning to share for a while: the slides for a talk entitled “Text Analytics: Yesterday, Today and Tomorrow”, co-authored with colleagues Vladimir Zelevinsky and Michael Ferretti. In this we outline some of the key challenges in text analytics, describe some of Endeca’s current research in this area, examine the current state of the text analytics market and explore some of the prospects for the future.

I was amused to read on slide 40:

Solutions still not standardized

Users differ in their views of the world of texts, solutions, data, formats, data structures, and analysis.

Anyone offering a “standardized” solution is selling their view of the world.

As a user/potential customer, I am rather attached to my view of the world. You?

April 25, 2012

OAG Launches Mapper, a New Network Analysis Mapping Tool

Filed under: Aviation,Books,Marketing,Travel — Patrick Durusau @ 6:27 pm

OAG Launches Mapper, a New Network Analysis Mapping Tool

From the post:

OAG, a UBM Aviation brand, today unveiled its new aviation analysis mapping tool, OAG Mapper. This latest innovation, from the global leader in aviation intelligence, combines a powerful global flight schedule query with advanced mapping software technology to quickly plot route network maps, based on data drawn from OAG’s market leading schedules database of 1,000 airlines and over 3,500 airports. It is ideal for those in commercial, marketing and strategic planning roles across the airlines, airports, tourism, consulting and route network related industry sectors.

A web-based tool that eliminates the need to hand-draw network routes onto maps, OAG Mapper allows users to either import IATA Airport codes, or to enter a carrier, airport, equipment type or a combination of these and generate custom network maps in seconds. The user can then highlight key routes by changing the thickness and colour of the lines and label them for easy reference, save the map to their profile and export to jpeg for use in network planning, forecasting, strategy and executive presentations.

This has aviation professional written all over it.

And what does aviation bring to mind? that’s right! Coin of the realm! Lot of coins from lots of realms.

Two thoughts:

First and the most obvious, use this service in tandem with other information for aviation professionals to create enhanced services for their use. Ask aviation professional what they would like to see and how they would like to see it. (Novel software theory: Give users what they want, how they want it. Easier sell than educating them.)

Second, we have all seen the travel sites that plot schedules, fees, destinations, hotels and car rentals.

But when was the last time you flew to an airport, rented a car and stayed in a hotel? That was the sum total of your trip?

Every location in the world has more to offer than that, well, not the South Pole but they don’t have a car rental agency. Or any beach. So why go there?

Sorry, got distracted. Every location in the world (with one exception, see above) has more than airports, hotels and car rentals. Suggestion: Use topic maps (non-obviously) to create information/reservation rich information environments.

The Frankfurt Book Fair is an example of an event with literally thousands of connections to be made in addition to airport, hotel and car rental. Your application could be the one that crosses all the information systems (or lack thereof) to provide that unique experience.

Could hard code it but I assume you are brighter than that.

NYCFacets

Filed under: Marketing,Open Data — Patrick Durusau @ 6:26 pm

NYCFacets: Smart Open Data Exchange

From the FAQ:

Smart Open Data Exchange?

A: We just don’t catalog the metadata for each datasource. We squeeze additional metadata – extrametadata as we call it, and correlate all the datasources to allow Open Data Users to see the “forest for the trees”. Or in the case of NYC – the “city for the streets”? (TODO: find urban equivalent of “See Forest for the Trees“)

The “Smart” comes from a process we call “Crowdknowing” – leveraging metadata + extrametadata to score each dataset from various perspectives, automatically correlate them, and in the near future, perform semi-automatic domain mapping.

Extrametadata?

A: Derived Metadata – Statistics (Quantitative and Qualitative), Ontologies, Semantic Mappings, Inferences, Federated Queries, Scores, Curations, Annotations plus various other Machine and Human-powered signals through a process we call “Crowdknowing“.

Crowdknowing?

A: Human-powered, machine-accelerated, collective knowledge systems cataloging metadata + derived extrametadata (derived using semantics, statistics, algorithm and the crowd). At this stage, the human-powered aspect is not emphasized because we found that the NYC Data Catalog community is still in its infancy – there were very few comments and ratings. But we hope to help improve that over time as we crawl secondary signals (e.g. votes and comments in NYCBigApps, Challengepost and Appstores; Facebook likes; Tweets, etc.).

OK, it was covered as the winner of the most recent NYCBigApps contest but I thought it needed a separate shout-out.

Take a close look at what this site has done with a minimum of software and some clever thinking.

NYC BigApps

Filed under: Contest,Mapping,Marketing — Patrick Durusau @ 6:25 pm

NYC BigApps

From the webpage:

New York City is challenging software developers to create apps that use city data to make NYC better.

There are three completed contests (one just ended) that resulted in very useful applications.

NYC BigApps 3.0 resulted in:

NYC Facets: Best Overall Application – Grand Prize – Explores and visualizes more than 1 million facts about New York City.

Work+: Best Overall Application – Second – Prices – Working from home not working for you? Discover new places to get things done.

Funday Genie: Investor’s Choice Application – The Funday Genie is an application for planning a free day. Our unique scheduling and best route algorithm creates a smart personalized day-itinerary of things to do, including events, attractions, restaurants, shopping, and more, based on the user’s preferences. Everyday can be a Funday.

among others.

Quick question: How would you interchange information between any two of these apps? Or if you like, any other two apps in this or prior contests?

Second question: How would you integrate additional information into any of these apps, prepared for use by another application?

Topic maps can:

  • collate information for display.
  • power re-usable and extensible mappings of data into other formats.
  • augment data for applications that lack merging semantics.

Where is your data today and where would you like for it to be tomorrow?

April 19, 2012

NSA Money Trap

Filed under: Humor,Marketing — Patrick Durusau @ 7:23 pm

I am posting this under humor, in part due to the excellent writing of James Bamford.

Here is a sample of what you will find at: The NSA Is Building the Country’s Biggest Spy Center (Watch What You Say):

Today Bluffdale is home to one of the nation’s largest sects of polygamists, the Apostolic United Brethren, with upwards of 9,000 members. The brethren’s complex includes a chapel, a school, a sports field, and an archive. Membership has doubled since 1978—and the number of plural marriages has tripled—so the sect has recently been looking for ways to purchase more land and expand throughout the town.

But new pioneers have quietly begun moving into the area, secretive outsiders who say little and keep to themselves. Like the pious polygamists, they are focused on deciphering cryptic messages that only they have the power to understand. Just off Beef Hollow Road, less than a mile from brethren headquarters, thousands of hard-hatted construction workers in sweat-soaked T-shirts are laying the groundwork for the newcomers’ own temple and archive, a massive complex so large that it necessitated expanding the town’s boundaries. Once built, it will be more than five times the size of the US Capitol.

Rather than Bibles, prophets, and worshippers, this temple will be filled with servers, computer intelligence experts, and armed guards. And instead of listening for words flowing down from heaven, these newcomers will be secretly capturing, storing, and analyzing vast quantities of words and images hurtling through the world’s telecommunications networks. In the little town of Bluffdale, Big Love and Big Brother have become uneasy neighbors.

There is enough doom and gloom to keep the movie industry busy through Terminator XXX – The Commodore 128 Conspiracy.

Why am I not worried?

  1. 70% of all IT projects fail – Odds are better than 50% this is one of them.
  2. Location – Build a computer center in one of the hottest location in the 48 states. Is that a comment on the planning of this center?
  3. Technology – In the time from planning to completion, two or three generations of computing architecture and design have occurred. Care to bet on the mixture of systems to be found at this “secret” location?
  4. 70% of all IT projects fail – Odds are better than 50% this is one of them.
  5. NSA advances in cryptography. Sure, just like Oakridge was breeched by an “advanced persistent threat“:

    Oak Ridge National Labs blamed the incident on an “advanced persistent threat,” (APT) a term commonly used by organizations to imply that the threat was so advanced that they would never have been able to protect themselves, Gunter Ollmann, vice-president of research at Damballa,

    Would you expect anyone to claim being the victim of a high school level hack? Do you really think the NSA is going to say its a little ahead, maybe?

  6. Consider the NSA’s track record against terrorism, revolution, etc. You would get more timely information reading the Washington Post. Oh, but their real contribution is a secret.
  7. When they have a contribution, like listening to cell phones of terrorists, they leak it. No leaks, no real contributions.
  8. 70% of all IT projects fail – Odds are better than 50% this is one of them.
  9. Apparently there is no capacity (unless is is secret) to proof signals intell with human intell. That’s like watching I Love Lucy episodes for current weather information. It has to be right part of the time.
  10. 70% of all IT projects fail – Odds are better than 50% this is one of them.

What is missing from most IT projects is an actor with technical expertise but no direct interest in the project. Someone who has no motive for CYA on the part of the client or contractor.

Someone who can ask of the decision makers: “What specific benefit is derived from ability X?” Such as the capacity to mine “big data.” To what end?

The oft cited benefit of “making better decisions” is not empowered by “big data.”

If you are incapable of making good business decisions now, that will be true after you have “big data.” (Sorry.)

April 18, 2012

Gas Price Fact Vacuum

Filed under: Marketing,Topic Maps — Patrick Durusau @ 6:12 pm

President Obama claims that speculation in oil markets are responsible for high gas prices. While others see only supply and demand.

Two stories, very different takes on the gas price question. The only fact the stories have in common is that gas prices are high.

In a high school debate setting, we would say the two teams did not “join the issue.” That is they don’t directly address the questions raised by their opponents but trot out their “evidence,” which is ignored in turn by the other side.

The result is a claim rich but fact poor environment that leaves readers to cherry pick claims that support their present opinions.

If you are interested in public policy, for an area like gas prices, topic maps can capture the lack of “joining the issue” by both sides in such a debate.

Might make an interesting visual for use in presidential debates. Where have the candidates have simply missed each others arguments?

Topic maps anyone? (PBS? Patrick Durusau)

If you want a more “practical” application of topic maps and the analysis that underlie them, think about the last set of ads, white papers, webinars you have seen on technology alternatives.

A topic map could help you get past the semantic-content="zero" parts of technology promotions. (Note the use of “could.” Like any technology, the usefulness of topic maps depend on the skill of their author(s) and user(s). Anyone who says differently is lying.)

April 17, 2012

Secret Service Babes

Filed under: BigData,Marketing,Topic Maps — Patrick Durusau @ 7:14 pm

The major news organizations are all over the story of the U.S. Secret Service and the prostitutes in Cartagena, Columbia.

But not every TV or radio station can afford to send reporters to Columbia.

And what news could they uncover at this point?

Ask yourself: Why were the secret service agents in Columbia?

Answer: The president was visiting.

Opportunity: Run the list of overnight presidential visits backwards and start interviewing the local prostitutes for their stories. May turn up some “Secret Service babes” of the smaller sort.

This is where a topic map makes an excellent mapping/information sharing tool. Being “secret,” you won’t have photos of Secret Service agents but physical descriptions can be collated/merged.

Composite physical/sexual description as it were.

To be reviewed/recognized by other prostitutes or wives of the Secret Service agents.

Interested in a topic map of Secret Service Sexual Liaisons (SSSL)? (Patrick Durusau)

PS: Is this a “big data” mining opportunity?

April 14, 2012

Everything You Wanted to Know About Data Mining but Were Afraid to Ask

Filed under: Data Mining,Marketing — Patrick Durusau @ 6:24 pm

Everything You Wanted to Know About Data Mining but Were Afraid to Ask by Alexander Furnas.

Interesting piece from the Atlantic that you can use to introduce a client to the concepts of data mining. And at the same time, use as the basis for discussing topic maps.

For example, Furnas says:

For the most part, data mining tells us about very large and complex data sets, the kinds of information that would be readily apparent about small and simple things. For example, it can tell us that “one of these things is not like the other” a la Sesame Street or it can show us categories and then sort things into pre-determined categories. But what’s simple with 5 datapoints is not so simple with 5 billion datapoints.

Topic maps being more about things that are “like the other” so that we can have them all in one place. Or at least all the information about them in one place.

See, that wasn’t hard.

The editorial and technical side of it, how information is gathered for useful presentation to a user, is hard.

But the client, like someone watching cable TV, is more concerned with the result than how it arrived.

Perhaps a different marketing strategy, results first.

Thoughts?

Rules and Rituals

Filed under: Marketing — Patrick Durusau @ 6:24 pm

Rules and Rituals by Basab Pradhan.

From the post:

Matt Richtel investigates the mystery of why laptops and not iPads need to be pulled out of bags for the X-Ray machine at airport security.

From the New York Times

What’s the distinction between the devices? Similar shapes, many similar functions, the tablet is thinner but not by much. Is the iPad a lower security risk? What about the punier laptop-like gadgets, the netbooks and ultrabooks? What about my smartphone?

Richtel contacts

the TSA and security experts, but doesn’t really get a good answer. The TSA said that it had its reasons but declined to share them saying that “the agency didn’t want to betray any secrets.” Another security expert called it “security theater”, implying that making passengers go through some inconvenience makes it look like the government is taking their security seriously!

A very amusing post on rules that concludes:

The only way to keep business agile is to constantly subject its rules to the sunlight of logic. Why do we have this rule in place? Did we make this rule when the conditions were different from what they are today? Do we completely understand the costs of this rule and have we weighed them against the benefits? Does anyone even remember why we have this rule?

Like zero based budgeting, we should be talking about zero-based rules.

Most of us would agree with Basab on the TSA and the use of the “sunshine of logic” with regards to airport security.

At least at first blush.

But it is a good illustration that the “sunshine of logic” is always from a particular perspective.

As a former frequent air traveler, my view was and is that the TSA is public band-aid of little utility. At Atlanta, it is simply a job creation mechanism for sisters, cousins and other relatives. Now that groping children is part of their job, no doubt the pool of job applicants has increased.

From the perspective of people who like groping children, the “sunshine of logic” for the TSA is entirely different. The TSA exists to provide them with employment and legitimate reasons to grope children.

From the perspective of the politicians who created the TSA, the “sunshine of logic” for the TSA is that they are doing something about terrorism (a farcical claim to you or I but I have heard it claimed by politicians).

Bottom line is that if I get to declare where “zero” starts, I’m pretty comfortable with “zero-based rules.” (You may be more or less comfortable.)

April 8, 2012

Context matters: Search can’t replace a high-quality index

Filed under: eBooks,Indexing,Marketing — Patrick Durusau @ 4:21 pm

Context matters: Search can’t replace a high-quality index

Joe Wikert writes:

I’ve never consulted an index in an ebook. From a digital content point of view, indexes seem to be an unnecessary relic of the print world. The problem with my logic is that I’m thinking of simply dropping a print index into an ebook, and that’s as shortsighted as thinking the future of ebooks in general is nothing more than quick-and-dirty conversions of print books. In this TOC podcast interview, Kevin Broccoli, CEO of BIM Publishing Services, talks about how indexes can and should evolve in the digital world.

Key points from the full video interview (below) include:

  • Why bother with e-indexes? — Searching for raw text strings completely removes context, which is one of the most valuable attributes of a good index. [Discussed at the 1:05 mark.]
  • Index mashups are part of the future — In the digital world you should be able to combine indexes from books on common topics in your library. That’s exactly what IndexMasher sets out to do. [Discussed at 3:37.]
  • Indexes with links — It seems simple but almost nobody is doing it. And as Kevin notes, wouldn’t it be nice for ebook retailers to offer something like this as part of the browsing experience? [Discussed at 6:24.]
  • Index as cross-selling tool — The index mashup could be designed to show live links to content you own but also include entries without links to content in ebooks you don’t own. Those entries could offer a way to quickly buy the other books, right from within the index. [Discussed at 7:28.]
  • Making indexes more dynamic — The entry for “Anderson, Chris” in the “Poke The Box” index on IndexMasher shows a simple step in this direction by integrating a Google and Amazon search into the index. [Discussed at 9:42.]

Apologies but I left the links out to the interview to encourage you to visit the original. It is really worth your time.

Do these points sound like something a topic map could do? 😉

BTW, I am posting a note to IndexMasher and will advise. Sounds very interesting.

An R programmer looks at Julia

Filed under: Julia,Marketing,R — Patrick Durusau @ 4:20 pm

An R programmer looks at Julia by Douglas Bates.

Douglas writes:

In January of this year I first saw mention of the Julia language in the release notes for LLVM. I mentioned this to Dirk Eddelbuettel and later we got in contact with Viral Shah regarding a Debian package for Julia.

There are many aspects of Julia that are quite intriguing to an R programmer. I am interested in programming languages for “Computing with Data”, in John Chambers ‘term, or “Technical Computing”, as the authors of Julia classify it. I believe that learning a programming language is somewhat like learning a natural language in that you need to live with it and use it for a while before you feel comfortable with it and with the culture surrounding it.

A common complaint for those learning R is finding the name of the function to perform a particular task. In writing a bit of Julia code for fitting generalized linear models, as described below, I found myself in exactly the same position of having to search through documentation to find how to do something that I felt should be simple. The experience is frustrating but I don’t know of a way of avoiding it. One word of advice for R programmers looking at Julia, the names of most functions correspond to the Matlab/octave names, not the R names. One exception is the d-p-q-r functions for distributions, as I described in an earlier posting. [bold emphasis added in last paragraph]

Problem: Programming languages with different names for the same operation.

Suggestions anyone?

😉

Do topic maps spring to mind?

Perhaps with select match language, select target language and auto-completion capabilities?

Unintrusive window or pop-up for text entry of name (or signature) in match language, that displays equivalent name/signature (would Hamming distance work here?) in target language. Using XTM/CTM as format would enable distributed (and yet interchangeable) construction of editorial artifacts for various programming languages.

Not the path to world domination or peace but on the other hand, it would be useful.

Lumia Review Cluster

Filed under: Clustering,Marketing — Patrick Durusau @ 4:20 pm

Lumia Review Cluster

Matthew Hurst has clustered reviews of the Nokia Lumia 900.

His blog post is an image of the cluster at a point in time so you will have to go to http://d8taplex.com/track/microsoft-widescreen.html to interact with the cluster.

What would you add to this cluster to make it more useful? Such as sub-clustering strictly reviews of the Nokia Lumia 900 or perhaps clustering based on the mentioning of other phones for comparison?

Other navigation?

You have the subject (what is being buzzed about) and you have a relative measure of the “buzz.” How do we take advantage of that “buzz?”

Navigating to a place is great fun but most people expect something at the end of the journey.

What does your topic map enable at the end of a journey?

Data and the Liar’s Paradox

Filed under: Data,Data Quality,Marketing — Patrick Durusau @ 4:20 pm

Data and the Liar’s Paradox by Jim Harris.

Jim writes:

“This statement is a lie.”

That is an example of what is known in philosophy and logic as the Liar’s Paradox because if “this statement is a lie” is true, then the statement is false, which would in turn mean that it’s actually true, but this would mean that it’s false, and so on in an infinite, and paradoxical, loop of simultaneous truth and falsehood.

I have never been a fan of the data management concept known as the Single Version of the Truth, and I often quote Bob Kotch, via Tom Redman’s excellent book, Data Driven: “For all important data, there are too many uses, too many viewpoints, and too much nuance for a single version to have any hope of success. This does not imply malfeasance on anyone’s part; it is simply a fact of life. Getting everyone to work from a Single Version of the Truth may be a noble goal, but it is better to call this the One Lie Strategy than anything resembling truth.”

More business/data quality reading.

Imagine my chagrin after years of studying literary criticism in graduate seminary classes (don’t ask, its a long and boring story) to discover that business types already know “truth” is a relative thing.

What does that mean for topic maps?

I would argue with careful design we can capture several points of view, using a point of view as our vantage point.

As opposed to strategies that can only capture a single point of view, their own.

Capturing multiple viewpoints will be a hot topic when “big data” starts to hit the “big fan.”

Books That Influenced My Thinking: Quality, Productivity and Competitive Position

Filed under: Data Quality,Marketing — Patrick Durusau @ 4:19 pm

Books That Influenced My Thinking: Quality, Productivity and Competitive Position by Thomas Redman.

From the post:

I recently learned that Technics Publications, led by Steve Hoberman, is re-issuing one of my favorites, Data and Reality by William Kent. It led me to conclude I ought to review some of the books that most influenced my thinking about data quality. (I’ll include Data and Reality, when the re-issue appears). I am explicitly excluding books on data quality per se.

First up is Dr. Deming’s Quality, Productivity and Competitive Position (QPC). First published in 1982, to me this is Deming at his finest. The more famous Out of The Crisis came out about the same time and the two cover much the same material. But QPC is raw, powerful Deming. He is fed up the economic malaise of corporate America at the time and he rails against top management for simply not understanding the role of quality in marketplace competition.

Data quality is a “hot” topic these days. I thought it might be useful to see what business perspective resources were available on the topic.

Both to learn management “speak” about data quality and how solutions are evaluated.

QPC sounds a bit dated (1982) but I rather doubt management has changed that much, albeit the terms by which management is described have probably changed a lot. Not the terms used by their employees but the terms used by consultants who are being paid by management. 😉

Not to forget that topic maps as information products, information services or software, all face the same issues of quality, productivity and competitive position.

50 million messages per second – on a single machine

Filed under: Akka,Marketing — Patrick Durusau @ 4:19 pm

50 million messages per second – on a single machine

From the post:

50 million messages per second on a single machine is mind blowing!

We have measured this for a micro benchmark of Akka 2.0.

As promised in Scalability of Fork Join Pool I will here describe one of the tuning settings that can be used to achieve even higher throughput than the amazing numbers presented previously. Using the same benchmark as in Scalability of Fork Join Pool and only changing the configuration we go from 20 to 50 million messages per second.

The micro benchmark use pairs of actors sending messages to each other, classical ping-pong. All sharing the same fork join dispatcher.

Fairly sure the web scale folks will just sniff and move on. It’s not like every Facebook user sending individual messages to all of their friends and their friend’s friends, all at the same time.

On the other hand, 50 million messages per second per machine, on enough machines, and you are talking about a real pile of message. 😉

Are we approaching the point of data being responsible for processing itself and reporting the results? Or at least reporting itself to the nearest processor with the appropriate inputs? Perhaps by broadcasting a message itself?

Closer to home, could a topic map infrastructure be built using message passing that reports a TMDM based data model? For use by query or constraint languages? That is it presents a TMDM API as it were, although behind the scenes the reported API is the result of message passing and processing.

That would make the data model or API if you prefer, a matter of what message passing had been implemented.

More malleable and flexible than a relational database scheme or Cyc based ontology. An enlightened data structure, for a new age.

April 4, 2012

Astronomers Look to Exascale Computing to Uncover Mysteries of the Universe

Filed under: Astroinformatics,Marketing — Patrick Durusau @ 3:34 pm

Astronomers Look to Exascale Computing to Uncover Mysteries of the Universe by Robert Gelber.

From the post:

Plans are currently underway for development of the world’s most powerful radio telescope. The Square Kilometer Array (SKA) will consist of roughly 3,000 antennae located in Southern Africa or Australia; its final location may be decided later this month. The heart of this system, however, will include one of the world’s fastest supercomputers.

The array is quite demanding of both data storage and processing power. It is expected to generate an exabyte of data per day and require a multi-exaflops supercomputer to process it. Rebecca Boyle of Popsci wrote an article about the telescope’s computing demands, estimating that such a machine would have to deliver between two to thirty exaflops.

The array is not due to go online until 2024 but that really isn’t that far away.

Strides in engineering, processing, programming, and other fields, all of which rely upon information retrieval, are going to be necessary. Will your semantic application advance or retard those efforts?

California abandons $2 billion court management system

Filed under: Marketing — Patrick Durusau @ 3:29 pm

California abandons $2 billion court management system by Michael Krigsman.

From the post:

Despite spending $500 million on the California Case Management System (CCMS), court officials terminated the project and allocated $8.6 million to determine whether they can salvage anything. In 2004, planners expected the system to cost $260 million; today, the price tag would be $2 billion if the project runs to completion.

The multi-billion project, started in 2001, was intended to automate California court operations with a common system across the state and replace 70 different legacy systems. Although benefits from the planned system seem clear, court leadership decided it could no longer afford the cost of completing the system, especially during this period of budget cuts, service reductions, and personnel layoffs.

This failure wasn’t entirely due to the diversity of legacy applications. I expect poor project management, local politics and just bad IT advice all played their parts.

But it is an example of how removing local diversity in IT represents a bridge too far.

Diversity in our population is a common thing. (English-only states not withstanding.)

Diversity in IT is common as well.

Diversity in plant and animal populations make them more robust.

Perhaps diversity in IT systems, with engineered interchange, could give us robustness and interoperability.

March 28, 2012

Once Upon A Subject Clearly…

Filed under: Identity,Marketing,Subject Identity — Patrick Durusau @ 4:22 pm

As I was writing up the GWAS Central post, the question occurred to me: does their mapping of identifiers take something away from topic maps?

My answer is no and I would like to say why if you have a couple of minutes. 😉 Seriously! It isn’t going to take that long. However long it has taken me to reach this point.

Every time we talk, write or otherwise communicate about a subject, we at the same time have identified that subject. Makes sense. We want whoever we are talking, writing to or communicating with, to understand what we are talking about. Hard to do if we don’t identify what subject(s) we are talking about.

We do it all day, every day. In public, in private, in semi-public places. 😉 And we use words to do it. To identify the subjects we are talking about.

For the most part, or at least fairly often, we are understood by other people. Not always, but most of the time.

The problem comes in when we start to gather up information from different people who may (or may not) use words differently than we do. So there is a much larger chance that we don’t mean the same thing by the same words. Or we may use different words to mean the same thing.

Words, which were our reliable servants for the most part, become far less reliable.

To counter that unreliability, we can create groups of words, mappings if you like, to keep track of what words go where. But, to do that, we have to use words, again.

Start to see the problem? We always use words, to clear up our difficulties with words. And there isn’t any universal stopping place. The Cyc advocates would have us stop there and the SUMO crowd would have us stop over there and the Semantic Web folks yet somewhere else and of course the topic map mavens, yet one or more places.

For some purposes, any one or more of those mappings may be adequate. A mapping is only as good and for as long as it is useful.

History tells us that every mapping will be replaced with other mappings. We would do well us understand/document the words we are using as part of our mappings, as well as we are able.

But if words are used to map words, where do we stop? My suggestion would be to stop as we always have, wherever looks convenient. So long as the mapping suits your present purposes, what more would you ask of it?

I am quite content to have such stopping places because it means we will always have more starting places for the next round of mapping!

Ironic isn’t it? We create mappings to make sense out of words and our words lay the foundation for others to do the same.

March 26, 2012

Accountable Government – Stopping Improper Payments

Filed under: Marketing — Patrick Durusau @ 6:35 pm

Accountable Government – Stopping Improper Payments by Kimberley Williams.

Kimberly cites a couple of the usual food stamps, unemployment fraud cases to show the need for data integration.

I find it curious that small-fry fraud, food stamps, welfare, unemployment is nearly always cited as the basis for better financial controls in government.

The question that needs to be asked is: What is the ROI for stopping small-fry fraud? Which would need a reliable estimate of fraud versus the expense of better financial controls to stop it. If the expense is greater than the fraud, why bother?

On the other hand, defense contractor fraud may justify data integration and attempts at better financial controls. For example, for the fiscal year 2009, the Defense Criminal Investigative Service recovered $2,077,282,746. That $billions with a B.

That is money recovered, not estimated fraud.

From the following year but just in case you want to personalize the narrative a bit:

A South Carolina defense contractor has agreed to pay the U.S. government more than $1 million to resolve fraud allegations related to a contract with the Defense Department.

U.S. Attorney Bill Nettles said Wednesday the Defense Department paid nearly $435,000 to Columbia-based FN Manufacturing LLC to mentor minority-owned companies. But the government says FN never provided some of the mentoring and contracted out some of theservices, an action that violated the company’s contract.

FN is a subsidiary of FN Herstal of Belgium. The company makes the popular M-16 rifle, which is carried by almost every soldier. (Source: http://www2.wspa.com/news/2010/aug/04/defense-contractor-fined-ar-661483/)

or,

Defense company BAE Systems PLC said yesterday it would pay fines totaling more than $400 million after reaching settlements with Britain’s anti-fraud agency and the U.S. Justice Department to end decades-long corruption investigations into the company.

The world’s No. 2 defense contractor said that under its agreement with Washington, it would plead guilty to one criminal charge of conspiring to make false statements to the U.S. government over regulatory filings in 2000 and 2002. The agreement was subject to court approval, it said.

In Britain, it said it would plead guilty to one charge of breach of duty to keep proper accounting records about payments it made to a former marketing adviser in Tanzania in relation to the sale of a military radar system in 1999.

The bulk of the fines would be paid to the U.S. authorities. In Britain, BAE will be paying penalties of 30 million pounds ($46.9 million), including a charity payment to Tanzania.

BAE said it “regrets the lack of rigor in the past” and “accepts full responsibility for these past shortcomings.” (Source: http://www.capecodonline.com/apps/pbcs.dll/article?AID=/20100206/BIZ/2060310)

.

Measuring User Retention with Hadoop and Hive

Filed under: Hadoop,Hive,Marketing — Patrick Durusau @ 6:35 pm

Measuring User Retention with Hadoop and Hive by Daniel Russo.

From the post:

The Hadoop ecosystem is comprised of numerous tech­nologies that can work together to provide a powerful and scalable mech­anism for analyzing and deriving insight from large quan­tities of data.

In an effort to showcase the flex­i­bility and raw power of queries that can be performed over large datasets stored in Hadoop, this post is written to demon­strate an example use case. The specific goal is to produce data related to user retention, an important metric for all product companies to analyze and understand.

Motivation: Why User Retention?

Broadly speaking, when equipped with the appro­priate tools and data, we can enable our team and our customers to better under­stand the factors that drive user engagement and to ulti­mately make deci­sions that deliver better products to market.

User retention measures speak to the core of product quality by answering a crucial question about how the product resonates with users. In the case of apps (mobile or otherwise), that question is: “how many days does it take for users to stop using (or unin­stall) the app?”.

Pinch Media (now Flurry) delivered a formative presentation early in the AppStore’s history. Among numerous insights collected from their dataset was the following slide, which detailed patterns in user retention across all apps imple­menting their tracking SDK:

I mention this example because:

  • User retention is the measure of an app’s success or failure.*
  • Hadoop and Hive skill sets are good ones pick up.

* I have a pronounced fondness for requirements and the documenting of the same. Others prefer unit/user/interface/final tests. Still others prefer formal proofs of “correctness.” All pale beside the test of “user retention.” If users keep using an application, what other measure would be meaningful?

March 23, 2012

Building a Bigger Haystack

Filed under: Data Mining,Marketing,Topic Maps — Patrick Durusau @ 7:23 pm

Counterterrorism center increases data retention time to five years by Mark Rockwell.

From the post:

The National Counterterrorism Center, which acts as the government’s clearinghouse for terrorist data, has moved to hold onto certain types of data for up to five years to improve its ability to keep track of it across government databases.

On March 22, NCTC implemented new guidelines allowing much lengthier data retention period for “terrorism information” in federal datasets including non-terrorism information. NCTC had previously been required to destroy data on citizens within three months if no ties were found to terrorism. Those rules, according to NCTC, limited the effectiveness of the data, since in some instances, the ability to link across data sets over time could help track threats that weren’t immediate, or immediately evident. According to the center, the longer retention time can aid in connecting dots that aren’t immediately evident when the initial data is collected.

Director of National Intelligence James Clapper, Attorney General Eric Holder, and National Counterterrorism Center (NCTC) Director Matthew Olsen signed the updated guidelines designed on March 22 to allow NCTC to obtain and more effectively analyze certain data in the government’s possession to better address terrorism-related threats.

I looked for the new guidelines but apparently they are not posted to the NCTC website.

Here is the justification for the change:

One of the issues identified by congress and the intelligence community after the 2009 Fort Hood shootings and the Christmas Day 2009 bombing attempt was the government’s limited ability to query multiple federal datasets and to correlate information from many sources that might relate to a potential attack, said the center. A review of those attacks recommended the intelligence community push for the of state-of-the-art search and correlation capabilities, including techniques that would provide a single point of entry to various government databases, it said.

“Following the failed terrorist attack in December 2009, representatives of the counterterrorism community concluded it is vital for NCTC to be provided with a variety of datasets from various agencies that contain terrorism information,” said Clapper in a March 22 statement. “The ability to search against these datasets for up to five years on a continuing basis as these updated Guidelines permit will enable NCTC to accomplish its mission more practically and effectively than the 2008 Guidelines allowed.”

OK, so for those two cases, what evidence would having search capabilities over five years worth of data uncover? Even with the clarity of hindsight, there has been no showing of what data could have been uncovered.

The father of the attacker reported his son’s intentions to the CIA on November 19, 2009. That right, within 45 days of the attack.

Building a bigger haystack is a singularly ineffectual way to fight terrorism. It will generate more data, more IT systems, with the personnel to man and sustain them, all of which are agency drone, not fighting terrorism goals.

Cablegate was the result of a “bigger haystack” project. Do you think we need another one?

Topic maps and other semantic technologies can produce smaller, relevant haystacks.

I guess that is the question:

Do you want more staff and a larger budget or to have the potential to combat terrorism? (The latter is only potential given that US intelligence can’t intercept bombers on 36 day notice.)

March 21, 2012

FDsys – Topic Maps – Concrete Example

Filed under: Marketing,Topic Maps — Patrick Durusau @ 3:31 pm

FDsys – Topic Maps – Concrete Example

Have you ever wanted a quick, concrete example to give someone of the need for a topic map?

Today I saw: Liberating America’s secret, for-pay laws which is a great read on how pay-for standards are cited by federal regulations, but you have to pay for the standards to know what the rules say.

That’s a rip-off isn’t it? You not only have to follow the rules, on pain of enforcement, but you have to pay to know what the rules are.

Being a member of the OASIS Technical Advisory Board and general advocate of open standards, I see an opportunity for OASIS to claim some PR here.

So I go to the FDsys site, choose advanced search, limited to the Code of Federal Regulations and enter “OASIS.”

I get 298 “hits” for “collection:CFR and content:OASIS.”

Really?

Well, the first one is: http://www.gpo.gov/fdsys/pkg/CFR-2011-title18-vol1/pdf/CFR-2011-title18-vol1-sec37-6.pdf. Title 18?

In the event that an OASIS user makes an error in a query, the Responsible Party can block the affected query and notify the user of the nature of the error. The OASIS user must correct the error before making any additional queries. If there is a dispute over whether an error has occurred, the procedures in paragraph (d) of this section apply.

FYI, Title 18 is Federal Energy Regulatory Commission so this doesn’t sound right.

To cut to the chase, I find:

http://www.gpo.gov/fdsys/pkg/CFR-2009-title47-vol1/pdf/CFR-2009-title47-vol1-sec10-10.pdf

as one of the relevant examples.

Two questions:

  1. How to direct OASIS members to citations in U.S. and foreign laws/regs to promote OASIS?
  2. How to make U.S. and foreign regulators aware of relevant OASIS materials?

Hint: The answer is not:

  • Waiting for them to discover OASIS and its fine work.
  • Advertising the fine work of OASIS to its own membership and staff.

March 15, 2012

Mancrush on Todd Park?

Filed under: Governance,Government,Marketing — Patrick Durusau @ 8:02 pm

OK, I admit It. I have a mancrush on the new Federal CTO, Todd Park by Tim O’Reilly.

Tim waxes on about Todd’s success with startups and what I would call a vendor/startup show, Health Datapalooza. (Does the agenda for Health Datapalooza 2012 look just a little vague to you? Not what I would call a “technical” conference.)

And Tim closes with this suggestion:

I want to put out a request to all my friends in the technology world: if Todd calls you and asks you for help, please take the call, and do whatever he asks.

Since every denizen of K-Street already has Todd’s private cell number on speed dial, the technology community needs to take another tack.

Assuming you don’t already own several members of Congress and/or federal agencies, watch for news of IT issues relevant to your speciality.

Send in one (1) suggestion on a one (1) page letter that clearly summarizes why your proposal is relevant, cost-effective and worthy of further discussion. The brevity will be such a shocker that your suggestion will stand out from the hand cart stuff that pours in from, err, traditional sources.

The Office of Science and Technology Policy (No link from the Whitehouse homepage, to keep you from having to hunt for it.) This is where Todd will be working.

Contact page for The Office of Science and Technology (You can attach a document to your message.)

I would copy your representative/senators, particularly if you donate on a regular basis.

Todd’s predecessor is described as having “…inspired and productive three years on the job.” (Todd Park Named New U.S. Chief Technology Officer I wonder if that is what Tim means by “productive?”

March 14, 2012

Plastic Surgeon Holds Video Contest, Offers Free Nose Job to Winner

Filed under: Contest,Marketing — Patrick Durusau @ 7:35 pm

Plastic Surgeon Holds Video Contest, Offers Free Nose Job to Winner by Tim Nudd.

From the post:

Plastic surgeons aren’t known for their innovating marketing. But then, Michael Salzhauer isn’t your ordinary plastic surgeon. He’s “Dr. Schnoz,” the self-described “Nose King of Miami,” and he’s got an unorthodox offer for would-be patients—a free nose job to the winner of a just-announced video contest.

Can’t give away a nose job but what about a topic map?

What sort of contest should we have?

What would you do for a topic map?

March 13, 2012

ESPN API

Filed under: ESPN,Marketing — Patrick Durusau @ 8:16 pm

ESPN API

ESPN is developing a public API!

Sam Hunting often said that sports, with all the fan trivia, was a natural for topic maps.

Here is a golden opportunity!

Imagine a topic map that accesses ESPN and merges with local arrest/divorce records, fan blogs, photos from various sources.

First seen at Simply Statistics.

Then BI and Data Science Thinking Are Flawed, Too

Filed under: Identification,Identifiers,Marketing,Subject Identifiers,Subject Identity — Patrick Durusau @ 8:15 pm

Then BI and Data Science Thinking Are Flawed, Too

Steve Miller writes:

I just finished an informative read entitled “Everything is Obvious: *Once You Know the Answer – How Common Sense Fails Us,” by social scientist Duncan Watts.

Regular readers of Open Thoughts on Analytics won’t be surprised I found a book with a title like this noteworthy. I’ve written quite a bit over the years on challenges we face trying to be the rational, objective, non-biased actors and decision-makers we think we are.

So why is a book outlining the weaknesses of day-to-day, common sense thinking important for business intelligence and data science? Because both BI and DS are driven from a science of business framework that formulates and tests hypotheses on the causes and effects of business operations. If the thinking that produces that testable understanding is flawed, then so will be the resulting BI and DS.

According to Watts, common sense is “exquisitely adapted to handling the kind of complexity that arises in everyday situations … But ‘situations’ involving corporations, cultures, markets, nation-states, and global institutions exhibit a very different kind of complexity from everyday situations. And under these circumstances, common sense turns out to suffer from a number of errors that systematically mislead us. Yet because of the way we learn from experience … the failings of commonsense reasoning are rarely apparent to us … The paradox of common sense, therefore, is that even as it helps us make sense of the world, it can actively undermine our ability to understand it.”

The author argues that common sense explanations to complex behavior fail in three ways. The first error is that the mental model of individual behavior is systematically flawed. The second centers on explanations for collective behavior that are even worse, often missing the “emergence” – one plus one equals three – of social behavior. And finally, “we learn less from history than we think we do, and that misperception skews our perception of the future.”

Reminds me of Thinking, Fast and Slow by Daniel Kahneman.

Not that two books with a similar “take” proves anything but you should put them on your reading list.

I wonder when/where our perceptions of CS practices have been skewed?

Or where that has played a role in our decision making about information systems?

March 10, 2012

Spreadsheets: Let’s Just Be Friends

Filed under: Marketing — Patrick Durusau @ 8:21 pm

Spreadsheets: Let’s Just Be Friends by Timothy Powers.

From the post:

Have you ever had that awkward conversation with a significant other where they tell you they just want to be friends?

Sometimes the news is hard to swallow. It forces you to ask yourself, “What could I have done better?”

This same tough conversation needs to happen with certain software applications too. People just stay in relationships with software for too long. That said, it’s time to have the “friend talk” and break up with spreadsheets.

You’ve never really loved them. It’s been a relationship of convenience – they just showed up one day on your laptop and the rest was history. Yes, they’re nice and have a good personality (as much as software can), but it’s time to cut the cord and just be friends.

You have to really admire the copy writers for IBM. Without a hint of a blush, the post concludes with notice of an upcoming presentation (March 7, 2012) that promises you will: “…see new solutions that will give you a more personal relationship with your data.”

So we go from “just being friends,” to a “more personal relationship” in a scant number of lines.

To be honest, I have never really wanted a personal relation with my data. Or with my computer for that matter. One is a resource and the other is a tool, nothing more. Maybe that view depends on your social skills. 😉

This piece may help you cast doubt on the suitability of spreadsheets for all cases but I would avoid promising that topic maps or any other technology offers a “personal relationship.” (Do you know if common law still has a breach of promise of marriage action?)

THOMSON REUTERS NEWS ANALYTICS FOR INTERNET NEWS AND SOCIAL MEDIA

Filed under: Marketing,News — Patrick Durusau @ 8:21 pm

THOMSON REUTERS NEWS ANALYTICS FOR INTERNET NEWS AND SOCIAL MEDIA

From the post:

Thomson Reuters News Analytics (TRNA) for Internet News and Social Media is a powerful tool that allows users to analyze millions of public and premium sources of internet content, tag and filter that content to focus on the most relevant sources, and turn of the mass of data into actionable analytics that can be used to support trading, investment and risk management decisions.

The TRNA engine is based on tried and tested technology that is widely deployed by trading firms to analyze Reuters News and a host of other professional news wire services. TRNA for Internet News and Social Media leverages this core technology to analyze content sourced in collaboration with Moreover Technologies, which aggregates content from more than four million social media channels and 50,000 Internet news sites. This content is then analyzed in real-time by the TRNA engine, generating an output of quantifiable data points across a number of dimensions such as sentiment, relevance, and novelty. These and many other metrics can help analysts understand with greater context, what is being said and how it is being said across a number of media channels for a more complete picture.

I mention this story not because I think Thomson Reuters is using topic maps or that you will be likely to compete with them in the same markets.

No, I mention this story because Thomson Reuters doesn’t offer services for which there is no demand. That is to say that repackaging of information from “big data,” and other sources offers a new market for information products.

Using topic maps as the basis for repackaging streams of data into information products will enable you to leverage the talents of one analyst across any number of products. Instead of tasking analysts with producing new graphics of well-known information.

Ad targeting at Yahoo

Filed under: Ad Targeting,Marketing,User Targeting — Patrick Durusau @ 8:20 pm

Ad targeting at Yahoo by Greg Linden.

From the post:

A remarkably detailed paper, “Web-Scale User Modeling for Targeting” (PDF), will be presented at WWW 2012 that gives many insights into how Yahoo does personalized advertising.

In summary, the researchers describe a system used in production at Yahoo that does daily builds of large user profiles. Each profile contains tens of thousands of features that summarize the interests of each user from the web pages they have viewed, searches they made, and ads they have viewed, clicked on, and converted (bought something) on. They explain how important it is to use conversions, not just ad clicks, to train the system. They measure the importance of using recent history (what you did in the last couple days), of using fine-grained data (detailed categories and even some specific pages and queries), of using large profiles, and of including data about ad views (which is a huge and low quality data source since there are multiple ad views per page view), and find all those significantly help performance.

You need to read the paper and Greg’s analysis (+ additional references) if you are interested in user profiles/marketing.

Even if you are not, I think the paper offers a window into one view of user behavior. Whether that view works for you, your ad clients or topic map applications, is another question.

« Newer PostsOlder Posts »

Powered by WordPress