Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 4, 2011

Whose Afraid of Topic Maps? (see end for alt title)

Filed under: Marketing,Topic Maps — Patrick Durusau @ 7:52 pm

I saw a post asking why don’t programmers use topic maps? I replied at the time but on reflection, I think the answer is simpler than I thought at the time.

What do ontologies, classification systems and terminologies all have in common?

Ontology

SUMO and Cyc are projects that would be admitted by most to fall under the rubric of “ontology.”

Classification System

The Library of Congress Subject Headings (LCSH) is an example of a classification system.

Terminology

SNOMED-CT self-identifies as a terminology, it makes a good example.

Substitute other projects that fall under these labels. It doesn’t change the following analysis.

What do these projects have in common?

SNOMED-CT and LCSH were both produced by organs of the U.S. government but SUMO and Cyc were not.

SUMO and Cyc both claim to be universal upper ontologies while SNOMED-CT and LCSH make no such claims.

All four have organizations that have grown up around them or fostered them. Is that a clue? Perhaps.

Question: If you had a question SUMO, Cyc, SNOMED-CT or LCSH, and need an authoritative answer, who would you ask?

Answer: Ask the appropriate project. Yes? That is only the maintainers of SUMO, Cyc, SNOMED-CT or LCSH can provide authoritative answers for their projects. Only their answers have interoperability with their systems.

Topic maps offer the capability to have decentralized authority over terms. While maintaining the use of extended terms across topic map systems.

I know, that didn’t read very smoothly so let me try to explain by example.

In Semantic Integration: N-Squared to N+1 (and decentralized) I demonstrated how four (4) different authors could have four (4) different identifiers for Superman and write different things about Superman.

As a topic map author I notice they are talking about the same subject and unknown to those authors, I create a topic that merges all of their information about Superman together.

Those authors may continue to write different information about Superman using their identifier, but anyone using my topic map will see all the information gathered together.

The same reasoning applies to SNOMED-CT and LCSH, both of which have medical classifications that are different. The medical community and their patients could wait until the SNOMED-CT and LCSH organizations create a mapping between the two, but there are other options.

A medical researcher who notices a mapping between terms in SNOMED-CT and LSCH, could mark that and future researchers (assuming they accept the mapping) would find information from either source, using either identifier, together. Creating sets identifiers is just that simple. (I lie, it would take a useful interface as well but that’s doable.

Note the difference in process.

In one case, highly bureaucratic organizations who have a stake in the use of “their” ontology/classification/terminology” make all the decisions about what maps are made and what those maps include.

In the topic map case, the person with the need for the information and expertise finds a mapping between information sources and adds in a mapping on the spot. A bread crumb if you will that may help future researchers with greater information than existed at that location before.

Oh, and one other issue, interoperability.

If you construct topic maps using the Topic Maps Data Model (TMDM) and one of the standard syntaxes, no matter what new material you add to it, it will work with other standard topic map software.

Try that with your local library catalog software for example.

The question isn’t why programmers aren’t using topic maps?

Topic maps enable decentralized decision making about information, without preset limits on that information, with interoperability by default.

The question is: Why aren’t you demanding the use of topic maps? (you have to answer that one for yourself)

Alt title: Topic Maps: Priesthood of the User

PS: Quite serious about the alt title. Remember when browsers were supposed to display webpages the way users wanted them? That didn’t last very long did it. Maybe we should have another go at it with information?

October 3, 2011

Scala – [Java – *] Documentation – Marketing Topic Maps

Filed under: Interface Research/Design,Marketing,Scala,Topic Maps — Patrick Durusau @ 7:06 pm

Scala Documentation

As usual, when I am pursuing one lead to interesting material for or on topic maps, another pops up!

The Scala Days 2011 wiki had the following note:

Please note that the Scala wikis are in a state of flux. We strongly encourage you to add content but avoid creating permanent links. URLs will frequently change. For our long-term plans see this post by the doc czar.

A post that was followed by the usual comments about re-inventing the wheel, documentation being produced but not known to many, etc.

I mentioned topic maps as a method to improve program documentation to a very skilled Java/Topic Maps programmer, who responded: How would that be an improvement over Javadoc?

How indeed?

Hmmm, well, for starters the API documentation would not be limited to a particular program. That is to say for common code the API documentation for say a package could be included across several independent programs so that when the package documentation is improved for one, it is improved for all.

Second, it is possible, although certainly not required, to maintain API documentation as “active” documentation, that is to say it has a “fixed” representation such as HTML, only because we have chosen to render it that way. Topic maps can reach out and incorporate content from any source as part of API documentation.

Third, this does not require any change in current documentation systems, which is fortunate because that would require the re-invention of the wheel in all documentation systems for source code/programming documentation. A wheel that continues to be re-invented with every new source repository and programming language.

So long as the content is addressable (hard to think of content that is non-addressable, do you have a counter-example?), topic maps can envelope and incorporate that content with other content in a meaningful way. Granting that incorporating some content requires more efforts that other content. (Pointer “Go ask Bill with a street address” would be unusual but not unusable.)

The real question is, as always, is it worth the effort in a particular context to create such a topic map? Answers to that are going to vary depending upon your requirements and interests.

Comments?

PS: For extra points, how would you handle the pointer “Go ask Bill + street address” so that the pointer and its results can be used in an TMDM instance for merging purposes? It is possible. The result of any identifier can be respresented as an IRI. That much TBL got right. It was failing to realize that it is necessary to distinguish between use of an address as an identifer versus a locator that has cause so much wasted effort in the SW project.

Well, that an identifier imperialism that requires every identifier be transposed into IRI syntax. Given all the extant identifiers, with new ones being invented every day, let’s just that that replacing all extant identifiers comes under the “fairy tales we tell children” label where they all live happily ever after.

September 30, 2011

Semantic Integration: N-Squared to N+1 (and decentralized)

Filed under: Data Integration,Mapping,Marketing,Semantics,TMDM,Topic Maps — Patrick Durusau @ 7:02 pm

Data Integration: The Relational Logic Approach pays homage to what is called the N-squared problem. The premise of N-squared for data integration is that every distinct identification must be mapped to every other distinct identification. Here is a graphic of the N-squared problem.

Two usual responses, depending upon the proposed solution.

First, get thee to a master schema (probably the most common). That is map every distinct data source to a common schema and all clients have to interact with that one schema. Case closed. Except data sources come and go, as do clients so there is maintenance overhead. Maintenance can take time to agree on updates.

Second, no system integrates every other possible source of data, so the fear of N-squared is greatly exaggerated. Not unlike the sudden rush for “big data” solutions whether the client has “big data” or not. Who would want to admit to having “medium” or even “small” data?

The third response that is of topic maps. The assumption that every identification must map to every other identification means things get ugly in a hurry. But topic maps question the premise of the N-Squared problem, that every identification must map to every other identification.

Here is an illustration of how five separate topic maps, with five different identifications of a popular comic book character (Superman), can be combined and yet avoid the N-Squared problem. In fact, topic maps offer an N+1 solution to the problem.

Each snippet, written in Compact Topic Map (CTM) syntax represents a separate topic map.


en-superman
http://en.wikipedia.org/wiki/Super_man ;
- "Superman" ;
- altname: "Clark Kent" .

***


de-superman
http://de.wikipedia.org/wiki/Superman ;
- "Superman" ;
- birthname: "Kal-El" .

***


fr-superman
http://fr.wikipedia.org/wiki/Superman ;
- "Superman" ;
birthplace: "Krypton" .

***


it-superman
http://it.wikipedia.org/wiki/Superman ;
- "Superman" ;
- altname: "Man of Steel" .

***


eo-superman
http://eo.wikipedia.org/wiki/Superman ;
- "Superman" ;
- altname: "Clark Joseph Kent" .

Copied into a common file, superman-N-squared.ctm, nothing happens. That’s because they all have different subject identifiers. What if I add to the file/topic map, the following topic:


superman
http://en.wikipedia.org/wiki/Super_man ;
http://de.wikipedia.org/wiki/Superman ;
http://fr.wikipedia.org/wiki/Superman ;
http://it.wikipedia.org/wiki/Superman ;
http://eo.wikipedia.org/wiki/Superman .

Results in the file, superman-N-squared-solution.ctm.

Ooooh.

Or an author know one other identifier. So long as any group of authors uses at least one common identifier between any two maps, it results in the merger of their separate topic maps. (Ordering of the merges may be an issue.)

Another way to say that is that the trigger for merging of identifications is decentralized.

Which gives you a lot more eyes on the data, potential subjects and relationships between subjects.

PS: Did you know that the English and German versions gives Superman’s cover name as “Clark Kent,” while the French, Italian and Esperanto versions give his cover name as “Clark Joeseph Kent?”

PPS: The files are both here, Superman-Semantics-01.zip.

September 29, 2011

Why your product sucks

Filed under: Marketing,Software — Patrick Durusau @ 6:37 pm

Why your product sucks by Mike Pumphrey.

It isn’t often I stop listening to the Kinks for a software presentation, much less a recorded one. The title made me curious enough to spend six (6) minutes on it (total length).

My summary of the presentation:

Do you want to be righteous and make users work to use your software or do you want to be ubiquitous? Your choice.

September 27, 2011

Linked Data Semantic Issues (same for topic maps?)

Filed under: Linked Data,LOD,Marketing,Merging,Topic Maps — Patrick Durusau @ 6:51 pm

Sebastian Schaffert posted a message on the pub-lod@w3c.org list that raised several issues about Linked Data. Issues that sound relevant to topic maps. See what you think.

From the post:

We are working together with many IT companies (with excellent software developers) and trying to convince them that Semantic Web technologies are superior for information integration. They are already overwhelmed when they have to understand that a database ID for an object is not enough. If they have to start distinguishing between the data object and the real world entity the object might be representing, they will be lost completely.

I guess being told that a “real world entity” may have different ways to be identified must seem to be the road to perdition.

Curious because the “real world” is a messy place. Or is that the problem? That the world of developers is artificially “clean,” at least as far as identification and reference.

Perhaps CS programs need to train developers for encounter with the messy “real world.”

From the post:

> When you dereference the URL for a person (such as …/561666514#), you get back RDF. Our _expectation_, of course, is that that RDF will include some remarks about that person (…/561666514#), but there can be no guarantee of this, and no guarantee that it won’t include more information than you asked for. All you can reliably expect is that _something_ will come back, which the service believes to be true and hopes will be useful. You add this to your knowledge of the world, and move on.

There I have my main problem. If I ask for “A”, I am not really interested in “B”. What our client implementation therefore does is to throw away everything that is about B and only keeps data about A. Which is – in case of the FB data – nothing. The reason why we do this is that often you will get back a large amount of irrelevant (to us) data even if you only requested information about a specific resource. I am not interested in the 999 other resources the service might also want to offer information about, I am only interested in the data I asked for. Also, you need to have some kind of “handle” on how to start working with the data you get back, like:
1. I ask for information about A, and the server gives me back what it knows about A (there, my expectation again …)
2. From the data I get, I specifically ask for some common properties, like A foaf:name ?N and do something with the bindings of N. Now how would I know how to even formulate the query if I ask for A but get back B?

Ouch! That one cuts a little close. šŸ˜‰

What about the folks who are “…not really interested in ‘B’.” ?

How do topic maps serve their interests?

Or have we decided for them that more information about a subject is better?

Or is that a matter of topic map design? What information to include?

That “merging” and what gets “merged” is a user/client decision?

That is how it works in practice simply due to time, resources, and other constraints.

Marketing questions:

How to discover data users would like to have appear with other data, prior to having a contract to do so?

Can we re-purpose search logs for that?

September 26, 2011

VOGCLUSTERS: an example of DAME web application

Filed under: Astroinformatics,Data Integration,Marketing — Patrick Durusau @ 6:59 pm

VOGCLUSTERS: an example of DAME web application by Marco Castellani, Massimo Brescia, Ettore Mancini, Luca Pellecchia, and Giuseppe Longo.

Abstract:

We present the alpha release of the VOGCLUSTERS web application, specialized for data and text mining on globular clusters. It is one of the web2.0 technology based services of Data Mining & Exploration (DAME) Program, devoted to mine and explore heterogeneous information related to globular clusters data.

VOGCLUSTERS (The alpha website.)

From the webpage:

This page is the entry point to the VOGCLUSTERS Web Application (alpha release) specialized for data and text mining on globular clusters. It is a toolset of DAME Program to manage and explore GC data in various formats.

In this page the users can obtain news, documentation and technical support about the web application.

The goal of the project VOGCLUSTERS is the design and development of a web application specialized in the data and text mining activities for astronomical archives related to globular clusters. Main services are employed for the simple and quick navigation in the archives (uniformed under VO standards and constraints) and their manipulation to correlate and integrate internal scientific information. The project has not to be intended as a straightforward website for the globular clusters, but as a web application. A website usually refers to the front-end interface through which the public interact with your information online. Websites are typically informational in nature with a limited amount of advanced functionality. Simple websites consist primarily of static content where the data displayed is the same for every visitor and content changes are infrequent. More advanced websites may have management and interactive content. A web application, or equivalently Rich Internet Application (RIA) usually includes a website component but features additional advanced functionality to replace or enhance existing processes. The interface design objective behind a web application is to simulate the intuitive, immediate interaction a user experiences with a desktop application.

Note the use of DAME as a foundation to “…manage and explore GC data in various formats.”

Just in case you are unaware, astronomy/radio astronomy, along with High Energy Physics (HEP) were the original big data.

If you have an interest in astronomy, this would be a good project to follow and perhaps to suggest topic map techniques.

Effective marketing of topic maps requires more than writing papers and hoping that someone reads them. Invest your time and effort into a project, then suggest (appropriately) the use of topic maps. You and your proposal will have more credibility that way.

September 24, 2011

The Impact of online reviews: An annotated bibliograpy

Filed under: Marketing,Reviews — Patrick Durusau @ 6:59 pm

The Impact of online reviews: An annotated bibliograpy by Panos Ipeirotis.

From the post:

A few weeks back, I received some questions about online consumer reviews, their impact on sales, and other related questions. At that point, I realized that while I had a good grasp of the technical literature within Computer Science venues, my grasp of the overall empirical literature within Marketing and Information Systems venues was rather shaky, so I had to do a better work in preparing a literature review.

So, I did whatever a self-respecting professor would do in such a situation: I asked my PhD student, Beibei Li, to compile a list of such papers, write a brief summary of each, and send me the list. She had passed her qualification exam by studying exactly this area, so she was the resident expert in the topic.

Beibei did not disappoint me. A few hours later I had a very good list of papers in my mailbox, together with the description. It was so good, that I thought that many other people would be interested in the list.

Questions:

  1. When was the last time you read a review of topic map software?
  2. When was the last time you read a review of a topic map?

I mention this bibliography in part to show the usefulness of online reviews and possibly how to make them effective.

It that sounds like cold-blooded marketing, there is a good reason. It is.

What topic map software or topic map would you suggest for review?

Where would you publish the review?

In case you are having trouble thinking of one, check the Topic Maps Lab projects listing.

Introducing Fech

Filed under: Dataset,Marketing — Patrick Durusau @ 6:58 pm

Introducing Fech by Michael Strickland.

From the post:

Ten years ago, the Federal Election Commission introduced electronic filing for political committees that raise and spend money to influence elections to the House and the White House. The filings contain aggregate information about a committeeā€™s work (what it has spent, what it owes) and more detailed listings of its interactions with the public (who has donated to it, who it has paid for services).

Journalists who work with these filings need to extract their data from complex text files that can reach hundreds of megabytes. Turning a new set into usable data involves using the F.E.C.ā€™s data dictionaries to match all the fields to their positions in the data. But the available fields have changed over time, and subsequent versions donā€™t always match up. For example, finding a committeeā€™s total operating expenses in version 7 means knowing to look in column 52 of the ā€œF3Pā€ line. It used to be found at column 50 in version 6, and at column 44 in version 5. To make this process faster, my co-intern Evan Carmi and I created a library to do that matching automatically.

Fech (think ā€œF.E.C.h,ā€ say ā€œfetchā€), is a Ruby gem that abstracts away any need to map data points to their meanings by hand. When you give Fech a filing, it checks to see which version of the F.E.C.ā€™s software generated it. Then, when you ask for a field like ā€œtotal operating expenses,ā€ Fech knows how to retrieve the proper value, no matter where in the filing that particular software version stores it.

At present Fech only parses presidential filings but can be extended to other filings.

OK, so now it is easier to get campaign finance information. Now what?

So members of congress live in the pockets of their largest supporters. Is that news?

How would you use topic map to make that news? Serious question.

Or how to use topic maps to make that extraction a value-add when used with other New York Times content?


Update: Fech 1.1 Released.

September 19, 2011

SDD Contest!

Filed under: Humor,Marketing — Patrick Durusau @ 7:54 pm

As a follow up to my posting about SDD systems yesterday, I wanted to uncover some more material on the subject.

It occurred to me to search a well known computer science publisher’s site using just the acronym, “SDD.”

Here are the results from the first ten (10) hits (in no particular order):

  • semi-discrete matrix decomposition (SDD) method
  • soft decision-directed (SDD) adaptation
  • self-organizing link layer protocol (SDD)
  • Hierarchical Set Decision Diagrams (SDD)
  • SDD (strictly diagonally dominant)
  • secure Directed Diffusion protocol (SDD)
  • structured dialogic design
  • strong disjunctive database
  • solid-state-storage-device
  • storytest-driven development

Don’t bother counting, its ten (10) out of ten (10).

A better search engine would have computed a dissimilarity for terms and used that to separate terms that are likely to not have the same meaning. It should then group those “dissimilar” terms and say to the user: Your term(s) may have more than one meaning. We have created probable meanings for you to use as filters. (The display the snippets similar to those above to the user.)

That would avoid my having to sort through 236 “hits” as of today for “SDD.”

True, that is a very poor search term but if we can boost performance for the edge cases, think of what we will do for the more main stream searches.

Oh, sorry, almost forgot the contest part! Please contribute other expansions for the acronym SDD. (non-obscene expansions) No prizes, I am just curious about the number of unique expansions. Does make a good example of semantic ambiguity.


On a more serious note, a search interface that enabled readers to choose from a listing of terms the disambiguation of such content, would over time improve its search offerings to users. One can imagine professors having their graduate students disambiguating their articles for the same reason people write HTML pages. They want their content found by others.

Is Semantic Reuse A Requirement?

Filed under: Marketing — Patrick Durusau @ 7:52 pm

I was blogging about improvements at GeoCommons when I thought about my usual complaint about mapping services: There is no possibility of semantic reuse. Whatever the user thought they recognized as the basis for a mapping, is simply not recorded.

That is true for all the ETL apps like Talend, Kettle and others. They all handle mappings/transformations, all of which fail to specify identifications of the subjects recognized by the user. Simply not present. Any reuse will require another user to make an implicit mapping of subjects and their identifications and fail to record them.

Is it the case that reuse of semantic mappings may not be a requirement?

That is mappings are created as one use mappings and in the event of changes the entire mapping will have to be inspected if not re-created?

September 18, 2011

Neo4j Pitchfilm

Filed under: Marketing,Neo4j — Patrick Durusau @ 7:28 pm

Neo4j Pitchfilm

15 second film to pitch Neo4j.

Does this remind you of any technology?

September 17, 2011

The Revolution(s) Are Being Televised

Filed under: Crowd Sourcing,Image Recognition,Image Understanding,Marketing — Patrick Durusau @ 8:17 pm

Revolutions usually mean human rights violations, lots of them.

Patrick Meier has a project to collect evidence of mass human rights violations in Syria.

See: Help Crowdsource Satellite Imagery Analysis for Syria: Building a Library of Evidence

Topic maps are an ideal solution to link objects in dated satellite images to eye witness accounts, captured military documents, ground photos, news accounts and other information.

I say that for two reasons:

First, with a topic map you can start from any linked object in a photo, a witness account, ground photo or news account and see all related evidence for that location. Granted that takes someone authoring that collation but it doesn’t have to be only one someone.

Second, topic maps offer parallel subject processing, which can distribute the authoring task in a crowd-sourced project, for instance. For example, I could be doing photo analysis and marking the location of military checkpoints. That would generate topics and associations for the geographic location, the type of installation, dates (from the photos), etc. Someone else could be interviewing witnesses and taking their testimony. As part of the processing of that testimony, another volunteer codes an approximate date and geographic location in connection with part of that testimony. Still another person is coding military orders by identified individuals for checkpoints that include the one in question. Associations between all these separately encoded bits of evidence, each unknown to the individual volunteers becomes a mouse-click away from coming to the attention of anyone reviewing the evidence. And determining responsibility.

The alternative, the one most commonly used, is to have an under-staffed international group piece together the best evidence it can from a sea of documents, photos, witness accounts, etc. An adequate job for the resources they have, but why settle for an “adequate” job when it can be done properly with 21st century technology?

September 13, 2011

The Science of Timing

Filed under: Marketing,Topic Maps — Patrick Durusau @ 7:12 pm

The Science of Timing

Webinar on when to tweet, blog, update FaceBook, etc.

Just like other aspects of marketing there are patterns that are more successful than others.

Successful marketing of topic maps, whoever gets the contracts, is something I would really like to see.

What is it they say? A rising tide helps all boats?

September 5, 2011

Palin, Bachmann, and the Internal Welfare Code (aka, Internal Revenue Code)

Filed under: Government Data,Marketing,Topic Maps — Patrick Durusau @ 8:02 pm

Sarah Palin and Rep. Michelle Bachmann (R-Minnesota) support a 0% corporate tax rate and closing corporate loopholes in the Internal Revenue Code.*

Those cheering are more interested in the 0% corporate tax rate than closing corporate loopholes.

Truth be told, it should be called the Internal Welfare Code (IWC) as most of its provisions are loopholes for one group or another.

That makes tax reform hard because it is welfare reform. To have reform, someone has to give up their welfare benefits.

When welfare/tax provisions are written into the IWC/IRC, reports are prepared on the cost in revenue for those provisions. It often is easy to see who benefits from them.

Now there is a topic map project. Mapping the provisions of the IWC/IRC to the reports on “cost in revenue” for those provisions and identifying those who benefit from them. From that mapping you could produce a color-coded IWC/IRC that has the loopholes/provisions for each group identified by color. Or even re-organize the IWC/IRC by color so the loopholes for each group can be roughly compared.

That would be government transparency with bite!

PS: If you know of any government transparency project that would be interested, please pass this along. Or any candidate for that matter.

*The logic closing corporate loopholes to a 0% tax escapes me. But, I am not running for President of the United States.

September 3, 2011

Topic Map Opportunity: Financial Crimes

Filed under: Marketing,Topic Maps — Patrick Durusau @ 6:48 pm

As in investigating them.*

The need for identity resolution is alive and well. Here is an example of one market for the results that topic maps deliver.

Investigating Financial Crimes: Looking for Parts of Needles Over Multiple Haystacks?

From the webpage:

The International Association of Financial Crimes Investigators (IAFCI) annual conference begins next week in Charlotte, NC. The association, a non-profit international organization, provides an environment within which information about financial fraud, fraud investigation and fraud prevention methods can be collected, exchanged and taught for the common good of the industry. Infoglide Software Corporation is a proud sponsor of IAFCI and will be attending this yearā€™s event. We invite all of our friends – and future customers – to come visit us at Booth 105. We would love to see you there. The conference begins on Monday, August 29th and runs through Thursday, September 2nd at the Charlotte Convention Center.

IAFCI has members across the world in every major continent, broken down by about one third law enforcement, one third banking and one third retail and service members. The membership dovetails nicely with Infoglideā€™s customer base. With a presence in major retail organizations, top global banks and mission critical government agencies, it is evident that Infoglideā€™s Identity Resolution Engine (IRE) is a tool that financial crimes investigators are excited about.

If youā€™re in the business of detecting and investigating financial crimes, AML and fraud, you know what itā€™s like to perform endless searches into disparate data sources looking for that golden nugget of information. Itā€™s worse than trying to find a needle in a haystack. In fact, the needle itself is usually spread across several haystacks. Fortunately, Infoglideā€™s patented IRE software helps financial crimes investigators quickly identify ā€˜persons of interestā€™ within those haystacks of data. Hereā€™s how:

  1. Enterprise-wide Identity Resolution: allows single-request searching into multiple databases without the need to move or clean the data. Accounting for variations in names, addresses and other attributes, it eliminates time and effort in triaging fraud cases, and allows analysts to focus on the high-return cases.
  2. Social Link Discovery: looks at non-obvious relationships between individuals across databases. By understanding, for example, that a loan applicant shares an address with the loans officer, and also shares a telephone number with a known fraudster, a company can gain immediate insight into the risks associated with that transaction.
  3. Anonymous Resolution for data Privacy: allows organizations to productively search into restricted databases without violating international data privacy laws. The analyst can understand if a match was ā€˜likelyā€™ found in the restricted data, without ever seeing or retrieving the actual results.
  4. Real Time Read Flag Analysis: is the proactive implementation of the technology that looks at incoming transaction and compares them to internal and third party databases to understand possible identity matches and non obvious relationships. If one is found, the software triggers an instant alert.

So, if you or someone you know is heading to the conference, please stop by to meet us. Itā€™s possible those haystacks arenā€™t quite as intimidating as you thought.

What needles in haystacks are you finding?


* That and other uses, inquire.

September 1, 2011

Everything is Subjective

Filed under: Marketing,Topic Maps — Patrick Durusau @ 6:05 pm

Everything is Subjective

I ran across Peter Brown’s keynote presentation at TMRA 2008 this morning while looking for something else.

You will see why I paused as soon as you load the slides. šŸ˜‰

Having the advantage of three years to think about the issues Peter raises, I think part of his conclusion:

It’s my world – I want to organize it around what is important for me, not you.

Is spot on.

I mention it because the tenth anniversary of 9/11 approaches and the U.S. intelligence agencies are still organizing their data around what is important to them, the number 1 goal being to increase the importance and budget allocation of their agency over others.

Still, if you can see it, you can map it and having mapped it, merge data with your own. I won’t ask how you got it.

August 25, 2011

Erlang Community Site

Filed under: Erlang,Marketing — Patrick Durusau @ 7:02 pm

Erlang Commnity site: www.trapexit.org

Interesting collection of links to various Erlang resources.

Includes Try Erlang site, where you can try Erlang in your browser.

I have seen topic maps displayed in web browsers. I have seen fairly ugly topic map editors in web browsers. No, don’t think I have seen a “Try Topic Maps” type site. Have I just missed it?

Thoughts? Suggestions?

August 24, 2011

Do You CTRL+F?

Filed under: Marketing,Search Interface,Searching,Topic Maps — Patrick Durusau @ 7:00 pm

College students stumped by search engines

This link was forwarded to me by Sam Hunting.

That college students can’t do adequate searching isn’t a surprise.

What did surprise me was the finding: “…90 percent of American Google users do not know how to use CTRL or Command+F to find a word on a page.”

That finding was reported in: Crazy: 90 Percent of People Don’t Know How to Use CTRL+F.

Or as it appears in the article:

This week, I talked with Dan Russell, a search anthropologist at Google, about the time he spends with random people studying how they search for stuff. One statistic blew my mind. 90 percent of people in their studies don’t know how to use CTRL/Command + F to find a word in a document or web page! I probably use that trick 20 times per day and yet the vast majority of people don’t use it at all.

“90 percent of the US Internet population does not know that. This is on a sample size of thousands,” Russell said. “I do these field studies and I can’t tell you how many hours I’ve sat in somebody’s house as they’ve read through a long document trying to find the result they’re looking for. At the end I’ll say to them, ‘Let me show one little trick here,’ and very often people will say, ‘I can’t believe I’ve been wasting my life!'”

How should this finding influence subject identity tests and/or user interfaces for topic maps?

Should this push us towards topic map based data products, as data products, not topic maps?

August 20, 2011

B.A.D. Data Is Not Always Badā€¦If You Have a Data Scientist

Filed under: Data Analysis,Marketing — Patrick Durusau @ 8:07 pm

B.A.D. Data Is Not Always Badā€¦If You Have a Data Scientist by Frank Coleman.

From the post:

How many times have you heard, ā€œBad data means bad decisionsā€? Starting with the Best Available Data (B.A.D.) is a great approach because it gets the inspection process moving. The best way to engage key stakeholders is to show them their numbers, even if you have low confidence in the results. If done well, you will be speaking with a group of passionate colleagues!

People are often afraid to start measuring a project or initiative because they have low confidence in the quality of the data they are accessing. But there is a great deal you can do with B.A.D. data; start by looking for trends. Many times the trend is all you really need to get going. Make sure you also understand what the distribution of this data looks like. You donā€™t have to be a Six Sigma black belt (though it helps) to know if the data has a normal distribution. From there you can ā€œgeek outā€ if you want, but your time will be better served by keeping it simple ā€“ especially at this stage.

A bit “practical” for my tastes, ;-), but worth your attention.

August 18, 2011

Integration Imperatives Around Complex Big Data

Filed under: BigData,Data as Service (DaaS),Data Integration,Marketing — Patrick Durusau @ 6:52 pm

Integration Imperatives Around Complex Big Data

  • Informatica Corporation (NASDAQ: INFA), the worldā€™s number one independent provider of data integration software, today announced the availability of a new research report from the Aberdeen Group that shows how organizations can get the most from their data integration assets in the face of rapidly growing data volumes and increasing data complexity.
  • Entitled: Future Integration Needs: Embracing Complex Data, the Aberdeen report reveals that:
    • Big Data is the new reality ā€“ In 2010, organizations experienced a staggering average data volume growth of 40 percent.
    • XML adoption has increased dramatically ā€“ XML is the most common semi-structured data source that organizations integrate. 74 percent of organizations are integrating XML from external sources. 66 percent of organizations are integrating XML from internal sources.
    • Data complexity is skyrocketing ā€“ In the next 12 months enterprises plan to introduce more complex unstructured data sources ā€“ including office productivity documents, email, web content and social media data ā€“ than any other data type.
    • External data sources are proliferating ā€“ On average, organizations are integrating 14 external data sources, up from 11 a year ago.
    • Integration costs are rising ā€“ As integration of external data rises, it continues to be a labor- and cost-intensive task, with organizations integrating external sources spending 25 percent of their total integration budget in this area.
  • For example, according to Aberdeen, organizations that have effectively integrated complex data are able to:
    • Use up to 50 percent larger data sets for business intelligence and analytics.
    • Integrate twice as successfully external unstructured data into business processes (40 percent vs. 19 percent).
    • Deliver critical information in the required time window 2.5 times more often via automated data refresh.
    • Slash the incidence of errors in their data almost in half compared to organizations relying on manual intervention when performing data updates and refreshes.
    • Spend an average of 43 percent less on integration software (based on 2010 spend).
    • Develop integration competence more quickly with significantly lower services and support expenditures, resulting in less costly business results.

I like the 25% of data integration budgets being spend on integrating external data. Imagine making that easier for enterprises with a topic map based service.

Maybe “Data as service (DaaS)” will evolve from simply being data delivery to dynamic integration of data from multiple sources. Where currency, reliability, composition, and other features of the data are on a sliding scale of value.

August 15, 2011

Visitor Conversion with Bayesian Discriminant and Hadoop

Filed under: Bayesian Models,Hadoop,Marketing — Patrick Durusau @ 7:31 pm

Visitor Conversion with Bayesian Discriminant and Hadoop

From the post:

You have lots of visitors on your eCommerce web site and obviously you would like most of them to convert. By conversion, I mean buying your product or service. It could also mean the visitor taking an action, which potentially could financially benefit the business e.g., opening an account or signing up for email new letter. In this post, I will cover some predictive data mining techniques that may facilitate higher conversion rate.

Wouldnā€™t it be nice if for any ongoing session, you could predict the odds of the visitor converting during the session, based on the visitorā€™s behavior during the session.

Armed with such information, you could take different kinds of actions to enhance the chances of conversion. You could entice the visitor with a discount offer. Or you could engage the visitor in a live chat to answer any product related questions.

There are simple predictive analytic techniques to predict the probability of of a visitor converting. When the predicted probability crosses a predefined threshold, the visitor could be considered to have high potential of converting.

I would ask the question of “conversion” more broadly.

That is how can we dynamically change the model of subject identity in a topic map to match a user’s expectations? What user behavior and how would we track it to reach such an end?

Reasoning that users are more interested in and more likely to support topic maps that reinforce their world views. And selling someone topic map output that they find agreeable is easier than output they find disagreeable.

August 14, 2011

Mining Data in Motion

Filed under: Data Mining,Marketing,Topic Maps — Patrick Durusau @ 7:09 pm

Mining Data in Motion by Chris Nott says: “…The scope for business innovation is considerable.

Or in context:

Mining data in motion. On the face of it, this seems to be a paradox: data in motion is transitory and so can’t be mined. However, this is one of the most powerful concepts for businesses to explore innovative opportunities if they can only release themselves from the constraints of today’s IT thinking.

Currently analytics are focused on data at rest. But exploiting information as it arrives into an organisation can open up new opportunities. This might include influencing customers as they interact based on analytics triggered by web log insight, social media analytics, a real-time view of business operations, or all three. The scope for business innovation is considerable.

The ability to mine this live information in real time is a new field of computer science. The objective is to process information as it arrives, using the knowledge of what has occurred in the past, but the challenge is in organising the data in a way that it is accessible to the analytics, processing a stream of data in motion.

Innovation in this context is going to require subject recognition, whether so phrased or not, and collation with other information, some of which may also be from live feeds.

Curious if standards for warranting the reliability of identifications or information in general are going to arise? Suspect there will be explicit liability limitations for information and the effort made to verify it. Free information will likely carry a disclaimer for any use for any purpose. Take your chances.

How reliable the information you are supplied depending upon the level of liability you have purchased.

I wonder how an “information warranty” economy would affect information suppliers who now disavow responsibility for their information content. Interesting because businesses would not hire lawyers or accountants who did not take some responsibility for their work. Perhaps there are more opportunities in data mining than just data stream mining.

Perhaps: Topic Maps – How Much Information Certainty Can You Afford?

Information could range from the fun house stuff you see on Fox to full traceability to sources that expands in real time. Depends on what you can afford.

August 10, 2011

Neo4j Enhanced API

Filed under: Marketing,Neo4j,TMRM — Patrick Durusau @ 7:15 pm

Neo4j Enhanced API

From the wiki page:

A Neo4J graph consists of the following element types:

  • Node
  • Relationship
  • RelationshipType
  • Property name
  • Property value

These five types of elements don’t share a common interface, except for Node and Relationship, which both extend the PropertyContainer interface.

The Enhanced API unifies all Neo4j elements under the common interface Vertex.

Which has the result:

Generalizations of database elements

The Vertex interface support methods for the manipulation of Properties and Edges, thereby providing all methods normally associated with Nodes. Properties and Edges (including their Types) are Vertices too. This allows for the generalization of all Neo4j database elements as if they were Nodes.

Due to generalization it is possible to create Edges involving regular Vertices, Edges of any kind, including BinaryEdges and Properties.

The generalization also makes it possible to set Properties on all Vertices. So it even becomes possible to set a Property on a Property.

Hmmm, properties on properties, where have I heard that? šŸ˜‰

Properties on properties and properties on values are what we need for robust data preservation, migration or even re-use.

I was reminded recently that SNOBOL turns 50 years old in 2012. Care to guess how many formats, schemas, data structures we have been through during just that time period. Some of them intended to be “legacy” formats, forever readable by those who follow. Except that people forget the meaning of the “properties” and their “values.”

If we had properties on properties and properties on values, we could at least record our present understandings of those items. And others could do the same to our properties and values.

Those mappings would not be universally useful to everyone. But if present, we would have the options to follow those mappings or not.

Perhaps that’s the key, topic maps are about transparent choice in the reuse of data.

Leaves the exercise or not of choice up to the user.

This is a step in that direction.

August 9, 2011

Tottenham riots: Data journalists and social scientists should join forces

Filed under: Marketing,Topic Maps — Patrick Durusau @ 7:56 pm

Tottenham riots: Data journalists and social scientists should join forces

Interpretations of riots reek with racial and class prejudice. I remember the riots in the 1960’s as well as more recent ones. The interpretations that followed could be predicted based on what channel or commentator was on the TV.

Topic maps are well suited to bring up parallel events from history, along with calmer analysis.

I wonder if anyone would bother to read such a topic map? Or, like the various economic bubbles that keep repeating themselves, would they say “this time is different.”

What is a Data Scientist?

Filed under: Marketing,Topic Maps — Patrick Durusau @ 7:55 pm

What is a Data Scientist?

About as hard to answer as: “What is a topic map designer?”

From the post:

  • Obtain: pointing and clicking does not scale.
  • Scrub: the world is a messy place.
  • Explore: You can see a lot by looking.
  • Models: always bad, sometimes ugly.
  • iNterpret: ā€œThe purpose of computing is insight, not numbers.ā€

See the post for the full details and suggest any other qualifications in your comments below. Thanks!

August 7, 2011

U.S. to Fund Hacking Projects That Thwart Cyber-Threats

Filed under: Funding,Marketing,Topic Maps — Patrick Durusau @ 7:07 pm

U.S. to Fund Hacking Projects That Thwart Cyber-Threats

From the post:

LAS VEGASā€”Former L0pht hacker known as “Mudge” discussed a new government initiative to fund hacking projects designed to help block cyber-threats at the Black Hat security conference.

The Defense Advanced Research Projects Agency will fund new cyber-security proposals under the new Cyber-Fast Track project, Peiter Zatko, currently a program manager for the agency’s information innovation office, said in his Aug. 4 keynote speech at Black Hat. The project, originally announced at ShmooCon cyber-security conference back in January, will bridge the gap between hacker groups and government agencies, he said.

Under the Cyber-Fast Track initiative, DARPA will fund between 20 to 100 projects annually. Open to anybody, researchers can pitch DARPA with ideas and have a project approved and funded within 14 days of the application, Zatko said. Developers will retain intellectual property rights while DARPA will operate under government use rights, Zatko said.

That sounds awfully attractive.

Suspect the more specific the proposal the better chance of getting it funded, so will be omitting the universality arguments about topic maps and the coming data singularity. šŸ˜‰

I don’t hang out in hacker circles (oversight on my part) so I guess the first step is to look at some of the conferences to see what threats are being discussed along with current remedies. To get a feel for where topic maps could make a difference.

If you do hang out in hacker circles (don’t tell me) and you are interested in working on a topic maps proposals for DARPA (I won’t ask where you get all your brilliant hacker ideas from, you must just read a lot), drop me a post.

August 3, 2011

Design: Build the Mobile Gov Toolkit

Filed under: eGov,Government Data,Marketing,Mobile Gov — Patrick Durusau @ 7:39 pm

Design: Build the Mobile Gov Toolkit

Tim O’Reilly tweeted this link.

Deadline for comments: 2 September 2011

From the post:

Your recommendations will help build an open, dynamic toolset–on a public wiki–to help agencies create and implement citizen-centric mobile gov services.

We are focusing on five areas.

  1. Policies: Tell us about policy gaps or ideas to support building mobile programs.
  2. Practices: What would jumpstart your efforts? Templates? Standards? Examples? Can you share your templates, standards, business cases?
  3. Partnerships: With whom and how can we work together?
  4. Products: What are your ideas for apps, mobile sites, text programs, mashups?
  5. Promotions: What are some great ways to spread the word?
  6. Do you have another category? You can add that, too.

What should we tell them about topic maps?

UK Government Paves Way for Data-Mining

Filed under: Authoring Topic Maps,Data Mining,Marketing — Patrick Durusau @ 7:37 pm

UK Government Paves Way for Data-Mining

Blog report on interesting UK government policy report.

From the post:

The key recommendation is that the Government should press at EU level for the introduction of an exception to current copyright law, allowing ā€œnon-consumptiveā€ use of a work (ie a use that doesnā€™t directly trade on the underlying creative and expressive purpose of the work). In the process of text-mining, copying is only carried out as part of the analysis process ā€“ it is a substitute for a human reading the work, and therefore does not compete with the normal exploitation of the work itself ā€“ in fact, as the paper says, these processes actually facilitate a workā€™s exploitation (ie by allowing search, or content recommendation). (emphasis in original)

If you think of topic maps as a value-add on top of information stores, allowing “non-consumptive” access would be a real boon for topic maps.

You could create a topic map into copyrighted material and the user of your topic map could access that material only if say they were a subscriber to that content.

As Steve Newcomb has argued on many occasions, topic maps can become economic artifacts in their own right.

August 2, 2011

XBRL Challenge ($20K Prize)

Filed under: Funding,Marketing,Topic Maps,XBRL — Patrick Durusau @ 7:54 pm

XBRL Challenge ($20K Prize)

OK, I admit that after the US budget debate, $20K doesn’t sound like a lot of money. šŸ˜‰ But, think of the prestige, groupies, etc., that would go along with winning first place.

From the website:

Over 1770 companies have already filed XBRL-formatted financial statements to the SEC and by year-end 2011, all public companies will be doing so. While several XBRL-enabled tools are available on the marketplace today, weā€™ve created the XBRL Challenge to encourage the development of more tools and build awareness among analysts about the wealth of data available to them.

The XBRL Challenge is a contest that invites participants to contribute open source analytical applications for investors that leverage corporate XBRL data.

Here is the short description of what they are looking for:

Tools that rely on XBRL data, e.g., tool that extracts data for multi-company comparison via desktop application; or one that creates real-time valuation measures and delivers to mobile devices.

I am going to check out the rules and existing apps.

See you near the winner’s circle?

July 27, 2011

The possibilities of Hadoop for Big Data (video)

Filed under: Marketing — Patrick Durusau @ 8:32 am

The possibilities of Hadoop for Big Data (video)

Start off your thinking about topic map advertising with something effective and amusing!

It’s short, attention getting, doesn’t over promise or bore with details.

Enjoy! (And think about a topic map ad.)

« Newer PostsOlder Posts »

Powered by WordPress