Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

May 5, 2011

NoSQL databases for the .NET developer: What’s the fuss all about?

Filed under: Marketing,NoSQL — Patrick Durusau @ 1:43 pm

NoSQL databases for the .NET developer: What’s the fuss all about?

Date: May 24 2011 – 2:00pm – 3:00pm EST

http://www.regonline.com/970013

From the post:

NOSQL (Not Only SQL) databases are one of the hottest technology trends in the software industry. Ranging from web companies like Facebook, Foursquare, Twitter to IT power houses such as the US Federal Government, Banks or NASA; the number of companies that invest in the NOSQL paradigm as part of their infrastructure is growing exponentially. What is this NOSQL movement? What are the different types of NOSQL databases? What are the real advantages, challenges and ROIs? Can we leverage NOSQL databases from my .NET applications? This webinar will present an overview of the NOSQL movement from the perspectives of a .NET developer. We will explore the different types of NOSQL databases as well as their .NET interfaces. Finally, we will present a series of real world examples that illustrate how other companies have taken advantage of NOSQL databases as part of their infrastructure.

Are you ready to leverage a NoSQL database from inside a topic map .Net application?

May 2, 2011

Driving Topic Map Adoption

Filed under: Interface Research/Design,Marketing — Patrick Durusau @ 10:31 am

I ran across Top Three Drivers of Solr Adoption and thought it might offer some lessons for driving topic map adoption.

From a survey of customers, these were the following drivers:

  • Vendor Fatigue
  • Flexibility
  • Stability

Having a “cute” name didn’t make the list. So much for all the various debates and recriminations over what to call products. Useful products, even with bad names, survive and possible thrive. Useless products, well-named or not, don’t.

Vendor Fatigue referred to the needless complex and sometimes over-reaching vendor agreements that seek to guarantee only particular levels of usage, etc. You really need to see the Dlibert cartoon at the post.

Very large vendors, ahem, I pass over without naming names, can rely on repeat business “just because.” Small vendors, on the other hand, should concentrate on delivering results and no so much on trying to trap customers in agreements. (You will also have lower legal fees.)

Good results = repeat business.

Flexibility referred to the ease with which Solr can be adapted to particular needs both for input and output. Topic maps have that in spades.

Stablity I think what the author meant was complexity. That is Lucene is for more complex than Solr, which makes it more difficult to maintain. Solr, like any other abstraction (compare editing with ex to vi), makes common tasks easier.

Topic maps can be as complex as need be.

But, in terms of user interfaces, successful topic map applications are going to be domain/task specific.

I say that because our views of editing/reading are so shaped by our communities, that departures from those, even if equally capable of some task, feel “unnatural.”

Shaping topic map interfaces in conversation with actual users, a fairly well documented technique, is more likely to produce a successful interface than developers guessing for days what they think is an “intuitive” interface.

April 27, 2011

Hard economic lessons for news

Filed under: Marketing — Patrick Durusau @ 2:32 pm

Hard economic lessons for news

I saw this in the TechDirt Daily Email. Mike Masnick offered the following summary:

  • Tradition is not a business model. The past is no longer a reliable guide to future success.
  • “Should” is not a business model. You can say that people “should” pay for your product but they will only if they find value in it.
  • Virtue is not a business model. Just because you do good does not mean you deserve to be paid for it.
  • Business models are not made of entitlements and emotions. They are made of hard economics. Money has no heart.
  • Begging is not a business model. It’s lazy to think that foundations and contributions can solve news’ problems. There isn’t enough money there.
  • No one cares what you spent. Arguing that news costs a lot is irrelevant to the market.

One or more of these themes have been offered as justifications for semantic technologies, including topic maps.

I would add for semantic technologies:

  • Saving the world isn’t a business model. Try for something with more immediate and measurable results.
  • Cult like faith in linking isn’t a solution, it’s a delusion. Linking per se is not a value, successful results (by whatever means) are.
  • Sharing my data isn’t a goal. Sharing in someone else’s data is.

April 24, 2011

Write Good Papers

Filed under: Authoring Topic Maps,Marketing — Patrick Durusau @ 5:31 pm

Write Good Papers by Daniel Lemire merits bookmarking and frequent review.

Authoring, whether of a blog post, a formal paper, program documentation, or a topic map, is authoring.

Review of these rules will improve the result of any authoring task.

April 23, 2011

NoSQL, NewSQL and Beyond:…

Filed under: Marketing,NoSQL — Patrick Durusau @ 8:21 pm

NoSQL, NewSQL and Beyond: The answer to SPRAINed relational databases

From the post:

The 451 Group’s new long format report on emerging database alternatives, NoSQL, NewSQL and Beyond, is now available.

The report examines the changing database landscape, investigating how the failure of existing suppliers to meet the performance, scalability and flexibility needs of large-scale data processing has led to the development and adoption of alternative data management technologies.

There is one point that I think presents an opportunity for topic maps:

Polyglot persistence, and the associated trend toward polyglot programming, is driving developers toward making use of multiple database products depending on which might be suitable for a particular task.

I don’t know if the report covers the reasons for polyglot persistence as I don’t have access to the “full” version of the report. Maybe someone who does can say if the report covers why the polyglot nature of IT resources is immune to attempts at its reduction.

April 22, 2011

Square Pegs and Round Holes in the NOSQL World

Filed under: Graphs,Key-Value Stores,Marketing,Neo4j,NoSQL — Patrick Durusau @ 1:06 pm

Square Pegs and Round Holes in the NOSQL World

Jim Webber reviews why graph databases (such as Neo4J) are better for storing graphs than Key-Value, Document or relational datastores.

He concludes:

In these kind of situations, choosing a non-graph store for storing graphs is a gamble. You may find that you’ve designed your graph topology far too early in the system lifecycle and lose the ability to evolve the structure and perform business intelligence on your data. That’s why Neo4j is cool – it keeps graph and application concerns separate, and allows you to defer data modelling decisions to more responsible points throughout the lifetime of your application.

You know, we could say the same thing about topic maps, that you don’t have to commit to all modeling decisions up front.

Something to think about.

April 21, 2011

IC Bias: If it’s measurable, it’s meaningful

Filed under: Data Models,Intelligence,Marketing — Patrick Durusau @ 12:37 pm

Dean Conway writes in Data Science in the U.S. Intelligence Community [1] about modeling assumptions:

For example, it is common for an intelligence analyst to measure the relationship between two data sets as they pertain to some ongoing global event. Consider, therefore, in the recent case of the democratic revolution in Egypt that an analyst had been asked to determine the relationship between the volume of Twitter traffic related to the protests and the size of the crowds in Tahrir Square. Assuming the analyst had the data hacking skills to acquire the Twitter data, and some measure of crowd density in the square, the next step would be to decide how to model the relationship statistically.

One approach would be to use a simple linear regression to estimate how Tweets affect the number of protests, but would this be reasonable? Linear regression assumes an independent distribution of observations, which is violated by the nature of mining Twitter. Also, these events happen in both time (over the course of several hours) and space (the square), meaning there would be considerable time- and spatial-dependent bias in the sample. Understanding how modeling assumptions impact the interpretations of analytical results is critical to data science, and this is particularly true in the IC.

His central point that: Understanding how modeling assumptions impact the interpretations of analytical results is critical to data science, and this is particularly true in the IC. cannot be over emphasized.

The example of Twitter traffic reveals a deeper bias in the intelligence community, if it’s measurable, it’s meaningful.

No doubt Twitter facilitated communication within communities that already existed but that does not make it an enabling technology.

The revolution was made possible by community organizers working over decades (http://english.aljazeera.net/news/middleeast/2011/02/2011212152337359115.html) and trade unions (http://www.guardian.co.uk/commentisfree/2011/feb/10/trade-unions-egypt-tunisia).

And the revolution continued after Twitter and then cell phones were turned off.

Understanding such events requires investment in human intell and analysis, not over reliance on SIGINT. [2]


[1] Spring (2011) issue of I-Q-Tel’s quarterly journal, IQT Quarterly

[2] That a source is technical or has lights and bells, does not make it reliable or even useful.

PS: The Twitter traffic, such as it was, may have primarily been from: Twitter, I think, is being used by news media people with computer connections, through those kind of means. Facebook, Twitter, and the Middle East, IEEE Spectrum, Steve Cherry interviews Ben Zhao, expert on social networking performance.

Are we really interested in how news people use Twitter, even in a social movement context?

April 20, 2011

5 Reasons Why Product Data Integration is Like Chasing Roadrunners

Filed under: Data Integration,Marketing — Patrick Durusau @ 2:16 pm

5 Reasons Why Product Data Integration is Like Chasing Roadrunners

Abstract:

Integrating product data carries a tremendous value, but cleanly integrating that data across multiple applications, data stores, countries and businesses can be as elusive a goal as catching the famed Looney Tunes character.

So why do it?

As a report from Automotive Aftermarket Industry Association pointed out, assuming $100 billion in transactions between suppliers and direct customer in the aftermarket each year, the shared savings potential tops $1.7 billion annually by eliminating product data errors in the supply chain. That’s just potential savings in one industry, in one year.

Note to self: The 1.7% savings on transaction errors requires a flexible and accurate mapping form one party’s information system to another. Something topic maps excel at.

You know what they say, a few $billion here, a few $billion there, and pretty soon you are talking about real money.

Giving a Single Name a Single Identity

Filed under: Marketing,Subject Identity — Patrick Durusau @ 2:15 pm

Giving a Single Name a Single Identity

This was just too precious to pass up.

The securities industry, parts of it anyway, would like to identify what is being traded in a reliable way.

Answer: Well, we’ll just pick a single identifier, etc. Read the article for the details but see near the end:

If you are running a worldwide trading desk in search of buyers or sellers in every corner of the world, you’re going to have a hard time finding them, in a single universal manner, says Robin Strong, Director of Buy-side Market Strategy, at Fidessa Group, a supplier of trading systems.

That is “primarily because the parties involved at the end of the various bits of wire from a single buy-side dealing desk don’t tend to cooperate. They’re all competitors. They want to own their piece of the value chain,’’ whether it’s a coding system or an order management system. “They’ve built a market that they own” and want to protect, he said.

With a topic map you could create a mapping into other markets.

Topic maps: Enhance the market you own with a part of someone else’s.

How is that for a marketing slogan?

Should by some mis-chance a single identifier come about, topic maps can help maintain insider semantics to maintain the unevenness of the playing field.

April 19, 2011

The Lisp Curse

Filed under: Lisp,Marketing,Semantic Diversity — Patrick Durusau @ 9:48 am

The Lisp Curse by Rudolf Winestock begins:

This essay is yet another attempt to reconcile the power of the Lisp programming language with the inability of the Lisp community to reproduce their pre-AI Winter achievements. Without doubt, Lisp has been an influential source of ideas even during its time of retreat. That fact, plus the brilliance of the different Lisp Machine architectures, and the current Lisp renaissance after more than a decade in the wilderness demonstrate that Lisp partisans must have some justification for their smugness. Nevertheless, they have not been able to translate the power of Lisp into a movement with overpowering momentum.

In this essay, I argue that Lisp’s expressive power is actually a cause of its lack of momentum.

Read the essay, then come back here. I’ll wait.

… … … …

OK, good read, yes?

At first blush, I thought about HyTime and its expressiveness. Or of topic maps. Could there be a parallel?

But non-Lisp software projects proliferate.

Let’s use http://sourceforge.net for examples.

Total projects for the database category – 906.

How many were written using Lisp?

Lisp 1

Compared to:

Java 282
C++ 106
PHP 298
Total: 686

That may not be fair.

Databases may not attract AI/Lisp programmers.

What about artificial intelligence?

Lisp 8
Schema 3
Total: 11

Compared to:

Java 115
C++ 111
C 42
Total: 268

Does that mean that Java, C++ and C are too expressive?

Or that their expressiveness has retarded their progress in some way?

Or is some other factor is responsible for proliferation of projects?

And a proliferation of semantics.

*****
Correction: I corrected sourceforge.org -> sourceforge.net and made it a hyperlink. Fortunately sourceforge silently redirects my mistake in entering the domain name in a browser.

April 16, 2011

MERGE Ahead

Filed under: Marketing,Merging,SQL — Patrick Durusau @ 2:44 pm

Merge Ahead: Introducing the DB2 for i SQL MERGE statement

Karl Hanson of IBM writes:

As any shade tree mechanic or home improvement handyman knows, you can never have too many tools. Sure, you can sometimes get by with inadequate tools on hand, but the right tools can help complete a job in a simpler, safer, and quicker way. The same is true in programming. New in DB2 for i 7.1, the MERGE statement is a handy tool to synchronize data in two tables. But as you will learn later, it can also do more. You might think of MERGE as doing the same thing you could do by writing a program, but with less work and with simpler notation.

Don’t panic, it isn’t merge in the topic map sense but it does show there are market opportunities for what is a trivial task for a topic map.

That implies to me there are also opportunities for more complex tasks, suitable only for topic maps.

April 13, 2011

April 11, 2011

Movies with multiple Harry Potter wizards

Filed under: Humor,Marketing — Patrick Durusau @ 5:39 am

Movies with multiple Harry Potter wizards

From Flowingdata.com:

I feel like whenever I watch a British film, I see a Harry Potter wizard or witch in it. I guess I’m not imagining things. The Ragbag had a similar curiosity and graphed all the films with four or more wizards in it — all 24 of them.

Something for everyone to consider adding to their Harry Potter topic maps.

Rawlings is said to be considering electronic versions of the Potter series. For a reported $100 Million. I would take a chance on digital piracy for $100 Million. 😉

A topic map to navigate the series, merged in some of the better fan material could be quite interesting.

April 10, 2011

Language Really Does Matter For Search – Post

Filed under: Marketing,Searching,Semantics — Patrick Durusau @ 2:51 pm

Language Really Does Matter For Search

Matthew Hurst writes:

While most pundits regard the deep, formal semantics promised by the likes of Powerset as not important to search I feel that I am personally finding search dead-ends in my long tail queries that clearly indicate the need for this type of feature. I will commit the sin of using a single example to support my point.

I don’t know if I qualify as a “pundit” but I certainly disagree that “deep, formal semantics” are not important for searching.

Well, assuming you want to find useful results.

I suspect that is part of the problem is that we have become accustomed to very low quality answers and shifting page after page of duplicated and/or irrelevant material.

As more data comes online, the worse the return on searches is going to become.

And the greater the opportunity for topic maps.

Ten years ago when the first topic map standard was approved, there was web searching but the quality and quantity of data wasn’t nearly what it is today.

I don’t know of any hard statistics on it but I would venture to guess that among staff allowed to use the WWW at work, at least an hour a day, every day, is spend not finding information on the WWW.

Think about that. At least 250 hours per person per year.

And the real figure is probably much higher.

So if you have a staff of 1,000 people, 250,000 hours are being lost every year, not finding information on the WWW.

The only bright side is that the lost 250,000 hours aren’t a line item in the budget.

Topic maps can’t save all of that time for you but they can help create a find once, re-use many situation for your staff.

April 8, 2011

Strategies for Exploiting Large-scale Data in the Federal Government

Filed under: Hadoop,Marketing — Patrick Durusau @ 7:19 pm

Strategies for Exploiting Large-scale Data in the Federal Government

Yes, that federal government. The one in the United States that is purportedly going to shut-down. Except that those responsible for the shutdown will still get paid. There’s logic in there somewhere or so I have been told.

Nothing specifically useful but more the flavor of conversations that are taking place where people have large datasets.

March 29, 2011

Tchaikovsky by any other name

Filed under: Marketing,Topic Maps — Patrick Durusau @ 12:50 pm

My daughter, a musician and library school student, send me a link to variations on spellings of Tchaikovsky, which I quote below, followed by some comments.

If your mean, what is the most common way of spelling the composer's name (which in Russian was ???? ????? ??????????) in the English language, then that would be "Pyotr Ilyich Tchaikovsky". But the composer himself used "Tchaikovsky", "Tschaikovsky" and "Tschaikowsky" when writing in other languages, while "Chaykovskiy" would be a more literal transliteration.

Here are some other versions from the Library of Congress catalog (http://authorities.loc.gov):

  • Ciaikovsky, Piotr Ilic
  • Tschaikowsky, Peter Iljitch
  • Tchaikowsky, Peter Iljitch
  • Ciaikovsky, Pjotr Iljc
  • Cajkovskij, Petr Il'ic
  • Tsjaikovsky, Peter Iljitsj
  • Czajkowski, Piotr
  • Chaikovsky, P. I.
  • Csajkovszkij, Pjotr Iljics
  • Tsjai?kovskiej, Pjotr Iljietsj
  • Tjajkovskij, Pjotr Ilitj
  • C?aikovskis, P.
  • Chai?kovskii?, Petr Il'ich
  • Tchaikovski, Piotr
  • Tchaikovski, Piotr Ilyitch,
  • Chai?kovskii?, Petr
  • Tchaikovsky, Peter
  • Tchai?kovsky, Piotr Ilitch
  • Tschaikowsky, Pjotr Iljitsch
  • Tschajkowskij, Pjotr Iljitsch
  • Tchai?kovski, P. I.
  • Ciaikovskij, Piotr
  • Ciaikovskji, Piotr Ilijich
  • Tschaikowski, Peter Illic
  • Tjajkovskij, Peter
  • Chai?kovski, P'otr Ilich,
  • Tschaikousky
  • Tschaijkowskij, P. I.
  • Tschaikowsky, P. I.
  • Chai?kovski, Piotr Ilich
  • Tchaikovsky, Pyotr Ilyich
  • C?ajkovskij, Pe?tr Ilic?
  • Tschaikovsky, Peter Ilyich
  • Tchaikofsky, Peter Ilyitch
  • Tciaikowski, P.
  • Tchai?kovski, Petr Ilitch
  • Ciaikovski, Peter Ilic
  • Tschaikowski, Pjotr
  • Tchaikowsky, Pyotr
  • Tchaikovskij, Piotr Ilic

You can see the original post at: http://www.tchaikovsky-research.net/en/forum/forum0059.html

An impressive list but doesn’t begin to touch the ways Tchaikovsky has been indexed in Russian libraries to say nothing of transliterations of his name in libraries around the world in other languages.

Or how his name has appeared in the literature.

You could search Google Books using all 40 variations listed above, plus variations in other languages.

As could everyone following you.

Or, some enterprising soul could create a topic map that responded with all the actual entries for Tchaikovsky the composer, whichever variation of his name that you used in any language.

Like an index, a topic map is a labor saving device for the user because the winnowing of false hits, addition of resources under variations on search terms and the creation of multiple ways (think spellings) that lead to the same materials, have already happened.

Of course, creation of a topic map, or paying for the use of one created by others, is a line item in the budget.

In a way that paying staff to stare at screen after screen of mind-numbing and quite possibly irrelevant “hits” is not.

Need to find a way to make the same case that is made for indexes, as labor-saving, important devices. For topic maps.

Lowering Barriers to Contributions

Filed under: Marketing — Patrick Durusau @ 12:46 pm

Lowering Barriers to Contributions

Specifically about Erlang open source projects but possible lessons for topic map (and other) projects in general.

The theory is the easier we make it for people to contribute, the more people will contribute.

Hard to say that will happen in any particular case but I don’t see a downside.

March 20, 2011

a practical guide to noSQL

Filed under: Marketing,NoSQL — Patrick Durusau @ 1:26 pm

a practical guide to noSQL by Denise Mura strikes me as deeply problematic.

First, realize that Denise is describing the requirements that a MarkLogic server is said to meet.

That may or may not be the same as your requirements.

The starting point for evaluating any software, MarkLogic (which I happen to like) or not, must be with your requirements.

I mention this in part because I can think of several organizations and more than one government agency that has bought software that met a vendors requirements, but not their own.

The result was a sale for the vendor but a large software dog that everyone kept tripping over but pride and unwillingness to admit error kept it around for a very long time.

Take for example her claim that MarkLogic deliver[s] real-time updates, search, and retrieval results…. Well, ok, but if I run weekly reports on data that is uploaded on a daily basis, then real-time updates, search, and retrieval results may not be one of my requirements.

You need to start with your requirements (you do have written requirements, yes?) and not those of a vendor or what “everyone else” requires.

The same lesson holds true for construction of a topic map. It is your world view that it needs to reflect.

Second, it can also be used as a lesson in reading closely.

For example, of Lucene, Solr, and Sphinx, Denise says:

Search engines lie to you all the time in ways that are not always obvious because they need to take shortcuts to make performance targets. In other words, they don’t provide for a way to guarantee accuracy.

It isn’t clear from the context what lies Denise thinks we are being told. Or what it would mean to …guarantee accuracy?

I can’t think of any obvious ways that a search engine has ever lied to me, much less any non-obvious ones. (That may be because they are non-obvious.)

There are situations where noSQL, SQL, MarkLogic and topic maps solutions are entirely appropriate. But as a consumer you will need cut through promotional rhetoric to make the choice that is right for you.

March 16, 2011

Legendary Plots

Filed under: Marketing,Topic Maps — Patrick Durusau @ 3:19 pm

Legendary Plots

Mostly focused on R but I have included it here because of the discussion of the legend.

Particularly improving the focus of the information presented.

I rarely want all the information about a subject.

I just want the information that is helpful in my particular context.

A topic map may hold far more information that it ever displays to me.

And only display some small part of that information.

Otherwise it is like drinking from a news sewer (insert your favorite example) during a disaster.

Lots of information, little of it makes any sense.

Data Integration: Moving Beyond ETL

Filed under: Data Governance,Data Integration,Marketing — Patrick Durusau @ 3:16 pm

Data Integration: Moving Beyond ETL

A sponsored white-paper by DataFlux, www.dataflux.com.

Where ETL = Extract Transform Load

Many of the arguments made in this paper fit quite easily with topic map solutions.

DataFlux appears to be selling data governance based solutions, although it appears to take an evolutionary approach to implementing such solutions.

It occurs to me that topic maps could be one stage in the documentation and evolution of data governance solutions.

High marks for a white paper that doesn’t claim IT salvation from a particular approach.

March 12, 2011

Allura

Filed under: Marketing,Software,Topic Maps — Patrick Durusau @ 6:47 pm

Allura

From the website:

Allura is an open source implementation of a software “forge”, a web site that manages source code repositories, bug reports, discussions, mailing lists, wiki pages, blogs and more for any number of individual projects.

SourceForge.net is running an instance of Allura (aka New Forge, or Forge 2.0)….

Among the many areas where topic maps could make a noticeable difference is software development.

If you have ever tried to use any of the report databases, maintained by either commercial vendors or open source projects, you know what I mean.

Some are undoubtedly better than others but I have never seen one I would want to re-visit.

But, no source code management project is going to simply adopt topic maps because you or I suggest it or someone else thinks it is a good idea.

Well, its an open project so here is your chance to work towards topic maps becoming part of this project!

Before you join the discussion lists, etc., a few questions/suggestions:

  1. Spend some time studying the project and its code. What are its current priorities? How can you contribute to those, so that later suggestions by you may find favor?
  2. Where in a source code management system is subject identity the most critical? Suggest you find 2 or at the most 3 and then propose changes for only 1 initially.
  3. How would you measure the difference that management of subject identity makes for participants? (Whether they are aware of the contribution of topic maps or not.)

March 10, 2011

Pentaho BI Suite Enterprise Edition (TM/SW Are You Listening?)

Filed under: BI,Linked Data,Marketing,Semantic Web — Patrick Durusau @ 8:12 am

Pentaho BI Suite Enterprise Edition

From the website:

Pentaho is the open source business intelligence leader. Thousands of organizations globally depend on Pentaho to make faster and better business decisions that positively impact their bottom lines. Download the Pentaho BI Suite today if you want to speed your BI development, deploy on-premise or in the cloud or cut BI licensing costs by up to 90%.

There are several open source offerings like this, Talend is another one that comes to mind.

I haven’t looked at its data integration in detail but suspect I know the answer to the question:

Say I have an integration of some BI assets using Pentaho and other BI assets integrated using Talend, how do I integrate those together while maintaining the separately integrated BI assets?

Or for that matter, how do I integrate BI that has been gathered and integrated by others, say Lexis/Nexis?

Interesting too to note that this is the sort of user slickness and ease that topic maps and (cough) linked data (see, I knew I could say it), faces in the marketplace.

Does it offer all the bells and whistles of more sophisticated subject identity or reasoning approaches?

No, but if it offers all that users are interested in using, what is your complaint?

Both topic maps and semantic web/linked data approaches need to listen more closely to what users want.

As opposed to deciding what users need.

And delivering the latter instead of the former.

March 8, 2011

Topic Maps: Less Garbage In, Less Garbage Out

Filed under: Authoring Topic Maps,Marketing,Topic Maps — Patrick Durusau @ 10:03 am

The latest hue and cry over changes to the Google search algorithm (search for “Google farmer update,” I don’t want to dignify any of it with a link) seems like a golden advertising opportunity for topic maps.

The slogan?

Topic Maps: Less Garbage In, Less Garbage Out

That is one of the value-adds of any curated data source isn’t it?

Instead of say 200,000 “hits” post-Farmer update on some subject, what if a topic map offered 20?

Or 0.0001% of the 200,000?

Of course, there are those who would rush forward to say that I might miss an important email or blog posting on subject X.

True, but if it were truly an important email or blog posting then a curator is likely to have picked it up. Yes?

The point of curation is to save users the time and effort of winnowing (wading?) through information garbage.

Here’s a topic map construction idea:

  1. Capture all the out-going search requests from your location.
  2. Throw away all the porn searches.
  3. Create a topic map of the useful answers to the remaining searches.
  4. Use filtering software to block access to search engines and/or redirect to the topic map.

Your staff is looking for answers to work related questions, yes?

A curated resource, like a topic map, would save them time and effort in finding answers to those questions.

March 5, 2011

Keep an Eye on the emerging Open-Source Analytics Stack – Post

Filed under: Examples,Marketing — Patrick Durusau @ 3:34 pm

Keep an Eye on the emerging Open-Source Analytics Stack

David Smith’s summary captures the tone of the piece:

For the business user, the key takeaway is that this data analytics stack, built on commodity hardware and leading-edge open-source software, and a is a lower-cost, higher-value alternative to the existing status quo solutions offered by traditional vendors. Just a couple of years ago, these types of robust analytic capabilities were only available through major vendors. Today, the open-source community provides everything that the traditional vendors provide — and more. With open-source, you have choice, support, lower costs and faster cycles of innovation. The open-source analytics stack is more than a handy collection of interoperable tools — it’s an intelligence platform.

In that sense, the open-source analytics stack is genuinely revolutionary.

I use and promote the use of open source software so don’t take this as being anti-open source.

I think the jury is still out on the lower-cost question.

In part because the notion that anyone who can use a keyboard and an open source package is qualified to do BI, will reap its own reward.

There was a rumor years ago that local bar associations actually sponsored the “How to Avoid Probate” kits.

Reasoning that self-help would only increase the eventual fees for qualified counsel.

Curious to see how much of the “lower cost” of open source software is absorbed by correcting amateurish mistakes (assuming they are even admitted).

February 25, 2011

…a grain of salt

Filed under: Data Analysis,Data Models,Data Structures,Marketing — Patrick Durusau @ 5:46 pm

Benjamin Bock asked me recently about how I would model a mole of salt in a topic map.

That is a good question but I think we had better start with a single grain of salt and then work our way up from there.

At first blush, and only at first blush, do many subjects look quite easy to represent in a topic map.

A grain of salt looks simple to at first glance, just create a PSI (Published Subject Identifier), put that as the subjectIdentifier on a topic and be done with it.

Well…, except that I don’t want to talk about a particular grain of salt, I want to talk about salt more generally.

OK, one of those, I see.

Alright, same answer as before, except make the PSI for salt in general, not some particular grain of salt.

Well,…., except that when I go to the Wikipedia article on salt, Salt, I find that salt is a compound of chlorine and sodium.

A compound, oh, that means something made up of more than one subject. In a particular type of relationship.

Sounds like an association to me.

Of a particular type, an ionic association. (I looked it up, see: Ionic Compound)

And this association between chlorine and sodium has several properties reported by Wikipedia, here are just a few of them:

  • Molar mass: 58.443 g/mol
  • Appearance: Colorless/white crystalline solid
  • Odor: Odorless
  • Density: 2.165 g/cm3
  • Melting point: 801 °C, 1074 K, 1474 °F
  • Boiling point: 1413 °C, 1686 K, 2575 °F
  • … and several others.

    If you are interested in scientific/technical work, please be aware of CAS, a work product of the American Chemical Society, with a very impressive range unique identifiers. (56 million organic and inorganic substances, 62 million sequences and they have a counter that increments while you are on the page.)

    Note that unlike my suggestion, CAS takes the assign a unique identifier view for the substances, sequences and chemicals that they curate.

    Oh, sorry, got interested in the CAS as a source for subject identification. In fact, that is a nice segway to consider how to represent the millions and millions of compounds.

    We could create associations with the various components being role players but then we would have to reify those associations in order to hang additional properties off of them. Well, technically speaking in XTM we would create non-occurrence occurrences and type those to hold the additional properties.

    Sorry, I was presuming the decision to represent compounds as associations. Shout out when I start to presume that sort of thing. 😉

    The reason I would represent compounds as associations is that the components of the associations are then subjects I can talk about and even add addition properties to, or create mappings between.

    I suspect that CAS has chemistry from the 1800’s fairly well covered but what about older texts? Substances before then may not be of interest to commercial chemists but certainly would be of interest to historians and other scholars.

    Use of a topic map plus the CAS identifiers would enable scholars studying older materials to effectively share information about older texts, which have different designations for substances than CAS would record.

    You could argue that I could use a topic for compounds, much as CAS does, and rely upon searching in order to discover relationships.

    Tis true, tis true, but my modeling preference is for relationships seen as subjects, although I must confess I would prefer a next generation syntax that avoids the reification overhead of XTM.

    Given the prevalent of complex relationships/associations as you see from the CAS index, I think a simplification of the representation of associations is warranted.

    Sorry, I never did quite reach Benjamin’s question about a mole of salt but I will take up that gauge again tomorrow.

    We will see that measurements (which figured into his questions about recipes as well) is an interesting area of topic map design.
    *****

    PS: Comments and/or suggestions on areas to post about are most welcome. Subject analysis for topic maps is not unlike cataloging in library science to a degree, except that what classification you assign is entirely the work product of your experience, reading and analysis. There are no fixed answers, only the ones that you find the most useful.

    February 23, 2011

    Big Oil and Big Data

    Filed under: Entity Resolution,Marketing,Topic Maps — Patrick Durusau @ 11:47 am

    Big Oil and Big Data Mike Betron, Marketing Director of Infoglide says that it is becoming feasible to mine “big data” and to exploit “entity resolution.”

    Those who want to exploit the availability of big data have another powerful tool at their disposal – entity resolution. The ability to search across multiple databases with disparate forms residing in different locations can tame large amounts of data very quickly, efficiently resolving multiple entities into one and finding hidden connections without human intervention in many application areas, including detecting financial fraud.

    By exploiting advancing technologies like entity resolution, systems can give organizations a distinct competitive advantage over those who lag in technology adoption.

    I have to quibble about the …without human intervention… part, although I am quite happy with augmented human supervision.

    Well, that and the implication that entity resolution is a new technology. In various guises, entity resolution has been in use for decades in medical epidemiology, for example.

    Preventing subject identifications from languishing in reports, summaries, and the other information debris of a modern organization. So that organizational memories, documented and accessible organization memories prosper and grow, now that would be something different. (It could also be called a topic map.)

    Do You Tweet?

    Filed under: Marketing,News — Patrick Durusau @ 11:41 am

    This is probably old news to most people who Tweet but I stumbled across: http://wthashtag.com, which maintains a directory of hash tags.

    Hash tags for which you can submit definitions.

    I have entered a definition for #odf.

    There already is a definition for #topicmaps.

    When you tweet about topic maps related material, events, etc., please use #topicmaps.

    I try to use that plus hashtags from other relevant areas in hopes people will follow the topic maps tag as something unfamiliar that may be of interest.

    Hard to say if it will be effective but I suspect no less effective than some marketing strategies I have seen.


    Update: Apparently this service has been moved from this address to: http://hashtag.it/. And as far as I can tell, the hashtags for topicmaps and ODF have been lost.

    Do note that the URL listed in the original tweet does not go to the tweet site but to “What the Trend? API.”

    February 22, 2011

    Luke

    Filed under: Hadoop,Lucene,Maps,Marketing,Search Engines — Patrick Durusau @ 1:34 pm

    Luke

    From the website:

    Lucene is an Open Source, mature and high-performance Java search engine. It is highly flexible, and scalable from hundreds to millions of documents.

    Luke is a handy development and diagnostic tool, which accesses already existing Lucene indexes and allows you to display and modify their content in several ways:

    • browse by document number, or by term
    • view documents / copy to clipboard
    • retrieve a ranked list of most frequent terms
    • execute a search, and browse the results
    • analyze search results
    • selectively delete documents from the index
    • reconstruct the original document fields, edit them and re-insert to the index
    • optimize indexes
    • open indexes consisting of multiple parts, and located on Hadoop filesystem
    • and much more…

    Searching is interesting and I have several more search engines to report this week, but the real payoff is finding.

    And recording the finding so that other users can benefit from it.

    We could all develop our own maps of the London Underground, at the expense of repeating the effort of others.

    Or, we can purchase a copy of the London Underground.

    Which one seems more cost effective for your organization?

    See an Error at the Washington Post? Now You Can Easily Report It

    Filed under: Marketing,News — Patrick Durusau @ 1:21 pm

    See an Error at the Washington Post? Now You Can Easily Report It

    I’m not sure what would leave me less impressed but I can say this story doesn’t do much for me.

    How about you?

    The news is that every story will have have a link to a form for reader feedback.

    That’s better than current practice but here are a couple of things that might make more of a difference:

    • Where possible, embed links directly in stories for websites or other online resources that are mentioned.

      Why cite a report from an agency, commission, etc. that is online and not provide a link?

      Leaves me with the impression you want me take your word for it.

    • Provide permalinks so users can create mappings to news stories to be used with their data.

    I would say that the permalinks should contain explicit subject identity but that is expecting too much.

    If we can link to it, we can add explicit subject identity.

    February 21, 2011

    Still Building Memex & Topic Maps Part 1

    Filed under: Marketing,Topic Maps — Patrick Durusau @ 3:47 pm

    Still Building the Memex, Stephen Davies writes:

    We define a Personal Knowledge Base – or PKB – as an electronic tool through which an individual can express, capture, and later retrieve the personal knowledge he or she has acquired.
    ….

    Personal: Like Bush’s memex, a PKB is intended for private use, and its contents are custom tailored to the individual. It contains trends, relationships, categories, and personal observations that its owner sees but which no one else may agree with. Many of the issues involved in PKB design are also relevant in collaborative settings, as when a homogeneous group of people is jointly building a shared knowledge base. In this case, the knowledge base could simply reflect the consensus view of all contributors; or, perhaps better, it could simultaneously store and present alternate views of its contents, so as to honor several participants who may organize it or view it differently. This can introduce another level of complexity.

    I am not sure that having “ear buds” from an intellectual IPod like a PKB is a good idea.

    The average reader is already well insulated from the inconvenience of information or opinions dissimilar from their own. Reflecting the “consensus view of all contributors,” is a symptom, and not a desirable one at that.

    We have had the equivalents of PKBs over both Republican and Democratic administrations.

    The consequences of PKB or Personal Knowledge Bases, weapons of mass destruction in Iraq, the collapse of the housing market, serious mis-steps in both foreign and domestic policy, are too well known to need elaboration.

    The problem posed by a PKB is simpler but what we need are PbKB, Public Knowledge Bases.

    Even though more complex, a PbKB has the potential to put all citizens on a even footing with regard to debates over policy choices.

    It may be difficult to achieve in practice and only every partially successful, but the result could hardly be worse than the echo chamber of a PKB.

    (Still Building Memex & Topic Maps Part 2 – Beyond the Echo Chamber)

    « Newer PostsOlder Posts »

    Powered by WordPress