Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 23, 2011

50 Billion Things on the Internet by 2020

Filed under: Marketing — Patrick Durusau @ 3:09 pm

Cisco: 50 Billion Things on the Internet by 2020 [Infographic]

Read the post for estimates up to 1 trillion by 2013 or 2015, depending upon whose speculation you are being paid to agree with.

It is an amusing infographic but more for what it doesn’t say than for what it does.

A fairly flat place that only talks about devices, cows, sensors and the like. Which is good, but surely only part of the story.

What about all the relationships between those devices? How do we identify/address them? Or their states over some time sequence?

The Cisco “Planetary Skin” sounds like an exciting project but even more so if those sensors and their data is correlated to other information.

To be sure we are in for “a really big show,” whatever number you happen to prefer.

July 20, 2011

…Develop[ing] Personal Search Engine

Filed under: Marketing,Search Engines — Patrick Durusau @ 1:01 pm

Ness Computing Announces $5M Series A Financing to Develop Personal Search Engine

From the post:

SILICON VALLEY, Calif., July 19, 2011 /PRNewswire/ — Ness Computing is announcing that it raised a $5M Series A round of financing in November 2010. The round was led by Vinod Khosla and Ramy Adeeb of Khosla Ventures, with participation from Alsop Louie Partners, TomorrowVentures, Bullpen Capital, a co-founder of Palantir Technologies and several angel investors. This financing is enabling the company’s team of engineers and scientists, with expertise in information retrieval and machine learning, to pursue their vision to change the nature of search by building technology that delivers results and recommendations that are unique to each person using it.

The technology, which the company calls a Likeness Engine, represents a new approach to this complex engineering challenge by fusing a search engine and a recommendation engine, and will power the company’s first product, a mobile service called Ness. The Likeness Engine is different from traditional search engines that are useful for finding fact-based objective information that is the same for everyone, such as weather reports, dictionary terms, and stock prices. Ness Computing’s vision is to answer questions of a more subjective nature by understanding each person’s likes and dislikes, to deliver results that match his or her personal tastes. This can be seen in the difference between a person asking, “Which concerts are playing in New York City?” and “Which concerts would I most enjoy in New York City?” Ultimately, Ness aims to help people make decisions about dining, nightlife, entertainment, shopping, music, travel and more, culled expressly for them from the world’s almost limitless options.

Impressive array of previously successful talent.

I am not sure I buy the “objective” versus “subjective” information divide but clearly Ness is interested in learning the user’s view of the world in order to “better” answer their questions.

Depending on how successful the searches by Ness become, a user could become insulated in a cocoon of previous expressions of likes and dislikes.

That isn’t an original insight, I saw it somewhere in an article about personalized search results from search engines. Nor is it a problem that arose due to personalization of search engines.

The average user (read not a librarian), tends to search for terms in a field or subject area that they already know. So they are unlikely to encounter information that uses different terminology. In a very real way, user’s searches are already highly personalized.

Personalization isn’t a bad thing but it is a limiting thing. That is it puts a border on the information that you will get back from a search and you won’t have much of an opportunity to go beyond that. It simply never comes up. And information overload being what it is, having limited, safe results can be quite useful. Particularly if you like sleeping at the Holiday Inn, eating at McDonald’s and watching American Idol.

Hopefully Ness will address the semantic diversity issue in order to provide users, at least the ones who are interested, with a richer search experience. Topic maps would be useful in such an attempt.

July 18, 2011

The Joy of Erlang; Or, How To Ride A Toruk

Filed under: Erlang,Marketing,Topic Maps — Patrick Durusau @ 6:42 pm

The Joy of Erlang; Or, How To Ride A Toruk by Evan Miller.

From the post:

In the movie Avatar, there’s this big badass bird-brained pterodactyl thing called a Toruk that the main character must learn to ride in order to regain the trust of the blue people. As a general rule, Toruks do not like to be ridden, but if you fight one, subdue it, and then link your Blue Man ponytail to the Toruk’s ptero-tail, you get to own the thing for life. Owning a Toruk is awesome; it’s like owning a flying car you can control with your mind, which comes in handy when battling large chemical companies, impressing future colleagues, or delivering a pizza. But learning to ride a Toruk is dangerous, and very few people succeed.

I like to think of the Erlang programming language as a Toruk. Most people are frightened of Erlang. Legends of its abilities abound. In order to master it, you have to fight it, subdue it, and (finally) link your mind to it. But assuming you survive, you then get to control the world’s most advanced server platform, usually without even having to think. And let me tell you: riding a Toruk is great fun.

This guide is designed to teach you the Erlang state of mind, so that you are not afraid to go off and commandeer a Toruk of your own. I am going to introduce only a handful of Erlang language features, but we’re going to use them to solve a host of practical problems. The purpose is to give you the desire and confidence to go out and master the rest of the language yourself.

You are welcome to type the examples into your own Erlang shell and play around with them, but examples are foremost designed to be read. I recommend printing this document out and perusing it in a comfortable chair, away from email, compilers, 3-D movies, and other distractions.

Do you think people view topic maps as a Toruk?

How would you train them to ride rather than be eaten?

July 17, 2011

Hadoop & Startups: Where Open Source Meets Business Data

Filed under: Hadoop,Marketing,Subject Identity — Patrick Durusau @ 7:28 pm

Hadoop & Startups: Where Open Source Meets Business Data

From the post:

A decade ago, the open-source LAMP (Linux, Apache, MySQL, PHP/Python) stack began to transform web startup economics. As new open-source webservers, databases, and web-friendly programming languages liberated developers from proprietary software and big iron hardware, startup costs plummeted. This lowered the barrier to entry, changed the startup funding game, and led to the emergence of the current Angel/Seed funding ecosystem. In addition, of course, to enabling a generation of webapps we all use everyday.

This same process is now unfolding in the Big Data space, with an open-source ecosystem centered around Hadoop displacing the expensive, proprietary solutions. Startups are creating more intelligent businesses and more intelligent products as a result. And perhaps even more importantly, this technological movement has the potential to blur the sharp line between traditional business and traditional web startups, dramatically expanding the playing field for innovation.

So, how do we create an open-source subject identity ecosystem?

Note that I said “subject identity ecosystem” and not URLs pointing at arbitrary resources. Useful but subject identity, to be re-usable, requires more than that.

Highly Scalable Erlang Web Apps

Filed under: Erlang,Marketing,Software — Patrick Durusau @ 7:26 pm

Highly Scalable Erlang Web Apps by Yurii Rashkovskii.

From the post:

Erlang is not well known for it’s ability for writing Web applications on the front-end; however, it can be incredibly powerful for writing scalable and highly scalable. Yurii Rashkovskii, creator of Beam.js and Erlagner.org is helping to change that with a laundry list of Erlang open source projects and libraries which make writing powerful and scalable Web applications back possible in Erlang. Yurii Rashkovskii recently presented on some of the powerful frameworks he has presented at the Erlang Factory in London and shares some of his projects and their powerful abilities.

In addition to useful information about Erlang web apps, Yurii says:

If one would look at my current list of open source Erlang projects, they might seem like a pile of unrelated stuff, but there’s actually a very basic idea behind most (if not all) of these projects. The idea is that if we want to make Erlang a much more attractive platform for other developers, we should act more on befriending adjacent communities, instead of directly competing with them. (emphasis added)

Is that a useful way to think about topic map applications?

Social Media in Strategic Communication (SMISC)

Filed under: Funding,Marketing,Social Networks — Patrick Durusau @ 7:25 pm

Social Media in Strategic Communication (SMISC)

From the Synopsis:

DARPA is soliciting innovative research proposals in the area of social media in strategic communication. Proposed research should investigate innovative approaches that enable revolutionary advances in science, devices, or systems. Specifically excluded is research that primarily results in evolutionary improvements to the existing state of practice. See the full DARPA-BAA-11-64 document attached.

Important Dates
Posting Date: see announcement at www.fbo.gov
Proposal Due Date
Initial Closing: August 30, 2011, 12:00 noon (ET)
Final Closing: October 11, 2011, 12:00 noon (ET)
Industry Day: Tuesday, August 2, 2011

Contracting Office Address:
3701 North Fairfax Drive
Arlington, Virginia 22203-1714
Primary Point of Contact.:
Dr. Rand Waltzman
DARPA-BAA-11-64@darpa.mil

From the Funding Opportunity Description:

DARPA is soliciting innovative research proposals in the area of social media in strategic communication. Proposed research should investigate innovative approaches that enable revolutionary advances in science, devices, or systems. Specifically excluded is research that primarily results in evolutionary improvements to the existing state of practice. (emphasis added)

I think topic maps could be part of an approach that is revolutionary, not evolutionary.

I don’t have the infrastructure to field an application but if you do and have need for a wooly-pated consultant on such a project, give me a call.

PS: I first saw this in a tweet from Tim O’Reilly.

July 12, 2011

DataSift

Filed under: Marketing,Semantic Web,Semantics — Patrick Durusau @ 7:11 pm

DataSift

Congratulations to DataSift for the $6 Million in funding!

But there is another gem in the story about their funding.

Instead Mr. Halstead looked at how companies like Amazon had disrupted server rental and came up with a plan to do the same to data analysis. “For me the technology isn’t the game changer. For me it is approaching data processing in a democratized way. There are any number of companies that will sell you data, but they will typically charge you a five-figure sum minimum.

Note the line: “For me the technology isn’t the game changer. For me it is approaching data processing in a democratized way.”

That the trick isn’t it? To approach “…semantics in a democratized way.”

Precisely what topic maps have to offer. Topic maps can capture your semantics. If and when you become interested in the semantics of others, you can map your semantics to theirs, preserving the integrity of both.

Disruptive to the top down ontology approach you ask? I suppose that is true, it is. But then democratization is always disruptive of authoritarian schemes and patterns.

The real question is: Whose semantics would you rather have? Your own or those of someone else?

July 10, 2011

2011 Digital Universe Study

Filed under: Marketing — Patrick Durusau @ 3:39 pm

2011 Digital Universe Study by IDC, sponsored by EMC

Textual excerpts: Extracting Value from Chaos

Multimedia: The 2011 Digital Universe Study: Extracting Value from Chaos

Lots of impressive (scary?) numbers about the expansion of the digital universe.

I rather liked the first technical call to action:

Investigate the new tools for creating metadata — the information you will need to understand which data is needed when and for what. Big data will be a fountain of big value only if it can speak to you through metadata.

And that no present metadata methodology was cited as “the” solution.

Sounds like we all have a lot of work and experimenting ahead of us.

I don’t think we are ever going to find the technology solution to any issue in the Digital Universe. Our companions are too busy inventing new issues and reinventing old ones in new guises for that to happen.

July 3, 2011

Who’s Your Daddy?

Filed under: Data Source,Dataset,Marketing,Mashups,Social Graphs,Social Networks — Patrick Durusau @ 7:30 pm

Who’s Your Daddy? (Genealogy and Corruption, American Style)

NPR (National Public Radio) News broadcast the opinion this morning that Brits are marginally less corrupt than Americans. Interesting question. Was Bonnie less corrupt than Clyde? Debate at your leisure but the story did prompt me to think of an excellent resource for tracking both U.S. and British style corruption.

Probably all the talk of lineage in the news lately but why not use the genealogy records that are gathered so obsessively to track the soft corruption of influence?

Just another data set to overlay on elected, appointed, and hired positions, lobbyists, disclosure statements, contributions, known sightings, congressional legislation and administrative regulations, etc. Could lead to a “Who’s Your Daddy?” segment on NPR where employment or contracts are questioned naming names. That would be interesting.

It also seems more likely to be effective than the “disclose your corruption” sunlight approach. Corruption is never confessed, it has to be rooted out.

July 2, 2011

Semantic Oil-Spots

Filed under: Marketing — Patrick Durusau @ 3:16 pm

While reading one of the surveys on Big Data it occurred to me that the W3C was correct about one thing. Data without semantics isn’t going to be very useful.

Attempting to impose semantics world wide reminds me of an article by Mehar Omar Khan, Don’t Try to Arrest the Sea: An Alternative Approach for Afghanistan. I comment it to you for reading but in summary it advocates what is known in some circles as the oil-spot strategy.

That is to create safe havens that offer benefits to the local populace and use those to attract others to the same benefits.

Topic maps, unlike some semantic strategies, have the potential to be semantic oil-spots. Their semantics are driven by the group or department where they are deployed and do not require consent or agreement beyond that range.

Which means that the group or department can begin to derive benefits from their use of topic maps, resulting in benefits that are not accruing to others. This allows topic maps and their use to sell themselves, rather than being imposed from the top down. (The FBI Virtual Case File project being a well known example of top down IT planning.)

Mehar Omar Khan summarizes his strategy as:

Don’t try to arrest the sea. Create islands. Having gone well past the phase of breaking the back of Al-Qaeda and dispersing the Taliban, concentrate on ‘creating and building’ examples. Set the beacon and you’ll see that all the lost ships and boats will come ashore.

Where are you setting your next topic map beacon?

Big Data: The next frontier…

Filed under: BigData,Marketing — Patrick Durusau @ 3:16 pm

Big Data: The next frontier innovation, competition, and productivity

McKinsey Global Institute (MGI) study which briefly summarizes as:

MGI studied big data in five domains—health care in the United States, the public sector in Europe, retail in the United States, and manufacturing and personal location data globally. Big data can generate value in each. For example, a retailer using big data to the full could increase its operating margin by more than 60 percent. Harnessing big data in the public sector has enormous potential, too. If US health care were to use big data creatively and effectively to drive efficiency and quality, the sector could create more than $300 billion in value every year. Two-thirds of that would be in the form of reducing US health care expenditure by about 8 percent. In the developed economies of Europe, government administrators could save more than €100 billion ($149 billion) in operational efficiency improvements alone by using big data, not including using big data to reduce fraud and errors and boost the collection of tax revenues. And users of services enabled by personal location data could capture $600 billion in consumer surplus. The research offers seven key insights.

The opportunity to increase an operating margin by 60 percent is likely to get any CE0’s attention.

However, I would advise that you read the full report and pay close attention to the seventh insight that concludes this summary and the report:

Several issues will have to be addressed to capture the full potential of big data. Policies related to privacy, security, intellectual property, and even liability will need to be addressed in a big data world. Organizations need not only to put the right talent and technology in place but also structure workflows and incentives to optimize the use of big data. Access to data is critical—companies will increasingly need to integrate information from multiple data sources, often from third parties, and the incentives have to be in place to enable this.

Guess what one word is never used in the full report (156 pages)? Starts with an “s.”

Give up? Semantics.

Privacy, IP, security, etc., are more popular topics but if you were to open up to public access all 6,000 plus HR systems at the Pentagon, evil doers would have as much trouble as the GAO in auditing it. Why? A lack of documented semantics. Eventually they too would throw up their hands and move onto more useful (from their perspective) activities.

The potential for value and all the popular problems are present in Big Data, but semantics come first. Otherwise it’s just a Big Mess.

How Much Data…?

Filed under: Marketing — Patrick Durusau @ 3:09 pm

How Much Data Will Humans Create & Store This Year? [INFOGRAPHIC]

I started to create a category called scary graphic but then decided that marketing came close enough.

Use this graphic to emphasize how your client will need the ability to navigate the digital wilderness. Well, ok, their corner of the digital wilderness.

Which is a piece of sanity that seems to be missing from most presentations on data growth. Twitter/Facebook/SuperCollider data is exploding but how much of that is relevant to managing sales data at Walmart? Or HomeDepot? Or even the New York Stock Exchange?

Twitter traffic and the like makes great copy, but crunching numbers with economic significance is more likely to win and maintain contracts.


PS: That’s not to discount the potential for tracking Twitter traffic and insider sales on stocks. But discovering the basis for lawsuits qualifies as crunching numbers with economic significance. Topic maps would be a nice way to summarize and present such information.

July 1, 2011

World Bank Data

Filed under: Data Source,Mapping,Marketing — Patrick Durusau @ 2:55 pm

World Bank Data

Available through other portals, the World Bank offers access to over 7,000 indicators at its site, along with widgets for displaying the data.

While the World Bank Data website is well done and a step towards “transparency,” it does not address the need for “transparency” in terms financial auditing.

Take for example the Uganda – Financial Sector DPC Project. Admittedly it is only $50M but given it has a forty (40) year term with a ten (10) year grace period, who will be able to say with any certainty what happened to the funds in question?

If there were a mapping between the financial systems that disburse these funds into the financial systems in Uganda, then on whatever basis the information is updated, the World Bank would know and could assure others of the fate of the funds in question.

Granted I am assuming that different institutions and countries have different financial systems and that uniformity of such applications or systems isn’t realistic. It should certainly be possible to setup and maintain mappings between such systems. I suspect that mappings to banks and other financial institutions should be made as well to enable off-site auditing of any and all transactions.

Lest it seem like I am picking on World Bank recipients, I would recommend such mapping/auditing practices for all countries before approval of big ticket items like defense budgets. The fact that an auditing mapping fails in a following year is an indication something was changed for a reason. Once it is understood that changes attract attention and attention uncovers fraud, unexpected maintenance is unlikely to be an issue.

June 28, 2011

Marketing What Users Want…And An Example

Filed under: Marketing — Patrick Durusau @ 9:51 am

Dick Weisinger in Information Overload: The Data Management Challenge cites the following numbers from a data management survey:

  • 36 percent of organizations say that email overload is their biggest data management problem
  • 28 percent say that document and content management is their biggest issue
  • 15 percent cite information access controls
  • 13 percent point to compliance issues that they must deal with
  • 8 percent say that social media is an area that causes them headaches

Hmmm, not even one percent (1%) said semantic integration was an issue.

Maybe they haven’t heard that semantic integration is all the rage in IT circles? Don’t they read Wired or Scientific American?

I am sure most of them do. Probably the same percentage as you would find at a semantic technology conference.

The difference is they are facing specific problems in an enterprise context. Problems for which they need solutions they can sell to their management as cost effective and doable. By yesterday. The full generality of semantic integration makes nice weekend reading but their management won’t sit long for it on the following Monday.

I am not warranting the following example is feasible or even useful but pose it as a thought experiment.

Assume management agrees email overload is a serious problem and suspects it stems from too many cc’s on posts. There are any number of ways to track such posts but let me outline a topic map solution.

First, create a topic map of the organizational structure, along with approval and informational relationships. This could become more fine grained but for purposes of illustration let’s start with those two relationships. The email addresses for the various actors are included for each person.

Second, since IT runs the SMTP servers that process all the email sent by employees, a copy of every message is stored with associations in the topic map between sender and its recipient(s).

Third, after a month, a graphical map is presented to management showing emails inside/outside of approval/informational paths, along with senders and recipients of those posts.

Fourth, I would suggest discovering what functions are being performed by the targets of large numbers of out of band posts. They maybe informal information hubs who need more formalized roles or greater responsibility. Or the approval/informational structures need revising.

Fifth, for the truly bold, the IT department can filter email to decision makers to allow only a restricted set of staff to reach them by email, thereby reducing their information load from intra- as well as inter-company email. I am sure they will be thankful to not have to setup their own email filters.

You don’t have to use “topic map,” or “semantic integration,” or other such buzz words in selling such a solution. You can insert those as appropriate in the email story at your next semantic integration conference presentation.

June 24, 2011

The White Man’s Burden (in the 21st Century)

Filed under: Marketing — Patrick Durusau @ 10:55 am

Kipling (author of The White Man’s Burden in 1899) would have enjoyed the “about’ page for Technology for Transparency Network, where he would find:

The Technology for Transparency Network is a research and mapping project that aims to improve understanding of the current state of online technology projects that increase transparency and accountability in Central & Eastern Europe, East Asia, Latin America, the Middle East and North Africa, Sub-Saharan Africa, Southeast Asia, South Asia, and the former Soviet Union.

(and then jump to the final paragraph)

For years now there has been an ongoing debate about whether the Internet is good or bad for democracy. But we have few case studies and even fewer comparative research mappings of Internet-based projects that aim to improve governance, especially in countries outside of North America and Western Europe. Hopefully the Technology for Transparency Network will lead not only to more informed debate about the Internet’s impact on democracy, but also to more participation and interest in projects that aim to empower and improve the livelihoods of citizens who were previously excluded from political participation. (emphasis added)

Governments in North American and Western Europe may be releasing more data than other governments. Such as White House visitor lists that are incomplete, don’t show who was met or why, or the substance of the conversations or agreements. The released data is like a lollipop given to distract an unruly child. It may be really big and tasty, but the adult conversation continues while it is being consumed.

For example, what about the black hole that is the US military budget? True transparency would trace from the bill in congress to each branch of the service to each contact to each subcontract and subsubcontracts and thence to every state and employer in that state, along with reports on the usefulness of that particular item or service and then back to the member of congress who introduced it (more likely written by the prime contractor) into the budget. Or doing the same for the organized fraud that is the airport screening by the TSA, that since 9/11 has not caught a single terrorist, not one.

Or even better, all the diplomatic cables from North American and Western European governments while they mis-managed the last half of the 20th century? There are plenty of folks who are still alive who might have something to say about those discussions. Oh, people might be held accountable for betraying their countries and fellow citizens? Well, that is what transparency is all about, accountability.

Topic maps can be one part of a solution to help bring transparency and accountability to governments, no matter where they are located. Even including North America and Western Europe.

In part because diverse topic maps, using multiple national languages and views of the world, can be merged while retaining the integrity of those views. Which means users of the merged topic map, can look for absences as well as intersections of interest. To empower people to make demands to fill in the missing pieces.

PS: Any governments that are about to fall. Please post all diplomatic cables to a public ftp site. The other governments let you fall so you don’t owe them anything.

June 14, 2011

Seven Things Human Editors Do that Algorithms Don’t (Yet)

Filed under: Authoring Topic Maps,Marketing — Patrick Durusau @ 10:25 am

Seven Things Human Editors Do that Algorithms Don’t (Yet)

The seven things are all familiar:

  • Anticipation
  • Risk-taking
  • The whole picture
  • Pairing
  • Social importance
  • Mind-blowingness
  • Trust

At least for topic map authors.

How are you selling the human authorial input into your topic maps?

This list looks like a good place to start.

Quadrennial Defense Review

Filed under: Marketing — Patrick Durusau @ 10:24 am

Quadrennial Defense Review

Not light reading but if you are interested in the broad outlines of how U.S. defense may develop, this is one place to start.

I mention it because topic map application can compete with existing applications that are already in place or they can be the solution to problems without an installed base. Depends upon your particular strategy for promoting topic maps or topic map based services.

Another reason for reading this and similar material is to pick up the vocabulary in which needs will be expressed, so that you can pitch topic maps as solutions in terms of needs as seen by the prospective client, in this case the DoD. Pitching subject-centric processing to someone looking to:

fully implement the National Security Professional (NSP) program to improve cross-agency training, education, and professional experience opportunities.
(page 71)

isn’t likely to be successful. Subject-centric processing may be the best way to accomplish their goal, but the focus should be on achieving their goal. Once they are satisfied that is the case, maybe they will ask how it is being done. Maybe not. The important thing is for them to say: “I want more of that.”

June 11, 2011

Hadoop: What is it Good For? Absolutely … Something

Filed under: Hadoop,Marketing — Patrick Durusau @ 12:43 pm

Hadoop: What is it Good For? Absolutely … Something by James Kobielus is an interesting review of how to contrast Hadoop with an enterprise database warehouse (EDW).

From the post:

So – apart from being an open-source community with broad industry momentum – what is Hadoop good for that you can’t get elsewhere? The answer to that is a mouthful, but a powerful one.

Essentially, Hadoop is vendor-agnostic in-database analytics in the cloud, leveraging an open, comprehensive, extensible framework for building complex advanced analytics and data management functions for deployment into cloud computing architectures. At the heart of that framework is MapReduce, which is the only industry framework for developing statistical analysis, predictive modeling, data mining, natural language processing, sentiment analysis, machine learning, and other advanced analytics. Another linchpin of Hadoop, Pig, is a versatile language for building data integration processing logic.

Promoting Hadoop without singing Aquarius, promising us a new era in human relationships, or that we are going to be smarter than we were 100, 500, or even 1,000 years ago. Just cold hard data analysis advantages, the sort that reputations, businesses and billings are built upon. Maybe there is a lesson there for topic maps?

June 10, 2011

5 Important Factors for Pricing Data in
the Information (Overload) Age

Filed under: Marketing — Patrick Durusau @ 6:33 pm

5 Important Factors for Pricing Data in the Information (Overload) Age

Dick Cudoff, Co-founder and CEO of Infochimps, identifies five (5) factors to consider in pricing data.

Selling the results of the application of topic maps to data could follow his advice quite easily.

June 9, 2011

Paper: A Study of Practical Deduplication

Filed under: Deduplication,Marketing,Topic Maps — Patrick Durusau @ 6:34 pm

Paper: A Study of Practical Deduplication

From the post:

With BigData comes BigStorage costs. One way to store less is simply not to store the same data twice. That’s the radically simple and powerful notion behind data deduplication. If you are one of those who got a good laugh out of the idea of eliminating SQL queries as a rather obvious scalability strategy, you’ll love this one, but it is a powerful feature and one I don’t hear talked about outside the enterprise. A parallel idea in programming is the once-and-only-once principle of never duplicating code.

Someone asked the other day about how to make topic maps profitable.

Well, selling a solution to issues like duplication of data would be one of them.

You do know that the kernel of the idea for topic maps arose out of a desire to avoid paying 2X, 3X, 4X, or more for the same documentation on military equipment. Yes? Didn’t fly ultimately because of the markup that contractors get on documentation, which then funds their hiring military retirees. That doesn’t mean the original idea was a bad one.

Now, applying a topic map to military documentation systems and demonstrating the duplication of content, perhaps using one of Lars Marius Garshol’s similarity measures, that sounds like a rocking topic map application. Particularly in budget cutting times.

June 7, 2011

Topic Map Travel Alert!

Filed under: Marketing,Topic Maps — Patrick Durusau @ 6:55 pm

Government harassing and intimidating Bradley Manning supporters was brought to my attention earlier today. Since I have announced a number of conferences that involve travel to destinations outside the United States, for U.S. citizens and residents, I wanted to bring it to your attention.

Best advice: Do not attempt to return the United States with electronic devices.

What is particularly disheartening about this story is that it illustrates the continuing incompetence of the United States with regard to IT issues. The agents in question may as well be using paper towel tubes to look for clues about the Wikileaks security breach.

With a topic map, one of the first subjects to represent would be the deeply flawed system design that enabled access to diplomatic cables over an extended time period. The contractor, sysadmins and others would definitely be nodes in that map. Not to mention a physical audit of all the equipment with access to that data would be another issue, not just a premature focus on one possible suspect. It isn’t all that hard to imagine other compromised hardware, given that thousands of people had access to the same data.

I would exclude from such a topic map as noise the reports/discussion about Private Manning because that is a diversion of attention from the poor security practices and accountability for those practices (none to speak of) that lead to this breech, whoever was responsible.

Given how unperturbed the DoD seems to about the leak of the State Department cables, one has to wonder how seriously the chain of command communicated the alleged need to maintain security with regard to this cables? That could be another set of nodes in such a map.

A far cry from the “quick, we need a suspect and therefore the suspect must be guilty, that’s why we are abusing the suspect,” approach taken in this case. Stopping security leaks is different from saving face in the aftermath of security leaks. Maybe that’s the difference.

Marketing Topic Maps to Geeks

Filed under: Marketing,RDF,Topic Maps — Patrick Durusau @ 6:54 pm

Another aspect of the “oh woe is topic maps” discussion is the lack of interest in topic maps by geeks. There are open source topic map projects, presentations at geeky conferences, demos, etc., but no real geek swell for topic maps. But that same isn’t true for ontologies, RDF, description logic (ok, maybe less for DL), etc.

In retrospect, that isn’t all that surprising. Take a gander inside any of the software project categories at sourceforge.org. Any of those projects could benefit from more participation but every year sees more projects in the same categories and oft times covering the same capabilities.

Does any of that say to you: There is an answer and it has to be my answer? I won’t bother with collecting the stats for the lack of code reuse, another aspect of this issue. It is too well known to belabor.

Topic maps made the fatal mistake of saying answers are supplied by users and not developers. If you don’t think that was a mistake, take a look at any RDF vocabulary and tell me it was written by a typical user community. Almost without exception (I am sure there must be some somewhere), RDF vocabularies are written by experts and imposed on users. Hence their popularity, at least among experts anyway.

Topic map inverted the usual world view to say that since users are the source of the semantics in the texts they read, that we should start with their views. Imposing world views is always more popular than learning them, particularly among the geek community. They know what users should be doing and they damned well better do it.

Oh, the other mistake that topic maps made was to say there was more than one world view. Multiple world views that could be aligned together. The ontologists scotched that idea decades ago, although they haven’t been able to agree on the one world view that should be in place. I suppose there may be (small letters), multiple world views, but that is composed of the correct World View and numerous incorrect world views.

That would certainly be the position of US intelligence and diplomatic circles, who map into the correct World View all “incorrect world views,” which may account for their notable lack of successes over the last fifty or so years.

We should market topic maps to audiences who are interested in their own goals, not the goals of others, even geeks.

Goals from group to group. Some groups want to engage in disruptive behavior, other groups wish to prevent disruptive behavior, some want to advance research, still others want to be patent trolls.

Topic maps: Advance your goals with military grade IT. (How’s that for a new topic map slogan?)

Marketing Indexing

Filed under: Indexing,Marketing,Topic Maps — Patrick Durusau @ 6:25 pm

The episodic “oh, woe is topic maps! We aren’t as successful as ..(insert some semantic technology)..” posts are back on topicmapmail@infoloom.com. I don’t dispute that topic maps could improve its market share. I remember the “we’re #2, so we try harder” advertising campaign and take our present position as a reason to try harder, not to bewail our fate as ordained.

Let’s talk about how to market something closely related to topic maps, indexing.

I come to you with this great new idea, indexing. Instead of starting on page 1 and going through page n every time a reader wants to find information, the index points right to it. A real time saver.

You get excited and so we discuss two different marketing approaches:

1) We can do presentations, paper, demos, etc., on the theory of indexing, models of indexing, write software that does indexing, with a lot of effort, etc.

or,

2) We present a publisher/reviewer/reader with a book without an index and we have a copy of the same book with an index, plus a list of ten subjects to find in the book.

Show of hands. Which one do you think would be more effective?

June 5, 2011

Kasabi

Filed under: Dataset,Graphs,Marketing,Topic Maps — Patrick Durusau @ 3:21 pm

Kasabi

A dataset collection, curation and interface website that is currently in a public beta.

Summarized in part as:

Search, Browse, Explore

You can browse through the catalog to find datasets based on their category, or search via keywords. From each dataset’s homepage you can quickly find useful information about its provenance, licensing and a snapshot of useful metrics such as when the dataset was last updated.

Using the Explore tools will get you deeper into the dataset: drilling down into detailed documentation and sample data.

Datasets and APIs

Every dataset in Kasabi has a range of core APIs listed right on the dataset homepage or discoverable through the search and browse tools. Choose the API that best supports what you need to do, whether its a search over the data or more complex queries. Subscribe to an API to immediately gain access using your API key. Your dashboard lists all your subscribed APIs, and each has a useful reference card of parameters and response formats available from its homepage. Need more detailed docs? We have those too.

Contribute APIs

Can’t find an API that matches your application? In Kasabi, you can contribute your own using our API building tools. These tools let developers create customised RESTful APIs that capture ways of querying or navigating across a dataset, producing results in a variety of built-in and custom formats. All contributed APIs are listed in the catalog, along with automatically generated documentation, allowing them to be shared with the Kasabi community.

The Contribute APIs looks quite interesting, particularly since all the datasets are stored as separate graph databases.

A bit more from the FAQ on custom APIs:

A custom API allows you tailor access to the dataset. This custom access will then be suited to your particular application or user community. By creating and maintaining a custom API over the data, you won’t be constrained by the default APIs provided by Kasabi or the data owner.

By allowing the developer community to share its skills in ways other than just creating applications, Kasabi lets us broaden the definition of data curation to cover APIs and access as well as the data itself.

Only fifty-nine (59) datasets as of June 4, 2011, with a definite UK flavor but I expect that will grow fairly quickly. The usual suspects, the CIA World Factbook, BBC, New York Times, DBpedia, are all present. More than enough information to make topic map interfaces interesting. The principal advantage of topic map interfaces is the ability to specify a basis for a mapping, thereby enabling other researchers to follow or not, as they choose.

May 21, 2011

Erlang – Give it a try! (Marketing TM idea?)

Filed under: Erlang,Marketing — Patrick Durusau @ 5:24 pm

Erlang – Give it a try!

I just spot checked this online shell but it looks interesting.

Might be the sort of thing to get someone interested in Erlang.

Shows just enough to awaken interest in its capabilities.

Reminds me of a mother who read just enough of a story to capture a child’s interest and refused to read the rest of it. So the child had to learn to read it on their own, to find out how the story continued.

Maybe we need to do the same to market topic maps? Topic map enough content in an area that users will want to extend the topic map to make it more useful to them. To complete the story as it were.

I know what interests me, but it isn’t any more marketable than topic maps of 18th century castrati or similar obscurities. Fundable but not marketable.

Suggestions for what might prove to be popular topic maps?

FamilySearch.org

Filed under: Dataset,Marketing — Patrick Durusau @ 5:17 pm

FamilySearch.org

After locating the census record abstracts for record linkage, it occurred to me to look for census records for other countries.

Which fairly quickly put me out at family history sites.

FamilySearch.org looks like one of the better ones.

Pointers to very diverse sets of records which should provide grist for any matching algorithms as well as modeling issues for other information.

I am not familiar with the software in this area but my impression is that a lot of effort has gone into even the free stuff so poor UIs or performing apps need not apply. Topic maps are going to have to offer a real value add to get traction in this area.

If you investigate or are in the family history area, post a note if current software allows merging of family histories together?

May 10, 2011

Promoting Topic Maps

Filed under: Marketing — Patrick Durusau @ 3:35 pm

The recent release of QuaaxTM made me wonder how many people comfortable with the LAMP stack are aware of QuaaxTM?

There is the Topic Maps Tools page by Lars Marius Garshol, which is always the first place I look for new software, but how many non-topic map users would know to look there?

I am thinking that we need a two-fold strategy:

1) Use Lars’ lists of software to create “flyers” as it were for particular languages/platforms. (Communities that use Python are probably not interested in C++ libraries.)

2) “Distribute” those flyers when appropriate (no spamming) in discussions in other communities. With pointers back to Topic Maps Tools.

The metadata associated with the current listings makes the tools easy to find, but that is a pull information model.

I am thinking more along the lines of a push information model.

If you think about advertising, it is all based on a push information model.

Maybe there is a lesson there for the topic maps community.

May 9, 2011

Stats of the Union tells health stories in America

Filed under: Data Source,Marketing,Visualization — Patrick Durusau @ 10:35 am

Stats of the Union tells health stories in America

From Flowingdata.com news of an iPad app that:

maps the status of health in America. Browse, pan, zoom, and explore through a number of demographics and breakdowns.

I don’t have an iPad (or iPhone) but both are venues of opportunity for topic maps.

It isn’t hard to imagine a topic map that takes the same information in Stats of the Union and adds in data that correlates obesity with the density of fast-food restaurants, making zoning decisions for the same a matter of public health.

To answer the question: “Why are you fat?” with a localized “McDonalds, Wendys, Arbies, etc.”

Nice visualizations from what I could see on the video.

Just a thought, to personalize the obesity app, you could map in frequent customers who are, ahem, extra large sizes. (With their consent of course. I wouldn’t bother asking McDonalds.)

Perhaps a new slogan: Topic maps, focusing information to a sharp point.

What do you think?

Big Data – Demo

Filed under: BigData,Marketing — Patrick Durusau @ 10:32 am

The slides from JAX2011 by Pavlo Baron are as informative and entertaining as any set I have ever seen.

Big Data The slide deck.

If more people did slides like these, fewer people would be asleep or doing email during presentations.

Big Data – Demo For JAX2011

Pavlo blogs about his demo at JAX2011.

Big-Data-Demo-2

The source code for Pavlo’s demo.

*****

Every serious data project visits the factors Pavlo lists. The ones that succeed anyway.

May 5, 2011

Editing Geeks

Filed under: Documentation,Marketing,Standards — Patrick Durusau @ 1:55 pm

Top 25 Blogs for Editing Geeks

Something for those of us who are concerned with documentation and standards.

There is a lot of documentation, not to mention standards, in the topic maps area that could use attention.

Rather ironic that documentation for topic maps should be sub-par since the topic maps adventure started off as a software documentation project.

Perhaps getting our own house in order might make topic maps more appealing to others as well as giving all of us better documentation for existing topic map applications.

« Newer PostsOlder Posts »

Powered by WordPress