Archive for the ‘Topic Maps’ Category

Causium Sales Model

Wednesday, May 15th, 2013

Atlassian’s Causium Sales Model Reaches $2.5 Million Charity Donations by Kit Eaton.

From the post:

Back in May 2010 Atlassian, a large innovative software company, revealed that its alternative business model causium had allowed it to donate $500,000 to international literacy improvement charity Room to Read. Now the firm says it has surpassed $2.5 million in donations, and is holding a special event with the charity on May 14th to celebrate.

Causium is an alternative to the freemium business model that many companies–from the Wall Street Journal to Babbel–follow. Under freemium thinking, Atlassian would give away some of its enterprise-grade code for free in order to attract business for its paid services. But instead, the company charges a nominal $10 fee, which it then donates to charity. The fee works in two ways–as a boost to charitable causes, and also to demonstrate to the software’s end-users that the code itself has value.

Atlassian’s President Jay Simons spoke to Fast Company, explaining that the plan has worked better than they expected: “We didn’t appreciate at the time that we were effectively building this annuity stream. Customers that buy the 10-user license will buy it again the following year.” The first year of the plan resulted in some $300,000 in charity donations, and the growth of the company’s reputation since means they donated the same amount in the first quarter of 2013. The donations are important to Room to Read, Simons says, because “they have a reliable funding source” on a regular basis.

Important to note that Atlassian had the market presence to make a causium sales model work.

On that score, see:

Why Atlassian is to Software as Apple is to Design by Mark Fidelman.

and, of course:

Atlassian.

Important lessons if you hope to make your software or service a success.

Topic Maps in Lake Wobegon

Wednesday, May 15th, 2013

Jim Harris writes in The Decision Wobegon Effect:

In his book The Most Human Human, Brian Christian discussed what Baba Shiv of the Stanford Graduate School of Business called the decision dilemma, “where there is no objectively best choice, where there are simply a number of subjective variables with trade-offs between them. The nature of the situation is such that additional information probably won’t even help. In these cases – consider the parable of the donkey that, halfway between two bales of hay and unable to decide which way to walk, starves to death – what we want, more than to be correct, is to be satisfied with our choice (and out of the dilemma).”

(…)

Jim describes the Wobegon effect, an effect that blinds decision makers to alternative bales of hay.

Topic maps are composed of a mass of decisions, both large and small.

Is the Wobegon effect affecting your topic map authoring?

Check Jim’s post and think about your topic map authoring practices.

Every Band On Spotify Gets A Soundrop Listening Room [Almost a topic map]

Sunday, May 12th, 2013

Every Band On Spotify Gets A Soundrop Listening Room by Eliot Van Buskirk.

From the post:

Soundrop, a Spotify app that shares a big investor with Spotify, says it alone has the ability to scale listening rooms up so that thousands of people can listen to the same song together at the same time, using a secret sauce called Erlang — a hyper-efficient coding language developed by Ericsson for use on big telecom infrastructures (updated).

Starting today, Soundrop will offer a new way to listen: individual rooms dedicated to any single artist or band, so that fans of (or newcomers to) their music can gather to listen to that bands music. The rooms are filled with tunes already, but anyone in the room can edit the playlist, add new songs (only from that artist or their collaborations), and of course talk to other listeners in the chatroom.

“The rooms are made automatically whenever someone clicks on the artist,” Soundrop head of partnerships Cortney Harding told Evolver.fm. “No one owns the rooms, though. Artists, labels and management have to come to us to get admin rights.”

In topic map terminology, what I hear is:

Using the Soundrop app, Spotify listeners can create topics for any single artist or band with a single click. Associations between the artist/band and their albums, individual songs, etc., are created automatically.

What I don’t hear is the exposure of subject identifiers to allow fans to merge in information from other resources, such as fan zines, concert reports and of course, covers from the Rolling Stone.

Perhaps Soundrop will offer subject identifiers and merging as a separate, perhaps subscription feature.

Could be a win-win if the Rolling Stone, for example, were to start exposing their subject identifiers for articles, artists and bands.

Some content producers will follow others, some will invent their own subject identifiers.

The important point being that with topic maps we can merge based on their identifiers.

Not some uniform-identifier-in-the-sky-by-an-by, which stymies progress until universal agreement arrives.

Why Are We Still Waiting for Natural Language Processing?

Friday, May 10th, 2013

Why Are We Still Waiting for Natural Language Processing? by Geoffrey Pullum.

From the post:

Try typing this, or any question with roughly the same meaning, into the Google search box:

Which UK papers are not part of the Murdoch empire?

Your results (and you could get identical ones by typing the same words in the reverse order) will contain an estimated two million or more pages about Rupert Murdoch and the newspapers owned by his News Corporation. Exactly what you did not ask for.

Putting quotes round the search string freezes the word order, but makes things worse: It calls not for the answer (which would be a list including The Daily Telegraph, the Daily Mail, the Daily Mirror, etc.) but for pages where the exact wording of the question can be found, and there probably aren’t any (except this post).

Machine answering of such a question calls for not just a database of information about newspapers but also natural language processing (NLP). I’ve been waiting for NLP to arrive for 30 years. Whatever happened?

This is a series you need to follow.

Geoffrey promises to report on three “unexpected developments” that relate to natural language processing.

The next installment to appear Monday, May 13, 2013.

Of course, with curated content, as with a topic map, you get a find-once/read-many result (FOMR?).

Help Map Historical Weather From Ship Logs

Thursday, May 9th, 2013

Help Map Historical Weather From Ship Logs by Caitlin Dempsey.

From the post:

The Old Weather project is a crowdsourcing data gathering endeavor to understand and map historical weather variability. The data collected will be used to understand past weather patterns and extremes in order to better predict future weather and climate. The project is headed by a team of collaborators from a range of agencies such as NOAA, the Met Office, the National Archives, and the National Maritime Museum.

Information about historical weather, in the form of temperature and pressure measurements, can be gleaned from old ship logbooks. For example, Robert Fitzory, the Captain of the Beagle, and his crew recorded weather conditions in their logs at every point the ship visited during Charles Darwin’s expedition. The English East India from the 1780s to the 1830s made numerous trips between the United Kingdom and China and India, with the ship crews recording weather measurements in their log books. Other expeditions to Antarctica provide rare historical measurements for that region of the world.

By utilizing a crowdsourcing approach, the Old Weather project team aims to use the collective efforts of public participation to gather data and to fact check data recorded from log books. There are 250,000 log books stored in the United Kingdom alone. Clive Wilkinson, a climate historian and research manager for the Recovery of Logbooks and International Marine Data (RECLAIM) Project, a part of NOAA’s Climate Database Modernisation Program, notes there are billions of unrecorded weather observations stored in logbooks around the world that could be captured and use to better climate prediction models.

In addition to climate data, I suspect that ships logs would make interesting records to dovetail, using a topic map, with other records, such as of ports, along their voyages.

Tracking the identities of passengers and crew, cargoes, social events/conditions along the way.

Standing on their own, logs and other historical materials are of interest, but integrated with other historical records a fuller historical tapestry emerges.

New York Times – Article Search API v. 2

Sunday, May 5th, 2013

New York Times – Article Search API v. 2

From the documentation page:

With the Article Search API, you can search New York Times articles from Sept. 18, 1851 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata.

The prior Article Search API described itself as:

With the Article Search API, you can search New York Times articles from 1981 to today, retrieving headlines, abstracts, lead paragraphs, links to associated multimedia and other article metadata.

An addition of one hundred and eighty years of content for searching. No bad for a v. 2 release.

On cursory review, the API does appear to have changed significantly.

For example, the default fields for each request in version 1.0 were body, byline, date, title, url.

In version 2.0, the default fields returned are: web_url, snippet, lead_paragraph, abstract, print_page, blog, source, multimedia, headline, keywords, pub_date, document_type, news_desk, byline, type_of_material, _id, and word_count.

Five default fields for version 1.0 versus seventeen for version 2.0.

There are changes in terminology that will make discovering all the changes from version 1.0 to version 2.0 non-trivial.

Two fields that were present in version 1.0 that I don’t see (under another name?) in version 2.0 are:

dbpedia_resource:

DBpedia person names mapped to Times per_facet terms. This field is case sensitive: values must be Mixed Case.

The Times per_facet is often more comprehensive than dbpedia_resource, but the DBpedia name is easier to use with other data sources. For more information about linked open data, see data.nytimes.com.

dbpedia_resource_url:

URLs to DBpedia person names that have been mapped to Times per_facet terms. This field is case sensitive: values must be Mixed Case.

For more information about linked open data, see data.nytimes.com.

More documentation is promised, which I hope includes a mapping from version 1.0 to version 2.0.

Certainly looks like the basis for annotating content in the New York Times archives as part of a topic map.

Where users input their authentication details for the New York Times and/or other pay-per-view sites.

I can’t imagine anyone objecting to you helping them sell their content. ;-)

British Library Labs – Competition 2013

Sunday, May 5th, 2013

British Library Labs – Competition 2013

Deadline for entry: Wednesday 26 June , 2013 (midnight GMT)

From the webpage:

We want you to propose an innovative and transformative project using the British Library’s digital collections and if your idea is chosen, the Labs team will work with you to make it happen and you could win a prize of up to £3,000.

From the digitisation of thousands of books, newspapers and manuscripts, the curation of UK websites, bird sounds or location data for our maps, over the last two decades we’ve been faithfully amassing a vast and wide-ranging number of digital collections for the nation. What remains elusive, however, is understanding what researchers need in place in order to unlock the potential for new discoveries within these fascinating and diverse sets of digital content.

The Labs competition is designed to attract scholars, explorers, trailblazers and software developers who see the potential for new and innovative research and development opportunities lurking within these immense digital collections. Through soliciting imaginative and transformative projects utilising this content you will be giving us a steer as to the types of new processes, platforms, arrangements, services and tools needed to make it more accessible. We’ll even throw the Library’s resources behind you to make your idea a reality.

Numerous ways to get support for developing your idea before submission.

In terms of PR for your solution (hopefully topic maps based) do note:

Prizes

Winners will get direct curatorial and financial support for completing their project from the Labs team, which may involve an expenses paid residency at the British Library for a mutually agreed period of time (dependent on the winners’ circumstances, the winning ideas, access to resources and budget allowing).

  • Winners will receive £3000 for completing their project
  • Runners-up will receive £1000 for completing their project

The work will take place between between Saturday July 6 and Monday 4 November, 2013, with the completed projects being showcased during November 2013 when prizes will be awarded.

What happens to your ideas?

All ideas will be posted on the Labs website after they have been judged. All project ideas submitted for the competition can continue to be worked on and where possible the Labs team will provide support (time and resources permitting). Well developed projects will be showcased together with the competition winners during November 2013.

This is also a good excuse to spend more time at the British Library website. I don’t spend nearly enough time there myself.

Is Search a Thing of the Past

Friday, May 3rd, 2013

Is Search a Thing of the Past by April Holmes.

April covers a survey of 2277 private technology firms that were acquired in 2012.

See her post for the details but the bottom line was:

None of them were search companies.

I can’t remember anyone ever saying they had a “great” search experience.

Can you?

If not, what would you want to replace present search interfaces? (Leaving technical feasibility aside for the moment.)

How to Go Viral, Every Time

Wednesday, April 24th, 2013

How to Go Viral, Every Time by Jess Bachman.

From the post:

Everyone wants their content to go viral. It’s the holy grail of marketing. It can turn companies and product into the talk of the town, even if they sell toiletries. The ROI on content with more than a million views is almost unmeasurable. So how do you make sure your content will go viral?

The secret is simple. Be incredibly lucky.

Luck is the third piece of the virality triumvirate and obviously the hardest to bank on. In fact, you cannot achieve true virality without it. With great content and powerful tactics you can certainly get millions of views on a consistent basis, but if lady luck doesn’t give her blessing, you will end up with a good – but not great – ROI.

What do you think would make good viral material for a topic map video?

And of course:

Anyone with skills at producing videos interested in a topic map video?

Data Science Markets [Marketing]

Saturday, April 20th, 2013

Data Visualization: The Data Industry by Sean Gonzalez.

From the post:

In any industry you either provide a service or a product, and data science is no exception. Although the people who constitute the data science workforce are in many cases rebranded from statistician, physicist, algorithm developer, computer scientist, biologist, or anyone else who has had to systematically encode meaning from information as the product of their profession, data scientists are unique from these previous professions in that they operate across verticals as opposed to diving ever deeper down the rabbit hole.

Sean identifies five (5) market segments in data science and a visualization product for each one:

  1. New Recruits
  2. Contributors
  3. Distillers
  4. Consultants
  5. Traders

See Sean’s post for the details.

Have you identified market segments and the needs they have for topic map based data and/or software?

Yes, I said their needs.

You may want a “…more just, verdant, and peaceful world” but that’s hardly a common requirement.

Starting with a potential customer’s requirements is more likely to result in a sale.

The TAO of Topic Maps in Spanish

Wednesday, April 17th, 2013

Steve Pepper sends word that The TAO of Topic Maps has been translated into Spanish!

I am very grateful to Maria Ramos of WebHostingHub.com for translating The TAO of Topic Maps into Spanish: http://www.webhostinghub.com/support/es/misc/mapas-tematicos.

Since the article contains a lot of technical terminology, it might be a good idea if some Spanish-speaking Topic Maps experts were to proof-read the translation. Please send any comments directly to Maria at mariar@webhostinghub.com with a cc: to me.

Other translations to note?

HowTo: Develop Your First Google Glass App [Glassware]

Wednesday, April 17th, 2013

HowTo: Develop Your First Google Glass App [Glassware] by Tarandeep Singh.

From the post:

Google has raised curtains off it’s Glass revealing detailed Tech Specs. Along with the specs came the much awaited Mirror API – The API for Glass apps.

So you had that killer app idea for Google Glass? Now its time for you to put those ideas into code!

The race is on to produce the first topic map based Google Glass App!

A response to a request can be a machine generated guess or a human curated answer.

Which one do you think users would prefer?

Glass – Another Topic Map Medium?

Thursday, April 11th, 2013

If you haven’t seen Glass, go to: http://www.google.com/glass/start/

If lame search results are annoying on your desktop, pad or cellphone, imagine not being able to escape them.

Of for a positive spin, would you want a service provider with better results?

Bad data “in your face” may be the selling point we need.

Spreadsheet is Still the King of all Business Intelligence Tools

Thursday, April 11th, 2013

Spreadsheet is Still the King of all Business Intelligence Tools by Jim King.

From the post:

The technology consulting firm Gartner Group Inc. once precisely predicated that BI would be the hottest technology in 2012. The year of 2012 witnesses the sharp and substantial increase of BI. Unexpectedly, spreadsheet turns up to be the one developed and welcomed most, instead of the SAP BusinessObjects, IBM Cognos, QlikTech Qlikview, MicroStrateg, or TIBCO Spotfire. In facts, no matter it is in the aspect of total sales, customer base, or the increment, the spreadsheet is straight the top one.

Why the spreadsheet is still ruling the BI world?

See Jim’s post for the details but the bottom line was:

It is the low technical requirement, intuitive and flexible calculation capability, and business-expert-oriented easy solution to the 80% BI problems that makes the spreadsheet still rule the BI world.

Question:

How do you translate:

  • low technical requirement
  • intuitive and flexible calculation capacity (or its semantic equivalent)
  • business-expert-oriented solution to the 80% of BI problems

into a topic map application?

Selling Topic Maps: One Feature At A Time?

Thursday, April 11th, 2013

Dylan Jones writes in Data Quality: One Habit at a Time:

I started learning about data quality management back in 1992. Back then there were no conferences, limited publications and if you received an email via the internet the excitement lasted for hours.

Fast forward to today. We are practically swamped with data quality knowledge outlets. Sites like the Data Roundtable, OCDQ Blog and scores of other data quality bloggers provide practical ideas and techniques on an almost hourly basis.

We never lack for ideas and methods for implementing data quality management, and of course this is hugely beneficial for professionals looking to mature data quality in their organisation.

However, with all this knowledge comes a warning. Data quality management can only succeed when behaviours are changed, but to change a person’s behaviour requires the formation of new habits. This is where many projects will ultimately fail.

Have you ever started the New Year with a promise to change your ways and introduce new habits? Perhaps the guilt of festive excesses drove you to join a gym or undertake some other new health regime. How was that health drive looking in March? How about September?

The problem of habit formation is exacerbated when we attempt to change multiple habits. Perhaps we want to combine a regular running regime with learning new skills. The result is often failure.

Does your topic maps sales pitch require too much change? (I know mine does.)

Or do you focus on the one issue/problem that your client needs solving?

Sure, topic maps enable robust integration of diverse data stores but it that’s not your clients issue, why bring it up?

Can we sell more by promising less?

Open Access Theses and Dissertations

Tuesday, April 9th, 2013

Open Access Theses and Dissertations

From the webpage:

OATD aims to be the best possible resource for finding open access graduate theses and dissertations published around the world. Metadata (information about the theses) comes from over 600 colleges, universities, and research institutions. OATD currently indexes over 1.5 million theses and dissertations.

A search for “topic maps” as a phrase turns up thirty-six matches.

Try it for yourself: OATD search: “topic maps

Yes, the total for RDF is substantially higher.

But I take that as an incentive to do a better job spreading the word about topic maps in academic circles.

Enjoy!

I first saw this at: Theses and Dissertations Available Through New Open Access Tool by Africa S. Hands.

Data-Plundering at Amazon

Saturday, April 6th, 2013

Amazon S3 storage buckets set to ‘public’ are ripe for data-plundering by Ted Samson.

From the post:

Using a combination of relatively low-tech techniques and tools, security researchers have discovered that they can access the contents of one in six Amazon Simple Storage Service (S3) buckets. Those contents range from sales records and personal employee information to source code and unprotected database backups. Much of the data could be used to stage a network attack, to compromise users accounts, or to sell on the black market.

All told, researchers managed to discover and explore nearly 2,000 buckets from which they gathered a list of more than 126 billion files. They reviewed over 40,000 publicly visible files, many of which contained sensitive information, according to Rapid 7 Senior Security Consultant Will Vandevanter.

….

The root of the problem isn’t a security hole in Amazon’s storage cloud, according to Vandevanter. Rather, he credited Amazon S3 account holders who have failed to set their buckets to private — or to put it more bluntly, organizations that have embraced the cloud without fully understanding it. The fact that all S3 buckets have predictable, publically accessible URLs doesn’t help, though.

That was close!

From the headline I thought Chinese government hackers had carelessly left Amazon S3 storage buckets open after downloading. ;-)

If you want an even lower tech technique for hacking into your network, try the following (with permission):

Call users from your internal phone system and say system passwords have been stolen and IT will monitor all logins for 72 hours. To monitor access, IT needs users logins and passwords to put tracers on accounts. Could make the difference in next quarter earnings being up or being non-existent.

After testing, are you in more danger from your internal staff than external hackers?

As you might suspect, I would be using a topic map to provide security accountability across both IT and users.

With the goal of assisting security risks to become someone else’s security risks.

K-Nearest Neighbors: dangerously simple

Saturday, April 6th, 2013

K-Nearest Neighbors: dangerously simple by Cathy O’Neil.

From the post:

I spend my time at work nowadays thinking about how to start a company in data science. Since there are tons of companies now collecting tons of data, and they don’t know what do to do with it, nor who to ask, part of me wants to design (yet another) dumbed-down “analytics platform” so that business people can import their data onto the platform, and then perform simple algorithms themselves, without even having a data scientist to supervise.

After all, a good data scientist is hard to find. Sometimes you don’t even know if you want to invest in this whole big data thing, you’re not sure the data you’re collecting is all that great or whether the whole thing is just a bunch of hype. It’s tempting to bypass professional data scientists altogether and try to replace them with software.

I’m here to say, it’s not clear that’s possible. Even the simplest algorithm, like k-Nearest Neighbor (k-NN), can be naively misused by someone who doesn’t understand it well. Let me explain.

The devil is all in the detail of what you mean by close. And to make things trickier, as in easier to be deceptively easy, there are default choices you could make (and which you would make) which would probably be totally stupid. Namely, the raw numbers, and Euclidean distance.

Read and think about Cathy’s post.

All those nice, clean, clear number values and a simple math equation, muddied by meaning.

Undocumented meaning.

And undocumented relationships between the variables the number values represent.

You could document your meaning and the relationships between variables and still make dumb decisions.

The hope is you or your successor will use documented meaning and relationships to make better decisions.

For documentation you can:

  • Try to remember the meaning of “close” and the relationships for all uses of K-Nearest Neighbors where you work.
  • Write meaning and relationships down on sticky notes collected in your desk draw.
  • Write meaning and relationships on paper or in electronic files, the latter somewhere on the server.
  • Document meaning and relationships with a topic map, so you can leverage on information already known. Including identifiers for the VP who ordered you to use particular values, for example. (Along with digitally signed copies of the email(s) in question.)

Which one are you using?

PS: This link was forwarded to me by Sam Hunting.

Topic Maps and Bookmarks

Friday, April 5th, 2013

A comment recently suggested web bookmarks as an ideal topic map use case for most users.

There has been work along those lines. I haven’t found/remembered every paper/proposal so chime in the ones I miss.

The one that first came to mind was Thomas Passin’s Browser bookmark management with Topic Maps at Extreme Markup in 2003.

Abstract:

Making effective use of large collections of browser bookmarks is difficult. The user faces major challenges in finding specific entries, in finding specific or general kinds of entries, and in finding related references. In addition, the ability to add annotations would be very valuable.

This paper discusses a practical model for a bookmark collection that has been organized into nested folders. It is shown convincingly that the folder structure in no way implies a hierarchical taxonomy, nor does it reflect a faceted classification scheme. The model is presented as a topic map.

A number of simple enhancements to the basic information are described, including a very modest amount of semantic analysis on the bookmark titles. An approach for preserving user-entered annotations across bookmark updates is delineated. Some issues of user interface are discussed. In toto, the model, the computed enrichment, and the user interface work together to provide effective collocation and navigation capabilities.

A bookmark application that embodies this model has been implemented entirely within a standard browser The topic map engine is written entirely in javascript. The utility of this application, which the author uses daily, is remarkable considering the simplicity of the underlying model. It is planned to give a live demonstration during the presentation.

Then there was Tobias Hofmann and Martin Pradella, BookMap — A Topic Map Based Web Application for Organizing Bookmarks. (TMRA 2007)

Description:

This talk proposes a basic Ontology for use in Topic Maps storing semantic information on bookmark collections. Furthermore, we introduce a data model allowing to implement such a system on a LAMP (Linux, Apache, MySQL, PHP) platform, extended with the Cake-PHP framework. A prototype has been developed as proof of concept, where the use of AJAX and drag and drop capabilities in the browser resulted in a good user experience during a preliminary user evaluation.

and,

Toward a Topic Maps Amanuensis by Jack Park (2007)

Abstract:

The CALO project at SRI International provides unique opportunities to explore the boundaries of knowledge representation and organization in a learning environment. A goal reported here is to develop methods for assistance in the preparation of documents through a topic map framework populated by combinations of machine learning and recorded social gestures. This work in progress continues the evolution of Tagomizer, our social bookmarking application, adding features necessary for annotations of websites beyond simple bookmark-like tagging, including the creation of new subjects in the topic map. We report on the coupling of Tagomizer with a Java wiki engine, and show how this new framework will serve as a platform for CALO’s DocAssist application.

More recently:

ToMaBoM, Topic Map Bookmark Manager – Firefox Extension by Dieter Steiner (last updated 2012-11-05)

Features:

  • Create and Safe Weblinks in a Topic Map
  • Organize and Mange Entrys
  • Change Topic Map Meta-Model
  • Safe copy of Webpages locally and access them from within the extension
  • Import and Export the Topic Map as XML Topic Map

I need to mention Gabriel Hopmans is working on a topic map bookmark app but I don’t have a link to share. Gabriel?

Over the weekend, read up on the older proposals and take a look at ToMaBoM.

What do you like/dislike, would like to see, not just there but in any topic map bookmark app?

PS: I am wiling to bet that curated bookmarks, delivered to users (TM based searching), will be more popular than users doing the work themselves.

Building Attribute and Value Crosswalks… [Please Verify]

Friday, April 5th, 2013

Building Attribute and Value Crosswalks Using Esri’s Data Interoperability Extension by Nathan Lebel.

From the post:

The Esri Data Interoperability Extension gives GIS professionals the ability to build complex spatial extraction, transformation, and loading (ETL) tools. Traditionally the crosswalking of feature classes and attributes is done prior to setting up the migration tools and is used only as a guide. The drawback to this method is that it takes a considerable amount of time to build the crosswalks and then to build the ETL tools.

GISI’s article, “Building Attribute and Value Crosswalks in ESRI Data Interoperability Extension the Scalable/Dynamic Way” outlines the use of the SchemaMapper transformer within Data Interoperability Extension which can pull crosswalk information directly from properly formatted tables. For large projects this means you can store crosswalk information in a single repository and point each ETL tool to that repository without needing to manage multiple crosswalk documents. For projects that might change during the lifecycle of the project the use of SchemaMapper means that changes can be made to the repository without requiring any additional changes to the ETL tool. There are three examples used in this article which encompasses a majority of crosswalking tasks; feature class to feature class, attribute to attribute, and attribute value to attribute value crosswalking. All of the examples use CSV files to store the crosswalk information; however the transformer can pull directly from RDBMS tables as well which gives you the ability to build a user interface to create and update crosswalks which is recommended for large scale projects.

The full article can be accessed on GISI’s blog or as a PDF or Ebook in either EPUB or Kindle or format.

If you have time, please read the original article. Obtain it from the links listed in the final paragraph.

I need for you to verify my reading of the process described in that article.

As far as I can tell, the author never say “why” or on what basis the various mappings are being made.

I would be hard pressed to duplicate the mapping based on the information given about the original data sources.

Having an opaque mapping can be useful, as the article says but what if I stumble upon the mapping five years from now? Or two years? Or perhaps even six months from now?

Specifying the “why” of a mapping is something topic maps are uniquely qualified to do.

You can define merging rules that require the basis for mapping to be specified.

If that basis is absent, no merging occurs.

Targeting Developers?

Thursday, April 4th, 2013

Most topic map software, either explicitly or implicitly, is targeted at developers.

I ran across a graphic today that highlights what I consider to be a flaw in that strategy.

The original graphic concerns the number of students enrolled in computer science:

CS enrollment

I first saw that in a tweet by Matt Asay.

I need to practice (read learn) Gimp skills so my first attempt to re-purpose the graphic was:

CS student enrollment

But that leaves my main point implied, so after some fiddling, I got:

Marketing image

Even without a marketing degree, I can pick the better marketing target.

What about you?

BTW, the experience with Hadoop supports my side, not the targeting for developers argument.

Yes, a lot of Hadoop tools are difficult to use, if not black arts.

However, Hadoop marketing has more hand waving and arm flapping than you will see among Democrats on entitlement reform and Republicans on tax reform, combined.

The Hadoop ecosystem (which I like a lot by the way) is billed to consumers as curing everything but AIDS and that is just a matter of application.

Consumer demand, from people who aren’t going to run Hadoop clusters, write pig scripts, etc. is driving developers to build better tools and to learn the harder ones.

Suggestions on how to build consumer oriented marketing of topic maps will be greatly appreciated!

Requirements for an Authoring Tool for Topic Maps

Wednesday, April 3rd, 2013

I appreciated the recent comment that made it clear I was conflating several things under “authoring.”

One of those things was the conceptual design topic map, another was the transformation or importing of data into a topic map.

A third one was the authoring of a topic map in the sense of using an editor, much like a writer using a typewriter.

Not to denigrate the other two aspects of authoring but I haven’t thought about them as long as the sense of writing a topic map.

Today I wanted to raise the issue of requirements for a authoring/writing tool for topic maps.

I appreciate the mention of Wandora, which is a very powerful topic map tool.

But Wandora has more features than a beginning topic map author will need.

An author could graduate to Wandora, but it makes a difficult starting place.

Here is my sketch of requirements for a topic map authoring/writing tool:

  • Text entry (Unicode)
  • Prompts/Guides for required/optional properties (subject identifier, subject locator or item identifier)
  • Prompts/Guides for required/optional components (Think roles in an associations)
  • Types (nice to have constrained to existing topic)
  • Scope (nice to have constrained to be existing topic)
  • Separation of topics, associations, occurrences (TMDM as legend)
  • As little topic map lingo as possible
  • Pre-defined topics

What requirements am I missing for a topic map authoring tool that is more helpful than a text editor but less complicated than TeX?

BTW, as I wrote this, it occurred to me to ask: How did you learn to write HTML?

Outing Censors

Wednesday, April 3rd, 2013

You may already be aware of threats and legal proceedings by Edwin Mellen Press against criticism of itself and its publications.

For one recent update, see: Posts Removed Because We’ve Received Letters From Edwin Mellen Press’ Attorney by Kent Anderson.

For further background, see: When Sellers and Buyers Disagree — Edwin Mellen Press vs. a Critical Librarian by Rick Anderson.

The thought occurs to me that over the years there must be a treasure trove of letters and other communications from Edwin Mellen Press, not to mention litigation files, depositions, etc.

But any story about Edwin Mellen Press will be written with access to only part of that historical information.

What if McMaster University were to publicize the “…demands and considerable pressure from the Edwin Mellen Press….?” And those demands could be mapped to other demands against others?

The demands by Edwin Mellen Press have been made against librarians. The very people who excel at the collection and creation of archives.

Is it time for the library community to pool its knowledge about Edwin Mellen Press?

My time resources are limited but I would be willing to contribute as I am able to such an effort.

You?

Information Management – Gartner 2013 “Predictions”

Wednesday, April 3rd, 2013

I hesitate to call Gartner reports “predictions.”

The public ones I have seen are c-suite summaries of information already known to the semi-informed.

Are Gartner “predictions” about what c-suite types may become informed about in the coming year?

That qualifies for the dictionary sense of “prediction.”

More importantly, what c-suite types may become informed about are clues on how to promote topic maps.

If you don’t have access to the real Gartner reports, Andy Price has summarized information management predictions in: IT trends: Gartner’s 2013 predictions for information management.

The ones primarily relevant to topic maps are:

  • Big data
  • Semantic technologies
  • The logical data warehouse
  • NoSQL DBMSs
  • Information stewardship applications
  • Information valuation/infonomics

One possible way to capitalize on these “predictions” would be to create a word cloud from the articles reporting on these “predictions.”

Every article with use slightly different language and the most popular terms are the ones to use for marketing.

Thinking they will be repeated often enough to resonate with potential customers.

Capturing the business needs answered by those terms would be a separate step.

Topic Map Tool Chain

Tuesday, April 2nd, 2013

Belaboring the state of topic map tools won’t change this fact: It could use improvement.

Leaving the current state of topic map tools to one side, I have a suggestion about going forward.

What if we conceptualize topic map production as a tool chain?

A chain that can exist as separate components or with combinations of components.

Thinking like *nix tools, each one could be designed to do one task well.

The stages I see:

  1. Authoring
  2. Merging
  3. Conversion
  4. Query
  5. Display

The only odd looking stage is “conversion.”

By that I mean conversion from being held in a topic map data store or format to some other format for integration, query or display.

TaxMap, the oldest topic map on the WWW, is a conversion to HTML for delivery.

Converting a topic map into graph format enables the use of graph display or query mechanisms.

End-to-end solutions are possible but a tool chain perspective enables smaller projects with quicker returns.

Comments/Suggestions?

On the Eight Day

Monday, April 1st, 2013

On the eight day of creation [language and time units are for the convenience of the reader. The celestial court exists outside of their strictures].

I started this post off as an April Fools Day gag but the keyboard ran away from me.

See what you think.

L = Lord

O = Other member(s) of the celestial court

L: The Tower of Babel is another example of bad PR from my own followers.

O: How so? Didn’t you confuse their languages to prevent an assault on Heaven?

L: Look around you. It is likely I would be fearful of someone piling up bricks to assault Heaven?

O: Well, now that you mention it, no, it doesn’t seem likely. (In an uncertain tone of voice.)

L: Would it help if I explained why humans invented the story of the Tower of Babel?

O: Nodding quickly.

L: Arrogance.

O: Arrogance?

L: Think about it. There are two types of people. One type thinks they know what and how everyone else should be thinking. The other type knows who should be telling others what and how to think.

The Tower of Babel story blames me for the competition to force others to a single way of thinking.

What’s ironic is their arrogance multiplies the number of languages and approaches to languages. Every generation denigrates what went before, for a new bumper crop of shiny “truths.”

Need an example?

Take their “when in the beginning there was FORTRAN….”

Now look at any listing of major programming languages, never mind the smaller ones.

No Tower of Babel story there.

O: What about the Tower of Babel as an explanation for different languages?

L: Glad you asked.

I’ll give you one guess who thinks they are entitled to an explanation for everything.

More listening to others and less whining about not being in charge would be a start towards less confusion of languages.

USPTO – New Big Data App [Value-Add Opportunity]

Monday, April 1st, 2013

U.S. Patent and Trademark Office Launches New Big Data Application on MarkLogic®

From the post:

Real-Time, Granular, Online Access to Complex Manuals Improves Efficiency and Transparency While Reducing Costs

MarkLogic Corporation, the provider of the MarkLogic® Enterprise NoSQL database, today announced that the U.S. Patent and Trademark Office (USPTO) has launched the Reference Document Management Service (RDMS), which uses MarkLogic for real-time searching of detailed, specific, up-to-date content within patent and trademark manuals. RDMS enables real-time search of the Manual of Patent Examining Procedure (MPEP) and the Trademark Manual of Examination Procedures (TMEP). These manuals provide a vital window into the complexities of U.S. patent and trademark laws for inventors, examiners, businesses, and patent and government attorneys.

The thousands of examiners working for USPTO need to be able to quickly locate relevant instructions and procedures to assist in their examinations. The RDMS is enabling faster, easier searches for these internal users.

Having the most current materials online also means that the government can reduce reliance on printed manuals that quickly go out of date. USPTO can also now create and publish revisions to its manuals more quickly, allowing them to be far more responsive to changes in legislation.

Additionally, for the first time ever, the tool has also been made available to the public increasing the MPEP and TMEP accessibility globally, furthering the federal government’s efforts to promote transparency and accountability to U.S. citizens. Patent creators and their trusted advisors can now search and reference the same content as the USPTO examiners, in real time — instead of having to thumb through a printed reference guide.

The date on this report was March 26, 2013.

I don’t know if the USPTO is just playing games but searching their site for “Reference Document Management Service” produces zero “hits.”

Searching for “RDMS” produces four (4) “hits,” none of which were pointers to an interface.

Maybe it was too transparent?

The value-add proposition I was going to suggest was mapping the results of searching into some coherent presentation, like TaxMap.

And/or linking the results of searches into current literature in rapidly developing fields of technology.

Guess both of those opportunities will have to wait for basic searching to be available.

If you have a status update on this announced but missing project please ping me.

Current Topic Map Software?

Monday, April 1st, 2013

A recent comment about topic map tools reads in part:

First, it took me a long time to understand what tools are out there, what their capabilities are, and which ones are still maintained. (As an aside, you would think the topic map community would have a central topic map based repository/wiki to make it easy for new developers to get started. )

A valid criticism.

I could not name off hand all the currently maintained topic map projects.

Can you?

Moreover, shouldn’t there be more topic map tools?

Adoption of computer technologies in the absence of computer-based tools tends to be low.

Yes?

Speaking of Business Cases

Friday, March 29th, 2013

The Telenor post reminded me about my arguments about topic maps saving users time by not (re)searching for information already found.

In Telenor’s case, there was someone, customers in fact, who wanted faster and more accurate information.

Is there a business case for avoiding (re)searching for information already found?

Say where research is being billed to a client by the hour?

The more attorneys, CPAs, paralegals, etc. that find the same information = more billable hours.

Where a topic map = fewer billable hours.

And where billable hours aren’t an issue, what do users do with the time they used to spend on the appearance of working by searching?

I am reminded of a then department manager who described themselves as “…doing market research…” by reading the latest issue of Computer Shopper. Nearly twenty (20) years ago now but even then there were more effective means of such research.

On the other hand, there may be cases where use of topic maps by one side may force others to improve their game.

Intelligence gathering and processing for example.

Topic maps need not disrupt current layers of contracting, feathered nests and revolving doors, to say nothing of the turf guardians.

But topic maps could envelope such systems, in place, to provide access to integrated inter-agency intelligence, long before agreement is reached (if ever) on what intelligence to share.

Let’s do this the hard way [Topic Map Security]

Thursday, March 28th, 2013

Let’s do this the hard way by Edd Dumbill.

Discovery of high profile security vulnerabilities (Rails, MongoDB) caused Edd to pen this suggestion for software security:

But perhaps we are in need of an inversion of philosophy. Where Internet programming is concerned, everyone is quick to quote Postel’s law: “Be conservative in what you do, be liberal in what you accept from others.”

The fact of it is that being liberal in what you accept is really hard. You basically have two options: look carefully for only the information you need, which I think is the spirit of Postel’s law, or implement something powerful that will take care of many use cases. This latter strategy, though seemingly quicker and more future-proof, is what often leads to bugs and security holes, as unintended applications of powerful parsers manifest themselves.

My conclusion is this: use whatever language makes sense, but be systematically paranoid. Be liberal in what you accept, but conservative about what you believe.

Which raises the little noticed question of topic map security.

Take for instance, if you are using the TMDM model for a topic map and someone submits the topic map equivalent of “spam.” That is a topic that has the same subject identifier as some legitimate topic in your map but it is an ad to get you into “bikini shape.”

My inbox has seen a rash of those lately. I shudder to think what I would look like in “bikini shape.” It would be good for others, not so much for me. ;-)

Or a topic that has a set of subject identifiers that causes merging between topics that should not be merged. Possibly overloading your system or at the very least, causing a disruption to your users.

There are no standard solutions to topic map security although I suspect some users/vendors have hand crafted their own.

To be taken seriously in these security conscious times, I think we need to extend the topic maps standard to provide for topic map security.

Suggestions and proposals welcome!