Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 7, 2011

Dr. Watson?

I got up thinking that there needs to be a project for automated authoring of a topic map and the name, Dr. Watson suddenly occurred to me. After all, Dr. Watson was Sherlock Holmes’ sidekick so it would not be like saying it could stand on its own. Plus there would be some name recognition and/or confusion with the real Dr. Watson, or rather imaginary Dr. Watson of Sherlock Holmes’ fame.

And there would be confusion with the Dr. Watson that is the internal debugger for Windows (MS, I never can remember if the ™ goes on Windows or MS. Not that anyone else would want to call themselves MS. 😉 ) Plus the Watson research center at IBM.

Well, I suspect being an automated, probabilistic topic map authoring system will be enough to distinguish it from the foregoing uses.

Any others that I should be aware of?

I say probabilistic because even with the TMDM’s string matching on URIs, it is only probable that two or more topics actually represent the same topic. It is always possible that a URI has been incorrectly used to identity the subject that a topic represents. And in such cases, the error perpetuates itself across a topic map.

So we start off with the realization that even string matching results in a probability of less than 1.0 (where 1.0 is absolute certainty) that two or more topics represent the same subject.

Since we are already being probabilistic, why not be openly so?

But, before we get into the weeds and details, the project has to have a cool name. (As in not an acronym that is cool and we make up a long name to fit the acronym.)

All those in favor of Dr. Watson, please signify by raising your hands (or the beer you are holding).

More to follow.

December 6, 2011

Common Lisp is the best language to learn programming

Filed under: Authoring Topic Maps,Lisp,Programming,Topic Maps — Patrick Durusau @ 8:06 pm

Common Lisp is the best language to learn programming

From the post:

Now that Conrad Barski’s Land of Lisp (see my review on Slashdot) has come out, I definitely think Common Lisp is the best language for kids (or anyone else) to start learning computer programming.

Not trying to start a language war but am curious about two resources cited in this post:

Common Lisp HyperSpec

and,

Common Lisp the language, 2nd edition

My curiosity?

How would you map these two resources into a single topic map on Lisp?

Is there any third resource, perhaps the “Land of Lisp” that you would like to add?

Any blogs, mailing list posts, etc.?

Would that topic map be any different if you decided to add Scheme or Haskell to your topic map?

If this were a “learning lisp” resource for beginning programmers, how would you limit the amount of information exposed?

November 27, 2011

Tracking Scholars (and their work)

Filed under: Authoring Topic Maps — Patrick Durusau @ 8:54 pm

In An R function to analyze your Google Scholar Citations page I mused:

Scholars are fairly peripatetic these days and so have webpages, projects, courses, not to mention social media postings using various university identities. A topic map would be a nice complement to this function to gather up the “grey” literature that underlies final publications.

Matt O’Donnell follows that post up with a tweet asking what such a map would look like?

An example would help make the point but I did not want to choose one with a known outcome. Since I recently blogged about the Natural Language Processing being taught by Christoper Manning and Dan Jurafsky, I will use both of them as examples.

From the course description we know:

Dan Jurafsky is Professor of Linguistics and Professor by Courtesy of Computer Science at Stanford University. Dan received his Bachelors degree in Linguistics in 1983 and his Ph.D. in Computer Science in 1992, both from the University of California at Berkeley, and also taught at the University of Colorado, Boulder before joining the Stanford faculty in 2004. He is the recipient of a MacArthur Fellowship and has served on a variety of editorial boards, corporate advisory boards, and program committees. Dan’s research extends broadly throughout natural language processing as well as its application to the behavioral and social sciences.

Jurafsky has at least three (possibly more) email addresses:

  • University of California at Berkley – ending somewhere in the early 1990’s
  • University of Colorado, Boulder – between early 1990’s and 2004
  • Stanford – starting in 2004

Just following the link in the class blurb we have: jurafsky(at)stanford.edu for his (current) email at Stanford (it may have changed, can’t say based on what we know now) and a URL to use as a subject identifier, http://www.stanford.edu/~jurafsky/.

I should make up some really difficult technique at this point for discovering prior email addresses. 😉 Some of those may be necessary but what follows is a technique that works for most academics.

We know that Jurafsky started at Stanford in 2004 and for purposes of this exercise we will assume his email at Stanford has been stable. So we need email addresses prior to 2004. At least for CS or CS related fields, the first place I would go is The DBLP Computer Science Bibliography. Choosing author search and inputting “jurafsky” I get two “hits.”

# Dan Jurafsky
# Daniel Jurafsky

You will note on the right hand side of the listing of articles, on the “Ask Others…” line, there is a text box with the value used by DBLP to conduct the search. For both “Dan Jurafsky” and “Daniel Jurafsky” it is using author:daniel_jurafsky:. That is it has regularized the name so that when you as for “Dan Jurafsky,” the search is on the longer form.

Sorry, digression. Anyway, we know we need an address for sometime prior to 2004 and scanning the publications prior to 2004, I saw the following citation:

Daniel Gildea, Daniel Jurafsky: Automatic Labeling of Semantic Roles. Computational Linguistics 28(3): 245-288 (2002)

The source in Computational Linguistics is important because if you follow the Computational Linguistics 28 link, it will take you to a listing of that article in that particular issue of Compuational Linguistics.

Oh, the icons:

Electronic Edition Link to electronic version if one exists (may be a pay-per-view site)
CiteSeerX Searches the title as a string at CiteSeerX
Google scholar Searches the title as a string at Google Scholar
pubzone.org Links to article if it appears in PubZone, a service of ETH Zurich in cooperation with ACM SIGMOD
BibTeX The article’s citation in BibTeX format.
bibliographical record in XML The article’s citation in XML.

If you choose the first icon, it will take you to a paper by Dan Jurafsky in 2002, where his email address is listed as: jurafsky@colorado.edu. (Computational Linguistics is open access now, all issues. Reason why I suggested it first.)

You could also look at Jurafsky’s publication page and find the same paper.

Where there is a listing of publications, try there first but realize that DPLP is a valuable research tool.

The oldest paper that Jurafsky has listed:

Jurafsky, Daniel, Chuck Wooters, Gary Tajchman, Jonathan Segal, Andreas Stolcke, Eric Fosler, and Nelson Morgan. 1994. Integrating Experimental Models of Syntax, Phonology, and Accent/Dialect in a Speech Recognizer (in AAAI-94 workshop)

Gives us his old Berkeley address: jurafsky@icsi.berkeley.edu.

Updating the information we have for Jurafsky:

  • University of California at Berkeley – jurafsky@icsi.berkeley.edu
  • University of Colorado, Boulder – jurafsky@colorado.edu
  • Stanford – jurafsky(at)stanford.edu

And his current homepage for a subject identifier: http://www.stanford.edu/~jurafsky/.

Or, in CTM notation for a topic map:

http://www.stanford.edu/~jurafsky/ # subject identifier
– “Dan Jurafsky”; # name with default type
email: jurafsky(at)stanford.edu @stanford; #occurrence with scope
email: jurafsky@colorado.edu @colorado; #occurrence with scope
email: jurafsky@icsi.berkeley.edu #icis.berkeley. #occurrence with scope, note period ending the topic “block”

I thought about and declined to use the notion of “currentEmail.” Using scopes allows for future changes in emails, while maintaining a sense of when certain email addresses were in use. Search engine policies not withstanding, the world is not a timeless place.

I have some of the results of using the Prof. Jurafsky’s prior addresses, but want to polish that up a bit before posting it.

(I will get to Christopher in the next part.)

Topic Map Tool Chain

Filed under: Authoring Topic Maps,Topic Maps — Patrick Durusau @ 8:52 pm

I have talked about a lot of software and techniques since starting this blog but I don’t have an easy way to organize them by topic map task. That is, when do you need which tool? And how would you evaluate one tool against another?

The second question, comparing tools, probably isn’t something I will get to in the coming year. Might but don’t get your hopes up. I do think I can start to outline one view of when you need which tool.

To talk about tools for topic maps, I need to have an outline of the process of creating a topic map.

My first cut at that process looks like this:

I already see some places that need repair/expansion so don’t take this as anything but a rough draft.

It can become better but only with your comments.

For example, I like the cloud metaphor, mostly because it is popular and people think they know what it means. 😉 But, here is leaves the false impression that “clouds” are the only source of data for a topic map.

What about people and their experiences? Or museums, art, books (those hard rectangular things), sensors, etc. Public vs. private clouds.

Maybe what I should do is keep the cloud and remove data/text and let the cloud be a hyperlink to another image that has more detail? Something like “universes of knowledge – enter here” or something like that. What do you think?

Question: For purposes of just blocking the process, should indexing point to “processing?” I know it can occur later or earlier but just curious how others feel.

The double ended arrows are to show that interaction is possible between stages. Such as authoring and the topic map instance. That the act of authoring a topic map can make the author of the topic map create different paths than originally intended. That happens so constantly that I thought it important to capture.

Question: And similarity measures. Where do I put them? Personally I think they fall under mining/analysis because that will be the basis for creation of the topic map but I can see an argument for merging/processing of the topic map also needing such rules in case another topic map ventures within merging distance.

Comments/suggestions?

PS: I would like to keep the diagram fairly uncluttered, even if I have to use the images or arrows to lead to other information or expand in someway. Diagrams that can’t be interpreted in a glance seem to defeat the purpose of having a diagram. (Not claiming that quality for this diagram, one of the reasons I am asking for you help.)

November 23, 2011

Crowdsourcing Maps

Filed under: Authoring Topic Maps,Crowd Sourcing,Maps — Patrick Durusau @ 7:35 pm

Crowdsourcing Maps by Mikhil Masli appears in the November 2011 issue of Computer.

Mikhil describes geowikis as having three characteristics that enable crowdsourcing of maps:

  • simple, WYSIWYG editing of geographic features like roads and landmarks
  • versioning that works with a network of tightly coupled objects rather than independent documents, and
  • spatial monitoring tools that make it easier for users to “watch” a geographic area for possibly malicious edits and to interpret map changes visually.

How would those translate into characeristics of topic maps?

  • simple WYSIWYG interface
  • versioning at lowest level
  • subject monitoring tools to enable watching for edits

Oh, I forgot, the topic map originator would have to supply the basic content of the map. Not going to be very interesting to have an empty map for other to fill in.

That is where geographic maps have the advantage is that there is already some framework, into which any user can add their smaller bit of information.

In creating environments where we want users to add to topic maps, we need to populate those “maps” and make it easy for users to contribute.

For example, a library catalog is already populated with information and one possible goal (it may or may not be yours) would be to annotate library holdings with commentary by anonymous or non-anonymous comments/reviews by library patrons. The binding could be based on the library’s internal identifier with other subjects (such as roles) being populated transparently to the user.

Could you do that without a topic map? Possibly, depending on your access to the internals of your library catalog software. But could you then also associate all those reviews with a particular author and not a particular book they had written? 😉 Yes, gets dicey when requirements for information delivery change over time. Topic maps excel at such situations because the subjects you want need only be defined. (Well, there is a bit more to it than that but the margin is too small to write it all down.)

My point here is that topic maps can be authored and vetted by small groups of experts but that they can also, with some planning, be usefully authored by large groups of individuals. That places a greater burden on the implementer of the authoring interface but experience with that sort of thing appears to be growing.

November 19, 2011

Crowdsourcing Scientific Research: Leveraging the Crowd for Scientific Discovery

Filed under: Authoring Topic Maps,Crowd Sourcing — Patrick Durusau @ 10:25 pm

Crowdsourcing Scientific Research: Leveraging the Crowd for Scientific Discovery by Dave Oleson.

From the post:

Lab scientists spend countless hours manually reviewing and annotating cells. What if we could give these hours back, and replace the tedious parts of science with a hands-off, fast, cheap, and scalable solution?

That’s exactly what we did when we used the crowd to count neurons, an activity that computer vision can’t yet solve. Building on the work we recently did with the Harvard Tuberculosis lab, we were able to take untrained people all over the world (people who might never have learned that DNA Helicase unzips genes
), turn them into image analysts with our task design and quality control, and get results comparable to those provided by trained lab workers.

So, do you think authoring your topic map is more difficult than reliable identification of neurons? Really?

Maybe the lesson of crowd sourcing is that we need to be as smart at coming up new ways to do old tasks as we think we are.

What do you think?

November 13, 2011

Looking for volunteers for collaborative search study

Filed under: Authoring Topic Maps,Collaboration,Searching,Volunteers — Patrick Durusau @ 9:59 pm

Looking for volunteers for collaborative search study

From the post:

We are about to deploy an experimental system for searching through CiteSeer data. The system, Querium, is designed to support collaborative, session-based search. This means that it will keep track of your searches, help you make sense of what you’ve already seen, and help you to collaborate with your colleagues. The short video shown below (recorded on a slightly older version of the system) will give you a hint about what it’s like to use Querium.

You may also want to visit the Session Search page.

Could be your opportunity to help shape the future of searching! Not to mention being a window into potentials for collaborative topic map authoring!

November 7, 2011

When Gamers Innovate

When Gamers Innovate

The problem (partially):

Typically, proteins have only one correct configuration. Trying to virtually simulate all of them to find the right one would require enormous computational resources and time.

On top of that there are factors concerning translational-regulation. As the protein chain is produced in a step-wise fashion on the ribosome, one end of a protein might start folding quicker and dictate how the opposite end should fold. Other factors to consider are chaperones (proteins which guide its misfolded partner into the right shape) and post-translation modifications (bits and pieces removed and/or added to the amino acids), which all make protein prediction even harder. That is why homology modelling or “machine learning” techniques tend to be more accurate. However, they all require similar proteins to be already analysed and cracked in the first place.

The solution:

Rather than locking another group of structural shamans in a basement to perform their biophysical black magic, the “Fold It” team created a game. It uses human brainpower, which is fuelled by high-octane logic and catalysed by giving it a competitive edge. Players challenge their three-dimensional problem-solving skills by trying to: 1) pack the protein 2) hide the hydrophobics and 3) clear the clashes.

Read the post or jump to the Foldit site.

Seems to me there are a lot of subject identity and relationship (association) issues that are a lot less complex that protein folding. Not that topic mappers should shy away from protein folding but we should be more imaginative about our authoring interfaces. Yes?

November 3, 2011

Introducing DocDiver

Introducing DocDiver by Al Shaw. The ProPublica Nerd Blog

From the post:

Today [4 Oct. 2011] we’re launching a new feature that lets readers work alongside ProPublica reporters—and each other—to identify key bits of information in documents, and to share what they’ve found. We call it DocDiver [1].

Here’s how it works:

DocDiver is built on top of DocumentViewer [2] from DocumentCloud [3]. It frames the DocumentViewer embed and adds a new right-hand sidebar with options for readers to browse findings and to add their own. The “overview” tab shows, at a glance, who is talking about this document and “key findings”—ones that our editors find especially illuminating or noteworthy. The “findings” tab shows all reader findings to the right of each page near where readers found interesting bits.

Graham Moore (Networkedplanet) mentioned early today that the topic map working group should look for technologies and projects where topic maps can make a real difference for a minimal amount of effort. (I’m paraphrasing so if I got it wrong, blame me, not Graham.)

This looks like a case where an application is very close to having topic map capabilities but not quite. The project already has users, developers and I suspect would be interested in anything that would improve their software, without starting over. That would be the critical part, to leverage existing software an imbue it with subject identity as we understand the concept, to the benefit of current users of the software.

October 20, 2011

Search Analytics for Your Site

Filed under: Authoring Topic Maps,Search Analytics — Patrick Durusau @ 6:36 pm

Search Analytics for Your Site

From the website:

Any organization that has a searchable web site or intranet is sitting on top of hugely valuable and usually under-exploited data: logs that capture what users are searching for, how often each query was searched, and how many results each query retrieved. Search queries are gold: they are real data that show us exactly what users are searching for in their own words. This book shows you how to use search analytics to carry on a conversation with your customers: listen to and understand their needs, and improve your content, navigation and search performance to meet those needs.

I haven’t read this book so don’t take this post as an endorsement or “buy” recommendation.

While watching the slide deck, it occurred to me that if search analytics could improve your website, why not use search analytics to develop the design and content of a topic map?

The design aspect in the sense that the most prominent, easiest to use/find content is what is popular with users. Could even be by time of the day if you have a topic map that is accessible 24 x 7.

The content aspect in the sense of what is included, what we say about it and perhaps how it is findable is based on search analysis.

If you were developing a topic map about Sarah Palin, perhaps searching for “dude” should return her husband as a topic. I can think of other nicknames but this isn’t a political blog.

Comments on this book or suggestions of other search analytics resources appreciated.

September 10, 2011

The Language Problem: Jaguars & The Turing Test

Filed under: Ambiguity,Authoring Topic Maps,Indexing,Language — Patrick Durusau @ 6:10 pm

The Language Problem: Jaguars & The Turing Test by Gord Hotchkiss.

The post begins innocently enough:

“I love Jaguars!”

When I ask you to understand that sentence, I’m requiring you to take on a pretty significant undertaking, although you do it hundreds of times each day without really thinking about it.

The problem comes with the ambiguity of words.

If you appreciate discussions of language, meaning and the short falls of our computing companions, you will really like this article and the promised following posts.

Not to mention bringing into sharp relief the issues that topic map authors (or indexers) face when trying to specify a subject that will be recognized and used by N unknown users.

I suppose that is really the tricky part, or at least part of it, the communication channel for an index or topic map is only one way. There is no opportunity for correcting a reading/mis-reading by the author. All that lies with the user/reader alone.

GTD – Global Terrorism Database

Filed under: Authoring Topic Maps,Data,Data Integration,Data Mining,Dataset — Patrick Durusau @ 6:08 pm

GTD – Global Terrorism Database

From the homepage:

The Global Terrorism Database (GTD) is an open-source database including information on terrorist events around the world from 1970 through 2010 (with annual updates planned for the future). Unlike many other event databases, the GTD includes systematic data on domestic as well as international terrorist incidents that have occurred during this time period and now includes more than 98,000 cases.

While chasing down a paper that didn’t make the cut I ran across this data source.

Lacking an agreed upon definition of terrorism (see Chomsky for example), you may or may not find what you consider to be incidents of terrorism in this dataset.

Never the less, it is a dataset of events of popular interest and can be used to attract funding for your data integration project using topic maps.

TV Tropes

Filed under: Authoring Topic Maps,Data,Interface Research/Design — Patrick Durusau @ 6:06 pm

TV Tropes

Sam Hunting forwarded this to my attention.

From the homepage:

What is this about? This wiki is a catalog of the tricks of the trade for writing fiction.

Tropes are devices and conventions that a writer can reasonably rely on as being present in the audience members’ minds and expectations. On the whole, tropes are not clichĂ©s. The word clichĂ©d means “stereotyped and trite.” In other words, dull and uninteresting. We are not looking for dull and uninteresting entries. We are here to recognize tropes and play with them, not to make fun of them.

The wiki is called “TV Tropes” because TV is where we started. Over the course of a few years, our scope has crept out to include other media. Tropes transcend television. They reflect life. Since a lot of art, especially the popular arts, does its best to reflect life, tropes are likely to show up everywhere.

We are not a stuffy encyclopedic wiki. We’re a buttload more informal. We encourage breezy language and original thought. There Is No Such Thing As Notability, and no citations are needed. If your entry cannot gather any evidence by the Wiki Magic, it will just wither and die. Until then, though, it will be available through the Main Tropes Index.

I rather like the definition of trope as “devices and conventions that a writer can reasonably rely on as present in the audience members’ minds and expecations.” I would guess under some circumstances we could call those “subjects” which we can include in a topic map. And then, for example, map the occurrences of those subjects in TV shows, for example.

As the site points out, it is called TV Tropes because it started with TV, but tropes have a much larger range than TV.

Being aware of and able to invoke (favorable) tropes in the minds of your users is one part of selling your topic map solution.

Solr Digest, Spring-Summer 2011, Part 1

Filed under: Authoring Topic Maps,Solr — Patrick Durusau @ 6:04 pm

Solr Digest, Spring-Summer 2011, Part 1

Don’t miss this issue of the Solr Digest.

It covers Solr releases 3.2, 3.3 and the upcoming 3.4 so there is no shortage of material. Part 2 is in the works.

Of particular interest to topic map authors will be the result grouping/field collapsing.

From the Apache wiki:

Field Collapsing and Result Grouping are different ways to think about the same Solr feature.

Field Collapsing collapses a group of results with the same field value down to a single (or fixed number) of entries. For example, most search engines such as Google collapse on site so only one or two entries are shown, along with a link to click to see more results from that site. Field collapsing can also be used to suppress duplicate documents.

Result Grouping groups documents with a common field value into groups, returning the top documents per group, and the top groups based on what documents are in the groups. One example is a search at Best Buy for a common term such as DVD, that shows the top 3 results for each category (“TVs & Video”,”Movies”,”Computers”, etc)

For example, collapsed results could be exported to a representation as a topic and either occurrences or associations of a particular type. Other uses will suggest themselves.

August 29, 2011

Hyperwords

Filed under: Authoring Topic Maps,Interface Research/Design — Patrick Durusau @ 6:29 pm

Hyperwords

From the website:

Every word becomes a link to the most powerful services on the internet – Google, Wikipedia, translations, conversions and much more.

Is available as a plugin for Firefox, Chrome and Safari web browsers. A beta version is being tested for IE, Office and PDFs.

You can select a single word or a group of words or numbers.

It can also be licensed for use with a website and that enables you to customize the user’s experience.

Very high marks for a user friendly interface. Even casual users know how to select text, although want to do with it next may prove to be a big step. Still, “click on the icon” should be as easy to remember as “use the force Luke!,” at least with enough repetition.

I am curious about the degree of customization that is possible with a licensed copy for a website. Quite obviously thinking about using words on a website or some known set of website as keys into a topic map backend.

This could prove to be a major step forward for all semantic-based services.

Very much a watch this space service.

August 15, 2011

A Workflow for Digital Research Using Off-the-Shelf Tools

Filed under: Authoring Topic Maps,Digital Research,Research Methods — Patrick Durusau @ 7:30 pm

A Workflow for Digital Research Using Off-the-Shelf Tools by William J. Turkel.

An excellent overview of useful tools for digital research.

One or more of these will be useful in authoring your next topic map.

August 3, 2011

UK Government Paves Way for Data-Mining

Filed under: Authoring Topic Maps,Data Mining,Marketing — Patrick Durusau @ 7:37 pm

UK Government Paves Way for Data-Mining

Blog report on interesting UK government policy report.

From the post:

The key recommendation is that the Government should press at EU level for the introduction of an exception to current copyright law, allowing “non-consumptive” use of a work (ie a use that doesn’t directly trade on the underlying creative and expressive purpose of the work). In the process of text-mining, copying is only carried out as part of the analysis process – it is a substitute for a human reading the work, and therefore does not compete with the normal exploitation of the work itself – in fact, as the paper says, these processes actually facilitate a work’s exploitation (ie by allowing search, or content recommendation). (emphasis in original)

If you think of topic maps as a value-add on top of information stores, allowing “non-consumptive” access would be a real boon for topic maps.

You could create a topic map into copyrighted material and the user of your topic map could access that material only if say they were a subscriber to that content.

As Steve Newcomb has argued on many occasions, topic maps can become economic artifacts in their own right.

July 26, 2011

A machine learning toolbox for musician
computer interaction

Filed under: Authoring Semantics,Authoring Topic Maps,Machine Learning — Patrick Durusau @ 6:27 pm

A machine learning toolbox for musician computer interaction

Abstract:

This paper presents the SARC EyesWeb Catalog, (SEC), a machine learning toolbox that has been specifically developed for musician-computer interaction. The SEC features a large number of machine learning algorithms that can be used in real-time to recognise static postures, perform regression and classify multivariate temporal gestures. The algorithms within the toolbox have been designed to work with any N-dimensional signal and can be quickly trained with a small number of training examples. We also provide the motivation for the algorithms used for the recognition of musical gestures to achieve a low intra-personal generalisation error, as opposed to the inter-personal generalisation error that is more common in other areas of human-computer interaction.

Recorded at: 11th International Conference on New Interfaces for Musical Expression. 30 May – 1 June 2011, Oslo, Norway. nime2011.org

The paper: A machine learning toolbox for musician computer interaction

The software: SARC EyesWeb Catalog [SEC]

Although written in the context of musician-computer interaction, the techniques described here could just as easily be applied to exploration or authoring of a topic map. Or for that matter exploring a data stream that is being presented to a user.

Imagine that one hand gives “focus” to some particular piece of data and the other hand “overlays” a query onto that data that then displays a portion of a topic map with that data as the organizing subject. Based on that result the data can be simply dumped back into the data stream or “saved” for further review and analysis.

July 10, 2011

How To Create a Hello World Page with structr
(Hello World for topic maps?)

Filed under: Authoring Topic Maps,Neo4j,structr — Patrick Durusau @ 3:40 pm

How To Create a Hello World Page with structr

Guide to creating a “Hello World” page with structr, which is a Neo4j-based CMS.

While looking at the guide, it occurred to me that most users are only going to create pages. I can’t imagine most sysadmins giving users the ability to create domains, sites or even templates. Users are going to author pages. And when their applications open up, it is going to be inside a domain/site and have only certain templates they can use, as part of a workflow to others.

Shouldn’t that be the same case for topic maps? That is the average user does not author a topic map, does not author default subjects or identifiers, probably doesn’t author identifiers at all. And for that matter, whatever they do author, is part of a workflow with others. Yes?

Has the problem, at least in part, been that topic map explanations explain too much? That for some specific domain, we say what to do and it simply works?

Take www.worldcat.org for example. A topic map authoring interface to that resource should allow users to select one or more entries, returned from a query, which all share the same ISBN, as being the same item.

For example, search for “Real World Haskell.” Six items are returned, with the first two obviously being the same title. The first entry has the following ISBN entry: 9780596514983 0596514980, the second entry has: 0596514980 : 9780596514983. That’s right. Addition of a colon separator and the ISBN numbers are reversed. Rather than two entries, a topic map should allow me to mark this as one entry and to process the underlying data to present it as such. Including all the libraries with holdings.

That should not require any more effort on my part than choosing these entries as identical items. Ideally that choice on my part should accrue to the benefit of any subsequent users searching for the same entry.

The third and fourth items are the same text but in Japanese. My personal modeling choice would be to merge them and the sixth item (the Safari edition) as language and media variants respectively. Might need language/media variant choices.

No thorny theoretical issues, immediate benefit to current and subsequent users. Is that a way forward?


Deduplication of the WorldCat files by automated means is certainly possible and a value-add. But given the number of users who already consult WorldCat on a daily basis, why not take advantage of their human expertise?

June 25, 2011

Hammock-driven Development

Filed under: Authoring Topic Maps,Clojure — Patrick Durusau @ 8:51 pm

Hammock-driven Development

From the description:

Rich Hickey’s second, “philosophical” talk at the first Clojure Conj, in Durham, North Carolina on October 23rd, 2010.

The presentation reminded me of this story in Peopleware (p. 67):

In my years at Bell Labs, we worked in two-person offices. They were spacious, quiet, and the phones could be diverted. I shared my office with Wendl Thomis who went on to build a small empire as an electronic toy maker. In those days, he was working on the ESS fault dictionary. The dictionary scheme relied on the notion of n-space proximity, a concept hairy enough to challenge even Wendl’s powers of concentration. One afternoon, I was bent over a program listing while Wendl was staring into space, his feet propped up on the desk. Our boss came in and asked, “Wendl! What are you doing?” Wendl said, “I’m thinking.” And the boss said, “Can’t you do that at home?”

If you liked that story, you will like the presentation.

Everything that is said about software development is directly applicable to authoring topic maps and standards.

June 14, 2011

Seven Things Human Editors Do that Algorithms Don’t (Yet)

Filed under: Authoring Topic Maps,Marketing — Patrick Durusau @ 10:25 am

Seven Things Human Editors Do that Algorithms Don’t (Yet)

The seven things are all familiar:

  • Anticipation
  • Risk-taking
  • The whole picture
  • Pairing
  • Social importance
  • Mind-blowingness
  • Trust

At least for topic map authors.

How are you selling the human authorial input into your topic maps?

This list looks like a good place to start.

May 26, 2011

Google Correlate & Party Games

Filed under: Authoring Topic Maps,Search Engines,Searching — Patrick Durusau @ 3:42 pm

Google Correlate

A new service from Google. From the blog entry:

It all started with the flu. In 2008, we found that the activity of certain search terms are good indicators of actual flu activity. Based on this finding, we launched Google Flu Trends to provide timely estimates of flu activity in 28 countries. Since then, we’ve seen a number of other researchers—including our very own—use search activity data to estimate other real world activities.

However, tools that provide access to search data, such as Google Trends or Google Insights for Search, weren’t designed with this type of research in mind. Those systems allow you to enter a search term and see the trend; but researchers told us they want to enter the trend of some real world activity and see which search terms best match that trend. In other words, they wanted a system that was like Google Trends but in reverse.

This is now possible with Google Correlate, which we’re launching today on Google Labs. Using Correlate, you can upload your own data series and see a list of search terms whose popularity best corresponds with that real world trend. In the example below, we uploaded official flu activity data from the U.S. CDC over the last several years and found that people search for terms like [cold or flu] in a similar pattern to actual flu rates…

One use Google Correlate would be party games to guess the correlated terms.

I looked at the “rainfall” correlation example.

For “annual rainfall (in) |correlate| disney vacation package,” I would have guessed “prozac” and not “mildew remover.” Shows what I know.

I am sure topic map authors have other uses for these Google tools. What are yours?

May 21, 2011

opencorporates

Filed under: Authoring Topic Maps,Dataset — Patrick Durusau @ 5:14 pm

opencorporates – The Open Corporate Database of the World

A “alpha” status project that is collecting corporate registration/report information from around the world.

As of 21 May 2011, 12,678,041 companies.

Five US states plus District of Columbia, United Kingdom, Netherlands and a scattering of others.

This is a useful data source, provided the corporations of interest fall in a covered jurisdiction.

The following video illustrates the usefulness of this site:

How to use OpenCorporates to match companies in Google Refine

Certainly looks like a useful tool for populating a topic map to me!

That may be the ultimate value of all the Linked Data efforts. Being the step before reconciliation of information into a reliable form for merger with other reconciled information. At some point raw information has to be gathered together and a rough cut gathering with Linked Data is as good as any other method.

May 15, 2011

Breaking Bin Laden: A Closer Look

Filed under: Authoring Topic Maps,Graphs — Patrick Durusau @ 5:57 pm

Breaking Bin Laden: A Closer Look

A post from the SocialFlow blog:

Since last Friday, when we first published Breaking Bin Laden: Visualizing the Power of a Single Tweet, our analysis and data visualization of the way news filtered out around the Bin Laden raid via the Twitter, we’ve been overwhelmed by the response. Thousands of Tweets, many in Spanish, French, German and Japanese.

There have been quite a few interesting articles written about our post as well. The Guardian asked important questions about how journalists can respond to the tremendous velocity of the real-time web. Over at Fast Company, Brian Solis used our visualization as a jumping off point for a discussion of who matters in “the information economy.”

There have been plenty of inquiries about the graph itself, so we wanted to provide you with the opportunity to explore it in greater depth. Click on the image below or download it, and zoom in to get a closer look at all of the intersecting forces that propelled a single tweet to its eventual astonishing spread.

Truly unusual work.

Makes me wonder about several things:

  1. What would it take to trace the addition of information to a topic map in a similar way?
  2. What would it look like to add information to the nodes in these graphs using a topic map?
  3. For that matter, what information would you want to add and why?

May 14, 2011

XMLSH

Filed under: Authoring Topic Maps,Data Mining — Patrick Durusau @ 6:25 pm

XMLSH – Command line shell for processing XML.

Another tool for your data mining/manipulation tool-kit!

May 12, 2011

Topic Map Metrics II

Filed under: Authoring Topic Maps,Topic Maps — Patrick Durusau @ 7:57 am

No real insight on how to construct a topic map metric, even by contract but do have a couple of more examples for a discussion of metrics:

Example 1:

Robert Cerny wants a topic map of concrete things, like members of a sports team.

OK, if I add their wives to the topic map does that make it “more complete?”

What if I add their mistresses (current/former) and illegitimate children?

Does your answer change if the wives don’t (already) know about the mistresses?

That is, how would I set boundaries on the associations that are included in a topic map?

Example 2:

What if I am building a topic map for a journal archive?

There is a traditional index which does author, title indices, maybe even a subject index.

Assume that I convert that into a topic map and that is the “base” for the topic map.

That is the topic map has to contain every entry that is found in the printed index.

At least now we can measure when the topic map falls short.

I think the second example is important because without a specified base of information, you are just waving your hands to talk about making a topic map more complete.

Well, maybe not a specified base but you do have to say “what subjects you want to talk about” as Steve Newcomb would say.

Not a lot but perhaps a useful place to start the discussion.

May 3, 2011

The Human – Computer Chasm & Topic Maps

Filed under: Authoring Topic Maps,Subject Identity,Topic Maps — Patrick Durusau @ 1:33 pm

Someone asked the other day why I thought adoption of topic maps hasn’t “set the woods on fire,” as my parents generation would say.

I am in the middle of composing a longer response with suggestions for marketing strategies but I wanted to stop and share something about the human – computer chasm that is relevant to topic maps.

Over the years the topic map community has debated various syntaxes, models, data model, recursive subject representation, query languages and the like. All of which have been useful and sometimes productive debates.

But in those debates, we sometimes (always?) over-looked the human – computer chasm when talking about subject identity.

Take a simple example:

When I see my mother-in-law I don’t think:

  1. http://www.durusau.net/general/Ethel-Holt.html
  2. Wife of Fred Holt
  3. Mother of Carol Holt (my wife)
  4. Mother-in-law of Patrick Durusau
  5. etc….

I know all those things but they aren’t how I recognize Ethel Holt.

I have known Ethel for more than thirty (30) years and have been her primary care-giver for the last decade or so.

To be honest, I don’t know how I recognize Ethel but suspect it is a collage of factors both explicit and implicit.

But topic maps don’t record our recognition of subjects. They record our after the fact explanations of how we think we recognized subjects. To be matched with circumstances that would lead to the same explanation.

I think part of the lack of progress with topic maps is that we failed to recognize the gap between how we recognize subjects and what we write down so computers can detect when two statements are about the same subject.

What topic maps are mapping, isn’t between properties of subjects (although it can be expressed that way) but between the reasons given by some person for identifying a subject.

The act of recognition is human, complex and never fully explained.

Detecting subject sameness is mechanical and based on recorded explanations.

That distinction makes it clear the choices of properties, syntax, etc., for subject sameness, are a matter of convenience, nothing more.

The Proverbial Lone Wolf Librarian’s Weblog

Filed under: Authoring Topic Maps,Library,Searching — Patrick Durusau @ 1:08 pm

The Proverbial Lone Wolf Librarian’s Weblog

A blog that will be of interest to librarians and library school students in particular.

It collects presentations that focus on digital issues from a library perspective.

Acquiring the skills long taught to librarians will make you a better topic map author.

April 24, 2011

Write Good Papers

Filed under: Authoring Topic Maps,Marketing — Patrick Durusau @ 5:31 pm

Write Good Papers by Daniel Lemire merits bookmarking and frequent review.

Authoring, whether of a blog post, a formal paper, program documentation, or a topic map, is authoring.

Review of these rules will improve the result of any authoring task.

March 27, 2011

Authoring Topic Maps Interfaces

Filed under: Authoring Topic Maps,Interface Research/Design — Patrick Durusau @ 3:17 pm

In a discussion about authoring interfaces today I had cause to mention the use of styles to enable conversion of documents to SGML/XML.

This was prior to the major word processing formats converting to XML. Yes, there was a dark time with binary formats but I will leave that for another day.

As I recall, the use of styles, if done consistently, was a useful solution for how to reliably convert from binary formats to SGML/XML.

There was only one problem.

It was difficult if not impossible to get users to reliably use styles in their documents.

Which caused all sorts of havoc with the conversion process.

I don’t recall seeing any actual studies on users failing to use styles correctly but it was common knowledge at the time.

Does anyone have pointers to literature on the consistent use of styles by users?

I mention that recollection as a starting point for discussion of different levels of topic map authoring interfaces.

That is users willingness to do something consistently, is appallingly low.

So we need to design mechanisms to compensate for their lack of consistency. (to use a nice term for it)

Rather than expecting me to somehow mark my use of the term “topic,” when followed immediately by “map,” is not a “topic” in the same sense as Latent Dirichlet Allocation (LDA), the interface should be set to make that distinction on its own.

And when I am writing a blog post on Latent Dirichlet Allocation (LDA), the interface should ask when I use the term “topic” (not followed immediately by “map”) do I mean “topic” in the sense of 13250-2 or do I mean “topic” in the sense of Latent Dirichlet Allocation (LDA)? My response is simply yes/no.

It really has to be that simple.

More complex authoring interfaces should be available but creating systems that operate in the background of our day to day activities, silently gathering up topics, associations, occurrences are going to do a long way to solving some of the adoption problems for topic maps.

We have had spell-check for years.

Why not subject-check? (I will have to think about that part. Could be interesting. Images for people/places/things? We would be asking the person most likely to know, the author.)

« Newer PostsOlder Posts »

Powered by WordPress