Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 9, 2011

AutoMap – Extracting Topic Maps from Texts?

Filed under: Authoring Topic Maps,Entity Extraction,Networks,Semantics,Software — Patrick Durusau @ 10:59 am

AutoMap: Extract, Analyze and Represent Relational Data from Texts (according to its webpage).

From the webpage:

AutoMap is a text mining tool that enables the extraction of network data from texts. AutoMap can extract content analytic data (words and frequencies), semantic networks, and meta-networks from unstructured texts developed by CASOS at Carnegie Mellon. Pre-processors for handling pdf’s and other text formats exist. Post-processors for linking to gazateers and belief inference also exist. The main functions of AutoMap are to extract, analyze, and compare texts in terms of concepts, themes, sentiment, semantic networks and the meta-networks extracted from the texts. AutoMap exports data in DyNetML and can be used interoperably with *ORA.

AutoMap uses parts of speech tagging and proximity analysis to do computer-assisted Network Text Analysis (NTA). NTA encodes the links among words in a text and constructs a network of the linked words.

AutoMap subsumes classical Content Analysis by analyzing the existence, frequencies, and covariance of terms and themes.

For a rough cut at a topic map from a text, AutoMap looks like a useful tool.

In addition to the software, training material and other information is available.

My primary interest is the application of such a tool to legislative debates, legislation and court decisions.

None of those occur in a vacuum and topic maps could help provide a context for understand such material.

January 4, 2011

ColorBrewer – A Tool for Color Design in Maps – Post

Filed under: Authoring Topic Maps,Graphics,Mapping,Maps — Patrick Durusau @ 10:23 am

ColorBrewer – A Tool for Color Design in Maps

From Matthew Hurst:

Just found ColorBrewer2 – a tool that helps select color schemes for map based data. The tool allows you to play with different criteria, then proposes a space of possible color combinations. Proactively filtering for color blindness, photocopy friendly and printer friendly is great. Adding projector friendly (no yellow please) would be nice. I’d love to see something like this for time series and other statistical data forms.

Just the thing for planning map based interfaces for topic maps!

December 23, 2010

ScraperWiki

Filed under: Authoring Topic Maps,Data Mining,Data Source — Patrick Durusau @ 1:47 pm

ScraperWiki

The website describes traditional screen scraping and then says:

ScraperWiki is an online tool to make that process simpler and more collaborative. Anyone can write a screen scraper using the online editor, and the code and data are shared with the world. Because it’s a wiki, other programmers can contribute to and improve the code. And, if you’re not a programmer yourself, you can request a scraper or ask the ScraperWiki team to write one for you.

Interesting way to promote the transition to accessible and structured data.

One step closer to incorporation into or being viewed by a topic map!

December 14, 2010

NKE: Navigational Knowledge Engineering

Filed under: Authoring Topic Maps,Ontology,Subject Identity,Topic Maps — Patrick Durusau @ 5:36 pm

NKE: Navigational Knowledge Engineering

From the website:

Although structured data is becoming widely available, no other methodology – to the best of our knowledge – is currently able to scale up and provide light-weight knowledge engineering for a massive user base. Using NKE, data providers can publish flat data on the Web without extensively engineering structure upfront, but rather observe how structure is created on the fly by interested users, who navigate the knowledge base and at the same time also benefit from using it. The vision of NKE is to produce ontologies as a result of users navigating through a system. This way, NKE reduces the costs for creating expressive knowledge by disguising it as navigation. (emphasis in original)

This methodology may or may not succeed but it demonstrates a great deal of imagination.

Now imagine a similar concept but built around subject identity.

Where known ambiguities offer a user a choice of subjects to identify.

Or where there are different ways to identify a subject. The harder case.

Questions:

  1. Read the paper/run the demo. Comments, suggestions? (3-5 pages, no citations)
  2. How would you adapt this approach to the identification of subjects? (3-5 pages, no citations)
  3. What data set would you suggest for a test case using the technique you describe in #2? Why is that data set a good test? (3-5 pages, pointers to the data set)

December 10, 2010

Decoding Searcher Intent: Is “MS” Microsoft Or Multiple Sclerosis? – Post

Filed under: Authoring Topic Maps,Interface Research/Design,Search Engines,Searching — Patrick Durusau @ 7:35 pm

Decoding Searcher Intent: Is “MS” Microsoft Or Multiple Sclerosis? is a great post from searchengineland.com.

Although focused on user behavior, as a guide to optimizing content for search engines, the same analysis is relevant for construction of topic maps.

A topic map for software help files is very unlikely to treat “MS” as anything other than Microsoft.

Even if those files might contain a references to Multiple Sclerosis, written as “MS.”

Why?

Because every topic map will concentrate its identification of subjects and relationships between subjects where there is the greatest return on investment.

Just as we have documentation rot now, there will be topic map rot as some subjects near the boundary of what is being maintained.

And some subjects won’t be identified or maintained at all.

Perhaps another class of digital have-nots.

Questions:

  1. Read the post and prepare a one page summary of its main points.
  2. What other log analysis would you use in designing a topic map? (3-5 pages, citations)
  3. Should a majority of user behavior/expectations drive topic map design? (3-5 pages, no citations)

Semantically Equivalent Facets

Filed under: Authoring Topic Maps,Facets,Topic Map Software,Topic Map Systems,Topic Maps — Patrick Durusau @ 3:32 pm

I failed to mention semantically equivalent facets in either Identifying Subjects With Facets or Facets and “Undoable” Merges.

Sorry! I assumed it was too obvious to mention.

That is if you are using a facet based navigation with a topic map, it will return/navigate the facet you ask for, and also return/navigate any semantically equivalent facet.

One of the advantages of using a topic map to underlie a facet system is that users get the benefit of something familiar, a set of facet axes they recognize, while at the same time getting the benefit of navigating semantically equivalent facets without knowing about it.

I suppose I should say that declared semantically equivalent facets are included in navigation.

Declared semantic equivalence doesn’t just happen, nor is it free.

Keeping that in mind will help you ask questions when sales or project proposals gloss over the hard questions of what return you will derive from an investment in semantic technologies? And when?

Facets and “Undoable” Merges

After writing Identifying Subjects with Facets, I started thinking about the merge of the subjects matching a set of facets. So the user could observe all the associations where the members of that subject participated.

If merger is a matter of presentation to the user, then the user should be able to remove one of the members that makes up a subject from the merge. Which results in the removal of associations where that member of the subject participated.

No more or less difficult than the inclusion/exclusion based on the facets, except this time it involves removal on the basis of roles in associations. That is the playing of a role, being a role, etc. are treated as facets of a subject.

Well, except that an individual member of a collective subject is being manipulated.

This capability would enable a user to manipulate what members of a subject are represented in a merge. Not to mention being able to unravel a merge one member of a subject at a time.

An effective visual representation of such a capability could be quite stunning.

Identifying Subjects With Facets

If facets are aspects of subjects, then for every group of facets, I am identifying the subject that has those facets.

If I have the facets, height, weight, sex, age, street address, city, state, country, email address, then at the outset, my subject is the subject that has all those characteristics, with whatever value.

We could call that subject: people.

Not the way I usually think about it but follow the thought out a bit further.

For each facet where I specify a value, the subject identified by the resulting value set is both different from the starting subject and, more importantly, has a smaller set of members in the data set.

Members that make up the collective that is the subject we have identified.

Assume we have narrowed the set of people down to a group subject that has ten members.

Then, we select merge from our application and it merges these ten members.

Sounds damned odd, to merge what we know are different subjects?

What if by merging those different members we can now find these different individuals have a parent association with the same children?

Or have a contact relationship with a phone number associated with an individual or group of interest?

Robust topic map applications will offer users the ability to navigate and explore subject identities.

Subject identities that may not always be the ones you expect.

We don’t live in a canned world. Does your semantic software?

December 8, 2010

Aspects of Topic Maps

Writing about Bobo: Fast Faceted Search With Lucene, made me start to think about the various aspects of topic maps.

Authoring of topic maps is something that was never discussed in the original HyTime based topic map standard and despite several normative syntaxes, mostly even now it is either you have a topic map, or you don’t. Depending upon your legend.

Which is helpful given the unlimited semantics that can be addressed with topic maps but looks awfully hand-wavy to, ahem, outsiders.

Subject Identity or should I say: when two subject representatives are deemed for some purpose to represent the same subject. (That’s clearer. ;-)) This lies at the heart of topic maps and the rest of the paradigm supports or is consequences of this principle.

There is no one way to identify any subject and users should be free to use the identification that suits them best. Where subjects include the data structures that we build for users. Yes, IT doesn’t get to dictate what subjects can be identified or how. (Probably should have never been the case but that is another issue.)

Merging of subject representatives. Merging is an aspect of recognizing two or more subject representatives represent the same subject. What happens then is implementation, data model and requirement specific.

A user may wish to see separate representatives just prior to merger so merging can be audited or may wish to see only merged representatives for some subset of subjects or may have other requirements.

Interchange of topic maps. Not exclusively the domain of syntaxes/data models but an important purpose for them. It is entirely possible to have topic maps for which no interchange is intended or desirable. Rumor has it of the topic maps at the Y-12 facility at Oak Ridge for example. Interchange was not their purpose.

Navigation of the topic map. The post that provoked this one is a good example. I don’t need specialized or monolithic software to navigate a topic map. It hampers topic map development to suggest otherwise.

Querying topic maps. Topic maps have been slow to develop a query language and that effort has recently re-started. Graph query languages, that are already fairly mature, may be sufficient for querying topic maps.

Given the diversity of subject identity semantics, I don’t foresee a one size fits all topic maps query language.

Interfaces for topic maps. However one resolves/implements other aspects of topic maps, due regard has to be paid to the issue of interfaces. Efforts thus far range from web portals to “look its a topic map!” type interface.

In the defense of current efforts, human-computer interfaces are poorly understood. Not surprising since the human-codex interface isn’t completely understood and we have been working at that one considerably longer.

Questions:

  1. What other aspects to topic maps would you list?
  2. Would you sub-divide any of these aspects? If so, how?
  3. What suggestions do you have for one or more of these aspects?

Bayesian Model Selection and Statistical Modeling – Review

Filed under: Authoring Topic Maps,Bayesian Models,Software — Patrick Durusau @ 9:47 am

Bayesian Model Selection and Statistical Modeling by Tomohiro Ando, reviewed by Christian P. Robert.

If you are planning on using Bayesian models in your topic maps activities, read this review first.

You will thank the reviewer later.

Webinar: Revolution R is 100% R and More
9 AM Pacific 8 December 2010 (today)

Filed under: Authoring Topic Maps,R,Software — Patrick Durusau @ 7:59 am

Webinar: Revolution R is 100% R and More

Apologies for the short notice but this webinar may be of interest to those using R to mine data sets as part of topic map construction.

It was in my morning sweep of resources and was just posted yesterday.

I have a scheduling conflict but the webinar is said to be available for asynchronous viewing.

December 6, 2010

KissKissBan

KissKissBan: A Competitive Human Computation Game for Image Annotation Authors: Chien-Ju Ho, Tao-Hsuan Chang, Jong-Chuan Lee, Jane Yung-jen Hsu, Kuan-Ta Chen Keywords: Amazon Mechanical Turk, ESP Game, Games With A Purpose, Human Computation, Image Annotation

Abstract:

In this paper, we propose a competitive human computation game, KissKissBan (KKB), for image annotation. KKB is different from other human computation games since it integrates both collaborative and competitive elements in the game design. In a KKB game, one player, the blocker, competes with the other two collaborative players, the couples; while the couples try to find consensual descriptions about an image, the blocker’s mission is to prevent the couples from reaching consensus. Because of its design, KKB possesses two nice properties over the traditional human computation game. First, since the blocker is encouraged to stop the couples from reaching consensual descriptions, he will try to detect and prevent coalition between the couples; therefore, these efforts naturally form a player-level cheating-proof mechanism. Second, to evade the restrictions set by the blocker, the couples would endeavor to bring up a more diverse set of image annotations. Experiments hosted on Amazon Mechanical Turk and a gameplay survey involving 17 participants have shown that KKB is a fun and efficient game for collecting diverse image annotations.

This article makes me wonder about the use of “games” for the construction of topic maps?

I don’t know of any theoretical reason why topic map construction has to resemble a visit to the dentist office. 😉

Or for that matter, why does a user needs to know they are authoring/using a topic map at all?

Questions:

  1. What other game or game like scenario’s do you think lend themselves to the creation of online content? (3-5 pages, citations)
  2. What type of information do you think users could usefully contribute to a topic map (whether known to be a topic map or not)? (3-5 pages, no citations)
  3. Sketch out a proposal for an online game that adds information, focusing on incentives and the information contributed. (3-5 pages, no citations)

December 5, 2010

d.note: revising user interfaces through change tracking, annotations, and alternatives

Filed under: Authoring Topic Maps,Interface Research/Design — Patrick Durusau @ 8:22 am

d.note: revising user interfaces through change tracking, annotations, and alternatives Authors: Björn Hartmann, Sean Follmer, Antonio Ricciardi, Timothy Cardenas, Scott R. Klemmer

Abstract:

Interaction designers typically revise user interface prototypes by adding unstructured notes to storyboards and screen printouts. How might computational tools increase the efficacy of UI revision? This paper introduces d.note, a revision tool for user interfaces expressed as control flow diagrams. d.note introduces a command set for modifying and annotating both appearance and behavior of user interfaces; it also defines execution semantics so proposed changes can be tested immediately. The paper reports two studies that compare production and interpretation of revisions in d.note to freeform sketching on static images (the status quo). The revision production study showed that testing of ideas during the revision process led to more concrete revisions, but that the tool also affected the type and number of suggested changes. The revision interpretation study showed that d.note revisions required fewer clarifications, and that additional techniques for expressing revision intent could be beneficial. (There is a movie that accompanies this article as well.)

Designing/revising user interfaces is obviously relevant to the general task of creating topic maps software.

Questions:

  1. Pick a current topic map authoring tool and evaluate its user interface. (3-5 pages, no citations)
  2. Create a form for authoring topic map material in a particular domain.
  3. What are the strong/weak points of your proposal in #2? (3-5 pages, no citations)

December 3, 2010

Dynamic Indexes?

I was writing the post about the New York Times graphics presentation when it occurred to me how close we are to dynamic indexes.

After all, gaming consoles are export restricted.

What we now consider to be “runs,” static indexes and the like are computational artifacts.

They follow how we created indexes when they were done by hand.

What happens when the properties of what is being indexed, its identifications and merging rules can change on the fly and re-present itself to the user for further manipulation?

I don’t think the fundamental issues of index construction get any easier with dynamic indexes but how we answer them will determine how quickly we can make effective use of such indexes.

Whether crossing the line first to dynamic indexes will be a competitive advantage, only time will tell.

I would like for some VC to be interested in finding out.

Caveat to VCs. If someone pitches this as making indexes more quickly, that isn’t the point. “Quick” and “dynamic” aren’t the same thing. Related but different. Keep both hands on your wallet.

Detecting “Duplicates” (same subject?)

Filed under: Authoring Topic Maps,Duplicates,String Matching,Subject Identity — Patrick Durusau @ 4:43 pm

A couple of interesting posts from the LingPipe blog:

Processing Tweets with LingPipe #1: Search and CSV Data Structures

Processing Tweets with LingPipe #2: Finding Duplicates with Hashing and Normalization

The second one on duplicates being the one that caught my eye.

After all, what are merging conditions the in TMDM other than the detection of duplicates?

Of course, I am interested in TMDM merging but also in the detection of fuzzy subject identity.

Whether than is then represented by an IRI or kept as a native merging condition being an implementation type issue.

This could be very important for some future leak of diplomatic tweets. 😉

NoSQL Data Modeling

Filed under: Authoring Topic Maps,Database,Topic Maps — Patrick Durusau @ 4:06 pm

NoSQL Data Modeling

Alex Popescu emphasizes that data modeling is part and parcel of NoSQL database design.

Data modeling practice has something that topic maps practice does not: a wealth of material on data model patterns.

Rather I should say: subject identification patterns (which subjects to identify) and subject identity patterns (how to identify those subjects).

Both of which if developed and written out, could help with the topic map authoring process.

S4

S4

From the website:

S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.

Just in case you were wondering if topic maps are limited to being bounded objects composed of syntax. No.

Questions:

  1. Specify three sources of unbounded streams of data. (3 pages, citations)
  2. What subjects would you want to identify and on what basis in any one of them? (3-5 pages, citations)
  3. What other information about those subjects would you want to bind to the information in #2? What subject identity tests are used for those subjects in other sources? (5-10 pages, citations)

December 2, 2010

Building Concept Structures/Concept Trails

Automatically Building Concept Structures and Displaying Concept Trails for the Use in Brainstorming Sessions and Content Management Systems Authors: Christian Biemann, Karsten Böhm, Gerhard Heyer and Ronny Melz

Abstract:

The automated creation and the visualization of concept structures become more important as the number of relevant information continues to grow dramatically. Especially information and knowledge intensive tasks are relying heavily on accessing the relevant information or knowledge at the right time. Moreover the capturing of relevant facts and good ideas should be focused on as early as possible in the knowledge creation process.

In this paper we introduce a technology to support knowledge structuring processes already at the time of their creation by building up concept structures in real time. Our focus was set on the design of a minimal invasive system, which ideally requires no human interaction and thus gives the maximum freedom to the participants of a knowledge creation or exchange processes. The initial prototype concentrates on the capturing of spoken language to support meetings of human experts, but can be easily adapted for the use in Internet communities that have to rely on knowledge exchange using electronic communication channel.

I don’t share the author’s confidence that corpus linguistics are going to provide the level of accuracy expected.

But, I find the notion of a dynamic semantic map that grows, changes and evolves during a discussion to be intriguing.

This article was published in 2006 so I will follow up to see what later results have been reported.

Apache Tika – a content analysis toolkit

Filed under: Authoring Topic Maps,Data Mining,Software — Patrick Durusau @ 7:57 pm

Apache Tika – a content analysis toolkit

From the website:

Apache Tika™ is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

Formats include:

  • HyperText Markup Language
  • XML and derived formats
  • Microsoft Office document format
  • OpenDocument Format
  • Portable Document Format
  • Electronic Publication Format
  • Rich Text Format
  • Compression and packaging formats
  • Text formats
  • Audio formats
  • Image formats
  • Video formats
  • Java class files and archives
  • The mbox format

Sounds like we are getting close to pipelines for topic map production.

Comments?

November 27, 2010

Pattern Recognition

Filed under: Authoring Topic Maps,Pattern Recognition — Patrick Durusau @ 10:00 pm

Pattern Recognition by Robi Polikar.

Survey of pattern recognition.

Any method that augments your “recognition” of subjects in texts relies on some form of “pattern recognition.”

The suggested reading at the end of the article is very helpful.

Questions:

  1. Reports of use of any of the pattern recognition techniques in library research? (2-3 pages, citations)
  2. Pick one of the reported techniques. What type of topic map would it be used with? Why? (3-5 pages, citations)
  3. Demonstrate the use of one of the reported techniques on a data set. (project/class presentation)

November 26, 2010

Mechanical Turk and Jump Starting Topic Maps

Filed under: Authoring Topic Maps,Topic Maps — Patrick Durusau @ 9:47 am

Is anyone using the Mechanical Turk for topic map authoring purposes?

Would require breaking authoring into small tasks and perhaps capturing some information in the background.

Could be a refinement step to follow automatic data extraction or evaluation.

Use the LAMP stack for data collection.

Once an authoring framework was in place, just a question of populating it.

Would appreciate notes from anyone taking this approach to creating topic maps.

*****
Before anyone complains this would not be as precise as the brooding intellect approach to topic map authoring, yes, yes, you are right.

Just as printers rely on Danielle Steele and similar authors for their livelihood, semantic technologies, including topic maps, need to get their intellectual skirts dirty.

November 24, 2010

Text Visualization for Visual Text Analytics

Filed under: Authoring Topic Maps,Text Analytics,Visualization — Patrick Durusau @ 7:32 pm

Text Visualization for Visual Text Analytics Authors: John Risch, Anne Kao, Stephen R. Poteet and Y. J. Jason Wu

Abstract:

The term visual text analytics describes a class of information analysis techniques and processes that enable knowledge discovery via the use of interactive graphical representations of textual data. These techniques enable discovery and understanding via the recruitment of human visual pattern recognition and spatial reasoning capabilities. Visual text analytics is a subclass of visual data mining / visual analytics, which more generally encompasses analytical techniques that employ visualization of non-physically-based (or “abstract”) data of all types. Text visualization is a key component in visual text analytics. While the term “text visualization” has been used to describe a variety of methods for visualizing both structured and unstructured characteristics of text-based data, it is most closely associated with techniques for depicting the semantic characteristics of the free-text components of documents in large document collections. In contrast with text clustering techniques which serve only to partition text corpora into sets of related items, these so-called semantic mapping methods also typically strive to depict detailed inter- and intra-set similarity structure. Text analytics software typically couples semantic mapping techniques with additional visualization techniques to enable interactive comparison of semantic structure with other characteristics of the information, such as publication date or citation information. In this way, value can be derived from the material in the form of multidimensional relationship patterns existing among the discrete items in the collection. The ultimate goal of these techniques is to enable human understanding and reasoning about the contents of large and complexly related text collections.

Not the latest word in the area but a useful survey of the issues that arise in text visualization.

Text visualization is important for the creation of topic maps as well as the viewing of information discovered by use of a topic map.

Questions:

  1. Update the bibliography of this paper for the techniques discussed.
  2. Are there new text visualization techniques?
  3. How would you use the techniques in this paper or newer ones, for authoring topic maps? (3-5 pages, citations)

November 20, 2010

Associations: The Kind They Pay For

Filed under: Associations,Authoring Topic Maps,Data Mining,Data Structures — Patrick Durusau @ 4:56 pm

Fun at a Department Store: Data Mining Meets Switching Theory Author(s): Anna Bernasconi, Valentina Ciriani, Fabrizio Luccio, Linda Pagli Keywords: SOP, Implicants, Data Mining, Frequent Itemsets, Blulife

Abstract:

In this paper we introduce new algebraic forms, SOP +  and DSOP + , to represent functions f:{0,1}n → ℕ, based on arithmetic sums of products. These expressions are a direct generalization of the classical SOP and DSOP forms.

We propose optimal and heuristic algorithms for minimal SOP +  and DSOP +  synthesis. We then show how the DSOP +  form can be exploited for Data Mining applications. In particular we propose a new compact representation for the database of transactions to be used by the LCM algorithms for mining frequent closed itemsets.

A new technique for extracting associations between items present (or absent) in transactions (sales transactions).

Of interest to people with the funds to pay for data mining and topic maps.

Topic maps are useful to bind the mining of such associations to other information systems, such as supply chains.

Questions:

  1. How would you use data mining of transaction associations to guide collection development? (3-5 pages, with citations)
  2. How would you use topic maps with the mining of transaction associations? (3-5 pages, no citations)
  3. How would you bind an absence of data to other information? (3-5 pages, no citations)

Observation: Intelligence agencies recognize the absence of data as an association. Binding that absence to other date is a job for topic maps.

Subjective Logic = Effective Logic?

Capture of Evidence for Summarization: An Application of Enhanced Subjective Logic

Authors(s): Sukanya Manna, B. Sumudu U. Mendis, Tom Gedeon Keywords: subjective logic, opinions, evidence, events, summarization, information extraction

Abstract:

In this paper, we present a method to generate an extractive summary from a single document using subjective logic. The idea behind our approach is to consider words and their co-occurrences between sentences in a document as evidence of their relatedness to the contextual meaning of the document. Our aim is to formulate a measure to find out ‘opinion’ about a proposition (which is a sentence in this case) using subjective logic in a closed environment (as in a document). Stronger opinion about a sentence represents its importance and are hence considered to summarize a document. Summaries generated by our method when evaluated with human generated summaries, show that they are more similar than baseline summaries.

The authors justify their use of “subjective” logic by saying:

pointed out that a given piece of text is interpreted by different person in a different fashion especially in the way how they understand and interpret the context. Thus we see that human understanding and reasoning is subjective in nature unlike propositional logic which deals with either truth or falsity of a statement. So, to deal with this kind of situation we used subjective logic to find out sentences which are significant in the context and can be used to summarize a document.

“Subjective” logic means we are more likely to reach the same result as a person reading the text.

Search results as used and evaluated by people.

That sounds like effective logic to me.

Questions:

  1. Read the Audun Jøsang’s article Artificial Reasoning with Subjective Logic.
  2. Summarize three (3) applications (besides the article above) of “subjective” logic. (3-5 pages, citations)
  3. How do you think “subjective” logic should be modeled in topic maps? (3-5 pages, citations optional)

November 15, 2010

Analysis of Amphibian Biodiversity Data

Filed under: Authoring Topic Maps,Bioinformatics,Similarity — Patrick Durusau @ 3:14 pm

Analysis of Amphibian Biodiversity Data.

Traditional citation: Hayek, L.-A. C. 1994. Analysis of amphibian biodiversity data. Pp. 207-269. In: Measuring and monitoring miological diversity. Standard methods for amphibians. W. R. Heyer et al., eds. (Smithsonian Institution, Washington, D. C.).

Important for two reasons:

  1. it gathers together forty-six (46) similarity measures (yes, 46 of them)
  2. illustrates that reading broadly is useful in topic maps work

Questions:

  1. From Hayek, which measures would you want to use building your topic map? Why? (3-5 pages, no citations)
  2. What measures developed after Hayek would you want to use? (specific to your data) (3-5 pages, citations)
  3. Just curious, we talk about algorithms “measuring” similarity. Pick two things, books, articles, whatever that you think are “similar.” Would any of these algorithms say they were similar? (3-5 pages, no citations. Yes, it is a hard question.)

November 12, 2010

I See What You Mean

Filed under: Authoring Topic Maps,Marketing,Topic Maps — Patrick Durusau @ 6:28 pm

A recent email from Andrew S. Townley reminded me of a story I heard from my father decades ago.

Circa rural Louisiana, USA, early 1930’s. A friend had just completed a new house and asked the local plumber to come install the “commode.” When the plumber started gathering up his tool kit, the friend protested that he didn’t need to bring “all that” with him. That he had done this many times before. The plumber persisted on the grounds it was better to be prepared so he would not have to return for additional tools.

When they arrive at the new house, the plumber finds he is to install what is known to him as a “toilet.”

Repeating the term “commode” over and over again would not have helped, nor in a modern context, would having a universal URI for “commode.”

What would help, and what topic maps offer, is a representative for the subject that both “commode” and “toilet” name. A representative that contains properties that authors thought identify the subject it represents.

That enables either party to the conversation to do two very important things:

  • Search for subjects in the way most familiar to them.
  • Examining properties of the subject to see if it is the subject they were seeking.

One more important thing, if they are editing a topic map:

  • Add additional properties that identify the subject in yet another way.

Understanding what others mean in my experience has been asking the other person to explain what they mean in different ways until I finally stumble upon one when I say: “I see what you mean!”

Topic maps are a way to bring “I see what you mean” to information systems.

*****
I am glossing over representatives containing properties of all sorts, not just those that identify a subject and that which properties identify a subject are declared.

What is critical to this post is that different people identify the same subjects differently and assign them different properties.

Intellectual credit for this post goes to Michel Biezunski. Michel and I had a conversation years ago where Michel was touting the phrase: “See What I Mean” or SWIM. I think my formulation fits the story better but you decide which phrasing works best for you.

LOD, Semantic Ambiguity and Topic Maps

Filed under: Authoring Topic Maps,Linked Data,Semantic Web,Topic Maps — Patrick Durusau @ 6:23 pm

The semantic ambiguity of linked data has been a hot topic of discussion of late.

Not only of what linked data links to but of linked data itself!

If you have invested a lot in linked data efforts, don’t panic!

Topic maps, even using XTM/CTM syntaxes, to say nothing of more exotic models, can reduce any semantic ambiguity using occurrences.

If and when it is necessary.

Quite serious, “if and when necessary.”

Err, “if and when necessary” meaning when it is important enough for someone to pay for the disambiguation.

Ambiguity between buyers and sellers of women’s shoes or lingerie probably abounds, but unless someone is willing to pay the freight for disambiguation, it isn’t my concern.

Linked data is exposing the ambiguity of the Semantic Web.

Being unable to solve the semantic ambiguity it exposes, linked data is creating opportunities for topic maps!

Maybe we should send the W3C a fruit basket or something?

November 9, 2010

Summarizing Multidimensional Data Streams: A Hierarchy-Graph-Based Approach

Filed under: Authoring Topic Maps,Data Mining — Patrick Durusau @ 7:44 pm

Summarizing Multidimensional Data Streams: A Hierarchy-Graph-Based Approach Authors(s): Yoann Pitarch, Anne Laurent, Pascal Poncelet

When dealing with potentially infinite data streams, storing the whole data stream history is unfeasible and providing a high-quality summary is required. In this paper, we propose a summarization method for multidimensional data streams based on a graph structure and taking advantage of the data hierarchies. The summarization method considers the data distribution and thus overcomes a major drawback of the Tilted Time Window common framework. We adapt this structure for synthesizing frequent itemsets extracted on temporal windows. Thanks to our approach, as users do not analyze any more numerous extraction results, the result processing is improved.

As a text scholar, I would presume that all occurrences are stored.

For high speed data streams too large to store, that are read in one pass, that isn’t an option.

If terabytes of high speed data are on your topic mapping horizon, start here.

****
PS: Posts on temporal modeling with proxies to follow (but not real soon).

XML Data Repository

Filed under: Authoring Topic Maps,Dataset — Patrick Durusau @ 4:00 pm

XML Data Repository.

Data in XML format for testing augmented authoring or search tools.

Whose Logic Binds A Topic Map?

Filed under: Authoring Topic Maps,Semantic Web,TMDM,TMRM,Topic Maps — Patrick Durusau @ 7:15 am

An exchange with Lars Heuer over what the TMRM should say about “ako” and “isa” (see: A Guide to Publishing Linked Data Without Redirects brings up an important but often unspoken issue.

The current draft of the Topic Maps Reference Model (TMRM) says that subclass-superclass relationships are reflexive and transitive. Moreover, “isa” relationships, are non-reflexive and transitive.

Which is all well and good, assuming that accords with your definition of subclass-superclass and isa. The Topic Maps Data Model (TMDM) on the other hand defines “isa” as non-transitive.

Either one is a legitimate choice and I will cover the resolution of that difference elsewhere.

My point here is to ask: “Whose logic binds a topic map?”

My impression is that here and in the Semantic Web, logical frameworks are being created, into which users are supposed to fit their data.

As a user I would take serious exception to fitting my data into someone else’s world view (read logic).

That the real question isn’t it?

Whether IT/SW dictates to users the logic that will bind their data or if users get to define their own “logics?”

Given the popularity of tagging and folksonomies, user “logics” look like the better bet.

« Newer PostsOlder Posts »

Powered by WordPress