Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 27, 2010

DeepaMehta

Filed under: Subject Identity,Topic Map Software,Topic Maps — Patrick Durusau @ 2:42 pm

DeepaMehta.

From the homepage:

DeepaMehta is a software platform for Knowledge Management. Knowledge is represented in a semantic network and is handled collaboratively. DeepaMehta combines interdisciplinary research with the idea of Open Source to generate a true benefit for workflow as well as for social processes. At the same time Deepa Mehta is an indian movie director.

The DeepaMehta user interface is build according to research in Cognitive Psychology and accomodates the knowledge building process of the individual. Instead of handling information through applications, windows and files with DeepaMehta the user handles all kind of information directly and individually. DeepaMehtas user interface is completely based on Mind Maps / Concept Maps.

Not quite my choice for an interface but then I have spent too many decades with books and similar resources.

A topic map that presents like a printed page but is populated with nodes of associations offering further information would be more to my tastes.

*****
PS: Posted to me by Jack Park

Soft fuzzy rough sets for robust feature evaluation and selection

Filed under: Rough Sets,Subject Identity — Patrick Durusau @ 2:08 pm

Soft fuzzy rough sets for robust feature evaluation and selection Authors: Qinghua Hu, Shuang An and Daren Yu. Keywords: Fuzzy rough sets – Feature evaluation – Noise – Soft fuzzy rough sets – Classification learning – Feature reduction

Introduces techniques that reduce the influence of noise on fuzzy rough sets. Important in a world full of noise.

Question for the ambitious: Survey ten articles on feature reduction that don’t cite each other. Pick 2 features that were eliminated in each article. Do you agree/disagree with the evaluation of those features? Not a question of the numerical calculation but your view of the useful/not useful nature of the feature.

September 26, 2010

Do ask, do tell

Filed under: Marketing,Semantics,Subject Identity — Patrick Durusau @ 7:42 pm

Do ask, do tell: a policy for successful semantic integration.

That is, ask and allow others to tell how they identify their subjects.

It does not mean, ask and then tell others a solution, approach, etc. to identify their subjects. (Including FOL.)

Users should be enabled to know when they are talking about the same thing. Using their own vocabularies.

Teach a user to integrate information and they have learned a new skill.

Teach a user to call an expert and they have gained a new bill.

Semantic experts have enough to do without making ordinary vocabularies require expert maintenance.

The General Case

Filed under: Marketing,Subject Identifiers,Subject Identity,Topic Maps — Patrick Durusau @ 7:04 am

The SciDB project illustrates that there is no general case solution for semantic identity.

If we distinguish between IRIs as addresses versus IRIs as identifiers, IRIs are useful for some cases of semantic identity. (IRIs can be used even if you don’t make that distinction, but they are less useful.)

But can you imagine an IRI for each tuple of values in the some 15 petabytes of data annually from the Large Hadron Collider? It may be very important to identify any number of those tuples. Such as if (not when) they discover the Higgs boson.

Those tuples have semantic identity, as do subjects composed of those tuples.

Rather than seeking general solutions for all semantic identity, perhaps we should find solutions that work for particular cases.

September 22, 2010

Consultative Committee for Space Data Systems (CCSDS)

Filed under: Dataset,Space Data,Subject Identity — Patrick Durusau @ 8:15 pm

Consultative Committee for Space Data Systems (CCSDS) is a collaborative effort to create standards for space data.

Interesting because:

  1. Space exploration get funding from governments
  2. Subjects for mapping in a variety of formats, etc.

Assuming that agreement can be reached on the format for a mission, the question remains how do we integrate that data with articles, books, presentations, data from other missions or sources, and/or analysis of other data?

That agreement is reached on a format for one mission or even one set of data, is just a starting point for a more complicated conversation.

Journal of Cheminformatics

Filed under: Cheminformatics,Database,Subject Identity — Patrick Durusau @ 8:09 pm

Journal of Cheminformatics.

Journal of Cheminformatics is an open access, peer-reviewed, online journal encompassing all aspects of cheminformatics and molecular modeling including:

  • chemical information systems, software and databases, and molecular modelling
  • chemical structure representations and their use in structure, substructure, and similarity searching of chemical substance and chemical reaction databases
  • computer and molecular graphics, computer-aided molecular design, expert systems, QSAR, and data mining techniques

A good starting place for chemical subject identity issues.

September 19, 2010

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Destined to be a deeply influential resource.

Read the paper, use the application for a week Chem2Bio2RDF, then answer these questions:

  1. Choose three (3) subjects that are identified in this framework.
  2. For each subject, how is it identified in this framework?
  3. For each subject, have you seen it in another framework or system?
  4. For each subject seen in another framework/system, how was it identified there?

Extra credit: What one thing would you change about any of the identifications in this system? Why?

Subjects, Identifiers, IRI’s, Crisp Sets

Filed under: Crisp Sets,Fuzzy Sets,Rough Sets,Soft Sets,Subject Identity — Patrick Durusau @ 9:49 am

I was reading Fuzzy Sets, Uncertainty, and Information by George J. Klir and Tina A. Folger, when it occurred to me that use of IRI’s as identifiers for subjects, is by definition a “crisp set.”

Klir and Folger observe:

The crisp set is defined in such a was as to dichotomize the individuals in some given universe of discourse into two groups: members (those that certainly belong in the set) and nonmembers (those that certainly do not). A sharp, unambiguous distinction exists between the members of the class or category represented by the crisp set. (p. 3)

A subject can be assigned an IRI as an identifier, based on some set of properties.

That assignment and use as an identifier makes identification a crisp set operation.

Eliminates fuzzy, rough, soft and other non-crisp set operations, as well as other means of identification.

******
What formal characteristics of crisp sets are useful for topic maps?

Are those characteristics useful for topic map design, authoring or both?

Extra credit: Any set software you would suggest to test your answers?

September 18, 2010

Topic Map Question #1 – What Subjects/Entities Do You Want To Talk About?

Filed under: Authoring Topic Maps,Subject Identity,Topic Maps — Patrick Durusau @ 3:06 pm

The first topic map question is: “What subjects/entities do you want to talk about?”

Until that question is explored (it isn’t ever fully answered), the answers to other questions remain dangerously vague:

  • How to identify those subjects?
  • How do others identify the same subjects?
  • Are other identifications of any interest?
  • What other subjects are of interest?
  • How should those subjects be identified?
  • What relationships between subjects should be identified?
  • How should relationships between subjects be identified?
  • etc.

The responses “just use syntax X” or “use software Y” are answers to the question about subjects/entities.

Just not explicit answers.

Characteristic of the pig in a poke school of topic map design.

September 16, 2010

Data Clustering: 50 Years Beyond K-Means

Filed under: Clustering,Subject Identity — Patrick Durusau @ 4:23 am

Data Clustering: 50 Years Beyond K-Means Author: Anil K. Jain Keywords: clustering, clustering algorithms, semi-supervised clustering, ensemble clustering, simultaneous feature selection, data clustering, large scale data clustering.

Excellent survey and history of clustering.

September 14, 2010

International Journal of Approximate Reasoning – Volume 51, Issue 8, October 2010

Filed under: Data Mining,Similarity,Subject Identity — Patrick Durusau @ 3:49 am

International Journal of Approximate Reasoning – Volume 51, Issue 8, October 2010 has a couple of items of interest:

Redis Snippet for Storing the Social Graph – Post

Filed under: NoSQL,Subject Identity — Patrick Durusau @ 3:47 am

Redis Snippet for Storing the Social Graph from Alex Popescu, a snippet on storing relationships for a social graph using Redis.

Relationships are just a step away (representationally speaking) from associations. Worth a look.

September 13, 2010

Are You Going To: FIS:2010 – Projecting Subject Identity

Filed under: Conferences,Subject Identity — Patrick Durusau @ 6:01 pm

FIS:2010 (3rd Future Internet Symposium 2010), doesn’t quite have the ring of “San Francisco” but I work with conference announcements as they come in.

Projecting subject identity for subjects in linked data (or data in general) is missing from the program.

Projecting subject identity, performing comparisons and merging on those projections will power effective use of any future Internet.

Expecting a uniform data format is on par with waiting for Esperanto to become universal. You can be self-righteous or you can be effective. I suggest effective.

If you attend FIS:2010, ask the speakers about subject identity projection.

Key-Value Pairs

Filed under: NoSQL,Subject Identity,TMRM — Patrick Durusau @ 7:33 am

The Topic Map Reference Model can’t claim to have invented the key/value view of the world.

But it is interesting how much traction key/value pair approaches have been getting of late. From NoSQL in general to Neo4j and Redis in particular. (no offense to other NoSQL contenders, those are the two that came to mind)

Declare which key/value pairs identify a subject and you are on your way towards a subject-centric view of computing.

OK, there are some details but declaring how you identify a subject is the first step in enabling others to reliably identify the same subject.

September 12, 2010

LNCS Volume 6304: Artificial Intelligence: Methodology, Systems, and Applications

Filed under: Classification,Ontology,Searching,Subject Identity — Patrick Durusau @ 6:48 pm

LNCS Volume 6304: Artificial Intelligence: Methodology, Systems, and Applications edited by Darina Dicheva, Danail Dochev, has, among other interesting titles, the following:

September 9, 2010

Calibrated Leakage?

Filed under: Data Mining,Examples,Subject Identity,Topic Maps — Patrick Durusau @ 6:36 pm

Unlike leaks from a faucet, only some leaks from the Obama Whitehouse annoy the administration.

All administrations approve of their “leaks” and dislike unfavorable “leaks.” In either case, it is an information mapping issue.

First, people who have access to particular documents or facts become topics. Their known associates, from FBI background checks, Facebook pages, etc., also become topics. Form associations between them.

Second, phone traffic and visitor/day book log entries become topics and build associations with Whitehouse staff and their friends.

Third, documents with high likelihood to have “leakable” stories or facts, are topics with timed associations as they fan out across the staff.

Fourth, “leaks” in the media, particularly by time of the disclosure, are captured as topics as well as who reported it, etc.

No magic, just automating and making correlations between information and records that already exist in disparate forms.

A topic map enables estimates of how effective approved “leaks” are propagating or investigation of the sources of unapproved “leaks.”

Topic maps: calibrating leakage.

******
PS: There are defenses to highly correlated data gathering/analysis. Please inquire.

Is Glutonny the Answer to Glut?

Filed under: Subject Identity,Topic Maps — Patrick Durusau @ 6:24 pm

One of the reprises about topic maps is that all the information about a subject can be gathered to a single location..

I am already suffering from information glut! A gluttonous topic maps solution is going to find more information about my subjects? That I didn’t even know existed? No thanks!

But: topic maps can gather all user-defined relevant information about a subject to a single location.

Using subject identity topic maps can reliably filter out irrelevant, repetitive or even useless information about a subject. Which leaves you with a modest and digestible amount of information.

Topic maps: Putting information glut on a diet!

High-Performance Dynamic Pattern Matching over Disordered Streams

Filed under: Data Integration,Data Mining,Pattern Recognition,Subject Identity,Topic Maps — Patrick Durusau @ 4:12 pm

High-Performance Dynamic Pattern Matching over Disordered Streams by Badrish Chandramouli, Jonathan Goldstein, and David Maier came to me by way of Jack Park.

From the abstract:

Current pattern-detection proposals for streaming data recognize the need to move beyond a simple regular-expression model over strictly ordered input. We continue in this direction, relaxing restrictions present in some models, removing the requirement for ordered input, and permitting stream revisions (modification of prior events). Further, recognizing that patterns of interest in modern applications may change frequently over the lifetime of a query, we support updating of a pattern specification without blocking input or restarting the operator.

In case you missed it, this is related to: Experience in Extending Query Engine for Continuous Analytics.

The algorithmic trading use case in this article made me think of Nikita Ogievetsky. For those of you who do not know Nikita, he is an XSLT/topic map maven, currently working in the finance industry.

Do trading interfaces allow user definition of subjects to be identified in data streams? And/or merged with subjects identified in other data streams? Or is that an upgrade from the basic service?

September 8, 2010

Performativity and Topic Maps

Filed under: Mapping,Maps,Subject Identity,Topic Maps — Patrick Durusau @ 8:50 am

Parsing Performativity came to me by way of Sam Hunting.

Read the post, then ask yourself: Is my topic map wearing a path from point A to point B?

Perhaps performativity should be a measure of a topic map’s success?

September 7, 2010

Domains of Discourse, Identification (and Mapping)

Filed under: Subject Identity,Topic Maps — Patrick Durusau @ 6:09 am

Topic maps rhetoric has long maintained that different domains of discourse may have different ways to identify some single subject.

Topic maps provide the means to map between those different identifications, to provide a collocation point for all the information about such a subject.

If different domains can have different ways to identify the same subject, doesn’t it stand to reason that they can also have different ways to map subject identifications from other domains?

Some of them may lack any concept of mapping to/from foreign identifications. Identifications are expressed in a given vocabulary or not at all. Other will have a variety of concepts of mapping, some broader, some narrower.

Understanding subject identifications in various domains as well as their concepts of mapping between domains, will only improve our promotion of topic maps.

September 5, 2010

“Linguistic terms do not hold exact meaning….”

Filed under: Data Integration,Fuzzy Sets,Information Retrieval,Subject Identity — Patrick Durusau @ 10:36 am

In some background research I ran across:

One of the most important applications of fuzzy set theory is the concept of linguistic variables. A linguistic variable is a variable whose values are not numbers, but words or sentences in a natural or artificial language. The value of a linguistic variable is defined as an element of its term set? a predefined set of appropriate linguistic terms. Linguistic terms are essentially subjective categories for a linguistic variable.

Linguistic terms do not hold exact meaning, however, and may be understood differently by different people. The boundaries of a given term are rather subjective, and may also depend on the situation. Linguistic terms therefore cannot be expressed by ordinary set theory; rather, each linguistic term is associated with a fuzzy set. (“Soft sets and soft groups,” by Haci Akta? and Naim Ça?man, Information Sciences, Volume 177, Issue 13, 1 July 2007, Pages 2726-2735

Fuzzy sets are yet another useful approach that has recognized linguistic uncertainty as an issue and developed mechanisms to address it.

What is “linguistic uncertainty” if it isn’t a question of “subject identity?”

Fuzzy sets have developed another way to answer questions about subject identity.

As topic maps mature I want to see the development of equivalences between approaches to subject identity.

Imagine a topic map system consisting of a medical scanning system that is identifying “subjects” in cultures using rough sets, with equivalences to “subjects” identified in published literature using fuzzy sets, that is refined by “subjects” from user contributions and interactions using PSIs or other mechanisms. (Or other mechanisms, past, present or future.)

September 3, 2010

Making Wikileaks Effective

Filed under: Information Retrieval,Marketing,Subject Identity,Topic Maps — Patrick Durusau @ 7:57 pm

Wikileaks has captured the headlines with the release of Afghan War Diary, 2004-2010.

I haven’t looked at the documents but document collections present the same issues for effective use.

First, document semantics vary depending upon whether they are being read by their intended audience, another military command or other audience. For example, locations may be identified by unfamiliar terms.

Second, and nearly as important, what if one analyst bridges the different semantics and identifies a location? How do they map it to their semantic and communicate that fact to others?

Could pass around a sticky note. Put it on a blackboard. Write it up in a multi-page report.

Topic maps are an effective means to navigate data and multiple interpretations of it, not to mention integrating other data you may have on hand.

Topic maps don’t constrain what subjects you can identify in advance, the basis on which you identify them, and can quickly share discoveries with others.

Wikileaks can be annoying. Topic maps can make Wikileaks effective. There’s a difference.

September 1, 2010

Structural, Syntactic, and Statistical Pattern Recognition

Filed under: Pattern Recognition,Subject Identity — Patrick Durusau @ 7:22 pm

Structural, Syntactic, and Statistical Pattern Recognition (Joint IAPR International Workshop, SSPR&SPR 2010, Cesme, Izmir, Turkey, August 18-20, 2010. Proceedings) edited by: Edwin R. Hancock, Richard C. Wilson, Terry Windeatt, Ilkay Ulusoy, and, Francisco Escolano.

Pattern recognition is a first step towards assisting users in the subject recognition process that results in a topic map.

Content-Based Tile Retrieval System by Pavel Vácha and Michal Haindl, surprised me because it was about matching colors/patterns on ceramic tiles.

I was expecting a paper on tiling of an identity plane but it was just as delightful. Anyone who has ever shopped for paint or tile, particularly with one’s spouse, ;-), can understand the importance of color/pattern matching.

This paper is a good illustration of how pattern matching can be used to assist users, albeit, not in a topic map context. Its application to the construction of a topic map would be just one step further.

Developers of topic map applications targeting real world data will find a number of insights and techniques in this collection of papers.

August 31, 2010

One of These Things

One of These Things could be a theme song for topic maps.

It is also a good idea for a topic map authoring interface.

Say you get ten (10) “hits” back from a search. Add a “checkbox” to each “hit.” Unchecked means same as other unchecked “hits.” Checked means different from the unchecked “hits.”

The “same subject” judgment becomes a collective one of all the users of the search interface. Different “hits” are going to be unchecked in any search return.

Semantic input = Human input.

August 29, 2010

Journal of Artificial Intelligence Research – Journal

Filed under: Data Integration,Merging,Subject Identity — Patrick Durusau @ 7:23 pm

Journal of Artificial Intelligence Research is one of the oldest electronic journals on the Internet, not to mention that it offers free access to all its contents.

While some of the articles have titles like “The Strategy-Proofness Landscape of Merging”, P. Everaere, S. Konieczny and P. Marquis (2007), Volume 28, pages 49-105, they raise issues that sophisticated topic mappers will need to be able to discuss intelligently with data analysts.

Information Fusion – Journal

Filed under: Data Integration,Merging,Subject Identity — Patrick Durusau @ 6:59 pm

Information Fusion covers a number of areas of direct interest to topic map researchers and developers. An incomplete list includes:

  • Fusion Learning In Imperfect, Imprecise And Incomplete Environments
  • Intelligent Techniques For Fusion Processing
  • Fusion System Design And Algorithmic Issues
  • Fusion System Computational Resources and Demands Optimization
  • Special Purpose Hardware Dedicated To Fusion Applications

If you are considering this as a publication venue, consider their “open access” (quotes are theirs) before making that choice.

August 28, 2010

Annotated Computer Vision Bibliography

Filed under: Merging,Searching,Subject Identity — Patrick Durusau @ 5:33 am

Annotated Computer Vision Bibliography in its 17th year on the Internet!

Relevant to topic maps, among other reasons:

  1. Users visually distinguishing subjects in topic map use/authoring
  2. Pattern recognition, clustering, related techniques (chapter 14)
  3. Subject recognition of various types

Suggestions of specific articles of interest to topic mappers greatly appreciated!

August 27, 2010

A Comparison of Merging Operators in Possibilistic Logic

Filed under: Mapping,Merging,Subject Identity — Patrick Durusau @ 7:26 am

A Comparison of Merging Operators in Possibilistic Logic by Guilin Qi, Weiru Liu and David Bell has topic maps written all over it doesn’t it?

The article is not yet available on my university server but I will keep a watch for it and will report back when I have more details. The author links are to their DBLP records.

Try the following searches on “merging operators” in DBLP and CiteSeerX:

******
Update: 28 August 2010

A Comparison of Merging Operators in Possibilistic Logic (another source for the paper) More comments to follow.

******

Update: 28 August 2010

Qi’s PhD thesis (2006) FUSION OF UNCERTAIN INFORMATION IN THE FRAMEWORK OF POSSIBILISTIC LOGIC starts with:

Possibilistic logic provides a good framework for dealing with merging problems when information is pervaded with uncertainty and inconsistency. Many merging operators in possibilistic logic have been proposed. However, there are still some important problems left unsolved.

Makes me curious about the “Many merging operators….” No promises of when but it would be interesting to start a list of those both within and without possibilistic logic.

August 25, 2010

Murray – Presentation History

Filed under: Graphs,Information Retrieval,Subject Identity — Patrick Durusau @ 3:36 pm

Ronald Murray forwarded a Presentation History that clarifies some of the issues raised in Ethnomathematics Doodles.

Please use “Presentation History” instead of “Ethnomathematics Doodles” on its own.

August 24, 2010

Ethnomathematics Doodles

Filed under: Graphs,Information Retrieval,Subject Identity — Patrick Durusau @ 7:29 pm

Ethnomathematics Doodles came by way of Ronald Murray, whose presentation, Moby-Dick to Mashups, was mentioned here not all that long ago.

BTW, Ron has placed the slides from that presentation up on Slideshare.net and is seeking comments on them.

« Newer PostsOlder Posts »

Powered by WordPress