September « 2010 « Another Word For It

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 20, 2010

Catalogue & Index Blog

Filed under: Cataloging,Indexing — Patrick Durusau @ 2:40 pm

Catalogue & Index Blog.

Blog from the Chartered Institute of Library and Information Professionals (CILIP) Cataloging and Indexing Group.

News about cataloging, indexing and Cataloging and Indexing Group activities.

Comments Off

September 19, 2010

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Filed under: Biomedical,Data Integration,Dataset,Health care,Interface Research/Design,RDF,SPARQL,Subject Identity,Uncategorized — Patrick Durusau @ 9:57 am

Chem2Bio2RDF: a semantic framework for linking and data mining chemogenomic and systems chemical biology data

Destined to be a deeply influential resource.

Read the paper, use the application for a week Chem2Bio2RDF, then answer these questions:

Choose three (3) subjects that are identified in this framework.
For each subject, how is it identified in this framework?
For each subject, have you seen it in another framework or system?
For each subject seen in another framework/system, how was it identified there?

Extra credit: What one thing would you change about any of the identifications in this system? Why?

Comments Off

Subjects, Identifiers, IRI’s, Crisp Sets

Filed under: Crisp Sets,Fuzzy Sets,Rough Sets,Soft Sets,Subject Identity — Patrick Durusau @ 9:49 am

I was reading Fuzzy Sets, Uncertainty, and Information by George J. Klir and Tina A. Folger, when it occurred to me that use of IRI’s as identifiers for subjects, is by definition a “crisp set.”

Klir and Folger observe:

The crisp set is defined in such a was as to dichotomize the individuals in some given universe of discourse into two groups: members (those that certainly belong in the set) and nonmembers (those that certainly do not). A sharp, unambiguous distinction exists between the members of the class or category represented by the crisp set. (p. 3)

A subject can be assigned an IRI as an identifier, based on some set of properties.

That assignment and use as an identifier makes identification a crisp set operation.

Eliminates fuzzy, rough, soft and other non-crisp set operations, as well as other means of identification.

******
What formal characteristics of crisp sets are useful for topic maps?

Are those characteristics useful for topic map design, authoring or both?

Extra credit: Any set software you would suggest to test your answers?

Comments Off

Oh My Gosh, What Happened To Paraguay? And China, You Are So Big!

Filed under: Mapping,Maps,Topic Maps,Visualization — Patrick Durusau @ 9:16 am

Oh My Gosh, What Happened To Paraguay? And China, You Are So Big!.

Robert Krulwich’s (NPR blog) coverage of inventive use of maps to display world data.

Worldmapper, the source of the maps in this story, has over 700 maps for viewing.

The globe and countries provide a framework within which facts are displayed.

******
Choose a topic map and describe a framework for displaying information in that map.

Is your framework different from individual subjects in the topic map?

If so, in what way? More importantly, what goal(s) does that framework further?

Should those goals be subjects?

Comments Off

September 18, 2010

Topic Map Question #1 – What Subjects/Entities Do You Want To Talk About?

Filed under: Authoring Topic Maps,Subject Identity,Topic Maps — Patrick Durusau @ 3:06 pm

The first topic map question is: “What subjects/entities do you want to talk about?”

Until that question is explored (it isn’t ever fully answered), the answers to other questions remain dangerously vague:

How to identify those subjects?
How do others identify the same subjects?
Are other identifications of any interest?
What other subjects are of interest?
How should those subjects be identified?
What relationships between subjects should be identified?
How should relationships between subjects be identified?
etc.

The responses “just use syntax X” or “use software Y” are answers to the question about subjects/entities.

Just not explicit answers.

Characteristic of the pig in a poke school of topic map design.

Comments Off

The TV-tree — an index structure for high-dimensional data (1994)

Filed under: Feature Spaces,High Dimensionality,R-Trees,Similarity,Spatial Index — Patrick Durusau @ 8:05 am

The TV-tree — an index structure for high-dimensional data (1994) Authors: King-ip Lin , H. V. Jagadish , Christos Faloutsos Keywords:Spatial Index, Similarity Retrieval, Query by Context, R*-Tree, High-Dimensionality Feature Spaces.

Abstract:

We propose a file structure to index high-dimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length’ feature vectors. Finally we report simulation results, comparing the proposed structure with the R -tree, which is one of the most successful methods for low-dimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses.

The notion of “…utilizing additional features whenever the additional discriminatory power is absolutely necessary…” is an important one.

Compare to fixed simplistic discrimination and/or fixed complex, high-overhead, discrimination between subject representatives.

Either one represents a failure of imagination.

Comments Off

TREC 2010/2011

Filed under: Conferences,Heterogeneous Data,Information Retrieval,Searching,Software — Patrick Durusau @ 7:34 am

It’s too late to become a participant in TREC 2010 but everyone interested in building topic maps should be aware of this conference.

The seven tracks for this year are blog, chemical IR, entity, legal, relevance feedback, “session,” and web.

Prior TREC conferences are online, along with a host of other materials, at the Text REtrieval Conference (TREC) site.

The 2011 cycle isn’t that far away so consider being a participant next year.

Comments Off

SimMetrics

Filed under: Binary Distance,Binary Similarity,Software — Patrick Durusau @ 7:29 am

SimMetrics. An extensible Java library of thirty (30) distance or similarity measures.

Comments Off

September 17, 2010

A Logical Account of Lying

Filed under: Classification,Indexing,Subject Identifiers — Patrick Durusau @ 2:46 pm

A Logical Account of Lying Authors:Chiaki Sakama, Martin Caminada and Andreas Herzig Keywords: lying, lies, argumentation systems, artificial intelligence, multiagent systems, intelligent agents.

Abstract:

This paper aims at providing a formal account of lying – a dishonest attitude of human beings. We first formulate lying under propositional modal logic and present basic properties for it. We then investigate why one engages in lying and how one reasons about lying. We distinguish between offensive and defensive lies, or deductive and abductive lies, based on intention behind the act. We also study two weak forms of dishonesty, bullshit and deception, and provide their logical features in contrast to lying. We finally argue dishonesty postulates that agents should try to satisfy for both moral and self-interested reasons. (emphasis in original)

Be the first to have your topic map distinguish between:

offensive lies
defensive lies
deductive lies
abductive lies (Someone tweet John Sowa please.)
deception
bullshit

Subj3ct.com has an identifier for the subject “bullshit,” http://dbpedia.org/resource/Bullshit, but it does not reflect this latest analysis.

Comments Off

Planet Cataloging

Filed under: Access Points,Authority Record,Cataloging,Classification — Patrick Durusau @ 5:01 am

Planet Cataloging

Aggregation of > 60 blogs on cataloging.

Read to improve your topic mapping (and cataloging) skills.

Comments Off

Tutorial: Getting Started With Cassandra – Post

Filed under: Cassandra,NoSQL — Patrick Durusau @ 4:42 am

Tutorial: Getting Started With Cassandra via Alex Popescu.

Jack Park says I should read about super columns and key/value pairs in Cassandra. This looks like a good starting place.

Comments Off

New Approach for Automated Categorizing and Finding Similarities in Online Persian News

Filed under: Clustering,Similarity — Patrick Durusau @ 4:39 am

New Approach for Automated Categorizing and Finding Similarities in Online Persian News Authors: Naser Ezzati Jivan, Mahlagha Fazeli and Khadije Sadat Yousefi Keywords: Categorization of web pages – category – automatic categorization of Persian news – feature – similarity – clustering – structure of web pages.

Abstract:

The Web is a great source of information where data are stored in different formats, e.g., web-pages, archive files and images. Algorithms and tools which automatically categorize web-pages have wide applications in real-life situations. A web-site which collects news from different sources can be an example of such situations. In this paper, an algorithm for categorizing news is proposed. The proposed approach is specialized to work with documents (news) written in the Persian language but it can be easily generalized to work with documents in other languages, too. There is no standard test-bench or measure to evaluate the performance of this kind of algorithms as the amount of similarity between two documents (news) is not well-defined. To test the performance of the proposed algorithm, we implemented a web-site which uses the proposed approach to find similar news. Some of the similar news items found by the algorithm has been reported.

Similarity: The first step towards subject identification.

Comments Off

September 16, 2010

LISA ’10 Uncovering the Secrets of System Administration – Nov. 7-12 – San Jose

Filed under: Conferences — Patrick Durusau @ 12:43 pm

LISA ’10 Uncovering the Secrets of System Administration – Nov. 7-12 – San Jose

Attend for:

Papers/Tutorials on all aspects of Unix administration
Training on the latest tools
Discovering how topic maps can assist sysadmins

All Usenix conference proceedings. (free)

Comments Off

Almost A Topic Map

Filed under: Cataloging,Classification,Topic Maps — Patrick Durusau @ 4:35 am

Ann Arbor District Library, a very cool library that has added a topic map like characteristic to its catalog.

User tags are stored separately but displayed alongside the controlled vocabulary of the library.

Some subject identifications are more equal than others.

A legitimate choice that enhances both the formal vocabulary as well as the user supplied “tags.”

One small step towards topic maps, ….

*****
Supplemental: 17 September 2010

More that one reader reported that my post was unclear. Here is a bit fuller explanation.

Follow the link Catalog. Next to the search catalog text book you will see a drop down menu. Select that and see “Tags” as one of the options. Those “tags” are supplied by users of the catalog. In other words, you can search by the controlled vocabulary of the library or by user tags. Both are associated with particular items in the collection.

Comments (2)

Data Clustering: 50 Years Beyond K-Means

Filed under: Clustering,Subject Identity — Patrick Durusau @ 4:23 am

Data Clustering: 50 Years Beyond K-Means Author: Anil K. Jain Keywords: clustering, clustering algorithms, semi-supervised clustering, ensemble clustering, simultaneous feature selection, data clustering, large scale data clustering.

Excellent survey and history of clustering.

Comments Off

UCI Machine Learning Datasets

Filed under: Authoring Topic Maps,Dataset,Interface Research/Design — Patrick Durusau @ 4:12 am

UCI Machine Learning Datasets Collection of 194 datasets (as of 2010/09/14) for machine learning.

Re-purpose to develop/test interfaces to assist in authoring topic maps.

Comments Off

September 15, 2010

Taxonomy for Characterizing Ensemble Methods in Classification Tasks

Filed under: Authoring Topic Maps,Classification,Ensemble Methods — Patrick Durusau @ 8:11 am

Taxonomy for Characterizing Ensemble Methods in Classification Tasks Author: Lior Rokach Keywords: Ensemble-methods; Classification; Boosting; Bagging; Partitioning; Decision trees; Neural networks. Review and annotated bibliography of work on ensemble methods.

Ensemble methods, I like the sound of that.

Extend it to mean human authors + other methods creating a topic map.

Comments Off

International Association for Cryptological Research

Filed under: Cryptography,Security,Topic Map Software,Topic Maps — Patrick Durusau @ 6:01 am

International Association for Cryptologic Research

Hosts conference proceedings, ePrint Archive, CryptoDB, and other goodies. Membership details for IACR.

Topic map applications need to offer features such as:

secure communications to and from topic maps.
secure and verified data for merging into topic maps.
capability to merge parts of separately held topic maps without disclosing the basis for merging.*
etc.

*(Important for a range of defense and security applications.)

Comments Off

1st ACM International Health Informatics Symposium – November 11-12, 2010

Filed under: Biomedical,Conferences,Health care — Patrick Durusau @ 5:48 am

1st ACM International Health Informatics Symposium – November 11-12, 2010.

Interesting presentations:

The Effect of Different Context Representations on Word Sense Discrimination in Biomedical Texts
An evaluation of feature sets and sampling techniques for de-identification of medical records
Federated Querying Architecture for Clinical & Translational Health IT
Contextualizing consumer health information searching: an analysis of questions in a social Q&A community

Will watch for the call for papers for next year. Would be nice to have a topic map paper or two on the program.

Comments Off

September 14, 2010

Towards a Principled Theory of Clustering

Filed under: Clustering — Patrick Durusau @ 4:06 am

Towards a Principled Theory of Clustering Author:Reza Bosagh Zadeh Keywords: Clustering functions, Single-Linkage, Max-Sum, Minimum/Maximum Spanning Trees, Effective Similarity.

Exploration of methods to characterize clustering algorithms “…in terms of the effective similarity between two points.” A line of research that may make choice of clustering algorithms less arbitrary.

Comments Off

International Journal of Approximate Reasoning – Volume 51, Issue 8, October 2010

Filed under: Data Mining,Similarity,Subject Identity — Patrick Durusau @ 3:49 am

International Journal of Approximate Reasoning – Volume 51, Issue 8, October 2010 has a couple of items of interest:

Heuristic algorithm for interpretation of multi-valued attributes in similarity-based fuzzy relational databases Author(s):Rafal A. Angryk, Jacek Czerniak Keywords: Similarity-based fuzzy relational databases; Multi-valued entries; Taxonomic symbolic attributes; Fuzzy similarity relation; Data mining.
Aggregating multiple classification results using fuzzy integration and stochastic feature selection Author(s):Nick J. Pizzi, Witold Pedrycz Keywords: Data classification; Fuzzy sets; Pattern recognition; Fuzzy integrals; Feature selection; Computational intelligence.

Comments Off

Redis Snippet for Storing the Social Graph – Post

Filed under: NoSQL,Subject Identity — Patrick Durusau @ 3:47 am

Redis Snippet for Storing the Social Graph from Alex Popescu, a snippet on storing relationships for a social graph using Redis.

Relationships are just a step away (representationally speaking) from associations. Worth a look.

Comments Off

September 13, 2010

A million answers to twenty questions: choosing by checklist

Filed under: Information Retrieval,Interface Research/Design,Topic Map Software — Patrick Durusau @ 6:09 pm

A million answers to twenty questions: choosing by checklist Authors: Michael Mandler , Paola Manzini , Marco Mariotti, Keywords: Bounded rationality, utility maximization, choice function, lexicographic utility.

Mentions:

Checklist users can in effect perform a binary search, which makes the number of preference discriminations they make an exponential function of the number of properties that they use. As a result, an agent who makes a 1,000,000 preference discriminations needs a checklist that is just 20 properties long.

Substitute “identity” for “preference.”

How many discriminations are necessary to identify a subject?
Does the order of discrimination matter?
What properties discriminate more than others?
Do the answers to 1-3 vary by domain? If so, in what way?

Empirical question, unlike ontologies, classifications, cataloging, the answers come from users.

Comments Off

Are You Going To: FIS:2010 – Projecting Subject Identity

Filed under: Conferences,Subject Identity — Patrick Durusau @ 6:01 pm

FIS:2010 (3rd Future Internet Symposium 2010), doesn’t quite have the ring of “San Francisco” but I work with conference announcements as they come in.

Projecting subject identity for subjects in linked data (or data in general) is missing from the program.

Projecting subject identity, performing comparisons and merging on those projections will power effective use of any future Internet.

Expecting a uniform data format is on par with waiting for Esperanto to become universal. You can be self-righteous or you can be effective. I suggest effective.

If you attend FIS:2010, ask the speakers about subject identity projection.

Comments Off

Key-Value Pairs

Filed under: NoSQL,Subject Identity,TMRM — Patrick Durusau @ 7:33 am

The Topic Map Reference Model can’t claim to have invented the key/value view of the world.

But it is interesting how much traction key/value pair approaches have been getting of late. From NoSQL in general to Neo4j and Redis in particular. (no offense to other NoSQL contenders, those are the two that came to mind)

Declare which key/value pairs identify a subject and you are on your way towards a subject-centric view of computing.

OK, there are some details but declaring how you identify a subject is the first step in enabling others to reliably identify the same subject.

Comments Off

September 12, 2010

Gaming for Topic Maps?

Filed under: Authoring Topic Maps,Interface Research/Design,Topic Map Software,Topic Maps,Usability — Patrick Durusau @ 6:57 pm

Gaming for a Cure: Computer Gamers Tackle Protein Folding describes how over 57,000 “players” bested supercomputers:

Analysis shows that players bested the computers on problems that required radical moves, risks and long-term vision — the kinds of qualities that computers do not possess.

Distributed human contribution to massive information projects is a proven fact. (The reading programme of the OED is an earlier example.)

Can you make mapping large data sets into an interesting game?

For some clues, see: Foldit.

Comments Off

Cartesian Products and Topic Maps

Filed under: TMQL,TMRM — Patrick Durusau @ 6:50 pm

Using SQL Cross Join – the report writers secret weapon is a very clear explanation of the utility of cross-joins in SQL.

Cross-join = Cartesian product, something you will remember from the Topic Maps Reference Model.

Makes a robust where clause look important doesn’t it?

Comments Off

LNCS Volume 6304: Artificial Intelligence: Methodology, Systems, and Applications

Filed under: Classification,Ontology,Searching,Subject Identity — Patrick Durusau @ 6:48 pm

LNCS Volume 6304: Artificial Intelligence: Methodology, Systems, and Applications edited by Darina Dicheva, Danail Dochev, has, among other interesting titles, the following:

Cross-Language Personalization through a Semantic Content-Based Recommender System Author(s): Pasquale Lops, Cataldo Musto, Fedelucio Narducci, Marco Gemmis, Pierpaolo Basile, Giovanni Semeraro Keywords: Cross-language Recommender System – Content-based Recommender System – Word Sense Disambiguation – MultiWordNet
Term Ranking and Categorization for Ad-Hoc Navigation Author(s): Ondrej Ševce, Jozef Tvarožek, Mária Bieliková Keywords: term – category – navigation – conceptual user profile
A Bayesian Model for Entity Type Disambiguation Author(s): Barbara Bazzanella, Heiko Stoermer, Paolo Bouquet Keywords: Entity Type Disambiguation – Entity Type – Naive Bayesian Model – Entity-Centric – Query
Towards Ontological Blending Author(s): Joana Hois, Oliver Kutz, Till Mossakowski, John Bateman Keywords: Ontologies – Creativity in AI – Blending – Algebraic Semiotics – Blendoid – Conceptual Blending – Conceptual Spaces – Integrating Ontologies
A Meta Learning Approach: Classification by Cluster Analysis Author(s): Anna Jurek, Yaxin Bi, Shengli Wu, Chris Nugent Keywords: Combining Classifiers – Stacking – Clustering – Meta-Learning
Mapping Data Driven and Upper Level Ontology Author(s): Mariana Damova, Svetoslav Petrov, Kiril Simov Keywords: Semantic Web – Linking Open Data – Upper Ontology – FactForge – Ontology Mapping – Linked Data – Inference

Comments Off

September 11, 2010

Informatics – University of Michigan

Filed under: Degree Program — Tags: Programs — Patrick Durusau @ 8:17 am

Informatics – University of Michigan is described as:

Informatics combines solid grounding in computer programming, mathematics, and statistics, combined with study of the ethical and social science aspects of complex information systems. Informatics majors learn to critically analyze various approaches to processing information and develop skills to design, implement, and evaluate the next generation of information technology tools.

Sounds like a good place to look for potential topic map authors and/or to promote the use of topic maps!

Comments Off

76 Binary Smilarity and Distance Measures

Filed under: Binary Distance,Binary Similarity,Classification,Pattern Recognition — Patrick Durusau @ 5:53 am

A Survey of Binary Similarity and Distance Measures Authors: Seung-Seok Choi, Sung-Hyuk Cha, Charles C. Tappert Keywords: binary similarity measure, binary distance measure, hierarchical clustering, classification, operational taxonomic unit. (Journal of Systemics, Cybernetics and Informatics, Vol. 8, No. 1, pp. 43-48, 2010)

Comments Off

« Newer Posts — Older Posts »