Blog from the Chartered Institute of Library and Information Professionals (CILIP) Cataloging and Indexing Group.
News about cataloging, indexing and Cataloging and Indexing Group activities.
Blog from the Chartered Institute of Library and Information Professionals (CILIP) Cataloging and Indexing Group.
News about cataloging, indexing and Cataloging and Indexing Group activities.
Destined to be a deeply influential resource.
Read the paper, use the application for a week Chem2Bio2RDF, then answer these questions:
Extra credit: What one thing would you change about any of the identifications in this system? Why?
I was reading Fuzzy Sets, Uncertainty, and Information by George J. Klir and Tina A. Folger, when it occurred to me that use of IRI’s as identifiers for subjects, is by definition a “crisp set.”
Klir and Folger observe:
The crisp set is defined in such a was as to dichotomize the individuals in some given universe of discourse into two groups: members (those that certainly belong in the set) and nonmembers (those that certainly do not). A sharp, unambiguous distinction exists between the members of the class or category represented by the crisp set. (p. 3)
A subject can be assigned an IRI as an identifier, based on some set of properties.
That assignment and use as an identifier makes identification a crisp set operation.
Eliminates fuzzy, rough, soft and other non-crisp set operations, as well as other means of identification.
******
What formal characteristics of crisp sets are useful for topic maps?
Are those characteristics useful for topic map design, authoring or both?
Extra credit: Any set software you would suggest to test your answers?
Oh My Gosh, What Happened To Paraguay? And China, You Are So Big!.
Robert Krulwich’s (NPR blog) coverage of inventive use of maps to display world data.
Worldmapper, the source of the maps in this story, has over 700 maps for viewing.
The globe and countries provide a framework within which facts are displayed.
******
Choose a topic map and describe a framework for displaying information in that map.
Is your framework different from individual subjects in the topic map?
If so, in what way? More importantly, what goal(s) does that framework further?
Should those goals be subjects?
The TV-tree — an index structure for high-dimensional data (1994) Authors: King-ip Lin , H. V. Jagadish , Christos Faloutsos Keywords:Spatial Index, Similarity Retrieval, Query by Context, R*-Tree, High-Dimensionality Feature Spaces.
Abstract:
We propose a file structure to index high-dimensionality data, typically, points in some feature space. The idea is to use only a few of the features, utilizing additional features whenever the additional discriminatory power is absolutely necessary. We present in detail the design of our tree structure and the associated algorithms that handle such `varying length’ feature vectors. Finally we report simulation results, comparing the proposed structure with the R -tree, which is one of the most successful methods for low-dimensionality spaces. The results illustrate the superiority of our method, with up to 80% savings in disk accesses.
The notion of “…utilizing additional features whenever the additional discriminatory power is absolutely necessary…” is an important one.
Compare to fixed simplistic discrimination and/or fixed complex, high-overhead, discrimination between subject representatives.
Either one represents a failure of imagination.
It’s too late to become a participant in TREC 2010 but everyone interested in building topic maps should be aware of this conference.
The seven tracks for this year are blog, chemical IR, entity, legal, relevance feedback, “session,” and web.
Prior TREC conferences are online, along with a host of other materials, at the Text REtrieval Conference (TREC) site.
The 2011 cycle isn’t that far away so consider being a participant next year.
SimMetrics. An extensible Java library of thirty (30) distance or similarity measures.
A Logical Account of Lying Authors:Chiaki Sakama, Martin Caminada and Andreas Herzig Keywords: lying, lies, argumentation systems, artificial intelligence, multiagent systems, intelligent agents.
Abstract:
This paper aims at providing a formal account of lying – a dishonest attitude of human beings. We first formulate lying under propositional modal logic and present basic properties for it. We then investigate why one engages in lying and how one reasons about lying. We distinguish between offensive and defensive lies, or deductive and abductive lies, based on intention behind the act. We also study two weak forms of dishonesty, bullshit and deception, and provide their logical features in contrast to lying. We finally argue dishonesty postulates that agents should try to satisfy for both moral and self-interested reasons. (emphasis in original)
Be the first to have your topic map distinguish between:
Subj3ct.com has an identifier for the subject “bullshit,” http://dbpedia.org/resource/Bullshit, but it does not reflect this latest analysis.
Tutorial: Getting Started With Cassandra via Alex Popescu.
Jack Park says I should read about super columns and key/value pairs in Cassandra. This looks like a good starting place.
New Approach for Automated Categorizing and Finding Similarities in Online Persian News Authors: Naser Ezzati Jivan, Mahlagha Fazeli and Khadije Sadat Yousefi Keywords: Categorization of web pages – category – automatic categorization of Persian news – feature – similarity – clustering – structure of web pages.
Abstract:
The Web is a great source of information where data are stored in different formats, e.g., web-pages, archive files and images. Algorithms and tools which automatically categorize web-pages have wide applications in real-life situations. A web-site which collects news from different sources can be an example of such situations. In this paper, an algorithm for categorizing news is proposed. The proposed approach is specialized to work with documents (news) written in the Persian language but it can be easily generalized to work with documents in other languages, too. There is no standard test-bench or measure to evaluate the performance of this kind of algorithms as the amount of similarity between two documents (news) is not well-defined. To test the performance of the proposed algorithm, we implemented a web-site which uses the proposed approach to find similar news. Some of the similar news items found by the algorithm has been reported.
Similarity: The first step towards subject identification.
LISA ’10 Uncovering the Secrets of System Administration – Nov. 7-12 – San Jose
Attend for:
Ann Arbor District Library, a very cool library that has added a topic map like characteristic to its catalog.
User tags are stored separately but displayed alongside the controlled vocabulary of the library.
Some subject identifications are more equal than others.
A legitimate choice that enhances both the formal vocabulary as well as the user supplied “tags.”
One small step towards topic maps, ….
*****
Supplemental: 17 September 2010
More that one reader reported that my post was unclear. Here is a bit fuller explanation.
Follow the link Catalog. Next to the search catalog text book you will see a drop down menu. Select that and see “Tags” as one of the options. Those “tags” are supplied by users of the catalog. In other words, you can search by the controlled vocabulary of the library or by user tags. Both are associated with particular items in the collection.
Data Clustering: 50 Years Beyond K-Means Author: Anil K. Jain Keywords: clustering, clustering algorithms, semi-supervised clustering, ensemble clustering, simultaneous feature selection, data clustering, large scale data clustering.
Excellent survey and history of clustering.
International Association for Cryptologic Research
Hosts conference proceedings, ePrint Archive, CryptoDB, and other goodies. Membership details for IACR.
Topic map applications need to offer features such as:
*(Important for a range of defense and security applications.)
1st ACM International Health Informatics Symposium – November 11-12, 2010.
Interesting presentations:
Will watch for the call for papers for next year. Would be nice to have a topic map paper or two on the program.
Towards a Principled Theory of Clustering Author:Reza Bosagh Zadeh Keywords: Clustering functions, Single-Linkage, Max-Sum, Minimum/Maximum Spanning Trees, Effective Similarity.
Exploration of methods to characterize clustering algorithms “…in terms of the effective similarity between two points.” A line of research that may make choice of clustering algorithms less arbitrary.
International Journal of Approximate Reasoning – Volume 51, Issue 8, October 2010 has a couple of items of interest:
Redis Snippet for Storing the Social Graph from Alex Popescu, a snippet on storing relationships for a social graph using Redis.
Relationships are just a step away (representationally speaking) from associations. Worth a look.
A million answers to twenty questions: choosing by checklist Authors: Michael Mandler , Paola Manzini , Marco Mariotti, Keywords: Bounded rationality, utility maximization, choice function, lexicographic utility.
Mentions:
Checklist users can in effect perform a binary search, which makes the number of preference discriminations they make an exponential function of the number of properties that they use. As a result, an agent who makes a 1,000,000 preference discriminations needs a checklist that is just 20 properties long.
Substitute “identity” for “preference.”
Empirical question, unlike ontologies, classifications, cataloging, the answers come from users.
FIS:2010 (3rd Future Internet Symposium 2010), doesn’t quite have the ring of “San Francisco” but I work with conference announcements as they come in.
Projecting subject identity for subjects in linked data (or data in general) is missing from the program.
Projecting subject identity, performing comparisons and merging on those projections will power effective use of any future Internet.
Expecting a uniform data format is on par with waiting for Esperanto to become universal. You can be self-righteous or you can be effective. I suggest effective.
If you attend FIS:2010, ask the speakers about subject identity projection.
The Topic Map Reference Model can’t claim to have invented the key/value view of the world.
But it is interesting how much traction key/value pair approaches have been getting of late. From NoSQL in general to Neo4j and Redis in particular. (no offense to other NoSQL contenders, those are the two that came to mind)
Declare which key/value pairs identify a subject and you are on your way towards a subject-centric view of computing.
OK, there are some details but declaring how you identify a subject is the first step in enabling others to reliably identify the same subject.
Using SQL Cross Join – the report writers secret weapon is a very clear explanation of the utility of cross-joins in SQL.
Cross-join = Cartesian product, something you will remember from the Topic Maps Reference Model.
Makes a robust where clause look important doesn’t it?
LNCS Volume 6304: Artificial Intelligence: Methodology, Systems, and Applications edited by Darina Dicheva, Danail Dochev, has, among other interesting titles, the following:
Informatics – University of Michigan is described as:
Informatics combines solid grounding in computer programming, mathematics, and statistics, combined with study of the ethical and social science aspects of complex information systems. Informatics majors learn to critically analyze various approaches to processing information and develop skills to design, implement, and evaluate the next generation of information technology tools.
Sounds like a good place to look for potential topic map authors and/or to promote the use of topic maps!
A Survey of Binary Similarity and Distance Measures Authors: Seung-Seok Choi, Sung-Hyuk Cha, Charles C. Tappert Keywords: binary similarity measure, binary distance measure, hierarchical clustering, classification, operational taxonomic unit. (Journal of Systemics, Cybernetics and Informatics, Vol. 8, No. 1, pp. 43-48, 2010)
Powered by WordPress