Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 25, 2010

parf: Parallel Random Forest Algorithm

Filed under: Classification,Random Forests — Patrick Durusau @ 9:28 pm

parf: Parallel Random Forest Algorithm

From the website:

The Random Forests algorithm is one of the best among the known classification algorithms, able to classify big quantities of data with great accuracy. Also, this algorithm is inherently parallelisable.

Originally, the algorithm was written in the programming language Fortran 77, which is obsolete and does not provide many of the capabilities of modern programming languages; also, the original code is not an example of “clear” programming, so it is very hard to employ in education. Within this project the program is adapted to Fortran 90. In contrast to Fortran 77, Fortran 90 is a structured programming language, legible — to researchers as well as to students.

The creator of the algorithm, Berkeley professor emeritus Leo Breiman, expressed a big interest in this idea in our correspondence. He has confirmed that no one has yet worked on a parallel implementation of his algorithm, and promised his support and help. Leo Breiman is one of the pioneers in the fields of machine learning and data mining, and a co-author of the first significant programs (CART – Classification and Regression Trees) in that field.

Well, while I was at code.google.com I decided to look around for any resources that might interest topic mappers in the new year. This one caught my eye.

Not much apparent activity so this might be one where a volunteer or two could make a real difference.

December 7, 2010

A Library Case For Topic Maps

Filed under: Classification,Examples,Subject Identity,Topic Maps — Patrick Durusau @ 7:40 pm

Libraries would benefit from topic maps in a number of ways but I ran across a very specific one today.

To escape the paralyzing grip of library vendors, a number of open source projects for system, even state-wide library software projects are now underway.

OK, so you have a central registry of all the books. But the local libraries, have millions of books with call numbers already assigned.

Libraries can either spend years and $millions to transition to uniform identifiers (doesn’t that sound “webby” to you?) or they can keep the call number they have.

Here is a real life example of the call numbers for Everybody’s Plutarch:

920 PLU

920.3 P

920 PLUT

R 920 P

920 P

Solution? One record (can you say proxy?) for this book with details for the various individual library holdings.

Libraries are already doing this so what is the topic map payoff?

Say I write a review of Everybody’s Plutarch and post it to the local library system with call number 920 P.

With a topic map, users of the system with 920.3 P (or any of the others), will also see my review.

The topic map payoff is that we can benefit from the contributions of others as well as contribute ourselves.

(Without having to move in mental lock step.)

December 6, 2010

Survey on Social Tagging Techniques

Filed under: Bookmarking,Classification,Folksonomy,Tagging — Patrick Durusau @ 6:37 am

Survey on Social Tagging Techniques Authors: Manish Gupta, Rui Li, Zhijun Yin, Jiawei Han Keywords: Social tagging, bookmarking, tagging, social indexing, social classification, collaborative tagging, folksonomy, folk classification, ethnoclassification, distributed classification, folk taxonomy

Abstract:

Social tagging on online portals has become a trend now. It has emerged as one of the best ways of associating metadata with web objects. With the increase in the kinds of web objects becoming available, collaborative tagging of such objects is also developing along new dimensions. This popularity has led to a vast literature on social tagging. In this survey paper, we would like to summarize different techniques employed to study various aspects of tagging. Broadly, we would discuss about properties of tag streams, tagging models, tag semantics, generating recommendations using tags, visualizations of tags, applications of tags and problems associated with tagging usage. We would discuss topics like why people tag, what influences the choice of tags, how to model the tagging process, kinds of tags, different power laws observed in tagging domain, how tags are created, how to choose the right tags for recommendation, etc. We conclude with thoughts on future work in the area.

I recommend this survey in part due to its depth but also for not lacking a viewpoint:

…But fixed static taxonomies are rigid, conservative, and centralized. [cite omitted]…Hierarchical classifications are influenced by the cataloguer’s view of the world and, as a consequence, are affected by subjectivity and cultural bias. Rigid hierarchical classification systems cannot easily keep up with an increasing and evolving corpus of items…By their very nature, hierarchies tend to establish only one consistent, authoritative structured vision. This implies a loss of precision, erases differences of expression, and does not take into account the variety of user needs and views.

I am not innocent of having made similar arguments in other contexts. It makes good press among the young and dissatisfied, it doesn’t bear up to close scrutiny.

For example, the claim is made that “hierarchical classifications” are “affected by subjectivity and cultural bias.” The implied claim is that social tagging is not. Yes? I would argue that all classification, hierarchical and otherwise is affected by “subjectivity and cultural bias.”

Questions:

  1. Choose one of the other claims about hierarchical classifications. Is is also true of social tagging? Why/Why not? (3-5 pages, no citations)
  2. Choose a social tagging practice. What are its strengths/weaknesses? (3-5 pages, no citations)
  3. How would you use topic maps with the social tagging practice in #2? (3-5 pages, no citations)

to_be_classified: A Facet Analysis of a Folksonomy

Filed under: Classification,Facets,Folksonomy,Ranganathan — Patrick Durusau @ 5:37 am

to_be_classified: A Facet Analysis of a Folksonomy Author Elise Conradi Keywords Facet analysis, Faceted classification, VDP::Samfunnsvitenskap: 200::Biblioteks- og informasjonsvitenskap: 320::Kunnskapsgjenfinning og organisering: 323

Abstract:

This research examines Ranganathan’s postulational approach to facet analysis with the intention of manually inducing a faceted classification ontology from a folksonomy. Folksonomies are viewed as a source to a wealth of data representing users’ perspectives. An in-depth study of faceted classification theory is used to form a methodology based on the postulational approach. The dataset used to test the methodology consists of over 107,000 instances of 1,275 unique tags representing 76 popular non-fiction history books collected from the LibraryThing folksonomy. Preliminary results of the facet analysis indicate the manual inducement of two faceted classification ontologies in the dataset; one representing the universe of books and one representing the universe of subjects within the universe of books. The ontology representing the universe of books is considered to be complete, whereas the ontology representing the universe of subjects is incomplete. These differences are discussed in light of theoretical differences between special and universal faceted classifications. The induced ontologies are then discussed in terms of their substantiation or violation of Ranganathan’s Canons of Classification.

Highly recommended. Expect back references to this entry in the coming months.

Questions:

  1. Is Ranganathan’s “idea plane” for work in classification different from Husserl’s “bracketing?” If so, how? (3-5 pages, citations)
  2. How would you distinguish the “idea plane” from the “verbal plane?” (3-5 pages, no citations)
  3. How would you compare the “idea planes” as seen by two different classifiers? (3-5 pages, no citations)

November 30, 2010

Apache Mahout – Website

Filed under: Classification,Clustering,Data Mining,Mahout,Pattern Recognition,Software — Patrick Durusau @ 8:54 pm

Apache Mahout

From the website:

Apache Mahout’s goal is to build scalable machine learning libraries. With scalable we mean:

Scalable to reasonably large data sets. Our core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. However we do not restrict contributions to Hadoop based implementations: Contributions that run on a single node or on a non-Hadoop cluster are welcome as well. The core libraries are highly optimized to allow for good performance also for non-distributed algorithms.

Current capabilities include:

  • Collaborative Filtering
  • User and Item based recommenders
  • K-Means, Fuzzy K-Means clustering
  • Mean Shift clustering
  • Dirichlet process clustering
  • Latent Dirichlet Allocation
  • Singular value decomposition
  • Parallel Frequent Pattern mining
  • Complementary Naive Bayes classifier
  • Random forest decision tree based classifier
  • High performance java collections (previously colt collections)

A topic maps class will only have enough time to show some examples of using Mahout. Perhaps an informal group?

November 20, 2010

Classification and Pattern Discovery of Mood in Weblogs

Filed under: Classification,Clustering,Pattern Recognition — Patrick Durusau @ 10:18 am

Classification and Pattern Discovery of Mood in Weblogs Authors(s): Thin Nguyen, Dinh Phung, Brett Adams, Truyen Tran, Svetha Venkatesh

Abstract:

Automatic data-driven analysis of mood from text is an emerging problem with many potential applications. Unlike generic text categorization, mood classification based on textual features is complicated by various factors, including its context- and user-sensitive nature. We present a comprehensive study of different feature selection schemes in machine learning for the problem of mood classification in weblogs. Notably, we introduce the novel use of a feature set based on the affective norms for English words (ANEW) lexicon studied in psychology. This feature set has the advantage of being computationally efficient while maintaining accuracy comparable to other state-of-the-art feature sets experimented with. In addition, we present results of data-driven clustering on a dataset of over 17 million blog posts with mood groundtruth. Our analysis reveals an interesting, and readily interpreted, structure to the linguistic expression of emotion, one that comprises valuable empirical evidence in support of existing psychological models of emotion, and in particular the dipoles pleasure-displeasure and activation-deactivation.

The classification and pattern discovery of sentiment in weblogs will be a high priority for some topic maps.

Detection of teenagers who post to MySpace about violence for example.

Questions:

  1. How would you use this technique for research on weblogs? (3-5 pages, no citations)
  2. What other word lists could be applied to research on weblogs? Thoughts on how they could be applied? (3-5 pages, citations)
  3. Does the “mood” of a text impact its classification in traditional schemes? How would you test that question? (3-5 pages, no citations)

Additional resources:

Affective Norms for English Words (ANEW) Instruction Manual and Affective Ratings

ANEW Message: Request form for ANEW word list.

November 19, 2010

November 17, 2010

RDA: Resource Description and Access

Filed under: Cataloging,Classification,RDA,Subject Identity,Topic Maps — Patrick Durusau @ 11:06 am

RDA: Resource Description and Access

From the website:

RDA: Resource Description and Access is the new standard for resource description and access designed for the digital world. Built on the foundations established by AACR2, RDA provides a comprehensive set of guidelines and instructions on resource description and access covering all types of content and media. (emphasis in original)

In case you are interested in the draft of 2008 version, just to get the flavor of it, see: http://www.rdatoolkit.org/constituencyreview.

More to follow on RDA and topic maps.

November 8, 2010

Combining the Missing Link: An Incremental Topic Model of Document Content and Hyperlink

Filed under: Classification,Data Mining,Link-IPLSI — Patrick Durusau @ 7:59 am

Combining the Missing Link: An Incremental Topic Model of Document Content and Hyperlink Authors: Huifang Ma, Zhixin Li and Zhongzhi Shi Keywords: Topic model, Link-IPLSI, Incremental Learning, Adaptive Asymmetric learning

Abstract:

The content and structure of linked information such as sets of web pages or research paper archives are dynamic and keep on changing. Even though different methods are proposed to exploit both the link structure and the content information, no existing approach can effectively deal with this evolution. We propose a novel joint model, called Link-IPLSI, to combine texts and links in a topic modeling framework incrementally. The model takes advantage of a novel link updating technique that can cope with dynamic changes of online document streams in a faster and scalable way. Furthermore, an adaptive asymmetric learning method is adopted to freely control the assignment of weights to terms and citations. Experimental results on two different sources of online information demonstrate the time saving strength of our method and indicate that our model leads to systematic improvements in the quality of classification.

Questions:

  1. Timed expiration of documents and terms? Appropriate for library settings? (discussion)
  2. Citations treated same as hyperlinks? (Aren’t citations more granular?) (3-5 pages, citations)
  3. What do we lose by citation to documents and not concepts/locations in documents? (3-5 pages, citations)

PS: The updating aspects of this paper are very important. Static data exists but isn’t very common in enterprise applications.

November 7, 2010

Parallel Implementation of Classification Algorithms Based on MapReduce

Filed under: Classification,Data Mining,Hadoop,MapReduce — Patrick Durusau @ 8:31 pm

Parallel Implementation of Classification Algorithms Based on MapReduce Authors: Qing He, Fuzhen Zhuang, Jincheng Li and Zhongzhi Shi Keywords: Data Mining, Classification, Parallel Implementation, Large Dataset, MapReduce

Abstract:

Data mining has attracted extensive research for several decades. As an important task of data mining, classification plays an important role in information retrieval, web searching, CRM, etc. Most of the present classification techniques are serial, which become impractical for large dataset. The computing resource is under-utilized and the executing time is not waitable. Provided the program mode of MapReduce, we propose the parallel implementation methods of several classification algorithms, such as k-nearest neighbors, naive bayesian model and decision tree, etc. Preparatory experiments show that the proposed parallel methods can not only process large dataset, but also can be extended to execute on a cluster, which can significantly improve the efficiency.

From the paper:

In this paper, we introduced the parallel implementation of several classification algorithms based on MapReduce, which make them be applicable to mine large dataset. The key is to design the proper key/value pairs. (emphasis in original)

Questions:

  1. Annotated bibliography of parallel classification algorithms (newer than this paper, 3-5 pages, citations)
  2. Report for class on application of parallel classification algorithms (report + paper)
  3. Application of parallel classification algorithm to a library dataset (project)
  4. Can the key/value pairs be interchanged with others? Yes/no, why? (3-5 pages, no citations.)

November 6, 2010

The AQ Methods for Concept Drift

Filed under: Authoring Topic Maps,Classification,Concept Drift,Topic Maps — Patrick Durusau @ 4:51 am

The AQ Methods for Concept Drift Authors: Marcus A. Maloof Keywords:online learning, concept drift, aq algorithm, ensemble methods

Abstract:

Since the mid-1990’s, we have developed, implemented, and evaluated a number of learning methods that cope with concept drift. Drift occurs when the target concept that a learner must acquire changes over time. It is present in applications involving user preferences (e.g., calendar scheduling) and adversaries (e.g., spam detection). We based early efforts on Michalski’s aq algorithm, and our more recent work has investigated ensemble methods. We have also implemented several methods that other researchers have proposed. In this chapter, we survey results that we have obtained since the mid-1990’s using the Stagger concepts and learning methods for concept drift. We examine our methods based on the aq algorithm, our ensemble methods, and the methods of other researchers. Dynamic weighted majority with an incremental algorithm for producing decision trees as the base learner achieved the best overall performance on this problem with an area under the performance curve after the first drift point of .882. Systems based on the aq11 algorithm, which incrementally induces rules, performed comparably, achieving areas of .875. Indeed, an aq11 system with partial instance memory and Widmer and Kubat’s window adjustment heuristic achieved the best performance with an overall area under the performance curve, with an area of .898.

The author offers this definition of concept drift:

Concept drift [19, 30] is a phenomenon in which examples have legitimate labels at one time and have different legitimate labels at another time. Geometrically, if we view a target concept as a cloud of points in a feature space, concept drift may entail the cloud changing its position, shape, and size. From the perspective of Bayesian decision theory, these transformations equate to changes to the form or parameters of the prior and class-conditional distributions.

Hmmm, “legitimate labels,” sounds like a job for topic maps doesn’t it?

Questions:

  1. Has concept drift been used in library classification? (research question)
  2. How would you use concept drift concepts in library classification? (3-5 pages, no citations)
  3. Demonstrate use of concept drift techniques to augment topic map authoring. (project)

On Classifying Drifting Concepts in P2P Networks

Filed under: Ambiguity,Authoring Topic Maps,Classification,Concept Drift — Patrick Durusau @ 4:07 am

On Classifying Drifting Concepts in P2P Networks Authors: Hock Hee Ang, Vivekanand Gopalkrishnan, Wee Keong Ng and Steven Hoi Keywords: Concept drift, classification, peer-to-peer (P2P) networks, distributed classification

Abstract:

Concept drift is a common challenge for many real-world data mining and knowledge discovery applications. Most of the existing studies for concept drift are based on centralized settings, and are often hard to adapt in a distributed computing environment. In this paper, we investigate a new research problem, P2P concept drift detection, which aims to effectively classify drifting concepts in P2P networks. We propose a novel P2P learning framework for concept drift classification, which includes both reactive and proactive approaches to classify the drifting concepts in a distributed manner. Our empirical study shows that the proposed technique is able to effectively detect the drifting concepts and improve the classification performance.

The authors define the problem as:

Concept drift refers to the learning problem where the target concept to be predicted, changes over time in some unforeseen behaviors. It is commonly found in many dynamic environments, such as data streams, P2P systems, etc. Real-world examples include network intrusion detection, spam detection, fraud detection, epidemiological, and climate or demographic data, etc.

The authors may well have been the first to formulate this problem among mechanical peers but any humanist could have pointed out examples concept drift between people. Both in the literature as well as real life.

Questions:

  1. What are the implications of concept drift for Linked Data? (3-5 pages, no citations)
  2. What are the implications of concept drift for static ontologies? (3-5 pages, no citations)
  3. Is concept development (over time) another form of concept drift? (3-5 pages, citations, illustrations, presentation)

*****
PS: Finding this paper is an illustration of ambiguity leading to serendipitous discovery. I searched for one of the author’s instead of the exact title of another paper. While scanning the search results I found this paper.

October 29, 2010

VoxPopuLII – Blog

Filed under: Cataloging,Classification,FRBR,Information Retrieval,Legal Informatics — Patrick Durusau @ 5:46 am

VoxPopuLII.

From the blog:

VoxPopuLII is a guest-blogging project sponsored by the Legal Information Institute at the Cornell Law School. It presents the insights of a the very diverse group of people working on legal informatics issues and government information, all around the world. It emphasizes new voices and big ideas.

Not your average blog.

I first encountered: LexML Brazil Project

Questions (about LexML):

  1. What do you think about the strategy to deal with semantic diversity? Pluses? Minuses?
  2. The project says they are following: “Ranganathan’s ‘stratification planes’ classification system…” Your evaluation?
  3. Identify 3 instances of equivalents to the “stratification planes” classification system.
  4. How would you map those 3 instances to Ranganathan’s “stratification planes?”

October 11, 2010

Satrap: Data and Network Heterogeneity Aware P2P Data-Mining

Filed under: Classification,Heterogeneous Data,Networks,Searching,Semantic Diversity — Patrick Durusau @ 6:15 am

Satrap: Data and Network Heterogeneity Aware P2P Data-Mining Authors: Hock Hee Ang, Vivekanand Gopalkrishnan, Anwitaman Datta, Wee Keong Ng, Steven C. H. Hoi Keywords: Distributed classification, P2P network, cascade SVM

Abstract:

Distributed classification aims to build an accurate classifier by learning from distributed data while reducing computation and communication cost. A P2P network where numerous users come together to share resources like data content, bandwidth, storage space and CPU resources is an excellent platform for distributed classification. However, two important aspects of the learning environment have often been overlooked by other works, viz., 1) location of the peers which results in variable communication cost and 2) heterogeneity of the peers’ data which can help reduce redundant communication. In this paper, we examine the properties of network and data heterogeneity and propose a simple yet efficient P2P classification approach that minimizes expensive inter-region communication while achieving good generalization performance. Experimental results demonstrate the feasibility and effectiveness of the proposed solution.

Among the other claims for Satrap:

  • achieves the best accuracy-to-communication cost ratio given that data exchange is performed to improve global accuracy.
  • allows users to control the trade-off between accuracy and communication cost with the user-specified parameters.

I find these two the most interesting.

In part because semantic integration, whether explicit or not, is always a question of cost ratio and tradeoffs.

It would be refreshing to see papers that say what semantic integration would be too costly with method X or that aren’t possible with method Y.

October 5, 2010

Context-aware intelligent recommender system

Filed under: Classification,Context-aware,Fuzzy Logic — Patrick Durusau @ 6:49 am

Context-aware intelligent recommender system Authors: Mehdi Elahi Keywords: active learning, classification, context-aware, fuzzy logic, recommendation systems, recommenders

Abstract:

This demo paper presents a context-aware recommendation system. The system mines data from user’s web searches and other sources to improve the presentation of content on visited web pages. While user is browsing the internet, a memory resident agent records and analyzes the content of the webpages that were either searched for or visited in order to identify topic preferences. Then, based on such information, the content of requested web page is ranked and classified with different styles. The demo shows how a music weblog can be modified automatically based on user’s affinities.

Context-aware recommendation systems help present relevant information in large topic maps but I am more interested in their use for authoring systems.

Automatic construction of topics/roles/associations based on prior choices (for user approval) comes to mind.

Not a tool for a casual author but certainly a power tool for professional information explorers. (librarians?)

October 3, 2010

Exploratory information search by domain experts and novices

Exploratory information search by domain experts and novices Authors: Ruogu Kang, Wai-Tat Fu Keywords: domain expertise, exploratory search, social search

Abstract:

The arising popularity of social tagging system has the potential to transform traditional web search into a new era of social search. Based on the finding that domain expertise could influence search behavior in traditional search engines, we hypothesized and tested the idea that domain expertise would have similar influence on search behavior in a social tagging system. We conducted an experiment comparing search behavior of experts and novices when they searched using a tradition search engine and a social tagging system. Results from our experiment showed that experts relied more on their own domain knowledge to generate search queries, while novices were influenced more by social cues in the social tagging system. Experts were also found to conform to each other more than novices in their choice of bookmarks and tags. Implications on the design of future social information systems are discussed.

Empirical validation of the idea that expert searchers (dare I say librarians?) can improve the search results for “novice” searchers.

A line of research that librarians need to take up and expand to combat budget cuts by the uninformed.

Note that experts suffer from the “vocabulary” problem just like novices, just in more sophisticated ways.

Designing a thesaurus-based comparison search interface for linked cultural heritage sources

Filed under: Classification,Heterogeneous Data,Interface Research/Design,Thesaurus — Patrick Durusau @ 7:15 am

Designing a thesaurus-based comparison search interface for linked cultural heritage sources Authors: Alia Amin, Michiel Hildebrand, Jacco van Ossenbruggen, Lynda Hardman Keywords: comparison search, thesauri, cultural heritage

Prototype: LISA, e-culture.multimedian.nl

Abstract:

Comparison search is an information seeking task where a user examines individual items or sets of items for similarities and differences. While this is a known information need among experts and knowledge workers, appropriate tools are not available. In this paper, we discuss comparison search in the cultural heritage domain, a domain characterized by large, rich and heterogeneous data sets, where different organizations deploy different schemata and terminologies to describe their artifacts. This diversity makes meaningful comparison difficult. We developed a thesaurus-based comparison search application called LISA, a tool that allows a user to search, select and compare sets of artifacts. Different visualizations allow users to use different comparison strategies to cope with the underlying heterogeneous data and the complexity of the search tasks. We conducted two user studies. A preliminary study identifies the problems experts face while performing comparison search tasks. A second user study examines the effectiveness of LISA in helping to solve comparison search tasks. The main contribution of this paper is to establish design guidelines for the data and interface of a comparison search application. Moreover, we offer insights into when thesauri and metadata are appropriate for use in such applications.

User-centric project that develops an interface into heterogeneous data sets.

What I would characterize as pre-mapping, that is no “canonical” mapping has yet been established.

Perhaps a good idea to preserve a pre-mapping stage as any mapping represents but one choice among many.

September 29, 2010

LingPipe

Filed under: Classification,Clustering,Entity Extraction,Full-Text Search,Searching — Patrick Durusau @ 7:06 am

LingPipe.

The tutorial listing for LingPipe is the best summary of its capabilities.

Its sandbox is another “must see” location.

There may be better introductions to linguistic processing but I haven’t seen them.

September 17, 2010

A Logical Account of Lying

Filed under: Classification,Indexing,Subject Identifiers — Patrick Durusau @ 2:46 pm

A Logical Account of Lying Authors:Chiaki Sakama, Martin Caminada and Andreas Herzig Keywords: lying, lies, argumentation systems, artificial intelligence, multiagent systems, intelligent agents.

Abstract:

This paper aims at providing a formal account of lying – a dishonest attitude of human beings. We first formulate lying under propositional modal logic and present basic properties for it. We then investigate why one engages in lying and how one reasons about lying. We distinguish between offensive and defensive lies, or deductive and abductive lies, based on intention behind the act. We also study two weak forms of dishonesty, bullshit and deception, and provide their logical features in contrast to lying. We finally argue dishonesty postulates that agents should try to satisfy for both moral and self-interested reasons. (emphasis in original)

Be the first to have your topic map distinguish between:

  • offensive lies
  • defensive lies
  • deductive lies
  • abductive lies (Someone tweet John Sowa please.)
  • deception
  • bullshit

Subj3ct.com has an identifier for the subject “bullshit,” http://dbpedia.org/resource/Bullshit, but it does not reflect this latest analysis.

Planet Cataloging

Filed under: Access Points,Authority Record,Cataloging,Classification — Patrick Durusau @ 5:01 am

Planet Cataloging

Aggregation of > 60 blogs on cataloging.

Read to improve your topic mapping (and cataloging) skills.

September 16, 2010

Almost A Topic Map

Filed under: Cataloging,Classification,Topic Maps — Patrick Durusau @ 4:35 am

Ann Arbor District Library, a very cool library that has added a topic map like characteristic to its catalog.

User tags are stored separately but displayed alongside the controlled vocabulary of the library.

Some subject identifications are more equal than others.

A legitimate choice that enhances both the formal vocabulary as well as the user supplied “tags.”

One small step towards topic maps, ….

*****
Supplemental: 17 September 2010

More that one reader reported that my post was unclear. Here is a bit fuller explanation.

Follow the link Catalog. Next to the search catalog text book you will see a drop down menu. Select that and see “Tags” as one of the options. Those “tags” are supplied by users of the catalog. In other words, you can search by the controlled vocabulary of the library or by user tags. Both are associated with particular items in the collection.

September 15, 2010

Taxonomy for Characterizing Ensemble Methods in Classification Tasks

Filed under: Authoring Topic Maps,Classification,Ensemble Methods — Patrick Durusau @ 8:11 am

Taxonomy for Characterizing Ensemble Methods in Classification Tasks Author: Lior Rokach Keywords: Ensemble-methods; Classification; Boosting; Bagging; Partitioning; Decision trees; Neural networks. Review and annotated bibliography of work on ensemble methods.

Ensemble methods, I like the sound of that.

Extend it to mean human authors + other methods creating a topic map.

September 12, 2010

LNCS Volume 6304: Artificial Intelligence: Methodology, Systems, and Applications

Filed under: Classification,Ontology,Searching,Subject Identity — Patrick Durusau @ 6:48 pm

LNCS Volume 6304: Artificial Intelligence: Methodology, Systems, and Applications edited by Darina Dicheva, Danail Dochev, has, among other interesting titles, the following:

September 11, 2010

76 Binary Smilarity and Distance Measures

Filed under: Binary Distance,Binary Similarity,Classification,Pattern Recognition — Patrick Durusau @ 5:53 am

A Survey of Binary Similarity and Distance Measures Authors: Seung-Seok Choi, Sung-Hyuk Cha, Charles C. Tappert Keywords: binary similarity measure, binary distance measure, hierarchical clustering, classification, operational taxonomic unit. (Journal of Systemics, Cybernetics and Informatics, Vol. 8, No. 1, pp. 43-48, 2010)

August 2, 2010

…Library of Congress Subject Heading for Social Tags

Filed under: Cataloging,Classification,LCSH,OPACS — Patrick Durusau @ 6:46 pm

“A Semantic Similarity Approach for Predicting Library of Congress Subject Headings for Social Tags,” by Kwan Yi, appears in JASIST, 61(8):1658-1672, 2010. This is an important article for library students to read. Carefully.

The author recognizes that linking social tags to controlled vocabularies may help with the organization of information that is only socially tagged. And the article is a good review of the application of five popular measures of semantic similarity metrics.

The interesting step for the article would be the reverse of the author’s suggested: “The study of introducing the LCSH to give a control to social tags…”(p. 1670).

Why not introduce “social tags” to enrich the finding experience of users in LCSH settings?

A substantial body of users find information with “social tags,” so why not offer that option?

The user experience with “social tags” along side LCSH headings in a library setting awaits future research.

June 6, 2010

Citation Indexing

Eugene Garfield’s homepage may not be familiar to topic map fans but it should be.

Garfield invented citation indexing in the late 1950’s/early 1960’s.

Among the treasures you will find here:

June 4, 2010

representing scientific discourse, or: why triples are not enough

Filed under: Classification,Indexing,Information Retrieval,Ontology,RDF,Semantic Web — Patrick Durusau @ 4:15 pm

representing scientific discourse, or: why triples are not enough by Anita de Waard, Disruptive Technologies Director (how is that for a cool title?), Elsevier Labs, merits a long look.

I won’t spoil the effect by trying to summarize the presentation.  It is only 23 slides long.

Read those slides carefully and then get yourself to: Rhetorical Document Structure Group HCLS IG W3C. Read, discuss, contribute.

PS: Based on this slide pack I am seriously thinking of getting a Twitter account so I can follow Anita. Not saying I will but am as tempted as I have ever been. This looks very interesting. Fertile ground for discussion of topic maps.

June 1, 2010

Enhancing navigation in biomedical databases by community voting and database-driven text classification

Enhancing navigation in biomedical databases by community voting and database-driven text classification demonstrates improvement of automatic classification of literature by harnessing community knowledge.

From the authors:

Using PepBank as a model database, we show how to build a classification-aided retrieval system that gathers training data from the community, is completely controlled by the database, scales well with concurrent change events, and can be adapted to add text classification capability to other biomedical databases.

The system can be seen at: PepBank.

You need to read the article in full to appreciate what the authors have done but a couple of quick points to notice:

1) The use of heat maps to assist users in determining the relevance of a given abstract. (Domain specific facts.)

2) The user interface allows yes/no voting on the same facts as appear in the heat map.

Voting results in reclassification of the entries.

Equally important is a user interface that enables immediate evaluation of relevance and, quick user feedback on relevance.

The user is not asked a series of questions, given complex rating choices, etc., it is yes or no. That may seem coarse but the project demonstrates with proper design, that can be very useful.

May 12, 2010

Gonorrhea and Weapons of Mass Destruction

Filed under: Classification,Concept Hierarchies,Humor,Ontology — Patrick Durusau @ 7:43 am

The Weapons of Mass Destruction (WMD) ontology at the Suggested Upper Merged Ontology (SUMO) website includes Gonorrhea.

Imagine a WMD debate over a Gonorrhea test for all airline passengers, blue ink for their thumbs (positive), along with penicillin shots.

The transmission mechanisms of Gonorrhea make it an unlikely weapon of mass destruction.

The monological nature of WMD ontology prevents contrary views from being registered. It must have, after all, a determinate result.

Topic map authors can make equally foolish statements. The difference is that contrary views can be registered as well.

May 11, 2010

Subject World

Filed under: Access Points,Cataloging,Classification,Examples,LCSH,OPACS,Subject Headings — Patrick Durusau @ 9:01 am

Subject World (Japanese only)

Subject World is a project to visualize heterogeneous terminology, including catalogs, for use with library catalogs. Uses BSH4 subject headings (Basic Subject Headings) and NDC9 index terms (Nippon Decimal Classification) to visualize and retrieve information from the Osaka City University OPAC.

English language resources:

Subject World: A System for Visualizing OPAC (paper)

Slides with the same title (but different publication from the paper):

Subject World: A System for Visualizing OPAC (slides)

See also: Murakami Harumi Laboratory, in particular its research and publication pages.

« Newer PostsOlder Posts »

Powered by WordPress