Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 10, 2011

Infovis vs. Statistical Graphics

Filed under: Visualization — Patrick Durusau @ 10:45 am

Infovis vs. Statistical Graphics Authors: Andrew Gelman and Antony Unwin.

One commonality between information visualization and statistical graphics is that it is easily possible to do both poorly. 😉

This is an entertaining overview of both. Recommended.

January 9, 2011

Apache UIMA

Apache UIMA

From the website:

Unstructured Information Management applications are software systems that analyze large volumes of unstructured information in order to discover knowledge that is relevant to an end user. An example UIM application might ingest plain text and identify entities, such as persons, places, organizations; or relations, such as works-for or located-at.

UIMA enables applications to be decomposed into components, for example “language identification” => “language specific segmentation” => “sentence boundary detection” => “entity detection (person/place names etc.)”. Each component implements interfaces defined by the framework and provides self-describing metadata via XML descriptor files. The framework manages these components and the data flow between them. Components are written in Java or C++; the data that flows between components is designed for efficient mapping between these languages.

UIMA additionally provides capabilities to wrap components as network services, and can scale to very large volumes by replicating processing pipelines over a cluster of networked nodes.

The UIMA project offers a number of annotators that produce structured information from unstructured texts.

If you are using UIMA as a framework for development of topic maps, please post concerning your experiences with UIMA. What works, what doesn’t, etc.

Center for Computational Analysis of Social and Organizational Systems (CASOS)

Center for Computational Analysis of Social and Organizational Systems (CASOS)

Home of both ORA and AutoMap but I thought it merited an entry of its own.

Directed by Dr. Kathleen Carley:

CASOS brings together computer science, dynamic network analysis and the empirical study of complex socio-technical systems. Computational and social network techniques are combined to develop a better understanding of the fundamental principles of organizing, coordinating, managing and destabilizing systems of intelligent adaptive agents (human and artificial) engaged in real tasks at the team, organizational or social level. Whether the research involves the development of metrics, theories, computer simulations, toolkits, or new data analysis techniques advances in computer science are combined with a deep understanding of the underlying cognitive, social, political, business and policy issues.

CASOS is a university wide center drawing on a group of world class faculty, students and research and administrative staff in multiple departments at Carnegie Mellon. CASOS fosters multi-disciplinary research in which students and faculty work with students and faculty in other universities as well as scientists and practitioners in industry and government. CASOS research leads the way in examining network dynamics and in linking social networks to other types of networks such as knowledge networks. This work has led to the development of new statistical toolkits for the collection and analysis of network data (Ora and AutoMap). Additionally, a number of validated multi-agent network models in areas as diverse as network evolution , bio-terrorism, covert networks, and organizational adaptation have been developed and used to increase our understanding of real socio-technical systems.

CASOS research spans multiple disciplines and technologies. Social networks, dynamic networks, agent based models, complex systems, link analysis, entity extraction, link extraction, anomaly detection, and machine learning are among the methodologies used by members of CASOS to tackle real world problems.

Definitely a group that bears watching by anyone interested in topic maps!

AutoMap – Extracting Topic Maps from Texts?

Filed under: Authoring Topic Maps,Entity Extraction,Networks,Semantics,Software — Patrick Durusau @ 10:59 am

AutoMap: Extract, Analyze and Represent Relational Data from Texts (according to its webpage).

From the webpage:

AutoMap is a text mining tool that enables the extraction of network data from texts. AutoMap can extract content analytic data (words and frequencies), semantic networks, and meta-networks from unstructured texts developed by CASOS at Carnegie Mellon. Pre-processors for handling pdf’s and other text formats exist. Post-processors for linking to gazateers and belief inference also exist. The main functions of AutoMap are to extract, analyze, and compare texts in terms of concepts, themes, sentiment, semantic networks and the meta-networks extracted from the texts. AutoMap exports data in DyNetML and can be used interoperably with *ORA.

AutoMap uses parts of speech tagging and proximity analysis to do computer-assisted Network Text Analysis (NTA). NTA encodes the links among words in a text and constructs a network of the linked words.

AutoMap subsumes classical Content Analysis by analyzing the existence, frequencies, and covariance of terms and themes.

For a rough cut at a topic map from a text, AutoMap looks like a useful tool.

In addition to the software, training material and other information is available.

My primary interest is the application of such a tool to legislative debates, legislation and court decisions.

None of those occur in a vacuum and topic maps could help provide a context for understand such material.

ORA – Topic Maps as Networks?

Filed under: Networks,Software — Patrick Durusau @ 10:28 am

ORA (Organization Risk Analyzer) is a toolkit developed for the analysis of organizational networks that could prove to be very useful for topic maps when viewed as networks.

From the website:

*ORA is a dynamic meta-network assessment and analysis tool developed by CASOS at Carnegie Mellon. It contains hundreds of social network, dynamic network metrics, trail metrics, procedures for grouping nodes, identifying local patterns, comparing and contrasting networks, groups, and individuals from a dynamic meta-network perspective. *ORA has been used to examine how networks change through space and time, contains procedures for moving back and forth between trail data (e.g. who was where when) and network data (who is connected to whom, who is connected to where …), and has a variety of geo-spatial network metrics, and change detection techniques. *ORA can handle multi-mode, multi-plex, multi-level networks. It can identify key players, groups and vulnerabilities, model network changes over time, and perform COA analysis. It has been tested with large networks (106 nodes per 5 entity classes).Distance based, algorithmic, and statistical procedures for comparing and contrasting networks are part of this toolkit.

Comments on which parts of this toolkit you find the most useful welcome.

International Network for Social Network Analysis

Filed under: Conferences,Networks — Patrick Durusau @ 6:39 am

International Network for Social Network Analysis

An organization focused on social networks (no surprise there) but also the source of a number of interesting resources, such as software and data sets.

There is a workshop on NetworkX to be offered at Sunbelt 2011.

Registration for the workshop closes 24 January 2011.

The family tree topic map demonstrated by Eric Freese, years ago now, is one example of a social network.

Both the site and organization merit a close look.

January 8, 2011

NetworkX

Filed under: Graphs,Maps,Networks,Software — Patrick Durusau @ 11:21 am

NetworkX

From the website:

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
….
Features:

  • Standard graph-theoretic and statistical physics functions
  • Easy exchange of network algorithms between applications,
    disciplines, and platforms
  • Many classic graphs and synthetic networks
  • Nodes and edges can be "anything"
    (e.g. time-series, text, images, XML records)
  • Exploits existing code from high-quality legacy software in C,
    C++, Fortran, etc.
  • Open source (encourages community input)
  • Unit-tested

NetworkX is a nice way to display topic maps as graphs.

Its importance for topic maps lies in the ability to study properties of nodes (representatives of subjects, including relationships) and composition of nodes (merging in topic map speak).

Principal Programming Paradigms

Filed under: Semantics,Subject Identity,Visualization — Patrick Durusau @ 9:35 am

Principal Programming Paradigms is an outgrowth of Concepts, Techniques, and Models of Computer Programming
, which is highly recommended for your reading pleasure.

I don’t think there are any universal threads running through semantic technologies but that may simply be my view of those technologies.

Questions:

  1. List with definitions your list of semantic technologies (3-5 pages, citations)
  2. What characteristics do you think form useful classifications of those technologies? (3-5 pages, citations)
  3. How would you visualize those technologies? (neatness counts)

Visualizing Terrorist Plots vs. Attacks

Filed under: Visualization — Patrick Durusau @ 7:10 am

The Beauty of Data Visualization David McCandless, TED presentation on data visualization, via Alex Popescu.

The visualization of global media fear is a great example.

Visualization can lead to discovery of patterns in information.

It also illustrates how selection of information influences the visualization.

Along with swine flu and killer wasps, I would have included reports of terrorist plots.

Unlike the Fall sweeps, terrorist plots aren’t announced by terrorists in advance. Terrorist plots are announced by, wait for it, governments.

Would plotting government sponsored announcements of terrorist plots illustrate the chimerical nature of reports of terrorist activity?

Questions (class activity):

  1. What media sources should we mine for reports of terrorist plots?
  2. What qualifies as a report of a terrorist plot and how should we measure the media intensity of those reports?
  3. Should we use a timeline for such reports? If so, what else should populate the timeline?
  4. How do we plot actual terrorist attacks?
  5. How would we show relationship/non-relationship of plots versus attacks? Say large number of reported plots but unrelated attack in Mumbai?
  6. Assuming we have settled all the foregoing questions, how would you capture this information in a topic map?

January 7, 2011

Association Game

Filed under: Associations,Humor — Patrick Durusau @ 7:16 pm

Actually it is called YouTube Name Mashup.

Enter a first and last name and it selects random YouTube videos.

It requires Chrome or Safari. (Seriously, even Mozilla dies.)

The association part?

Do this at a party, with or without topic mappers.

Divide into two teams. Each team has a turn at suggesting a first and last name for submission.

Teams have until the videos stop to write down a relationship (association to you topic map readers) with the roles in the relationship, between any subject in one video with any subject in the other video.

The best relationship is determined by applause from those attending the party.

Five rounds maximum.

Remember, the point of this exercise is to have fun and practice some imaginative thinking.

User Performance Using An Ontology-Driven Information Retrieval (ONTOIR) System

Filed under: Ontology,Topic Maps — Patrick Durusau @ 3:32 pm

User Performance Using An Ontology-Driven Information Retrieval (ONTOIR) System Authors: Myongho Yi

Also published by VDM Verlag User Performance Using An Ontology-Driven Information Retrieval (ONTOIR) System

Abstract:

Enhancing the representation and relationship of information through ontology is a promising alternative approach for knowledge organization. This improved knowledge organization is vital for collocation of information and effective and efficient searching. This study concerned the testing of user performance when searching an ontology-driven information retrieval (ONTOIR) system that shows explicit relationships among resources. The study explores the possibilities of improving user performance in searching for information. The goal was to examine whether or not ontology enhances user performance in terms of recall and search time. The experiment was conducted with 40 participants to evaluate and compare the differences in user performance (recall and search time) between an ontology-driven information retrieval system and a traditional, thesaurus-driven information retrieval system.

Better recall and shorter search time were found when conducting relationship-based queries in an ontology-driven information retrieval system as compared to a thesaurus-based system. Further studies comparing user performance with a cluster-based search engine and an ontology-driven information retrieval system are needed.

FYI, the first link is $89.00 less than the second one.

A topic map used to deliver an ontology driven search and navigation of information interface.

openNLP

Filed under: Data Mining,Natural Language Processing — Patrick Durusau @ 2:13 pm

openNLP

From the website:

OpenNLP is an organizational center for open source projects related to natural language processing. Its primary role is to encourage and facilitate the collaboration of researchers and developers on such projects.

OpenNLP also hosts a variety of java-based NLP tools which perform sentence detection, tokenization, pos-tagging, chunking and parsing, named-entity detection, and coreference using the OpenNLP Maxent machine learning package.

OpenNLP is incubating at the Apache Software Foundation (ASF).

Another set of NLP tools for topic map authors.

Probability and Computing

Filed under: Probalistic Models — Patrick Durusau @ 10:30 am

Probability and Computing

Lecture notes by Ryan O’Donnell for his Fall 2009 course at Carnegie Mellon University.

Probability is an important topic (sorry) in a number of CS areas.

Probabilistic merging is why I mention it here but understanding it more broadly will be useful in other CS areas as well.

Provenance for Aggregate Queries

Filed under: Aggregation,Merging,Query Language,TMQL — Patrick Durusau @ 7:19 am

Provenance for Aggregate Queries Authors: Yael Amsterdamer, Daniel Deutch, Val Tannen

Abstract:

We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance by annotating the different database tuples with elements of a commutative semiring and propagating the annotations through query evaluation. We show that aggregate queries pose novel challenges rendering this approach inapplicable. Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. We realize this approach in a concrete construction, first for “simple” queries where the aggregation operator is the last one applied, and then for arbitrary (positive) relational algebra queries with aggregation; the latter queries are shown to be more challenging in this context. Finally, we use aggregation to encode queries with difference, and study the semantics obtained for such queries on provenance annotated databases.

Not for the faint of heart reading.

But, provenance for merging is one obvious application of this paper.

For that matter, provenance should also be a consideration for TMQL.

Win: Data Analysis with Open Source Tools

Filed under: Data Analysis — Patrick Durusau @ 6:20 am

Comment to win a copy of Data Analysis with Open Source Tools

From the blog:

Want to win a copy? I have five of them up for grabs. For a chance to win, leave a comment below by January 9, 2011, 10:00pm PST. Tell us what you used to make your very first graph. Pencil and graph paper? Excel? R? Jelly beans?

I have entered a comment. Thought I should pass the opportunity along.

Apache OODT – Top Level Project

Filed under: Data Integration,Data Mining,Data Models,OODT,Software — Patrick Durusau @ 6:02 am

Apache OODT is the first ASF Top Level Project status for NASA developed software.

From the website:

Just what is Apacheâ„¢ OODT?

It’s metadata for middleware (and vice versa):

  • Transparent access to distributed resources
  • Data discovery and query optimization
  • Distributed processing and virtual archives

But it’s not just for science! It’s also a software architecture:

  • Models for information representation
  • Solutions to knowledge capture problems
  • Unification of technology, data, and metadata

Looks like a project that could benefit from having topic maps as part of its tool kit.

Check out the 0.1 OODT release and see what you think.

Apache Mahout – Data Mining Class

Filed under: Data Mining,Mahout — Patrick Durusau @ 5:27 am

Apache Mahout – Data Mining Class at the Illinois Institute of Technology, by Dr. David Grossman.

Grossman is the co-author of: Information Retrieval: Algorithms and Heuristics (The Information Retrieval Series)(2nd Edition)

The class was organized by Grant Ingersoll, see: Apache Mahout Catching on in Academia.

Definitely worth a visit to round out your data mining skills.

January 6, 2011

10 Ways to be a Marketing Genius Like Lady Gaga

Filed under: Marketing — Patrick Durusau @ 3:39 pm

10 Ways to be a Marketing Genius Like Lady Gaga

Just another log for the discussion of how to market topic maps.

A very amusing slide deck that could prove to be quite useful.

I am not real sure of an equivalent for the monster claw hand as marketing for topic maps. Suggestions welcome!

The Top Five Information Management Meltdowns of 2010

Filed under: Data Source,Examples — Patrick Durusau @ 3:14 pm

The Top Five Information Management Meltdowns of 2010

Every year produces a number of stories like these.

Pick one and from the published reports, describe how you would incorporate topic maps to help lead to a different outcome. (3-5 pages, citations)

PS: It may be possible that topic maps play no direct role in avoiding the problem but lead to a more useful system.

Lucene and Solr: 2010 in Review – Post

Filed under: Lucene,Search Engines,Solr — Patrick Durusau @ 2:55 pm

Lucene and Solr: 2010 in Review

Great highlights of a busy and productive year for both Lucene and Solr.

Economic Indicator Database

Filed under: Data Source — Patrick Durusau @ 9:37 am

Economic Indicator Database

The US Census Bureau has made its database of economic indicators searchable.

You can even download data for further manipulation, although I must admit being bemused by the “Download straight from website to Excel.”

I wonder if they mean “Excel” in the sense of any old spreadsheet program like people who say “photocopy” by another name, that starts and ends with an “X?” 😉

Probably not. They probably mean a specific bit of software.

Thought I would mention it as one of the many data sources from the US government that can be re-purposed for use with a topic map.

I am sure other governments make similar data sources available.

If you have a favorite one, please forward the URL, a brief description and any comments you want to make about using it with topic maps.

Moving Forward – Library Project Blog

Filed under: Interface Research/Design,Library,Library software,Software,Solr — Patrick Durusau @ 8:30 am

Moving Forward is a blog I discovered via alls things cataloged.

From the Forward blog:

Forward is a Resource Discovery experiment that builds a unified search interface for library data.

Today Forward is 100% of the UW System Library catalogs and two UW digital collections. The project also experiments with additional search contextualization by using web service APIs.

Forward can be accessed at the URL:
http://forward.library.wisconsin.edu/.

Sounds like a great opportunity for topic map fans with an interest in library interfaces to make a contribution.

Topincs 5.2.0

Filed under: Topic Map Software — Patrick Durusau @ 8:20 am

The Topincs 5.2.0 release has a number of enhancements and bug fixes.

Including Topincs 5.2.0 eliminates all problems on tap devices, like the iPhone and the iPad.

Not owning either one I will have to rely on reports from others.

Download: http://www.cerny-online.com/topincs/downloads/topincs-5.2.0.tar.gz

See the Topincs homepage for the manual and other information.

January 5, 2011

Indexicality: Understanding mobile human-computer interaction in context

Filed under: Context,Indexicality,Semiotics — Patrick Durusau @ 1:21 pm

Indexicality: Understanding mobile human-computer interaction in context Authors: Jesper Kjeldskov, Jeni Paay Keywords: Mobile computing, indexicality, physical context, spatial context, social context, prototype systems, field evaluation, public transport, healthcare, sociality

Abstract:

A lot of research has been done within the area of mobile computing and context-awareness over the last 15 years, and the idea of systems adapting to their context has produced promising results for overcoming some of the challenges of user interaction with mobile devices within various specialized domains. However, today it is still the case that only a limited body of theoretically grounded knowledge exists that can explain the relationship between users, mobile system user interfaces, and their context. Lack of such knowledge limits our ability to elevate learning from the mobile systems we develop and study from a concrete to an abstract level. Consequently, the research field is impeded in its ability to leap forward and is limited to incremental steps from one design to the next. Addressing the problem of this void, this article contributes to the body of knowledge about mobile interaction design by promoting a theoretical approach for describing and understanding the relationship between user interface representations and user context. Specifically, we promote the concept of indexicality derived from semiotics as an analytical concept that can be used to describe and understand a design. We illustrate the value of the indexicality concept through an analysis of empirical data from evaluations of three prototype systems in use. Based on our analytical and empirical work we promote the view that users interpret information in a mobile computer user interface through creation of meaningful indexical signs based on the ensemble of context and system.

One of the more interesting observations by the authors is that the greater the awareness of context, the less information that has to be presented to the user. For a mobile device, with limited display area that is an advantage.

It would be an advantage for other interfaces because even with a lot of screen real estate, it would be counter-productive to over run the user with information about a subject.

Present them with the information relevant to a particular context, leaving the option for them to request additional information.

Bribing Statistics

Filed under: Data Source,Marketing,Software — Patrick Durusau @ 1:03 pm

Bribing Statistics by Aleks Jakulin.

Self reporting (I paid a bribe is the name of the application) of bribery in the United States is uncommon, at least characterized as a bribe.

There are campaign finance reports and analysis that link organizations/causes to particular candidates. Not surprisingly, candidates vote in line with their major sources of funding.

The reason I mention it here is to suggest that topic maps could be used to provide a more granular mapping between contributions, office holders (or agency staff) and beneficiaries of legislation or contracts.

None of those things exist in isolation or without identity.

While one researcher might only be interested in DARPA contracts (to use a U.S. based example), the contract officers and the beneficiaries of those contracts, another researcher may be collecting data on campaign contributions that may include some of the beneficiaries of the DARPA contracts.

Topic maps are a great way to accumulate that sort of research over time.

Parallel prototyping leads to better design results, more divergence, and increased self-efficacy

Filed under: Computation,Parallelism — Patrick Durusau @ 12:43 pm

Parallel prototyping leads to better design results, more divergence, and increased self-efficacy Authors: Steven P. Dow, Alana Glassco, Jonathan Kass, Melissa Schwarz, Daniel L. Schwartz, and Scott R. Klemmer

Abstract:

Iteration can help people improve ideas. It can also give rise to fixation, continuously refining one option without considering others. Does creating and receiving feedback on multiple prototypes in parallel, as opposed to serially, affect learning, self-efficacy, and design exploration? An experiment manipulated whether independent novice designers created graphic Web advertisements in parallel or in series. Serial participants received descriptive critique directly after each prototype. Parallel participants created multiple prototypes before receiving feedback. As measured by click-through data and expert ratings, ads created in the Parallel condition significantly outperformed those from the Serial condition. Moreover, independent raters found Parallel prototypes to be more diverse. Parallel participants also reported a larger increase in task-specific self-confidence. This article outlines a theoretical foundation for why parallel prototyping produces better design results and discusses the implications for design education.

Interesting that I should stumble on this article after posting about parallel processing.

A useful heuristic for the development of subject identifications in an organization.

Is Parallel Programming Hard, And, If So, What Can You Do About It?

Filed under: Computation,Parallelism,Software — Patrick Durusau @ 12:36 pm

Is Parallel Programming Hard, And, If So, What Can You Do About It? Editor Paul E. McKenney

Kirk Lowery forwarded this link to my attention.

Just skimming the first couple of chapters, I have to say it has some of the most amusing graphics I have seen in any CS book.

Performance, productivity and generality are all goals of parallel programming, as cited by this book.

I have to wonder though if subject recognition tasks, analogous to computer vision, that are inherently parallel.

Doing them in parallel does not make them easier but not doing them in parallel certainly makes them harder.

For example, consider the last time you failed to recognize someone who wasn’t in the location or context where you normally see them.

Do you recognize the context in addition or in parallel to your recognition of the person’s face?

Questions:

  1. What benefits/drawbacks do you see in parallel processing of TMDM instances? (3-5 pages, citations)
  2. How would you design subject identifications for processing in a parallel environment? (3-5 pages, citations)
  3. How would you evaluate the need for parallel processing of subject identifications? (3-5 pages, citations)

Map of American English Dialects and Subdialects – Post

Filed under: Data Source,Mapping,Maps — Patrick Durusau @ 9:05 am

Map of American English Dialects and Subdialects

From Flowingdata.com a delightful map of American English dialects and subdialects. Several hundred YouTube videos are accessible through the map as examples.

Interesting example of mapping but moreover, looks like an excellent candidate for a topic map that binds in additional resources on the subject.

Enjoy!

January 4, 2011

ColorBrewer – A Tool for Color Design in Maps – Post

Filed under: Authoring Topic Maps,Graphics,Mapping,Maps — Patrick Durusau @ 10:23 am

ColorBrewer – A Tool for Color Design in Maps

From Matthew Hurst:

Just found ColorBrewer2 – a tool that helps select color schemes for map based data. The tool allows you to play with different criteria, then proposes a space of possible color combinations. Proactively filtering for color blindness, photocopy friendly and printer friendly is great. Adding projector friendly (no yellow please) would be nice. I’d love to see something like this for time series and other statistical data forms.

Just the thing for planning map based interfaces for topic maps!

Algorithms – Lecture Notes

Filed under: CS Lectures,String Matching,Subject Identity — Patrick Durusau @ 7:51 am

Algorithms, Jeff Erickson’s lecture notes.

Mentioned in a post on the Theoretical Computer Science blog, What Lecture Notes Should Everyone Read?.

From the introduction:

Despite several rounds of revision, these notes still contain lots of mistakes, errors, bugs, gaffes, omissions, snafus, kludges, typos, mathos, grammaros, thinkos, brain farts, nonsense, garbage, cruft, junk, and outright lies, all of which are entirely Steve Skiena’s fault. I revise and update these notes every time I teach the course, so please let me know if you find a bug. (Steve is unlikely to care.)

The notes are highly amusing and useful to anyone seeking to improve current subject identification (read searching) practices.

« Newer PostsOlder Posts »

Powered by WordPress