Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

September 5, 2011

How Hard is the Local Search Problem?

Filed under: Geographic Information Retrieval,Local Search,Mapping,Searching — Patrick Durusau @ 7:33 pm

How Hard is the Local Search Problem? by Matthew Hurst.

The “local search” problem that Matthew is addressing is illustrated with Google’s mapping of local restaurants in Matthew’s neighborhood.

The post starts:

The local search problem has two key components: data curation (creating and maintaining a set of high quality statements about what the world looks like) and relevance (returning those statements in a manner that satisfies a user need. The first part of the problem is a key enabler to success, but how hard is it?

There are many problems which involve bringing together various data sources (which might be automatically or manually created) and synthesizing an improved set of statements intended to denote something about the real world. The way in which we judge the results of such a process is to take the final database, sample it, and test it against what the world looks like.

In the local search space, this might mean testing to see if the phone number in a local listing is indeed that associated with a business of the given name and at the given location.

But do we quantify this challenge? We might perform the above evaluation and find out that 98% of the phone numbers are correctly associated. Is that good? Expected? Poor?

After following Matthew through his discussion of the various factors in “local search,” what are your thoughts on Google’s success with “local search?”

Could you do better?

How? Be specific, a worked example would be even more convincing.

August 27, 2011

Factual’s Crosswalk API

Filed under: Crosswalk,Mapping — Patrick Durusau @ 9:09 pm

Factual’s Crosswalk API by Matthew Hurst.

From the post:

Factual, which is mining the web for knowledge using a variety of web mining methods, has released an API in the local space which aims to expose, for a specific local entity (e.g. a restaurant) the places on the web that it is mentioned. For example, you might find for a restaurant its homepage, its listing on Yelp, its listing on UrbanSpoon, etc.

This mapping between entities and mentions is potentially a powerful utility. Given all these mentions, if some of the data changes (e.g. via a user update on a Yelp page) then the central knowledge base information for that entity can be updated.

When I looked, the crosswalk API was still limited to the US. Matthew uncovers the accuracy of mapping issues known all to well to topic mappers.

From the Factual site:

Factual Crosswalk does four things:

  1. Converts a Factual ID into 3rd party identifiers and URLs
  2. Converts a 3rd party URL into a Factual canonical record
  3. Converts a 3rd party namespace and ID into a Factual canonical record
  4. Provides a list of URLs where a given Factual entity is found on the Internet

Don’t know about you but I am unimpressed.

In part because of the flatland mapping approach to identification. If all I know is Identifier1 was mapped to Identifier2, that is better than a poke with a sharp stick for identification purposes, but only barely. How do I discover what entity you thought was represented by Identifier1 or Identifier2?

I suppose piling up identifiers is one approach but we can do better than that.


PS: I am adding Crosswalk as a category so I can cover traditional crosswalks as developed by librarians. I am interested in what implicit parts of crosswalks should become explicit in a topic map. Pointers and suggestions welcome. Or conversions of crosswalks into topic maps.

August 5, 2011

Topic “Flow” Map?

Filed under: Mapping,Maps,Visualization — Patrick Durusau @ 7:06 pm

The Economist’s Twitter followers click links, Al Jazeera’s retweet, study finds

This article uses “topic map” in the sense of:

Timing and topical interest matter when seeking attention. By arranging audience tweets into topic maps, we were able to visualise the flow of attention between topics of interest, across the different audiences.

The “topic map” is of @AJENGLISH Audience for an hour.

You can see the full article at: Engaging News Hungry Audiences Tweet by Tweet: An audience analysis of prominent mainstream media news accounts on Twitter

The full article includes another “topic map” of Fox.

As a publisher, I would be interested in what terminology I could use to reach other audiences, perhaps at other times. Doing that mapping of identifications, would require a more traditional topic map.

If you were clever with it, that could result in real-time tracking of different memes for the same subjects across user streams such as Facebook and Twitter. From tracking it is just a short step to modeling and then influencing those memes.

July 1, 2011

World Bank Data

Filed under: Data Source,Mapping,Marketing — Patrick Durusau @ 2:55 pm

World Bank Data

Available through other portals, the World Bank offers access to over 7,000 indicators at its site, along with widgets for displaying the data.

While the World Bank Data website is well done and a step towards “transparency,” it does not address the need for “transparency” in terms financial auditing.

Take for example the Uganda – Financial Sector DPC Project. Admittedly it is only $50M but given it has a forty (40) year term with a ten (10) year grace period, who will be able to say with any certainty what happened to the funds in question?

If there were a mapping between the financial systems that disburse these funds into the financial systems in Uganda, then on whatever basis the information is updated, the World Bank would know and could assure others of the fate of the funds in question.

Granted I am assuming that different institutions and countries have different financial systems and that uniformity of such applications or systems isn’t realistic. It should certainly be possible to setup and maintain mappings between such systems. I suspect that mappings to banks and other financial institutions should be made as well to enable off-site auditing of any and all transactions.

Lest it seem like I am picking on World Bank recipients, I would recommend such mapping/auditing practices for all countries before approval of big ticket items like defense budgets. The fact that an auditing mapping fails in a following year is an indication something was changed for a reason. Once it is understood that changes attract attention and attention uncovers fraud, unexpected maintenance is unlikely to be an issue.

June 23, 2011

The Need For Immutability

Filed under: Immutable,Mapping,Proxies — Patrick Durusau @ 1:52 pm

The Need For Immutability by Andrew Binstock.

From the post:

It makes data items ideal for sharing between threads

Andrew recites a short history of immutability.

Immutability also supports stable mappings between subject representatives.

June 20, 2011

Designing and Refining Schema Mappings via Data Examples

Filed under: Database,Mapping,Schema — Patrick Durusau @ 3:34 pm

Designing and Refining Schema Mappings via Data Examples by Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang-Chiew Tan, from SIGMOD ’11.

Abstract:

A schema mapping is a specification of the relationship between a source schema and a target schema. Schema mappings are fundamental building blocks in data integration and data exchange and, as such, obtaining the right schema mapping constitutes a major step towards the integration or exchange of data. Up to now, schema mappings have typically been specified manually or have been derived using mapping-design systems that automatically generate a schema mapping from a visual specification of the relationship between two schemas. We present a novel paradigm and develop a system for the interactive design of schema mappings via data examples. Each data example represents a partial specification of the semantics of the desired schema mapping. At the core of our system lies a sound and complete algorithm that, given a finite set of data examples, decides whether or not there exists a GLAV schema mapping (i.e., a schema mapping specified by Global-and-Local-As-View constraints) that “fits” these data examples. If such a fitting GLAV schema mapping exists, then our system constructs the “most general” one. We give a rigorous computational complexity analysis of the underlying decision problem concerning the existence of a fitting GLAV schema mapping, given a set of data examples. Specifically, we prove that this problem is complete for the second level of the polynomial hierarchy, hence, in a precise sense, harder than NP-complete. This worst-case complexity analysis notwithstanding, we conduct an experimental evaluation of our prototype implementation that demonstrates the feasibility of interactively designing schema mappings using data examples. In particular, our experiments show that our system achieves very good performance in real-life scenarios.

Two observations:

1) The use of data examples may help overcome the difficulty of getting users to articulate “why” a particular mapping should occur.

2) Data examples that support mappings, if preserved, could be used to illustrate for subsequent users “why” particular mappings were made or even should be followed in mappings to additional schemas.

Mapping across revisions of a particular schema or across multiple schemas at a particular time is likely to benefit from this technique.

June 9, 2011

Ontologies As Low-Lying Subjects

Filed under: Mapping,Ontology — Patrick Durusau @ 6:35 pm

While writing up a call for papers on “integration” of ontologies, it occurred to me that ontologies are really low lying subjects for topic maps.

Any text corpus or database is going to require extraction of its content as a first step.

Your second step is going to be processing that extracted content to identify subjects.

Your third step is going to be creating topics and associations between topics, along with the properties of topics.

Your fourth step, depending on the purpose of your topic map, will be to create pointers back into the content for users (occurrences).

And finally, your fifth step, is going to be fashioning the interface your users will use for the topic map.

Compare those steps to topic mapping ontologies:

Your first step isn’t extraction of the data because while ontologies may exist in some format, they are free standing sets of subjects.

Your second step won’t be to identify subjects because the ontology already has subjects identified. (Yes, there are other subjects you could identify but this is the low-lying fruit version).

You avoid the third step because subjects in an ontology already have properties and relationships to other subjects.

You don’t need pointers because the entire ontology fits into your topic map, so no fourth step.

You have a familiar interface, the ontology itself, which leaves you with no fifth step.

Well, that’s a slight exaggeration. 😉

You do need the third step where subjects in the ontology get properties in addition to the ones they have in their respective ontologies. Those added properties enable the same subjects in different ontologies to merge. If their respective properties are also subjects in the ontology, that is they can have properties, you should be able to merge those properties as well.

I realize that the originators of some ontologies may disagree with the mappings but that really isn’t the appropriate question. The question is whether users find the mapping useful for some particular purpose. I am not sure what other test one would have?

May 19, 2011

How to map connections with great circles

Filed under: Geographic Data,Mapping,R — Patrick Durusau @ 3:26 pm

How to map connections with great circles

From the post:

There are various ways to visualize connections, but one of the most intuitive and straightforward ways is to actually connect entities or objects with lines. And when it comes to geographic connections, great circles are a nice way to do this.

This is a very nice R tutorial on using great circles to visualize airline connections.

The same techniques could map “connections” of tweets, phone calls, emails, any type of data that can be associated with a geographic location.

May 12, 2011

Information Heterogeneity and Fusion

Filed under: Data Fusion,Heterogeneous Data,Information Integration,Mapping — Patrick Durusau @ 7:54 am

2nd International Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011)

Important Dates:

Paper submission deadline: 25th July 2011
Notification of acceptance: 19th August 2011
Camera-ready version due: 12th September 2011
Workshop: 23rd or 27th October 2011

Datasets are also being made available. Just in case you can’t find any heterogeneous data lying around. 😉

Looks like a perfect venue for topic map papers. (Not to mention that a re-usable mapping between recommender systems looks like a commercial opportunity.)

From the website:

In recent years, increasing attention has been given to finding ways for combining, integrating and mediating heterogeneous sources of information for the purpose of providing better personalized services in many information seeking and e-commerce applications. Information heterogeneity can indeed be identified in any of the pillars of a recommender system: the modeling of user preferences, the description of resource contents, the modeling and exploitation of the context in which recommendations are made, and the characteristics of the suggested resource lists.

Almost all current recommender systems are designed for specific domains and applications, and thus usually try to make best use of a local user model, using a single kind of personal data, and without explicitly addressing the heterogeneity of the existing personal information that may be freely available (on social networks, homepages, etc.). Recognizing this limitation, among other issues: a) user models could be based on different types of explicit and implicit personal preferences, such as ratings, tags, textual reviews, records of views, queries, and purchases; b) recommended resources may belong to several domains and media, and may be described with multilingual metadata; c) context could be modeled and exploited in multi-dimensional feature spaces; d) and ranked recommendation lists could be diverse according to particular user preferences and resource attributes, oriented to groups of users, and driven by multiple user evaluation criteria.

The aim of HetRec workshop is to bring together students, faculty, researchers and professionals from both academia and industry who are interested in addressing any of the above forms of information heterogeneity and fusion in recommender systems. We would like to raise awareness of the potential of using multiple sources of information, and look for sharing expertise and suitable models and techniques.

Another dire need is for strong datasets, and one of our aims is to establish benchmarks and standard datasets on which the problems could be investigated. In this edition, we make available on-line datasets with heterogeneous information from several social systems. These datasets can be used by participants to experiment and evaluate their recommendation approaches, and be enriched with additional data, which may be published at the workshop website for future use.

April 9, 2011

David Rumsey Map Collection

Filed under: Mapping,Maps — Patrick Durusau @ 3:42 pm

David Rumsey Map Collection

From the website:

Welcome to the David Rumsey Map Collection Database and Blog. The Map Database has many viewers and the Blog has numerous categories.

The historical map collection has over 26,000 maps and images online. The collection focuses on rare 18th and 19th century North American and South American maps and other cartographic materials. Historic maps of the World, Europe, Asia, and Africa are also represented.

It is going to take a while to form even an impression of such a collection of maps and map related resources.

April 7, 2011

The Beauty of Maps

Filed under: Mapping,Maps — Patrick Durusau @ 7:27 pm

The Beauty of Maps

A BBC special from last year that is now available in twelve (12) parts on YouTube.

The mapping side of topic maps remains largely unexplored.

Perhaps this series will spark expeditions into the wilds of mapping semantics.

March 16, 2011

KNIME – 4th Annual User Group Meeting

Filed under: Data Analysis,Heterogeneous Data,Mapping,Subject Identity — Patrick Durusau @ 3:14 pm

KNIME – 4th Annual User Group Meeting

From the website:

The 4th KNIME Workshop and Users Meeting at Technopark in Zurich, Switzerland took place between February 28th and March 4th, 2011 and was a huge success.

The meeting was very well attended by more than 130 participants. The presentations ranged from customer intelligence and applications of KNIME in soil and fuel research through to high performance data analytics and KNIME applications in the Life Science industry. The second meeting of the special interest group attracted more than 50 attendees and was filled with talks about how KNIME can be put to use in this fast growing research area.

Presentations are available.

A new version of KNIME is available for download with the features listed in ChangeLog 2.3.3.

Focused on data analytics and work flow, another software package that could benefit from an interchangeable subject-oriented approach.

February 22, 2011

Quantum GIS

Filed under: Geographic Information Retrieval,Mapping,Maps — Patrick Durusau @ 1:31 pm

Quantum GIS

From the website:

QGIS is a cross-platform (Linux, Windows, Mac) open source application with many common GIS features and functions. The major features include:

1. View and overlay vector and raster data in different formats and projections without conversion to an internal or common format.

Supported formats include:

  • spatially-enabled PostgreSQL tables using PostGIS and SpatiaLite,
  • most vector formats supported by the OGR library*, including ESRI shapefiles, MapInfo, SDTS and GML.
  • raster formats supported by the GDAL library*, such as digital elevation models, aerial photography or landsat imagery,
  • GRASS locations and mapsets,
  • online spatial data served as OGC-compliant WMS , WMS-C (Tile cache), WFS and WFS-T

2. Create maps and interactively explore spatial data with a friendly graphical user interface. The many helpful tools available in the GUI include:

  • on the fly projection,
  • print composer,
  • overview panel,
  • spatial bookmarks,
  • identify/select features,
  • edit/view/search attributes,
  • feature labeling,
  • vector diagram overlay
  • change vector and raster symbology,
  • add a graticule layer,
  • decorate your map with a north arrow, scale bar and copyright label,
  • save and restore projects

3. Create, edit and export spatial data using:

  • digitizing tools for GRASS and shapefile formats,
  • the georeferencer plugin,
  • GPS tools to import and export GPX format, convert other GPS formats to GPX, or down/upload directly to a GPS unit

4. Perform spatial analysis using the fTools plugin for Shapefiles or the integrated GRASS plugin, including:

  • map algebra,
  • terrain analysis,
  • hydrologic modeling,
  • network analysis,
  • and many others

5. Publish your map on the internet using the export to Mapfile capability (requires a webserver with UMN MapServer installed)

6. Adapt Quantum GIS to your special needs through the extensible plugin architecture.

I didn’t find this on my own. 😉 This and the T I G E R data source were both mentioned Paul Smith’s Mapping with Location Data presentation.

Data and manipulations you usually find have no explicit basis in subject identity but that is your opportunity to really shine.

Assuming you can discover some user need that can be met with explicit subject identity or met better with explicit subject identity than not.

Let’s try not to be like some vendors I could mention where a user’s problem has to fit the solution they are offering. I turned down an opportunity like that, some thirty years ago now, and see no reason to re-visit that decision.

At least in my view, any software solution has to fit my problem, not vice versa.

T I G E R – Topologically Integrated Geographic Encoding and Referencing system

Filed under: Geographic Data,Mapping,Maps — Patrick Durusau @ 1:28 pm

T I G E R – Topologically Integrated Geographic Encoding and Referencing system

From the US Census Bureau.

From the website:

Latest TIGER/Line® Shapefile Release

  • TIGER/Line®Shapefiles are spatial extracts from the Census Bureau’s MAF/TIGER database, containing features such as roads, railroads, rivers, as well as legal and statistical geographic areas.
  • They are made available to the public for no charge and are typically used to provide the digital map base for a Geographic Information System or for mapping software.
  • They are designed for use with geographic information system (GIS) software. The TIGER/Line®Shapefiles do not include demographic data, but they contain geographic entity codes that can be linked to the Census Bureau’s demographic data, available on American FactFinder

2010 TIGER/Line® Shapefiles Main Page — Released on a rolling basis beginning November 30, 2010.

and,

TIGER®-Related Products

Great source of geographic and other data.

Can use it for mashups or, you can push beyond mashups to creating topic maps.

For example, plotting all the crime in an area is a mashup.

Interesting I suppose for real estate agents pushing housing in better neighborhoods.

Having the crime reported in an area and the location of crimes committed by the same person (based on arrest reports) and known associates of that person, that is starting to sound like a topic map. Then add in real time observations and conversations of officers working the area.

Enhancing traditional law enforcement, the most effective way to prevent terrorism.

“Mapping with Location Data” by Paul Smith (February 2011)

Filed under: Location Data,Mapping,Maps — Patrick Durusau @ 1:24 pm

“Mapping with Location Data” by Paul Smith (February 2011)

From the description:

With the recent announcement of Baltimore’s open-data initiative, “OpenBaltimore”, there’s been lots of buzz about what people can do with some of this city data. Enter Paul Smith, an expert on data and mapping. Paul will be talking about EveryBlock and how his company uses city data in their neighborhood maps, as well as showing off some cool map visualizations. He’ll also be providing some insight on how folks might be able to jump in and create their own maps based on their own location data.

Our speaker Paul Smith is co-founder and software engineer for EveryBlock, a “what’s going on in my neighborhood” website. He has been developing sites and software for the web since 1994. Originally from Maryland, he recently moved to Baltimore after more than a decade in Chicago, where we co-founded Friends of the Bloomingdale Trail and produced the Election Day Advent Calendar.

Great source of ideas for city data and use of the same.

Two ways where topic maps are a value-add:

1) The relationships between data sets are subjects that can be represented and additional information recorded about those relationships.

2) Identification of subjects can support reliable attachment of other data to the same subjects.

January 31, 2011

OpenData + R + Google = Easy Maps
(& Lessons for Topic Maps)

Filed under: Examples,Mapping,R — Patrick Durusau @ 7:15 am

OpenData + R + Google = Easy Maps from James Chesire (via R-Bloggers is a compelling illustration of the use of R for mapping.

It also illustrates a couple of principles that are important for topic map authors to keep in mind:

1) An incomplete [topic] map is better than no [topic] map at all.

Chesire could have waited until he had all the data from every agency studying the issue of child labor and reconciling that data with field surveys, plus published reports from news organizations, etc., but then we would not have this article would we?

We also would not have a useful mapping of the data we have on hand.

I mention this one first because it is one that afflicts me the most.

I work on example topic maps but because they aren’t complete I am reluctant to see them as being in publishable shape.

The principle from software coding, release early and often, should be the operative principle for topic map authoring.

2) There is no true view of the data that should be honored.

Many governments of countries on this map would dispute the accuracy of the data. And your point would be?

Every map tells a story from a point of view.

There isn’t any reason for your topic map to await approval of any particular group or organization included in it.

A world of data awaits us as topic mappers.

The only question is whether we are going to step up to take advantage of it?

*****
PS: My position on incomplete topic maps is not inconsistent with my view on PR driven SQL data dumps that are topic maps in name only. As they say, you can put lipstick on a pig, ….

January 28, 2011

Why Command Helpers Suck – Post

Filed under: Database,Examples,Mapping — Patrick Durusau @ 6:53 am

Why Command Helpers Suck is an amusing rant by Kristina Chodorow (author of MongdoDB: The Definitive Guide) on the different command helpers for the same underlying database commands.

Shades of XWindows documentation and the origins of topic maps. Same commands, different terminology.

If as Robert Cerny has suggested topic maps don’t offer something new then I think it is fair to observe that the problems topic maps work to solve aren’t new either. 😉

A bit more seriously, topic maps could offer Kristina a partial solution.

Imagine a utility for command helpers that is actively maintained and that has a mapping between all the known command helpers and a given database command.

Just enter the command you know and the appropriate command is sent to the database.

That is the sort of helper application that could easily find a niche.

The master mapping could be maintained with full identifications, notes, etc. but there needs to be a compiled version for speed of response.

January 23, 2011

geocommons

Filed under: Dataset,Geographic Information Retrieval,Mapping,Maps — Patrick Durusau @ 9:27 pm

geocommons

A very impressive resource for mapping data against a common geographic background.

Works for a lot of reasons, not the least of which is the amount of effort that has gone into the site and its tools.

But, I think having a common frame of reference, that is geographic locations, simplifies the problem addressed by topic maps.

That is the data is seen through the common lens of geographic boundaries and/or locations.

To make it closer to the problem faced by topic maps, what if geographic locations had to be brought into focus, before data could be mapped against them?

That seems to me to be the harder problem.

January 19, 2011

Quantum Mechanics of Topic Maps

Filed under: Category Theory,Mapping,Maps,Topic Maps — Patrick Durusau @ 6:47 pm

I ran across Alfred Korzybski’s dictum “…the map is not the territory…” the other day.

I’ve repeated it and have heard others repeat it.

Not to mention it being quoted in any number of books on mapping and mapping technologies.

It’s a natural distinction, between the artifact of a map and the territory it is mapping.

But it is important to note that Korzbski did not say “…a map cannot be a territory….”

Like the wave/particle duality in quantum mechanics, maps can be maps or they can be territories.

Depends upon the purpose with which we are viewing them.

A rather wicked observer effect that changes the formal properties of a map vis-a-vis a territory to being the properties of a territory vis-a-vis a map.

Maps (that is syntaxes/data models) try to avoid that observer effect by proclaiming themselves to be the best possible of all possible maps in the traditional of Dr. Pangloss.

They may be the best map for some situation, but they remain subject to being viewed as a territory, should the occasion arise.

(If that sounds like category theory to you, give yourself a gold star.)

The map-as-territory principle is what enables the viewing of subject representatives in different maps as representatives of the same subjects.

Otherwise, we must await the arrival of the universal mapping strategy.

It is due to arrive on the same train as the universal subject identifier for all subjects, for all situations and time periods.

January 5, 2011

Map of American English Dialects and Subdialects – Post

Filed under: Data Source,Mapping,Maps — Patrick Durusau @ 9:05 am

Map of American English Dialects and Subdialects

From Flowingdata.com a delightful map of American English dialects and subdialects. Several hundred YouTube videos are accessible through the map as examples.

Interesting example of mapping but moreover, looks like an excellent candidate for a topic map that binds in additional resources on the subject.

Enjoy!

January 4, 2011

ColorBrewer – A Tool for Color Design in Maps – Post

Filed under: Authoring Topic Maps,Graphics,Mapping,Maps — Patrick Durusau @ 10:23 am

ColorBrewer – A Tool for Color Design in Maps

From Matthew Hurst:

Just found ColorBrewer2 – a tool that helps select color schemes for map based data. The tool allows you to play with different criteria, then proposes a space of possible color combinations. Proactively filtering for color blindness, photocopy friendly and printer friendly is great. Adding projector friendly (no yellow please) would be nice. I’d love to see something like this for time series and other statistical data forms.

Just the thing for planning map based interfaces for topic maps!

December 18, 2010

KNIME Version 2.3.0 released – News

Filed under: Heterogeneous Data,Mapping,Software,Subject Identity — Patrick Durusau @ 12:48 pm

KNIME Version 2.3.0 released

From the announcement:

The new version is a greatly enhancing the usability of KNIME. It adds new features like workflow annotations, support for hotkeys, inclusion of R-views in reports, data flow switches, option to hide node labels, variable support in the database reader/connector and R-nodes, and the ability to export KNIME workflows as SVG Graphics.

With the 2.3 release we are also introducing a community node repository, which includes KNIME extensions for bio- and chemoinformatics and an advanced R-scripting environment.

December 14, 2010

Medical researcher discovers integration, gets 75 citations

Filed under: Humor,Mapping,Topic Maps — Patrick Durusau @ 5:52 pm

Medical researcher discovers integration, gets 75 citations

Steve Derose forwarded a pointer to this post.

Highly recommended.

Summary: Medical researcher re-discovers technique for determining area under a curve, it is accepted for publication and then cited 75 times. BTW, second semester calculus starts with this issue.

Questions:

  1. Can topic maps help researchers discover information in other fields? Yes/No? (3-5 pages, no citations)
  2. Assume yes, how would you construct a topic map to help the medical researcher? (3-5 pages, no citations)
  3. Assume no, what are the barriers that prevent topic maps from helping the researcher? (3-5 pages, no citations)

December 2, 2010

Building Concept Structures/Concept Trails

Automatically Building Concept Structures and Displaying Concept Trails for the Use in Brainstorming Sessions and Content Management Systems Authors: Christian Biemann, Karsten Böhm, Gerhard Heyer and Ronny Melz

Abstract:

The automated creation and the visualization of concept structures become more important as the number of relevant information continues to grow dramatically. Especially information and knowledge intensive tasks are relying heavily on accessing the relevant information or knowledge at the right time. Moreover the capturing of relevant facts and good ideas should be focused on as early as possible in the knowledge creation process.

In this paper we introduce a technology to support knowledge structuring processes already at the time of their creation by building up concept structures in real time. Our focus was set on the design of a minimal invasive system, which ideally requires no human interaction and thus gives the maximum freedom to the participants of a knowledge creation or exchange processes. The initial prototype concentrates on the capturing of spoken language to support meetings of human experts, but can be easily adapted for the use in Internet communities that have to rely on knowledge exchange using electronic communication channel.

I don’t share the author’s confidence that corpus linguistics are going to provide the level of accuracy expected.

But, I find the notion of a dynamic semantic map that grows, changes and evolves during a discussion to be intriguing.

This article was published in 2006 so I will follow up to see what later results have been reported.

November 25, 2010

Sig.ma – Live views on the Web of Data

Filed under: Indexing,Information Retrieval,Lucene,Mapping,RDF,Search Engines,Semantic Web — Patrick Durusau @ 10:27 am

Sig.ma – Live views on the Web of Data

From the website:

In Sig.ma, elements such as large scale semantic web indexing, logic reasoning, data aggregation heuristics, pragmatic ontology alignments and, last but not least, user interaction and refinement, all play together to provide entity descriptions which become live, embeddable data mash ups.

Read one of various versions of an article on Sig.ma for the technical details.

From the Web Technologies article cited on the homepage:

Sig.ma revolves around the creation of Entity Profiles. An entity profile – which in the Sig.ma dataflow is represented by the “data cache” storage (Fig. 3) – is a summary of an entity that is presented to the user in a visual interface, or which can be returned by the API as a rich JSON object or a RDF document. Entity profiles usually include information that is aggregated from more than one source. The basic structure of an entity profile is a set of key-value pairs that describe the entity. Entity profiles often refer to other entities, for example the profile of a person might refer to their publications.

No, this isn’t an implementation of the TMRM.

This is an implementation of one way to view entities for a particular type of data. A very exciting one but still limited to a particular data set.

This is a big step forward.

For example, it isn’t hard to imagine entity profiles against particular websites or data sets. Entity profiles that are maintained and leased for use with search engines like Sig.ma.

Or going a bit further and declaring a basis for identification of subjects, such as the existence of properties a…n in an RDF graph.

Questions:

  1. Spend a couple of hours with Sig.ma researching library related questions. (Discussion)
  2. What did you like, dislike or find surprising about Sig.ma? (3-5 pages, no citations)
  3. Entity profiles for library science (Class project)

Sig.ma: Live Views on the web of data – bibliography issues

I normally start with a DOI here so you can see article in question.

Not here.

Here’s why:

Sig.ma: Live views on the Web of Data Journal of Web Semantics. (10 pages)

Sig.ma: Live Views on the Web of Data WWW ’10 Proceedings(demo, 4 pages)

Sig.ma: Live Views on the Web of Data (8 pages) http://richard.cyganiak.de/2008/papers/sigma-semwebchallenge2009.pdf

Sig.ma: Live Views on the Web of Data (4 pages) http://richard.cyganiak.de/2008/papers/sigma-demo-www2010.pdf

Sig.ma: Live Views on the Web of Data (25 pages) http://fooshed.net/paper/JWS2010.pdf

Before saying anything ugly, ;-), this is some of the most exciting research I have seen in a long time. I will cover that part of it in a following post. But, to the matter at hand, bibliographic control.

Five (5) different articles, two published in recognized journals that all have the same name? (The demo articles are the same but have different headers/footers, page numbers and so would likely be indexed as different articles.)

I will be able to resolve any confusion by obtaining the article in question.

But that isn’t an excuse.

I, along with everyone else interested in this research, will waste a small part of our time resolving the confusion. Confusion that could have been avoided for everyone.

Not unlike everyone who does the same search having to tread the same google glut.

With no way to pass on what we have resolved, for the benefit of others.

Questions:

  1. Help these authors out. How would you suggest they avoid this in the future? Use of the name is important. (3-5 pages, no citations)
  2. Help the library out. How will you deal with multiple papers with the same title, authors, pub year? (this isn’t uncommon) (3-5 pages, citations optional)
  3. How would you use topic maps to resolve this issue? (3-5 pages, no citations)

November 24, 2010

Strange Maps

Filed under: Mapping,Maps — Patrick Durusau @ 9:47 am

Strange Maps

From the website:

Frank Jacobs loves maps, but finds most atlases too predictable. He collects and comments on all kinds of intriguing maps—real, fictional, and what-if ones—and has been writing the Strange Maps blog since 2006, first on WordPress and now for Big Think.

I mention this because maps are often seen as depicting the way things are.

I prefer to think of maps, including maps of subjects, as useful for particular purposes.

That isn’t quite the same thing.

Questions:

  1. What basis for comparison/representation would you like to see used for states or counties? (discussion)
  2. What do you think that would show/demonstrate differently from standard maps? (discussion)
  3. Suggest data sources for creating such a representation. (3-5 pages, citations)

November 22, 2010

TxtAtlas

Filed under: Information Retrieval,Interface Research/Design,Mapping — Patrick Durusau @ 7:08 am

TxtAtlas

First noticed on Alex Popescu’s blog.

Text a phone number and then it appears as an entry on a map.

I have an uneasy feeling this may be important.

Not this particular application but the ease of putting content from dispersed correspondents together into a single map.

I wonder if instead of distance the correspondents could be dispersed over time? Say as users of a document archive?*

Questions:

  1. How would you apply these techniques to a document archive? (3-5 pages, no citations)
  2. How would you adapt the mapping of a document archive based on user response? (3-5 pages, no citations)
  3. Design an application of this technique for a document archive. (Project)

*Or for those seeking more real-time applications, imagine GPS coordinates + status updates from cellphones on a more detailed map. Useful for any number of purposes.

October 29, 2010

Ordinance Survey Linked Data

Filed under: Authoring Topic Maps,Mapping,Merging,Topic Maps — Patrick Durusau @ 5:40 am

Ordinance Survey Linked Data.

Description:

Ordnance Survey is Great Britain’s national mapping agency, providing the most accurate and up-to-date geographic data, relied on by government, business and individuals. OS OpenData is the opening up of Ordnance Survey data as part of the drive to increase innovation and support the “Making Public Data Public” initiative. As part of this initiative Ordnance Survey has published a number of its products as Linked Data. Linked Data is a growing part of the Web where data is published on the Web and then linked to other published data in much the same way that web pages are interlinked using hypertext. The term Linked Data is used to describe a method of exposing, sharing, and connecting data via URIs on the Web….

Let’s use topic maps to connect subjects that don’t have URIs.

Subject mapping exercise:

  1. Connect 5 subjects from the Domseday Book
  2. Connect 5 subjects from either The Shakespeare Paper Trail: The Early Years and/or The Shakespeare Paper Trail: The Later Years
  3. Connect 5 subjects from WW2 People’s War (you could do occurrences but try for something more imaginative)
  4. Connect 5 subjects from some other period of English history.
  5. Suggest other linked data sources and sources of subjects for subject mapping (extra credit)

October 17, 2010

The Neighborhood Auditing Tool for the UMLS and its Source Terminologies

Filed under: Authoring Topic Maps,Interface Research/Design,Mapping,Topic Maps,Usability — Patrick Durusau @ 5:19 am

The next NCBO Webinar will be presented by Dr. James Geller from the New Jersey Institute of Technology on “The Neighborhood Auditing Tool for the UMLS and its Source Terminologies” at 10:00am PDT, Wednesday, October 20.

ABSTRACT:

The UMLS’s integration of more than 100 source vocabularies makes it susceptible to errors. Furthermore, its size and complexity can make it very difficult to locate such errors. A software tool, called the Neighborhood Auditing Tool (NAT), that facilitates UMLS auditing is presented. The NAT supports “neighborhood-based” auditing, where, at any given time, an auditor concentrates on a single focus concept and one of a variety of neighborhoods of its closely related concepts. The NAT can be seen as a special browser for the complex structure of the UMLS’s hierarchies. Typical diagrammatic displays of concept networks have a number of shortcomings, so the NAT utilizes a hybrid diagram/text interface that features stylized neighborhood views which retain some of the best features of both the diagrammatic layouts and text windows while avoiding the shortcomings. The NAT allows an auditor to display knowledge from both the Metathesaurus (concept) level and the Semantic Network (semantic type) level. Various additional features of the NAT that support the auditing process are described. The usefulness of the NAT is demonstrated through a group of case studies. Its impact is tested with a study involving a select group of auditors.


WEBEX DETAILS:
Topic: NCBO Webinar Series
Date: Wednesday, October 20, 2010
Time: 10:00 am, Pacific Daylight Time (San Francisco, GMT-07:00)
Meeting Number: 929 613 752
Meeting Password: ncbomeeting

****

Deeply edited version from NCBO Webinar – James Geller, October 20 at 10:00am PT, which has numerous other details.

If you translate “integration” as “merging,” the immediate relevance to topic maps and exploration of data sets becomes immediately obvious.

« Newer PostsOlder Posts »

Powered by WordPress