Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

December 17, 2010

Google Books Ngram Viewer

Filed under: Dataset,Software — Patrick Durusau @ 4:33 pm

Google Books Ngram Viewer

From the website:

Scholars interested in topics such as philosophy, religion, politics, art and language have employed qualitative approaches such as literary and critical analysis with great success. As more of the world’s literature becomes available online, it’s increasingly possible to apply quantitative methods to complement that research. So today Will Brockman and I are happy to announce a new visualization tool called the Google Books Ngram Viewer, available on Google Labs. We’re also making the datasets backing the Ngram Viewer, produced by Matthew Gray and intern Yuan K. Shen, freely downloadable so that scholars will be able to create replicable experiments in the style of traditional scientific discovery.

Since 2004, Google has digitized more than 15 million books worldwide. The datasets we’re making available today to further humanities research are based on a subset of that corpus, weighing in at 500 billion words from 5.2 million books in Chinese, English, French, German, Russian, and Spanish. The datasets contain phrases of up to five words with counts of how often they occurred in each year.

Tracing shifts in language usage will help topic map designers create maps for historical materials that require less correction by users.

One wonders if the extracts can be traced back to particular works?

That would enable a map developed for these extracts to be used with the scanned texts themselves.

Data Driven Journalism

Filed under: Data Mining,R — Patrick Durusau @ 2:15 pm

Data Driven Journalism

Report of a presentation by Peter Aldhous, San Francisco Bureau Chief of New Scientist Magazine to the Bay area Ruser group.

Main focus is on the use of data in journalism with coverage of the use of R.

Needs only minor tweaking to make an excellent case for topic maps in journalism.

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

CFP – Dealing with the Messiness of the Web of Data – Journal of Web Semantics

From the call:

Research on the Semantic Web, which is now in its second decade, has had a tremendous success in encouraging people to publish data on the Web in structured, linked, and standardized ways. The success of what has now become the Web of Data can be read from the sheer number of triples available within the Linked-Open Data, Linked Life Data and Open-Government initiatives. However, this growth in data makes many of the established assumptions inappropriate and offers a number of new research challenges.

In stark contrast to early Semantic Web applications that dealt with small, hand-crafted ontologies and data-sets, the new Web of Data comes with a plethora of contradicting world-views and contains incomplete, inconsistent, incorrect, fast-changing and opinionated information. This information not only comes from academic sources and trustworthy institutions, but is often community built, scraped or translated.

In short: the Web of Data is messy, and methods to deal with this messiness are paramount for its future.

Now, we have two choices as the topic map community:

  • congratulate ourselves for seeing this problem long ago, high five each other, etc., or
  • step up and offer topic map solutions that incorporate as much of the existing SW work as possible.

I strongly suggest the second one.

Important dates:

We will aim at an efficient publication cycle in order to guarantee prompt availability of the published results. We will review papers on a rolling basis as they are submitted and explicitly encourage submissions well before the submission deadline. Submit papers online at the journal’s Elsevier Web site.

Submission deadline: 1 February 2011
Author notification: 15 June 2011

Revisions submitted: 1 August 2011
Final decisions: 15 September 2011
Publication: 1 January 2012

December 16, 2010

Emergent Semantics

Filed under: Bayesian Models,Emergent Semantics,Self-Organizing — Patrick Durusau @ 7:08 pm

Philippe Cudré-Mauroux Video, Slides from SOKS: Self-Organising Knowledge Systems, Amsterdam, 29 April 2010

Abstract:

Emergent semantics refers to a set of principles and techniques analyzing the evolution of decentralized semantic structures in large scale distributed information systems. Emergent semantics approaches model the semantics of a distributed system as an ensemble of relationships between syntactic structures.

They consider both the representation of semantics and the discovery of the proper interpretation of symbols as the result of a self-organizing process performed by distributed agents exchanging symbols and having utilities dependent on the proper interpretation of the symbols. This is a complex systems perspective on the problem of dealing with semantics.

A “must see” presentation!

More comments/questions to follow.

*****
Apologies but content/postings will be slow starting today, for a few days. Diagnostic on left hand has me doing hunt-and-peck with my right.

December 15, 2010

Mergent Annual Reports API

Filed under: Data Source — Patrick Durusau @ 3:59 pm

Mergent Annual Reports API

From http://www.programmableweb.com:

Provides scanned annual report documents (e.g. in PDF format) for US and Canadian publicly traded companies, with more than 20 years of historical reports available.

According to the Mergent Annual Reports API docs you can search on:

  • nameContains
  • naicsPrefix
  • primaryCusipPrefix
  • primaryTicker
  • sicPrefix

The SEC’s Edgar tutorial mentions that the SEC has full text of annual reports and attachments for the last four (4) years.

So the omission of full-text searching looks odd.

Questions:

  1. Review the Edgar documentation. What other fixed fields would you add to this interface and why? (2-3 pages, no citations)
  2. Would you offer full-text searching? If yes, what do you gain? If no, why not? (2-3 pages, no citations)
  3. What other information (this source or some other) would you want to include? Why? (2-3 pages, no citations)

M*A*S*H 4077th + Klinger = Rationale for Topic Maps

Filed under: Examples,Topic Maps — Patrick Durusau @ 11:35 am

Period of Adjustment, M*A*S*H 4077th, Season 8, is an episode where Corporal Klinger has to learn to fill the shoes of the much beloved Corporal “Radar” O’Reilly.

It is also as scene that everyone has seen repeated everywhere from boardrooms to lunchrooms.

A key person, who had accumulated skills and knowledge about a position retires, is transferred or promoted.

There follows a hellish period while their replacement learns where the supplies are stored, the key people to alert to problems, when to start prepping the boss for meetings, etc.

It seems like they will never get any of it right, until one day, without our really noticing it, they do.

They keep getting it right, until that person retires, is transferred or promoted. Then the hellish period begins again.

What if we could capture that organizational knowledge?

Organizational knowledge that wasn’t written down.

Topic maps help capture organizational knowledge because your staff can:

  • write down what they know
  • how they know it,
  • when it is important to know it,
  • in their own terms.

When your staff gets the opportunity (and tool) to express their knowledge is up to you.

****
PS: Can someone help me update this example?

I haven’t watched much TV since the end of M*A*S*H and more recent examples would be a good thing. Thanks!

Mergent Company Fundamentals

Filed under: Data Source — Patrick Durusau @ 9:26 am

Mergent Company Fundamentals

From http://www.programmableweb.com:

Provides financial information on US and Canadian publicly traded companies, including historical financial statements (balance sheets, cash flow statements, income statements and ratios) going back 20+ years. Additionally, provides information on executives, including current and historical compensation, biography, insider holdings and insider transactions.

Interesting as a data source and certainly useful with other Mergent products (to be posted).

Questions:

  1. Identify three (3) other data sources that could be integrated with information from this API. (1-2 pages, citations)
  2. How would you integrate those resources with this one using a topic map? (3-5 pages, no citations)
  3. Why would you want to integrate those records using a topic map? What need might it meet? (3-5 pages, what needs would be met by this integration?

Graph Databases – Intro Slide Deck – Ontologies – Connectedness

Filed under: Graphs,Ontology — Patrick Durusau @ 8:27 am

Graph (Theory and Databases) is a nice overview of graph theory and databases by Pere Urbón-Bayes. I saw it first at: Alex Popescu’s site.

I do have a quibble about slide 14 with the usual graph showing progress towards Everything connected.

To your lower left are ontologies, RDF, Linked Data, Tagging, moving I suppose from less to more connected.

There is only one problem: Everything is already connected.

It doesn’t need electronic or other information systems for connections.

What is at issue is the representation of connections in electronic information systems.

The reason I emphasize that point is that all representations in electronic information systems are partial representations of some connections.

And as far as that goes, all representations do better with some aspects of connections than others.

For example, I don’t think that ontologies are further up the connection line than folksonomies.

Depends on your particular requirements as to which one is more connected

Mashup Champaign

Filed under: Examples,Topic Maps — Patrick Durusau @ 7:25 am

When I think about government information, I tend to think about national or state government type information.

Which overlooks the level of government that has the most impact on our day to day lives, county and city government.

I am sure we have all seen the development maps that show how urban patterns develop.

What if we had a map (with a topic map underneath) that had ownership, family/business relationships keyed to the real estate and transactions on real estate, and holders of political office as the binding points?

I have started looking at Champaign for likely resources.

Questions:

  1. Examine the Enterprise Zone map. Comments/observations? (2-3 pages, no citations)
  2. What other city records would you want to see in connection with the map in #1? Are those records online? (2-3 pages, no citations)
  3. What relationships and other information would you want to record in a topic map about this map? (3-5 pages, no citations

Mashup Australia

Filed under: Examples — Patrick Durusau @ 7:21 am

Mashup Australia

I ran across this site as a reference in a story at http://www.programmableweb.com.

From the website:

The Government 2.0 Taskforce invited you to MashupAustralia to help us show why open access to Australian government information is good for our economy and society.

Recommended as a source of mashup ideas.

Questions:

  1. Pick one of the mashups to review. Write a summary of the mashup. (2-3 pages, pointer to the mashup)
  2. What, if anything, would use of topic map add to this mashup? (3-5 pages, no citations)
  3. Pick a related mashup. How would you integrate these two mashups using a topic map? (3-5 pages, no citations)

Geekiest Christmas Present Ever

Filed under: Conferences — Patrick Durusau @ 5:35 am

Want the geekiest Christmas present ever?

How about a week packed with long monologues/conversations about the caves, mazes and plumbing that underlie publishing, business and government?

How about a pair of tickets (you and your “other”) to Montreal for the Balisage 2011 conference?!

Previously seen at Balisage conferences:

Michael Kay, who became editor of XSLT/XPath, so he could write another book. (Note to Michael: Lots of people write books without first writing standards.)

Steve Newcomb, beloved by all, understood by, well, that is a somewhat shorter list. (Check at the Balisage registration desk. We’ll leave a note.)

Ken Holman, who is reformatting the world’s business documents, one country at a time. (Ken will soon have a permanent booth at the UN.)

Jon Bosak, who specializes in helping businesses he isn’t working for, work better. (Wait, there’s something wrong with that. It’s true, but seems unnatural.)

Michael Sperberg-McQueen, the markup world’s answer to Paladin, “HAVE MARKUP WILL TRAVEL.” (Why a joker on his business card? You’ll have to ask him.)

Important Dates:

  • 11 March 2011 – Peer review applications due
  • 8 April 2011 – Paper submissions due
  • 8 April 2011 – Applications due for student support awards
  • 20 May 2011 – Speakers notified
  • 8 July 2011 – Final papers due
  • 1 August 2011 – Pre-conference Symposium
  • 2-5 August 2011 – Balisage: The Markup Conference

Get your tickets now before all the planes/trains fill up, all going to Montreal for the Balisage 2011 conference!

Have a geeky Christmas!

(Or holiday/non-holiday/festival of your choice. Just be sure to be in Montreal for Balisage.)

December 14, 2010

Medical researcher discovers integration, gets 75 citations

Filed under: Humor,Mapping,Topic Maps — Patrick Durusau @ 5:52 pm

Medical researcher discovers integration, gets 75 citations

Steve Derose forwarded a pointer to this post.

Highly recommended.

Summary: Medical researcher re-discovers technique for determining area under a curve, it is accepted for publication and then cited 75 times. BTW, second semester calculus starts with this issue.

Questions:

  1. Can topic maps help researchers discover information in other fields? Yes/No? (3-5 pages, no citations)
  2. Assume yes, how would you construct a topic map to help the medical researcher? (3-5 pages, no citations)
  3. Assume no, what are the barriers that prevent topic maps from helping the researcher? (3-5 pages, no citations)

NKE: Navigational Knowledge Engineering

Filed under: Authoring Topic Maps,Ontology,Subject Identity,Topic Maps — Patrick Durusau @ 5:36 pm

NKE: Navigational Knowledge Engineering

From the website:

Although structured data is becoming widely available, no other methodology – to the best of our knowledge – is currently able to scale up and provide light-weight knowledge engineering for a massive user base. Using NKE, data providers can publish flat data on the Web without extensively engineering structure upfront, but rather observe how structure is created on the fly by interested users, who navigate the knowledge base and at the same time also benefit from using it. The vision of NKE is to produce ontologies as a result of users navigating through a system. This way, NKE reduces the costs for creating expressive knowledge by disguising it as navigation. (emphasis in original)

This methodology may or may not succeed but it demonstrates a great deal of imagination.

Now imagine a similar concept but built around subject identity.

Where known ambiguities offer a user a choice of subjects to identify.

Or where there are different ways to identify a subject. The harder case.

Questions:

  1. Read the paper/run the demo. Comments, suggestions? (3-5 pages, no citations)
  2. How would you adapt this approach to the identification of subjects? (3-5 pages, no citations)
  3. What data set would you suggest for a test case using the technique you describe in #2? Why is that data set a good test? (3-5 pages, pointers to the data set)

World Library and Information Congress : 77th IFLA General Conference and Assembly

Filed under: Conferences,Library,Library Associations — Patrick Durusau @ 3:30 pm

Calls for Papers: World Library and Information Congress : 77th IFLA General Conference and Assembly 13-18 August 2011, San Juan, Puerto Rico

You should visit the main site as well but I linked directly to the call for papers listing. Some 15 of 16 calls for the main conference are still open and there are calls for satellite meeting papers as well.

Proceedings from prior conferences are available (at least the two that I checked) and I will include links to those in an upcoming post.

CFP: 10th International Workshop on Web Semantics (WebS 2011),

Filed under: Conferences,Ontology — Patrick Durusau @ 8:32 am

CFP: 10th International Workshop on Web Semantics (WebS 2011)

The 10th International Workshop on Web Semantics (WebS 2011) will be held in conjunction with 22nd International Workshop on Database and Expert Systems Applications DEXA), to be held on 29 August – 02 September 2011 in Toulouse, France.

From the email post:

The special topic “Reliability of ontologies” aims on detecting reusable ontologies and measuring the reliability of possible reusable ontology candidates. How can we measure the reliability and the usability of ontologies? Which adaptations of state-of-the-art ontology engineering methodologies are necessary to support modeling reusable ontologies? What measurements for defining and comparing ontologies can be used and how could ontology repositories use them? These are some of the open research questions to be addressed by papers dedicated to this year’s special topic.

Important dates:

  • Paper submission: March 04, 2011
  • Notification of acceptance: May 16, 2011
  • Webs 2011 Workshop: 29 August – 02 September, 2011

Questions:

  1. What role could topic maps play in answering/exploring the questions for this workshop? (3-5 pages, citations)
  2. (if after the workshop) How would a topic map solution differ from the solution offered by the paper you have chosen from those presented? (3-5 pages, citations)
  3. (after the workshop, extra credit) Create a topic map of the program committee, the presenters, the affiliations of the presenters, with a visual display of the same.*

*I don’t know what you will find, if anything. It is something I have always been curious about but obviously not curious enough to do the analysis.

Invenio – Library Software

Filed under: Library software,OPACS,Software — Patrick Durusau @ 7:51 am

Invenio (new release)

From the website:

Invenio is a free software suite enabling you to run your own digital library or document repository on the web. The technology offered by the software covers all aspects of digital library management from document ingestion through classification, indexing, and curation to dissemination. Invenio complies with standards such as the Open Archives Initiative metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying bibliographic format. The flexibility and performance of Invenio make it a comprehensive solution for management of document repositories of moderate to large sizes (several millions of records).

Invenio has been originally developed at CERN to run the CERN document server, managing over 1,000,000 bibliographic records in high-energy physics since 2002, covering articles, books, journals, photos, videos, and more. Invenio is being co-developed by an international collaboration comprising institutes such as CERN, DESY, EPFL, FNAL, SLAC and is being used by about thirty scientific institutions worldwide (see demo).

One of many open source library projects where topic maps are certainly relevant.

Questions:

Choose one site for review and one for comparison from General/Demo – Invenio

  1. What features of the site you are reviewing could be enhanced by the use of topic maps? Give five (5) specific search results that could be improved and then say how they could be improved. (3-5 pages, include search results)
  2. Are your improvements domain specific? Use the comparison site in answering this question. (3-5 pages, no citations)
  3. How would you go about making the case for altering the current distribution? What is the payoff for the end user? (not the same as enhancement, asking about when end users would find easier/better/faster. Perhaps you should ask end users? How would you do that?) (3-5 pages, no citations)

Duplicate and Near Duplicate Documents Detection: A Review

Filed under: Data Mining,Duplicates — Patrick Durusau @ 7:24 am

Duplicate and Near Duplicate Documents Detection: A Review Authors: J Prasanna Kumar, P Govindarajulu Keywords: Web Mining, Web Content Mining, Web Crawling, Web pages, Duplicate Document, Near duplicate pages, Near duplicate detection

Abstract:

The development of Internet has resulted in the flooding of numerous copies of web documents in the search results making them futilely relevant to the users thereby creating a serious problem for internet search engines. The outcome of perpetual growth of Web and e-commerce has led to the increase in demand of new Web sites and Web applications. Duplicated web pages that consist of identical structure but different data can be regarded as clones. The identification of similar or near-duplicate pairs in a large collection is a significant problem with wide-spread applications. The problem has been deliberated for diverse data types (e.g. textual documents, spatial points and relational records) in diverse settings. Another contemporary materialization of the problem is the efficient identification of near-duplicate Web pages. This is certainly challenging in the web-scale due to the voluminous data and high dimensionalities of the documents. This survey paper has a fundamental intention to present an up-to-date review of the existing literature in duplicate and near duplicate detection of general documents and web documents in web crawling. Besides, the classification of the existing literature in duplicate and near duplicate detection techniques and a detailed description of the same are presented so as to make the survey more comprehensible. Additionally a brief introduction of web mining, web crawling, and duplicate document detection are also presented.

Questions:

Duplicate document detection is a rapidly evolving field.

  1. What general considerations would govern a topic map to remain current in this field?
  2. What would we need to extract from this paper to construct such a map?
  3. What other technologies would we need to use in connection with such a map?
  4. What data sources should we use for such a map?

10 Best Data Visualization Projects of the Year – 2010

Filed under: Graphics,Visualization — Patrick Durusau @ 5:19 am

10 Best Data Visualization Projects of the Year – 2010

If you don’t know the flowingdata.com site, the playground of Nathan Yau, you should.

This is data visualization at its best.

Questions:

For the following, exclude data visualizations reported by Yau. That would make it too easy.

  1. Find three examples of good data visualization and say why they are good data visualization. (2-3 pages, references/pointers to the visualizations)
  2. Find three examples of poor data visualization and say why they are poor data visualizations. (2-3 pages, references/pointers to the visualizations)
  3. How would you use visualization with all or part of your topic map? (3-5 pages, no citations)

USA Today API

Filed under: Data Source,Dataset,News — Patrick Durusau @ 5:05 am

USA Today API

The nice folks at www.programmableweb.com reported today that USA has opened its article archive up going back to 2004.

From the story USA Today Expands APIs to Include Articles Back to 2004:

The dataset contains all web stories going back to 2004, as well as blog posts, newspaper stories, and even wire feeds.

Questions:

Use the standard USA today and another news archive of your choice to answer the following questions:

  1. Find a series of news stories about some major event and compare the language used.
  2. Could you find the stories in both archives using the same language? (2-3 pages, pointers to the news sources)
  3. What stories about the event would require different languages to find? 2-3 pages, pointers to the news sources)

The point of this exercise is to develop examples of where creative searching is going to find more resources than using the typical search terms.

It will also illustrate the semantic limitations of current search engines.

December 13, 2010

Machine Learning and Data Mining with R – Post

Filed under: Bayesian Models,Data Mining,R — Patrick Durusau @ 7:58 pm

Machine Learning and Data Mining with R

Announcement of course notes and slides, plus live classes in San Francisco, January 2012, courtesy of the Revolutions blog from Revolution Analytics.

Check the post for details and links.

Search Potpourri: Pasties

Filed under: Humor,Search Engines — Patrick Durusau @ 11:51 am

Today’s search term: pasties

1-3: Err, nipple covers

4: a filled pastry case

5-8: nipple covers

9: a filled pastry case delivered

10. nipple cover

The other results alternative between those two subjects for the most part.

With the exception that a market has been created for flying pasties by airport scanners.

Can anyone suggest a search engine that doesn’t return both in the first page of “hits?”

Zeitgeist 2010: How the world searched

Filed under: Humor,News — Patrick Durusau @ 9:41 am

Zeitgeist 2010: How the world searched

From the website:

Based on the aggregation of billions of search queries people typed into Google this year, Zeitgeist captures the spirit of 2010.

Interactive map plus video.

I was disappointed topic maps weren’t listed.

But they don’t have a technology listing so that must be the reason.

Maybe we can get topic maps re-classed as Consumer Electronics or just try for the general In the News category.

Suggestions? 😉

USA Today Best-Selling Books API

Filed under: Books,Data Source,Dataset — Patrick Durusau @ 8:45 am

USA Today Best-Selling Books API

From the website:

USA Today’s Best-Selling Books API provides a method for developers to retrieve USA TOday’s weekly compiled list of the nation’s best-selling books, which is published each Thursday. In addition, developers can also retrieve archived lists since the book list’s launch on Thursday, Oct. 28, 1993. The Best-Selling Books API can also be used to retrieve a title’s history on the list and metadata about each title.

Available metadata:

  • Author. Contains one or more names of the authors, illustrators, editors or other creators of the book.
  • BookListAppearances. The number of weeks a book has appeared in the Top 150, regardless of ISBN.
  • BriefDescription. A summary of the book. Contains indicators of the book’s class (fiction or non-fiction) and format (hardcover, paperback, e-book). If a title is available in multiple formats, the format noted is the one selling the most copies that week.
  • CategoryID. Code for book category type.
  • CategoryName. Text of book category type.
  • Class. Specifies whether the book is fiction or non-fiction.
  • FirstBookListAppearance. The date of the list when the particular ISBN first appeared in the top 150.
  • FormatName. Specifies whether the ISBN is assigned to a hardcover, paperback or e-book edition.
  • HighestRank. The highest position on the list achieved by this book, regardless of ISBN.
  • ISBN. The book’s 13- or 10-digit ISBN. The ISBN for a title in a given week is the ISBN of the version (hardcover, paperback or e-book) that sold the most copies that week.
  • MostRecentBooksListAppearance. The date of the list when the particular ISBN last appeared in the top 150.
  • Rank. The book’s rank on the list.
  • RankHistories. The weekly history of the ISBN fetched.
  • RankLastWeek. The book’s rank on the prior week’s list if it appeared. Books not on the previous week’s list are designated with a “0”.
  • Title. The book title. Titles are generally reported as specified by publishers and as they appear on the book’s cover.
  • TitleAPIUrl. URL to retrieve the list history for that ISBN. Note that the ISBN refers to the version of the title that sold the most copies that week if multiple formats were available for sale. Sales from other ISBNs assigned to that title may be included; we do not provide the other ISBNs each week.

Questions:

  1. Would you use a topic map to dynamically display this information to library patrons? If so, which parts? (2-3 pages, no citations)
  2. What information would you want to use to supplement this information? How would you map it to this information? (2-3 pages, no citations)
  3. What information would you include for library staff and not patrons? (if any) (2-3 pages, no citations)

10×10 – Words and Photos

Filed under: Data Source,Dataset,Subject Identity — Patrick Durusau @ 7:38 am

10×10

From the website:

10×10 (‘ten by ten’) is an interactive exploration of words and photos based on news stories during a particular hour. The 10×10 site displays 100 photos, each photo representative of a word used in many news stories published during the current hour. The 10×10 site maintains an archive of these photos and words back to 2004. The 10×10 API is organized like directories, with the year, month, day and hour. Retrieve the words list for a particular hour, then get the photos that correspond to those words.

A preset mapping of words to photos but nothing would prevent an application from offering additional photos.

Not to mention enabling merging based on the recognition of photos.*

Replication of merging could be an issue if based on image recognition.

On the other hand, I am not sure replication of merging would be any less certain than asking users to base merging decisions based on textual content.

Reliable replication of merging is possible only when our mechanical servants are given rules to apply.

****
*Leaving aside replication of merging issues (which may not be an operational requirement), facial recognition, perhaps supplemented by human operator confirmation, could be an interesting component of mass acquisition of images, say at border entry/exit points.

Not that border personnel need be given access to such information, a la Secret Intelligence – Public Recording Network (SIPRNet) systems, but a topic map could simply signal an order to detain, follow, get their phone number.

Simply dumping data into systems doesn’t lead to more “connect the dot” moments.

Topic maps may be a way to lead to more such moments, depending upon the skill of their construction and your analysts. (inquiries welcome)

OrientDB 0.9.24

Filed under: NoSQL,OrientDB — Patrick Durusau @ 7:10 am

OrientDB 0.9.24 has been released! Direct download: http://orient.googlecode.com/files/orientdb-0.9.24.zip

Issues fixed: http://code.google.com/p/orient/issues/list?can=1&q=label:v0.9.24

Features for 0.9.25 (Jan. 2010): http://code.google.com/p/orient/issues/list?q=label:v0.9.25

To suggest a new feature: http://code.google.com/p/orient/issues/entry?template=New%20feature

December 12, 2010

Europeana: think culture

Filed under: Data Source,Dataset,Museums — Patrick Durusau @ 8:04 pm

europena: think culture

More than 14.6 million items from over 1500 organizations.

Truly an embarrassment of riches for anyone writing a topic map about Europe, its history, literature, influence on other parts of the world, etc.

I have just begun to explore the site and its interfaces. Will report back from time to time.

You can create your own tags but creation of an account requires the following agreement:

I understand that My Europeana gives me the opportunity to create tags for any item I wish. I agree that I will not create any tags that could be considered libelous, harmful, threatening, unlawful, defamatory,infringing, abusive, inflammatory, harassing, pornographic, obscene, fraudulent, invasive of privacy or publicity rights, hateful, or racially, ethnically or otherwise objectionable. By clicking this box I agree to abide by this agreement, and understand that if I don’t my membership of My Europeana will be terminated.

Just so you know.

Questions:

  1. Select ten (10) artifacts to be integrated with local resources, using a topic map. Create a topic map. (The artifacts can be occurrences but associations provide richer opportunities.)
  2. Select one of the projects on the Thought Lab page and review it.
  3. What would you suggest as an improvement to the project you selected in #2? (3-5 pages, citations)

The European Library

Filed under: Data Source,Dataset,Library — Patrick Durusau @ 8:03 pm

The European Library

Free access to 48 European national libraries with materials in 35 languages.

Not to mention 25 million pages of scanned material read by OCR.

Collection access via several interfaces.

Any library with web access should be able to offer its multi-lingual patrons, Europeans ones anyway, with primary materials in languages of their preference.

Topic map mavens will no doubt want to push further than hyperlinks to language specific collections.

Questions:

Assume your library director has grown tired of “topic maps would…” type suggestions and asks you for a proposal to use topic maps to integrate part of the European Library materials into the local collection.

  1. How would you choose the parts of each collection to be part of the topic map? (2-3 pages, no citations)
  2. What other members of the library staff would you involve in planning the proposal/prototype? (2-3 pages, with attention to the skill sets needed)
  3. Outline your prototype topic map and create a small but workable topic map to demonstrate the mapping you propose to use. (3-5 pages, no citations. Topic map without a custom interface. Very necessary step for a successful topic map deployment but beyond the time we have here.)

SRU Search/Retrieval via URL

Filed under: Library,Query Language,Retrieval — Patrick Durusau @ 8:00 pm

SRU Search/Retrieval via URL

Standards, resources, including free implementations for the SRU effort.

SRU: the protocol – SearchRetrieve Operation: Binding for SRU 2.0 (draft)

CQL: The Contextual Query Language – CQL: The Contextual Query Language (draft)

The website reports that standardization is to be completed soon. And the available drafts date from 2010.

However, if you follow known servers you will find only thirteen (13) known servers as of 12 December 2010.

Standards can be written prior to wide spread adoption but before spending too much effort on this protocol and query language, I think we need to watch its adoption curve closely.

Copac

Filed under: Data Source,Dataset,Library — Patrick Durusau @ 7:59 pm

Copaq

From the website:

Copac is a freely available library catalogue, giving access to the merged online catalogues of many major UK and Irish academic and National libraries, as well as increasing numbers of specialist libraries.

Copac has c.36 million records, representing the merged holdings of:

  • members of the Research Libraries UK (RLUK). This includes the catalogues of the British Library, the National Library of Scotland, and the National Library of Wales / Llyfrgell Genedlaethol Cymru.
  • increasing numbers of specialist libraries with collections of national research interest, as well as records for specialist collections held in UK academic libraries.

Copac offers four interfaces:

  • Web interface (+ plugins, including one for Facebook)
  • Z39.50
  • OpenURL
  • SRU

Questions:

  1. OK, so a topic map can merge your local library records with those in Copac. Why? What is your use case for that merging? (3-5 pages, no citations)
  2. What other data would you argue should be linked to Copac records using topic maps? (3-5 pages, no citations)
  3. What APIs at http://www.programmableweb.com would you use with Copac? Why? (3-5 pages, no citations)

ProgrammableWeb

Filed under: Data Source,Dataset,Topic Maps — Patrick Durusau @ 6:02 pm

ProgrammableWeb

As of 12 December 2010, 2479 APIs and 5429 Mashups.

Care to guess what a search on topic maps returns?

If your guess was 1 you would be correct.

Surely there is at least one API or Mashup out of the 7,908 listed that is a candidate for topic map #2?

Writing this as much of a note-home-from-the-teacher for myself as it is to anyone else.

Topic maps are a fundamentally different approach to semantic integration.

Not the usual re-write/convert to a new shiny orthodox format approach. Quickly before it gets supplanted (or is found wanting).

Topic maps offer a number of interesting capabilities:

  • no need for a universal and unique identifier
  • “double-ended” nature of associations that binds players together (you don’t have to remember to write the relationship both ways)
  • complete logical lock-step universe not required prior to authoring (or afterwards)
  • supports multiple overlapping views of the same data
  • …and other advantages.

But, my saying it to readers of this blog is preaching to the choir!

Surely some of us know relatives, former employers, parole officers who are not already sold on topic maps.

Please forward this post or tweet it to them.

Questions:

Search for APIs or mashups interest to you.

  1. Which APIs or mashups interest you as sources for topic map material? Why?(2-3 pages, no citations)
  2. Are there materials outside these you would want to point to or include in your topic map? (2-3 pages, citations/pointers)
  3. How would you test your topic map? No syntactic correctness but for inclusion of resources, terminology, etc.(3-5 pages, citations)
« Newer PostsOlder Posts »

Powered by WordPress