Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

January 17, 2011

DBpedia 3.6 – Release

Filed under: Data Source,DBpedia — Patrick Durusau @ 11:28 am

DBpedia 3.6 – Release

From the announcement:

The new DBpedia dataset describes more than 3.5 million things, of which 1.67 million are classified in a consistent ontology, including 364,000 persons, 462,000 places, 99,000 music albums, 54,000 films, 16,500 video games, 148,000 organizations, 148,000 species and 5,200 diseases.

The DBpedia dataset features labels and abstracts for 3.5 million things in up to 97 different languages; 1,850,000 links to images and 5,900,000 links to external web pages; 6,500,000 external links into other RDF datasets, and 632,000 Wikipedia categories.

The dataset consists of 672 million pieces of information (RDF triples) out of which 286 million were extracted from the English edition of Wikipedia and 386 million were extracted from other language editions and links to external datasets.

Quick Links:

DBpedia MappingTool: a graphical user interface to support the community in creating and editing mappings as well as the ontology.

Improved DBpedia Ontology as well as improved Infobox mappings.

Some commonly used property names changed. Please see http://dbpedia.org/ChangeLog and http://dbpedia.org/Datasets/Properties to know which relations changed and update your applications accordingly!

Download the new DBpedia dataset from http://dbpedia.org/Downloads36

Available as Linked Data and via the DBpedia SPARQL endpoint at http://dbpedia.org/sparql

January 6, 2011

The Top Five Information Management Meltdowns of 2010

Filed under: Data Source,Examples — Patrick Durusau @ 3:14 pm

The Top Five Information Management Meltdowns of 2010

Every year produces a number of stories like these.

Pick one and from the published reports, describe how you would incorporate topic maps to help lead to a different outcome. (3-5 pages, citations)

PS: It may be possible that topic maps play no direct role in avoiding the problem but lead to a more useful system.

Economic Indicator Database

Filed under: Data Source — Patrick Durusau @ 9:37 am

Economic Indicator Database

The US Census Bureau has made its database of economic indicators searchable.

You can even download data for further manipulation, although I must admit being bemused by the “Download straight from website to Excel.”

I wonder if they mean “Excel” in the sense of any old spreadsheet program like people who say “photocopy” by another name, that starts and ends with an “X?” 😉

Probably not. They probably mean a specific bit of software.

Thought I would mention it as one of the many data sources from the US government that can be re-purposed for use with a topic map.

I am sure other governments make similar data sources available.

If you have a favorite one, please forward the URL, a brief description and any comments you want to make about using it with topic maps.

January 5, 2011

Bribing Statistics

Filed under: Data Source,Marketing,Software — Patrick Durusau @ 1:03 pm

Bribing Statistics by Aleks Jakulin.

Self reporting (I paid a bribe is the name of the application) of bribery in the United States is uncommon, at least characterized as a bribe.

There are campaign finance reports and analysis that link organizations/causes to particular candidates. Not surprisingly, candidates vote in line with their major sources of funding.

The reason I mention it here is to suggest that topic maps could be used to provide a more granular mapping between contributions, office holders (or agency staff) and beneficiaries of legislation or contracts.

None of those things exist in isolation or without identity.

While one researcher might only be interested in DARPA contracts (to use a U.S. based example), the contract officers and the beneficiaries of those contracts, another researcher may be collecting data on campaign contributions that may include some of the beneficiaries of the DARPA contracts.

Topic maps are a great way to accumulate that sort of research over time.

Map of American English Dialects and Subdialects – Post

Filed under: Data Source,Mapping,Maps — Patrick Durusau @ 9:05 am

Map of American English Dialects and Subdialects

From Flowingdata.com a delightful map of American English dialects and subdialects. Several hundred YouTube videos are accessible through the map as examples.

Interesting example of mapping but moreover, looks like an excellent candidate for a topic map that binds in additional resources on the subject.

Enjoy!

December 30, 2010

Data Diligence: More Thoughts on Google Books’ Ngrams – Post

Filed under: Data Analysis,Data Source — Patrick Durusau @ 4:56 pm

Data Diligence: More Thoughts on Google Books’ Ngrams

Matthew Hurst asks a number of interesting questions about the underlying data for Google Book’s Ngrams.

He illustrates that large amounts of data have the potential to be useful, but divorced from any context or at least limited in terms of the context that is known, it can be of limited utility.

Questions:

  1. Spend at least 4-6 hours exploring (ok, playing) with Google Books’ Ngrams.
  2. Develop 3 or 4 questions you would like to answer with this data source.
  3. What additional information or context would you need to answer your questions in #2?

December 29, 2010

Setting Government Data Free With ScraperWiki

Filed under: Data Mining,Data Source — Patrick Durusau @ 2:12 pm

Setting Government Data Free With ScraperWiki

reports one a video by Max Ogden illustrating on the use of ScraperWiki to harvest government data.

If you are planning on adding government data to your topic map, this is a video you need to see.

SimpleGeo Makes Location Easy With Context and Places

Filed under: Data Source — Patrick Durusau @ 2:05 pm

SimpleGeo Makes Location Easy With Context and Places

Programmableweb.com reports on:

SimpleGeo Context takes a latitude and longitude and provides relevant contextual information such as weather, demographics, or neighborhood data for that co-ordinate.

SimpleGeo Places, which is a free database of business listings and Points of Interest (POI) that enables real-time community collaboration.

Interesting and free APIs that could add value to any topic map concerned with tourism or location information.

December 23, 2010

U.S. SEC RSS Feeds

Filed under: Data Source — Patrick Durusau @ 3:17 pm

U.S. SEC RSS Feeds

I ran across these feeds while looking at EDGAR Guide.

Should the SEC succeed in exposing EDGAR with an API, that would be useful for combining financial industry data.

Question: Is anyone capturing the current RSS feeds into a topic map application?

Mergent Corporate Actions and Dividends API

Filed under: Data Source — Patrick Durusau @ 3:05 pm

Mergent Corporate Actions and Dividends API

From http://www.programmableableweb.com:

Provides information on corporate actions and events reported by US and Canadian publicly traded companies, including a detailed database of issued and declared dividends. The API allows access to detailed data on dividend distributions, stock splits, stock dividends, spin-offs, redemption of stock, rights, tender offers, mergers & acquisitions, bankruptcy filings, and more.

Another corporate finance information source that can be usefully combined with other information.

ScraperWiki

Filed under: Authoring Topic Maps,Data Mining,Data Source — Patrick Durusau @ 1:47 pm

ScraperWiki

The website describes traditional screen scraping and then says:

ScraperWiki is an online tool to make that process simpler and more collaborative. Anyone can write a screen scraper using the online editor, and the code and data are shared with the world. Because it’s a wiki, other programmers can contribute to and improve the code. And, if you’re not a programmer yourself, you can request a scraper or ask the ScraperWiki team to write one for you.

Interesting way to promote the transition to accessible and structured data.

One step closer to incorporation into or being viewed by a topic map!

December 19, 2010

ASPECT Vocabulary Bank for Education API

Filed under: Data Source,Education — Patrick Durusau @ 2:04 pm

ASPECT Vocabulary Bank for Education API

From the http://www.programmableweb.com website:

The ASPECT Vocabulary Bank for Education (VBE) provides both a browsable and searchable web application for users to locate, view and download terminology, as well as standards-based machine to machine interfaces. The VBE provides a range of multilingual, controlled lists relevant to learning in the EU, including those that are used to validate metadata profiles and a thesaurus used to describe educational topics. The RESTful API allows users to interface with the VBE.

The EU is a group that realizes not all users speak the same language.

December 18, 2010

Mergent Historical Securities API

Filed under: Data Source — Patrick Durusau @ 4:50 pm

Mergent Historical Securities API

From http://www.programmableableweb.com:

Provides historical quotes for all US and Canadian stocks, indices, mutual funds, OTC bulletin board issues, and other securities. Prices are fully adjusted for splits, dividends and other corporate actions. Data on both living and dead issues is available. The API also provides access to non-price data such as short interest, shares outstanding, earnings per share and more.

This API along with one of the news archives could make an interesting exercise for high school or even college students.

What news events affect stock prices?

Do events of the same nature have the same impact?

December 15, 2010

Mergent Annual Reports API

Filed under: Data Source — Patrick Durusau @ 3:59 pm

Mergent Annual Reports API

From http://www.programmableweb.com:

Provides scanned annual report documents (e.g. in PDF format) for US and Canadian publicly traded companies, with more than 20 years of historical reports available.

According to the Mergent Annual Reports API docs you can search on:

  • nameContains
  • naicsPrefix
  • primaryCusipPrefix
  • primaryTicker
  • sicPrefix

The SEC’s Edgar tutorial mentions that the SEC has full text of annual reports and attachments for the last four (4) years.

So the omission of full-text searching looks odd.

Questions:

  1. Review the Edgar documentation. What other fixed fields would you add to this interface and why? (2-3 pages, no citations)
  2. Would you offer full-text searching? If yes, what do you gain? If no, why not? (2-3 pages, no citations)
  3. What other information (this source or some other) would you want to include? Why? (2-3 pages, no citations)

Mergent Company Fundamentals

Filed under: Data Source — Patrick Durusau @ 9:26 am

Mergent Company Fundamentals

From http://www.programmableweb.com:

Provides financial information on US and Canadian publicly traded companies, including historical financial statements (balance sheets, cash flow statements, income statements and ratios) going back 20+ years. Additionally, provides information on executives, including current and historical compensation, biography, insider holdings and insider transactions.

Interesting as a data source and certainly useful with other Mergent products (to be posted).

Questions:

  1. Identify three (3) other data sources that could be integrated with information from this API. (1-2 pages, citations)
  2. How would you integrate those resources with this one using a topic map? (3-5 pages, no citations)
  3. Why would you want to integrate those records using a topic map? What need might it meet? (3-5 pages, what needs would be met by this integration?

December 14, 2010

USA Today API

Filed under: Data Source,Dataset,News — Patrick Durusau @ 5:05 am

USA Today API

The nice folks at www.programmableweb.com reported today that USA has opened its article archive up going back to 2004.

From the story USA Today Expands APIs to Include Articles Back to 2004:

The dataset contains all web stories going back to 2004, as well as blog posts, newspaper stories, and even wire feeds.

Questions:

Use the standard USA today and another news archive of your choice to answer the following questions:

  1. Find a series of news stories about some major event and compare the language used.
  2. Could you find the stories in both archives using the same language? (2-3 pages, pointers to the news sources)
  3. What stories about the event would require different languages to find? 2-3 pages, pointers to the news sources)

The point of this exercise is to develop examples of where creative searching is going to find more resources than using the typical search terms.

It will also illustrate the semantic limitations of current search engines.

December 13, 2010

USA Today Best-Selling Books API

Filed under: Books,Data Source,Dataset — Patrick Durusau @ 8:45 am

USA Today Best-Selling Books API

From the website:

USA Today’s Best-Selling Books API provides a method for developers to retrieve USA TOday’s weekly compiled list of the nation’s best-selling books, which is published each Thursday. In addition, developers can also retrieve archived lists since the book list’s launch on Thursday, Oct. 28, 1993. The Best-Selling Books API can also be used to retrieve a title’s history on the list and metadata about each title.

Available metadata:

  • Author. Contains one or more names of the authors, illustrators, editors or other creators of the book.
  • BookListAppearances. The number of weeks a book has appeared in the Top 150, regardless of ISBN.
  • BriefDescription. A summary of the book. Contains indicators of the book’s class (fiction or non-fiction) and format (hardcover, paperback, e-book). If a title is available in multiple formats, the format noted is the one selling the most copies that week.
  • CategoryID. Code for book category type.
  • CategoryName. Text of book category type.
  • Class. Specifies whether the book is fiction or non-fiction.
  • FirstBookListAppearance. The date of the list when the particular ISBN first appeared in the top 150.
  • FormatName. Specifies whether the ISBN is assigned to a hardcover, paperback or e-book edition.
  • HighestRank. The highest position on the list achieved by this book, regardless of ISBN.
  • ISBN. The book’s 13- or 10-digit ISBN. The ISBN for a title in a given week is the ISBN of the version (hardcover, paperback or e-book) that sold the most copies that week.
  • MostRecentBooksListAppearance. The date of the list when the particular ISBN last appeared in the top 150.
  • Rank. The book’s rank on the list.
  • RankHistories. The weekly history of the ISBN fetched.
  • RankLastWeek. The book’s rank on the prior week’s list if it appeared. Books not on the previous week’s list are designated with a “0”.
  • Title. The book title. Titles are generally reported as specified by publishers and as they appear on the book’s cover.
  • TitleAPIUrl. URL to retrieve the list history for that ISBN. Note that the ISBN refers to the version of the title that sold the most copies that week if multiple formats were available for sale. Sales from other ISBNs assigned to that title may be included; we do not provide the other ISBNs each week.

Questions:

  1. Would you use a topic map to dynamically display this information to library patrons? If so, which parts? (2-3 pages, no citations)
  2. What information would you want to use to supplement this information? How would you map it to this information? (2-3 pages, no citations)
  3. What information would you include for library staff and not patrons? (if any) (2-3 pages, no citations)

10×10 – Words and Photos

Filed under: Data Source,Dataset,Subject Identity — Patrick Durusau @ 7:38 am

10×10

From the website:

10×10 (‘ten by ten’) is an interactive exploration of words and photos based on news stories during a particular hour. The 10×10 site displays 100 photos, each photo representative of a word used in many news stories published during the current hour. The 10×10 site maintains an archive of these photos and words back to 2004. The 10×10 API is organized like directories, with the year, month, day and hour. Retrieve the words list for a particular hour, then get the photos that correspond to those words.

A preset mapping of words to photos but nothing would prevent an application from offering additional photos.

Not to mention enabling merging based on the recognition of photos.*

Replication of merging could be an issue if based on image recognition.

On the other hand, I am not sure replication of merging would be any less certain than asking users to base merging decisions based on textual content.

Reliable replication of merging is possible only when our mechanical servants are given rules to apply.

****
*Leaving aside replication of merging issues (which may not be an operational requirement), facial recognition, perhaps supplemented by human operator confirmation, could be an interesting component of mass acquisition of images, say at border entry/exit points.

Not that border personnel need be given access to such information, a la Secret Intelligence – Public Recording Network (SIPRNet) systems, but a topic map could simply signal an order to detain, follow, get their phone number.

Simply dumping data into systems doesn’t lead to more “connect the dot” moments.

Topic maps may be a way to lead to more such moments, depending upon the skill of their construction and your analysts. (inquiries welcome)

December 12, 2010

Europeana: think culture

Filed under: Data Source,Dataset,Museums — Patrick Durusau @ 8:04 pm

europena: think culture

More than 14.6 million items from over 1500 organizations.

Truly an embarrassment of riches for anyone writing a topic map about Europe, its history, literature, influence on other parts of the world, etc.

I have just begun to explore the site and its interfaces. Will report back from time to time.

You can create your own tags but creation of an account requires the following agreement:

I understand that My Europeana gives me the opportunity to create tags for any item I wish. I agree that I will not create any tags that could be considered libelous, harmful, threatening, unlawful, defamatory,infringing, abusive, inflammatory, harassing, pornographic, obscene, fraudulent, invasive of privacy or publicity rights, hateful, or racially, ethnically or otherwise objectionable. By clicking this box I agree to abide by this agreement, and understand that if I don’t my membership of My Europeana will be terminated.

Just so you know.

Questions:

  1. Select ten (10) artifacts to be integrated with local resources, using a topic map. Create a topic map. (The artifacts can be occurrences but associations provide richer opportunities.)
  2. Select one of the projects on the Thought Lab page and review it.
  3. What would you suggest as an improvement to the project you selected in #2? (3-5 pages, citations)

The European Library

Filed under: Data Source,Dataset,Library — Patrick Durusau @ 8:03 pm

The European Library

Free access to 48 European national libraries with materials in 35 languages.

Not to mention 25 million pages of scanned material read by OCR.

Collection access via several interfaces.

Any library with web access should be able to offer its multi-lingual patrons, Europeans ones anyway, with primary materials in languages of their preference.

Topic map mavens will no doubt want to push further than hyperlinks to language specific collections.

Questions:

Assume your library director has grown tired of “topic maps would…” type suggestions and asks you for a proposal to use topic maps to integrate part of the European Library materials into the local collection.

  1. How would you choose the parts of each collection to be part of the topic map? (2-3 pages, no citations)
  2. What other members of the library staff would you involve in planning the proposal/prototype? (2-3 pages, with attention to the skill sets needed)
  3. Outline your prototype topic map and create a small but workable topic map to demonstrate the mapping you propose to use. (3-5 pages, no citations. Topic map without a custom interface. Very necessary step for a successful topic map deployment but beyond the time we have here.)

Copac

Filed under: Data Source,Dataset,Library — Patrick Durusau @ 7:59 pm

Copaq

From the website:

Copac is a freely available library catalogue, giving access to the merged online catalogues of many major UK and Irish academic and National libraries, as well as increasing numbers of specialist libraries.

Copac has c.36 million records, representing the merged holdings of:

  • members of the Research Libraries UK (RLUK). This includes the catalogues of the British Library, the National Library of Scotland, and the National Library of Wales / Llyfrgell Genedlaethol Cymru.
  • increasing numbers of specialist libraries with collections of national research interest, as well as records for specialist collections held in UK academic libraries.

Copac offers four interfaces:

  • Web interface (+ plugins, including one for Facebook)
  • Z39.50
  • OpenURL
  • SRU

Questions:

  1. OK, so a topic map can merge your local library records with those in Copac. Why? What is your use case for that merging? (3-5 pages, no citations)
  2. What other data would you argue should be linked to Copac records using topic maps? (3-5 pages, no citations)
  3. What APIs at http://www.programmableweb.com would you use with Copac? Why? (3-5 pages, no citations)

ProgrammableWeb

Filed under: Data Source,Dataset,Topic Maps — Patrick Durusau @ 6:02 pm

ProgrammableWeb

As of 12 December 2010, 2479 APIs and 5429 Mashups.

Care to guess what a search on topic maps returns?

If your guess was 1 you would be correct.

Surely there is at least one API or Mashup out of the 7,908 listed that is a candidate for topic map #2?

Writing this as much of a note-home-from-the-teacher for myself as it is to anyone else.

Topic maps are a fundamentally different approach to semantic integration.

Not the usual re-write/convert to a new shiny orthodox format approach. Quickly before it gets supplanted (or is found wanting).

Topic maps offer a number of interesting capabilities:

  • no need for a universal and unique identifier
  • “double-ended” nature of associations that binds players together (you don’t have to remember to write the relationship both ways)
  • complete logical lock-step universe not required prior to authoring (or afterwards)
  • supports multiple overlapping views of the same data
  • …and other advantages.

But, my saying it to readers of this blog is preaching to the choir!

Surely some of us know relatives, former employers, parole officers who are not already sold on topic maps.

Please forward this post or tweet it to them.

Questions:

Search for APIs or mashups interest to you.

  1. Which APIs or mashups interest you as sources for topic map material? Why?(2-3 pages, no citations)
  2. Are there materials outside these you would want to point to or include in your topic map? (2-3 pages, citations/pointers)
  3. How would you test your topic map? No syntactic correctness but for inclusion of resources, terminology, etc.(3-5 pages, citations)

Outside.in Hyperlocal News API

Filed under: Data Source,Dataset — Patrick Durusau @ 6:00 pm

Outside.in Hyperlocal News API

From the website:

The Outside.in API lets you easily integrate hyperlocal news in your sites and applications by providing recent news stories and blog posts for any neighborhood, ZIP code, city, or state in the United States.

A news aggregation site that offers free developer accounts (daily limits on accesses).

Follows > 54,000 RSS feeds.

Questions:

  1. What subjects would a topic map for the postal code where you live include? What information would you use from this service? (2-3 pages)
  2. What subjects would a topic map for the region where you live include? What information would you use from this service? (2-3 pages)
  3. What subjects would a topic map for the country where you live include? What information would you from this service? (2-3 pages)

If it sounds like you weren’t given enough room for all the subjects you would want to include, consider that no topic map, dictionary, encyclopedia, etc., is ever complete.

Editorial choices always have to be made. This is an exercise to give you an opportunity to make those choices and then discuss them with your classmates. (Instead of your director or perhaps a board of supervisors.)

Daylife Developer

Filed under: Data Source,Dataset,Software — Patrick Durusau @ 5:54 pm

Daylife Developer

News aggregation and analysis service.

Offers free developer access to their API, capped at 5,000 calls per day.

From the website:

Have an idea for the next big news application? Build a great app using the Daylife API, then we’ll market it to our clients and give you 70% of the proceeds from any sales. Learn more.

I started to not mention this site so I could keep the 70% to myself but there is more than one great news app using topic maps. 😉

Oh, but that means creating an app.

An app that uses topic maps to deliver substantively different and useful aggregation of news.

Both of those are critical requirements.

The app must be substantively different in delivery of a unique value-add from the use of topic maps. Something the user can’t get somewhere else.

The app must be useful in delivery of value-add found useful by some community. A community willing to pay for that usefulness.

See you at Daylife Developer?

******
PS: Send pointers to similar resources to: patrick@durusau.net.

The more resources become available, including aggregation services, the greater the opportunity for topic maps!

December 10, 2010

Trends in Large-Scale Subject Repositories

Filed under: Data Source — Patrick Durusau @ 11:40 am

Trends in Large-Scale Subject Repositories Authors: Jessica Adamick, Rebecca Reznik-Zellen

Abstract:

Noting a lack of broad empirical studies on subject repositories, the authors investigate subject repository trends that reveal common practices despite their apparent isolated development. Data collected on year founded, subjects, software, content types, deposit policy, copyright policy, host, funding, and governance are analyzed for the top ten most-populated subject repositories. Among them, several trends exist such as a multi- and interdisciplinary scope, strong representation in the sciences and social sciences, use of open source repository software for newer repositories, acceptance of pre- and post-prints, moderated deposits, submitter responsibility for copyright, university library or departmental hosting, and discouraged withdrawal of materials. In addition, there is a loose correlation between repository size and age. Recognizing the diversity of all subject repositories, the authors recommend that tools for assessment and evaluation be developed to guide subject repository management to best serve their respective communities.

A useful review of some of the leading subject repositories.

Crack the subject identity nut, reliably and in a cost-effective manner, for any of these repositories, and your advertising woes are over.

December 2, 2010

OpenSecrets.org

Filed under: Data Source — Patrick Durusau @ 5:59 am

OpenSecrets.org

From the website:

OpenSecrets.org is your nonpartisan guide to money’s influence on U.S. elections and public policy. Whether you’re a voter, journalist, activist, student or interested citizen, use our free site to shine light on your government. Count cash and make change.

Of particular interest to topic mapppers will be their OpenSecrets Developer Tools which include:

  • APIs — for live mashups
  • OpenData — itemized tables for analysis and recombinations
  • Widgets — with the ease of cut and paste

Offers a number of interesting possibilities.

*****

I wonder if contract data is available that lists who approved contracts or was part of the approval process and the winners of those contracts, both in terms of organizations and individuals?

Seems to me that would be an interesting set of dots to put together. I will ask around.

Suggestions of data sources for other governments welcome!

Questions:

  1. Document sources of political funding for a non-US government.
  2. How would you apply topic maps to the OpenSecrets.org data? (3-5 pages, no citations)
  3. What additional data would you include in your topic map in #2? (3-5 pages, list sources of other data)

December 1, 2010

Campaign Finance API (US-centric)

Filed under: Data Source — Patrick Durusau @ 1:48 pm

Campaign Finance API (US-centric)

The New York Times sponsors an API that accesses United States Federal Election Commission filings. Requires registration but is otherwise free. There are some limits on queries, etc.

I mention it because topic map applications that “tag” (in another sense of the word) candidates with particular contributions and legislation need to start sooner rather than later.

The 2012 election cycle (US), will be here sooner than you expect.

BTW, similar data sources for other countries would be good to bring to the attention of the topic mapping community.

November 21, 2010

DUC: Document Understanding Conferences

Filed under: Conferences,Data Source,Summarization — Patrick Durusau @ 8:19 am

DUC: Document Understanding Conferences

From the website:

There is currently much interest and activity aimed at building powerful multi-purpose information systems. The agencies involved include DARPA, ARDA and NIST. Their programmes, for example DARPA’s TIDES (Translingual Information Detection Extraction and Summarization) programme, ARDA’s Advanced Question & Answering Program and NIST’s TREC (Text Retrieval Conferences) programme cover a range of subprogrammes. These focus on different tasks requiring their own evaluation designs.

Within TIDES and among other researchers interested in document understanding, a group grew up which has been focusing on summarization and the evaluation of summarization systems. Part of the initial evaluation for TIDES called for a workshop to be held in the fall of 2000 to explore different ways of summarizing a common set of documents. Additionally a road mapping effort was started in March of 2000 to lay plans for a long-term evaluation effort in summarization.

Data sets, papers, etc., on text summarization.

Yes, DUC has moved to Textual Analysis Conference (TAC) but what they don’t say is that the DUC data and papers for 2001 to 2007 are listed at this site only.

Something to remember when you are looking for text summarization data sets and research.

Questions:

  1. Select a paper from the 2007 DUC conference. Update on the status of that research. (3-5 pages, citations)
  2. For the authors of #1, annotated bibliography of publications since the paper in 2007.
  3. How would you use the technique from #1 in the construction of a topic map? Inform your understanding, selection, data for that map, etc.? (3-5 pages, no citations)

November 2, 2010

Afghanistan War Diary

Filed under: Authoring Topic Maps,Data Source,Maiana,Topic Maps — Patrick Durusau @ 5:15 am

Afghanistan War Diary.

A portion of the Afghanistan war documents published by Wikileaks as a topic map.

The release is an automatic conversion to a topic map so does not reflect the nuances that human authoring brings to a topic map.

June 23, 2010

Authorities and Vocabularies!

Filed under: Data Source,LCSH,RDF — Patrick Durusau @ 6:12 pm

Authorities and Vocabularies at the Library of Congress offers bulk downloads of some of their authorities and vocabularies. Like the Library of Congress subject headings!

Granted it is in RDF but your topic map application is going to encounter RDF eventually. You may as well develop some experience at incorporating it into your topic map as you would any other subject identification system.

« Newer PostsOlder Posts »

Powered by WordPress