Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 17, 2011

CENDI Science Terminology Locator

Filed under: Government Data,Terminology,Vocabularies — Patrick Durusau @ 6:40 pm

CENDI Science Terminology Locator

Another CENDI resource that merits special mention.

From the webpage:

Browse the terminology resources across the U.S. Federal Science Agencies by selecting a topic and clicking the acronym resource link next to the category.

What you get when following one of the terminology links varies from “page not found” for NASA, RDF as an option at NALT, very complex term navigation (DOE), apparently search results in an agency database (USGS), a listing of terms with definitions and some navigation (DTIC), Descriptor Data (MeSH), “page not found” for NBII, to an outdated link for ERIC, but redirects to a thesaurus navigation page.

If you have someone in government who doesn’t think varying terminologies is an issue, send them this link. The varying responses and what you see when you get there should be proof enough for anyone.

CENDI Agency Terminology Resources

Filed under: Government Data,Terminology,Thesaurus,Vocabularies — Patrick Durusau @ 6:39 pm

CENDI Agency Terminology Resources

From the webpage:

The following URLs provide access to the online thesauri and indexing resources of the various federal scientific & technical agencies including CENDI agencies. These resources are of interest to those wishing to know about the scientific and technical terminology used in various fields.

  • Agriculture & Food
  • Applied Science & Technologies
  • Astronomy & Space
  • Biology & Nature
  • Earth & Ocean Sciences
  • Energy & Energy Conservation
  • Environment & Environmental Quality
  • General Science
  • Health & Medicine
  • Physics, Chemistry, and Mathematics
  • Science Education

I will post on CENDI but I thought this was important enough to call out separately. Particularly since there are multiple thesauri in some of these categories.

For example:

NAL Agricultural Thesaurus http://agclass.nal.usda.gov/agt/agt.shtml

The NAL Agricultural Thesaurus (NALT) is annually updated and the 2007 edition contains over 65,800 terms organized into 17 subject categories. NALT is searchable online and is available in several formats (PDF, ASCII text, XML, SKOS) for download from the web site. NALT has standard hierarchical, equivalence and associative relationships and provides scope notes and over 2,400 definitions of terms for clarity. Proposals for new terminology can be sent to thes@nal.usda.gov. Published by the National Agricultural Library, United States Department of Agriculture.

Tesauro Agrícola http://agclass.nal.usda.gov/agt_es.shtml

Tesauro Agrícola is the Spanish language translation of the NAL Agricultural Thesaurus (NALT). The thesaurus accommodates the complexity of the Spanish language from a Western Hemisphere perspective. First published in May 2007, the thesaurus contains over 15,700 translated concepts and contains definitions for more than 2,400 terms. The thesaurus is searchable with a Spanish interface and is available in several formats (PDF, ASCII text, XML) for download from the web site. Proposals for new terminology can be sent to thes@nal.usda.gov . Published by the National Agricultural Library, United States Department of Agriculture.

October 16, 2011

CENDI Agency Indexing System Descriptions: A Baseline Report

Filed under: Government Data,Indexing,Thesaurus — Patrick Durusau @ 4:08 pm

CENDI Agency Indexing System Descriptions: A Baseline Report (1998)

In some ways a bit dated but also a snap-shot in time of the indexing practices of the:

  • National Technical Information Service (NTIS),
  • Department of Energy, Office of Scientific and Technical Information (DOE OSTI),
  • US Geological Survey/Biological Resources Division (USGS/BRD),
  • National Aeronautics and Space Administration, STI Program (NASA),
  • National Library of Medicine/National Institutes of Health (NLM),
  • National Air Intelligence Center (NAIC),
  • Defense Technical Information Center (DTIC).

The summary reads:

Software/technology identification for automatic support to indexing. As the resources for providing human indexing become more precious, agencies are looking for technology support. DTIC, NASA, and NAIC already have systems in place to supply candidate terms. New systems are under development and are being tested at NAIC and NLM. The aim of these systems is to decrease the burden of work borne by indexers.

Training and personnel issues related to combining cataloging and indexing functions. DTIC and NASA have combined the indexing and cataloging functions. This reduces the paper handling and the number of “stations” in the workflow. The need for a separate cataloging function decreases with the advent of EDMS systems and the scanning of documents with some automatic generation of cataloging information based on this scanning. However, the merger of these two diverse functions has been a challenge, particularly given the difference in skill level of the incumbents.

Thesaurus maintenance software. Thesaurus management software is key to the successful development and maintenance of controlled vocabularies. NASA has rewritten its system internally for a client/server environment. DTIC has replaced its systems with a commercial-off-the-shelf product. NTIS and USGS/BRD are interested in obtaining software that would support development of more structured vocabularies.

Linked or multi-domain thesauri. Both NTIS and USGS/BRD are interested in this approach. NTIS has been using separate thesauri for the main topics of the document. USGS/BRD is developing a controlled vocabulary to support metadata creation and searching but does not want to develop a vocabulary from scratch. In both cases, there is concern about the resources for development and maintenance of an agency-specific thesaurus. Being able to link to multiple thesauri that are maintained by their individual “owners” would reduce the investment and development time.

Full-text search engines and human indexing requirements. It is clear that the explosion of information on the web (both relevant web sites and web-published documents) cannot be indexed in the old way. There are not enough resources; yet, the chaos of the web bets for more subject organization. The view of current full-text search engines is that the users often miss relevant documents and retrieve a lot of “noise”. The future of web searching is unclear and demands or requirements that it might place on indexing is unknown.

Quality Control in a production environment. As resources decrease and timeliness becomes more important, there are fewer resources available for quality control of the records. The aim is to build the quality in at the beginning, when the documents are being indexed, rather than add review cycles. However, it is difficult to maintain quality in this environment.

Training time. The agencies face indexer turnover and the need to produce at ever-increasing rates. Training time has been shortened over the years. There is a need to determine how to make shorter training periods more effective.

Indexing systems designed for new environments, especially distributed indexing. An alternative to centralized indexers is a more distributed environment that can take advantage of cottage labor and contract employees. However, this puts increasing demands on the indexing system. It must be remotely accessible, yet secure. It must provide equivalent levels of validation and up-front quality control.

Major project: Update this report, focusing on the issues listed in the summary.

October 15, 2011

Code For America

Filed under: eGov,Government Data,Marketing — Patrick Durusau @ 4:27 pm

Code For America

I hesitated over this post. But, being willing to promote topic maps for governments, near-governments, governments in the wings, wannabe governments and groups of various kinds opposed by governments, I should not stick at nationalistic or idealistic groups in the United States.

Projects that will benefit from topic maps in government circles work as well in Boston as Mogadishu and Kandahar.

With some adaptation for local goals and priorities but the underlying technical principles remain the same.

At 9/11, the siloed emergency responders could not effectively communicate with each other. Care to guess who can’t effectively communicate with each other in most major metropolitan areas? Just one example of the siloed nature of state, local and city government (To use U.S.-centric terminology. Supply your own local terminology.)

Keep an eye out for the software that is open sourced as a result of this project. Maybe adaptable to your local circumstances or silo. Or you may need a topic map.

September 22, 2011

Skills Matter – Autumn Update

Filed under: Conferences,Government Data,NoSQL,Scala — Patrick Durusau @ 6:26 pm

Skills Matter – Autumn Update

Given the state of UK airport security, about the only reason I would go to the UK would be for a Skills Matter (un)conference, eXchange, or tutorial! And that is from having only enjoyed them as recorded presentations, slides and code. Actual attendance must bring a lot of repeat customers.

On the schedule for this Fall:

Skills Matter Partner Conferences

Skills Matter has partnered with Silicon Valley Comes to the UK, WIP, Novoda, FuseSource and David Pollak, to provide you with the following fantastic (un)Conferences & Hackathon’s:

Skills Matter eXchanges

We’ll also be running some pretty cool one- and two-day long Skills Matter eXchanges, which are conferences featuring 45 minute long expert talks and lots of breaks to discuss what you have learned. Expect in-depth, hands-on talks led by real experts who are there to be quizzed, questioned and interrogated until you know as much as they do, or thereabouts! In the paragraphs below, you’ll be able to find out about the following eXchanges we have planned for the coming months:

Skills Matter Progressive Technology Tutorials

Skills Matter Progressive Technology Tutorials offer a collection of 4-hour tutorials, featuring a mix in-depth and hands-on workshops on technology, agile and software craftsmanship. In the paragraphs below, you’ll be able to find out about the following eXchanges we have planned for the coming months:

September 9, 2011

LATC – Linked Open Data Around-the-Clock

Filed under: Government Data,Linked Data,LOD — Patrick Durusau @ 7:10 pm

LATC – Linked Open Data Around-the-Clock

This appears to be an early release of the site because it has an “unfinished” feel to it. For example, you to poke around a bit to find the tools link. And it isn’t clear how the project intends to promote the use of those tools or originate others to promote the use of linked data.

I suppose it is too late to avoid the grandiose “around-the-clock” project name? Web servers, barring some technical issue, are up 24 x 7. They keep going even as we sleep. Promise.

Objectives:

increase the number, the quality and the accuracy of data links between LOD datasets. LATC contributes to the evolution of the World Wide Web into a global data space that can be exploited by applications similar to a local database today. By increasing the number and quality of data links, LATC makes it easier for European Commission-funded projects to use the Linked Data Web for research purposes.

support institutions as well as individuals with Linked Data publication and consumption. Many of the practical problems that a European Commission-funded project may discover when interaction with the Web of Data are solved on the conceptual level and the solutions have been implemented into freely available data publication and consumption tools. What is still missing is the dissemination of knowledge about how to use these tools to interact with the Web of Linked Data. We aim at providing this knowledge.

create an in-depth test-bed for data intensive applications by publishing datasets produced by the European Commission, the European Parliament, and other European institutions as Linked Data on the Web and by interlinking them with other governmental data, such as found in the UK and elsewhere.

September 5, 2011

Palin, Bachmann, and the Internal Welfare Code (aka, Internal Revenue Code)

Filed under: Government Data,Marketing,Topic Maps — Patrick Durusau @ 8:02 pm

Sarah Palin and Rep. Michelle Bachmann (R-Minnesota) support a 0% corporate tax rate and closing corporate loopholes in the Internal Revenue Code.*

Those cheering are more interested in the 0% corporate tax rate than closing corporate loopholes.

Truth be told, it should be called the Internal Welfare Code (IWC) as most of its provisions are loopholes for one group or another.

That makes tax reform hard because it is welfare reform. To have reform, someone has to give up their welfare benefits.

When welfare/tax provisions are written into the IWC/IRC, reports are prepared on the cost in revenue for those provisions. It often is easy to see who benefits from them.

Now there is a topic map project. Mapping the provisions of the IWC/IRC to the reports on “cost in revenue” for those provisions and identifying those who benefit from them. From that mapping you could produce a color-coded IWC/IRC that has the loopholes/provisions for each group identified by color. Or even re-organize the IWC/IRC by color so the loopholes for each group can be roughly compared.

That would be government transparency with bite!

PS: If you know of any government transparency project that would be interested, please pass this along. Or any candidate for that matter.

*The logic closing corporate loopholes to a 0% tax escapes me. But, I am not running for President of the United States.

Sartor et al. on Legislative XML for the Semantic Web

Filed under: Government Data,Legal Informatics — Patrick Durusau @ 7:34 pm

Sartor et al. on Legislative XML for the Semantic Web from the Legalinformatics Blog.

Legislative XML for the Semantic Web: Principles, Models, Standards for Document Management (Springer 2011), a collection of scholarly articles on the use of XML and Semantic Web technologies in connection with legislative information systems, has been published.

Should be of interest for anyone working on topic maps and legislative information systems.

August 3, 2011

Design: Build the Mobile Gov Toolkit

Filed under: eGov,Government Data,Marketing,Mobile Gov — Patrick Durusau @ 7:39 pm

Design: Build the Mobile Gov Toolkit

Tim O’Reilly tweeted this link.

Deadline for comments: 2 September 2011

From the post:

Your recommendations will help build an open, dynamic toolset–on a public wiki–to help agencies create and implement citizen-centric mobile gov services.

We are focusing on five areas.

  1. Policies: Tell us about policy gaps or ideas to support building mobile programs.
  2. Practices: What would jumpstart your efforts? Templates? Standards? Examples? Can you share your templates, standards, business cases?
  3. Partnerships: With whom and how can we work together?
  4. Products: What are your ideas for apps, mobile sites, text programs, mashups?
  5. Promotions: What are some great ways to spread the word?
  6. Do you have another category? You can add that, too.

What should we tell them about topic maps?

July 7, 2011

LAC Releases Government of Canada Core
Subject Thesaurus

Filed under: Government Data,RDF,SKOS — Patrick Durusau @ 4:30 pm

LAC Releases Government of Canada Core Subject Thesaurus

From the post:

The government of Canada has released a new downloadable version of its Core Subject Thesaurus in SKOS/RDF format. According to Library and Archives Canada, “The Government of Canada Core Subject Thesaurus is a bilingual thesaurus consisting of terminology that represents all the fields covered in the information resources of the Government of Canada. Library and Archives Canada is exploring the potential for linked data and the semantic web with LAC vocabularies, metadata and open content.”

When you reach the post with links to the vocabulary you will find it is also available as XML and CVS.

There are changes from the 2009 version.

Here’s an example:

old form new form French equivalent
Adaptive aids
(for persons
with disabilities)
Assistive Technologies Technologie d’aide

Did you notice that the old form and new form don’t share a single word in common?

Imagine that, an unstable core subject thesaurus.

Over time, more terms will be added, changed and deleted. Is there a topic map in the house?

June 28, 2011

Explore the Marvel Universe Social Graph

Filed under: Government Data,Graphs,Social Graphs,Social Networks — Patrick Durusau @ 9:50 am

Explore the Marvel Universe Social Graph

From the post (but be sure to see the images):

From Friday evening to Sunday afternoon, Kai Chang, Tom Turner, and Jefferson Braswell were tuning their visualizations and had a lot of fun exploring Spiderman or Captain america ego network. They came with these beautiful snapshots and created a zoomable web version using the Seadragon plugin. The won the “Most aesthetically pleasing visualization” category, congratulations to Kai, Tom and Jefferson for their amazing work!

The datasets have been added to the wiki Datasets page, so you can play with it and maybe calculate some metrics like centrality on the network. The graph is pretty large, so be sure to increase you Gephi memory settings with > 2GB.

I am sure the Marvel Comic graph is a lot more amusing but I can’t help but wonder about ego networks that combined:

  • Lobbyists registered with the US government
  • Elected and appointed officials and their staffs, plus staff’s families
  • Washington social calendar reports
  • Political donations
  • The Green Book

Topic maps could play a role in layering contracts, legislation and other matters onto the various ego networks.

June 27, 2011

Data-gov Wiki

Filed under: Government Data,Public Data,RDF — Patrick Durusau @ 6:32 pm

Data-gov Wiki

From the wiki:

The Data-gov Wiki is a project being pursued in the Tetherless World Constellation at Rensselaer Polytechnic Institute. We are investigating open government datasets using semantic web technologies. Currently, we are translating such datasets into RDF, getting them linked to the linked data cloud, and developing interesting applications and demos on linked government data. Most of the datasets shown on this page come from the US government’s data.gov Web site, although some are from other countries or non-government sources.

Try out their Drupal site with new demos:

Linking Open Government Data

My misgivings about the “openness” that releasing government data brings to one side, the Drupal site is a job well done and merits your attention.

June 26, 2011

regulations.gov

Filed under: Data Source,Government Data — Patrick Durusau @ 4:06 pm

regulations.gov

An easy source of US government regulations, which you can then use to demonstrate how your topic map application either maps the regulation into a legal environment or maps named individuals who win or lose under the regulation (in or out of government).

June 19, 2011

Open Government Data 2011 wrap-up

Filed under: Conferences,Dataset,Government Data,Public Data — Patrick Durusau @ 7:35 pm

Open Government Data 2011 wrap-up by Lutz Maicher.

From the post:

On June 16, 2011 the OGD 2011 – the first Open Data Conference in Austria – took place. Thanks to a lot of preliminary work of the Semantic Web Company the topic open (government) data is very hot in Austria, especially in Vienna and Linz. Hence 120 attendees (see the list here) for the first conference is a real success. Congrats to the organizers. And congrats to the community which made the conference to a very vital and interesting event.

If there is a Second Open Data Conference, it is a venue where topic maps should put in an appearance.

PublicData.EU Launched During DAA

Filed under: Dataset,Government Data,Public Data — Patrick Durusau @ 7:33 pm

PublicData.EU Launched During DAA

From the post:

During the Digital Agenda Assembly this week in Brussels the new portal PublicData.EU was launched in beta. This is a step aimed to make public data easier to find across the EU. As it says on the ‘about’ page:

“In order to unlock the potential of digital public sector information, developers and other prospective users must be able to find datasets they are interested in reusing. PublicData.eu will provide a single point of access to open, freely reusable datasets from numerous national, regional and local public bodies throughout Europe.

Information about European public datasets is currently scattered across many different data catalogues, portals and websites in many different languages, implemented using many different technologies. The kinds of information stored about public datasets may vary from country to country, and from registry to registry. PublicData.eu will harvest and federate this information to enable users to search, query, process, cache and perform other automated tasks on the data from a single place. This helps to solve the “discoverability problem” of finding interesting data across many different government websites, at many different levels of government, and across the many governments in Europe.

In addition to providing access to official information about datasets from public bodies, PublicData.eu will capture (proposed) edits, annotations, comments and uploads from the broader community of public data users. In this way, PublicData.eu will harness the social aspect of working with data to create opportunities for mass collaboration. For example, a web developer might download a dataset, convert it into a new format, upload it and add a link to the new version of the dataset for others to use. From fixing broken URLs or typos in descriptions to substantive comments or supplementary documentation about using the datasets, PublicData.eu will provide up to date information for data users, by data users.”

PublicData.EU is built by the Open Knowledge Foundation as part of the LOD2 project. “PublicData.eu is powered by CKAN, a data catalogue system used by various institutions and communities to manage open data. CKAN and all its components are open source software and used by a wide community of catalogue operators from across Europe, including the UK Government’s data.gov.uk portal.”

Here’s a European marketing opportunity for topic maps. How would a topic map solution be different from what is offered here? (There are similar opportunities in the US as well.)

« Newer Posts

Powered by WordPress