Archive for the ‘Vocabularies’ Category

Dictionary of Fantastic Vocabulary [Increasing the Need for Topic Maps]

Monday, April 18th, 2016

Dictionary of Fantastic Vocabulary by Greg Borenstein.

Alexis Lloyd tweeted this link along with:

This is utterly fantastic.

Well, it certainly increases the need for topic maps!

From the bot description on Twitter:

Generating new words with new meanings out of the atoms of English.

Ahem, are you sure about that?

Is a bot is generating meaning?

Or are readers conferring meaning on the new words as they are read?

If, as I contend, readers confer meaning, the utterance of every “new” word, opens up as many new meanings as there are readers of the “new” word.

Example of people conferring different meanings on a term?

Ask a dozen people what is meant by “shot” in:

It’s just a shot away

When Lisa Fischer breaks into her solo in:

(Best played loud.)

Differences in meanings make for funny moments, awkward pauses, blushes, in casual conversation.

What if the stakes are higher?

What if you need to produce (or destroy) all the emails by “bobby1.”

Is it enough to find some of them?

What have you looked for lately? Did you find all of it? Or only some of it?

New words appear everyday.

You are already behind. You will get further behind using search.

What’s New for 2016 MeSH

Thursday, December 17th, 2015

What’s New for 2016 MeSH by Jacque-Lynne Schulman.

From the post:

MeSH is the National Library of Medicine controlled vocabulary thesaurus which is updated annually. NLM uses the MeSH thesaurus to index articles from thousands of biomedical journals for the MEDLINE/PubMed database and for the cataloging of books, documents, and audiovisuals acquired by the Library.

MeSH experts/users will need to absorb the details but some of the changes include:

Overview of Vocabulary Development and Changes for 2016 MeSH

  • 438 Descriptors added
  • 17 Descriptor terms replaced with more up-to-date terminology
  • 9 Descriptors deleted
  • 1 Qualifier (Subheading) deleted

and,

MeSH Tree Changes: Uncle vs. Nephew Project

In the past, MeSH headings were loosely organized in trees and could appear in multiple locations depending upon the importance and specificity. In some cases the heading would appear two or more times in the same tree at higher and lower levels. This arrangement led to some headings appearing as a sibling (uncle) next to the heading under which they were treed as a nephew. In other cases a heading was included at a top level so it could be seen more readily in printed material. We reviewed these headings in MeSH and removed either the Uncle or Nephew depending upon the judgement of our Internal and External reviewers. There were over 1,000 tree changes resulting from this work, many of which will affect search retrieval in MEDLINE/PubMed and the NLM Catalog.

and,

MeSH Scope Notes

MeSH had a policy that each descriptor should have a scope note regardless of how obvious its meaning. There were many legacy headings that were created without scope notes before this rule came into effect. This year we initiated a project to write scope notes for all existing headings. Thus far 481 scope notes to MeSH were added and the project continues for 2017 MeSH.

Echoes of Heraclitus:

It is not possible to step twice into the same river according to Heraclitus, or to come into contact twice with a mortal being in the same state. (Plutarch) (Heraclitus)

Semantics and the words we use to invoke them are always in a state of flux. Sometimes more, sometimes less.

The lesson here is that anyone who says you can have a fixed and stable vocabulary is not only selling something, they are selling you a broken something. If not broken on the day you start to use it, then fairly soon thereafter.

It took time for me to come to the realization that the same is true about information systems that attempt to capture changing semantics at any given point.

Topic maps in the sense of ISO 13250-2, for example, can capture and map changing semantics, but if and only if you are willing to accept its data model.

Which is good as far as it goes but what if I want a different data model? That is to still capture changing semantics and map between them, but using a different data model.

We may have a use case to map back to ISO 13250-2 or to some other data model. The point being that we should not privilege any data model or syntax in advance, at least not absolutely.

Not only do communities change but their preferences for technologies change as well. It seems just a bit odd to be selling an approach on the basis of capturing change only to build a dike to prevent change in your implementation.

Yes?

New: Library of Congress Demographic Group Terms (LCDGT)

Wednesday, May 13th, 2015

From an email:

As part of its ongoing effort to provide effective access to library materials, the Library of Congress is developing a new vocabulary, entitled Library of Congress Demographic Group Terms (LCDGT). This vocabulary will be used to describe the creators of, and contributors to, resources, and also the intended audience of resources. It will be created and maintained by the Policy and Standards Division, and be distinct from the other vocabularies that are maintained by that division: Library of Congress Subject Headings (LCSH), Library of Congress Genre/Form Terms for Library and Archival Materials (LCGFT), and the Library of Congress Medium of Performance Thesaurus for Music (LCMPT).

A general rationale for the development of LCDGT, information about the pilot vocabulary, and a link to the Tentative List of terms in the pilot may be found on LC’s Acquisitions and Bibliographic Access website at http://www.loc.gov/catdir/cpso/lcdgt-announcement.html.

The Policy and Standards Division is accepting comments on the pilot vocabulary and the principles guiding its development through June 5, 2015. Comments may be sent to Janis L. Young at jayo@loc.gov.

A follow-up question to this post asked:

Is there a list of the codes used in field 072 in these lists? Some I can figure out, but it would be nice to see a list of the categories you’re using.

The list in question is: DEMOGRAPHIC GROUP TERMS.

To which Adam Schiff replied:

The list of codes is in http://www.loc.gov/catdir/cpso/lcdgt-principles.pdf and online at http://www.loc.gov/standards/valuelist/lcdgt.html (although the latter is still lacking a few of the codes found in the former).

Enjoy!

The Vocabulary of Cyber War

Tuesday, April 21st, 2015

The Vocabulary of Cyber War

From the post:

At the 39th Joint Doctrine Planning Conference, a semiannual meeting on topics related to military doctrine and planning held in May 2007, a contractor for Booz Allan Hamilton named Paul Schuh gave a short presentation discussing doctrinal issues related to “cyberspace” and the military’s increasing effort to define its operations involving computer networks. Schuh, who would later become chief of the Doctrine Branch at U.S. Cyber Command, argued that military terminology related to cyberspace operations was inadequate and failed to address the expansive nature of cyberspace. According to Schuh, the existing definition of cyberspace as “the notional environment in which digitized information is communicated over computer networks” was imprecise. Instead, he proposed that cyberspace be defined as “a domain characterized by the use of electronics and the electromagnetic spectrum to store, modify, and exchange data via networked systems and associated physical infrastructures.”

Amid the disagreements about “notional environments” and “operational domains,” Schuh informed the conference that “experience gleaned from recent cyberspace operations” had revealed “the necessity for development of a lexicon to accommodate cyberspace operations, cyber warfare and various related terms” such as “weapons consequence” or “target vulnerability.” The lexicon needed to explain how the “‘four D’s (deny, degrade, disrupt, destroy)” and other core terms in military terminology could be applied to cyber weapons. The document that would later be produced to fill this void is The Cyber Warfare Lexicon, a relatively short compendium designed to “consolidate the core terminology of cyberspace operations.” Produced by the U.S. Strategic Command’s Joint Functional Command Component – Network Warfare, a predecessor to the current U.S. Cyber Command, the lexicon documents early attempts by the U.S. military to define its own cyber operations and place them within the larger context of traditional warfighting. A version of the lexicon from January 2009 obtained by Public Intelligence includes a complete listing of terms related to the process of creating, classifying and analyzing the effects of cyber weapons. An attachment to the lexicon includes a series of discussions on the evolution of military commanders’ conceptual understanding of cyber warfare and its accompanying terminology, attempting to align the actions of software with the outcomes of traditional weaponry.

A bit dated, 2009, particularly in terms of the understanding of cyber war but possibly useful for leaked documents from that time period and as a starting point to study the evolution of terminology in the area.

To the extent this crosses over with cybersecurity, you may find the A Glossary of Common Cybersecurity Terminology (NICCS) or Glossary of Information Security Terms, useful. There is overlap between the two.

There are several information sharing efforts under development or in place, which will no doubt lead to the creation of more terminology.

Controlled Vocabularies and the Semantic Web

Wednesday, February 18th, 2015

Controlled Vocabularies and the Semantic Web Journal of Library Metadata – Special Issue Call for Papers

From the webpage:

Ranging from large national libraries to small and medium-sized institutions, many cultural heritage organizations, including libraries, archives, and museums, have been working with controlled vocabularies in linked data and semantic web contexts.  Such work has included transforming existing vocabularies, thesauri, subject heading schemes, authority files, term and code lists into SKOS and other machine-consumable linked data formats. 

This special issue of the Journal of Library Metadata welcomes articles from a wide variety of types and sizes of organizations on a wide range of topics related to controlled vocabularies, ontologies, and models for linked data and semantic web deployment, whether theoretical, experimental, or actual. 

Topics include, but are not restricted to the following:

  • Converting existing vocabularies into SKOS and/or other linked data formats.
  • Publishing local vocabularies as linked data in online repositories such as the Open Metadata Registry.
  • Development or use of special tools, platforms and interfaces that facilitate the creation and deployment of vocabularies as linked data.
  • Working with Linked Data / Semantic Web W3C standards such as RDF, RDFS, SKOS, and OWL.
  • Work with the BIBFRAME, Europeana, DPLA, CIDOC-CRM, or other linked data / semantic web models, frameworks, and ontologies.
  • Challenges in transforming existing vocabularies and models into linked data and semantic web vocabularies and models.

Click here for a complete list of possible topics.

Researchers and practitioners are invited to submit a proposal (approximately 500 words) including a problem statement, problem significance, objectives, methodology, and conclusions (or tentative conclusions for work in progress). Proposals must be received by March 1, 2015. Full manuscripts (4000-7000 words) are expected to be submitted by June 1, 2015. All submitted manuscripts will be reviewed on a double-blind review basis.

Please forward inquiries and proposal submissions electronically to the guest editors at: perkintj@miamioh.edu

Proposal Deadline: March 1, 2015.

Library of Metadata online. Unfortunately one of those journals where authors have to pay for their work to be accessible to others. The interface makes it look like you are going to have access until you attempt to view a particular article. I didn’t stumble across any that were accessible but I only tried four (4) or (5) of them.

Interesting journal if you have access to it or if you are willing to pay $40.00 per article for viewing. I worked for an academic publisher for a number of years and have an acute sense of the value-add publishers bring to the table. Volunteer authors, volunteer editors, etc.

Underspecifying Meaning

Sunday, July 27th, 2014

Word Meanings Evolve to Selectively Preserve Distinctions on Salient Dimensions by Catriona Silvey, Simon Kirby, and Kenny Smith.

Abstract:

Words refer to objects in the world, but this correspondence is not one-to-one: Each word has a range of referents that share features on some dimensions but differ on others. This property of language is called underspecification. Parts of the lexicon have characteristic patterns of underspecification; for example, artifact nouns tend to specify shape, but not color, whereas substance nouns specify material but not shape. These regularities in the lexicon enable learners to generalize new words appropriately. How does the lexicon come to have these helpful regularities? We test the hypothesis that systematic backgrounding of some dimensions during learning and use causes language to gradually change, over repeated episodes of transmission, to produce a lexicon with strong patterns of underspecification across these less salient dimensions. This offers a cultural evolutionary mechanism linking individual word learning and generalization to the origin of regularities in the lexicon that help learners generalize words appropriately.

I can’t seem to access the article today but the premise is intriguing.

Perhaps people can have different “…less salient dimensions…” and therefore are generalizing words “inappropriately” from the standpoint of another person.

Curious if a test can be devised to identify those “…less salient dimensions…” in some target population? Might lead to faster identification of terms likely to be mis-understood.

Medical Vocabulary

Sunday, July 6th, 2014

Medical Vocabulary by John D. Cook.

A new twitter account that tweets medical terms with definitions.

Would a twitter account that focuses on semantic terminology be useful?

No promises, just curious.

Thinking promising semantic searching/integration if there was some evidence that semanticists are aware of the vast and different terminology in their own field.

PS: John D. Cook has seventeen (17) Twitter accounts as of today:

I subscribe to several of them and they are very much worth the time to follow.

For the current list of John D. Cook twitter accounts, see: http://www.johndcook.com/twitter/

Cross-Scheme Management in VocBench 2.1

Sunday, April 13th, 2014

Cross-Scheme Management in VocBench 2.1 by Armando Stellato.

From the post:

One of the main features of the forthcoming VB2.1 will be SKOS Cross-Scheme Management

I started drafting some notes about cross-scheme management here: https://art-uniroma2.atlassian.net/wiki/display/OWLART/SKOS+Cross-Scheme+Management

I think it is important to have all the integrity checks related to this aspect clear for humans, and not only have them sealed deep in the code. These notes will help users get acquainted with this feature in advance. Once completed, these will be included also in the manual of VB.

For the moment I’ve only written the introduction, some notes about data integrity and then described the checks carried upon the most dangerous operation: removing a concept from a scheme. Together with the VB development group, we will add more information in the next days. However, if you have some questions about this feature, you may post them here, as usual (or you may use the vocbench user/developer user groups).

A consistent set of operations and integrity checks for cross-scheme are already in place for this 2.1, which will be released in the next days.

VB2.2 will focus on other aspects (multi-project management), while we foresee a second wave of facilities for cross-scheme management (such as mass-move/add/remove actions, fixing utilities, analysis of dangling concepts, corrective actions etc..) for VB2.3

I agree that:

I think it is important to have all the integrity checks related to this aspect clear for humans, and not only have them sealed deep in the code.

But I am less certain that following the integrity checks of SKOS is useful in all mappings between schemes.

If you are interested in such constraints, see Armando’s notes.

Semantics of Business Vocabulary and Business Rules

Tuesday, February 4th, 2014

Semantics of Business Vocabulary and Business Rules

From 1.2 Applicability:

The SBVR specification is applicable to the domain of business vocabularies and business rules of all kinds of business activities in all kinds of organizations. It provides an unambiguous, meaning-centric, multilingual, and semantically rich capability for defining meanings of the language used by people in an industry, profession, discipline, field of study, or organization.

This specification is conceptualized optimally for business people rather than automated processing. It is designed to be used for business purposes, independent of information systems designs to serve these business purposes:

  • Unambiguous definition of the meaning of business concepts and business rules, consistently across all the terms, names and other representations used to express them, and across the natural languages in which those representations are expressed, so that they are not easily misunderstood either by “ordinary business people” or by lawyers.
  • Expression of the meanings of concepts and business rules in the wordings used by business people, who may belong to different communities, so that each expression wording is uniquely associated with one meaning in a given context.
  • Transformation of the meanings of concepts and business rules as expressed by humans into forms that are suitable to be processed by tools, and vice versa.
  • Interpretation of the meanings of concepts and business rules in order to discover inconsistencies and gaps within an SBVR Content Model (see 2.4) using logic-based techniques.
  • Application of the meanings of concepts and business rules to real-world business situations in order to enable reproducible decisions and to identify conformant and non-conformant business behavior.
  • Exchange of the meanings of concepts and business rules between humans and tools as well as between tools without losing information about the essence of those meanings.

I do need to repeat their warning from 6.2 How to Read this Specification:

This specification describes a vocabulary, or actually a set of vocabularies, using terminological entries. Each entry includes a definition, along with other specifications such as notes and examples. Often, the entries include rules (necessities) about the particular item being defined.

The sequencing of the clauses in this specification reflects the inherent logical order of the subject matter itself. Later clauses build semantically on the earlier ones. The initial clauses are therefore rather ‘deep’ in terms of SBVR’s grounding in formal logics and linguistics. Only after these clauses are presented do clauses more relevant to day-to-day business communication and business rules emerge.

This overall form of presentation, essential for a vocabulary standard, unfortunately means the material is rather difficult to approach. A figure presented for each sub-vocabulary does help illustrate its structure; however, no continuous ‘narrative’ or explanation is appropriate.

😉

OK. so you aren’t going to read it for giggles. But you will be encountering it in the wild world of data so at least mark the reference.

I first saw this in a tweet by Stian Danenbarger.

Three Linked Data Vocabularies

Friday, January 17th, 2014

Three Linked Data Vocabularies are W3C Recommendations

From the post:

Three Recommendations were published today to enhance data interoperability, especially in government data. Each one specifies an RDF vocabulary (a set of properties and classes) for conveying a particular kind of information:

  • The Data Catalog (DCAT) Vocabulary is used to provide information about available data sources. When data sources are described using DCAT, it becomes much easier to create high-quality integrated and customized catalogs including entries from many different providers. Many national data portals are already using DCAT.
  • The Data Cube Vocabulary brings the cube model underlying SDMX (Statistical Data and Metadata eXchange, a popular ISO standard) to Linked Data. This vocabulary enables statistical and other regular data, such as measurements, to be published and then integrated and analyzed with RDF-based tools.
  • The Organization Ontology provides a powerful and flexible vocabulary for expressing the official relationships and roles within an organization. This allows for interoperation of personnel tools and will support emerging socially-aware software.

More vocabularies for mapping into their respective areas, backwards for pre-existing vocabularies and forward for vocabularies that succeed them.

Create better SKOS vocabularies

Tuesday, January 14th, 2014

Create better SKOS vocabularies

From the webpage:

PoolParty SKOS Quality Checker allows you to perform automated quality checks on controlled vocabularies. You will receive a report of our findings.

This service is based on qSKOS and is able to make checks on over 20 quality issues.

You will organize uploaded vocabularies by giving a name for which you may provide different versions of the same vocabulary. This way you can easily track quality improvements over time.

You won’t need this for simple vocabularies (think schema.org) but could be useful for more complex vocabularies.

Get savvy on the latest cloud terminology

Sunday, January 12th, 2014

Get savvy on the latest cloud terminology by Nick Hardiman.

From the post:

As with all technology, some lingo stays popular, while other phrases decline in use. Use this list to find out the newest terminology for all things cloud.

Some cloud terms, like cloudstorming, cloudware and external cloud, are declining in popularity. Other terms are up-and-coming, like vertical cloud.

This list gives all the latest lingo to keep you up-to-date on the most popular terms for all things cloud:

Nick has assembled a list of fifty-one (51) cloud terms.

Could be useful in creating a vocabulary a la schema.org.

As Nick says, the lingo is going to change. Using a microformat and vocabulary can help you maintain access to information.

For example, Nick says:

Cluster

a collection of machines that work together to deliver a customer service. Cloud clusters grow and shrink on-demand. A cloud service provides an API for scaling out a cluster, by adding more machines.

When quoting that, I could say:

<blockquote itemprop="Thing" sameAs="http://en.wikipedia.org/wiki/Cluster_%28computing%29">

Cluster

a collection of machines that work together to deliver a customer service. Cloud clusters grow and shrink on-demand. A cloud service provides an API for scaling out a cluster, by adding more machines. </blockquote>

Which would distinguish (when searching) a “cluster” of computers from one of the other 38 uses of “cluster” found at: en.wikipedia.org/wiki/Cluster

Rather than using “Thing” from schema.org, I really should find or make an extension to that vocabulary for terms in various areas that are relevant to topic maps.

Vocabularies at W3C

Wednesday, January 8th, 2014

Vocabularies at W3C by Phil Archer.

From the post:

In my opening post on this blog I hinted that another would follow concerning vocabularies. Here it is.

When the Semantic Web first began, the expectation was that people would create their own vocabularies/schemas as required – it was all part of the open world (free love, do what you feel, dude) Zeitgeist. Over time, however, and with the benefit of a large measure of hindsight, it’s become clear that this is not what’s required.

The success of Linked Open Vocabularies as a central information point about vocabularies is symptomatic of a need, or at least a desire, for an authoritative reference point to aid the encoding and publication of data. This need/desire is expressed even more forcefully in the rapid success and adoption of schema.org. The large and growing set of terms in the schema.org namespace includes many established terms defined elsewhere, such as in vCard, FOAF, Good Relations and rNews. I’m delighted that Dan Brickley has indicated that schema.org will reference what one might call ‘source vocabularies’ in the near future, I hope with assertions like owl:equivalentClass, owl:equivalentProperty etc.

Designed and promoted as a means of helping search engines make sense of unstructured data (i.e. text), schema.org terms are being adopted in other contexts, for example in the ADMS. The Data Activity supports the schema.org effort as an important component and we’re delighted that the partners (Google, Microsoft, Yahoo! and Yandex) develop the vocabulary through the Web Schemas Task Force, part of the W3C Semantic Web Interest Group of which Dan Brickley is chair.

Phil then makes a pitch for doing vocabulary work at the W3C but you can see his post for the details.

I think the success of schema.org is a flashing pointer to a semantic sweet spot.

It isn’t nearly everything that you could do with RDF/OWL or with topic maps, but it’s enough to show immediate ROI for a minimum of investment.

Make no mistake, people will develop different vocabularies for the same activities. Not a problem. Topic maps will be able to help you robustly map between different vocabularies.

U.S. Military Slang

Thursday, December 5th, 2013

The definitive glossary of modern US military slang by Ben Brody.

From the post:

It’s painful for US soldiers to hear discussions and watch movies about modern wars when the dialogue is full of obsolete slang, like “chopper” and “GI.”

Slang changes with the times, and the military’s is no different. Soldiers fighting the wars in Iraq and Afghanistan have developed an expansive new military vocabulary, taking elements from popular culture as well as the doublespeak of the military industrial complex.

The US military drawdown in Afghanistan — which is underway but still awaiting the outcome of a proposed bilateral security agreement — is often referred to by soldiers as “the retrograde,” which is an old military euphemism for retreat. Of course the US military never “retreats” — rather it conducts a “tactical retrograde.”

This list is by no means exhaustive, and some of the terms originated prior to the wars in Afghanistan and Iraq. But these terms are critical to speaking the current language of soldiers, and understanding it when they speak to others. Please leave anything you think should be included in the comments.

Useful for documents that contain U.S. military slang, such as the Afghanistan War Diary.

As Ben notes at the outset, language changes over time so validate any vocabulary against your document/data set.

Preservation Vocabularies [3 types of magnetic storage medium?]

Sunday, June 30th, 2013

Preservation Datasets

From the webpage:

The Linked Data Service is to provide access to commonly found standards and vocabularies promulgated by the Library of Congress. This includes data values and the controlled vocabularies that house them. Below are descriptions of each preservation vocabulary derived from the PREMIS standard. Inside each, a search box allows you to search the vocabularies individually .

New preservation vocabularies from the Library of Congress.

Your mileage will vary with these vocabularies.

Take storage for example.

As we all learned in school, there are only three kinds of magnetic “storage medium:”

  • hard disk
  • magnetic tape
  • TSM

😉

In case you don’t recognize TSM, it stands for IBM Tivoli Storage Manager.

Hmmmm, what about the twenty (20) types of optical disks?

Or other forms of magnetic media? Such as thumb drives, floppy disks, etc.

I pick “storage medium” at random.

Take a look at some of the other vocabularies and let me know what you think.

Please include links to more information in case the LOC decides to add more entries to its vocabularies.

I first saw this at: 21 New Preservation Vocabularies available at id.loc.gov.

Integrating controlled vocabularies… (webinar)

Friday, June 28th, 2013

Integrating controlled vocabularies in information management systems : the new ontology plug-in”, 4th July

From the post:

The Webinar will introduce the new ontology plug-in developed in the context of the AIMS Community, how it works and the usage possibilities. It was created within the context of AgriOcean DSpace, however it is an independent plug-in and can be used in any other applications and information management systems.

The ontology plug-in searches multiple thesauri and ontologies simultaneously by using a web service broker (e.g. AGROVOC, ASFA, Plant Ontology, NERC-C19 ontology, and OceanExpert). It delivers as output a list of selected concepts, where each concept has a URI (or unique ID), a preferred label with optional language definition and the ontology from which the concepts has been selected. The application uses JAVA, Javascript and JQuery. As it is open software, developers are invited to reuse, enrich and enhance the existing source code.

We invite the participants of the webinar to give their view how thesauri and ontologies can be used in repositories and other types of information management systems and how the ontology plug-in can be further developed.

Date

4th of July 2013 – 16:00 Rome Time (Use Time Converter to calculate the time difference between your location and Rome , Italy)

On my must watch list.

Demo: http://193.190.8.15/ontwebapp/ontology.html

Source: https://code.google.com/p/ontology-plugin/

Imagine adapting the plugin to search for URIs in <a> elements and searching a database for the subjects they identify.

Vocabulary Management at W3C (Draft) [ontology and vocabulary as synonyms]

Thursday, June 6th, 2013

Vocabulary Management at W3C (Draft)

From the webpage:

One of the major stumbling blocks in deploying RDF has been the difficulty data providers have in determining which vocabularies to use. For example, a publisher of scientific papers who wants to embed document metadata in the web pages about each paper has to make an extensive search to find the possible vocabularies and gather the data to decide which among them are appropriate for this use. Many vocabularies may already exist, but they are difficult to find; there may be more than one on the same subject area, but it is not clear which ones have a reasonable level of stability and community acceptance; or there may be none, i.e. one may have to be developed in which case it is unclear how to make the community know about the existence of such a vocabulary.

There have been several attempts to create vocabulary catalogs, indexes, etc. but none of them has gained a general acceptance and few have remained up for very long. The latest notable attempt is LOV, created and maintained by Bernard Vatant (Mondeca) and Pierre-Yves Vandenbussche (DERI) as part of the DataLift project. Other application areas have more specific, application-dependent catalogs; e.g., the HCLS community has established such application-specific “ontology portals” (vocabulary hosting and/or directory services) as NCBO and OBO. (Note that for the purposes of this document, the terms “ontology” and “vocabulary” are synonyms.) Unfortunately, many of the cataloging projects in the past relied on a specific project or some individuals and they became, more often than not, obsolete after a while.

Initially (1999-2003) W3C stayed out of this process, waiting to see if the community would sort out this issue by itself. We hoped to see the emergence of an open market for vocabularies, including development tools, reviews, catalogs, consultants, etc. When that did not emerge, we decided to begin offering ontology hosting (on www.w3.org) and we began the Ontaria project (with DARPA funding) to provide an ontology directory service. Implementation of these services was not completed, however, and project funding ended in 2005. After that, W3C took no active role until the emergence of schema.org and the eventual creation of the Web Schemas Task Force of the Semantic Web Interest Group. WSTF was created both to provide an open process for schema.org and as a general forum for people interested in developing vocabularies. At this point, we are contemplating taking a more active role supporting the vocabulary ecosystem. (emphasis added)

The W3C proposal fails to address two issues with vocabularies:

1. Vocabularies are not the origin of the meanings of terms they contain.

Awful, according to yet another master of the king’s English quoted by Fries, could only mean awe-inspiring.

But it was not so. “The real meaning of any word,” argued Fries, “must be finally determined, not by its original meaning, it source or etymology, but by the content given the word in actual practical usage…. Even a hardy purist would scarcely dare pronounce a painter’s masterpiece awful, without explanations. [The Story of Ain’t by David Skinner, HarperCollins 2012, page 47)

Vocabularies represent some community of semantic practice but that brings us to the second problem the W3C proposal ignores.

2. The meaning of terms in a vocabulary are not stable, universal nor self-evident.

The problem with most vocabularies being they have no way to signal the the context, community or other information that would help distinguish one vocabulary meaning from another.

A human reader may intuit context and other clues from a vocabulary and use those factors when comparing the vocabulary to a text.

Computers, on the other hand, know no more than they have been told.

Vocabularies need to move beyond being simple tokens and represent terms with structures that capture some of the information a human reader knows intuitively about those terms.

Otherwise vocabularies will remain mute records of some socially defined meaning, but we won’t know which ones.

Threat Assessment Glossary

Wednesday, April 24th, 2013

Threat Assessment Glossary by Denise Bulling and Mario Scalora.

If you are working in the public/national security area, you may need some vocabulary help.

I would check the definitions against other sources.

Here’s why:

Hunters (AKA Biters) Hunters are individuals who intend to follow a path toward violence and behave in ways to further that goal

I’m sure the NRA will like that one.

Identification Thoughts of the necessity and utility of violence by a subject that are made evident through behaviors such as researching previous attackers and collecting, practicing, and fantasizing about weapons

That looks like a typo but I can’t tell where it should go.

Terrorism Act of violence or threats of violence used to further the agenda of the perpetrator while causing fear and psychological distress

I would have included physical harm but I’m no expert on terrorism.

Construction of Controlled Vocabularies

Tuesday, April 2nd, 2013

Construction of Controlled Vocabularies: A Primer by Marcia Lei Zeng.

From the “why” page:

Vocabulary control is used to improve the effectiveness of information storage and retrieval systems, Web navigation systems, and other environments that seek to both identify and locate desired content via some sort of description using language. The primary purpose of vocabulary control is to achieve consistency in the description of content objects and to facilitate retrieval.

1.1 Need for Vocabulary Control (1.1)

The need for vocabulary control arises from two basic features of natural language, namely:

• Two or more words or terms can be used to represent a single concept

Example:
salinity/saltiness
  VHF/Very High Frequency

• Two or more words that have the same spelling can represent different concepts

Example:
Mercury (planet)
  Mercury (metal)
  Mercury (automobile)
  Mercury (mythical being)

Great examples for vocabulary control but for topic maps as well!

The topic map question is:

What do you know about the subject(s) in either case, that would make you say the words mean the same subject or different subjects?

If we can capture the information you think makes them represent the same or different subjects, there is a basis for repeating that comparison.

Perhaps even automatically.

Mary Jane pointed out this resource in a recent comment.

Data Catalog Vocabulary (DCAT) [Last Call ends 08 April 2013]

Tuesday, March 12th, 2013

Data Catalog Vocabulary (DCAT)

Abstract:

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.

By using DCAT to describe datasets in data catalogs, publishers increase discoverability and enable applications easily to consume metadata from multiple catalogs. It further enables decentralized publishing of catalogs and facilitates federated dataset search across sites. Aggregated DCAT metadata can serve as a manifest file to facilitate digital preservation.

If you have comments, now would be a good time to finish them up for submission.

I first saw this in a tweet by Sandro Hawke.

Core Public Service Vocabulary released for public review [Deadline 27 February 2013]

Thursday, February 14th, 2013

Core Public Service Vocabulary released for public review

From the post:

The Core Public Service Vocabulary has entered in public review period. Anyone interested is invited to provide feedback until 27 February 2013 (inclusive).

In December 2012, the ISA Programme launched the Core Public Service Vocabulary (CPSV) initiative as part of Action 1.1 on improving semantic interoperability in European e-Government systems. The CPSV is a simplified, reusable and extensible data model that captures the fundamental characteristics of a service offered by public administrations.

The CPSV is designed to make it easy to exchange basic information about the functions carried out by the public sector and the services in which those functions are carried out. By using the vocabulary, organisations publishing data about their services will for example enable:

  • easier discovery of those services within and across countries;
  • easier discovery of the legislation and policies that underpin service provision;
  • easier comparison of similar services provided by different organisations.

Download the draft specification and comment by 27 February 2013.

From text at the draft download site, it appears the Pubic Review Period was to run from 8 February and 27 February 2013.

Take a look and see if you think that is enough time? Or to see if you have other comments.

AGROVOC 2013 edition released

Monday, February 11th, 2013

AGROVOC 2013 edition released

From the post:

The AGROVOC Team is pleased to announce the release of the AGROVOC 2013 edition.

The updated version contains 32,188 concepts in up to 22 languages, resulting in a total of 626,211 terms (in 2012: 32,061 concepts, 625,096 terms).

Please explore AGROVOC by searching terms, or browsing hierarchies.

AGROVOC 2013 is available for download, and accessible via web services.

From the “about” page:

The AGROVOC thesaurus contains 32,188 concepts in up to 22 languages covering topics related to food, nutrition, agriculture, fisheries, forestry, environment and other related domains.

A global community of editors consisting of librarians, terminologists, information managers and software developers, maintain AGROVOC using VocBench, an open-source multilingual, web-based vocabulary editor and workflow management tool that allows simultaneous, distributed editing. AGROVOC is expressed in Simple Knowledge Organization System (SKOS) and published as Linked Data.

Need some seeds for your topic map in “…food, nutrition, agriculture, fisheries, forestry, environment and other related domains”?

EU – Law-Related Authority Files

Friday, January 11th, 2013

The EU Data Portal has a number of law-related authority files:

I first saw these at: New EU Data Portal links to several law-related authority files.

Machine Learning based Vocabulary Management Tool

Monday, January 7th, 2013

Machine Learning based Vocabulary Management Tool – Assessment for the Linked Open Data by Ahsan Morshed and Ritaban Dutta.

Abstract:

Reusing domain vocabularies in the context of developing the knowledge based Linked Open data system is the most important discipline on the web. Many editors are available for developing and managing the vocabularies or Ontologies. However, selecting the most relevant editor is very difficult since each vocabulary construction initiative requires its own budget, time, resources. In this paper a novel unsupervised machine learning based comparative assessment mechanism has been proposed for selecting the most relevant editor. Defined evaluation criterions were functionality, reusability, data storage, complexity, association, maintainability, resilience, reliability, robustness, learnability, availability, flexibility, and visibility. Principal component analysis (PCA) was applied on the feedback data set collected from a survey involving sixty users. Focus was to identify the least correlated features carrying the most independent information variance to optimize the tool selection process. An automatic evaluation method based on Bagging Decision Trees has been used to identify the most suitable editor. Three tools namely Vocbench, TopBraid EVN and Pool Party Thesaurus Manager have been evaluated. Decision tree based analysis recommended the Vocbench and the Pool Party Thesaurus Manager are the better performer than the TopBraid EVN tool with very similar recommendation scores.

With the caveat that sixty (60) users in your organization (the number tested in this study), might reach different results, a useful study of vocabulary software.

More useful for the evaluation criteria to use with vocabulary software than in any absolute guide to the appropriate software.

I first saw this at: New article on vocabulary management tools.

Upcoming release of EuroVoc 4.4, EU’s multilingual thesaurus [December 18, 2012]

Wednesday, December 12th, 2012

Upcoming release of EuroVoc 4.4, EU’s multilingual thesaurus

From the post:

EuroVoc 4.4 will be released on December 18, 2012. During this day, the website might be temporary unavailable.

6.883 thesaurus concepts

This new edition is the result of a thorough revision among other things according to the concepts introduced by the ‘Lisbon Treaty’. It includes 6.883 thesaurus concepts of which 85 concepts are new, 142 have been updated and 28 have been classified as obsolete concepts.

These new concepts are the results of the proposals sent by the librarians from the libraries of the national parliaments in Europe, the European Institutions namely the European Parliament and the users of EuroVoc. All the terms in Portuguese have been revised according to the Portuguese language spelling reform. The prior lexical value remains available as Non-Preferred Terms.

EuroVoc, the EU’s multilingual thesaurus

EuroVoc is a multilingual, multidisciplinary thesaurus covering the activities of the EU, the European Parliament in particular. It contains terms in 22 EU languages. It is managed by the Publications Office, which moved forward to ontology-based thesaurus management and semantic web technologies conformant to W3C recommendations as well as latest trends in thesaurus standards.

There are documents prior to this version of the thesaurus and even documents prior to there being a EuroVoc thesaurus at all.

And there will be documents after EuroVoc has been superceded.

Not to mention in between there will be documents that use other vocabularies.

Good thing we have topic maps to use this resource to its best advantage.

A way station in a sea of semantic currents and drifts.

Linked Science Core Vocabulary Specification

Monday, November 12th, 2012

Linked Science Core Vocabulary Specification (revision 0.91)

Abstract:

LSC, the Linked Science Core Vocabulary, is a lightweight vocabulary providing terms to enable publishers and researchers to relate things in science to time, space, and themes. More precisely, LSC is designed for describing scientific resources including elements of research, their context, and for interconnecting them. We introduce LSC as an example of building blocks for Linked Science to communicate the linkage between scientific resources in a machine-understandable way. The “core” in the name refers to the fact that LSC only defines the basic terms for science. We argue that the success of Linked Science—or Linked Data in general—lies in interconnected, yet distributed vocabularies that minimize ontological commitments. More specific terms needed by different scientific communities can therefore be introduced as extensions of LSC. LSC is hosted at LinkedScience.org; please check also other available vocabularies at LinkedScience.org/vocabularies.

A Linked Data vocabulary that you may encounter.

I first saw this in a tweet by Ivan Herman.

VEST Registry. Vocabularies

Wednesday, November 7th, 2012

VEST Registry. Vocabularies

From the webpage:

All the vocabularies in the VEST Registry are classified by type and subject domain. Most of the Vocabularies are related to indexing. The purpose of indexing is to facilitate the search and finding of the content in a collection by the use of controlled/code lists, authority files or controlled subject vocabularies. The indexing ensures that the content will be found by users when they search specifically in information management systems. There are different type of vocabularies like authority files, classification systems, concept maps, controlled lists, ontologies, taxonomies or subject headings. But under the concept Vocabularies you can find as well dictionaries, encyclopedies, glossaries, lexical databases or topic trees.

If I am reading the webpage correctly, 116 separate vocabularies.

Browse through them to get an idea of the range of materials here.

Just on the homepage I see:

African Studies Thesaurus

A structured vocabulary of 12,100 English terms in the field of African studies, the African Studies Thesaurus is developed and maintained by staff at the library of the African Studies Centre Leiden. It is used for indexing and retrieving material in the library collection and is directly linked to the catalogue.

ARABTERM United Nations Multilingual Terminology Database of the Arabic Translation Service

ARABTERM is a multilingual terminology database which provides United Nations nomenclature and special terms in four of the official UN languages – Arabic, English, French and Spanish. The database is mainly intended for use by the language and editorial staff of the United Nations to ensure consistent translation of common terms and phrases used within the Organization.

Biological Entities

This ontology manages reference data about biological species needed for fisheries fact sheets and statistical information, among other resources. Species items are organized and maintained in the Aquatic Science and Fisheries Information System (ASFIS) and currently includes nearly 11.000 species items related to Fisheries and Aquaculture.

CABI thesaurus

The CAB Thesaurus is the essential search tool for all users of the CAB ABSTRACTS™ and Global Health databases and related products. The CAB Thesaurus is not only an invaluable aid for database users but it has many potential uses by individuals and organizations indexing their own information resources for both internal use and on the Internet..

No slight intended towards any vocabulary I didn’t mention. Just a random listing from the homepage.

Why rebuild when you can re-use? (And map.)

FEMA Acronyms, Abbreviations and Terms

Wednesday, October 17th, 2012

FEMA Acronyms, Abbreviations and Terms (PDF)

From the webpage:

The FAAT List is a handy reference for the myriad of acronyms and abbreviations used within the federal government, emergency management and first response communities. This year’s new edition, which continues to reflect the evolving U.S. Department of Homeland Security, contains an approximately 50 percent increase in the number of entries and definitions bringing the total to over 6,200 acronyms, abbreviations and terms. Some items listed are obsolete, but they are included because they may still appear in publications and correspondence. Obsolete items can be found at the end of this document.

This may be handy for reading FEMA or related government documents.

Hasn’t been updated since 2009.

If you know of a more recent resource, please give a shout.

Thesauri (Vocabularies – TemaTres)

Saturday, August 18th, 2012

Thesauri (Vocabularies – TemaTres)

The TemaTres vocabulary server is important but even more so is its collection of one hundred and fifty vocabularies.

Send a note if you export your vocabulary to a topic map. Interested in examples of mappings between vocabularies.