You’ve probably never given much thought to the fact that picking up your cup of morning coffee presents your brain with a set of complex decisions. You need to decide how to aim your hand, grasp the handle and raise the cup to your mouth, all without spilling the contents on your lap.
A new Northwestern University study shows that, not only does your brain handle such complex decisions for you, it also hides information from you about how those decisions are made.
“Our study gives a salient example,” said Yangqing ‘Lucie’ Xu, lead author of the study and a doctoral candidate in psychology at Northwestern. “When you pick up an object, your brain automatically decides how to control your muscles based on what your eyes provide about the object’s shape. When you pick up a mug by the handle with your right hand, you need to add a clockwise twist to your grip to compensate for the extra weight that you see on the left side of the mug.
“We showed that the use of this visual information is so powerful and automatic that we cannot turn it off. When people see an object weighted in one direction, they actually can’t help but ‘feel’ the weight in that direction, even when they know that we’re tricking them,” Xu said. (emphasis added)
I never quite trusted my brain and now I have proof that it is untrustworthy. Hiding stuff indeed!
But that’s the trick of subject identification/identity isn’t it?
That our brains “recognize” all manner of subjects without any effort on our part.
Another part of the effortless features of our brains. But it hides the information we need to integrate information stores from ourselves and others.
Or rather, making it more work than we are usually willing to devote to digging it out.
When called upon to be “explicit” about subject identification, or even worse, to imagine how other people identify subjects, we prefer to stay at home consuming passive entertainment.
Two quick points:
First, need to think about how to incorporate this “feature” into delivery interfaces for users.
Second, what subjects would users pay others to mine/collate/identify for them? (Delivery being a separate issue.)
It is a long-term goal of arXiv to accurately identify and disambiguate all authors of all articles in arXiv. Such identification would provide accurate results for queries such as "show me all the other papers by the particular John Smith that wrote this paper", something that can be done only approximately with text-based searches. It would also permit construction of an author-article graph which is useful for relevance
assessment and bibliometric analysis.
Since 2005 arXiv has used authority records that associate user accounts with articles authored by that user. These records support the endorsement system. The use of public author identifiers as a way to build services upon this data is new in 2009. Initially, users must opt-in to have a public author identifier and to expose the record of their articles on arXiv for use in other services. At some later date we hope to be able to improve our authority records to the point where we can create public author identifiers for all authors of arXiv articles without needing to enlist the help of each author to check their record before opting in.
The services we offer based on author identifiers are:
a way to dynamically include the list of your publications in your own home page using the JavaScript myarticles widget
an arXiv Facebook application providing a convenient way to alert friends to your arXiv articles and to comment on articles within Facebook
Significant enough in its own right but note the plans for the future:
The following enhancements and interoperability features are planned:
arXiv will permit authors to record other identifiers they have in other schemes and include these in the data feeds. This will allow agents and systems to link together the same author in different databases.
arXiv will support mechanisms for handling name changes, combination of accidentally created duplicates and separation of accidentally combined identifiers.
Recoding other identifiers? What? Acknowledge that there can be more than one identifier (yours) per subject? Blasphemy!
Go arXiv.org!
I am sure suggestions, support, contributions, etc., would be most welcome.
I remember William Kent, the original author of “Data and Reality” from a presentation he made in 2003, entitled: “The unsolvable identity problem.”
His abstract there read:
The identity problem is intractable. To shed light on the problem, which currently is a swirl of interlocking problems that tend to get tumbled together in any discussion, we separate out the various issues so they can be rationally addressed one at a time as much as possible. We explore various aspects of the problem, pick one aspect to focus on, pose an idealized theoretical solution, and then explore the factors rendering this solution impractical. The success of this endeavor depends on our agreement that the selected aspect is a good one to focus on, and that the idealized solution represents a desirable target to try to approximate as well as we can. If we achieve consensus here, then we at least have a unifying framework for coordinating the various partial solutions to fragments of the problem.
I haven’t read the “new” version of “Data and Reality” (just ordered a copy) but I don’t recall the original needing much in the way of changes.
The original carried much the same message, that all of our solutions are partial even within a domain, temporary, chronologically speaking, and at best “useful” for some particular purpose. I rather doubt you will find that degree of uncertainty being confessed by the purveyors of any current semantic solution.
I did pull my second edition off the shelf and with free shipping (5-8 days), I should have time to go over my notes and highlights before the “new” version appears.
Regular readers of Open Thoughts on Analytics won’t be surprised I found a book with a title like this noteworthy. I’ve written quite a bit over the years on challenges we face trying to be the rational, objective, non-biased actors and decision-makers we think we are.
So why is a book outlining the weaknesses of day-to-day, common sense thinking important for business intelligence and data science? Because both BI and DS are driven from a science of business framework that formulates and tests hypotheses on the causes and effects of business operations. If the thinking that produces that testable understanding is flawed, then so will be the resulting BI and DS.
According to Watts, common sense is “exquisitely adapted to handling the kind of complexity that arises in everyday situations … But ‘situations’ involving corporations, cultures, markets, nation-states, and global institutions exhibit a very different kind of complexity from everyday situations. And under these circumstances, common sense turns out to suffer from a number of errors that systematically mislead us. Yet because of the way we learn from experience … the failings of commonsense reasoning are rarely apparent to us … The paradox of common sense, therefore, is that even as it helps us make sense of the world, it can actively undermine our ability to understand it.”
The author argues that common sense explanations to complex behavior fail in three ways. The first error is that the mental model of individual behavior is systematically flawed. The second centers on explanations for collective behavior that are even worse, often missing the “emergence” – one plus one equals three – of social behavior. And finally, “we learn less from history than we think we do, and that misperception skews our perception of the future.”
I work in local search at Microsoft which means, like all those working in this space, I have to deal with an identity crisis on a daily basis. Currently, most local search products – like Bing’s and Google’s – leverage multiple data sets to derive a digital model of the world that users can then interact with. In creating this digital model, multiple statements have to be conflated to form a unified representation. This can be extremely challenging for two reasons. Firstly, the system has to decided when two records are intended to denote the same real world entity. Secondly, the designers of the system have to determine what real world entities are and how to describe them.
For example, if a business moves is that the same business or the closure of one and the opening of another? What does it mean to categorize a business? The cafe in Barnes and Noble is branded Starbucks but isn’t actually part of the Starbucks chain – should is surface as a separate entity or is it ‘hidden’ within the bookshop as an attribute (‘has cafe’)?
Thinking through these hard representational problems is as much part of the transformative trends going on in the tech industry as are those characterized by terms like ‘big data’ and ‘data scientist’.
Questions of identity and how to resolve different multiple references to the same entity have been debated at least since the time of Greek philosophers. Identity (Wikipedia page, see references on the various pages.)
This “philosophical challenge” has been going on for a very long time and so far I haven’t seen any demonstrations that the Web raises new questions.
You need to read Matthew’s identity example in his post.
The songs in question could be said to be instances of the same subject and a reference to that subject would be satisfied with any of those instances. From another point of view, the origin of the instances could be said to distinguish them into different subjects, say for proof of licensing purposes. Other view points are possible. Depends upon the purpose of your criteria of identification.
It is a very rich source of reference materials that you may find useful in developing subject heading proposals or subject classifications for other uses (such as topic maps).
But don’t neglect the materials you find on the SACO homepage.
I got a copy of “Thinking, Fast and Slow” for Christmas and it has already proven to be an enjoyable read.
Kahneman says early on (page 28):
The premise of this book is that it is easier to recognize other people’s mistakes than our own.
I thought about that line when I read a note from a friend that topic maps needed more than my:
tagging everything with “Topic Maps….”
Which means I haven’t been clear about the reasons for the breath of materials I have and will be covering in this blog.
One premise of this blog is that the use and recognition of identifiers is essential for communication.
Another premise of this blog is that it is easier for us to study the use and recognition of identifiers by others, much for the same reasons we can recognize the mistakes of others more easily.
The use and recognition of identifiers by others aren’t mistakes but they may be different from those we would make. In cases where they differ from ours, we have a unique opportunity to study the choices made and the impacts of those choices. And we may learn patterns in those choices that we can eventually see in our own choices.
Understanding the use and recognition of identifiers in a particular circumstance and the requirements for the use and recognition of identifiers, is the first step towards deciding whether topic maps would be useful in some circumstance and in what way?
For example, processing social security records in the United States, anything other than “bare” identifiers like a social security number may be unnecessary and add load with no corresponding benefit. Aligning social security records with bank records, might need to reconsider the judgement to use only social security numbers. (Some information sharing is “against the law.” But as the Sheriff in “Oh Brother where art thou?” says: “The law is a man made thing.” Laws change, or you can commission absurdist interpretations of it.)
Topic maps aren’t everywhere but identifiers and recognition of identifiers are.
Understanding identifiers and their recognition will help you choose the most appropriate solution to a problem
Apart from the content there are various features from metadata (like IP etc) which can help tell a spammer and regular user apart. Following are results of some data analysis (done on roughly 8000+ comments) which speak of another feature which proves to be a good discriminator. Hopefully this will aid others fighting spam/abuse (if not already using a similar feature).
(graph omitted)
The discriminator referred above is typing speed. The graph above plots the content length of a comment posted by a user against the (approximate) time he took to write it. If a user posts more than one comment in window of 5-10 minutes, we can consider those comments as consecutive posts. …
An illustration that subject identity tests are limited only by your imagination. From what I understand, very few spammers self-identify themselves using OWL and URLs. So as in this case, you need other tests to separate them.
A follow-up on this would be to see if particular spammers have speed patterns in their posts or searching more broadly, say across a set of blogs, a particular pattern. That is they start with blog X and then move down the line. Could be useful for dynamically configuring firewalls to block further content after they hit the first blog.
You have heard that passwords + keying patterns are used for personal identity?
ORCID, Inc. is a non-profit organization dedicated to solving the name ambiguity problem in scholarly research and brings together the leaders of the most influential universities, funding organizations, societies, publishers and corporations from around the globe. The ideal solution is to establish a registry that is adopted and embraced as the de facto standard by the whole of the community. A resolution to the systemic name ambiguity problem, by means of assigning unique identifiers linkable to an individual’s research output, will enhance the scientific discovery process and improve the efficiency of funding and collaboration. The organization is managed by a fourteen member Board of Directors.
ORCID’s principles will guide the initiative as it grows and operates. The principles confirm our commitment to open access, global communication, and researcher privacy.
Accurate identification of researchers and their work is one of the pillars for the transition from science to e-Science, wherein scholarly publications can be mined to spot links and ideas hidden in the ever-growing volume of scholarly literature. A disambiguated set of authors will allow new services and benefits to be built for the research community by all stakeholders in scholarly communication: from commercial actors to non-profit organizations, from governments to universities.
Thomson Reuters and Nature Publishing Group convened the first Name Identifier Summit in Cambridge, MA in November 2009, where a cross-section of the research community explored approaches to address name ambiguity. The ORCID initiative officially launched as a non-profit organization in August 2010 and is moving ahead with broad stakeholder participation (view participant gallery). As ORCID develops, we plan to engage researchers and other community members directly via social media and other activity. Participation from all stakeholders at all levels is essential to fulfilling the Initiative’s mission.
I am not altogether certain that elimination of ambiguity in identification will enable “…min[ing] to spot links and ideas hidden in the ever-growing volume of scientific literature.” Or should I say there is no demonstrated connection between unambiguous identification of researchers and such gains?
True enough, the claim is made but I thought science was based on evidence, not simply making claims.
And, like most researchers, I have discovered unexpected riches when mistaking one researcher’s name for another’s. Reducing ambiguity in identification will reduce the incidence of, well, ambiguity in identification.
I was picking tomatoes in the garden when I thought about telling Carol (my wife) the plants are about to stop producing.
Those plants are at a particular address, in the backyard, middle garden bed of three, are of three different varieties, but I am going to sum up those properties by saying: “The tomatoes are about to stop producing.”
It occurred to me that a subjectIdentifier could be assigned to a topic element on the basis of summing up properties of the topic.* That would have the advantage of enabling merging on the basis of subjectIdentifiers as opposed to more complex tests upon properties of a topic.
Disclosure of the basis for assignment of a subjectIdentifier is an interesting question.
It could be that a service wishes to produce subjectIdentifiers and index information based upon complex property measures, producing for consumption, the subjectIdentifiers and merge-capable indexes on one or more information sets. The basis for merging being the competitive edge offered by the service.
If promoting merging with a vendor’s process or format, which is seeking to become the TCP/IP of some area, the basis for merging and tools to assist with it will be supplied.
Or if you are an intelligence agency and you want an inward and outward facing interface that promotes merging of information but does not disclose your internal basis for identification, variants of this technique may be of interest.
*The notion of summing up imposes no prior constraints on the tests used or the location of the information subjected to those tests.
The Office of the National Coordinator for Health IT has released an advance on a proposed rule about the use of existing metadata standards to support electronic health information exchange and to get feedback on the experience from various organizations that may have applied them.
ONC is considering inclusion of certain metadata standards in stage 2 of meaningful use.
The use of metadata, or elements that describe data, is considered key to fueling more complex health information exchange.
If you are maintaining or building topic maps to integrate medical data you will be interested in this proposal.
John Sowa’s posting on Watson merits reproduction in its entirety (lite editing to make it format for easy reading):
Peter,
Thanks for the reminder:
Dave Ferrucci gave a talk on UIMA (the Unstructured Information Management Architecture) back in May-2006, entitled: “Putting the Semantics in the Semantic Web: An overview of UIMA and its role in Accelerating the Semantic Revolution”
I recommend that readers compare Ferrucci’s talk about UIMA in 2006 with his talk about the Watson system and Jeopardy in 2011. In less than 5 years, they built Watson on the UIMA foundation, which contained a reasonable amount of NLP tools, a modest ontology, and some useful tools for knowledge acquisition. During that time, they added quite a bit of machine learning, reasoning, statistics, and heuristics. But most of all, they added terabytes of documents.
For the record, following are Ferrucci’s slides from 2006:
Compared to Ferrucci’s talks, the PBS Nova program was a disappointment. It didn’t get into any technical detail, but it did have a few cameo appearances from AI researchers. Terry Winograd and Pat Winston, for example, said that the problem of language understanding is hard.
But I thought that Marvin Minsky and Doug Lenat said more with their tone of voice than with their words. My interpretation (which could, of course, be wrong) is that both of them were seething with jealousy that IBM built a system that was competing with Jeopardy champions on national TV — and without their help.
In any case, the Watson project shows that terabytes of documents are far more important for commonsense reasoning than the millions of formal axioms in Cyc. That does not mean that the Cyc ontology is useless, but it undermines the original assumptions for the Cyc project: commonsense reasoning requires a huge knowledge base of hand-coded axioms together with a powerful inference engine.
An important observation by Ferrucci: The URIs of the Semantic Web are *not* useful for processing natural languages — not for ordinary documents, not for scientific documents, and especially not for Jeopardy questions:
1. For scientific documents, words like ‘H2O’ are excellent URIs. Adding an http address in front of them is pointless.
2. A word like ‘water’, which is sometimes a synonym for ‘H2O’, has an open-ended number of senses and microsenses.
3. Even if every microsense could be precisely defined and cataloged on the WWW, that wouldn’t help determine which one is appropriate for any particular context.
4. Any attempt to force human being(s) to specify or select a precise sense cannot succeed unless *every* human understands and consistently selects the correct sense at *every* possible occasion.
5. Given that point #4 is impossible to enforce and dangerous to assume, any software that uses URIs will have to verify that the selected sense is appropriate to the context.
6. Therefore, URIs found “in the wild” on the WWW can never be assumed to be correct unless they have been guaranteed to be correct by a trusted source.
These points taken together imply that annotations on documents can’t be trusted unless (a) they have been generated by your own system or (b) they were generated by a system which is at least as trustworthy as your own and which has been verified to be 100% compatible with yours.
In summary, the underlying assumptions for both Cyc and the Semantic Web need to be reconsidered.
I don’t always agree with Sowa but he has written extensively on conceptual graphs, knowledge representation and ontological matters. See http://www.jfsowa.com/
You will find a link to an interview with Minsky at that same location.
I don’t know that I would describe Minsky as “…seething with jealousy….”
While I enjoy Jeopardy and it is certainly more cerebral than say American Idol, I think Minsky is right in seeing the Watson effort as something other than artificial intelligence.
Q: In 2011, who was the only non-sentient contestant on the TV show Jeopardy?
I have been watching the identifier debate in the LOD community with its revisionists, personal accounts and other takes on what the problem is, if there is a problem and how to solve the problem if there is one.
I have a slightly different question: What happens when we have a name/identifier?
Short of being present when someone points to or touches an object, themselves, you (if the TSA) and says a name or identifier, what happens?
Try this experiment. Take a sheet of paper and write: George W. Bush.
Now write 10 facts about George W. Bush.
Please circle which ones that you think must match to identify George W. Bush.
So, even though you knew the name George W. Bush, isn’t it fair to say that the circled facts are what you would use to identify George W. Bush?
Here’s is the fun part: Get a colleague or co-worker to do the same experiment. (Substitute Lady Gaga if your friends don’t know enough facts about George W. Bush.)
Now compare several sets of answers for the same person.
Working from the same name, you most likely listed different facts and different ones you would use to identify that subject.
Even though most of you would agree that some or all of the facts listed go with that person.
It sounds like even though we use identifiers/names, those just clue us in on facts, some of which we use to make the identification.
That’s the problem isn’t it?
A name or identifier can make us think of different facts (possibly identifying different subjects) and even if the same subject, we may use different facts to identify the subject.
Assuming we are at a set of facts (RDF graph, whatever) we need to know: What facts identify the subject?
And a subject may have different identifying properties, depending on the context of identification.
Questions:
How to specify essential facts for identification as opposed to the extra ones?
How to answer #1 for an RDF graph?
How do you make others aware of your answer in #2?
Thinking that since everyone is recycling old stuff under the patterns rubric that topic maps may as well jump on the bandwagon.
Instead of the three amigos (was that a movie?) we could have the dirty dozen honchos (or was that another movie?). I don’t get out much these days so I would probably need some help with current cultural references.
This ties into Lars Heuer’s effort to distinguish between Playboy Playmates and Astronauts, while trying to figure out why birds keep, well, let’s just say he has to wash his hair a lot.
When you have an entry from DBpedia, what do you have to know to identify it? Its URI is one thing but I rarely encounter URIs while shopping. (Or playmates for that matter.)
So in 1 out of 2 emergency room situations we get the right term? (Not to knock Kelb’s work. It is an important approach that needs further development.)
Topic maps are guessing as well.
We don’t know what information in a subject identifier identifies a subject. Some of it? All of it? Under what circumstances?
Question: What information identifies a subject, at least to its author?
Answer: Ask the Author.
Asking authors what information identifies their subject(s) seems like an overlooked approach.
Domain specific vocabularies with additional information about subjects that indicates the information that identifies a subject versus merely supplemental information would be a good start.
That avoids inline syntax difficulties and enables authors to easily and quickly associate subject identification information with their documents.
Both RDF and TMDM Topic Maps could use the same vocabularies to improve their handling of associated document content.
Topic maps took a different approach to the problem of identifying subjects (than RDF) and so looks at semantic drift differently.
In the original 13250, subject descriptor was defined as:
3.19 subject descriptor – Information which is intended to provide a positive, unambiguous indication of the identity of a subject, and which is the referent of an identity attribute of a topic link.
When 13250 was reformulated to focus on the XTM syntax and the legend known as the Topic Maps Data Model (TMDM), the subject descriptor of old became subject identifiers. (Clause 7, TMDM)
A subject identifier has information that identifies a subject.
The author of a topic uses information that identifies a subject to create a subject identifier. (Which is represented in a topic map by an IRI.)
Anyone can look at the subject identifier to see if they are talking about the same subject.
They are responsible for catching semantic drift if it occurs.
But, there is something missing from RDF and topic maps.
Something that would help with semantic drift, although they would use it differently.
I mentioned yesterday that creating a public interchangeable identifier isn’t as easy as identifying identifier and documenting them publicly. Recognizing an Interchangeable Identifier
What if I identified (by some means) “Patrick” as an identifier and posted it to my website (public documentation).
Is that now a “public interchangeable identifier?”
No. Why?
First, there has to be some agreed upon means to declare an identifier to be an identifier. When I say agreed upon, it need not be something as formal as a standard but it has to be recognized by a community of users.
Second, it is important to know in what context this is an identifier? Akin to what we talk about as “scope” in topic maps. But with the recognition that the notion of “unconstrained” scope is a pernicious fiction. Scope may be unspecified but it is never unconstrained.
I would argue that no identifier exists without some defined scope. It may not be known or specified but the essence of an identifier, that it identifies some subject, exists only within some scope.
More on means to declare identifiers and their context anon.
Q: How would you recognize an interchangeable identifier?
A: Oh, yeah, that’s right. Anything we can talk about has an identifier, so how to recognize an interchangeable identifier?
If two people agree on column headers for a database table, they have interchangeable identifiers for the columns, at least between the two of them.
There are two requirements for interchangeable identifiers:
Identification as an identifier.
Notice of the identifier.
Any token can be an identifier under some circumstances so identifiers must be identified for interchange.
Notice of an identifier is usually a matter of being part of a profession or discipline. Some term is an identifier because it was taught to you as one.
That works but for local interchange, but public interchange requires publicly documented identifiers.
That’s it. Identify identifiers and document the identifiers publicly and you will have public interchangeable identifiers.
It can’t be that simple? Well, truthfully, it’s not.
More on public interchangeable identifiers forthcoming.
The SciDB project illustrates that there is no general case solution for semantic identity.
If we distinguish between IRIs as addresses versus IRIs as identifiers, IRIs are useful for some cases of semantic identity. (IRIs can be used even if you don’t make that distinction, but they are less useful.)
But can you imagine an IRI for each tuple of values in the some 15 petabytes of data annually from the Large Hadron Collider? It may be very important to identify any number of those tuples. Such as if (not when) they discover the Higgs boson.
Those tuples have semantic identity, as do subjects composed of those tuples.
Rather than seeking general solutions for all semantic identity, perhaps we should find solutions that work for particular cases.
A Logical Account of LyingAuthors:Chiaki Sakama, Martin Caminada and Andreas Herzig Keywords: lying, lies, argumentation systems, artificial intelligence, multiagent systems, intelligent agents.
Abstract:
This paper aims at providing a formal account of lying – a dishonest attitude of human beings. We first formulate lying under propositional modal logic and present basic properties for it. We then investigate why one engages in lying and how one reasons about lying. We distinguish between offensive and defensive lies, or deductive and abductive lies, based on intention behind the act. We also study two weak forms of dishonesty, bullshit and deception, and provide their logical features in contrast to lying. We finally argue dishonesty postulates that agents should try to satisfy for both moral and self-interested reasons. (emphasis in original)
Be the first to have your topic map distinguish between:
Lost In Translation is a summary of recent research on language and its impact on our thinking by Lera Boroditsky (Professor of psychology at Stanford University and editor in chief of Frontiers in Cultural Psychology).
Read the article for the details but concepts such as causality, space and others aren’t as fixed as you may have thought.
Another teaser:
It turns out that if you change how people talk, that changes how they think. If people learn another language, they inadvertently also learn a new way of looking at the world. When bilingual people switch from one language to another, they start thinking differently, too.
Topic maps show different ways to identify the same subject. Put enough alternative identifications together and you will learn to think in another language.
Question: Should topic maps come with the following warning?
Caution: Topic Map – You May Start Thinking Differently
Topic maps and the semantic web share problems and dangers in their rush to re-name things with IRIs.
The problems include, the number of subjects, the propagation (enforcement?) of new names, the emergence of new subjects, and others.
Re-naming has a graver danger, identified by Michael Shara, curator of Astrophysics, American Museum of Natural History, when asked why the heaviest star in the universe, R136a**, doesn’t have a better name.* He responsed:
…partly because it [R136a] refers back to the original catalog, and once you go back to the original catalog, you can find all the literature that refers to it, so naming it John’s star or Betty’s Bright Object, would take that away from us.
So would renaming it to an IRI.
Request of the topic maps and semantic web communities:
Please let us keep our identifiers (as identifiers) and our history.
There was a lively discussion on the topicmapmail discussion list about books and whether they have any universal identifiers. (Look in the archives for July, 2010 and messages with MARC in the subject line.)
There are known problems with ISBNs, such as publishers re-using them or assigning duplicate ISBNs to different books or simply making mistakes with the numbers themselves.
It was reported by one participant that Amazon uses it own unique identifier for books.
The United States Library of Congress has its own internal identifier for books in its collection.
Not to mention that other library systems have their own identifiers for their collections.
At a minimum, it is possible for a book, considered as a subject, to have an ISBN, an identifier from Amazon, another identifier at the Library of Congress and still others in other systems. Perhaps even a unique identifier from a book jobber that sells books to libraries.
If you think about that for a moment, it become clear that a book as a subject has a *set* of identifiers, all of which identify the same subject. Moreover, each of those identifiers works best in a particular context, dare we say the identifier has a scope?
If I had a representative (a topic) for this subject (book) that had a set of identifiers (ISBN, ASIN, LOC, etc.) and each of those identifiers had a scope, I could reliably import information from any source that used at least one of those identifiers.
The originators of those identifiers can use continue to use their identifiers and yet enjoy the benefits of information that was generated or collected using other identifiers.
The internal system maintains its use of the Library of Congress Control Number (the LCCN in the title), which is a unique identifier for that record and allows the outside world access to the same information using a URI.
Question: When I have a work that is identified by a LCCN Permalink and also has an identifier in CiteseerX, DBLP, WorldCat or in a European library, which one should I use?
Question: The FAQ says this link identifies the bibliographic record. Not the same thing as the book it identifies. How should I tell others that I am using the URI to identify a particular book? (Which is not the same thing as the record for that book.)
Citation indexes offer a concrete example of why blindly following the linked data mantra of creating “ URIs as names for things” (Linked Data) is a bad idea.
Science Citation Index Expanded ™ by Thompson Reuters offers coverage using citations to identify articles back to 1900. That works because the articles use citations as identifiers to reference previous articles.
There are articles available in digital form, from arXiv.org, CiteSeerX or some other digital repository. That means that they have an identifier in addition to the more traditional citation reference/identifier.
Where multiple identifiers identify the same subject, we need equivalence operators.
Where identifiers already identify subjects, we need operators that re-use those identifiers.
Ask yourself, “What good is a new set of identifiers that partially duplicates existing identifiers?”
If you think you have a good answer, please email me or reply to this post. Thanks!
Thomas Neidhart‘s comments made me realize I had been too brief on the issue of subject identifiers. I want to correct that by telling “The Story of Blow.”
If you are reading this post you are likely online so please open up another browser window to: Merriam-Webster and type in the search box the word “blow.”
1) Quite clearly “blow” identifies a lot of different subjects. So it is a “subject identifier” in the non-topic map sense.
2) And just as clearly, “blow” can be, has the capacity to, lead us to additional information. That is it can be resolved.
Doesn’t mean it will be resolved, only that resolution is possible.
3) The additional information point is illustrated by the Merriam-Webster entry. As a transitive verb, it lists some 14 separate meanings. All of which involve additional information to know which one is meant.
Btu the dictionary is just a common example.
Another is the information that speakers of English carry around about the meanings of “blow.”
Which means our resolutions of “blow” can differ from that of others. (The “vocabulary problem.”)
4) The additional information in a dictionary is explicit. That is you and I can both examine the same information.
That is in contrast to each of us hearing the term “blow” in conversation or over the radio/TV and deciding privately what was meant. We go through the first three steps but not to the fourth.
I could say: “That was good blow.” and leave you wondering what possible meaning I have assigned to the term “blow.” I’m surprised the dictionary omits this one, in another lifetime I would have understood it to be a reference to cocaine. So if I wanted that usage to be understood by others, I had better mark it with a Subject Identifier so as to make that meaning explicit.
I can think of several other missing definitions for “blow.” Can you?
PS: I was amused at the example given for the sense of “blow” as to spend extravagantly, “I will blow you to a steak.” Since Google reports no “hits” on that string I suspect it was inserted to catch anyone copying their definitions.