Archive for the ‘Context’ Category

Infinite Dimensional Word Embeddings [Variable Representation, Death to Triples]

Thursday, November 19th, 2015

Infinite Dimensional Word Embeddings by Eric Nalisnick and Sachin Ravi.


We describe a method for learning word embeddings with stochastic dimensionality. Our Infinite Skip-Gram (iSG) model specifies an energy-based joint distribution over a word vector, a context vector, and their dimensionality, which can be defined over a countably infinite domain by employing the same techniques used to make the Infinite Restricted Boltzmann Machine (Cote & Larochelle, 2015) tractable. We find that the distribution over embedding dimensionality for a given word is highly interpretable and leads to an elegant probabilistic mechanism for word sense induction. We show qualitatively and quantitatively that the iSG produces parameter-efficient representations that are robust to language’s inherent ambiguity.

Even better from the introduction:

To better capture the semantic variability of words, we propose a novel embedding method that produces vectors with stochastic dimensionality. By employing the same mathematical tools that allow the definition of an Infinite Restricted Boltzmann Machine (Côté & Larochelle, 2015), we describe ´a log-bilinear energy-based model–called the Infinite Skip-Gram (iSG) model–that defines a joint distribution over a word vector, a context vector, and their dimensionality, which has a countably infinite domain. During training, the iSGM allows word representations to grow naturally based on how well they can predict their context. This behavior enables the vectors of specific words to use few dimensions and the vectors of vague words to elongate as needed. Manual and experimental analysis reveals this dynamic representation elegantly captures specificity, polysemy, and homonymy without explicit definition of such concepts within the model. As far as we are aware, this is the first word embedding method that allows representation dimensionality to be variable and exhibit data-dependent growth.

Imagine a topic map model that “allow[ed] representation dimensionality to be variable and exhibit data-dependent growth.

Simple subjects, say the sort you find at, can have simple representations.

More complex subjects, say the notion of “person” in U.S. statutory law (no, I won’t attempt to list them here), can extend its dimensional representation as far as is necessary.

Of course in this case, the dimensions are learned from a corpus but I don’t see any barrier to the intentional creation of dimensions for subjects and/or a combined automatic/directed creation of dimensions.

Or as I put it in the title, Death to All Triples.

More precisely, not just triples but any pre-determined limit on representation.

Looking forward to taking a slow read on this article and those it cites. Very promising.

Message of Ayatollah Seyyed Ali Khamenei To the Youth in Europe and North America

Tuesday, January 27th, 2015

#LETTER4U Message of Ayatollah Seyyed Ali Khamenei To the Youth in Europe and North America

Unlike many news sources I will not attempt to analyze this message from Ayatollah Seyyed Ali Khamenei.

You should read the message for yourself and not rely on the interpretations of others.

Ayatollah Seyyed Ali Khamenei’s request is an honorable one and should be granted. You will find it an exercise in attempting (one never really succeeds) to understand the context of another. That is one of the key skills in creating topic maps that traverse the contextual boundaries of departments, enterprises, government offices and cultures.

It isn’t easy to stray from one’s own cultural context but even making the effort is worthwhile.

Understanding Context

Sunday, January 25th, 2015

Understanding Context by Andrew Hinton.

From the post:

Technology is destabilizing the way we understand our surroundings. From social identity to ubiquitous mobility, digital information keeps changing what here means, how to get there, and even who we are. Why does software so easily confound our perception and scramble meaning? And how can we make all this complexity still make sense to our users?

Understanding Context — written by Andrew Hinton of The Understanding Group — offers a powerful toolset for grasping and solving the challenges of contextual ambiguity. By starting with the foundation of how people perceive the world around them, it shows how users touch, navigate, and comprehend environments made of language and pixels, and how we can make those places better.

Understanding Context is ideal for information architects, user experience professionals, and designers of digital products and services of any scope. If what you create connects one context to another, you need this book.


Amazon summarizes in part:

You’ll discover not only how to design for a given context, but also how design participates in making context.

  • Learn how people perceive context when touching and navigating digital environments
  • See how labels, relationships, and rules work as building blocks for context
  • Find out how to make better sense of cross-channel, multi-device products or services
  • Discover how language creates infrastructure in organizations, software, and the Internet of Things
  • Learn models for figuring out the contextual angles of any user experience

This book is definitely going on my birthday wish list at Amazon. (There done!)

Looking forward to a slow read and in the meantime, will start looking for items from the bibliography.

My question, of course, is that after expending all the effort to discover and/or design a context, how do I pass that context onto another?

To someone coming from a slightly different context? (Assuming always that the designer is “in” a context.)

From a topic map perspective, what subjects do I need to represent to capture a visual context? Even more difficult, what properties of those subjects do I need to capture to enable their discovery by others? Or to facilitate mapping those subjects to another context/domain?

Definitely a volume I would assign as reading for a course on topic maps.

I first saw this in a tweet by subjectcentric.

Accidental vs Deliberate Context

Saturday, December 27th, 2014

Accidental vs Deliberate Context by Jessica Kerr.

From the post:

In all decisions, we bring our context with us. Layers of context, from what we read about that morning to who our heroes were growing up. We don’t realize how much context we assume in our communications, and in our code.

One time I taught someone how to make the Baby Vampire face. It involves poking out both corners of my lower lip, so they stick up like poky gums. Very silly. To my surprise, the person couldn’t do it. They could only poke one side of the lower lip out at a time.

Turns out, few outside my family can make this face. My mom can do it, my sister can do it, my daughters can do it – so it came as a complete surprise to me when someone couldn’t. There is a lip-flexibility that’s part of my context, always has been, and I didn’t even realize it.

Jessica goes on to illustrate that communication depends upon the existence of some degree of shared context and that additional context can be explained to others, as on a team.

She distinguishes between “incidental” shared contexts and “deliberate” shared contexts. Incidental contexts arising from family or long association with friends. Common/shared experiences form an incidental context.

Deliberate contexts, on the other hand, are the intentional melding of a variety of contexts, in her examples, the contexts of biologists and programmers. Who at the outset, lacked a common context in which to communicate.

Forming teams with diverse backgrounds is a way to create a “deliberate” context, but my question would be how to preserve that “deliberate” context for others? It becomes an “incidental” context if others must join the team in order to absorb the previously “deliberate” context. If that is a requirement, then others will not be able to benefit from deliberately created contexts in which they did not participate.

If the process and decisions made in forming a “deliberate” context were captured by a topic map, then others could apply this “new” deliberate context to develop other “deliberate” contexts. Perhaps some of the decisions or mappings made would not suit another “deliberate” context but perhaps some would. And perhaps other “deliberate” contexts would evolve beyond the end of their inputs.

The point being that unless these “deliberate” contexts are captured, to whatever degree of granularity is desired, every “deliberate” context for say biologists and programmers is starting off at ground zero. Have you ever heard of a chemistry experiment starting off by recreating the periodic table? I haven’t. Perhaps we should abandon that model in the building of “deliberate” contexts as well.

Not to mention that re-usable “deliberate” contexts might enable greater diversity in teams.

Topic maps anyone?

PS: I suggest topic maps to capture “deliberate” context because topic maps are not constrained by logic. You can capture any subject and any relationship between subjects, logical or not. For example, a user of a modern dictionary, which lists words in alphabetical order, would be quite surprised if given a dictionary of Biblical Hebrew and asked to find a word (assuming they know the alphabet). The most common dictionaries of Biblical Hebrew list words by their roots and not as they appear to the common reader. There are arguments to be made for each arrangement but neither one is a “logical” answer.

The arrangement of dictionaries is another example of differing contexts. With a topic map I can offer a reader whichever Biblical Hebrew dictionary is desired, with only one text underlying both displays. As opposed to the printed version which can offer only one context or another.

Coeffects: The next big programming challenge

Sunday, January 12th, 2014

Coeffects: The next big programming challenge by Tomas Petricek.

From the post:

Context-aware programming matters

The phrase context in which programs are executed sounds quite abstract and generic. What are some concrete examples of such context? For example:

  • When writing a cross-platform application, different platforms (and even different versions of the same platform) provide different contexts – the API functions that are available.
  • When creating a mobile app, the different capabilities that you may (or may not) have access to are context (GPS sensor, accelerometer, battery status).
  • When working with data (be it sensitive database or social network data from Facebook), you have permissions to access only some of the data (depending on your identity) and you may want to track provenance information. This is another example of a context.

These are all fairly standard problems that developers deal with today. As the number of devices where programs need to run increases, dealing with diverse contexts will be becoming more and more important (and I’m not even talking about ubiquitous computing where you need to compile your code to a coffee machine).

We do not preceive the above things as problems (at best, annoyances that we just have to deal with), because we do not realize that there should be a better way. Let me digg into four examples in a bit more detail.

This post is a good introduction to Tomas’ academic work.

A bit further on Tomas explains what he means by “coeffects:”

Coeffects: Towards context-aware languages

The above examples cover a couple of different scenarios, but they share a common theme – they all talk about some context in which an expression is evaluated. The context has essentially two aspects:

  • Flat context represents additional data, resources and meta-data that are available in the execution environment (regardless of where in the program you access them). Examples include resources like GPS sensors or databases, battery status, framework version and similar.
  • Structural context contains additional meta-data related to variables. This can include provenance (source of the variable value), usage information (how often is the value accessed) or security information (does it contain sensitive data).

As a proponent of statically typed functional languages I believe that a context-aware programming language should capture such context information in the type system and make sure that basic errors (like the ones demonstrated in the four examples above) are ruled out at compile time.

This is essentially the idea behind coeffects. Let’s look at an example showing the idea in (a very simplified) practice and then I’ll say a few words about the theory (which is the main topic of my upcoming PhD thesis).

I don’t know that Tomas would agree but I see his “coeffects,” particularly “meta-data related to variables,” as keying off the subject identity of variables.

Think of it this way: What is the meaning of any value with no express or implied context?

My answer would be that a value without context is meaningless.

Be example, how would you process the value “1” Is it a boolean? Integer? A string?

Imbuing data with “meta-data” (or explicit identity as I prefer) is a first step towards transparent data.

PS: See Petricek and Skeet’s Real-World Functional Programming.

Context Aware Searching

Thursday, September 19th, 2013

Scaling Up Personalized Query Results for Next Generation of Search Engines

From the post:

North Carolina State University researchers have developed a way for search engines to provide users with more accurate, personalized search results. The challenge in the past has been how to scale this approach up so that it doesn’t consume massive computer resources. Now the researchers have devised a technique for implementing personalized searches that is more than 100 times more efficient than previous approaches.

At issue is how search engines handle complex or confusing queries. For example, if a user is searching for faculty members who do research on financial informatics, that user wants a list of relevant webpages from faculty, not the pages of graduate students mentioning faculty or news stories that use those terms. That’s a complex search.

“Similarly, when searches are ambiguous with multiple possible interpretations, traditional search engines use impersonal techniques. For example, if a user searches for the term ‘jaguar speed,’ the user could be looking for information on the Jaguar supercomputer, the jungle cat or the car,” says Dr. Kemafor Anyanwu, an assistant professor of computer science at NC State and senior author of a paper on the research. “At any given time, the same person may want information on any of those things, so profiling the user isn’t necessarily very helpful.”

Anyanwu’s team has come up with a way to address the personalized search problem by looking at a user’s “ambient query context,” meaning they look at a user’s most recent searches to help interpret the current search. Specifically, they look beyond the words used in a search to associated concepts to determine the context of a search. So, if a user’s previous search contained the word “conservation” it would be associated with concepts likes “animals” or “wildlife” and even “zoos.” Then, a subsequent search for “jaguar speed” would push results about the jungle cat higher up in the results — and not the automobile or supercomputer. And the more recently a concept has been associated with a search, the more weight it is given when ranking results of a new search.

I rather like the contrast of ambiguous searches being resolved with “impersonal techniques.”

The paper, Scaling Concurrency of Personalized Semantic Search over Large RDF Data by Haizhou Fu, Hyeongsik Kim, and Kemafor Anyanwu, has this abstract:

Recent keyword search techniques on Semantic Web are moving away from shallow, information retrieval-style approaches that merely find “keyword matches” towards more interpretive approaches that attempt to induce structure from keyword queries. The process of query interpretation is usually guided by structures in data, and schema and is often supported by a graph exploration procedure. However, graph exploration-based interpretive techniques are impractical for multi-tenant scenarios for large database because separate expensive graph exploration states need to be maintained for different user queries. This leads to significant memory overhead in situations of large numbers of concurrent requests. This limitation could negatively impact the possibility of achieving the ultimate goal of personalizing search. In this paper, we propose a lightweight interpretation approach that employs indexing to improve throughput and concurrency with much less memory overhead. It is also more amenable to distributed or partitioned execution. The approach is implemented in a system called “SKI” and an experimental evaluation of SKI’s performance on the DBPedia and Billion Triple Challenge datasets show orders-of-magnitude performance improvement over existing techniques.

If you are interesting in scaling issues for topic maps, note the use of indexing as opposed to graph exploration techniques in this paper.

Also consider mining “discovered” contexts that lead to “better” results from the viewpoint of users. Those could be the seeds for serializing those contexts as topic maps.

Perhaps even directly applicable to work by researchers, librarians, intelligence analysts.

Seasoned searchers use richer contexts in searching that the average user and if those contexts are captured, they could enrich the search contexts of the average user.

How Impoverished is the “current world of search?”

Wednesday, May 8th, 2013

Internet Content Is Looking for You

From the post:

Where you are and what you’re doing increasingly play key roles in how you search the Internet. In fact, your search may just conduct itself.

This concept, called “contextual search,” is improving so gradually the changes often go unnoticed, and we may soon forget what the world was like without it, according to Brian Proffitt, a technology expert and adjunct instructor of management in the University of Notre Dame’s Mendoza College of Business.

Contextual search describes the capability for search engines to recognize a multitude of factors beyond just the search text for which a user is seeking. These additional criteria form the “context” in which the search is run. Recently, contextual search has been getting a lot of attention due to interest from Google.


“You no longer have to search for content, content can search for you, which flips the world of search completely on its head,” says Proffitt, who is the author of 24 books on mobile technology and personal computing and serves as an editor and daily contributor for

“Basically, search engines examine your request and try to figure out what it is you really want,” Proffitt says. “The better the guess, the better the perceived value of the search engine. In the days before computing was made completely mobile by smartphones, tablets and netbooks, searches were only aided by previous searches.


Context can include more than location and time. Search engines will also account for other users’ searches made in the same place and even the known interests of the user.

If time and location plus prior searches is context that “…flips the world of search completely on its head…”, imagine what a traditional index must do.

A traditional index being created by a person who has subject matter knowledge beyond the average reader and so is able to point to connections and facts (context) previously unknown to the user.

The “…current world of search…” is truly impoverished for time and location to have that much impact.

Qi4j SDK Release 2.0

Sunday, April 28th, 2013

Qi4j SDK Release 2.0

From the post:

After nearly 2 years of hard work, the Qi4j Community today launched its second generation Composite Oriented Programming framework.

Qi4j is Composite Oriented Programming for the Java platform. It is a top-down approach to write business applications in a maintainable and efficient manner. Qi4j let you focus on the business domain, removing most impedance mismatches in software development, such as object-relation mapping, overlapping concerns and testability.

Qi4j’s main areas of excellence are its enforcement of application layering and modularization, the typed and generic AOP approach, affinity based dependency injection, persistence management, indexing and query subsystems, but there are much more.

The 2.0 release is practically a re-write of the entire runtime, according to co-founder Niclas Hedhman; “Although we are breaking compatibility in many select areas, most 1.4 applications can be converted with relatively few changes.”. He continues; “These changes are necessary for the next set of planned features, including full Scala integration, the upcoming JDK8 and Event Sourcing integrated into the persistence model.”

“It has been a bumpy ride to get this release out the door.”, said Paul Merlin, the 2.0 Release Manager, “but we are determined that Qi4j represents the best technological platform for Java to create applications with high business value.” Not only has the community re-crafted a remarkable codebase, but also created a brand new website, fully integrated with the new Gradle build process.

See: and

Principles of Composite Oriented Programming:

  • Behavior depends on Context
  • Decoupling is a virtue
  • Business Rules matters more
  • Classes are dead, long live interfaces

“Behavior depends on Context” sounds a lot like identity depends on context, either of what the object represents or a user.

Does your application capture context for data or its users? If so, what does it do with that information?

Speak of the devil,… I just mentioned Peter Neubauer in a prior post, then I see his tweet on Qi4j. 😉

G2 | Sensemaking – Two Years Old Today

Sunday, February 3rd, 2013

G2 | Sensemaking – Two Years Old Today by Jeff Jonas.

From the post:

What is G2?

When I speak about Context Accumulation, Data Finds Data and Relevance Finds You, and Sensemaking I am describing various aspects of G2.

In simple terms G2 software is designed to integrate diverse observations (data) as it arrives, in real-time.  G2 does this incrementally, piece by piece, much in the same way you would put a puzzle together at home.  And just like at home, the more puzzle pieces integrated into the puzzle, the more complete the picture.  The more complete the picture, the better the ability to make sense of what has happened in the past, what is happening now, and what may come next.  Users of G2 technology will be more efficient, deliver high quality outcomes, and ultimately will be more competitive.

Early adopters seem to be especially interested in one specific use case: Using G2 to help organizations better direct the attention of its finite workforce.  With the workforce now focusing on the most important things first, G2 is then used to improve the quality of analysis while at the same time reducing the amount of time such analysis takes.  The bigger the organization, the bigger the observation space, the more essential sensemaking is.

About Sensemaking

One of the things G2 can already do pretty darn well – considering she just turned two years old – is ”Sensemaking.”  Imagine a system capable of paying very close attention to every observation that comes its way.  Each observation incrementally improving upon the picture and using this emerging picture in real-time to make higher quality business decisions; for example, the selection of the perfect ad for a web page (in sub-200 milliseconds as the user navigates to the page) or raising an alarm to a human for inspection (an alarm sufficiently important to be placed top of the queue).  G2, when used this way, enables Enterprise Intelligence.

Of course there is no magic.  Sensemaking engines are limited by their available observation space.  If a sentient being would be unable to make sense of the situation based on the available observation space, neither would G2.  I am not talking about Fantasy Analytics here.

I would say “subject identity” instead of “sensemaking” and after reading Jeff’s post, consider them to be synonyms.

Read the section General Purpose Context Accumulation very carefully.

As well as “Privacy by Design (PbD).”

BTW, G2 uses Universal Message Format XML for input/output.

Not to argue from authority but Jeff is one of only 77 active IBM Research Fellows.

Someone to listen to, even if we may disagree on some of the finer points.

Why I subscribe to the Ann Arbor Chronicle

Tuesday, October 30th, 2012

Why I subscribe to the Ann Arbor Chronicle by Jon Udell.

At one level, Jon’s post describes why he subscribes to a free online newspaper.

At another level, Jon is describing the value-add that makes content so valuable it attracts voluntary support.

That value-add? Context.

The newspaper has built a context for reporting news, the reported news is situated in the context of prior reports and situated in a context built from other sources.

As opposed to reporting allegations, rumors or even facts, with no useful context in which to evaluate them.

If you prefer cartoons, visit Use the calendar icon to search for: February 7, 1993.

Context-Aware Recommender Systems 2012 [Identity and Context?]

Tuesday, September 11th, 2012

Context-Aware Recommender Systems 2012 (In conjunction with the 6th ACM Conference on Recommender Systems (RecSys 2012))

I usually think of recommender systems as attempts to deliver content based on clues about my interests or context. If I dial 911, the location of the nearest pizza vendor probably isn’t high on my lists of interests, etc.

As I looked over these proceedings, it occurred to me that subject identity, for merging purposes, isn’t limited to the context of the subject in question.

That is some merging tests could depend upon my context as a user.

Take my 911 call for instance. For many purposes, a police substation, fire station, 24 hour medical clinic and a hospital are different subjects.

In a medical emergency situation, for which a 911 call might be a clue, all of those could be treated as a single subject – places for immediate medical attention.

What other subjects do you think might merge (or not) depending upon your context?

Table of Contents

  1. Optimal Feature Selection for Context-Aware Recommendation Using Differential Relaxation
    Yong Zheng, Robin Burke, Bamshad Mobasher.
  2. Relevant Context in a Movie Recommender System: Users’ Opinion vs. Statistical Detection
    Ante Odic, Marko Tkalcic, Jurij Franc Tasic, Andrej Kosir.
  3. Improving Novelty in Streaming Recommendation Using a Context Model
    Doina Alexandra Dumitrescu, Simone Santini.
  4. Towards a Context-Aware Photo Recommender System
    Fabricio Lemos, Rafael Carmo, Windson Viana, Rossana Andrade.
  5. Context and Intention-Awareness in POIs Recommender Systems
    Hernani Costa, Barbara Furtado, Durval Pires, Luis Macedo, F. Amilcar Cardoso.
  6. Evaluation and User Acceptance Issues of a Bayesian-Classifier-Based TV Recommendation System
    Benedikt Engelbert, Karsten Morisse, Kai-Christoph Hamborg.
  7. From Online Browsing to Offline Purchases: Analyzing Contextual Information in the Retail Business
    Simon Chan, Licia Capra.

Sarcastic Computers?

Thursday, May 31st, 2012

You may have seen the headline: Could Sarcastic Computers Be in Our Future? New Math Model Can Help Computers Understand Inference.

And the lead for the article sounds promising:

In a new paper, the researchers describe a mathematical model they created that helps predict pragmatic reasoning and may eventually lead to the manufacture of machines that can better understand inference, context and social rules.

Language is so much more than a string of words. To understand what someone means, you need context.

Consider the phrase, “Man on first.” It doesn’t make much sense unless you’re at a baseball game. Or imagine a sign outside a children’s boutique that reads, “Baby sale — One week only!” You easily infer from the situation that the store isn’t selling babies but advertising bargains on gear for them.

Present these widely quoted scenarios to a computer, however, and there would likely be a communication breakdown. Computers aren’t very good at pragmatics — how language is used in social situations.

But a pair of Stanford psychologists has taken the first steps toward changing that.

Context being one of those things you can use semantic mapping techniques to capture, I was interested.

Jack Park pointed me to a public PDF of the article: Predicting pragmatic reasoning in language games

Be sure to read the entire file.

A blue square, a blue circle, a green square.

Not exactly a general model for context and inference.

Context models and out-of-context objects

Saturday, May 5th, 2012

Context models and out-of-context objects by Myung Jin Choia, Antonio Torralbab, Alan S. Willskyc.


The context of an image encapsulates rich information about how natural scenes and objects are related to each other. Such contextual information has the potential to enable a coherent understanding of natural scenes and images. However, context models have been evaluated mostly based on the improvement of object recognition performance even though it is only one of many ways to exploit contextual information. In this paper, we present a new scene understanding problem for evaluating and applying context models. We are interested in finding scenes and objects that are “out-of-context”. Detecting “out-of-context” objects and scenes is challenging because context violations can be detected only if the relationships between objects are carefully and precisely modeled. To address this problem, we evaluate different sources of context information, and present a graphical model that combines these sources. We show that physical support relationships between objects can provide useful contextual information for both object recognition and out-of-context detection.

The authors distinguish object recognition in surveillance video versus still photographs, the subject of the investigation here. A “snapshot” if you will.

Subjects in digital media, assuming you don’t have the authoring data stream, exist in “snapshots” of a sort don’t they?

To start with they are bound up in a digital artifact, which among other things lives in a file system, with a last modified date, amongst many other files.

There may be more “context” for subjects in digital files that appears at first blush. Will have to give that some thought.

Social Media Monitoring with CEP, pt. 2: Context As Important As Sentiment

Sunday, February 5th, 2012

Social Media Monitoring with CEP, pt. 2: Context As Important As Sentiment by Chris Carlson.

From the post:

When I last wrote about social media monitoring, I made a case for using a technology like Complex Event Processing (“CEP”) to detect rapidly growing and geospatially-oriented social media mentions that can provide early warning detection for the public good (Social Media Monitoring for Early Warning of Public Safety Issues, Oct. 27, 2011).

A recent article by Chris Matyszczyk of CNET highlights the often conflicting and confusing nature of monitoring social media. A 26-year old British citizen, Leigh Van Bryan, gearing up for a holiday of partying in Los Angeles, California (USA), tweeted in British slang his intention to have a good time: “Free this week, for quick gossip/prep before I go and destroy America.” Since I’m not too far removed the culture of youth, I did take this to mean partying, cutting loose, having a good time (and other not-so-current definitions.)

This story does not end happily, as Van Bryan and his friend Emily Bunting were arrested and then sent back to Blighty.

This post will not increase American confidence in the TSAbut does illustrate how context can influence the identification of a subject (or “person of interest”) or to exclude the same.

Context is captured in topic maps using associations. In this particular case, a view of the information on the young man in question would reveal a lack of associations with any known terror suspects, people on the no-fly list, suspicious travel patterns, etc.

Not to imply that having good information leads to good decisions, technology can’t correct that particular disconnect.

Multiple Recognitions: Reconsidered

Wednesday, February 1st, 2012

Yesterday I closed with these lines:

Requirement: A system of identification must support the same identifiers resolving to different identifications.

The consequences of deciding otherwise on such a requirement, I will try to take up tomorrow. (Multiple Recognitions)

Rereading that for today’s post, I don’t agree with myself.

The requirement isn’t a requirement at all but an observation that the same identifier may have multiple resolutions.

Better to say that the designer of systems of identification should be aware of that observation. To avoid situations like I posed yesterday with “I will call you a cab” example.

A fortuitous mistake because it leads to the next issue that I wanted to address: Do identifiers have contexts in which they have only a single resolution?

Yesterday’s mistake has made me more wary of sweeping pronouncements so I am posing the context issue as a question. 😉

Can you think of any counter-examples?

The easiest place to look would be in comedy, where mistaken identity (such as in Shakespeare), double meanings, etc., are bread and butter of the art. Two or more people hear or see the same identifier and reach different resolutions.

In those cases, if we had a rule that identifiers could only have a single resolution, we would have to simply skip over those cases. That seems like an inelegant solution.

Or would you shrink the context down to the individuals who had the different resolutions of an identifier?

Perhaps, perhaps but then what is your solution when later in the play one or more individuals discover their mistake and now hold a common resolution but still remember the one that was in error? Or perhaps more than one that was in error? How do we describe the context(s) there?

There is a long history of such situations in comedy. You may be tempted to say that recreational literature can be excluded. That “fictional” work isn’t the first place we want semantic technologies to work.

Perhaps but remember that comedy and “fiction” have their origin in our day to day affairs. The misunderstandings they parody are our misunderstandings.

The saying: “what did X know and when did they know it?” takes on new meaning when we take about the interpretation of identifiers. Perhaps “freedom fighter” is a more sympathetic term until you “know” those forces are operating death squads. And may have different legal consequences.

How do you think boundaries for contexts should be set/designated? Seems like that would be an important issue to take up.

The communicative function of ambiguity in language

Friday, January 20th, 2012

The communicative function of ambiguity in language by Steven T. Piantadosi, Harry Tily and Edward Gibson. (Cognition, 2011) (PDF file)


We present a general information-theoretic argument that all efficient communication systems will be ambiguous, assuming that context is informative about meaning. We also argue that ambiguity allows for greater ease of processing by permitting efficient linguistic units to be re-used. We test predictions of this theory in English, German, and Dutch. Our results and theoretical analysis suggest that ambiguity is a functional property of language that allows for greater communicative efficiency. This provides theoretical and empirical arguments against recent suggestions that core features of linguistic systems are not designed for communication.

This is a must read paper if you are interesting in ambiguity and similar issues.

At page 289, the authors report:

These findings suggest that ambiguity is not enough of a problem to real-world communication that speakers would make much effort to avoid it. This may well be because actual language in context provides other information that resolves the ambiguities most of the time.

I don’t know if our communication systems are efficient or not but I think the phrase “in context” is covering up a very important point.

Our communication systems came about in very high-bandwidth circumstances. We were in the immediate presence of a person speaking. With all the context that provides.

Even if we accept an origin of language of say 200,000 years ago, written language, which provides the basis for communication without the presence of another person, emerges only in the last five or six thousand years. Just to keep it simple, 5 thousand years would be 2.5% of the entire history of language.

So for 97.5% of the history of language, it has been used in a high bandwidth situation. No wonder it has yet to adapt to narrow bandwidth situations.

If writing puts us into a narrow bandwidth situation and ambiguity, where does that leave our computers?

Context and Semantics for Knowledge Management – … Personal Productivity [and Job Security]

Friday, October 28th, 2011

Context and Semantics for Knowledge Management – Technologies for Personal Productivity by Warren, Paul; Davies, John; Simperl, Elena (Eds.). 1st Edition., 2011, X, 392 p. 120 illus., 4 in color. Hardcover, ISBN 978-3-642-19509-9

I quite agree with the statement: “the fact that much corporate knowledge only resides in employees’ heads seriously hampers reuse.” True but it is also a source of job security. In organizations both large and small, in the U.S. and in other countries as well.

I don’t think any serious person believes the Pentagon (US) needs to have more than 6,000 HR systems. But, job security presents different requirements from say productivity, accomplishment of mission (aside from the mission of remaining employed), in this case, national defense, etc.

How one overcomes job security is going to vary from system to system. Be aware it is a non-technical issue and technology is not the answer to it. It is a management issue that management would like to treat as a technology problem. Treating personnel issues as problems that can be solved with technology nearly universally fails.

From the announcement:

Knowledge and information are among the biggest assets of enterprises and organizations. However, efficiently managing, maintaining, accessing, and reusing this intangible treasure is difficult. Information overload makes it difficult to focus on the information that really matters; the fact that much corporate knowledge only resides in employees’ heads seriously hampers reuse.

The work described in this book is motivated by the need to increase the productivity of knowledge work. Based on results from the EU-funded ACTIVE project and complemented by recent related results from other researchers, the application of three approaches is presented: the synergy of Web 2.0 and semantic technology; context-based information delivery; and the use of technology to support informal user processes. The contributions are organized in five parts. Part I comprises a general introduction and a description of the opportunities and challenges faced by organizations in exploiting Web 2.0 capabilities. Part II looks at the technologies, and also some methodologies, developed in ACTIVE. Part III describes how these technologies have been evaluated in three case studies within the project. Part IV starts with a chapter describing the principal market trends for knowledge management solutions, and then includes a number of chapters describing work complementary to ACTIVE. Finally, Part V draws conclusions and indicates further areas for research.

Overall, this book mainly aims at researchers in academia and industry looking for a state-of-the-art overview of the use of semantic and Web 2.0 technologies for knowledge management and personal productivity. Practitioners in industry will also benefit, in particular from the case studies which highlight cutting-edge applications in these fields.

Axioms of Context

Wednesday, January 19th, 2011

Tefko Saracevic in his keynote address The notion of context in “Information Interaction in Context” at: the Third Information Interaction in Context Symposium (IIiX’10) offered the following five (5) axioms of context:

  • Axiom 1: One cannot not have a context in information interaction. Every interaction is conducted within a context. Because context-less information interaction is impossible, it is not possible not to have a context.
  • Axiom 2: Every interaction has a content and relationship aspect – context is the later and classifies the former. It means that all interactions, apart from information derived from meaning of words or terms describing the content, have more information to be derived from context.
  • Axiom 3: The nature of information interaction is asymmetric; it involves differing processes and interpretation by parties involved. Contexts are asymmetric as well. Systems context is primarily about meanings; user context is primarily about situations.
  • Axiom 4: Context is multilayered. It extends beyond users or systems. In interactions it is customary to consider direct context, but context extends indirectly to broader social context also.
  • Axiom 5: Context is not self-revealing, nor is it self-evident. Context may be difficult to formulate and synthesize. But plenty can go wrong when not taken into consideration in interactions.

Unfortunately only an abstract of Saracevic’s keynote is reported in the proceedings.

I think his fifth axiom, Context is not self-revealing, nor it it self-evident, is the one most relevant for topic maps.

What subjects we mean to identify depend upon contexts we may only dimly sense. Mostly because they are so familiar.

In a Bible encoding project several years ago, none of our messages made the context clear because we shared the context in which those messages took place.

Anyone who stumbled upon those at the time or later, could have a hard time deciding what was being talked about and why?

We have always had the capacity to say more about context, but topic maps enable us to build mappings based on those statements of contexts.

The contexts that give our words meaning and identify the subjects of discussion.

Indexicality: Understanding mobile human-computer interaction in context

Wednesday, January 5th, 2011

Indexicality: Understanding mobile human-computer interaction in context Authors: Jesper Kjeldskov, Jeni Paay Keywords: Mobile computing, indexicality, physical context, spatial context, social context, prototype systems, field evaluation, public transport, healthcare, sociality


A lot of research has been done within the area of mobile computing and context-awareness over the last 15 years, and the idea of systems adapting to their context has produced promising results for overcoming some of the challenges of user interaction with mobile devices within various specialized domains. However, today it is still the case that only a limited body of theoretically grounded knowledge exists that can explain the relationship between users, mobile system user interfaces, and their context. Lack of such knowledge limits our ability to elevate learning from the mobile systems we develop and study from a concrete to an abstract level. Consequently, the research field is impeded in its ability to leap forward and is limited to incremental steps from one design to the next. Addressing the problem of this void, this article contributes to the body of knowledge about mobile interaction design by promoting a theoretical approach for describing and understanding the relationship between user interface representations and user context. Specifically, we promote the concept of indexicality derived from semiotics as an analytical concept that can be used to describe and understand a design. We illustrate the value of the indexicality concept through an analysis of empirical data from evaluations of three prototype systems in use. Based on our analytical and empirical work we promote the view that users interpret information in a mobile computer user interface through creation of meaningful indexical signs based on the ensemble of context and system.

One of the more interesting observations by the authors is that the greater the awareness of context, the less information that has to be presented to the user. For a mobile device, with limited display area that is an advantage.

It would be an advantage for other interfaces because even with a lot of screen real estate, it would be counter-productive to over run the user with information about a subject.

Present them with the information relevant to a particular context, leaving the option for them to request additional information.

Probabilistic User Modeling in the Presence of Drifting Concepts

Saturday, December 4th, 2010

Probabilistic User Modeling in the Presence of Drifting Concepts Authors(s): Vikas Bhardwaj, Ramaswamy Devarajan


We investigate supervised prediction tasks which involve multiple agents over time, in the presence of drifting concepts. The motivation behind choosing the topic is that such tasks arise in many domains which require predicting human actions. An example of such a task is recommender systems, where it is required to predict the future ratings, given features describing items and context along with the previous ratings assigned by the users. In such a system, the relationships among the features and the class values can vary over time. A common challenge to learners in such a setting is that this variation can occur both across time for a given agent, and also across different agents, (i.e. each agent behaves differently). Furthermore, the factors causing this variation are often hidden. We explore probabilistic models suitable for this setting, along with efficient algorithms to learn the model structure. Our experiments use the Netflix Prize dataset, a real world dataset which shows the presence of time variant concepts. The results show that the approaches we describe are more accurate than alternative approaches, especially when there is a large variation among agents. All the data and source code would be made open-source under the GNU GPL.

Interesting because not only do concepts drift from user to user but modeling users as existing in neighborhoods of other users was more accurate than purely homogeneous or heterogeneous models.


  1. If there is a “neighborhood” effect on users, what, if anything does that imply for co-occurrence of terms? (3-5 pages, no citations)
  2. How would you determine “neighborhood” boundaries for terms? (3-5 pages, citations)
  3. Do “neighborhoods” for terms vary by semantic domains? (3-5 pages, citations)

Be aware that the Netflix dataset is no longer available. Possibly in response to privacy concerns. A demonstration of the utility of such concerns and their advocates.

Context of Data?

Wednesday, May 19th, 2010

Cristiana Bolchini and others in And What Can Context Do For Data? have started down an interesting path for exploration.

That all data exists in some context is an unremarkable observation until one considers how often that context can be stated, attributed to data, to say nothing of being used to filter or access that data.

Bolchini introduces the notion of a context dimension tree (CDT) which “models context in terms of a set of context dimensions, each capturing a different characteristic of the context.” (CACM, Nov. 2009, page 137) Note that dimensions can be decomposed into sub-trees for further analysis. Further operations combine these dimensions into the “context” of the data that is used to produce a particular view of the data.

Not quite what is meant by scope in topic maps but something a bit more nuanced and subtle. I would argue (no surprise) that the context of a subject is part and parcel of its identity. And how much of that context we choose to represent will vary from project to project.

Further reading:

Bolchini, C., Curino, C. A., Quintaretti, E., Tanca, L. and Schreber, F. A. A data-oriented study of context models. SIGMOD Record, 2007.

Bolchini, C., Quintaretti, E. and Rossato, R. Relational data tailoring through view composition. In Proc. Intl. Conf. on Conceptual Modeling (ER’2007). LNCS. Nov. 2007

Context-ADDICT (its an acronym, I swear!) Website for the project developing this line of research. Prototype software available.

Context Is A Multi-Splendored Thing

Saturday, March 20th, 2010

Sven’s Dude, where’s my context? illustrates an interesting point about topic maps that is easy to overlook. He proposes to create a topic map that maps co-occurrences of terms to a topic and then uses that information as part of a search process.

Assume we had Sven’s topic map for a set of documents and we also had the results of a probe into the same set by coders who had sketched in some ways used to identify one or more subjects. Perhaps even the results of several probes into the set by different sets of coders. (Can anyone say, “different legal teams looking at the same document collection?”)

Each set of coders or team may be using different definitions of context to identify subjects. And, quite likely, they will be identifying the same subjects, albeit based on different contexts.

If team A discovers that the subject “Patrick Durusau” always uses the term “proxy,” as a technical term from an ISO standard, that information can inform all subsequent searches for that term. That is to say that as contexts are “established” for a document collection, subsequent searches can become more precise.

Expressed as a proposition: Topic maps enable cumulative exploration and mapping of information. (As opposed to where searches start at the beginning, again. You would think that would get tiresome.)