Archive for the ‘Serendipity’ Category

Six Degrees of Wikipedia – Eye Candy or Opportunity for Serendipity?

Wednesday, February 28th, 2018

Six Degrees of Wikipedia

As the name implies, finds the shortest path between two Wikipedia pages. defines serendipity in part as:

In general, serendipity is the act of finding something valuable or delightful when you are not looking for it. In information technology, serendipity often plays a part in the recognition of a new product need or in solving a design problem. Web surfing can be an occasion for serendipity since you sometimes come across a valuable or interesting site when you are looking for something else.

Serendipity requires exposure to things you aren’t looking for, search engines excel at that, but their results are so noisy that serendipity is a rare occurrence.

Six Degrees of Wikipedia may have a different result.

First and foremost, humans created the links, for reasons unknown, that form the six degrees of separation. The resulting six degrees is a snapshot of human input from dozens, if not hundreds, of human actors. All of who had an unknown motivation.

Second, the limitation to six degrees results in a graph and nodes that can be absorbed in a glance.

Compare to the “I can make big and dense graphs” so typical in the “analysis” of social media results. (Hint: If any US government agency is asking, “The Russians did it.” is the correct response. Gin up supporting data on your own.)

Six degrees between topics would make a fascinating way to explore a topic map, especially one that merged topics from different domains. Randomly select labels to appear along side those more familiar to a user. Provoke serendipity!

Serendipity in the Stacks:…

Wednesday, January 7th, 2015

Serendipity in the Stacks: Libraries, Information Architecture, and the Problems of Accidental Discovery by Patrick L. Carr.


Serendipity in the library stacks is generally regarded as a positive occurrence. While acknowledging its benefits, this essay draws on research in library science, information systems, and other fields to argue that, in two important respects, this form of discovery can be usefully framed as a problem. To make this argument, the essay examines serendipity both as the outcome of a process situated within the information architecture of the stacks and as a user perception about that outcome.

A deeply dissatisfying essay on serendipity as evidenced in the author’s conclusion that reads in part:

While acknowledging the validity of Morville’s points, I nevertheless believe that, along with its positive aspects, serendipity in the stacks can be usefully framed as a problem. From a process-based standpoint, serendipity is problematic because it is an indicator of a potential misalignment between user intention and process outcome. And, from a perception-based standpoint, serendipity is problematic because it can encourage user-constructed meanings for libraries that are rooted in opposition to change rather than in users’ immediate and evolving information needs.

To illustrate the “…potential misalignment between user intention and process outcome,” Carr uses the illustration of a user looking for a specific volume by call number but the absence of the book for its location, results in the discovery of an even more useful book nearby. That Carr describes as:

Even if this information were to prove to be more valuable to the user than the information in the book that was sought, the user’s serendipitous discovery nevertheless signifies a misalignment of user intention and process outcome.

Sorry, that went by rather quickly. If the user considers the discovery to be a favorable outcome, why should we take Carr’s word that it “signifies a misalignment of user intention and process outcome?” What other measure for success should an information retrieval system have other than satisfaction of its users? What other measure would be meaningful?

Carr refuses to consider how libraries could seem to maximize what is seen as a positive experience by users because:

By situating the library as a tool that functions to facilitate serendipitous discovery in the stacks, librarians risk also situating the library as a mechanism that functions as a symbolic antithesis to the tools for discovery that are emerging in online environments. In this way, the library could signify a kind of bastion against change. Rather than being cast as a vital tool for meeting discovery needs in emergent online environments, the library could be marginalized in a way that suggests to users that they perceive it as a means of retreat from online environments.

I don’t doubt the same people who think librarians are superflous since “everyone can find what they need on the Internet” would be quick to find libraries as being “bastion[s] against change.” For any number of reasons. But the opinions of semi-literates should not dictate library policy.

What Carr fails to take into account is that a stacks “environment,” which he concedes does facilitate serendipitous discovery, can be replicated in digital space.

For example, while it is currently a prototype, StackLife at Harvard is an excellent demonstration of a virtual stack environment.


Jonathan Zittrain, Vice-Dean for Library and Information Resources, Harvard Law School; Professor of Law at Harvard Law School and the Harvard Kennedy School of Government; Professor of Computer Science at the Harvard School of Engineering and Applied Sciences; Co-founder of the Berkman Center for Internet & Society, nominated StackLife for Stanford Prize for Innovation in Research Libraries, saying in part:

  • It always shows a book (or other item) in a context of other books.
  • That context is represented visually as a scrollable stack of items — a shelf rotated so that users can more easily read the information on the spines.
  • The stack integrates holdings from multiple libraries.
  • That stack is sorted by “StackScore,” a measure of how often the library’s community has used a book. At the Harvard Library installation, the computation includes ten year aggregated checkouts weighted by faculty, grad, or undergrad; number of holdings in the 73 campus libraries, times put on reserve, etc.
  • The visualization is simple and clean but also information-rich. (a) The horizontal length of the book reflects the physical book’s height. (b) The vertical height of the book in the stack represents its page count. (c) The depth of the color blue of the spine indicates its StackScore; a deeper blue means that the work is more often used by the community.
  • When clicked, a work displays its Library of Congress Subject Headings (among other metadata). Clicking one of those headings creates a new stack consisting of all the library’s items that share that heading.
  • If there is a Wikipedia page about that work, Stacklife also displays the Wikipedia categories on that page, and lets the user explore by clicking on them.
  • Clicking on a work creates an information box that includes bibliographic information, real-time availability at the various libraries, and, when available: (a) the table of contents; (b) a link to Google Books’ online reader; (c) a link to the Wikipedia page about that book; (d) a link to any National Public Radio audio about the work; (e) a link to the book’s page at Amazon.
  • Every author gets a page that shows all of her works in the library in a virtual stack. The user can click to see any of those works on a shelf with works on the same topic by other authors.
  • Stacklife is scalable, presenting enormous collections of items in a familiar way, and enabling one-click browsing, faceting, and subject-based clustering.

Does StackLife sound like a library “…that [is] rooted in opposition to change rather than in users’ immediate and evolving information needs.”

I can’t speak for you but it doesn’t sound that way to me. It sounds like a library that isn’t imposing its definition of satisfaction upon users (good for Harvard) and that is working to blend the familiar with new to the benefit of its users.

We can only hope that College & Research Libraries will have a response from the StackLife project to Carr’s essay in the same issue.

PS: If you have library friends who don’t read this blog, please forward a link to this post to their attention. I know they are consumed with their current tasks but the StackLife project is one they need to be aware of. Thanks!

I first saw the essay on Facebook in a posting by Simon St.Laurent.

Penguins in Sweaters…

Sunday, November 3rd, 2013

Penguins in Sweaters, or Serendipitous Entity Search on User-generated Content by Ilaria Bordino, Yelena Mejova and Mounia Lalmas.


In many cases, when browsing the Web users are searching for specific information or answers to concrete questions. Sometimes, though, users find unexpected, yet interesting and useful results, and are encouraged to explore further. What makes a result serendipitous? We propose to answer this question by exploring the potential of entities extracted from two sources of user-generated content – Wikipedia, a user-curated online encyclopedia, and Yahoo! Answers, a more unconstrained question/answering forum – in promoting serendipitous search. In this work, the content of each data source is represented as an entity network, which is further enriched with metadata about sentiment, writing quality, and topical category. We devise an algorithm based on lazy random walk with restart to retrieve entity recommendations from the networks. We show that our method provides novel results from both datasets, compared to standard web search engines. However, unlike previous research, we find that choosing highly emotional entities does not increase user interest for many categories of entities, suggesting a more complex relationship between topic matter and the desirable metadata attributes in serendipitous search.

From the introduction:

A system supporting serendipity must provide results that are surprising, semantically cohesive, i.e., relevant to some information need of the user, or just interesting. In this paper, we tackle the question of what makes a result serendipitous.

Serendipity, now that would make a very interesting product demonstration!

In particular if the search results were interesting to the client.

I must admit when I saw the first part of the title I was expecting an article on Linux. 😉


JChemInf volumes as single PDFs

Saturday, July 13th, 2013

JChemInf volumes as single PDFs by Egon Willighagen.

From the post:

One of the advantages of a print journal is that you are effectively forced to look at papers which may not have received your attention in the first place. Online journals do not provide such functionality, and you’re stuck with the table of contents, and never see that cool figure from that paper with the boring title.

Of course, the problem is artificial. We have pdftk and we can make PDF of issues, or in the present example, of complete volumes. Handy, I’d say. It saves you from many, many downloads and forces you to scan through all pages. Anyway, I wanted to scan the full JChemInf volumes, and rather have one PDF per volume. So, I created them. And you can get them too. The journal is Open Access after all (CC-BY).


Egon has links to the Journal of Cheminformatics (as complete volumes), vols. 1 – 4.

He also has a good point about print journals increasing the potential for a chance encounter with unexpected information.

Personalization of search results is a step away from serendipity.

Thoughts on how to step back towards serendipity?

A long and winding road (….introducing serendipity into music recommendation)

Wednesday, April 25th, 2012

Auralist: introducing serendipity into music recommendation


Recommendation systems exist to help users discover content in a large body of items. An ideal recommendation system should mimic the actions of a trusted friend or expert, producing a personalised collection of recommendations that balance between the desired goals of accuracy, diversity, novelty and serendipity. We introduce the Auralist recommendation framework, a system that – in contrast to previous work – attempts to balance and improve all four factors simultaneously. Using a collection of novel algorithms inspired by principles of “serendipitous discovery”, we demonstrate a method of successfully injecting serendipity, novelty and diversity into recommendations whilst limiting the impact on accuracy. We evaluate Auralist quantitatively over a broad set of metrics and, with a user study on music recommendation, show that Auralist‘s emphasis on serendipity indeed improves user satisfaction.

A deeply interesting article for anyone interested in recommendation systems and the improvement thereof.

It is research that should go forward but among my concerns about the article:

1) I am not convinced of the definition of “serendipity:”

Serendipity represents the “unusualness” or “surprise” of recommendations. Unlike novelty, serendipity encompasses the semantic content of items, and can be imagined as the distance between recommended items and their expected contents. A recommendation of John Lennon to listeners of The Beatles may well be accurate and novel, but hardly constitutes an original or surprising recommendation. A serendipitous system will challenge users to expand their tastes and hopefully provide more interesting recommendations, qualities that can help improve recommendation satisfaction [23]

Or perhaps I am “hearing” it in the context of discovery. Such as searching for Smokestack Lighting and not finding the Yardbirds but Howling Wolf as the performer. Serendipity in that sense not having any sense of “challenge.”

2) A survey of 21 participants, mostly students, is better than experimenters asking each other for feedback but only just. The social sciences department should be able to advise on test protocols and procedures.

3) There was no showing that “user satisfaction,” the item to be measured, is the same thing as “serendipity.” I am not entirely sure that other than by example, “serendipity” can even be discussed, let alone measured.

Take my Howling Wolf example. How close or far away is the “serendipity” there versus an instance of “serendipity” as offered by Auralist? Unless and until we can establish a metric, at least a loose one, it is hard to say which one has more “serendipity.”

No Datum is an Island of Serendip

Wednesday, November 30th, 2011

No Datum is an Island of Serendip by Jim Harris.

From the post:

Continuing a series of blog posts inspired by the highly recommended book Where Good Ideas Come From by Steven Johnson, in this blog post I want to discuss the important role that serendipity plays in data — and, by extension, business success.

Let’s start with a brief etymology lesson. The origin of the word serendipity, which is commonly defined as a “happy accident” or “pleasant surprise” can be traced to the Persian fairy tale The Three Princes of Serendip, whose heroes were always making discoveries of things they were not in quest of either by accident or by sagacity (i.e., the ability to link together apparently innocuous facts to come to a valuable conclusion). Serendip was an old name for the island nation now known as Sri Lanka.

“Serendipity,” Johnson explained, “is not just about embracing random encounters for the sheer exhilaration of it. Serendipity is built out of happy accidents, to be sure, but what makes them happy is the fact that the discovery you’ve made is meaningful to you. It completes a hunch, or opens up a door in the adjacent possible that you had overlooked. Serendipitous discoveries often involve exchanges across traditional disciplines. Serendipity needs unlikely collisions and discoveries, but it also needs something to anchor those discoveries. The challenge, of course, is how to create environments that foster these serendipitous connections.”

I don’t disagree about the importance of serendipity but I do wonder about the degree to which we can plan or even facilitate it. At least in terms of software/interfaces, etc.

Remember Malcolm Gladwell and The Tipping Point? Its a great read but there is on difficulty that I don’t think Malcolm dwells on enough. It is one thing to pick out tipping points (or alleged ones) in retrospect. It is quite another to pick out a tipping point before it occurs and to plan to take advantage of it. There are any number of rationalist explanations for various successes, but that are all after the fact constructs that serve particular purposes.

I do think we can make serendipity more likely by exposing people to a variety of information that makes the realization of connections between information more likely. That isn’t to say that serendipity will happen, just that we can create circumstances for people that will make the conditions ripe for it.

Serendipity Is Not An Intent

Tuesday, November 15th, 2011

Serendipity Is Not An Intent

From the post:

Wired had two amazing pieces on online advertising yesterday and while Felix Salmon’s piece The Future of Online Advertising could be Yieldbot’s manifesto it is the piece Can ‘Serendipity’ Be a Business Model? that deals more directly with our favorite topic, intent.


Twitter is the greatest discovery engine ever created on the web. But discovery can be and not be serendipitous. Sometimes,, as Dorsey alludes to, you discover things you had no idea existed but much more often you discover things after you have intent around what you want to discover. This is an important differentiation for Twitter to consider. It’s important because it’s a different algorithm.

Discovery intent is not an algo about “how do we introduce you to something that would otherwise be difficult for you to find, but something that you probably have a deep interest in?” There is no “introduce” and “probably” in the discovery intent algo. Most importantly, there is no “we.” It’s an algo about “how do you discover what you’re interested in.”

Discovering more about what you’re interested in has always been Twitter’s greatest strength. It leverages both user-defined inputs and the rich content streams where context and realtime matching can occur. Just like Search.

If Twitter wants to build a discovery system for advertising it should look like this. (emphasis added)

Inverts the advertising and when you think about it, the search algorithm. Rather than discovering, poorly, what interests the user or answer as question, enable the user to discover (a pull model) what interests them.

Completely different way of thinking about advertising and search.

Priesthood of the user? Worked (depending on who you ask) a long time ago.

Maybe, just maybe, a service architecture based on that as a goal, could disrupt the current “I know better than you” push models for search and advertising.

Is Precision the Enemy of Serendipity?

Wednesday, September 28th, 2011

I was reading claims of increased precision by software X the other day. I probably have mentioned this before (and it wasn’t original, then or now) that precision seems to me to be the enemy of serendipity.

For example, when I was an undergraduate, the library would display all the recent issues of journals on long angled shelves. So it was possible to walk along looking at the new issues in a variety of areas with ease. As a political science major I could have gone directly to journals on political science. But I would have missed the Review of Metaphysics and/or the Journal of the History of Ideas, both of which are rich sources of ideas relevant to topic maps (and information systems more generally).

But precision about the information available, a departmental page that links only to electronic versions of journals relevant to the “discipline,” reduces the opportunity to perhaps recognize relevant literature outside the confines of a discipline.

True, I still browse a lot, otherwise I would not notice titles like: k-means Approach to the Karhunen-Loéve Transform (aka PCA – Principal Component Analysis). I knew that k-means was a form of clustering that could help with gathering members of collective topics together but quite honestly did not recognize Karhunen-Loéve Transform. I know it as either PCA – Principal Component Analysis, which I inserted in my blog title to help others recognize the technique.

Of course the problem is that sometimes I really want precision, perhaps I am rushed to finish a job or need to find a reference for a standard, etc. In those cases I don’t have time to wade through a lot of search results and appreciate whatever (little) precision I can wring out of a search engine.

Whether I want more precision or more serendipity varies on a day to day basis for me. How about you?