Archive for the ‘Diversity’ Category

Trump Wins! Trump Wins! A Diversity Lesson For Data Scientists

Wednesday, November 9th, 2016

Here’s Every Major Poll That Got Donald Trump’s Election Win Wrong by Brian Flood.

From the post:

When Donald Trump shocked the world to become the president-elect on Tuesday night, the biggest loser wasn’t his opponent Hillary Clinton, it was the polling industry that tricked America into thinking we’d be celebrating the first female president right about now.

The polls, which Trump has been calling inaccurate and rigged for months, made it seem like Clinton was a lock to occupy the White House come January.

Nate Silver’s FiveThirtyEight is supposed to specialize in data-based journalism, but the site reported on Tuesday morning that Clinton had a 71.4 percent chance of winning the election. The site was wrong about the outcome in major battleground states including Florida, North Carolina and Pennsylvania, and Trump obviously won the election in addition to the individual states that were supposed to vote Clinton. Silver wasn’t the only pollster to botch the 2016 election.

Trump’s victory should teach would be data scientists this important lesson:

Diversity is important in designing data collection

Some of the reasons given for the failure of prediction in this election:

  1. People without regular voting records voted.
  2. People polled weren’t honest about their intended choices.
  3. Pollster’s weren’t looking for a large, angry segment of the population.

All of which can be traced back to a lack of imagination/diversity in the preparation of the polling instruments.

Ironic isn’t it?

Strive for diversity, including people whose ideas you find distasteful.

Such as vocal Trump supporters. (Substitute your favorite villain.)

Boosting (in Machine Learning) as a Metaphor for Diverse Teams [A Quibble]

Sunday, October 23rd, 2016

Boosting (in Machine Learning) as a Metaphor for Diverse Teams by Renee Teate.

Renee’s summary:

tl;dr: Boosting ensemble algorithms in Machine Learning use an approach that is similar to assembling a diverse team with a variety of strengths and experiences. If machines make better decisions by combining a bunch of “less qualified opinions” vs “asking one expert”, then maybe people would, too.

Very much worth your while to read at length but to setup my quibble:


What a Random Forest does is build up a whole bunch of “dumb” decision trees by only analyzing a subset of the data at a time. A limited set of features (columns) from a portion of the overall records (rows) is used to generate each decision tree, and the “depth” of the tree (and/or size of the “leaves”, the number of examples that fall into each final bin) is limited as well. So the trees in the model are “trained” with only a portion of the available data and therefore don’t individually generate very accurate classifications.

However, it turns out that when you combine the results of a bunch of these “dumb” trees (also known as “weak learners”), the combined result is usually even better than the most finely-tuned single full decision tree. (So you can see how the algorithm got its name – a whole bunch of small trees, somewhat randomly generated, but used in combination is a random forest!)

All true but “weak learners” in machine learning are easily reconfigured, combined with different groups of other “weak learners,” or even discarded.

None of which is true for people who are hired to be part of a diverse team.

I don’t mean to discount Renee’s metaphor because I think it has much to recommend it, but diverse “weak learners” make poor decisions too.

Don’t take my word for it, watch the 2016 congressional election results.

Be sure to follow Renee on @BecomingDataSci. I’m interested to see how she develops this metaphor and where it leads.

Enjoy!

End The Lack Of Diversity On The Internet Today!

Saturday, January 16th, 2016

Julia Evans tweeted earlier today:

“programmers are 0.66% of internet users, and build the software that everyone uses” – @heddle317

The strengths of having diversity on teams, including software teams, is well known and I won’t repeat those arguments here.

See: Why Diverse Teams Create Better Work, Diversity and Work Group Performance, More Diverse Personalities Mean More Successful Teams, Managing Groups and Teams/Diversity, or, How Diversity Makes Us Smarter, for five entry points into the literature on the diversity.

With 0.66% of internet users writing software for everyone, do you see the lack of diversity?

One response is to turn people into “Linus Torvalds” so we have a broader diversity of people programming. Good thought but I don’t know of anyone who wants to be a Linus Torvalds. (Sorry Linus.)

There’s a great benefit to having more people master programming but long-term, its not a solution to the lack of diversity in the production of software for the Internet.

Even if the number of people writing software for the Internet went up ten-fold, that’s only 6.6% of the population of Internet users. Far too monotone to qualify as any type of diversity.

There is another way to increase diversity in the production of Internet software.

Warnings: You will have to express your intuitive experience in words. You will have to communicate your experiences to programmers. Some programmers will think they know a “better way” for you to experience the interface. Always remember your experience is the “users” experience, unlike theirs.

You can use, express comments on, track your comments and respond to comments from programmers, on software built for the Internet. Programmers won’t seek you or your comments out so volunteering is the only option.

Programmers have their views, but if software doesn’t meet the need, habits, customs of users, it’s useless.

Programmers can only learn the needs, habits and customs of users from you.

Are you going to help end this lack of diversity and programmers to write better software or not?

IoT: The New Tower of Babel

Thursday, December 10th, 2015

640-babel

Luke Anderson‘s post at Clickhole, titled: Humanity Could Totally Pull Off The Tower Of Babel At This Point, was a strong reminder of the Internet of Things (IoT).

See what you think:

If you went to Sunday school, you know the story: After the Biblical flood, the people of earth came together to build the mighty Tower of Babel. Speaking with one language and working tirelessly, they built a tower so tall that God Himself felt threatened by it. So, He fractured their language so that they couldn’t understand each other, construction ceased, and mankind spread out across the ancient world.

We’ve come a long way in the few millennia since then, and at this point, humanity could totally pull off the Tower of Babel.

Just look at the feats of human engineering we’ve accomplished since then: the Great Wall; the Golden Gate Bridge; the Burj Khalifa. And don’t even get me started on the International Space Station. Building a single tall building? It’d be a piece of cake.

Think about it. Right off the bat, we’d be able to communicate with each other, no problem. Besides most of the world speaking either English, Spanish, and/or Chinese by now, we’ve got translators, Rosetta Stone, Duolingo, the whole nine yards. Hell, IKEA instructions don’t even have words and we have no problem putting their stuff together. I can see how a guy working next to you suddenly speaking Arabic would throw you for a loop a few centuries ago. But now, I bet we could be topping off the tower and storming heaven in the time it took people of the past to say “Hey, how ya doing?”

Compare this Internet of Things statement from the Masters of Contracts that Yield No Useful Result:


IoT implementation, at its core, is the integration of dozens and up to tens of thousands of devices seamlessly communicating with each other, exchanging information and commands, and revealing insights. However, when devices have different usage scenarios and operating requirements that aren’t compatible with other devices, the system can break down. The ability to integrate different elements or nodes within broader systems, or bringing data together to drive insights and improve operations, becomes more complicated and costly. When this occurs, IoT can’t reach its potential, and rather than an Internet of everything, you see siloed Internets of some things.

The first, in case you can’t tell from it being posted at Clickhole, was meant as sarcasm or humor.

The second was deadly serious from folks who would put a permanent siphon on your bank account. Whether their services are cost effective or not is up to you to judge.

The Tower of Babel is a statement about semantics and the human condition. It should come as no surprise that we all prefer our language over that of others, whether those are natural or programming languages. Moreover, judging from code reuse, to say nothing of the publishing market, we prefer our restatements of the material, despite equally useful statements by others.

How else would you explain the proliferation of MS Excel books? 😉 One really good one is more than enough. Ditto for Bible translations.

Creating new languages to “fix” semantic diversity just adds another partially adopted language to the welter of languages that need to be integrated.

The better option, at least from my point of view, is to create mappings between languages, mappings that are based on key/value pairs to enable others to build upon, contract or expand those mappings.

It simply isn’t possible to foresee every use case or language that needs semantic integration but if we perform such semantic integration as returns ROI for us, then we can leave the next extension or contraction of that mapping to the next person with a different ROI.

It’s heady stuff to think we can cure the problem represented by the legendary Tower of Babel, but there is a name for that. It’s called hubris and it never leads to a good end.

“At least they don’t seal the fire exits”…

Friday, June 5th, 2015

“At least they don’t seal the fire exits” Or why unpaid internships are BS by Auriel M. V. Fournier.

From the post:

I’m flipping through a job board, scanning for post docs, dreamily reading field technician posts and there they are

Unpaid internship in Amazing Place A

Unpaid technician working with Cool Species B

Some are obvious, and put their unpaid status it in the title, others you have to dig through the fine print, before you are hit you over the head with what a ‘unique oppurtunity this internship is’ how rare the animal or system, and how you should smile and love that you are not going to get paid, and might even have to pay them for the pleasure of working for them.

Every time I see one of these posts my skin crawls, my heart races, my eyes narrow. These jobs anger me, at my core, and I think we as a scientific community need to stop doing this to ourselves and our young scientists.

We get up and talk about how we need diversity in our field (whatever field it is, for me its wildlife ecology) how we need people from all backgrounds, cultures, creeds and races. Then we create positions that only those who come from means, and continue to have them can take. We are shooting ourselves in the foot by excluding people from getting into science. How is someone who has student loans (most students do), someone who has no financial support, someone with a child, or a sick parent, no family to buy a plane ticket for them, or any other kind of life situation supposed to take these positions? How?
….

Take the time to read Auriel’s post, whether you use unpaid internships or not. It’s not long and worth the read. I will wait for you to come back before continuing….back so soon?

Abstract just a little bit from Auriel’s post and think about her main point separate and apart from the specifics of unpaid internships. Is it that unpaid work can be undertaken only by those who can survive without getting paid for that work? Yes?

If you agree with that, how many unpaid editors, unpaid journal board members, unpaid peer reviewers, unpaid copy editors, unpaid program unit chairs, unpaid presenters, unpaid organizational officers, etc., do you think exist in academic circles?

Hmmm, do you think the people in all those unpaid positions still have to make ends meet at the end of the month? Take care of expenses out of their own pockets for travel and other expenses? Do you think the utility company cares whether you have done a good job as a volunteer peer reviewer this past month?

The same logic that Auriel uses in her post applies to all those unpaid positions as well. Not that academic groups can make all unpaid volunteer positions paid but any unpaid or underpaid position means you have made choices about who can hold those positions.

Accidental vs Deliberate Context

Saturday, December 27th, 2014

Accidental vs Deliberate Context by Jessica Kerr.

From the post:

In all decisions, we bring our context with us. Layers of context, from what we read about that morning to who our heroes were growing up. We don’t realize how much context we assume in our communications, and in our code.

One time I taught someone how to make the Baby Vampire face. It involves poking out both corners of my lower lip, so they stick up like poky gums. Very silly. To my surprise, the person couldn’t do it. They could only poke one side of the lower lip out at a time.

Hotel-Transylvania-Castle-1280x1024-Wallpaper-ToonsWallpapers.com-

Turns out, few outside my family can make this face. My mom can do it, my sister can do it, my daughters can do it – so it came as a complete surprise to me when someone couldn’t. There is a lip-flexibility that’s part of my context, always has been, and I didn’t even realize it.

Jessica goes on to illustrate that communication depends upon the existence of some degree of shared context and that additional context can be explained to others, as on a team.

She distinguishes between “incidental” shared contexts and “deliberate” shared contexts. Incidental contexts arising from family or long association with friends. Common/shared experiences form an incidental context.

Deliberate contexts, on the other hand, are the intentional melding of a variety of contexts, in her examples, the contexts of biologists and programmers. Who at the outset, lacked a common context in which to communicate.

Forming teams with diverse backgrounds is a way to create a “deliberate” context, but my question would be how to preserve that “deliberate” context for others? It becomes an “incidental” context if others must join the team in order to absorb the previously “deliberate” context. If that is a requirement, then others will not be able to benefit from deliberately created contexts in which they did not participate.

If the process and decisions made in forming a “deliberate” context were captured by a topic map, then others could apply this “new” deliberate context to develop other “deliberate” contexts. Perhaps some of the decisions or mappings made would not suit another “deliberate” context but perhaps some would. And perhaps other “deliberate” contexts would evolve beyond the end of their inputs.

The point being that unless these “deliberate” contexts are captured, to whatever degree of granularity is desired, every “deliberate” context for say biologists and programmers is starting off at ground zero. Have you ever heard of a chemistry experiment starting off by recreating the periodic table? I haven’t. Perhaps we should abandon that model in the building of “deliberate” contexts as well.

Not to mention that re-usable “deliberate” contexts might enable greater diversity in teams.

Topic maps anyone?

PS: I suggest topic maps to capture “deliberate” context because topic maps are not constrained by logic. You can capture any subject and any relationship between subjects, logical or not. For example, a user of a modern dictionary, which lists words in alphabetical order, would be quite surprised if given a dictionary of Biblical Hebrew and asked to find a word (assuming they know the alphabet). The most common dictionaries of Biblical Hebrew list words by their roots and not as they appear to the common reader. There are arguments to be made for each arrangement but neither one is a “logical” answer.

The arrangement of dictionaries is another example of differing contexts. With a topic map I can offer a reader whichever Biblical Hebrew dictionary is desired, with only one text underlying both displays. As opposed to the printed version which can offer only one context or another.

Non-Moral Case For Diversity

Monday, July 21st, 2014

Groups of diverse problem solvers can outperform groups of high-ability problem solvers by Lu Hong and Scott E. Page.

Abstract:

We introduce a general framework for modeling functionally diverse problem-solving agents. In this framework, problem-solving agents possess representations of problems and algorithms that they use to locate solutions. We use this framework to establish a result relevant to group composition. We find that when selecting a problem-solving team from a diverse population of intelligent agents, a team of randomly selected agents outperforms a team comprised of the best-performing agents. This result relies on the intuition that, as the initial pool of problem solvers becomes large, the best-performing agents necessarily become similar in the space of problem solvers. Their relatively greater ability is more than offset by their lack of problem-solving diversity.

I have heard people say that diverse teams are better, but always in the context of contending for members of one group or another to be included on a team.

Reading the paper carefully, I don’t think that is the author’s point at all.

From the conclusion:

The main result of this paper provides conditions under which, in the limit, a random group of intelligent problem solvers will outperform a group of the best problem solvers. Our result provides insights into the trade-off between diversity and ability. An ideal group would contain high-ability problem solvers who are diverse. But, as we see in the proof of the result, as the pool of problem solvers grows larger, the very best problem solvers must become similar. In the limit, the highest-ability problem solvers cannot be diverse. The result also relies on the size of the random group becoming large. If not, the individual members of the random group may still have substantial overlap in their local optima and not perform well. At the same time, the group size cannot be so large as to prevent the group of the best problem solvers from becoming similar. This effect can also be seen by comparing Table 1. As the group size becomes larger, the group of the best problem solvers becomes more diverse and, not surprisingly, the group performs relatively better.

A further implication of our result is that, in a problem-solving context, a person’s value depends on her ability to improve the collective decision (8). A person’s expected contribution is contextual, depending on the perspectives and heuristics of others who work on the problem. The diversity of an agent’s problem-solving approach, as embedded in her perspective-heuristic pair, relative to the other problem solvers is an important predictor of her value and may be more relevant than her ability to solve the problem on her own. Thus, even if we were to accept the claim that IQ tests, Scholastic Aptitude Test scores, and college grades predict individual problem-solving ability, they may not be as important in determining a person’s potential contribution as a problem solver as would be measures of how differently that person thinks. (emphasis added)

Some people accept gender, race, nationality, etc. as markers for thinking differently and no doubt that is true in some cases. But presuming it is just as uninformed as presuming no differences in how people of different gender, race, and nationalities think.

You could ask. Such as presenting candidates for a team with open ended problems that are capable of multiple solutions. Group similar solutions together and then pick randomly across the solution groups.

You may have a gender, race, nationality diverse team but if they think the same way, say Anthony Scalia and Clarence Thomas, then your team isn’t usefully diverse.

Diversity of thinking should be your goal, not diversity of markers of diversity.

I first saw this in a tweet by Chris Dixon.

Corporate Culture Clash:…

Monday, July 15th, 2013

Corporate Culture Clash: Getting Data Analysts and Executives to Speak the Same Language by Drew Rockwell

From the post:

A colleague recently told me a story about the frustration of putting in long hours and hard work, only to be left feeling like nothing had been accomplished. Architecture students at the university he attended had scrawled their frustrations on the wall of a campus bathroom…“I wanted to be an architect, but all I do is create stupid models,” wrote students who yearned to see their ideas and visions realized as staples of metropolitan skylines. I’ve heard similar frustrations expressed by business analysts who constantly face the same uphill battle. In fact, in a recent survey we did of 600 analytic professionals, some of the biggest challenges they cited were “getting MBAs to accept advanced methods”, getting executives to buy into the potential of analytics, and communicating with “pointy-haired” bosses.

So clearly, building the model isn’t enough when it comes to analytics. You have to create an analytics-driven culture that actually gets everyone paying attention, participating and realizing what analytics has to offer. But how do you pull that off? Well, there are three things that are absolutely critical to building a successful, analytics-driven culture. Each one links to the next and bridges the gap that has long divided analytics professionals and business executives.

Some snippets to attract you to this “must read:”

(…)
In the culinary world, they say you eat with your eyes before your mouth. A good visual presentation can make your mouth water, while a bad one can kill your appetite. The same principle applies when presenting data analytics to corporate executives. You have to show them something that stands out, that they can understand and that lets them see with their own eyes where the value really lies.
(…)
One option for agile integration and analytics is data discovery – a type of analytic approach that allows business people to explore data freely so they can see things from different perspectives, asking new questions and exploring new hypotheses that could lead to untold benefits for the entire organization.
(…)
If executives are ever going to get on board with analytics, the cost of their buy-in has to be significantly lowered, and the ROI has to be clear and substantial.
(…)

I did pick the most topic map “relevant” quotes but they are as valid for topic maps as any other approach.

Seeing from different perspectives sounds like on-the-fly merging to me.

How about you?

Detecting Semantic Overlap and Discovering Precedents…

Monday, July 8th, 2013

Detecting Semantic Overlap and Discovering Precedents in the Biodiversity Research Literature by Graeme Hirst, Nadia Talenty, and Sara Scharfz.

Abstract:

Scientific literature on biodiversity is longevous, but even when legacy publications are available online, researchers often fail to search it adequately or effectively for prior publications; consequently, new research may replicate, or fail to adequately take into account, previously published research. The mechanisms of the Semantic Web and methods developed in contemporary research in natural language processing could be used, in the near-term future, as the basis for a precedent-finding system that would take the text of an author’s early draft (or a submitted manuscript) and find potentially related ideas in published work. Methods would include text-similarity metrics that take different terminologies, synonymy, paraphrase, discourse relations, and structure of argumentation into account.

Footnote one (1) of the paper gives an idea of the problem the authors face:

Natural history scientists work in fragmented, highly distributed and parochial communities, each with domain specific requirements and methodologies [Scoble 2008]. Their output is heterogeneous, high volume and typically of low impact, but with a citation half-life that may run into centuries” (Smith et al. 2009). “The cited half-life of publications in taxonomy is longer than in any other scientific discipline, and the decay rate is longer than in any scientific discipline” (Moritz 2005). Unfortunately, we have been unable to identify the study that is the basis for Moritz’s remark.

The paper explores in detail issues that have daunted various search techniques, when the material is available in electronic format at all.

The authors make a general proposal for addressing these issues, with mention of the Semantic Web but omit from their plan:

The other omission is semantic interpretation into a logical form, represented in XML, that draws on ontologies in the style of the original Berners-Lee, Hendler, and Lassila (2001) proposal for the Semantic Web. The problem with logical-form representation is that it implies a degree of precision in meaning that is not appropriate for the kind of matching we are proposing here. This is not to say that logical forms would be useless. On the contrary, they are employed by some approaches to paraphrase and textual entailment (section 4.1 above) and hence might appear in the system if only for that reason; but even so, they would form only one component of a broader and somewhat looser kind of semantic representation.

That’s the problem with the Semantic Web in a nutshell:

The problem with logical-form representation is that it implies a degree of precision in meaning that is not appropriate for the kind of matching we are proposing here.

What if I want to be logically precise sometimes but not others?

What if I want to be more precise in some places and less precise in others?

What if I want to have different degrees or types of imprecision?

With topic maps the question is: How im/precise do you want to be?

The Twitter of Babel: Mapping World Languages through Microblogging Platforms

Friday, December 21st, 2012

The Twitter of Babel: Mapping World Languages through Microblogging Platforms by Delia Mocanu, Andrea Baronchelli, Bruno Gonçalves, Nicola Perra, Alessandro Vespignani.

Abstract:

Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data “proxies” of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the linguistic homogeneity of different countries, the touristic seasonal patterns within countries and the geographical distribution of different languages in multilingual regions. This work highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities.

So, rather on the surface homogeneous languages, users can use their own natural, heterogeneous languages, which we can analyze as such?

Cool!

Semantic and linguistic heterogeneity has persisted from the original Tower of Babel until now.

The smart money will be riding on managing semantic and linguistic heterogeneity.

Other money can fund emptying the semantic ocean with a tea cup.

Why There Shouldn’t Be A Single Version Of The Truth

Sunday, December 16th, 2012

Why There Shouldn’t Be A Single Version Of The Truth by Chuck Hollis.

From the post:

Legacy thinking can get you in trouble in so many ways. The challenge is that — well — there’s so much of it around.

Maxims that seemed to make logical sense in one era quickly become the intellectual chains that hold so many of us back. Personally, I’ve come to enjoy blowing up conventional wisdom to make room for emerging realities.

I’m getting into more and more customer discussions with progressive IT organizations that are seriously contemplating building platforms and services that meet the broad goal of “analytically enabling the business” — business analytics as service, if you will.

The problem? The people in charge have done things a certain way for a very long time. And the new, emerging requirements are forcing them to go back and seriously reconsider some of their most deeply-held assumptions.

Like having “one version of the truth”. I’ve seen multiple examples of it get in the way of organizations who need to be doing more with their data.

As usual, a highly entertaining and well illustrated essay from Chuck.

Chuck makes the case for enough uniformity to enable communication but enough diversity to generate new ideas and interesting discussions.

The Shades of Time Project

Thursday, April 26th, 2012

The Shades of TIME project by Drew Conway.

Drew writes:

A couple of days ago someone posted a link to a data set of all TIME Magazine covers, from March, 1923 to March, 2012. Of course, I downloaded it and began thumbing through the images. As is often the case when presented with a new data set I was left wondering, “What can I ask of the data?”

After thinking it over, and with the help of Trey Causey, I came up with, “Have the faces of those on the cover become more diverse over time?” To address this questions I chose to answer something more specific: Has the color values of skin tones in faces on the covers changed over time?

I developed a data visualization tool, I’m calling the Shades of TIME, to explore the answer to that question.

An interesting data set and an illustration of why topic map applications are more useful if they have dynamic merging (user selected).

Presented with the same evidence, the covers of TIME magazine I most likely would have:

  • Mapped people on the covers to historical events
  • Mapped people on the covers to additional historical resources
  • Mapped covers into library collections
  • etc.

I would not have set out to explore the diversity in skin color on the covers. In part because I remember when it changed. That is part of my world knowledge. I don’t have to go looking for evidence of it.

My purpose isn’t to say authors, even topic map authors, should avoid having a point of view. Isn’t possible in any event. What I am suggesting is that to the extent possible, users be enabled to impose their views on a topic map as well.

SEALS – Community Page

Sunday, January 29th, 2012

SEALS – Semantic Evaluation At Large Scale – Community Page

The community page was added after my first post on the SEAL project.

The next community event:

SEALS to present evaluation results at ESWC 2012

SEALS is pleased to announce that the workshop Evaluation of Semantic Technologies (IWEST 2012) has been confirmed to take place at the leading semantic web conference, ESWC (Extended Semantic Web Conference) 2012, scheduled to take place May 27-31, 2012 in beautiful Crete, Greece.

This workshop will be a venue for researchers and tool developers, firstly, to initiate discussion about the current trends and future challenges of evaluating semantic technologies. Secondly, to support communication and collaboration with the goal of aligning the various evaluation efforts within the community and accelerating innovation in all the associated fields as has been the case with both the TREC benchmarks in information retrieval and the TPC benchmarks in database research.

A call for papers will be published soon. All SEALS community members and evaluation campaign participants are especially encouraged to submit and participate.

If you attend, I am particularly interested in the results of the discussion about “aligning the various evaluation efforts within the community….”

I say that because when the project started, the “about” page reported:

This is a very active research area, currently supported by more than 3000 individuals integrated in 360 organisations which have produced around 700 tools, but still suffers from a lack of standard benchmarks and infrastructures for assessing research outcomes. Due to its physically boundless nature, it remains relatively disorganized and lacks common grounds for assessing research and technological outcomes.

Sounds untidy, even diverse doesn’t it? 😉

To tell the truth, I am not bothered by the repetition of semantic diversity in efforts to reduce semantic diversity. I find it refreshing that our languages burst the bonds that would be imposed upon them on a regular basis. Tyrants of thought, social, political and economic arrangements, the well- and the ill-intended, all fail. (Some last longer than others but on a historical time scale, the governments of the East and West are ephemera. Their peoples, the originators of language and semantics, will persist.)

We can reduce semantic diversity when it is needful or to account for it, but even those efforts, as SEALS points out, exhibit the same semantic diversity as the area they purport to address.

ACM RecSys 2011 Workshop on Novelty and Diversity in Recommender Systems

Tuesday, December 13th, 2011

DiveRS 2011 – ACM RecSys 2011 Workshop on Novelty and Diversity in Recommender Systems

From the conference page:

Most research and development efforts in the Recommender Systems field have been focused on accuracy in predicting and matching user interests. However there is a growing realization that there is more than accuracy to the practical effectiveness and added-value of recommendation. In particular, novelty and diversity have been identified as key dimensions of recommendation utility in real scenarios, and a fundamental research direction to keep making progress in the field.

Novelty is indeed essential to recommendation: in many, if not most scenarios, the whole point of recommendation is inherently linked to a notion of discovery, as recommendation makes most sense when it exposes the user to a relevant experience that she would not have found, or thought of by herself –obvious, however accurate recommendations are generally of little use.

Not only does a varied recommendation provide in itself for a richer user experience. Given the inherent uncertainty in user interest prediction –since it is based on implicit, incomplete evidence of interests, where the latter are moreover subject to change–, avoiding a too narrow array of choice is generally a good approach to enhance the chances that the user is pleased by at least some recommended item. Sales diversity may enhance businesses as well, leveraging revenues from market niches.

It is easy to increase novelty and diversity by giving up on accuracy; the challenge is to enhance these aspects while still achieving a fair match of the user’s interests. The goal is thus generally to enhance the balance in this trade-off, rather than just a diversity or novelty increase.

DiveRS 2011 aims to gather researchers and practitioners interested in the role of novelty and diversity in recommender systems. The workshop seeks to advance towards a better understanding of what novelty and diversity are, how they can improve the effectiveness of recommendation methods and the utility of their outputs. We aim to identify open problems, relevant research directions, and opportunities for innovation in the recommendation business. The workshop seeks to stir further interest for these topics in the community, and stimulate the research and progress in this area.

The abstract from “Fusion-based Recommender System for Improving Serendipity” by Kenta Oku, Fumio Hattori reads:

Recent work has focused on new measures that are beyond the accuracy of recommender systems. Serendipity, which is one of these measures, is defined as a measure that indicates how the recommender system can find unexpected and useful items for users. In this paper, we propose a Fusion-based Recommender System that aims to improve the serendipity of recommender systems. The system is based on the novel notion that the system finds new items, which have the mixed features of two user-input items, produced by mixing the two items together. The system consists of item-fusion methods and scoring methods. The item-fusion methods generate a recommendation list based on mixed features of two user-input items. Scoring methods are used to rank the recommendation list. This paper describes these methods and gives experimental results.

Interested yet? 😉