Archive for the ‘Diversity’ Category

The Twitter of Babel: Mapping World Languages through Microblogging Platforms

Friday, December 21st, 2012

The Twitter of Babel: Mapping World Languages through Microblogging Platforms by Delia Mocanu, Andrea Baronchelli, Bruno Gonçalves, Nicola Perra, Alessandro Vespignani.

Abstract:

Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data “proxies” of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the linguistic homogeneity of different countries, the touristic seasonal patterns within countries and the geographical distribution of different languages in multilingual regions. This work highlights the potential of geolocalized studies of open data sources to improve current analysis and develop indicators for major social phenomena in specific communities.

So, rather on the surface homogeneous languages, users can use their own natural, heterogeneous languages, which we can analyze as such?

Cool!

Semantic and linguistic heterogeneity has persisted from the original Tower of Babel until now.

The smart money will be riding on managing semantic and linguistic heterogeneity.

Other money can fund emptying the semantic ocean with a tea cup.

Why There Shouldn’t Be A Single Version Of The Truth

Sunday, December 16th, 2012

Why There Shouldn’t Be A Single Version Of The Truth by Chuck Hollis.

From the post:

Legacy thinking can get you in trouble in so many ways. The challenge is that — well — there’s so much of it around.

Maxims that seemed to make logical sense in one era quickly become the intellectual chains that hold so many of us back. Personally, I’ve come to enjoy blowing up conventional wisdom to make room for emerging realities.

I’m getting into more and more customer discussions with progressive IT organizations that are seriously contemplating building platforms and services that meet the broad goal of “analytically enabling the business” — business analytics as service, if you will.

The problem? The people in charge have done things a certain way for a very long time. And the new, emerging requirements are forcing them to go back and seriously reconsider some of their most deeply-held assumptions.

Like having “one version of the truth”. I’ve seen multiple examples of it get in the way of organizations who need to be doing more with their data.

As usual, a highly entertaining and well illustrated essay from Chuck.

Chuck makes the case for enough uniformity to enable communication but enough diversity to generate new ideas and interesting discussions.

The Shades of Time Project

Thursday, April 26th, 2012

The Shades of TIME project by Drew Conway.

Drew writes:

A couple of days ago someone posted a link to a data set of all TIME Magazine covers, from March, 1923 to March, 2012. Of course, I downloaded it and began thumbing through the images. As is often the case when presented with a new data set I was left wondering, “What can I ask of the data?”

After thinking it over, and with the help of Trey Causey, I came up with, “Have the faces of those on the cover become more diverse over time?” To address this questions I chose to answer something more specific: Has the color values of skin tones in faces on the covers changed over time?

I developed a data visualization tool, I’m calling the Shades of TIME, to explore the answer to that question.

An interesting data set and an illustration of why topic map applications are more useful if they have dynamic merging (user selected).

Presented with the same evidence, the covers of TIME magazine I most likely would have:

  • Mapped people on the covers to historical events
  • Mapped people on the covers to additional historical resources
  • Mapped covers into library collections
  • etc.

I would not have set out to explore the diversity in skin color on the covers. In part because I remember when it changed. That is part of my world knowledge. I don’t have to go looking for evidence of it.

My purpose isn’t to say authors, even topic map authors, should avoid having a point of view. Isn’t possible in any event. What I am suggesting is that to the extent possible, users be enabled to impose their views on a topic map as well.

SEALS – Community Page

Sunday, January 29th, 2012

SEALS – Semantic Evaluation At Large Scale – Community Page

The community page was added after my first post on the SEAL project.

The next community event:

SEALS to present evaluation results at ESWC 2012

SEALS is pleased to announce that the workshop Evaluation of Semantic Technologies (IWEST 2012) has been confirmed to take place at the leading semantic web conference, ESWC (Extended Semantic Web Conference) 2012, scheduled to take place May 27-31, 2012 in beautiful Crete, Greece.

This workshop will be a venue for researchers and tool developers, firstly, to initiate discussion about the current trends and future challenges of evaluating semantic technologies. Secondly, to support communication and collaboration with the goal of aligning the various evaluation efforts within the community and accelerating innovation in all the associated fields as has been the case with both the TREC benchmarks in information retrieval and the TPC benchmarks in database research.

A call for papers will be published soon. All SEALS community members and evaluation campaign participants are especially encouraged to submit and participate.

If you attend, I am particularly interested in the results of the discussion about “aligning the various evaluation efforts within the community….”

I say that because when the project started, the “about” page reported:

This is a very active research area, currently supported by more than 3000 individuals integrated in 360 organisations which have produced around 700 tools, but still suffers from a lack of standard benchmarks and infrastructures for assessing research outcomes. Due to its physically boundless nature, it remains relatively disorganized and lacks common grounds for assessing research and technological outcomes.

Sounds untidy, even diverse doesn’t it? ;-)

To tell the truth, I am not bothered by the repetition of semantic diversity in efforts to reduce semantic diversity. I find it refreshing that our languages burst the bonds that would be imposed upon them on a regular basis. Tyrants of thought, social, political and economic arrangements, the well- and the ill-intended, all fail. (Some last longer than others but on a historical time scale, the governments of the East and West are ephemera. Their peoples, the originators of language and semantics, will persist.)

We can reduce semantic diversity when it is needful or to account for it, but even those efforts, as SEALS points out, exhibit the same semantic diversity as the area they purport to address.

ACM RecSys 2011 Workshop on Novelty and Diversity in Recommender Systems

Tuesday, December 13th, 2011

DiveRS 2011 – ACM RecSys 2011 Workshop on Novelty and Diversity in Recommender Systems

From the conference page:

Most research and development efforts in the Recommender Systems field have been focused on accuracy in predicting and matching user interests. However there is a growing realization that there is more than accuracy to the practical effectiveness and added-value of recommendation. In particular, novelty and diversity have been identified as key dimensions of recommendation utility in real scenarios, and a fundamental research direction to keep making progress in the field.

Novelty is indeed essential to recommendation: in many, if not most scenarios, the whole point of recommendation is inherently linked to a notion of discovery, as recommendation makes most sense when it exposes the user to a relevant experience that she would not have found, or thought of by herself –obvious, however accurate recommendations are generally of little use.

Not only does a varied recommendation provide in itself for a richer user experience. Given the inherent uncertainty in user interest prediction –since it is based on implicit, incomplete evidence of interests, where the latter are moreover subject to change–, avoiding a too narrow array of choice is generally a good approach to enhance the chances that the user is pleased by at least some recommended item. Sales diversity may enhance businesses as well, leveraging revenues from market niches.

It is easy to increase novelty and diversity by giving up on accuracy; the challenge is to enhance these aspects while still achieving a fair match of the user’s interests. The goal is thus generally to enhance the balance in this trade-off, rather than just a diversity or novelty increase.

DiveRS 2011 aims to gather researchers and practitioners interested in the role of novelty and diversity in recommender systems. The workshop seeks to advance towards a better understanding of what novelty and diversity are, how they can improve the effectiveness of recommendation methods and the utility of their outputs. We aim to identify open problems, relevant research directions, and opportunities for innovation in the recommendation business. The workshop seeks to stir further interest for these topics in the community, and stimulate the research and progress in this area.

The abstract from “Fusion-based Recommender System for Improving Serendipity” by Kenta Oku, Fumio Hattori reads:

Recent work has focused on new measures that are beyond the accuracy of recommender systems. Serendipity, which is one of these measures, is defined as a measure that indicates how the recommender system can find unexpected and useful items for users. In this paper, we propose a Fusion-based Recommender System that aims to improve the serendipity of recommender systems. The system is based on the novel notion that the system finds new items, which have the mixed features of two user-input items, produced by mixing the two items together. The system consists of item-fusion methods and scoring methods. The item-fusion methods generate a recommendation list based on mixed features of two user-input items. Scoring methods are used to rank the recommendation list. This paper describes these methods and gives experimental results.

Interested yet? ;-)