Archive for the ‘Novelty’ Category

Why Big Data Fails to Detect Terrorists

Thursday, December 17th, 2015

Kirk Borne tweeted a link to his presentation, Big Data Science for Astronomy & Space and more specifically to slides 24 and 25 on novelty detection, surprise discovery.

Casting about for more resources to point out, I found Novelty Detection in Learning Systems by Stephen Marsland.

The abstract for Stephen’s paper:

Novelty detection is concerned with recognising inputs that differ in some way from those that are usually seen. It is a useful technique in cases where an important class of data is under-represented in the training set. This means that the performance of the network will be poor for those classes. In some circumstances, such as medical data and fault detection, it is often precisely the class that is under-represented in the data, the disease or potential fault, that the network should detect. In novelty detection systems the network is trained only on the negative examples where that class is not present, and then detects inputs that do not fits into the model that it has acquired, that it, members of the novel class.

This paper reviews the literature on novelty detection in neural networks and other machine learning techniques, as well as providing brief overviews of the related topics of statistical outlier detection and novelty detection in biological organisms.

The rest of the paper is very good and worth your time to read but we need not venture beyond the abstract to demonstrate why big data cannot, by definition, detect terrorists.

The root of the terrorist detection problem summarized in the first sentence:

Novelty detection is concerned with recognising inputs that differ in some way from those that are usually seen.

So, what are the inputs of a terrorist that differ from the inputs usually seen?

That’s a simple enough question.

Previously committing a terrorist suicide attack is a definite tell but it isn’t a useful one.

Obviously the TSA doesn’t know because it has never caught a terrorist, despite its profile and wannabe psychics watching travelers.

You can churn big data 24×7 but if you don’t have a baseline of expected inputs, no input is going to stand out from the others.

The San Bernardino were not detected, because the inputs didn’t vary enough for the couple to stand out.

Even if they had been selected for close and unconstitutional monitoring of their etraffic, bank accounts, social media, phone calls, etc., there is no evidence that current data techniques would have detected them.

Before you invest or continue paying for big data to detect terrorists, ask the simple questions:

What is your baseline from which variance will signal a terrorist?

How often has it worked?

Once you have a dead terrorist, you can start from the dead terrorist and search your big data, but that’s an entirely different starting point.

Given the weeks, months and years of finger pointing following a terrorist attack, speed really isn’t an issue.

Incremental Classification, concept drift and Novelty detection (IClaNov)

Wednesday, October 8th, 2014

Incremental Classification, concept drift and Novelty detection (IClaNov)

From the post:

The development of dynamic information analysis methods, like incremental clustering, concept drift management and novelty detection techniques, is becoming a central concern in a bunch of applications whose main goal is to deal with information which is varying over time. These applications relate themselves to very various and highly strategic domains, including web mining, social network analysis, adaptive information retrieval, anomaly or intrusion detection, process control and management recommender systems, technological and scientific survey, and even genomic information analysis, in bioinformatics. The term “incremental” is often associated to the terms dynamics, adaptive, interactive, on-line, or batch. The majority of the learning methods were initially defined in a non-incremental way. However, in each of these families, were initiated incremental methods making it possible to take into account the temporal component of a data stream. In a more general way incremental clustering algorithms and novelty detection approaches are subjected to the following constraints:

  • Possibility to be applied without knowing as a preliminary all the data to be analyzed;
  • Taking into account of a new data must be carried out without making intensive use of the already considered data;
  • Result must but available after insertion of all new data;
  • Potential changes in the data description space must be taken into consideration.

This workshop aims to offer a meeting opportunity for academics and industry-related researchers, belonging to the various communities of Computational Intelligence, Machine Learning, Experimental Design and Data Mining to discuss new areas of incremental clustering, concept drift management and novelty detection and on their application to analysis of time varying information of various natures. Another important aim of the workshop is to bridge the gap between data acquisition or experimentation and model building.

ICDM 2014 Conference: December 14, 2014

The agenda for this workshop has been posted.

Does your ontology support incremental classification, concept drift and novelty detection? All of those exist in the ongoing data stream of experience if not within some more limited data stream from a source.

You can work from a dated snapshot of the world as it was, but over time will that best serve your needs?

Remember that for less than $250,000 (est.) the attacks on 9/11 provoked the United States into spending $trillions based on a Cold War snapshot of the world. Probably the highest return on investment for an attack in history.

The world is constantly changing and your data view of it should be changing as well.

The dynamics of correlated novelties

Tuesday, August 12th, 2014

The dynamics of correlated novelties by F. Tria, V. Loreto, V. D. P. Servedio, and S. H. Strogatz.

Abstract:

Novelties are a familiar part of daily life. They are also fundamental to the evolution of biological systems, human society, and technology. By opening new possibilities, one novelty can pave the way for others in a process that Kauffman has called “expanding the adjacent possible”. The dynamics of correlated novelties, however, have yet to be quantified empirically or modeled mathematically. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya’s urn, predicts statistical laws for the rate at which novelties happen (Heaps’ law) and for the probability distribution on the space explored (Zipf’s law), as well as signatures of the process by which one novelty sets the stage for another. We test these predictions on four data sets of human activity: the edit events of Wikipedia pages, the emergence of tags in annotation systems, the sequence of words in texts, and listening to new songs in online music catalogues. By quantifying the dynamics of correlated novelties, our results provide a starting point for a deeper understanding of the adjacent possible and its role in biological, cultural, and technological evolution.

From the introduction:

The notion that one new thing sometimes triggers another is, of course, commonsensical. But it has never been documented quantitatively, to the best of our knowledge. In the world before the Internet, our encounters with mundane novelties, and the possible correlations between them, rarely left a trace. Now, however, with the availability of extensive longitudinal records of human activity online1, it has become possible to test whether everyday novelties crop up by chance alone, or whether one truly does pave the way for another.

Steve Newcomb often talks about serendipity and topic maps. What if it is possible to engineer serendipity? That is over a large enough population, discover the subjects that are going to trigger the transition where “formerly adjacent possible becomes actualized[?].

This work is in its very early stages but its impact on information delivery/discovery may be substantial.

ACM RecSys 2011 Workshop on Novelty and Diversity in Recommender Systems

Tuesday, December 13th, 2011

DiveRS 2011 – ACM RecSys 2011 Workshop on Novelty and Diversity in Recommender Systems

From the conference page:

Most research and development efforts in the Recommender Systems field have been focused on accuracy in predicting and matching user interests. However there is a growing realization that there is more than accuracy to the practical effectiveness and added-value of recommendation. In particular, novelty and diversity have been identified as key dimensions of recommendation utility in real scenarios, and a fundamental research direction to keep making progress in the field.

Novelty is indeed essential to recommendation: in many, if not most scenarios, the whole point of recommendation is inherently linked to a notion of discovery, as recommendation makes most sense when it exposes the user to a relevant experience that she would not have found, or thought of by herself –obvious, however accurate recommendations are generally of little use.

Not only does a varied recommendation provide in itself for a richer user experience. Given the inherent uncertainty in user interest prediction –since it is based on implicit, incomplete evidence of interests, where the latter are moreover subject to change–, avoiding a too narrow array of choice is generally a good approach to enhance the chances that the user is pleased by at least some recommended item. Sales diversity may enhance businesses as well, leveraging revenues from market niches.

It is easy to increase novelty and diversity by giving up on accuracy; the challenge is to enhance these aspects while still achieving a fair match of the user’s interests. The goal is thus generally to enhance the balance in this trade-off, rather than just a diversity or novelty increase.

DiveRS 2011 aims to gather researchers and practitioners interested in the role of novelty and diversity in recommender systems. The workshop seeks to advance towards a better understanding of what novelty and diversity are, how they can improve the effectiveness of recommendation methods and the utility of their outputs. We aim to identify open problems, relevant research directions, and opportunities for innovation in the recommendation business. The workshop seeks to stir further interest for these topics in the community, and stimulate the research and progress in this area.

The abstract from “Fusion-based Recommender System for Improving Serendipity” by Kenta Oku, Fumio Hattori reads:

Recent work has focused on new measures that are beyond the accuracy of recommender systems. Serendipity, which is one of these measures, is defined as a measure that indicates how the recommender system can find unexpected and useful items for users. In this paper, we propose a Fusion-based Recommender System that aims to improve the serendipity of recommender systems. The system is based on the novel notion that the system finds new items, which have the mixed features of two user-input items, produced by mixing the two items together. The system consists of item-fusion methods and scoring methods. The item-fusion methods generate a recommendation list based on mixed features of two user-input items. Scoring methods are used to rank the recommendation list. This paper describes these methods and gives experimental results.

Interested yet? 😉