Open Challenges for Data Stream Mining Research

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 3, 2014

Open Challenges for Data Stream Mining Research

Filed under: BigData,Data Mining,Data Streams,Text Mining — Patrick Durusau @ 4:58 pm

Open Challenges for Data Stream Mining Research, SIGKDD Explorations, Volume 16, Number 1, June 2014.

Abstract:

Every day, huge volumes of sensory, transactional, and web data are continuously generated as streams, which need to be analyzed online as they arrive. Streaming data can be considered as one of the main sources of what is called big data. While predictive modeling for data streams and big data have received a lot of attention over the last decade, many research approaches are typically designed for well-behaved controlled problem settings, over-looking important challenges imposed by real-world applications. This article presents a discussion on eight open challenges for data stream mining. Our goal is to identify gaps between current research and meaningful applications, highlight open problems, and define new application-relevant research directions for data stream mining. The identified challenges cover the full cycle of knowledge discovery and involve such problems as: protecting data privacy, dealing with legacy systems, handling incomplete and delayed information, analysis of complex data, and evaluation of stream algorithms. The resulting analysis is illustrated by practical applications and provides general suggestions concerning lines of future research in data stream mining.

Under entity stream mining, the authors describe the challenge of aggregation:

The first challenge of entity stream mining task concerns information summarization: how to aggregate into each entity e at each time point t the information available on it from the other streams? What information should be stored for each entity? How to deal with differences in the speeds of individual streams? How to learn over the streams efficiently? Answering those questions in a seamless way would allow us to deploy conventional stream mining methods for entity stream mining after aggregation.
…

Sounds remarkably like an issue for topic maps doesn’t it? Well, not topic maps in the sense that every entity has an IRI subjectIdentifier but in the sense that merging rules define the basis on which two or more entities are considered to represent the same subject.

The entire issue is on “big data” and if you are looking for research “gaps,” it is a great starting point. Table of Contents: SIGKDD explorations, Volume 16, Number 1, June 2014.

I included the TOC link because for reasons only known to staff at the ACM, the articles in this issue don’t show up in the library index. One of the many “features” of the ACM Digital Library.

In addition to the committee which oversees the Digital Library being undisclosed to members and available for contact only by staff.

Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

October 3, 2014

Open Challenges for Data Stream Mining Research

No Comments