Monday, April 2nd, 2012


An extension of the SAX software for larger data sets. Detailed in: iSAX: Indexing and Mining Terabyte Sized Time Series.


Current research in indexing and mining time series data has produced many interesting algorithms and representations. However, it has not led to algorithms that can scale to the increasingly massive datasets encountered in science, engineering, and business domains. In this work, we show how a novel multiresolution symbolic representation can be used to index datasets which are several orders of magnitude larger than anything else considered in the literature. Our approach allows both fast exact search and ultra fast approximate search. We show how to exploit the combination of both types of search as sub-routines in data mining algorithms, allowing for the exact mining of truly massive real world datasets, containing millions of time series.

There are a number of data sets at this page with “…warning 500meg file.”