DPSP: Distributed Progressive Sequential Pattern Mining on the Cloud Authors: Jen-Wei Huang, Su-Chen Lin, Ming-Syan Chen Keywords: sequential pattern mining, period of interest (POI), customer transactions
Abstract:
The progressive sequential pattern mining problem has been discussed in previous research works. With the increasing amount of data, single processors struggle to scale up. Traditional algorithms running on a single machine may have scalability troubles. Therefore, mining progressive sequential patterns intrinsically suffers from the scalability problem. In view of this, we design a distributed mining algorithm to address the scalability problem of mining progressive sequential patterns. The proposed algorithm DPSP, standing for Distributed Progressive Sequential Pattern mining algorithm, is implemented on top of Hadoop platform, which realizes the cloud computing environment. We propose Map/Reduce jobs in DPSP to delete obsolete itemsets, update current candidate sequential patterns and report up-to-date frequent sequential patterns within each POI. The experimental results show that DPSP possesses great scalability and consequently increases the performance and the practicability of mining algorithms.
The phrase mining sequential patterns was coined in Mining Sequential Patterns, a paper authored by Rakesh Agrawal, Ramakrishnan Srikant, and cited by the authors of this paper.
The original research was to find patterns in customer transactions, which I suspect are important “subjects” for discovery and representation in commerce topic maps.