Roger Magoulas, using not small iron reports:
The result: The training set was processed and the sample data set classified in six seconds. We were able to classify the entire 400,000-record data set in under six minutes — more than a four-orders-of-magnitude records processed per minute (26,000-fold) improvement. A process that would have run for days, in its initial implementation, now ran in minutes! The performance boost let us try out different feature options and thresholds to optimize the classifier. On the latest run, a random sample showed the classifier working with 92% accuracy.
set-oriented machine learning makes for:
Handling larger and more diverse data sets Applying machine learning to a larger set of problems Faster turnarounds Less risk Better focus on a problem Improved accuracy, greater understanding and more usable results
Seems to me sameness of subject representation is a classification task. Yes?
Going from days to minutes sounds attractive to me.
How about you?