Classification and regression trees by Wei-Yin Loh.
Classification and regression trees are machine-learningmethods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. This article gives an introduction to the subject by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples. 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 14–23 DOI: 10.1002/widm.8.
A bit more challenging that CSV formats but also very useful.
I heard a joke many years ago but a then U.S. Assistant Attorney General who said:
To create a suspect list for a truck hijacking in New York, you choose files with certain name characteristics, delete the ones that are currently in prison and those that remain are your suspect list. (paraphrase)
If topic maps can represent any “subject” then they should be able to represent “group subjects” as well. We may know that our particular suspect is the member of a group, but we just don’t know which member of the group is our suspect.
Think of it as a topic map that evolves as more data/analysis is brought to the map and members of a group subject can be broken out into smaller groups or even individuals.
In fact, displaying summaries of characteristics of members of a group in response to classification/regression could well help with the subject analysis process. An interactive construction/mining of the topic map as it were.
Great paper whether you use it for topic map subject analysis or more traditional purposes.