Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

March 3, 2015

SmileMiner [Conflicting Data Science Results?]

Filed under: Java,Machine Learning — Patrick Durusau @ 3:21 pm

SmileMiner

From the webpage:

SmileMiner (Statistical Machine Intelligence and Learning Engine) is a pure Java library of various state-of-art machine learning algorithms. SmileMiner is self contained and requires only Java standard library.

SmileMiner is well documented and you can browse the javadoc for more information. A basic tutorial is avilable on the project wiki.

To see SmileMiner in action, please download the demo jar file and then run java -jar smile-demo.jar.

  • Classification: Support Vector Machines, Decision Trees, AdaBoost, Gradient Boosting, Random Forest, Logistic Regression, Neural Networks, RBF Networks, Maximum Entropy Classifier, KNN, Naïve Bayesian, Fisher/Linear/Quadratic/Regularized Discriminant Analysis.
  • Regression: Support Vector Regression, Gaussian Process, Regression Trees, Gradient Boosting, Random Forest, RBF Networks, OLS, LASSO, Ridge Regression.
  • Feature Selection: Genetic Algorithm based Feature Selection, Ensemble Learning based Feature Selection, Signal Noise ratio, Sum Squares ratio.
  • Clustering: BIRCH, CLARANS, DBScan, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical Clustering, Sequential Information Bottleneck, Self-Organizing Maps, Spectral Clustering, Minimum Entropy Clustering.
  • Association Rule & Frequent Itemset Mining: FP-growth mining algorithm
  • Manifold learning: IsoMap, LLE, Laplacian Eigenmap, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection
  • Multi-Dimensional Scaling: Classical MDS, Isotonic MDS, Sammon Mapping
  • Nearest Neighbor Search: BK-Tree, Cover Tree, KD-Tree, LSH
  • Sequence Learning: Hidden Markov Model.

Great to have another machine learning library but it reminded me of a question I read yesterday:

When two teams of data scientists report conflicting results, how does a manager choose between them?

There is a view, says Florian Zettelmeyer, the Nancy L. Ertle Professor of Marketing, that data science represents disembodied truth.

Zettelmeyer, himself a data scientist, fervently disagrees with that view.

“Data science fundamentally speaks to management decisions” he said, “and management decisions are fundamentally political. There are agendas and there are winners and losers. As a result, different teams will often come up with different conclusions and it is the job of a manager to be able to call it. This requires a ‘working knowledge of data science.’”

Granting it is a promotion for the Kellogg School of Management but Zettelmeyer has a good point.

I’m not so sure that a “working knowledge of data science” is required to choose between different answers in data science. A knowledge of what their superiors are likely to accept is a more likely criteria.

A good machine learning library should give you enough options to approximate the expected answer.

I first saw this in a tweet by Bence Arato.

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress