JML [Java Machine Learning]

JML [Java Machine Learning] by Mingjie Qian.

From the webpage:

JML is a pure Java library for machine learning. The goal of JML is to make machine learning methods easy to use and speed up the code translation from MATLAB to Java. Tutorial-JML.pdf

Current version implements logistic regression, Maximum Entropy modeling (MaxEnt), AdaBoost, LASSO, KMeans, spectral clustering, Nonnegative Matrix Factorization (NMF), sparse NMF, Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA) (by Gibbs sampling based on LdaGibbsSampler.java by Gregor Heinrich), joint l_{2,1}-norms minimization, Hidden Markov Model (HMM), Conditional Random Field (CRF), etc. just for examples of implementing machine learning methods by using this general framework. The SVM package LIBLINEAR is also incorporated. I will try to add more important models such as Markov Random Field (MRF) to this package if I get the time:)

JML library’s another advantage is its complete independence from feature engineering, thus any preprocessed data could be run. For example, in the area of natural language processing, feature engineering is a crucial part for MaxEnt, HMM, and CRF to work well and is often embedded in model training. However, we believe that it is better to separate feature engineering and parameter estimation. On one hand, modularization could be achieved so that people can simply focus on one module without need to consider other modules; on the other hand, implemented modules could be reused without incompatibility concerns.

JML also provides implementations of several efficient, scalable, and widely used general purpose optimization algorithms, which are very important for machine learning methods be applicable on large scaled data, though particular optimization strategy that considers the characteristics of a particular problem is more effective and efficient (e.g., dual coordinate descent for bound constrained quadratic programming in SVM). Currently supported optimization algorithms are limited-memory BFGS, projected limited-memory BFGS (non-negative constrained or bound constrained), nonlinear conjugate gradient, primal-dual interior-point method, general quadratic programming, accelerated proximal gradient, and accelerated gradient descent. I would always like to implement more practical efficient optimization algorithms. (emphasis in original)

Something else “practical” for your weekend. ;-)

Leave a Reply

You must be logged in to post a comment.