Archive for the ‘Support Vector Machines’ Category

Linear SVM Classifier on Twitter User Recognition

Friday, March 6th, 2015

Linear SVM Classifier on Twitter User Recognition by Leon van Bokhorst.

From the post:

Support Vector Machines (SVM) are very useful and popular in data classification, regression and outlier detection. This advanced supervised machine learning algorithm can quickly become very complex and hard to understand, but can lead to great results. In the example we train a linear SVM to detect and predict who’s the writer of a tweet.

Nice weekend type project, Python, iPython notebook, 400 tweets (I think Leon is right, the sample is too small), but an opportunity to “arm up the switches and dial in the mils.”

Enjoy!

While you are there, you should look around Leon’s blog. A number of interesting posts on statistics using Python.

SVM – Understanding the math

Monday, November 10th, 2014

SVM – Understanding the math – Part 1 by Alexandre Kowalczy. (Part 2)

The first two tutorials of a series on Support Vector Machines (SVM) and their use in data analysis.

If you shudder when you read:

The objective of a support vector machine is to find the optimal separating hyperplane which maximizes the margin of the training data.

you won’t after reading these tutorials. Well written and illustrated.

If you think about it, math symbolism is like programming. It is a very precise language written with a great deal of economy. Which makes it hard to understand for the uninitiated. The underlying ideas, however, can be extracted and explained. That is what you find here.

Want to improve your understanding of what appears on the drop down menu as SVM? This is a great place to start!

PS: A third tutorial is due out soon

Better table search through Machine Learning and Knowledge

Friday, August 24th, 2012

Better table search through Machine Learning and Knowledge by Johnny Chen.

From the post:

The Web offers a trove of structured data in the form of tables. Organizing this collection of information and helping users find the most useful tables is a key mission of Table Search from Google Research. While we are still a long way away from the perfect table search, we made a few steps forward recently by revamping how we determine which tables are “good” (one that contains meaningful structured data) and which ones are “bad” (for example, a table that hold the layout of a Web page). In particular, we switched from a rule-based system to a machine learning classifier that can tease out subtleties from the table features and enables rapid quality improvement iterations. This new classifier is a support vector machine (SVM) that makes use of multiple kernel functions which are automatically combined and optimized using training examples. Several of these kernel combining techniques were in fact studied and developed within Google Research [1,2].

Important work on tables from Google Research.

Important in part because you can compare your efforts on accessible tables to theirs, to gain insight into what you are, or aren’t doing “right.”

For any particular domain, you should be able to do better than a general solution.

BTW, I disagree on the “good” versus “bad” table distinction. I suspect that tables that hold the layout of web pages, say for a CMS, are more consistent than database tables of comparable size. And that data, may or may not be important to you.

Important versus non-important data for a particular set of requirements is a defensible distinction.

“Good” versus “bad” tables is not.

Predictive Analytics: NeuralNet, Bayesian, SVM, KNN [part 4]

Monday, June 4th, 2012

Predictive Analytics: NeuralNet, Bayesian, SVM, KNN by Ricky Ho.

From the post:

Continuing from my previous blog in walking down the list of Machine Learning techniques. In this post, we’ll be covering Neural Network, Support Vector Machine, Naive Bayes and Nearest Neighbor. Again, we’ll be using the same iris data set that we prepared in the last blog.

Ricky continues his march through machine learning techniques. This post promises one more to go.

Skytree: Big Data Analytics

Saturday, March 3rd, 2012

Skytree: Big Data Analytics

Released this last week, Skytree offers both local as well as cloud-based data analytics.

From the website:

Skytree Server can accurately perform machine learning on massive datasets at high speed.

In the same way a relational database system (or database accelerator) is designed to perform SQL queries efficiently, Skytree Server is designed to efficiently perform machine learning on massive datasets.

Skytree Server’s scalable architecture performs state-of-the-art machine learning methods on data sets that were previously too big for machine learning algorithms to process. Leveraging advanced algorithms implemented on specialized systems and dedicated data representations tuned to machine learning, Skytree Server delivers up to 10,000 times performance improvement over existing approaches.

Currently supported machine learning methods:

  • Neighbors (Nearest, Farthest, Range, k, Classification)
  • Kernel Density Estimation and Non-parametric Bayes Classifier
  • K-Means
  • Linear Regression
  • Support Vector Machines (SVM)
  • Fast Singular Value Decomposition (SVD)
  • The Two-point Correlation

There is a “free” local version with a data limit (100,000 records) and of course the commercial local and cloud versions.

Comments?

Kernel Methods and Support Vector Machines de-Mystified

Sunday, October 9th, 2011

Kernel Methods and Support Vector Machines de-Mystified

From the post:

We give a simple explanation of the interrelated machine learning techniques called kernel methods and support vector machines. We hope to characterize and de-mystify some of the properties of these methods. To do this we work some examples and draw a few analogies. The familiar no matter how wonderful is not perceived as mystical.

Did the authors succeed in their goal of a “simple explanation”?

You might want to compare the Wikipedia entry they cite on support vector machines before making your comment. Success is often a relative term.