Archive for the ‘Weka’ Category

Re-Use, Re-Use! Using Weka within Lisp

Friday, August 19th, 2016

Suggesting code re-use, as described by Paul Homer in The Myth of Code Reuse, provokes this reaction from most programmers (substitute re-use for refund):


Atabey Kaygun demonstrates he isn’t one of those programmers in Using Weka within Lisp:

From the post:

As much as I like implementing machine learning algorithms from scratch within various languages I like using, in doing serious research one should not take the risk of writing error-prone code. Most likely somebody already spent many thousand hours writing, debugging and optimizing code you can use with some effort. Re-use people, re-use!

In any case, today I am going to describe how one can use weka libraries within ABCL implementation of common lisp. Specifically, I am going to use the k-means implementation of weka.

As usual, well written and useful guide to using Weka and Lisp.

The issues of code re-use aren’t confined to programmers.

Any stats you can suggest on re-use of database or XML schemas?

Weka MOOCs – Self-Paced Courses

Friday, July 8th, 2016

All three Weka MOOCs available as self-paced courses

From the post:

All three MOOCs (“Data Mining with Weka”, “More Data Mining with Weka” and “Advanced Data Mining with Weka”) are now available on a self-paced basis. All the material, activities and assessments are available from now until 24th September 2016 at:

The Weka software and MOOCs are great introductions to machine learning!

Advanced Data Mining with Weka – Starts 25 April 2016

Wednesday, April 6th, 2016

Advanced Data Mining with Weka by Ian Witten.

From the webpage:

This course follows on from Data Mining with Weka and More Data Mining with Weka. It provides a deeper account of specialized data mining tools and techniques. Again the emphasis is on principles and practical data mining using Weka, rather than mathematical theory or advanced details of particular algorithms. Students will analyse time series data, mine data streams, use Weka to access other data mining packages including the popular R statistical computing language, script Weka in Python, and deploy it within a cluster computing framework. The course also includes case studies of applications such as classifying tweets, functional MRI data, image classification, and signal peptide prediction.

The syllabus:

Advanced Data Mining with Weka is open for enrollment and starts 25 April 2016.

Five very intense weeks await!

Will you be there?

I first saw this in a tweet by Alyona Medelyan.

Data Mining with Weka (2014)

Sunday, March 2nd, 2014

Data Mining with Weka

From the course description:

Everybody talks about Data Mining and Big Data nowadays. Weka is a powerful, yet easy to use tool for machine learning and data mining. This course introduces you to practical data mining.

The 5-week course starts on 3rd March 2014.

Apologies, somehow I missed the notice on this class.

This will be followed by More Data Mining with Weka in late April of 2014.

Based on my experience with the Weka Machine Learning course, also with Professor Witten, I recommend either one or both of these courses without reservation.

Machine Learning: The problem is…

Wednesday, September 25th, 2013

I am watching the Data Mining with Weka videos and Prof. Ian Witten observed that Weka makes machine learning easy but:

The problem is understanding what it is that you have done.

That’s really the rub isn’t it? You loaded data, the program ran without crashing, some output was displayed.

All well and good but does it mean anything?

Or does your boss tell you what a data set will show after you complete machine learning on it?

Not to single out machine learning because there any number of ways to “cook” data long before it gets to the machine learning processor.

Take survey data for example. Where you ask some group of people for their responses.

A quick scan of survey methodology at Wikipedia and you will realize that services like Survey Monkey are for:


I’ve heard the arguments of no money to do a survey correctly so mid-management makes up questions that leads to the correct result. Business decisions are justified on that type of survey data.

Collecting data and running machine learning algorithms are vital day to day activities in data science.

Even if you plan to fool others, do be fooled yourself. Develop a critical outlook and questions that should be asked of data sets, depending upon their point of origin.

PS: Do you know of any courses on “data skepticism?” That would make a great course title. 😉

Data Mining with Weka [Free MOOC]

Thursday, August 29th, 2013

Data Mining with Weka

From the webpage:

Welcome to the free online course Data Mining with Weka

This 5 week MOOC will introduce data mining concepts through practical experience with the free Weka tool.

The course features:

The course will start September 9, 2013, with enrolments now open.

An opportunity to both keep your mind in shape and learn something useful.

The need for data intuits who also know machine learning is increasing.

Are you going to be the pro from Dover or not?

Data Mining with Weka

Sunday, April 14th, 2013

Data Mining with Weka by Prof Ian H. Witten.

From the documentation:

The purpose of this study is to gain information to help design and implement the main WekaMOOC course.

If you are interested in Weka or helping with the development of a MOOC or both, this is an opportunity for you.

I am curious if MOOCs or at least mini-MOOCs are going to replace the extended infomercials touted as webinars.

Update already: For Ubuntu, manually install (not with Aptitude). So you can start with JVM memory settings. I got JDBC error messages but otherwise ran properly.

Free Data Mining Tools [African Market?]

Wednesday, April 10th, 2013

The Best Data Mining Tools You Can Use for Free in Your Company by: Mawuna Remarque KOUTONIN.

Short descriptions of the usual suspects but a couple (jHepWork and PSPP) that were new to me.

  1. RapidMiner
  2. RapidAnalytics
  3. Weka
  4. PSPP
  5. KNIME
  6. Orange
  7. Apache Mahout
  8. jHepWork
  9. Rattle

An interesting site in general.

Consider the following pitch for business success in Africa:

Africa: Your Business Should be Profitable in 45 days or Die

And the reasons for that claim:

1. “It’s almost virgin here. There are lot of opportunities, but you have to fight!”

2. “Target the vanity class with vanity products. The “new rich” have lot of money. They are though on everything except their big ego and social reputation”

3. “Target the lazy executives and middle managers. Do the job they are paid for as a consultant. Be good, and politically savvy, and the money is yours”

4. “You’ll make more money in selling food or opening a restaurant than working for the Bank”

5. “You can’t avoid politics, but learn to think like the people your are talking with. Always finish your sentence with something like “the most important is the country’s development, not power. We all have to work in that direction”

6. “It’s about hard work and passion, but you should first forget about managing time like in Europe.

Take time to visit people, go to the vanity parties, have the patience to let stupid people finish their long empty sentences, and make the politicians understand that your project could make them win elections and strengthen their positions”

7. “Speed is everything. Think fast, Act fast, Be everywhere through friends, family and informants”

With the exception of #1, all of these points are advice I would give to someone marketing topic maps on any continent.

It may be easier to market topic maps where there are few legacy IT systems that might feel threatened by a new technology.

Applying Parallel Prediction to Big Data

Saturday, October 6th, 2012

Applying Parallel Prediction to Big Data by Dan McClary (Principal Product Manager for Big Data and Hadoop at Oracle).

From the post:

One of the constants in discussions around Big Data is the desire for richer analytics and models. However, for those who don’t have a deep background in statistics or machine learning, it can be difficult to know not only just what techniques to apply, but on what data to apply them. Moreover, how can we leverage the power of Apache Hadoop to effectively operationalize the model-building process? In this post we’re going to take a look at a simple approach for applying well-known machine learning approaches to our big datasets. We’ll use Pig and Hadoop to quickly parallelize a standalone machine-learning program written in Jython.

Playing Weatherman

I’d like to predict the weather. Heck, we all would – there’s personal and business value in knowing the likelihood of sun, rain, or snow. Do I need an umbrella? Can I sell more umbrellas? Better yet, groups like the National Climatic Data Center offer public access to weather data stretching back to the 1930s. I’ve got a question I want to answer and some big data with which to do it. On first reaction, because I want to do machine learning on data stored in HDFS, I might be tempted to reach for a massively scalable machine learning library like Mahout.

For the problem at hand, that may be overkill and we can get it solved in an easier way, without understanding Mahout. Something becomes apparent on thinking about the problem: I don’t want my climate model for San Francisco to include the weather data from Providence, RI. Weather is a local problem and we want to model it locally. Therefore what we need is many models across different subsets of data. For the purpose of example, I’d like to model the weather on a state-by-state basis. But if I have to build 50 models sequentially, tomorrow’s weather will have happened before I’ve got a national forecast. Fortunately, this is an area where Pig shines.

Two quick observations:

First, Dan makes my point about your needing the “right” data, which may or may not be the same thing as “big data.” Decide what you want to do before you reach for big iron and data.

Second, I never hear references to the “weatherman” without remembering: “you don’t need to be a weatherman to know which way the wind blows.” (link to the manifesto) If you prefer a softer version, Subterranean Homesick Blues by Bob Dylan.

Interacting with Weka from Jython

Thursday, September 20th, 2012

Interacting with Weka from Jython by Christophe Lalanne.

From the post:

I discovered a lovely feature: You can use WEKA directly with Jython in a friendly interactive REPL.

There are days when I think I need more than multiple workspaces on multiple monitors. I need an extra set of hands and eyes. 😉


R Integration in Weka

Sunday, July 8th, 2012

R Integration in Weka by Mark Hall.

From the post:

These days it seems like every man and his proverbial dog is integrating the open-source R statistical language with his/her analytic tool. R users have long had access to Weka via the RWeka package, which allows R scripts to call out to Weka schemes and get the results back into R. Not to be left out in the cold, Weka now has a brand new package that brings the power of R into the Weka framework.


In this section I briefly cover what the new RPlugin package for Weka >= 3.7.6 offers. This package can be installed via Weka’s built-in package manager.

Here is an list of the functionality implemented:

  • Execution of arbitrary R scripts in Weka’s Knowledge Flow engine
  • Datasets into and out of the R environment
  • Textual results out of the R environment
  • Graphics out of R in png format for viewing inside of Weka and saving to files via the JavaGD graphics device for R
  • A perspective for the Knowledge Flow and a plugin tab for the Explorer that provides visualization of R graphics and an interactive R console
  • A wrapper classifier that invokes learning and prediction of R machine learning schemes via the MLR (Machine Learning in R) library

The use of R appears to be spreading! (Oracle, SAP, Hadoop, just to name a few that come readily to mind.)

Where is it on your list of data mining tools?

I first saw this at DZone.