Archive for the ‘S4’ Category

MOA Massively Online Analysis

Saturday, December 1st, 2012

MOA Massively Online Analysis : Real Time Analytics for Data Streams

From the homepage:

What is MOA?

MOA is an open source framework for data stream mining. It includes a collection of machine learning algorithms (classification, regression, and clustering) and tools for evaluation. Related to the WEKA project, MOA is also written in Java, while scaling to more demanding problems.

What can MOA do for you?

MOA performs BIG DATA stream mining in real time, and large scale machine learning. MOA can be easily used with Hadoop, S4 or Storm, and extended with new mining algorithms, and new stream generators or evaluation measures. The goal is to provide a benchmark suite for the stream mining community. Details.

Short tutorials and a manual are available. Enough to get started but you will need additional resources on machine learning if it isn’t already familiar.

A small niggle about documentation. Many projects have files named “tutorial” or in this case “Tutorial1,” or “Manual.” Those files are easier to discover/save, if the project name, version(?), is prepended to tutorial or manual. Thus “Moa-2012-08-tutorial1″ or “Moa-2012-08-manual.”

If data streams are in your present or future, definitely worth a look.

S4

Friday, December 3rd, 2010

S4

From the website:

S4 is a general-purpose, distributed, scalable, partially fault-tolerant, pluggable platform that allows programmers to easily develop applications for processing continuous unbounded streams of data.

Just in case you were wondering if topic maps are limited to being bounded objects composed of syntax. No.

Questions:

  1. Specify three sources of unbounded streams of data. (3 pages, citations)
  2. What subjects would you want to identify and on what basis in any one of them? (3-5 pages, citations)
  3. What other information about those subjects would you want to bind to the information in #2? What subject identity tests are used for those subjects in other sources? (5-10 pages, citations)