Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

July 28, 2014

Oryx 2:…

Filed under: Machine Learning,Spark — Patrick Durusau @ 6:56 pm

Oryx 2: Lambda architecture on Spark for real-time large scale machine learning

From the overview:

This is a redesign of the Oryx project as “Oryx 2.0”. The primary design goals are:

1. A more reusable platform for lambda-architecture-style designs, with batch, speed and serving layers

2. Make each layer usable independently

3.Fuller support for common machine learning needs

  • Test/train set split and evaluation
  • Parallel model build
  • Hyper-parameter selection

4. Use newer technologies like Spark and Streaming in order to simplify:

  • Remove separate in-core implementations for scale-down
  • Remove custom data transport implementation in favor of message queues like Apache Kafka
  • Use a ‘real’ streaming framework instead of reimplementing a simple one
  • Remove complex MapReduce-based implementations in favor of Apache Spark-based implementations

5. Support more input (i.e. not just CSV)

Initial import was three days ago if you are interested in being in on the beginning!

No Comments

No comments yet.

RSS feed for comments on this post.

Sorry, the comment form is closed at this time.

Powered by WordPress