Cascalog 2.0 In Depth by Sam Ritchie.
From the post:
Cascalog 2.0 has been out for over a year now, and outside of a post to the mailing list and a talk at Clojure/Conj 2013 (slides here), I’ve never written up the
startingly long list of new features brought by that release. So shameful.This post fixes that. 2.0 was a big deal. Anonymous functions make it easy to reuse your existing, non Cascalog code. The interop story with vanilla Clojure is much better, which is huge for testing. Finally, users can access the JobConf, Cascading’s counters and other Cascading guts during operations.
Here’s a list of the features I’ll cover in this post:
- new def*ops,
- Anonymous function support
- Higher order functions
- Lifting Clojure functions into Cascalog
- expand-query
- Using functions as implicit filters in queries
- prepared functions, and access to Cascading’s guts
As if that weren’t enough, 2.0 adds a standalone Cascading DSL with an API similar to Scalding’s. You can move between this Cascading API and Cascalog. This makes it easy to use Cascading’s new features, like optimized joins, that haven’t bubbled up to the Cascalog DSL.
I’ll go over the Cascading DSL and the support for non-Cascading execution environments in a later post. For now, let’s get into it.
If you want to follow along, go ahead and clone the Cascalog repo, cd into the “cascalog-core” subdirectory and run “lein repl”. To try this code out in other projects, run “lein sub install” in the root directory. This will install
[cascalog/cascalog-core "3.0.0-SNAPSHOT"]
locally, so you can add it to yourproject.clj
and give the code a whirl.…
Belated but welcome review of the features of Cascalog 2.0!
I particularly liked the suggested “follow along” approach of the post.
Enjoy!