Archive for the ‘Scala’ Category

SKA LA

Tuesday, February 7th, 2012

SKA LA (link broken by site relocation, see below). Andy Petrella writes a multi-part series on:

Neo4J with Scala Play! 2.0 on Heroku

The outline from the first post:

I’ll try here to gather all steps of a spike I did to have a web prototype using scala and a graph database.

For that I used the below technologies.

Play! Framework as the web framework, in its 2.0-RC1 version.

Neo4J as the back end service for storing graph data.

Scala for telling the computer what it should do…

Here is an overview of what will be covered in the current suite.

  1. How to install Play! 2.0 RC1 from Git
  2. Install Neo4J and run it in a Server Mode. Explain its REST/Json Interface.
  3. Create a Play! project. Update it to open it in IDEA Community Edition
  4. An introduction of the Json facilities of Play! Scala. With the help of the SJson paradigm.
  5. Introduction of the Dispatch Scala library for HTTP communication
  6. How to use effeciently Dispatch’s Handler and Play!’s Json functionality together.
  7. Illustrate how to send Neo4J REST requests. For creating generic node, then create a persistent service that can re/store domain model instances.
  8. Create some views (don’t bother me for ‘em … I’m not a designer ^^) using Scala templates and Jquery ajax for browsing model and creating instances.
  9. Deploy the whole stuffs on Heroku.

If you aren’t already closing in on the winning entry for the Neo4j Challenge, this series of post will get you a bit closer!

BTW, remember the deadline is February 29th. (Leap year if you are using the Gregorian system.)


All nine parts have been posted. Until I can make more tidy repairs, see: https://bitly.com/bundles/startupgeek/4

Parallelizing Machine Learning– Functionally: A Framework and Abstractions for Parallel Graph Processing

Sunday, February 5th, 2012

Parallelizing Machine Learning– Functionally: A Framework and Abstractions for Parallel Graph Processing by Heather Miller and Philipp Haller.

Abstract:

Implementing machine learning algorithms for large data, such as the Web graph and social networks, is challenging. Even though much research has focused on making sequential algorithms more scalable, their running times continue to be prohibitively long. Meanwhile, parallelization remains a formidable challenge for this class of problems, despite frameworks like MapReduce which hide much of the associated complexity.We present a framework for implementing parallel and distributed machine learning algorithms on large graphs, flexibly, through the use of functional programming abstractions. Our aim is a system that allows researchers and practitioners to quickly and easily implement (and experiment with) their algorithms in a parallel or distributed setting. We introduce functional combinators for the flexible composition of parallel, aggregation, and sequential steps. To the best of our knowledge, our system is the first to avoid inversion of control in a (bulk) synchronous parallel model.

An area of research that appears to have a great deal of promise. Very much worth your attention.

Typesafe Stack

Tuesday, December 27th, 2011

Typesafe Stack

From the website:

Scala. Akka. Simple.

A 100% open source, integrated distribution offering Scala, Akka, sbt, and the Scala IDE for Eclipse.

The Typesafe Stack makes it easy for developers to get started building scalable software systems with Scala and Akka. The Typesafe Stack is based on the most recent stable versions of Scala and Akka, and provides all of the major components needed to develop and deploy Scala and Akka applications.

Go ahead! You need something new to put on your new, shiny 5TB disk drive. ;-)

Scala IDE for Eclipse

Friday, December 23rd, 2011

Scala IDE for Eclipse

We released the Scala IDE V2.0 for Eclipse today! After 9 months of intensive work by the community contributors, users and the IDE team we are really proud to release the new version. Not only is it robust and reliable but also comes with much improved performance and responsiveness. There are a whole lot of new features that make it a real pleasure to use, Report errors as you type, Project builder with dependency tracking, Definition Hyperlinking and Inferred type hovers, Code completion and better integration with Java build tools, and lots more. You can learn more about them all below. We hope you will enjoy using the new version and continue to help us with ideas and improvement suggestions, or just contribute them.

While working on V2.0 the team has been listening hard to what the IDE users need. Simply stated faster compilation, better debugging and better integration with established Java tools like Maven. The good news is the team is ready for and excited by the challenge. Doing V2.0 we learned a lot about the build process and now understand what is needed to make significant gains in large project compile times. This and providing a solid debugging capability will be the main thrust of the next IDE development cycle. More details will be laid out as we go through the project planning phase and establish milestones. Contributors will be most welcome and we have made it a lot easier to be one. So if you want us to get the next version faster, come and help!

A lot of effort has gone into this version of the IDE and we would like to recognize the people who have contributed so much time and energy to the success of the project.

Extreme Cleverness: Functional Data Structures in Scala

Tuesday, December 20th, 2011

Extreme Cleverness: Functional Data Structure in Scala

From the description:

Daniel Spiewak shows how to create immutable data that supports structural sharing, such as: Singly-linked List, Banker’s Queue, 2-3 Finger Tree, Red-Black Tree, Patricia Trie, Bitmapped Vector Trie.

Every now and again I see a presentation that is head and shoulders above even very good presentations. This is one of those.

The coverage of the Bitmapped Vector Trie merits your close attention. Amazing performance characteristics.

Satisfy yourself, see: http://github.com/djspiewak/extreme-cleverness

Multilingual Graph Traversals

Thursday, December 8th, 2011

OK the real title is: JVM Language Implementations. ;-) I like mine better.

From the webpage:

Gremlin is a style of graph traversing that can be hosted in any number of languages. The benefit of this is that users can make use of the programming language they are most comfortable with and still be able to evaluate Gremlin-style traversals. This model is different than, lets say, using SQL in Java where the query is evaluated by passing a string representation of the query to the SQL engine. On the contrary, with native Gremlin support for other JVM languages, there is no string passing. Instead, simple method chaining in Gremlin’s fluent style. However, the drawback of this model is that for each JVM language, there are syntactic variations that must be accounted for.

The examples below demonstrate the same traversal in Groovy, Scala, and Java, respectively.

Seeing is believing.

New Features in Scala 2.10

Friday, November 18th, 2011

New Features in Scala 2.10

From the post:

Today was most awaited (by me) talk of Devoxx. Martin Odersky gave presentation and announced a new features in the Scala 2.10. I just want to quickly go through all of them:

1. New reflection framework – it looks very nice (see photo) and 100% Scala. No need for Java for reflection API in order to work with Scala classes anymore!
2. Reification – it would be limited
3. type Dynamic – something similar to .NET 3
4. IDE improvements
5. Faster builds
6. SIPs: string interpolation and simpler implicits

At the moment it’s not clear whether mentioned SIPs would be really included in the release, but the chances are pretty high! So yes, we will finally get string interpolation!

Important for two reasons:

First, news about the upcoming features of Scala.

Second, we learn there is another expansion for SIPs. (I really didn’t plan it that way but it was nice how it worked out.)

NoSQL Exchange – 2 November 2011

Thursday, November 3rd, 2011

NoSQL Exchange – 2 November 2011

It doesn’t get much better or fresher (for non-attendees) than this!

  • Dr Jim Webber of Neo Technology starts the day by welcoming everyone to the first of many annual NOSQL eXchanges. View the podcast here…
  • Emil Eifrém gives a Keynote talk to the NOSQL eXchange on the past, present and future of NOSQL, and the state of NOSQL today. View the podcast here…
  • HANDLING CONFLICTS IN EVENTUALLY CONSISTENT SYSTEMS In this talk, Russell Brown examines how conflicting values are kept to a minimum in Riak and illustrates some techniques for automating semantic reconciliation. There will be practical examples from the Riak Java Client and other places.
  • MONGODB + SCALA: CASE CLASSES, DOCUMENTS AND SHARDS FOR A NEW DATA MODEL Brendan McAdams — creator of Casbah, a Scala toolkit for MongoDB — will give a talk on “MongoDB + Scala: Case Classes, Documents and Shards for a New Data Model”
  • REAL LIFE CASSANDRA Dave Gardner: In this talk for the NOSQL eXchange, Dave Gardner introduces why you would want to use Cassandra, and focuses on a real-life use case, explaining each Cassandra feature within this context.
  • DOCTOR WHO AND NEO4J Ian Robinson: Armed only with a data store packed full of geeky Doctor Who facts, by the end of this session we’ll have you tracking down pieces of memorabilia from a show that, like the graph theory behind Neo4j, is older than Codd’s relational model.
  • BUILDING REAL WORLD SOLUTION WITH DOCUMENT STORAGE, SCALA AND LIFT Aleksa Vukotic will look at how his company assessed and adopted CouchDB in order to rapidly and successfully deliver a next generation insurance platform using Scala and Lift.
  • ROBERT REES ON POLYGLOT PERSISTENCE Robert Rees: Based on his experiences of mixing CouchDB and Neo4J at Wazoku, an idea management startup, Robert talks about the theory of mixing your stores and the practical experience.
  • PARKBENCH DISCUSSION This Park Bench discussion will be chaired by Jim Webber.
  • THE FUTURE OF NOSQL AND BIG DATA STORAGE Tom Wilkie: Tom Wilkie takes a whistle-stop tour of developments in NOSQL and Big Data storage, comparing and contrasting new storage engines from Google (LevelDB), RethinkDB, Tokutek and Acunu (Castle).

And yes, I made a separate blog post on Neo4j and Dr. Who. ;-) What can I say? I am a fan of both.

Neo4j’s Cypher internals – Part 2: All clauses, more Scala’s Parser Combinators and query entry point

Thursday, November 3rd, 2011

Neo4j’s Cypher internals – Part 2: All clauses, more Scala’s Parser Combinators and query entry point

From the post:

During the previous post, I’ve explained what is Neo4j and then, explained how graph traversal could be done on Neo4j using the Java API. Next, I’ve introduced Cypher and how it helped write queries, in order to retrieve data from the graph. After introducing Cypher’s syntax, we dissected the Start Clause, which is the start point (duh) for any query being written on Cypher. If you hadn’t read it, go there, and then come back to read this one.

In this second part, I’ll show the other clauses existents in Cypher, the Match, Where, Return, Skip and Limit, OrderBy and Return. Some will be simple, some not and I’ll go in a more detailed way on those clauses that aren’t so trivial. After that, we will take a look at the Cypher query entry point, and how the query parsing is unleashed.

Nuff said, let’s get down to business.

This and part 1 are starting points for understanding Cypher. A key to evaluation of Neo4j as a topic map storage/application platform.

True enough, at present (1.4) Neo4j only supports 32 billion nodes, 32 billion relationships and 64 billion properties per database but on the other hand, I have fewer than 32 billion books than that so at a certain level of coarseness it should be fine. ;-)

BTW, I do collect CS texts, old as well as new. Mostly algorithm, parsing, graph, IR, database sort of stuff but occasionally other stuff too. Just in case you have a author’s copy or need to clear out space for more books. Drop me a line if you would like to make a donation to my collection.

Learning Scala? Learn the Fundamentals First

Sunday, October 23rd, 2011

Learning Scala? Learn the Fundamentals First by Craig Tataryn.

From the post:

A few weeks back I gave my talk at JavaOne 2011 titled “The Scala Language Tour”, if you’re at all interested you can grab the slides and examples from github.

The session was very well received, my only enemy was time! Given 1 hour, how does one give 170+ people a taste of all that’s Scala without completely starving them of details? Lots and lots and lots of dry-runs of your presentation, that’s how. I must have iterated my talk a dozen or more times. I just couldn’t bring myself to trimming any more fat. The short story is, I could have used 5-10 more minutes. A crucial set of slides had to be omitted concerning the “Tuple” in Scala.

Demonstrates the fundamental nature of tuples in Scala, with examples of where it can be found in Scala code.

Scala Videos (and ebook)

Friday, October 21st, 2011

Scala Videos (and ebook)

While looking for something else (isn’t that always the case?) I ran across this collection of Scala videos and a free ebook, Scala for the Impatient, at Typesafe.

Something to enjoy over the weekend!

Scalex

Tuesday, October 11th, 2011

Scalex

From the webpage:

Scaladoc Index

Much like Hoogle for Haskell, Scalex lets you find Scala functions quickly.

  • map Search for the text “map”
  • list map Search for the text “list” and the text “map”
  • A => A Search for the type “A => A”
  • : A => A Search for the type “A => A”
  • a Search for the text “a”
  • map : List[A] => (A => B) => List[B]
    Search for the text “map” and the type “List[A] => (A => B) => List[B]“

Searches can be either textual (a list of words), or by type (a type signature) or both. A type search may optionally start with a : symbol. A search is considered a text search unless it contains a combination of text and symbols, or if it starts with :. To search for both a type and a name, place a : between them, for example size : List[A] => Int

It occurs to me that a topic map version of such a resource could have “occurrences” of functions drawn from a code base that exist in associations with known programs and programmers. As an added resource to see how things are done with a particular function by experts.

Without documentation of the surrounding code that might be less useful than one would otherwise think but all good code is documented, isn’t it? ;-)

Yes, Virginia, Scala is Learnable

Sunday, October 9th, 2011

Yes, Virginia, Scala is Learnable

Paul Snively writes:

We’re using Databinder Dispatch a lot in the Cloud Services Engineering group at VMware, and late last week I was discussing it with one of my colleagues, a very senior (not average!) Java developer. I showed him a snippet of Dispatch code and said I wouldn’t expect anyone to understand it on first reading. He seemed surprised by that, unfortunately in the sense that he seemed to believe that it was expected that team members understand Dispatch code on first reading. Then Dave Pollak’s excellent Yes, Virginia, Scala is Hard post appeared, calling me out by name. :-) While it’s extremely flattering that Dave thinks I’m a statistical outlier with respect to programming language expertise, his comment, along with my disappointment to find that a very capable colleague apparently felt pressure to understand something that I expect no one to understand immediately, impels me to try to address the question of Scala’s complexity.

He concludes:

My one-sentence summary, though, would be: there’s no substitute for actually learning the language, and yes, Virginia, Scala is learnable.

Perhaps a bit unfair but I am reminded of efforts to make metadata more “accessible” to people, innocent of any formal information/library training, who built data sets used by millions daily. Interesting I suppose but then I recall when Alta Vista was “the” search site. How many users could today even correctly identify the name? There will always be far more users looking for simple facts, surmise and rumor than those interested in more sophisticated analysis.

My counsel is to learn both the more sophisticated and perhaps even historical systems. You can always dumb delivery down.

ScalaDays 2011 Resources

Monday, October 3rd, 2011

ScalaDays 2011 Resources

From the webpage:

Below, you’ll find links to any publicly-available material relating to presentations given at ScalaDays 2011.

This includes, but is not limited to:

  • slides
  • videos
  • projects referenced
  • source code
  • blog articles
  • follow-ups / corrections

A number of resources that will be of interest to Scala programmers.

Scala – [Java - *] Documentation – Marketing Topic Maps

Monday, October 3rd, 2011

Scala Documentation

As usual, when I am pursuing one lead to interesting material for or on topic maps, another pops up!

The Scala Days 2011 wiki had the following note:

Please note that the Scala wikis are in a state of flux. We strongly encourage you to add content but avoid creating permanent links. URLs will frequently change. For our long-term plans see this post by the doc czar.

A post that was followed by the usual comments about re-inventing the wheel, documentation being produced but not known to many, etc.

I mentioned topic maps as a method to improve program documentation to a very skilled Java/Topic Maps programmer, who responded: How would that be an improvement over Javadoc?

How indeed?

Hmmm, well, for starters the API documentation would not be limited to a particular program. That is to say for common code the API documentation for say a package could be included across several independent programs so that when the package documentation is improved for one, it is improved for all.

Second, it is possible, although certainly not required, to maintain API documentation as “active” documentation, that is to say it has a “fixed” representation such as HTML, only because we have chosen to render it that way. Topic maps can reach out and incorporate content from any source as part of API documentation.

Third, this does not require any change in current documentation systems, which is fortunate because that would require the re-invention of the wheel in all documentation systems for source code/programming documentation. A wheel that continues to be re-invented with every new source repository and programming language.

So long as the content is addressable (hard to think of content that is non-addressable, do you have a counter-example?), topic maps can envelope and incorporate that content with other content in a meaningful way. Granting that incorporating some content requires more efforts that other content. (Pointer “Go ask Bill with a street address” would be unusual but not unusable.)

The real question is, as always, is it worth the effort in a particular context to create such a topic map? Answers to that are going to vary depending upon your requirements and interests.

Comments?

PS: For extra points, how would you handle the pointer “Go ask Bill + street address” so that the pointer and its results can be used in an TMDM instance for merging purposes? It is possible. The result of any identifier can be respresented as an IRI. That much TBL got right. It was failing to realize that it is necessary to distinguish between use of an address as an identifer versus a locator that has cause so much wasted effort in the SW project.

Well, that an identifier imperialism that requires every identifier be transposed into IRI syntax. Given all the extant identifiers, with new ones being invented every day, let’s just that that replacing all extant identifiers comes under the “fairy tales we tell children” label where they all live happily ever after.

Scala Tutorial – Tuples, Lists, methods on Lists and Strings

Monday, October 3rd, 2011

Scala Tutorial – Tuples, Lists, methods on Lists and Strings

I mention this not only because it looks like a good Scala tutorial series but also because it is being developed in connection with a course on computational linguistics at UT Austin (sorry, University of Texas at Austin, USA).

The cross-over between computer programming and computational linguistics illustrates the artificial nature of the divisions we make between disciplines and professions.

Notes on using the neo4j-scala package, Part 1

Friday, September 30th, 2011

Notes on using the neo4j-scala package, Part 1 by Sebastian Benthall.

From the post:

Encouraged by the reception of last week’s hacking notes, I’ve decided to keep experimenting with Neo4j and Scala. Taking Michael Hunger’s advice, I’m looking into the neo4j-scala package. My goal is to port my earlier toy program to this library to take advantage of more Scala language features.

These my notes from stumbling through it. I’m halfway through.

Let’s encourage Sebastian some more!

Skills Matter – Autumn Update

Thursday, September 22nd, 2011

Skills Matter – Autumn Update

Given the state of UK airport security, about the only reason I would go to the UK would be for a Skills Matter (un)conference, eXchange, or tutorial! And that is from having only enjoyed them as recorded presentations, slides and code. Actual attendance must bring a lot of repeat customers.

On the schedule for this Fall:

Skills Matter Partner Conferences

Skills Matter has partnered with Silicon Valley Comes to the UK, WIP, Novoda, FuseSource and David Pollak, to provide you with the following fantastic (un)Conferences & Hackathon’s:

Skills Matter eXchanges

We’ll also be running some pretty cool one- and two-day long Skills Matter eXchanges, which are conferences featuring 45 minute long expert talks and lots of breaks to discuss what you have learned. Expect in-depth, hands-on talks led by real experts who are there to be quizzed, questioned and interrogated until you know as much as they do, or thereabouts! In the paragraphs below, you’ll be able to find out about the following eXchanges we have planned for the coming months:

Skills Matter Progressive Technology Tutorials

Skills Matter Progressive Technology Tutorials offer a collection of 4-hour tutorials, featuring a mix in-depth and hands-on workshops on technology, agile and software craftsmanship. In the paragraphs below, you’ll be able to find out about the following eXchanges we have planned for the coming months:

Scala School!

Wednesday, September 21st, 2011

Scala School!

From the webpage:

About

Scala school was started as a series of lectures at Twitter to prepare experienced engineers to be productive Scala programmers. Being a relatively new language, but also one that draws on many familiar concepts, we found this an effective way of getting new engineers up to speed quickly. This is the written material that accompanied those lectures. We have found that these are useful in their own right.

Approach

We think it makes the most sense to approach teaching Scala not as if it’s an improved Java but as a new language. Experience in Java is not expected. Focus will be around the interpreter and the object-functional style as well as the style of programming we do here. An emphasis will be placed on maintainability, clarity of expression, and leveraging the type system.

Most of the lessons require no software other than a Scala REPL. The reader is encouraged to follow along, and go further! Use these lessons as a starting point to explore the language.

Excellent!

Neo4j and Scala hacking notes

Wednesday, September 21st, 2011

Neo4j and Scala hacking notes

From the post:

This week FOSS4G, though it has nothing in particular to do with geospatial (…yet), I’ve started hacking around graph database Neo4j in Scala because I’m convinced both are the future. I’ve had almost no experience with either.

Dwins kindly held my hand through this process. He knows a hell of a lot about Scala and guided me through how some of the language features could help me work with the Neo4j API. In this post, I will try to describe the process and problems we ran into and parrot his explanations.

Very nice introduction to using Neo4j and Scala.

I am not sure if the lesson is to read documentation first or not. See what you think. ;-)

Extreme Cleverness: Functional Data Structures in Scala

Sunday, September 18th, 2011

Extreme Cleverness: Functional Data Structures in Scala by Daniel Spiewak.

Daniel is an enthusiastic and engaging speaker.

The graphics are particularly helpful.

The influence of chip architecture on the usefulness of data structures was interesting.

All the code, etc., at: http://www.github.com/djspiewak/extreme-cleverness

High Wizardry in the Land of Scala

Friday, September 9th, 2011

High Wizardry in the Land of Scala by Daniel Spiewak.

Daniel is obviously not a decaf fan. ;-)

Covers some advanced features of Scala:

  • Higher-Kinds
  • Type Classes
  • Type-Level Encoding
  • Continuations

He mentions a series of posts at Apocalisp, which starts with: Type-Level Programming in Scala, which has a listing of the following post, sort of. The list is incomplete and not really consistent with all the following articles. A complete listing would be handy. To save the author time I may contribute one.

I am wondering if type encoding of data structures would be useful with complex subject identifiers?

BTW, is Ruby a better language for conferences? Daniel mentions that at Ruby conferences they have porn in their slides. All that you find here is a nasty looking math slide. Would Benjamin be the right person to ask?

How Neo4j uses Scala’s Parser Combinator: Cypher’s internals – Part 1

Thursday, September 8th, 2011

How Neo4j uses Scala’s Parser Combinator: Cypher’s internals – Part 1

From the post:

I think that most of us, software developers, while kids, always wanted to know how things were made by inside. Since I was a child, I always wanted to understand how my toys worked. And then, what I used to do? Opened’em, sure. And of course, later, I wasn’t able to re-join its pieces properly, but this is not this post subject ;) . Well, understanding how things works behind the scenes can teach us several things, and in software this is no different, and we can study how an specific piece of code was created and mixed together with other code.

In this series of posts I’ll share what I’ve found inside Neo4J implementation, specifically, at Cypher’s code (its query language).

In this first part, I’ll briefly introduce Neo4J and Cypher and then I’ll start to explain the internals of its parser and how it works. Since it is a long (very very long subject, in fact), part 2 and subsequents are coming very very soon.

If you want to understand the internals of a graph query language, this looks like a good place to start.


Update: Neo4j’s Cypher internals – Part 2: All clauses, more Scala’s Parser Combinators and query entry point

Scala for the Intrigued & Scala Traits

Wednesday, September 7th, 2011

Scala for the Intrigued & Scala Traits

The current issue of PragPub (Sept-2011) has a pair of articles on Scala.

“Scala for the Intrigued” by Venkat Subramaniam starts a new series on Scala. Conciseness is the emphasis of the first post, with the implication that conciseness is a virtue. Perhaps, perhaps. I prefer to think of “conciseness” as the proper use of macros for substitution. Still, it looks like an interesting series to follow.

Brian Tarbox in “Scala Traits” counters an “antipattern” in Java with a “pattern” in Scala using “traits.” Different choices than were made for Java so Scala offers different capabilities. Capabilities that you may find useful. Or not. But worth your while to consider.

Scala Style Guide

Saturday, July 23rd, 2011

Scala Style Guide

From the webpage:

In lieu of an official style guide from EPFL, or even an unofficial guide from a community site like Artima, this document is intended to outline some basic Scala stylistic guidelines which should be followed with more or less fervency. Wherever possible, this guide attempts to detail why a particular style is encouraged and how it relates to other alternatives. As with all style guides, treat this document as a list of rules to be broken. There are certainly times when alternative styles should be preferred over the ones given here.

Question: Is it a sign of maturity for a programming language to start having religious wars over styles?

Just curious. Thought this might mark a milestone in the development of Scala.

Overview: Visualization to Connect the Dots

Tuesday, July 19th, 2011

Overview is Hiring!

I don’t think I have ever re-posted a job ad but this one merits wide distribution:

We need two Java or Scala ninjas to build the core analytics and visualization components of Overview, and lead the open-source development community. You’ll work in the newsroom at AP’s global headquarters in New York, which will give you plenty of exposure to the very real problems of large document sets.

The exact responsibilities will depend on who we hire, but we imagine that one of these positions will be more focused on user experience and process design, while the other will do the computer science heavy lifting — though both must be strong, productive software engineers. Core algorithms must run on a distributed cluster, and scale to millions of documents. Visualization will be through high-performance OpenGL. And it all has to be simple and obvious for a reporter on deadline who has no time to fight technology. You will be expected to implement complex algorithms from academic references, and expand prototype techniques into a production application.

From the about page:

Overview is an open-source tool to help journalists find stories in large amounts of data, by cleaning, visualizing and interactively exploring large document and data sets. Whether from government transparency initiatives, leaks or Freedom of Information requests, journalists are drowning in more documents than they can ever hope to read.

There are good tools for searching within large document sets for names and keywords, but that doesn’t help find stories we’re not looking for. Overview will display relationships among topics, people, places and dates to help journalists to answer the question, “What’s in there?”

We’re building an interactive system where computers do the visualization, while a human guides the exploration. We will also produce documentation and training to help people learn how to use this system. The goal is to make this capability available to anyone who needs it.

Overview is a project of The Associated Press, supported by the John S. and James L. Knight Foundation as part of its Knight News Challenge. The Associated Press invests its resources to advance the news industry, delivering fast, unbiased news from every corner of the world to all media platforms and formats. The Knight News Challenge is an international contest to fund digital news experiments that use technology to inform and engage communities.

Sounds like a project that is worth supporting to me!

Analytics are great, but subject identity would be more useful.

Apply if you have the skill sets, repost the link, and/or volunteer to carry the good news of topic maps to the project.

Scala for the Curious Erlang Programmer

Wednesday, July 13th, 2011

Dean Wampler – Scala for the Curious Erlang Programmer

From the description:

Scala is a statically-typed, hybrid functional and object-oriented language for the JVM. The Scala standard library includes an Erlang- inspired Actors library. In this talk, I’ll discuss how Scala compares and contrasts to Erlang, highlighting the advantages and disadvantages of each language for particular needs. For example, we’ll discuss the pros and cons of a rich type system and static typing in Scala. We’ll discuss ways that Scala is perhaps more general purpose than Erlang, but not as powerful in the areas where Erlang excels.

Always useful to choose the right tool for a task. Including semantics as understood by users.

You may also enjoy Dean’s Polyglotprogramming site, with links to his presentations and blog.

Scaling Scala at Twitter by Marius Eriksen

Tuesday, July 12th, 2011

Scaling Scala at Twitter by Marius Eriksen

From the description:

Rockdove is the backend service that powers the geospatial features on Twitter.com and the Twitter API (“Twitter Places”). It provides a datastore for places and a geospatial search engine to find them. To throw out some buzzwords, it is:

  • a distributed system
  • realtime (immediately indexes updates and changes)
  • horizontally scalable
  • fault tolerant

Rockdove is written entirely in Scala and was developed by 2 engineers with no prior Scala experience (nor with Java or the JVM). We think the geospatial search engine provides an interesting case study as it presents a mix of algorithm problems and “classic” scaling and optimization issues. We will report on our experience using Scala, focusing especially on:

  • “functional” systems design
  • concurrency and parallelism
  • using a “research language” in practice
  • when, where and why we turned the “functional dial”
  • avoiding mutable state

Not to mention being a well done presentation!

Spark – Lighting-Fast Cluster Computing

Monday, June 27th, 2011

Spark – Lighting-Fast Cluster Computing

From the webpage:

What is Spark?

Spark is an open source cluster computing system that aims to make data analytics fast — both fast to run and fast to write.

To run programs faster, Spark provides primitives for in-memory cluster computing: your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop MapReduce.

To make programming faster, Spark integrates into the Scala language, letting you manipulate distributed datasets like local collections. You can also use Spark interactively to query big data from the Scala interpreter.

What can it do?

Spark was initially developed for two applications where keeping data in memory helps: iterative algorithms, which are common in machine learning, and interactive data mining. In both cases, Spark can outperform Hadoop by 30x. However, you can use Spark’s convenient API to for general data processing too. Check out our example jobs.

Spark runs on the Mesos cluster manager, so it can coexist with Hadoop and other systems. It can read any data source supported by Hadoop.

Who uses it?

Spark was developed in the UC Berkeley AMP Lab. It’s used by several groups of researchers at Berkeley to run large-scale applications such as spam filtering, natural language processing and road traffic prediction. It’s also used to accelerate data analytics at Conviva. Spark is open source under a BSD license, so download it to check it out!

Hadoop must be doing something right to be treated as the solution to beat.

Still, depending on your requirements, Spark definitely merits your consideration.

TinySearchEngine

Monday, June 27th, 2011

TinySearchEngine

A search engine written in 30 lines of Scala.

Features:

  • in-memory index
  • norms and IDF calculated online
  • default OR operator between query terms
  • index a document per line from a single file
  • read stopwords from a file