Archive for the ‘Scala’ Category

A walk-through for the Twitter streaming API

Sunday, April 14th, 2013

A walk-through for the Twitter streaming API by Jason Baldridge.

From the post:

Analyzing tweets is all the rage, and if you are new to the game you want to know how to get them programmatically. There are many ways to do this, but a great start is to use the Twitter streaming API, a RESTful service that allows you to pull tweets in real time based on criteria you specify. For most people, this will mean having access to the spritzer, which provides only a very small percentage of all the tweets going through Twitter at any given moment. For access to more, you need to have a special relationship with Twitter or pay Twitter or an affiliate like Gnip.

This post provides a basic walk-through for using the Twitter streaming API. You can get all of this based on the documentation provided by Twitter, but this will be slightly easier going for those new to such services. (This post is mainly geared for the first phase of the course project for students in my Applied Natural Language Processing class this semester.)

You need to have a Twitter account to do this walk-through, so obtain one now if you don’t have one already.

Basics of obtaining tweets from the Twitter stream.

I mention it as an active data source that may find its way into your topic map.

Saddle

Friday, April 5th, 2013

Saddle

From the webpage:

Saddle is a data manipulation library for Scala that provides array-backed, indexed, one- and two-dimensional data structures that are judiciously specialized on JVM primitives to avoid the overhead of boxing and unboxing.

Saddle offers vectorized numerical calculations, automatic alignment of data along indices, robustness to missing (N/A) values, and facilities for I/O.

Saddle draws inspiration from several sources, among them the R programming language & statistical environment, the numpy and pandas Python libraries, and the Scala collections library.

I have heard some one and two dimensional data structures can be quite useful. ;-)

Something to play with over the weekend.

Enjoy!

“Functional Programming for…Big Data”

Wednesday, March 20th, 2013

“Functional Programming for optimization problems in Big Data” by Paco Nathan.

Interesting slide deck, even if it doesn’t start with high drama. ;-)

Covers:

  1. Data Science
  2. Functional Programming
  3. Workflow Abstraction
  4. Typical Use Cases
  5. Open Data Example

The reading list mentioned in these slides makes a nice self-review course in data science.

The Open Data Example is for Palo Alto but you can substitute a city with open data closer to home.

Applied Natural Language Processing

Wednesday, March 20th, 2013

Applied Natural Language Processing by Jason Baldridge.

Description:

This class will provide instruction on applying algorithms in natural language processing and machine learning for experimentation and for real world tasks, including clustering, classification, part-of-speech tagging, named entity recognition, topic modeling, and more. The approach will be practical and hands-on: for example, students will program common classifiers from the ground up, use existing toolkits such as OpenNLP, Chalk, StanfordNLP, Mallet, and Breeze, construct NLP pipelines with UIMA, and get some initial experience with distributed computation with Hadoop and Spark. Guidance will also be given on software engineering, including build tools, git, and testing. It is assumed that students are already familiar with machine learning and/or computational linguistics and that they already are competent programmers. The programming language used in the course will be Scala; no explicit instruction will be given in Scala programming, but resources and assistance will be provided for those new to the language.

From the syllabus:

The foremost goal of this course is to provide practical exposure to the core techniques and applications of natural language processing. By the end, students will understand the motivations for and capabilities of several core natural language processing and machine learning algorithms and techniques used in text analysis, including:

  • regular expressions
  • vector space models
  • clustering
  • classification
  • deduplication
  • n-gram language models
  • topic models
  • part-of-speech tagging
  • named entity recognition
  • PageRank
  • label propagation
  • dependency parsing

We will show, on a few chosen topics, how natural language processing builds on and uses the fundamental data structures and algorithms presented in this course. In particular, we will discuss:

  • authorship attribution
  • language identification
  • spam detection
  • sentiment analysis
  • influence
  • information extraction
  • geolocation

Students will learn to write non-trivial programs for natural language processing that take advantage of existing open source toolkits. The course will involve significant guidance and instruction in to software engineering practices and principles, including:

  • functional programming
  • distributed version control systems (git)
  • build systems
  • unit testing
  • distributed computing (Hadoop)

The course will help prepare students both for jobs in the industry and for doing original research that involves natural language processing.

A great start to one aspect of being a “data scientist.”

I encountered this course via the Nak (Scala library for NLP) project. Version 1.1.1 was just released and I saw a tweet from Jason Baldridge on the same.

The course materials have exercises and a rich set of links to other resources.

You may also enjoy:

Jason Baldridge’s homepage.

Bcomposes (Jason’s blog).

Programming Isn’t Math

Sunday, March 10th, 2013

Programming Isn’t Math by Oscar Boykin.

From the description:

Functional programming has a rich history of drawing from mathematical theory, yet in this highly entertaining talk from the Northeast Scala Symposium, Twitter data scientist Oscar Boykin make the case that programming is distinct from mathematics. This distinction is healthy and does not mean we can’t leverage many results and concepts from mathematics.

As examples, Oscar will discuss some recent work — algebird, bijection, scalding — and show cases where mathematical purity were both helpful and harmful to developing products at Twitter.

The phrase “…highly entertaining…” may be an understatement.

The type of presentation where you want to starting reading new material during the presentation but you are afraid of missing the next gold nugget!

Definitely one to start the week on!

Neo4j and Gatling Sitting in a Tree, Performance T-E-S-T-ING

Thursday, February 14th, 2013

Neo4j and Gatling Sitting in a Tree, Performance T-E-S-T-ING by Max De Marzi.

From the post:

I was introduced to the open-source performance testing tool Gatling a few months ago by Dustin Barnes and fell in love with it. It has an easy to use DSL, and even though I don’t know a lick of Scala, I was able to figure out how to use it. It creates pretty awesome graphics and takes care of a lot of work for you behind the scenes. They have great documentation and a pretty active google group where newbies and questions are welcomed.

It requires you to have Scala installed, but once you do all you need to do is create your tests and use a command line to execute it. I’ll show you how to do a few basic things, like test that you have everything working, then we’ll create nodes and relationships, and then query those nodes.

You did run performance tests on your semantic application. Yes?

Semantic Search for Scala – Post 1

Saturday, February 2nd, 2013

Semantic Search for Scala – Post 1 by Mads Hartmann Jensen.

From the post:

The goal of the project is to create a semantic search engine for Scala, in the form of a library, and integrate it with the Scala IDE plugin for Eclipse. Part of the solution will be to index all aspects of a Scala code, that is:

  • Definitions of the usual Scala elements: classes, traits, objects, methods, fields, etc.
  • References to the above elements. Some more challenging case to consider are self-types, type-aliases, code injected by the compiler, and implicits.

With this information the library should be able to

  • Find all occurrences of any type of Scala element
  • Create a call-hierarchy, this is list all in- and outgoing method invocations, for any Scala method.
  • Create a type-hierarchy, i.e. list all super- and subclasses, of a specific type (I won’t necessarily find time to implement this during my thesis but nothing is stopping me from working on the project even after I hand in the report)

Mads is working on his master’s thesis and Typesafe has agreed to collaborate with him.

For a longer description of the project (or to comment), see: Features and Trees

If you have suggestions on semantic search for programming languages, please contact Mads on Twitter, Twitter @Mads_Hartmann.

The Neophyte’s Guide to Scala Part [n]…

Saturday, January 26th, 2013

Daniel Westheide has a series of posts introducing Scala to Neophytes.

As of today:

The Neophyte’s Guide to Scala Part 1: Extractors

The Neophyte’s Guide to Scala Part 2: Extracting Sequences

The Neophyte’s Guide to Scala Part 3: Patterns Everywhere

The Neophyte’s Guide to Scala Part 4: Pattern Matching Anonymous Functions

The Neophyte’s Guide to Scala Part 5: The Option type

The Neophyte’s Guide to Scala Part 6: Error handling with Try

The Neophyte’s Guide to Scala Part 7: The Either type

The Neophyte’s Guide to Scala Part 8: Welcome to the Future

The Neophyte’s Guide to Scala Part 9: Promises and Futures in practice

The Neophyte’s Guide to Scala Part 10: Staying DRY with higher-order functions

Apologies for not seeing this sooner.

Makes a nice starting place for the 25th March 2013 Functional Programming Principles in Scala class by Martin Odersky.

I first saw this at Chris Cundill’s This week in #Scala (26/01/2013).

Functional Programming Principles in Scala

Saturday, January 26th, 2013

Functional Programming Principles in Scala by Martin Odersky.

March 25th 2013 (7 weeks long)

From the webpage:

This course introduces the cornerstones of functional programming using the Scala programming language. Functional programming has become more and more popular in recent years because it promotes code that’s safe, concise, and elegant. Furthermore, functional programming makes it easier to write parallel code for today’s and tomorrow’s multiprocessors by replacing mutable variables and loops with powerful ways to define and compose functions.

Scala is a language that fuses functional and object-oriented programming in a practical package. It interoperates seamlessly with Java and its tools. Scala is now used in a rapidly increasing number of open source projects and companies. It provides the core infrastructure for sites such as Twitter, LinkedIn, Foursquare, Tumblr, and Klout.

In this course you will discover the elements of the functional programming style and learn how to apply them usefully in your daily programming tasks. You will also develop a solid foundation for reasoning about functional programs, by touching upon proofs of invariants and the tracing of execution symbolically.

The course is hands on; most units introduce short programs that serve as illustrations of important concepts and invite you to play with them, modifying and improving them. The course is complemented by a series of assignments, most of which are also programming projects.

In case you missed it last time.

I first saw this at Chris Cundill’s This week in #Scala (26/01/2013).

…Functional Programming and Scala

Thursday, January 17th, 2013

Resources for Getting Started With Functional Programming and Scala by Kelsey Innis.

From the post:

This is the “secret slide” from my recent talk Learning Functional Programming without Growing a Neckbeard, with links to the sources I used to put the talk together and some suggestions for ways to get started writing Scala code.

The “…without growing a neckbeard” merits mention even if you are not interested in functional programming and topic maps.

Nice list of resources.

Don’t miss the presentation!

I first saw this at This week in #Scala (11/01/2013) by Chris Cundill.

Scala Cheatsheet

Monday, January 7th, 2013

Scala Cheatsheet by Brendan O’Connor.

Quick reference to Scala syntax.

Also includes examples of bad practice, labeled as such.

I first saw this at This week in #Scala (04/01/2013) by Chris Cundill.

Computational Finance with Map-Reduce in Scala [Since Quants Have Funding]

Wednesday, November 28th, 2012

Computational Finance with Map-Reduce in Scala by Ron Coleman, Udaya Ghattamaneni, Mark Logan, and Alan Labouseur. (PDF)

Assuming the computations performed by quants are semantically homogeneous (a big assumption), the sources of their data and application of the outcomes, are not.

The clients of quants aren’t interested in you humming “…its a big world after all…,” etc. They are interested in furtherance of their financial operations.

Using topic maps to make an already effective tool more effective, is the most likely way to capture their interest. (Short of taking hostages.)

I first saw this in a tweet by Data Science London.

Graham’s Guide to Learning Scala

Sunday, November 25th, 2012

Graham’s Guide to Learning Scala by Graham Lee.

From the post:

It’s a pretty widely-accepted view that, as a programmer, learning new languages is a Good Idea™ . Most people with more than one language under their belt would say that learning new languages broadens your mind in ways that will positively affect the way you work, even if you never use that language again.

With the Christmas holidays coming up and many people likely to take some time off work, this end of the year presents a great opportunity to take some time out from your week-to-week programming grind and do some learning.

With that in mind, I present “Graham’s Guide to Learning Scala”. There are many, many resources on the web for learning about Scala. In fact, I think there’s probably too many! It would be quite easy to start in the wrong place and quickly get discouraged.

So this is not yet another resource to add to the pile. Rather, this is a guided course through what I believe are some of the best resources for learning Scala, and in an order that I think will help a complete newbie pick it up quickly but without feeling overwhelmed.

And, best of all, it has 9 Steps!

As Graham says, the holidays are coming up.

One way to avoid nosey family members, ravenous cousins and in-laws, almost off-key (you would have to know the key to be off-key) singing, is to spend some quality time with your laptop.

Graham offers a good selection of resources to fill a week, either now or at some other down time of the year.

Intro to Scalding by @posco and @argyris [video lecture]

Sunday, November 4th, 2012

Intro to Scalding by @posco and @argyris by Marti Hearst.

From the post:

On Thursday we learned about an alternative language for analyzing big data: Scalding. It’s built on Scala and is used extensively by the Twitter Revenue group. Oscar Boykin presented a lecture that he and Argyris Zymnis put together for us:

(video – see Marti’s post)

Because scalding is built on the functional programming language Scala, it has advantage oover Pig in that you can have the equivalent of user-defined functions directly in your code. See for the lecture notes more details. Be sure watch the video to get all the details especially since Oscar managed to make us all laugh throughout his lecture. Thanks guys!

Another great lecture from Marti’s class, “Analyzing Big Data with Twitter.”

When the revenue department of a business, at least a successful business, starts using a technology, it’s time to take notice.

Atomic Scala

Tuesday, August 28th, 2012

Atomic Scala by Bruce Eckel and Dianne Marsh.

From the webpage:

Atomic Scala is meant to be your first Scala book, not your last. We show you enough to become familiar and comfortable with the language — competent, but not expert. You’ll be able to write useful Scala code, but you won’t necessarily be able to read all the Scala code you encounter.

When you’re done, you’ll be ready for more complex Scala books, several of which we recommend at the end of this book.

The first 25% of the book is available for download.

Take a peek at the “about” page if the author names sound familiar. ;-)

I first saw this at Christopher Lalanne’s A bag of tweets / August 2012.

ScalaNLP

Monday, August 20th, 2012

ScalaNLP

From the homepage:

ScalaNLP is a suite of machine learning and numerical computing libraries.

ScalaNLP is the umbrella project for Breeze and Epic. Breeze is a set of libraries for machine learning and numerical computing. Epic (coming soon) is a high-performance statistical parser.

From the about page:

Breeze is a suite of Scala libraries for numerical processing, machine learning, and natural language processing. Its primary focus is on being generic, clean, and powerful without sacrificing (much) efficiency.

The library currently consists of several parts:

  • breeze-math: Linear algebra and numerics routines
  • breeze-process: Libraries for processing text and managing data pipelines.
  • breeze-learn: Machine Learning, Statistics, and Optimization.

Possible future releases:

  • breeze-viz: Vizualization and plotting
  • breeze-fst: Finite state toolkit

Breeze is the merger of the ScalaNLP and Scalala projects, because one of the original maintainers is unable to continue development. The Scalala parts are largely rewritten.

Epic is a high-performance statistical parser written in Scala. It uses Expectation Propagation to build complex models without suffering the exponential runtimes one would get in a naive model. Epic is nearly state-of-the-art on the standard benchmark dataset in Natural Language Processing. We will be releasing Epic soon.

In case you are interested in project history, Scalala source.

A fairly new community so drop by and say hello.

Scalding for the Impatient

Sunday, August 12th, 2012

Scalding for the Impatient by Sujit Pal.

From the post:

Few weeks ago, I wrote about Pig, a DSL that allows you to specify a data processing flow in terms of PigLatin operations, and results in a sequence of Map-Reduce jobs on the backend. Cascading is similar to Pig, except that it provides a (functional) Java API to specify a data processing flow. One obvious advantage is that everything can now be in a single language (no more having to worry about UDF integration issues). But there are others as well, as detailed here and here.

Cascading is well documented, and there is also a very entertaining series of articles titled Cascading for the Impatient that builds up a Cascading application to calculate TF-IDF of terms in a (small) corpus. The objective is to showcase the features one would need to get up and running quickly with Cascading.

Scalding is a Scala DSL built on top of Cascading. As you would expect, Cascading code is an order of magnitude shorter than equivalent Map-Reduce code. But because Java is not a functional language, implementing functional constructs leads to some verbosity in Cascading that is eliminated in Scalding, leading to even shorter and more readable code.

I was looking for something to try my newly acquired Scala skills on, so I hit upon the idea of building up a similar application to calculate TF-IDF for terms in a corpus. The table below summarizes the progression of the Cascading for the Impatient series. I’ve provided links to the original articles for the theory (which is very nicely explained there) and links to the source codes for both the Cascading and Scalding versions.

A very nice side by side comparison and likely to make you interested in Scalding.

Scalding

Saturday, August 4th, 2012

Scalding: Powerful & Concise MapReduce Programming

Description:

Scala is a functional programming language on the JVM. Hadoop uses a functional programming model to represent large-scale distributed computation. Scala is thus a very natural match for Hadoop.

In this presentation to the San Francisco Scala User Group, Dr. Oscar Boykin and Dr. Argyris Zymnis from Twitter give us some insight on Scalding DSL and provide some example jobs for common use cases.

Twitter uses Scalding for data analysis and machine learning, particularly in cases where we need more than sql-like queries on the logs, for instance fitting models and matrix processing. It scales beautifully from simple, grep-like jobs all the way up to jobs with hundreds of map-reduce pairs.

The Alice example failed (counted the different forms of Alice differently). I am reading a regex book so that may have made the problem more obvious.

Lesson: Test code/examples before presentation. ;-)

See the Github repository: https://github.com/twitter/scalding.

Both Scalding and the presentation are worth your time.

Scalatron

Wednesday, July 11th, 2012

Scalatron: Learn Scala with a programming game

From the homepage:

Scalatron is a free, open-source programming game in which bots, written in Scala, compete in a virtual arena for energy and survival. You can play by yourself against the computer or organize a tournament with friends. Scalatron may be the quickest and most entertaining way to become productive in Scala. – For updates, follow @scalatron on Twitter.

Entertaining and works right out of the “box.”

Well, remember the HBase 8080 conflict issue, so from the Scalatron documentation:

java -jar Scalatron.jar -help

Displays far more command line options than will be meaningful at first.

For the HBase 8080 issue, you need:

java -jar Scalatron.jar port int

or in my case:

java -jar Scalatron.jar port 9000

Caution, on startup it will ask to make Google Chrome your default browser. Good that it asks but annoying. Why not leave the user with whatever default browser they already prefer?

Anyway, starts up, asks you to create a user account (browser window) and can set the Administrator password.

Scalatron window opens up and I can tell this could be real addictive, in or out of ISO WG meetings. ;-)

Scala resources mentioned in the Scalatron Tutorial document:

Other Resources

It’s a bit close to the metal to use as a model for a topic map “game.”

But I like the idea of “bots” (read teams) competing against each other, except for the construction of a topic map.

Just sketching some rough ideas but assuming some asynchronous means of communication, say tweets, emails, IRC chat, a simple syntax (CTM anyone?), basic automated functions and scoring, that should be doable, even if not on a “web” scale. ;-)

By “basic automated functions” I mean more than simply parsing syntax for addition to a topic map but including the submission of DOIs, for example, which are specified to be resolved against a vendor or library catalog, with the automatic production of additional topics, associations, etc. Repetitive entry of information by graduate students only proves they are skillful copyists.

Assuming some teams will discover the same information as others, some timing mechanism and awarding of “credit” for topics/associations/occurrences added to the map would be needed.

Not to mention the usual stuff of contests, leader board, regular updating of the map, along with graph display, etc.

Something to think about. As I tell my daughter, life is too important to be taken seriously. Perhaps the same is true about topic maps.

Forwarded by Jack Park. (Who is not responsible for my musings on the same.)

High-Performance Domain-Specific Languages using Delite

Saturday, June 2nd, 2012

High-Performance Domain-Specific Languages using Delite

Description:

This tutorial is an introduction to developing domain specific languages (DSLs) for productivity and performance using Delite. Delite is a Scala infrastructure that simplifies the process of implementing DSLs for parallel computation. The goal of this tutorial is to equip attendees with the knowledge and tools to develop DSLs that can dramatically improve the experience of using high performance computation in important scientific and engineering domains. In the first half of the day we will focus on example DSLs that provide both high-productivity and performance. In the second half of the day we will focus on understanding the infrastructure for implementing DSLs in Scala and developing techniques for defining good DSLs.

The graph manipulation language Green-Marl is one of the subjects of this tutorial.

This resource should be located and “boosted” by a search engine tuned to my preferences.

Skipping breaks, etc., you will find:

  • Introduction To High Performance DSLs (Kunle Olukotun)
  • OptiML: A DSL for Machine Learning (Arvind Sujeeth)
  • Liszt: A DSL for solving mesh-based PDEs (Zach Devito)
  • Green-Marl: A DSL for efficient Graph Analysis (Sungpack Hong)
  • Scala Tutorial (Hassan Chafi)
  • Delite DSL Infrastructure Overview (Kevin Brown)
  • High Performance DSL Implementation Using Delite (Arvind Sujeeth)
  • Future Directions in DSL Research (Hassan Chafi)

Compare your desktop computer to the MANIAC 1 (calculations for the first hydrogen bomb).

What have you invented/discovered lately?

Procedural Reflection in Programming Languages Volume 1

Saturday, April 14th, 2012

Procedural Reflection in Programming Languages Volume 1

Brian Cantwell Smith’s dissertation that is the base document for reflection in programming languages.

Abstract:

We show how a computational system can be constructed to “reason”, effectively and consequentially, about its own inferential processes. The analysis proceeds in two parts. First, we consider the general question of computational semantics, rejecting traditional approaches, and arguing that the declarative and procedural aspects of computational symbols (what they stand for, and what behaviour they engender) should be analysed independently, in order that they may be coherently related. Second, we investigate self-referential behaviour in computational processes, and show how to embed an effective procedural model of a computational calculus within that calculus (a model not unlike a meta-circular interpreter, but connected to the fundamental operations of the machine in such a way as to provide, at any point in a computation, fully articulated descriptions of the state of that computation, for inspection and possible modification). In terms of the theories that result from these investigations, we present a general architecture for procedurally reflective processes, able to shift smoothly between dealing with a given subject domain, and dealing with their own reasoning processes over that domain.

An instance of the general solution is worked out in the context of an applicative language. Specifically, we present three successive dialects of LISP: 1-LISP, a distillation of current practice, for comparison purposes; 2-LISP, a dialect constructed in terms of our rationalised semantics, in which the concept of elevation is rejected in favour of independent notions of simplification and reference, and in which the respective categories of notation, structure, semantics, and behaviour are strictly aligned; and 3-LISP, an extension of 2-LISP endowed with reflective powers. (Warning: Hand copied from an image PDF. Tying errors may have occurred.)

I think reflection as it is described here is very close to Newcomb’s notion of composite subject identities, which are themselves composed of composite subject identities.

Has me wondering what a general purpose identification language with reflection would look like?

Martin Odersky: Reflection and Compilers

Saturday, April 14th, 2012

Martin Odersky: Reflection and Compilers

From the description:

Reflection and compilers do tantalizing similar things. Yet, in mainstream, statically typed languages the two have been only loosely coupled, and generally share very little code. In this talk I explore what happens if one sets out to overcome their separation.

The first half of the talk addresses the challenge how reflection libraries can share core data structures and algorithms with the language’s compiler without having compiler internals leaking into the standard library API. It turns out that a component system based on abstract types and path-dependent types is a good tool to solve this challenge. I’ll explain how the “multiple cake pattern” can be fruitfully applied to expose the right kind of information.

The second half of the talk explores what one can do when strong, mirror-based reflection is a standard tool. In particular, the compiler itself can use reflection, leading to a particular system of low-level macros that rewrite syntax trees. One core property of these macros is that they can express staging, by rewriting a tree at one stage to code that produces the same tree at the next stage. Staging lets us implement type reification and general LINQ-like functionality. What’s more, staging can also be applied to the macro system itself, with the consequence that a simple low-level macro system can produce a high-level hygienic one, without any extra effort from the language or compiler.

Ignore the comments about the quality of the sound and video. It looks like substantial improvements have been made or I am less sensitive to those issues. Give it a try and see what you think.

Strikes me as being very close to Newcomb’s thoughts on subject identity being composed of other subject identities.

Such that you could have subject representatives that “merge” together and then themselves form the basis for merging other subject representatives.

Suggestions of literature on reflection, its issues and implementations? (Donated books welcome as well. Contact for physical delivery address.)

Neo4j Spring Data & Scala

Sunday, April 1st, 2012

Neo4j Spring Data & Scala by Jan Machacek.

From the post:

Spring Data is an excellent tool that generates implementations of repositories using the naming conventions similar to the convention used in the dynamic language runtimes such as Grails and Ruby on Rails. In this post, I am going to show you how to use Spring Data in your Scala code.

In this post, we will construct trivial application that uses the Spring Data Neo4j to persist simple User objects. The only difference is that we’ll use Scala throughout and highlight some of the sticky points of Spring Data in Scala.

The post seeks to illustrate that Spring remains relevant, even after the advent of Scala.

It does that but code adoption, like application of security patches, is a mixed bag. Some people are using (read advocating) the latest releases, some people are using useful (read stable) software and still others are using older (read unsupported) software. You are likely to find Neo4j in one or more of those environments. Documentation for any and/or all of them would promote usage of Neo4j.

Multiperspective

Sunday, March 4th, 2012

Multiperspective

A hosted “content management system,” aka, a hosted website solution. Based on Scala and Neo4j.

I suspect that Scala and Neo4j make it easier for the system developers to offer a hosted website solution.

I am not sure that in a hosted solution the average web developer will notice the difference.

Still, unless you want a “custom” domain name, the service is “free” with some restrictions.

Would be interested if you can tell that it is Scala and Neo4j powering the usual services?

From “Under the hood”

Multispective.com is a next-generation content management system. In this post we will look how this system works and how this setup can benefit our users.

Unlike most other content management systems, multispective.com is entirely built in the programming language Scala, which means it runs on the rock-solid and highly performant Java Virtual Machine.

Scala offers us a highly powerful programming model, greatly cutting back the amount of software we had to write, while its powerful type system reduces the number of potential coding errors.

Another unique feature of multispective.com is the use of the Neo4j database engine.

Nearly all content management systems in use today, store their information in a Relational Database Management System (RDBMS), a proven technology ubiquitous around the ICT spectrum.

Relational Database Management Systems are very useful and have become extremely robust through decades of improvements, but they are not very well suited for highly connected data.

The world-wide-web is highly connected and in our search for the right technology for our software, we decided a different approach towards storage of data was needed.

Neo4j ended up to be the prefered solution for our storage needs. This database engine is based upon the model of the property-graph. Where a RDBMS stores information in tables, Neo4j stores information as nodes and relationships, where both can contain properties.

The data model of the property-graph is extremely simple, so it’s easy to reason about.

There were two main advantages to a graph-database for us. First of all, relationships are explicitly stored in the database. This makes navigating over complex networked data possible while maintaining a reasonable performance. Secondly, a graph database does not require a schema.

Variations for computing results from sequences in Scala

Saturday, February 18th, 2012

Variations for computing results from sequences in Scala

From the post:

A common question from students who are new to Scala is: What is the difference between using the map function on lists, using for expressions and foreach loops? One of the major sources of confusion with regard to this question is that a for expression in Scala in not the equivalent of for loops in languages like Python and Java — instead, the equivalent of for loops is foreach in Scala. This distinction highlights the importance of understanding what it means to return values versus relying on side-effects to perform certain computations. It also helps reinforce some points about fixed versus reassignable variables and immutable versus mutable data structures.

Continuing with the FP theme. Don’t miss the links to additional tutorial materials on Scala at the end of this post.

Effective Scala – Best Practices from Twitter

Friday, February 17th, 2012

Effective Scala – Best Practices from Twitter by Bienvenido David III.

From the post:

Twitter has open sourced its Effective Scala guide. The document is on GitHub, with the Git repository URL https://github.com/twitter/effectivescala.git. The document is licensed under CC-BY 3.0.

Scala is one of the primary programming languages used at Twitter, and most of the Twitter infrastructure is written in Scala. The Effective Scala guide is a series of short essays, a set of “best practices” learned from using Scala inside Twitter. Twitter’s use of Scala is mainly for creating high volume, distributed systems, though most of the guide should be applicable to other domains.

Sounds like a book to read if you are either looking for work at Twitter or just want to get better at Scala. Both are worthwhile goals.

Spring and Scala (Scala User Group London talk)

Thursday, February 9th, 2012

Spring and Scala (Scala User Group London talk) by Jan Machacek.

From the post:

Many thanks to all who came to my Spring in Scala talk. The video is now available at Skills Matters website, I am adding the slides in PDF the source code on Github and links to the other posts that explain in more detail the topics I mentioned in the talk.

It would be very nice if this becomes a tradition for Skills Matters presentations. Video, slides, source code and a post with links to further resources.

Watch the presentation, download the slides and source code and read this post carefully. You won’t be disappointed.

Description of the presentation:

In this Spring in Scala talk, Jan Machacek will start by comparing Scala to the other languages on the Java platform. Find out that Scala code gets compiled to regular Java bytecode, making it accessible to your Spring code. You will also learn what functional programming means and how to see & apply the patterns of functional programming in what we would call enterprise code. In addition to being functional language, Scala is strongly typed language.

The second part of the talk will therefore explore the principles of type systems. You will find out what polymorphic functions are, and what the Scala wizards mean when they talk about type covariance and contravariance. Throughout the talk, there will be plenty of code examples comparing the Spring bean in Java with their new form in Scala; together with plentiful references to the ever-growing Scala ecosystem, the talk will give you inspiration & guidance on using Scala in your Spring applications. Come over and find your functional mojo!

SKA LA

Tuesday, February 7th, 2012

SKA LA (link broken by site relocation, see below). Andy Petrella writes a multi-part series on:

Neo4J with Scala Play! 2.0 on Heroku

The outline from the first post:

I’ll try here to gather all steps of a spike I did to have a web prototype using scala and a graph database.

For that I used the below technologies.

Play! Framework as the web framework, in its 2.0-RC1 version.

Neo4J as the back end service for storing graph data.

Scala for telling the computer what it should do…

Here is an overview of what will be covered in the current suite.

  1. How to install Play! 2.0 RC1 from Git
  2. Install Neo4J and run it in a Server Mode. Explain its REST/Json Interface.
  3. Create a Play! project. Update it to open it in IDEA Community Edition
  4. An introduction of the Json facilities of Play! Scala. With the help of the SJson paradigm.
  5. Introduction of the Dispatch Scala library for HTTP communication
  6. How to use effeciently Dispatch’s Handler and Play!’s Json functionality together.
  7. Illustrate how to send Neo4J REST requests. For creating generic node, then create a persistent service that can re/store domain model instances.
  8. Create some views (don’t bother me for ‘em … I’m not a designer ^^) using Scala templates and Jquery ajax for browsing model and creating instances.
  9. Deploy the whole stuffs on Heroku.

If you aren’t already closing in on the winning entry for the Neo4j Challenge, this series of post will get you a bit closer!

BTW, remember the deadline is February 29th. (Leap year if you are using the Gregorian system.)


All nine parts have been posted. Until I can make more tidy repairs, see: https://bitly.com/bundles/startupgeek/4

Parallelizing Machine Learning– Functionally: A Framework and Abstractions for Parallel Graph Processing

Sunday, February 5th, 2012

Parallelizing Machine Learning– Functionally: A Framework and Abstractions for Parallel Graph Processing by Heather Miller and Philipp Haller.

Abstract:

Implementing machine learning algorithms for large data, such as the Web graph and social networks, is challenging. Even though much research has focused on making sequential algorithms more scalable, their running times continue to be prohibitively long. Meanwhile, parallelization remains a formidable challenge for this class of problems, despite frameworks like MapReduce which hide much of the associated complexity.We present a framework for implementing parallel and distributed machine learning algorithms on large graphs, flexibly, through the use of functional programming abstractions. Our aim is a system that allows researchers and practitioners to quickly and easily implement (and experiment with) their algorithms in a parallel or distributed setting. We introduce functional combinators for the flexible composition of parallel, aggregation, and sequential steps. To the best of our knowledge, our system is the first to avoid inversion of control in a (bulk) synchronous parallel model.

An area of research that appears to have a great deal of promise. Very much worth your attention.

Typesafe Stack

Tuesday, December 27th, 2011

Typesafe Stack

From the website:

Scala. Akka. Simple.

A 100% open source, integrated distribution offering Scala, Akka, sbt, and the Scala IDE for Eclipse.

The Typesafe Stack makes it easy for developers to get started building scalable software systems with Scala and Akka. The Typesafe Stack is based on the most recent stable versions of Scala and Akka, and provides all of the major components needed to develop and deploy Scala and Akka applications.

Go ahead! You need something new to put on your new, shiny 5TB disk drive. ;-)