Archive for the ‘Ruby’ Category

Transducers – java, js, python, ruby

Saturday, November 8th, 2014

Transducers – java, js, python, ruby

Struggling with transducers?

Learn better by example?

Cognitect Labs has released transducers for Java, JavaScript, Ruby, and Python.

Clojure recently added support for transducers – composable algorithmic transformations. These projects bring the benefits of transducers to other languages:

BTW, take a look at Rich Hickey’s latest (as of Nov. 2014) video on Transducers.

Please forward to language specific forums.

Class Scheduling [Tutorial FoundationDB]

Saturday, December 21st, 2013

Class Scheduling

From the post:

This tutorial provides a walkthrough of designing and building a simple application in Python using FoundationDB. In this tutorial, we use a few simple data modeling techniques. For a more in-depth discussion of data modeling in FoundationDB, see Data Modeling.

The concepts in this tutorial are applicable to all the languages supported by FoundationDB. If you prefer, you can see a version of this tutorial in:

The offering of the same tutorial in different languages looks like a clever idea.

Like using a polyglot edition of the Bible with parallel original text and translations.

In a polyglot, the associations between words in different languages are implied rather than explicit.

Tiny Data: Rapid development with Elasticsearch

Sunday, October 27th, 2013

Tiny Data: Rapid development with Elasticsearch by Leslie Hawthorn.

From the post:

Today we’re pleased to bring you the story of the creation of SeeMeSpeak, a Ruby application that allows users to record gestures for those learning sign language. Florian Gilcher, one of the organizers of the Berlin Elasticsearch User Group participated in a hackathon last weekend with three friends, resulting in this brand new open source project using Elasticsearch on the back end. (Emphasis in original.)


Sadly, there are almost no good learning resources for sign language on the internet. If material is available, licensing is a hassle or both the licensing and the material is poorly documented. Documenting sign language yourself is also hard, because producing and collecting videos is difficult. You need third-party recording tools, video conversion and manual categorization. That’s a sad state in a world where every notebook has a usable camera built in!

Our idea was to leverage modern browser technologies to provide an easy recording function and a quick interface to categorize the recorded words. The result is SeeMeSpeak.

Two lessons here:

  1. Data does not have to be “big” in order to be important.
  2. Browsers are very close to being the default UI for users.

Ferret – Indexing with Ruby

Tuesday, May 7th, 2013

Ferret – Indexing with Ruby

From the post:

Following my scrap and crawling experiences, I was looking for a good indexer. Initially I was setup to use Lucene, as I got pretty good recomendations about. Lucene really shines, but I was decided about using Ruby or any other scripting language to avoid bloated code.

Browsing around I found about Ferret, which is a text indexing library for Ruby. The benchmarks and references were good, and so I setup to work on some testing to get used to it. Fortunately, the results were good, and the API is a breeze. Also pagination is built-in. How cool is that ?

For an initial test, I setup to index the Linux Kernel source code. By looking at Brian McCallister example, I wrote two small scripts: indexer.rb and search.rb. I ran indexer over the source tree, and came up with some very interesting results. The words I searched for were ‘net’, ‘skb’, ‘x86′ and finally ‘linux’.

You probably want to drop by Ferret, to pick up the source.

Ferret is described there as:

Ferret is an information retrieval library in the same vein as Apache Lucene[1]. Originally it was a full port of Lucene but it now uses it’s own file format and indexing algorithm although it is still very similar in many ways to Lucene. Everything you can do in Lucene you should be able to do in Ferret.

Critical Ruby On Rails Issue Threatens 240,000 Websites [Ruby TMs Beware]

Friday, January 11th, 2013

Critical Ruby On Rails Issue Threatens 240,000 Websites by Mathew J. Schwartz.

From the post:

All versions of the open source Ruby on Rails Web application framework released in the past six years have a critical vulnerability that an attacker could exploit to execute arbitrary code, steal information from databases and crash servers. As a result, all Ruby users should immediately upgrade to a newly released, patched version of the software.

That warning was sounded Tuesday in a Google Groups post made by Aaron Patterson, a key Ruby programmer. “Due to the critical nature of this vulnerability, and the fact that portions of it have been disclosed publicly, all users running an affected release should either upgrade or use one of the work arounds immediately,” he wrote. The patched versions of Ruby on Rails (RoR) are 3.2.11, 3.1.10, 3.0.19 and 2.3.15.

As a result, more than 240,000 websites that use Ruby on Rails Web applications are at risk of being exploited by attackers. High-profile websites that employ the software include Basecamp, Github, Hulu, Pitchfork, Scribd and Twitter.

Ruby developers will already be aware of this issue but if you have Ruby-based topic map software, you may not have an in-house Ruby developer.

The major players in the Ruby community are concerned so it’s time to ask someone to look at any Ruby software, topic maps or not, that you are running.

If you are interested in the details, see: Analysis of Rails XML Parameter Parsing Vulnerability.

At its heart, a subject identity issue.

If symbol and yaml types had defined properties/values (or value ranges) as part of their “identity,” then other routines could reject instances that do not meet a “safe” identity test.

But because instances are treated as having primitive identities, what gets injected is what you get (WGIIWY).

Using a Graph Database with Ruby [Parts 1 and 2]

Wednesday, January 9th, 2013

Using a Graph Database with Ruby. Part 1: Introduction and Using a Graph Database with Ruby. Part 2: Integration by Thiago Jackiw.

From the introduction to Part 2:

In the first article, we learned about graph databases, their differences and advantages over traditional databases, and about Neo4j. In this article, we are going to install Neo4j, integrate and evaluate the gems listed in the first part of this series.

The scenario that we are going to be working with is the continuation of the simple idea in the first article, a social networking example that is capable of producing traversal queries such as “given the fact that Bob is my friend, give me all friends that are friend’s of friend’s of friend’s of Bob”.

You may want to skip the first part if you are already familiar enough with Neo4j or graphs to want to use them. 😉

The second part walks you through creation of enough data to demonstrate traversals and some of the capabilities of Neo4j.

I first saw this in a tweet by Glenn Goodrich.

Mining Twitter Data with Ruby – Visualizing User Mentions

Thursday, September 27th, 2012

Mining Twitter Data with Ruby – Visualizing User Mentions by Greg Moreno.

From the post:

In my previous post on mining twitter data with ruby, we laid our foundation for collecting and analyzing Twitter updates. We stored these updates in MongoDB and used map-reduce to implement a simple counting of tweets. In this post, we’ll show relationships between users based on mentions inside the tweet. Fortunately for us, there is no need to parse each tweet just to get a list of users mentioned in the tweet because Twitter provides the “entities.mentions” field that contains what we need. After we collected the “who mentions who”, we then construct a directed graph to represent these relationships and convert them to an image so we can actually see it.

Good lesson in paying attention to your data stream.

Can impress your clients with elaborate system for parsing tweets for mentions or you can just use the “entities.mentions” field.

I would rather used the “entities.mentions” field’s content to create linkage to more content. Possibly searched/parsed content.

Question of where you are going to devote your resources.

Machine Learning in All Languages: Introduction

Wednesday, September 5th, 2012

Machine Learning in All Languages: Introduction by Burak Kanber.

From the post:

I love machine learning algorithms. I’ve taught classes and seminars and given talks on ML. The subject is fascinating to me, but like all skills fascination simply isn’t enough. To get good at something, you need to practice!

I also happen to be a PHP and Javascript developer. I’ve taught classes on both of these as well — but like any decent software engineer I have experience with Ruby, Python, Perl, and C. I just prefer PHP and JS. Before you flame PHP, I’ll just say that while it has its problems, I like it because it gets stuff done.

Whenever I say that Tidal Labs’ ML algorithms are in PHP, they look at me funny and ask me how it’s possible. Simple: it’s possible to write ML algorithms in just about any language. Most people just don’t care the learn the fundamentals strongly enough that they can write an algorithm from scratch. Instead, they rely on Python libraries to do the work for them, and end up not truly grasping what’s happening inside the black box.

Through this series of articles, I’ll teach you the fundamental machine learning algorithms in a variety of languages, including:

  • PHP
  • Javascript
  • Perl
  • C
  • Ruby

Just started so too soon to comment but thought it might be of interest.

Hadoop Streaming Support for MongoDB

Saturday, June 9th, 2012

Hadoop Streaming Support for MongoDB

From the post:

MongoDB has some native data processing tools, such as the built-in Javascript-oriented MapReduce framework, and a new Aggregation Framework in MongoDB v2.2. That said, there will always be a need to decouple persistance and computational layers when working with Big Data.

Enter MongoDB+Hadoop: an adapter that allows Apache’s Hadoop platform to integrate with MongoDB.

[graphic omitted]

Using this adapter, it is possible to use MongoDB as a real-time datastore for your application while shifting large aggregation, batch processing, and ETL workloads to a platform better suited for the task.

[graphic omitted]

Well, the engineers at 10gen have taken it one step further with the introduction of the streaming assembly for Mongo-Hadoop.

What does all that mean?

The streaming assembly lets you write MapReduce jobs in languages like Python, Ruby, and JavaScript instead of Java, making it easy for developers that are familiar with MongoDB and popular dynamic programing languages to leverage the power of Hadoop.

I like that, “…popular dynamic programming languages…” 😉

Any improvement to increase usability without religious conversion (using a programming language not your favorite) is a good move.

Neo4j.rb – Update

Saturday, April 21st, 2012

Neo4j.rb – Update

From the webpage:

Neo4j.rb is a graph database for JRuby.

You can think of Neo4j as a high-performance graph engine with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables — yet enjoys all the benefits of a fully transactional, enterprise-strength database.

It uses two powerful and mature Java libraries:

  • Neo4J – for persistence and traversal of the graph
  • Lucene for querying and indexing.

New features include:

Rules, and

Cypher DSL queries.

Neo4j: Social Skills for Ruby Developers

Sunday, February 12th, 2012

Neo4j: Social Skills for Ruby Developers

From the description:

Ruby developers tend to be a lonely bunch. Slumped over a Mac in a dimly lit corner of a warehouse turned open-workspace. Unable to approach new people and introduce yourself. Unable to have a conversation that doesn’t devolve into an opinionated debate. Social skills are limited to what you learned from Manga. Unfortunately, you can’t use those in real life. Yet, one day, someone shows up and asks if you can build ’em a “social site” – you know, friends, activity feeds, jealousy. And privacy settings. “Me?”, you think. “You want ME to build you a SOCIAL site?” Go ahead. Reach for that bottle of Neo4J. Its time to celebrate!

About the Presenter

Prasanna Pendse (a member of the ChicagoRuby organizer team) was rescued from the mines deep in the belly of ClearCase, CMM, Digital Six Sigma and Waterfall five years ago. His newfound freedom at ThoughtWorks ( took him to such far flung places as China, Japan, India, Hong Kong and Malvern, PA. His travels brought him many points and upgrades, but the one thing that brings him most joy is Ruby! One day, Prasanna was slumped over his Mac in a dimly lit corner of a warehouse turned open-workspace when someone approached him and asked “can we make this social?”

Heroku Neo4j, App Harbor MVC4, Neo4jClient & Ruby Proxy

Sunday, February 5th, 2012

Heroku Neo4j, App Harbor MVC4, Neo4jClient & Ruby Proxy

The .NET environment for Neo4j has gotten easier to setup.

Romiko Derbynew outlays the process of deploying a 4 layers architecture using Heroku Neo4j, App Harbor MVC4, Neo4jClient and Ruby Proxy.

Well, there are some prerequisites:

Getting Started with Heroku ToolBelt/Neo4j on Windows.

Proxy Ruby Gem from

(I suppose saying you also need to have Ruby installed would be a bit much? 😉 )

Seriously, value of work by Romiko and others to create paths to be forked and expanded by others for Neo4j cannot be over estimated.

Java, Python, Ruby, Linux, Windows, are all doomed

Friday, February 3rd, 2012

Java, Python, Ruby, Linux, Windows, are all doomed by Russell Winder.

From the description:

The Multicore Revolution gathers pace. Moore’s Law remains in force — chips are getting more and more transistors on a quarterly basis. Intel are now out and about touting the “many core chip”. The 80-core chip continues its role as research tool. The 48-core chip is now actively driving production engineering. Heterogeneity not homogeneity is the new “in” architecture.

Where Intel research, AMD and others cannot be far behind.

The virtual machine based architectures of the 1990s, Python, Ruby and Java, currently cannot cope with the new hardware architectures. Indeed Linux and Windows cannot cope with the new hardware architectures either. So either we will have lots of hardware which the software cannot cope with, or . . . . . . well you’ll just have to come to the session.

The slides are very hard to see so grab a copy at:

From the description: Heterogeneity not homogeneity is the new “in” architecture.

Is greater heterogeneity in programming languages coming?

ROMA User-Customizable NoSQL Database in Ruby

Friday, January 27th, 2012

ROMA User-Customizable NoSQL Database in Ruby

From the presentation:

  • User-customizable NoSQL database in Ruby
  • Features
    • Key-value model
    • High scalability
    • High availability
    • Fault-tolerance
    • Better throughput
    • And…
  • To meet application-specific needs, ROMA provides
    • Plug-in architecture
    • Domain specific language (DSL) for Plug-in
  • ROMA enables meeting the above need in Rakuten Travel

The ROMA source code:

Reportedly has 70 million users and while that may not be “web scale,” it may scale enough to meet your needs. 😉

Of particular interest are the DSL capabilities. See slides 31-33. Declaring your own commands. Something for other projects to consider.

Neo4j on Heroku (pts. 1, 2 & 3)

Wednesday, January 18th, 2012

Neo4j on Heroku Part 1 starts out:

On his blog Marko A. Rodriguez showed us how to make A Graph-Based Movie Recommender Engine with Gremlin and Neo4j.

In this two part series, we are going to take his work from the Gremlin shell and put it on the web using the Heroku Neo4j add-on and altering the Neovigator project for our use case. Heroku has a great article on how to get an example Neo4j application up and running on their Dev Center and Michael Hunger shows you how to add JRuby extensions and provides sample code using the Neo4j.rb Gem by Andreas Ronge.

We are going to follow their recipe, but we are going to add a little spice. Instead of creating a small 2 node, 1 relationship graph, I am going to show you how to leverage the power of Gremlin and Groovy to build a much larger graph from a set of files.

Neo4j on Heroku Part 2 starts out:

We are picking up where we left off on Neo4j on Heroku –Part One so make sure you’ve read it or you’ll be a little lost. So far, we have cloned the Neoflix project, set up our Heroku application and added the Neo4j add-on to our application. We are now ready to populate our graph.

CAUTION: Part 2 populates the graph with over one million relationships! If you are looking for trivial uses of Neo4j, you had better stop here in part 2.

Neo4j on Heroku Part3 starts out:

This week we learned that leaving the create_graph method accessible to the world was a bad idea. So let’s go ahead and delete that route in Sinatra, and instead create a Rake Task for it.

And announces the Neo4j Challenge!

Thanks Max De Marzi!


Friday, January 6th, 2012


From the webpage:

Neography is a thin Ruby wrapper to the Neo4j Rest API, for more information:

If you want to the full power of Neo4j, you will want to use JRuby and the excellent Neo4j.rb gem at by Andreas Ronge

A complement to Neography is the Neology Gem at by Carlo Alberto Degli Atti

An alternative is the Architect4r Gem at by Maximilian Schulz

For all you Ruby hackers out there!

Getting started with Ruby and Neo4j

Thursday, January 5th, 2012

Getting started with Ruby and Neo4j

Max De Marzi walks you through installation of neography and then to making a social network graph. Nothing new but a gentle introduction to Neo4j with promises of more to come on Gremlin and Cypher (ways to walk across the graph).

Pass along to any Rubyists that need an introduction to Neo4j.

Heroku, Neo4j and Google Spreadsheet in 10min. Flat.

Saturday, December 3rd, 2011

Heroku, Neo4j and Google Spreadsheet in 10min. Flat. by Peter Neubauer

From the description:

This screencast shows how to use Neo4j on Heroku. We will do:

  • Create and install a Heroku app
  • Add a Neo4j instance to it
  • create a custom Ruby app
  • execute Cypher queries
  • Connect to the app using Google Spreadsheet
  • Build a small bar chart from a Cypher query.

Great presentation, with one tiny flaw. That is that the screen was so tiny that one has to guess at the contents of commands. Sure I can come fairly close but a file with transcripts of the terminal sessions and code would be nicer.

Recommend that you download the video for viewing. Watch it once online and you will see what I mean. I ran it on a 22 inch Samsung as full screen and a copy of the command sequence would have been appreciated.

Mneme: Scalable Duplicate Filtering Service

Saturday, November 12th, 2011

Mneme: Scalable Duplicate Filtering Service

From the post:

Detecting and dealing with duplicates is a common problem: sometimes we want to avoid performing an operation based on this knowledge, and at other times, like in a case of a database, we want may want to only permit an operation based on a hit in the filter (ex: skip disk access on a cache miss). How do we build a system to solve the problem? The solution will depend on the amount of data, frequency of access, maintenance overhead, language, and so on. There are many ways to solve this puzzle.

In fact, that is the problem – they are too many ways. Having reimplemented at least half a dozen solutions in various languages and with various characteristics at PostRank, we arrived at the following requirements: we want a system that is able to scale to hundreds of millions of keys, we want it to be as space efficient as possible, have minimal maintenance, provide low latency access, and impose no language barriers. The tradeoff: we will accept a certain (customizable) degree of error, and we will not persist the keys forever.

Mneme: Duplicate filter & detection

Mneme is an HTTP web-service for recording and identifying previously seen records – aka, duplicate detection. To achieve the above requirements, it is implemented via a collection of bloomfilters. Each bloomfilter is responsible for efficiently storing the set membership information about a particular key for a defined period of time. Need to filter your keys for the trailing 24 hours? Mneme can create and automatically rotate 24 hourly filters on your behalf – no maintenance required.

Interesting in several respects:

  1. Duplicate detection
  2. Duplicate detection for a defined period of time
  3. Duplicate detection for a defined period of time with “customizable” degree of error

Would depend on your topic map project requirements. Assuming absolute truth forever and ever isn’t one of them, detecting duplicate subject representatives for some time period at a specified error rate may be the concepts you are looking for.

Enables a discussion of how much certainly (error rate) for how long (time period) for detection of duplicates (subject representatives) on what basis? All of those are going to impact project complexity and duration.

Interesting as well as a solution that for some duplicate detection requirements will work quite well.

A new interface for IUCAT: Blacklight

Sunday, October 30th, 2011

A new interface for IUCAT: Blacklight by Hyeran Kang.

From the post:

As you may have heard, work has begun on a new interface for IUCAT. The IU Libraries OLE Discovery Layer Implementation Task Force (DLITF) will be overseeing the implementation of a new discovery layer, powered by Blacklight, to overlay our current SirsiDynix system. Development work is going on during this fall semester and a public Beta will be launched in spring 2012. This is a good time to share some background information around the new discovery interface, Blacklight.

What is Blacklight?

Blacklight is a free and open source OPAC (Online Public Access Catalog) solution developed at the University of Virginia (UVA) Library; check the project site for detailed information. While some OSS (Open Source Software) systems, such as Evergreen and Koha, were developed to replace a library’s entire ILS (Integrated Library System), Blacklight has been designed to work with a library’s current ILS to assist in reengineering the library’s searching tools. It uses Apache Solr for indexing and searching records and Ruby on Rails for its front end.

If Solr or Ruby on Rails is on your “to be learned” list, you might want bump one or both up a notch or two and consider contributing to the Blacklight project.


Sunday, September 25th, 2011

MonoTable – Zero-admin, no single-point-of-failure, scalable NoSQL Data-Store in Ruby

From the webpage:


It’ll be available as a gem.


We are in the early design/implementaiton phase.

Primary Goals

  • Ordered key-value store / document store
  • REST api
  • Scales with ease
  • Easy setup and admin


The MonoTable Data Structure

“Everything should be made as simple as possible, but no simpler.” -Einstein

MonoTable stores all data in a single table. The table consists of records sorted by their keys. Each record, in addition to their key, can have 0 or more named fields. Basicaly, it’s a 2-dimensional hash where the first dimension supports range selects.

Sounds interesting but remember that Einstein may have been wrong about other issues: Models, Relativity & Reality.

Overview of Neo4j.rb 1.0.0

Sunday, May 1st, 2011

Overview of Neo4j.rb 1.0.0

While specific for JRuby, this is a very nice set of guides, examples and documentation that will benefit anyone using Neo4J.

BTW, the main page for Neo4j.rb.

You may also be interested in Introduction to Neo4j.rb. (Note, the first slide really is blank except for “JAYWAY.” Just go to the next slide.)