Archive for May, 2014

Functional programming with Clojure

Wednesday, May 28th, 2014

Functional programming with Clojure

A MOOC being lead by: Juhana Laurinharju, Jani Rahkola, and Ilmari Vacklin.

From the homepage:

Functional programming is a programming paradigm where pure functions are the basic building blocks of programs. A pure function is like a function in the mathematical sense. The outputs of the function are fully determined by its inputs. The idea is that this restriction makes your programs easier to understand. This course shows how you can code meaningful programs with mainly pure functions. Pure functional programming differs from object-oriented programming in that e.g. it does not make use of variables or loops.

The course is an introduction to functional programming with a dynamically typed language Clojure. We start with an introduction to Clojure; its syntax and development environment. Clojure has a good selection of data structures and we cover most of them. We also go through the basics of recursion and higher-order functions. The course material is in English.

Clojure is a young Lispish functional programming language on the Java virtual machine (JVM) platform, suitable for small and large programs. Because it runs on the JVM, all Clojure programs can use all the standard and third-party Java libraries freely. It offers tools for many tasks that are harder with other languages and has a special focus on concurrent programming.

The only registration is a GitHub account.

Now that’s a friendly registration process!

Enjoy!

I first saw this in Christophe Lalanne’s A bag of tweets / May 2014.

Microsoft Research’s Naiad Project

Wednesday, May 28th, 2014

Solve the Big Data Problems of the Future: Join Microsoft Research’s Naiad Project by Tara Grumm.

From the post:

Over the past decade, general-purpose big data platforms like Hadoop have brought distributed computing into the mainstream. As people have become accustomed to processing their data in the cloud, they have become more ambitious, wanting to do things like graph analysis, machine learning, and real-time stream processing on their huge data sources.

Naiad is designed to solve this more challenging class of problems: it adds support for a few key primitives – maintaining state, executing loops, and reacting to incoming data – and provides high-performance infrastructure for running them in a scalable distributed system.

The result is the best of both worlds. Naiad runs simple programs just as fast as existing general-purpose platforms, and complex programs as fast as specialized systems for graph analysis, machine learning, and stream processing. Moreover, as a general-purpose system, Naiad lets you compose these different applications together, enabling mashups (such as computing a graph algorithm over a real-time sliding window of a social media firehose) that weren’t possible before.

Who should use Naiad?

We’ve designed Naiad to be accessible to a variety of different users. You can get started right away with Naiad by writing programs using familiar declarative operators based on SQL and LINQ.

For power users, we’ve created low-level interfaces to make it possible to extend Naiad without sacrificing any performance. You can plug in optimized data structures and algorithms, and build new domain-specific languages on top of Naiad. For example, we wrote a graph processing layer on top of Naiad that has performance comparable with (and often better than) specialized systems designed only to process graphs.

Big data geeks and open source supporters should take a serious look at the Naiad Project.

It will take a while but the real question in the future will be how well you can build upon a continuous data substrate.

Or as Harvey Logan says in Butch Cassidy and the Sundance Kid,

Rules? In a knife fight? No rules!

I would prepare accordingly.

OpenVis Conf (videos 2014)

Tuesday, May 27th, 2014

OpenVis Conf

Eighteen (18) great videos are up for your viewing pleasure!

I will have to limit myself to one video per day so I won’t have too many new ideas. 😉

Enjoy!

I first saw this in a tweet by Rob Simon.

DR Radio

Tuesday, May 27th, 2014

Dark Reading To Launch Weekly Internet Radio Show by Tim Wilson.

From the post:

DR Radio will take place every Wednesday at 1:00 p.m. ET and will feature live chat; first topic will be “A Day in the Life of a Penetration Tester.”

Dark Reading, the online community for security professionals, will launch a new Internet radio show tomorrow that will feature special guests, news, and an interactive chat enabling readers to talk with editors, speakers, and each other.

Dark Reading Radio, which will take place every Wednesday from 1:00 p.m. to 2:00 p.m. ET, will provide a platform to discuss new threats, best-practices, emerging research, and topics in the news. The inaugural show, which will take place May 21, is called A Day in the Life of a Penetration Tester and features John Sawyer, a top pen tester at InGuardians. Registration for the show is open now.

The goal of Dark Reading Radio is to provide a real-time conversation among security professionals, industry experts, and Dark Reading editors. The show’s format will include a 30-minute audio interview with a featured guest, but the online chat will take place for a full hour, enabling the audience to interact with the speaker after the interview has concluded.

Tomorrow:

Dark Reading Radio: The Real Reason Security Jobs Remain Vacant

Join us Wednesday, May 28, at 1:00 p.m. Eastern, to learn why good security staff really are not hard to find, if you know what to look for.

Nice way to break the week up.

Are there any weekly “semantic technology” radio shows?

Data as Code. Code as Data:…

Tuesday, May 27th, 2014

Data as Code. Code as Data: Tighther Semantic Web Development Using Clojure by Frédérick Giasson.

From the post:

I have been professionally working in the field of the Semantic Web for more than 7 years now. I have been developing all kind of Ontologies. I have been integrating all kind of datasets from various sources. I have been working with all kind of tools and technologies using all kind of technologies stacks. I have been developing services and user interfaces of all kinds. I have been developing a set of 27 web services packaged as the Open Semantic Framework and re-implemented the core Drupal modules to work with RDF data has I wanted it to. I did write hundred of thousands of line of codes with one goal in mind: leveraging the ideas and concepts of the Semantic Web to make me, other developers, ontologists and data-scientists working more accurately and efficiently with any kind data.

However, even after doing all that, I was still feeling a void: a disconnection between how I was think about data and how I was manipulating it using the programming languages I was using, the libraries I was leveraging and the web services that I was developing. Everything is working, and is working really well; I did gain a lot of productivity in all these years. However, I was still feeling that void, that disconnection between the data and the programming language.

Frédérick promises to walk us through serializing RDF data into Clojure code.

Doesn’t that sound interesting?

Hmmm, will we find that data has semantics? And subjects that the data represents?

Can’t say, don’t know. But I am very interested in finding out how far Frédérick will go with “Data as Code. Code as Data.”

A crowdsourcing approach to building a legal ontology from text

Tuesday, May 27th, 2014

A crowdsourcing approach to building a legal ontology from text by Anatoly P. Getman and Volodymyr V. Karasiuk.

Abstract:

This article focuses on the problems of application of artificial intelligence to represent legal knowledge. The volume of legal knowledge used in practice is unusually large, and therefore the ontological knowledge representation is proposed to be used for semantic analysis, presentation and use of common vocabulary, and knowledge integration of problem domain. At the same time some features of legal knowledge representation in Ukraine have been taken into account. The software package has been developed to work with the ontology. The main features of the program complex, which has a Web-based interface and supports multi-user filling of the knowledge base, have been described. The crowdsourcing method is due to be used for filling the knowledge base of legal information. The success of this method is explained by the self-organization principle of information. However, as a result of such collective work a number of errors are identified, which are distributed throughout the structure of the ontology. The results of application of this program complex are discussed in the end of the article and the ways of improvement of the considered technique are planned.

Curious how you would compare this attempt to extract an ontology from legal texts to the efforts in the 1960’s and 1970’s to extract logic from the United States Internal Revenue Code? Apologies but my undergraduate notes aren’t accessible so I can’t give you article titles and citations.

If you do dig out some of that literature, pointers would be appreciated. As I recall, capturing the “logic” of those passages was fraught with difficulty.

Crawling With Nutch

Tuesday, May 27th, 2014

Crawling With Nutch by Elizabeth Haubert.

From the post:

Recently, I had a client using LucidWorks search engine who needed to integrate with the Nutch crawler. This sounds simple as both products have been around for a while and are officially integrated. Even better, there are some great “getting started in x minutes” tutorials already out there for both Nutch, Solr and LucidWorks. But there were a few gotchas that kept those tutorials from working for me out of the box. This blog post documents my process of getting Nutch up and running on a Ubuntu server.
….

I know exactly what Elizabeth means, I have yet to find a Nutch/Solr tutorial that isn’t incomplete in some way.

What is really amusing is to try to setup Tomcat 7, Solr and Nutch.

I need to write up that experience sometime fairly soon. But no promises if you vary from the releases I document.

Strangeloop 2014

Tuesday, May 27th, 2014

Strangeloop 2014

September 17-19, 2014 in St. Louis, Missouri.

Including the keynotes, sixty-six (66) speakers. I don’t think I have ever seen a stronger speaker’s list.

Enjoy!

Registration opens May 28, 2014.

Crossing the Chasm…

Tuesday, May 27th, 2014

Crossing the Chasm with Semantic Technology by Marin Dimitrov.

From the description:

After more than a decade of active efforts towards establishing Semantic Web, Linked Data and related standards, the verdict of whether the technology has delivered its promise and has proven itself in the enterprise is still unclear, despite the numerous existing success stories.

Every emerging technology and disruptive innovation has to overcome the challenge of “crossing the chasm” between the early adopters, who are just eager to experiment with the technology potential, and the majority of the companies, who need a proven technology that can be reliably used in mission critical scenarios and deliver quantifiable cost savings.

Succeeding with a Semantic Technology product in the enterprise is a challenging task involving both top quality research and software development practices, but most often the technology adoption challenges are not about the quality of the R&D but about successful business model generation and understanding the complexities and challenges of the technology adoption lifecycle by the enterprise.

This talk will discuss topics related to the challenge of “crossing the chasm” for a Semantic Technology product and provide examples from Ontotext’s experience of successfully delivering Semantic Technology solutions to enterprises.

I differ from Dimitrov’s on some of the details but a solid +1! for slides 29 and 30.

I think you will recognize immediate similarity, at least on slide 29, to some of the promotions for topic maps.

Of course, the next question is how to get to slide 30 isn’t it?

3D Printed Hypercube of Monkeys

Tuesday, May 27th, 2014

Nothing is more fun than a 3D printed hypercube of monkeys

From the post:

The quaternion group {1,i,j,k,-1,-i,-j,-k} is a beautiful group of order eight. It didn’t have a physical representation because the object should be 4-dimensional. But has the quaternion group ever appeared as the symmetry group of an object? The answer is yes. In order to visualize the symmetries of the quaternion group, mathematician Henry Segerman, sculptor Will Segerman and mathemusician Vi Hart have designed a four-dimensional object, a hypercube, and put a monkey at the center of each of the eight cubes.

If that doesn’t sound interesting enough, the post also has an animated image of the monkeys emerging from the 4th dimension, a video on “…how to make sculptures of 4D things,” and a pointer to: The Quaternion Group as a Symmetry Group.

Displaying countries in different perspectives impacts your perception of a map. Imagine the impact of emerging from the 4th dimension.

I first saw this in a tweet by Stefano Bertolo.

Bigdata and Blueprints

Tuesday, May 27th, 2014

Bigdata and Blueprints

From the webpage:

Blueprints is an open-source property graph model interface useful for writing applications on top of a graph database. Gremlin is a domain specific language for traversing property graphs that comes with an excellent REPL useful for interacting with a Blueprints database. Rexster exposes a Blueprints database as a web service and comes with a web-based workbench application called DogHouse.

To get started with bigdata via Blueprints, Gremlin, and Rexster, start by getting your bigdata server running per the instructions here.

Then, go and download some sample GraphML data. The Tinkerpop Property Graph is a good starting point.

Just in case you aren’t familiar with bigdata(R):

bigdata(R) is a scale-out storage and computing fabric supporting optional transactions, very high concurrency, and very high aggregate IO rates. The bigdata RDF/graph database can load 1B edges in under one hour on a 15 node cluster. Bigdata operates in both a single machine mode (Journal), highly available replication cluster mode (HAJournalServer), and a horizontally sharded cluster mode (BigdataFederation). The Journal provides fast scalable ACID indexed storage for very large data sets, up to 50 billion edges. The HAJournalServer adds replication, online backup, horizontal scaling of query, and high availability. The federation provides fast scalable shard-wise parallel indexed storage using dynamic sharding and shard-wise ACID updates and incremental cluster size growth. Both platforms support fully concurrent readers with snapshot isolation. (http://sourceforge.net/projects/bigdata/)

So, this is a major event for Blueprints.

I first saw this in a tweet by Marko A. Rodriguez.

Nonlinear Dynamics and Chaos

Tuesday, May 27th, 2014

Nonlinear Dynamics and Chaos – Steven Strogatz, Cornell University.

From the description:

This course of 25 lectures, filmed at Cornell University in Spring 2014, is intended for newcomers to nonlinear dynamics and chaos. It closely follows Prof. Strogatz’s book, “Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering.” The mathematical treatment is friendly and informal, but still careful. Analytical methods, concrete examples, and geometric intuition are stressed. The theory is developed systematically, starting with first-order differential equations and their bifurcations, followed by phase plane analysis, limit cycles and their bifurcations, and culminating with the Lorenz equations, chaos, iterated maps, period doubling, renormalization, fractals, and strange attractors. A unique feature of the course is its emphasis on applications. These include airplane wing vibrations, biological rhythms, insect outbreaks, chemical oscillators, chaotic waterwheels, and even a technique for using chaos to send secret messages. In each case, the scientific background is explained at an elementary level and closely integrated with the mathematical theory. The theoretical work is enlivened by frequent use of computer graphics, simulations, and videotaped demonstrations of nonlinear phenomena. The essential prerequisite is single-variable calculus, including curve sketching, Taylor series, and separable differential equations. In a few places, multivariable calculus (partial derivatives, Jacobian matrix, divergence theorem) and linear algebra (eigenvalues and eigenvectors) are used. Fourier analysis is not assumed, and is developed where needed. Introductory physics is used throughout. Other scientific prerequisites would depend on the applications considered, but in all cases, a first course should be adequate preparation.

Storgatz’s book “Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering,” is due out in a second edition in July of 2014. First edition was 2001.

Mastering the class and Stogatz’s book will enable you to call BS on projects with authority. Social groups are one example of chaotic systems. As a consequence, the near religious certainly of policy wonks on outcomes of particular policies is mis-guided.

Be cautious with those who response to social dynamics being chaotic by saying: “…yes, but …(here follows their method of controlling the chaotic system).” Chaotic systems by definition cannot be controlled nor can we account for all the influences and variables in such systems.

The best you can do is what seems to work, most of the time.

Ethics and Big Data

Monday, May 26th, 2014

Ethical research standards in a world of big data by Caitlin M. Rivers and Bryan L. Lewis.

Abstract:

In 2009 Ginsberg et al. reported using Google search query volume to estimate influenza activity in advance of traditional methodologies. It was a groundbreaking example of digital disease detection, and it still remains illustrative of the power of gathering data from the internet for important research. In recent years, the methodologies have been extended to include new topics and data sources; Twitter in particular has been used for surveillance of influenza-like-illnesses, political sentiments, and even behavioral risk factors like sentiments about childhood vaccination programs. As the research landscape continuously changes, the protection of human subjects in online research needs to keep pace. Here we propose a number of guidelines for ensuring that the work done by digital researchers is supported by ethical-use principles. Our proposed guidelines include: 1) Study designs using Twitter-derived data should be transparent and readily available to the public. 2) The context in which a tweet is sent should be respected by researchers. 3) All data that could be used to identify tweet authors, including geolocations, should be secured. 4) No information collected from Twitter should be used to procure more data about tweet authors from other sources. 5) Study designs that require data collection from a few individuals rather than aggregate analysis require Institutional Review Board (IRB) approval. 6) Researchers should adhere to a user’s attempt to control his or her data by respecting privacy settings. As researchers, we believe that a discourse within the research community is needed to ensure protection of research subjects. These guidelines are offered to help start this discourse and to lay the foundations for the ethical use of Twitter data.

I am curious who is going to follow this suggested code of ethics?

Without long consideration, obviously not the NSA, FBI, CIA, DoD, or any employee of the United States government.

Ditto for the security services in any country plus their governments.

Industry players are well known for their near perfect recidivism rate on corporate crime so not expecting big data ethics there.

Drug cartels? Anyone shipping cocaine in multi-kilogram lots is unlikely to be interested in Big Data ethics.

That rather narrows the pool of prospective users of a code of ethics for big data doesn’t it?

I first saw this in a tweet by Ed Yong.

Self-Inflicted Wounds in Science (Astronomy)

Monday, May 26th, 2014

The Major Blunders That Held Back Progress in Modern Astronomy

From the post:

Mark Twain once said, “It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so. ”

The history of science provides many entertaining examples. So today, Abraham Loeb at Harvard University in Cambridge, scour the history books for examples from the world of astronomy.

It turns out that the history of astronomy is littered with ideas that once seemed incontrovertibly right and yet later proved to be bizarrely wrong. Not least among these are the ancient ideas that the Earth is flat and at the centre of the universe.

But there is no shortage of others from the modern era. “A very common flaw of astronomers is to believe that they know the truth even when data is scarce,” says Loeb.

To make his point, Loeb has compiled a list of ten modern examples of ideas that were not only wrong but also significantly held back progress in astronomy “causing unnecessary delays in finding the truth”.

Highly amusing account of how “beliefs” in science can delay scientific progress. Three in this essay with pointers to the other seven (7).

When someone says: “This is science/scientific…,” they are claiming to have followed the practices of scientific “rhetoric,” that is how to construct a scientific argument.

Whether a scientific argument is correct or not, is an entirely separate question.

Verified Networking using Dependent Types

Sunday, May 25th, 2014

Verified Networking using Dependent Types by Simon Fowler.

Abstract:

Strongly, statically typed functional programming languages have found a strong grounding in academia and industry for a variety of reasons: they are concise, their type systems provide additional static correctness guarantees, and the structured management of side effects aids easier reasoning about the operation of programs, to name but a few.

Dependently-typed languages take these concepts a step further: by allowing types to be predicated on values, it is possible to impose arbitrarily specific type constraints on functions, resulting in increased confidence about their runtime behaviour.

This work demonstrates how dependent types may be used to increase confidence in network applications. We show how dependent types may be used to enforce resource usage protocols inherent in C socket programming, providing safety guarantees, and examine how a dependently- typed embedded domain-specific language may be used to enforce the conformance of packets to a given structure. These concepts are explored using two larger case studies: packets within the Domain Name System (DNS) and a networked game.

The use of statically typed functional programming languages is spreading. Fowler’s dissertation is yet another illustration of that fact.

When you read:

…examine how a dependently- typed embedded domain-specific language may be used to enforce the conformance of packets to a given structure.

Do you also hear:

…examine how a dependently- typed embedded domain-specific language may be used to enforce the conformance of proxies to a given structure.

??

Emotion Markup Language 1.0 (No Repeat of RDF Mistake)

Sunday, May 25th, 2014

Emotion Markup Language (EmotionML) 1.0

Abstract:

As the Web is becoming ubiquitous, interactive, and multimodal, technology needs to deal increasingly with human factors, including emotions. The specification of Emotion Markup Language 1.0 aims to strike a balance between practical applicability and scientific well-foundedness. The language is conceived as a “plug-in” language suitable for use in three different areas: (1) manual annotation of data; (2) automatic recognition of emotion-related states from user behavior; and (3) generation of emotion-related system behavior.

I started reading EmotionML with the expectation that the W3C had repeated its one way and one way only for identification mistake from RDF.

Much to my pleasant surprise I found:

1.2 The challenge of defining a generally usable Emotion Markup Language

Any attempt to standardize the description of emotions using a finite set of fixed descriptors is doomed to failure: even scientists cannot agree on the number of relevant emotions, or on the names that should be given to them. Even more basically, the list of emotion-related states that should be distinguished varies depending on the application domain and the aspect of emotions to be focused. Basically, the vocabulary needed depends on the context of use. On the other hand, the basic structure of concepts is less controversial: it is generally agreed that emotions involve triggers, appraisals, feelings, expressive behavior including physiological changes, and action tendencies; emotions in their entirety can be described in terms of categories or a small
number of dimensions; emotions have an intensity, and so on. For details, see Scientific Descriptions of Emotions in the Final Report of the Emotion Incubator Group.

Given this lack of agreement on descriptors in the field, the only practical way of defining an EmotionML is the definition of possible structural elements and their valid child elements and attributes, but to allow users to “plug in” vocabularies that they consider appropriate for their work. A separate W3C Working Draft complements this specification to provide a central repository of [Vocabularies for EmotionML] which can serve as a starting point; where the vocabularies listed there seem inappropriate, users can create their custom vocabularies.

An additional challenge lies in the aim to provide a generally usable markup, as the requirements arising from the three different use cases (annotation, recognition, and generation) are rather different. Whereas manual annotation tends to require all the fine-grained distinctions considered in the scientific literature, automatic recognition systems can usually distinguish
only a very small number of different states.

For the reasons outlined here, it is clear that there is an inevitable tension between flexibility and interoperability, which need to be weighed in the formulation of an EmotionML. The guiding principle in the following specification has been to provide a choice only where it is needed, and to propose reasonable default options for every choice.

Everything that is said about emotions is equally true for identification, emotions being on one of the infinite sets of subjects that you might want to identify.

Had the W3C avoided the one identifier scheme of RDF (and the reliance on a subset of reasoning, logic), RDF could have had plugin “identifier” modules, enabling the use of all extant and future identifiers, not to mention “reasoning” according to the designs of users.

It is good to see the W3C learning from its earlier mistakes and enabling users to express their world views, as opposed to a world view as prescribed by the W3C.

When users declare their emotional vocabularies, those are subjects which merit further identification. To avoid the problem of us not meaning the same thing by “owl:sameAs” as someone else means by “owl:sameAs.” (When owl:sameAs isn’t the Same: An Analysis of Identity Links on the Semantic Web by Harry Halpin, Ivan Herman, Patrick J. Hayes.)

Topic maps are a good solution for documenting subject identity and deciding when two or more identifications of subjects are the same subject.

I first saw this in a tweet by Inge Henriksen

Real Time Robot Dance Party

Sunday, May 25th, 2014

From the description:

From the 2014 Solid Conference: In this day and age, we usually consider robots to be utilitarian, problem solvers. But there is another use for robots, that is for artistic expression.

In this fun, energetic talk, we will explore controlling multiple robots in real time. Roombas sway to gentle computer generated music, while Sphero balls roll with flashing lights. This robot jam will culminate in spectacular finale when the AR Drones fly in to join the dance.

Using Emacs, Overture, Clojure, a live robot dance by Carin Meier and Peter Shanley.

A very impressive demonstration but of what I am not exactly sure. Which is of course, a perfect demonstration!

Enjoy!

I first saw this in a tweet by Michael Klishin

Human Computation

Saturday, May 24th, 2014

Human Computation

From the homepage:

Human Computation is an international and interdisciplinary forum for the electronic publication and print archiving of high-quality scholarly articles in all areas of human computation, which concerns the design or analysis of information processing systems in which humans participate as computational elements.

Submission Topics

(Editorial keywords are in boldface – please see author guidelines for details)

Applications – novel or transformative applications
Interfaces – HCI or related human factors methods or issues
Modalities – general interaction paradigms (e.g., gaming) and related methods
Techniques – repeatable methods, analogous to design patterns for OOP
Algorithms – wisdom of crowds, aggregation, reputation, crowdsourced analysis, and ML/HC
Architecture – development platforms, architectures, languages, APIs, IDEs, and compilers
Infrastructure – relevant networks, protocols, state space, and services
Participation – factors that influence human participation
Analysis – techniques for identifying typical characteristics and patterns in human computation systems
Epistemology – the role, source, representation, and construction of information
Policy – ethical, regulatory, and economic considerations
Security – security issues, including surreptitious behavior to influence system outcomes
Society – cultural, evolutionary, existential, psychological, and social impact
Organization – taxonomies of concepts, terminology, problem spaces, algorithms, and methods
Surveys – state of the art assessments of various facets
Meta-topics – insightful commentary on the future, philosophy, charter, and purpose of HC.

Looks like a journal for topic map articles to me.

You?

I first saw this in a tweet by Matt Lease.

Fluokitten

Saturday, May 24th, 2014

Fluokitten: Category theory concepts in Clojure – Functors, Applicatives, Monads, Monoids and more.

From the “getting started” page:

This is a brief introductory guide to Fluokitten that aims to give you the necessary information to get up and running, as well as a brief overview of some available resources for learning key category theory concepts and how to apply them in Clojure with Fluokitten.

Overview

Fluokitten is a Clojure library that enables programming paradigms derived from category theory (CT). It provides:

  • A core library of CT functions uncomplicate.fluokitten.core;
  • Protocols for many CT concepts uncomplicate.fluokitten.protocols;
  • Implementations of these protocols for standard Clojure constructs (collections, functions, etc.) uncomplicate.fluokitten.jvm;
  • Macros and functions to help you write custom protocol implementations.
  • Accompanying website with learning resources.

Not your first resource on Clojure but certainly one to consult when you want to put category theory into practice with Clojure.

Morph

Saturday, May 24th, 2014

Morph – A Library of Morphisms: Monoids, Functors, and Monads by Armando Blancas.

From the webpage:

Morph

Morph is a library of Haskell-style morphisms: monoids, functors, and monads. These constructs are helpful for designing programs that are purely functional and that encapsulate the boilerplate employed by many programming techniques.

Features

  • Implementation based on protocols and data types.
  • Predefined monoids and functors; with support for Clojure collections.
  • Monads: Identity, Maybe, Either, Reader, Writer, State, Imperative.
  • Monad Transformers: MaybeT, EitherT, ReaderT, WriterT, StateT.
  • Support for curried functions.
  • Library of generic functions for the above constructs.
  • Sample code in src/main/resources.

These constructs have a reputation of being hard to explain and even harder to understand and to apply in everyday programming. I’ve made every effort to present them as regular techniques and idioms with practical benefits. Behind their strange or fancy names, these are just functions that work on data types.

An intuition of their behavior is all that’s needed to take advantage of these functions; you may never need or want to write your own. I’m pleased with the power and simplicity these techniques have to offer and I hope you may find them useful as well.

Lowering the learning curve for using functional programming languages? Is that a bit like being able to use a compiler but not write one? Whatever drives adoption is a good thing.

…Data Analytics Hackathon

Saturday, May 24th, 2014

Elasticsearch Teams up with MIT Sloan for Data Analytics Hackathon by Sejal Korenromp.

From the post:

Following from the success and popularity of the Hopper Hackathon we participated in late last year, last week we sponsored the MIT Sloan Data Analytics Club Hackathon for our latest offering to Elasticsearch aficionados. More than 50 software engineers, business students and other open source software enthusiasts signed up to participate, and on a Saturday to boot! The full day’s festivities included access to a huge storage and computing cluster, and everyone was set free to create something awesome using Elasticsearch.

Hacks from the finalists:

  • Quimbly – A Digital Library
  • Brand Sentiment Analysis
  • Conference Data
  • Twitter based sentiment analyzer
  • Statistics on Movies and Wikipedia

See Sejal’s post for the details of each hack and the winner.

I noticed several very good ideas in these hacks, no doubt you will notice even more.

Enjoy!

Lisp Flavored Erlang

Saturday, May 24th, 2014

Lisp Flavored Erlang These are your father’s parentheses Elegant weapons, for a more …civilized age1.

From the homepage:

Origins

LFE has many origins, depending upon whether you’re looking at Lisp (and here), Erlang, or LFE-proper. The LFE community of contributors embraces all of these and more.

From the original release message:

I have finally released LFE, Lisp Flavoured Erlang, which is a lisp syntax front-end to the Erlang compiler. Code produced with it is compatible with “normal” Erlang code. The is an LFE-mode for Emacs and the lfe-mode.el file is include in the distribution… (Robert Virding)

I haven’t looked up the numbers but I am sure that LFE is in the terminology of academia, one of the less often taught languages. However, it sounds deeply interesting as we all march towards scalable concurrent processing.

Erlang-Bookmarks

Saturday, May 24th, 2014

Erlang-Bookmarks

Almost two hundred (195 as of May 24, 2014) links gathered in the following groups:

  • API Clients
  • Blogs
  • Books
  • Community
  • Database clients
  • Debugging and profiling
  • Documentation
  • Documentation tools
  • Editors and IDEs
  • Erlang for beginners
  • Erlang Internals
  • Erlang interviews and resources
  • Erlang – more advanced topics
  • Exercises
  • Http clients
  • Json
  • Load testing tools
  • Loggers
  • Network
  • Other languages on top of the Erlang VM
  • Package managers
  • Podcasts
  • Projects using Erlang
  • Style guide and Erlang Enhancement Proposals
  • Testing Frameworks
  • Videos
  • Utils
  • War diaries
  • Web frameworks
  • Web servers

This should supply you with plenty of beach reading. 😉

Free MarkLogic Classes

Saturday, May 24th, 2014

Free MarkLogic Classes

From the webpage:

MarkLogic University offers FREE publicly scheduled instructor led courses! Here’s how it works:

  • Sign up for any public class listed below by paying the Booking Fee
  • Once you have completed the course the Booking Fee will be fully refunded with 7 business days
  • If you register for the class but do not attend you will forfeit your Booking Fee
  • If you have any questions please contact training@marklogic.com

Vendor specific I know but you can’t argue with the pricing scheme. If anything, it should help encourage you to attend and complete the classes.

If you take one (or more) of these courses, please comment or send me a private message. Thanks!

Elasticsearch 1.2.0 and 1.1.2 released

Saturday, May 24th, 2014

Elasticsearch 1.2.0 and 1.1.2 released by Clinton Gormley.

From the post:

Today, we are happy to announce the release of Elasticsearch 1.2.0, based on Lucene 4.8.1, along with a bug fix release Elasticsearch 1.1.2.

You can download them and read the full change lists here:

Elasticsearch 1.2.0 is a bumper release, containing over 300 new features, enhancements, and bug fixes. You can see the full changes list in the Elasticsearch 1.2.0 release notes, but we will highlight some of the important ones below:

Highlights of the more important changes for Elasticsearch 1.2.0:

  • Java 7 required
  • dynamic scripting disabled by default
  • field data and filter caches
  • gateways removed
  • indexing and merging
  • aggregations
  • context suggester
  • improved deep scrolling
  • field value factor

See Clinton’s post or the release notes for more complete coverage. (Aggregation looks particularly interesting.)

the HiggsML challenge

Saturday, May 24th, 2014

the HiggsML challenge

The challenge runs from May 12th to September 2014.

From the challenge:

In a nutshell, we provide a data set containing a mixture of simulated signal and background events, built from simulated events provided by the ATLAS collaboration at CERN. Competitors can use or develop any algorithm they want, and the one who achieves the best signal/background separation wins! Besides classical prizes for the winners, a special “HEP meets ML” prize will also be awarded with an invitation to CERN; we are also seeking to organise a NIPS workshop.

For this HEP challenge we deliberately picked one of the most recent and hottest playgrounds: the Higgs decaying into a pair of tau leptons. The first ATLAS results were made public in december 2013 in a CERN seminar, ATLAS sees Higgs boson decay to fermions. The simulated events that participants will have in their hands are the same that physicists used. Participants will be working in realistic conditions although we have simplified quite a bit the original problem so that it became tractable without any background in physics.

HEP physicist, even ATLAS physicists, who have experience with multivariate analysis, neural nets, boosted decision trees and the like are warmly encouraged to compete with machine learning experts.

The Laboratoire de l’Accélerateur Linéaire (LAL) is a French lab located in the vicinity of Paris. It is overseen by both the CNRS (IN2P3) and University Paris-Sud. It counts 330 employees (125 researchers and 205 engineers and technicians) and brings internationally recognized contributions to experimental Particle Physics, Accelerator Physics, Astroparticle Physics, and Cosmology.

Contact : for any question of general interest about the challenge, please consult and use the forum provided on the Kaggle web site. For private comments, we are also reachable at higgsml_at_lal.in2p3.fr.

Now there is a machine learning challenge for the summer!

Not to mention more science being done on the basis of public data sets.

Be sure to forward this to both your local computer science and physics department.

Scala eXchange 2013 (screencasts)

Friday, May 23rd, 2014

Scala eXchange 2013

From the webpage:

Join us at the third Annual Scala eXchange 2013 for 2 days of learning Scala skills! Meet the amazing Bill Venners and gain an understanding of the trade-offs between implicit conversions and parameters and how to take advantage of implicit parameters in your own designs. Or join Viktor Klang’s talk to learn strategies for the recovery and healing of your systems, when things go FUBAR. Find out about Lift from David Pollak, or find out about Adept, the new dependency management system for Scala in Fredrik Ekholdt’s talk. Find out about the road to Akka Cluster, and beyond in Jonas Boner’s keynote or about the new design of theMacro-based Scala Parallel Collections with Alex Prokopec! Featuring 2 days of talks over 3 tracks, The Scala eXchange will bring the world’s top Scala experts and many of the creators of Scala stack technologies together with Europe’s Scala community to learn and share skills, exchange ideas and meet like minded people. Don’t miss it!

There are forty-eight (48) screencasts from Scala eXchange 2013 posted for your viewing pleasure.

I can’t think of a better selling point for Scala eXchange 2014 than the screencasts from 2013.

Convert Existing Data into Parquet

Friday, May 23rd, 2014

Convert Existing Data into Parquet by Uri Laserson.

From the post:

Learn how to convert your data to the Parquet columnar format to get big performance gains.

Using a columnar storage format for your data offers significant performance advantages for a large subset of real-world queries. (Click here for a great introduction.)

Last year, Cloudera, in collaboration with Twitter and others, released a new Apache Hadoop-friendly, binary, columnar file format called Parquet. (Parquet was recently proposed for the ASF Incubator.) In this post, you will get an introduction to converting your existing data into Parquet format, both with and without Hadoop.

Actually, between Uri’s post and my pointing to it, Parquet has been accepted into the ASF Incubator!

All the more reason to start following this project.

Enjoy!

Neo4j 2.0: Creating adjacency matrices

Friday, May 23rd, 2014

Neo4j 2.0: Creating adjacency matrices by Mark Needham.

From the post:

About 9 months ago I wrote a blog post showing how to export an adjacency matrix from a Neo4j 1.9 database using the cypher query language and I thought it deserves an update to use 2.0 syntax.

I’ve been spending some of my free time working on an application that runs on top of meetup.com’s API and one of the queries I wanted to write was to find the common members between 2 meetup groups.

The first part of this query is a cartesian product of the groups we want to consider which will give us the combinations of pairs of groups:

I can imagine several interesting uses for the adjacency matrices that Mark describes.

One of which is common membership in groups as the post outlines.

Another would be a common property or sharing a value within a range.

Yes?

Overview (new release)

Friday, May 23rd, 2014

Overview (new release)

A new version of Overview was released last Monday. The GitHub page lists the following new features:

  • Overview will reserve less memory on Windows with 32-bit Java. That means it won’t support larger document sets; it also means it won’t crash on startup for some users.
  • Overview now starts in “single-user mode”. You won’t be prompted for a username or password.
  • Overview will automatically open up a browser window to http://localhost:9000 when it’s ready to go.
  • You can export huge document sets without running out of memory.

Installation and upgrade instructions: Installation and upgrade instructions: https://github.com/overview/overview-server/wiki/Installing-and-Running-Overview

For more details on how Overview supports “document-driven journalism,” see the Overview Project homepage.