Another Word For It Patrick Durusau on Topic Maps and Semantic Diversity

August 6, 2012

Writing a modular GPGPU program in Java

Filed under: CUDA,GPU,Java — Patrick Durusau @ 4:05 pm

Writing a modular GPGPU program in Java by Masayuki Ioki, Shumpei Hozumi, and Shigeru Chiba.

Abstract:

This paper proposes a Java to CUDA runtime program translator for scientific-computing applications. Traditionally, these applications have been written in Fortran or C without using a rich modularization mechanism. Our translator enables those applications to be written in Java and run on GPGPUs while exploiting a rich modularization mechanism in Java. This translator dynamically generates optimized CUDA code from a Java program given at bytecode level when the program is running. By exploiting dynamic type information given at translation, the translator devirtualizes dynamic method dispatches and flattens objects into simple data representation in CUDA. To do this, a Java program must be written to satisfy certain constraints.

This paper also shows that the performance overheads due to Java and WootinJ are not significantly high.

Just in case you are starting to work on topic map processing routines for GPGPUs.

Something to occupy your time during the “dog days” of August.

August 4, 2012

Fun With Hadoop In Action Exercises (Java)

Filed under: Hadoop,Java — Patrick Durusau @ 7:01 pm

Fun With Hadoop In Action Exercises (Java) by Sujit Pal.

From the post:

As some of you know, I recently took some online courses from Coursera. Having taken these courses, I have come to the realization that my knowledge has some rather large blind spots. So far, I have gotten most of my education from books and websites, and I have tended to cherry pick subjects which I need at the moment for my work, as a result of which I tend to ignore stuff (techniques, algorithms, etc) that fall outside that realm. Obviously, this is Not A Good Thing™, so I have begun to seek ways to remedy that.

I first looked at Hadoop years ago, but never got much beyond creating proof of concept Map-Reduce programs (Java and Streaming/Python) for text mining applications. Lately, many subprojects (Pig, Hive, etc) have come up in order to make it easier to deal with large amounts of data using Hadoop, about which I know nothing. So in an attempt to ramp up relatively quickly, I decided to take some courses at BigData University.

The course uses BigInsights (IBM’s packaging of Hadoop) which run only on Linux. VMWare images are available, but since I have a Macbook Pro, that wasn’t much use to me without a VMWare player (not free for Mac OSX). I then installed VirtualBox and tried to run a Fedora 10 64-bit image on it, and install BigInsights on Fedora, but it failed. I then tried to install Cloudera CDH4 (Cloudera’s packaging of Hadoop) on it (its a series of yum commands), but that did not work out either. Ultimately I decided to ditch VirtualBox altogether and do a pseudo-distributed installation of the stock Apache Hadoop (1.0.3) direct on my Mac following instructions on Michael Noll’s page.

The Hadoop Fundamentals I course which I was taking covers quite a few things, but I decided to stop and actually read all of Hadoop in Action (HIA) in order to get a more thorough coverage. I had purchased it some years before as part of Manning’s MEAP (Early Access) program, so its a bit dated (examples are mostly in the older 0.19 API), but its the only Hadoop book I possess, and the concepts are explained beautifully, and its not a huge leap to mentally translate code from the old API to the new, so it was well worth the read.

I also decided to tackle the exercises (in Java for now) and post my solutions on GitHub. Three reasons. First, it exposes me to a more comprehensive set of scenarios than I have had previously, and forces me to use techniques and algorithms that I wont otherwise. Second, hopefully some of my readers can walk circles around me where Hadoop is concerned, and they would be kind enough to provide criticism and suggestions for improvement. And third, there may be some who would benefit from having the HIA examples worked out. So anyway, here they are, my solutions to selected exercises from Chapters 4 and 5 of the HIA book for your reading pleasure.

Much good content follows!

This will be useful to a large number of people.

As well as setting a good example.

July 31, 2012

Vertical Scaling made easy through high-performance actors

Filed under: Actor-Based,Java,Messaging — Patrick Durusau @ 10:41 am

Vertical Scaling made easy through high-performance actors

From the webpage:

Vertical scaling is today a major issue when writing server code. Threads and locks are the traditional approach to making full utilization of fat (multi-core) computers, but result is code that is difficult to maintain and which to often does not run much faster than single-threaded code.

Actors make good use of fat computers but tend to be slow as messages are passed between threads. Attempts to optimize actor-based programs results in actors with multiple concerns (loss of modularity) and lots of spaghetti code.

The approach used by JActor is to minimize the messages passed between threads by executing the messages sent to idle actors on the same thread used by the actor which sent the message. Message buffering is used when messages must be sent between threads, and two-way messaging is used for implicit flow control. The result is an approach that is easy to maintain and which, with a bit of care to the architecture, provides extremely high rates of throughput.

On an intel i7, 250 million messages can be passed between actors in the same JVM per second–several orders of magnitude faster than comparable actor frameworks.

Hmmm, 250 million messages a second? On the topic map (TM) scale, that’s what?, about 1/4 TM? 😉

Seriously, if you are writing topic map server software, you need to take a look at JActor.

June 11, 2012

Machine Learning in Java has never been easier! [Java App <-> BigML Rest API]

Filed under: Java,Machine Learning — Patrick Durusau @ 4:25 pm

Machine Learning in Java has never been easier!

From the post:

Java is by far one of the most popular programming languages. It’s on the top of the TIOBE index and thousands of the most robust, secure, and scalable backends have been built in Java. In addition, there are many wonderful libraries available that can help accelerate your project enormously. For example, most of BigML’s backend is developed in Clojure which runs on top of the Java Virtual Machine. And don’t forget the ever-growing Android market, with 850K new devices activated each day!

There are number of machine learning Java libraries available to help build smart data-driven applications. Weka is one of the more popular options. In fact, some of BigML’s team members were Weka users as far back as the late 90s. We even used it as part of the first BigML backend prototype in early 2011. Apache Mahout is another great Java library if you want to deal with bigger amounts of data. However in both cases you cannot avoid “the fun of running servers, installing packages, writing MapReduce jobs, and generally behaving like IT ops folks“. In addition you need to be concerned with selecting and parametrizing the best algorithm to learn from your data as well as finding a way to activate and integrate the model that you generate into your application.

Thus we are thrilled to announce the availability of the first Open Source Java library that easily connects any Java application with the BigML REST API. It has been developed by Javi Garcia, an old friend of ours. A few of the BigML team members have been lucky enough to work with Javi in two other companies in the past.

With this new library, in just a few lines of code you can create a predictive model and generate predictions for any application domain. From finding the best price for a new product to forecasting sales, creating recommendations, diagnosing malfunctions, or detecting anomalies.

It won’t be as easy as “…in just a few lines of code…” but it will, what’s the term, modularize the building of machine learning applications. Someone has to run/maintain the servers, do security patches, backups but it doesn’t have to be you.

Specialization, that’s the other term. So that team members can be really good at what they do, as opposed to sorta good at a number of things.

If you need a common example, consider documentation, most of which is written by developers when they can spare the time. Reads like it. Costs your clients time and money trying to get their developers to work with poor documentation.

Not to mention costing you time and money when the software is not longer totally familiar to one person.

PS: As of today, June 11, 2012, Java is now #2 and C is #1 on the TIOBE list.

June 6, 2012

Java Annotations

Filed under: Java,Java Annotations — Patrick Durusau @ 7:48 pm

Java Annotations

From the post:

Annotation is code about the code, that is metadata about the program itself. In other words, organized data about the code, embedded within the code itself. It can be parsed by the compiler, annotation processing tools and can also be made available at run-time too.

We have basic java comments infrastructure using which we add information about the code / logic so that in future, another programmer or the same programmer can understand the code in a better way. Javadoc is an additional step over it, where we add information about the class, methods, variables in the source code. The way we need to add is organized using a syntax. Therefore, we can use a tool and parse those comments and prepare a javadoc document which can be distributed separately.

Javadoc facility gives option for understanding the code in an external way, instead of opening the code the javadoc document can be used separately. IDE benefits using this javadoc as it is able to render information about the code as we develop. Annotations were introduced in JDK 1.5

A reminder to myself of an opportunity for the application of topic maps to Java code. Obviously with regard to “custom” annotations but I suspect the range of usage for “standard” annotations is quite wide.

Not that there aren’t other “subjects” that could be usefully organized out of source code using topic maps. Such as which developers use which classes, methods, etc.

May 2, 2012

12 Ways to Increase Throughput by 32X and Reduce Latency by 20X

Filed under: Java,Messaging,Performance — Patrick Durusau @ 3:31 pm

12 Ways to Increase Throughput by 32X and Reduce Latency by 20X

From the post:

Martin Thompson, a high-performance technology geek, has written an awesome post, Fun with my-Channels Nirvana and Azul Zing. In it Martin shows the process and techniques he used to take an existing messaging product, written in Java, and increase throughput by 32X and reduce latency by 20X. The article is very well written with lots of interesting details that make it well worth reading.

You might want to start with the High Scalability summary before tackling the “real thing.”

Of interest to subject-centric applications that rely on messaging. And anyone interested in performance for the sheer pleasure of it.

April 5, 2012

Choosing a Java Version on Ubuntu

Filed under: Java — Patrick Durusau @ 3:38 pm

Choosing a Java Version on Ubuntu

Apologies but I had to write this down where I am likely to find it in the future.

I run a fair number of Java based apps that are, shall we say, sensitive as to the edition/version of Java that is being invoked.

Some updates take it upon themselves to “correct” my settings.

I happened upon this and it reminded me of that issue.

Thought you might find it helpful at some point.

March 5, 2012

Java Remote Method Invocation (RMI) for Bioinformatics

Filed under: Bioinformatics,Java,Remote Method Invocation (RMI) — Patrick Durusau @ 7:53 pm

Java Remote Method Invocation (RMI) for Bioinformatics by Pierre Lindenbaum.

From the post:

Java Remote Method Invocation (Java RMI) enables the programmer to create distributed Java technology-based to Java technology-based applications, in which the methods of remote Java objects can be invoked from other Java virtual machines*, possibly on different hosts.“[Oracle] In the current post a java client will send a java class to the server that will analyze a DNA sequence fetched from the NCBI, using the RMI technology.

Distributed computing, both to the client and server, is likely to form part of a topic map solution. This example is one drawn from bioinformatics but the principles are generally applicable.

February 29, 2012

Work-Stealing & Recursive Partitioning with Fork/Join

Filed under: Java,MapReduce — Patrick Durusau @ 7:21 pm

Work-Stealing & Recursive Partitioning with Fork/Join by Ilya Grigorik.

From the post:

Implementing an efficient parallel algorithm is, unfortunately, still a non-trivial task in most languages: we need to determine how to partition the problem, determine the optimal level of parallelism, and finally build an implementation with minimal synchronization. This last bit is especially critical since as Amdahl’s law tells us: “the speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program”.

The Fork/Join framework (JSR 166) in JDK7 implements a clever work-stealing technique for parallel execution that is worth learning about – even if you are not a JDK user. Optimized for parallelizing divide-and-conquer (and map-reduce) algorithms it abstracts all the CPU scheduling and work balancing behind a simple to use API.

I appreciated the observation later in the post that map-reduce is a special case of the pattern described in this post. A better understanding of the special cases can lead to a deeper understanding of the general one.

February 3, 2012

Java, Python, Ruby, Linux, Windows, are all doomed

Filed under: Java,Linux OS,Parallelism,Python,Ruby — Patrick Durusau @ 5:02 pm

Java, Python, Ruby, Linux, Windows, are all doomed by Russell Winder.

From the description:

The Multicore Revolution gathers pace. Moore’s Law remains in force — chips are getting more and more transistors on a quarterly basis. Intel are now out and about touting the “many core chip”. The 80-core chip continues its role as research tool. The 48-core chip is now actively driving production engineering. Heterogeneity not homogeneity is the new “in” architecture.

Where Intel research, AMD and others cannot be far behind.

The virtual machine based architectures of the 1990s, Python, Ruby and Java, currently cannot cope with the new hardware architectures. Indeed Linux and Windows cannot cope with the new hardware architectures either. So either we will have lots of hardware which the software cannot cope with, or . . . . . . well you’ll just have to come to the session.

The slides are very hard to see so grab a copy at: http://www.russel.org.uk/Presentations/accu_london_2010-11-18.pdf

From the description: Heterogeneity not homogeneity is the new “in” architecture.

Is greater heterogeneity in programming languages coming?

January 5, 2012

Open Data Structures

Filed under: Data Structures,Java — Patrick Durusau @ 4:04 pm

Open Data Structures by Pat Morin.

From “about:”

Open Data Structures covers the implementation and analysis of data structures for sequences (lists), queues, priority queues, unordered dictionaries, and ordered dictionaries.

Data structures presented in the book include stacks, queues, deques, and lists implemented as arrays and linked-list; space-efficient implementations of lists; skip lists; hash tables and hash codes; binary search trees including treaps, scapegoat trees, and red-black trees; and heaps, including implicit binary heaps and randomized meldable heaps.

The data structures in this book are all fast, practical, and have provably good running times. All data structures are rigorously analyzed and implemented in Java and C++. The Java implementations implement the corresponding interfaces in the Java Collections Framework.

The book and accompanying source code are free (libre and gratis) and are released under a Creative Commons Attribution License. Users are free to copy, distribute, use, and adapt the text and source code, even commercially. The book’s LaTeX sources, Java/C++ sources, and build scripts are available through github.

Noticed in David Eppstein’s Link Roundup.

December 31, 2011

Guava project

Filed under: Java — Patrick Durusau @ 7:27 pm

Guava project

From the web page:

The Guava project contains several of Google’s core libraries that we rely on in our Java-based projects: collections, caching, primitives support, concurrency libraries, common annotations, string processing, I/O, and so forth.

Something you may find useful in the coming year.

I first saw this in Christophe Lalanne’s A bag of tweets / Dec 2011.

December 21, 2011

Three New Splunk Developer Platform Offerings

Filed under: Java,Javascript,Python,Splunk — Patrick Durusau @ 7:24 pm

Three New Splunk Developer Platform Offerings

From the post:

Last week was a busy week for the Splunk developer platform team. We pushed live 2 SDKs within one hour! We are excited to announce the release of:

  • Java SDK Preview on GitHub. The Java SDK enables our growing base of customers to share and harness the core Splunk platform and the valuable data stored in Splunk across the enterprise. The SDK ships with a number of examples including an explorer utility that provides the ability to explore the components and configuration settings of a Splunk installation. Learn more about the Java SDK.
  • JavaScript SDK Preview on GitHub The JavaScript SDK takes big data to the web by providing developers with the ability to easily integrate visualizations into custom applications. Now developers can take the timeline view and charting capabilities of Splunk’s out-of-the-box web interface and include them in their custom applications. Additionally, with node.js support on the server side, developers can build end-to-end applications completely in JavaScript. Learn more about the JavaScript SDK.
  • Splunk Developer AMI. A developer-focused publicly available Linux Amazon Machine Image (AMI) that includes all the Splunk SDKs and Splunk 4.2.5. The Splunk Developer AMI, will make it easier for developers to try the Splunk platform. To enhance the usability of the image, developers can sign up for a free developer license trial, which can be used with the AMI. Read our blog post to learn more about the developer AMI.

The delivery of the Java and JavaScript SDKs coupled with our existing Python SDK (GitHub) reinforce our commitment to developer enablement by providing more language choice for application development and putting the SDKs on the Splunk Linux AMI expedites the getting started experience.

We are seeing tremendous interest in our developer community and customer base for Splunk to play a central role facilitating the ability to build innovative applications on top of a variety of data stores that span on-premises, cloud and mobile.

We are enabling developers to build complex Big Data applications for a variety of scenarios including:

  • Custom built visualizations
  • Reporting tool integrations
  • Big Data and relational database integrations
  • Complex event processing

Not to mention being just in time for the holidays! 😉

Seriously, tools to do useful work with “big data” are coming online. The question is going to be the skill with which they are applied.

December 8, 2011

Multilingual Graph Traversals

Filed under: Gremlin,Groovy,Java,Scala — Patrick Durusau @ 8:00 pm

OK the real title is: JVM Language Implementations. 😉 I like mine better.

From the webpage:

Gremlin is a style of graph traversing that can be hosted in any number of languages. The benefit of this is that users can make use of the programming language they are most comfortable with and still be able to evaluate Gremlin-style traversals. This model is different than, lets say, using SQL in Java where the query is evaluated by passing a string representation of the query to the SQL engine. On the contrary, with native Gremlin support for other JVM languages, there is no string passing. Instead, simple method chaining in Gremlin’s fluent style. However, the drawback of this model is that for each JVM language, there are syntactic variations that must be accounted for.

The examples below demonstrate the same traversal in Groovy, Scala, and Java, respectively.

Seeing is believing.

December 4, 2011

Translating math into code with examples in Java, Racket, Haskell and Python

Filed under: Haskell,Java,Mathematics,Python — Patrick Durusau @ 8:17 pm

Translating math into code with examples in Java, Racket, Haskell and Python by Matthew Might.

Any page that claims Okasaki’s Purely Functional Data Structures as an “essential reference” has to be interesting.

And…, it turns out to be very interesting!

If I have a complaint, it is that it ended too soon! See what you think.

November 27, 2011

H2

Filed under: Database,Java — Patrick Durusau @ 8:51 pm

H2

From the webpage:

Welcome to H2, the Java SQL database. The main features of H2 are:

  • Very fast, open source, JDBC API
  • Embedded and server modes; in-memory databases
  • Browser based Console application
  • Small footprint: around 1 MB jar file size

I ran across this the other day and it looked interesting.

Particularly since I want to start exploring the topic maps tool chain. And what parts can be best done by what software?

November 23, 2011

Google Plugin for Eclipse (GPE) is Now Open Source

Filed under: Cloud Computing,Eclipse,Interface Research/Design,Java — Patrick Durusau @ 7:41 pm

Google Plugin for Eclipse (GPE) is Now Open Source by Eric Clayberg.

From the post:

Today is quite a milestone for the Google Plugin for Eclipse (GPE). Our team is very happy to announce that all of GPE (including GWT Designer) is open source under the Eclipse Public License (EPL) v1.0. GPE is a set of software development tools that enables Java developers to quickly design, build, optimize, and deploy cloud-based applications using the Google Web Toolkit (GWT), Speed Tracer, App Engine, and other Google Cloud services.

….

As of today, all of the code is available directly from the new GPE project and GWT Designer project on Google Code. Note that GWT Designer itself is based upon the WindowBuilder open source project at Eclipse.org (contributed by Google last year). We will be adopting the same guidelines for contributing code used by the GWT project.

Important for the reasons given but also one possible model for topic map services. What if your topic map services were hosted in the cloud and developers could write against against it? That is they would not have to concern themselves with the niceties of topic maps but simply request the information of interest to them, using tools you have provided to make that easier for them.

Take for example the Statement of Disbursements that I covered recently. If that were hosted as a topic map in the cloud, a developer, say working for a resturant promoter, might want to query the topic map for who frequents eateries in a particular area. They are not concerned with the merging that has to take place between various budgets and the alignment of those merges with individuals, etc. They are looking for a list of places with House members alphabetically sorted after it.

November 4, 2011

Solr Performance Monitoring with SPM

Filed under: Java,Solr — Patrick Durusau @ 6:09 pm

Solr Performance Monitoring with SPM

From the post:

Originally delivered as Lightning Talk at Lucene Eurocon 2011 in Barcelona, this quick presentation shows how to use Sematext’s SPM service, currently free to use for unlimited time, to monitor Solr, OS, JVM, and more.

We built SPM because we wanted to have a good and easy to use tool to help us with Solr performance tuning during engagements with our numerous Solr customers. We hope you find our Scalable Performance Monitoring service useful! Please let us know if you have any sort of feedback, from SPM functionality and usability to its speed. Enjoy!

Nice set of slides!

I was relieved to discover that Sematext (I can spell it right with effort) is 100% organic, no GMOs! 😉

Please heed the call for the community to respond with feedback on SPM!

October 26, 2011

Oracle Releases NoSQL Database

Filed under: Java,NoSQL,Oracle — Patrick Durusau @ 6:58 pm

Oracle Releases NoSQL Database by Leila Meyer.

From the post:

Oracle has released Oracle NoSQL Database 11g, the company’s new entry into the NoSQL database market. Oracle NoSQL Database is a distributed, highly scalable, key-value database that uses the Oracle Berkeley Database Java Edition as its underlying storage system. Developed as a key component of the Oracle Big Data Appliance that was unveiled Oct. 3, Oracle NoSQL Database is available now as a standalone product.

(see the post for the list of features and other details)

Oracle NoSQL Database will be available in a Community Edition through an open source license and an Enterprise Edition through an Oracle Technology Network (OTN) license. The Community Edition is still awaiting final licensing approval, but the Enterprise Edition is available now for download from the Oracle Technology Network.

Don’t know that I will have the time but it would be amusing to compare the actual release with pre-release speculation about its features and capabilities.

More to follow as information becomes available.

LingPipe and Text Processing Books

Filed under: Java,LingPipe,Natural Language Processing — Patrick Durusau @ 6:57 pm

LingPipe and Text Processing Books

From the website:

We’ve decided to split what used to be the monolithic LingPipe book in two. As they’re written, we’ll be putting up drafts here.

NLP with LingPipe

You can download the PDF of the LingPipe book here:

Carpenter, Bob and Breck Baldwin. 2011. Natural Language Processing with LingPipe 4. Draft 0.5. June 2011. [Download: lingpipe-book-0.5.pdf]

Text Processing with Java

The PDF of the book on text in Java is here:

Carpenter, Bob, Mitzi Morris, and Breck Baldwin. 2011. Text Processing with Java 6. Draft 0.5. June 2011. [Download: java-text-book-0.5.pdf]

The pages are 7 inches by 10 inches, so if you print, you have the choice of large margins (no scaling) or large print (print fit to page).

Source code is also available.

October 22, 2011

Java Wikipedia Library (JWPL)

Filed under: Data Mining,Java,Software — Patrick Durusau @ 3:16 pm

Java Wikipedia Library (JWPL)

From the post:

Lately, Wikipedia has been recognized as a promising lexical semantic resource. If Wikipedia is to be used for large-scale NLP tasks, efficient programmatic access to the knowledge therein is required.

JWPL (Java Wikipedia Library) is a open-source, Java-based application programming interface that allows to access all information contained in Wikipedia. The high-performance Wikipedia API provides structured access to information nuggets like redirects, categories, articles and link structure. It is described in our LREC 2008 paper.

JWPL contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page. The parser can also be used stand-alone with other texts using MediaWiki markup.

Further, JWPL contains the tool JWPLDataMachine that can be used to create JWPL dumps from the publicly available dumps at download.wikimedia.org.

Wikipedia is a resource of growing interest. This toolkit may prove useful in mining it for topic map purposes.

October 19, 2011

Playing with microsatellites (Simple Sequence Repeats), Java, and Neo4j

Filed under: Bioinformatics,Java,Neo4j — Patrick Durusau @ 3:16 pm

Playing with microsatellites (Simple Sequence Repeats), Java, and Neo4j

From the post:

I just finished this afternoon a small project I had to do about identification of microsatellites in DNA sequences. As with every new project I start, I think of something that:

  • I didn’t try before
  • is worth learning
  • is applicable in order to meet the needs of the specific project

These last few days it was the chance to get to know and try the visualization tool included in the last version of Neo4j Webadmin dashboard.

I had already heard of it a couple of times from different sources but had not had the chance to play a bit with it yet. So, after my first contact with it I have to say that although it’s something Neo4j introduced in the last versions, it already has a decent GUI and promising functionality.

Covers his domain model and the results of same.

October 15, 2011

Emil Eifrem discusses why Neo4J is releveant to Java Development

Filed under: Java,Neo4j — Patrick Durusau @ 4:27 pm

Emil Eifrem discusses why Neo4J is releveant to Java Development

From the description:

Emil Eifrem has run a technology startup from both Malmo, Sweden and, now, Silicon Valley. He discusses the differences between Silicon Valley-based startups when compared to Swedish startups and he explains the reasons why Neo4J is relevant to Java developers. Emil discusses some of the challenges involved in running a startup and how Neo4J can help address database scalability issues. This interview with O’Reilly Media was conducted at Oracle’s OpenWorld/JavaOne 2011 in San Francisco, CA.

Each to his own but “why Neo4J is relevant to Java Development” isn’t the take away I have from the interview.

It isn’t fair to say the Semantic Web activity is “academic” and that is why it is failing. Google Wave wasn’t an academic project and it failed pretty quickly. I suppose being an academic at heart, I resent the notion that academics are impractical. Some are, some aren’t. Just as all commercial products don’t succeed simply because they have commercial backing. Oh, web servers for example.

Emil’s stronger point is that the Semantic Web does not solve a high priority problem for most users. Solve a problem a few people care about or who aren’t willing to pay the cost to solve, your project isn’t going very far.

Neo4j, for example, solves problems with highly connected data that cannot be addressed without the use of graph databases. That makes graph databases, of which Neo4j is one, very attractive and successful.

My take away: Emil Eifrem on Successful Startups (Ones With Solutions For High Priority Problems).

Great interview Emil!

October 9, 2011

Execution in the Kingdom of Nouns

Filed under: Java,Language,Language Design — Patrick Durusau @ 6:43 pm

Execution in the Kingdom of Nouns

From the post:

They’ve a temper, some of them—particularly verbs: they’re the proudest—adjectives you can do anything with, but not verbs—however, I can manage the whole lot of them! Impenetrability! That’s what I say!
— Humpty Dumpty

Hello, world! Today we’re going to hear the story of Evil King Java and his quest for worldwide verb stamp-outage.1

Caution: This story does not have a happy ending. It is neither a story for the faint of heart nor for the critical of mouth. If you’re easily offended, or prone to being a disagreeable knave in blog comments, please stop reading now.

Before we begin the story, let’s get some conceptual gunk out of the way.

What I find compelling is the notion that a programming language should follow how we think, that is the most of us.

If you want a successful topic map, should it follow/mimic the thinking of:

  1. the author
  2. the client
  3. intended user base?

#1 is easy, that’s the default and requires the least work.

#2 is instinctive, but you will need to educate the client to #3.

#3 is golden if you can hit that mark.

September 26, 2011

Building Distributed Indexing for Solr: MurmurHash3 for Java

Filed under: Indexing,Java,Solr — Patrick Durusau @ 7:01 pm

Building Distributed Indexing for Solr: MurmurHash3 for Java by Yonik Seeley.

From the post:

Background

I needed a really good hash function for the distributed indexing we’re implementing for Solr. Since it will be used for partitioning documents, it needed to be really high quality (well distributed) since we don’t want uneven shards. It also needs to be cross-platform, so a client could calculate this hash value themselves if desired, to predict which node has a given document.

MurmurHash3

MurmurHash3 is one of the top favorite new hash function these days, being both really fast and of high quality. Unfortunately it’s written in C++, and a quick google did not yield any suitable high quality port. So I took 15 minutes (it’s small!) to port the 32 bit version, since it should be faster than the other versions for small keys like document ids. It works in 32 bit chunks and produces a 32 bit hash – more than enough for partitioning documents by hash code.

Something for your Solr friends.

September 22, 2011

Neo4j and Spring – Practical Primer 28 Sept 2011 (London)

Filed under: Java,Neo4j,Spring Data — Patrick Durusau @ 6:24 pm

Neo4j and Spring – Practical Primer 28 Sept 2011 (London)

From the announcement:

In this talk Aleksa will introduce how you can integrate Neo4j with Spring – the popular Java enterprise framework.

Topics covered will include declarative transactions, object-to-graph mapping using Spring-data-graph components, as well collections mapping using Cypher and Gremlin annotation support introduced in newly released version 1.1.0 of spring-data-graph. You can expect a lot of hands on coding and practical examples while exploring the latest features of neo4j and spring-data-graph.

Aleksa Vukotic is a Data Management Practice lead and Spring veteran with extensive experience as author, trainer, architect and developer. Working with graph data models on several projects, Neo4j has become Aleksa’s technology of choice for solving complex graph related problems.

Looks interesting to me!

September 8, 2011

JavaZone 2011 Videos

Filed under: Conferences,Java — Patrick Durusau @ 5:51 pm

JavaZone 2011 Videos

These just appeared online today and you are the best judge of the ones that interest you.

If you think some need to be called out, give a shout!

September 2, 2011

Jfokus 14-16 February 2012 – Call for Papers

Filed under: Conferences,Java — Patrick Durusau @ 7:53 pm

Jfokus 14-16 February 2012 – Call for Papers

Judging from prior years, there will be more than a few presentations of interest to topic mappers at this conference.

If you submit your proposal by October 1, 2011, your presentation could be one of them.

August 23, 2011

Mulan: A Java Library for Multi-Label Learning

Filed under: Java,Machine Learning — Patrick Durusau @ 6:38 pm

Mulan: A Java Library for Multi-Label Learning

From the website:

Mulan is an open-source Java library for learning from multi-label datasets. Multi-label datasets consist of training examples of a target function that has multiple binary target variables. This means that each item of a multi-label dataset can be a member of multiple categories or annotated by many labels (classes). This is actually the nature of many real world problems such as semantic annotation of images and video, web page categorization, direct marketing, functional genomics and music categorization into genres and emotions. An introduction on mining multi-label data is provided in (Tsoumakas et al., 2010).

Currently, the library includes a variety of state-of-the-art algorithms for performing the following major multi-label learning tasks:

  • Classification. This task is concerned with outputting a bipartition of the labels into relevant and irrelevant ones for a given input instance.
  • Ranking. This task is concerned with outputting an ordering of the labels, according to their relevance for a given data item
  • Classification and ranking. A combination of the two tasks mentioned-above.

In addition, the library offers the following features:

  • Feature selection. Simple baseline methods are currently supported.
  • Evaluation. Classes that calculate a large variety of evaluation measures through hold-out evaluation and cross-validation.

July 24, 2011

Neo4j, the open source Java graph database, and Windows Azure

Filed under: Java,Neo4j,Windows Azure — Patrick Durusau @ 6:45 pm

Neo4j, the open source Java graph database, and Windows Azure by Josh Sandhu.

From the post:

Recently I was travelling in Europe. I alwasy find it a pleasure to see a mixture of varied things nicely co-mingling together. Old and new, design and technology, function and form all blend so well together and there is no better place to see this than in Malmö Sweden at the offices of Diversify Inc., situated in a building built in the 1500’s with a new savvy workstyle. This also echoed at the office of Neo Technology in a slick and fancy incubator, Minc, situated next to the famous Turning Torso building and Malmö University in the new modern development of the city.

My new good friends, Diversify’s Magnus Mårtensson, Micael Carlstedt, Björn Ekengren, Martin Stenlund and Neo Technology’s Peter Neubauer hosted my colleague Anders Wendt from Microsoft Sweden, and me. The topic of this meeting was about Neo Technology’s Neo4j, open source graph database, and Windows Azure. Neo4j is written in Java, but also has a RESTful API and supports multiple languages. The database works as an object-oriented, flexible network structure rather than as strict and static tables. Neo4j is also based on graph theory and it has the ability to digest and work with lots of data and scale is well suited to the cloud. Diversify has been doing some great work getting Java to work with Windows Azure and has given us on the Interoperability team a lot of great feedback on the tools Microsoft is building for Java. They have also been working with some live customers and have released a new case study published in Swedish and an English version made available by Diversify on their blog.

The most interesting part of the interviews was the statement that getting a Java application to run in Azure wasn’t hard. Getting a Java application to run well in Azure was another matter.

That was the disappointing aspect of this post as well. So other steps are required to get Neo4j to run well on Azure. How about something more than the general statement? Something that developers could use to judge the difficulty in considering a move to Azure?

Supplemental materials on getting Neo4j to run well on Azure would take this from a “we are all excited” piece, despite there being some disclosed set of issues, to being a substantive contribution towards overcoming interoperability issues to everyone’s benefit.

« Newer PostsOlder Posts »

Powered by WordPress