Archive for the ‘Java’ Category

U.S. sides with Oracle in Java copyright dispute with Google

Wednesday, May 27th, 2015

U.S. sides with Oracle in Java copyright dispute with Google by John Ribeiro.

From the post:

The administration of President Barack Obama sided with Oracle in a dispute with Google on whether APIs, the specifications that let programs communicate with each other, are copyrightable.

Nothing about the API (application programming interface) code at issue in the case materially distinguishes it from other computer code, which is copyrightable, wrote Solicitor General Donald B. Verrilli in a filing in the U.S. Supreme Court.

The court had earlier asked for the government’s views in this controversial case, which has drawn the attention of scientists, digital rights group and the tech industry for its implications on current practices in developing software.

Although Google has raised important concerns about the effects that enforcing Oracle’s copyright could have on software development, those concerns are better addressed through a defense on grounds of fair use of copyrighted material, Verrilli wrote.

Neither the ScotusBlog case page, Google Inc. v. Oracle America, Inc., nor the Solitor General’s Supreme Court Brief page, as of May 27, 2015, has a copy of the Solicitor General’s brief.

I hesitate to comment on the Solicitor General’s brief sight unseen, as media reports on legal issues are always vague and frequently wrong.

Whatever Solicitor General Verrilli may or may not have said to one side, software interoperability should be the default, not something established by affirmative defenses. Public policy should encourage interoperability of software.

Consumers, large and small, should be aware that reduction of interoperability between software means higher costs for consumers. Something to keep in mind when you are looking for a vendor.

SmileMiner [Conflicting Data Science Results?]

Tuesday, March 3rd, 2015


From the webpage:

SmileMiner (Statistical Machine Intelligence and Learning Engine) is a pure Java library of various state-of-art machine learning algorithms. SmileMiner is self contained and requires only Java standard library.

SmileMiner is well documented and you can browse the javadoc for more information. A basic tutorial is avilable on the project wiki.

To see SmileMiner in action, please download the demo jar file and then run java -jar smile-demo.jar.

  • Classification: Support Vector Machines, Decision Trees, AdaBoost, Gradient Boosting, Random Forest, Logistic Regression, Neural Networks, RBF Networks, Maximum Entropy Classifier, KNN, Naïve Bayesian, Fisher/Linear/Quadratic/Regularized Discriminant Analysis.
  • Regression: Support Vector Regression, Gaussian Process, Regression Trees, Gradient Boosting, Random Forest, RBF Networks, OLS, LASSO, Ridge Regression.
  • Feature Selection: Genetic Algorithm based Feature Selection, Ensemble Learning based Feature Selection, Signal Noise ratio, Sum Squares ratio.
  • Clustering: BIRCH, CLARANS, DBScan, DENCLUE, Deterministic Annealing, K-Means, X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical Clustering, Sequential Information Bottleneck, Self-Organizing Maps, Spectral Clustering, Minimum Entropy Clustering.
  • Association Rule & Frequent Itemset Mining: FP-growth mining algorithm
  • Manifold learning: IsoMap, LLE, Laplacian Eigenmap, PCA, Kernel PCA, Probabilistic PCA, GHA, Random Projection
  • Multi-Dimensional Scaling: Classical MDS, Isotonic MDS, Sammon Mapping
  • Nearest Neighbor Search: BK-Tree, Cover Tree, KD-Tree, LSH
  • Sequence Learning: Hidden Markov Model.

Great to have another machine learning library but it reminded me of a question I read yesterday:

When two teams of data scientists report conflicting results, how does a manager choose between them?

There is a view, says Florian Zettelmeyer, the Nancy L. Ertle Professor of Marketing, that data science represents disembodied truth.

Zettelmeyer, himself a data scientist, fervently disagrees with that view.

“Data science fundamentally speaks to management decisions” he said, “and management decisions are fundamentally political. There are agendas and there are winners and losers. As a result, different teams will often come up with different conclusions and it is the job of a manager to be able to call it. This requires a ‘working knowledge of data science.’”

Granting it is a promotion for the Kellogg School of Management but Zettelmeyer has a good point.

I’m not so sure that a “working knowledge of data science” is required to choose between different answers in data science. A knowledge of what their superiors are likely to accept is a more likely criteria.

A good machine learning library should give you enough options to approximate the expected answer.

I first saw this in a tweet by Bence Arato.

Using Clojure To Generate Java To Reimplement Clojure

Thursday, November 13th, 2014

Using Clojure To Generate Java To Reimplement Clojure by Zach Tellman.

From the post:

Most data structures are designed to hold arbitrary amounts of data. When we talk about their complexity in time and space, we use big O notation, which is only concerned with performance characteristics as n grows arbitrarily large. Understanding how to cast an O(n) problem as O(log n) or even O(1) is certainly valuable, and necessary for much of the work we do at Factual. And yet, most instances of data structures used in non-numerical software are very small. Most lists are tuples of a few entries, and most maps are a few keys representing different facets of related data. These may be elements in a much larger collection, but this still means that the majority of operations we perform are on small instances.

But except in special cases, like 2 or 3-vectors that represent coordinates, it’s rarely practical to specify that a particular tuple or map will always have a certain number of entries. And so our data structures have to straddle both cases, behaving efficiently at all possible sizes. Clojure, however, uses immutable data structures, which means it can do an end run on this problem. Each operation returns a new collection, which means that if we add an element to a small collection, it can return something more suited to hold a large collection.

Tellman describes this problem and his solution in Predictably Fast Clojure. (The URL is to a time mark but I think the entire video is worth your time.)

If that weren’t cool enough, Tellman details the creation of 1000 lines of Clojure that generate 5500 lines of Java so his proposal can be rolled into Clojure.

What other data structures can be different when immutability is a feature?

Transducers – java, js, python, ruby

Saturday, November 8th, 2014

Transducers – java, js, python, ruby

Struggling with transducers?

Learn better by example?

Cognitect Labs has released transducers for Java, JavaScript, Ruby, and Python.

Clojure recently added support for transducers – composable algorithmic transformations. These projects bring the benefits of transducers to other languages:

BTW, take a look at Rich Hickey’s latest (as of Nov. 2014) video on Transducers.

Please forward to language specific forums.

Lucene 4 Essentials for Text Search and Indexing

Sunday, March 9th, 2014

Lucene 4 Essentials for Text Search and Indexing by Mitzi Morris.

From the post:

Here’s a short-ish introduction to the Lucene search engine which shows you how to use the current API to develop search over a collection of texts. Most of this post is excerpted from Text Processing in Java, Chapter 7, Text Search with Lucene.

Not too short! 😉

I have seen blurbs about Text Processing in Java but this post convinced me to put it on my wish list.


PS: As soon as a copy arrives I will start working on a review of it. If you want to see that happen sooner rather than later, ping me.

Class Scheduling [Tutorial FoundationDB]

Saturday, December 21st, 2013

Class Scheduling

From the post:

This tutorial provides a walkthrough of designing and building a simple application in Python using FoundationDB. In this tutorial, we use a few simple data modeling techniques. For a more in-depth discussion of data modeling in FoundationDB, see Data Modeling.

The concepts in this tutorial are applicable to all the languages supported by FoundationDB. If you prefer, you can see a version of this tutorial in:

The offering of the same tutorial in different languages looks like a clever idea.

Like using a polyglot edition of the Bible with parallel original text and translations.

In a polyglot, the associations between words in different languages are implied rather than explicit.

Introducing Luwak,…

Monday, December 9th, 2013

Introducing Luwak, a library for high-performance stored queries by Charlie Hull.

From the post:

A few weeks ago we spoke in Dublin at Lucene Revolution 2013 on our work in the media monitoring sector for various clients including Gorkana and Australian Associated Press. These organisations handle a huge number (sometimes hundreds of thousands) of news articles every day and need to apply tens of thousands of stored expressions to each one, which would be extremely inefficient if done with standard search engine libraries. We’ve developed a much more efficient way to achieve the same result, by pre-filtering the expressions before they’re even applied: effectively we index the expressions and use the news article itself as a query, which led to the presentation title ‘Turning Search Upside Down’.

We’re pleased to announce the core of this process, a Java library we’ve called Luwak, is now available as open source software for your own projects. Here’s how you might use it:

That may sound odd, using the article as the query but be aware that Charlie reports “speeds of up to 70,000 stored queries applied to an article in around a second on modest hardware.

Perhaps not “big data speed” but certainly enough speed to get your attention.

Charlie mentions in his Dublin slides that Luwak could be used to “Add metadata to items based on their content.”

That one use case but creating topic/associations out of content would be another.

JML [Java Machine Learning]

Friday, November 8th, 2013

JML [Java Machine Learning] by Mingjie Qian.

From the webpage:

JML is a pure Java library for machine learning. The goal of JML is to make machine learning methods easy to use and speed up the code translation from MATLAB to Java. Tutorial-JML.pdf

Current version implements logistic regression, Maximum Entropy modeling (MaxEnt), AdaBoost, LASSO, KMeans, spectral clustering, Nonnegative Matrix Factorization (NMF), sparse NMF, Latent Semantic Indexing (LSI), Latent Dirichlet Allocation (LDA) (by Gibbs sampling based on by Gregor Heinrich), joint l_{2,1}-norms minimization, Hidden Markov Model (HMM), Conditional Random Field (CRF), etc. just for examples of implementing machine learning methods by using this general framework. The SVM package LIBLINEAR is also incorporated. I will try to add more important models such as Markov Random Field (MRF) to this package if I get the time:)

JML library’s another advantage is its complete independence from feature engineering, thus any preprocessed data could be run. For example, in the area of natural language processing, feature engineering is a crucial part for MaxEnt, HMM, and CRF to work well and is often embedded in model training. However, we believe that it is better to separate feature engineering and parameter estimation. On one hand, modularization could be achieved so that people can simply focus on one module without need to consider other modules; on the other hand, implemented modules could be reused without incompatibility concerns.

JML also provides implementations of several efficient, scalable, and widely used general purpose optimization algorithms, which are very important for machine learning methods be applicable on large scaled data, though particular optimization strategy that considers the characteristics of a particular problem is more effective and efficient (e.g., dual coordinate descent for bound constrained quadratic programming in SVM). Currently supported optimization algorithms are limited-memory BFGS, projected limited-memory BFGS (non-negative constrained or bound constrained), nonlinear conjugate gradient, primal-dual interior-point method, general quadratic programming, accelerated proximal gradient, and accelerated gradient descent. I would always like to implement more practical efficient optimization algorithms. (emphasis in original)

Something else “practical” for your weekend. 😉

JavaZone 2013

Sunday, September 29th, 2013

JavaZone 2013 (videos)

The JavaZone tweet I saw earlier today said five (5) lost videos had been found so all one hundred and forty-nine (149) videos are up for viewing!

I should have saved this one for the holidays but at one or two a day, you may be done by the holidays! 😉

GraphHopper Maps…

Tuesday, July 23rd, 2013

GraphHopper Maps – High Performance and Customizable Routing in Java by Peter Karich.

From the post:

Today we’re proud to announce the first stable release of GraphHopper! After over a year of busy development we finally reached version 0.1!

GraphHopper is a fast and Open Source road routing engine written in Java based on OpenStreetMap data. It handles the full planet on a 15GB server but is also scales down and can be embedded into your application! This means you’re able to run Germany-wide queries on Android with only 32MB in a few seconds. You can download the Android offline routing demo or have a look at our web instance which has world wide coverage for car, bike and pedestrian:

GraphHopper Java Routing

The trip to the current state of GraphHopper was rather stony as we had to start from scratch as there is currently no fast Java-based routing engine. What we’ve built is quite interesting as it shows that a Java application can be as fast as Bing or Google Maps (in 2011) and beats YOURS, MapQuest and Cloudmade according to the results outlined in a Blog post from Pascal and with tests against GraphHopper – although OSRM is still ahead. But how can a Java application be so fast? One important side is the used algorithm: Contraction Hierarchies – a ‘simple’ shortcutting technique to speed up especially lengthy queries. But even without this algorithm GraphHopper is fast which is a result of weeks of tuning for less memory consumption (yes, memory has something to do with speed), profiling and tweaking. But not only the routing is fast and memory efficient also the import process. And it should be easy to get started and modify GraphHopper to your needs.

Contraction hierarchies are a very active area of graph research.

Contraction Hierarchies at Wikipedia has a nice coverage with a pointer to Robert Geisberger’s thesis, Contraction Hierarchies: Faster and Simpler
Hierarchical Routing in Road Networks

You may also be interested in:

Efficient Route Planning by Prof. Dr. Hannah Bast. A wiki for a 2012 summer course on route planning. Includes videos, slides, exercises, etc.

Esri Geometry API

Wednesday, March 27th, 2013

Esri Geometry API

From the webpage:


The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.


  • API methods to create simple geometries directly with the API, or by importing from supported formats: JSON, WKT, and Shape
  • API methods for spatial operations: union, difference, intersect, clip, cut, and buffer
  • API methods for topological relationship tests: equals, within, contains, crosses, and touches

This looks particularly useful for mapping the rash of “public” data sets to facts on the ground.

Particularly if income levels, ethnicity, race, religion and other factors are taken into account.

Might give more bite to the “excess population,” aka the “47%” people speak so casually about.

Additional resources:

ArcGIS Geodata Resource Center

ArcGIS Blog


AI Algorithms, Data Structures, and Idioms…

Tuesday, March 19th, 2013

AI Algorithms, Data Structures, and Idioms in Prolog, Lisp and Java by George F. Luger and William A. Stubblefield.

From the introduction:

Writing a book about designing and implementing representations and search algorithms in Prolog, Lisp, and Java presents the authors with a number of exciting opportunities.

The first opportunity is the chance to compare three languages that give very different expression to the many ideas that have shaped the evolution of programming languages as a whole. These core ideas, which also support modern AI technology, include functional programming, list processing, predicate logic, declarative representation, dynamic binding, meta-linguistic abstraction, strong-typing, meta-circular definition, and object-oriented design and programming. Lisp and Prolog are, of course, widely recognized for their contributions to the evolution, theory, and practice of programming language design. Java, the youngest of this trio, is both an example of how the ideas pioneered in these earlier languages have shaped modern applicative programming, as well as a powerful tool for delivering AI applications on personal computers, local networks, and the world wide web.

Where could you go wrong with comparing Prolog, Lisp and Java?

Either for the intellectual exercise or because you want a better understanding of AI, a resource to enjoy!

PersistIT [B+ Tree]

Thursday, March 7th, 2013

PersistIT: A fast, transactional, Java B+Tree library

From the webpage:

Akiban PersistIT is a key/value data storage library written in Java™. Key features include:

  • Support for highly concurrent transaction processing with multi-version concurrency control
  • Optimized serialization and deserialization mechanism for Java primitives and objects
  • Multi-segment keys to enable a natural logical key hierarchy
  • Support for long records
  • Implementation of a persistent SortedMap
  • Extensive management capability including command-line and GUI tools

For more information

I mention this primarily because of the multi-segment keys, which I suspect could be useful for type hierarchies.

Possibly other uses as well but that is the first one that came to mind.

…O’Reilly Book on NLP with Java?

Friday, February 22nd, 2013

Anyone Want to Write an O’Reilly Book on NLP with Java? by Bob Carpenter.

From the post:

Mitzi and I pitched O’Reilly books a revision of the Text Processing in Java book that she’s been finishing off.

The response from their editor was that they’d love to have an NLP book based on Java, but what we provided looked like everything-but-the-NLP you’d need for such a book. Insightful, these editors. That’s exactly how the book came about, when the non-proprietary content was stripped out of the LingPipe Book.

I happen to still think that part of the book is incredibly useful. It covers all of unicode, UCI for normalization and detection, all of the streaming I/O interfaces, codings in HTML, XML and JSON, as well as in-depth coverage of reg-exes, Lucene, and Solr. All of the stuff that is continually misunderstood and misconfigured so that I have to spend way too much of my time sorting it out. (Mitzi finished the HTML, XML and JSON chapter, and is working on Solr; she tuned Solr extensively on her last consulting gig, by the way, if anyone’s looking for a Lucene/Solr developer).

Read Bob’s post and give him a shout if you are interested.

Would be a good exercise in learning how choices influence the “objective” outcomes.

Streaming Histograms for Clojure and Java

Wednesday, February 13th, 2013

Streaming Histograms for Clojure and Java

From the post:

We’re happy to announce that we’ve open-sourced our “fancy” streaming histograms. We’ve talked about them before, but now the project has been tidied up and is ready to share.

PDF & CDF for a 32-bin histogram approximating a multimodal distribution.

The histograms are a handy way to compress streams of numeric data. When you want to summarize a stream using limited memory there are two general options. You can either store a sample of data in hopes that it is representative of the whole (such as a reservoir sample) or you can construct some summary statistics, updating as data arrives. The histogram library provides a tool for the latter approach.

The project is a Clojure/Java library. Since we use a lot of Clojure at BigML, the readme’s examples are all Clojure oriented. However, Java developers can still find documentation for the histogram’s public methods.

A tool for visualizing/exploring large amounts of numeric data.

Setting up Java GraphChi development environment…

Sunday, February 10th, 2013

Setting up Java GraphChi development environment – and running sample ALS by Danny Bickson.

From the post:

As you may know, our GraphChi collaborative filtering toolkit in C is becoming more and more popular. Recently, Aapo Kyrola did a great effort for porting GraphChi C into Java and implementing more methods on top of it.

In this blog post I explain how to setup GraphChi Java development environment in Eclipse and run alternating least squares algorithm (ALS) on a small subset of Netflix data.

Based on the level of user feedback I am going to receive for this blog post, we will consider porting more methods to Java. So email me if you are interested in trying it out.

If you are interested in more machine learning methods in Java, here’s your chance!

Not to mention your interest in graph based solutions.

Javadoc coding standards

Friday, November 23rd, 2012

Javadoc coding standards by Stephen Colebourne.

From the post:

These are the standards I tend to use when writing Javadoc. Since personal tastes differ, I’ve tried to explain some of the rationale for some of my choices. Bear in mind that this is more about the formatting of Javadoc, than the content of Javadoc.

There is an Oracle guide which is longer and more detailed than this one. The two agree in most places, however these guidelines are more explicit about HTML tags, two spaces in @param and null-specification, and differ in line lengths and sentence layout.

Each of the guidelines below consists of a short description of the rule and an explanation, which may include an example:

Documentation of source code is vital to its maintenance. (cant)

But neither Stephen nor Oracle made much of the need to document the semantics of the source and/or data. If I am indexing/mapping across source files, &ltcode> elements aren’t going to be enough to compare field names across documents.

I am assuming that semantic diversity is as present in source code as elsewhere. Would you assume otherwise?

Lucene with Zing, Part 2

Wednesday, November 21st, 2012

Lucene with Zing, Part 2 by Mike McCandless.

From the post:

When I last tested Lucene with the Zing JVM the results were impressive: Zing’s fully concurrent C4 garbage collector had very low pause times with the full English Wikipedia index (78 GB) loaded into RAMDirectory, which is not an easy feat since we know RAMDirectory is stressful for the garbage collector.

I had used Lucene 4.0.0 alpha for that first test, so I decided to re-test using Lucene’s 4.0.0 GA release and, surprisingly, the results changed! MMapDirectory’s max throughput was now better than RAMDirectory’s (versus being much lower before), and the concurrent mark/sweep collector (-XX:-UseConcMarkSweepGC) was no longer hitting long GC pauses.

This was very interesting! What change could improve MMapDirectory’s performance, and lower the pressure on concurrent mark/sweep’s GC to the point where pause times were so much lower in GA compared to alpha?

Mike updates his prior experience with Lucene and Zing.

Covers the use gcLogAnalyser and Fragger to understand “why” his performance test results changed from the alpha to GA releases.

Insights into both Lucene and Zing.

Have you considered loading your topic map into RAM?

Full power to the Neo4j engines, Mr. Scott!

Tuesday, November 6th, 2012

René’s title: “Get the full neo4j power by using the Core Java API for traversing your Graph data base instead of Cypher Query Language“, makes you appreciate why René’s day job is “computer scientist” and not “ad copy writer.” 😉

René compares working with Neo4j via:

  • Java Core API
  • Traverser Framework
  • Cypher Query Language

And that is the order of their performance, from fastest to slowest:

  • Java Core API – Order of magnitude faster than Cypher
  • Traverser Framework – 25% slower than Java Core
  • Cypher Query Language – Slowest

Order of magnitude improvements tend to attract the attention of commercial customers and those with non-trivial data sets.

That is if you need performance today, not someday.

The Impedance Mismatch is Our Fault

Friday, November 2nd, 2012

The Impedance Mismatch is Our Fault by Stuart Halloway.

From the summary:

Stuart Dabbs Halloway explains what the impedance mismatch is and what can be done to solve it in the context of RDBMS, OOP, and NoSQL.

If you haven’t seen one of Stuart’s presentations, you need to treat yourself to this one.

Two points, among many others, to consider:

In “reality,”

  • Records are immutable.
  • Reality is cumulative.

How does your topic map application compare on those two points?

Coming soon on JAXenter: videos from JAX London [What Does Hardware Know?]

Wednesday, October 31st, 2012

Coming soon on JAXenter: videos from JAX London by Elliot Bentley.

From the post:

Can you believe it’s only been two weeks since JAX London? We’re already planning for the next one at JAX Towers (yes, really).

Yet if you’re already getting nostalgic, never fear – JAXenter is on hand to help you relive those glorious yet fleeting days, and give a taste of what you may have missed.

For a start, we’ve got videos of almost every session in the main room, including keynotes from Doug Cutting, Patrick Debois, Steve Poole and Martijn Verburg & Kirk Pepperdine, which we’ll be releasing gradually onto the site over the coming weeks. Slides for the rest of JAX London’s sessions are already freely available on SlideShare.

Pepperdine and Verburg, “Java and the Machine,” remark:

There’s no such thing as a process as far as the hardware is concerned.

A riff I need to steal to say:

There’s no such thing as semantics as far as the hardware is concerned.

We attribute semantics to data for input, we attribute semantics to processing of data by hardware, we attribute semantics to computational results.

I didn’t see a place for hardware in that statement. Do you?

Couchbase Java API Cheat Sheet Revisited

Thursday, September 27th, 2012

Couchbase Java API Cheat Sheet Revisited by Don Pinto.

From the post:

With the release of Couchbase Server 2.0 – Beta, I thought I’d take some time to update the Couchbase JAVA API Cheat Sheet I had posted earlier. Couchbase Server 2.0 has a lot of awesome features and the 2.0 compatible Java APIs are available in the Java SDK 1.1 Dev Preview 3.

What’s new?

  • Lots of new APIs to build and execute queries against views defined in Couchbase Server
  • APIs to specify persistence requirements
  • APIs to specify replication requirements

Hope you find this new cheat sheet helpful. I’ll be happy to know of any cool projects that you create using the new Java API. Or better yet, just share code via your Github account with us and other users.

Would look best with a color printer.

No suggestions so far on topic map cheat sheets.

Maybe I should have asked about “subject” cheat sheets?

The results of analysis/identification/modeling of subjects in public data sets.

New Tool: JMXC – JMX Console

Tuesday, September 25th, 2012

New Tool: JMXC – JMX Console

From the post:

When you are obsessed with performance and run a performance monitoring service like Sematext does, you need a quick and easy way to inspect Java apps’ MBeans in JMX. We just open-sourced JMXC, our 1-class tool for dumping the contents of JMX, or specific MBeans. This is a true and super-simple, no external dependencies console tool that can connect to JMX via Java application PID or via JMX URL and can dump either all MBeans or those specified on the command line.

JMX lives at along with other Sematext open-source tools. Feedback and pull requests welcome! Enjoy!

If that sounds a tad cryptic, try reading: Introducing MBeans.

Too good of an opportunity to highlight Sematext’s open source tools to miss.

Java: Parsing CSV files

Sunday, September 23rd, 2012

Java: Parsing CSV files by Mark Needham

Mark is switching to OpenCSV.

See his post for how he is using OpenCSV and other info.

Real-Time Twitter Search by @larsonite

Saturday, September 22nd, 2012

Real-Time Twitter Search by @larsonite by Marti Hearst.

From the post:

Brian Larson gives a brilliant technical talk about how real-time search Real-Time Twitter Search by @larsoniteworks at Twitter; He really knows what he’s talking about given that he’s the tech lead for search and relevance at Twitter!

The coverage of the real-time indexing, Java memory model, safe publication were particularly good.

As a bonus, also discusses relevance near the end of the presentation.

You may want to watch this more than once!

Brian recommends Java Concurrency in Practice by Brian Goetz as having good coverage of the Java memory model.

Customizing the java classes for the NCBI generated by XJC

Saturday, September 8th, 2012

Customizing the java classes for the NCBI generated by XJC by Pierre Lindenbaum.

From the post:

Reminder: XJC is the Java XML Binding Compiler. It automates the mapping between XML documents and Java objects:

(mapping graphic omitted)

The code generated by XJC allows to :

  • Unmarshal XML content into a Java representation
  • Access and update the Java representation
  • Marshal the Java representation of the XML content into XML content

This post caught my eye because Pierre is adding an “equals” method.

It is a string equivalence test and for data in question that makes sense.

Your “equivalence” test might be more challenging.

The new Java 0Day examined

Wednesday, August 29th, 2012

The new Java 0Day examined

From the post:

A first analysis of the Java 0Day exploit code, which is already publicly available, suggests that the exploit is rather hard to notice: at first glance, the dangerous code looks just like any other Java program with no trace of any exotic bytecode. According to Michael Schierl, who has discovered several Java holes himself, the code’s secret is that it does something which it isn’t allowed to do: it uses the internal sun.awt.SunToolkit class to disable the SecurityManager, and ultimately the sandbox of Java.

The sun.awt.SunToolkit class gives public (public) access to a method called getField() that provides access to the private attributes of other classes. Technically speaking, untrusted code such as the exploit that is being executed in the browser shouldn’t be able to access this method at all. But Java 7 introduced a new method to the Expression class, .execute(), which allowed expressions created at runtime to be executed. Bugs in the implementation of the new method allows the code to gain access to the getField() method.

I’m not going to make a habit out of reporting security issues, with Java or otherwise but this looked worth passing along.

Curious, with all the design pattern books, are there any design flaw pattern books?

Java for graphics cards

Sunday, August 19th, 2012

Java for graphics cards

From the post:

Phil Pratt-Szeliga, a postgraduate at Syracuse University in New York, has released the source code of his Rootbeer GPU compiler on Github. The developer presented the software at the High Performance Computing and Communication conference in Liverpool in June. The slides from this presentation can be found in the documentation section of the Github directory.

Short summary of Phil Pratt-Szeliga’s GPU compiler.

Is it a waste to have GPU cycles lying around or is there some more fundamental issue at stake?

To what degree does chip architecture drive choices at higher levels of abstraction?

Suggestions of ways to explore that question?

Announcing Scalable Performance Monitoring (SPM) for JVM

Tuesday, August 7th, 2012

Announcing Scalable Performance Monitoring (SPM) for JVM (Sematext)

From the post:

Up until now, SPM existed in several flavors for monitoring Solr, HBase, ElasticSearch, and Sensei. Besides metrics specific to a particular system type, all these SPM flavors also monitor OS and JVM statistics. But what if you want to monitor any Java application? Say your custom Java application run either in some container, application server, or from a command line? You don’t really want to be forced to look at blank graphs that are really meant for stats from one of the above mentioned systems. This was one of our own itches, and we figured we were not the only ones craving to scratch that itch, so we put together a flavor of SPM for monitoring just the JVM and (Operating) System metrics.

Now SPM lets you monitor OS and JVM performance metrics of any Java process through the following 5 reports, along with all other SPM functionality like integrated Alerts, email Subscriptions, etc. If you are one of many existing SPM users these graphs should look very familiar.

JVM monitoring isn’t like radio station management where you can listen for dead air. It a bit more complicated than that.

SPM may help with it.

Beyond the JVM and OS, how do you handle monitoring of topic map applications?

Writing a modular GPGPU program in Java

Monday, August 6th, 2012

Writing a modular GPGPU program in Java by Masayuki Ioki, Shumpei Hozumi, and Shigeru Chiba.


This paper proposes a Java to CUDA runtime program translator for scientific-computing applications. Traditionally, these applications have been written in Fortran or C without using a rich modularization mechanism. Our translator enables those applications to be written in Java and run on GPGPUs while exploiting a rich modularization mechanism in Java. This translator dynamically generates optimized CUDA code from a Java program given at bytecode level when the program is running. By exploiting dynamic type information given at translation, the translator devirtualizes dynamic method dispatches and flattens objects into simple data representation in CUDA. To do this, a Java program must be written to satisfy certain constraints.

This paper also shows that the performance overheads due to Java and WootinJ are not significantly high.

Just in case you are starting to work on topic map processing routines for GPGPUs.

Something to occupy your time during the “dog days” of August.