Archive for the ‘Java’ Category
Wednesday, March 27th, 2013
Esri Geometry API
From the webpage:
geometry-api-java
The Esri Geometry API for Java can be used to enable spatial data processing in 3rd-party data-processing solutions. Developers of custom MapReduce-based applications for Hadoop can use this API for spatial processing of data in the Hadoop system. The API is also used by the Hive UDF’s and could be used by developers building geometry functions for 3rd-party applications such as Cassandra, HBase, Storm and many other Java-based “big data” applications.
Features
- API methods to create simple geometries directly with the API, or by importing from supported formats: JSON, WKT, and Shape
- API methods for spatial operations: union, difference, intersect, clip, cut, and buffer
- API methods for topological relationship tests: equals, within, contains, crosses, and touches
This looks particularly useful for mapping the rash of “public” data sets to facts on the ground.
Particularly if income levels, ethnicity, race, religion and other factors are taken into account.
Might give more bite to the “excess population,” aka the “47%” people speak so casually about.
Additional resources:
ArcGIS Geodata Resource Center
ArcGIS Blog
twitter@esri
Posted in GIS, Geometry, Java | No Comments »
Tuesday, March 19th, 2013
AI Algorithms, Data Structures, and Idioms in Prolog, Lisp and Java by George F. Luger and William A. Stubblefield.
From the introduction:
Writing a book about designing and implementing representations and search algorithms in Prolog, Lisp, and Java presents the authors with a number of exciting opportunities.
The first opportunity is the chance to compare three languages that give very different expression to the many ideas that have shaped the evolution of programming languages as a whole. These core ideas, which also support modern AI technology, include functional programming, list processing, predicate logic, declarative representation, dynamic binding, meta-linguistic abstraction, strong-typing, meta-circular definition, and object-oriented design and programming. Lisp and Prolog are, of course, widely recognized for their contributions to the evolution, theory, and practice of programming language design. Java, the youngest of this trio, is both an example of how the ideas pioneered in these earlier languages have shaped modern applicative programming, as well as a powerful tool for delivering AI applications on personal computers, local networks, and the world wide web.
Where could you go wrong with comparing Prolog, Lisp and Java?
Either for the intellectual exercise or because you want a better understanding of AI, a resource to enjoy!
Posted in Algorithms, Artificial Intelligence, Data Structures, Java, Lisp, Prolog | No Comments »
Thursday, March 7th, 2013
PersistIT: A fast, transactional, Java B+Tree library
From the webpage:
Akiban PersistIT is a key/value data storage library written in Java™. Key features include:
- Support for highly concurrent transaction processing with multi-version concurrency control
- Optimized serialization and deserialization mechanism for Java primitives and objects
- Multi-segment keys to enable a natural logical key hierarchy
- Support for long records
- Implementation of a persistent SortedMap
- Extensive management capability including command-line and GUI tools
For more information
I mention this primarily because of the multi-segment keys, which I suspect could be useful for type hierarchies.
Possibly other uses as well but that is the first one that came to mind.
Posted in B+Tree, Data Structures, Java | No Comments »
Friday, February 22nd, 2013
Anyone Want to Write an O’Reilly Book on NLP with Java? by Bob Carpenter.
From the post:
Mitzi and I pitched O’Reilly books a revision of the Text Processing in Java book that she’s been finishing off.
The response from their editor was that they’d love to have an NLP book based on Java, but what we provided looked like everything-but-the-NLP you’d need for such a book. Insightful, these editors. That’s exactly how the book came about, when the non-proprietary content was stripped out of the LingPipe Book.
I happen to still think that part of the book is incredibly useful. It covers all of unicode, UCI for normalization and detection, all of the streaming I/O interfaces, codings in HTML, XML and JSON, as well as in-depth coverage of reg-exes, Lucene, and Solr. All of the stuff that is continually misunderstood and misconfigured so that I have to spend way too much of my time sorting it out. (Mitzi finished the HTML, XML and JSON chapter, and is working on Solr; she tuned Solr extensively on her last consulting gig, by the way, if anyone’s looking for a Lucene/Solr developer).
Read Bob’s post and give him a shout if you are interested.
Would be a good exercise in learning how choices influence the “objective” outcomes.
Posted in Java, Natural Language Processing | No Comments »
Wednesday, February 13th, 2013
Streaming Histograms for Clojure and Java
From the post:
We’re happy to announce that we’ve open-sourced our “fancy” streaming histograms. We’ve talked about them before, but now the project has been tidied up and is ready to share.
PDF & CDF for a 32-bin histogram approximating a multimodal distribution.
The histograms are a handy way to compress streams of numeric data. When you want to summarize a stream using limited memory there are two general options. You can either store a sample of data in hopes that it is representative of the whole (such as a reservoir sample) or you can construct some summary statistics, updating as data arrives. The histogram library provides a tool for the latter approach.
The project is a Clojure/Java library. Since we use a lot of Clojure at BigML, the readme’s examples are all Clojure oriented. However, Java developers can still find documentation for the histogram’s public methods.
A tool for visualizing/exploring large amounts of numeric data.
Posted in Clojure, Graphics, Java, Visualization | No Comments »
Sunday, February 10th, 2013
Setting up Java GraphChi development environment – and running sample ALS by Danny Bickson.
From the post:
As you may know, our GraphChi collaborative filtering toolkit in C is becoming more and more popular. Recently, Aapo Kyrola did a great effort for porting GraphChi C into Java and implementing more methods on top of it.
In this blog post I explain how to setup GraphChi Java development environment in Eclipse and run alternating least squares algorithm (ALS) on a small subset of Netflix data.
Based on the level of user feedback I am going to receive for this blog post, we will consider porting more methods to Java. So email me if you are interested in trying it out.
If you are interested in more machine learning methods in Java, here’s your chance!
Not to mention your interest in graph based solutions.
Posted in GraphChi, Graphs, Java, Machine Learning | No Comments »
Friday, November 23rd, 2012
Javadoc coding standards by Stephen Colebourne.
From the post:
These are the standards I tend to use when writing Javadoc. Since personal tastes differ, I’ve tried to explain some of the rationale for some of my choices. Bear in mind that this is more about the formatting of Javadoc, than the content of Javadoc.
There is an Oracle guide which is longer and more detailed than this one. The two agree in most places, however these guidelines are more explicit about HTML tags, two spaces in @param and null-specification, and differ in line lengths and sentence layout.
Each of the guidelines below consists of a short description of the rule and an explanation, which may include an example:
Documentation of source code is vital to its maintenance. (cant)
But neither Stephen nor Oracle made much of the need to document the semantics of the source and/or data. If I am indexing/mapping across source files, <code> elements aren’t going to be enough to compare field names across documents.
I am assuming that semantic diversity is as present in source code as elsewhere. Would you assume otherwise?
Posted in Documentation, Java | No Comments »
Wednesday, November 21st, 2012
Lucene with Zing, Part 2 by Mike McCandless.
From the post:
When I last tested Lucene with the Zing JVM the results were impressive: Zing’s fully concurrent C4 garbage collector had very low pause times with the full English Wikipedia index (78 GB) loaded into RAMDirectory, which is not an easy feat since we know RAMDirectory is stressful for the garbage collector.
I had used Lucene 4.0.0 alpha for that first test, so I decided to re-test using Lucene’s 4.0.0 GA release and, surprisingly, the results changed! MMapDirectory’s max throughput was now better than RAMDirectory’s (versus being much lower before), and the concurrent mark/sweep collector (-XX:-UseConcMarkSweepGC) was no longer hitting long GC pauses.
This was very interesting! What change could improve MMapDirectory’s performance, and lower the pressure on concurrent mark/sweep’s GC to the point where pause times were so much lower in GA compared to alpha?
Mike updates his prior experience with Lucene and Zing.
Covers the use gcLogAnalyser and Fragger to understand “why” his performance test results changed from the alpha to GA releases.
Insights into both Lucene and Zing.
Have you considered loading your topic map into RAM?
Posted in Indexing, Java, Lucene, Zing JVM | No Comments »
Tuesday, November 6th, 2012
René’s title: “Get the full neo4j power by using the Core Java API for traversing your Graph data base instead of Cypher Query Language“, makes you appreciate why René’s day job is “computer scientist” and not “ad copy writer.”
René compares working with Neo4j via:
- Java Core API
- Traverser Framework
- Cypher Query Language
And that is the order of their performance, from fastest to slowest:
- Java Core API – Order of magnitude faster than Cypher
- Traverser Framework – 25% slower than Java Core
- Cypher Query Language – Slowest
Order of magnitude improvements tend to attract the attention of commercial customers and those with non-trivial data sets.
That is if you need performance today, not someday.
Posted in Cypher, Graphs, Java, Neo4j, Networks | No Comments »
Friday, November 2nd, 2012
The Impedance Mismatch is Our Fault by Stuart Halloway.
From the summary:
Stuart Dabbs Halloway explains what the impedance mismatch is and what can be done to solve it in the context of RDBMS, OOP, and NoSQL.
If you haven’t seen one of Stuart’s presentations, you need to treat yourself to this one.
Two points, among many others, to consider:
In “reality,”
- Records are immutable.
- Reality is cumulative.
How does your topic map application compare on those two points?
Posted in Java, ORM, Programming, Topic Map Software | No Comments »
Wednesday, October 31st, 2012
Coming soon on JAXenter: videos from JAX London by Elliot Bentley.
From the post:
Can you believe it’s only been two weeks since JAX London? We’re already planning for the next one at JAX Towers (yes, really).
Yet if you’re already getting nostalgic, never fear – JAXenter is on hand to help you relive those glorious yet fleeting days, and give a taste of what you may have missed.
For a start, we’ve got videos of almost every session in the main room, including keynotes from Doug Cutting, Patrick Debois, Steve Poole and Martijn Verburg & Kirk Pepperdine, which we’ll be releasing gradually onto the site over the coming weeks. Slides for the rest of JAX London’s sessions are already freely available on SlideShare.
Pepperdine and Verburg, “Java and the Machine,” remark:
There’s no such thing as a process as far as the hardware is concerned.
A riff I need to steal to say:
There’s no such thing as semantics as far as the hardware is concerned.
We attribute semantics to data for input, we attribute semantics to processing of data by hardware, we attribute semantics to computational results.
I didn’t see a place for hardware in that statement. Do you?
Posted in CS Lectures, Java, Performance, Processing, Programming | No Comments »
Thursday, September 27th, 2012
Couchbase Java API Cheat Sheet Revisited by Don Pinto.
From the post:
With the release of Couchbase Server 2.0 – Beta, I thought I’d take some time to update the Couchbase JAVA API Cheat Sheet I had posted earlier. Couchbase Server 2.0 has a lot of awesome features and the 2.0 compatible Java APIs are available in the Java SDK 1.1 Dev Preview 3.
What’s new?
- Lots of new APIs to build and execute queries against views defined in Couchbase Server
- APIs to specify persistence requirements
- APIs to specify replication requirements
Hope you find this new cheat sheet helpful. I’ll be happy to know of any cool projects that you create using the new Java API. Or better yet, just share code via your Github account with us and other users.
Would look best with a color printer.
No suggestions so far on topic map cheat sheets.
Maybe I should have asked about “subject” cheat sheets?
The results of analysis/identification/modeling of subjects in public data sets.
Posted in Couchbase, Java | No Comments »
Tuesday, September 25th, 2012
New Tool: JMXC – JMX Console
From the post:
When you are obsessed with performance and run a performance monitoring service like Sematext does, you need a quick and easy way to inspect Java apps’ MBeans in JMX. We just open-sourced JMXC, our 1-class tool for dumping the contents of JMX, or specific MBeans. This is a true and super-simple, no external dependencies console tool that can connect to JMX via Java application PID or via JMX URL and can dump either all MBeans or those specified on the command line.
JMX lives at https://github.com/sematext/jmxc along with other Sematext open-source tools. Feedback and pull requests welcome! Enjoy!
If that sounds a tad cryptic, try reading: Introducing MBeans.
Too good of an opportunity to highlight Sematext’s open source tools to miss.
Posted in Java, Performance | No Comments »
Sunday, September 23rd, 2012
Java: Parsing CSV files by Mark Needham
Mark is switching to OpenCSV.
See his post for how he is using OpenCSV and other info.
Posted in CSV, Java | No Comments »
Saturday, September 8th, 2012
Customizing the java classes for the NCBI generated by XJC by Pierre Lindenbaum.
From the post:
Reminder: XJC is the Java XML Binding Compiler. It automates the mapping between XML documents and Java objects:
(mapping graphic omitted)
The code generated by XJC allows to :
- Unmarshal XML content into a Java representation
- Access and update the Java representation
- Marshal the Java representation of the XML content into XML content
This post caught my eye because Pierre is adding an “equals” method.
It is a string equivalence test and for data in question that makes sense.
Your “equivalence” test might be more challenging.
Posted in Bioinformatics, Java | No Comments »
Wednesday, August 29th, 2012
The new Java 0Day examined
From the post:
A first analysis of the Java 0Day exploit code, which is already publicly available, suggests that the exploit is rather hard to notice: at first glance, the dangerous code looks just like any other Java program with no trace of any exotic bytecode. According to Michael Schierl, who has discovered several Java holes himself, the code’s secret is that it does something which it isn’t allowed to do: it uses the internal sun.awt.SunToolkit class to disable the SecurityManager, and ultimately the sandbox of Java.
The sun.awt.SunToolkit class gives public (public) access to a method called getField() that provides access to the private attributes of other classes. Technically speaking, untrusted code such as the exploit that is being executed in the browser shouldn’t be able to access this method at all. But Java 7 introduced a new method to the Expression class, .execute(), which allowed expressions created at runtime to be executed. Bugs in the implementation of the new method allows the code to gain access to the getField() method.
I’m not going to make a habit out of reporting security issues, with Java or otherwise but this looked worth passing along.
Curious, with all the design pattern books, are there any design flaw pattern books?
Posted in Java, Security | 2 Comments »
Sunday, August 19th, 2012
Java for graphics cards
From the post:
Phil Pratt-Szeliga, a postgraduate at Syracuse University in New York, has released the source code of his Rootbeer GPU compiler on Github. The developer presented the software at the High Performance Computing and Communication conference in Liverpool in June. The slides from this presentation can be found in the documentation section of the Github directory.
…
Short summary of Phil Pratt-Szeliga’s GPU compiler.
Is it a waste to have GPU cycles lying around or is there some more fundamental issue at stake?
To what degree does chip architecture drive choices at higher levels of abstraction?
Suggestions of ways to explore that question?
Posted in GPU, Java | No Comments »
Tuesday, August 7th, 2012
Announcing Scalable Performance Monitoring (SPM) for JVM (Sematext)
From the post:
Up until now, SPM existed in several flavors for monitoring Solr, HBase, ElasticSearch, and Sensei. Besides metrics specific to a particular system type, all these SPM flavors also monitor OS and JVM statistics. But what if you want to monitor any Java application? Say your custom Java application run either in some container, application server, or from a command line? You don’t really want to be forced to look at blank graphs that are really meant for stats from one of the above mentioned systems. This was one of our own itches, and we figured we were not the only ones craving to scratch that itch, so we put together a flavor of SPM for monitoring just the JVM and (Operating) System metrics.
Now SPM lets you monitor OS and JVM performance metrics of any Java process through the following 5 reports, along with all other SPM functionality like integrated Alerts, email Subscriptions, etc. If you are one of many existing SPM users these graphs should look very familiar.
JVM monitoring isn’t like radio station management where you can listen for dead air. It a bit more complicated than that.
SPM may help with it.
Beyond the JVM and OS, how do you handle monitoring of topic map applications?
Posted in Java, Systems Administration | No Comments »
Monday, August 6th, 2012
Writing a modular GPGPU program in Java by Masayuki Ioki, Shumpei Hozumi, and Shigeru Chiba.
Abstract:
This paper proposes a Java to CUDA runtime program translator for scientific-computing applications. Traditionally, these applications have been written in Fortran or C without using a rich modularization mechanism. Our translator enables those applications to be written in Java and run on GPGPUs while exploiting a rich modularization mechanism in Java. This translator dynamically generates optimized CUDA code from a Java program given at bytecode level when the program is running. By exploiting dynamic type information given at translation, the translator devirtualizes dynamic method dispatches and flattens objects into simple data representation in CUDA. To do this, a Java program must be written to satisfy certain constraints.
This paper also shows that the performance overheads due to Java and WootinJ are not significantly high.
Just in case you are starting to work on topic map processing routines for GPGPUs.
Something to occupy your time during the “dog days” of August.
Posted in CUDA, GPU, Java | No Comments »
Saturday, August 4th, 2012
Fun With Hadoop In Action Exercises (Java) by Sujit Pal.
From the post:
As some of you know, I recently took some online courses from Coursera. Having taken these courses, I have come to the realization that my knowledge has some rather large blind spots. So far, I have gotten most of my education from books and websites, and I have tended to cherry pick subjects which I need at the moment for my work, as a result of which I tend to ignore stuff (techniques, algorithms, etc) that fall outside that realm. Obviously, this is Not A Good Thing™, so I have begun to seek ways to remedy that.
I first looked at Hadoop years ago, but never got much beyond creating proof of concept Map-Reduce programs (Java and Streaming/Python) for text mining applications. Lately, many subprojects (Pig, Hive, etc) have come up in order to make it easier to deal with large amounts of data using Hadoop, about which I know nothing. So in an attempt to ramp up relatively quickly, I decided to take some courses at BigData University.
The course uses BigInsights (IBM’s packaging of Hadoop) which run only on Linux. VMWare images are available, but since I have a Macbook Pro, that wasn’t much use to me without a VMWare player (not free for Mac OSX). I then installed VirtualBox and tried to run a Fedora 10 64-bit image on it, and install BigInsights on Fedora, but it failed. I then tried to install Cloudera CDH4 (Cloudera’s packaging of Hadoop) on it (its a series of yum commands), but that did not work out either. Ultimately I decided to ditch VirtualBox altogether and do a pseudo-distributed installation of the stock Apache Hadoop (1.0.3) direct on my Mac following instructions on Michael Noll’s page.
The Hadoop Fundamentals I course which I was taking covers quite a few things, but I decided to stop and actually read all of Hadoop in Action (HIA) in order to get a more thorough coverage. I had purchased it some years before as part of Manning’s MEAP (Early Access) program, so its a bit dated (examples are mostly in the older 0.19 API), but its the only Hadoop book I possess, and the concepts are explained beautifully, and its not a huge leap to mentally translate code from the old API to the new, so it was well worth the read.
I also decided to tackle the exercises (in Java for now) and post my solutions on GitHub. Three reasons. First, it exposes me to a more comprehensive set of scenarios than I have had previously, and forces me to use techniques and algorithms that I wont otherwise. Second, hopefully some of my readers can walk circles around me where Hadoop is concerned, and they would be kind enough to provide criticism and suggestions for improvement. And third, there may be some who would benefit from having the HIA examples worked out. So anyway, here they are, my solutions to selected exercises from Chapters 4 and 5 of the HIA book for your reading pleasure.
Much good content follows!
This will be useful to a large number of people.
As well as setting a good example.
Posted in Hadoop, Java | No Comments »
Tuesday, July 31st, 2012
Vertical Scaling made easy through high-performance actors
From the webpage:
Vertical scaling is today a major issue when writing server code. Threads and locks are the traditional approach to making full utilization of fat (multi-core) computers, but result is code that is difficult to maintain and which to often does not run much faster than single-threaded code.
Actors make good use of fat computers but tend to be slow as messages are passed between threads. Attempts to optimize actor-based programs results in actors with multiple concerns (loss of modularity) and lots of spaghetti code.
The approach used by JActor is to minimize the messages passed between threads by executing the messages sent to idle actors on the same thread used by the actor which sent the message. Message buffering is used when messages must be sent between threads, and two-way messaging is used for implicit flow control. The result is an approach that is easy to maintain and which, with a bit of care to the architecture, provides extremely high rates of throughput.
On an intel i7, 250 million messages can be passed between actors in the same JVM per second–several orders of magnitude faster than comparable actor frameworks.
Hmmm, 250 million messages a second? On the topic map (TM) scale, that’s what?, about 1/4 TM?
Seriously, if you are writing topic map server software, you need to take a look at JActor.
Posted in Actor-Based, Java, Messaging | No Comments »
Monday, June 11th, 2012
Machine Learning in Java has never been easier!
From the post:
Java is by far one of the most popular programming languages. It’s on the top of the TIOBE index and thousands of the most robust, secure, and scalable backends have been built in Java. In addition, there are many wonderful libraries available that can help accelerate your project enormously. For example, most of BigML’s backend is developed in Clojure which runs on top of the Java Virtual Machine. And don’t forget the ever-growing Android market, with 850K new devices activated each day!
There are number of machine learning Java libraries available to help build smart data-driven applications. Weka is one of the more popular options. In fact, some of BigML’s team members were Weka users as far back as the late 90s. We even used it as part of the first BigML backend prototype in early 2011. Apache Mahout is another great Java library if you want to deal with bigger amounts of data. However in both cases you cannot avoid “the fun of running servers, installing packages, writing MapReduce jobs, and generally behaving like IT ops folks“. In addition you need to be concerned with selecting and parametrizing the best algorithm to learn from your data as well as finding a way to activate and integrate the model that you generate into your application.
Thus we are thrilled to announce the availability of the first Open Source Java library that easily connects any Java application with the BigML REST API. It has been developed by Javi Garcia, an old friend of ours. A few of the BigML team members have been lucky enough to work with Javi in two other companies in the past.
With this new library, in just a few lines of code you can create a predictive model and generate predictions for any application domain. From finding the best price for a new product to forecasting sales, creating recommendations, diagnosing malfunctions, or detecting anomalies.
It won’t be as easy as “…in just a few lines of code…” but it will, what’s the term, modularize the building of machine learning applications. Someone has to run/maintain the servers, do security patches, backups but it doesn’t have to be you.
Specialization, that’s the other term. So that team members can be really good at what they do, as opposed to sorta good at a number of things.
If you need a common example, consider documentation, most of which is written by developers when they can spare the time. Reads like it. Costs your clients time and money trying to get their developers to work with poor documentation.
Not to mention costing you time and money when the software is not longer totally familiar to one person.
PS: As of today, June 11, 2012, Java is now #2 and C is #1 on the TIOBE list.
Posted in Java, Machine Learning | No Comments »
Wednesday, June 6th, 2012
Java Annotations
From the post:
Annotation is code about the code, that is metadata about the program itself. In other words, organized data about the code, embedded within the code itself. It can be parsed by the compiler, annotation processing tools and can also be made available at run-time too.
We have basic java comments infrastructure using which we add information about the code / logic so that in future, another programmer or the same programmer can understand the code in a better way. Javadoc is an additional step over it, where we add information about the class, methods, variables in the source code. The way we need to add is organized using a syntax. Therefore, we can use a tool and parse those comments and prepare a javadoc document which can be distributed separately.
Javadoc facility gives option for understanding the code in an external way, instead of opening the code the javadoc document can be used separately. IDE benefits using this javadoc as it is able to render information about the code as we develop. Annotations were introduced in JDK 1.5
A reminder to myself of an opportunity for the application of topic maps to Java code. Obviously with regard to “custom” annotations but I suspect the range of usage for “standard” annotations is quite wide.
Not that there aren’t other “subjects” that could be usefully organized out of source code using topic maps. Such as which developers use which classes, methods, etc.
Posted in Java, Java Annotations | No Comments »
Thursday, April 5th, 2012
Choosing a Java Version on Ubuntu
Apologies but I had to write this down where I am likely to find it in the future.
I run a fair number of Java based apps that are, shall we say, sensitive as to the edition/version of Java that is being invoked.
Some updates take it upon themselves to “correct” my settings.
I happened upon this and it reminded me of that issue.
Thought you might find it helpful at some point.
Posted in Java | No Comments »
Monday, March 5th, 2012
Java Remote Method Invocation (RMI) for Bioinformatics by Pierre Lindenbaum.
From the post:
“Java Remote Method Invocation (Java RMI) enables the programmer to create distributed Java technology-based to Java technology-based applications, in which the methods of remote Java objects can be invoked from other Java virtual machines*, possibly on different hosts.“[Oracle] In the current post a java client will send a java class to the server that will analyze a DNA sequence fetched from the NCBI, using the RMI technology.
Distributed computing, both to the client and server, is likely to form part of a topic map solution. This example is one drawn from bioinformatics but the principles are generally applicable.
Posted in Bioinformatics, Java, Remote Method Invocation (RMI) | No Comments »
Wednesday, February 29th, 2012
Work-Stealing & Recursive Partitioning with Fork/Join by Ilya Grigorik.
From the post:
Implementing an efficient parallel algorithm is, unfortunately, still a non-trivial task in most languages: we need to determine how to partition the problem, determine the optimal level of parallelism, and finally build an implementation with minimal synchronization. This last bit is especially critical since as Amdahl’s law tells us: “the speedup of a program using multiple processors in parallel computing is limited by the time needed for the sequential fraction of the program”.
The Fork/Join framework (JSR 166) in JDK7 implements a clever work-stealing technique for parallel execution that is worth learning about – even if you are not a JDK user. Optimized for parallelizing divide-and-conquer (and map-reduce) algorithms it abstracts all the CPU scheduling and work balancing behind a simple to use API.
I appreciated the observation later in the post that map-reduce is a special case of the pattern described in this post. A better understanding of the special cases can lead to a deeper understanding of the general one.
Posted in Java, MapReduce | No Comments »
Friday, February 3rd, 2012
Java, Python, Ruby, Linux, Windows, are all doomed by Russell Winder.
From the description:
The Multicore Revolution gathers pace. Moore’s Law remains in force — chips are getting more and more transistors on a quarterly basis. Intel are now out and about touting the “many core chip”. The 80-core chip continues its role as research tool. The 48-core chip is now actively driving production engineering. Heterogeneity not homogeneity is the new “in” architecture.
Where Intel research, AMD and others cannot be far behind.
The virtual machine based architectures of the 1990s, Python, Ruby and Java, currently cannot cope with the new hardware architectures. Indeed Linux and Windows cannot cope with the new hardware architectures either. So either we will have lots of hardware which the software cannot cope with, or . . . . . . well you’ll just have to come to the session.
The slides are very hard to see so grab a copy at: http://www.russel.org.uk/Presentations/accu_london_2010-11-18.pdf
From the description: Heterogeneity not homogeneity is the new “in” architecture.
Is greater heterogeneity in programming languages coming?
Posted in Java, Linux OS, Parallelism, Python, Ruby | No Comments »
Thursday, January 5th, 2012
Open Data Structures by Pat Morin.
From “about:”
Open Data Structures covers the implementation and analysis of data structures for sequences (lists), queues, priority queues, unordered dictionaries, and ordered dictionaries.
Data structures presented in the book include stacks, queues, deques, and lists implemented as arrays and linked-list; space-efficient implementations of lists; skip lists; hash tables and hash codes; binary search trees including treaps, scapegoat trees, and red-black trees; and heaps, including implicit binary heaps and randomized meldable heaps.
The data structures in this book are all fast, practical, and have provably good running times. All data structures are rigorously analyzed and implemented in Java and C++. The Java implementations implement the corresponding interfaces in the Java Collections Framework.
The book and accompanying source code are free (libre and gratis) and are released under a Creative Commons Attribution License. Users are free to copy, distribute, use, and adapt the text and source code, even commercially. The book’s LaTeX sources, Java/C++ sources, and build scripts are available through github.
Noticed in David Eppstein’s Link Roundup.
Posted in Data Structures, Java | No Comments »