Archive for the ‘Sorting’ Category

Hadoop MapReduce: to Sort or Not to Sort

Wednesday, March 6th, 2013

Hadoop MapReduce: to Sort or Not to Sort by Tendu Yogurtcu.

From the post:

What is the big deal about Sort? Sort is fundamental to the MapReduce framework, the data is sorted between the Map and Reduce phases (see below). Syncsort’s contribution allows native Hadoop sort to be replaced by an alternative sort implementation, for both Map and Reduce sides, i.e. it makes Sort phase pluggable.

MapReduce

Opening up the Sort phase to alternative implementations will facilitate new use cases and data flows in the MapReduce framework. Let’s look at some of these use cases:

The use cases include:

  • Optimized sort implementations.
  • Hash-based aggregations.
  • Ability to run a job with a subset of data.
  • Optimized full joins.

See Tendu’s post for the details.

I first saw this at Use Cases for Hadoop’s New Pluggable Sort by Alex Popescu.

Fast Parallel Sorting Algorithms on GPUs

Wednesday, December 5th, 2012

Fast Parallel Sorting Algorithms on GPUs by Bilal Jan, Bartolomeo Montrucchio, Carlo Ragusa, Fiaz Gul Khan, Omar Khan.

Abstract:

This paper presents a comparative analysis of the three widely used parallel sorting algorithms: OddEven sort, Rank sort and Bitonic sort in terms of sorting rate, sorting time and speed-up on CPU and different GPU architectures. Alongside we have implemented novel parallel algorithm: min-max butterfly network, for finding minimum and maximum in large data sets. All algorithms have been implemented exploiting data parallelism model, for achieving high performance, as available on multi-core GPUs using the OpenCL specification. Our results depicts minimum speed-up19x of bitonic sort against oddeven sorting technique for small queue sizes on CPU and maximum of 2300x speed-up for very large queue sizes on Nvidia Quadro 6000 GPU architecture. Our implementation of full-butterfly network sorting results in relatively better performance than all of the three sorting techniques: bitonic, odd-even and rank sort. For min-max butterfly network, our findings report high speed-up of Nvidia quadro 6000 GPU for high data set size reaching 224 with much lower sorting time.

Is there a GPU in your topic map processing future?

I first saw this in a tweet by Stefano Bertolo.

1MB Sorting Explained

Sunday, October 28th, 2012

1 MB Sorting Explained by Jeff Preshing.

From the post:

In my previous post, I shared some source code to sort one million 8-digit numbers in 1MB of RAM as an answer to this Stack Overflow question. The program works, but I didn’t explain how, leaving it as a kind of puzzle for the reader.

(image omitted)

I had promised to explain it in a followup post, and in the meantime, there’s been a flurry of discussion in the comments and on Reddit. In particular, commenter Ben Wilhelm (aka ZorbaTHut) already managed to explain most of it (Nice work!), and by now, I think quite a few people already get it. Nonetheless, I’ll write up another explanation as promised.

You may want to also review the answers and comments at Stack Overflow as well.

Sorting being one of those fundamental operations you will encounter time and again.

Even in topic maps.

Sorting by function value in Solr

Wednesday, September 28th, 2011

Sorting by function value in Solr by Rafał Kuć.

From the post:

In Solr 3.1 and later we have a very interesting functionality, which enables us to sort by function value. What does that gives us? Actually, a few interesting possibilities.

Let’s start

The first example that comes to mind, perhaps because of the project on which I worked some time ago, it’s sorting on the basis of distance between two geographical points. So far, to implement such functionality was needed changes in Solr (for example, LocalSolr or LocalLucene). Using Solr 3.1 and later, you can sort your search results using the value returned by the defined functions. For example, is Solr, we have the dist function calculating the distance between two points. One variation of the function is a function accepting five parameters: algorithm and two pairs of points. If, using this feature, we would like to sort your search results in ascending order from the point of latitude and longitude 0.0, we should add the following sort parameter to the Solr query:

See the post and think about how you would use this with Solr, or even how you might offer this to your users in other contexts.