Archive for the ‘Sorting’ Category

Sorting [Visualization]

Tuesday, March 24th, 2015

Carlo Zapponi created http://sorting.at/, a visualization of sorting resource that steps through different sorting algorithms. You can choose from four different initial states, six (6) different sizes (5, 10, 20, 50, 75, 100), and six (6) different colors.

The page defaults to Quick Sort and Heap Sort, but under add algorithms you will find:

I added Wikipedia links for the algorithms. For a larger list see:
Sorting algorithm.

I first saw this in a tweet by Eric Christensen.

Spark officially sets a new record in large-scale sorting

Thursday, November 6th, 2014

Spark officially sets a new record in large-scale sorting by Reynold Xin.

From the post:

A month ago, we shared with you our entry to the 2014 Gray Sort competition, a 3rd-party benchmark measuring how fast a system can sort 100 TB of data (1 trillion records). Today, we are happy to announce that our entry has been reviewed by the benchmark committee and we have officially won the Daytona GraySort contest!

In case you missed our earlier blog post, using Spark on 206 EC2 machines, we sorted 100 TB of data on disk in 23 minutes. In comparison, the previous world record set by Hadoop MapReduce used 2100 machines and took 72 minutes. This means that Spark sorted the same data 3X faster using 10X fewer machines. All the sorting took place on disk (HDFS), without using Spark’s in-memory cache. This entry tied with a UCSD research team building high performance systems and we jointly set a new world record.

Winning this benchmark as a general, fault-tolerant system marks an important milestone for the Spark project. It demonstrates that Spark is fulfilling its promise to serve as a faster and more scalable engine for data processing of all sizes, from GBs to TBs to PBs. In addition, it validates the work that we and others have been contributing to Spark over the past few years.

If you are not already familiar with Spark, see the project homepage and/or the extensive documentation page. (Be careful, you can easily lose yourself in the Spark documentation.)

Alphabetical Order

Tuesday, July 29th, 2014

Alphabetical order explained in a mere 27,817 words by David Weinberger.

From the post:

This is one of the most amazing examples I’ve seen of the complexity of even simple organizational schemes. “Unicode Collation Algorithm (Unicode Technical Standard #10)” spells out in precise detail how to sort strings in what we might colloquially call “alphabetical order.” But it’s way, way, way more complex than that.

Unicode is an international standard for how strings of characters get represented within computing systems. For example, in the familiar ASCII encoding, the letter “A” is represented in computers by the number 65. But ASCII is too limited to encode the world’s alphabets. Unicode does the job.

As the paper says, “Collation is the general term for the process and function of determining the sorting order of strings of characters” so that, for example, users can look them up on a list. Alphabetical order is a simple form of collation.

The best part is the summary of Unicode Technical Standard #10:

This document dives resolutely into the brambles and does not give up. It incidentally exposes just how complicated even the simplest of sorting tasks is when looked at in their full context, where that context is history, language, culture, and the ambiguity in which they thrive.

We all learned the meaning of “alphabetical order” in elementary school. But which “alphabetical order” depends upon language, culture, context, etc.

Other terms and phrases have the same problem. But the vast majority of them have no Unicode Technical Report with all the possible meanings.

For those terms there are topic maps.

I first saw this in a tweet by Computer Science.

The sound of sorting…

Sunday, December 22nd, 2013

The sound of sorting – 15 sorting algorithms visualized and sonified by Alex Popescu.

Alex has embedded a YouTube video that visualizes and sonifies 15 sorting algorithms.

As a special treat, Alex links to more details on the video.

If you are interested in more visualizations of algorithms, see Algoviz.org.

Hadoop MapReduce: to Sort or Not to Sort

Wednesday, March 6th, 2013

Hadoop MapReduce: to Sort or Not to Sort by Tendu Yogurtcu.

From the post:

What is the big deal about Sort? Sort is fundamental to the MapReduce framework, the data is sorted between the Map and Reduce phases (see below). Syncsort’s contribution allows native Hadoop sort to be replaced by an alternative sort implementation, for both Map and Reduce sides, i.e. it makes Sort phase pluggable.

MapReduce

Opening up the Sort phase to alternative implementations will facilitate new use cases and data flows in the MapReduce framework. Let’s look at some of these use cases:

The use cases include:

  • Optimized sort implementations.
  • Hash-based aggregations.
  • Ability to run a job with a subset of data.
  • Optimized full joins.

See Tendu’s post for the details.

I first saw this at Use Cases for Hadoop’s New Pluggable Sort by Alex Popescu.

Fast Parallel Sorting Algorithms on GPUs

Wednesday, December 5th, 2012

Fast Parallel Sorting Algorithms on GPUs by Bilal Jan, Bartolomeo Montrucchio, Carlo Ragusa, Fiaz Gul Khan, Omar Khan.

Abstract:

This paper presents a comparative analysis of the three widely used parallel sorting algorithms: OddEven sort, Rank sort and Bitonic sort in terms of sorting rate, sorting time and speed-up on CPU and different GPU architectures. Alongside we have implemented novel parallel algorithm: min-max butterfly network, for finding minimum and maximum in large data sets. All algorithms have been implemented exploiting data parallelism model, for achieving high performance, as available on multi-core GPUs using the OpenCL specification. Our results depicts minimum speed-up19x of bitonic sort against oddeven sorting technique for small queue sizes on CPU and maximum of 2300x speed-up for very large queue sizes on Nvidia Quadro 6000 GPU architecture. Our implementation of full-butterfly network sorting results in relatively better performance than all of the three sorting techniques: bitonic, odd-even and rank sort. For min-max butterfly network, our findings report high speed-up of Nvidia quadro 6000 GPU for high data set size reaching 224 with much lower sorting time.

Is there a GPU in your topic map processing future?

I first saw this in a tweet by Stefano Bertolo.

1MB Sorting Explained

Sunday, October 28th, 2012

1 MB Sorting Explained by Jeff Preshing.

From the post:

In my previous post, I shared some source code to sort one million 8-digit numbers in 1MB of RAM as an answer to this Stack Overflow question. The program works, but I didn’t explain how, leaving it as a kind of puzzle for the reader.

(image omitted)

I had promised to explain it in a followup post, and in the meantime, there’s been a flurry of discussion in the comments and on Reddit. In particular, commenter Ben Wilhelm (aka ZorbaTHut) already managed to explain most of it (Nice work!), and by now, I think quite a few people already get it. Nonetheless, I’ll write up another explanation as promised.

You may want to also review the answers and comments at Stack Overflow as well.

Sorting being one of those fundamental operations you will encounter time and again.

Even in topic maps.

Sorting by function value in Solr

Wednesday, September 28th, 2011

Sorting by function value in Solr by Rafał Kuć.

From the post:

In Solr 3.1 and later we have a very interesting functionality, which enables us to sort by function value. What does that gives us? Actually, a few interesting possibilities.

Let’s start

The first example that comes to mind, perhaps because of the project on which I worked some time ago, it’s sorting on the basis of distance between two geographical points. So far, to implement such functionality was needed changes in Solr (for example, LocalSolr or LocalLucene). Using Solr 3.1 and later, you can sort your search results using the value returned by the defined functions. For example, is Solr, we have the dist function calculating the distance between two points. One variation of the function is a function accepting five parameters: algorithm and two pairs of points. If, using this feature, we would like to sort your search results in ascending order from the point of latitude and longitude 0.0, we should add the following sort parameter to the Solr query:

See the post and think about how you would use this with Solr, or even how you might offer this to your users in other contexts.