Archive

Posts Tagged ‘data’

Sorting for Chumps

December 9, 2010 Leave a comment

A terse post about sorting, written for chumps by a chump

If you’ve never spent time on Urban Dictionary, you’re missing out on one of the finer things in life.

Chump. A sucka that tries to act cool, but is really a fool and tries to act tough, but really isn’t

source

I’m a chump who’s been out of University for a few years now, so I’ve long since forgotten anything that I don’t use in my daily job. Lately, I’ve been going back over some of the things I used to know and trying to re-learn basic fun things like sorting and runtime analysis.

I won’t bore you with the distinction between big-O and big-theta, but there’s a fun table on Wikipedia’s Big O notation page for the curious.

For chumps like me who appreciate how stunningly cool the math is, but just want real-world performance data… you’ve come to the right place.

Test methodology

As preparation for this blog post, I wrote a small sorting library and “integer sorting” program which you can find on github. I named it libsort just because I’m cool like that. The code isn’t the best I’ve ever written, but most of it was hacked out on iSSH so… there you go.

libsort currently implements the following algorithms:

  • bubblesort: O(n2), O(1) extra space
  • mergesort: O(n log n) runtime, O(n) extra space
  • quicksort: O(n log n) runtime, O(1) extra space

I started with bubblesort because it looked easy (remember, I’m a chump :)) and it’s also n2 so it should make for good comparison against faster algorithms.

Mergesort and Quicksort really weren’t that much harder to implement… I picked slightly optimized versions and coded them up based on Wikipedia’s pseudo-code. Man I love pseudo-code…

I also ran two versions of quicksort and mergesort, with some minor changes like malloc’ing once beforehand rather than having each recursive call malloc temporary space. You’ll see this show up in the graphs as quicksort_onemalloc below, etc.

Test machine

The system I ran the tests on looks like this

  • Intel Core i7-920 @ 2.67 GHz (quad-core with 2 threads/core = 8 virtual CPUs)
  • 6 GB of DDR3 memory (sorry, don’t know the bus speed)
  • Ubuntu 10.04 Desktop amd64
  • a regular old SATA HDD

The Results

You can download the spreadsheet I put together from the raw data. Or you can be a chump like me and just look at the graphs.

Result #1: bubblesort sucks.

Bubblesort does SO poorly once you get beyond 10,000 elements that I stopped measuring it. O(n2) really hurts… 100K elements took over 110 seconds. That’s really, really bad.

Result #2: quicksort did slightly better than mergesort, but not enough to really matter a whole lot for most purposes.

The graph above doesn’t really do justice for just how fast sorting is for small lists. Both of these sorting algorithms can sort over 1 MILLION integers in less than a quarter of a second. You really don’t see much difference until you get past 100 million, then my implementation of an in-place quicksort starts to really shine.

Looking at it in a log10-scale layout:

Looking at it from a log-scale shows pretty much linear increase in log-time, which sounds pretty good to this chump.

That’s all for now, please check out the libsort code on github and let me know if I’ve made any stupid mistakes, etc.

Thanks for reading – now let me know what you think!

Advertisements
Categories: Uncategorized Tags: , , ,