Archive

Posts Tagged ‘code’

If Today Was Your Last Day… To Code

February 6, 2011 3 comments

This post comes courtesy of Nickelback, one of my “top 20” bands. I’m listening to their song “If today was your last day” and thinking happy thoughts, so walk with me for a bit. Pretend it’s your last day to code, what would you do?

If today was your last day [to code], and tomorrow was too late…

Would you spend time in meetings discussing the next cool project? Sit through yet another status update meeting while a coworker reads bullets from a powerpoint?

Would you write new wiki pages / document all the arcane parts of the system that nobody else understands? Would you finally finish the interface documentation you keep putting off? Would you clean up your desk and blurt out epithets at random passers-by?

Would you spend an extra hour teaching a coworker or junior engineer how to write cleaner/better code? Would you spend your last day tutoring, mentoring, making sure those you leave behind get the last ounce of wisdom before you go?

Would you start a new project, cut your teeth on the euphoria of starting something new and creating something from the ground up? Or would you rush around trying to fight the last few bugs blocking your product from shipping next week, in the hopes that your efforts will make the product better for the customer?

Would you seek out the hardest, gnarliest, most beasty multi-threaded race condition and kick it to the curb? Would you grab the nearest flyswatter and quash a few bugs just for old times sake? Or would you coast through, take an extra long lunch, and try to enjoy the last few hours of your coding existence?

Would you rewrite that one butt ugly section of code you’ve been ignoring? You know, the one that makes your skin crawl and makes your eyes bleed. Would you rip it to pieces and give it another go?

Would you explore new tools, migrate to git, drop MySQL for PostGres, check into the new code profiler or finally use valgrind to kill memory leaks? Would you improve the validation for your project, check out cruise control or Hudson / Jenkins?

Would you write in C, assembly, python, perl, Ruby, Haskell, C#, Java, Erlang, Go, Lisp, or would you invent your own language just for fun?

Would you write a new compiler, OS, GUI, database, web browser, or iOS/Android app? Would you migrate up the stack to functional programming or get into research? Or would you learn more about hardware, get closer to the processor and learn more about transistors?

Summarizing the song… Live today like it’s your last day, and do what you really wanted to do all along.

“It’s never too late to shoot for the stars regardless of whoever you are. So do whatever it takes…”

Best

Atto

Advertisements
Categories: Life Tags: , ,

gcc optimization case study

January 4, 2011 3 comments

I’ve used gcc for years, with varying levels of optimization… then the other day, I got really bored and started experimenting to see if gcc optimization flags really matter at all.

Trivial Example

It’s probably not the best example I could’ve come up with, but I hammered out the following code as a tiny and trivial test case.


/*
 * bar.c - trivial gcc optimization test case
 */

int main(int argc, char ** argv)
{
	long long i, j, count;

	j = 1;
	count = (100*1000*1000LL);
	for(i = 0; i < count; i++)
		j = j << 1;
	printf("j = %lld\n", j);
}

The Makefile is pretty straightforward, nothing super interesting here:


CC=gcc
CFLAGS=-O0
OBJS=bar.o
TARGET=foo

$(TARGET): $(OBJS)
	$(CC) $(CFLAGS) -o $(TARGET) $(OBJS)

clean:
	-rm -f $(OBJS) $(TARGET) *~

It’s pretty boring code that loops 100 million times doing a left shift. Ignoring the fact that the result is useless, I then ran gcc with a few different flags with the following results. Runtime is in seconds, and the asm was generated with objdump -S and I’m only showing the loop code below.

gcc flags runtime (avg of 5) loop disassembly
-O0 0.2314 sec 29: shlq -0x10(%rbp)
2d: addq $0x1,-0x8(%rbp)
32: mov -0x8(%rbp),%rax
36: cmp -0x18(%rbp),%rax
3a: jl 29
-O1 0.078 sec e: add %rsi,%rsi
11: add $0x1,%rax
15: cmp $0x5f5e100,%rax
1b: jne e
-O2 0.077 sec same loop
-O3 0.077 same loop, different return
-Os 0.072 7: inc %rax
a: add %rsi,%rsi
d: cmp $0x5f5e100,%rax
13: jne 8
-O2 -funroll-loops 0.014 sec 10: add $0x8,%rax
14: shl $0x8,%rsi
18: cmp $0x5f5e100,%rax
1e: jne 10

Runtime measured with “time ./foo” on a Core i7-920 quad-core CPU with 6 GB of DDR3 memory

Interestingly enough, the fastest gcc compile options for this useless code sample are -O2 and -f-unroll-loops. It is faster because it performs an 8-way unroll and therefore does approximately 1/8th the work. This works in this trivial example because it literally replaces 8 left-shift-by-one operations with a single shift-left by 8.

So that’s semi-interesting, but all I’ve proved so far is gcc does in fact optimize and it is indeed much faster when you optimize a trivial loop example.

Multi-threaded app

I was curious to see how this plays out on non-trivial programs, and I had some code at work that needed this kind of analysis anyways – so I gave it a whirl on my multi-threaded app. I can’t break out the source code or the disassembly, and I’m honestly not 100% sure why – but I saw some very odd results.

flags runtime (approx)
-O0 8.5 sec
-O2 12.5 sec
-O3 13 sec
-Os 7.9 sec
-Os -march=core2 7.7 sec

What I find really interesting is that with -O2 and -O3, the application runtime gets WORSE instead of better. My best guess for this is that with O2 and aggressive inlining, the code size blows up and it’s worse for cache hit rates. I haven’t investigated it and probably won’t take the time, but I found it rather fascinating to see such a change just from compiler flags

FIO benchmark

Anyone who has visited my blog before probably knows that I’m a big fan of Jens Axboe’s fio benchmark. I decided to record a similar result from using fio to benchmark /dev/ram0.

I’m using fio 1.44.2, compiled from source on my local Ubuntu 10.04 adm64 system.


$ sudo ./fio --bs=4k --direct=1 --filename=/dev/ram0 
--numjobs=4 --iodepth=8 --ioengine=libaio --group_reporting
--time_based --runtime=60 --rw=rand-read --name=rand-read

There’s no real rhyme or reason for the workload I chose (4k, 4 forks, iodepth=9, etc), I just wanted something with a few threads and some outstanding IO to have a good chance of getting high bandwidth.

flags bandwidth (MB/s) avg latency (us)
-O0 1208 1.04
-O1 1524 0.83
-O2 1645 13.95 OR 0.78, very odd
-O3 1676 0.76
-Os 1543 3.29
-O2 -funroll-loops 1667 0.77 OR 13.33, odd again

Summary

With three very different examples, I get three very different sets of results.

In my trivial, stupid for loop example, unrolling the loop made a lot of sense and using -O2 didn’t matter a whole lot.

In my multi-threaded app, most optimizations actually made the runtime worse, but size and architecture made a difference.

And with fio, the results are pretty much what you’d expect – higher levels of optimization make the benchmark faster.

What’s the conclusion?

Compiler flags matter, but like all optimization their usefulness and impact is highly workload and application dependent.

So… you just gotta try them all, see what is best for your project. I know, killer conclusion, but hey… I just tell it like I see it.

Thanks for reading, I’d love to hear your comments/feedback below

atto

Categories: Uncategorized Tags: , ,

Sorting for Chumps

December 9, 2010 Leave a comment

A terse post about sorting, written for chumps by a chump

If you’ve never spent time on Urban Dictionary, you’re missing out on one of the finer things in life.

Chump. A sucka that tries to act cool, but is really a fool and tries to act tough, but really isn’t

source

I’m a chump who’s been out of University for a few years now, so I’ve long since forgotten anything that I don’t use in my daily job. Lately, I’ve been going back over some of the things I used to know and trying to re-learn basic fun things like sorting and runtime analysis.

I won’t bore you with the distinction between big-O and big-theta, but there’s a fun table on Wikipedia’s Big O notation page for the curious.

For chumps like me who appreciate how stunningly cool the math is, but just want real-world performance data… you’ve come to the right place.

Test methodology

As preparation for this blog post, I wrote a small sorting library and “integer sorting” program which you can find on github. I named it libsort just because I’m cool like that. The code isn’t the best I’ve ever written, but most of it was hacked out on iSSH so… there you go.

libsort currently implements the following algorithms:

  • bubblesort: O(n2), O(1) extra space
  • mergesort: O(n log n) runtime, O(n) extra space
  • quicksort: O(n log n) runtime, O(1) extra space

I started with bubblesort because it looked easy (remember, I’m a chump :)) and it’s also n2 so it should make for good comparison against faster algorithms.

Mergesort and Quicksort really weren’t that much harder to implement… I picked slightly optimized versions and coded them up based on Wikipedia’s pseudo-code. Man I love pseudo-code…

I also ran two versions of quicksort and mergesort, with some minor changes like malloc’ing once beforehand rather than having each recursive call malloc temporary space. You’ll see this show up in the graphs as quicksort_onemalloc below, etc.

Test machine

The system I ran the tests on looks like this

  • Intel Core i7-920 @ 2.67 GHz (quad-core with 2 threads/core = 8 virtual CPUs)
  • 6 GB of DDR3 memory (sorry, don’t know the bus speed)
  • Ubuntu 10.04 Desktop amd64
  • a regular old SATA HDD

The Results

You can download the spreadsheet I put together from the raw data. Or you can be a chump like me and just look at the graphs.

Result #1: bubblesort sucks.

Bubblesort does SO poorly once you get beyond 10,000 elements that I stopped measuring it. O(n2) really hurts… 100K elements took over 110 seconds. That’s really, really bad.

Result #2: quicksort did slightly better than mergesort, but not enough to really matter a whole lot for most purposes.

The graph above doesn’t really do justice for just how fast sorting is for small lists. Both of these sorting algorithms can sort over 1 MILLION integers in less than a quarter of a second. You really don’t see much difference until you get past 100 million, then my implementation of an in-place quicksort starts to really shine.

Looking at it in a log10-scale layout:

Looking at it from a log-scale shows pretty much linear increase in log-time, which sounds pretty good to this chump.

That’s all for now, please check out the libsort code on github and let me know if I’ve made any stupid mistakes, etc.

Thanks for reading – now let me know what you think!

Categories: Uncategorized Tags: , , ,

Array map in C

October 16, 2010 4 comments

Last year, I got “The Ruby Way” (link) as a gift to myself. I’m not a major Ruby freak, but there are some cool things about it and I really enjoy Ruby in general.

As I read my way through the book, I remember reading about array map. Array map is cool, it let’s you do things like

my_strings.map {|s| s.strip! }

Which is roughly equivalent to

new_arr = []
my_strings.each {|s| new_arr.append(s.strip) }

Of course, this is kindof a stupid example and you can do MUCH more interesting things with Array#select or Array#reject – like write a one-liner to find all the phone numbers in an array that start with 351 or don’t hash to the same bucket, whatever. Anyway, I remember being really excited about array#map for some reason when I read that chapter. And whenever I get excited about something… weird things tend to happen. 🙂

See, I spend a lot of time writing programs in C. It’s a great language, and pretty much the only option in the kernel/filesystem/embedded world. So when I see a cool feature in another language (like Array map), I think “gee, wouldn’t it be awesome to have that in C.”

Being a frequent C coder, I do a lot of things with/to/because of arrays – and the two most common things I do with arrays are indexing and looping. Indexing is cool because it’s an O(1) operation, and that’s great for writing fast code that uses a lot of RAM. Looping, on the other hand… well, it’s usefulness depends on the problem at hand, but this is pretty common:

for(i = 0; i < count; i--);
   do_stuff_with(&my_array[i]);

Ten bonus points if you can spot the subtle bugs I put into this trivial for loop. Hint #1: there are three of them. Hint #2: the program (or kernel/firmware/driver) will not do anything useful. Beyond the obvious problems with this code (and how easy it is to accidentally slip in the extra semicolon), writing for loops gets really tedious and old sometimes.

So I keep thinking that it’d be neat to write a C library to implement Array map, select, reject. But then I start thinking about the code. To do this properly, you could use macros, something like

#define ARRAY_MAP(arr, size, func)                       \
        for(int i=0; i<(size); i++) (func)(arr[i]);
.
.
my_arr = malloc(sizeof(u32)*10);
memcpy(my_arr, arr1, sizeof(u32)*10);
ARRAY_MAP(my_arr, 10, do_stuff);

Well, that looks like junk and doesn’t really do much. You could implement it with callbacks to avoid the evils of macros, but that doesn’t help you much either. You still have to handle memory allocation, deallocation, type safety, and in the end the code doesn’t look any simpler.

Not to mention, for anything really high performance (filesystems, OSes, etc) you generally want to avoid looping over arrays in general – the Linux kernel now has a native hash table and circular buffer (among other fun data types). If performance is critical, use a better/faster/cooler algorithm and a more appropriate data type.

So I’m back to square one, array map in C is pointless. If you don’t care about runtime, just use arrays and loops, or use Python or Ruby.

But array map/select/reject is so cool… wouldn’t it be neat in C?

infinite_loop:
   printk(KERN_ERR "/me wants array_map in C");
   goto infinite_loop;

Best,

Atto

Categories: Uncategorized Tags: , , , ,

Simple Recursion

October 2, 2010 1 comment

I started writing this post last week, then an update to the wordpress app crashed and I lost the draft. With any luck, this post will be better than last week’s post would’ve been 🙂

A few weeks back, there was an article or two posted to digg/reddit/slashdot about recursion. The basic bottom line was something to the effect that 90% of programmers fail at simple recursion, which is both surprising and expected.

Surprising, because simple recursion really is… simple recursion. Most coders have seen the infamous Fibonacci series, and the (pseudo) code is quite simple:

def fib(n):
   if n == 1 or n == 0:
      return 1
   else:
      return fib(n-1) + fib(n-2)

You have a simple basis case (two cases, n is one or zero) to terminate the recursion, and if the inputs are not trivial to compute then you break the problem down into smaller problems and “recurse.” in the fibonacci example above, it’s pretty simple code to compute a simple numeric series… So it surprises me that coders can/do routinely fumble on this exact problem. I have personally seen multiple people fail this question in a interview with significant prompting. So it shouldn’t surprise me, but it does.

Now some people will argue with me and say the Fibonacci series is not a good question because it’s too (basic, academic, fill in the blank). And generally speaking, they’re correct – the Fibonacci series is a pretty basic, boring problem which is solvable with simple, basic code. So I am definitely interested in more complex, more advanced code which tackles more complex problems (merge sort anyone?). However someone who doesn’t get the Fibonacci series (which is a classic recursion example) is very unlikely to ace more challenging questions.

At this point, I have several other thoughts about recursion – non-trivial recursion is, not surprisingly, non-trivial; and I’d like to give some of the sorting algorithms a decent write-up. But for now I think I’ll keep it short and to the point:

Recursion can be elegant, fun, and simple. We should all spend a little more time recursing.

🙂

Categories: Uncategorized Tags: , ,

Three things I love about python

September 23, 2010 Leave a comment

I overheard a coworker spewing hate over python a few week’s back, he really was going on about it – I honestly couldn’t understand why he needed to hate it so badly. It’s just a language, if you don’t like it then don’t use it. He was ranting about it so much that it made an impression on me, and later that night I got to thinking and remembered several things I really like about python.

Eight years ago I stumbled across python for a personal project, and I quickly grew to love its simplicity and power. Then I got distracted by school, life, and work, there was a bunch of perl and C thrown in the mix, and I forgot about python.

Recently, I picked it back up again for personal projects on github (etc) and now I wonder why I ever stopped.

Yes it’s just a programming language, it won’t make your breakfast for you, and it doesn’t really matter if you like it, think it smells funny, think .NET is better, etc. I do want to take a moment and share three things I really like about python, three reasons why I think it’s better than sliced white bread. 🙂

Whitespace Matters

I know that lots of people hate python for this very reason, but honestly after seeing so many ugly, obfuscated lines of C and hearing endless debates over where to put the curly braces and so on – there’s something so completely liberating and beautiful about

if len(mystring) > threshold:
    do_stuff()

Yes, you’re not as free to define your own coding style as you would be in C – but you also *never* have to look at something awful like

if(strlen(mystring) > threshold){ do_stuff();
  }
else  {
do_some_other_stuff(); }

And anyone who spends a lot of time in C knows this is a trivial and not too bothersome example.

Compiled Python

Another favorite complaint about python is performance. Your boss/friend/coworker/neighbor makes some ridiculous blanket statement like “python is so slow, look how much faster X is.”

Well, they’re probably right. So if you’re writing filesystems or operating systems, or if performance matters to you that much, then use language X (usually assembly, C, or D). If your app isn’t controlling a cruise missile or processing high-frequency trading & bank transactions – then maybe python is fast enough.

Or at least, that used to be the case. With compiled python, I’m not so sure anymore. There are clearly cases where you can use python to speed up development, write cleaner code, and then get pretty respectable performance without using a lower level language. Heck, some people even use python for firmware.

Irrespective of performance, there are other benefits to compiled python – from what I understand, the python interpreter (aka virtual machine) gets embedded into your “executable”, this alleviating the requirement that python be installed on every machine you want your code to run on. Standalone executables are sometimes very nice to have and can help simplify deployment of your application, especially on non-Linux/Unix operating systems.

Interactive Mode

Like most of the other things on this post, this feature isn’t unique to python. Ruby and most of the modern dynamic languages have some kind of interactive mode / interpreter. I love being able to try out code interactively, it really makes a difference for me when developing code in python. There’s something totally awesome about being able to do the following:

$ python
Python 1.5.2b2 (#1, Feb 28 1999, 00:02:06)  [GCC 2.8.1] on sunos5
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> if 0 != 1:
...     print "Zero is not one! Awesome!"
Zero is not one! Awesome!
>>>

(Note: python startup header copy/pasted from python 2.5 tutorial)

Again, this example is pretty trivial (like most examples in my blog) but hey, you get what you pay for. I really love the interactive mode, it lets me try out an idea or snippet of code independent of the surrounding code it will eventually be embedded in. It’s almost like having a temporary, quick, throw-away unit test for mini code snippets. For me, it’s yet another reason to like python and ruby.

Thanks for reading, comments are always welcome

atto

Categories: Uncategorized Tags: , ,

Two things I hate about debugfs

August 29, 2010 Leave a comment

I am a major fan of the Linux kernel’s debugfs filesystem. For those not familiar with its awesome power, read these LWN articles from 2004 and 2009, and the Wikipedia entry.

I’ve written IOCTLs in the past, and always felt like the sterotypical monolithic switch statement was a bit lame – so once I ran across debugfs, I kicked ioctls to the curb and never looked back. There’s something so completely elegant and awesome about a simple

$ cat /sys/kernel/debug/mymod/var1; echo 'erase' > /sys/kernel/debug/mymod/buffer2

But as I get more and more into debugfs, two things keep irritating me and hence this post.

Gripe #1: EXPORT_SYMBOL_GPL

Why in the world is debugfs exported as GPL only? Do the kernel developers really think that people who write proprietary modules don’t need debugfs? Or do they really want people abusing /sys or /proc files even further, leading to more mess and chaos?

Or, is it yet another manefestation of Linux’s split personality disorder reaction to non-GPL code? After all, the core kernel developers still can’t decide if non-GPL kernel modules are a violation of the GPL… so I shouldn’t be surprised to see something as awesome and inspiring as debugfs locked down to GPL exports.

But I am surprised, and it does grate on me. I love debugfs but hate that it’s exported GPL only.

Gripe #2: boilerplate code

Now this one is a bit more subjective, and quite possibly a sign that I’m doing something wrong… but I find myself writing a lot of boilerplate code. Sure, debugfs makes it super trivial to read/write simple variables —

struct dentry my_foo = debugfs_create_x32(name, perms, root, u32_ptr);

But creating a more complex file with custom logic requires a struct file_operations which looks quite boilerplate:

static struct file_operations file1_ops = {
    .open = file1_open,
    .release = file1_release,
    .read = file1_read,
    .write = file1_write
};

That’s not so bad when you’re creating one debugfs file, but try creating 5 or 10 or 20. After a while, all the structs look ver very similar, and all the functions look the same too. Open sets the file pointer’s private data to the inode’s private data, possibly allocates some RAM. Read does a simple_read_from_buffer. Write kmalloc’s, copies from users, then does a strncmp to see if the user echo’d the right magic token to kick off some logic. Close undoes whatever open did.

So while I love the flexibility and power of debugfs, the boilerplate code lends to many copy/paste errors and leaves me wishing debugfs would handle more for me in the normal boring cases. Or at least add some new file types… debugfs_create_text_buffer() anyone?

In Summary

In summary, I love debugfs. I’m a die hard Linux command line freak and debugfs fits me like a glove. But my experience with debugfs hasn’t been completely Utopian and there’s a couple of annoyances that leave me wanting a little more polish.

Hmmm…. wait… it’s open source right?

Maybe I’ll chip in some code, so I’m not just the whiny poser throwing stones from the corner.

Best

Atto

Categories: Uncategorized Tags: , ,