True Zero-Copy with XIP vs PRAMFS

November 24, 2010 12 comments

Introduction

A couple of weeks ago, Louis Brandy wrote a blog post entitled “Memory mapped IO for fun and profit,” which I found rather fascinating. I’ve always loved memory mapped files, for a lot of the reasons listed on his blog. It really makes your software simpler to be able to treat a chunk of memory like… a chunk of memory, letting the OS handle the IO behind the scenes as deemed appropriate.

I really liked his blog post, it was a great example usage case where mmap() didn’t just make the code easier to debug and/or understand, it actually provided a real, noticeable performance boost.

But one thing that’s always bugged me about MMAP’d files is they really aren’t “zero copy.” They’re more like ‘one copy’, because you still have to fault and fetch/flush pages via the normal IO mechanism. What if there was a way to actually perform zero-copy MMAP’d files? Granted, you either need RAM or something else that is direct-mapped into the processor (PCM/CMOx/Memristors anyone?), but I’d like to test my theory that this could lead to higher performance than the current mechanism.

The diagram below shows the various paths that files can take through the system, I’ve tried to color code it but probably made a mess of it.

The green arrows show the normal I/O flow with two copies of the data from retrieved from storage, the red arrows show the normal MMAP’d files, and the blue line labelled “mmap + direct_access” is what I’m all keen to test.

The basic idea I’m trying to convey is that conventional IO (either from a RAM disk or a regular SATA HDD) involves a fair amount of copying. Data is copied into the user program’s stack/heap, but it’s also copied into the page cache. So mmap’d files are neat because you can skip some of the copying and do I/O behind the scenes to flush dirty pages, etc. But what I’m after is a true zero-copy, which requires a RAMdisk.

I dig some digging, and found the following snippet from Documentation/filesystems/xip.txt:

A block device operation named direct_access is used to retrieve a reference (pointer) to a block on-disk. The reference is supposed to be cpu-addressable, physical address and remain valid until the release operation is performed.

I did some more digging, and I think what I’m really looking for is like ext2’s XIP feature — but not so much from a “I want to execute this code from NOR” standpoint, more like I want MMAP’d files which are directly addressable by the CPU.

And then there’s PRAMFS, a new-ish filesystem which doesn’t use block devices or the block layer at all – preferring to directly control the target [RAM/PRAM/???] with some page table protection thrown in for fun.

So, I set out to test which of these three methods gives the best bang for our proverbial buck.

People who just want to see the graphs / data, feel free to skip to the conclusion.

The Contenders

In no order of preference, here are the various IO methods I plan to test

  • ext2 + direct_access
  • pramfs
  • mmap() + block IO
  • libaio + block IO [no mmap]

My current plan is to use the fio benchmark with the “mmap” IO engine, because a) I have no useful examples / use cases, b) fio really is a flexible IO tester 🙂

Creating the ext2 + direct_access method was actually quite complex to setup, I decided to write a custom RAMdisk with direct_access (xiprd) and I ended up rebuilding my kernel for ext2+xip support… I’m not going to go into detail on getting this setup / working, google for “how to compile a kernel the Ubuntu way” if you’re curious.

Benchmark machine setup:

  • i7-920 @ 2.67 GHz
  • 6GB DDR3 memory
  • Ubuntu 10.04 with vanilla kernel 2.6.36.1 + ftrace + ext2 XIP

ext2 + direct_access

Once I got my custom kernel with ext2 “xip” (execute in place) support built in, I loaded my custom block device driver (xiprd) with a 2 GB RAM-disk. Of course, trying to run the fio MMAP benchmark resulted in a crash and a system reboot… followed by a harried bug hunt.

This is still a work in progress, I think the bug has something to do with my using vmalloc() instead of kmalloc(), but switching to kmalloc() limits my RAMdisk size to 1MB. Using vmalloc, I keep getting the error:


page_offset=14
[ 1720.844373] ramdisk=ffff880106e00000, kaddr=ffff880106e0e000, pfn=1076750
[ 1720.954378] fio[3749]: segfault at 7fe0265b9328 ip 000000000041b17e sp 00007fff43825f40 error 4 in fio[400000+3b000]

Hacking xiprd to use kmalloc, I then had to force mkfs.ext2 to use block size of 4K (-b 4096) otherwise the mount failed.

$ sudo fio --bs=4k --ioengine=mmap --iodepth=1 \
--numjobs=10 --size=80k --direct=1 --runtime=10 --directory=/mnt/xip/ \
 --name=rand-read --rw=randread --group_reporting --time_based
(laying out files)
rand-read: (groupid=0, jobs=10): err= 0: pid=4363
  read : io=21079MB, bw=2108MB/s, iops=539567, runt= 10001msec
    clat (usec): min=2, max=28356, avg=13.28, stdev=53.77
  cpu          : usr=12.75%, sys=55.71%, ctx=31149, majf=0, minf=5396522
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=5396212/0, short=0/0
     lat (usec): 4=0.05%, 10=48.70%, 20=50.97%, 50=0.07%, 100=0.01%
     lat (usec): 250=0.07%, 500=0.03%, 750=0.01%, 1000=0.01%
     lat (msec): 2=0.02%, 4=0.02%, 10=0.02%, 20=0.01%, 50=0.01%

Run status group 0 (all jobs):
   READ: io=21079MB, aggrb=2108MB/s, minb=2158MB/s, maxb=2158MB/s, mint=10001msec, maxt=10001msec

OK, 2.1 GB/s of random read bandwidth and 13.28 us round-trip using direct_access, not too shabby. However, this is still using a 1MB RAM disk, so I need to look into fixing the vmalloc issues.

Update: I tried booting with mem=3072m, and hacked xiprd to use __va(0x10000000) instead of vmalloc’ing. I also tried an ioremap_cache() which failed. Using __va() appeared to work, but rebooted the system when trying to make the filesystem, so this is definitely a WIP.

mmap + pramfs

Work in progress, I patched pramfs into the kernel source, rebuilt it, rebooted with mem=3072m, and tried creating a pramfs filesystem with

sudo mount -t pramfs -o physaddr=0x100000000,init=2000M,bs=4k none /mnt/pram

When I boot the kernel with only 3GB of RAM, 0x100000000 is a valid system RAM address which is not in use – so this works fine. But when I run fio against it, my system locks up when laying out the last file. So this is still a work in progress, maybe I ran past the RAM or who knows. I’ll keep trying and update if I find anything useful.

I tried a few more times, with mmap and libaio engines but couldn’t get it to work – so I filed bug 3118126 for the curious.

UPDATE: I finally got back around to compiling PRAMFS without memory protection and XIP. I guess the kernel doesn’t like setting PTE’s for RAM it’s not tracking, which makes sense.

Results for libaio:

sudo fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=10 \
 --size=180m  --direct=1 --runtime=60 --directory=/mnt/pram/ \
 --name=rand-read --rw=randread --time_based --group_reporting
(laying out files, etc)
Jobs: 10 (f=10): [rrrrrrrrrr] [100.0% done] [841M/0K /s] [210K/0 iops] [eta 00m:00s]
rand-read: (groupid=0, jobs=10): err= 0: pid=1741
  read : io=49262MB, bw=840623KB/s, iops=210155, runt= 60008msec
    slat (usec): min=33, max=30023, avg=37.08, stdev=14.81
    clat (usec): min=0, max=20027, avg= 0.41, stdev= 1.21
    bw (KB/s) : min=48520, max=105888, per=12.47%, avg=104790.37, stdev=408.00
  cpu          : usr=2.49%, sys=77.28%, ctx=55025, majf=0, minf=356
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=12611029/0, short=0/0
     lat (usec): 2=99.98%, 4=0.02%, 10=0.01%, 20=0.01%, 50=0.01%
     lat (usec): 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
     lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%

Run status group 0 (all jobs):
   READ: io=49262MB, aggrb=840623KB/s, minb=860798KB/s,
    maxb=860798KB/s, mint=60008msec, maxt=60008msec

For the impatient, that’s ~840 MB/s with libaio.

How about mmap + PRAMFS?

sudo fio --bs=4k --ioengine=mmap --iodepth=1 --numjobs=10 \
--size=180m  --direct=1 --runtime=60 --directory=/mnt/pram/\
 --name=rand-read --rw=randread --time_based --group_reporting
(cut)
Jobs: 10 (f=10): [rrrrrrrrrr] [100.0% done] [6827M/0K /s] [1707K/0 iops] [eta 00m:00s]
rand-read: (groupid=0, jobs=10): err= 0: pid=2260
  read : io=382030MB, bw=6367MB/s, iops=1630K, runt= 60001msec
    clat (usec): min=1, max=22316, avg= 4.35, stdev=21.06
    bw (KB/s) : min=44208, max=347336, per=1.31%, avg=85323.68, stdev=525.67
  cpu          : usr=48.01%, sys=31.28%, ctx=60143, majf=460800, minf=97339361
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=97799801/0, short=0/0
     lat (usec): 2=0.01%, 4=38.14%, 10=61.33%, 20=0.01%, 50=0.47%
     lat (usec): 100=0.03%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
     lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%

Run status group 0 (all jobs):
   READ: io=382030MB, aggrb=6367MB/s, minb=6520MB/s, maxb=6520MB/s, mint=60001msec, maxt=60001msec

Awesome! 6.3 GB/s of read speed using MMAP. I’m totally loving MMAP, it rocks.

mmap + block IO

I chose ext4 for this test, because it’s the most prevalent at the time of this writing and because rumor pegs it as being more stable than btrfs.

I loaded my xiprd driver so it would be a more fair comparison – RAM disk against RAM disk, hopefully less oranges and more apples to apples.


$ sudo fio --bs=4k --ioengine=mmap --iodepth=1 --numjobs=10 --size=180m \
 --direct=1 --runtime=60 --directory=/mnt/xip/ --name=rand-read \
--rw=randread --time_based --group_reporting
(laying out files)
rand-read: (groupid=0, jobs=10): err= 0: pid=2504
  read : io=392968MB, bw=6549MB/s, iops=1677K, runt= 60001msec
    clat (usec): min=1, max=24708, avg= 4.89, stdev=33.22
    bw (KB/s) : min=188848, max=367640, per=3.86%, avg=258557.40, stdev= 0.00
  cpu          : usr=49.12%, sys=30.48%, ctx=55807, majf=460797, minf=100139571
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=100599920/0, short=0/0
     lat (usec): 2=0.01%, 4=29.03%, 10=70.76%, 20=0.17%, 50=0.03%
     lat (usec): 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
     lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%

Run status group 0 (all jobs):
   READ: io=392968MB, aggrb=6549MB/s, minb=6707MB/s, maxb=6707MB/s, mint=60001msec, maxt=60001msec

The main highlights here are the 6549 MB/s of rand read bandwidth, and 4.89 usec of clat. That completely blows direct_access+MMAP out of the water, and is pretty awesome for a MMAP'd ramdisk.

libaio + block IO

Just for complete and total fairness, I re-ran the fio against my RAM-disk with libaio (async IO) instead of MMAP:

 $ sudo fio --bs=4k --ioengine=libaio --iodepth=1 --numjobs=10 \
--size=180m --direct=1 --runtime=60 --directory=/mnt/xip/ \
--name=rand-read --rw=randread --time_based --group_reporting
(laying out files)
Jobs: 10 (f=10): [rrrrrrrrrr] [100.0% done] [4521M/0K /s] [1130K/0 iops] [eta 00m:00s]
rand-read: (groupid=0, jobs=10): err= 0: pid=4835
  read : io=264571MB, bw=4410MB/s, iops=1129K, runt= 60000msec
    slat (usec): min=2, max=32699, avg= 6.53, stdev=33.72
    clat (usec): min=0, max=21591, avg= 0.66, stdev=10.33
    bw (KB/s) : min=161100, max=368408, per=6.85%, avg=309377.89, stdev=12854.65
  cpu          : usr=18.43%, sys=60.72%, ctx=74361, majf=0, minf=367
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=67730119/0, short=0/0
     lat (usec): 2=99.84%, 4=0.15%, 10=0.01%, 20=0.01%, 50=0.01%
     lat (usec): 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
     lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%

Run status group 0 (all jobs):
   READ: io=264571MB, aggrb=4410MB/s, minb=4515MB/s, maxb=4515MB/s, mint=60000msec, maxt=60000msec

Skipping to the bottom line, using an MMAP'd RAMdisk was ~2GB and ~2us faster. That is pretty awesome.

Conclusion(s)

UPDATE: I finally added some PRAMFS numbers below. Enjoy!

Regular, memory-mapped files win by a small amount when it comes to bandwidth -- but PRAMFS is a very close contender.

How about latency?

Regular mmap'd files ALMOST win here also. PRAMFS wins by a hair on this test, but only when using MMAP - PRAMFS + libaio is rather... not good. This is *not* what I expected, but is rather fascinating. Granted, all of this data is collected against a 2 GB RAM disk... but it is pretty interesting that regular bio-based transfers between the page cache and my RAM disk beat out a 1MB mmap'd, kmalloc'd, direct_access()'d RAMdisk.

What does this mean in real life?

I have no idea... probably the only real conclusion you can draw from all this data / graphs is something like

  1. Memory-mapped files can be faster for some workloads
  2. XIP is no match for regular MMAP'd files, in its present state

Oh, and here's a link to the spreadsheet (mmap_fio_data.xls) I used to generate the above graphs.

Atto

Extreme Dollhouse Programming

November 13, 2010 Leave a comment

My kids woke up at 5am on Saturday morning, which was not surprising or particularly unusual.

Being the awesome, totally cool dad that I am :-), I let my wife sleep in while I entertained the kids.

Also being a geek / engineer, it obviously wasn’t good enough to just play with my kids’ toys… I soon found myself balancing dollhouse people on top of dollhouse furniture on top of dollhouses.

Ladies and Gentlemen, without further ado I give you… Extreme Dollhouse Programming.

I learned three things from playing with dollhouse toys:

  1. engineer + dollhouse = weird things happen
  2. it’s really hard to balance odd shapes while kids are trying to knock them down
  3. a lot of software is built just like these rickety furniture stacks

On #1, what else can I really say… weird stuff happens when you let engineers out of their cubicles. My wife tells me that I should get my head examined, because normal people might play house or act out episodes from Lifetime TV. Leave it to an engineer to stack Grandma 6 chairs and a toilet up in the sky.

#2 almost goes without saying… but it’s quite entertaining to see just how fast you can rebuild your tower before it gets knocked down by your toddler. Think of it like a game of reverse speed jenga. Someday it’ll be a competive Olympic sport… I can almost see my gold medal now 😉

And #3 is my lame excuse of a tie-in to justify this post.

But seriously though, how many projects have you worked on where you felt like the whole project could come crashing down at any minute? How many complex software systems are thrown together at high speed, held together by baling wire and ugly Perl scripts? How many projects have no formal requirements, or worse yet no real customers?

How many software projects are really, carefully, methodically planned and executed in an elegant way?

This is not a critique / rant where I tear into the software industry and make stupid arguments like “software developers suck” — I’m thinking more about the way I approach software development, and thinking we all have room for improvement.

And while there are some great ideas found in eXtreme Programming, Agile, etc – I don’t think there’s one true software development style or approach. More like there are some good ideas out there, and everyone should use these ideas to improve themselves and get better at the “craft.”

So here’s to building better dollhouses software, self improvement, and all that jazz.

Atto

Categories: Uncategorized Tags: , ,

Joel on Software

November 3, 2010 Leave a comment

I’m sitting at the Garden Court hotel in Palo Alto, California where there are no less than three bubbling fountains within earshot. I’m seated in an open-air courtyard (see photos below) where I’m impatiently waiting for 3:00 to roll around.

Joel Spolsky of “Joel on Software” fame is doing a world tour to pitch FogBugz and Kiln. I’m not familiar with FogBugz (been a JIRA/Crucible user for a while) but I’m mildly excited about hearing about FogBugz, especially if he demos Evidence Based Scheduling. Kiln, well, yes I am very curious to find out more about what they’ve done to Mercurial especially with the recount announcement that Atlassian purchased BitBucket. I’ve been wondering for years why people spend so much time worrying about revision control tools and forget about the whole rest of the stack – GitHub is a perfect example of going beyond just revision control to include the whole change flow from code to bugs to review and so on.

So yes, I’m excited to learn more about Kiln… But the real reason I’m here is Joel.

It’s silly I know, but I’ve been following Joel’s blog for a number of years now and I’m sad that he’s stopped. I’m sure that after 10 years of blogging he’s on to new and better things – but Joel has always stood out in my mind as someone who makes things happen, someone who makes a difference. Am I personally a better developer because of Joel’s ramblings? Maybe, hard to prove. But he made me think new thoughts and helped me to see the world in a slightly different light, see things from another perspective. Do I agree with his thinking that .NET is awesome and Open source is pointless/misguided? Not in the slightest. But I am excited to finally meet the man who challenged my way of thinking and ultimately made me a better person through his long and analogy-ridden blog posts.

Well it’s getting close to the event, so I’m going to go join in the fun.

Best,

Atto

Fountain
courtyard
iPad

Categories: Uncategorized Tags: , ,

Installing Linux

October 31, 2010 Leave a comment

I still have a hazy recollection of my very first experience installing Linux.

My brother and I had read about Linux on some bulletin board (or maybe it was a library book, not sure), and we bought slackware 3.x CDs from some online retailer. Remember the days when you had to buy CDs because downloading 700 MB was completely and totally impossible over a 9600 baud modem? Man, those were the good old days.

We waited and waited for those CDs to come in the mail, and it took *forever*. Well, it felt like forever to me and my brother, but maybe it was a week or so.

We ripped open the packaging, threw the CD into the drive, and booted into the (curses-based) text installer. A few steps later, it asked us which drive partition to install Linux into. Partition? What’s that, some kind of privacy screen? Being the clueless noobs that we were, we told slackware to blow away the existing partitions and create new ones. Little did we know we had just tossed all our schoolwork, and 8 months of work on a 3D game engine we thought would make us rich and famous.

The next few hours were pretty stressful, as we ran all over the house looking for anything we might have backed up to a 3.5″ floppy, then we searched all the 5″ floppies. No dice, all our code was gone.

On the bright side, I stopped spending all my free time on Pascal and 3D models and started focusing more on my homework. Also, we both got completely and totally hooked on Linux. Something about the whole experience either scared us into learning more, or intrigued us – we spent the next several years fiddling with every *nix distro we could get our hands on. I’ve installed (in no particular order) Slackware, Debian, Knoppix, Gentoo, FreeBSD, OpenBSD, NetBSD, Ubuntu, MeeGo, Fedora, RedHat, CentOS, OpenSUSE, SLES, and probably some I’ve forgotten.

Along the way, I’ve picked up some useful skills that I turned into a career which I owe largely to the open nature of Linux. But that’s a story for another day.

Fast forward a bit to 2010. I recently made the switch back to Linux from Windows… on my work laptop.

I’ve been using Linux fairly consistently for about 15 years, but a lot of that time it’s been on desktops and/or servers. I’ve tried all kinds of Linux distros on laptops, but nothing ever seemed to work right. The video card didn’t work, or the wireless driver crashed, whatever. And even assuming the drivers were fine and everything worked on boot, let’s face it – there’s a lot of reasons Linux is in the minority of desktop OSes.

But I digress. I installed Ubuntu 10.10 (aka Maverick Meerkat) on a thinkpad, and I was pleasantly surprised. The installer was nice looking, easy to navigate. The partitioning wizard automatically resized my Windows partition so I can dual-boot if I ever need to. Installing the graphics card driver was painless, the sound card works out of the box, and I was pretty much blown away by how accessible Linux is on 10.10.

I guess the proper way to summarize it is Ubuntu makes Linux painless / easy. Or at least, nearly painless. Evolution/exchange support is still buggy as all get out, plugging into my docking station doesn’t change my monitors, and suspend/resume/hibernate always crashes… but all things told, Linux ala Ubuntu has come a long long way.

By way of reference, I’ve had to install Fedora, CentOS, and OpenSUSE recently and not much has changed in 10 years. The desktop is still ugly and inaccessible, the package managers rely on mirrors which are broken more often than not – and OpenSUSE refused to recognize my existing partitions, I had to boot a Ubuntu LiveCD to delete them so OpenSUSE would install.

Also as an interesting data point, I found a Ubuntu 6.06 LiveCD lying around and booted it up for old times sake. Wow, I forgot how terrible Linux was as a Desktop OS back in the day. I’ve always been a Linux junkie and I LOOOOOOVE the command line… but I am very glad to see the progress of Ubuntu and I have high hopes for Unity on the desktop.

So that’s all really, just a rambling long story about my experiences installing Linux. I spend most of my time fiddling around in the kernel, but it’s amazing how much impact minor usability improvements can have for even a kernel hacker / developer like me.

Here’s hoping the future of Linux is even brighter, and that we’ll see amazing user interfaces and other improvements that make Ubuntu 10.10 look pathetic.

Rock on Canonical, rock on.

Atto

Categories: Uncategorized Tags: , , ,

Array map in C

October 16, 2010 4 comments

Last year, I got “The Ruby Way” (link) as a gift to myself. I’m not a major Ruby freak, but there are some cool things about it and I really enjoy Ruby in general.

As I read my way through the book, I remember reading about array map. Array map is cool, it let’s you do things like

my_strings.map {|s| s.strip! }

Which is roughly equivalent to

new_arr = []
my_strings.each {|s| new_arr.append(s.strip) }

Of course, this is kindof a stupid example and you can do MUCH more interesting things with Array#select or Array#reject – like write a one-liner to find all the phone numbers in an array that start with 351 or don’t hash to the same bucket, whatever. Anyway, I remember being really excited about array#map for some reason when I read that chapter. And whenever I get excited about something… weird things tend to happen. 🙂

See, I spend a lot of time writing programs in C. It’s a great language, and pretty much the only option in the kernel/filesystem/embedded world. So when I see a cool feature in another language (like Array map), I think “gee, wouldn’t it be awesome to have that in C.”

Being a frequent C coder, I do a lot of things with/to/because of arrays – and the two most common things I do with arrays are indexing and looping. Indexing is cool because it’s an O(1) operation, and that’s great for writing fast code that uses a lot of RAM. Looping, on the other hand… well, it’s usefulness depends on the problem at hand, but this is pretty common:

for(i = 0; i < count; i--);
   do_stuff_with(&my_array[i]);

Ten bonus points if you can spot the subtle bugs I put into this trivial for loop. Hint #1: there are three of them. Hint #2: the program (or kernel/firmware/driver) will not do anything useful. Beyond the obvious problems with this code (and how easy it is to accidentally slip in the extra semicolon), writing for loops gets really tedious and old sometimes.

So I keep thinking that it’d be neat to write a C library to implement Array map, select, reject. But then I start thinking about the code. To do this properly, you could use macros, something like

#define ARRAY_MAP(arr, size, func)                       \
        for(int i=0; i<(size); i++) (func)(arr[i]);
.
.
my_arr = malloc(sizeof(u32)*10);
memcpy(my_arr, arr1, sizeof(u32)*10);
ARRAY_MAP(my_arr, 10, do_stuff);

Well, that looks like junk and doesn’t really do much. You could implement it with callbacks to avoid the evils of macros, but that doesn’t help you much either. You still have to handle memory allocation, deallocation, type safety, and in the end the code doesn’t look any simpler.

Not to mention, for anything really high performance (filesystems, OSes, etc) you generally want to avoid looping over arrays in general – the Linux kernel now has a native hash table and circular buffer (among other fun data types). If performance is critical, use a better/faster/cooler algorithm and a more appropriate data type.

So I’m back to square one, array map in C is pointless. If you don’t care about runtime, just use arrays and loops, or use Python or Ruby.

But array map/select/reject is so cool… wouldn’t it be neat in C?

infinite_loop:
   printk(KERN_ERR "/me wants array_map in C");
   goto infinite_loop;

Best,

Atto

Categories: Uncategorized Tags: , , , ,

Simple Recursion

October 2, 2010 1 comment

I started writing this post last week, then an update to the wordpress app crashed and I lost the draft. With any luck, this post will be better than last week’s post would’ve been 🙂

A few weeks back, there was an article or two posted to digg/reddit/slashdot about recursion. The basic bottom line was something to the effect that 90% of programmers fail at simple recursion, which is both surprising and expected.

Surprising, because simple recursion really is… simple recursion. Most coders have seen the infamous Fibonacci series, and the (pseudo) code is quite simple:

def fib(n):
   if n == 1 or n == 0:
      return 1
   else:
      return fib(n-1) + fib(n-2)

You have a simple basis case (two cases, n is one or zero) to terminate the recursion, and if the inputs are not trivial to compute then you break the problem down into smaller problems and “recurse.” in the fibonacci example above, it’s pretty simple code to compute a simple numeric series… So it surprises me that coders can/do routinely fumble on this exact problem. I have personally seen multiple people fail this question in a interview with significant prompting. So it shouldn’t surprise me, but it does.

Now some people will argue with me and say the Fibonacci series is not a good question because it’s too (basic, academic, fill in the blank). And generally speaking, they’re correct – the Fibonacci series is a pretty basic, boring problem which is solvable with simple, basic code. So I am definitely interested in more complex, more advanced code which tackles more complex problems (merge sort anyone?). However someone who doesn’t get the Fibonacci series (which is a classic recursion example) is very unlikely to ace more challenging questions.

At this point, I have several other thoughts about recursion – non-trivial recursion is, not surprisingly, non-trivial; and I’d like to give some of the sorting algorithms a decent write-up. But for now I think I’ll keep it short and to the point:

Recursion can be elegant, fun, and simple. We should all spend a little more time recursing.

🙂

Categories: Uncategorized Tags: , ,

Three things I love about python

September 23, 2010 Leave a comment

I overheard a coworker spewing hate over python a few week’s back, he really was going on about it – I honestly couldn’t understand why he needed to hate it so badly. It’s just a language, if you don’t like it then don’t use it. He was ranting about it so much that it made an impression on me, and later that night I got to thinking and remembered several things I really like about python.

Eight years ago I stumbled across python for a personal project, and I quickly grew to love its simplicity and power. Then I got distracted by school, life, and work, there was a bunch of perl and C thrown in the mix, and I forgot about python.

Recently, I picked it back up again for personal projects on github (etc) and now I wonder why I ever stopped.

Yes it’s just a programming language, it won’t make your breakfast for you, and it doesn’t really matter if you like it, think it smells funny, think .NET is better, etc. I do want to take a moment and share three things I really like about python, three reasons why I think it’s better than sliced white bread. 🙂

Whitespace Matters

I know that lots of people hate python for this very reason, but honestly after seeing so many ugly, obfuscated lines of C and hearing endless debates over where to put the curly braces and so on – there’s something so completely liberating and beautiful about

if len(mystring) > threshold:
    do_stuff()

Yes, you’re not as free to define your own coding style as you would be in C – but you also *never* have to look at something awful like

if(strlen(mystring) > threshold){ do_stuff();
  }
else  {
do_some_other_stuff(); }

And anyone who spends a lot of time in C knows this is a trivial and not too bothersome example.

Compiled Python

Another favorite complaint about python is performance. Your boss/friend/coworker/neighbor makes some ridiculous blanket statement like “python is so slow, look how much faster X is.”

Well, they’re probably right. So if you’re writing filesystems or operating systems, or if performance matters to you that much, then use language X (usually assembly, C, or D). If your app isn’t controlling a cruise missile or processing high-frequency trading & bank transactions – then maybe python is fast enough.

Or at least, that used to be the case. With compiled python, I’m not so sure anymore. There are clearly cases where you can use python to speed up development, write cleaner code, and then get pretty respectable performance without using a lower level language. Heck, some people even use python for firmware.

Irrespective of performance, there are other benefits to compiled python – from what I understand, the python interpreter (aka virtual machine) gets embedded into your “executable”, this alleviating the requirement that python be installed on every machine you want your code to run on. Standalone executables are sometimes very nice to have and can help simplify deployment of your application, especially on non-Linux/Unix operating systems.

Interactive Mode

Like most of the other things on this post, this feature isn’t unique to python. Ruby and most of the modern dynamic languages have some kind of interactive mode / interpreter. I love being able to try out code interactively, it really makes a difference for me when developing code in python. There’s something totally awesome about being able to do the following:

$ python
Python 1.5.2b2 (#1, Feb 28 1999, 00:02:06)  [GCC 2.8.1] on sunos5
Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
>>> if 0 != 1:
...     print "Zero is not one! Awesome!"
Zero is not one! Awesome!
>>>

(Note: python startup header copy/pasted from python 2.5 tutorial)

Again, this example is pretty trivial (like most examples in my blog) but hey, you get what you pay for. I really love the interactive mode, it lets me try out an idea or snippet of code independent of the surrounding code it will eventually be embedded in. It’s almost like having a temporary, quick, throw-away unit test for mini code snippets. For me, it’s yet another reason to like python and ruby.

Thanks for reading, comments are always welcome

atto

Categories: Uncategorized Tags: , ,