Home > Uncategorized > Array map in C

Array map in C


Last year, I got “The Ruby Way” (link) as a gift to myself. I’m not a major Ruby freak, but there are some cool things about it and I really enjoy Ruby in general.

As I read my way through the book, I remember reading about array map. Array map is cool, it let’s you do things like

my_strings.map {|s| s.strip! }

Which is roughly equivalent to

new_arr = []
my_strings.each {|s| new_arr.append(s.strip) }

Of course, this is kindof a stupid example and you can do MUCH more interesting things with Array#select or Array#reject – like write a one-liner to find all the phone numbers in an array that start with 351 or don’t hash to the same bucket, whatever. Anyway, I remember being really excited about array#map for some reason when I read that chapter. And whenever I get excited about something… weird things tend to happen. 🙂

See, I spend a lot of time writing programs in C. It’s a great language, and pretty much the only option in the kernel/filesystem/embedded world. So when I see a cool feature in another language (like Array map), I think “gee, wouldn’t it be awesome to have that in C.”

Being a frequent C coder, I do a lot of things with/to/because of arrays – and the two most common things I do with arrays are indexing and looping. Indexing is cool because it’s an O(1) operation, and that’s great for writing fast code that uses a lot of RAM. Looping, on the other hand… well, it’s usefulness depends on the problem at hand, but this is pretty common:

for(i = 0; i < count; i--);
   do_stuff_with(&my_array[i]);

Ten bonus points if you can spot the subtle bugs I put into this trivial for loop. Hint #1: there are three of them. Hint #2: the program (or kernel/firmware/driver) will not do anything useful. Beyond the obvious problems with this code (and how easy it is to accidentally slip in the extra semicolon), writing for loops gets really tedious and old sometimes.

So I keep thinking that it’d be neat to write a C library to implement Array map, select, reject. But then I start thinking about the code. To do this properly, you could use macros, something like

#define ARRAY_MAP(arr, size, func)                       \
        for(int i=0; i<(size); i++) (func)(arr[i]);
.
.
my_arr = malloc(sizeof(u32)*10);
memcpy(my_arr, arr1, sizeof(u32)*10);
ARRAY_MAP(my_arr, 10, do_stuff);

Well, that looks like junk and doesn’t really do much. You could implement it with callbacks to avoid the evils of macros, but that doesn’t help you much either. You still have to handle memory allocation, deallocation, type safety, and in the end the code doesn’t look any simpler.

Not to mention, for anything really high performance (filesystems, OSes, etc) you generally want to avoid looping over arrays in general – the Linux kernel now has a native hash table and circular buffer (among other fun data types). If performance is critical, use a better/faster/cooler algorithm and a more appropriate data type.

So I’m back to square one, array map in C is pointless. If you don’t care about runtime, just use arrays and loops, or use Python or Ruby.

But array map/select/reject is so cool… wouldn’t it be neat in C?

infinite_loop:
   printk(KERN_ERR "/me wants array_map in C");
   goto infinite_loop;

Best,

Atto

Advertisements
Categories: Uncategorized Tags: , , , ,
  1. Dan
    November 27, 2010 at 10:46 pm

    Macros aren’t evil. That’s just simple-minded name-calling.

    They can be a very helpful tool. If you want to use a macro here, embed it in an outer do { } while (0) loop, and make some local variables to snapshot your inputs.

    If you REALLY want to blow your mind, make an “X” macro, something like this:
    #define ARRAY_MAP(arr, size) do { \
    int _my_i, _my_size = size; \
    for(_my_i = 0; _my_i < _my_size; _my_i++) { ARRAY_MAP_X(_my_i, arr[_my_i]) } \
    } while (0)
    // … use it here
    #define ARRAY_MAP_X(idx, item) printf("Item at position %d: %d\n", idx, item);
    ARRAY_MAP(my_arr, 10);

    Why? This allows you to embed multi-line statements, perform an expression, call a function, pass other arguments in, etc.

    … or use Objective C with blocks. But if you're doing this in the heart of an algorithm, you can save some mips by not entering and exiting the function millions of times.

    *Note – I have only written the above code, not compiled or tested it.

    • November 27, 2010 at 10:58 pm

      You’re right about everything except the “mindless name calling” part.

      > You could implement it with callbacks to avoid the evils of macros,

      This statement != “macros are evil”, think about it

  2. Dan
    November 27, 2010 at 10:50 pm

    … one more thing. Iterating over arrays is indeed fast if you want to visit all or most of the items in the array. It’s certainly faster than iterating over all of the items in a tree (mips and cache behavior).
    I know, you assumed that only a minor bit of the data is worth considering and there’s a fast-path way to skip that, but there are cases where you want to examine or otherwise visit all data in a rarely-changing dataset, and I can’t think of a better way to do that than a flat C array.
    (Array indexing is the same as pointer math plus a dereference! The “array” access is just syntax.)

    • November 28, 2010 at 7:38 am

      Good point, some of my comments above assumed you don’t want to visit/process everything in the list. If you *do*, arrays are fast indeed. Thanks for the awesome comments, come back soon!

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: