Archive

Archive for April, 2010

Code Review for Dummies

April 24, 2010 Leave a comment

Early on in my college days, I realized that not every “CS major” can actually write coherent code. Some people chose computers because they were pre-law, others thought it would be good money, etc. And then there were the true nerds. Kids like me who’d been coding since their early teen years, who ate/slept/breathed and even dreamed code. Kids you’d be afraid to walk on the same side of the street with… kids who think in 3D coordinates and analyze the algorithmic complexity of setting up a row of folding chairs.

I think the real shocker for me was one college professor who couldn’t actually code at all. The guy was a flippin’ genius with algorithm design, pseudocode, and dynamic programming principles – but he couldn’t write actual code that would compile and run on a real computer.

And then there was me. I was a freakin’ code rockstar. I led group projects, wrote awesome code and turned in all my projects early.

Yeah, I thought I was pretty hot stuff.

A few years later, life smacked me up side the head a few times and I slowly, ever so slowly came to realize that I write bugs too. It’s true that programmers have different levels of ability, and there are plenty of code “posers” who can’t code their way out of a paper bag – but everyone has bugs, sometimes really stupid ones. And sometimes, even when code is written and tested, validated to be functionally correct – sometimes, it’s not fast enough. Or it uses too much memory, or it’s not portable to Mac OS, or you lose all your data if the power cord gets yanked out of the wall.

Code reviews can’t solve world hunger or promote world peace, but when done correctly they save time and money. I’ve personally seen code reviews find and fix critical bugs of every flavor and type, and the process pushes code in the direction it really should be going.

I haven’t met very many professional software developers who actually argue against code review… but not everyone actually does it. And even when people do reviews, sometimes the review process is fundamentally flawed and ends up wasting time and money.

And now we finally get to the topic at hand: code review for dummies.

Or putting it another way, here’s some concrete steps you can follow to make code reviews work for you instead of against you.

  1. Put time in your project schedule for code reviews
  2. Use a code review tool like Review Board or Crucible
  3. Establish sane code standards, and enforce them frequently

This list is by no means perfect, but if you follow these suggestions you will be well on your way to amazing code reviews. Let’s take a moment to discuss each idea in brief.

Schedule Time for Reviews

It seems like something so duh obvious that it’s not worth mentioning – but this can actually be the most difficult part of code review.

If you don’t schedule time for code reviews, they just don’t happen. It’s pretty rare when a developer will spontaneously conduct a code review or code walkthrough instead of writing new code. And if your modus operandi is like most software teams, your project is weeks/months overdue and it’s down to crunch time… not the kind of environment that most developers will find condusive for reviews. That is, unless there’s significant managerial support and/or prodding.

Use a Code Review Tool

I literally shudder whenever I hear the words “Fagan code inspection.” If you’re not familiar with Fagan or his uber-complicated, much abused process – go read about it.

I think the concept of a Fagan inspection is really great – for software projects that are adequately funded, with teams that execute the method perfectly, and have long software development cycles. And I’ve never seen all of those conditions met. Nobody ever comes to the meeting having read the code, and invariably it turns into a drawn out brawl over variable naming conventions.

Other approaches to actually conducting a review include developers emailing patches around, and “code walkthroughs.”

Emailing patches is not too bad, but it only works if all your developers know what a patch is, and how to use them effectively. And even then, you have no permanent record of the comments or decisions from the review. Well, technically you have a bunch of emails but the whole process is rather manual and makes any actual reporting or tracking of patches cumbersome.

Code walkthroughs are an extremely important and critical aspect of the software development process, and should be part of your code review process. But I won’t be covering this process here.

Lightweight code review tools like Review Board and Crucible are indispensable and make the review process… oddly fun and painless. Check them out.

Establish Sane Coding Standards

The less I say about this, the better off everyone will be.

You have to have coding standards to make your code readable and maintainable. Without coding standards, code reviews are pointless brawls.

‘Nuff said.


And that folks is the long and the short of it. Code review is an essential, vital part of any serious software development process. Not doing it makes you a dummy… so don’t be a dummy, follow my advice and you’ll be glad you did.

Best

Atto

Advertisements
Categories: Uncategorized Tags: , , ,

TortoiseGit – round 2, fight!

April 22, 2010 2 comments

For some odd reason, I’ve gotten my head wrapped around TortoiseGit and hence this follow-up post.

Cloning a public repository successfully is neat… but not exactly useful.

So, for this post I’m going to try a couple of tricky things in rapid succession:

  1. creating conflicting changes in a clone that I have to resolve when I push
  2. create two radically different changesets consistent with two sparring developers

And if I get really creative, maybe I’ll try to clone an SVN repository just for the heck of it.

First things first.

Cloning my local repo was trivial, and not worth detailing. I created two conflicting changes, one in the master and one in the cloned repo. Committed in the clone with no issues, then hit the “push” dialog.

I was expecting some kind of error, or perhaps a merge dialog to pop up – but instead I got the following very confusing error message.

git.exe push “origin” master:master

To prevent you from losing history, non-fast-forward updates were rejected
Merge the remote changes before pushing again. See the ‘Note about
fast-forwards’ section of ‘git push –help’ for details.
To C:\Users\foo\Downloads\hg-git
! [rejected] master -> master (non-fast-forward)
error: failed to push some refs to ‘C:\Users\foo\Downloads\hg-git’

As a complete and total git newbie, I found this error message very confusing. What is a fast-forward update? I actually missed the “merge the remote changes” message the first time through. This dialog is not very useful – why not list the log message of the remote changes that need to be pulled? This would be much more user friendly. At the very least, it should pop up with a button to pull & merge.

Hmmm… what to do next.

The dialog says I need to merge, but I seem to remember reading something about pulling before a merge.

How about TortoiseGit->Pull?

The result is shown below in the following image.

Wow, that is not only confusing, but downright useless.

Unless I completely missed something, there’s no easy “edit conflicts” button that pops up the 3-way merge tool. I actually had to search around for a bit to figure it out. You can either right click on a file and then select TortoiseGit->Edit conflicts, or you can use the “Check for modifications” link to show a list of all changed (and conflicting) files.

OK, so I edited the conflicts, then committed them to my local clone. Now what happens if I push?

git.exe push “origin” master:master

remote: error: refusing to update checked out branch: refs/heads/master
remote: error: By default, updating the current branch in a non-bare repository
remote: error: is denied, because it will make the index and work tree inconsistent
remote: error: with what you pushed, and will require ‘git reset –hard’ to match
remote: error: the work tree to HEAD.
remote: error:
remote: error: You can set ‘receive.denyCurrentBranch’ configuration variable to
remote: error: ‘ignore’ or ‘warn’ in the remote repository to allow pushing into
remote: error: its current branch; however, this is not recommended unless you
remote: error: arranged to update its work tree to match what you pushed in some
remote: error: other way.
remote: error:
remote: error: To squelch this message and still keep the default behaviour, set
remote: error: ‘receive.denyCurrentBranch’ configuration variable to ‘refuse’.
To C:\Users\foo\Downloads\hg-git
! [remote rejected] master -> master (branch is currently checked out)
error: failed to push some refs to ‘C:\Users\foo\Downloads\hg-git’

Wow, this is so lame I’m almost ready to give up on git.

At this point, git is zero for two on the user friendliness scale. So far my googling for the many error messages in this git-barf just turn up more of the same – I can’t push from master -> master on a local repository. Which is completely and totally not helpful, and doesn’t make any logical sense.

It’s getting late, so I’ll post back when I git this figured out (ha ha very punny).

Atto

Categories: Uncategorized Tags: , , ,

First Impressions of TortoiseGit

April 21, 2010 Leave a comment

This will be a really short post, I recently installed TortoiseGit and wanted to post my first impressions.

I’ve been a long time SVN user, with a fair amount experience using TortoiseSVN. TortoiseSVN is about as good as it gets when it comes to getting out of the way and letting you get your actual work done. I have a lot of good things to say about TortoiseSVN, but let’s face it – it’s just a GUI front-end for SVN.

I’m going to skip the debate about centralized vs distributed revision control — the ‘net is full and overflowing with opinions, debates, misinformation, and evangelism on both sides. At the end of the day, SVN (darcs, Mercurial, git, bazaar, perforce, etc) are just tools to manage changes to source code. Each has it’s own strengths and weaknesses, so I won’t bore you by repeating these pros/cons here.

What I really wanted to write about was my first impressions with using TortoiseGit, which is a Tortoise clone for git. I’m coming at this from a complete newbie’s perspective – I’ve only cloned one git repository before, and it’s been a while.

First things first: install TortoiseGit. No issues here, the installer runs and within a minute or so I’ve got explorer integration working. An annoying artifact of running Windows is the constant reboots – sure enough, TortoiseGit wants me to reboot before it’s fully working. Argh. I didn’t want to reboot, so I put this on hold or a day or two until my next natural reboot.

Now, let’s clone a repo.

Right click in windows explorer, click on “Git Clone” — oops! Got a dialog box of death, looks like I forgot to install msysgit. A download/install later, and now TortoiseGit pops up the clone dialog.

The SVN checkout and git clone dialog look pretty much the same, nothing really interesting to see but I’ve included a screenshot here.
svn checkout vs git clone

I cloned hg-git.github.com, and the clone worked perfectly without any issues. It’ll be interesting to see how well this works from a corporate network, behind a proxy and all that.

Of course, showing the log files works perfectly and gives me all the changesets from initial import.

One thing I really like about the git log is the graph running along the left side – it looks like a depiction of the branching & merging, but I’d have to check into it more to be sure:

I have yet to try anything actually useful, committing / branching / etc — but at first glance, it looks fully capable and functional and I’m excited to get more into it as time permits.

Categories: Uncategorized Tags: , ,

Measure always, optimize sparingly

April 19, 2010 Leave a comment

At work today, I found myself falling into a classic coder mental trap, one that’s worth sharing.

We have an automated test suite – we call them our “sanity tests” – that we’re expected to run prior to checking code into the repository. It’s a quick validation run that helps keep the code moving in a positive direction – we can know almost instantly if we’ve made changes that break the system, and back out bad code before anyone else is affected by it. It’s really the only sane way to code, I honestly don’t know how people and teams stay sane without… sanity tests.

Between each test run, we try to initialize the system back to a “known good” state, so failures from a previous test don’t result in false positives in later tests. The system “reset” routine destroys some intermediary data, and depending on the amount of data generated, it can take a while.

Or so I thought.

But I digress.

The total test time is very small, a few seconds on a nice Linux box and maybe half a minute on a slower Windows system. But for whatever reason, the time it takes to reset the system has been bugging me. I keep thinking to myself, if only I could shave the 8.37 seconds down to 4.83 seconds… that would be totally awesome.

So finally, I broke down and gave in to my obsessive compulsive nature and mentally geared up for a good half hour of debug/hack/burn to optimize the reset.

More than half an hour later, I put the finishing touches on the routine that destroys the intermediary data – because that was obviously the slowest, most annoying part.

Right?

Uh… no.

I literally smacked my own head in disgust.

I had fallen for the classic engineer “infinite optimization loop.” The special OCD place we all go where everything is open for tweaking, improving, and optimizing to death. The place where ROI is always infinite and there are no time constraints.

I went back and measured the reset routine, and the piece I had optimized accounted for only 1/3 of the reset time. My optimizations were about a 50% speedup, or 1/6 of the reset time, and only .3 seconds shaved off the best total system time.

Holy cow, I’m a flippin’ rockstar… move over Elvis, here comes dork boy.

When I was in the middle of this grand epiphany, I remembered the sage advice of my uncle – a carpenter of 30+ years: “measure twice, cut once.”

In software engineering, we’re lucky because many physical constraints just don’t apply. There is no board, no saw. So there’s no reason we can’t cut a hundred times, measure, and go back and cut some more.

But the problem is, sometimes we get so far removed from physical constraints, we can far too easily believe there are no constraints. But time, money, and schedules are at odds with the Utopian, ideal engineer’s optimization loop.

So I’d like to propose a motto for coders of all races, creeds, and compilers:

Measure always, optimize sparingly.

Atto

Categories: Epiphany Tags: , ,

ack is better than grep

April 16, 2010 Leave a comment

Disclaimer – this post will probably only make sense if you’re a Linux/UNIX command line freak. So if you’re a Windows .NET programmer or a Web coder or not a coder at all – you might wanna skip this post.

If you spend much time at all working with source code (C, C++, Java, Ruby, Perl, Python, whatever), you quickly come to realize that navigating through code is tricky. Visual Studio users don’t really appreciate this because IntelliSense hides most of this trouble from you (most of the time), but any good UNIX geek knows what a nightmare grep can be and what an unholy combination of pipe magic is required to find your way around code.

I’ll post soon about ctags, cscope and maybe even code bubbles, and how you can use them in combination with the world’s best text editor (vim of course) to write some rockin’ good code.

However, these and other tagging based tools usually require integration with an editor, and there are times where it’s just not convenient to launch or configure an editor. Or setup a project in an IDE. Sometimes you just want to find out what file function Foo is defined in, or who all the callers of Bar are, and you want the answer now.

A naive grep based approach starts out like this:

$   grep "Foo" *
Binary file stuff.o matches
foo.c:247      Foo(bar, car, star);
(5 pages of irrelevant garbage cut)

Usually, you pick up some garbage coming from Binary files in the same directory, and you forget to specify a recursive search so it doesn’t find the file you really wanted.

So, you monkey around some more, use find to limit your search to source files, and you get this:

$   find . -name "*.c" -or -name "*.h" | grep Foo
foo:c:247    Foo(bar, car, star);
subdir/proj/main.c:17    bar = malloc(sizeof(Foo));
.svn/blah/foo.c.blah   Foo(NULL, car, bar);
(Cut 10 pages of irrelevant SVN metadata)

Oops, looks like we got some subversion working copy junk mixed in with useful results. Blasted revision control always clutters up a grep chain. After a few more iterations, my grep command finally ends up looking like this:

$   find . -name "*.c" -or -name "*.h" | grep Foo | grep -v .svn | grep -v otherstuff

It’s honestly a flipping nightmare.

A few months back, I stumbled across another command line tool (well, perl script really) that hides all the warts of grep, and helps me do the one task I really care about doing: finding everyone who uses Foo so I can add a new parameter to the argument list, or some other similar code maintenance task.

Now, instead of typing a monstrous, iterative grep session – I just type

$   ack Foo

And ack searches for the regular expression Foo in all source-code-like files in all subdirectories, displaying line numbers and highlighting in red the matching patterns.

ack really is better than grep.

ack rocks. Save yourself a lot of grief and kick your grep habit to the curb, you will never look back.

Best,

Atto

Categories: Uncategorized Tags: , ,

Clock Ticks per Lifetime

April 12, 2010 Leave a comment

Okay, so this is a slightly random thought, but I was thinking about clock ticks. And that led to an oddly profound thought, which I’ll share in a minute.

As an engineer, perfectionist, and anal-retentive type of guy, I’m constantly measuring, analyzing, and trying to optimize everything to death.

I am a geek’s geek. The type of geek that is better kept in the basement, the kind you’d be afraid to get too close to a real life customer. I think hacking around reading the Linux kernel source code is fun.

Anyways, part of my job as a software engineer is to “profile” code. I use various nifty tools and gizmos combined with manual instrumentation and analysis to find the slowest/most time-consuming parts of a software project — and then make them faster. Usually, it’s not too tricky to find the worst part, and most of the time it’s not too difficult to make the slowest part faster — it’s generally much harder to work on the right peice of code for the right amount of time.

It’s a lot easier to get lost in optimizing the wrong part of the code, the code you want to fix even though it only impacts 5% of the total runtime. You can spend a week making your program start 10% faster, and not realize that nobody flippin’ cares if your program starts in 2.1 seconds instead of 2.35.

There are other times when you completely miss an important clue, the alarm bells are ringing like crazy and the red flags are going nuts – but you (or someone you know) is asleep at the wheel. The lights are on, but nobody’s home… and you don’t optimize a small but critical piece of code. The reasons this happens are as varied as the seasons, but one common malady is lack of context. The Foo module only takes 749 us (and Bar takes 200 ms), but you don’t realize that Foo runs in a tight loop and is the critical path. So you waste time and money “fixing” Bar and wonder why your software is buggy and slow.

It’s these little contextual clues that make all the difference. These key vital bits of information that provide a frame of reference and give shape to a project.

A good analogy that comes to mind are the corner peices in a puzzle. One of the first things a kid learns about puzzles is that putting the corner pieces in place makes everything just flow together. Trying to build a puzzle without the corners (or sides) is much more difficult. There’s no framework to base the rest of the puzzle on, no structure to latch onto.

Getting to the point of this rambling story, I had a minor epiphany of sorts when thinking tonight about clock ticks. It wasn’t really a corner piece, maybe just a small side of the puzzle that I snapped into place. But I wanted to share it here in case anyone else finds it interesting.

I was thinking about processor clock ticks, and started to do a little math. Common processor clock speeds are in the GHz range, let’s pick 2.66 GHz just for fun. Let’s pretend you’re a respectable geek and you have a Core i7 system which you haven’t overclocked, it’s running at the stock 2.66 GHz clock.

2.66 GHz is a lot of hertz. How many hertz? Well, a gigahertz is a billion cycles per second, so 2.66 GHz is 2.66 times a billion cycles/second.

That means your Core i7 processor has a clock period of

1/(2.66*10^9) sec =~ 375 picoseconds

375 picoseconds is not a lot of time. It happens 2.66 billion times every second, which is amazing because your Core i7 is getting something done every few clock ticks – for simplicity’s sake, let’s say your CPU gets a billion something’s done every second.

That is pretty neat, and totally geeks me out – but it made me think and wonder about how that relates to people. Assuming the average life expectancy is, say, 70 years, how many “clock ticks” do we get in our lifetime?

Drumroll to cue in some more math…

(2.66*10^9 Hz) * (3600 sec/hour) * (24 hours/day) * 
(365.26 days/year) * (70 years/lifetime) = 5.876186 × 10^18

That is a lot of clock ticks.

So, if you’re still with me, and the tedious algebra didn’t send you running for safety – the thought struck me that maybe, just maybe, with approximately 6 x 10^18 clock ticks in my life, I have enough time.

There’s enough spare clock cycles for me to not be a complete perfectionist, to live a little and make mistakes. To learn from those mistakes, and make other stupid mistakes to learn from.

With 6 x 10^18 clock cycles (of which I have at least 3.5 remaining), there’s enough time to relax and enjoy my job, family, church, and life. To not waste time, but choose instead to live it more fully and purposefully.

So the next time someone like me is stressing about some peice of code that doesn’t matter, or stressing over little details that aren’t 100 percent relevant – tell them to relax and burn a few clock ticks being human.

There are enough clock ticks in your life to get everything done, and have fun in the process.

Cheers,

Atto

Categories: Epiphany Tags: , , ,

First post

April 11, 2010 Leave a comment

I have finally done it.

I created… a blog.

I know, I know, everyone else and their dog has a blog, so why waste time yammering on about it? I mean, this is 2010, so it’s almost embarrassing to admit that I am just now, finally, and actually blogging.

Unfortunately (or fortunately, depending on your point of view), this blog will not be dedicated to various cute kittens or lolcats, nor will there be any sign of knitting needles or sewing of any kind.

I am a geek. So this blog will have all kinds of geeky and confusing topics, for example:

Why does gdb not support reverse debugging of MMX type instructions (used by memset, strncat, etc), thereby rendering the feature pointless for nontrivial programs?

Why does the windows command line not support basic utilities such as the Unix “tee” command? And why does powershell (which fixes some of the brokenness of cmd.exe) require the latest service pack?

Why does the Linux kernel change the block driver APIs with every kernel release? And why isn’t anyone working on LDD4?

And last but not least…

Why does everyone facebook?

Best,

Atto

Categories: Uncategorized