waste-management

Waste Management

Stevey's Drunken Blog Rants™

This month's column was inspired by some phone screens I did recently, in which candidates told me cheerfully that using C# means you "don't have to worry about memory." Since I can say, without hyperbole, that I've heard that statement at least a hundred grillion times 1, I thought I'd talk a little about garbage collection.

Garbage Collection is Your Friend

Don't get me wrong: I like garbage collection. It's a huge advance in abstraction — and, as I pointed out last month, programming is hard 2, so anything you can do to make it easier is a Good Thing. But it doesn't excuse you from "worrying about memory."

Lots of people pooh-pooh garbage collection as being a crutch for namby-pamby programmers who don't want to know anything about what's happening inside the computer. And based on my last two phone screens, I'm almost inclined to agree, even though I'm not entirely sure what "namby-pamby" means.

However, even programmers of the non-namby variety would be wise to study up on garbage collection. But why? Why should you care about garbage collection, if you're writing in some macho hand-managed language like C, or even a distinctly namby (but still hand-managed) language like Pascal?

Here's why: You should care because good garbage collectors are better at memory management than you are.

Heresy? Perhaps! I'm making a lot of assumptions about how good you are at hand-managing your program memory. For example, I'm assuming that you're a human being, when in fact you could easily be a dog.3 But humor me - let's assume you're a person who makes mistakes once in a while, whereas a garbage collector is a reasonably well-designed and well-implemented program whose sole purpose in life is to manage memory.

"Conventional wisdom" (which also tells us baloney like boiling frogs don't try to escape the pot) is that if you do something yourself, you'll do it better than a computer program can. For instance, many people still believe that well-trained humans produce better assembly code than compilers do, even though the experts will tell you that you're nuts to even try.4

But even if you think you can generate tighter assembly code than a compiler can, does that mean you should hand-code everything in assembly language? Of course not. Almost all your effort will be wasted, since you'll be optimizing code that may only run once in a while, or even not at all.

Memory management is a similar kind of problem. You may be able to produce better memory-management code than a good garbage collector, but:

    1. For how much longer? The state of the art of garbage collection keeps advancing. Garbage collectors are getting pretty darned smart. They can determine object lifetimes, turn heap allocations into stack allocations, move objects around to manage the heap more effectively, pool immutable objects, and take advantage of hardware support for memory management. At some point, garbage collectors are bound to do a better job, overall, than people do.

    2. How much control do you really have? Hand-managing memory is a bit like driving - you may be a safe driver, but you're still putting your safety in the hands of other drivers every time you get on the road. Any one of them could make a mistake and plow right into you. Arguing that you should avoid garbage collectors is becoming increasingly like the argument that you should avoid wearing a seat belt, just in case you ever need to leap from a burning car.

    3. Are you really saving money? Even if today you can hand-manage your memory better than a garbage collector can - is it worth it? How much time do you spend writing, discussing, and debugging memory-management code? How much revenue have we lost from bugs related to errors in memory management?

The answer to all these questions is, of course: "I dunno." I didn't exactly research this article. But my gut tells me that if we're paying programmers to do memory management, we're doing something wrong.

Garbage Collection != Worry-Free

If garbage collection is soooo spiffy, then why do infrastructure folks mumble curses and spit contempuously when you mention Java to them? Probably because some programmers view garbage collection as a license for unrestrained piggery.5 That's about as valid as assuming you're allowed to smear ketchup all over the walls of your hotel room just because they have a cleaning service.

In reality, there are all sorts of things you need to worry about in a garbage-collected system. Here are a few of the classic ones:

Allocating too many objects - you can bring even the best garbage collector to its knees by feeding it too much garbage. When you're considering an algorithm or data structure, be aware of its memory requirements, and remember that you're writing code for the real world, that will run on real machines.

On the other hand, many Java and Perl programmers go overboard and try to optimize for memory usage long before they've even run the program for the first time. As you write the code, you should focus on correctness and human-readability first. If your code runs into memory problems, then you can run a profiler to tell you where the hot spots are.

Keeping hard references to unused objects - if you have a live hard-reference to an object, then the garbage collector isn't going to free it. This happens a lot when you're using caches - you throw your objects into a hashtable and never free (or clear) the hashtable. If you're using a long-lived memory cache, consider using a LinkedHashMap or a WeakHashMap so you don't inadvertently keep unused references around.

Keeping unused resources open - another classic boo-boo in garbage-collected programs is using up all of your filehandles, database connections, or some other limited resource, because you chose to release them in finalizers that are never invoked. Most garbage collected systems offer no guarantees about when destructors or finalizers are invoked. This means that you can't use the standard C++ stack-object idiom for freeing resources. Java offers an idiom (via the finally keyword) with an equally strong guarantee. It's not the same guarantee — in Java, you should just never bother with finalizers — but it's a guarantee nonetheless, and you can use it to ensure your resources are freed at a particular point in the execution of your program.

There are other issues you need to be aware of as well, such as whether your garbage collector is tuned for the kind of program you're writing. The JVM offers lots of GC tuning parameters that can make a significant difference to your GC performance, and it's a good idea to read up on them.6

Further Reading

I was planning on doing a quick section on how garbage collectors work, but hey - would you look at that! Almost 1500 words already. Lucky me.

I'll just mention that not all garbage collectors are created equal. There's a tremendous amount of research going into them, which is all the more reason it's going to be hard to keep up with their performance, if you're the kind of person who likes to roll your own.

If you'd like to learn more about garbage collection, I recommend visiting www.memorymanagement.org, which has everything from tutorials to advanced algorithms. You might start with their FAQ, which gives a good overview, and dispels a few of the myths. Their article on memory management in various languages is a fun, quick read.

Food for Thought

C programmers think memory management is too important to be left to the computer. Lisp programmers think memory management is too important to be left to the user.

— Ellis and Stroustrup's The Annotated C++ Reference Manual

I have run some tests at the U of Oslo with about 100 users who generally agreed that Emacs had become faster in the latest Emacs pretest. All I had done was to remove the "Garbage collecting" message which people perceive as slowing Emacs down and tell them that it had been sped up. It is, somehow, permissible for a program to take a lot of time doing any other task than administrative duties like garbage collection.

— Erik Naggum (erik@naggum.no)

(Published August 15, 2004)

Notes

[1] Where a "grillion" is defined to be 1.

[2] Duh.

[3] True story: I had a vivid dream last year in which my brother's beagle, Bentley the Beagle, was explaining to me how excited he was that he could talk to me online and nobody knew he was a beagle. I'm not sure what to think of that, except that I'm apparently very susceptible to cartoon suggestions.

[4] But that doesn't mean you can't win Extreme Coolness Points for implementing the ADJ challenge in assembly. Hats off, Willie and Brian!

[5] Yes, piggery is a real word. I learned this when Anagram Genius informed me that "Stephen Francis Yegge" is an anagram for "Fatness, Hence Piggery." Thanks Mom, Dad.

[6] So you can be as baffled by them as the rest of us.