transformation

Transformation

Refactoring Trilogy, Part 1

I originally published this essay to my internal Google blog on 10/27/2005, and then republished it on the O'Reilly Ruby Blog in March 2006. I'm posting it here on 10/22/2006, since I think it belongs with my other essays, and I'm planning on writing two sequels.

Transformation

I keep hearing people say they couldn’t possibly use Ruby because it lacks automatic refactoring tools. And although it will eventually have some of them, there’s a class of automated refactorings that are automatable in Java but not in Ruby. This, people say, is a show-stopper.

I wonder.

What exactly is refactoring? I mean, it’s not a word in the dictionary.

Fowler tells us that it’s the art and science of turning smelly code into good code, in small, incremental steps. Provably correct, by construction. Algorithms for giving your code a makeover without breaking it in the process.

He gives us a nice taxonomy. He presents some good techniques, especially geared for Java programmers. Some are things we had already figured out and are habitual, and some of them are new.

Some of these “refactoring” techniques are automatable. And many of them are useful in languages other than Java. It seems Fowler and friends have stumbled on something real, something as big as OOP, almost. Or at least they were the first to market it and package it properly. Either way, we know about it now. Thank you, Fowler and friends!

Refactoring is one of the first programming books I’ve seen that talks about the almost mystical act of writing code. It takes the process, exposes all the insides, revels in it, walks you line by line through oh so many little decisions that affect code quality. These are things most people never talk about. They take them for granted. Most people talk about “architecture”. Refactoring talks about the idioms in the code we write every day. Real now-code, not planned someday-code.

It’s remarkable, really, that nobody talks about this. They leave all the so-called style choices to the programmer. Refactoring rubs our noses in the implications of our line-by-line style choices. Beautiful.

Discovering Refactoring

Refactoring caught my eye in the bookstore one day in 2002, years after it was published. I hadn’t read it because it was published by those UML weenies. I’ve just never been a fan. It has its uses in database modeling (maybe), but I’ve never found it useful in class modeling. And I’ve never cared for the Booch/Jacobsen/etc. crowd’s books.

Refactoring is right there, smack in the middle of that weenie series. Every time I see it, my eyes sweep across the cover without a second glance. You know the old saying!

But one wintery day in 2002, I’m in the bookstore, and I pick it up. No real reason. I’m curious. Don’t know why. Maybe I’d finally heard the word “refactoring” somewhere. What did it mean? It’s not a word in the dictionary.

“Factoring”, sure, that’s a dictionary word. You can factor numbers, or polynomials. Factoring I know. Don’t know why you’d re-do it, though. What’s “re”-factoring?

I open the book. It says local variables are the root of all evil. Perhaps not exactly those words, but it’s the first discussion I stumble across. Local variables!? I plop down in a squashy armchair, outraged, to read more. I want to know if this guy is actually insane, or merely an idiot.

Horror sets in: he’s right. His explanation makes chilling sense. One of my cherished programming practices — caching intermediate values in local variables, as an inline performance optimization — is clearly demonstrated, before my very eyes in the squashy armchair, to be Evil. It explains why I have certain methods in my code base that keep growing and growing, and for reasons I’ve never been quite able to grok, the methods are unsplittable.

These big methods, they’re the Bad Places. The areas of the code base where I loathe to tread. Dark caves that grow more evil every time I visit them. Because add functionality I must, but the locals have threaded their way impenetrably through each function, spiderwebs that catch me and hold me.

The book shows me why they’re unsplittable, then gives me axes to split them. Sharp and precise tools. And the techniques make sense, right then and there. Some even appear to be automatable. Wow.

I move on. Turning pages faster, now. Interested.

The book next tells me: don’t comment my code. Insanity again! But once again, his explanation makes sense. I resolve to stop writing one-line comments, and to start making more descriptive function and parameter names.

I buy the book, take it home, read it over and over. I marvel. It appears to be pure genius. To this day, I still feel that way, although perhaps not so greatly as I did on that day. But the book is a landmark, and it made me a better programmer overnight. How often does that happen?

Sudden embarrassment. How could I not have read this back in 1998? I’m awash with a horrid cold feeling, as if I’ve just learned I’ve been coming to work for years with my pants down around my ankles. Has everyone else at work already read this book? Am I the only one who didn’t know?

I ask around the next day. Casual. Cool. Not tipping my hand. You’ve read Refactoring, right? Nope. Everyone I ask says No. Most haven’t heard of it. Out of 20 developers I survey, only one guy has read it. No surprise there, since he reads everything. His vote? “Yeah, that’s a great book!”

I feel a rush of relief. Most people don’t know about it, then. I’m safe. I can study it, use it, not worry that everyone will know how foolish my code has been. My code had only been bad along a few dimensions. Most of it was well-engineered. I used design patterns, unit testing, source control, all the usual software engineering discipline. It just smelled a little bad, and now I could fix it.

Refactoring Today

Everyone knows about Refactoring nowadays, because IDEs now have all of the automatable refactorings from the book, and a few extras to boot.

But despite its overnight popularity, I doubt most engineers have read Fowler’s book, not even a few chapters of it. I suspect most engineers today don’t realize there are still many refactorings that are not automatable, even in Java. Most of them, even. Although that’s a topic for another day.

Today, I still don’t know exactly why they called it Refactoring. Catchy, I guess. It feels distantly related to factoring, in the math sense. Reorganization? Too broad. Refactoring seems fine.

Sometimes a good name is all that lies between a great idea and mass acceptance.

Refactoring today is an entire industry. It’s a banner. It’s the battle cry of Java-IDE lovers everywhere. Refactoring tools are Productivity in a Bottle. You browse a menu of refactorings, choose one, and the earth moves.

Why is automated refactoring so popular in the Java camp, and nowhere near as popular in other languages?

Java people say it’s because only Java allows you to accomplish this level of automation of code transformations.

Reprise

I wonder.

I read Fowler. I absorbed it. It’s the art and science of taking smelly code and turning it into better code, in small provable steps.

But he taught us something else, didn’t he?

Oh, but you wouldn’t know what that thing is, if you haven’t read his book. Have you? All of it? No skimming? C’mon now. Admit it. You skimmed.

Here’s the deal: to show us the paths from bad code to good, Fowler had to show us bad code. He showed us examples of what it looks like, and explained why it’s bad. He gave us a set of warning indicators and even called them “Code Smells”. More clever marketing? Perhaps. But they’re right on the mark.

How did that code get smelly in the first place?

Well, we optimized prematurely. We stored too many intermediate values, for fear of recomputing them. We didn’t write small functions, for fear of virtual method-call overhead. We made bloated class heirarchies for the imagined benefits of reuse. We made huge parameter lists to avoid allocating a container object. We used null everywhere as a semantic token. We allowed boolean-logic expressions to grow into unreadable thickets. We failed to encapsulate data and data structures with accessor methods. And many more bad things besides.

We were making dozens, hundreds of little mistakes that added up to some pretty smelly code. The book catalogued our mistakes, gave them names, elevated them to First-Class Mistakes.

Then, presumably, we stopped making them? Well, maybe those of us who read the book. Even if we read it late. Better late than never. After reading it, you know what bad code smells like, and you know how it got that way. You’ve learned how to avoid writing it.

At what point did automated refactoring tools become the focus? The book’s original focus was about design, with tools for recovery. Now the focus is all on recovery, and specifically on the automatable subset of recovery techniques.

The implicit assumption here is that bad code just happens, inevitably. Even though we know all about its characteristics, and we know how to spot it instantly. Heck, if you read the book, you know it wasn’t just a catalog of 100-odd specific refactorings. It also presented themes. Once you get the core ideas, you can invent your own refactorings, and identify new code smells.

And now you know better how to write the code correctly the first time around.

Oh, you disagree? Because code is a living thing, and requirements change constantly? Yes. Code needs to change. But Refactoring isn’t the whole story on code change; it’s a relatively small part. There’s data modeling, and architecture, and design patterns, all the high-level astronautics. And of course the custom code-pattern transformations you apply almost daily that aren’t general enough to give first-class names. Changing any of these things requires techniques that I assure you do not have entries in the Refactoring menu of your IDE.

Refactoring is zoomed waaay in. It’s focused on how you personally wrote this or that class or method, down at the level where you were making choices about local variables, control-flow constructs, and other micro-design decisions.

You now know how to avoid doing the wrong things, at that level.

Well — you know if you read the book, that is. Without skimming. And also, I suppose, only if you were already experienced enough for the book to stab at you like a blade, mocking you for not noticing every one of these things yourself, so that you will remember its lessons in your bones for the rest of your days.

Did Refactoring make us lazy? Maybe so. Especially if we skimmed it, or just read the tools-half without reading the explanation-half. Then maybe we think the whole story is about recovery from code that inevitably goes sour. And even if you read the book, maybe those siren-song automatic refactorings have made you forget what the book was really about. Namely, fixing your code and then never again writing so amateurishly.

Is this the whole story? No. Do I sometimes still need to refactor my code? Yes. Are there subtleties I’m punting on for now? Yes. Refactoring can’t really be discussed in a vacuum; it’s interdependent with other modern development ideas, including “don’t repeat yourself”, “once and only once”, unit testing, and others. I may revisit refactoring again in that context.

Today, though I’m just interested in why Java programmers are saying that the ability to “program” by pushing buttons is so critically important to them that they’re unwilling to consider using another language. Even a language that’s gaining rapid worldwide recognition as a step-function in productivity at least as great as Java was over C++, with almost none of the downsides or friction of similar-looking predecessors like Perl and Python.

I mean, they won’t even consider trying Ruby? Gosh. Pushing buttons must be… wonderful.

I close my eyes, envisioning that kind of power…

Pushbutton Productivity

Ah, those automated refactorings. Such programming power — instant productivity with the click of a button. Programming by Menu Selection. Choose your automated attack, and the very earth moves. Mechanical muscles moving mountains of code. It’s almost as if you’re superhuman. Programming never felt so much like a video game.

It must feel like piloting an earth-mover: a John Deere, a Komatsu, a Caterpillar — one of those huge yellow mechanical dinosaurs with the world’s largest tires, the ones that held us in awe when we were children. The driver pulls a lever, a hill of dirt moves aside. The work of a hundred men in a day, accomplished with the bored flick of a wrist. A lifeless yellow behemoth at your beck and call.

Now that’s productivity. If my job were moving mountains of dirt around, then I agree: I would not possibly be able to work effectively without an earth-mover, some tractors, a dump truck, maybe a crane or a backhoe. My personal tools for refactoring the surface of the earth. You couldn’t pry my fingers from their hundred-ton hulks.

Caterpillar. Such an odd name for a motorized Colossus. Or is it? They named it after a little segmented bug. All the segments look alike, repeating themselves. Each segment has two identical tiny legs. The legs have to move in coordinated waves to propel the bug forward. So much computational processing devoted entirely to crawling around in the mud!

Yes, I see it now. Caterpillars are long, machine-like insects. Earth-movers are huge, insect-like machines. It begins to make sense.

Automated code-refactoring tools work on caterpillar-like code. You have some big set of entities — objects, methods, names, anything patterned. All nearly identical. You have to change them all in a coordinated way, like a caterpillar’s crawl, moving all the legs or lines this way or that.

How did our code get that way to begin with? We wrote it badly. Refactoring to the rescue. Good design may be a lost cause, but we can recover, because we have automated servants to go fix all those little segments for us. They never get tired, and all we have to do is push buttons.

Well then. How could you possibly live without automated refactoring tools? How else could you coordinate the caterpillar-like motions of all Java’s identical tiny legs, its thousands of similar parts?

I’ll tell you how:

Ruby is a butterfly.