language-trickery-and-ejb

Language Trickery and EJB

Today a very smart, very well-intentioned Java programmer responded to a mailing-list language discussion about closures by saying:

The main language "benefit" that we are to consider at Amazon is it's ability to solve business problems. The components are shared in order to reuse the code, not the language specific trickery. I would argue that if you can not expose you public module interfaces in a language-neutral way there is something wrong with you code. We should write to "Amazon" not to "C++" or "Java". You are not supposed to expose LISP "closure" (whatever that is) but concepts like "list of orders" or "customer ID". That can be expressed in a language neutral way. Read any OO design book for an explanation.

My response got a little too long for polite email society, so I've turned it into a blog entry. Well, to be honest, I'm also doing this to help combat the Blub Paradox, which manifests itself, in Java programmers, as a fear of closures and other such "language trickery."

Incidentally, I have no idea whether responding to an email thread with a tutorial-like blog entry is considered gauche. In fact, I have no idea whatsoever what constitutes proper blogging protocol. As far as I can tell, you're supposed to post links to news items and to other peoples' blogs every day, and that's what makes for "hip" blogs. You'd think you could automate that style of blogging with a script. But maybe that's why it's hip?

All I really know is that when I have occasional long-ish ideas, or snapshots of my current thinking about some long-running problem, my blog seems like a nice place to put them. And it's easier than publishing to my home directory. Sad, but true.

In any case, I've been getting the impression that many Java programmers become uncomfortable when people start talking about features offered by other languages. Not just the Java engineer on the mailing list today, but lots of Java programmers. The majority, maybe.

So I figured I'd put my very best effort into explaining the reasons that people have these discussions. Hopefully I can shed some light on the relationship between language trickery and (seemingly) more concrete problems like good EJB design.

Let's talk trickery!

My first magic trick: Loops

Let's hypothesize that Java didn't come equipped with looping constructs: for, while, and do, to be precise.

Instead, imagine that you had to create a java.lang.LoopObject, initialize the loop start/end/increment variables, and then pass in a code block to call on every iteration.

In that unhappy scenario, this Java code:

// print even numbers from 2 to 100 for (int i = 2; i <= 100; i += 2) {

System.out.println(i);

}

would instead look like this:

// print even numbers from 2 to 100 LoopObject loop = new LoopObject();

loop.setStartingValue(2);

loop.setEndingValue(10);

loop.setCounterIncrement(2);

loop.setOperation(new LoopBody() {

public void loopBody(int currentIndex) {

System.out.println(currentIndex);

}

});

loop.start();

Et voila! We have a loop. Presumably you could call LoopObject.this.break() if you wanted to terminate the loop early.

This is high-quality Object Oriented Programming at work! Even if Java didn't have loops built into its syntax, OOP would come to the rescue: a library class could provide the feature for us.

OOP is a mechanism for adding functionality that your language doesn't provide for you. Every new class and API-call you add is creating vocabulary that extends and enriches your language. After all, a language is just a way to tell computers what to do.

Even though OOP has been amazingly useful in my example, I think most people would agree that having for-loops sure is an awful lot nicer. While-loops are cool too. (I wouldn't shed many tears if I lost the do-loop, but maybe that's just me.)

Having loop syntax is nicer because you're expressing the exact same computation with less code. Your intent is clearer to the reader, assuming they know your language.

Moreover, it stands out more than the LoopObject version's blob of method calls and object instantiations. People complain that Lisp looks like oatmeal, like a ball of mud that keeps getting bigger; that's because Lisp has almost no syntax. Syntax can provide visual cues that can make it easier for people to read the code. (On the other hand, you can easily overdo it; more on that below.)

Finally, and perhaps most importantly, your compiler can help you out, by whining at you if you've (e.g.) forgotten to initialize the loop variable.

The IteratorMaster Service

Look at it this way: a for-loop is a service provided to you, the programmer, by Java. Its Service API consists of:

- Declaring a name for your loop-counter variable.
- Specifying an initial value for the variable.
- Specifying the amount by which to increment the variable on each loop iteration.
- Specifying the value for the loop counter at which the service should stop iterating.
- Specifying a block of code to execute on each iteration.

As a reward for using the API above properly, Java's for-loop construct renders the following service to you: it invokes your block of code repeatedly, updating your loop counter in the manner you specified, and it gives your block access to the loop counter variable, as well as to any other variables and methods that are lexically in scope for that block.

That's an awful lot of service for one teeny statement.

Even better, the syntax for for-loops in Java forces you to adhere to the API for the service. The Object-Oriented version of the service doesn't force you to do it, at least not the way I've written it above. You could rewrite the LoopObject to accept three integer values in its (only) public constructor, in which case the loop would look like this:

LoopObject loop = new LoopObject(2, 10, 2);

loop.setOperation(new LoopBody() {

public void loopBody(int currentInde) {

System.out.println(currentIndex);

}

});

loop.start();

But that's slightly yucky, because you have three int parameters to your constructor, so it's easy for a programmer to put them in the wrong order accidentally.

Some languages provide trickery to make it harder to put the parameters in the wrong order, by having named parameters, but in Java (and C/C++), your options here are somewhat limited. You can go with the 3-int-param constructor, or you can have setter methods, and maybe throw an IllegalStateException if the user forgets to initialize one of the loop parameters.

When we're designing fancy object-oriented EJB interfaces for our service customers, I think it's a good idea to keep in mind that even a well-designed OO interface isn't necessarily as good as could be for your customers. If you have a clunky interface, then no matter how Object Oriented it is, it'll be more painful to use than perhaps it needs to be.

And sometimes OO is just plain clunky; you have no real choice in the matter.

Is Syntax better than OO interfaces?

So Java provides looping syntax for us. Is that just language trickery? After all, the OOP version for doing loops was serviceable enough. (So to speak.)

Perhaps it is trickery. Even so, I think most people would say it's awfully useful trickery. Maybe some programmers would be perfectly happy with a LoopObject, and they would become alarmed when folks on mailing lists talked about the possibility of incorporating a "for" keyword into Java. Who knows.

Let's assume, for the sake of discussion, that we all agree that having for-loops and while-loops in your language is better than requiring an OO interface to doing loops.

Do you sense a slippery slope approaching? I do, and I think it's going to be an interesting one.

A natural question springs to mind: what stuff do you include in your language, with direct syntactic or semantic support, and what stuff do you relegate to the status of second-class citizen, off in some class library?

Keeping in mind, of course, that your compiler can't check if you're using library calls in a valid way; you're pretty much at the mercy of whatever documentation and exception-handling your API designer provides. All your tools can do for you is ensure that you comply with the method signatures when invoking the service. That's true in Java and C++, and it's also true in language-neutral OO interface frameworks such as CORBA or SOAP. The method signature is the only thing the tools can check, for purposes of helping the user avoid errors.

Hmmm. So a method signature is pretty useful, then. Score one for strong typing.

But what about APIs that span method calls? Our LoopObject "service" illustrates the problem nicely. Your compiler can make sure you don't try to initialize the loop counter with the string "argle-bargle", which seems like a good thing. It's a semantic guarantee, and guarantees a pretty useful when you're constructing a service.

But the compiler can't know that you were supposed to call the darn method in the first place. It's a runtime error just waiting to happen.

The LoopObject class is obviously a toy example, but the same principle applies to any interface you're building, even (heck, especially) if it's a fancy, web-service-enabled, fully object-oriented CustomerOrderList interface: you have no way to ensure that the programmer follows the rules for using the interface, short of throwing an exception when they do it wrong. That's a poor substitute for some of the things a compiler can do for you, but it's all we've got.

Or is it? Would it be preferable to have some language trickery to help out our hapless users?

That depends, of course. For one thing, it depends on how many of your potential users will be relying on the service. You can't have syntax for everything, because nobody would be able to remember it all. You'd wind up with Perl. You'd have to explain to users that your service has a Customer Context and an Order Context, and that your API calls behave differently depending on which context your service guesses you're in, based on the time of day and its current mood.

Well then. We've established two boundary conditions for our problem: you want some syntax, but maybe not syntax for everything in the kitchen sink.

How much syntax then, dangit?

I don't think anybody knows. That's why we discuss it a lot. That's why different languages make different choices about it.

What syntax did Java decide to include, and was it the right decision?

Well, we use loops all the time. It would seem perversely inconvenient to require you to use a class or library to do iteration. So clearly that's a candidate for inclusion in the language, and Java definitely did the right thing there. Thank you, James!

And we all iterate through lists an awful lot, regardless of the problem domain, so some language designers were kind enough to provide syntax for this. However, they were unkind enough to call it map, which makes your users think it's some sort of cartography function until they learn that no, it's actually invoking a block of code for each element of a list.

Ruby and Groovy call it each, by the way.

Java has no map function. Why not, though? After all, many other languages had it long before Java came along, and newer languages have incorporated it too.

The answer is probably historical: C and C++ didn't have it, and Java was designed to be easy for C++ programmers to learn. For some reason, programmers love to learn new stuff, as long as it's not syntax. They'll stay up all night reading about Hibernate or Jaxen or some godawfully complex new framework, but map is really scary for some reason. So are "closures", evidently.

I don't know why, either.

New Syntax for Java

Well, OK, there's no map operator in Java. But Java folks still do a lot of list iteration, so they added it (naturally!) using an Object-Oriented language extension: java.util.Iterator.

It does the job, but is it ever clunky. Sun realized (a bit late, I think, but better late than never) that every Java programmer in the universe was writing blobs of Iterator code over and over again, so in the latest release of Java they added... map?

Nope. Too fancy. Might scare people. So they added for. Yeah, I know, they had that already. They made a teensy extension to the syntax for the for-loop, one that the compiler turns directly into an Iterator. Oh, and they called that version "foreach" in the docs, even though the actual language keyword is for. Silly? Try figuring out the JDK release-numbering system sometime.

Anyway, Java added some syntax in the latest release. Not much -- well, it depends on who you ask, I guess; some people feel they might as well have gone to the Louvre and painted a moustache on the Mona Lisa -- but definitely a few additions here and there. Generics, enums, and the so-called "smart" for-loop which is, in fact, not all that smart, as I mentioned in another blog entry a while back. And some other odds and ends.

If Java is gaining syntax, however slowly, then it reopens the question (even for Java programmers!) as to when exactly syntax is useful.

And what we've established, I think, is that it's useful when everyone is doing the same thing over and over. Then you introduce some syntax, which allows users to do the same thing with shorter, clearer code, and it allows the compiler to do some extra error checking for you.

All at the expense of some really loud whining from people who are too busy reading giant J2EE books to have time to learn a new language construct, even if, pound for pound, the language construct has 10x or even 100x the impact to their productivity.

Could it really be 100x? Sure, why not? After all, syntax is automation! It's code generation, in a very literal sense. Everyone loves automation and code generation, since they can save you reams and reams of potentially long and buggy code. So you'd think folks wouldn't mind a little syntax, peppered with some semantics, added into their language.

Syntax for distributed computing

Some people, folks who are clearly not scared of new syntax, were thinking about this problem about fifteen years ago, over in Sweden. They worked at a big telecommunications company called Ericsson, and they had to build this enormous, real-time distributed telephony switching and communications system.

Almost as if by magic, they time-traveled forward to this week, teleported to Seattle, hacked into our systems, and read this blog entry, and they thought: "Hey, he's right! Programming languages can automate reams of boring code, and also protect users from making mistakes. And we're about to embark on writing reams of distributed-computing code for this giant worldwide real-time telecommunications system. So, like, maybe we should make a language for it."

Well, OK, I'll be honest: they thought of it first. No sense trying to lie about it; you'd have figured it out eventually, probably using some Object Oriented methodology.

So Ericsson engineers decided to solve our problem, the one we're talking about hurling J2EE books at in the hopes of stunning it, with a new programming language made just for distributed computing. They cleverly called it "Ericsson Language", or Erlang for short.

They created syntax for the network calls, for running distributed processes, for doing peer reelections, restarting processes, for doing asynchronous event-based messaging, for doing exponential backoff and retry, all kinds o' stuff.

Rumor has it that they've built themselves one of the largest real-time, transactional distributed systems in the world, using only about a million lines of Erlang code, which they estimate would be about 20 million lines of buggy C/C++ code.

Go figure.

And they have Java bindings for their network protocol, so you can write Erlang servers in Java. (Or C/C++, if you prefer.) And it's all free, publicly available, heavily documented, and used by companies all over the world for doing distributed computing seamlessly and (by all accounts) far more productively and robustly.

I suppose we should probably take a look at it, but languages are pretty scary. Plus we'd have to get it into /opt/third-party somehow, which you know ain't gonna happen, because Heaven Help Us if some team decides to make a bold, innovative move and try to get their stuff done faster.

I thought I'd mention it in passing, though, as a curiosity.

My second magic trick: Closures

OK, so I've finally set the stage, I hope, to talk about what I initially started this blog entry for: closures. I'm going to explain closures to all you fine Java programmers out there, and then we can go back to discussing them on mailing lists without anyone becoming alarmed. I've even used soothing visual cues like Java Purple!

Closures, like loops, are a common programming-language feature. Even Java has them!

However, Java only supports them through a relatively cumbersome OO mechanism similar to my hypothetical LoopObject.

To see a Java closure in action, observe this snippet of (hopefully) non-scary Java code:

// call my obj when JComponent c's action is performed

public void addListener(JComponent c, final MyCallback obj)

{

c.addActionListener (new ActionListener() {

public void actionPerformed(ActionEvent e) {

obj.doCallback(e.getActionCommand());

}

});

}

This is perfectly ordinary Java code; you use this kind of construct (an anonymous inner class that implements some sort of interface) any time you want to have some snippet of code invoked "later". GUI event listeners do this a lot, and it's also a common idiom when you want a Runnable to be invoked on some other thread.

What makes it a closure? It's the "final MyCallback obj" arg. Your event listener is using a local variable for addListener, which we all know is going to go away forever when the method returns. But the event listener can still refer to it!

It lives on, at least in the event listener, because Java has made a copy of the value and saved it, magically, trickerily even, "somewhere" inside the anonymous event listener class.

You do this kind of thing a lot, even in Java. You have a block of code to execute later, but it needs to capture and remember a little of the environment it was declared in.

The only requirement Java imposes is that you declare any variables the closure will refer to as "final". And then you get them all saved up, nice and neat, forever and ever, inside your anonymous inner class. Not as instance variables (well, for all you know), but just as pure magic. Sure is convenient! We didn't have to declare a bunch of instance variables to hold them, and initialize them all, etc. etc. It all just happened for us.

In many languages, they support this a little more directly by having syntax for it. If Java had first-class functions and syntax for closures, the code might look something like this:

// call my obj when JComponent c's action is performed public void addListener (JComponent c, final MyCallback obj ) {

c.addActionListener { ActionEvent e |

obj.doCallback ( e.getActionCommand() );

}

As in many other examples, adding some syntax doesn't appear to be a huge win. But recall that the second LoopObject implementation I gave above was only about 50% bigger than the corresponding for-loop (7 lines instead of 3). And it was still sufficient to make everyone thank their lucky stars that Java provides loop syntax instead of the equivalent Object Oriented API Service Call.

The big difference here is that we don't have an anonymous class wrapper for the code block. We just have the code block, and all that really needs is the parameter(s) and the actual code to execute. We've removed some boilerplate, automating its generation with a little new syntax.

I think the syntax is just groovy, don't you?

Coffee break's over. Everyone back on your heads.

Well, back here in the Real World, we all know languages are horribly evil, and scary, and useless, and we should be solving the world's problems with EJB. Fine!

But you really don't need to remind us that there are CustomerOrder objects to expose; our bosses (and customers) remind us every day. Please let us at least discuss the possibility, on the new language discussion mailing list, that there might be a brighter future ahead of us than what EJB and OOP can actually provide.

Thank you.

(Published October 01, 2004)

Comments

An unrelated random note.

You're a top ten AMZN blogger :)

Posted by: Andrew W. at October 2, 2004 05:17 AM

Back to Stevey's Drunken Blog Rants™