math-every-day

I just read a book called John von Neumann and the Origins of Modern Computing. Every few years I read a book that causes a discontinuity in my thinking -- a step function that's a lot larger than the little insights that most books or articles produce. For me, this was one of them.

Incidentally, I don't expect it will have the same impact on everyone. I think there's a right time for every book. You have to struggle with some set of issues for a while before the right book can really hit home. You might like it, but I'm not necessarily going to recommend it. It's more of a history or a biography than a technical book.

I thought I'd talk about some of the realizations, probably already obvious to you, that really "clicked" when I read it.

What Math Really Is

I finally realized how important math is. Innumeracy, the mathematical version of illiteracy, is crippling me and most of the rest of the programming world.

Von Neumann was originally "just" a mathematician (and chemical engineer), but he made lasting and often central contributions to fields other than pure mathematics. Yes, he invented the computer and computer programming as we know it, which is a fine thing to have on your resume, but it's only a tiny part of his work. The list is so long that the book I read couldn't begin to mention it all, let alone discuss it. I'll list a handful of his accomplishments here, but they don't begin to paint the full picture.

Johnny co-invented game theory, made critical contributions to the field of economics, and extended his game theory to formalize the theory of linear programming, which is now a staple optimization method for many disciplines, including some that we employ at Amazon (e.g. operations research).

He singlehandedly invented the theory of cellular automata -- you know, the one that madman Stephen Wolfram tries to take credit for in A New Kind of Science -- and was one of the pioneers of methods for understanding the human brain, which spun off any number of sub-fields.

He also did pioneering work in the theory of building large, reliable systems comprised of unreliable components.

He was one of the key contributors to the atomic bomb on the Manhattan Project. He's the one who suggested the implosion device for achieving reliable detonation (they had been trying to use a gun/projectile method), and he was instrumental in working out the hydrodynamic and thermodynamic problems in understanding the shock waves and blast behavior. Regardless of your feelings about nuclear weapons, you have to appreciate his contributions in a critical wartime initiative.

He made many major contributions to the field of numerical approximation. Many real-world problems are non-tractable to mathematical analysis; e.g. they're only describable by systems of high-order partial differential equations. Johnny tackled these problems with gusto, inventing or extending a huge number of different approaches for doing approximations or probabilistic analyses that yielded results where rigorous analysis had failed.

This is the primary reason that he went to all the effort to invent the modern computer. He wasn't satisfied with simply designing and building it; he used it to solve a huge variety of real-life problems.

For instance -- and this is just one example -- he turned weather forecasting into a science. Before Johnny decided to tackle the problem (and he knew nothing of meteorology; he had to learn it in order to solve it), it wasn't possible to get accurate 24-hour weather predictions, and the experts in the field had given up hope of ever making scientific weather predictions. You now get accurate weather forecasts because Johnny decided it was an interesting problem.

"Watching" Johnny von Neumann work in this book gave me a new appreciation for the importance of having a strong mathematical foundation. Math is a funny thing; it's not the way most people think of it -- it's an ever-expanding set of tools for modeling and solving problems. It's very much driven by practical considerations; if a new problem comes up, and it's not tractable to existing mathematical methods, then you work to make up new ones.

So math is very much like programming: it's a set of tools and practices that you use to model and solve problems. There are often multiple approaches that have their own pros and cons. You might be able to model a particular problem algebraically, geometrically, and numerically. One approach might be more accurate but more difficult to compute. Or an approach might be accurate and easy to compute, but only when your instance of the problem satisfies certain constraints.

We have hundreds of problems at Amazon that nobody has tried to analyze mathematically. Von Neumann would have been excited by all of this (and possibly disgusted that we've been in business a decade without bringing in mathematicians to help us solve our problems. In fact we regularly turn them down as part of our interview process.)

My new motto is "Math every day." I'm giving myself one year to master all the math I was supposed to have learned in high school and college: algebra, geometry, trigonometry, limits and conic sections, differential calculus, integral calculus, multivariate calculus, simple differential equations, linear algebra and eigenvectors/eigenvalues, discrete math and logic, probability and statistics. I "knew" it all at one time or another, without really understanding what the heck it was for, so I should be able to put it all together again fairly quickly, if I put my mind to it.

Math every day. You learn things a little at a time. Practice something every day for half an hour and you'll become comfortable with it in no time.

What Computers Really Are

Another realization I had while reading the book is that just about every course I took in my CS degree was either invented by Johnny von Neumann, or it's building on his work in mostly unintelligent ways.

Where to start? Before von Neumann, the only electronic computing devices were calculators. He invented the modern computer, effectively simulating a Universal Turing Machine because he felt a sequential device would be cheaper and faster to manufacture than a parallel one. I'd say at least 80% of what we learned in our undergrad machine-architecture course was straight out of his first report on designing a programmable computer. It really hasn't changed much.

He created a sequential-instruction device with a fast calculation unit but limited memory and slow data transfer (known as the infamous "von Neumann bottleneck", as if he's somehow responsible for everyone else being too stupid in the past 60 years to come up with something better. In fact, Johnny was well on his way to coming up with a working parallel computer based on neuron-like cellular automata; he probably would have had one in production by 1965 if he hadn't tragically died of cancer in 1957, at age 54.)

Von Neumann knew well the limitations of his sequential computer, but needed to solve real problems with it, so he invented everything you'd need to do so: encoding machine instructions as numbers, fixed-point arithmetic, conditional branching, iteration and program flow control, subroutines, debugging and error checking (both hardware and software), algorithms for converting binary to decimal and back, and mathematical and logical systems for modelling problems so they could be solved (or approximated) on his computing machine.

He did much of the hardware and materials engineering himself, working closely with engineers to come up with solutions for memory, secondary storage, input and output (including hooking up an oscilloscope to the computer to visually check results that were too complex for punch cards). He secured the funding for building the computer and computing lab from government and other sources, developed training programs to teach people how to program computers, and worked with domain experts to find problems that were too hard for mathematical analysis but tractable to numerical solutions on his computer.

Of course he knew what was computable on his machine, because he worked out new definitions of elegance and efficiency, came up with the mathematical models for analyzing the complexity of algorithms being executed on his device, and with his staff, ran hundreds of experiments to learn what worked well and what didn't.

John von Neumann invented our universe.

Then he died at the depressingly early age of 54, robbing the world of perhaps the greatest genius of the 20th century. "Those who know" generally seem to rank Albert Einstein ahead of von Neumann, but Johnny always gets a solid #2 vote. Frankly, though, I think Johnny had a far bigger impact on my life, and not just because I'm a programmer. What did Albert do, really? Dashed all our hopes of faster-than-light travel, that's what he did. Whined a lot about not agreeing with quantum mechanics, that's what he did. To the best of my knowledge, Einstein didn't even know EJB, which according to many Amazon folks makes him a retard.

When I say that von Neumann invented our universe, I'm not trying to be poetic or rhetorical. What I'm saying is that his first attempt at a computing machine, one that he didn't really like all that much and considered mostly a prototype, is still the one sitting on my desk today. That means we're a bunch of frigging boneheads. You, me, everyone.

Discrete math, data structures, algorithms: they've all been refined since he died, sure, but he started it all. Virtually every discipline that we think of as "computer science" is like Euclidean geometry: useful, sure, but far from the only kind out there.

Operating systems, threads, and processes are just a pathetic attempt at fooling you into thinking you have a parallel computer, right? Gosh, computers are so darn fast that you can have the processor zing around like Feyman's hypothetical "only electron in the universe", and it almost looks as if we're smart. But we're not. All the world at large truly understands is serial execution, which is precisely why we're so lost in the whole distributed computing space. Everyone talks about agents and crawlers and web services and all this other crap we don't understand; Johnny would have taken one look at it and invented tractable mathematical solutions.

Compilers: now there's one discipline where Johnny clearly didn't have much influence. He was a big old-school proponent of doing everything in machine code. He might have changed his mind if he'd lived another few decades; hard to say. But ordinary mortals realized they needed shortcuts: higher-level languages that would be translated into Johnny's machine instructions.

So people came up with a bunch of crap-ass languages that still had the exact same abstractions as the underlying machine: a global memory that you update by issuing statements or instructions, expressions that can be computed by the arithmetic-logic unit, conditional branching and loops, subroutines. Everything you need to be "Turing-complete", which is equivalent to von Neumann-complete.

The new languages simply added various shortcuts: named storage locations ("variables"), syntax for dealing with memory addresses, "types" for telling the compiler that a variable comes from a particular set of valid values and has a particular set of legal operations, "scopes" and "namespaces" for organizing your code a little better, "objects" for anthropomorphizing your data so it can be happy or sad (mostly sad), and other cruft.

Bu it's all crap. Why? Because it's all just sugaring for the capabilities of assembly language on von Neumann serial machines, plus a smattering of support for calling into the operating system code so you can pretend your program is performing truly parallel operations. And none of that parallel stuff works very well, since we don't understand it, because we don't know math.

What about data structures, you ask? Surely that's one island of purity, something that exists outside of the von Neumann universe? After all, we worked so hard to understand them. (Well, some of us did. Plenty of folks didn't even do that; they just call an API and it all works like magic.)

Sorry to disappoint you, but most of our data structures are fundamentally based on Johnny's sequential machine. Think of all those pointers: they're just memory addresses. The best you can do for a sorting algorithm, complexity-wise, assuming radix sort isn't possible, is n*logn. Or so you thought; there are parallel algorithms that run in linear time. All your cherished data structures are simply the best that clever people could do at organizing data on a Turing Machine. If someone created a set of data structures whose pointers were URLs, that would be a step away from von Neumann.

What about SMP or NUMA? Surely adding multiple processors is a huge step towards parallel computing? Nope, sorry. Well, not much of one. There's still a von Neumann bottleneck; the channel has just been made a bit shorter. We need it to be made infinitely wider by creating a truly parallel computer that doesn't consist of CPUs all trying to update the same global store.

Face it: Computer Science was a misnomer. It should have been called Johnny's First Universe.

What's Left?

OK, so let's speculate about what would have happened if Johnny had lived another 30 years, and built his parallel computer based on neural networks and cellular automata theory, and we all used those instead of the Turing devices. What things out of an undergraduate computer science degree would still be relevant?

That's easy: Math. Everything that's relevant would stem from math. There would be data structures, but they'd be different ones, derived from the complexity analysis of the "inner economy" of the parallel computer. There would be algorithms, but they'd also be different, and designing your own algorithms would require mathematical analytical abilities, just like they do today. There would be control systems and feedback loops and type systems and semantic nets, just like today, but all of those things are just useless superstition without math.

Just like everything, really. It's hard to think of useful courses in school that didn't eventually require you to start doing some mathematical analysis in order to understand the advanced concepts. Even philosophy, psychology and social sciences need math -- let's not forget that logic, statistics and stochastic/probabilistic methods are still branches of mathematics.

Other than math, the other big one is Linguistics. It's a field that has its share of math, to be sure, and in the end, mathematics will probably solve the problem of why people are so scared of languages. But until the human brain is really mapped out, which could be a while, an understanding of languages and how they work is going to be fundamental. We can't even do math without having a language for it.

Programming Languages

I said earlier that all programming languages are just desperate attempts to provide shortcuts for operations on the von Neumann architecture. Actually, that's not entirely true. A handful of languages are designed to express computations in a "pure" form, not tied to Johnny's omnibus.

One such language is Prolog. There's a branch of math (what else?) that focuses on being able to describe or model a computation as a set of rules, and then having an engine evaluate the rules, supposedly without having to know or care how the engine does it. In practice, you do need to know how it does it, in order to be able to compute the complexity of the algorithm. This model is quite useful for certain problem domains, and Prolog is a language that gives you access to a rules engine, and provides "normal" language features for situations where the rules engine doesn't help.

There are others, but I won't go into them because at the moment, I don't understand them yet. Some of them aren't so much languages as they are "approaches" that can be used in any language -- some of them approaches to doing truly parallel computation, for instance.

But Lisp is particularly interesting, because unlike most other languages, it wasn't designed as a language for the Turing machine. One of Alan Turing's professors, Alonso Church, was working on a mathematical model for expressing computations at the same time Turing came up with his Turing Machines. Church had created a system called the Lambda Calculus, and he and Turing quickly realized they were formally equivalent, albeit very different-looking. The Lambda Calculus is a system for building up computations by writing and composing functions.

As it turns out, Turing machines are easier to implement in (1950s) hardware, and the Lambda calculus is easier for people to read and write by hand, as you'll know if you've ever tried to write out a significant-sized Turing machine.

In 1958, the year after von Neumann passed away, John McCarthy invented a mathematical notation called Lisp, which was based on the Lambda calculus, and allowed him to express programs much more conveniently than the von Neumann instruction set allowed him to do. Soon afterwards, a grad student decided to turn it into a programming language. Lisp survives today as the second oldest programming language still in use, after Fortran, which was invented in 1957.

The fact that Lisp is based on the lambda calculus is important. It means it's not some cheesy hack to provide syntactic sugar for von Neumann machine language. It's a model for writing programs that's intrinsically powerful and expressive. For one thing, being a functional language, Lisp translates far more easily to parallel machines, because it doesn't assume the world works by moving a head around on an infinite tape; i.e. it doesn't work by side-effect.

L

isp has some pragmatic additions to allow you to do programming by side-effect. Some people think this is bad, because that paves the way to ugly bugs. But as long as von Neumann machines are the norm, it's going to be practical. At least Lisp encourages you to program without side-effects. C, C++, Java, Perl, and most other languages embrace the von Neumann architecture and force you to write error-prone code.

And Lisp's design gives it at several distinct technical advantages over other functional languages such as ML and Haskell. I'll leave them out of scope for the moment, but all of them contribute to Lisp's ability to write more compact code -- the larger the system, the more compact the code. C++, Java and Perl code bases seem to grow almost linearly with the functionality, but carefully engineered Lisp code bases have a way of growing at a rate that feels closer to the log of the feature complexity.

Big Systems

Sometimes I hear people hint that we're wasting time when we talk about little toy code examples, or about programming languages. Their feeling is that it's silly to be talking about these things when people doing "real" work here are busy building giant production systems.

But if you consider my July 16th blog entry Lisp Wins (I think), where I talk about high-level language constructs like "map" saving you reams of code, or my September 24th blog entry Saving Time, where the Second Point talks about how I didn't give up on a function until I'd written it to be 60% shorter, you'll realize I've got a running theme here: big systems suck, and you shouldn't write them.

In the 1930s, people were solving large computational problems by accumulating huge masses of laborers, and doling out little parts of the computation to each person, who would work it out on paper using a slide rule or sometimes a calculator. People who did bits of calculations for a living were called "computers", and groups of people dividing up a calculation are now called "human computers".

The human computers in in the 1930s weren't the first, either; they'd been around and well-documented, here and there, since the 16th century. And you've heard the speculations that the monks' religious rituals in some ancient monasteries were actually a ruse to carry out some large computation. But as computing machines increased in power, the need for human intervention in computations decreased.

Or did it? If that's the case, why are we hiring hundreds more programmers? Are we just churning big prayer-wheels here?

Mathematicians have traditionally avoided problems that they considered "intractable" to analysis. They've known about numerical methods for centuries; even Newton devised a few, as did the ancient Greeks. But brute-force methods have traditionally been met with some disdain, because the results of large-scale human computations rarely justify the cost of the undertaking.

Clairaut's work in 1757, in which he and two other people labored for 5 months to trace the orbit of Halley's Comet to compute its perihelion, garnered widespread criticism after his team missed the true perihelion by thirty-one days. Larger-scale computing groups have met with similar results: you can perform large, expensive computations, e.g. to create trigonometric tables, but humans make errors, and human computers generate unreliable output that becomes worse the harder they push to get it done faster.

Before von Neumann's programmable computer, mathematicians had few other options. They pushed pure analysis to its limits, but some physical and natural phenomena can only be described with complex systems of equations that cannot be solved analytically. Many problems had to be solved informally.

Von Neumann changed all that; he embraced numerical methods and invented entire new branches of math for use with the computers he was inventing -- the sequential computer for the most part, but also his planned parallel computers. And he was able to perform larger and larger computing tasks with small-ish teams. When a task grew large enough, requiring more than, say, 20 people, it was considered intractable. Rather than trying to secure funding for more people, he and his colleagues went to work on creating intrinsically more powerful mathematical models to make the problem tractable.

And don't you think for a moment that his problems were "easier" than the ones Amazon faces today. Looking at the speed at which he was pioneering the development of parallel computational models and research on intelligent, self-reproducing, self-repairing systems in the last few years of his life, it's pretty clear that he would have been successful. Johnny's approach was to make intractable tasks tractable.

That's not the way the computer programming industry works today. What is software engineering, if not dividing up problems to be worked on by multiple people? And coordinating the division of labor for huge computing tasks? It might seem more glamorous than doing arithmetic, but is it really? I don't think so. We're still just a big human computer, with a fancier slide rule.

All because we don't know math.

Epilogue

I used to think I was too old to go back and get good at math. I don't think that anymore. "Math every day." That's my new motto. Another good one might be: "Think outside the tape." (The Turing-machine tape, of course.)

Various (much slower) researchers have continued to advance the mathematics behind neural nets, cellular automata, parallel algorithms, and parallel languages. If Johnny came back today, he'd probably catch up with all of the advances in math, computer science, and genetics of the past 45 years within about, oh, maybe 18 months. I'll probably never catch up, not if I live to be 100. But I'm not worried; at least now I know that the better I get at math, the better I'll be at virtually everything I try my hand at.

I went and bought a bunch of math books today. Many of them were by the same guy: if there's one person who's had nearly as big an impact on Computer Science as Johnny von Neumann, it's the brilliant mathematician Donald Knuth. I'm going to be reading everything he ever wrote.

I've also bought myself some nice titles that teach you math and math history simultaneously, including Mathematics and its History by John Stillwell. I've decided that the only way to make math non-boring is to put it in historical perspective. Reading about the people, the discoveries, the competition and the breakthroughs makes it interesting in a way my math courses never managed to do.

And I bought some works by Noam Chomsky and other linguists. Time to familiarize myself with linguistics, too, since the field has such a huge similarity to the study of programming languages and other computer-science disciplines like search and natural language recognition. No need to drop everything else to do this; a few hours of study a week should more than suffice. I'm looking forward to my new education.

Math every day.

Postscript

There are so many reviews of Stephen Wolfram's insane rant about cellular automata that it's hard to find any one in particular that you're looking for. I was browsing them looking for the one that mentioned that Von Neumann was the man who invented CAs; it turns out to be the one titled The Emperor's New Kind of Clothes by Joe Weiss, and comes up first if you sort the reviews by Most Helpful First. (Hard to believe that's not our default configuration.)

The first few Most Helpful reviews of Wolfram's book are all quite good, but one of them is so good I just have to mention it here; otherwise you might not go read it. I've read some funny customer reviews in my time, but this one might be my new all-time favorite: A new kind of review. Enjoy!

(Published November 15th, 2004)