Friday, October 26, 2007

Worst Measurement Ever

I've been resisting posting about it for some time, I think since Patrick pointed me at the link almost six months ago. I've reached my breaking point. In my quest to learn Erlang, I've come across at least three blogs and/or articles that actually site this "measurement" with attribution as if it were some sort of legitimate claim as to the scalability of Erlang.

Given that I'm reading his book, I would really expect more from Joe Armstrong and by attribution Ali Ghodsi.

The Apache vs. Yaws measurement is one of the most useless pieces of information produced by the Erlang community, to the point that I'd argue it does a disservice to the Erlang community and the language.

In any sort of quasi-scientific measurement (or primary school science experiment for that matter), I would expect to see:

  1. the actual code used to test the server
  2. the actual Linux kernel version
  3. the actual yaws server code

Instead, we see a graph of an under-documented experiment that creates conditions for a DoS test at best, not a web server scalability test.

From the looks of it, this "measurement" is not:

  1. documented to any reasonable extent - what kernel was used? what was the exact Apache configuration? what was the Yaws code used to serve the files?
  2. repeatable - no source, little documentation, little detail on the environment
  3. peer reviewed - without the above, nobody else can discuss in detail or attempt to repeat the same results
  4. valid - it simply does not repeat a real-world environment faced by any modern web server

Allow me to support some of those assertions from real-world experience helping large web sites:

  1. Not knowing what Linux kernel version was used, how Apache was configured (in detail) and how Apache was built, it's impossible to know if Apache "was configured for maximum performance". Seeing as they decided not to share any of that vital information, I consider the entire experiment invalid. These things matter, just like if I were to run BEAM on an SMP system without SMP, it's easy to misconfigure the runtime. For example, Apache started using epoll in 2.6.x kernels. Anything earlier than that would be an inappropriate kernel to use with mpm_worker. We wouldn't know from the sparse detail if it is a valid configuration or not.
  2. In a real-world environment, if I saw 100s of connections inactive for 10 seconds at a time, I would simply set the connection timeout to five seconds. Not sure how you would do the same on Yaws.
  3. Most production web servers don't serve one byte files (the "load" requested by the clients in the measurement). I can't actually recall the last time I saw a high-volume server dish out more than one or two one byte files. Instead, real web servers serve files that measure in the KB or even MB in size. In fact, given Erlang's empirical difficulty handling basic file I/O, I'm not surprised that they chose a one byte file to use in simulating "load". If the file were any larger, Yaws would likely have been swamped under it's own unbuffered I/O implementation and exhibited substantial latency per request.
  4. A one byte file would be cached upstream of the server in a high-volume site, particularly if your clients were operating over latent connections where one character per ten seconds was realistic (i.e. dialup).
  5. If this were a real DoS attempt, it would be choked off at the ISP router, well before the web server saw the intentionally slow request.
  6. I can personally type an HTTP request faster than one character per ten seconds, this is simply not a realistic access pattern from a benign client.
I'm not going to say that Erlang does or doesn't scale. My point is that this diagram and the corresponding writeup are utterly useless and that to site it as a valid study is irresponsible at best (especially when it's your own thesis).

I put this in the same category as Microsoft FUD - there is an agenda here, and people will read it and say "Oh, yea - told you we rock" without questioning the details. Most however, should dismiss it as pure FUD, and FUD served from an Apache 2.2.3 server no less (maybe Yaws wasn't up to the task).


Ulf Wiger said...

So to exemplify, you pick "empirical proof" that Erlang's file I/O sucks, without bothering to check the follow-ups of that experiment?

Tim Bray did as you suggested and posted the code. Others with more
experience using Erlang have made
the code run ca 17x faster. The
slow file IO was mostly due to the
fact that the function used is
designed for prompting a tty user
for input.

Anyway, yaws was designed for dynamic content. The Apache vs Yaws benchmark was done back in the days when no-one outside telecoms cared about Erlang.

Bjarne Stoustrup once said: "There are only two kinds of languages: the kind everybody bitches about, and the kind nobody uses." Erlang has gone from being a language nobody uses to one that everybody bitches about. For one who's been using Erlang for years, this is kinda cool. (:

Erik Onnen said...

"without bothering to check the follow-ups of that experiment"

I've seen the dozens of alternatives come across the mailing list for sure. What approach Yaws uses, I don't know. That doesn't change the fact that it took several experts to tune a fundamental operation - none of the other languages had that problem. Maybe Yaws doesn't have the issue, but the nature of the "load" test remains suspect on other terms, I/O problems or not.

I can appreciate that the benchmark was done long ago, but it's still sited as a reference to Erlang's scalability when it doesn't really deserve that place in the language's emergence from the telcos. We should find some better examples and highlight those.

Ulf Wiger said...

"That doesn't change the fact that it took several experts to tune a fundamental operation - none of the other languages had that problem"

No, it didn't. Several versions that used a faster way to read a file were posted the day after Tim's post. Some users then decided to explore how fast it could go on an 8-core machine, and collectively brought the runtime down to ca 0.3 seconds.

BTW, since everyone likes to bash io:get_line(), let's conduct a small experiment just to illustrate that perhaps the designers of the code weren't simply incompetent:

Start two erlang nodes:
erl -sname n1
erl -sname n2

From the shell in n1, write:
(n1@myhost)1> rpc:call(n2@myhost,io,get_line,['What is your name? ']).
What is your name? Ulf

So what happened?
When making a remote procedure call to the node n2@myhost, Erlang automatically establishes a connection, spawns a process on the remote node to service the call. This process inherits the group leader (which services the basic IO) of the caller. Thus, the prompt appears on the caller's node.

This is extremely useful when building distributed embedded systems, where you may have machines that have no TTY. There are always tradeoffs between speed and flexibility. A good environment should cater to both needs, but the user needs to learn the purpose of different functions.

This is not a problem, until those getting their feet wet and making beginner's mistakes, blog about it, and then have their findings referred to as "empirical proof".

I think it's great that people start pushing for Erlang to be optimized in areas that simply aren't important in non-stop embedded systems (blocking line-oriented I/O tends to be one of those). I've found Erlang's IO system to be fast, scalable, flexible and extremely robust.

BTW, if you want the code for the Apache vs Yaws experiment, just email Joe and ask him. It's never been a secret, but there was never any pressure to put it on-line before, as far as I know.

If you want an interesting reference to Erlang's capabilities in terms of (network) IO and load tolerance, you may read

Comparing C++ and Erlang for Motorola Telecoms Software

If you read the slides and decide that you want to argue against the findings, please first read the full paper (Copyright ACM). Some of their findings are consistent with Joe's experiment, but this was the outcome of a 3-year EU project.

And the fact that SICS (Swedish Institute for Computer Science) runs an Apache web server, like the majority of other sites out there, says precious little about Yaws capabilities (or lack thereof).

Steve Vinoski said...

Regarding the Wide Finder, I am certainly no Erlang expert. That was basically one of my first real Erlang programs, and yet I was able to create a solution after 3 tries (not counting tiny tweaks in between) that matched or exceeded the performance of Tim Bray's original Ruby. So, stating that "experts" were required to get Erlang to provide a decent solution is incorrect. According to Tim, even my first solution already outran Ruby on the T5120 64-core machine, and my third solution is way faster than that first solution. The experts took that third solution and made it even faster.

But on the Yaws vs. Apache front, I agree that it would be nice to see some updated results with details on how to recreate the tests. I was able to partially recreate the tests and it looked to me that Yaws could indeed handle many more connections than Apache and do it much more quickly, but I'd definitely have to perform much more rigorous testing before I'd be confident enough to officially publish any results.

Erik Onnen said...

Ulf and Steve, thanks for level-setting the widefinder issues. This wasn't really a widefinder post and in retrospect, I could have made my point without dwelling on I/O concerns. That came across as a troll on my part, but rather than removing it I'll leave the thread in context and produce another post on my widefinder thoughts.

Ulf Wiger said...


Just to clarify another point that got lost: I think it would be great to see a follow-up on the Yaws-vs-Apache experiment, even if this would end up refuting some of the initial findings. I agree that more info should be made available, so that others could add insight.

There are a lot of questions to answer. Obviously, Apache isn't primarily focused on performance, and there are other fast dynamic content servers out there. It would also be useful to highlight tuning aspects etc.

Joe isn't going to do this (I hope), since he doesn't work for SICS anymore. But he can contribute the background info, if asked (so he told me).