Node.js and Web Server Architecture Performance
Since I got an ACM subscription, I have been interested in knowing what research papers have practical implications for web developers. There was a blog post by Ted Dziuba that blasted the hype surrounding Node.js, calling it a cancer on the developer community and suggesting that the performance is sub-par and the language used, JavaScript, contains flaws as bad as those found in PHP. He did a micro-benchmark and the amount of requests that Node.js could handle with the simplest function and with a modest load was tiny. There was a reply to this by Joshua Kehn where the example used by Dziuba to prove the poor performance of Node.js is called flawed. Kehn demonstrates this by benchmarking the example in PHP and in Python. The example isn't very good, it's the Fibonacci function which requires caching/memoization to be more efficient. Even with the inefficient implementation, Node.js is apparently the fastest compared to PHP and Python. The trouble with this comparison is that the implementation of the programming languages are different and have different performance characteristics (for example, the PyPy implementation of Python sometimes performs much better). Another problem is that the HTTP server implementations may not be optimized. This is very true for Python where the best practice for deploying Django in production is to use Apache and mod_wsgi. Both the Python and PHP examples should be using Apache.
The comparison is between Node.js, Apache and the simple HTTP server included with Python. Was PHP slow because it was using Apache or was it slow because the PHP implementation is horrible? Was Python slow because the implementation of the language or server was un-optimized?
Once we separate this, we can more fairly judge the performance of Node.js. Side-note, both of the blog posts do not allow comments, I wonder why that is, are they afraid they'll be called out on this?
Node.js is event-driven and non-blocking and I found a paper from 2007 that compares different web server architectures and their performance. It suggest that blocking sockets improve performance, and that event-based and pipeline-based servers have a higher thoroughput (Click here to learn how to measure the thoroughput of your web server). The web servers discussed in the paper are implemented in C++ and Java and maybe this is unfair to Node.js, however it seems common to use Node.js as your production web server instead of Apache. The paper is a great example in how benchmarking should be done. They describe the exact hardware specification they used, the exact language implementations and server implementations used, the tuning required and they go further and do some verification.
Prior to running a full slate of experiments, a correctness test was conducted with each server to ensure that they were responding with the correct bytes for each request. In addition, it is important to verify that each server and each different configuration
of the servers, successfully processes the given workload in a way that permits fair comparisons. Our SPECweb99-like workload uses a set of files with 36 different sizes, ranging from 102 to 921,600 bytes. Because some servers might obtain higher throughput or lower response times by not properly handling files of all sizes, we check if all file sizes are serviced equally. Client timeouts are permitted across all file sizes; however, the verification ensures that if timeouts occur, all file sizes timeout with approximately the same frequency (no single timeout percentage can be 2% larger than the mean). We also check that the maximum timeout percentage is ≤ 10%. An additional criteria is that no individual client experiences a disproportionate number of timeouts (i.e., timeout percentages for each file size are ≤ 5%). This check is similar in nature to that performed in SPECweb99 to verify that approximately the same quality of service is afforded to files of different sizes. Results are only included for experiments that pass verification for all request rates. In instances where verification did not pass, it was most often due to servers not being able to respond to requests for large files prior to the client timing out. This simulates a user getting frustrated while waiting for a page and stopping the request and browsers that have built-in timeout periods.
The most important thing I found in this paper is that there's a specification for web server testing called specWeb99 (Click here to see the results published). Here's what it measures:
Each workload measuring the peak performance (SPECweb2009_Banking, SPECweb2009_Ecommerce, and SPECweb2009_Support) measures the maximum number of simultaneous user sessions that a web server is able to support while still meeting specific throughput and error rate requirements. The TCP connections for each user session are made and sustained at a specified maximum bit rate with a maximum segment size intended to more realistically model conditions that will be seen on the Internet during the lifetime of this benchmark. Power (in Watts) used while running the peak performance on each of these workloads is measured as well.
SPECweb2009_Power is a workload that is fully based on SPECweb2009_Ecommerce. SPECweb2009_Power is run at six different load levels, starting with the highest load level that corresponds to the maximum number of connections used in SPECweb2009_Ecommerce, ramping down to idle. The workload measures the performance to power ratio as the sum of simultaneous user sessions to the sum of watts used.
This is the kind of testing that should be done, not the flawed microbenchmarks done by both Dziuba and Kehn. The importance of computer science is that we get testable and verifiable results and have a good way to compare different applications. I'm not even going to look at the performance problems that may be due to the poor implementations of PHP and Python but they do matter and there's a wealth of papers that suggest how we can improve performance at these lower levels (not to mention the amount of papers that discuss improvements to the higher-level data structures and algorithms we use).

Post new comment