Node.js and Web Server Architecture Performance
The comparison is between Node.js, Apache and the simple HTTP server included with Python. Was PHP slow because it was using Apache or was it slow because the PHP implementation is horrible? Was Python slow because the implementation of the language or server was un-optimized?
Once we separate this, we can more fairly judge the performance of Node.js. Side-note, both of the blog posts do not allow comments, I wonder why that is, are they afraid they'll be called out on this?
Node.js is event-driven and non-blocking and I found a paper from 2007 that compares different web server architectures and their performance. It suggest that blocking sockets improve performance, and that event-based and pipeline-based servers have a higher thoroughput (Click here to learn how to measure the thoroughput of your web server). The web servers discussed in the paper are implemented in C++ and Java and maybe this is unfair to Node.js, however it seems common to use Node.js as your production web server instead of Apache. The paper is a great example in how benchmarking should be done. They describe the exact hardware specification they used, the exact language implementations and server implementations used, the tuning required and they go further and do some verification.
Prior to running a full slate of experiments, a correctness test was conducted with each server to ensure that they were responding with the correct bytes for each request. In addition, it is important to verify that each server and each different conﬁguration
of the servers, successfully processes the given workload in a way that permits fair comparisons. Our SPECweb99-like workload uses a set of ﬁles with 36 different sizes, ranging from 102 to 921,600 bytes. Because some servers might obtain higher throughput or lower response times by not properly handling ﬁles of all sizes, we check if all ﬁle sizes are serviced equally. Client timeouts are permitted across all ﬁle sizes; however, the veriﬁcation ensures that if timeouts occur, all ﬁle sizes timeout with approximately the same frequency (no single timeout percentage can be 2% larger than the mean). We also check that the maximum timeout percentage is ≤ 10%. An additional criteria is that no individual client experiences a disproportionate number of timeouts (i.e., timeout percentages for each ﬁle size are ≤ 5%). This check is similar in nature to that performed in SPECweb99 to verify that approximately the same quality of service is afforded to ﬁles of different sizes. Results are only included for experiments that pass veriﬁcation for all request rates. In instances where veriﬁcation did not pass, it was most often due to servers not being able to respond to requests for large ﬁles prior to the client timing out. This simulates a user getting frustrated while waiting for a page and stopping the request and browsers that have built-in timeout periods.
The most important thing I found in this paper is that there's a specification for web server testing called specWeb99 (Click here to see the results published). Here's what it measures:
Each workload measuring the peak performance (SPECweb2009_Banking, SPECweb2009_Ecommerce, and SPECweb2009_Support) measures the maximum number of simultaneous user sessions that a web server is able to support while still meeting specific throughput and error rate requirements. The TCP connections for each user session are made and sustained at a specified maximum bit rate with a maximum segment size intended to more realistically model conditions that will be seen on the Internet during the lifetime of this benchmark. Power (in Watts) used while running the peak performance on each of these workloads is measured as well.
SPECweb2009_Power is a workload that is fully based on SPECweb2009_Ecommerce. SPECweb2009_Power is run at six different load levels, starting with the highest load level that corresponds to the maximum number of connections used in SPECweb2009_Ecommerce, ramping down to idle. The workload measures the performance to power ratio as the sum of simultaneous user sessions to the sum of watts used.
This is the kind of testing that should be done, not the flawed microbenchmarks done by both Dziuba and Kehn. The importance of computer science is that we get testable and verifiable results and have a good way to compare different applications. I'm not even going to look at the performance problems that may be due to the poor implementations of PHP and Python but they do matter and there's a wealth of papers that suggest how we can improve performance at these lower levels (not to mention the amount of papers that discuss improvements to the higher-level data structures and algorithms we use).