|« Selenium RC fixed for FF 18.104.22.168||Internal Redirect WSGI middleware »|
A couple of months ago, in response to someone else's speed claims, I posted a comment that CherryPy's built in WSGI server could serve 1200 simple requests per second. The demo used Apache's "ab" tool to test ("-k -n 3000 -c %s"). In the last few days before the release of CherryPy 3.0 final, I've done some further optimization of cherrypy.wsgiserver, and now get 2000+ req/sec on my modest laptop.
threads | Completed | Failed | req/sec | msec/req | KB/sec | 10 | 3000 | 0 | 2170.79 | 0.461 | 358.18 | 20 | 3000 | 0 | 2080.34 | 0.481 | 343.26 | 30 | 3000 | 0 | 1920.31 | 0.521 | 316.85 | 40 | 3000 | 0 | 2051.84 | 0.487 | 338.55 | 50 | 3000 | 0 | 2051.84 | 0.487 | 338.55 |
The improvements are due to a variety of optimizations, including:
I want to make it clear that the benchmark does not exercise any part of CherryPy other than the WSGI server. I used a very simple WSGI application (not the full CherryPy stack):
def simple_app(environ, start_response): """Simplest possible application object""" status = '200 OK' response_headers = [('Content-type','text/plain'), ('Content-Length','19')] start_response(status, response_headers) return ['My Own Hello World!']
The full stack of CherryPy includes the WSGI application side as well, and consequently takes more time. But that has risen from about 380 requests per second in October to:
Client Thread Report (1000 requests, 14 byte response body, 10 server threads): threads | Completed | Failed | req/sec | msec/req | KB/sec | 10 | 1000 | 0 | 536.86 | 1.863 | 85.36 | 20 | 1000 | 0 | 509.47 | 1.963 | 81.01 | 30 | 1000 | 0 | 499.28 | 2.003 | 79.39 | 40 | 1000 | 0 | 491.90 | 2.033 | 78.21 | 50 | 1000 | 0 | 504.32 | 1.983 | 80.19 | Average | 1000.0 | 0.0 | 508.366 | 1.969 | 80.832 |
If you want to benchmark the full CherryPy stack on your own, just install CherryPy and run the script at
Here's the other script for the "bare server" benchmarks:
import re import sys import threading import time from cherrypy import _cpmodpy AB_PATH = "" APACHE_PATH = "apache" SCRIPT_NAME = "" PORT = 8080 class ABSession: """A session of 'ab', the Apache HTTP server benchmarking tool.""" parse_patterns = [('complete_requests', 'Completed', r'^Complete requests:\s*(\d+)'), ('failed_requests', 'Failed', r'^Failed requests:\s*(\d+)'), ('requests_per_second', 'req/sec', r'^Requests per second:\s*([0-9.]+)'), ('time_per_request_concurrent', 'msec/req', r'^Time per request:\s*([0-9.]+).*concurrent requests\)$'), ('transfer_rate', 'KB/sec', r'^Transfer rate:\s*([0-9.]+)'), ] def __init__(self, path=SCRIPT_NAME + "/", requests=3000, concurrency=10): self.path = path self.requests = requests self.concurrency = concurrency def args(self): assert self.concurrency > 0 assert self.requests > 0 return ("-k -n %s -c %s <a href="http://localhost:%s%s"">http://localhost:%s%s"</a> % (self.requests, self.concurrency, PORT, self.path)) def run(self): # Parse output of ab, setting attributes on self args = self.args() self.output = _cpmodpy.read_process(AB_PATH or "ab", args) for attr, name, pattern in self.parse_patterns: val = re.search(pattern, self.output, re.MULTILINE) if val: val = val.group(1) setattr(self, attr, val) else: setattr(self, attr, None) safe_threads = (25, 50, 100, 200, 400) if sys.platform in ("win32",): # For some reason, ab crashes with > 50 threads on my Win2k laptop. safe_threads = (10, 20, 30, 40, 50) def thread_report(path=SCRIPT_NAME + "/", concurrency=safe_threads): sess = ABSession(path) attrs, names, patterns = zip(*sess.parse_patterns) rows = [('threads',) + names] for c in concurrency: sess.concurrency = c sess.run() rows.append([c] + [getattr(sess, attr) for attr in attrs]) return rows def print_report(rows): widths =  for i in range(len(rows)): lengths = [len(str(row[i])) for row in rows] widths.append(max(lengths)) for row in rows: print for i, val in enumerate(row): print str(val).rjust(widths[i]), "|", print if __name__ == '__main__': def simple_app(environ, start_response): """Simplest possible application object""" status = '200 OK' response_headers = [('Content-type','text/plain'), ('Content-Length','19')] start_response(status, response_headers) return ['My Own Hello World!'] from cherrypy import wsgiserver as w s = w.CherryPyWSGIServer(("localhost", PORT), simple_app) threading.Thread(target=s.start).start() try: time.sleep(1) print_report(thread_report()) finally: s.stop()
Good work Robert! I upgraded Aspen trunk to the 3.0.0 server the other day ... the AI_PASSIVE changes threw me for a loop (an empty string for host now means AF_INET6 rather than AF_INET) but other than that I'm excited about the improvements. :)
BTW, I believe the problem with > 50 threads on Windows is due to a limit of 64 sockets in Windows select implementation, which is there to encourage use of some other MS-specific API. Try -c64 and -c65 to illustrate the limit.
Sorry I missed you on IRC. Here's the reference I had in mind for the 64-socket limit:
Right. My IRC comment was noticing that both Python and Apache 2.0.59 set FD_SETSIZE to something larger than 64 (so I was initially confused why the limit was still in effect). However, it doesn't do it in the right place; adding it to httpd-2.0.59\srclib\apr\include\apr.hw just before the winsock2.h include allows ab to use more than 64 sockets on Windows. You also need to tell benchmark.py about the location of the recompiled ab.exe (in the AB_PATH global variable), and remove or rewrite the special-casing for the number of threads when sys.platform == win32. Here's the results for the full CherryPy stack:
Client Thread Report (1000 requests, 14 byte response body, 10 server threads):
threads | Completed | Failed | req/sec | msec/req | KB/sec |
25 | 1000 | 0 | 531.15 | 1.883 | 84.45 |
50 | 1000 | 0 | 520.08 | 1.923 | 82.69 |
100 | 1000 | 0 | 499.28 | 2.003 | 79.39 |
200 | 1000 | 0 | 480.08 | 2.083 | 76.33 |
400 | 1000 | 0 | 436.05 | 2.293 | 69.33 |
Average | 1000.0 | 0.0 | 493.328 | 2.037 | 78.438 |
How about defining "modest laptop", or better still, give figures for requesting a static file containing the same text from Apache when running on the same machine. Ensure that the version of Apache and which Apache MPM is being used are stated as well.
If you give the figures for Apache then people with a different machine can run a test with Apache (something most people would have access to) and by comparing their Apache results to your results, then extrapolate as to what performance CherryPy WSGI server will yield without actually having to install CherryPy. Without some sort of reference like this the results are pretty meaningless and can't easily be compared to other systems. :-)
Actually, there's little doubt FAPWS is much faster (I've benchmarked ~5K req/s with FAPWS vs ~1K req/s for CP3's wsgiserver on the same hardware) serving a simple "hello, world." type page.
Still, I'd suggest that CP3 currently has the fastest working implementation of a WSGI server as FAPWS is highly experimental, unstable, mostly undocumented, and not really suitable for anything but benchmarks ;-)
BTW, isn't "I am a real human" just what a not-real human would say?
|<< <||> >>|