« Selenium RC fixed for FF 2.0.0.1Internal Redirect WSGI middleware »

CherryPy 3 has fastest WSGI server yet

12/23/06

Permalink 06:05:52 pm, by fumanchu Email , 608 words   English (US)
Categories: CherryPy

CherryPy 3 has fastest WSGI server yet

A couple of months ago, in response to someone else's speed claims, I posted a comment that CherryPy's built in WSGI server could serve 1200 simple requests per second. The demo used Apache's "ab" tool to test ("-k -n 3000 -c %s"). In the last few days before the release of CherryPy 3.0 final, I've done some further optimization of cherrypy.wsgiserver, and now get 2000+ req/sec on my modest laptop.

threads | Completed | Failed | req/sec | msec/req | KB/sec |
     10 |      3000 |      0 | 2170.79 |    0.461 | 358.18 |
     20 |      3000 |      0 | 2080.34 |    0.481 | 343.26 |
     30 |      3000 |      0 | 1920.31 |    0.521 | 316.85 |
     40 |      3000 |      0 | 2051.84 |    0.487 | 338.55 |
     50 |      3000 |      0 | 2051.84 |    0.487 | 338.55 |

The improvements are due to a variety of optimizations, including:

  • Replacing mimetools/rfc822.Message with custom code for reading headers.
  • Using socket.sendall instead of a socket fileobject for writes.
  • Generic hand-tuning of code loops.

I want to make it clear that the benchmark does not exercise any part of CherryPy other than the WSGI server. I used a very simple WSGI application (not the full CherryPy stack):

def simple_app(environ, start_response):
    """Simplest possible application object"""
    status = '200 OK'
    response_headers = [('Content-type','text/plain'),
                        ('Content-Length','19')]
    start_response(status, response_headers)
    return ['My Own Hello World!']

The full stack of CherryPy includes the WSGI application side as well, and consequently takes more time. But that has risen from about 380 requests per second in October to:

Client Thread Report (1000 requests, 14 byte response body, 10 server threads):

threads | Completed | Failed | req/sec | msec/req | KB/sec |
     10 |      1000 |      0 |  536.86 |    1.863 |  85.36 |
     20 |      1000 |      0 |  509.47 |    1.963 |  81.01 |
     30 |      1000 |      0 |  499.28 |    2.003 |  79.39 |
     40 |      1000 |      0 |  491.90 |    2.033 |  78.21 |
     50 |      1000 |      0 |  504.32 |    1.983 |  80.19 |
Average |    1000.0 |    0.0 | 508.366 |    1.969 | 80.832 |

If you want to benchmark the full CherryPy stack on your own, just install CherryPy and run the script at cherrypy/test/benchmark.py.

Here's the other script for the "bare server" benchmarks:

import re
import sys
import threading
import time
from cherrypy import _cpmodpy

AB_PATH = ""
APACHE_PATH = "apache"
SCRIPT_NAME = ""
PORT = 8080


class ABSession:
    """A session of 'ab', the Apache HTTP server  benchmarking tool."""
    parse_patterns = [('complete_requests', 'Completed',
                       r'^Complete requests:\s*(\d+)'),
                      ('failed_requests', 'Failed',
                       r'^Failed requests:\s*(\d+)'),
                      ('requests_per_second', 'req/sec',
                       r'^Requests per second:\s*([0-9.]+)'),
                      ('time_per_request_concurrent', 'msec/req',
                       r'^Time per request:\s*([0-9.]+).*concurrent requests\)$'),
                      ('transfer_rate', 'KB/sec',
                       r'^Transfer rate:\s*([0-9.]+)'),
                      ]

    def __init__(self, path=SCRIPT_NAME + "/", requests=3000, concurrency=10):
        self.path = path
        self.requests = requests
        self.concurrency = concurrency

    def args(self):
        assert self.concurrency > 0
        assert self.requests > 0
        return ("-k -n %s -c %s <a href="http://localhost:%s%s"">http://localhost:%s%s"</a> %
                (self.requests, self.concurrency, PORT, self.path))

    def run(self):
        # Parse output of ab, setting attributes on self
        args = self.args()
        self.output = _cpmodpy.read_process(AB_PATH or "ab", args)
        for attr, name, pattern in self.parse_patterns:
            val = re.search(pattern, self.output, re.MULTILINE)
            if val:
                val = val.group(1)
                setattr(self, attr, val)
            else:
                setattr(self, attr, None)


safe_threads = (25, 50, 100, 200, 400)
if sys.platform in ("win32",):
    # For some reason, ab crashes with > 50 threads on my Win2k laptop.
    safe_threads = (10, 20, 30, 40, 50)


def thread_report(path=SCRIPT_NAME + "/", concurrency=safe_threads):
    sess = ABSession(path)
    attrs, names, patterns = zip(*sess.parse_patterns)
    rows = [('threads',) + names]
    for c in concurrency:
        sess.concurrency = c
        sess.run()
        rows.append([c] + [getattr(sess, attr) for attr in attrs])
    return rows

def print_report(rows):
    widths = []
    for i in range(len(rows[0])):
        lengths = [len(str(row[i])) for row in rows]
        widths.append(max(lengths))
    for row in rows:
        print
        for i, val in enumerate(row):
            print str(val).rjust(widths[i]), "|",
    print


if __name__ == '__main__':

    def simple_app(environ, start_response):
        """Simplest possible application object"""
        status = '200 OK'
        response_headers = [('Content-type','text/plain'),
                            ('Content-Length','19')]
        start_response(status, response_headers)
        return ['My Own Hello World!']

    from cherrypy import wsgiserver as w
    s = w.CherryPyWSGIServer(("localhost", PORT), simple_app)
    threading.Thread(target=s.start).start()
    try:
        time.sleep(1)
        print_report(thread_report())
    finally:
        s.stop()

5 comments

Comment from: Chad Whitacre [Visitor] · http://tech.whit537.org/

Good work Robert! I upgraded Aspen trunk to the 3.0.0 server the other day ... the AI_PASSIVE changes threw me for a loop (an empty string for host now means AF_INET6 rather than AF_INET) but other than that I'm excited about the improvements. :)

BTW, I believe the problem with > 50 threads on Windows is due to a limit of 64 sockets in Windows select implementation, which is there to encourage use of some other MS-specific API. Try -c64 and -c65 to illustrate the limit.

12/26/06 @ 09:15
Comment from: Chad Whitacre [Visitor] · http://tech.whit537.org/

Sorry I missed you on IRC. Here's the reference I had in mind for the 64-socket limit:

http://mail.zope.org/pipermail/zope3-dev/2002-October/003235.html

12/27/06 @ 14:10
Comment from: fumanchu [Member] Email

Right. My IRC comment was noticing that both Python and Apache 2.0.59 set FD_SETSIZE to something larger than 64 (so I was initially confused why the limit was still in effect). However, it doesn't do it in the right place; adding it to httpd-2.0.59\srclib\apr\include\apr.hw just before the winsock2.h include allows ab to use more than 64 sockets on Windows. You also need to tell benchmark.py about the location of the recompiled ab.exe (in the AB_PATH global variable), and remove or rewrite the special-casing for the number of threads when sys.platform == win32. Here's the results for the full CherryPy stack:


Client Thread Report (1000 requests, 14 byte response body, 10 server threads):

threads | Completed | Failed | req/sec | msec/req | KB/sec |
25 | 1000 | 0 | 531.15 | 1.883 | 84.45 |
50 | 1000 | 0 | 520.08 | 1.923 | 82.69 |
100 | 1000 | 0 | 499.28 | 2.003 | 79.39 |
200 | 1000 | 0 | 480.08 | 2.083 | 76.33 |
400 | 1000 | 0 | 436.05 | 2.293 | 69.33 |
Average | 1000.0 | 0.0 | 493.328 | 2.037 | 78.438 |

12/28/06 @ 01:01
Comment from: Grumpy Old Man [Visitor]

How about defining "modest laptop", or better still, give figures for requesting a static file containing the same text from Apache when running on the same machine. Ensure that the version of Apache and which Apache MPM is being used are stated as well.

If you give the figures for Apache then people with a different machine can run a test with Apache (something most people would have access to) and by comparing their Apache results to your results, then extrapolate as to what performance CherryPy WSGI server will yield without actually having to install CherryPy. Without some sort of reference like this the results are pretty meaningless and can't easily be compared to other systems. :-)

01/22/07 @ 03:27
Comment from: Cliff Wells [Visitor] · http://pentropy.twisty-industries.com/

Actually, there's little doubt FAPWS is much faster (I've benchmarked ~5K req/s with FAPWS vs ~1K req/s for CP3's wsgiserver on the same hardware) serving a simple "hello, world." type page.

Still, I'd suggest that CP3 currently has the fastest working implementation of a WSGI server as FAPWS is highly experimental, unstable, mostly undocumented, and not really suitable for anything but benchmarks ;-)

BTW, isn't "I am a real human" just what a not-real human would say?

08/15/08 @ 04:24

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
April 2014
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30      

Search

The requested Blog doesn't exist any more!

XML Feeds

blogging tool