|« Faces are not Proof, but they are Trust||The Fourth Way »|
Currently (rev 1193), a typical CherryPy request has a standard execution path, and a standard time to complete it:
0.008 _cpwsgi.py:51(_wsgi_callable) 0.001 _cpwsgi.py:36(translate_headers) 0.001 _cpengine.py:131(request) 0.001 _cprequest.py:623(Response.__init__) 0.006 _cprequest.py:116(run) 0.000 _cprequest.py:230(process_request_line) 0.001 _cprequest.py:265(process_headers) 0.003 _cprequest.py:189(respond) 0.001 _cprequest.py:294(get_resource) 0.001 _cprequest.py:415(Dispatcher.__call__) 0.001 _cprequest.py:432(find_handler) 0.001 _cprequest.py:326(tool_up) 0.001 _cprequest.py:644(finalize) 0.001 cherrypy\__init__.py:96(log_access) 0.001 logging\__init__.py:1015(log) 0.001 logging\__init__.py:1055(_log) 0.001 cherrypy\__init__.py:51(__getattr__) 0.001 :0(getattr)
_cpwsgi._wsgi_callable() takes about 8 msec (on my box using the builtin timer). That number breaks down into 1 msec for
translate_headers(), 1 msec for
_cpengine.request(), and 6 msec for
Request.run(). Etcetera. These are all of the calls which take 1 msec or more to complete.
It looks like moving to Python's builtin logging for the access log has added 1 msec to
Request.run(). I think that's reasonable; we lose a millisecond but gain syslog and rotating log options.
Somebody please explain to me why
_cpwsgi.translate_headers takes a millisecond to change 20 strings from "
HTTP_HEADER_NAME" to "
Header-Name". I've tried lots of rewritings of that to no avail; moving from "yield" to returning a list did nothing, nor did inlining it into
I tried making the default Dispatcher cache the results from
find_handler. That is,
cache[(app, path_info)] = func, vpath, request.config. I couldn't see any speedup on cache hits.
The next-to-last line above is interesting.
0.001 cherrypy\__init__.py:51(__getattr__) shows 1 msec being used for
cherrypy.response. I've already done a lot of work to minimize this by looking them up once and binding to a local, for example,
request = cherrypy.request, and then looking up further attributes using the local name. But perhaps there's more to be done.
The last line above shows 1 msec being used to call the builtin
getattr() function. Seems we have a very object-oriented style.
I'll keep looking for ways to get any of those 0.001's to read 0.000. Perhaps now that I've moved profiling to WSGI middleware, I can aggregate times and work with numbers that have a little more precision.
translate_headers is the first to fall under 0.001. http://www.cherrypy.org/changeset/1195
What tool did you use to get the nice profile tree?
I used the CherryPy profiler module output, my knowledge of CP internals, and my spacebar. ;) But it would be sweet if there were a tool to do that automatically, eh?
I'm glad to see you doing this work.
CherryPy is immensely cool, and I'd like to use it for some very heavily loaded intranet sites in the not-too-distant future. I haven't tried it and founds its performance inadequate yet, but every little bit helps.
For the sake of heavily loaded sites, one thing I'd especially like to see integrated into CherryPy 3.0 is memcached session support (http://groups.google.com/group/cherrypy-users/browse_thread/thread/4fe014dc35683ed0/72ebf8ec451d2589?q=memcached&rnum=3#72ebf8ec451d2589 ). What do you think of that idea?
|<< <||> >>|