Category: CherryPy

Pages: << 1 2 3 4 5 6 >>

07/10/06

Permalink 11:07:40 am, by fumanchu Email , 361 words   English (US)
Categories: CherryPy

CherryPy 3 optimization

Currently (rev 1193), a typical CherryPy request has a standard execution path, and a standard time to complete it:

0.008 _cpwsgi.py:51(_wsgi_callable)
    0.001 _cpwsgi.py:36(translate_headers)
    0.001 _cpengine.py:131(request)
        0.001 _cprequest.py:623(Response.__init__)
    0.006 _cprequest.py:116(run)
        0.000 _cprequest.py:230(process_request_line)
        0.001 _cprequest.py:265(process_headers)
        0.003 _cprequest.py:189(respond)
            0.001 _cprequest.py:294(get_resource)
                0.001 _cprequest.py:415(Dispatcher.__call__)
                    0.001 _cprequest.py:432(find_handler)
            0.001 _cprequest.py:326(tool_up)
            0.001 _cprequest.py:644(finalize)
        0.001 cherrypy\__init__.py:96(log_access)
            0.001 logging\__init__.py:1015(log)
                0.001 logging\__init__.py:1055(_log)

0.001 cherrypy\__init__.py:51(__getattr__)
0.001 :0(getattr)

That is, _cpwsgi._wsgi_callable() takes about 8 msec (on my box using the builtin timer). That number breaks down into 1 msec for translate_headers(), 1 msec for _cpengine.request(), and 6 msec for Request.run(). Etcetera. These are all of the calls which take 1 msec or more to complete.

It looks like moving to Python's builtin logging for the access log has added 1 msec to Request.run(). I think that's reasonable; we lose a millisecond but gain syslog and rotating log options.

Somebody please explain to me why _cpwsgi.translate_headers takes a millisecond to change 20 strings from "HTTP_HEADER_NAME" to "Header-Name". I've tried lots of rewritings of that to no avail; moving from "yield" to returning a list did nothing, nor did inlining it into _wsgi_callable.

I tried making the default Dispatcher cache the results from find_handler. That is, cache[(app, path_info)] = func, vpath, request.config. I couldn't see any speedup on cache hits.

The next-to-last line above is interesting. 0.001 cherrypy\__init__.py:51(__getattr__) shows 1 msec being used for cherrypy.request and cherrypy.response. I've already done a lot of work to minimize this by looking them up once and binding to a local, for example, request = cherrypy.request, and then looking up further attributes using the local name. But perhaps there's more to be done.

The last line above shows 1 msec being used to call the builtin getattr() function. Seems we have a very object-oriented style. ;)

I'll keep looking for ways to get any of those 0.001's to read 0.000. Perhaps now that I've moved profiling to WSGI middleware, I can aggregate times and work with numbers that have a little more precision. ;)

06/15/06

Permalink 01:23:24 am, by admin Email , 753 words   English (US)
Categories: Python, CherryPy

How CherryPy processes a request

Inspired by James Bennett, here's a little treatise on how CherryPy processes a request. A couple of differences, though. First, Django is a "full-stack" web framework, with an ORM, built-in templating, etcetera, whereas CherryPy focuses on HTTP. Second, I'll be showing the process for CherryPy 2.2 (the current stable branch), but I'll try to point out along the way where CherryPy 3 (now in alpha) differs.

HTTP Server

Something must actually sit on a listening socket and receive requests from HTTP clients. CherryPy provides an HTTP server (_cpwsgiserver.py), or you can use Apache, lighttpd, or others.

Bridge from HTTP Server to CherryPy

The Web Server Gateway Interface spec came into being to connect various HTTP servers to various web frameworks (and gateways and middleware and...). If you want to use it to connect an HTTP server with CherryPy, feel free. CherryPy provides a "WSGI application callable" in _cpwsgi.py. Otherwise, you need a specific adapter at this stage to connect the two.

The CherryPy Engine

Whether you use WSGI or not for the Bridge, it calls Engine.request(), which creates the all-important objects cherrypy.request and cherrypy.response, returning the former. The Bridge then calls request.run(), passing it the incoming message stream.

The CherryPy Request

Several steps occur here to convert the incoming stream to more usable data structures, pass the request to the appropriate user code, and then convert outbound data. In-between the standard processing steps, users can define extra code to be run via filters (CP 2.2) or hooks (CP 3). Here's how CherryPy 2 does it:

  1. Request.processRequestLine() analyzes the first line of the request, turning "GET /path/to/resource?key=val HTTP/1.1" into a request method, path, query string, and version.
  2. Any on_start_resource filters are run.
  3. Request.processHeaders() turns the incoming HTTP request headers into a dictionary, and separates Cookie information.
  4. Any before_request_body filters are run.
  5. Request.processBody() turns the incoming HTTP request body into a dictionary if possible, otherwise, it's passed onward as a file-like object.
  6. Any before_main filters are run.
  7. The user-supplied page handler is looked up (see below).
  8. The user-supplied page handler is invoked. Its return value, which can be a string, a list, a file, or a generator object, will be used for the response body.
  9. Any before_finalize filters will be run.
  10. Response.finalize() checks for HTTP correctness of the response, and transforms user-friendly data structures into HTTP-server-friendly structures.
  11. Any on_end_resource filters are run.

CherryPy 3 performs the same steps as above, but in the order: 1, 3, 7, 2, 4, 5, 6, 8, 9, 10, 11. That is, it determines which bit of user code will respond to the request much earlier in the process. This also means that internal redirects can "start over" much earlier. In addition, CP 3 can collect configuration data once (at the same time that it looks up the page handler); CP 2 recollected config data every time it was used.

Page handlers

As mentioned (steps 7 and 8, above), CherryPy users write "page handlers", functions which receive the request parameters as arguments, and return the response body. CherryPy makes clever use of threadlocals, so all other data a developer needs is available in the global cherrypy.request and cherrypy.response objects (the parameters are as well, but it's awfully convenient to receive them as arguments to the page handler, and to return the body rather than setting it).

The URL is mapped to a page handler by traversing a tree of such handlers, so that the handler for "/a/b/c" is most likely root.a.b.c(). I say "most likely", because you can also define index() handlers and default() handlers.

The CherryPy Response

When the call to Request.run() returns, the Bridge uses the Response attributes status, header_list, and body to construct the outbound stream, and pass it to the HTTP server that made the request. CherryPy works hard to support both buffered and streaming output, so the body may be a generator object that is only iterated over at this point.

Exceptional circumstances

The page handler, or any of the filters/hooks, can decide that the response is complete, and that processing should be stopped. Most often, this is accomplished by raising an HTTPRedirect (3xx) exception, or an HTTPError (4xx or 5xx; NotFound (404) is so common it has its own subclass). Unanticipated errors are automatically converted into HTTPError(500). Users have some facility for modifying the actual error output with additional error filters/hooks.

That's it!

05/09/06

Permalink 11:49:50 pm, by fumanchu Email , 134 words   English (US)
Categories: CherryPy

One of the ways CherryPy 3 will rock

Looks like CherryPy 3 will be significantly faster than CP 2.2. Here are some quick benchmark (Apache ab) stats from my little Win2k laptop. The first three are from the same test (1000 requests, 14 byte response body, 10 server threads), for 10 to 50 client threads:

req/sec x threadsmsec/req x threadskb/sec x threads

These two are from a different test (1000 requests, 50 client threads, 10 server threads), for response sizes of 10 bytes, 100, 1K, 10K, 100K, and 100M: req/sec x byteskb/sec x bytes

I believe the improvement comes from three areas. First, the lowercase_api flag and checks are no longer needed. Second, filters are no longer called just to see if they're turned on. Third, all of the configs and special attributes are now looked up once, inline with the page handler (i.e, controller method) lookup.

I can't wait to run the benchmark suite on a real server. :)

04/23/06

Permalink 08:15:15 pm, by fumanchu Email , 1363 words   English (US)
Categories: CherryPy

CherryPy 3 directions

I committed the first round of changes for CherryPy version 3 on Friday. It's nowhere near complete, but it hopefully can give hints about the future.

Before I dive into the meat, you should know I moved some things around:

  • _cphttptools is now called _cprequest
  • There's a new 'tools.py' module (see below).
  • All of the code in the /filters folder still exists, but it's all been moved into the /lib folder. The filters folder has been removed.
  • You can now call functions and instantiate objects in config files. For example: now = cherrypy.lib.httptools.HTTPDate()

Dispatchers

In CherryPy 2.2, you're able to replace the page-handler-dispatch mechanism by using a custom Request class; that is, you would subclass _cphttptools.Request and override the main or mapPathToObject methods. That can be tedious, since you can't specify the Request class on a per-request basis; the Request object has already been formed by the time the URL has been parsed.

In CP 3, there's a new _cprequest.dispatch function, and each Request object calls it. If you don't like the way CP looks up page handlers by default, you can declare your own dispatcher in the config:

dispatcher = my.custom.dispatcher.function

or

dispatcher = my.custom.DispatchClass(blah)

The only requirement is that the right-hand-side be a callable: it takes a "path" argument and should return a page handler (a callable). The default Dispatcher also sets request.virtual_path, so unless you're also setting request.execute_main to False you should probably do the same.

Filtering is now Hooking

I had a good long look at filters in CP 2.2. Despite their name, they don't really "filter" anything; nothing "passes through them". Some of them modify cherrypy.request attributes, but just as many of them don't. They're not implemented as filters; instead, they're "hooks".

A "hook" usually means a place where callbacks are called, and CherryPy filters have always been called from a pre-determined set of hooks (e.g. before_request_body). So I went ahead and changed the terminology throughout the codebase.

But there's a much bigger change than just the name. People have been pining for CP to release its grip over both filter declaration (which filters are available) and filter invocation (CP 2 calls all filter methods whether enabled or not). These issues have largely been solved in the current trunk by moving control out of the global cherrypy.filters module and into each Request object. Every Request object now possesses a "hooks" attribute, a _cprequest.HookMap object. The HookMap class has the following attributes and methods:

  • callbacks: a dict of the form {hookpoint: [callback, ...]}. The "hookpoint" is one of our old filter method names, like "before_finalize".
  • failsafe: a list of hooknames that should run all their callbacks, even if some of those callbacks raise exceptions.
  • attach(point, callback, conf=None): allows you to attach a callback to be invoked by this request. Any code can do this, and can do it on the fly! See the new caching module for an example; if the request is served from cache before_main, then the logic which would cache the page handler output is never attached, and therefore never invoked.
  • run(point): runs all registered callbacks for the given hook point.
  • populate_from_config(): this is called automatically by the Request object, and searches for Tools which it can call to setup hooks. What's a Tool? Read on...

Tools

CherryPy has always included a number of extensions and libraries which help you design web applications more quickly. In addition, many people have designed their own extensions to CP, some as custom filters, some as decorators, some as base classes to be subclassed, some as WSGI middleware, custom Request objects, on*Start methods, etc., etc., etc.

I'd like to call all of these extensions "features" for the rest of this post. A "feature" in this sense is any function(s) or module which could be implemented in a variety of ways. If the feature should apply site-wide, you probably want to run it like a CP2-style filter, and perhaps declare its scope in the config dict/file. But if it only applies to a page handler or two, you might think a decorator would be more attractive syntax. Sometimes, you want to invoke the feature from inside the page handler, after you've inspected a certain header, or after a lookup has failed.

However, it often happens that implementing your feature to be used in one of these ways harms its use in another: if you make a lovely decorator out of your feature, chances are that you cannot just "plug it in" as a before_main handler and expect it to work. This was a big problem for CP-2; a lot of logic could be useful elsewhere, but wasn't available because it was "locked away" inside a filter or some other construct.

A Tool is my new term for "feature adapter". If you can write your feature as a normal Python function, with normal Python arguments instead of config.get calls, chances are it can be wrapped in a Tool in a single line of code:

cherrypy.tools.cool_stuff = cherrypy.tools.Tool('before_finalize', cool.stuff)

What does that line buy you?

  • Your function is registered in the CherryPy tool registry, so
  • Your function can be called from the tools namespace: tools.cool_stuff(*args, **kwargs).
  • Your function can be used as a decorator via @tools.cool_stuff.wrap(*args, **kwargs). Any arguments passed to wrap() get passed to your function whenever it is called.
  • Your function can be used as a hook and managed in config. Remember the populate function (above)? It scans through the current config, finds any items that start with "tools.", and checks to see that "tools.cool_stuff.on" is True. If it is, it takes all other "tools.cool_stuff.*" config entries and passes them as named arguments to your cool_stuff function, at the hook point you requested.

That is the "simple case", and there is sufficient room for very complex additions to that (grep for the setup method). If your feature needs to replace the page handler, for example (as caching, static, and xmlrpc do), there's a tools.MainTool class; when used as a decorator or a hook, it automatically skips the page handler for you if your function returns True (meaning "I've handled this request, thanks").

I plan to explore other Tool improvements in the near future:

  • Argument inspection is high on the list, so that decorators, etc get the same argspec as your original function. You might also be able to import tools and let your IDE auto-complete your config entries, which in my mind would cut down on reaching for manuals quite a bit. It would have to be optional, because IIRC Jython doesn't have an "inspect" module.
  • Other wrappers on the Tool class for...what? Base classes? WSGI middleware? custom Request objects? on*Start/Stop methods?
  • Look harder at the flags request.processRequestBody and request.execute_main. They're ugly. Devious thought: replace request.processRequestBody and request.main with default hooks.
  • "Tools" may not be the best name.
  • Other hook points are possible. Investigate using hooks in a more generic fashion.
  • Using a tool as a decorator effectively means that it is not overridable in config. This "feature lock" is something I've wanted for quite a while, but there may need to be some means of allowing config to override such features or their arguments. For example, a developer may want to insist that a "staticfilter" be in place, but not particularly care about the OS path to its resources.

There are other issues that need to be addressed in CherryPy 3, of course (separating the CP server and the HTTP server springs instantly to mind). But these changes should give a us a good basis for consolidation of a lot of code, and the freedom to use all our beautiful library logic in whatever way is most appropriate to each application and installation. I look forward to all your ideas and improvements.

03/19/06

Permalink 11:27:57 pm, by fumanchu Email , 129 words   English (US)
Categories: Python, CherryPy

Python webapps no longer deadorex

Are we live, or are we deadorex?

I spent a few hours of my weekend working on getting a Read-Eval-Print Loop (sometimes called an "interactive interpreter") in a web browser. It was surprisingly easy to do so using Python's builtin code module and CherryPy. You can get it here: http://projects.amor.org/misc/wiki/HTTPREPL If anyone wants to contribute adapters for other web frameworks, I'd be happy to include them.

Anyway, now that you can build your application completely on the fly, we're one step closer to Smalltalk-style web nirvana. Maybe I should include a textarea option for larger chunks of code? Maybe an option to save the command history with the prompts stripped out? Hm...

Example HTTPREPL session

02/26/06

Permalink 03:58:33 am, by fumanchu Email , 552 words   English (US)
Categories: CherryPy

Making a custom CherryPy Request class for Routes

While at PyCon in Dallas this weekend, I got a chance to hear David Creemer talk about how he's using CherryPy (among lots of other tools) to deliver a site that's getting 250,000 hits a day before it's even been officially launched. He mentioned during the "lightning" (5 minute) talk that, of his entire toolkit, CherryPy and SQLObject were two of the tools that "mostly worked, except..." I spoke with him after the talk about his concerns, and the big CherryPy issue was dispatching: he prefers a Routes-style dispatch mechanism, which makes changes to the design easier.

He had previously posted his cherrypy+routes script, which I read but hadn't done anything about. I mentioned to him yesterday that it might be better, when overriding the dispatch mechanism, to do so in a custom Request class, rather than in a single exposed default method on the cherrypy tree. Here's a first crack at what that would look like; I haven't tested it but it's more to get the idea of custom Request classes out there than to be a working patch ;)

import urllib
import cherrypy
from cherrypy import _cphttptools
import routes


mapper = routes.Mapper()
controllers = {}

def redirect( url ):
    raise cherrypy.HTTPRedirect( url )

def mapConnect( name, route, controller, **kwargs ):
    controllers[ name ] = controller
    mapper.connect( name, route, controller=name, **kwargs )

def mapFinalize():
    mapper.create_regs( controllers.keys() )

def URL( name, query = None, doseq = None, **kwargs ):
    uri = routes.url_for( name, **kwargs )

    if not uri:
        return "/UNKNOWN-%s" % name

    if query:
        uri += '?' + urllib.urlencode(query, doseq)

    return uri


class RoutesRequest(_cphttptools.Request):

    def main(self, path=None):
        """Obtain and set cherrypy.response.body from a page handler."""
        if path is None:
            path = self.object_path

        page_handler = self.mapPathToObject(path)

        virtual_path = path.split("/")
        # Decode any leftover %2F in the virtual_path atoms.
        virtual_path = [x.replace("%2F", "/") for x in virtual_path if x]

        kwargs = self.params.copy()
        kwargs.update( cherrypy.request.mapper_dict )

        try:
            body = page_handler(*virtual_path, **kwargs)
        except Exception, x:
            x.args = x.args + (page_handler,)
            raise
        cherrypy.response.body = body

    def mapPathToObject(self, objectpath):
        """For path, return the corresponding exposed callable (or raise NotFound).

        path should be a "relative" URL path, like "/app/a/b/c". Leading and
        trailing slashes are ignored.
        """

        # tell routes to use the cherrypy threadlocal object
        config = routes.request_config()
        if hasattr(config, 'using_request_local'):
            config.request_local = lambda: self
            config = routes.request_config()

        # hook up the routes variables for this request
        config.mapper = mapper
        config.host = self.headerMap['Host']
        config.protocol = self.scheme
        config.redirect = redirect
        config.mapper_dict = mapper.match( objectpath )

        if config.mapper_dict:
            c = config.mapper_dict.pop( 'controller', None )
            if c:
                controller = controllers[c]

                # we have a controller, now emulate cherrypy's index/default/callable semantics:
                action = config.mapper_dict.pop( 'action', 'index' )

                meth = getattr( controller, action, None )
                if not meth:
                    meth = getattr( controller, 'default', None )

                if not meth and callable( controller ) and action == 'index' :
                    meth = controller

                if meth and getattr( meth, 'exposed', False ):
                    return meth

        raise cherrypy.NotFound( objectpath )


# 'authui' is a module with login/logout functions

mapConnect( name = 'home', route = '', controller = home )
mapConnect( name = 'auth', route = 'auth/:action', controller = authui,
            requirements = dict( action='(login|logout)' ) )
mapFinalize()

cherrypy.server.request_class = RoutesRequest
cherrypy.server.start()

Look, Ma, no root!

02/07/06

Permalink 11:10:35 pm, by fumanchu Email , 213 words   English (US)
Categories: Python, General, Dejavu, CherryPy

We're hiring, by the way

The job posting is pretty tame: we need a Python web developer. But I thought I'd add my personal point-of-view, and say that we really mean "developer" and not just "coder". You'd be responsible for producing working web apps, but that involves a lot of design work and architectural decision-making.

You would also be expected to contribute to the CherryPy HTTP framework and to Dejavu (my Python ORM), since I'm a core dev on both those projects and use them heavily already. In other words, if you have or want exposure to the full stack of modern web development challenges, this is the job for you. You'll be a full member of an IT team of 3 serving an energetic staff of 50.

You'll also get something that's hard to find in most programming jobs: warm fuzzies. We build homes for the poor in Mexico, simultaneously "building" the church in Mexico, the U.S., Canada, and elsewhere. We are not on the cutting-edge of world missions--we are defining that edge. If you've been thinking about "doing more for Jesus", but would rather write code than dig ditches in Uganda, give us a call (619-662-1200 ext 11).

11/23/05

Permalink 11:42:58 am, by fumanchu Email , 695 words   English (US)
Categories: CherryPy

What will CherryPy 3 look like?

The correct answer is: "nobody knows". But here are some ideas I've been kicking around the ol' cranium lately...

[09:32] *** now talking in #cherrypy
[10:22] <Lawouach> where to start
[10:22] <Lawouach> what's your basic idea toward 3.0?
[10:22] <@fumanchu> oh, I have so many ;)
[10:22] <Lawouach> lol
[10:22] <Lawouach> say big general ones :)
[10:22] <Lawouach> not details per se
[10:23] <@fumanchu> 1) make CP have a kick-butt,
    non-CP-specific toolkit (lib/httptools), that is SO
    good that Quixote, Django, et al can't *help* but
    decide to use it instead of their own server processes
[10:24] <@fumanchu> even if they don't like the way CP
    maps handlers to URL's, for example
[10:24] <@fumanchu> they should be able to build a
    server with the behavior they like out of lib/httptools
[10:25] <Lawouach> we want to be lib that rule them all :)
[10:25] <@fumanchu> yup
[10:26] <Lawouach> i agree as long as we don't become a
    framework on our own, but i already know it's not what
    you intend :)
[10:26] <@fumanchu> right
[10:26] <@fumanchu> it's an anti-framework approach
[10:26] <@fumanchu> we make writing-a-web-framework
    into a weekend's work
[10:27] <@fumanchu> take some from column A; try all of column B
[10:27] <Lawouach> do you want to stay very low-level
    (aka HTTP wrapper level) or make it a bit higher level
    and provide functions such as the bast_match() we were
    talking about last week?
[10:27] <@fumanchu> best_match would be fine as long
    as it doesn't depend upon cherrypy
[10:28] <Lawouach> right, this was a bad example
[10:28] <Lawouach> but basically where httptools should stop?
[10:28] <@fumanchu> I think that can be open-ended
[10:28] <Lawouach> i think we should keep the level
    you've been doing till now
[10:29] <@fumanchu> 2) then, by pulling a ton of code
    out of _cphttptools (putting it in lib/httptools instead),
    I want to see if we can get the Request and Response
    objects down to a tiny size
[10:34] <@fumanchu> the trunk version of _cphttptools
    is already 60% of its 2.1 size
[10:35] <Lawouach> right. hmmm
[10:37] <@fumanchu> and a *lot* of what's left is very OO
[10:38] <@fumanchu> so, one idea I'm toying with: allow
    developers to use their own subclasses of Request
    and Response
[10:40] <@fumanchu> if we make it super-easy to use custom
    Request subclasses, then they will want to start
    overriding Request.run
[10:40] <@fumanchu> take out the filter logic, and
    Request.run becomes:
def _run(self, requestLine, headers, rfile):

    self.headers = list(headers)
    self.headerMap = httptools.HeaderMap()
    self.simpleCookie = Cookie.SimpleCookie()
    self.rfile = rfile
    self.processRequestLine(requestLine)

    try:
        self.processHeaders()
        self.processBody()
        self.main()
        cherrypy.response.finalize()
    except cherrypy.RequestHandled:
        pass
    except (cherrypy.HTTPRedirect, cherrypy.HTTPError), inst:
        inst.set_response()
        cherrypy.response.finalize()
[10:40] <Lawouach> regarding the subclassing of request
    and response, i'm know that it could interest very
    much the guys behind itools
[10:40] <@fumanchu> yes
[10:40] <@fumanchu> and Ben Bangert (routes)
[10:41] <@fumanchu> anyway, if Request.run is *that* simple,
    then who needs filters?
[10:41] <@fumanchu> just code them procedurally into your
    Request.run method
[10:43] <@fumanchu> looking over the filters that are built in...
[10:44] <@fumanchu> I think that half could be done just as
    easily as lib/httptools functions
[10:44] <@fumanchu> and half could be "always on"
[10:44] <@fumanchu> (if we continue to improve them, like
    encodingfilter, to meet the HTP spec)
[10:44] <@fumanchu> HTTP
[10:44] <Lawouach> that's my white cheap :) (i don't think this
    expression exists so i make it up!)
[10:45] <Lawouach> i really want CP to be HTTP conditionnaly compliant
    at least :)
[10:45] <Lawouach> and maybe in CP 4.0 to be unconditionnaly compliant!
[10:45] <Lawouach> :p
[10:45] <@fumanchu> I completely agree
[10:46] <@fumanchu> anyway, I want to stress that I'm still playing
    with these ideas
[10:46] <@fumanchu> nothing's set in stone
[10:47] <Lawouach> since you've be proposing them a while back,
    i've been a great fan of them
[10:47] <@fumanchu> and trying to implement them will turn up
    lots of problems, I'm sure
[10:47] <@fumanchu> oh, well thanks
[10:47] <Lawouach> that's why i don't have so many different
    things to bring for cp 3.0
[10:51] <@fumanchu> one of the nice things about these ideas
    for 3.0 is that the bulk of the work can be done within
    the 2.x branch

10/27/05

Permalink 01:08:47 am, by fumanchu Email , 1452 words   English (US)
Categories: CherryPy

URL-rewriting in CherryPy 2.1

There are a lot of reasons, and places, why a developer would want an original Request-URI to be treated as if it were another. CherryPy 2.1.0 has a (possibly bewildering) array of attributes, core code, and filters which either enable rewriting or are affected by it. Here's how I see the state of the art (this is not gospel--much is my opinion regarding design intent).

Features

First, some features which depend on rewriting:

  1. Generating URL's to spit back out in HTML.
  2. HTTP Redirects and their targets (new URL).
  3. Handler dispatch: mapping URI's to handler methods. Includes...
  4. Arbitrary mount points: allowing a deployer to mount an application at an arbitrary base URI.
  5. Config lookups, since the config map is keyed by URI (path only--no queryString or fragment).
  6. Logging: do you log the original URI or the rewritten one, or both? in different logs? in all messages?

Request Attributes

Now, cherrypy.request has the following attributes (grabbed straight from the book):

  • requestLine: This attribute is a string containing the first line of the raw HTTP request; for example, "GET /path/page HTTP/1.1".
  • method: This attribute is a string containing the HTTP request method, such as GET or POST.
  • path: This attribute is a string containing the path of the resource the client requested.
  • queryString: This attribute is a string containing the query string of the request (the part of the URL following '?').
  • protocol: This attribute is a string containing the HTTP protocol of the request in the form of HTTP/x.x

Let's take an example HTTP requestLine and see if we can't parse it out:

     DELETE /path/to/handler/?param=somevalue HTTP/1.1
     \____/ \_______________/ \_____________/ \______/
     method       path          queryString   protocol

Pretty straightforward; no overlaps. Note that if the Request-URI includes a scheme and host, that'll be stripped when path is formed.


There are a couple of other URI-related request attributes:

  • base: This attribute is a string containing the root URL of the server. By default, it is equal to scheme://headerMap['Host'].
  • browserUrl: This attribute is a string containing the URL the client requested. By default, it is equal to base + path, plus the queryString, if provided.

Since the requestLine doesn't always include the scheme or host (it may, rarely), these are obtained from other sources and joined into base. The browserUrl joins the base, the path, and the queryString to form a complete, absolute URI (what was hopefully in the Address bar of the end-user's web browser, if that's applicable).


Finally, we have these copies/substitutes for the functionality provided by path:

  • objectPath: This attribute is a string containing the path of the exposed method that will be called to handle this request. This is usually the same as cherrypy.request.path, but can be changed in a filter to change which method is actually called.
  • originalPath: This attribute is a string containing the original value of cherrypy.request.path, in case it is modified by a filter during the request.

The objectPath may be used to control dispatching, but there's nothing in the core that uses it that way. Since it's almost always None, dispatching usually falls back to the value of path. Once the handler dispatch is completed, then objectPath contains the route to the found handler, expressed as a path; in the above example, it might be "/path/to/handler/index" if an "index" function handles the request.

The originalPath is also an odd attribute. You would think that CherryPy core features, especially those which use or implement URI rewriting, would make use of this value. But none of them do. It gets set but never used.

How to rewrite in 2.1

Rewriting "base"

This is what the builtin baseUrlFilter does, so that an instance of CherryPy running behind Apache with mod_proxy or mod_rewrite can spit back out proper URI's in HTML, redirects, etc. As far as I can tell, this works well and has no issues with the rest of CherryPy. The only other value which overlaps with the value of base is browserUrl, which the filter also rewrites.

Rewriting "path"

Another way to rewrite is to use a filter that changes the value of path for you as early as possible. For example, I use a VirtualPathFilter which does this:

class VirtualPathFilter(object):
    """Filter that changes cherrypy.request.path, stripping a set prefix."""

    def onStartResource(self):
        if cherrypy.config.get('virtualPathFilter.on', False):
            prefix = cherrypy.config.get('virtualPathFilter.prefix', '')
            if prefix:
                path = cherrypy.request.path
                if path == prefix:
                    path = '/'
                elif path.startswith(prefix):
                    path = path[len(prefix):]
                cherrypy.request.path = path

This allows me to provide feature #4, arbitrary mount points. I write my application as if it were always mounted at /, but the deployer can then provide a virtualPathFilter.prefix to turn the URL /prefix/page?id=3 into /page?id=3.

Unfortunately, if the other pieces of CherryPy aren't written to support arbitrary mount points, then this scheme falls apart. And they aren't so written. I've just broken many of our other features:

  1. Generating URL's to spit back out in HTML. Broken. I now have to manually provide prefix to my HTML templates, or take on the nightmare of making every generated URL into a URL which is relative (e.g. "../../otherpage") to the current one.
  2. HTTP Redirects and their targets. Broken. I now have to manually provide prefix to each instance (or use relative URL's). But I can't control CherryPy's redirect instances! For example, when CherryPy tries to redirect index methods by adding a trailing slash to the requested URI, it uses the value of path, which I've rewritten.
  3. Handler dispatch: not broken.
  4. Arbitrary mount points: not broken.
  5. Config lookups. Broken? Some other filter which does a config lookup could run their onStartResource method before mine. Since my filter is user-defined, it is forced to run after all of the builtin ones; none of those currently perform config lookups, however. If any of the server.* config entries are specified somewhere other than "global", then we have the same issue. Finally, what's to stop a future CP developer from adding more such problems (as they fix other bugs)?
  6. Logging: the error.log and access.log will both use the original URI (from requestLine). Broken? or not? One? Both?

Rewriting "objectPath"

An alternative to rewriting the path is to use a filter that changes the value of objectPath instead, before the handler is looked up and called. For example, we could change VirtualPathFilter to do this instead:

class VirtualPathFilter(object):
    """Filter that changes cherrypy.request.objectPath, stripping a set prefix."""

    def beforeMain(self):
        if cherrypy.config.get('virtualPathFilter.on', False):
            prefix = cherrypy.config.get('virtualPathFilter.prefix', '')
            if prefix:
                path = cherrypy.request.path
                if path == prefix:
                    path = '/'
                elif path.startswith(prefix):
                    path = path[len(prefix):]
                cherrypy.request.objectPath = path
                                 ^^^^^^^^^^

Are there any side-effects to this approach?

  1. Generating URL's to spit back out in HTML: Broken. No change from rewriting path.
  2. HTTP Redirects and their targets: Broken. No change from rewriting path.
  3. Handler dispatch: not broken.
  4. Arbitrary mount points: not broken.
  5. Config lookups. Broken. A call to config.get() defaults to using path, which we haven't rewritten, which might seem all right until you try to deploy the app: every configMap key must be rewritten to prefix the mount point, and this must be done separately for each site. Some might call this an acceptable trade-off. I don't. ;)
  6. Logging: probably not considered broken.

Recommendations for CherryPy 2.2

We need to fix rewriting path or objectPath, or both. Let's try fixing objectPath:

  1. Generating URL's to spit back out in HTML: What to do about user code? Tell them to always use relative URL's? Not acceptable, really. A rewriting filter needs some way to "unrewrite" an arbitrary path, it seems. A prefix-only rewriter could do this, for example, but not a regex-rewriter. Maybe a prefix stripped from path could just be suffixed to base?
  2. HTTP Redirects and their targets: Same issue as #1. But also make the trailing-slash hack redirect by using browserUrl instead of (objectPath or path) + queryString. Note that HTTPRedirect already uses browserUrl.
  3. Handler dispatch: not broken.
  4. Arbitrary mount points: not broken.
  5. Config lookups. Make config.get try objectPath? There's a big problem there: objectPath might grow an extra "/index" or "/default" suffix halfway through the request process. So we'd have to separate the two concepts into a "searchPath" and a "foundPath". Even if we did that, we would still have the issue that path-rewriting does (user-defined filters run late). We would have to find a way to run a rewriting filter before most (all?) others. Maybe rewriting shouldn't be a filter at all—maybe it should be part of the fixed API, if only for prefixed mount points.
  6. Logging: probably not considered broken.

Seems we have our work cut out for us.

10/26/05

Permalink 03:35:00 pm, by fumanchu Email , 262 words   English (US)
Categories: CherryPy

Agenda topics for CherryPy 2.2 roadmap meeting

There will be an IRC meeting for the CherryPy 2.2 roadmap on Thursday, 6pm GMT. Things I want to discuss (most-important items first):

  • Merging the /requestobj branch to trunk. This branch merges the cherrypy.request object with the _cphttptools.Request object. This is just generally a good idea, and will make the code cleaner. It could also be a first step toward...
  • Allowing subclassing of the Request object in order to handle various dispatch schemes.
  • An API for HTTP-method dispatch, including ways to encourage developers of new apps to care about idempotency.
  • Ticket #356: Formalize server.environment as a set of config defaults
  • Multiple apps in a single process, including...
  • Arbitrary mount points for applications
  • Possible solutions to #362 (guaranteed filter methods). One such solution might involve...
  • Alternatives to the filter system--see if there's any way to declare customizations (which the app developer sees as frozen, and critical to the app) in code
  • Formalize the intended uses of request attributes (like browserUrl, base, path, querystring, etc.). There is some confusion in the codebase regarding when those should be, and are currently, rewritten. This will the inform...
  • Fixing the VirtualHostFilter (to inspect the Host header)
  • Better HTTP content-negotiation (Accept-* and Vary headers)
  • Better content caching (Expires header, etc)
  • Not really for discussion, but I want to get my mod_python WSGI gateway to work as well as mpcp on Unix

More as I think of them...

<< 1 2 3 4 5 6 >>

March 2017
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Search

The requested Blog doesn't exist any more!

XML Feeds

free blog software