« WSGI wrapper for mod_pythonAgenda topics for CherryPy 2.2 roadmap meeting »

URL-rewriting in CherryPy 2.1

10/27/05

Permalink 01:08:47 am, by fumanchu Email , 1452 words   English (US)
Categories: CherryPy

URL-rewriting in CherryPy 2.1

There are a lot of reasons, and places, why a developer would want an original Request-URI to be treated as if it were another. CherryPy 2.1.0 has a (possibly bewildering) array of attributes, core code, and filters which either enable rewriting or are affected by it. Here's how I see the state of the art (this is not gospel--much is my opinion regarding design intent).

Features

First, some features which depend on rewriting:

  1. Generating URL's to spit back out in HTML.
  2. HTTP Redirects and their targets (new URL).
  3. Handler dispatch: mapping URI's to handler methods. Includes...
  4. Arbitrary mount points: allowing a deployer to mount an application at an arbitrary base URI.
  5. Config lookups, since the config map is keyed by URI (path only--no queryString or fragment).
  6. Logging: do you log the original URI or the rewritten one, or both? in different logs? in all messages?

Request Attributes

Now, cherrypy.request has the following attributes (grabbed straight from the book):

  • requestLine: This attribute is a string containing the first line of the raw HTTP request; for example, "GET /path/page HTTP/1.1".
  • method: This attribute is a string containing the HTTP request method, such as GET or POST.
  • path: This attribute is a string containing the path of the resource the client requested.
  • queryString: This attribute is a string containing the query string of the request (the part of the URL following '?').
  • protocol: This attribute is a string containing the HTTP protocol of the request in the form of HTTP/x.x

Let's take an example HTTP requestLine and see if we can't parse it out:

     DELETE /path/to/handler/?param=somevalue HTTP/1.1
     \____/ \_______________/ \_____________/ \______/
     method       path          queryString   protocol

Pretty straightforward; no overlaps. Note that if the Request-URI includes a scheme and host, that'll be stripped when path is formed.


There are a couple of other URI-related request attributes:

  • base: This attribute is a string containing the root URL of the server. By default, it is equal to scheme://headerMap['Host'].
  • browserUrl: This attribute is a string containing the URL the client requested. By default, it is equal to base + path, plus the queryString, if provided.

Since the requestLine doesn't always include the scheme or host (it may, rarely), these are obtained from other sources and joined into base. The browserUrl joins the base, the path, and the queryString to form a complete, absolute URI (what was hopefully in the Address bar of the end-user's web browser, if that's applicable).


Finally, we have these copies/substitutes for the functionality provided by path:

  • objectPath: This attribute is a string containing the path of the exposed method that will be called to handle this request. This is usually the same as cherrypy.request.path, but can be changed in a filter to change which method is actually called.
  • originalPath: This attribute is a string containing the original value of cherrypy.request.path, in case it is modified by a filter during the request.

The objectPath may be used to control dispatching, but there's nothing in the core that uses it that way. Since it's almost always None, dispatching usually falls back to the value of path. Once the handler dispatch is completed, then objectPath contains the route to the found handler, expressed as a path; in the above example, it might be "/path/to/handler/index" if an "index" function handles the request.

The originalPath is also an odd attribute. You would think that CherryPy core features, especially those which use or implement URI rewriting, would make use of this value. But none of them do. It gets set but never used.

How to rewrite in 2.1

Rewriting "base"

This is what the builtin baseUrlFilter does, so that an instance of CherryPy running behind Apache with mod_proxy or mod_rewrite can spit back out proper URI's in HTML, redirects, etc. As far as I can tell, this works well and has no issues with the rest of CherryPy. The only other value which overlaps with the value of base is browserUrl, which the filter also rewrites.

Rewriting "path"

Another way to rewrite is to use a filter that changes the value of path for you as early as possible. For example, I use a VirtualPathFilter which does this:

class VirtualPathFilter(object):
    """Filter that changes cherrypy.request.path, stripping a set prefix."""

    def onStartResource(self):
        if cherrypy.config.get('virtualPathFilter.on', False):
            prefix = cherrypy.config.get('virtualPathFilter.prefix', '')
            if prefix:
                path = cherrypy.request.path
                if path == prefix:
                    path = '/'
                elif path.startswith(prefix):
                    path = path[len(prefix):]
                cherrypy.request.path = path

This allows me to provide feature #4, arbitrary mount points. I write my application as if it were always mounted at /, but the deployer can then provide a virtualPathFilter.prefix to turn the URL /prefix/page?id=3 into /page?id=3.

Unfortunately, if the other pieces of CherryPy aren't written to support arbitrary mount points, then this scheme falls apart. And they aren't so written. I've just broken many of our other features:

  1. Generating URL's to spit back out in HTML. Broken. I now have to manually provide prefix to my HTML templates, or take on the nightmare of making every generated URL into a URL which is relative (e.g. "../../otherpage") to the current one.
  2. HTTP Redirects and their targets. Broken. I now have to manually provide prefix to each instance (or use relative URL's). But I can't control CherryPy's redirect instances! For example, when CherryPy tries to redirect index methods by adding a trailing slash to the requested URI, it uses the value of path, which I've rewritten.
  3. Handler dispatch: not broken.
  4. Arbitrary mount points: not broken.
  5. Config lookups. Broken? Some other filter which does a config lookup could run their onStartResource method before mine. Since my filter is user-defined, it is forced to run after all of the builtin ones; none of those currently perform config lookups, however. If any of the server.* config entries are specified somewhere other than "global", then we have the same issue. Finally, what's to stop a future CP developer from adding more such problems (as they fix other bugs)?
  6. Logging: the error.log and access.log will both use the original URI (from requestLine). Broken? or not? One? Both?

Rewriting "objectPath"

An alternative to rewriting the path is to use a filter that changes the value of objectPath instead, before the handler is looked up and called. For example, we could change VirtualPathFilter to do this instead:

class VirtualPathFilter(object):
    """Filter that changes cherrypy.request.objectPath, stripping a set prefix."""

    def beforeMain(self):
        if cherrypy.config.get('virtualPathFilter.on', False):
            prefix = cherrypy.config.get('virtualPathFilter.prefix', '')
            if prefix:
                path = cherrypy.request.path
                if path == prefix:
                    path = '/'
                elif path.startswith(prefix):
                    path = path[len(prefix):]
                cherrypy.request.objectPath = path
                                 ^^^^^^^^^^

Are there any side-effects to this approach?

  1. Generating URL's to spit back out in HTML: Broken. No change from rewriting path.
  2. HTTP Redirects and their targets: Broken. No change from rewriting path.
  3. Handler dispatch: not broken.
  4. Arbitrary mount points: not broken.
  5. Config lookups. Broken. A call to config.get() defaults to using path, which we haven't rewritten, which might seem all right until you try to deploy the app: every configMap key must be rewritten to prefix the mount point, and this must be done separately for each site. Some might call this an acceptable trade-off. I don't. ;)
  6. Logging: probably not considered broken.

Recommendations for CherryPy 2.2

We need to fix rewriting path or objectPath, or both. Let's try fixing objectPath:

  1. Generating URL's to spit back out in HTML: What to do about user code? Tell them to always use relative URL's? Not acceptable, really. A rewriting filter needs some way to "unrewrite" an arbitrary path, it seems. A prefix-only rewriter could do this, for example, but not a regex-rewriter. Maybe a prefix stripped from path could just be suffixed to base?
  2. HTTP Redirects and their targets: Same issue as #1. But also make the trailing-slash hack redirect by using browserUrl instead of (objectPath or path) + queryString. Note that HTTPRedirect already uses browserUrl.
  3. Handler dispatch: not broken.
  4. Arbitrary mount points: not broken.
  5. Config lookups. Make config.get try objectPath? There's a big problem there: objectPath might grow an extra "/index" or "/default" suffix halfway through the request process. So we'd have to separate the two concepts into a "searchPath" and a "foundPath". Even if we did that, we would still have the issue that path-rewriting does (user-defined filters run late). We would have to find a way to run a rewriting filter before most (all?) others. Maybe rewriting shouldn't be a filter at all—maybe it should be part of the fixed API, if only for prefixed mount points.
  6. Logging: probably not considered broken.

Seems we have our work cut out for us.

3 comments

Comment from: John P. Speno [Visitor]

Yep, got to fix this stuff in 2.2. Big time! :-)

10/27/05 @ 07:56
Comment from: peterhunt [Member] Email

Maybe I'm missing something, but why don't you just modify cherrypy.request.browserUrl along with the objectPath?

10/27/05 @ 17:34
Comment from: fumanchu [Member] Email

If we modified browserUrl, then we'd have no way at all to make the trailing-slash hack work (I just changed the trunk, so that hack now correctly uses browserUrl).

10/27/05 @ 23:33

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
August 2018
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Search

The requested Blog doesn't exist any more!

XML Feeds

free blog