| « WSGI wrapper for mod_python | Agenda topics for CherryPy 2.2 roadmap meeting » |
There are a lot of reasons, and places, why a developer would want an original Request-URI to be treated as if it were another. CherryPy 2.1.0 has a (possibly bewildering) array of attributes, core code, and filters which either enable rewriting or are affected by it. Here's how I see the state of the art (this is not gospel--much is my opinion regarding design intent).
First, some features which depend on rewriting:
Now, cherrypy.request has the following attributes (grabbed straight from the book):
Let's take an example HTTP requestLine and see if we can't parse it out:
DELETE /path/to/handler/?param=somevalue HTTP/1.1
\____/ \_______________/ \_____________/ \______/
method path queryString protocol
Pretty straightforward; no overlaps. Note that if the Request-URI includes a scheme and host, that'll be stripped when path is formed.
There are a couple of other URI-related request attributes:
Since the requestLine doesn't always include the scheme or host (it may, rarely), these are obtained from other sources and joined into base. The browserUrl joins the base, the path, and the queryString to form a complete, absolute URI (what was hopefully in the Address bar of the end-user's web browser, if that's applicable).
Finally, we have these copies/substitutes for the functionality provided by path:
The objectPath may be used to control dispatching, but there's nothing in the core that uses it that way. Since it's almost always None, dispatching usually falls back to the value of path. Once the handler dispatch is completed, then objectPath contains the route to the found handler, expressed as a path; in the above example, it might be "/path/to/handler/index" if an "index" function handles the request.
The originalPath is also an odd attribute. You would think that CherryPy core features, especially those which use or implement URI rewriting, would make use of this value. But none of them do. It gets set but never used.
This is what the builtin baseUrlFilter does, so that an instance of CherryPy running behind Apache with mod_proxy or mod_rewrite can spit back out proper URI's in HTML, redirects, etc. As far as I can tell, this works well and has no issues with the rest of CherryPy. The only other value which overlaps with the value of base is browserUrl, which the filter also rewrites.
Another way to rewrite is to use a filter that changes the value of path for you as early as possible. For example, I use a VirtualPathFilter which does this:
class VirtualPathFilter(object):
"""Filter that changes cherrypy.request.path, stripping a set prefix."""
def onStartResource(self):
if cherrypy.config.get('virtualPathFilter.on', False):
prefix = cherrypy.config.get('virtualPathFilter.prefix', '')
if prefix:
path = cherrypy.request.path
if path == prefix:
path = '/'
elif path.startswith(prefix):
path = path[len(prefix):]
cherrypy.request.path = path
This allows me to provide feature #4, arbitrary mount points. I write my application as if it were always mounted at /, but the deployer can then provide a virtualPathFilter.prefix to turn the URL /prefix/page?id=3 into /page?id=3.
Unfortunately, if the other pieces of CherryPy aren't written to support arbitrary mount points, then this scheme falls apart. And they aren't so written. I've just broken many of our other features:
server.* config entries are specified somewhere other than "global", then we have the same issue. Finally, what's to stop a future CP developer from adding more such problems (as they fix other bugs)?An alternative to rewriting the path is to use a filter that changes the value of objectPath instead, before the handler is looked up and called. For example, we could change VirtualPathFilter to do this instead:
class VirtualPathFilter(object):
"""Filter that changes cherrypy.request.objectPath, stripping a set prefix."""
def beforeMain(self):
if cherrypy.config.get('virtualPathFilter.on', False):
prefix = cherrypy.config.get('virtualPathFilter.prefix', '')
if prefix:
path = cherrypy.request.path
if path == prefix:
path = '/'
elif path.startswith(prefix):
path = path[len(prefix):]
cherrypy.request.objectPath = path
^^^^^^^^^^
Are there any side-effects to this approach?
config.get() defaults to using path, which we haven't rewritten, which might seem all right until you try to deploy the app: every configMap key must be rewritten to prefix the mount point, and this must be done separately for each site. Some might call this an acceptable trade-off. I don't. We need to fix rewriting path or objectPath, or both. Let's try fixing objectPath:
Seems we have our work cut out for us.
Yep, got to fix this stuff in 2.2. Big time! :-)
Maybe I'm missing something, but why don't you just modify cherrypy.request.browserUrl along with the objectPath?
If we modified browserUrl, then we'd have no way at all to make the trailing-slash hack work (I just changed the trunk, so that hack now correctly uses browserUrl).