« CherryPy 3 has fastest WSGI server yetIf you like CherryPy except for the dispatching... »

Internal Redirect WSGI middleware

11/13/06

Permalink 10:57:51 pm, by fumanchu Email , 745 words   English (US)
Categories: Python, CherryPy, WSGI

Internal Redirect WSGI middleware

I played around with this as a potential hack for CherryPy 3. It's WSGI middleware for adding almost-transparent "internal redirect" capabilities to any WSGI application.

My operating theory was that anyone writing a WSGI app that does not already have an internal-redirect feature was probably using HTTP redirects (302, 303, or 307) to do nearly the same thing. This middleware simply waits for a 307 response status and performs the redirection itself within the same request, without informing the user-agent.

This should be OK because 307 isn't normally cacheable anyway, and some versions of IE don't bother to ask the user as the spec requires already, so it just duplicates an existing browser bug. I could have used a custom HTTP code like 399, but if that ever leaked out to the UA (because someone forgot to enable the middleware) then the UA should fall back to "300 Multiple Choices", which didn't seem like a good fit. At least by using 307, the fallback should be appropriate, if not graceful.

Here's the code, which could probably use some improvements:

"""WSGI middleware which performs "internal" redirection."""

import StringIO


class _Redirector(object):

    def __init__(self, nextapp, recursive=False):
        self.nextapp = nextapp
        self.recursive = recursive

        self.location = None
        self.write_proxy = None
        self.status = None
        self.headers = None
        self.exc_info = None

        self.seen_paths = []

    def start_response(self, status, headers, exc_info):
        if status[:3] == "307":
            for name, value in headers:
                if name.lower() == "location":
                    self.location = value
                    break
        self.status = status
        self.headers = headers
        self.exc_info = exc_info
        return self.write

    def write(self, data):
        # This is only here for silly apps which call write.
        if self.write_proxy is None:
            self.write_proxy = self.sr(self.status, self.headers, self.exc_info)
        self.write_proxy(data)

    def __call__(self, environ, start_response):
        self.sr = start_response

        nextenv = environ.copy()
        curpath = nextenv['PATH_INFO']
        if nextenv.get('QUERY_STRING'):
            curpath = curpath + "?" + nextenv['QUERY_STRING']
        self.seen_paths.append(curpath)

        while True:
            # Consume the response (in case it's a generator).
            response = [x for x in self.nextapp(nextenv, self.start_response)]

            if self.location is None:
                # No redirection required; complete the response normally.
                self.sr(self.status, self.headers, self.exc_info)
                return response

            # Start with a fresh copy of the environ and start altering it.
            nextenv = environ.copy()
            nextenv['REQUEST_METHOD'] = 'GET'
            nextenv['CONTENT_LENGTH'] = '0'
            nextenv['wsgi.input'] = StringIO.StringIO()
            nextenv['redirector.history'] = self.seen_paths[:]

            # "The [Location response-header] field value
            # consists of a single absolute URI."
            (nextenv["wsgi.url_scheme"],
             nextenv["SERVER_NAME"],
             path, params,
             nextenv["QUERY_STRING"], frag) = urlparse(self.location)

            if frag:
                raise ValueError("Illegal #fragment in Location response "
                                 "header %r" % self.location)

            if params:
                path = path + ";" + params

            # Assume 'path' is already unquoted according to
            # <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2">http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2</a>
            if path.lower().startswith(environ['SCRIPT_NAME'].lower()):
                nextenv["PATH_INFO"] = path[len(environ['SCRIPT_NAME']):]
            else:
                raise ValueError("Location response header %r does not "
                                 "match current SCRIPT_NAME %r"
                                 % (self.location, environ['SCRIPT_NAME']))

            # Update self.seen_paths and check for recursive calls.
            curpath = nextenv['PATH_INFO']
            if nextenv.get('QUERY_STRING'):
                curpath = curpath + "?" + nextenv['QUERY_STRING']
            if curpath in self.seen_paths:
                raise RuntimeError("redirector visited the same URL twice: %r"
                                   % curpath)
            else:
                self.seen_paths.append(curpath)

            # Reset self for the next iteration
            self.location = None
            self.write_proxy = None
            self.status = None
            self.headers = None
            self.exc_info = None


def redirector(nextapp, recursive=False):
    """WSGI middleware which performs "internal" redirection.

    Whenever the next application sets a response status of 307 and
    provides a Location response header, this component will not pass
    that response on to the user-agent; instead, it parses the URI
    provided in the Location response header and calls the same
    application again using that URI. The following entries in the
    WSGI environ dict may be modified when redirecting: wsgi.url_scheme,
    SERVER_NAME, PATH_INFO, QUERY_STRING. REQUEST_METHOD is always
    set to 'GET', so any desired parameters must be supplied as
    query string arguments in the Location response header.
    The wsgi.input entry will always be reset to an empty StringIO,
    and CONTENT_LENGTH will be set to 0.

    If 'recursive' is False (the default), each new target URI will be
    checked to see if it has already been visited in the same request;
    if so, a RuntimeError is raised. If 'recursive' is True, no check
    is made and therefore no such errors are raised.
    """
    def redirect_wrapper(environ, start_response):
        ir = _Redirector(nextapp, recursive)
        return ir(environ, start_response)
    return redirect_wrapper

4 comments

Comment from: jos [Visitor]

You might want to check out Ian Bicking's WSGIRemote: http://pythonpaste.org/wsgiremote/

I think it solves a similar problem.

11/14/06 @ 07:00
Comment from: fumanchu [Member] Email

I think you mean paste.recursive? http://pythonpaste.org/module-paste.recursive.html

That is similar; however, it requires the next application to import paste so it can raise a known exception (that the middleware then traps). In CherryPy, at least, it would take more code to pass such an exception out of the app and make sure all the right finalization code is run before the forward occurs; that's all sidestepped neatly by (ab)using 307 (which has to be supported regardless of whether or not you use the middleware).

I also considered informing the middleware via a custom environ entry, but that seemed like it was contrary to WSGI's style, where the intent is that the server writes environ entries, middleware reads and/or changes them, and apps only read them.

11/14/06 @ 09:14
Comment from: Phillip J. Eby [Visitor] · http://dirtsimple.org/

FYI, this middleware is not WSGI compliant, because it consumes the entire response of every response given to it, not merely those that are redirects. This is explicitly forbidden by the following section of PEP 333:

http://www.python.org/dev/peps/pep-0333/#middleware-handling-of-block-boundaries

11/14/06 @ 10:21
Comment from: Brisbane SEO Guy [Visitor] · http://www.searchtempo.com

@Phillip: Well, it is a hack, you know.

Great article, thanks for the info!

08/11/10 @ 22:28

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
September 2017
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search

The requested Blog doesn't exist any more!

XML Feeds

powered by b2evolution free blog software