Category: WSGI

Pages: 1 2 >>

09/08/08

Permalink 11:36:06 am, by fumanchu Email , 132 words   English (US)
Categories: IT, CherryPy, WSGI

Resources are concepts for the client

...not objects on the server. Roy Fielding explains yet again:

Web architects must understand that resources are just consistent mappings from an identifier to some set of views on server-side state. If one view doesn’t suit your needs, then feel free to create a different resource that provides a better view (for any definition of “better”). These views need not have anything to do with how the information is stored on the server, or even what kind of state it ultimately reflects. It just needs to be understandable (and actionable) by the recipient.

I have found this to be the single most-misunderstood aspect of HTTP. Too many people conceive of URI's as just names for files or database objects. They can be so much more.

10/02/07

Permalink 12:55:58 am, by fumanchu Email , 159 words   English (US)
Categories: CherryPy, WSGI

CherryPy 3 request_queue_size

Well, that was instructive. Leaving server.request_queue_size at the default 5:

C:\Python24\Lib\site-packages>python cherrypy\test\benchmark.py
Starting CherryPy app server...
Started in 1.10800004005 seconds

Client Thread Report (1000 requests, 14 byte response body, 10 server threads):

threads | Completed | Failed | req/sec | msec/req | KB/sec |
     10 |      1000 |      0 |  736.81 |    1.357 | 119.36 |
     20 |      1000 |      0 |  436.07 |    2.293 |  70.64 |
     30 |      1000 |      0 |  348.38 |    2.870 |  56.44 |
     40 |      1000 |      0 |  233.10 |    4.290 |  37.76 |
     50 |      1000 |      0 |  296.77 |    3.370 |  48.08 |
Average |    1000.0 |    0.0 | 410.226 |    2.836 | 66.456 |

Client Thread Report (1000 requests, 14 bytes via staticdir, 10 server threads):

threads | Completed | Failed | req/sec | msec/req | KB/sec |
     10 |      1000 |      0 |  421.73 |    2.371 |  87.30 |
     20 |      1000 |      0 |  374.87 |    2.668 |  77.60 |
     30 |      1000 |      0 |  306.71 |    3.260 |  63.49 |
     40 |      1000 |      0 |  240.08 |    4.165 |  49.70 |
     50 |      1000 |      0 |  170.03 |    5.881 |  35.20 |
Average |    1000.0 |    0.0 | 302.684 |    3.669 | 62.658 |

Size Report (1000 requests, 50 client threads, 10 server threads):

    bytes | Completed | Failed | req/sec | msec/req |   KB/sec |
       10 |      1000 |      0 |  187.98 |    5.320 |    29.70 |
      100 |      1000 |      0 |  207.45 |    4.820 |    51.45 |
     1000 |      1000 |      0 |  186.89 |    5.351 |   210.81 |
    10000 |      1000 |      0 |  228.12 |    4.384 |  2262.07 |
   100000 |      1000 |      0 |  245.60 |    4.072 | 24022.01 |
100000000 |      1000 |     10 |   20.83 |   48.001 | 20358.12 |

Upping server.request_queue_size to 128:

C:\Python24\Lib\site-packages>python cherrypy\test\benchmark.py
Starting CherryPy app server...
Started in 1.10700011253 seconds

Client Thread Report (1000 requests, 14 byte response body, 10 server threads):

threads | Completed | Failed | req/sec | msec/req |  KB/sec |
     10 |      1000 |      0 |  745.38 |    1.342 |  120.75 |
     20 |      1000 |      0 |  772.32 |    1.295 |  125.12 |
     30 |      1000 |      0 |  654.11 |    1.529 |  105.97 |
     40 |      1000 |      0 |  929.02 |    1.076 |  150.50 |
     50 |      1000 |      0 |  641.03 |    1.560 |  103.85 |
Average |    1000.0 |    0.0 | 748.372 |   1.3604 | 121.238 |

Client Thread Report (1000 requests, 14 bytes via staticdir, 10 server threads):

threads | Completed | Failed | req/sec | msec/req |  KB/sec |
     10 |      1000 |      0 |  547.89 |    1.825 |  113.41 |
     20 |      1000 |      0 |  588.10 |    1.700 |  121.74 |
     30 |      1000 |      0 |  704.42 |    1.420 |  145.82 |
     40 |      1000 |      0 |  547.89 |    1.825 |  113.41 |
     50 |      1000 |      0 |  516.96 |    1.934 |  107.01 |
Average |    1000.0 |    0.0 | 581.052 |   1.7408 | 120.278 |

Size Report (1000 requests, 50 client threads, 10 server threads):

    bytes | Completed | Failed | req/sec | msec/req |   KB/sec |
       10 |      1000 |      0 |  622.35 |    1.607 |    98.33 |
      100 |      1000 |      0 |  604.74 |    1.654 |   149.37 |
     1000 |      1000 |      0 |  667.74 |    1.498 |   752.54 |
    10000 |      1000 |      0 |  890.31 |    1.123 |  8837.25 |
   100000 |      1000 |      0 |  728.44 |    1.373 | 71247.09 |
100000000 |      1000 |    202 |   12.81 |   78.094 |     None |

07/24/07

Permalink 10:03:20 am, by fumanchu Email , 393 words   English (US)
Categories: CherryPy, WSGI

Please don't use wsgiapp

Gordon Tillman has a wiki page up on how to mix Django content into a CherryPy site. It's easy and probably works, but please don't do it anymore.

We're officially going to deprecate the wsgiapp Tool because 1) it doesn't conform to the WSGI spec (and cannot be fixed to do so), and 2) there's a better way to mix content in a CherryPy site: tree.graft.

The tree.graft(app, script_name) method is the proper way to add Django or other WSGI content to an existing CherryPy site. Instead of nesting the two frameworks, we branch instead. To take Gordon's example, instead of:

class DjangoApp(object):
    _cp_config = {
        'tools.wsgiapp.on': True,
        'tools.wsgiapp.app': AdminMediaHandler(WSGIHandler()),
}
...
cherrypy.tree.mount(DjangoApp(), '/')

You should always write this instead:

cherrypy.tree.graft(AdminMediaHandler(WSGIHandler()), '/')

Look, if you nest the one inside the other, CherryPy's going to do an awful lot of HTTP request parsing that is going to be completely redundant, since Django's going to do it again anyway. And this code is not very fast. Your site is going to crawl. That's strike one for nesting.

Strike two is the "always on" nature of nesting as opposed to branching. When you write your request/response cycle like an onion, every component which could possibly play a part in the request has to be called, even if just to reply "I'm not involved in this one". Given the slowness of Python function calls, this is rarely a good thing. If you thought your site was crawling before... This was a major design flaw of CherryPy 2, and is a major reason CherryPy 3 is 3x faster: the old Filters were called all the time, even if you didn't need them; the new Tools are only called when they're applicable.

Strike three against the nested approach is that it's always easier to traverse a tree of siblings than it is to traverse a nested set; programmers, for some reason, like to hide information from you, including how their site components go together. The branched version will be much easier to reason about, statically analyze, and write inspection tools for.

So please, use tree.graft, and stop using the wsgiapp Tool in CherryPy 3. We're going to formally deprecate it soon.

06/24/07

Permalink 03:22:20 pm, by fumanchu Email , 1679 words   English (US)
Categories: Python, CherryPy, WSGI

Web Site Process Bus

WSGI has enabled an ecosystem where site deployers can, in theory, mix multiple applications from various frameworks into a single web site, served by a single HTTP server. And that's great. But there are several areas where WSGI is purposefully silent, where there is still room for standards-based collaboration:

  • managing WSGI HTTP servers (start/stop/restart)
  • construction of the WSGI component graph (servers -> middlewares -> apps)
  • main process state control (start/stop/restart/graceful)
  • site-wide services (autoreload, thread monitors, site logging)
  • config file formats and parsing for all of the above

Most frameworks address all of the above already, to varying degrees; however, they still tend to do so in a very monolithic manner. Paste is notable for attempting to provide some of them in discrete pieces (especially WSGI graph construction and a config format tailor-made for it).

But I'm going to focus here on just two of these issues: process state and site-wide services. I believe we can separate these two from the rest of the pack and provide a simple, common specification for both, one that's completely implementable in 100 lines of code by any framework.

The problem

One of the largest issues when combining multiple frameworks in a single process is answering the question, "who's in control of the site as a whole?" Multiple frameworks means multiple code bases who all think they should provide:

  • the startup script
  • daemonization
  • dropping privileges
  • PID file management
  • site logging
  • autoreload
  • signal handling
  • sys.exit calls
  • atexit handlers
  • main thread error trapping

...and they often disagree about those behaviors. Throw Apache or lighttpd into the mix and you've got some serious deployment issues.

The typical solution to this is to have each component provide a means of shutting off each process-controlling feature. For example, CherryPy 3 obeys the config entry engine.autoreload_on = False, while django-admin.py takes a --noreload command-line arg. But these are different for each framework, and difficult to coordinate as the number of components grows. Since, for example, only one autoreloader is needed per site, a more usable solution would be to selectively turn on just one instead of turning off all but one.

For a worse example, let's look at handling SIGTERM. Currently, we have the following:

SIGTERM before WSPBus

OK, Django doesn't actually provide a SIGTERM handler, but you get the idea. If several components register a SIGTERM handler, only one of them will "win" by virtue of being the last one to register. And chances are, the winning handler will shut down its component cleanly and then exit the process, leaving other components to fend for themselves.

In fact, there's a whole list of negatives for the monolithic approach to process control and site services:

  1. Frameworks and servers have to provide all desirable site behaviors, or force their packagers/deployers to develop them ad-hoc.
  2. Frameworks and servers all have different API's for changing process state. Race conditions and unpredictable outcomes are common.
  3. Frameworks and servers all have different API's for reacting to process state changes. Resource acquisition and cleanup becomes a huge unknown.
  4. Frameworks and servers have to know they're being deployed alongside other frameworks and servers.

We could attempt to solve this with a Grand Unified Site Container, but that would most likely:

  1. force a single daemon implementation, thus eliminating innovation in process invocation,
  2. force a single configuration syntax, thus denying any market over declaration styles,
  3. force a static set of site services, limiting any improvements in process interaction,
  4. add an additional dependency to every framework,
  5. deny using HTTP servers like Apache and lighttpd in the same process (since they do their own process control), and
  6. be a dumping-ground for every other aspect of web development, from databases to templating.

A solution: the Web Site Process Bus

The Web Site Process Bus uses a simple publish/subscribe architecture to loosely connect WSGI components with site services. Here's our SIGTERM example, implemented with a WSPBus:

SIGTERM after WSPBus

The singleton Bus object does three things:

  1. It models server-availability state via a "state" attribute, which is a sentinel value from the set: (STARTING, STARTED, STOPPING, STOPPED).
  2. It possesses methods to change the state, such as "start", "stop", "restart", "graceful", and "exit".
  3. It possesses "publish" and "subscribe"/"unsubscribe" methods for named channels.

Each method which changes the state also has an equivalent named channel. Any framework, server, or other component may register code as a listener on any channel. For example, a web framework can register database-connection code to be run when the "start" method is called, and disconnection code for the "stop" method:

bus.subscribe("start", orm.connpool.start)
bus.subscribe("stop", orm.connpool.stop)

Any channel which has no listeners will simply ignore all published messages. This allows component code to be much simpler; callers do not need to know whether their actions are appropriate--they are appropriate if a listener is subscribed to that channel.

In addition to the builtin state-transition channels, components are free to define their own pub/sub channels. CherryPy's current implementation, for example, defines the additional channels start_thread and stop_thread, and registers channels for signals, such as "SIGTERM", "SIGHUP", and "SIGUSR1" (which then typically call bus methods like "restart" and "exit"). Some of these could be standardized. Other custom channels would be more naturally tightly-coupled, requiring awareness on the part of callers and callees.

Since WSPB state-changing method calls are expected to be sporadic, and often fundamentally serial (e.g., "autoreload"), their execution is synchronous. Subscribers (mostly of custom channels), however, are free to return immediately, and continue their operation asynchronously.

Benefits

The WSPB cleanly solves all of the problems outlined above. The various components are no longer in competition over process state; instead, there is a single race-free state machine. However, no single component has to know whether or how many other components are deployed in the same site.

Frameworks and servers can provide a subset of all site services, with a common, imperative-Python API for deployers to add or substitute their own. However, the WSPB doesn't define a config syntax, so each framework can continue to provide its own unique layer to translate config into that API. A deployer of a combined Pylons/Zope website could choose a Pylons startup script and config syntax to manage the lifecycle of the Zope components.

The WSPB doesn't try to instantiate or compose WSGI components (server -> middleware -> app) either. So there's even room for site daemons which provide no traditional web app functionality; instead, they specialize in providing tools to compose WSGI component graphs via a config file or even a GUI.

It also "plays nice" with mod_python, mod_proxy, mod_wsgi, FastCGI, and SCGI. Those who develop WSGI gateways for these will have a clear incentive to consolidate their ad-hoc startup and shutdown models into the WSPB. For example, a modpython gateway can use apache.register_cleanup to just call bus.stop() instead of providing custom cleanup-declaration code.

Best of all, the WSPB can be defined as a specification which any framework can provide in a small amount of code. Rather than attempt to draft the specification here (that can be hashed out on Web-SIG, since this is by no means complete), I'm just going to provide an example:

try:
    set
except NameError:
    from sets import Set as set
import sys
import threading
import time
import traceback as _traceback


# Use a flag to indicate the state of the bus.
class _StateEnum(object):
    class State(object):
        pass
states = _StateEnum()
states.STOPPED = states.State()
states.STARTING = states.State()
states.STARTED = states.State()
states.STOPPING = states.State()


class Bus(object):
    """Process state-machine and messenger for HTTP site deployment."""

    states = states
    state = states.STOPPED

    def __init__(self):
        self.state = states.STOPPED
        self.listeners = dict([(channel, set()) for channel
                               in ('start', 'stop', 'exit',
                                   'restart', 'graceful', 'log')])
        self._priorities = {}

    def subscribe(self, channel, callback, priority=None):
        """Add the given callback at the given channel (if not present)."""
        if channel not in self.listeners:
            self.listeners[channel] = set()
        self.listeners[channel].add(callback)

        if priority is None:
            priority = getattr(callback, 'priority', 50)
        self._priorities[(channel, callback)] = priority

    def unsubscribe(self, channel, callback):
        """Discard the given callback (if present)."""
        listeners = self.listeners.get(channel)
        if listeners and callback in listeners:
            listeners.discard(callback)
            del self._priorities[(channel, callback)]

    def publish(self, channel, *args, **kwargs):
        """Return output of all subscribers for the given channel."""
        if channel not in self.listeners:
            return []

        exc = None
        output = []

        items = [(self._priorities[(channel, listener)], listener)
                 for listener in self.listeners[channel]]
        items.sort()
        for priority, listener in items:
            # All listeners for a given channel are guaranteed to run even
            # if others at the same channel fail. We will still log the
            # failure, but proceed on to the next listener. The only way
            # to stop all processing from one of these listeners is to
            # raise SystemExit and stop the whole server.
            try:
                output.append(listener(*args, **kwargs))
            except (KeyboardInterrupt, SystemExit):
                raise
            except:
                self.log("Error in %r listener %r" % (channel, listener),
                         traceback=True)
                exc = sys.exc_info()[1]
        if exc:
            raise
        return output

    def start(self):
        """Start all services."""
        self.state = states.STARTING
        self.log('Bus starting')
        self.publish('start')
        self.state = states.STARTED

    def restart(self):
        """Restart the process (may close connections)."""
        self.stop()

        self.log('Bus restart')
        self.publish('restart')

    def graceful(self):
        """Advise all services to reload."""
        self.log('Bus graceful')
        self.publish('graceful')

    def block(self, state=states.STOPPED, interval=0.1):
        """Wait for the given state, KeyboardInterrupt or SystemExit."""
        try:
            while self.state != state:
                time.sleep(interval)
        except (KeyboardInterrupt, IOError):
            # The time.sleep call might raise
            # "IOError: [Errno 4] Interrupted function call" on KBInt.
            self.log('Keyboard Interrupt: shutting down bus')
            self.stop()
        except SystemExit:
            self.log('SystemExit raised: shutting down bus')
            self.stop()
            raise

    def stop(self):
        """Stop all services."""
        self.state = states.STOPPING
        self.log('Bus stopping')
        self.publish('stop')
        self.state = states.STOPPED

    def exit(self, status=0):
        """Stop all services and exit the process."""
        self.stop()

        self.log('Bus exit')
        self.publish('exit')
        sys.exit(status)

    def log(self, msg="", traceback=False):
        if traceback:
            exc = sys.exc_info()
            msg += "\n" + "".join(_traceback.format_exception(*exc))
        self.publish('log', msg)

02/25/07

Permalink 02:21:23 pm, by admin Email , 572 words   English (US)
Categories: Python, CherryPy, WSGI

PyCon 2007 and CherryPy

PyCon 2007 is nearing a close; here are some notes on how it affected CherryPy:

Web application deployment

Chad Whitacre (author of Aspen) herded several cats into a room on Sunday and forced us to discuss the various issues surrounding Python web application deployment. This is hinted at in the WSGI spec:

Finally, it should be mentioned that the current version of WSGI does not prescribe any particular mechanism for "deploying" an application for use with a web server or server gateway. At the present time, this is necessarily implementation-defined by the server or gateway. After a sufficient number of servers and frameworks have implemented WSGI to provide field experience with varying deployment requirements, it may make sense to create another PEP, describing a deployment standard for WSGI servers and application frameworks.

There were three basic realms where the participants agreed we could try to collaborate/standardize:

  1. Process control: stop, start, restart, daemonization, signal handling, socket re-use, drop privileges, etc. If you're familiar with CherryPy 3, you'll recognize this list as 95% of the current cherrypy.engine object. The CherryPy team has already been discussing ways of breaking up the Engine object; this may facilitate that (and vice-versa). Joseph Tate volunteered to look at socket re-use issues specifically, but the general consensus seemed to be that much of this would be hashed out on Web-SIG.

  2. WSGI stack composition: Jim Fulton proposed that we could all agree on Paste Deploy (at least a good portion of the API) to manage this in a cross-framework manner. Most heads nodded, "yes". Jim also proposed that each of the framework authors take the next week to refamiliarize themselves with Deploy, and then start pestering Ian Bicking with specific API issues. Ian suggested that he should fork Paste Deploy into another project specifically for this. For CherryPy, this would first mean offering standard egg entry points. [Personally, I'd like to standardize on a pure-Python API for deploy, not a config file format API. In other words, make the config file format optional, so that users of CP-only apps could avoid having to learn a distinct config file format for deployment. It should be possible to transform various config file formats into the same Python object(s).]

  3. Benchmarks: Jim also suggested we create a standard WSGI HTTP server benchmark suite, with various test applications and concurrency scenarios. This would compare various WSGI HTTP servers, as opposed to CherryPy's existing benchmark suite which compares successive versions of the full CP stack. Ian volunteered to begin work on that project (with the expectation that others would contribute substantial use cases, etc).

Others who were present for at least a portion of the long discussion: me, Mark Ramm, Kevin Dangoor, Ben Bangert, Jonathan Ellis, Matt Good, Brian Beck, and Calvin Hendryx-Parker.

WSGI middleware authoring

After some discussion with Mark (and he with Ian and Ben), we agreed that CherryPy could do more in the WSGI-middleware-authoring department. There is a continuous pressure to simply re-use or fix up the existing CherryPy request object to fill this need; however, there are some fundamental problems with that approach (such as the use of threadlocals to manage context, and the difficulty of streaming WSGI output through a CherryPy app). At the moment, I'm leaning toward adding a new API to CherryPy which would be similar to the application API, but specifically targeted at middleware authoring.

11/13/06

Permalink 10:57:51 pm, by fumanchu Email , 745 words   English (US)
Categories: Python, CherryPy, WSGI

Internal Redirect WSGI middleware

I played around with this as a potential hack for CherryPy 3. It's WSGI middleware for adding almost-transparent "internal redirect" capabilities to any WSGI application.

My operating theory was that anyone writing a WSGI app that does not already have an internal-redirect feature was probably using HTTP redirects (302, 303, or 307) to do nearly the same thing. This middleware simply waits for a 307 response status and performs the redirection itself within the same request, without informing the user-agent.

This should be OK because 307 isn't normally cacheable anyway, and some versions of IE don't bother to ask the user as the spec requires already, so it just duplicates an existing browser bug. I could have used a custom HTTP code like 399, but if that ever leaked out to the UA (because someone forgot to enable the middleware) then the UA should fall back to "300 Multiple Choices", which didn't seem like a good fit. At least by using 307, the fallback should be appropriate, if not graceful.

Here's the code, which could probably use some improvements:

"""WSGI middleware which performs "internal" redirection."""

import StringIO


class _Redirector(object):

    def __init__(self, nextapp, recursive=False):
        self.nextapp = nextapp
        self.recursive = recursive

        self.location = None
        self.write_proxy = None
        self.status = None
        self.headers = None
        self.exc_info = None

        self.seen_paths = []

    def start_response(self, status, headers, exc_info):
        if status[:3] == "307":
            for name, value in headers:
                if name.lower() == "location":
                    self.location = value
                    break
        self.status = status
        self.headers = headers
        self.exc_info = exc_info
        return self.write

    def write(self, data):
        # This is only here for silly apps which call write.
        if self.write_proxy is None:
            self.write_proxy = self.sr(self.status, self.headers, self.exc_info)
        self.write_proxy(data)

    def __call__(self, environ, start_response):
        self.sr = start_response

        nextenv = environ.copy()
        curpath = nextenv['PATH_INFO']
        if nextenv.get('QUERY_STRING'):
            curpath = curpath + "?" + nextenv['QUERY_STRING']
        self.seen_paths.append(curpath)

        while True:
            # Consume the response (in case it's a generator).
            response = [x for x in self.nextapp(nextenv, self.start_response)]

            if self.location is None:
                # No redirection required; complete the response normally.
                self.sr(self.status, self.headers, self.exc_info)
                return response

            # Start with a fresh copy of the environ and start altering it.
            nextenv = environ.copy()
            nextenv['REQUEST_METHOD'] = 'GET'
            nextenv['CONTENT_LENGTH'] = '0'
            nextenv['wsgi.input'] = StringIO.StringIO()
            nextenv['redirector.history'] = self.seen_paths[:]

            # "The [Location response-header] field value
            # consists of a single absolute URI."
            (nextenv["wsgi.url_scheme"],
             nextenv["SERVER_NAME"],
             path, params,
             nextenv["QUERY_STRING"], frag) = urlparse(self.location)

            if frag:
                raise ValueError("Illegal #fragment in Location response "
                                 "header %r" % self.location)

            if params:
                path = path + ";" + params

            # Assume 'path' is already unquoted according to
            # <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2">http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2</a>
            if path.lower().startswith(environ['SCRIPT_NAME'].lower()):
                nextenv["PATH_INFO"] = path[len(environ['SCRIPT_NAME']):]
            else:
                raise ValueError("Location response header %r does not "
                                 "match current SCRIPT_NAME %r"
                                 % (self.location, environ['SCRIPT_NAME']))

            # Update self.seen_paths and check for recursive calls.
            curpath = nextenv['PATH_INFO']
            if nextenv.get('QUERY_STRING'):
                curpath = curpath + "?" + nextenv['QUERY_STRING']
            if curpath in self.seen_paths:
                raise RuntimeError("redirector visited the same URL twice: %r"
                                   % curpath)
            else:
                self.seen_paths.append(curpath)

            # Reset self for the next iteration
            self.location = None
            self.write_proxy = None
            self.status = None
            self.headers = None
            self.exc_info = None


def redirector(nextapp, recursive=False):
    """WSGI middleware which performs "internal" redirection.

    Whenever the next application sets a response status of 307 and
    provides a Location response header, this component will not pass
    that response on to the user-agent; instead, it parses the URI
    provided in the Location response header and calls the same
    application again using that URI. The following entries in the
    WSGI environ dict may be modified when redirecting: wsgi.url_scheme,
    SERVER_NAME, PATH_INFO, QUERY_STRING. REQUEST_METHOD is always
    set to 'GET', so any desired parameters must be supplied as
    query string arguments in the Location response header.
    The wsgi.input entry will always be reset to an empty StringIO,
    and CONTENT_LENGTH will be set to 0.

    If 'recursive' is False (the default), each new target URI will be
    checked to see if it has already been visited in the same request;
    if so, a RuntimeError is raised. If 'recursive' is True, no check
    is made and therefore no such errors are raised.
    """
    def redirect_wrapper(environ, start_response):
        ir = _Redirector(nextapp, recursive)
        return ir(environ, start_response)
    return redirect_wrapper

12/05/05

Permalink 11:23:32 am, by fumanchu Email , 103 words   English (US)
Categories: WSGI

WSGI gateway for mod_python status fix

After much woe, I think I finally tracked down the status problems I was having with modpython_gateway.py (which is now available on my "misc" Trac site). It should now correctly handle redirects, 404's, and .css and .js content. I think it also fixed my earlier "delayed content" problem.

I hereby nominate mod_python's status API for the "One Obvious Way To Do It" booby prize. Having req.status, a return value/status, and the option to raise a status makes far too many combinations.

11/06/05

Permalink 01:18:00 am, by fumanchu Email , 879 words   English (US)
Categories: Python, WSGI

WSGI wrapper for mod_python

Update Nov 6, 2005: Finally got it to work with Apache2-prefork on Unix (it only worked on mpm_winnt until now).

Update Oct 25, 2005: I was having a problem setting up a new install of my CherryPy application, using this recipe. It turned out that I didn't have the right interpreter_name in my PythonImport directive:

PythonImport module interpreter_name

Therefore, the CherryPy server started in a different intepreter than the one being used for the requests. It must exactly match the value of req.interpreter, and is case-sensitive. I've updated the code with comments to that effect (just to have it all in one place).

Update Aug 11, 2005: I was having a problem serving .css and .js pages. CherryPy's standalone WSGI server did fine, but mod_python did not. I finally tracked it down to the fact that I was both setting apache's req.status and returning req.status from the handler. Funky. It worked when I chose to simply return the value, and not set it.

Update June 5, 2005:

  1. Pages take forever to terminate when returning a status of 200--apache.OK must be returned instead in that case.

  2. Added code for hotshot profiling.

  3. Added code for using paste.lint

As I mentioned I was doing last week, I wrote a more complete WSGI wrapper for modpython. Here it is. Feedback welcome. Phil Eby told me he'd like a mod_python wrapper for inclusion in wsgiref; he should feel free to use this one however he sees fit. ;)


"""
WSGI wrapper for mod_python. Requires Python 2.2 or greater.


Example httpd.conf section for a CherryPy app called "mcontrol":

<Directory D:\htdocs\mcontrol>
    SetHandler python-program
    PythonHandler wsgiref.modpy_wrapper::handler
    PythonOption application cherrypy.wsgiapp::wsgiApp
    PythonOption import mcontrol.cherry::startup
</Directory>

"""

import sys
from mod_python import apache
from wsgiref.handlers import BaseCGIHandler


class InputWrapper(object):
    
    def __init__(self, req):
        self.req = req
    
    def close(self):
        pass
    
    def read(self, size=-1):
        return self.req.read(size)
    
    def readline(self):
        return self.req.readline()
    
    def readlines(self, hint=-1):
        return self.req.readlines(hint)
    
    def __iter__(self):
        line = self.readline()
        while line:
            yield line
            # Notice this won't prefetch the next line; it only
            # gets called if the generator is resumed.
            line = self.readline()


class ErrorWrapper(object):
    
    def __init__(self, req):
        self.req = req
    
    def flush(self):
        pass
    
    def write(self, msg):
        self.req.log_error(msg)
    
    def writelines(self, seq):
        self.write(''.join(seq))


bad_value = ("You must provide a PythonOption '%s', either 'on' or 'off', "
             "when running a version of mod_python < 3.1")

class Handler(BaseCGIHandler):
    
    def __init__(self, req):
        options = req.get_options()
        
        # Threading and forking
        try:
            q = apache.mpm_query
        except AttributeError:
            threaded = options.get('multithread', '').lower()
            if threaded == 'on':
                threaded = True
            elif threaded == 'off':
                threaded = False
            else:
                raise ValueError(bad_value % "multithread")
            
            forked = options.get('multiprocess', '').lower()
            if forked == 'on':
                forked = True
            elif forked == 'off':
                forked = False
            else:
                raise ValueError(bad_value % "multiprocess")
        else:
            threaded = q(apache.AP_MPMQ_IS_THREADED)
            forked = q(apache.AP_MPMQ_IS_FORKED)
        
        env = dict(apache.build_cgi_env(req))
        
        if req.headers_in.has_key("authorization"):
            env["HTTP_AUTHORIZATION"] = req.headers_in["authorization"]
        
        BaseCGIHandler.__init__(self,
                                stdin=InputWrapper(req),
                                stdout=None,
                                stderr=ErrorWrapper(req),
                                environ=env,
                                multiprocess=forked,
                                multithread=threaded
                                )
        self.request = req
        self._write = req.write
    
    def _flush(self):
        pass
    
    def send_headers(self):
        self.cleanup_headers()
        self.headers_sent = True
        # Can't just return 200 or the page will hang until timeout
        s = int(self.status[:3])
        if s == 200:
            self.finalstatus = apache.OK
        else:
            self.finalstatus = s
        # the headers.Headers class doesn't have an iteritems method...
        for key, val in self.headers.items():
            if key.lower() == 'content-length':
                if val is not None:
                    self.request.set_content_length(int(val))
            elif key.lower() == 'content-type':
                self.request.content_type = val
            else:
                self.request.headers_out[key] = val


_counter = 0

def profile(req):
    # Call this function instead of handler
    # to get profiling data for each call.
    import hotshot, os.path
    ppath = os.path.dirname(__file__)
    if not os.path.exists(ppath):
        os.makedirs(ppath)
    global _counter
    _counter += 1
    ppath = os.path.join(ppath, "cp_%s.prof" % _counter)
    prof = hotshot.Profile(ppath)
    result = prof.runcall(handler, req)
    prof.close()
    return result

def handler(req):
    config = req.get_config()
    debug = int(config.get("PythonDebug", 0))
    
    options = req.get_options()
    
    # Because PythonImport cannot be specified per Directory or Location,
    # take any 'import' PythonOption's and import them. If a function name
    # in that module is provided (after the "::"), it will be called with
    # the request as an argument. The module and function, if any, should
    # be re-entrant (i.e., handle multiple threads), and, since they will
    # be called per request, must be designed to run setup code only on the
    # first request (a global 'first_request' flag is usually enough).
    import_opt = options.get('import')
    if import_opt:
        atoms = import_opt.split('::', 1)
        modname = atoms.pop(0)
        module = __import__(modname, globals(), locals(), [''])
        if atoms:
            func = getattr(module, atoms[0])
            func(req)
    
    # Import the wsgi 'application' callable and pass it to Handler.run
    modname, objname = options['application'].split('::', 1)
    module = __import__(modname, globals(), locals(), [''])
    app = getattr(module, objname)
    
    h = Handler(req)
##    from paste import lint
##    app = lint.middleware(app)
    h.run(app)
    
    # finalstatus was set in Handler.send_headers()
    return h.finalstatus

10/20/05

Permalink 09:00:23 pm, by fumanchu Email , 332 words   English (US)
Categories: Python, CherryPy, WSGI

Is CherryPy a web framework?

Guido van Rossum recently wrote:

Python, in its design philosophy, tries hard not to be a framework. (This sets it apart from Java, which is hostile to non-Java code.) Python tries to be helpful when you want to solve part of your problem using a different tool. It tries to work well even if Python is only a small part of your total solution. It tries to be agnostic of platform-specific frameworks, optionally working with them (e.g. fork and pipes on Unix) but not depending or relying on them. Even threads are quite optional to Python.

Oddly enough, this is how I feel about CherryPy, that it tries hard not to be a framework. It tries to be helpful, recognizing that it's most likely only part of your solution. It tries to be agnostic of templating and persistence systems, and has little to say about markup languages, content-types, site architecture, or RPC formats.

Guido was responding to Phillip J. Eby, who wrote:

A Pythonic framework shouldn't load you down with new management burdens and keep you from using other frameworks. It should make life easier, and make your code more interoperable, not less. Indeed, I've pretty much come to agreement with the part of the Python developer community that has says Frameworks Are Evil.

Not wanting to be Evil, I've tried to make CherryPy 2.1 a system which doesn't load you down with new management burdens. Instead, it exposes the functionality of HTTP by presenting it in a Pythonic way. I and many others think it makes life easier—CherryPy appears to have an underscore-shaped learning curve. ;) And as for interoperability, CherryPy was one of the first Python web-application servers to grow a WSGI interface.

I suppose that CherryPy will always have to bear the moniker of "framework", if only because it calls your code, instead of the other way around. But let's keep it a Pythonic framework as long as we can.

08/16/05

Permalink 05:17:39 pm, by fumanchu Email , 662 words   English (US)
Categories: CherryPy, WSGI

Funny how people only goggle over the baby

Simon Willison recently wrote a description of Django's request-handling mechanism. Here's a quick comparison with CherryPy:

When Django receives a request, the first thing it does is create an HttpRequest object (or subclass there-of) to represent that request.

CherryPy has a Request object, as well. However, it's purely an internal object; it doesn't get passed around to application code. One of the design points of CherryPy is that it allows you to write (at least a majority of) your code "like any other app"; this means that input arrives as "simple data" via function parameters, and you use the "return" statement to output data, not custom HTTP-framework objects. Point in favor of CP, IMO.

Once the object has been created, Django performs URL resolution. This is a process by which the URL specified in the request is used to select a view function to handle the creation of a response. A trivial Django application is simply one or more view functions and a configuration file that maps those functions to URLs.

Like almost every other web framework. ;) The only difference from CherryPy is that CP specifies the mapping in code, not config files. Another point to CP.

Having resolved the URL to a view, the view function is called with the request object as the first argument. Other keyword arguments may be passed as well depending on the URL configuration; see the documentation for details.

See above; CherryPy is flatter, and tends to pass data, not internal objects.

The view function is where the bulk of the work happens: it is here that database queries are made, templates loaded, HTML is generated and an HttpResponse object encapsulating the result is created. The view function returns this object, which is then passed back to the environment-specific code (mod_python or WSGI) which passes it back to the browser as an HTTP response.

Again, CherryPy is flatter, expecting you to return data, not objects. You can return a string, an iterable of strings, a file, or None, or yield any of those. Point.

This is all pretty straightforward stuff - but I skipped a couple of important details: exceptions and middleware. The view function doesn't have to return an HttpResponse; it can raise an exception instead, the most common varieties being Http404 (for file-not-found) or Http500 (for server error). In development servers these exceptions will be formatted and sent back to the browser, while in production mode they will be silently logged and a "friendly" error message displayed.

CherryPy also has user-raisable exceptions; however, they're not so low-level. Instead of Http404, you raise cherrypy.NotFound. Instead of Http3xx, you raise cherrypy.HTTPRedirect. I prefer CP's style, of course, but I don't think it's a clear "winner" over Httpxxx exceptions.

Middleware is even more interesting. Django provides three hooks in the above sequence where middleware classes can intervene, with the middleware classes to be used defined in the site's configuration file. This results in three types of middleware: request, view and response (although one middleware class can apply for more than one hook).

CherryPy has 7 such hooks; two are for errors, so let's call it 5 for a more-reasonable comparison. But see my previous post on why static hook points may not be the best approach. Still, 5 is better than 3 :). Point.

The bulk of the above code can be found in the call method of the ModPythonHandler class and the get_response method of the BaseHandler class.

That sounds like an unfortunate violation of the DRY principle. CherryPy isolates all of that nicely via the server.request() function. Are we keeping score yet?

As Django is not yet at a 1.0 release, the above is all subject to potential refactoring future change.

I can't wait to see Django 1.0! Until then, I'm going to take our adolescent web framework and go sulk in my room. ;)

1 2 >>

September 2017
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search

The requested Blog doesn't exist any more!

XML Feeds

free blog software