« Lines of codePython concurrency syntax »

Web Site Process Bus

06/24/07

Permalink 03:22:20 pm, by fumanchu Email , 1679 words   English (US)
Categories: Python, CherryPy, WSGI

Web Site Process Bus

WSGI has enabled an ecosystem where site deployers can, in theory, mix multiple applications from various frameworks into a single web site, served by a single HTTP server. And that's great. But there are several areas where WSGI is purposefully silent, where there is still room for standards-based collaboration:

  • managing WSGI HTTP servers (start/stop/restart)
  • construction of the WSGI component graph (servers -> middlewares -> apps)
  • main process state control (start/stop/restart/graceful)
  • site-wide services (autoreload, thread monitors, site logging)
  • config file formats and parsing for all of the above

Most frameworks address all of the above already, to varying degrees; however, they still tend to do so in a very monolithic manner. Paste is notable for attempting to provide some of them in discrete pieces (especially WSGI graph construction and a config format tailor-made for it).

But I'm going to focus here on just two of these issues: process state and site-wide services. I believe we can separate these two from the rest of the pack and provide a simple, common specification for both, one that's completely implementable in 100 lines of code by any framework.

The problem

One of the largest issues when combining multiple frameworks in a single process is answering the question, "who's in control of the site as a whole?" Multiple frameworks means multiple code bases who all think they should provide:

  • the startup script
  • daemonization
  • dropping privileges
  • PID file management
  • site logging
  • autoreload
  • signal handling
  • sys.exit calls
  • atexit handlers
  • main thread error trapping

...and they often disagree about those behaviors. Throw Apache or lighttpd into the mix and you've got some serious deployment issues.

The typical solution to this is to have each component provide a means of shutting off each process-controlling feature. For example, CherryPy 3 obeys the config entry engine.autoreload_on = False, while django-admin.py takes a --noreload command-line arg. But these are different for each framework, and difficult to coordinate as the number of components grows. Since, for example, only one autoreloader is needed per site, a more usable solution would be to selectively turn on just one instead of turning off all but one.

For a worse example, let's look at handling SIGTERM. Currently, we have the following:

SIGTERM before WSPBus

OK, Django doesn't actually provide a SIGTERM handler, but you get the idea. If several components register a SIGTERM handler, only one of them will "win" by virtue of being the last one to register. And chances are, the winning handler will shut down its component cleanly and then exit the process, leaving other components to fend for themselves.

In fact, there's a whole list of negatives for the monolithic approach to process control and site services:

  1. Frameworks and servers have to provide all desirable site behaviors, or force their packagers/deployers to develop them ad-hoc.
  2. Frameworks and servers all have different API's for changing process state. Race conditions and unpredictable outcomes are common.
  3. Frameworks and servers all have different API's for reacting to process state changes. Resource acquisition and cleanup becomes a huge unknown.
  4. Frameworks and servers have to know they're being deployed alongside other frameworks and servers.

We could attempt to solve this with a Grand Unified Site Container, but that would most likely:

  1. force a single daemon implementation, thus eliminating innovation in process invocation,
  2. force a single configuration syntax, thus denying any market over declaration styles,
  3. force a static set of site services, limiting any improvements in process interaction,
  4. add an additional dependency to every framework,
  5. deny using HTTP servers like Apache and lighttpd in the same process (since they do their own process control), and
  6. be a dumping-ground for every other aspect of web development, from databases to templating.

A solution: the Web Site Process Bus

The Web Site Process Bus uses a simple publish/subscribe architecture to loosely connect WSGI components with site services. Here's our SIGTERM example, implemented with a WSPBus:

SIGTERM after WSPBus

The singleton Bus object does three things:

  1. It models server-availability state via a "state" attribute, which is a sentinel value from the set: (STARTING, STARTED, STOPPING, STOPPED).
  2. It possesses methods to change the state, such as "start", "stop", "restart", "graceful", and "exit".
  3. It possesses "publish" and "subscribe"/"unsubscribe" methods for named channels.

Each method which changes the state also has an equivalent named channel. Any framework, server, or other component may register code as a listener on any channel. For example, a web framework can register database-connection code to be run when the "start" method is called, and disconnection code for the "stop" method:

bus.subscribe("start", orm.connpool.start)
bus.subscribe("stop", orm.connpool.stop)

Any channel which has no listeners will simply ignore all published messages. This allows component code to be much simpler; callers do not need to know whether their actions are appropriate--they are appropriate if a listener is subscribed to that channel.

In addition to the builtin state-transition channels, components are free to define their own pub/sub channels. CherryPy's current implementation, for example, defines the additional channels start_thread and stop_thread, and registers channels for signals, such as "SIGTERM", "SIGHUP", and "SIGUSR1" (which then typically call bus methods like "restart" and "exit"). Some of these could be standardized. Other custom channels would be more naturally tightly-coupled, requiring awareness on the part of callers and callees.

Since WSPB state-changing method calls are expected to be sporadic, and often fundamentally serial (e.g., "autoreload"), their execution is synchronous. Subscribers (mostly of custom channels), however, are free to return immediately, and continue their operation asynchronously.

Benefits

The WSPB cleanly solves all of the problems outlined above. The various components are no longer in competition over process state; instead, there is a single race-free state machine. However, no single component has to know whether or how many other components are deployed in the same site.

Frameworks and servers can provide a subset of all site services, with a common, imperative-Python API for deployers to add or substitute their own. However, the WSPB doesn't define a config syntax, so each framework can continue to provide its own unique layer to translate config into that API. A deployer of a combined Pylons/Zope website could choose a Pylons startup script and config syntax to manage the lifecycle of the Zope components.

The WSPB doesn't try to instantiate or compose WSGI components (server -> middleware -> app) either. So there's even room for site daemons which provide no traditional web app functionality; instead, they specialize in providing tools to compose WSGI component graphs via a config file or even a GUI.

It also "plays nice" with mod_python, mod_proxy, mod_wsgi, FastCGI, and SCGI. Those who develop WSGI gateways for these will have a clear incentive to consolidate their ad-hoc startup and shutdown models into the WSPB. For example, a modpython gateway can use apache.register_cleanup to just call bus.stop() instead of providing custom cleanup-declaration code.

Best of all, the WSPB can be defined as a specification which any framework can provide in a small amount of code. Rather than attempt to draft the specification here (that can be hashed out on Web-SIG, since this is by no means complete), I'm just going to provide an example:

try:
    set
except NameError:
    from sets import Set as set
import sys
import threading
import time
import traceback as _traceback


# Use a flag to indicate the state of the bus.
class _StateEnum(object):
    class State(object):
        pass
states = _StateEnum()
states.STOPPED = states.State()
states.STARTING = states.State()
states.STARTED = states.State()
states.STOPPING = states.State()


class Bus(object):
    """Process state-machine and messenger for HTTP site deployment."""

    states = states
    state = states.STOPPED

    def __init__(self):
        self.state = states.STOPPED
        self.listeners = dict([(channel, set()) for channel
                               in ('start', 'stop', 'exit',
                                   'restart', 'graceful', 'log')])
        self._priorities = {}

    def subscribe(self, channel, callback, priority=None):
        """Add the given callback at the given channel (if not present)."""
        if channel not in self.listeners:
            self.listeners[channel] = set()
        self.listeners[channel].add(callback)

        if priority is None:
            priority = getattr(callback, 'priority', 50)
        self._priorities[(channel, callback)] = priority

    def unsubscribe(self, channel, callback):
        """Discard the given callback (if present)."""
        listeners = self.listeners.get(channel)
        if listeners and callback in listeners:
            listeners.discard(callback)
            del self._priorities[(channel, callback)]

    def publish(self, channel, *args, **kwargs):
        """Return output of all subscribers for the given channel."""
        if channel not in self.listeners:
            return []

        exc = None
        output = []

        items = [(self._priorities[(channel, listener)], listener)
                 for listener in self.listeners[channel]]
        items.sort()
        for priority, listener in items:
            # All listeners for a given channel are guaranteed to run even
            # if others at the same channel fail. We will still log the
            # failure, but proceed on to the next listener. The only way
            # to stop all processing from one of these listeners is to
            # raise SystemExit and stop the whole server.
            try:
                output.append(listener(*args, **kwargs))
            except (KeyboardInterrupt, SystemExit):
                raise
            except:
                self.log("Error in %r listener %r" % (channel, listener),
                         traceback=True)
                exc = sys.exc_info()[1]
        if exc:
            raise
        return output

    def start(self):
        """Start all services."""
        self.state = states.STARTING
        self.log('Bus starting')
        self.publish('start')
        self.state = states.STARTED

    def restart(self):
        """Restart the process (may close connections)."""
        self.stop()

        self.log('Bus restart')
        self.publish('restart')

    def graceful(self):
        """Advise all services to reload."""
        self.log('Bus graceful')
        self.publish('graceful')

    def block(self, state=states.STOPPED, interval=0.1):
        """Wait for the given state, KeyboardInterrupt or SystemExit."""
        try:
            while self.state != state:
                time.sleep(interval)
        except (KeyboardInterrupt, IOError):
            # The time.sleep call might raise
            # "IOError: [Errno 4] Interrupted function call" on KBInt.
            self.log('Keyboard Interrupt: shutting down bus')
            self.stop()
        except SystemExit:
            self.log('SystemExit raised: shutting down bus')
            self.stop()
            raise

    def stop(self):
        """Stop all services."""
        self.state = states.STOPPING
        self.log('Bus stopping')
        self.publish('stop')
        self.state = states.STOPPED

    def exit(self, status=0):
        """Stop all services and exit the process."""
        self.stop()

        self.log('Bus exit')
        self.publish('exit')
        sys.exit(status)

    def log(self, msg="", traceback=False):
        if traceback:
            exc = sys.exc_info()
            msg += "\n" + "".join(_traceback.format_exception(*exc))
        self.publish('log', msg)

3 comments

Comment from: VanL [Visitor]

This sounds like a good idea - but isn't it a specialized re-implementation of D-Bus? Why not keep the same idea and standardize on D-Bus? I would be just as easy to create pure-python implementation of D-Bus as to specify and create another messaging bus. It would be even easier to use the existing D-Bus python bindings and spend any necessary time at the message level.

06/25/07 @ 04:27
Comment from: fumanchu [Member] Email

The WSPBus is a bus, yes, but D-Bus is primarily for interprocess comms, while WSPBus is for intraprocess comms. So all of the overhead of sockets, marshalling, authentication are really unnecessary in this context. The other big difference is the D-Bus is designed primarily for one-to-one messaging, while the WSPBus is designed for one-to-many messaging. Finally, D-Bus is designed for RPC, whereas WSPBus can be useful as a pure pub/sub bus.

One of my design goals for this spec is to minimize the pain for framework and server authors. So "bindings" are right out, since that would introduce C code into several pure-Python packages. And a pure-Python implementation of D-Bus would be much larger than WSPBus; Debian's python-dbus package is ~800KB, but CP's current version of WSPBus is 8KB.

06/25/07 @ 09:07
Comment from: rgz [Visitor]

Real nice, gotta write my turbogears representative about this! Yes i see the usefulness of a single process solution for sigterm sending. Thanks :)

06/25/07 @ 12:43

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
December 2014
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Search

The requested Blog doesn't exist any more!

XML Feeds

blog software