Categories: Python, Cation, CherryPy, Dejavu, WHELPS, WSGI

Pages: << 1 2 3 4 5 6 7 8 9 10 11 >>

07/09/07

Permalink 12:27:50 pm, by fumanchu Email , 241 words   English (US)
Categories: Python, Dejavu, CherryPy

Lines of code

I was asked last week how many lines of code some of my projects are, and didn't have an answer handy. Fortunately, it's easy to write a LOC counter in Python:

"""Calculate LOC (lines of code) for a given package directory."""

import os
import re

def loc(path, pattern="^.*\.py$"):
    """Return the number of lines of code for all files in the given path.

    If the 'pattern' argument is provided, it must be a regular expression
    against which each filename will be matched. By default, all filenames
    ending in ".py" are analyzed.
    """
    lines = 0
    for root, dirs, files in os.walk(path):
        for name in files:
            if re.match(pattern, name):
                f = open(os.path.join(root, name), 'rb')
                for line in f:
                    line = line.strip()
                    if line and not line.startswith("#"):
                        lines += 1
                f.close()
    return lines

I've added the above to my company's public-domain misc package at http://projects.amor.org/misc/. Here are the results for my high-priority projects (some are proprietary):

>>> from misc import loc
>>> loc.loc(r"C:\Python24\Lib\site-packages\raisersedge")
2290
>>> loc.loc(r"C:\Python24\Lib\site-packages\dejavu")
7703
>>> loc.loc(r"C:\Python24\Lib\site-packages\geniusql")
9509
>>> loc.loc(r"C:\Python24\Lib\site-packages\cherrypy")
16391
>>> loc.loc(r"C:\Python24\Lib\site-packages\endue")
9339
>>> loc.loc(r"C:\Python24\Lib\site-packages\mcontrol")
11512
>>> loc.loc(r"C:\Python24\Lib\site-packages\misc")
4648

~= 61 kloc. Pretty hefty for a single in-house web app stack. :/ But, hey, nobody said integration projects were easy.

06/24/07

Permalink 03:22:20 pm, by fumanchu Email , 1679 words   English (US)
Categories: Python, CherryPy, WSGI

Web Site Process Bus

WSGI has enabled an ecosystem where site deployers can, in theory, mix multiple applications from various frameworks into a single web site, served by a single HTTP server. And that's great. But there are several areas where WSGI is purposefully silent, where there is still room for standards-based collaboration:

  • managing WSGI HTTP servers (start/stop/restart)
  • construction of the WSGI component graph (servers -> middlewares -> apps)
  • main process state control (start/stop/restart/graceful)
  • site-wide services (autoreload, thread monitors, site logging)
  • config file formats and parsing for all of the above

Most frameworks address all of the above already, to varying degrees; however, they still tend to do so in a very monolithic manner. Paste is notable for attempting to provide some of them in discrete pieces (especially WSGI graph construction and a config format tailor-made for it).

But I'm going to focus here on just two of these issues: process state and site-wide services. I believe we can separate these two from the rest of the pack and provide a simple, common specification for both, one that's completely implementable in 100 lines of code by any framework.

The problem

One of the largest issues when combining multiple frameworks in a single process is answering the question, "who's in control of the site as a whole?" Multiple frameworks means multiple code bases who all think they should provide:

  • the startup script
  • daemonization
  • dropping privileges
  • PID file management
  • site logging
  • autoreload
  • signal handling
  • sys.exit calls
  • atexit handlers
  • main thread error trapping

...and they often disagree about those behaviors. Throw Apache or lighttpd into the mix and you've got some serious deployment issues.

The typical solution to this is to have each component provide a means of shutting off each process-controlling feature. For example, CherryPy 3 obeys the config entry engine.autoreload_on = False, while django-admin.py takes a --noreload command-line arg. But these are different for each framework, and difficult to coordinate as the number of components grows. Since, for example, only one autoreloader is needed per site, a more usable solution would be to selectively turn on just one instead of turning off all but one.

For a worse example, let's look at handling SIGTERM. Currently, we have the following:

SIGTERM before WSPBus

OK, Django doesn't actually provide a SIGTERM handler, but you get the idea. If several components register a SIGTERM handler, only one of them will "win" by virtue of being the last one to register. And chances are, the winning handler will shut down its component cleanly and then exit the process, leaving other components to fend for themselves.

In fact, there's a whole list of negatives for the monolithic approach to process control and site services:

  1. Frameworks and servers have to provide all desirable site behaviors, or force their packagers/deployers to develop them ad-hoc.
  2. Frameworks and servers all have different API's for changing process state. Race conditions and unpredictable outcomes are common.
  3. Frameworks and servers all have different API's for reacting to process state changes. Resource acquisition and cleanup becomes a huge unknown.
  4. Frameworks and servers have to know they're being deployed alongside other frameworks and servers.

We could attempt to solve this with a Grand Unified Site Container, but that would most likely:

  1. force a single daemon implementation, thus eliminating innovation in process invocation,
  2. force a single configuration syntax, thus denying any market over declaration styles,
  3. force a static set of site services, limiting any improvements in process interaction,
  4. add an additional dependency to every framework,
  5. deny using HTTP servers like Apache and lighttpd in the same process (since they do their own process control), and
  6. be a dumping-ground for every other aspect of web development, from databases to templating.

A solution: the Web Site Process Bus

The Web Site Process Bus uses a simple publish/subscribe architecture to loosely connect WSGI components with site services. Here's our SIGTERM example, implemented with a WSPBus:

SIGTERM after WSPBus

The singleton Bus object does three things:

  1. It models server-availability state via a "state" attribute, which is a sentinel value from the set: (STARTING, STARTED, STOPPING, STOPPED).
  2. It possesses methods to change the state, such as "start", "stop", "restart", "graceful", and "exit".
  3. It possesses "publish" and "subscribe"/"unsubscribe" methods for named channels.

Each method which changes the state also has an equivalent named channel. Any framework, server, or other component may register code as a listener on any channel. For example, a web framework can register database-connection code to be run when the "start" method is called, and disconnection code for the "stop" method:

bus.subscribe("start", orm.connpool.start)
bus.subscribe("stop", orm.connpool.stop)

Any channel which has no listeners will simply ignore all published messages. This allows component code to be much simpler; callers do not need to know whether their actions are appropriate--they are appropriate if a listener is subscribed to that channel.

In addition to the builtin state-transition channels, components are free to define their own pub/sub channels. CherryPy's current implementation, for example, defines the additional channels start_thread and stop_thread, and registers channels for signals, such as "SIGTERM", "SIGHUP", and "SIGUSR1" (which then typically call bus methods like "restart" and "exit"). Some of these could be standardized. Other custom channels would be more naturally tightly-coupled, requiring awareness on the part of callers and callees.

Since WSPB state-changing method calls are expected to be sporadic, and often fundamentally serial (e.g., "autoreload"), their execution is synchronous. Subscribers (mostly of custom channels), however, are free to return immediately, and continue their operation asynchronously.

Benefits

The WSPB cleanly solves all of the problems outlined above. The various components are no longer in competition over process state; instead, there is a single race-free state machine. However, no single component has to know whether or how many other components are deployed in the same site.

Frameworks and servers can provide a subset of all site services, with a common, imperative-Python API for deployers to add or substitute their own. However, the WSPB doesn't define a config syntax, so each framework can continue to provide its own unique layer to translate config into that API. A deployer of a combined Pylons/Zope website could choose a Pylons startup script and config syntax to manage the lifecycle of the Zope components.

The WSPB doesn't try to instantiate or compose WSGI components (server -> middleware -> app) either. So there's even room for site daemons which provide no traditional web app functionality; instead, they specialize in providing tools to compose WSGI component graphs via a config file or even a GUI.

It also "plays nice" with mod_python, mod_proxy, mod_wsgi, FastCGI, and SCGI. Those who develop WSGI gateways for these will have a clear incentive to consolidate their ad-hoc startup and shutdown models into the WSPB. For example, a modpython gateway can use apache.register_cleanup to just call bus.stop() instead of providing custom cleanup-declaration code.

Best of all, the WSPB can be defined as a specification which any framework can provide in a small amount of code. Rather than attempt to draft the specification here (that can be hashed out on Web-SIG, since this is by no means complete), I'm just going to provide an example:

try:
    set
except NameError:
    from sets import Set as set
import sys
import threading
import time
import traceback as _traceback


# Use a flag to indicate the state of the bus.
class _StateEnum(object):
    class State(object):
        pass
states = _StateEnum()
states.STOPPED = states.State()
states.STARTING = states.State()
states.STARTED = states.State()
states.STOPPING = states.State()


class Bus(object):
    """Process state-machine and messenger for HTTP site deployment."""

    states = states
    state = states.STOPPED

    def __init__(self):
        self.state = states.STOPPED
        self.listeners = dict([(channel, set()) for channel
                               in ('start', 'stop', 'exit',
                                   'restart', 'graceful', 'log')])
        self._priorities = {}

    def subscribe(self, channel, callback, priority=None):
        """Add the given callback at the given channel (if not present)."""
        if channel not in self.listeners:
            self.listeners[channel] = set()
        self.listeners[channel].add(callback)

        if priority is None:
            priority = getattr(callback, 'priority', 50)
        self._priorities[(channel, callback)] = priority

    def unsubscribe(self, channel, callback):
        """Discard the given callback (if present)."""
        listeners = self.listeners.get(channel)
        if listeners and callback in listeners:
            listeners.discard(callback)
            del self._priorities[(channel, callback)]

    def publish(self, channel, *args, **kwargs):
        """Return output of all subscribers for the given channel."""
        if channel not in self.listeners:
            return []

        exc = None
        output = []

        items = [(self._priorities[(channel, listener)], listener)
                 for listener in self.listeners[channel]]
        items.sort()
        for priority, listener in items:
            # All listeners for a given channel are guaranteed to run even
            # if others at the same channel fail. We will still log the
            # failure, but proceed on to the next listener. The only way
            # to stop all processing from one of these listeners is to
            # raise SystemExit and stop the whole server.
            try:
                output.append(listener(*args, **kwargs))
            except (KeyboardInterrupt, SystemExit):
                raise
            except:
                self.log("Error in %r listener %r" % (channel, listener),
                         traceback=True)
                exc = sys.exc_info()[1]
        if exc:
            raise
        return output

    def start(self):
        """Start all services."""
        self.state = states.STARTING
        self.log('Bus starting')
        self.publish('start')
        self.state = states.STARTED

    def restart(self):
        """Restart the process (may close connections)."""
        self.stop()

        self.log('Bus restart')
        self.publish('restart')

    def graceful(self):
        """Advise all services to reload."""
        self.log('Bus graceful')
        self.publish('graceful')

    def block(self, state=states.STOPPED, interval=0.1):
        """Wait for the given state, KeyboardInterrupt or SystemExit."""
        try:
            while self.state != state:
                time.sleep(interval)
        except (KeyboardInterrupt, IOError):
            # The time.sleep call might raise
            # "IOError: [Errno 4] Interrupted function call" on KBInt.
            self.log('Keyboard Interrupt: shutting down bus')
            self.stop()
        except SystemExit:
            self.log('SystemExit raised: shutting down bus')
            self.stop()
            raise

    def stop(self):
        """Stop all services."""
        self.state = states.STOPPING
        self.log('Bus stopping')
        self.publish('stop')
        self.state = states.STOPPED

    def exit(self, status=0):
        """Stop all services and exit the process."""
        self.stop()

        self.log('Bus exit')
        self.publish('exit')
        sys.exit(status)

    def log(self, msg="", traceback=False):
        if traceback:
            exc = sys.exc_info()
            msg += "\n" + "".join(_traceback.format_exception(*exc))
        self.publish('log', msg)

06/05/07

Permalink 02:19:41 pm, by admin Email , 408 words   English (US)
Categories: IT, Python

Python concurrency syntax

via Bill de hÓra, I ran across this thread on LtU wherein Peter Van Roy comments:

The real problem is not threads as such; it is threads plus shared mutable state. To solve this problem, it's not necessary to throw away threads. It is sufficient to disallow mutable state shared between threads (mutable state local to one thread is still allowed).

...and Allan McInnes adds:

The "problem with threads" lies in the current approach to sharing state by default, and "pruning away nondeterminism" to get a correctly functioning system.

...and "dbfaken" adds:

Perhaps we should have strong syntax distinctions for mutation.

Since the first versions of Dejavu (my Python mediated-DB/ORM), I've noticed that this "pruning away nondeterminism" approach is exactly the wrong direction for systems which are designed to be thread-safe; we could instead explore languages and systems which allow us to "prune away determinism". By that I mean, mutable state should not be shared between threads by default; any mutable state which needs to be shared should be explicitly declared as such. This would make systems like Dejavu much simpler to create, use, and maintain.

I've often wondered what a "strong syntax distinction for [shared] mutation" would look like in Python. The simplest solution would probably have to:

  1. Make class.__dict__'s immutable. This is a natural choice given the normal usage patterns of classes by developers in the wild: generally, a class exists to share methods between instances. There are valid use cases for classes which are mutable, but they are rare; perhaps a sentinel of some kind provided by object could re-enable mutability for classes, but it should be off by default.
  2. Make all module.__dict__'s immutable. This has already been suggested on python-dev (IIRC by GvR himself), although I believe it was suggested as a way to reduce monkeypatching.
  3. Provide a @shared annotation for explicitly declaring shared mutable data.

This is just one solution to a small set of use cases: threaded programs where the explicit shared state is small compared to the total lines of code. I haven't the experience to state whether such a model is inherently damaging to other concurrent needs and designs. It has the benefit, however, of having little impact on single-threaded programs.

Would such a feature help catapult Python into the "large systems" space?

03/16/07

Permalink 03:55:32 pm, by fumanchu Email , 23 words   English (US)
Categories: CherryPy

It's official: CherryPy rocks

From sucks-rocks.com:

CherryPy rocks

No, really. It rocks. Rocks, rocks, rocks.

(Thanks, jamwt!)

02/25/07

Permalink 02:21:23 pm, by admin Email , 572 words   English (US)
Categories: Python, CherryPy, WSGI

PyCon 2007 and CherryPy

PyCon 2007 is nearing a close; here are some notes on how it affected CherryPy:

Web application deployment

Chad Whitacre (author of Aspen) herded several cats into a room on Sunday and forced us to discuss the various issues surrounding Python web application deployment. This is hinted at in the WSGI spec:

Finally, it should be mentioned that the current version of WSGI does not prescribe any particular mechanism for "deploying" an application for use with a web server or server gateway. At the present time, this is necessarily implementation-defined by the server or gateway. After a sufficient number of servers and frameworks have implemented WSGI to provide field experience with varying deployment requirements, it may make sense to create another PEP, describing a deployment standard for WSGI servers and application frameworks.

There were three basic realms where the participants agreed we could try to collaborate/standardize:

  1. Process control: stop, start, restart, daemonization, signal handling, socket re-use, drop privileges, etc. If you're familiar with CherryPy 3, you'll recognize this list as 95% of the current cherrypy.engine object. The CherryPy team has already been discussing ways of breaking up the Engine object; this may facilitate that (and vice-versa). Joseph Tate volunteered to look at socket re-use issues specifically, but the general consensus seemed to be that much of this would be hashed out on Web-SIG.

  2. WSGI stack composition: Jim Fulton proposed that we could all agree on Paste Deploy (at least a good portion of the API) to manage this in a cross-framework manner. Most heads nodded, "yes". Jim also proposed that each of the framework authors take the next week to refamiliarize themselves with Deploy, and then start pestering Ian Bicking with specific API issues. Ian suggested that he should fork Paste Deploy into another project specifically for this. For CherryPy, this would first mean offering standard egg entry points. [Personally, I'd like to standardize on a pure-Python API for deploy, not a config file format API. In other words, make the config file format optional, so that users of CP-only apps could avoid having to learn a distinct config file format for deployment. It should be possible to transform various config file formats into the same Python object(s).]

  3. Benchmarks: Jim also suggested we create a standard WSGI HTTP server benchmark suite, with various test applications and concurrency scenarios. This would compare various WSGI HTTP servers, as opposed to CherryPy's existing benchmark suite which compares successive versions of the full CP stack. Ian volunteered to begin work on that project (with the expectation that others would contribute substantial use cases, etc).

Others who were present for at least a portion of the long discussion: me, Mark Ramm, Kevin Dangoor, Ben Bangert, Jonathan Ellis, Matt Good, Brian Beck, and Calvin Hendryx-Parker.

WSGI middleware authoring

After some discussion with Mark (and he with Ian and Ben), we agreed that CherryPy could do more in the WSGI-middleware-authoring department. There is a continuous pressure to simply re-use or fix up the existing CherryPy request object to fill this need; however, there are some fundamental problems with that approach (such as the use of threadlocals to manage context, and the difficulty of streaming WSGI output through a CherryPy app). At the moment, I'm leaning toward adding a new API to CherryPy which would be similar to the application API, but specifically targeted at middleware authoring.

01/23/07

Permalink 12:05:53 pm, by fumanchu Email , 433 words   English (US)
Categories: Python, Dejavu

Mapping Python types to DB types

Reading Barry Warsaw's recent use of SQLAlchemy, I'm reminded once again of how ugly I find SQLAlchemy's PickleType and SQLObject's PickleCol concepts. I have nothing against the concept of pickle itself, mind you, but I do have an issue with implementation layer names leaking into application code.

The existence of a PickleType (and BlobType, etc.) means that the application developer needs to think in terms of database types. This adds another mental model to the user's (my) tiny brain, one which is unnecessary. It constantly places the burden on the developer to map Python types to database types.

For Dejavu, I started in the opposite direction, and decided that object properties would be declared in terms of Python types, not database types. When you write a new Unit class, you even pass the actual type (such as int or unicode) to the property constructor instead of a type name! Instead of separate classes for each type, there is only a single UnitProperty class. This frees programmers from having to map types in their code (and therefore in their heads); it removes an entire mental model (DB types) at coding time, and allows the programmer to remain in the Python flow.

However, the first versions of Dejavu went too far in this approach, mostly due to the fact that Dejavu started from the "no legacy" side of ORM development; that is, it assumed your Python model would always create the database. This allowed Dejavu to choose appropriate database types for the declared Python types, but meant that existing applications (with existing data) were difficult to port to Dejavu, because the type-adaptation machinery had no way to recognize and handle database types other than those Dejavu preferred to create.

Dejavu 1.5 (soon to be released) corrects this by allowing true "M x N" type adaptation. What this means is that you can continue to directly use Python types in your model, but you also gain complete control over the database types. The built-in type adapters understand many more (Python type <-> DB type) adaptation pairs, now, but you also have the power to add your own. In addition, Dejavu now has DB type-introspection capabilities—the database types will be discovered for you, and appropriate adapters used on the fly. [...and Dejavu now allows you to automatically create Python models from existing databases.]

In short, it is possible to have an ORM with abstractions that don't leak (at least not on a regular basis—the construction of a custom adapter requires some thought ;) ).

01/20/07

Permalink 02:46:25 am, by fumanchu Email , 1378 words   English (US)
Categories: CherryPy

help(CherryPy 3.0)

Abstract

  1. CherryPy just grew its first metaclass.
  2. CherryPy just grew its first stdlib monkeypatch.
  3. Because of 1 and 2, CherryPy is now a heck of a lot easier to learn and use.
  4. Points 1, 2, and 3 all apply to unreleased trunk code and are subject to change.

Intro

I've been a proper fool (and might still be). I've been telling everyone that CherryPy 3 is much easier to learn and use because it's been tailored to be help()-ful. What I meant was that you could open an interactive interpreter, type help(cherrypy.<thing>) and have at least some idea of what it does. I spent quite a bit of time honing the top-level namespace down to as few components as possible (and some of the component namespaces, too) in order to make help() easier to read.

This is harder to do than you might think. Unlike simple linear scripts or libraries, the most important objects when CherryPy is "live" don't exist at an interactive prompt. The Request, Response, and Session objects are all heavily dependent on the context of a real HTTP conversation. They're hard to create in a vacuum. And although there's one of each per thread while the system is running, they are implemented as thread local objects so that the CherryPy programmer can treat each of them as if there were only one: a global.

Reusing thread locals

Thread locals are a great invention, but they suffer from one serious drawback when used in a threaded framework: they allow anyone to add attributes to them. If the framework re-uses the same thread for multiple requests, it becomes difficult to reliably clean out all of those attributes between requests.

CherryPy's solution to that was to add a container in 2.1; instead of a separate thread local for the Request, Response, and Session objects, there is a single, hidden thread local called cherrypy._serving, and the Request, Response, and Session objects for each thread are attributes of the "serving" object. This makes it easy for cleanup code: it just calls cherrypy._serving.__dict__.clear() when the request ends. (Aside: this technique also allows the Request, Response and Session types to be overridden).

However, pushing those objects into a container means they're no longer so easy to reference. CherryPy code would become uglier and more difficult if, instead of:

cherrypy.request.method

...you had to write:

cherrypy._serving.request.method

So a _ThreadLocalProxy class was introduced to allow CherryPy code to keep writing the nicer, shorter syntax. In short, it passes __getattr__ (and other double-underscore methods) through to a wrapped object. So cherrypy.request became a proxy object to a wrapped Request object. Ditto for response and session.

That was fine for CherryPy 2, but one of the goals for version 3.0 is better IDE support. Most IDE's at least provide calltips for code completion, but there aren't usually any HTTP requests coming in as you're writing code! CP 2's thread local proxies didn't have a request object in the main thread (or any thread that wasn't started by the HTTP server), so typing cherrypy.request. couldn't result in a calltip as you coded. The solution for CherryPy 3 was to have the proxy's __getattr__ and friends wrap a default object if a live object could not be found. And the default objects' attributes are true defaults; if they're not overridden (in config or code), they won't change when the system goes live. This makes interactive exploration even easier; you can forget all about the threading and pretend you're looking at live, global objects.

help(proxy) isn't helpful

But there's another catch: one of the few problems with using a proxy object in pure Python is that it's no longer of the same type as the wrapped object. Unfortunately for us, Python's builtin help function uses pydoc, and pydoc calls type(obj) quite a bit.

You can certainly call help(cherrypy.request.run) and get the correct docstring, because "run" is an attribute of cherrypy.request, the proxy calls __getattr__ first, and then type() is called on the attribute, not the request object/proxy. But if you attempt help(cherrypy.request), you're in for some confusion, because the proxy implementation leaks out.

Or rather, it did leak out until just now. I took the plunge and CherryPy now monkeypatches pydoc, so that it "passes the help() call through the proxy". Monkeypatching the standard library is of course a huge no-no, but the alternative was to essentially copy and paste most of pydoc and distribute the result with CherryPy. Now, help(cherrypy.response) at least prints:

>>> help(cherrypy.response)
Help on Response in module cherrypy._cprequest object:

class Response(__builtin__.object)
 |  An HTTP Response, including status, headers, and body.
 |  
 |  Application developers should use Response.headers (a dict) to
 |  set or modify HTTP response headers. When the response is finalized,
 |  Response.headers is transformed into Response.header_list as
 |  (key, value) tuples.
 |  
 |  Methods defined here:
 |  
 |  __init__(self)
 |  
 |  check_timeout(self)
 |      If now > self.time + self.timeout, set self.timed_out.
 |      
 |      This purposefully sets a flag, rather than raising an error,
 |      so that a monitor thread can interrupt the Response thread.
 |  
 |  collapse_body(self)
 |  
 |  finalize(self)
 |      Transform headers (and cookies) into self.header_list.
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes defined here:
 |  
 |  __dict__ = <dictproxy object>
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__ = <attribute '__weakref__' of 'Response' objects>
 |      list of weak references to the object (if defined)
 |  
 |  body = <cherrypy._cprequest.Body object>
 |      The body of the HTTP response (the response entity).
 |  
 |  cookie = <SimpleCookie: >
 |  
 |  header_list = []
 |  
 |  headers = {}
 |  
 |  status = ''
 |  
 |  stream = False
 |  
 |  time = None
 |  
 |  timed_out = False
 |  
 |  timeout = 300

Documenting data

But there's a further flaw with the above output of help(); none of the data members of the Response class are documented! A few of them are mentioned in the class docstring, to be sure, but hardly to a truly useful extent. The Request object is an even poorer state, since it has so many more data members.

The solution for that issue is somewhat complicated, as well. It turns out that there are plenty of good documentation generators for Python code (that emit HTML or text; epydoc and pudge spring to mind), but no serious helpers for making help() more informative. This is a real shame; I would almost always rather have help() be truly helpful than go read a book or search online docs.

So I proposed a (small!) metaclass to help alleviate the problem for CherryPy. When you look at CherryPy source code, now, you might see something like this:

class Request(object):
    """An HTTP request."""

    __metaclass__ = cherrypy._AttributeDocstrings

    prev = None
    prev__doc = """
    The previous Request object (if any). This should be None
    unless we are processing an InternalRedirect."""

    # Conversation/connection attributes
    local = http.Host("localhost", 80)
    local__doc = \
        "An http.Host(ip, port, hostname) object for the server socket."

    remote = http.Host("localhost", 1111)
    remote__doc = \
        "An http.Host(ip, port, hostname) object for the client socket."

The _AttributeDocstrings metaclass does one thing: finds class members whose names look like <attrname>__doc, takes their str value, formats it, and folds it into the class docstring. Here's a snippet of the resulting help() output:

Help on Request in module cherrypy._cprequest object:

class Request(__builtin__.object)
 |  An HTTP request.
 |  
 |  local [= http.Host('localhost', 80, 'localhost')]:
 |      An http.Host(ip, port, hostname) object for the server socket.
 |  
 |  prev [= None]:
 |      The previous Request object (if any). This should be None
 |      unless we are processing an InternalRedirect.
 |  
 |  remote [= http.Host('localhost', 1111, 'localhost')]:
 |      An http.Host(ip, port, hostname) object for the client socket.

Christian's first question was, "why not just write it yourself by hand in the docstring?" Here's the long answer. The metaclass:

  1. Places the docstring nearer to the attribute declaration.
  2. Makes attribute docs more uniform ("name (default): doc").
  3. Automatically gets the attribute name right in the docstring.
  4. Automatically gets the default value right in the docstring.

I chose the naming convention because it allows the attribute name and the attribute__doc name to line up horizontally (it doesn't matter which comes first; I prefer to put the doc after the attribute). It also looks similar to the conventions in Python's C code, where doc variable names look like module_attribute__doc__ or sometimes just attribute_doc.

Code faster

Hopefully these two improvements, although more awkward than I like implementation-wise, will make using CherryPy much easier and faster. Feel free to help() us out by writing a few data member docstrings!

01/17/07

Permalink 05:12:12 pm, by fumanchu Email , 53 words   English (US)
Categories: Python

Selenium RC fixed for FF 2.0.0.1

The bug report is here. And, as stated in the original forum thread, you can get a nightly release from http://release.openqa.org/selenium-remote-control/nightly/ . Just replace your existing selenium-server.jar with the new one.

12/23/06

Permalink 06:05:52 pm, by fumanchu Email , 608 words   English (US)
Categories: CherryPy

CherryPy 3 has fastest WSGI server yet

A couple of months ago, in response to someone else's speed claims, I posted a comment that CherryPy's built in WSGI server could serve 1200 simple requests per second. The demo used Apache's "ab" tool to test ("-k -n 3000 -c %s"). In the last few days before the release of CherryPy 3.0 final, I've done some further optimization of cherrypy.wsgiserver, and now get 2000+ req/sec on my modest laptop.

threads | Completed | Failed | req/sec | msec/req | KB/sec |
     10 |      3000 |      0 | 2170.79 |    0.461 | 358.18 |
     20 |      3000 |      0 | 2080.34 |    0.481 | 343.26 |
     30 |      3000 |      0 | 1920.31 |    0.521 | 316.85 |
     40 |      3000 |      0 | 2051.84 |    0.487 | 338.55 |
     50 |      3000 |      0 | 2051.84 |    0.487 | 338.55 |

The improvements are due to a variety of optimizations, including:

  • Replacing mimetools/rfc822.Message with custom code for reading headers.
  • Using socket.sendall instead of a socket fileobject for writes.
  • Generic hand-tuning of code loops.

I want to make it clear that the benchmark does not exercise any part of CherryPy other than the WSGI server. I used a very simple WSGI application (not the full CherryPy stack):

def simple_app(environ, start_response):
    """Simplest possible application object"""
    status = '200 OK'
    response_headers = [('Content-type','text/plain'),
                        ('Content-Length','19')]
    start_response(status, response_headers)
    return ['My Own Hello World!']

The full stack of CherryPy includes the WSGI application side as well, and consequently takes more time. But that has risen from about 380 requests per second in October to:

Client Thread Report (1000 requests, 14 byte response body, 10 server threads):

threads | Completed | Failed | req/sec | msec/req | KB/sec |
     10 |      1000 |      0 |  536.86 |    1.863 |  85.36 |
     20 |      1000 |      0 |  509.47 |    1.963 |  81.01 |
     30 |      1000 |      0 |  499.28 |    2.003 |  79.39 |
     40 |      1000 |      0 |  491.90 |    2.033 |  78.21 |
     50 |      1000 |      0 |  504.32 |    1.983 |  80.19 |
Average |    1000.0 |    0.0 | 508.366 |    1.969 | 80.832 |

If you want to benchmark the full CherryPy stack on your own, just install CherryPy and run the script at cherrypy/test/benchmark.py.

Here's the other script for the "bare server" benchmarks:

import re
import sys
import threading
import time
from cherrypy import _cpmodpy

AB_PATH = ""
APACHE_PATH = "apache"
SCRIPT_NAME = ""
PORT = 8080


class ABSession:
    """A session of 'ab', the Apache HTTP server  benchmarking tool."""
    parse_patterns = [('complete_requests', 'Completed',
                       r'^Complete requests:\s*(\d+)'),
                      ('failed_requests', 'Failed',
                       r'^Failed requests:\s*(\d+)'),
                      ('requests_per_second', 'req/sec',
                       r'^Requests per second:\s*([0-9.]+)'),
                      ('time_per_request_concurrent', 'msec/req',
                       r'^Time per request:\s*([0-9.]+).*concurrent requests\)$'),
                      ('transfer_rate', 'KB/sec',
                       r'^Transfer rate:\s*([0-9.]+)'),
                      ]

    def __init__(self, path=SCRIPT_NAME + "/", requests=3000, concurrency=10):
        self.path = path
        self.requests = requests
        self.concurrency = concurrency

    def args(self):
        assert self.concurrency > 0
        assert self.requests > 0
        return ("-k -n %s -c %s <a href="http://localhost:%s%s"">http://localhost:%s%s"</a> %
                (self.requests, self.concurrency, PORT, self.path))

    def run(self):
        # Parse output of ab, setting attributes on self
        args = self.args()
        self.output = _cpmodpy.read_process(AB_PATH or "ab", args)
        for attr, name, pattern in self.parse_patterns:
            val = re.search(pattern, self.output, re.MULTILINE)
            if val:
                val = val.group(1)
                setattr(self, attr, val)
            else:
                setattr(self, attr, None)


safe_threads = (25, 50, 100, 200, 400)
if sys.platform in ("win32",):
    # For some reason, ab crashes with > 50 threads on my Win2k laptop.
    safe_threads = (10, 20, 30, 40, 50)


def thread_report(path=SCRIPT_NAME + "/", concurrency=safe_threads):
    sess = ABSession(path)
    attrs, names, patterns = zip(*sess.parse_patterns)
    rows = [('threads',) + names]
    for c in concurrency:
        sess.concurrency = c
        sess.run()
        rows.append([c] + [getattr(sess, attr) for attr in attrs])
    return rows

def print_report(rows):
    widths = []
    for i in range(len(rows[0])):
        lengths = [len(str(row[i])) for row in rows]
        widths.append(max(lengths))
    for row in rows:
        print
        for i, val in enumerate(row):
            print str(val).rjust(widths[i]), "|",
    print


if __name__ == '__main__':

    def simple_app(environ, start_response):
        """Simplest possible application object"""
        status = '200 OK'
        response_headers = [('Content-type','text/plain'),
                            ('Content-Length','19')]
        start_response(status, response_headers)
        return ['My Own Hello World!']

    from cherrypy import wsgiserver as w
    s = w.CherryPyWSGIServer(("localhost", PORT), simple_app)
    threading.Thread(target=s.start).start()
    try:
        time.sleep(1)
        print_report(thread_report())
    finally:
        s.stop()

11/13/06

Permalink 10:57:51 pm, by fumanchu Email , 745 words   English (US)
Categories: Python, CherryPy, WSGI

Internal Redirect WSGI middleware

I played around with this as a potential hack for CherryPy 3. It's WSGI middleware for adding almost-transparent "internal redirect" capabilities to any WSGI application.

My operating theory was that anyone writing a WSGI app that does not already have an internal-redirect feature was probably using HTTP redirects (302, 303, or 307) to do nearly the same thing. This middleware simply waits for a 307 response status and performs the redirection itself within the same request, without informing the user-agent.

This should be OK because 307 isn't normally cacheable anyway, and some versions of IE don't bother to ask the user as the spec requires already, so it just duplicates an existing browser bug. I could have used a custom HTTP code like 399, but if that ever leaked out to the UA (because someone forgot to enable the middleware) then the UA should fall back to "300 Multiple Choices", which didn't seem like a good fit. At least by using 307, the fallback should be appropriate, if not graceful.

Here's the code, which could probably use some improvements:

"""WSGI middleware which performs "internal" redirection."""

import StringIO


class _Redirector(object):

    def __init__(self, nextapp, recursive=False):
        self.nextapp = nextapp
        self.recursive = recursive

        self.location = None
        self.write_proxy = None
        self.status = None
        self.headers = None
        self.exc_info = None

        self.seen_paths = []

    def start_response(self, status, headers, exc_info):
        if status[:3] == "307":
            for name, value in headers:
                if name.lower() == "location":
                    self.location = value
                    break
        self.status = status
        self.headers = headers
        self.exc_info = exc_info
        return self.write

    def write(self, data):
        # This is only here for silly apps which call write.
        if self.write_proxy is None:
            self.write_proxy = self.sr(self.status, self.headers, self.exc_info)
        self.write_proxy(data)

    def __call__(self, environ, start_response):
        self.sr = start_response

        nextenv = environ.copy()
        curpath = nextenv['PATH_INFO']
        if nextenv.get('QUERY_STRING'):
            curpath = curpath + "?" + nextenv['QUERY_STRING']
        self.seen_paths.append(curpath)

        while True:
            # Consume the response (in case it's a generator).
            response = [x for x in self.nextapp(nextenv, self.start_response)]

            if self.location is None:
                # No redirection required; complete the response normally.
                self.sr(self.status, self.headers, self.exc_info)
                return response

            # Start with a fresh copy of the environ and start altering it.
            nextenv = environ.copy()
            nextenv['REQUEST_METHOD'] = 'GET'
            nextenv['CONTENT_LENGTH'] = '0'
            nextenv['wsgi.input'] = StringIO.StringIO()
            nextenv['redirector.history'] = self.seen_paths[:]

            # "The [Location response-header] field value
            # consists of a single absolute URI."
            (nextenv["wsgi.url_scheme"],
             nextenv["SERVER_NAME"],
             path, params,
             nextenv["QUERY_STRING"], frag) = urlparse(self.location)

            if frag:
                raise ValueError("Illegal #fragment in Location response "
                                 "header %r" % self.location)

            if params:
                path = path + ";" + params

            # Assume 'path' is already unquoted according to
            # <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2">http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.2</a>
            if path.lower().startswith(environ['SCRIPT_NAME'].lower()):
                nextenv["PATH_INFO"] = path[len(environ['SCRIPT_NAME']):]
            else:
                raise ValueError("Location response header %r does not "
                                 "match current SCRIPT_NAME %r"
                                 % (self.location, environ['SCRIPT_NAME']))

            # Update self.seen_paths and check for recursive calls.
            curpath = nextenv['PATH_INFO']
            if nextenv.get('QUERY_STRING'):
                curpath = curpath + "?" + nextenv['QUERY_STRING']
            if curpath in self.seen_paths:
                raise RuntimeError("redirector visited the same URL twice: %r"
                                   % curpath)
            else:
                self.seen_paths.append(curpath)

            # Reset self for the next iteration
            self.location = None
            self.write_proxy = None
            self.status = None
            self.headers = None
            self.exc_info = None


def redirector(nextapp, recursive=False):
    """WSGI middleware which performs "internal" redirection.

    Whenever the next application sets a response status of 307 and
    provides a Location response header, this component will not pass
    that response on to the user-agent; instead, it parses the URI
    provided in the Location response header and calls the same
    application again using that URI. The following entries in the
    WSGI environ dict may be modified when redirecting: wsgi.url_scheme,
    SERVER_NAME, PATH_INFO, QUERY_STRING. REQUEST_METHOD is always
    set to 'GET', so any desired parameters must be supplied as
    query string arguments in the Location response header.
    The wsgi.input entry will always be reset to an empty StringIO,
    and CONTENT_LENGTH will be set to 0.

    If 'recursive' is False (the default), each new target URI will be
    checked to see if it has already been visited in the same request;
    if so, a RuntimeError is raised. If 'recursive' is True, no check
    is made and therefore no such errors are raised.
    """
    def redirect_wrapper(environ, start_response):
        ir = _Redirector(nextapp, recursive)
        return ir(environ, start_response)
    return redirect_wrapper

<< 1 2 3 4 5 6 7 8 9 10 11 >>

August 2014
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            

Search

The requested Blog doesn't exist any more!

XML Feeds

powered by b2evolution