« Somebody's finally using text/x-jsonShort list of things CherryPy should do »

Customization vs handler dispatch in web application servers

10/25/05

Permalink 01:33:07 pm, by fumanchu Email , 2897 words   English (US)
Categories: IT, Python, CherryPy

Customization vs handler dispatch in web application servers

[10/25/05 Update: changed some terms to make it more clear.]

At its most basic, a web-application server can be said to map a set of URI's to a set of handlers. From Roy Fielding's REST dissertation:

The resource is a conceptual mapping -- the server receives the identifier (which identifies the mapping) and applies it to its current mapping implementation (usually a combination of collection-specific deep tree traversal and/or hash tables) to find the currently responsible handler implementation and the handler implementation then selects the appropriate action+response based on the request content.

In an HTTP server, the "identifier" is the URI (which includes the query string, as I learned recently). The "handler implementation" is almost always a function in some programming language; for many HTTP servers written with scripting languages, these handlers will be written in the same language as the server. CherryPy 2.1, Django 1.0, and Quixote 2.3 are Python examples of this. mod_python 3.1 is an example of a Python web-application tool where the HTTP server is written in some other language (Apache, written in C). In a moment, we'll take a look at how each of these manages URI-to-handler mappings, which we'll call "dispatch".

Every web-application server, whether tied to a larger framework (Django) or not (CherryPy, Quixote, mod_python), must also address the need for customization. By "customization", I mean modifications to the per-request behavior of the server. I do not mean the behavior of the application, although the same techniques are often employed for both. I also do not mean end-user settings, which are properly stored in an application database. I also want to make it absolutely clear that I don't mean "data which exist in configuration files"—the concept of customization is distinct from the medium.

Now let's look at our four servers, and see how they manage URI-to-handler dispatch, and how they provide customization:

CherryPy

CherryPy takes the "deep tree traversal" route in order to map URI's to handlers. There is a cherrypy.root object which the developer creates, which always maps to the root URI "/". Subpaths are attached as attributes to the root object. Since the path portion of a URI is also heirarchical, there is a relatively straightforward mapping.

CherryPy allows some flexibility by providing a default method; if the mapper reaches the end of its search without finding a matching handler, it will then reverse direction, looking for a parent method named "default", which it then calls, passing any child path info as arguments. That is, a URI of "/path/to/parent/child/repr?color=red", if handled by cherrypy.root.path.to.parent.default, will be called as default("child", "repr", color="red").

CherryPy manages customization primarily via an internal Python dict (a key->value map); each key is a URI, and each value is another dict of (name, setting) pairs. This is often specified in an "ini-style" config file.

Django

Django might be said to epitomize the "hash tables" approach to handler dispatch, using an ordered set of regular expressions. The urlpatterns object is a tuple of tuples, where each inner tuple is of the form: ("pattern", "handler"). The pattern-handler pairs are evaluated in order until the URI matches a pattern, at which point the handler is looked up (converted from a string to the function which it identifies) and called. By using regular expressions, Django is free to map any set of URI's to a given handler.

Django keeps global customization data in a settings module; each global variable in that module can be used as a named value. Per-request customization, however, is managed entirely within the handlers, in code.

Quixote

Quixote has a mapping strategy apparently designed for maximum flexibility. Applications create a Publisher object, to which the server passes each HTTP request. The default Publisher will call self.root_directory._q_traverse(), passing the value of the PATH_INFO environment variable (split into chunks by each "/" in the URI). The _q_traverse method may then "do what it likes" with that path info; the common Directory object tries to map URI's to local methods, or to _q_traverse methods of successive child objects.

Quixote manages global customizations with a Config class; each attribute of that class is a constant which the server uses to customize its request-response process. The data can be read from a file (more properly, executed from a Python file). But like Django, per-request customization is managed entirely within the handlers, in code.

mod_python

Mod_python plugs right into Apache, and can control much more of the HTTP conversation than most of the other frameworks. Here, let's just talk about the PythonHandler directive; it's used as follows:

<Location /myapp>
    SetHandler python-program
    PythonHandler wsgiref.modpython_gateway::handler
</Location>

That is, the mapping between URI's and handlers is performed with Apache's Location, LocationMatch, Directory, Files, and similar directives. That's usually not the whole story, however; many modpython applications define few Handlers, or even just one, choosing instead to implement their own additional dispatch and customization layers within those handlers. I believe this tendency is one of the factors which have led so many Python web-framework developers, even the above three, to build on top of modpython as a deployment option.

The only generic, server-provided means of managing customization data for mod_python is the PythonOption key value directive (although other directives exist and may even be inspected; much of the customization in a modpython application is done entirely within Apache, or via other modules). Each PythonOption applies to the same set of URI's for which the given handler will be invoked.

Customization referents

All of the above designs, as described, have an additional detail in common: they can map multiple URI's to a given handler, but cannot (or tend not to) do the reverse: map a given URI to multiple handlers [1]. This is a surjective, not injective, mapping (click on the image to learn more):

Surjective, not injective, map. 1=D, 2=B, 3=A, 4=A

The central question then arises: is per-request customization data bound to the URI's, or to the handlers? Let's answer that for each of our four examples:

  • CherryPy customizations are definitely associated with URI's. Each section in the config dict is directly mapped from a URI key (note that a single key may match multiple URI's).
  • Django customizations are definitely associated with/defined by handlers. Any customization is written into the View objects themselves, in Python.
  • Quixote customizations are also definitely associated with handlers, just like in Django.
  • Mod_python customizations are associated with URI's; each PythonOption (or other Apache directive) applies to a given Location.

In the two frameworks (CherryPy and mod_python) where per-request customization data is associated more closely with URI's, the implementation is declarative as opposed to imperative; the server is free to use the data as it sees fit, in order to meet the perceived goal of the user. In the other two servers, Django and Quixote, the implementation is ad-hoc; developers may choose to use declarative implementations (for example, global constants), or they may "hard-code" the behavior.

This difference shouldn't be a surprise. Django is already a full-stack framework like Ruby on Rails or Spring. CherryPy, in contrast, was designed to (optionally) act as a base component for larger, "full-stack" frameworks like Subway or TurboGears. That is, CherryPy must be customizable both by end-applications and by intermediate frameworks. That would be difficult to achieve with imperative customization. Mod_python goes even further, since Django, Quixote, and even CherryPy can optionally use it to connect to Apache.

Access roles and customization

We need to pause, here, and make a distinction between application developers and application deployers. For many small applications, these two roles are played by the same person (who cannot understand why everyone is so picky about config architecture ;) ). But for larger applications or megaframeworks, the two are very distinct. Frequently, the following division of roles is expected:

ImperativeDeclarative
Server code Framework developer App developer
Application code App developer App developer
(often default values)
Config files Deployer
(a Programmer)
Deployer

Much of this is a direct result of the state of programming tools and languages. For example, "imperative server code" is the domain of framework developers, because only they have CVS/SVN privileges; if anyone else makes changes to that code, they fear losing their changes on the next update (although distributed RCS like Arch or Bazaar, can help ameliorate this a little). Similarly, config files exist outside the CVS/SVN of the application code, and are therefore the only domain of the deployer. But note that config files which use the same language as the application code are often assumed to be too difficult for non-programmers to use.

The Customization Maturity Vector

When developing applications (both new or existing), many developers tend to start with all behaviors embedded in imperative code. After a time, the developers notice a need for varying behaviors, and decide to provide a switch, in code, for it. This may take the form of constant values or subclassing or composition or some other pattern. Once the behavior set is reduced to a small number of variants, control over its customization may be placed in a config file. This results in a fairly predictable vector:

Imperative Code (IC) -> Declarative Code (DC) -> Declarative Text (DT)

Server and framework authors do the same, of course, preferring to start with imperative implementations in code, and moving slowly, but predictably, to declarative implementations in text. And they are right to move slowly; the decisions about where to store and retrieve such data are critical to proper isolation and encapsulation, key ingredients of multi-layered software.

However, what is often not addressed is that the different mechanisms not only implement access control, but directly affect program readability and server architecture, as well. For example, a server author who wishes to make a new, customizable feature available has several options. They may:

  1. IC: Provide a separate method for each possible behavior, expecting the application developer to call the correct one when required.
  2. IC: Provide a default method, which they expect to be overridden in a subclass (or otherwise replaced/superseded; some might call this Declarative, depending on the syntax).
  3. DC: Provide a standard location by which the application developer may declare a method to be called (plugin style).
  4. DC: Provide a method which is configured via a constant; the value may be placed in a variety of scopes.
  5. DT: Provide a method which is configured via a "config entry", whether in code or in a text file.

In my limited experience, their decision will most likely be motivated by the roles defined in the previous section, and by the "Maturity Vector" above. I'd like to hear from some other authors about their experience. But for now, let's move on to the architectural implications.

Issues for declarative mechanisms

CherryPy and Apache-configured-mod_python share a common weakness: customization data is stored in config files, in a declarative language which is not that of the application code. However, developers like to think of customizations as applying to handlers, not to URI's, and they often gripe when the effects of configuration files are divorced in time and space from their corresponding handler code.

For example, CherryPy has a plugin mechanism consisting of filters: classes with a set of methods which are called at various points in the request process. Until version 2.1, all filters were declared in code; each object on the cherrypy.root tree could define its own _cpFilterList attribute, a list of filter instances. Such filters apply to that object and its children, and therefore any URI's which map to that object or its children. CherryPy provided some filters in the standard distribution, but many were created by app developers to meet their specific needs.

Beginning in version 2.1, however, the builtin filters changed their declaration method, from an in-code list to configuration files, and therefore, changed from being associated with handlers to being associated with URI's. Interestingly, user-defined filters did not. Developers, especially CherryPy veterans, therefore, become confused between the two mechanisms. When configuration is bound to URI's instead of handlers, it is easy to delay the actual handler dispatch; in some cases, it may be delayed too long, limiting the customization which developers can perform, both in the handlers themselves and via any filter/plugin hooks provided by the server.

The effect is not limited to the filters themselves. Since some app developers see filters as a catch-all for customization needs, they place customizations in filters which don't really belong there, because they naively expect user-defined filters to be automatically declarable and configurable via the config file (they're not). On the other hand, filters are perceived to be black magic by many app developers, and some behaviors are hard-coded in handlers which belong more properly in filters (Python decorators seem to be a Strange Attractor for this).

Another problem arises because text-config-file declarations in both of these tools map to URI's, and not handlers. Server and app developers are at a loss in those rare cases when they need to allow deployers to specifically customize a handler as opposed to a URI. For example, a deployer for a site which internally redirects an arbitrary number of paths (e.g. http://www.site.com/~[user]/help -> /help) to a common, customizable handler would much rather write a single config entry, but the config file format forces them to write one entry for each virtual path.

Issues for imperative mechanisms

Django and Quixote, by contrast, seem to handle all per-request customization in code, imperatively. At least, there is no central, server-managed repository for declarative settings—any app developer could decide to implement their own; some do. Some simply expect customization to be done directly on the source files (example).

The first problem with this approach is that deployers (who are not programmers) have a significant psychological, if not educational, hurdle to overcome when configuring their copy of the application. [This isn't a religious treatise, so I'll stop there.]

The second problem is that it's difficult to really extend the framework itself. It's not expected, of course, that anyone would write a framework on top of Django, but neither is there any facility for extending per-request Django behavior, other than in imperative code. That is, if a site needed to gzip only some of their HTTP responses, a developer would have to implement that behavior, for each handler, in imperative code.

Finally, when customization is bound to handlers instead of URI's, the timing of the handler dispatch is of utmost importance; it must occur very early in the request-handling process, so that any per-request customization data can be available as soon as possible. Often, the mapping from URI to handler must be done absolutely first, so that the server itself has access to such data, even before calling the main handler. Django, for example, resolves the URI to the handler right away; the only serious action it takes first is to call global middleware (which has no per-request config). Quixote takes a hybrid approach; the _q_traverse methods act as "server code" (providing dispatch, and possibly configuration) but are instantiated in application objects.

Conclusion

The choice of where to store per-request customization data in a web application server is never a trivial one. It is constrained by project maturity and social expectations, as well as the architecture of both the server and each application. These concerns often compete; occasionally, they produce unresolvable conflicts.

When designing a web application server, the design of any configuration system is of utmost importance, and will affect the design of the entire server and any applications or frameworks which are built on the server. If a configuration system is not flexible, it may resist (or deny) applying the server to some application domains, or limit the extensibility of the server.

The customization system must also be designed to work in concert with the handler-dispatch mechanism. Customizations of all kinds should be analyzed and explicitly designed to be bound to either handlers or URI's. Pretending that handlers and URI's are synonymous will only hide implementation conflicts, delaying them from design time to deployment time.

Any feedback on this document is welcome. I'd like to learn a lot more about this from other server/framework authors' experiences, and from the analysis of any developers and/or deployers. Add your comments below, and I'll work on keeping this document updated.


[1] Mod_python has the best facilities for doing this, since many Apache handlers are designed to be run in series, or to cascade. An enterprising Quixote developer could, in theory, write a Publisher which called multiple handlers (but then, any of these tools' handlers could implement their own additional layer of dispatch, as well). But the vast majority of applications tend to remain surjective.

3 comments

Comment from: Ben Bangert [Visitor] · http://www.groovie.org/

I'm not entirely sure what the point of this post is, is it comparing methods or trying to find strengths in one or the other?

If you're looking for better systems for deployers, I'd suggest looking at Paste. The config file format is rather minimal given the ability it wraps up, though CherryPy will need the ticket resolved that prevents it from functioning properly with multiple webapps in a single process.

The other thing I'm not sure you're accounting for is all of the frameworks you mention will fail in a deployment scenario where the entire web application is deployed under another URL namespace. If you wrote your web application with the presumption it'd "own" the root path, and have hard-coded /archives/2004 in templates and such, the webapp will fail when a user tries to deploy it under /joes/blog URI.

For a web application to survive under this type of deployment, all URL's it generates to its own parts will need to be generated. This was one of my primary goals in porting the Rails system of Routes to Python (and I liked their flexible mapping system that didn't use huge regexps). Zope already has this under control as well since they use URL generation to call webapp resources.

I've read your post a few times now, and I'm still having difficulty deciphering the terms you're using for various contexts. Perhaps thats because I'm not primarily a web framework creator?

I should mention however, that Myghty allows for significant customization of dispatch far beyond what I believe CherryPy or Django offer. As I'm building a mega-framework of sorts on top of Myghty, the customization has allowed me to easily build MVC frameworks and experiment using:

  • A stateless controller class (one controller is instantiated and used persistently)
  • A stateful controller class (controller is instantiated and used for each request)
  • Stateless controller thats automatically instantiated outside the users code
  • Pipe-lined request dispatch (ala the Axkit XML/XSLT approach)
  • Routes-based dispatching



Myghty's advanced resolver mechanism allows the entire resolver dispatch scheme to be fully customized by the web application developer (should they choose to). Entire custom resolvers can be written and added in, or pipe-lined. The resolvers can even be fine-tuned per the context of the request, for example one rule-set of resolvers could be used during the initial request, and a different set of resolvers could be called during a sub-request.

This entire approach makes for a very very powerful and customizable base for building custom mega-frameworks on top of it (as I'm doing). Of course, in Myghty's case this means the templating it provides is available though there's no reason one couldn't build their own mega-framework using Myghty for this feature and using another template language.

Hopefully that helps some with the comparison of URL dispatching/configuration and the customizability it provides a developer.

10/25/05 @ 11:47
Comment from: fumanchu [Member] Email

Hi, Ben!

The post is descriptive. I'm trying to point out some of the issues that a developer will run into when making a framework.

"If you're looking for better systems for deployers, I'd suggest looking at Paste."

I'm not, really. I'm much more interested in the content of such a file than the management of it. [However, if the management tools, like Paste, are widely considered to be hard to understand, then framework and server authors are naturally going to prefer other ways to allow customization of their code.]

"The other thing I'm not sure you're accounting for is all of the frameworks you mention will fail in a deployment scenario where the entire web application is deployed under another URL namespace."

I disagree, but I think it may be semantics. There are two issues involved in supporting arbitrary mount points. First, dispatch needs to still work (input), and second, URL's need to be written appropriately (output). The latter can be addressed either via relative URL's or via rewriting (interpolation of the correct absolute root). At least three of the four servers I covered have facilities for correct dispatch:

  • One of Django's design goals is to promote fluid roots.
  • Mod_python can depend on its deployers using mod_rewrite in almost every case.
  • It's easy enough to find or write a filter for CherryPy which does the dispatch-rewriting for you.



I can't speak for Quixote on this issue.

Regarding the output (URL generation), I'm only really familiar with CherryPy, which, since it is templating-language agnostic, leaves the relativity of URI's to the application developer (I know all of my URI's are both correct and re-routable). One nice thing about CherryPy is that it doesn't own any URI's itself, and therefore does not have to address this issue in the core.

Regardless of existing support, I think Routes is a great step in providing this functionality to lots of existing frameworks.

"I've read your post a few times now, and I'm still having difficulty deciphering the terms you're using for various contexts. Perhaps thats because I'm not primarily a web framework creator?"

Perhaps, but it's more likely just my intuitive munging of terms. I'll re-read it (now that I've had some sleep) and see if I can't touch it up and be more consistent.

Thanks for the info re: Myghty and Routes. I've actually got another post in draft right now regarding CherryPy, and how I'd like to turn it inside-out in order to support various dispatch schemes. [I don't think it'll start doing response composition anytime soon, though. That is, I don't see CP resolvers handling "sub-requests"; perhaps that's an advantage that Myghty provides, that CP can't count on (not having a blessed template solution).]

10/25/05 @ 12:23
Comment from: Ben Bangert [Visitor] · http://www.groovie.org/

I'm looking forward to better methods to plug-in dispatch schemes for CherryPy. One person already coded up an integration with Routes for CherryPy though I'm sure it has some bugs since he couldn't fully integrate it into CherryPy's back-end. I believe as it currently is, this should be implemented as some sort of filter?

In Routes, I added two functions intended to be used by the developer in whatever framework they use, that passes the configuration data using a thread-local singleton class. Kind of non-Pythonic (implicit instead of explicit), but it seemed like the best choice.

There are some interesting abilities gained and lost by having the C in MVC taken care of independently of the V. Of course, it will be more modular and a framework developer will be able to pick and choose more parts when they can plug-in a Controller (middle-section) type thing like CherryPy, then a template language, a Model class, etc. On the other hand, more advanced functionality that requires access to the Controller or sub-requests to Controller data might not work at all or as well when they are bound more closely.

This is one of the reasons I'm using Myghty as the VC in MVC for my own little mega-framework. In the case I cite at the end, the functionality desired is to pull the content of what is essentially a sub-request, and insert its content into the template being rendered. It's possible this is do-able in CherryPy, but I can't seem to find it in the docs.

Regarding creating frameworks in general, I'm hoping more people choose mega-framework type things, so I don't have to keep learning tons of new stuff anytime I want a new framework (ie, build on CherryPy, Myghty, or other extensible frameworks). Transferrable skill-sets are nice to have, as TurboGears emphasizes.

10/25/05 @ 14:27

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
August 2018
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31  

Search

The requested Blog doesn't exist any more!

XML Feeds

free blog