Pages: << 1 ... 8 9 10 11 12 13 14 15 16 17 18 ... 26 >>


Permalink 01:06:38 am, by fumanchu Email , 676 words   English (US)
Categories: CherryPy

CherryPy now handles partial GETs

Partial GET requests are a handy way for a client to request a portion of a resource, rather than the entire resource. HTTP clients send a Range: bytes=start-stop request header, where start and stop are non-negative integers. The HTTP server can then send only those bytes (inclusive) in the response. Multiple byte ranges are also possible. CherryPy has had support for this since, well, earlier this morning (changeset 549, in the current svn trunk).

As John Udell noted a while back, Adobe Reader uses Range headers to accomplish this if the server supports it. Here's an example .pdf request, and the server's response (many headers omitted for clarity):

GET /mail.pdf HTTP/1.1

200 OK
Accept-Ranges 'bytes'
Content-Length '6786140'
Content-Type 'application/pdf'
Last-Modified 'Mon, 01 Dec 2003 18:13:02 GMT'

On the first request, the server returns a normal 200 response, and begins outputting the file. However, it also outputs the "Accept-Ranges" response header. This tells the client that partial GET requests (using the Range header) will be honored. Therefore, the client tries it, jumping to the PDF's content catalog at the end of the file:

GET /mail.pdf HTTP/1.1
RANGE 'bytes=6633280-6634323,6633278-6633279,6634324-6636107,6669998-6672067,5710727-5712197,

206 Partial Content
Accept-Ranges 'bytes'
Content-Length '158347'
Content-Type 'multipart/byteranges; boundary='

Now our server has returned a different status-code, "206 Partial Content". Since the client requested multiple byteranges, the response body is a multipart/byteranges entity. Each part inside that multipart body has its own Content-Type and Content-Range headers.

Since that worked, Adobe Reader proceeds to read more ranges:

GET /mail.pdf HTTP/1.1
RANGE 'bytes=6626812-6627960,6785258-6786139'

206 Partial Content
Accept-Ranges 'bytes'
Content-Length '2305'
Content-Type 'multipart/byteranges; boundary='

GET /mail.pdf HTTP/1.1
RANGE 'bytes=6636108-6636167,194633-198372,198373-202575'

206 Partial Content
Accept-Ranges 'bytes'
Content-Length '8392'
Content-Type 'multipart/byteranges; boundary='
Last-Modified 'Mon, 01 Dec 2003 18:13:02 GMT'

...and then appears to have finished, after taking some time to process those responses. However, when I scrolled to the last page in the PDF document, Adobe Reader made an additional request:

GET /mail.pdf HTTP/1.1
RANGE 'bytes=6668978-6669997,6672068-6672327,5698663-5699716,5723626-5724706,

206 Partial Content
Accept-Ranges 'bytes'
Content-Length '36370'
Content-Type 'multipart/byteranges; boundary='
Last-Modified 'Mon, 01 Dec 2003 18:13:02 GMT'

So it seems there's quite a bit of partial-retrieval going on, ultimately making the client more responsive from the user's point-of-view.

How to take advantage of partial GET support in CherryPy

If you want to serve static files so that clients can requests portions of them (including resumable downloads!), you only need to use the StaticFilter, which handles Range requests transparently. Here's the script I used to serve mail.pdf:

import cherrypy

class Root: pass
cherrypy.root = Root()

        'server.environment': 'production',
        'staticFilter.on': True,
        'staticFilter.dir': 'static',

Yes, that's really all you need to make an HTTP static file server! The staticFilter.dir (where I saved mail.pdf) is relative to wherever you save the above script. If you'd like it relative to some other absolute path, set that in staticFilter.root.

If you'd like to respond to Range request headers, but you're not serving static files, you can still benefit from CherryPy's core. In cherrypy/_cphttptools, there is a get_ranges(content_length) function which you can use; it examines the current request's Range header, and returns a list of (start, stop) tuples (or just returns None if there's no header). For example, given the Range header:

Range: bytes=30000-40000

The call get_ranges(50000) will return [(30000, 40001)]. Note that we've incremented stop by 1, so that you can use it in a string-slicing operation (byte-ranges are inclusive, but Python's slices have exclusive upper-bounds).

Note also that you need to supply a content-length to get_ranges. It's perfectly valid for a client to request Range: bytes=-500, and expect to receive the last 500 bytes of the resource. So you need to specify the total length in order to do the subtraction.

CherryPy doesn't yet handle the If-Range request header, so feel free to write that and contribute it. ;) ETag support would be nice, too.


Permalink 11:08:42 am, by fumanchu Email , 120 words   English (US)
Categories: CherryPy

Is your code a novel?

Remco, a long-time friend of and contributor to CherryPy, started porting his first app to CP 2.1 today, and had this to say:

<remco>    btw, the code has been cleaned beautifully!!!
[fumanchu] well, thanks
<remco>    respect to all of you who contributed to it
[fumanchu] I tried to make the core process easy to read
<remco>    well, it's still a webserver core, so one has to keep focussed,
<remco>    but compared to 2.0 or prior to that : it reads like a novel! :D
[fumanchu] heh
[fumanchu] 2.0 was a collection of short stories
<remco>    and you can jot that on ur resume

Thanks, I just might do that. :)


Permalink 12:05:47 pm, by fumanchu Email , 785 words   English (US)
Categories: Python, CherryPy

Code Coverage with CherryPy 2.1

CherryPy1 helps with both the collection and the analysis of coverage data (for a good introduction to code coverage, see Now, I'm a visual learner, so I'm going to skip right to the screenshot and explain it in detail afterward. This is a browser session with two frames: a menu frame on the left and a file frame on the right. Clicking on one of the filenames in the menu will show you that file, annotated with coverage data, in the right-hand frame. This stats-browser is included with CherryPy, and can be used for any application, not just CherryPy or CP apps.

1 All of this is present in CherryPy 2.1 beta, revision 543. Get it via SVN

coverage stats browser session

Collection of coverage statistics

You need to start by obtaining the module, either the original from Gareth Rees, or Ned Batchelder's updated version. Drop it in site-packages.

Covering CherryPy

If you're collecting coverage statistics for CherryPy itself, just run the test suite with the --cover option. Coverage data will be collected in cherrypy/lib/coverage.cache. Example:

mp5:/usr/lib/python2.3/site-packages# python cherrypy/test/ --cover

Covering CherryPy applications

If you write a test suite for your own applications, build it on top of the tools present in cherrypy/test. Here's a minimal example:

import os, sys
localDir = os.path.dirname(__file__)
dbpath = os.path.join(localDir, "db")

from cherrypy.test import test

if __name__ == '__main__':
    # Place our current directory's parent (myapp/) at the beginning
    # of sys.path, so that all imports are from our current directory.
    curpath = os.path.normpath(os.path.join(os.getcwd(), localDir))
    sys.path.insert(0, os.path.normpath(os.path.join(curpath, '../../')))

    testList = ["test_directory",
    testConf = os.path.join(localDir, "test.conf")

By using the TestHarness from CherryPy's test suite, you automatically get access to the --cover command-line arg (and --profile and all the others, too, but that's for another day). Again, coverage data will be collected in cherrypy/lib/coverage.cache by default.

Covering Other Applications

You can use the stats-browser even if you don't use the CherryPy framework to develop your applications. Just use as it was originally intended: -x

The coverage data, in this case, will be collected by default into a .coverage file. You need to tell the stats-server where this file is (see below). Note that successive manual calls to will accumulate stats; the CherryPy test suite, in contrast, erases the data on each run.

Analysis of coverage statistics

Once you've got coverage data sitting around in a file somewhere, it's a snap to have CherryPy serve it in your browser. If you're covering the CherryPy test suite, or your own CP app using CP's TestHarness (see above), just execute:

mp5:/usr/lib/python2.3/site-packages# python cherrypy/lib/

Then, point your browser to http://localhost:8080, and you should see an image similar to the above.

By default, the server reads coverage data from cherrypy/lib/coverage.cache, the same file our collector wrote to by default. If you covered your own application and collected the data in another file, you can supply that path as a command-line arg:

# python cherrypy/lib/ /path/to/.coverage 8088

If you supply a second arg, as in this example, it will change the port for you (from the default of 8080).

You need to stop (Ctrl-C) and restart the server if you recollect coverage data.

The interface

Each file in the menu has coverage stats, and is a hyperlink; click on one, and the file frame will show you the file contents, annotated with coverage data. Lines that start with ">" were touched, and those that start with "!" were not.

Click the "Show %" button to show a "percent covered" figure for each file. This can take a long time if you have lots of files, so it's best to first restrict your view using the directory links. Each directory is a hyperlink; click on one to restrict the menu to that folder only. Percentages below the "threshold" value will be shown in red. The "Show %" feature isn't "sticky", by the way; that is, if you click on a different directory link, or refresh the page, the figures will disappear. That's a necessary evil due to the slowness of generating percentages for many files. Just hit the "Show %" button again as needed.

As you can see from the screenshot, I've got some more tests to write! Hope you find this tool as useful as I do. :)


Permalink 02:16:45 pm, by admin Email , 10 words   English (US)
Categories: General

Overheard at lunch

"My face may be copyrighted, but my body's public domain."

Permalink 12:35:52 am, by fumanchu Email , 420 words   English (US)
Categories: IT

Yet Another Firefox Bookmarklet Trick

Bizarro has got to be my favorite comic these days. You'd think King Features would get with the times and give me an RSS/Atom feed to my favorite comic, so I'd be reminded to go see their ads every day. But they haven't, because they get better revenues from dead-tree syndication, I guess, and want to drive me either there or to their new $15.00/year service.

Not only that, but even their "Web 1.0" interface broke recently, on purpose. I used to be able to select a month's worth of recent strips from a dropdown list. Now, that list hasn't been populated correctly (it always shows only the first week of the previous month, in classic nagware fashion). Fortunately, you can request the comic for a given date by hacking the HTML form, for example: But that gets old fast—too many keystrokes. I thought about making my own RSS wrapper for the strip, but King Features does a good job of checking referrers.

My next thought was to use a Mozilla keyword search, which allows you to shortcut a lot of the input. The classic example is binding the keyword "google" to ""; then you can type "google bizarro" in your location box and be taken to the full URL, "".

Of course, that wouldn't format the date in my case, and if you're like me, the current date in YYYYMMDD format isn't something on the tip of your brain. Fortunately, you can use javascript in your bookmarklets! Here's what I ended up with:

javascript:url='';d=new Date();d=String((d.getFullYear()*100+d.getMonth()+1)*100+d.getDate());d=prompt('Publication Date', d);location.replace(url+d);

which, more readably, is:

    d=new Date();
    d=prompt('Publication Date', d);

When I click the bookmark, the script prompts me for the date value, and defaults to today's date in the proper format. I'm sure this could be made much more complete, but I'm too lazy for that. Besides, now that I've blogged about my loophole, I'm sure they'll close it soon enough. ;)


Permalink 05:17:39 pm, by fumanchu Email , 662 words   English (US)
Categories: CherryPy, WSGI

Funny how people only goggle over the baby

Simon Willison recently wrote a description of Django's request-handling mechanism. Here's a quick comparison with CherryPy:

When Django receives a request, the first thing it does is create an HttpRequest object (or subclass there-of) to represent that request.

CherryPy has a Request object, as well. However, it's purely an internal object; it doesn't get passed around to application code. One of the design points of CherryPy is that it allows you to write (at least a majority of) your code "like any other app"; this means that input arrives as "simple data" via function parameters, and you use the "return" statement to output data, not custom HTTP-framework objects. Point in favor of CP, IMO.

Once the object has been created, Django performs URL resolution. This is a process by which the URL specified in the request is used to select a view function to handle the creation of a response. A trivial Django application is simply one or more view functions and a configuration file that maps those functions to URLs.

Like almost every other web framework. ;) The only difference from CherryPy is that CP specifies the mapping in code, not config files. Another point to CP.

Having resolved the URL to a view, the view function is called with the request object as the first argument. Other keyword arguments may be passed as well depending on the URL configuration; see the documentation for details.

See above; CherryPy is flatter, and tends to pass data, not internal objects.

The view function is where the bulk of the work happens: it is here that database queries are made, templates loaded, HTML is generated and an HttpResponse object encapsulating the result is created. The view function returns this object, which is then passed back to the environment-specific code (mod_python or WSGI) which passes it back to the browser as an HTTP response.

Again, CherryPy is flatter, expecting you to return data, not objects. You can return a string, an iterable of strings, a file, or None, or yield any of those. Point.

This is all pretty straightforward stuff - but I skipped a couple of important details: exceptions and middleware. The view function doesn't have to return an HttpResponse; it can raise an exception instead, the most common varieties being Http404 (for file-not-found) or Http500 (for server error). In development servers these exceptions will be formatted and sent back to the browser, while in production mode they will be silently logged and a "friendly" error message displayed.

CherryPy also has user-raisable exceptions; however, they're not so low-level. Instead of Http404, you raise cherrypy.NotFound. Instead of Http3xx, you raise cherrypy.HTTPRedirect. I prefer CP's style, of course, but I don't think it's a clear "winner" over Httpxxx exceptions.

Middleware is even more interesting. Django provides three hooks in the above sequence where middleware classes can intervene, with the middleware classes to be used defined in the site's configuration file. This results in three types of middleware: request, view and response (although one middleware class can apply for more than one hook).

CherryPy has 7 such hooks; two are for errors, so let's call it 5 for a more-reasonable comparison. But see my previous post on why static hook points may not be the best approach. Still, 5 is better than 3 :). Point.

The bulk of the above code can be found in the call method of the ModPythonHandler class and the get_response method of the BaseHandler class.

That sounds like an unfortunate violation of the DRY principle. CherryPy isolates all of that nicely via the server.request() function. Are we keeping score yet?

As Django is not yet at a 1.0 release, the above is all subject to potential refactoring future change.

I can't wait to see Django 1.0! Until then, I'm going to take our adolescent web framework and go sulk in my room. ;)

Permalink 11:01:16 am, by fumanchu Email , 239 words   English (US)
Categories: Python, Dejavu, CherryPy

It doesn't take much of a Python to swallow my brain

Lines of code in the four systems I hack on most often (and of which I have a more-or-less complete grasp):

>>> import LOC
>>> LOC.LOC(r"C:\Python23\Lib\site-packages\cherrypy")
>>> LOC.LOC(r"C:\Python23\Lib\site-packages\dejavu")
>>> LOC.LOC(r"C:\Python23\Lib\site-packages\endue")
>>> LOC.LOC(r"C:\Python23\Lib\site-packages\mcontrol")

Something about my brain must naturally fit 7500-to-10000-line chunks of Python code. I certainly experience a strong drive to keep these systems from becoming more complicated, which I usually express via aggressive refactoring.

Some other packages (which I don't hack on) for comparison:

>>> LOC.LOC(r"C:\Python23\Lib\site-packages\colorstudy\SQLObject")
>>> LOC.LOC(r"C:\Python23\Lib\site-packages\paste")
>>> LOC.LOC(r"C:\Python23\Lib\site-packages\PIL")
>>> LOC.LOC(r"C:\Python23\Lib\site-packages\twisted")
>>> LOC.LOC(r"D:\download\Zope-2.8.1-final\Zope-2.8.1-final")

I think the sheer size of paste, twisted, and zope has actively kept me from wanting to dig into them further (but it's certainly not the only factor). Irrational, perhaps, but a natural human response to information overload.

Here's the LOC script if anyone wants to compare packages:

import os, codecs, re

def LOC(root, pattern='^.*\.py$'):
    LOCs = []
    pattern = re.compile(pattern)
    for path, dirs, files in os.walk(root):
        for f in files:
            if pattern.match(f):
                mod = os.path.join(root, path, f)
                lines = len(, "rb").readlines())
    return sum(LOCs)


Permalink 01:12:42 am, by fumanchu Email , 466 words   English (US)
Categories: Python, CherryPy

unittest's bad rap

Phillip J. Eby recently said:

unittest has gotten something of a bad rap, I think. Regardless of whether you like its basic testing facilities or not, it is an extremely good framework. In fact, I think it's one of the most beautiful frameworks in the Python standard library. Its functionality is cleanly separated into four roles, each of which can be filled by any object implementing the right interface: runners, loaders, cases, and results. Because of this exceptionally clean factoring, the basic framework is amazingly extensible.

I couldn't agree more, which is why I recently converted CherryPy's ad hoc test suite to one based on unittest. Although unittest doesn't fit every program out-of-the-box, its components are a breeze to subclass in order to make it fit your problem domain. CherryPy 2.1 now has a nice (which will only get better as it is extended), specifically designed for simultaneously testing both the client and server sides of an HTTP request. The webtest module:

  • Understands that full web-application test suites can have a lot of components and requests, and tries to keep its "no failures" output to a minimum.
  • Provides simplified page-request functions--all HTTP methods are available (like POST and PUT), and required request headers are set automatically if not provided manually.
  • Runs the page requests in the same process as the web-application server thread(s). This allows errors in the server to be trapped and then reported in the unittest thread (see the server_error function).
  • Automatically reloads all modules that have been imported by each test module.
  • Allows the person running the tests to see the response status, headers, and body, and also the requested URL, whenever an assertion fails. This helps test-first design tremendously.
  • Allows failed assertions to be ignored, so that the current test method may proceed with the remainder of the tests. Since many page requests are idempotent GET's, this can help debugging by collecting more failure information at once.
  • Provides easy regular-expression matching against the response body.

The webtest module is available, by the way, to be used in other frameworks or applications. There's nothing CherryPy-specific in that module; all of that is found in test\, which wraps webtest to fit the CP test suite. Anyone using webtest for their own framework or app could learn a thing or two from the wrappers there.

It'd be nice if some of PJE's mini-tutorial found its way into the docs for unittest; I said it was "a breeze to subclass", but only by reading most of the unittest source. Oh, and thanks, Phillip, for disallowing anonymous comments on your blog—that made me write up a more extensive post here on my own blog. But could you give me a trackback URL at least? ;)


Permalink 01:57:26 pm, by fumanchu Email , 45 words   English (US)
Categories: CherryPy

New CherryPy Planet


There's a new Planet in the OSS solar system, for posts related to CherryPy! CherryPy is "a pythonic, object-oriented web-development framework", which also happens to be fast, WSGI-ready, and easily extendable. Check out version 2.1, now in beta; you won't be disappointed!

Permalink 10:34:19 am, by fumanchu Email , 1132 words   English (US)
Categories: Python, Dejavu

Where Dejavu fits in the ORM cosmos

Florent Guillaume has written a good survey of his personal ORM options for Zope3. I thought I'd take the opportunity to discuss Dejavu, my own pure-Python ORM, in relation to his analysis.

SQLObject is a pure python mapper, and it is used in Zope 3 through sqlos. It provides declarative mapping for your classes ... any instance ... will actually be stored in SQL behind your back. A unique id is generated for each [object] and also stored in SQL. There are various facilities to provide relations between tables, and map them to lists in the python objects.

SQLStorage is an Archetypes storage that uses SQL as a backend. You can write an Archetypes schema ... SQLStorage relies on the Archetypes UIDs to uniquely identify objects. A Relation field can be used to have relations to other objects. Note that Archetypes objects stored through SQLStorage still have a presence in the ZODB, which means it's not a solution if you totally want to get rid of Data.fs bloat.

...both SQLObject and Archetypes require you to specify in your code that you will use SQL for some objects. Things are not "transparent" from the programmer's point of view.

Now that I haven't been actively hacking on Dejavu for a few months, I've had the opportunity to sit back and think more clearly about what I like in it. One of the things I like most is that the decision about what storage system to use is not made by the application programmer; instead, it's made by the deployer(s) of an app. Therefore:

  • Programmers can remain blissfully unaware of SQL. They should still understand something about storage in the abstract—things like indexing, size hints, and relations—but the Dejavu API completely replaces SQL, custom file drivers, caching mechanisms, and any other storage APIs, transparently.
  • Deployers can make their own decisions about storage mechanisms, on a per-class level. Even if those decisions are religious or political in nature. ;) But if they're not...
  • Deployers can test their dataset using various storage systems to find out which is fastest (or against other metrics).

Objects that you want to persist do have to follow a pretty complicated interface standard, and by far the easiest way to guarantee that is by subclassing dejavu.Unit. So Dejavu does "require you to specify in your code that you will [persist] some objects". I'm not sure whether Florent was defining "transparent" in terms of declaring "SQL" specifically, or declaring "persistence" in general.

Ape (Adaptable Persistence Engine) is a framework to do object-relational mapping at a lower level than the above two solutions, because it works at the ZODB Storage level...the downside is that the structure of the tables in the SQL database is chosen by the framework.

Dejavu doesn't do that. Object properties (and table names) are declared in Python code, just like SQLObject or SQLStorage (although more readably, IMO). But Dejavu doesn't add any properties which you don't explicitly specify; only the "ID" property is present by default. Dejavu does expose a "create_storage" method for turning your in-Python schema into empty database tables (for use when your database isn't already populated).

However Ape's default SQL mapper already tries hard to provide data mapping in a natural way; for instance all properties are made available in a natural manner, object titles or containment relationship are also naturally expressed.

If I understand that paragraph correctly, it's saying that the persistence mechanism doesn't conflict with normal Pythonic code. This is a great strength of Dejavu: objects are still objects, and their properties are gotten and set in a natural way, with reasonably-transparent coercion if necessary:

>>> class Knight(Unit):
        Name = UnitProperty(unicode)
>>> galahad = Knight()
>>> galahad.Name = "Galahad the Chaste"
>>> print galahad.Name
u'Galahad the Chaste'

Hornet is an SQL bridge (alpha software for now) that also works at the ZODB layer. In contrast to Ape, it is much more geared toward existing datasets, or regular SQL access to tables. It requires you to define schemas in code too, but once this is done object access is totally transparent, as in Ape.

Dejavu sounds closer to Hornet than to Ape; one of the original reasons I wrote Dejavu was to get transparent access to a third-party database over which I had no schema control. You need to define the schema in code, but there's no reason that couldn't be automated for most storage systems (databases are the easy part ;) ).

To me Hornet and Ape are promising because I believe they integrate at the right level. Ape is better because it doesn't require explicit schema declaration, and that's very important when you have flexible objects where the users add new fields on document instances (which happens all the time in CPS using FlexibleTypeInformation-based content objects).

There was a time in the development of Dejavu that I wanted to support not only manual schema discovery, but actually discovery-on-the-fly at runtime. I think I only dropped it because I didn't have an explicit need for it; my hunch is that the framework could still support it easily. One of the best things about Dejavu is that it's only being used in production today at a couple of sites; at this stage of development, a good coder could easily step in and hack it into what they want it to be. :)

...I want to store blobs in the filesystem. This can be done at the application level by various products such as CPS DiskFileField, Archetypes ExternalStorage, chrism's Blob product, and others. It can also be done transparently at the ZODB storage level using Ape, and that's a much simpler way to do it.

That could be done with Dejavu with a tiny amount of work; subclass an existing StorageManager and special-case the BLOB fields. You could probably even use one of the above products to do it within Dejavu.

...I plan on using the most flexible (and underused) framework available, which is Ape. I'll write various classifiers, and mappers so that a typical CMF or CPS site can be mapped naturally to SQL. I'll also replace the catalog by an implementation that does SQL queries. This will not be a simple endeavour, but I feel this is the only way to get what I truly need in the end, without sacrifying any flexibility.

Wow, that's a lot of work. Care to leverage the 1000 man-hours I put into Dejavu instead?

If Ape proves to hard to work with (because it imposes its own framework of mapping), I'll go the Hornet way of writing a storage directly, with flexible enough policies for the mapping to SQL or the filesystem (or LDAP for that matter).

I wouldn't mind having a Dejavu StorageManager for LDAP... ;)

<< 1 ... 8 9 10 11 12 13 14 15 16 17 18 ... 26 >>

October 2019
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    


The requested Blog doesn't exist any more!

XML Feeds