Categories: Python, Cation, CherryPy, Dejavu, WHELPS, WSGI

Pages: << 1 2 3 4 5 6 7 8 9 10 11 >>

01/09/09

Permalink 12:22:51 pm, by fumanchu Email , 369 words   English (US)
Categories: Python

assertNoDiff

I recently had to test output that consisted of a long list of dicts against an expected set. After too many long debugging sessions with copious print statements and lots of hand-comparison, I finally got smart and switched to using Python's builtin difflib to give me just the parts I was interested in (the wrong parts).

With difflib and a little pprint magic, a failing test now looks like this:

Traceback (most recent call last):
  File "C:\Python25\lib\site-packages\app\test\util.py", line 237, in tearDown
    self.assertNoDiff(a, b, "Expected", "Received")
  File "C:\Python25\lib\site-packages\app\test\util.py", line 382, in failIfDiff
    raise self.failureException, msg
AssertionError:
--- Expected

+++ Received

@@ -13,4 +13,3 @@

 {'call': 'getuser101',
 'output': {'first_name': 'Georg',
            'gender': u'Male',
            'last_name': 'Handel',
            ...}}
 {'call': 'getuser1',
 'output': None}
 {'call': 'getuser101',
 'output': {'first_name': 'Georg',
            'gender': u'Male',
            'last_name': 'Handel',
            ...}}
-{'call': 'getuser101',
 'output': {'first_name': 'Georg',
            'gender': u'Male',
            'last_name': 'Handel',
            ...}}

...and I can now easily see that the "Received" data is missing the last dict in the "Expected" list. Here's the code (not exactly what I committed at work, but I think this is even better):

import difflib
from pprint import pformat


class DiffTestCaseMixin(object):

    def get_diff_msg(self, first, second,
                     fromfile='First', tofile='Second'):
        """Return a unified diff between first and second."""
        # Force inputs to iterables for diffing.
        # use pformat instead of str or repr to output dicts and such
        # in a stable order for comparison.
        if isinstance(first, (tuple, list, dict)):
            first = [pformat(d) for d in first]
        else:
            first = [pformat(first)]

        if isinstance(second, (tuple, list, dict)):
            second = [pformat(d) for d in second]
        else:
            second = [pformat(second)]

        diff = difflib.unified_diff(
            first, second, fromfile=fromfile, tofile=tofile)
        # Add line endings.
        return ''.join([d + '\n' for d in diff])

    def failIfDiff(self, first, second, fromfile='First', tofile='Second'):
        """If not first == second, fail with a unified diff."""
        if not first == second:
            msg = self.get_diff_msg(first, second, fromfile, tofile)
            raise self.failureException, msg

    assertNoDiff = failIfDiff

The get_diff_msg function is broken out to allow a test method to call self.fail(msg), where 'msg' might be the join'ed output of several diffs.

Happy testing!

10/26/08

Permalink 08:48:51 pm, by fumanchu Email , 102 words   English (US)
Categories: IT, Python

The Timeless Way of Software

I just finished Chris Alexander's The Timeless Way of Building and I only have one question with regards to software development: why do we laud the patterns and ignore the call to context? In other words: modularity is the enemy of usable software. It also happens to be the enemy of efficient and of readable software. If I see one more networking package or ORM with One Abstraction To Rule Them All I am going to scream. You and I are really good at abstractions. We are freaks. Most people have a hard time with them. Try not to proliferate them unnecessarily.

09/08/08

Permalink 11:36:06 am, by fumanchu Email , 132 words   English (US)
Categories: IT, CherryPy, WSGI

Resources are concepts for the client

...not objects on the server. Roy Fielding explains yet again:

Web architects must understand that resources are just consistent mappings from an identifier to some set of views on server-side state. If one view doesn’t suit your needs, then feel free to create a different resource that provides a better view (for any definition of “better”). These views need not have anything to do with how the information is stored on the server, or even what kind of state it ultimately reflects. It just needs to be understandable (and actionable) by the recipient.

I have found this to be the single most-misunderstood aspect of HTTP. Too many people conceive of URI's as just names for files or database objects. They can be so much more.

07/25/08

Permalink 01:20:56 am, by fumanchu Email , 243 words   English (US)
Categories: Python, CherryPy

CherryPy for Python 3000

I'm categorically rejecting the 2to3 approach--for myself anyway. If you think it would help, feel free to:

  1. "upgrade" CP to 2.6, which AFAICT means ensuring it will no longer work in 2.5 or previous versions
  2. turn on the 3k warning
  3. import-and-fix until you don't get any warnings
  4. run-tests-and-fix until you don't get any warnings
  5. run 2to3
  6. import-and-fix until you don't get any errors
  7. run-tests-and-fix until you don't get any errors
  8. wait for bug reports

Me, I'd rather just drop cherrypy/ into 3k and skip steps 1-5.

Changes I had to make so far (http://www.cherrypy.org/changeset/2029):

  • (4) urlparse -> urllib.parse
  • (24) "except (ExcA, ExcB):" -> "except ExcA, ExcB:"
  • (30) "except ExcClass, x:" -> "except ExcClass as x"
  • (22) u"" -> ""
  • (1) BaseHTTPServer -> http.server
  • (1) rfc822 -> email.utils
  • (4) md5.new() -> hashlib.md5()
  • (3) sha.new() -> hashlib.sha1()
  • (3) urllib2 -> urllib
  • (28) StringIO -> io
  • (1) func.func_code -> func.code
  • (6) Cookie -> http.cookies
  • (3) ConfigParser -> configparser
  • (1) rfc822._monthnames -> email._parseaddr._monthnames
  • (105) print -> print()
  • (35) httplib -> http.client
  • (22) basestring -> (str, bytes)
  • (12) items() -> list(items())
  • (46) iteritems() -> items()
  • (11) Thread.get/setName -> get/set_name
  • (1) exec "" -> exec("")
  • (1) 0777 -> 0o777
  • (1) Queue -> queue
  • (1) urllib.unquote -> urllib.parse.unquote

At the moment, I'm a bit blocked importing wsgiserver--we had a nonblocking version of makefile that subclassed the old socket._fileobject class. Looks like the whole socket implementation has changed (and much of it pushed down into C). Not looking forward to reimplementing that.

07/11/08

Permalink 04:16:50 pm, by fumanchu Email , 228 words   English (US)
Categories: WHELPS

Writing High-Efficiency Large Python Systems--Lesson #3: Banish lazy imports

Lazy imports can be done either explicitly, by moving import statements inside functions (instead of at the global level), or by using tools such as LazyImport from egenix. Here's why they suck:

> fetchall (PgSQL:3227)
--> __fetchOneRow (PgSQL:2804)
----> typecast (PgSQL:874)
... 26703 function calls later ...
----< typecast (PgSQL:944): 
      <mx.DateTime.DateTime object for
       '2005-08-15 00:00:00.00' at 2713120>
    3477.321ms

Yes, folks, that single call took 3.4 seconds to run! That would be shorter if I weren't tracing calls, but...ick. Don't make your first customer wait like this in a high-performance app. The solution if you're stuck with lazy imports in code you don't control is to force them to be imported early:

mx.DateTime.Parser.DateFromString('2001-01-01')

Now that same call:

> fetchall (PgSQL:3227)
--> __fetchOneRow (PgSQL:2804)
----> typecast (PgSQL:874)
... 7 function calls later ...
----< typecast (PgSQL:944): 
      <mx.DateTime.DateTime object for
       '2005-08-15 00:00:00.00' at 27cf360>
    1.270ms

That's 1/3815th the number of function calls and 1/2738th the run time. I am not missing decimal points.

Not only is this time-consuming for the first requestor, but lends itself to nasty interactions when a second request starts before the first is done with all the imports. Module import is one of the least-thread-safe parts of almost any app, because people are used to expecting all imports in the main thread at process start.

I'm trying very hard not to rail at length about WSGI frameworks that expect to start up applications during the first HTTP request...but it's so tempting.

07/03/08

Permalink 05:37:31 pm, by fumanchu Email , 319 words   English (US)
Categories: WHELPS

Writing High-Efficiency Large Python Systems--Lesson #2: Use nothing but local syslog

You want to log everything, but you'll find that even in the simplest requests with the fastest response times, a simple file-based access log can add 10% to your response time (which usually means ~91% as many requests per second). The fastest substitute we've found for file-based logging in Python is syslog. Here's how easy it is:

import syslog
syslog.syslog(facility | priority, msg)

Nothing's faster, at least nothing that doesn't require you telling Operations to compile a new C module on their production servers.

"But wait!" you say, "Python's builtin logging module has a SysLogHandler! Use that!" Well, no. There are two reasons why not. First, because Python's logging module in general is bog-slow--too slow for high-efficiency apps. It can make many function calls just to decide it's not going to log a message. Second, the SysLogHandler in the stdlib uses a UDP socket by default. You can pass it a string for the address (probably '/dev/log') and it will use a UNIX socket just like syslog.syslog, but it'll still do it in Python, not C, and you still have all the logging module overhead.

Here's a SysLogLibHandler if you're stuck with the stdlib logging module:

class SysLogLibHandler(logging.Handler):
    """A logging handler that emits messages to syslog.syslog."""
    priority_map = {
        10: syslog.LOG_NOTICE, 
        20: syslog.LOG_NOTICE, 
        30: syslog.LOG_WARNING, 
        40: syslog.LOG_ERR, 
        50: syslog.LOG_CRIT, 
        0: syslog.LOG_NOTICE, 
        }

    def __init__(self, facility):
        self.facility = facility
        logging.Handler.__init__(self)

    def emit(self, record):
        syslog.syslog(self.facility | self.priority_map[record.levelno],
                      self.format(record))

I suggest using syslog.LOCAL0 - syslog.LOCAL7 for the facility arg. If you're writing a server, use one facility for access log messages and a different one for error/debug logs. Then you can configure syslogd to handle them differently (e.g., send them to /var/log/myapp/access.log and /var/log/myapp/error.log).

Permalink 05:02:59 pm, by fumanchu Email , 189 words   English (US)
Categories: WHELPS

Writing High-Efficiency Large Python Systems--Lesson #1: Transactions in tests

Don't write your test suite to create and destroy databases for each run. Instead, make each test method start a transaction and roll it back. We just made that move at work on a DAL project, and the test suite went from 500+ seconds to run the whole thing down to around 100. It also allowed us to remove a lot of "undo" code in the tests.

This means ensuring your test helpers always connect to their databases on the same connection (transactions are connection-specific). If you're using a connection pool where leased conns are bound to each thread, this means rewriting tests that start new threads (or leaving them "the old way"; that is, create/drop). It also means that, rather than running slightly different .sql files per test or module, you instead have a base of data and allow each test to add other data as needed. If your rollbacks work, these can't pollute other tests.

Obviously, this is much harder if you're doing integration testing of sharded systems and the like. But for application logic, it'll save you a lot of headache to do this from the start.

06/27/08

Permalink 12:22:48 pm, by fumanchu Email , 238 words   English (US)
Categories: Python

Specifically designed to be readable

Duncan McGreggor writes:

The Twisted source code was specifically designed to be read
(well, the code from the last two years, anyway).

If that were true, then this would not be ('object' graciously donated by me to the Twisted Foundation):


>>> from twisted.web import http
>>> http.HTTPChannel.mro()
[<class 'twisted.web.http.HTTPChannel'>,
 <class 'twisted.protocols.basic.LineReceiver'>,
 <class 'twisted.internet.protocol.Protocol'>,
 <class 'twisted.internet.protocol.BaseProtocol'>,
 <type 'object'>,
 <class twisted.protocols.basic._PauseableMixin at 0x02ABCB70>,
 <class twisted.protocols.policies.TimeoutMixin at 0x02ABC420>,
]

This wouldn't be true either:

$ grep -R "class I.*" /usr/lib/python2.5/site-packages/twisted | wc -l
287

Interfaces are great for development of a framework, but suck for development with a framework. That must be an older rev on my nix box; that number's grown to 380 in trunk! Not all of those are Interfaces, but most are.

Here's my personal favorite:

for tran in 'Generic TCP UNIX SSL UDP UNIXDatagram Multicast'.split():
    for side in 'Server Client'.split():
        if tran == "Multicast" and side == "Client":
            continue
        base = globals()['_Abstract'+side]
        method = {'Generic': 'With'}.get(tran, tran)
        doc = _doc[side]%vars()
        klass = new.classobj(tran+side, (base,),
                             {'method': method, '__doc__': doc})
        globals()[tran+side] = klass

You've got a tough row to hoe, Twisted devs. Good luck.

06/11/08

Permalink 08:56:58 pm, by fumanchu Email , 335 words   English (US)
Categories: Python, CherryPy

Tracking memory leaks with Dowser

Marius Gedminas just wrote a post on memory leaks. He could have used Dowser to find the leak more easily, I'll bet.

Dowser is a CherryPy application for monitoring and managing object references in your Python program. Because CherryPy runs everything (even the listening HTTP socket) in its own threads, it's a snap to include Dowser in any Python process. Dowser is also very lightweight (because CherryPy is). Here's how I added it to a Twisted project we're using at work:

...
from twisted.application import service
application = service.Application("My Server")
s.setServiceParent(application)

import cherrypy
from misc import dowser
cherrypy.config.update({'server.socket_port': 8088})
cherrypy.tree.mount(dowser.Root())
cherrypy.engine.autoreload.unsubscribe()
# Windows only
cherrypy._console_control_handler.unsubscribe()
cherrypy.engine.start()

from twisted.internet import reactor
reactor.addSystemEventTrigger('after', 'shutdown', cherrypy.engine.exit)

The lines before 'import cherrypy' already existed and are here just for context (this is a Twisted service.tac module). Let's quickly discuss the new code:

  1. import cherrypy and dowser. You don't have to stick dowser into a 'misc' folder; that's just how I checked it out from svn.
  2. Set the port you want CherryPy to listen on; pick a port your app isn't already using if it's a TCP server.
  3. Mount the dowser root.
  4. Turn off the CherryPy autoreloader, and the Ctrl-C handler if you're on Windows. I should really turn that off by default in CP. :/
  5. Start the engine, which starts listening on the port in a new thread among other things.
  6. Tell Twisted to stop CherryPy when it stops.

Then browse to http://localhost:8088/ and you'll see pretty sparklines of all the objects. Change the URL to http://localhost:8088/?floor=20 to see graphs for only those objects which have 20 or more objects.

Then, just click on the 'TRACE' links to get lots more information about each object. See the Dowser wiki page for more details and screenshots.

04/26/08

Permalink 10:38:58 pm, by fumanchu Email , 53 words   English (US)
Categories: IT, Python

Vellum coming along nicely

First, a great aphorism from Zed's (Vellum book](http://www.zedshaw.com/projects/vellum/manual-final.pdf) (pdf):

Makefiles are the C programmer’s REPL and interpreter.

He also asks himself:

What’s the minimum syntax needed to describe a build specification?

I predict good things based on the presence of that question alone.

<< 1 2 3 4 5 6 7 8 9 10 11 >>

October 2017
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 31        

Search

The requested Blog doesn't exist any more!

XML Feeds

powered by b2evolution