Categories: IT, Architecture, Linnaeus Award, Python, Cation, CherryPy, Dejavu, WHELPS, WSGI, Robotics and Engineering

Pages: << 1 2 3 4 5 6 7 8 9 10 11 ... 17 >>

07/25/08

Permalink 01:20:56 am, by fumanchu Email , 243 words   English (US)
Categories: Python, CherryPy

CherryPy for Python 3000

I'm categorically rejecting the 2to3 approach--for myself anyway. If you think it would help, feel free to:

  1. "upgrade" CP to 2.6, which AFAICT means ensuring it will no longer work in 2.5 or previous versions
  2. turn on the 3k warning
  3. import-and-fix until you don't get any warnings
  4. run-tests-and-fix until you don't get any warnings
  5. run 2to3
  6. import-and-fix until you don't get any errors
  7. run-tests-and-fix until you don't get any errors
  8. wait for bug reports

Me, I'd rather just drop cherrypy/ into 3k and skip steps 1-5.

Changes I had to make so far (http://www.cherrypy.org/changeset/2029):

  • (4) urlparse -> urllib.parse
  • (24) "except (ExcA, ExcB):" -> "except ExcA, ExcB:"
  • (30) "except ExcClass, x:" -> "except ExcClass as x"
  • (22) u"" -> ""
  • (1) BaseHTTPServer -> http.server
  • (1) rfc822 -> email.utils
  • (4) md5.new() -> hashlib.md5()
  • (3) sha.new() -> hashlib.sha1()
  • (3) urllib2 -> urllib
  • (28) StringIO -> io
  • (1) func.func_code -> func.code
  • (6) Cookie -> http.cookies
  • (3) ConfigParser -> configparser
  • (1) rfc822._monthnames -> email._parseaddr._monthnames
  • (105) print -> print()
  • (35) httplib -> http.client
  • (22) basestring -> (str, bytes)
  • (12) items() -> list(items())
  • (46) iteritems() -> items()
  • (11) Thread.get/setName -> get/set_name
  • (1) exec "" -> exec("")
  • (1) 0777 -> 0o777
  • (1) Queue -> queue
  • (1) urllib.unquote -> urllib.parse.unquote

At the moment, I'm a bit blocked importing wsgiserver--we had a nonblocking version of makefile that subclassed the old socket._fileobject class. Looks like the whole socket implementation has changed (and much of it pushed down into C). Not looking forward to reimplementing that.

07/11/08

Permalink 04:16:50 pm, by fumanchu Email , 228 words   English (US)
Categories: WHELPS

Writing High-Efficiency Large Python Systems--Lesson #3: Banish lazy imports

Lazy imports can be done either explicitly, by moving import statements inside functions (instead of at the global level), or by using tools such as LazyImport from egenix. Here's why they suck:

> fetchall (PgSQL:3227)
--> __fetchOneRow (PgSQL:2804)
----> typecast (PgSQL:874)
... 26703 function calls later ...
----< typecast (PgSQL:944): 
      <mx.DateTime.DateTime object for
       '2005-08-15 00:00:00.00' at 2713120>
    3477.321ms

Yes, folks, that single call took 3.4 seconds to run! That would be shorter if I weren't tracing calls, but...ick. Don't make your first customer wait like this in a high-performance app. The solution if you're stuck with lazy imports in code you don't control is to force them to be imported early:

mx.DateTime.Parser.DateFromString('2001-01-01')

Now that same call:

> fetchall (PgSQL:3227)
--> __fetchOneRow (PgSQL:2804)
----> typecast (PgSQL:874)
... 7 function calls later ...
----< typecast (PgSQL:944): 
      <mx.DateTime.DateTime object for
       '2005-08-15 00:00:00.00' at 27cf360>
    1.270ms

That's 1/3815th the number of function calls and 1/2738th the run time. I am not missing decimal points.

Not only is this time-consuming for the first requestor, but lends itself to nasty interactions when a second request starts before the first is done with all the imports. Module import is one of the least-thread-safe parts of almost any app, because people are used to expecting all imports in the main thread at process start.

I'm trying very hard not to rail at length about WSGI frameworks that expect to start up applications during the first HTTP request...but it's so tempting.

07/03/08

Permalink 05:37:31 pm, by fumanchu Email , 319 words   English (US)
Categories: WHELPS

Writing High-Efficiency Large Python Systems--Lesson #2: Use nothing but local syslog

You want to log everything, but you'll find that even in the simplest requests with the fastest response times, a simple file-based access log can add 10% to your response time (which usually means ~91% as many requests per second). The fastest substitute we've found for file-based logging in Python is syslog. Here's how easy it is:

import syslog
syslog.syslog(facility | priority, msg)

Nothing's faster, at least nothing that doesn't require you telling Operations to compile a new C module on their production servers.

"But wait!" you say, "Python's builtin logging module has a SysLogHandler! Use that!" Well, no. There are two reasons why not. First, because Python's logging module in general is bog-slow--too slow for high-efficiency apps. It can make many function calls just to decide it's not going to log a message. Second, the SysLogHandler in the stdlib uses a UDP socket by default. You can pass it a string for the address (probably '/dev/log') and it will use a UNIX socket just like syslog.syslog, but it'll still do it in Python, not C, and you still have all the logging module overhead.

Here's a SysLogLibHandler if you're stuck with the stdlib logging module:

class SysLogLibHandler(logging.Handler):
    """A logging handler that emits messages to syslog.syslog."""
    priority_map = {
        10: syslog.LOG_NOTICE, 
        20: syslog.LOG_NOTICE, 
        30: syslog.LOG_WARNING, 
        40: syslog.LOG_ERR, 
        50: syslog.LOG_CRIT, 
        0: syslog.LOG_NOTICE, 
        }

    def __init__(self, facility):
        self.facility = facility
        logging.Handler.__init__(self)

    def emit(self, record):
        syslog.syslog(self.facility | self.priority_map[record.levelno],
                      self.format(record))

I suggest using syslog.LOCAL0 - syslog.LOCAL7 for the facility arg. If you're writing a server, use one facility for access log messages and a different one for error/debug logs. Then you can configure syslogd to handle them differently (e.g., send them to /var/log/myapp/access.log and /var/log/myapp/error.log).

Permalink 05:02:59 pm, by fumanchu Email , 189 words   English (US)
Categories: WHELPS

Writing High-Efficiency Large Python Systems--Lesson #1: Transactions in tests

Don't write your test suite to create and destroy databases for each run. Instead, make each test method start a transaction and roll it back. We just made that move at work on a DAL project, and the test suite went from 500+ seconds to run the whole thing down to around 100. It also allowed us to remove a lot of "undo" code in the tests.

This means ensuring your test helpers always connect to their databases on the same connection (transactions are connection-specific). If you're using a connection pool where leased conns are bound to each thread, this means rewriting tests that start new threads (or leaving them "the old way"; that is, create/drop). It also means that, rather than running slightly different .sql files per test or module, you instead have a base of data and allow each test to add other data as needed. If your rollbacks work, these can't pollute other tests.

Obviously, this is much harder if you're doing integration testing of sharded systems and the like. But for application logic, it'll save you a lot of headache to do this from the start.

06/27/08

Permalink 12:22:48 pm, by fumanchu Email , 238 words   English (US)
Categories: Python

Specifically designed to be readable

Duncan McGreggor writes:

The Twisted source code was specifically designed to be read
(well, the code from the last two years, anyway).

If that were true, then this would not be ('object' graciously donated by me to the Twisted Foundation):


>>> from twisted.web import http
>>> http.HTTPChannel.mro()
[<class 'twisted.web.http.HTTPChannel'>,
 <class 'twisted.protocols.basic.LineReceiver'>,
 <class 'twisted.internet.protocol.Protocol'>,
 <class 'twisted.internet.protocol.BaseProtocol'>,
 <type 'object'>,
 <class twisted.protocols.basic._PauseableMixin at 0x02ABCB70>,
 <class twisted.protocols.policies.TimeoutMixin at 0x02ABC420>,
]

This wouldn't be true either:

$ grep -R "class I.*" /usr/lib/python2.5/site-packages/twisted | wc -l
287

Interfaces are great for development of a framework, but suck for development with a framework. That must be an older rev on my nix box; that number's grown to 380 in trunk! Not all of those are Interfaces, but most are.

Here's my personal favorite:

for tran in 'Generic TCP UNIX SSL UDP UNIXDatagram Multicast'.split():
    for side in 'Server Client'.split():
        if tran == "Multicast" and side == "Client":
            continue
        base = globals()['_Abstract'+side]
        method = {'Generic': 'With'}.get(tran, tran)
        doc = _doc[side]%vars()
        klass = new.classobj(tran+side, (base,),
                             {'method': method, '__doc__': doc})
        globals()[tran+side] = klass

You've got a tough row to hoe, Twisted devs. Good luck.

06/11/08

Permalink 08:56:58 pm, by fumanchu Email , 335 words   English (US)
Categories: Python, CherryPy

Tracking memory leaks with Dowser

Marius Gedminas just wrote a post on memory leaks. He could have used Dowser to find the leak more easily, I'll bet.

Dowser is a CherryPy application for monitoring and managing object references in your Python program. Because CherryPy runs everything (even the listening HTTP socket) in its own threads, it's a snap to include Dowser in any Python process. Dowser is also very lightweight (because CherryPy is). Here's how I added it to a Twisted project we're using at work:

...
from twisted.application import service
application = service.Application("My Server")
s.setServiceParent(application)

import cherrypy
from misc import dowser
cherrypy.config.update({'server.socket_port': 8088})
cherrypy.tree.mount(dowser.Root())
cherrypy.engine.autoreload.unsubscribe()
# Windows only
cherrypy._console_control_handler.unsubscribe()
cherrypy.engine.start()

from twisted.internet import reactor
reactor.addSystemEventTrigger('after', 'shutdown', cherrypy.engine.exit)

The lines before 'import cherrypy' already existed and are here just for context (this is a Twisted service.tac module). Let's quickly discuss the new code:

  1. import cherrypy and dowser. You don't have to stick dowser into a 'misc' folder; that's just how I checked it out from svn.
  2. Set the port you want CherryPy to listen on; pick a port your app isn't already using if it's a TCP server.
  3. Mount the dowser root.
  4. Turn off the CherryPy autoreloader, and the Ctrl-C handler if you're on Windows. I should really turn that off by default in CP. :/
  5. Start the engine, which starts listening on the port in a new thread among other things.
  6. Tell Twisted to stop CherryPy when it stops.

Then browse to http://localhost:8088/ and you'll see pretty sparklines of all the objects. Change the URL to http://localhost:8088/?floor=20 to see graphs for only those objects which have 20 or more objects.

Then, just click on the 'TRACE' links to get lots more information about each object. See the Dowser wiki page for more details and screenshots.

04/26/08

Permalink 10:38:58 pm, by fumanchu Email , 53 words   English (US)
Categories: IT, Python

Vellum coming along nicely

First, a great aphorism from Zed's (Vellum book](http://www.zedshaw.com/projects/vellum/manual-final.pdf) (pdf):

Makefiles are the C programmer’s REPL and interpreter.

He also asks himself:

What’s the minimum syntax needed to describe a build specification?

I predict good things based on the presence of that question alone.

04/25/08

Permalink 07:21:10 pm, by fumanchu Email , 42 words   English (US)
Categories: IT, Python

Epic [FAIL]

You have my permission to name your next test framework, library, or script "epic" and bill it as "more full of [FAIL] than any other test thingy".

Oh, and http://www.google.com/search?q=epic.py

/me looks in Titus' direction...

04/22/08

Permalink 11:58:01 am, by fumanchu Email , 105 words   English (US)
Categories: IT, Python, Dejavu

LINQ in Python

Chui's counterpoint pines:

There are some interesting ideas raised in LINQ that even Python developers ought to explore and consider adopting in a future Python.

Python had all this before LINQ in Dejavu and now Geniusql, and more pythonically, to boot. Instead of:

var AnIQueryable = from Customer in db.Customers where
    Customer.FirstName.StartsWith("m") select Customer;

you can write:

m_names = Customer.select(
    lambda cust: cust.FirstName.startswith("m"))

and instead of:

var AverageRuns =(from Master in this.db.Masters
    select Master.Runs).Average()

you can write:

avgruns = Masters.select(lambda m: avg(m.Runs))

02/18/08

Permalink 03:49:54 pm, by fumanchu Email , 1195 words   English (US)
Categories: IT

Xtremely Quick Development System

Divmod has a development methodology which they call UQDS. It's billed as lightweight, but I've been using it for 6 months now and find it burdensome. The basic flow of UQDS is: make a ticket, do all work in a branch, get a full review, merge to trunk. In theory, this brings the benefits of peer review and fewer conflicts between developers. In practice, however, I've found the following problems:

  1. Review is great, but is often performed by whomever is "free" rather than by those whom the change most affects. The larger the group affected, the more review happens after the change is merged back to trunk anyway.
  2. Conflicts are not reduced, they're delayed and distributed. One branch still gets merged to trunk before others, and those others often must all forward merge.
  3. The amount of change in a branch grows too large, and often incorporates changes which have little or nothing to do with the original ticket. Comments and whitespace get touched up from one dev to another; small buglets get found and fixed; refactoring happens. Review is more difficult.
  4. Commit history is unreadable. Changesets consist entirely of massive merges from branches and equally massive forward merges.
  5. The review overhead isn't worth the supposed benefits; developers spend too much time in review, and often invent bikeshed problems in order to feel their review time is worthwhile. There is no distinction between the review process for tiny versus massive changesets.
  6. Many problems still leak through the review process, but reverting trunk changesets is harder because they tend to be too large and mix concerns. That is, you often end up reverting 1/2 a changeset, which subversion does not make easy. UQDS tries to avoid reverting multiple changesets at once, but overcorrects by making fractional reverts more common. With Subversion, at least, I'd rather be saddled with the former.
  7. UQDS says, "[developers] can take all the time they need to come up with a good message" when they commit to trunk. In reality, they don't--they just want to finish quickly and move on to the next ticket.

So, here's my answer:

The Xtremely Quick Development System (XQDS)

The goals of XQDS:

  1. Code fast.
  2. Improve the process documentation.

The strategy of XQDS:

  1. Reduce changeset size. This allows everyone to code faster, since they don't need to exert as much mental effort to understand others' changes. It also makes the timeline more readable (and revertible!), since each changeset and its commit message is atomic.
  2. Resolve conflicts immediately.
  3. Apply review resources only where needed.
  4. Make tests easier to run than not run. If your test suite takes so long that you're forced to run it on a remote machine, you've either done something wrong or this methodology is too small for you. Which is fine.
  5. Record all decisions: on tickets where warranted, on trunk changesets regardless.

The flow of XQDS is:

  1. A task is created in an issue tracker.
  2. The task is discussed, on the ticket if possible. If the conversation is long or ephemeral, it may be conducted on IRC or elsewhere; however, at least the final decision(s) should be summarized on the ticket, not on a wiki page, mailing list, or other medium. This step may be ongoing as work is done.
  3. Someone accepts the ticket. If work lapses, others are always free to accept it for themselves.
  4. The worker does work in their local copy of trunk. Branches are created sparingly as needed, and the worker switches their local copy using svn switch.
  5. Work gets merged to trunk as it passes the full test suite in the smallest functional chunks possible. Sometimes "smallest possible" can actually be quite large; however, refactorings, buglets, and doc improvements are committed in their own changesets. In my experience, there's nothing worse than trying to review a changeset that's 5% ticket fix and 95% whitespace changes.
  6. Each commit includes a message that MUST describe the actual change; comments about the reasons or context are desirable but secondary. However, commits that directly influence a ticket always reference the ticket.
  7. Everyone runs svn up on at least a daily basis, and definitely before committing. Conflicts are resolved as needed with each local copy.

Questions:

  • Don't you lose the benefit of branching? What if Jethro checks in broken code and goes to lunch?

    1. Jethro shouldn't be checking in broken code. You fix this by increasing peer shame, not ignoring it or shifting the detection work onto another developer.
    2. This happens even using UQDS. Review by a single developer doesn't catch all possible broken code.
    3. You can still branch whenever you see fit. You're just not required to branch for each ticket.
  • Don't you lose the benefit of review? Review helps avoid conflict and also teaches the reviewee.

    1. In my experience, review is better with XQDS than UQDS. First, you apply review resources where they are needed. Not every change needs review. This allows developers to invest more energy into reviews which merit it.
    2. You raise the bar for review: everyone is expected to watch the timeline, svn up often, and help resolve conflicts.
    3. Senior developers are allowed to touch up junior developers' code in situ. It's always faster and often much more effective to show improvements than explain them.
    4. People who care about or are particularly skilled about various aspects of code quality are themselves responsible for upkeep. At some point, you can stop trying to teach Jethro how to avoid contention over concurrent shared mutable resources because he's never going to get it. At some point, you stop trying to make the Twisted developer follow your PEP-8 whitespace conventions because she has no incentive to do so; you fix it yourself and get on with life.
  • What if I need to share unfinished code with another developer? Or switch developers mid-feature? Or switch platforms mid-feature? Or switch features mid-developer?

    • Make a branch. XQDS doesn't prohibit this. But it doesn't mandate it like UQDS does. All that Combinator nonsense can be replaced with svn switch and a simple folder rename when you want to put aside some work for a while.
  • Isn't trunk broken more often?

    • Not if you have a good local test suite. If your test suite can't detect broken code, you need to write more functional (end-to-end) tests.
  • UQDS says it improves information flow to managers. Don't you lose that with XQDS?

    • Not at all. On the one hand, managers use the same tools (e.g. Trac timelines) that developers use to monitor progress. But with XQDS, the timeline is actually readable without all those cross merges. Managers can therefore also write Trac SQL queries, for example, which report developer activity based on the changeset activity.
  • Doesn't XQDS require more conflict resolution?

    • Not more, just more immediate. As UQDS notes, this can result in a situation where you feel like you cannot commit your local work because it's now broken due to someone else's changes. But that's a false impression: just svn switch to a new branch and commit your (now broken) changes. You're going to have to resolve the conflict either way; but XQDS allows you to skip making a branch unless you need it.

<< 1 2 3 4 5 6 7 8 9 10 11 ... 17 >>

August 2014
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30
31            

Search

The requested Blog doesn't exist any more!

XML Feeds

free blog software