« Writing High-Efficiency Large Python Systems--Lesson #1: Transactions in testsTracking memory leaks with Dowser »

Specifically designed to be readable

06/27/08

Permalink 12:22:48 pm, by fumanchu Email , 238 words   English (US)
Categories: Python

Specifically designed to be readable

Duncan McGreggor writes:

The Twisted source code was specifically designed to be read
(well, the code from the last two years, anyway).

If that were true, then this would not be ('object' graciously donated by me to the Twisted Foundation):


>>> from twisted.web import http
>>> http.HTTPChannel.mro()
[<class 'twisted.web.http.HTTPChannel'>,
 <class 'twisted.protocols.basic.LineReceiver'>,
 <class 'twisted.internet.protocol.Protocol'>,
 <class 'twisted.internet.protocol.BaseProtocol'>,
 <type 'object'>,
 <class twisted.protocols.basic._PauseableMixin at 0x02ABCB70>,
 <class twisted.protocols.policies.TimeoutMixin at 0x02ABC420>,
]

This wouldn't be true either:

$ grep -R "class I.*" /usr/lib/python2.5/site-packages/twisted | wc -l
287

Interfaces are great for development of a framework, but suck for development with a framework. That must be an older rev on my nix box; that number's grown to 380 in trunk! Not all of those are Interfaces, but most are.

Here's my personal favorite:

for tran in 'Generic TCP UNIX SSL UDP UNIXDatagram Multicast'.split():
    for side in 'Server Client'.split():
        if tran == "Multicast" and side == "Client":
            continue
        base = globals()['_Abstract'+side]
        method = {'Generic': 'With'}.get(tran, tran)
        doc = _doc[side]%vars()
        klass = new.classobj(tran+side, (base,),
                             {'method': method, '__doc__': doc})
        globals()[tran+side] = klass

You've got a tough row to hoe, Twisted devs. Good luck.

18 comments

Comment from: James Justin Harrell [Visitor] · http://jamesjustinharrell.com/

I had to disable your styling to prevent the bar on the right from sitting on top of the content, and you're talking about readability?

06/27/08 @ 13:52
Comment from: Kumar McMillan [Visitor] Email · http://farmdev.com/

Well put. My first experience with learning Twisted was reading through the official docs, following the example code. I plowed along only to realize that all the examples were wrong since Twisted's core interface had changed so drastically -- none of the examples worked. I thought, ok, I'll just read the source. That's where I got lost. I don't recall what version this was (this was perhaps a year ago?) and I'm sure things are much different now but I think it highlights that for any project to be successful it has to have up-to-date documentation presented in an easy-to-read way. That rules out most auto-generated docs (although I do think pydoctor has a nicer interface than most).

Django is a great example of why documentation makes a project successful. With all due respect, Django itself (the code) has many design flaws. But getting started with it is really easy and everything you need to know is documented.

06/27/08 @ 13:56
Comment from: Tristan Seligmann [Visitor] · http://mithrandi.vox.com/

The two code snippets you mention were written far more than two years ago, so far as I know (I suspect they may form some of the oldest code in Twisted, although I haven't actually checked.) As far as interfaces go, I'm not sure why you say they suck for development /with/ a framework; to me, they form critical documentation that greatly eases my use of the interfaces in question. Interfaces exist, whether you make them explicit or not; by making them explicit, you can attach a lot of useful information which makes the developer's life easier.

06/27/08 @ 13:57
Comment from: Duncan McGreggor [Visitor] · http://oubiwann.blogpsot.com/

"Interfaces are great for development of a framework, but suck for development with a framework."

What?! Why? You give no reason or evidence for this. Before you can, let me give counter evidence and support for my original claim:

Interfaces in Twisted provide code-level documentation (in the vein of "design-by-contract"). If you are going to try to learn Twisted in any depth, the very first thing you should do is read the interface(s) for the classes in which you are interested in understanding. Your own argument is defeated by your first example!

As for your other example:
1) it takes back to 2003, at the very latest, and thus does not meet the criteria for the explicit qualification I gave in my blog (past two year), and
2) it's for dynamic classes! How clear do you expect that to me?! ;-)

06/27/08 @ 14:06
Comment from: Pavel Pergamenshchik [Visitor]

What the heck? The intent of the code in the second example is mind-bogglingly, blindingly clear half-way through the module docstring. Would you prefer to see 13 repetitions of the same class definition in that file?

06/27/08 @ 17:55
Comment from: Lakin Wecker [Visitor] · http://lakin.weckers.net

@Duncan:
I agree that fumanchu hasn't properly backed up his claims. But neither do you. His argument isn't defeated by his first example? Or at least, you haven't given any proof to the contrary.

@Pavel:
No, we'd rather the framework design didn't necessitate such a mind-boggling repetitive definition of classes.

We'd rather the design was simple, not over-designed to the point of needing 380 class definitions to solve the same problem that most threaded programs solve by using a handful of synchronization primitives. Twisted takes its design so seriously that I honestly believe that it has violated most of the principles that brought me to python in the first place. To me, Twisted is the anti-thesis of python -c 'import this'

My experience with twisted over the past 5 months of full time development using it is: Twisted, and (more importantly) programs that use twisted, are dense, ugly, complicated, nested, impractical set of unreadable special cases that are hard to explain, and get in the way of understanding or solving the real problem.

06/27/08 @ 21:35
Comment from: Duncan McGreggor [Visitor] · http://oubiwann.blogpsot.com/

@Lakin:

My claim was simple: fumanchu presented the presence of interfaces as something that runs counter to readability. This goes counter to one of the primary uses of interfaces in any language, which is this: they explicitly document how classes should be used and/or created. Therefore the presence of interfaces increases the readability of the source code. Therefore, the evidence that he presented defeats his own argument.

As for the Twisted code that you have experienced: my guess is that the problem lies in one of two areas:
1) it was written by people that do not know Twisted, or
2) you do not understand it

The later is not meant as an insult, it's just a matter of fact that unless you have a great deal of experience working with concurrency in applications, it's going to look crazy. If you've only spent five months working with threading (and all of the problems inherent in writing threaded code that is actually correct), you'd be in the same boat. It takes five minutes to get started, but years of dedication to master this stuff.

I've seen horrible, horrible looking Twisted code. That's not Twisted's fault; it's the responsibility of whomever wrote it. Hell, when I first started working with Twisted, I wrote stuff that would probably make me weep now. Anyone can take a good framework and give it a bad name by creating trash and not taking responsibility for it. But here's what's more important: Twisted affords you (if you take the time) to write truly elegant code that solves very complicated problems quickly and efficiently.

One last point: if you think that a piece of threaded code is simple, I can show you an piece of Twisted code that does the same thing that is just as simple. If you are working on an asynchronous project and think that threaded code doesn't hide the problem and makes it easier to understand, then nearly every threaded programming expert in the field is shaking their head at you right now: it's just as difficult. One of the most sinister side effects of threaded programming is that a simple API is a breeding ground for propagating misunderstanding into buggy code. There's a quote from one of these (threading) field experts floating around somewhere that says something on the order of "there are three people in the world that really understand threads, and I'm not one of them."

If you think that one model of dealing with concurrency is better because it makes concurrent programming easy, then you really, really need to do more research and push yourself to understand what it is that's really going on. They are different models with dealing with one of the most difficult engineering problems humanity has ever faced. Hating one school of thought because of this (or because of someone else's bad code) doesn't make that engineering any easier. Taking the time to find out the right way to write that code, however, will make that engineering easier.

06/27/08 @ 22:35
Comment from: Lakin Wecker [Visitor] · http://lakin.weckers.net

@Duncan:

First off thanks for the clarification on your initial point. I now understand what you were trying to say.

I never made the claim that threaded programming makes concurrency easier to solve than asynchronous programming. Concurrency is hard, and neither approach makes it significantly easier.

I said that I found threading to provide a more simple interface than Twisted. Twisted is not the only approach to asynchronous programming, and it specifically happens to be one that I can't stand.

Admittedly our project has 4-5 people who aren't twisted experts. But it has at least one who is, and he set the example of how to write the code, and even reviewed and accepted our code. Yet I still find the code to be complicated, ugly, nested and horrid to maintain when compared with the simple easy to use alternative API (based on threading) that was proposed. This is not to say that threaded solutions always provide a more simple and effective way to solve concurrency issues than all other methods.

06/27/08 @ 23:55
Comment from: Duncan McGreggor [Visitor] · http://oubiwann.blogpsot.com/

@Lakin

Understood, and well-said. Thanks for clarifying.

And I also hate horrid, nasty, hard-to-maintain code ;-)

06/28/08 @ 00:32
Comment from: Glyph Lefkowitz [Visitor] Email · http://glyph.twistedmatrix.com/

@fumanchu: Interfaces are documentation. Most people complain that Twisted has too little documentation, or that the documentation is wrong. This is the first complaint I've received that there's too much of it!

Secondly, I wouldn't defend that example as the most straightforward code in the world, but:

1. If that code is the worst you can find in Twisted, I'll be relieved. Really? A loop which creates a few dynamic classes? Do you actually find that hard to read, compared to, say, this? http://cherrypy.org/browser/tags/cherrypy-3.0.3/cherrypy/_cpthreadinglocal.py#L178

2. We're human. Everybody makes mistakes. The fact that we value readable code doesn't make all of our code magically readable. This particular code predates our everything-gets-reviewed-for-readability by several years; I think it might even go back to Twisted 1.0. Duncan specifically had a caveat in his blog post about this.

3. I guess you're complaining about the depth of the inheritance hierarchy in that interactive interpreter example. It's not really clear, but I'll go ahead based on that assumption. In more recent years we've come to prefer composition to inheritance, but inheritance is a way to make code more readable by separating concerns. Do you really want to read about line-termination or buffering logic in the middle of the HTTP implementation? Or timeouts, or producer/consumer setup? Those things are kept elsewhere for a reason; they're used by lots of different code within Twisted. If you want help navigating the inheritance hierarchy and you don't have an IDE to help you with that, we generate cross-referenced API documentation which points at both interfaces and base classes:
http://twistedmatrix.com/documents/8.1.0/api/twisted.web.http.HTTPChannel.html
It can be a helpful code-reading aid if you're trying to follow the logic of a system.

@Kumar: Almost all of our example code dates from before we instituted test-driven development for the whole project, so the examples are largely untested and do sometimes unfortunately break. However, I know that not "all" the examples are wrong, since I've run some of them recently. Have you reported the breakages that you encountered as bugs? http://twistedmatrix.com/trac/newticket

You're absolutely right about Django's documentation, though. I think their documentation is awesome. I hope that one day Twisted's can be as good. And I'm not just saying that now! You can ask Jacob Kaplan-Moss; I told him so at PyCon.

@Lakin: You seem to have some very deep-seated misunderstandings about what Twisted is and does. "380 class definitions" - and I'm assuming you're talking about the 380 interfaces which Bob found - to solve the same problem that most threaded programs solve by using a handful of synchronization primitives"? What? Let's take a look at a representative interface from Twisted:
http://twistedmatrix.com/documents/8.1.0/api/twisted.words.iwords.IChatClient.html
What synchronization primitive, exactly, provides a multi-protocol multiplexing abstract description of a user's interaction with an instant messaging service? Is that "condition"? I have to say I never really understood the distinction between "condition" and "event" too thoroughly but I am pretty sure that's not what it does.

Twisted needs exactly zero class definitions to solve the same problems that most threaded programs solve by using synchronization primitives; Twisted doesn't have those problems. One might say that the whole point of having something like Twisted is to avoid those problems entirely. Of course, you do get some other problems, for example the issue that you need a serialization protocol to communicate between parallelized system components, but the interesting thing about Twisted is that you discover that you rarely even need parallelized system components, so you have the problem fairly rarely.

"we'd rather the framework design didn't necessitate such a mind-boggling repetitive definition of classes." It doesn't, in any way that I can think of. That particular file was just defining a bunch of different classes for convenience. I personally happen to think that it was a bad idea to do it that way, but the code in question is from twisted.application, which is the deployment and configuration part of Twisted. It's not a core part of the engine; in fact, I don't often write programs which use those particular classes because I tend not to use that style of configuration.

I'm sorry to hear that your experience developing with Twisted has been negative. But this isn't a productive conversation; you're not even making sense, let alone criticising Twisted in a way which would let us deal with the issues that you're having and improve it. You've called Twisted "repetitive", "over-designed", say that it has "violated ... the principles that brought me to Python", "ugly", "complicated", "impractical", "unreadable" and so on and so on. This is a lot of name-calling. But there are no specific examples to back it up. What makes a program ugly? What makes it repetitive? Can you cite a specific piece of functionality in Twisted which encourages or requires repetitiveness, complexity, impracticality? What constitutes "over"-design?

So, since you haven't given me anything specific to counter the name-calling, let me instead do something equally non-specific: name-dropping. Here are some companies whose employees disagree with you, since they use Twisted and love it: Lucasfilm. Dreamworks. Google. Apple. Zenoss. ITA Software. And, as I recently discovered, Rackspace.

I don't like name-calling or name-dropping. I'd prefer to discuss real issues rather than try to posture and demonstrate my authority. After all, you've got a masters' degree in computer science and I'm a college dropout, so I'm at a disadvantage there. I'm just giving you this small list of examples in the hope that you will reconsider your strongly biased attitude. If so many people doing such radically different things like it, after all, it must not be that bad. Why has it been so bad for you, and not them? What's different about what you're doing?

As far as "neither approach makes it significantly easier"; actually, I think the research community has come to the conclusion by now that shared-mutable-state multithreading is basically impossible to get right. Event-driven programming doesn't really make the problem any easier, but it does avoid introducing massive diffculties of its own. You might want to have a look at Mark Miller's paper, "Concurrency Among Strangers" - http://www.erights.org/talks/promises/ - specifically section 4, "why not shared-state concurrency".

It's a common mistake to think that both approaches are equivalent. However, one reason you don't like Twisted may simply be that you don't fully appreciate all the problems that you have to track down in threaded approaches to distributed applications, which you don't even need to worry about in the Twisted approach. You're seeing the benefits of threaded code (easier to read, fewer concerns, less indentation) without seeing any of the costs (security problems, race conditions, deadlocks, reduced testability, and so on, and so on).

06/28/08 @ 02:43
Comment from: Thomas Hervé [Visitor]

Judging the overall complexity of a system by its most complex part is pointless. It's like looking at Objects/abstract.c and saying "oh my god you Python guys are stupid I'll never use Python!".

06/28/08 @ 02:45
Comment from: Lakin Wecker [Visitor] · http://lakin.weckers.net

Hi Glyph,

I apologize for both not being clear on what part of Twisted I was criticizing, and for the un-founded name calling. Maybe this will teach me not to post late at night after a particularly frustrating week. :)

I do understand that Twisted's primary purpose is networking, and threads primary purpose is allowing concurrency. However, in order to allow for parallel processing of network interactions, twisted does provide classes and facilities for supporting concurrency. I would like to say more, but this post is not the appropriate forum.

I am in the middle of an extremely negative experience with Twisted. Not all of it is Twisted's fault. I suppose this sort of outburst on my part has been a long time coming, and I should have begun it in a more constructive manner and in smaller amounts rather than letting my negative experience with it boil to the point of exploding into such a non-productive experience.

As you pointed out name-calling and name-dropping will get us no where, so I will not continue it further.

We've long since stopped discussing the original point of fumanchu's post, and I am largely responsible for starting down that direction.

I would like to respond, but it's probably best served if I did it somewhere else. When I do respond, I promise you that will make every attempt to turn it into the sort of constructive criticism that either results in my learning of something new, or in a set of issues that you may address within Twisted.

06/28/08 @ 11:27
Comment from: fumanchu [Member] Email

With a deep inheritance chain, code becomes difficult to follow serially. Any typical block of logic requires tracing the code through multiple superclasses, and Python doesn't provide very good static tools for determining which superclass' method is being called; i.e., without actually instantiating the object. Frameworks make this especially difficult since they tend to have objects which are difficult to instantiate in isolation. Twisted is certainly no exception to this.

Twisted, more than any other major Python project I have seen, makes this worse not only by proliferating superclasses, but also by placing them in separate module files and even disparate subpackages. Was the TCP class I wanted (whose name I can't quite recall exactly) in twisted/internet, twisted/application/internet, twisted/internet/posixbase, twisted/internet/protocol, or twisted/internet/tcp? Can't recall.

Composition instead of inheritance actually makes this problem worse in a dynamically-typed language, since it's often not clear where to find the source code for an object that's passed in or set as an attribute. Twisted's callback architecture compounds this since even the execution flow is dynamically composed.

All of which is to say, Twisted is designed to be DRY first, robust second, and readable a distant third because of the constraints imposed by the first two. You may have made a great effort to make Twisted more readable--great! Worth the effort. But it isn't designed to be readable, and IMO cannot be called so without a concerted effort to reduce either its "reuse everywhere" design, or its scope.

06/28/08 @ 12:33
Comment from: Glyph Lefkowitz [Visitor] Email · http://glyph.twistedmatrix.com/

@fumanchu: It seems we may have different definitions of "readability" entirely.

I find DRY to be a critical component of readability. It takes more time and effort to read repetitive code, because you have to learn to recognize large-scale patterns rather than simple names which identify those patterns. I don't think that this is some weird quirk I've picked up. I think this is the reason that the subroutine is invented. In fact, wikipedia specifically has an entry about the subroutine, stating "There are many advantages to breaking a program up into subroutines, including...improving readability of a program".

Do you want malloc.c inlined on every line of python code? It's constantly being invoked in lots of different ways, and it's very far away from your Python source.

Perhaps saying that Twisted is "designed for readability" is not really precise enough to be a truly useful classification, but it does mean something. More specifically, Twisted requires that all code that is modified (either new or old code being maintained) have accompanying docstrings that describe its intent, and that the reviewer of each change (who did not work on that change) be able to understand it based on that documentation.

06/28/08 @ 14:16
Comment from: fumanchu [Member] Email

@glyph,

Apparently we do have entirely different definitions of "readability". Perhaps when Duncan wrote "source code" he meant that term to include "English statements describing what the Python statements do". I didn't read it that way at first--I assumed he meant the Python statements, since I tend to understand code by reading it directly more than by reading descriptions of it. YMMV. It'd be nice to be good at providing both.

06/28/08 @ 14:44
Comment from: Lakin Wecker [Visitor] · http://lakin.weckers.net

I'm hoping that the discussion I (mistakenly) started in these comments can carry on at http://www.abetterkindofangry.com/2008/07/simple-problem-deserves-simple-solution.html

07/01/08 @ 15:29
Comment from: Colin [Visitor] · http://syllogism.co.za

Uh, what's wrong with interfaces?

Regardless of what you think is wrong with there being a lot of them (Twisted does a lot of stuff, so this is unsurprising) I've very rarely had to actually touch any of them.

07/21/08 @ 03:26
Comment from: fumanchu [Member] Email

Interfaces separate information about a class from the class itself. That means either additional time looking up such info (often in a different file), or keeping such info in one's head. Being lazy and a bear of very little brain, I'm pretty ruthless about reducing the former in duration and the latter in volume. If it took you a moment to parse that last sentence because you had to look at the previous sentence to match "former" and "latter" to their concrete referents, then QED.

07/21/08 @ 15:18

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
August 2017
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
    1 2 3 4 5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    

Search

The requested Blog doesn't exist any more!

XML Feeds

blog software