When Nubi first became a student of Master Fu, he would ask the master many things.
"Which is the finest Web framework?" "Why is an Object Database better than a Relational Database?" "What is the best sorting algorithm?"
To these, the master would make no reply, nor even acknowledge that Nubi had spoken. However, Nubi would ask other masters and their students these same questions, and received many replies.
One day, Nubi asked Fu, "Master, there are many questions I ask of you which you do not answer. Yet I ask these same questions of others, and they have many answers. I begin to doubt, and fear that I will not find Wisdom by following you. Why do you remain silent?"
Master Fu replied, "I do not answer because you have no question."
At this reply, Nubi was greatly distressed, and said, "if I have no question, then how can others answer me?"
Without speaking, Fu hit Nubi over the head with his walking-stick.
Nubi turned himself in a circle in great confusion, and asked, "why did you hit me with your stick just now?"
Fu forced the stick into Nubi's hand. Upon grasping the stick, Nubi was enlightened.
Jeremy Hylton posted a little blurb about wanting "to use composition instead of inheritance". I try to do that, too, and have an additional trick I play. Because function-calls are rather expensive in Python, I sometimes use None instead of a "dummy object" for the null-behavior case.
class Parent(object): pass a = Parent() a.lock = None b = Parent() b.lock = lambda: True
>>> import timeit >>> t1 = timeit.Timer("if a.lock: a.lock()", "import test; a = test.a") >>> t2 = timeit.Timer("if b.lock: b.lock()", "import test; b = test.b") >>> t2.timeit() 1.7819946389834485 >>> t1.timeit() 0.44152985924188926 >>> t3 = timeit.Timer("b.lock()", "import test; b = test.b") >>> t3.timeit() 1.373198548977598
The third case,
t3, is what usually happens with normal delegation: even with a dummy object, a function-call is performed each time. If you have an object, you also have attribute lookup every time. If you have a high ratio of null delegates, it can save some time to use None instead. Using the naive timings above, if you have 50% null delegates, the times should average to
(1.78 + .44) / 2 = 1.11 versus the
1.37 average of dummy objects. If you have, say, 75% nulls, you get
.775 per call, on average.
I've got to add access control to "Mission Control" (our main business application), and I cannot decide where to do it.
There are two main components: a UI (built on a UI framework, Cation), and a Model layer full of business objects (built on an Object-Relational Mapper, Dejavu).
Part of me wants to build the controls into the business objects themselves, since that's really what needs to be protected. I don't care if someone can access a page if they can't see any data on it (or add any data through it). It wouldn't be difficult to enforce, because all the business objects already subclass
dejavu.Unit, the base class for the ORM. The downside is that Dejavu doesn't currently have any concept of users or access controls. Any user context, therefore, would have to be passed into Dejavu from outside. This wouldn't be too hard, since every Dejavu session checks out its own
Sandbox object--I could just stick a reference to the access-control library into the Sandbox. I'd probably end up leaving the default Sandbox as is, and write a new SecureSandbox class, then decide between the two via a config parameter.
Writing the locks into the Model would be nice, because, in general, those change less often than the View. When new features are added, having the locks in the View code means a higher probability of forgetting to test for permission.
It would also be nice for management by end-user admins--they tend to understand "you can see Mission Trips but not Projects" better than long lists of event descriptions like "Show the Zip Code of the Primary Addressee on a Group Scheduling Report (but not a Group Scheduling Calendar)".
But another part of me wants to put the locks in the View. It would allow finer-grained permissions, and allow locks on behaviors which don't involve persisted business objects. It would also be better-integrated, since the View layer already interacts with authentication code in the webserver. There's also a lot of precedent, in that most ORM's don't manage permissions.
I don't get it. I keep seeing people try to help software developers by telling them to stop trying (not trying to pick on you, John). There is a consistent trend to answer tough questions with, "someone has already written that, go use theirs". When did that start, and why do I keep seeing it?
I should probably explain why I find it a hard pill to swallow. Quite simply, I have strong recent experiences where I learned far more about a technology by building it than by using someone else's. This is certainly the case with my two "main" frameworks right now, Cation and Dejavu. Cation is a web application framework, and Dejavu is an object-relational mapper. In both cases, there are already many Python implementations, each with their own quirks and design decisions. Part of what drove me to write my own was that, given a choice between a dozen or more viable Python web app frameworks, I didn't have enough information to decide between them. Writing my own gave me that knowledge. I now feel I have the ability to judge which one I would go with in a new project, because I've hit the roadblocks myself and overcome them. Just as importantly, I know my own data better and can better match up the feature set that supports our requirements.
I'm the only programmer here at Amor, so I have to write a little bit of everything. I have to say that in those cases where I do go with someone else's code, I almost always learn it and live with it, for good or ill. Nothing ever gets fixed, it only gets worked around. Real fixes have a much higher cost--when it's Someone Else's Code, it's Somebody Else's Problem.
If software development is a craft, then we have to allow the apprentices to try things out. Maybe you, dear reader, got all the training you'll ever need in college, where you were forced to write compilers and parsers, full UI's and databases. But I didn't. I'd still like to learn more.
Fu's First Law: Everything's hard 'til you've done it once.
...if you do want great software, you have to let the developers own what they're building.
The converse is also true for me: if you want great software, you have to let the developers build what they're owning.
Do I think this is always true? Of course not. I'm not advocating a glorified "Not Invented Here" syndrome. There's no need for me to write Python all over again (however, I needed to hack bytecode for Dejavu). I don't need to write mod_python all over again (however, I needed to inspect its source code on more than one occasion for Cation). My point is that context is all-important, and recommending to someone that they skip the learning process strikes me as irresponsible, if you do not first inquire about that person's context.
Update: and now I read Jeffrey Shell: "But in my experience, the work required to do a good integration in a lot of these systems seems to be about equal to writing one from scratch."
Due to my trip to El Paso during Palm Sunday week (March 19-25 or thereabouts), I'm sad to say I won't be attending PyCon in Washington, DC. This is extremely unfortunate, since so many Python notables are scheduled to attend this year.
So I'll say "see you some other time" to Facundo Batista, Ian Bicking, JP Calderone, Brett Cannon, Michael Chermside, Andrew Dalke, Fred Drake, Bruce Eckel, Jim Fulton, Raymond Hettinger, Steve Holden, Jim Hugunin, Jeremy Hylton, Bob Ippolito, Andrew Kuchling, Cameron Laird, Ed Leafe, Glyph Lefkowitz, Ted Leung, Alex Martelli, Tim Peters, Armin Rigo, Simon Willison, Ka-Ping Yee, Holger Krekel, Martin Löwis, Guido van Rossum! and many others whose work I have read, respected, and (on occasion) stolen.
I'll work on going next year.
On c.l.p., Leif K-Brooks wrote:
My ideal solution would be an object database (or object-relational mapper, I guess) which provided total transparency in all but a few places, built-in indexing, built-in features for handling schema changes, the ability to create attributes which are required to be unique among other instances, and built-in querying with pretty syntax.
Dejavu meets a bunch of those needs (transparency, indexing, very pretty querying--no schema change tools though), but my initial reaction to reading his list was to say to myself, "Dejavu doesn't really provide 'attributes which are required to be unique'."
Well, it turns out it does, it's just that they're limited: the 'ID' attribute is the only one which can be required to be unique. They do not need to be globally unique--you could have a Customer object with an ID of 23 and a Purchase object with an ID of 23. But you can't have two Customer objects, each with the same ID of 23.
Well, that's not quite true. Rather than having a ton of explicit guards sprinkled throughout the code, the 'requirement to be unique' is enforced by the fact that Sandboxes will simply overwrite one Unit with another if they have the same ID. It's just that this is done lazily, so far a brief period of time, you could have two objects with the same ID.
When you memorize B, your Storage Manager may allocate space for instance B. But you'll only be able to recall either A or B (provider-dependent).
The other built-in guard is the UnitSequencer, which provides ID values when you don't. If you provide your own ID, then the sequencer doesn't get called. Right now, there are two sequencers:
I suppose I could rewrite the Null one to guarantee uniqueness, in order to avoid the corner case I outlined above. I just haven't had the need. I'm a conscientious coder, and haven't needed any more checks than these two.
The final issue for (future) users of Dejavu might be that they need some attribute other than ID to be unique. But I've never run into that, and am reticent to add it unless someone asks for it. So ask, but be prepared to convince me that that's a better approach than redesigning your model, or using a custom StorageManager to fake it, or both.
Dejavu's domain objects (those objects which are part of your model, and which you probably want to persist) all subclass from
dejavu.Unit. When you create a new Unit, you declare that it should be persisted by calling
Unit.memorize(). Plenty of other ORM's (Object-Relational Mappers) don't work this way; instead, every object is automatically persisted unless you specify otherwise. Why doesn't Dejavu do it automatically?
In the most common case, Dejavu uses the Model which you design to create and populate a database. Your Unit classes become tables, Unit Properties translate into columns, and each Unit instance is persisted as a row. In this case, your write the Model and let it drive the database design to match. In some cases, however, you need to integrate with an existing database. Dejavu has been designed to support this, as well (although it requires more work).
Often, a pre-existent database will possesses validation checks, from type and value constraints to referential integrity enforcement. Such guards often require that data be collected and pre-processed by your application before submitting it for persistence; that is, you must get "everything right" before you add a new row to a table. Since these requirements are application-specific, it would be impossible for Dejavu to guess when the object is "ready to persist". As the Zen says, "In the face of ambiguity, refuse the temptation to guess."
It is entirely possible for you to write a subclass of
dejavu.Unit which, at the end of
memorize for you. It would have been much harder and more confusing to do the opposite. If you find yourself not needing the flexibility which explicit calls to
memorize provide (and remembering to call
memorize becomes a burden), feel free to use such a subclass.
Further, Dejavu is designed to work with multiple stores, which may be determined dynamically. I had a use case, for example, to manage Transaction objects, where Income Transactions were placed in one store, and Expense Transactions were placed in another store. A custom Storage Manager proxied the two stores, and decided in which store to persist each Transaction Unit based on the
is_expense attribute. That attribute might not be known at the time an automatic mechanism persisted the object. To make matters worse, the Income store used an autoincrementing ID field, a fact over which I had no control, so I couldn't simply migrate a Transaction from one store to the other as the attribute changed.
Another approach would have been to simply have two separate Unit classes, IncomeTransaction and ExpenseTransaction. However, this would have broken encapsulation--the storage requirements would be intruding on the model design. I very much want the Model to migrate seamlessly on the day when we ditch the Income store, and the integrated Transaction class fits the company's mental and behavioral model better.
All that said, the decision comes down to "explicit is better than implicit". Since Dejavu is a library, it's much easier to provide the functionality in an explicit manner, and allow application developers to make it implicit if they see fit. If the behavior is performed implicitly, "behind the scenes", it is much more difficult to allow developers to then make that explicit when they need to; you end up either exposing class internals, or forcing the developer to reproduce your internal logic, often incompletely.
Update: I stumbled onto Mike Spille's blog, which talks a bit more (and better) about middleware versus libraries.
Ian Bicking recently promoted the idea of a WSGI reference library, to possibly include the following components (among others):
Not being the most careful reader in the world, I was thrown by the phrase, "...collaborating on a ... library of WSGI middleware"; I read the list as if he meant each piece would be a middleware component! Of course he did not intend that. Many of the items in the list are WSGI applications, which sit at the end of the software stack.
Some of the items in the list are, in fact, paraware; that is, they parallel the main application. Traditional programming libraries/toolkits are a common example of paraware. They provide functionality by supplying input and output hooks, which are supplied and consumed by the main application:
mylib.set_value(3) mylib.munge(obj) result = mylib.get('f')
Middleware, on the other hand, handles/munges a content stream, and sits between at least two other components in a software stack. Middleware is a nasty thing in many environments, because each middleware component must manage I/O of all shared objects, in two directions (both its caller and the next component in the stack). In Python, however (and specifically WSGI), the shared objects are all on the same heap, and can all be passed by reference.
I see problems with writing most of these components as middleware. WSGI has a shot at being ubiquitous because it enforces a set of interfaces and a data model; this same enforcement, however, can also be a liability, since WSGI is not yet ubiquitous. As a developer of a web framework, I have a dilemma: I need to provide the same functionality whether my users use WSGI or not. This means I need to write such components as libraries (so they can be used as paraware) and then wrap them with WSGI boilerplate (so they can be used as middleware). This leads to serious code smell. WSGI's callback structure is complicated enough without me introducing library-code wrappers. Perhaps what we need are generic pieces of WSGI middleware which you can init with a callback from your library code. Hmmm.
I've been meaning for a while now to investigate breaking my Cation app framework down into a set of libraries (instead of the monolithic framework it is today). You can see from the dearth of recent checkins that I haven't done any of it yet. Many of those could be added to a WSGI library (some are already on Ian's list). Here are the ones I'd be most interested in writing:
I'd like to do this myself because Cation keeps a list of application developers (usernames), and shows full tracebacks in the browser to developers. Ordinary users get a "pretty" error message, and the full traceback goes into the log only. I'm pretty sure a standard library version wouldn't do that. Integrating the usernames into the error handling logic leads me to want to provide this as paraware, since middleware components are usually not expected to interoperate.
This isn't WSGI-specific, and shouldn't be a candidate for WSGI. But it's something I'd like to rewrite in more of a library style, instead of a framework.
For example, this would assist a WSGI application in fulfilling a request to shut down--each active web request (thread) could be sent a shutdown message and kill itself gracefully from outside the application itself.
Since HTML form values are always received as strings, a standard (but overridable) way to convert them into Python values would be helpful. In the other direction, values need to be coerced to strings, put in the encoding of the server (or of the page), and often quoted safely. Again, this would probably need enough customizability that it would be a poor candidate for middleware, but a good candidate for a set of library calls.
Classic middleware, meeting a need orthogonal to the actual content delivery, and not needing customization or context.
Something that might on occasion need to be specialized, but ultimately a commodity for 90+% of cases. The standard implementation would be nothing more than a pretty interface over simple (but secure) file management.
That's enough for the next year or so Pity I have so many other projects to work on simultaneously.
|<< <||> >>|