Pages: << 1 2 3 4 5 6 7 8 9 10 11 ... 17 >>
Toshio Kuratomi's How to Build Applications Linux Distributions will Package. As a web framework dev, this was priceless.
I cringe at a lot of API's these days, because I see designers making the same mistakes again and again. Perhaps the most pervasive mistake is the dreaded NBU design: Namespacing By Underscores. For example, imagine you have a "Thing" class with a "color" attribute:
t = Thing()
t.color = 'red'
One day, you decide to switch from color names to RGB triples. Why, oh, why is this your first thought?
t.color_r = 255
t.color_g = 0
t.color_b = 0
That's your PHP (or Javascript, or SQL, or other) experience poking its ugly head in. Yes, PHP 5.3 finally has namespaces, and you can use objects as namespaces in JS if you're diligent. But chances are, you won't.
In Python, namespaces are easy. Use them. Ask yourself what the clearest syntax is, and you might come up with something like this:
>>> t.color = RGB(255, 0, 0)
>>> t.color.red
255
This is not just a matter of clever delegation (replacing a str attribute with an RGB object)--it covers all manner of interface design decisions. Here's a recent example from python-dev regarding the email package's interface for Python 3:
message.headers['Subject']
message.bytes_headers['Subject']
Please don't do that--it makes it seem as if the "message" object has a set of headers and a distinct set of bytes_headers. At the least, you've elevated the rare case to be a peer of the common case. A new user of the email module shouldn't see anything about bytes in help(message) or dir(message). Instead, write this:
message.headers['Subject'] = 'A conversation'
message.headers['Subject'].encoding = 'utf-8'
message.headers['Subject'].encode()
Or, if you really prefer bytes over unicode as the canonical representation:
message.headers['Subject'] = b'A conversation'
message.headers['Subject'].encoding = 'utf-8'
message.headers['Subject'].decode()
If message.headers[x].encoding is given a sane default, and you expect the vast majority of users to only deal in unicode, they may never see the .encoding and .encode attributes. Good! We've made the common case easy and the rare cases possible.
In addition, we've embellished the Header object with a bytes representation using standard Python conventions: just like Python 3's str object has an encode method, so does our Header object. It's far easier to remember that such a convention applies, than to remember a brand-new name like "bytes_headers" or "decoded_headers".
Namespaces are one honking great idea -- let's do more of those! But please not faked via underscores.
For PyCon 2009, I'm giving two talks! One on extending CherryPy and one on the innards of Dejavu/GeniuSQL. I think I've finally reduced my talks to the required time slots (I could easily have made 4-hour talks for each
and posted my presentations:
Use the arrow keys or mouse-click to proceed through them. The images don't load as fast over the network as they will when I present, so be patient if you preview them yourself. Also, try to use 1024 x 768 fullscreen--they're laid out specifically for that resolution.
Update: video is now available thanks to the great people who put on PyCon:
JJ Behrens is looking for work. Smart guy. You should hire him.
After you hire him, have a look at my resume and hire me too.
I'm starting a new category here: the Linnaeus Awards. Candidates must be examplars of Linnaean Taxonomy:
The method, the soul of science, designates at first sight any body in nature in such a way that the body in question expresses the name that is proper to it, and that this name recalls all the knowledge that may, in the course of time, have been acquired about the body thus named: so that in the midst of extreme confusion there is revealed the sovereign order of nature.
So, if you encounter a trout in the wild, you don't call it a "trout". You call it an "Oncorhynchus (mykiss) aguabonita masculinus trescenti-septi-squamatic duodecim-annus-natis...", stuffing every conceivable attribute of the object into its name.
Feel free to nominate additional candidates here or email: linnaeus@aminus.org.
Today's nomination:
Factory-factories are not new. But this one goes a step further with some of its "implementing classes":
...and genuflective attributes like:
But it doesn't stop there; the copy nominates itself:
There is nothing magic about the request processor: It may very well be a POJO. The RequestProcessorFactoryFactory is passed to the AbstractReflectiveHandlerMapping at startup...
Passing a factory-factory to an abstract-anything makes this a good candidate. Using the phrase "nothing magic" with a straight face catapults it to the top.
I recently had to test output that consisted of a long list of dicts against an expected set. After too many long debugging sessions with copious print statements and lots of hand-comparison, I finally got smart and switched to using Python's builtin difflib to give me just the parts I was interested in (the wrong parts).
With difflib and a little pprint magic, a failing test now looks like this:
Traceback (most recent call last):
File "C:\Python25\lib\site-packages\app\test\util.py", line 237, in tearDown
self.assertNoDiff(a, b, "Expected", "Received")
File "C:\Python25\lib\site-packages\app\test\util.py", line 382, in failIfDiff
raise self.failureException, msg
AssertionError:
--- Expected
+++ Received
@@ -13,4 +13,3 @@
{'call': 'getuser101',
'output': {'first_name': 'Georg',
'gender': u'Male',
'last_name': 'Handel',
...}}
{'call': 'getuser1',
'output': None}
{'call': 'getuser101',
'output': {'first_name': 'Georg',
'gender': u'Male',
'last_name': 'Handel',
...}}
-{'call': 'getuser101',
'output': {'first_name': 'Georg',
'gender': u'Male',
'last_name': 'Handel',
...}}
...and I can now easily see that the "Received" data is missing the last dict in the "Expected" list. Here's the code (not exactly what I committed at work, but I think this is even better):
import difflib
from pprint import pformat
class DiffTestCaseMixin(object):
def get_diff_msg(self, first, second,
fromfile='First', tofile='Second'):
"""Return a unified diff between first and second."""
# Force inputs to iterables for diffing.
# use pformat instead of str or repr to output dicts and such
# in a stable order for comparison.
if isinstance(first, (tuple, list, dict)):
first = [pformat(d) for d in first]
else:
first = [pformat(first)]
if isinstance(second, (tuple, list, dict)):
second = [pformat(d) for d in second]
else:
second = [pformat(second)]
diff = difflib.unified_diff(
first, second, fromfile=fromfile, tofile=tofile)
# Add line endings.
return ''.join([d + '\n' for d in diff])
def failIfDiff(self, first, second, fromfile='First', tofile='Second'):
"""If not first == second, fail with a unified diff."""
if not first == second:
msg = self.get_diff_msg(first, second, fromfile, tofile)
raise self.failureException, msg
assertNoDiff = failIfDiff
The get_diff_msg function is broken out to allow a test method to call self.fail(msg), where 'msg' might be the join'ed output of several diffs.
Happy testing!
I just finished Chris Alexander's The Timeless Way of Building and I only have one question with regards to software development: why do we laud the patterns and ignore the call to context? In other words: modularity is the enemy of usable software. It also happens to be the enemy of efficient and of readable software. If I see one more networking package or ORM with One Abstraction To Rule Them All I am going to scream. You and I are really good at abstractions. We are freaks. Most people have a hard time with them. Try not to proliferate them unnecessarily.
Most programs, especially libraries and frameworks, need "configuration". But exactly how to implement that is a murky subject, mostly because the boundary between "configuration" and "code" is itself ill-defined.
So let's try to define it. The first thing you might notice is that the dictionary definition of "configuration", an arrangement of parts, is quite different from what you typically find in a modern "configuration file". For example, take a typical Apache httpd.conf file. It does contain several directives which identify components: LoadModule, for example. But far outweighing these are directives which set attributes, usually on an object or on the system as a whole. Directives like "Listen 80", "ThreadsPerChild 250", and "LogLevel debug", even though they could be implemented via arrangements of pieces, probably aren't. Instead, the values are most likely implemented as permanent cell variables which never appear, or move, or disappear, but instead only change in value. Even the LoadModule directive doesn't really arrange any pieces within a space; it merely identifies and includes them in an abstract set of "loaded modules". One might argue that the Location context directive deals with arrangements of URL's, but those aren't really arranged; they simply exist. You can't rearrange /path/to/resource to be above /path. No, the dictionary definition of "configuration" as "arrangement" is a holdover from our mostly-hardware past, where even the most dynamic "configuration system" still required moving cards and jumpers around in physical space.
There are some notable exceptions, of course, but the vast majority of software on the market today that is "configurable" consists of a fairly static set of objects, plus a formalized means of tweaking a subset of attributes of those objects. The most common exception to this, the "plugin", is also rarely arranged with respect to other plugins or components; instead, it is merely "turned on" or included. I believe this tendency is due to a natural human limitation: we just don't reason about graphs and networks very well yet, at least not nearly as well as we reason about vectors (of instructions) and sets. We feel good when working on serial problems, and bad when working on parallel ones. As Chris Alexander said:
There is little purpose, then, in saying: It would be better if this force did not exist. For if it does exist anyway, designs based on such wishful thinking will fail.
So then, let's discuss ways to implement this kind of "configuration". Again, let's look at Apache's httpd.conf: here we find almost a DSL, in that http_config.h defines functions to tokenize and parse a config file into another representation, a config vector. Then that intermediate structure is transformed into the actual used values like, say, request_rec->server->keep_alive_timeout.
Or take a typical postgresql.conf file. The entries therein are translated (via the ConfigureNamesBool array) to their internal variable names, and set globally. For example, check_function_bodies is implemented as an extern in guc.h. When a block of code needs to switch on the value of check_function_bodies, it #includes that header and reads the global value directly.
These designs carry with them several problems:
server package, for example, config.c has more lines of code than any other C module except core.c.There is a way to implement "configuration" as we have defined it above (setting values on named attributes) which avoids the above problems. Rather than defining a layer where external names, types, and values get translated to internal names, types, and values in an ad-hoc mapping, we can define a better translation step by obeying 3 very simple constraints:
For example, if you have an internal "database" object with a "default_encoding" string attribute, the conventional approach might yield a config file entry like:
DatabaseDefaultEncoding: utf8
But if we follow the above constraints, we instead see config entries like this:
database.default_encoding = 'utf8'
We can generalize that to:
(path.to.object).key = value
...and in fact, we can write a simple parser which performs just that mapping. In the simplest implementation, only the set of objects is defined, and the set of keys is open-ended (that is, any attribute of the given object(s) is overridable):
for key, value in config.pairs():
objname, attrname = key.rsplit(".", 1)
obj = configurables[objname]
setattr(obj, attrname, value)
In contrast to the conventional approach, in the "Direct Attribute Configuration" pattern:
That's enough for now; feel free to expand in the comments.
...not objects on the server. Roy Fielding explains yet again:
Web architects must understand that resources are just consistent mappings from an identifier to some set of views on server-side state. If one view doesn’t suit your needs, then feel free to create a different resource that provides a better view (for any definition of “better”). These views need not have anything to do with how the information is stored on the server, or even what kind of state it ultimately reflects. It just needs to be understandable (and actionable) by the recipient.
I have found this to be the single most-misunderstood aspect of HTTP. Too many people conceive of URI's as just names for files or database objects. They can be so much more.
Interesting timing on Joe Gregorio's latest foray. Lately, I've been URI-ifying all the JSON calls which etsy.com's PHP layer makes to the back end (partly with the hope that that API would be opened up to the public someday, but that isn't currently a business need). Even though the company is bucking the mainstream quite successfully, the site itself is pretty typical e-commerce. Here's what I ended up with.
Out of 298 URI's (not counting querystring variants):
/users/{user_id}/images/)/collection/subcollection/{id}, and tend to map to a database row (although many of those are virtual, being split in practice over several tables)./collection/count./collection/ids/./collection/count_and_limited_ids, which is perhaps a quirk of our architecture; at some point, I'd like to see how splitting these each into 2 calls affects performance.DELETE /collection/{id}/cacheThe URI space for this API is pretty sparse right now--these URI were simply created to replace an existing RPC-style space of procedure names. And it's essentially a single data point. However, I think it's pretty representative of e-commerce needs for RESTful JSON. One lesson might be that pagination (count and ids) should be addressed in any coordinated protocol effort.