I heartily agree with the bold bits at least:
So dear users: Use my stuff, have fun with it. And letting me know that you're doing is the best reward I can think of. And if you can contribute patches, that's even better.
I'm tired of codes.
By "code" I mean a mapping from one set of terms to another.
That's a code.
Codes are good for reducing space and/or time if you really need to. A 4-byte integer takes less space than an 8-byte+overhead string. 'grep -u' takes less typing than 'grep --unix-byte-offsets'
Codes are good if names vary. Internationalization techniques like gettext map various translations to a single key (often the phrase as rendered in the dominant language). But even within the same language, people change the names they use to refer to things all the time.
Codes are good at hiding information.
Whether you want them to or not.
That's a problem.
Because codes hide information, the user of the code, whether willing or not, has to have access to the code. That means either a copy of the mapping table in its entirety, or a copy of an algorithm for performing the mapping.
Some of these you can keep in your head, but there's only so much space in your head.
We invented paper to keep more of these than could fit in our brains, but paper is slower than brain.
We invented computers to manage the volume of paper but 'command --help' and 'man command' are still slower than brain.
If a code exists to save space but space becomes microscopically cheap, do you still need a code?
If a code exists to save a person time but the person wastes more time looking up the code than they save using it, do you still need a code?
If a code exists to save a computer time but the computer wastes more time looking up the code than it saves using it, do you still need a code?
Codes don't just introduce the cost of mapping. They're far worse. Codes take a domain A which has its own syntax (the relationship of one thing in the domain to another thing in the domain) and introduce a second domain B with its own syntax (again, within the domain), in addition to the new semantic (the relationship between domains). (A) <-> (B). That's 3 analytic elements in place of 1.
But it's even worse in information systems since domain A is probably already a set of names with its own set of referents to things in the real world R. So instead of (R) <-> (A) we now have (R) <-> (A) <-> (B). If I have to map from B to R, that's 6 sets of interactions I now need to understand. You're pushing the 7±2 boundary.
Names refer to things.
If you need a name to refer to a name, that's a code.
Codes add complexity.
If you have a choice, expose directly. Many of you don't have a choice because you still think the unix command line is the best UI ever. You need to get out more. There are UI's out there that can show you the mapping without interrupting your flow). Many of you don't have a choice because you think in C or some other close-to-the-metal language which requires manual memory management and lots of numbered wires. Please keep using codes there. But please don't bring them into high-level languages: we're better off without them.
My brain is full and I'm tired of being slowed down by codes that return worse than nothing for their investment. Please stop inventing new ones. I know, you're a computer scientist and that's what computer scientists do. But you're good at it (aren't you?), both authoring and using them. Most people aren't. The rest of us are busy.
In The text/plain Semantic Web, Benjamin Carlyle argues:
Perhaps the most important media type in an enterprise-scale or world-scale semantic web or REST architecture is text/plain. The text/plain type is essentially schema free, and allows a representation to be retrieved or PUT with little to no jargon or domain-specific knowledge required by server or client. It is applicable to a wide range of problems and contexts, and is easily consumed by tools and humans alike.
Substitute 'application/json' and that paragraph starts to make sense. But then, the author also says "To my mind the best resource in formatting and processing of simple text-compatible data types can be found in the specification for XML Schema." So perhaps I shouldn't be too hard on the poor refugee. He comes tantalizingly close:
Part of the problem that emerges is that text/plain is not specific enough. It doesn't have sub-types that are clearly tied to a specification document or standards body. This makes interoperability a potential nightmare of heuristic detection.
Another problem with using text/plain in its bare form is its default assumption of a US-ASCII character type. This can lead to obvious problems in a modern internationalised world.
Both of which JSON solves nicely: it has basic types and SHALL be encoded with a Unicode encoding (utf8 by default).
Again, ideally we would be making use of a well-defined standards body to own and maintain the media types used to communicate very basic information.
The IANA and IETF sound like well-defined standards bodies to me...
Perhaps the clearest indication that you are overusing text/plain is that you are experiencing an explosion in hyperlinks. When you start to need a document to provide links for consumers to find these text/plain-centric resources, you should probably consider incorporating the information directly into these documents themselves.
A. Hyperlinks are a Good Thing.
B. You should first consider providing hyperlinks in a machine-discoverable fashion; text/plain is not it. A nice version of "it" is using XHR to GET/PUT application/json resources.
C. Allow comments on your blog.
Toshio Kuratomi's How to Build Applications Linux Distributions will Package. As a web framework dev, this was priceless.
I cringe at a lot of API's these days, because I see designers making the same mistakes again and again. Perhaps the most pervasive mistake is the dreaded NBU design: Namespacing By Underscores. For example, imagine you have a "Thing" class with a "color" attribute:
t = Thing() t.color = 'red'
One day, you decide to switch from color names to RGB triples. Why, oh, why is this your first thought?
t.color_r = 255 t.color_g = 0 t.color_b = 0
In Python, namespaces are easy. Use them. Ask yourself what the clearest syntax is, and you might come up with something like this:
>>> t.color = RGB(255, 0, 0) >>> t.color.red 255
This is not just a matter of clever delegation (replacing a str attribute with an RGB object)--it covers all manner of interface design decisions. Here's a recent example from python-dev regarding the email package's interface for Python 3:
Please don't do that--it makes it seem as if the "message" object has a set of headers and a distinct set of bytes_headers. At the least, you've elevated the rare case to be a peer of the common case. A new user of the email module shouldn't see anything about bytes in help(message) or dir(message). Instead, write this:
message.headers['Subject'] = 'A conversation' message.headers['Subject'].encoding = 'utf-8' message.headers['Subject'].encode()
Or, if you really prefer bytes over unicode as the canonical representation:
message.headers['Subject'] = b'A conversation' message.headers['Subject'].encoding = 'utf-8' message.headers['Subject'].decode()
If message.headers[x].encoding is given a sane default, and you expect the vast majority of users to only deal in unicode, they may never see the .encoding and .encode attributes. Good! We've made the common case easy and the rare cases possible.
In addition, we've embellished the Header object with a bytes representation using standard Python conventions: just like Python 3's
str object has an encode method, so does our Header object. It's far easier to remember that such a convention applies, than to remember a brand-new name like "bytes_headers" or "decoded_headers".
Namespaces are one honking great idea -- let's do more of those! But please not faked via underscores.
For PyCon 2009, I'm giving two talks! One on extending CherryPy and one on the innards of Dejavu/GeniuSQL. I think I've finally reduced my talks to the required time slots (I could easily have made 4-hour talks for each and posted my presentations:
Use the arrow keys or mouse-click to proceed through them. The images don't load as fast over the network as they will when I present, so be patient if you preview them yourself. Also, try to use 1024 x 768 fullscreen--they're laid out specifically for that resolution.
Update: video is now available thanks to the great people who put on PyCon:
My roomie Josh put on an art show at his church last Friday. Very cool show; around 1000 people showed up.
I was busy with a "guys night" for my church's youth groups and couldn't get out to the art show until midnight or so. Good thing; I scored one of the live art pieces for our kitchen wall at home (doorknob included for scale):
The Linnaeus Awards
I'm starting a new category here: the Linnaeus Awards. Candidates must be examplars of Linnaean Taxonomy:
The method, the soul of science, designates at first sight any body in nature in such a way that the body in question expresses the name that is proper to it, and that this name recalls all the knowledge that may, in the course of time, have been acquired about the body thus named: so that in the midst of extreme confusion there is revealed the sovereign order of nature.
So, if you encounter a trout in the wild, you don't call it a "trout". You call it an "Oncorhynchus (mykiss) aguabonita masculinus trescenti-septi-squamatic duodecim-annus-natis...", stuffing every conceivable attribute of the object into its name.
- Longest Name. Names which are mashed together because of formalLanguageIdentifierRestrictions or natürlichsprachemodifizierdiarrhöe might get bonus points.
- Most Dimensions. Names which incorporate knowledge from varied axes, the more the better.
- Most Abstract. Placing the number-of-scales-on-a-fish into its name is fun, but for real "sovereign order" you need to incorporate the vocabulary of the taxonomy itself into the name. For example, a function in a spreadsheet program named, "SpreadsheetProgramAdditionFunction". Bonus points for including terms from ontology, taximetrics, or metaphysics.
Feel free to nominate additional candidates here or email: email@example.com.
Factory-factories are not new. But this one goes a step further with some of its "implementing classes":
...and genuflective attributes like:
But it doesn't stop there; the copy nominates itself:
There is nothing magic about the request processor: It may very well be a POJO. The RequestProcessorFactoryFactory is passed to the AbstractReflectiveHandlerMapping at startup...
Passing a factory-factory to an abstract-anything makes this a good candidate. Using the phrase "nothing magic" with a straight face catapults it to the top.