« Best of PyCon 2009PyCon Presentations »

Python is not PHP

04/10/09

Permalink 12:59:29 pm, by fumanchu Email , 407 words   English (US)
Categories: IT, Python

Python is not PHP

I cringe at a lot of API's these days, because I see designers making the same mistakes again and again. Perhaps the most pervasive mistake is the dreaded NBU design: Namespacing By Underscores. For example, imagine you have a "Thing" class with a "color" attribute:

t = Thing()
t.color = 'red'

One day, you decide to switch from color names to RGB triples. Why, oh, why is this your first thought?

t.color_r = 255
t.color_g = 0
t.color_b = 0

That's your PHP (or Javascript, or SQL, or other) experience poking its ugly head in. Yes, PHP 5.3 finally has namespaces, and you can use objects as namespaces in JS if you're diligent. But chances are, you won't.

In Python, namespaces are easy. Use them. Ask yourself what the clearest syntax is, and you might come up with something like this:

>>> t.color = RGB(255, 0, 0)
>>> t.color.red
255

This is not just a matter of clever delegation (replacing a str attribute with an RGB object)--it covers all manner of interface design decisions. Here's a recent example from python-dev regarding the email package's interface for Python 3:

message.headers['Subject']
message.bytes_headers['Subject']

Please don't do that--it makes it seem as if the "message" object has a set of headers and a distinct set of bytes_headers. At the least, you've elevated the rare case to be a peer of the common case. A new user of the email module shouldn't see anything about bytes in help(message) or dir(message). Instead, write this:

message.headers['Subject'] = 'A conversation'
message.headers['Subject'].encoding = 'utf-8'
message.headers['Subject'].encode()

Or, if you really prefer bytes over unicode as the canonical representation:

message.headers['Subject'] = b'A conversation'
message.headers['Subject'].encoding = 'utf-8'
message.headers['Subject'].decode()

If message.headers[x].encoding is given a sane default, and you expect the vast majority of users to only deal in unicode, they may never see the .encoding and .encode attributes. Good! We've made the common case easy and the rare cases possible.

In addition, we've embellished the Header object with a bytes representation using standard Python conventions: just like Python 3's str object has an encode method, so does our Header object. It's far easier to remember that such a convention applies, than to remember a brand-new name like "bytes_headers" or "decoded_headers".

Namespaces are one honking great idea -- let's do more of those! But please not faked via underscores.

13 comments

Comment from: Wayne Witzel III [Visitor] · http://pieceofpy.com

I have to agree. I also see it a lot in config files, even if the config file is in Python. Never made any sense to me.

04/10/09 @ 15:48
Comment from: Kit Dallege [Visitor]

Hmm, what ever could you mean. Here is an example strait out of Django's settings file that's always driven me nuts. Not picking on Django, just one of a ton of examples that can be found in most every popular lib.

DATABASE_ENGINE = 'postgresql'
DATABASE_NAME = 'mydb'
DATABASE_USER = 'me'
DATABASE_PASSWORD = 'mypass'
DATABASE_HOST = ''
DATABASE_PORT = ''

04/10/09 @ 16:56
Comment from: Thomas [Visitor]

I'm not sure what this has to do with PHP. Either way, wouldn't something like this be better?

t = Thing()
t.color = Color(name = 'red').rgb(255, 0, 0)


04/10/09 @ 17:52
Comment from: Thomas [Visitor]

I might prefer this:

t.color = Color(name = 'red', rgb = (255, 0, 0))

04/10/09 @ 18:13
Comment from: rgz [Visitor] Email · http://rgzblog.blogspot.com

Hey! I take offense at the javascript reference! I only really use jQuery but I make extenssive use of namespaces in javascript because javascript makes it easier than python!

The closes similar things in python are dictionaries and named tuples and neither are as convenient*!

* For namespacesing, for other stuff dicts and named tuples are superior of course.

04/10/09 @ 21:48
Comment from: love encounter flow [Visitor] Email · http://jizura.com

i've come more and more to work in ways that differ from more classical oop as promoted by most pythonistas:

(1) i write libraries of methods that i publish as stateless singletons;

(2) i keep ALL of my data in completely generic types---the ‘seven sisters of JSON’: None, True, False, number, str, list, dict. from these you can build/simulate more advanced types like sets and bags etc. the central structural datatype is the dictionary. there is nothing in data that can not be represented with these seven types (i would not be against native sets, dates and so on, but we don’t get those in JSON).

(3) library methods accept parameters in terms of the seven sisters; most typically, the first argument to a method will be a dictionary that represents structured data.

example: my_color = { '~isa':'foobar/color', 'red':255, 'green':45, 'blue':120, }. the wave sign in front of ~isa denotes a ‘functional namespace sigil’, where ~ denotes standardized system realm; other examples are % for cached values and @ left to the user. these sigils help to keep apart subnamespaces. the convention is that anything that can not be directly expressed in terms of None, True, False, number, str, or list gets wrapped in a dict with a suitable value for ~isa. (i do very little structural validation and structural inheritance right now, but this may evolve.)

benefit #1: ALL of your data is at any point in time ready to be pickled and ready to be serialized to JSON and sent over the wire (as long as no circular structures are involved). ciao to the ways i used to literally bury data in my specific objects, copying them back and forth between different types of objects. i do much more data enrichment and data teransformation these days. no more fiddling around with __getinitargs__, __getnewargs__, __getstate__, __reduce__, __reduce_ex__, __setstate__, and friends---yes, this is pretty hard to get right. these days i pickle.dump(x) and as_json(x) my data without a second thought and can actually read the outputs of print as_json(x) with indentation turned on.

ANYTHING i ever keep in a list or a dictionary (with a name that does not start with %) is ALWAYS either a string, a number, a boolean, None, or a dict or a list that is non-circular. a simple rule to live by.

you see, the hitch with classical oop is that it does not seem to address, as such, two splits that i see in data handling: on one hand you have ‘data that persists’ and, on the other hand, ‘data that is transitional’. the other dimension is the split between ‘data that is generic’ (like booleans, numbers and so on) and ‘data that is proprietary’ (like instances of class Foo). the kind of data that i want to pickle to my hard disk and publish to web clients is non-transient data; the kind of data that i can pickle and send over the wire is generic data (or else i must deal with subtleties of pickling and write very specialized converters that reduce the proprietary to the generic).

say you want to represent colors. you could start with a dictionary, like {'r':1,'g':2,'b':3}. then, you want to have a ‘smart’ color that gives you my_color.hsv() and my_color.darker(), and what was previously designed as a dictionary now turns into an object of class Color() with color.r, color.g, color.darker() and so on. so now you have one way to keep that data in a serializable form, and one way to keep that data in a processable form.

if you do it this way, and it is the most-recommended one, you immediately lose standardized ways of representing and communicating your data. each time you want to pass in your color to a method of another class or send it to a web page you will have to pluck those elementary values from the instance (or define another method to do that), which means you will find yourself do a lot of def __repr__(self):return '%s/%s/%s'%(self.r,self.g,self.b,)-like stuff.

worse, say you have an object class Fluffy_Colors that knows how to do from_rgb_to_hsv() conversion and one Ez_shades that knows how to darken() and lighten() colors. you author your own class to take advantage of that code and provide ways to convert and modify colors. how do you get the conversion from RGB to HSV? instantiate a fc = Fluffy_Colors( red=self.r, green=self.g, blue=self.b, ) and call fc.from_rgb_to_hsv() on it. you want to keep the result? easy, self.r=fc.red etc. how is lightening a color done? right, es=Ez_shades( rgb=(self.r, self.g, self.b, ) ) (call conventions always tend to differ), then self.r=es.rgb[0] etc. so you do a lot of data-packing and unpacking in order to instantiate objects in types that are not interesting to you in the first place---you asked for a specific car upgrade but the only car you were allowed to upgrade at garage Fluffy_Colors is a Fluffy_Colors car, so you buy one, have the upgrade done, then go home and rip out that new carburetor to put it back into your car.

now the way of programming i talk about does in itself not spare you from adapting your data to another library’s conventions. but if library A expects one kind of generic data structure and library B expects another kind, chances are there are ways to describe that transition in a generic way, too---in a kind of generic XSLT-kind transformation (i hate XSLT but the idea to describe structural transformations in a generic way is great). in simple cases, trivial name mapping should suffice.

benefit #2: there is no obstacle to using method x from library A for your data d as long as it meaningful---no more subclassing etc. this encourages the formulation of methods that work more on the general appearance of data than on specific types.

benefit #3: APIs become simpler. one drawback of APIs these days is that they become so complex, sometimes downright awkward. often you have to do real dances with your objects and classes. with this kind of data objects and manipulating libraries, you can start to work in much more data-centric ways.

one possible avenue to further develop this style of programming that i see right now is to actually marry data objects (expressed as PODs, plain old dictionaries) and libraries (expressed as classes with methods) by using derivatives of dict and then do method discovery. as i see it, a majority of daily programming chores consists in doing the same sort of data mangling and massaging again and again, with parameters and variations of the ever same recipes. what i try to do is write generic bills of fare so procedures can be more easily shared across concerns.

just realized this has little to do with the original post... oops.

well, for one, you get namespaces for libraries that are not bound to specific instance types, but to specifc data structures with the way described, which is good.

two, structures built on dictionaries and lists naturally form namespaces: { '~isa':'rectangle', 'position':{ '~isa':'2d-vector', 'x':108, 'y':42, }, 'size':{ '~isa':'2d-vector', 'x':50, 'y':70, }, } (but remember that ‘flat is better than nested’).

three, you get refined namespaces with sigils. i often find that python’s restrictions on variable names are quite strict. Javascript allows $, and these days $ gets used a lot in up-and-coming libraries like jQuery and protoype. this is very similar to the way i use sigils. for example, when i have to keep a reference to a specific instance, when i need to build circular structures and when i build derived representations (say, x['name']['%full'] from x['name']['first']+x['name']['last']), i prefix those things with a % sign. this way, if a conflict should arise, i always know which data is to trust and which data is to toss. when i transmit data using JSON, i have those % values all filtered out so only essentials are kept. hey, and i never have to touch to_json() and __reduce_ex__() again!




04/11/09 @ 06:45
Comment from: Chris Leary [Visitor] · http://blog.cdleary.com

Is mimetypes incorrect in having mimetimes.suffix_map, mimetypes.encodings_map, and mimetypes.types_map?

04/17/09 @ 01:55
Comment from: mark [Visitor]

i am not actually sure that
color.r

would be better than

color_r

or color.red
vs
color_red

i mean we would have to compare the same long version or? This seems a bit small issue...

My bigger problem is that I am not sure where this is coming from PHP. In fact, the first version seems beautiful compared to the PHP code I read in my life (I left PHP years ago already and didn't look back though so I dont know if things really improved, and I honestly dont really care about PHP that much)

05/08/09 @ 19:46
Comment from: Vladimir [Visitor] · http://twitter.com/mindwork

In PHP, you can write RGB class like in your example and define a magic methon __call that will do getters and setters for a class variables. Then use like this:
$color = new RGBColor(255,0,0);
$red = $color->getRed();
$color->setBlue(128);

05/10/09 @ 01:43
Comment from: Robert [Visitor]

@mark

The reason that "color_r" is bad is that anybody else would read that as one variable and not a call. Whereas "color.r" is much clearer that you are calling the color of something.

05/10/09 @ 06:42
Comment from: php/python [Visitor] · http://www.php2python.com

@Kit Dallege

DATABASE_ENGINE = 'postgresql'
DATABASE_NAME = 'mydb'
[...]

I think he isn't saying that we should never use underscores - we shouldn't use them instead of namespaces or OOP.
That's how I understand it.

11/09/09 @ 08:07
Comment from: White Magic [Visitor] · http://www.whitemagickspells.org

Yeah i agree some API's are really badly made.

04/29/10 @ 11:29
Comment from: Laura Myob [Visitor] · http://stoneconsulting.com.au/

Not coming from PHP either, but even if color_r seems like more of a variable than color.r it is still fairly easily understood.

Besides, can't you just write a program that will change your _r's to .r's if necessary?

Please keep writing your excellent blog!

06/15/10 @ 01:03

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
November 2017
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    

Search

The requested Blog doesn't exist any more!

XML Feeds

powered by b2evolution free blog software