« Spam innardshelp(CherryPy 3.0) »

Mapping Python types to DB types


Permalink 12:05:53 pm, by fumanchu Email , 433 words   English (US)
Categories: Python, Dejavu

Mapping Python types to DB types

Reading Barry Warsaw's recent use of SQLAlchemy, I'm reminded once again of how ugly I find SQLAlchemy's PickleType and SQLObject's PickleCol concepts. I have nothing against the concept of pickle itself, mind you, but I do have an issue with implementation layer names leaking into application code.

The existence of a PickleType (and BlobType, etc.) means that the application developer needs to think in terms of database types. This adds another mental model to the user's (my) tiny brain, one which is unnecessary. It constantly places the burden on the developer to map Python types to database types.

For Dejavu, I started in the opposite direction, and decided that object properties would be declared in terms of Python types, not database types. When you write a new Unit class, you even pass the actual type (such as int or unicode) to the property constructor instead of a type name! Instead of separate classes for each type, there is only a single UnitProperty class. This frees programmers from having to map types in their code (and therefore in their heads); it removes an entire mental model (DB types) at coding time, and allows the programmer to remain in the Python flow.

However, the first versions of Dejavu went too far in this approach, mostly due to the fact that Dejavu started from the "no legacy" side of ORM development; that is, it assumed your Python model would always create the database. This allowed Dejavu to choose appropriate database types for the declared Python types, but meant that existing applications (with existing data) were difficult to port to Dejavu, because the type-adaptation machinery had no way to recognize and handle database types other than those Dejavu preferred to create.

Dejavu 1.5 (soon to be released) corrects this by allowing true "M x N" type adaptation. What this means is that you can continue to directly use Python types in your model, but you also gain complete control over the database types. The built-in type adapters understand many more (Python type <-> DB type) adaptation pairs, now, but you also have the power to add your own. In addition, Dejavu now has DB type-introspection capabilities—the database types will be discovered for you, and appropriate adapters used on the fly. [...and Dejavu now allows you to automatically create Python models from existing databases.]

In short, it is possible to have an ORM with abstractions that don't leak (at least not on a regular basis—the construction of a custom adapter requires some thought ;) ).


Comment from: John M. Camara [Visitor]

I don't personally feel that pickles and blobs are necessarily a bad thing but some people tend to abuse them. Just like some people abuse the use of relational databases for all applications even though some applications are better suited for object oriented databases. But relational database abuse is another topic so I'll say no more on that.

Let me make up an example and ask you how you would handle this situation. Lets say you are building an application that will allow users to display drawings that contain vector graphics. Each drawing contains a description, author, last_rev_date, and can contain any number of lines, circles, squares, etc. Now lets also say that the only querying that will be required will only be on the description, author, and last_rev_date fields.

Do you really feel like it would be a better solution to create tables for lines, circles, squares, etc, then create the necessary relationships, as well as dealing with the conversions to and from objects to a relational database vs just creating a table with fields for the description, author, last_rev_date and a pickle field for the pickled drawing object that references all the lines, circles, squares, etc?

As for me I would just pick the pickle method instead of doing all the ORM gymnastics even if an ORM could do it for me without any additional code.


01/23/07 @ 17:40
Comment from: fumanchu [Member] Email
Do you really feel like it would be a better solution
to create tables for lines, circles, squares, etc,
then create the necessary relationships, as well as
dealing with the conversions to and from objects to
a relational database vs just creating a table with
fields for the description, author, last_rev_date and
a pickle field for the pickled drawing object that
references all the lines, circles, squares, etc?

No, I don't feel the former is a better solution at all. As I said, I have nothing against pickling; I should have been more clear that pickle is the default mechanism by which Dejavu stores complex types like list and dict in databases.

But in Dejavu, I can declare a UnitProperty for the content using the Python type "SVGObject" (or what-have-you), rather than using the storage mechanisms or DB datatypes. Then, since I'm probably the first to want to persist SVG objects like this with Dejavu, I would write two small methods to do the pickling and unpickling on the way into and out of the database. They would probably be one-liners, just as if you weren't using a database at all. But I get all of the persistence of the metadata (like author, etc.) for free right alongside it. And my point is that my application code doesn't have to know pickling is involved.

Personally, if I were persisting drawing objects, I'd first consider using a filesystem so that each drawing's 'content' (the actual drawing data) would be persisted as a file with a known extension and binary data type. No pickling necessary or desired. Dejavu 1.5 has a filesystem backend that wouldn't take much to add a SVG adapter to.

01/23/07 @ 20:01
Comment from: John M. Camara [Visitor]

Ok. I now see the point your trying to make but I'm not sure I agree it's the right choice in all cases. If in the example I provided it was the only application that would ever need access to the "SVGObject" I would prefer that the application take care of the pickling and unpicking as I like to keep things simple. The mechanism I would use for the unpicking would be to create a builder class.

It is very likely that while unpicking these objects additional work will need to be done besides recreating the objects and I would use the builder class to perform this work. If I use the approach you mention this code would be split up into 2 pieces.

Now if more than one application needed access to this "SVGObject" I may write a separate application that would act as the "database" but in turn may use a relation database and file system, or relational database and Berkeley DB, or some other means of storing the data. Now I'm not familiar with Dejavu and what I just describe may in fact be what Dejavu is doing for you.

If Dejavu has this capability I would normally not use such a product to meet this need as I would likely want more control over the storage and querying capabilities than is likely to be provide by the library. I find that when I have applications that gets into this gray area of not mostly fitting into a relational database or mostly fitting into an object orient database that the solutions to these issues tend to be unique for each application.

On the other hand, for applications that either mostly fit into a relational or object orient database I could see I library handling these cases just fine.

Now as far as storing the content of these drawings to a file system. I would use other requirements that were not stated above to make the real decision as to how the content would be stored. I may chose a file, relational database, object orient database, or maybe a Berkeley database. I just made up the drawing example to have something which I had hope would not raise a "What if down the road you want to search on xx" question that may occur if I gave a business example. So I just wanted an example were pickling an object would seam like reasonable thing to do.

01/23/07 @ 22:01
Comment from: fumanchu [Member] Email

Very few things are the right choice in all cases. ;) But Dejavu indeed "does that for you"--one of its strengths is that it allows you to select different stores for different data, and then combines them transparently, so your application only sees a single "backend".

There are always "other requirements" involved, as you say. But rather than make educated guesses as to which backend is best, Dejavu is designed to let you easily test the performance of various stores with your own data (rather than relying on abstract benchmarks), and then go ahead and use the best mix of stores with the least amount of pain.

01/23/07 @ 22:50

Leave a comment

Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
September 2017
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30


The requested Blog doesn't exist any more!

XML Feeds

powered by b2evolution