« Simplify, maaaaanLooks like I reinvented the wheel again »

Storm, SQLAlchemy, and Geniusql


Permalink 07:00:00 pm, by fumanchu Email , 2186 words   English (US)
Categories: Python, Dejavu

Storm, SQLAlchemy, and Geniusql

As Mike Bayer mentioned, I recently did some cross-ORM comparisons. The charts below all show "number of requests" on the X axis and "number of seconds to complete those requests" on the Y axis.

Update: Mike found some problems in my methodology and posted a very thorough "revisiting" post. Definitely read that if you read this.


I used trunk versions of Geniusql (1.0) and SQLAlchemy (0.4, without using the orm subpackage), and the latest stable distro of Storm (0.10). All test runs were done on my Win2k laptop, which has a 2GHz CPU and 512MB of RAM. PostgreSQL 8.0.0rc1 was the DB engine, and psycopg2 was the driver.

Scripts and a CSV are here: http://www.aminus.org/rbre/python/ormcompare/ I'd be happy to be notified I did something wrong in the SQLAlchemy or Storm code (especially if you can patch it to do better ;) ).

Results and Analysis

As with all benchmarks, take this with a heavy grain of salt. The numbers say far more about the values and designs of each ORM than they do about the performance of particular subsystems in each ORM. To wit:

Step 2: Insert

Inserting single objects: Benchmark chart for step 2: insert

SQLAlchemy is doing something that Geniusql and Storm aren't doing on every insert. A trace of SQLAlchemy and Geniusql shows Geniusql making 13 Python function calls for a single simple insert, whereas SQLAlchemy makes over 400! What could it be doing?

Apparently, SQLAlchemy is spending a lot of time generating bound parameters. That is, it generates the SQL INSERT INTO "Animal" ("ID", "Name", "Species", "Legs") VALUES (:ID, :Name, :Species, :Legs), which Geniusql doesn't do. Is it ostensibly doing this for speed reasons? If so, the benchmark results seem to argue that it's a losing battle; the auto-generation of optimized code is costing more than the savings gained from the optimization.

It also spends far too long fetching the autoincremented ID's, using 73 Python function calls; Geniusql manages it in 4: fetch, log, a __getattr__, and a pull method to coerce the new ID to the proper Python type. But then, SQLAlchemy is also populating the new object with other (non-autoincrementing) default values, in Python, which Geniusql doesn't do (it relegates that to its ORM layer, Dejavu). I'd bet SQLAlchemy would be better off just fetching the new row (which the DB has populated with default values already anyway).

Here's a trace of Geniusql retrieving the autoincremented ID:

> _grab_new_ids (postgres:520)
  > fetch (psycopg:94)
    > log (objects:794)
    < log (objects:795) 0.047ms
    > __getattr__ (conns:23)
    < __getattr__ (conns:24): <built-in method cursor of psycopg2._psycopg.connection object at 0x00ADB020> 0.052ms
  < fetch (psycopg:120): ([(13L,)], (('last_value', 20, 2, 8, None, None, None),)) 1.232ms
< _grab_new_ids (postgres:527): {'ID': 13L} 1.601ms

and here's an SQLAlchemy trace:

> get_column_default (postgres:616)
  > __get__ (util:191)
  < __get__ (util:195): True 0.062ms
  > get_column_default (base:1530)
    > traverse_single (sql:893)
      > visit_sequence (postgres:633)
        > <lambda> (base:1528)
        < <lambda> (base:1528): <sqlalchemy.databases.postgres.PGDialect object at 0x00A23A30> 0.049ms
        > format_sequence (ansisql:1026)
          > __generic_obj_format (ansisql:1004)
            > _get_case_sensitive (schema:110)
            < _get_case_sensitive (schema:119): True 0.058ms
          < __generic_obj_format (ansisql:1010): 'animal_id_seq' 0.289ms
        < format_sequence (ansisql:1027): 'animal_id_seq' 0.462ms
        > execute (base:712)
          > _execute_text (base:722)
            > __distill_params (base:728)
            < __distill_params (base:735) 0.080ms
            > __create_execution_context (base:761)
              > <lambda> (base:932)
              < <lambda> (base:932): <sqlalchemy.databases.postgres.PGDialect object at 0x00A23A30> 0.051ms
              > create_execution_context (postgres:231)
                > __init__ (default:146)
                  > __encode_param_keys (default:184)
                    > supports_unicode_statements (default:50)
                    < supports_unicode_statements (default:52): False 0.049ms
                    > proc (default:189)
                    < proc (default:193) 0.065ms
                  < __encode_param_keys (default:198) 0.794ms
                  > supports_unicode_statements (default:50)
                  < supports_unicode_statements (default:52): False 0.048ms
                  > create_cursor (postgres:183)
                    > _is_server_side (postgres:180)
                    < _is_server_side (postgres:181): False 0.051ms
                    > _get_connection (base:532)
                    < _get_connection (base:534): <sqlalchemy.pool._ConnectionFairy object at 0x00BDA950> 0.059ms
                    > cursor (pool:308)
                      > __init__ (pool:391)
                      < __init__ (pool:393) 0.061ms
                    < cursor (pool:311): <sqlalchemy.pool._CursorFairy object at 0x00BDAE90> 0.271ms
                  < create_cursor (postgres:190): <sqlalchemy.pool._CursorFairy object at 0x00BDAE90> 0.832ms
                < __init__ (default:174) 2.301ms
              < create_execution_context (postgres:232): <sqlalchemy.databases.postgres.PGExecutionContext object at 0x00BDAD70> 2.487ms
            < __create_execution_context (base:762): <sqlalchemy.databases.postgres.PGExecutionContext object at 0x00BDAD70> 2.860ms
            > __execute_raw (base:764)
              > is_info_enabled (logging:71)
                > isEnabledFor (logging/__init__:1139)
                  > getEffectiveLevel (logging/__init__:1125)
                  < getEffectiveLevel (logging/__init__:1135): 30 0.129ms
                < isEnabledFor (logging/__init__:1145): False 0.334ms
              < is_info_enabled (logging:72): False 0.530ms
              > __execute (base:774)
                > do_execute (default:103)
                  > __getattr__ (pool:404)
                  < __getattr__ (pool:405): <built-in method execute of psycopg2._psycopg.cursor object at 0x00B94978> 0.049ms
                < do_execute (default:104) 2.892ms
              < __execute (base:781) 3.186ms
              > _autocommit (base:685)
                > in_transaction (base:617)
                < in_transaction (base:618): False 0.049ms
                > match (sre:126)
                  > _compile (sre:213)
                  < _compile (sre:218): <_sre.SRE_Pattern object at 0x00BDC020> 0.087ms
                < match (sre:129) 0.270ms
              < _autocommit (base:689) 0.608ms
            < __execute_raw (base:772) 4.783ms
            > result (default:233)
              > get_result_proxy (postgres:192)
                > _is_server_side (postgres:180)
                < _is_server_side (postgres:181): False 0.051ms
                > __init__ (base:1126)
                  > <lambda> (default:176)
                    > <lambda> (default:180)
                      > _branch (base:538)
                        > __init__ (base:524)
                        < __init__ (base:530) 0.114ms
                      < _branch (base:547): <sqlalchemy.engine.base.Connection object at 0x00BD1350> 0.298ms
                    < <lambda> (default:180): <sqlalchemy.engine.base.Connection object at 0x00BD1350> 0.475ms
                    > <lambda> (base:549)
                    < <lambda> (base:549): Engine(postgres://postgres:djvpg@localhost/geniusql_bench) 0.048ms
                  < <lambda> (default:176): Engine(postgres://postgres:djvpg@localhost/geniusql_bench) 0.888ms
                  > is_debug_enabled (logging:68)
                    > isEnabledFor (logging/__init__:1139)
                      > getEffectiveLevel (logging/__init__:1125)
                      < getEffectiveLevel (logging/__init__:1135): 30 0.130ms
                    < isEnabledFor (logging/__init__:1145): False 0.340ms
                  < is_debug_enabled (logging:69): False 0.541ms
                  > is_select (default:219)
                    > match (sre:126)
                      > _compile (sre:213)
                      < _compile (sre:218): <_sre.SRE_Pattern object at 0x00BAB7F0> 0.086ms
                    < match (sre:129): <_sre.SRE_Match object at 0x00BC49F8> 0.261ms
                  < is_select (default:222): True 0.431ms
                  > _init_metadata (base:1151)
                    > _create_key_cache (base:1182)
                      > __init__ (util:51)
                      < __init__ (util:52) 0.049ms
                    < _create_key_cache (base:1204): {} 0.321ms
                    > __getattr__ (pool:404)
                    < __getattr__ (pool:405): (('nextval', 20, 2, 8, None, None, None),) 0.053ms
                    > dbapi_type_map (default:30)
                    < dbapi_type_map (default:34): {} 0.048ms
                    > decode_result_columnname (default:25)
                      > decode (utf_8:15)
                      < decode (utf_8:16): (u'nextval', 7) 0.053ms
                    < decode_result_columnname (default:28): u'nextval' 0.234ms
                    > dialect_impl (types:49)
                    < dialect_impl (types:51): NullType() 0.060ms
                  < _init_metadata (base:1179) 1.758ms
                < __init__ (base:1135) 4.304ms
              < get_result_proxy (postgres:196): <sqlalchemy.engine.base.ResultProxy object at 0x00BD1E30> 4.737ms
            < result (default:234): <sqlalchemy.engine.base.ResultProxy object at 0x00BD1E30> 4.917ms
          < _execute_text (base:726): <sqlalchemy.engine.base.ResultProxy object at 0x00BD1E30> 13.304ms
        < execute (base:715): <sqlalchemy.engine.base.ResultProxy object at 0x00BD1E30> 13.544ms
        > scalar (base:1318)
          > _fetchone_impl (base:1284)
            > __getattr__ (pool:404)
            < __getattr__ (pool:405): <built-in method fetchone of psycopg2._psycopg.cursor object at 0x00B94978> 0.051ms
          < _fetchone_impl (base:1285): (13L,) 0.250ms
          > _process_row (base:1291)
            > __init__ (base:1431)
            < __init__ (base:1436) 0.072ms
          < _process_row (base:1292): (13L,) 0.308ms
          > __getitem__ (base:1462)
            > _get_col (base:1280)
              > __getitem__ (util:53)
              < __getitem__ (util:55): (NullType(), NullType(), 0) 0.063ms
              > convert_result_value (types:193)
              < convert_result_value (types:194): 13L 0.048ms
            < _get_col (base:1282): 13L 0.503ms
          < __getitem__ (base:1467): 13L 0.694ms
          > close (base:1206)
            > close (pool:398)
            < close (pool:400) 0.063ms
            > <lambda> (base:1140)
              > <lambda> (default:182)
              < <lambda> (default:182): <sqlalchemy.engine.base.Connection object at 0x00BDACF0> 0.048ms
            < <lambda> (base:1140): <sqlalchemy.engine.base.Connection object at 0x00BDACF0> 0.229ms
            > <lambda> (base:552)
            < <lambda> (base:552): False 0.050ms
          < close (base:1219) 0.796ms
        < scalar (base:1327): 13L 2.721ms
      < visit_sequence (postgres:635): 13L 17.369ms
    < traverse_single (sql:896): 13L 17.584ms
  < get_column_default (base:1532): 13L 17.773ms
< get_column_default (postgres:631): 13L 18.196ms

Step 3: Properties

Retrieving rows/objects by ID: Benchmark chart for step 3: properties

Again, Storm and Geniusql are close but SQLAlchemy is not. There must be some things that SQLAlchemy is doing that the others are not.

Geniusql uses the pattern: parse expression, write SQL, fetch, iterate over result rows (and coerce inbound values as they are yielded).

Storm does it a little differently. The first time you call store.find (at least, I'm assuming it's only the first time), Storm pre-constructs SQL for you, including INSERT INTO, autoincrement ID getters, SELECT by ID, etc. It's the actual find().one() call that does the traditional work. The only significant difference from Geniusql in this step is that Storm's inbound type adapters seem to be slower (by a factor of 10 or so).

There's no single thing I can point to that SQLAlchemy does in this case that the others don't, it just seems to generally take more steps to do each phase. The correlations seem to be taking the most time; SQLAlchemy takes about 5 times longer to generate the SQL than Geniusql does. And, like in the insert step, I'm not sure the bind params are helping SQLAlchemy. But then, Storm does parameter binding and manages to stay fast.

Step 4: Expressions

Retrieving rows/objects with various WHERE clauses: Benchmark chart for step 4: expressions

Oh, dear. It looks like Storm needs some work in this department. Those pre-calculated fetch-by-ID SQL statements are handy, but can't apply to every situation. But let's look at SQLAlchemy first:

SQLAlchemy does a lot of traversal of tree-like structures. Some of this could/should be cached. For example, Geniusql stores the quoted names of each column, so only a single attribute lookup is needed for each one in a SELECT clause; fetching all the qnames and assembling them into the final SQL takes about 1 ms in my trace log. SQLAlchemy, on the other hand, makes about 24 Python function calls (4.5 ms) for each column. The actual fetch-iterate-and-coerce step is practically the same for both libraries.

Storm takes about 3.4ms to generate each column name in a SELECT clause. It also does some odd things with inbound coercion: it gets a decimal.Decimal back for a column that was declared to be Float (?! Zoo.Admission), and coerces that plus the unicode Zoo.Name, but doesn't run a coercion function at all for the date fields. Guess it knows/trusts psycopg will return valid date objects. Ah, so then it wraps up the decimal scalar value in a storm.variables.FloatVariable object. Are you allowed to use binary floating-point "all the way down" in Storm?

At any rate, it takes Storm about 60ms in my trace log to return the first object for each row, and then about 20ms per row/object thereafter. Geniusql takes about 20ms for the first row, and under 2ms for each row thereafter. It's a good bet that that's the bottleneck, and is most likely due to slow inbound adapters.

Step 5: Crazy stuff

Retrieving rows with certain fields only (projection), use of builtin functions (today/now) and use of distinct: Benchmark chart for step 5: aggregates

This one's a little crazy for several reasons. First, it tests some dissimilar things:

  • The 'Legs' attribute of the 'Animal' table is returned for all rows.
  • The 'Species' and 'Lifespan' columns are then retrieved for all rows.
  • The 'Zoo.Founded' date attribute is compared using an SQL function like TODAY or NOW.
  • A DISTINCT query is run on the Animal table.

Second, I couldn't find a way to get Storm to write TODAY into the SQL, so that test cheats a bit and just sticks datetime.datetime.today into the initial expression. I'm pretty convinced that's why the Storm times are so low on this test.

But let's break down the trace log times for the first test, fetching SELECT Legs FROM Animal. Here's SQLAlchemy:

view = select([Animal.c.Legs]).execute().fetchall()
legs = [x[0] for x in view]

[Animal.c.Legs] =  1.3 ms
select()        =  2.9 ms
execute()       = 63.6 ms
fetchall()      = 15.5 ms
listcomp        = 10.4 ms

Here's the Geniusql equivalent:

view = db.select((Animal, ['Legs']))
legs = [x[0] for x in view]

select()        = 13.2 ms
+- fetch()      =     1.9 ms
listcomp        =  4.2 ms

And the Storm timings:

legs = getall(Animal.Legs)

getall()        = 13.8 ms
+- execute()    =    10.0 ms
+- get_all()    =     2.7 ms

I just noticed I forgot the listcomp in the Storm test. Meh.

The third query is more complicated than the ones we've seen so far, so it takes up most of the time in this entire step. Geniusql takes 74.455ms to write the SQL and fetch the data. SQLAlchemy takes 17 ms just parsing all of the overridden operators in the expression (Storm overrides operators, too, but typically does each one in 1/3 the time). Then select() is 3 ms, execute is 115 ms (!), and fetchall is 11 more. I've been thinking lately of adding compound SELECTs to Geniusql recently, but if that means a doubling of execution time for SELECTs without any correlated subqueries, perhaps I'll wait a while. Just kidding--I'm sure it can be done more simply; if SQLAlchemy could short-circuit the traversal of ClauseVisitors in the common case (one table, no compounded SELECTs), it might improve dramatically.

Step 6: Editing

Updating rows/objects: Benchmark chart for step 6: editing

No surprises here. SQLAlchemy parses the WHERE clause of the UPDATE statements faster than Geniusql, but here come those correlated subquery traversals and bind params again to slow the whole thing down.

I just noticed something else: Geniusql uses the following sequence of events to grab a connection:

> get (conns:241)
    > id (conns:271)
    < id (conns:273): 0.049ms
> get (conns:269): 0.280ms

Here's the SQLAlchemy system:

> contextual_connect (base:1036)
  > __init__ (base:524)
    > raw_connection (base:1086)
      > connect (pool:157)
        > __init__ (pool:262)
          > __init__ (pool:256)
          < __init__ (pool:257) 0.055ms
          > get (pool:171)
            > do_get (pool:533)
              > get (queue:124)
                > acquire (threading:93)
                  > currentThread (threading:672)
                  < currentThread (threading:674): <_MainThread(MainThread, started)> 0.058ms
                  > _note (threading:44)
                  < _note (threading:45) 0.050ms
                < acquire (threading:109): True 0.518ms
                > _empty (queue:181)
                < _empty (queue:182): False 0.049ms
                > _get (queue:193)
                < _get (queue:194): <sqlalchemy.pool._ConnectionRecord object at 0x00C0F1F0> 0.050ms
                > notify (threading:236)
                  > _is_owned (threading:143)
                    > currentThread (threading:672)
                    < currentThread (threading:674): <_MainThread(MainThread, started)> 0.054ms
                  < _is_owned (threading:144): True 0.232ms
                  > _note (threading:44)
                  < _note (threading:45) 0.047ms
                < notify (threading:243) 0.649ms
                > release (threading:111)
                  > currentThread (threading:672)
                  < currentThread (threading:674): <_MainThread(MainThread, started)> 0.055ms
                  > _note (threading:44)
                  < _note (threading:45) 0.048ms
                < release (threading:119) 0.487ms
              < get (queue:157): <sqlalchemy.pool._ConnectionRecord object at 0x00C0F1F0> 2.532ms
            < do_get (pool:536): <sqlalchemy.pool._ConnectionRecord object at 0x00C0F1F0> 2.743ms
          < get (pool:172): <sqlalchemy.pool._ConnectionRecord object at 0x00C0F1F0> 2.914ms
          > get_connection (pool:219)
          < get_connection (pool:234): <connection object at 0x00B94E58;
              dsn: 'dbname=geniusql_bench host=localhost user=postgres password=*****', closed: 0> 0.079ms
          > __get__ (logging:86)
            > _get_instance_name (logging:56)
            < _get_instance_name (logging:60): 'sqlalchemy.pool.QueuePool.0x..50' 0.061ms
            > getLogger (logging/__init__:1225)
              > getLogger (logging/__init__:832)
                > _acquireLock (logging/__init__:171)
                  > acquire (threading:93)
                    > currentThread (threading:672)
                    < currentThread (threading:674): <_MainThread(MainThread, started)> 0.056ms
                    > _note (threading:44)
                    < _note (threading:45) 0.048ms
                  < acquire (threading:109): True 0.496ms
                < _acquireLock (logging/__init__:181) 0.687ms
                > _releaseLock (logging/__init__:183)
                  > release (threading:111)
                    > currentThread (threading:672)
                    < currentThread (threading:674): <_MainThread(MainThread, started)> 0.056ms
                    > _note (threading:44)
                    < _note (threading:45) 0.048ms
                  < release (threading:119) 0.481ms
                < _releaseLock (logging/__init__:188) 0.658ms
              < getLogger (logging/__init__:862): <logging.Logger instance at 0x00B96F58> 1.775ms
            < getLogger (logging/__init__:1232): <logging.Logger instance at 0x00B96F58> 1.994ms
            > getEffectiveLevel (logging/__init__:1125)
            < getEffectiveLevel (logging/__init__:1135): 30 0.134ms
          < __get__ (logging:90): False 2.731ms
        < __init__ (pool:273) 6.473ms
        > checkout (pool:319)
        < checkout (pool:325): <sqlalchemy.pool._ConnectionFairy object at 0x00C0FF50> 0.092ms
      < connect (pool:159): <sqlalchemy.pool._ConnectionFairy object at 0x00C0FF50> 6.878ms
    < raw_connection (base:1089): <sqlalchemy.pool._ConnectionFairy object at 0x00C0FF50> 7.059ms
  < __init__ (base:530) 7.308ms
< contextual_connect (base:1042): <sqlalchemy.engine.base.Connection object at 0x00C0F910> 7.486ms

Houston, we have a problem. Don't log if you don't have to (especially when using the Python stdlib's extremely-nested logging module, which is slow even when turned off), and use a map of conns to threads instead of going to the Queue for every request. Geniusql manages this by using weakref wrappers on the conns, so when a thread no longer references its mapped connection, it's returned to the pool. Storm gets around this by not having a pool, or even threadsafe Store objects--each Store has its own connection, so it "maps conns to threads" by limiting you to one Store+conn per thread.

That's enough analysis for now. :) Hope you enjoyed the ride.


Comment from: mike bayer [Visitor]

Hey Robert -

I would note that the connection pool example you're showing there uses SQLAlchemy's QueuePool implementation, which is swappable - you're free to put in a simple get/put system in there if you want. The SQLAlchemy pool is also using weakref-oriented wrappers so that connections get returned to the pool automatically. If Genuisql's license is compatible, it might be worth looking into adapting your approach into a SQLAlchemy implementation; although id be curious to see how you handle various synchronization issues as well as things like invalidating stale connections and stuff like that (stuff our users really want).

as far as inserts, our overhead is strictly one of converting bind params as according to type objects, encoding the keys of dictionary-based parameters (you need that when using unicode identifiers with multibyte chars), and doing whatever string/date/binary/etc conversions are needed. but also, we are processing python-side defaults, taking note of database-side defaults, and returning the full set of primary key values for the row, if you are using a single execute; if you use executemany() semantics, much less of that goes on and i think you'll find results are much faster like that.

So you did hit it on the head, SA's "values" are that of a richer data processing on the inbound and outbound sides; since if you don't need any of that, raw DBAPI is going to beat everyone anyway. But I can say, I'm glad you at least tested the 0.4 trunk and not 0.3 - we've only begun in the last few months to really start optimizing alot of things, as you can see with the statement compilation for example.

08/18/07 @ 20:17
Comment from: mike bayer [Visitor] · http://www.sqlalchemy.org

also, id like to note that since you used Psycopg2, SQLAlchemy's insert constructs will pre-fire a sequence off in order to get the value; this is because in order to post-target the row, you need that value. cursor.lastrowid in psycopg2 returns the OID of the row, not the primary key value, which only works on a Postgres that has OIDs enabled; this setting is deprecated in Postgres which means the value is not of general use.

If you ran the benchmark with sqlite or mysql, you'll see we don't do any pre-firing of anything, since cursor.lastrowid is usable, and you'd get faster results.

08/18/07 @ 20:41
Comment from: fumanchu [Member] Email

Hi Mike,

Good to know about the swappable Pools; Geniusql allows that too, and I'm about to write one which handles invalidation of stale connections. I'd be interested to know more about the "synchronization issues" you've encountered.

Regarding unicode, I confess I haven't done multibyte identifier tests. I'd be surprised if I couldn't use octal escapes there just as well, though?

I can't tell, however, if you got the impression that Geniusql does no Python-type-to-DB-type conversion; it certainly does, even to the point of determining in advance the resultant type of every binary and comparison subexpression in the SQL.

Regarding Postgres sequences, I've always found SELECT last_value FROM sequence to work just fine, as long as you use the same connection as you did for the INSERT (which Geniusql takes pains to ensure).

But perhaps the most important distinction is that Geniusql (and therefore Dejavu) does not rely on the DB-API. I've found too many inconsistencies and performance problems there to want to support it--native driver calls are almost always faster and more reliable.

08/18/07 @ 22:53
Comment from: Jonathan Ellis [Visitor] · http://spyced.blogspot.com

Thanks for the analysis, Robert!

08/19/07 @ 09:58
Comment from: Mark Ramm [Visitor] · http://www.compoundthinking.com/blog

Thanks for the interesting read.

One nit-pick with your analysis though, bind parameters may very well be worthwhile if you are database bottlenecked, and you have plenty of application CPU cycles around (or can buy them easily). So, I would think you will probably find a point at which real applications get better performance out of doing the work to optimize the database query. But I'll admit that test is quite a bit more complicated to run as you need one or two app servers and a seprate DB server so you can run your DB server up to 100% utilization...

08/19/07 @ 13:49
Comment from: fumanchu [Member] Email

If the bottleneck on the database machine is CPU, yes, absolutely. But blindly asking an ORM to generate bind params in dynamic code for dynamic queries may not only shift that work from the DB server to the app server, but increase it, since you're shifting the work from query-plan optimization in C to SQL parameterization in Python. So you may be robbing Peter to pay Paul. You'd probably want to take the additional step at that point to pre-compile hotspot queries on the app server in parameterized form, which somewhat negates the "dynamic-programming" benefits of an ORM in the first place. :/

08/19/07 @ 17:00
Comment from: Brian Harring [Visitor]

Interesting stats, but I'd be more interested in code/blog entry for how you generated the time per func invocation; know pstats has the api already, but your presentation implies you've got something a bit less manual ;)

08/20/07 @ 00:42
Comment from: Jonathan Ellis [Visitor] · http://spyced.blogspot.com

Re bind params, I've never seen an application yet where the admins wouldn't happily trade some inefficiency on the app server side in exchange for less load on the db. App servers are designed so that you can just throw machines at the problem if you need to; this is harder to do with databases. (How much harder depends on the app, but it's always harder.)

08/20/07 @ 09:46
Comment from: mike bayer [Visitor] · http://www.sqlalchemy.org

Robert -

I'm puzzled...you don't use the DBAPI ? How do you talk to Postgres, you wrote your own version of psycopg2 ? That would explain why Geniusql doesn't use bind parameters either (at least for inserts...). As I noted yesterday, it's not going to work if/when Geniusql starts supporting Oracle, which is very bind-parameter oriented and won't let larger LOB entries in without a bind param. Using bind params is definitely one of our core values (and apparently Storm's too).

Your count of "400 function calls" for an insert seems a little biased; such as, you're probably checking out the connection from the pool each time, you're recompiling the statement every time, you're using Postgres for which we add extra steps specific to it, maybe you loaded up the Insert call extra SQL defaults (instead of, more typically, just configuring them as DEFAULTs on your table), which SA currently does not "inline", even though we now will probably be adding this enhancement.

With SQLite, where we don't have as many DBAPI/database related decisions to make, I count 161 calls for executing a compiled Insert() construct against an engine, including pool checkout and everything. If you pre-compile the Insert(), its 97. Use a direct connection instead of the engine, its 86. Execute against a straight string, its 44. Theres some logging overhead happening here, like calls to logging.getEffectiveLevel() and such, which we can further optimize.

Interstingly, when you execute against MySQLDB, an in-python DBAPI, MySQLDB itself uses 21 function calls. So that's really amazing that Geniusql has doubled the speed of a pure Python DBAPI (not exactly sure how)...but i think the scale of function calls here needs to be placed into perspective. It's not that much. Literally, nobody has ever had any issues with our speed at this level. We've addressed a good deal of ORM concerns in the past, and we had some recent talk about executemany() in particular (which has been addressed). But other than the executemany() issue, at the SQL compile/execute level, this entire discussion has simply never come up...if you truly need to execute thousands of INSERT statements lightning fast, you wouldn't be using Python at all. We address every concern that comes to us, and this one, i.e. about our connection pool and basic compile/execution methodologies being too slow, is definitely a first.

So lets look at SA's worst case scenario, which is Postgres. We do more work with postgres to work out the "id" issue. As I mentioned, this is because cursor.lastrowid is useless. We pre-execute the sequence, whereas you (Geniusql) post-grab it. Its a little misleading that you are comparing SQLAlchemy's slightly different function to yours, where you aren't post-grabbing the ID in your example. The pre-execute/post-grab issue will eventually go away since Postgres 8.4 has a "RETURNING ID" type of clause you can add to an INSERT.

So if you run an uncompiled Insert() against postgres, don't include the ID so we're going to pre-execute, use the heavyweight QueuePool and let the engine checkout/checkin the connection for you, and the table has two columns, its 323 function calls. So yes, theres our worst case coming up. But if you run against a checked out connection (which you'd certainly be doing if you had a lot of INSERTs to execute), it goes down to 250. If you include your ID and don't pre-execute the sequence (by providing your own id), it goes down to 188. Pre-compile, we're back down to 106. And of course, we're going to look into chopping off 20-30 method calls, if not more, off our Insert() compilation overhead...these things just have never been prioritized before.

We will almost certainly be adding an "inline=True" flag to insert() and update(), which indicates to execute all sequences and defaults inline with the query..so all the overhead you mention about pre-fetching defaults will go away. This is a very easy feature add that quite simply nobody has ever asked for or cared about...it's not any deep architectural flaw, just a particular implementation decision that until now had never raised any eyebrows.

As for our connection pool itself, why's it so damn slow ? Well, it's doing a lot that I am sure your implementation, at three function calls, is not. The QueuePool is using Queue.Queue to maintain connections, but its our own local copy that uses RLock instead of Lock. Why? Because we found a deadlock condition that can occur when a connection tries to close itself inside the del() method. So most of that overhead appears to be RLock/threadlocal type stuff. It's only because our pool was used in high-load production scenarios that we found this out...so it will be curious to see what new things you learn with Geniusql when it gets out into the wild. But the RLock overhead is still unacceptable, so we will look into factoring that out.

Why are we using Queue.Queue ? Because our QueuePool does a lot. It has a fixed size, but also can grow by a additional margin that you can configure to handle spikes in load. It also throttles incoming concurrent requests, who all line up and wait their turn to get a connection, instead of your database being overwhelmed (all of this is load/road tested). Application containers may have a dynamically-allocated pool of threads which shrinks and grows, some apps may even need to use multiple connections in one thread (such as for dual-transaction operations), and highly horizontal setups need to limit the overall number of database connections. So these things are necessary in large scale cases. And in small scale cases, nobody cares about the overhead being there; its still quite small.

All of our pool subclasses can detect connection hangups (differentiating from other error conditions) and invalidate connections, and automatically recycle connections that have been opened for a fixed amount of time (this is crucial for db's like MySQL which time out after 8 hours). These are all features that our users require..so that's why a checkout takes 35 function calls (or 21 with the lighter weight SingletonThreadPool). If and when Geniusql and Storm start adding features like these, your function call counts are going to go up too. And now that someone has actually raised this issue for the first time, we're going to look into bringing the call count down.

Also as you noted, Storm doesn't have a connection pool; they suggest to place a Store object in a thread local object, which effectively means you write your own "connection management" strategy using a simple thread local variable. You can do the identical pattern in SQLAlchemy; just create a bunch of Session() objects, bind each to a Connection, and store within a thread local object. Or like I mentioned, write any put()/get() pool you want and just stick it in create_engine(). To my knowledge, nobody has ever needed this.

So SQLAlchemy will be looking into some of these issues and bringing call counts down. But with a couple of years usage, a mailing list of 800 users, and nary a complaint about any of this before, I think the larger perspective is that theres a some good ideas and observations here, but not without a heaping dose of hair-splitting.

08/20/07 @ 11:04
Comment from: mike bayer [Visitor] · http://www.sqlalchemy.org

One more point - if you send the "threadlocal" flag to create_engine(), our QueuePool does the "use the same connection against the current thread" thing you mention as well. The reason this is not turned on by default is because someone might need to use two connections in the same thread, such as one who's in a transaction, the other who logs information about the transaction regardless of it rolling back. Its something that used to be default behavior back in version 0.1, but after people got bit by the "surprise", its now an explicit option...where theres actually two different kinds of "checkout" available so its actually quite easy to use even if you do need multiple connections.

Sorry for all the blog spam but we're just flinging off the 80 pounds of mud you've flung onto us ;).

08/20/07 @ 11:17
Comment from: mike bayer [Visitor] · http://www.sqlalchemy.org

OK Robert, latest trunk cleans up the log lines in pool, sets the "use_threadlocal" option at the pool level on by default, I've taken out a WeakValueDictionary() which was adding 4 or 5 calls, and now it uses 21 calls for a fresh checkout (using queue.get()), and only 6 for an existing checkout. The RLock I've added to queue.queue adds 7 calls of overhead to queue.get(). I'm inclined to leave it since it's known that a Queue.Queue.put() can happen inside a get() if you have a put() occuring within a del() method somewhere, which will deadlock.

08/20/07 @ 13:04
Comment from: fumanchu [Member] Email

Good to hear about the improvements, Mike. I hope you didn't think I was complaining about any of these things, more just trying to point out that there are different design goals and histories for each library. I hope you're not feeling too much pressure to make things faster.

Re: the DBAPI. Geniusql may use libraries like psycopg2, but it doesn't rely on DBAPI features (like cursor.lastrowid) universally; if there's a better way to do it with a given library (like "SELECT last_value" in PG), I'll prefer that. For pypgsql, Geniusql doesn't use any of the DBAPI calls--it goes straight to calling lower-level functions in the _libpq module that pypgsql ships. For ADO (SQL Server/MS Access), it makes COM calls directly, without using any third-party DBAPI library.

I'm not sure what you mean by "you aren't post-grabbing the ID in your example". The charts and my discussion all assume Geniusql is grabbing the IDs...?

Oh, and Geniusql uses a Queue for connections, too, just not on every call since it usually maps conns to threads (or you can swap that out and use any other means of sticky conns you like, or none at all). I plan to write a Pool subclass for my company soon which grows and shrinks the queue as needed.

Are you really using del, gosh, anywhere? Oog. ;) Weakrefs are much more reliable, and I bet they will be the key to removing the RLocks.

Thanks for all the info! It's not spam at all. :)

08/20/07 @ 18:19
Comment from: Ali Afshar [Visitor] · http://unpythonic.blogspot.com/


I had never heard of Geniusql, but I had a look at it. Thanks.

What do you use to generate the documentation in /docs/geniusql/?


08/20/07 @ 18:47
Comment from: fumanchu [Member] Email

Hi Ali,

I use a text editor and my own two hands to generate the documentation for Geniusql and Dejavu. Sorry there's no whiz-bang framework for that yet. ;)

08/20/07 @ 18:48
Comment from: mike bayer [Visitor] · http://www.sqlalchemy.org

Hey Robert -

Yes I'm a little crazy, but it's all good. My process is somewhat angsty as you might have noticed :)

Also I guess I'm incorrect about my understanding of the "pre"/"post" grab. How is it that I'm using too much time issuing "SELECT nextval(seq)" but you're not spending too much time saying "SELECT last_value" ?

Yeah you might be right about the weakref. The thing is, the del issue has so far been impossible to reproduce, I only know of it because it was apparent from a stack trace received by a user. At the moment I'm not really recalling how the heck he got it to do that, its on the ML somewhere...

08/20/07 @ 20:17
Comment from: Michael Welch [Visitor]

Great work! Minor nitpick: The graphs would be much more readable if you have labels for the X and Y axis.

08/20/07 @ 20:28
Comment from: fumanchu [Member] Email

Okay Mike, I've added trace logs to Step 2 to illustrate what I mean by the difference in sequence handling.

I also added axis labels to the first chart. I'll add them to the others... uh... when I get time.

08/20/07 @ 23:42
Comment from: mike bayer [Visitor] · http://www.sqlalchemy.org

robert -

what version of SA are you testing against there?? the "_get_case_sensitive()" method was removed weeks ago, many iterations ago. I think a current stack dump would be much improved.

08/21/07 @ 14:52
Comment from: Massimo [Visitor] · http://mdp.cti.depaul.edu

I would love you to bechmark the Gluon ORM http://mdp.cti.depaul.edu. If you are interested and I can help in anyway, let me know. I could benchmark it myself but I believe a comparative and indepdent test would look better.

11/08/07 @ 14:06
Comment from: Massimo [Visitor] · http://mdp.cti.depaul.edu

I wrote equivalent benchmark code for the Gluon web framework and I posted it at
You can use it to compare Gluon to the other ORMs line by line and benchmark it yourself.
In my tests it comes out faster then the other three (and I find it more readable) but I would very much prefer if you or somebody else were to run the tests and post independent benchmark numbers.

11/10/07 @ 20:59
Comment from: Stephen Day [Visitor]


Just thinking you should add an additional disclaimer on the first chart, just in case readers skip past the disclaimer near the top (as I did). Or remove the chart entirely, as it doesn't seem to have any value.


02/06/08 @ 16:44

Leave a comment

Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
October 2016
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          


The requested Blog doesn't exist any more!

XML Feeds