| « The Dresden Codak Gallery | CherryPy 3 request_queue_size » |
For reasons I don't have time to fully explore, once in a great while pickle.dumps(obj) doesn't produce consistent strings on successive runs. Here's the one that bit me today:
((2, 'Tiger River'), "(I2\nS'Tiger River'\np1\ntp2\n.", '23ca69094eb994abc75cdec989d22398')
((2, 'Tiger River'), "(I2\nS'Tiger River'\ntp1\n.", '2b40ffb53a0be2c4cfe4f99b24d64842')
The first item in each set is the object being pickled. The second object is the result of pickle.dumps(obj). Note the same object is pickled to two different strings on distinct runs. No idea why.
This is important enough to blog about because the third item in each set above is an md5 hash of the pickle, which is how I discovered this--my attempts to recover memcached objects using keys from md5.new(pickle.dumps(object)).hexdigest() failed because the pickles differed. So until some with more time and brains than I comments on why, I recommend you don't use pickle.dumps to create md5 seeds.
Is that really pickle.dumps? It looks like cPickle.dumps.
>>> import pickle, cPickle
>>> pickle.dumps( (2, "Tiger River"), 0 )
"(I2\nS'Tiger River'\np0\ntp1\n."
>>> cPickle.dumps( (2, "Tiger River"), 0 )
"(I2\nS'Tiger River'\np1\ntp2\n."
It appears to be a different based on reference counts
>>> cPickle.dumps( (2, "Tiger River"), 0 )
"(I2\nS'Tiger River'\np1\ntp2\n."
>>> T = (2, "Tiger River")
>>> cPickle.dumps( T, 0)
"(I2\nS'Tiger River'\ntp1\n."
Probably because of this optimization in cPickle.c::put
static int
put(Picklerobject *self, PyObject *ob)
{
if (ob->ob_refcnt < 2 || self->fast)
return 0;
return put2(self, ob);
}
where "put2" is the one which generates the "p1\n" causing the problems.
I get different results between pickle and cPickle: import pickle, cPickle, dis
tup = (2, 'Tiger River')
pickle.dumps(tup)
"(I2\nS'Tiger River'\np0\ntp1\n."
cPickle.dumps(tup)
"(I2\nS'Tiger River'\ntp1\n."
Try passing
dis.dis(pickle.dumps(tup))
0 STORE_SLICE+0
1 PRINT_ITEM_TO
2 DELETE_SLICE+0
3 UNARY_POSITIVE
4 RETURN_VALUE
5
6 IMPORT_STAR
7 LOAD_ATTR 25959 (25959)
10 21024
13 LOAD_ATTR 25974 (25974)
16 2599
19 JUMP_IF_TRUE 2608 (to 2630)
22 LOAD_GLOBAL 12656 (12656)
25 UNARY_POSITIVE
26
dis.dis(cPickle.dumps(tup))
0 STORE_SLICE+0
1 PRINT_ITEM_TO
2 DELETE_SLICE+0
3 UNARY_POSITIVE
4 RETURN_VALUE
5
6 IMPORT_STAR
7 LOAD_ATTR 25959 (25959)
10 21024
13 LOAD_ATTR 25974 (25974)
16 2599
19 LOAD_GLOBAL 12656 (12656)
22 UNARY_POSITIVE
23
Try using marshal.dumps, that should be faster, but afaik, neither pickle nor marshal guarantee the same dumps.
pickle and cPickle aren't guaranteed to produce the same output:
http://docs.python.org/lib/module-cPickle.html#tex2html118
Since the pickle data format is actually a tiny stack-oriented programming language, and some freedom is taken in the encodings of certain objects, it is possible that the two modules produce different data streams for the same input objects. However it is guaranteed that they will always be able to read each other's data streams.
also, you may want to read this to further understand pickle:
http://peadrop.com/blog/2007/06/18/pickle-an-interesting-stack-language/