« The Dresden Codak GalleryCherryPy 3 request_queue_size »

pickle.dumps not suitable for hashing

11/03/07

Permalink 05:33:51 pm, by fumanchu Email , 169 words   English (US)
Categories: Python

pickle.dumps not suitable for hashing

For reasons I don't have time to fully explore, once in a great while pickle.dumps(obj) doesn't produce consistent strings on successive runs. Here's the one that bit me today:

((2, 'Tiger River'), "(I2\nS'Tiger River'\np1\ntp2\n.", '23ca69094eb994abc75cdec989d22398')
((2, 'Tiger River'), "(I2\nS'Tiger River'\ntp1\n.", '2b40ffb53a0be2c4cfe4f99b24d64842')

The first item in each set is the object being pickled. The second object is the result of pickle.dumps(obj). Note the same object is pickled to two different strings on distinct runs. No idea why.

This is important enough to blog about because the third item in each set above is an md5 hash of the pickle, which is how I discovered this--my attempts to recover memcached objects using keys from md5.new(pickle.dumps(object)).hexdigest() failed because the pickles differed. So until some with more time and brains than I comments on why, I recommend you don't use pickle.dumps to create md5 seeds.

3 comments

Comment from: Andrew Dalke [Visitor]

Is that really pickle.dumps? It looks like cPickle.dumps.


>>> import pickle, cPickle
>>> pickle.dumps( (2, "Tiger River"), 0 )
"(I2\nS'Tiger River'\np0\ntp1\n."
>>> cPickle.dumps( (2, "Tiger River"), 0 )
"(I2\nS'Tiger River'\np1\ntp2\n."


It appears to be a different based on reference counts


>>> cPickle.dumps( (2, "Tiger River"), 0 )
"(I2\nS'Tiger River'\np1\ntp2\n."
>>> T = (2, "Tiger River")
>>> cPickle.dumps( T, 0)
"(I2\nS'Tiger River'\ntp1\n."


Probably because of this optimization in cPickle.c::put


static int
put(Picklerobject *self, PyObject *ob)
{
if (ob->ob_refcnt < 2 || self->fast)
return 0;

return put2(self, ob);
}


where "put2" is the one which generates the "p1\n" causing the problems.

11/03/07 @ 19:47
Comment from: Gustavo Picon [Visitor] · http://gpicon.org

I get different results between pickle and cPickle:


import pickle, cPickle, dis
tup = (2, 'Tiger River')
pickle.dumps(tup)
"(I2\nS'Tiger River'\np0\ntp1\n."
cPickle.dumps(tup)
"(I2\nS'Tiger River'\ntp1\n."


Try passing

dis.dis(pickle.dumps(tup))
0 STORE_SLICE+0
1 PRINT_ITEM_TO
2 DELETE_SLICE+0
3 UNARY_POSITIVE
4 RETURN_VALUE
5
6 IMPORT_STAR
7 LOAD_ATTR 25959 (25959)
10 21024
13 LOAD_ATTR 25974 (25974)
16 2599
19 JUMP_IF_TRUE 2608 (to 2630)
22 LOAD_GLOBAL 12656 (12656)
25 UNARY_POSITIVE
26
dis.dis(cPickle.dumps(tup))
0 STORE_SLICE+0
1 PRINT_ITEM_TO
2 DELETE_SLICE+0
3 UNARY_POSITIVE
4 RETURN_VALUE
5
6 IMPORT_STAR
7 LOAD_ATTR 25959 (25959)
10 21024
13 LOAD_ATTR 25974 (25974)
16 2599
19 LOAD_GLOBAL 12656 (12656)
22 UNARY_POSITIVE
23



Try using marshal.dumps, that should be faster, but afaik, neither pickle nor marshal guarantee the same dumps.

11/03/07 @ 21:24
Comment from: Ido [Visitor]

pickle and cPickle aren't guaranteed to produce the same output:

http://docs.python.org/lib/module-cPickle.html#tex2html118

Since the pickle data format is actually a tiny stack-oriented programming language, and some freedom is taken in the encodings of certain objects, it is possible that the two modules produce different data streams for the same input objects. However it is guaranteed that they will always be able to read each other's data streams.



also, you may want to read this to further understand pickle:
http://peadrop.com/blog/2007/06/18/pickle-an-interesting-stack-language/

11/04/07 @ 16:03

Leave a comment


Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
December 2014
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
  1 2 3 4 5 6
7 8 9 10 11 12 13
14 15 16 17 18 19 20
21 22 23 24 25 26 27
28 29 30 31      

Search

The requested Blog doesn't exist any more!

XML Feeds

multi-blog