Built In Python Hash() Function


Answer :

As stated in the documentation, built-in hash() function is not designed for storing resulting hashes somewhere externally. It is used to provide object's hash value, to store them in dictionaries and so on. It's also implementation-specific (GAE uses a modified version of Python). Check out:

>>> class Foo: ...     pass ...  >>> a = Foo() >>> b = Foo() >>> hash(a), hash(b) (-1210747828, -1210747892) 

As you can see, they are different, as hash() uses object's __hash__ method instead of 'normal' hashing algorithms, such as SHA.

Given the above, the rational choice is to use the hashlib module.


Use hashlib as hash() was designed to be used to:

quickly compare dictionary keys during a dictionary lookup

and therefore does not guarantee that it will be the same across Python implementations.


The response is absolutely no surprise: in fact

In [1]: -5768830964305142685L & 0xffffffff Out[1]: 1934711907L 

so if you want to get reliable responses on ASCII strings, just get the lower 32 bits as uint. The hash function for strings is 32-bit-safe and almost portable.

On the other side, you can't rely at all on getting the hash() of any object over which you haven't explicitly defined the __hash__ method to be invariant.

Over ASCII strings it works just because the hash is calculated on the single characters forming the string, like the following:

class string:     def __hash__(self):         if not self:             return 0 # empty         value = ord(self[0]) << 7         for char in self:             value = c_mul(1000003, value) ^ ord(char)         value = value ^ len(self)         if value == -1:             value = -2         return value 

where the c_mul function is the "cyclic" multiplication (without overflow) as in C.


Comments

Popular posts from this blog

Are Regular VACUUM ANALYZE Still Recommended Under 9.1?

Can Feynman Diagrams Be Used To Represent Any Perturbation Theory?