Posts

Showing posts with the label Hash

Can CRC32 Be Used As A Hash Function?

Answer : CRC32 works very well as a hash algorithm. The whole point of a CRC is to hash a stream of bytes with as few collisions as possible. That said, there are a few points to consider: CRC's are not secure. For secure hashing you need a much more computationally expensive algorithm. For a simple bucket hasher, security is usually a non-issue. Different CRC flavors exist with different properties. Make sure you use the right algorithm, e.g. with hash polynomial 0x11EDC6F41 (CRC32C) which is the optimal general purpose choice. As a hashing speed/quality trade-off, the x86 CRC32 instruction is tough to beat. However, this instruction doesn't exist in older CPU's so beware of portability problems. ---- EDIT ---- Mark Adler provided a link to a useful article for hash evaluation by Bret Mulvey. Using the source code provided in the article, I ran the "bucket test" for both CRC32C and Jenkins96. These tables show the probability that a tru...

Built In Python Hash() Function

Answer : As stated in the documentation, built-in hash() function is not designed for storing resulting hashes somewhere externally. It is used to provide object's hash value, to store them in dictionaries and so on. It's also implementation-specific (GAE uses a modified version of Python). Check out: >>> class Foo: ... pass ... >>> a = Foo() >>> b = Foo() >>> hash(a), hash(b) (-1210747828, -1210747892) As you can see, they are different, as hash() uses object's __hash__ method instead of 'normal' hashing algorithms, such as SHA. Given the above, the rational choice is to use the hashlib module. Use hashlib as hash() was designed to be used to: quickly compare dictionary keys during a dictionary lookup and therefore does not guarantee that it will be the same across Python implementations. The response is absolutely no surprise: in fact In [1]: -5768830964305142685L & 0xffffffff Out[1]: 1934711907L...