I dug into this failure a little bit and verified that it is specifically the "letter_range" portion of the test that sporadically fails. The hash of any frozenset constructed from floats, ints, or the empty frozenset, as well as frozensets recursively containing any of the previous have deterministic hashes that don't vary with the seed.
I isolated the letter_range test for various values of n to see how often this failure generally happened. I scanned the first 10000 integers set to PYTHONHASHSEED and got the following failures:
n=2 -
n=3 -
n=4 300, 1308, 2453, 4196, 5693, 8280, 8353
n=5 4846, 5693
n=6 3974
n=7 36, 1722, 5064, 8562, 8729
n=8 2889, 5916, 5986
n=9 -
n=10 -
I checked to see the behavior of psuedorandom integers in the range 0 to 2**64-1 by making a large sample of values taken from "len({random.randint(0,2**64) & 127 for _ in range(128)})", and found that the value of "u" in the test for n=7 if the hashes really are effectively randomly distributed follows a gaussian distribution with a mean of ~81 and deviation of ~3.5. So a value of 31 would be nearly 14 deviations from the mean which seems quite unreasonable.
I then took the distribution of the set sizes from the letter_range test for n=7 with 10,000 different seeds and plotted it alongside the distribution of set sizes from the last 7 bits of pseudorandom numbers in the attached file "frozenset_string_n7_10k.png".
The hashes of the frozensets of single letters follows a very different distribution. Either this test is inappropriate and will cause sporadic build failures, or there is a problem with the hash algorithm. |