Open Bug 507232 Opened 11 years ago Updated 10 years ago

Implement a thread-safe lock-free hashtable

Categories

(Core :: General, enhancement)

enhancement
Not set

Tracking

()

ASSIGNED

People

(Reporter: mozilla+ben, Assigned: irina)

References

(Blocks 1 open bug)

Details

Implement the lock-free thread-safe hashtable described in this paper:

Shalev, O. and Shavit, N. 2006. Split-ordered lists: Lock-free extensible hash tables. J. ACM 53, 3 (May. 2006), 379-405.
http://doi.acm.org/10.1145/1147954.1147958

making use of the sorted list implementation in bug 502692.

Irina, I know you've already attached an implementation to bug 500305, but we'll need a separate bug as parts of that patch become ready to land.
Here's a trick for generating the bit-reversal lookup table using macros:
http://graphics.stanford.edu/~seander/bithacks.html#BitReverseTable

That trick reverses one byte at a time, but the table is much smaller, and we can write wrappers to reverse any number of bytes using that table.
If I'm not mistaken, the implementation given in
https://bugzilla.mozilla.org/attachment.cgi?id=391406&action=diff
forces the client hash function to be one-to-one (as it happens to be in the current test code!).  In other words, although it handles bucket collisions provided the keys are different, it doesn't handle identical-key collisions.

There are a couple of ways to fix this:

  1.  Require ValueType to support operator <, so that SortedList can continue to maintain a total ordering of (key, value) pairs.

  2.  Associate more than one value with each key, somehow maintaining the uniqueness (and small constant size) of the set of associated values.  This only depends on ValueType supporting operator == (which it must, really).

  3.  Use some kind of open addressing (i.e., probing on collision), which probably would require SortedList::insert to support two failure modes, one indicating that the exact (key, value) pair was already found in the table, another indicating that a different value was associated with the given key.  In the latter case, the key could be incremented and the insert tried again.  The find operation would need to be modified accordingly: when (key, different-value) is found, increment the key and look again.

I mention the third idea only for completeness; I really don't like it at all!  The second option feels the most correct, but the first would be easiest to implement.  Can you think of other ways around the collision problem?
I don't think that 1 or 3 would work (at least not without major changes for SortedList) because:
- SortedList is only aware of the keys and not of the values
- SortedList assumes there are no duplicates (and it uses this when it determines if insertion succeeded or not)

2 can work if we use a ValueSet instead of the ValueType when inserting nodes in the SortedList, but I'm not sure what the behavior of find and remove should be then.
(In reply to comment #3)
> I don't think that 1 or 3 would work (at least not without major changes for
> SortedList) because:
> - SortedList is only aware of the keys and not of the values

We could change this.  I would be in favor of hiding the key-value distinction from SortedList.  All SortedList::find necessarily needs is operator< and operator==, so the hash table could synthesize a type implementing these comparison operators based on the key and value.  But it can only do this if the value type is < comparable.

> 2 can work if we use a ValueSet instead of the ValueType when inserting nodes
> in the SortedList, but I'm not sure what the behavior of find and remove should
> be then.

Again, I think SortedList could be kept safe from the details here.  The hash table would never really need to remove anything from the SortedList, as long as it could insert/find/remove from the value set.  That value set would need to be an unordered list, since we're trying to avoid requiring ValueType to support operator <.
I has an idea.  According to the paper, the motivation for the dummy nodes is to facilitate key removal.  Supposing we associate a small lock-free set with each key to hold values that hash to that key, instead of requiring a 1:1 key-value correspondence, then it would seem we don't ever need to remove the keys, but only add/remove values to/from the associated set.  If we don't remove the keys, do we still need dummy nodes?  And if we don't need dummy nodes, wouldn't that simplify the algorithm somewhat?
I don't think we can give up the dummy nodes even if we never remove the keys. They also serve for the initialization of the buckets. 
So, assuming that each bucket points to the first key it contains, we might have two threads that are trying to insert two different keys in the same bucket and these keys come before the existent first key (in their "split-order"). Then we will have to update the bucket to point to the smaller of the two, and it seems to me that we introduce a race condition.
You need to log in before you can comment on or make changes to this bug.