Closed Bug 500305 Opened 15 years ago Closed 2 years ago

[meta] Explore thread-safe lock-free data structures

Categories

(Core :: General, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: mozilla+ben, Unassigned)

References

Details

(Keywords: meta)

Attachments

(1 file, 7 obsolete files)

Metabug.  Bugs pertaining to the implementation of specific thread-safe lock-free data structures should be marked as blocking this bug.

General discussion of lock-free implementation techniques is welcome here.
Depends on: 500308
Depends on: 502692
Depends on: 504005
Attachment #389217 - Attachment is obsolete: true
(In reply to comment #2)
> Created an attachment (id=389226) [details]
> merged patches (queue & sortlist)

Are the Queue parts of this patch still up-to-date?
(In reply to comment #3)
> (In reply to comment #2)
> > Created an attachment (id=389226) [details] [details]
> > merged patches (queue & sortlist)
> 
> Are the Queue parts of this patch still up-to-date?

I made very small changes to Queue.h since I posted the patch.
Attached patch lockfree hashtable (obsolete) — Splinter Review
Did you guys check out Cliff Click's work?

http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html
http://video.google.com/videoplay?docid=2139967204534450862&ei=DCRqSurkFZjcqAP5lckc&q=cliff+click+lock+free+hash&client=firefox-a

Not sure it all worked out as planned, I haven't had time to search.

Mozilla, Google, and WebKit hashtables differ in some ways but all have been optimized to avoid malloc / new of many little structures. In its day the double hashtable code was a big savings compared to open hashing with buckets and chains where each element on the chain is separately allocated. The cost to trade off is entry address stability.

Also it has been important for performance to avoid division-based hashing, esp. with powers of two. Multiplicative hashing is used instead for strength reduction *and* better scattering behavior.

jsdhash.{h,cpp} is worth a read if you haven't looked, although it suffers from being C code (only recently compiled as C++) so the API is hard to use, compared to what one could do with C++; also with more delegation the minimum entry could encode free and removed sentinels along with the stored key in one word per entry, for a more efficient Set implementation than the two-word minimum that is currently supported. Google's hashtable code delegates like that.

http://mxr.mozilla.org/mozilla-central/source/js/src/jsdhash.h

/be
(In reply to comment #6)
> Did you guys check out Cliff Click's work?
> 
> http://blogs.azulsystems.com/cliff/2007/03/a_nonblocking_h.html
> http://video.google.com/videoplay?docid=2139967204534450862&ei=DCRqSurkFZjcqAP5lckc&q=cliff+click+lock+free+hash&client=firefox-a
> 
> Not sure it all worked out as planned, I haven't had time to search.

I hadn't seen that work, but thanks for the links.  The citation for the paper Irina's been using is "Split-Ordered Lists: Lock-Free Extensible Hash Tables" (Ori Shalev and Nir Shavit, 2006).  It should be available online, or I can provide it privately.

The Shalev/Shavit paper sets forth a clever notion of "moving 'the buckets among the items' rather than 'the items among the buckets'," as they describe it.  This approach promises to make rehashing a trivial operation.  (NB, even though rehashing is a rare operation, not having to worry about whether rehashing is happening simplifies/cheapens insert/lookup/delete.)

> Mozilla, Google, and WebKit hashtables differ in some ways but all have been
> optimized to avoid malloc / new of many little structures. In its day the
> double hashtable code was a big savings compared to open hashing with buckets
> and chains where each element on the chain is separately allocated. The cost to
> trade off is entry address stability.

We definitely plan to avoid malloc, as a future optimization, by overriding operator new to pool-allocate the Cell objects used by the SortedList data structure.  We need address stability, for better or worse, given that the "buckets" in the Shalev/Shavit algorithm are pointers into a master linked list of elements.

> Also it has been important for performance to avoid division-based hashing,
> esp. with powers of two. Multiplicative hashing is used instead for strength
> reduction *and* better scattering behavior.

Unfortunately, the Shalev/Shavit algorithm relies on the number of buckets being a power of two and on the use of % to truncate hashcodes (so that, when the buckets list doubles in size, elements from an old bucket may move into exactly one new bucket; if there are other ways of achieving that behavior, they are not obvious to me).

Your strength reduction and scattering behavior points are well taken; however, these concessions appear essential to the cheap rehashing behavior of the algorithm.  Just how much these drawbacks are mitigated by lock freedom remains to be quantified.

> jsdhash.{h,cpp} is worth a read if you haven't looked, although it suffers from
> being C code (only recently compiled as C++) so the API is hard to use,
> compared to what one could do with C++; also with more delegation the minimum
> entry could encode free and removed sentinels along with the stored key in one
> word per entry, for a more efficient Set implementation than the two-word
> minimum that is currently supported. Google's hashtable code delegates like
> that.
> 
> http://mxr.mozilla.org/mozilla-central/source/js/src/jsdhash.h

Will read.

> /be

Thanks!
Multiplicative hash can be thought of as reciprocal divide followed by remainder. Instead of h % n where h is a nominal uint32 hash code and n is a power of two (the buckets vector size), which strength-reduces to h & (n-1), you use:

  (h * PHI) >> (32 - log2(n));

PHI is the Golden Ratio in fixed point: 9E3779C7 via bc(1):

scale=8
obase=16
(1+sqrt(5))/2
1.9E37799
(2^32)/.
9E3779C7.DA09952

I think this has the same properties you seek. But I do not understand what you mean by "when the buckets list doubles in size, elements from an old bucket may move into exactly one new bucket" with %, since for n = 32 you could have h1 = 5 and h2 = 37 both hash to the same bucket (5), but when you grow to n = 64, h1 is in bucket 5 but h2 is in bucket 37.

/be
(In reply to comment #8)
> But I do not understand what you
> mean by "when the buckets list doubles in size, elements from an old bucket may
> move into exactly one new bucket" with %, since for n = 32 you could have h1 =
> 5 and h2 = 37 both hash to the same bucket (5), but when you grow to n = 64, h1
> is in bucket 5 but h2 is in bucket 37.

I meant that some elements stay in the original bucket and some elements move to a new bucket (but only one new bucket).  In your example, the original bucket is bucket 5, and, after the doubling, h1=5 stays in that bucket and h2=37 moves to the new bucket, 37.  The insight is that all k for which k % 2^i = 5 will have k % 2^{i+1} equal to either 5 or 5+2^i (e.g. with h3=69, h3 % 32 == h3 % 64 == 5, and with h4=101, h4 % 32 == 5 but h4 % 64 == 37).
(In reply to comment #9)
> (In reply to comment #8)
> > But I do not understand what you
> > mean by "when the buckets list doubles in size, elements from an old bucket may
> > move into exactly one new bucket"
 . . .
> ... The insight is that all k for which k % 2^i
> = 5 will have k % 2^{i+1} equal to either 5 or 5+2^i (e.g. with h3=69, h3 % 32
> == h3 % 64 == 5, and with h4=101, h4 % 32 == 5 but h4 % 64 == 37).

Ok, I see -- so "when the buckets list doubles in size, elements from the old bucket move into one of at most two buckets".

Multiplicative hash has the same property. Let m = log2(n). Given nominal hash codes h1 and h2 such that:

  (h1 * PHI) >> (32 - m) == (h2 * PHI) >> (32 - m)

doubling the table size increments m to (m + 1), shifting right by one fewer bit. This is a zero-filling right shift because h is uint32. So we get an extra bit on the right, which can of course be only 0 or 1. So h1 and h2 go in the same bucket if they both have 0 or 1 in that low bit, and in two different buckets otherwise.

/be
Brendan, thank you for the idea!

But in this case, even if h1 and h2 are in the same bucket, the bucket will be different from the original one, which has to be kept. If h1 and h2 go to different buckets, then one of them has to be in the original bucket and the other in a bucket which has as a "parent" the original bucket. 

Maybe this would work if we shift left (32-m) bits and then shift right? (i.e. keep the least significant bits instead of the most significant)
By that, the keys will be either in the old bucket if they have a 0 in the high bit or in the "child" bucket if they have a 1. 

I should also mention that the parent is obtained by 0-ing the most significant bit. 

The condition that the bucket is either the original one or the child is because there might be a thread that doesn't know about the change of size yet and tries to access the key in the old bucket. The child buckets are still accessible from the original buckets so that thread could still find the key it is looking for.
- the user must provide a pointer to a hash function when creating a hashtable
and % was replaced by &
- updated the lockfree queue acc. to the comments at 
https://bugzilla.mozilla.org/show_bug.cgi?id=500308#c13

I've also experimented with multiplicative hashing, but it doesn't seem a solution in this case, since the bit patterns used for selecting the buckets are lost by multiplying the keys.
Attachment #389226 - Attachment is obsolete: true
Attachment #390532 - Attachment is obsolete: true
Attachment #391243 - Attachment is obsolete: true
(In reply to comment #13)
> Created an attachment (id=391406) [details]
> added LockfreeEventQueue.h and TestLockfreeEventQueue.cpp

A couple of things you might want to look into while I'm reviewing the hashtable.

- Move the non-testing header files from xpcom/tests into xpcom/glue/lockfree (needs to be created), and modify EXPORTS in xpcom/glue/Makefile.in to export these headers.  This is roughly the layout we'll want by the time this code is ready to land.  Be aware that you'll need to re-make in xpcom/glue before re-making xpcom/tests.

- In LockfreeEventQueue::GetEvent:

  - If you change the while loop test to

      while (result ? !lfq_.dequeue(*result) : lfq_.isEmpty())

    then you don't need either instance of

      if (result) empty = !lfq_.dequeue(*result);
      else empty = lfq_.isEmpty();

    and you can also get rid of PRBool empty.

  - Instead of waiting a whole 1000ms, you can PR_Sleep(PR_INTERVAL_NO_WAIT), which is equivalent to yielding.

  - Better yet, in order to avoid busy waiting entirely, we should use some sort of lock.  The trick is to acquire it only when necessary: when the queue is empty in GetEvent, and when PutEvent puts the first event into an empty queue (or, a little easier, when PutEvent knows that there are Getters waiting).

More to come on the hashtable.
(In reply to comment #14)
>   - Better yet, in order to avoid busy waiting entirely, we should use some
> sort of lock.

(By lock I mean monitor, and by monitor I mean mozilla::Monitor (#include "mozilla/Monitor.h").  You'll want to use mozilla::MonitorAutoEnter to acquire the lock associated with the monitor so that it gets released when the MonitorAutoEnter object goes out of scope.  See http://mxr.mozilla.org/mozilla-central/source/xpcom/tests/TestThreadPoolListener.cpp#95 for an example.)
Depends on: 507232
(In reply to comment #14)

>   - Instead of waiting a whole 1000ms, you can PR_Sleep(PR_INTERVAL_NO_WAIT),
> which is equivalent to yielding.
> 
>   - Better yet, in order to avoid busy waiting entirely, we should use some
> sort of lock.  The trick is to acquire it only when necessary: when the queue
> is empty in GetEvent, and when PutEvent puts the first event into an empty
> queue (or, a little easier, when PutEvent knows that there are Getters
> waiting).

The biggest problem here is that PRThreadPool which uses the nsEventQueue already holds a lock when it enters GetEvent, so for now, PR_Sleep is not enough.
Re: comment 15, why a monitor? Monitors are reentrant, shouldn't be used if you can use a condvar instead. Condition variables are for waiting. Monitors should be avoided, IMHO.

Re: comment 16: don't nest locks and don't sleep holding a lock -- what does that leave as the solution?

/be
(In reply to comment #17)
> Re: comment 15, why a monitor? Monitors are reentrant, shouldn't be used if you
> can use a condvar instead. Condition variables are for waiting. Monitors should
> be avoided, IMHO.

Mostly because of this comment:
http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/CondVar.h#50

But if we can get away with a CondVar, avoiding the reentrancy support does seem like a compelling reason.

Chris Jones, any thoughts?  It would be nice to have a CondVarAutoEnter class.

> Re: comment 16: don't nest locks and don't sleep holding a lock -- what does
> that leave as the solution?

As long as the event queue interface explicitly advertises when it may sleep/wait, the nesting issue is the client's problem; in this case, nsThreadPool's.  Provided nsThreadPool only ever passes mayWait=PR_FALSE to nsEventQueue::GetEvent (via nsEventQueue::GetPendingEvent, say), the nested operation should always run to completion without sleeping/waiting, and the lock nesting should be safe (if not advisable), right?
(In reply to comment #18)
> (In reply to comment #17)
> > Re: comment 15, why a monitor? Monitors are reentrant, shouldn't be used if you
> > can use a condvar instead. Condition variables are for waiting. Monitors should
> > be avoided, IMHO.
> 
> Mostly because of this comment:
> http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/CondVar.h#50
> 

This comment is meant for "John Q. Public"; I made it because IMHO monitors have a friendlier and less error-prone API.  That said, (i) needing to be fast, like event queues might, is a "compelling reason"; and (ii) I defer to Brendan's judgment.

> But if we can get away with a CondVar, avoiding the reentrancy support does
> seem like a compelling reason.
> 
> Chris Jones, any thoughts?  It would be nice to have a CondVarAutoEnter class.
> 

I don't know much about our current event queue, but exposing the underlying monitor sounds very scary.  If this implementation is meant to be a drop-in replacement for the existing event queue, then exposing the monitor alone would make me lean towards sticking with monitor synchronization.

> > Re: comment 16: don't nest locks and don't sleep holding a lock -- what does
> > that leave as the solution?
> 
> As long as the event queue interface explicitly advertises when it may
> sleep/wait, the nesting issue is the client's problem; in this case,
> nsThreadPool's.  Provided nsThreadPool only ever passes mayWait=PR_FALSE to
> nsEventQueue::GetEvent (via nsEventQueue::GetPendingEvent, say), the nested
> operation should always run to completion without sleeping/waiting, and the
> lock nesting should be safe (if not advisable), right?

Sounds like nsThreadPool is making bad assumptions here.
(In reply to comment #19)
> (In reply to comment #18)
> > (In reply to comment #17)
> > > Re: comment 15, why a monitor? Monitors are reentrant, shouldn't be used if you
> > > can use a condvar instead. Condition variables are for waiting. Monitors should
> > > be avoided, IMHO.
> > 
> > Mostly because of this comment:
> > http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/CondVar.h#50
> 
> This comment is meant for "John Q. Public"; I made it because IMHO monitors
> have a friendlier and less error-prone API.  That said, (i) needing to be fast,
> like event queues might, is a "compelling reason"; and (ii) I defer to
> Brendan's judgment.

Monitors are a trap. Java erred in making objects thread-safe by default and then allowing nesting. People built multi-threaded apps with obvious hot spots and gratuitous locking for ST paths. Then Java started add thread-unsafe data structures to correct course, leading to worst of both worlds.

I'd rather we not have Monitors. If we do (and we do now, so...) that comment must go.

> I don't know much about our current event queue, but exposing the underlying
> monitor sounds very scary.  If this implementation is meant to be a drop-in
> replacement for the existing event queue, then exposing the monitor alone would
> make me lean towards sticking with monitor synchronization.

Agreed under that constraint, but let's not go down a bad path just to preserve drop-in capability.

> > > Re: comment 16: don't nest locks and don't sleep holding a lock -- what does
> > > that leave as the solution?
> > 
> > As long as the event queue interface explicitly advertises when it may
> > sleep/wait, the nesting issue is the client's problem; in this case,
> > nsThreadPool's.  Provided nsThreadPool only ever passes mayWait=PR_FALSE to
> > nsEventQueue::GetEvent (via nsEventQueue::GetPendingEvent, say), the nested
> > operation should always run to completion without sleeping/waiting, and the
> > lock nesting should be safe (if not advisable), right?
> 
> Sounds like nsThreadPool is making bad assumptions here.

Agreed. This needs fixing by doing something else, not piling up nested locks or monitors.

/be
Attached patch queue and hashtable updated (obsolete) — Splinter Review
the *.h files were moved in xpcom/glue/lockfree.

LockfreeEventQueue is ready to replace nsEventQueue, but I'm not sure it's going to be a performance win.. Both nsThread and nsThreadPool which use nsEventQueue are using locks before accessing the queue.
Attachment #391406 - Attachment is obsolete: true
counting how many threads are inside enqueue/dequeue seems to work nicely. 

I guess I complicated the code a bit so that dequeue won't wait too much: it should wait for at most one delete operation, because cleanup always checks if mutex>0. 
Also, I realized that I was wrong about enqueues being safe: for example an enqueue can be interrupted by another enqueue and some dequeues and the old value of tail will point to an invalid location.
Attachment #394568 - Attachment is obsolete: true
cleanup has to check mutex after setting cleaning to 1; Also, I changed the "while(cleaning);" to a condition variable
This is still better than nsEventQueue lock and than the monitor used in patch
https://bug500305.bugzilla.mozilla.org/attachment.cgi?id=394568
because all threads would have to wait only when a thread is doing a cleanup iff they need to do an addReffed assignment. There can be at most one thread doing the cleanup and it will stop when it detects that other threads are waiting for it. 

To solve the assignment problem for the sorted list the same technique is used (cleanup + condVar), with the difference that a LockfreeQueue is used to store the pointers that need to be freed.
Attachment #395234 - Attachment is obsolete: true

The bug assignee didn't login in Bugzilla in the last 7 months.
:overholt, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: irina → nobody
Flags: needinfo?(overholt)

I don't think we need to care about an ancient exploratory bug.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(overholt)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: