Closed Bug 486606 Opened 16 years ago Closed 14 years ago

Rewrite: Convert thread synchronization to new API

Categories

(Developer Infrastructure :: Source Code Analysis, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 645263

People

(Reporter: cjones, Assigned: cjones)

References

Details

Essentially, the rewrite needs to do the following:

  PRLock*     -> nsLock*
  PRMonitor*  -> nsMonitor*
  PRCMonitor* -> nsMonitor*
  PRCondVar*  -> nsCondVar*

  nsAutoLock::NewLock       -> nsLock::NewLock
  nsAutoMonitor::NewMonitor -> nsMonitor::NewMonitor

There is a one-to-one match between PR_XXX functions and nsXXX methods.  PRCMonitor is an oddity, but probably ought to be converted by hand.

If this proves to be easy, it would also be marvelous to rewrite the code in such a way that we can make these methods unnecessary:

  nsAutoLock::lock/unlock
  nsAutoMonitor::Enter/Exit

I think these methods are (1) an impediment to reasoning about code using AutoLock/Monitor; and (2) place an undue implementation burden on the AutoLock/Monitor implementation (it should be a dirt-simple wrapper, and not have to track any state.)

The former rewrite seems almost easy enough to do with sed.  The latter is tricker.  One way to do it is (i) find all decls that make calls to Auto::lock/unlock or Auto::Enter/Exit; (ii) change their type to nsLock*/nsMonitor*.  I'm open to other ideas, though.
Chris, thanks again for taking this on. A few comments:

PRCMonitor: just say no. We should stop using it.

unlock/lock and Exit/Enter nesting avoidance shows overlong outer critical sections, agree these are impediments and burdens. If you are game to revisit the nesting badness and shorten/reorder/iterate-to-recheck or whatever the right fix is to relieve the ancient, exiguous need for these bad boys, great.

/be
Why do we need NewLock/NewMonitor? Can't we get non-heap-allocated locks here?

Removing Enter/Exit is a problem. How else do you write code like

  mon.Enter();
  while (...) {
    if (...) {
      mon.Exit();
      ...
      mon.Enter();
    }
    ...
  }
  mon.Exit();

in a sane way? Our video decoders have a lot of this. If you can demonstrate how to get rid of it in say nsOggDecoder.cpp, I'll be convinced.

Creating new subclasses, say nsDroppableLock and nsDroppableMonitor, and only using them where necessary, would be fine.
(In reply to comment #2)
> Why do we need NewLock/NewMonitor? Can't we get non-heap-allocated locks here?
> 
> Removing Enter/Exit is a problem. How else do you write code like
> 
>   mon.Enter();
>   while (...) {
>     if (...) {
>       mon.Exit();
>       ...
>       mon.Enter();
>     }
>     ...
>   }
>   mon.Exit();
> 
> in a sane way? Our video decoders have a lot of this. If you can demonstrate
> how to get rid of it in say nsOggDecoder.cpp, I'll be convinced.

Sounds like need another RAII class that reverses the Enter/Exit pattern.
We do have an nsAutoUnlock class for locks, just not for monitors.
(In reply to comment #2)
> Why do we need NewLock/NewMonitor? Can't we get non-heap-allocated locks here?
> 

It's on my todo list.  I think the migration is easier if we switch to the new, heap-allocated API first.

> Removing Enter/Exit is a problem. How else do you write code like
> 
>   mon.Enter();
>   while (...) {
>     if (...) {
>       mon.Exit();
>       ...
>       mon.Enter();
>     }
>     ...
>   }
>   mon.Exit();
> 
> in a sane way? Our video decoders have a lot of this. If you can demonstrate
> how to get rid of it in say nsOggDecoder.cpp, I'll be convinced.
> 

The glib answer is s/nsAutoMonitor/nsMonitor/.  But creating nsAutoExitMonitor suits me as well, and appears to handle your case.
roc: why are the decoders using monitors over long critical sections? Don't make me blame Java now :-/.

Canonical synch structure for queueing uses a lock/condvar pair, not a monitor.

/be
(In reply to comment #6)
> roc: why are the decoders using monitors over long critical sections? Don't
> make me blame Java now :-/.
> 
> Canonical synch structure for queueing uses a lock/condvar pair, not a monitor.

There's state shared beween the main thread and the decoder thread. The decoder thread holds the monitor continuously except where we explicitly drop it (which is, roughly, whenever we might block or do something CPU-intensive, or in a Wait()). This is good because it reduces the number of interleavings.
(In reply to comment #7)
> (In reply to comment #6)
> > roc: why are the decoders using monitors over long critical sections? Don't
> > make me blame Java now :-/.
> > 
> > Canonical synch structure for queueing uses a lock/condvar pair, not a monitor.
> 
> There's state shared beween the main thread and the decoder thread. The decoder
> thread holds the monitor continuously except where we explicitly drop it (which
> is, roughly, whenever we might block or do something CPU-intensive, or in a
> Wait()). This is good because it reduces the number of interleavings.

This sounds like a job for nsMonitor.  Presumably the points where the monitor is released don't correspond to any kind of lexical scope.
(In reply to comment #8)
> This sounds like a job for nsMonitor.  Presumably the points where the monitor
> is released don't correspond to any kind of lexical scope.

Maybe not, but using non-scope-based objects can lead to errors that are difficult to find when code gets modified later (by including early returns, for instance). Having the scope enforce the duration of the lock (or, unlock) provides nice protection.

Could (should?) we even go so far as to remove the Lock/Unlock and Enter/Exit functions from the nsLock/nsMonitor?
Sure, there's always placement new and explicit destructor calls to fall back on! :-)
Don't make me make that constructor private! ;)
(In reply to comment #10)
> (In reply to comment #8)
> > This sounds like a job for nsMonitor.  Presumably the points where the monitor
> > is released don't correspond to any kind of lexical scope.
> 
> Maybe not, but using non-scope-based objects can lead to errors that are
> difficult to find when code gets modified later (by including early returns,
> for instance). Having the scope enforce the duration of the lock (or, unlock)
> provides nice protection.
> 

Except that the scope can't enforce that when AutoMonitor also has explicit Enter/Exit, which the decoder seems to require/want.  I think it's even more confusing to mix the two.  I want to keep "AutoMonitor == simple: you're in it when it's in scope", and "Monitor == not simple: you're on your own, be careful."

> Could (should?) we even go so far as to remove the Lock/Unlock and Enter/Exit
> functions from the nsLock/nsMonitor?

This would be equivalent to only providing AutoLock/AutoMonitor.  That'd be nice, but the amount of refactoring it would require is more than I want to tackle.  Plus there are probably cases where it's not possible, like code that needs to interrupt a task and drop back into an event loop, while still having the task hold a lock.

BTW, this entire discussion is probably better suited to bug 58904, where we're ostensibly deciding on the API :).  Rewriting the existing code to the new API (whatever it may be) is not nearly as interesting.
(In reply to comment #14)
> Except that the scope can't enforce that when AutoMonitor also has explicit
> Enter/Exit, which the decoder seems to require/want.

Oh, sorry, I was assuming we had agreed to make a nsAutoExitMonitor or somesuch.
(In reply to comment #14)
> 
> Except that the scope can't enforce that when AutoMonitor also has explicit
> Enter/Exit, which the decoder seems to require/want.  I think it's even more
> confusing to mix the two.  I want to keep "AutoMonitor == simple: you're in it
> when it's in scope", and "Monitor == not simple: you're on your own, be
> careful."

I'd replace all of the "ninja" constructs possible. If something can cause trouble, it will cause trouble.

I think it's most beneficial to discuss api in terms of rewriting, because as you rewrite you realize what is and isn't possible with existing code.
(In reply to comment #15)
> (In reply to comment #14)
> > Except that the scope can't enforce that when AutoMonitor also has explicit
> > Enter/Exit, which the decoder seems to require/want.
> 
> Oh, sorry, I was assuming we had agreed to make a nsAutoExitMonitor or
> somesuch.

That could work, if the places it's released/reacquired correspond to some
lexical scope.  But see above.

Also, AutoExitMonitor is weirder than AutoUnlock, since Monitors are reentrant.
 Either AutoExitMonitor (i) Exits the most recent Enter; or (ii) exits all
previous Enters.  Option (i) makes sense (to me) by analogy to AutoUnlock, but
doesn't guarantee that succeeding code is out of the Monitor.  Option (ii)
makes sense when you want to use it before Block() or Compute(), but doesn't
provide the same analogy to AutoUnlock (which feels funny).  I don't like
either by itself.

So it seems that we can either: 
  (1) provide bare nsMonitor and let performance-critical code take care of its
own problems; or
  (2) provide AutoEnterMonitor, autoExitMonitor ((i) above), and
AutoReallyExitMonitor ((ii) above), and refactor existing code to only use
AutoXXXMonitor on lexical boundaries.
It's probably obvious that I lean towards (1) :).
(In reply to comment #16)
> (In reply to comment #14)
> > 
> > Except that the scope can't enforce that when AutoMonitor also has explicit
> > Enter/Exit, which the decoder seems to require/want.  I think it's even more
> > confusing to mix the two.  I want to keep "AutoMonitor == simple: you're in it
> > when it's in scope", and "Monitor == not simple: you're on your own, be
> > careful."
> 
> I'd replace all of the "ninja" constructs possible. If something can cause
> trouble, it will cause trouble.
> 
> I think it's most beneficial to discuss api in terms of rewriting, because as
> you rewrite you realize what is and isn't possible with existing code.

I wholly agree.  But beyond a fancy analysis/refactoring, the best I can do is provide the guidance comment "Use AutoMonitor if at all possible" (which is in the proposed patch).

But maybe the analysis/refactoring is tractable ...
(In reply to comment #17)
>   (1) provide bare nsMonitor and let performance-critical code take care of its
> own problems; or

I don't think it's about performance-critical code. I think it's good for safety too to avoid unnecessary potential interleavings.
(In reply to comment #19)
> (In reply to comment #17)
> >   (1) provide bare nsMonitor and let performance-critical code take care of its
> > own problems; or
> 
> I don't think it's about performance-critical code. I think it's good for
> safety too to avoid unnecessary potential interleavings.

Hmm.  If you need to Exit the monitor at arbitrary Block() and Compute() points for better performance, then it seems like the main thread and decoding thread in your example are maybe sharing too much data.  A potentially simpler and better-performing way to reduce non-commutable interleavings would be with director classes and annotations that make data ownership (and its transfer) explicit.  But I'm really talking out of my arse here; I haven't looked at the code.
(In reply to comment #20)
> But I'm really talking out of my arse here; I haven't looked at the
> code.

Skimmed the code.  Wow, it's very nice.  A couple of small, practical points:

(1) these can be uncommented:
  //  NS_ASSERTION(PR_InMonitor(mDecoder->GetMonitor()), ...);
and updated to the new NSPR macro.  (It's not quite what you want, but close enough.)
(2) you might want a Mutex+CondVar rather than a Monitor.  Code like:
  mon.Exit();
  DecodeFrame();
  mon.Enter();
will cause weird performance bugs if the monitor is reentered anywhere in the context of the call to that function (meaning that the Exit doesn't fully exit the monitor).  This impl is a great choice for safety, and I'm sure this really nice code maintains implicit invariants guarding against reentry, but I'd rather debug a Mutex deadlock.

The rest of this is BS-y philosophical stuff.  I don't mean to single out nsOggDecode, because given its model, it's amazing code: safe and moderately easy to understand.  But, I don't like the model.  Maybe it's time to start having these kinds of discussions.

nsOggDecode uses a share-everything, protect-against-race model.  But all it really needs to share is nsOggDecode<-->(command queue)<-->nsOggDecodeStateMachine, network-->(buffer)-->nsOggDecodeStateMachine, and nsOggDecode-->(buffer)-->HTML media element sink.  Here, (command queue), (buffer), and (buffer) are the only things accessed concurrently, and can be more easily protected.  These things can also be implemented very efficiently on a shared-memory machine, and more easily ported to non-shared machines (like the Cell, but more may be coming in the not-too-distant future).  The advantage is simplicity: everything in nsOggDecodeStateMachine can run without caring about concurrency.  And similarly for nsOggDecode, "network" and the "media element".  This also can yield higher performance, because each of these actors can run concurrently if need be (and compilers can generate better code).

Whereas with the current implementation, each pair above has access to each other's entire public API.  Certainly it can be argued that much of that concurrent public API is what one would get if one started with (command queue)s and (buffer)s, and optimized them by hand.  But I can't think of any case in this example where that should be necessary.

Maybe an analogy to pointer safety would illustrate what I mean.  It'd be weird to write code like:

  Foo(Ptr* const p) {
    if (p == 0)  error;
    else         p->Bar();
    if (p == 0)  error;
    else         p->Baz();
    ...
    if (p == 0)  error;
    else         p->Boo();
  }

when

  Foo(Ptr* const p) {
    assert(p);
    p->Bar();
    p->Baz();
    ...
    p->Boo();
  }

is the same thing, but simpler and more efficient.  The analogy is that the second version compartmentalizes the potential weird behavior (concurrency :: garbage pointer), instead of sprinkling the safety concerns throughout the implementation.

Anyway, pie in the sky.  Let the flames begin.
New (soon to executable), rewrite specification:

rewrite SyncPrimitiveUpgrade {
  type PRLock => mozilla::Mutex
  call PR_NewLock => mozilla::Mutex::NewMutex
  call PR_DestroyLock => mozilla::Mutex::DestroyMutex
  call PR_Lock(lock) => lock->Lock()
  call PR_Unlock(lock) => lock->Unlock()

  type PRMonitor => mozilla::Monitor
  call PR_NewMonitor => mozilla::Monitor::NewMonitor
  call PR_DestroyLock => mozilla::Monitor::DestroyMonitor
  call PR_EnterMonitor(mon) => mon->Enter()
  call PR_ExitMonitor(mon) => mon->Exit()
  call PR_Wait(mon, ticks) => mon->Wait(ticks)
  call PR_Notify(mon) => mon->Notify()
  call PR_NotifyAll(mon) => mon->NotifyAll()

  type PRCondVar => mozilla::CondVar
  call PR_NewCondVar => mozilla::CondVar::NewCondVar
  call PR_DestroyCondVar => mozilla::CondVar::DestroyCondVar
  call PR_WaitCondVar(cvar, ticks) => cvar->Wait(ticks)
  call PR_NotifyCondVar(cvar) => cvar->Notify()
  call PR_NotifyAllCondVar(cvar) => cvar->NotifyAll()
}
I completely agree with everything in comment #21 -- I said something in another bug about bogo-Java-monitors being replaced by mutex+condvar. So who disagrees?

/be
Replacing the monitor with a mutex and conditional variable is completely reasonable.

Changing the decoder model to be purely message-passing, I'm not sure. We started it that way but it didn't work out very well. One problem is that for A/V sync you really do want the main thread peeking into shared state to see what the "current" video frame is. It would be interesting to have a broader discussion about that in some better forum.
(In reply to comment #24)
> One problem is that for
> A/V sync you really do want the main thread peeking into shared state to see
> what the "current" video frame is.

Do you mean the actual frame contents, or some small frame identifier (e.g., sequence number/address)?  If the latter, I had imagined the decoder pushing this information to the main thread's control queue.  For an integer or similar, that should be cheap.
For that to work, you'd have to flush the control queue every time the main thread paints the video, and in general the decoder will be sending events that require the main thread to run script, which we don't want to do during painting.

Maybe you could do it with multiple control channels or something. There might be some advantage over the current design, but we'd have to think about it.
Round 1 of this upgrade has been completed.  For the record, all the mozilla code could have been rewritten in one shot, this afternoon and evening; however, I stopped short of that for reasons related to the tryserver and reviewer latency.
(In reply to comment #24)
> Replacing the monitor with a mutex and conditional variable is completely
> reasonable.
> 
> Changing the decoder model to be purely message-passing, I'm not sure. We
> started it that way but it didn't work out very well. One problem is that for
> A/V sync you really do want the main thread peeking into shared state to see
> what the "current" video frame is. It would be interesting to have a broader
> discussion about that in some better forum.

Update: we (the Electrolysis team + media team) will attempt to use our IPC language (https://wiki.mozilla.org/IPC_Protocols) to make this happen.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
Product: Core → Firefox Build System
Product: Firefox Build System → Developer Infrastructure
You need to log in before you can comment on or make changes to this bug.