Status

()

enhancement
RESOLVED FIXED
2 years ago
3 months ago

People

(Reporter: billm, Assigned: billm)

Tracking

(Depends on 7 bugs, Blocks 1 bug)

unspecified
mozilla57
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57 fixed)

Details

Attachments

(4 attachments, 6 obsolete attachments)

757 bytes, patch
froydnj
: review+
Details | Diff | Splinter Review
87.79 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
1.24 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
8.38 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
This bug will cover the work to implement a cooperative scheduler for the Quantum DOM project. Basic pieces are:
- Start a pool of threads, only one of which will run at a time.
- Have a queue of labeled and unlabeled tasks.
- Threads in the pool pop tasks off the queue and run them if it is valid to do so.
- Threads can be interrupted while running JS or doing layout so that a different thread in the pool can run.
This patch creates a ZoneGroup (a JS engine abstraction) for each TabGroup (a browser abstraction). The two should be roughly 1:1, modulo differences with system stuff.
Posted patch TLS patch (obsolete) — Splinter Review
This patch tries to cope with the fact that we'll now be running "main thread" code across a pool of cooperatively scheduled threads. So far I've mostly been commenting out thread safety assertions and copying TLS across threads as needed.
Posted patch main scheduler mismash patch (obsolete) — Splinter Review
This is a WIP implementation of the scheduler. The patch also includes more commented out assertions and some XPConnect changes related to bug 1343396 (I got lazy keeping everything separate).

The main files of interest are xpcom/threads/{Scheduler,SchedulerQueue,CooperativeThreadPool}.{cpp,h}.
Depends on: 1359490
(In reply to Bill McCloskey (:billm) from comment #0)
> - Threads can be interrupted while running JS or doing layout so that a
> different thread in the pool can run.

How is this going to interact with global state?  I ran across nsContentUtils::{Set,}MicroTaskLevel() &co today and I'm sure there are other things like it; are we going to carefully orchestrate things such that certain "while running JS" things are uninterruptible?
(In reply to Nathan Froyd [:froydnj] from comment #4)
> (In reply to Bill McCloskey (:billm) from comment #0)
> > - Threads can be interrupted while running JS or doing layout so that a
> > different thread in the pool can run.
> 
> How is this going to interact with global state?  I ran across
> nsContentUtils::{Set,}MicroTaskLevel() &co today and I'm sure there are
> other things like it; are we going to carefully orchestrate things such that
> certain "while running JS" things are uninterruptible?

Yeah, stuff like that will have to become TLS. Olli is working on the microtask problem. I think there is somewhat less of it than might be feared. Code like this usually needs special handling for nested event loops, and we don't have too much special code for nested event loops. Microtasks are here, for example:
http://searchfox.org/mozilla-central/rev/baf47b352e873d4516d7656186d6d7c7447d3873/dom/base/nsDocument.cpp#12986
Depends on: 1359903
Depends on: 1359923
Depends on: 1360236
Depends on: 1372670
Please notice that IDBCounters are also exposed in TabGroup::IndexedDB{Transaction|Database}Counter() in bug 1352401 for the decision making of the preemption in the scheduler.
Posted patch main patch (obsolete) — Splinter Review
It's probably time to start getting feedback on this. It depends a lot on bug 1382922, so you should probably look over that one first.

The basic design is to have a CooperativeThreadPool that allows only one thread in the pool at a time to run. Then there's a Scheduler, which orchestrates everything, decides when to yield, and handles the event loop stuff. The patch adds a LabeledEventQueue, which keeps a separate queue for each SchedulerGroup. See the big comment in LabeledEventQueue.h to understand how this is structured.

I also had to change the result of HasPendingEvent to return whether there are events that are ready or not ready. This is useful for vsync. Suppose there is a vsync task in the high priority queue. Suppose it's not ready to run because it will touch some SchedulerGroup that's already running something. If a thread asks for an event, we want PrioritizedEventQueue to return no events in this case, rather than skipping over the high priority event queue and returning a normal priority event. So I had to add the tri-state result for this.
Attachment #8851100 - Attachment is obsolete: true
Attachment #8851105 - Attachment is obsolete: true
Attachment #8851106 - Attachment is obsolete: true
Attachment #8891136 - Flags: feedback?(nfroyd)
Attachment #8891136 - Attachment is patch: true
Comment on attachment 8891136 [details] [diff] [review]
main patch

Review of attachment 8891136 [details] [diff] [review]:
-----------------------------------------------------------------

I keep telling myself I need to take a good look at the core logic of the scheduler, but we can do a short first round of feedback, at least.

::: xpcom/threads/CooperativeThreadPool.cpp
@@ +123,5 @@
> +CooperativeThreadPool::CooperativeThread::ThreadMethod()
> +{
> +  sTlsCurrentThread.set(this);
> +
> +  PR_SetCurrentThreadName("CoopThrd");

We have nsThreadPoolNaming, could you use that to name the threads, please?

::: xpcom/threads/LabeledEventQueue.cpp
@@ +50,5 @@
> +      return true;
> +    }
> +  }
> +
> +  return false;

Might be slightly more in keeping with code conventions to do:

nsCOMPtr<...> labelable = ...;
if (!labelable) {
  return false;
}

...search for it in scheduler groups...

@@ +84,5 @@
> +  if (isLabeled) {
> +    RunnableEpochQueue* queue = mLabeled.LookupOrAdd(group);
> +    queue->Push(QueueEntry(event.forget(), epoch->mEpochNumber));
> +  } else {
> +    mUnlabeled.Push(QueueEntry(event.forget(), epoch->mEpochNumber));

Maybe just:

RunnableEpochQueue* queue = isLabeled ? mLabeled.LookupOrAdd(group) : &mUnlabeled;
queue->Push(QueueEntry(...));

@@ +108,5 @@
> +  if (mEpochs.IsEmpty()) {
> +    return nullptr;
> +  }
> +
> +  Epoch epoch = mEpochs.FirstElement();

`Epoch&`, perhaps?

::: xpcom/threads/Queue.h
@@ +8,5 @@
>  #define mozilla_Queue_h
>  
>  namespace mozilla {
>  
> +template<class T, size_t ItemsPerPage = 255, size_t Slop = 0>

What is Slop for?  I imagine that it's used for carefully rounding things up to jemalloc bucket sizes?

I don't understand why it's good to have the user specify this, though.  Why not take ItemsPerPage as a hint instead, and inside the class attempt to figure out what a good jemalloc-friendly Page size would be?

::: xpcom/threads/ThreadEventQueue.cpp
@@ +153,5 @@
>  bool
>  ThreadEventQueue<InnerQueueT>::ShutdownIfNoPendingEvents()
>  {
>    MutexAutoLock lock(mLock);
> +  if (mNestedQueues.IsEmpty() && mBaseQueue->HasPendingEvent(lock).IsEmpty()) {

Maybe the method should be called Status, instead of HasPendingEvent, since it's no longer a binary thing?

::: xpcom/threads/nsThreadManager.cpp
@@ +121,3 @@
>  
> +  bool startScheduler = false;
> +  if (PR_GetEnv("MOZ_USE_SCHEDULER") && XRE_IsContentProcess()) {

This PR_GetEnv is just there for testing purposes, right?  I assume it's going to be replaced by a pref or something at some point?
(In reply to Nathan Froyd [:froydnj] from comment #8)
> ::: xpcom/threads/Queue.h
> @@ +8,5 @@
> >  #define mozilla_Queue_h
> >  
> >  namespace mozilla {
> >  
> > +template<class T, size_t ItemsPerPage = 255, size_t Slop = 0>
> 
> What is Slop for?  I imagine that it's used for carefully rounding things up
> to jemalloc bucket sizes?

Yes.

> I don't understand why it's good to have the user specify this, though.  Why
> not take ItemsPerPage as a hint instead, and inside the class attempt to
> figure out what a good jemalloc-friendly Page size would be?

I don't know how to do that. Do you have something in mind? Given an arbitrary size, rounding it to the nearest power of 2 as a constexpr isn't so easy. I found some weird templatey things on Stack Overflow. Is that what you mean?
Actually, I thought up a reasonable way to handle this. The user would pass in an "estimated number of items" N, which would be required to be a power of 2. Then we would actually use N-1 as the number of items. I guess that's pretty simple and it minimizes space wasted.
Comment on attachment 8891136 [details] [diff] [review]
main patch

Review of attachment 8891136 [details] [diff] [review]:
-----------------------------------------------------------------

More comments, hopefully better ones than last time.

::: xpcom/threads/CooperativeThreadPool.cpp
@@ +23,5 @@
> +  , mNumThreads(aNumThreads)
> +  , mController(aController)
> +  , mSelectedThread(0)
> +{
> +  MOZ_ASSERT(aNumThreads < kMaxThreads);

...maybe we should just clamp the value of aNumThreads instead of silently letting bad data through?

@@ +77,5 @@
> +      mSelectedThread = i;
> +      mThreads[i]->mCondVar.Notify();
> +      return;
> +    }
> +  }

There should be a MOZ_ASSERT_UNREACHABLE() or similar after this loop, yes?  (That's kind of what AssertNotDeadlocked is trying to say, but the extra assert here can't be a bad thing.)

@@ +125,5 @@
> +  sTlsCurrentThread.set(this);
> +
> +  PR_SetCurrentThreadName("CoopThrd");
> +
> +  mozilla::IOInterposer::RegisterCurrentThread();

We want to think about starting the profiler on these threads?  (...and how to integrate all the information from them into something coherent for the profiler, etc...)

@@ +163,5 @@
> +CooperativeThreadPool::CooperativeThread::Init()
> +{
> +  PR_CreateThread(PR_USER_THREAD, ThreadFunc, this,
> +                  PR_PRIORITY_NORMAL, PR_GLOBAL_THREAD,
> +                  PR_JOINABLE_THREAD, 0);

If we're not going to check this call for failure, or even try to remember the thread to join it at some later point, is there any reason we shouldn't move this into the constructor?

(Actually, what does happen if we happen to find ourselves unable to create a thread/initialize the scheduler?)

@@ +203,5 @@
> +  } while (selected != mIndex + 1);
> +
> +  if (!found) {
> +    // We need to block all threads.
> +    selected = mPool->mNumThreads;

It's not immediately obvious to me how we get unblocked; is there documentation for that somewhere?

::: xpcom/threads/CooperativeThreadPool.h
@@ +46,5 @@
> +
> +  // Must be called from within the thread pool.
> +  static void Yield(MutexAutoLock& aProofOfLock);
> +
> +  class Resource {

Documentation on Resource would be very nice.

@@ +82,5 @@
> +    CooperativeThreadPool* mPool;
> +    CondVar mCondVar;
> +    Resource* mBlocker;
> +    nsCOMPtr<nsIThread> mThread;
> +    size_t mIndex;

Might make this `const`.

@@ +91,5 @@
> +  static const size_t kMaxThreads = 8;
> +
> +  Mutex& mMutex;
> +  bool mRunning;
> +  size_t mNumThreads;

Might make this `const`, just to reassure ourselves this isn't going to change.

@@ +93,5 @@
> +  Mutex& mMutex;
> +  bool mRunning;
> +  size_t mNumThreads;
> +  Controller& mController;
> +  mozilla::UniquePtr<CooperativeThread> mThreads[kMaxThreads];

Making this a mozilla::Array would have the virtue of bounds checking accesses in debug mode.

@@ +94,5 @@
> +  bool mRunning;
> +  size_t mNumThreads;
> +  Controller& mController;
> +  mozilla::UniquePtr<CooperativeThread> mThreads[kMaxThreads];
> +  size_t mSelectedThread;

This should feature a comment about how it might not be less than mNumThreads, since mSelectedThread == mNumThreads means something special, AFAICT.  Or it would be great if the state were encoded in something else other than a sentinel value (Variant<size_t, ThreadsBlocked> or something, maybe even integrating mRunning into the mix so we could have a more state machine-like representation?), so we weren't accidentally indexing off the end.

::: xpcom/threads/LabeledEventQueue.cpp
@@ +68,5 @@
> +
> +  // Create a new epoch if necessary.
> +  Epoch* epoch;
> +  if (mEpochs.IsEmpty()) {
> +    epoch = &mEpochs.Push(Epoch(Epoch::FirstEpochNumber(isLabeled), isLabeled));

Maybe just Epoch::First(isLabeled)?

@@ +72,5 @@
> +    epoch = &mEpochs.Push(Epoch(Epoch::FirstEpochNumber(isLabeled), isLabeled));
> +  } else {
> +    Epoch& lastEpoch = mEpochs.LastElement();
> +    if (lastEpoch.IsLabeled() != isLabeled) {
> +      epoch = &mEpochs.Push(Epoch(lastEpoch.mEpochNumber + 1, isLabeled));

Seems like there ought to be a better way, something like lastEpoch.nextOtherEpoch() here, that doesn't expose quite so much of the details of the numbering scheme.

@@ +145,5 @@
> +  return nullptr;
> +}
> +
> +PendingEventStatus
> +LabeledEventQueue::HasPendingEvent(MutexAutoLock& aProofOfLock)

Is there a deliberate reason that the lock proof parameters aren't const references?

@@ +163,5 @@
> +    }
> +  }
> +
> +  // Go through the labeled queues and look for one whose head is from the
> +  // current epoch and is allowed to run.

All the work we have to do in LabeledEventQueue::HasPendingEvent seems kind of expensive, especially since we have to do it both for "can I run something?" and then "get me something to run" after I've answered the first question in the affirmative.  And we have to do it for multiple queues for every visit to the event loop, correct?  Could we figure out a way that the second part of that could be cheaper, like returning a reference to the queue and/or queue entry in the PendingEventStatus::Ready state?  (I don't know of a good way offhand to make the first part cheaper, but maybe there's some good operating system idea we could steal.)

::: xpcom/threads/Scheduler.cpp
@@ +358,5 @@
> +{
> +  MutexAutoLock mutex(mLock);
> +  while (!mShuttingDown) {
> +    size_t threadIndex = mThreadPool->CurrentThreadIndex();
> +    if (threadIndex < kNumThreads && mContexts[threadIndex]) {

What is guaranteeing that this condition is going to be true?  Otherwise, we're going to get a thread index, find it doesn't satisfy the condition, and wait for 50us...and maybe do that for a while.  That's not going to be great.

::: xpcom/threads/Scheduler.h
@@ +31,5 @@
> +
> +  static bool UnlabeledEventRunning();
> +  static bool AnyEventRunning();
> +
> +  class EventLoopActivation

I think you can make this MOZ_RAII or something like that?
Attachment #8891136 - Flags: feedback?(nfroyd)
Posted patch main patch v2 (obsolete) — Splinter Review
I think this addresses all of your comments. A few remarks:

1. Unfortunately we start up the thread manager before XPCOM has been initialized, which means that prefs aren't available. I had to use a special mechanism to send down the scheduler-related prefs.

2. I found that breaking up HasPendingEvents into IsEmpty and HasReadyEvent works well. The path I'm most concerned about for performance is when nsThread gets an event. This calls into PrioritizedEventQueue, which then does some checks on various LabeledEventQueues. I was able to turn these all into IsEmpty checks, which are quite cheap. We only need to use HasReadyEvent when someone calls NS_HasPendingEvents or nsThread::HasPendingEvents, which are relatively rare I think.

3. There's still a FIXME in there because shutdown won't work when some of the cooperative threads are in nested event loops. I still need to think through how to solve this. But I don't think it should affect the review much.
Attachment #8891136 - Attachment is obsolete: true
Attachment #8897106 - Flags: review?(nfroyd)
Comment on attachment 8897106 [details] [diff] [review]
main patch v2

Review of attachment 8897106 [details] [diff] [review]:
-----------------------------------------------------------------

We need some sort of better scheduling algorithm, but I guess we can land this with the current one for testing, and try to iterate from there.

::: dom/ipc/ContentParent.cpp
@@ +2020,5 @@
>    extraArgs.push_back(boolPrefs.get());
>    extraArgs.push_back("-stringPrefs");
>    extraArgs.push_back(stringPrefs.get());
> +  extraArgs.push_back("-schedulerPrefs");
> +  extraArgs.push_back(schedulerPrefs.get());

Comments for why we're adding a separate command-line option for these would be splendid.  I'm not sure where the right place to add some comments are, but it'd be good if they weren't hidden in a bugzilla comment, because somebody's bound to be confused about this.

::: tools/profiler/core/platform.cpp
@@ -2984,5 @@
>  profiler_register_thread(const char* aName, void* aGuessStackTop)
>  {
>    DEBUG_LOG("profiler_register_thread(%s)", aName);
>  
> -  MOZ_RELEASE_ASSERT(!NS_IsMainThread());

Uh, why are these here?  Is this because we're attempting to register the cooperative scheduler threads as of interest to the profiler, and it's choking on this assertion, because those threads are "not" main threads?  Could we keep the assertion but weaken it a little bit for the cooperative threads?

::: xpcom/threads/CooperativeThreadPool.cpp
@@ +68,5 @@
> +    mShutdownCondition.Wait();
> +  }
> +
> +  for (size_t i = 0; i < mNumThreads; i++) {
> +    mThreads[i]->EndShutdown();

Are we intending to keep the mutex locked during cooperative thread shutdown?  If we are, maybe EndShutdown should take a MutexAutoLock to assert that state of affairs.

@@ +75,5 @@
> +
> +void
> +CooperativeThreadPool::RecheckBlockers(const MutexAutoLock& aProofOfLock)
> +{
> +  mMutex.AssertCurrentThreadOwns();

I have thought for a while that MutexAutoLock should have an method like AssertOwns(const Mutex&), which would make this assertion a little more idiomatic.

@@ +140,5 @@
> +
> +  {
> +    // Make sure only one thread at a time can proceed.
> +    MutexAutoLock lock(mPool->mMutex);
> +    while (mPool->mSelectedThread != AsVariant(mIndex)) {

I was going to comment on whether we could make this more efficient, but I suppose it shouldn't matter because we shouldn't really be sitting in this loop for very long, right?  The major loop is below, yes?

@@ +151,5 @@
> +  nsCOMPtr<nsIThread> thread = do_GetCurrentThread();
> +  mEventTarget = thread;
> +
> +  for (;;) {
> +    // FIXME: This won't work with nested event loops!

Uh.  That's not good!

::: xpcom/threads/CooperativeThreadPool.h
@@ +36,5 @@
> +    virtual void OnStartThread(size_t aIndex, const nsACString& aName, void* aStackTop) = 0;
> +    virtual void OnStopThread(size_t aIndex) = 0;
> +
> +    virtual void OnSuspendThread(size_t aIndex) = 0;
> +    virtual void OnResumeThread(size_t aIndex) = 0;

Some documentation on all of the Controller methods would be good.

::: xpcom/threads/LabeledEventQueue.h
@@ +11,5 @@
> +#include "mozilla/Queue.h"
> +
> +namespace mozilla {
> +
> +class LabeledEventQueue final : public AbstractEventQueue

This could stand some documentation.  The multi-paragraph comment below is a good start at explaining some of the implementation.  Maybe some of that could be cut out and reused for the class itself?

@@ +100,5 @@
> +    bool IsLabeled() const { return EpochNumberIsLabeled(mEpochNumber); }
> +
> +    Epoch NextEpoch(bool aIsLabeled) const
> +    {
> +      return Epoch(mEpochNumber + 1, aIsLabeled);

This function lets us construct invalid Epochs, right?  Can we somehow make that not possible?

::: xpcom/threads/Scheduler.cpp
@@ +319,5 @@
> +
> +already_AddRefed<nsIThreadObserver>
> +SchedulerEventQueue::GetObserverOnThread()
> +{
> +  return do_AddRef(mObserver.get());

MOZ_ASSERT that we're on the correct thread, here?

@@ +425,5 @@
> +      MakeUnique<EventQueue>(),
> +      MakeUnique<EventQueue>(),
> +      do_AddRef(aIdlePeriod));
> +
> +    prioritized = static_cast<PrioritizedEventQueue<AbstractEventQueue>*>(queue.get());

Maybe move the assignment to `prioritized` out of both arms?

@@ +518,5 @@
> +/* static */ void
> +Scheduler::EventLoopActivation::Init()
> +{
> +  if (!sTopActivation.init()) {
> +    MOZ_CRASH("unable to create EventLoopActivation TLS");

We should just have an infallibleInit method on thread local variables; we have this pattern in sooo many places.

@@ +648,5 @@
> +
> +  JSContext* cx = dom::danger::GetJSContext();
> +  mScheduler->SetJSContext(aIndex, cx);
> +  if (sPrefPreemption) {
> +    JS_AddInterruptCallback(cx, SchedulerImpl::InterruptCallback);

Are we hijacking some other interrupt callback here?  I guess the assumption is that we're running on a completely different thread than the actual main thread?  Too bad the JS engine doesn't give us any way to tell whether we're overwriting something here, so we could assert.

@@ +720,5 @@
> +{
> +  MOZ_ASSERT(XRE_IsParentProcess());
> +  nsPrintfCString result("%d%d%d%d,%d",
> +                         Preferences::GetBool("dom.ipc.scheduler", false),
> +                         Preferences::GetBool("dom.ipc.scheduler.chaoticScheduling", false),

I'm wondering whether chaoticScheduling should even be a pref; it doesn't really seem like something we'd want to advertise to users.

::: xpcom/threads/Scheduler.h
@@ +20,5 @@
> +class SchedulerGroup;
> +class SchedulerImpl;
> +class SynchronizedEventQueue;
> +
> +class Scheduler

Some documentation on this would be excellent.

::: xpcom/threads/nsThreadManager.cpp
@@ +47,5 @@
> +void
> +NS_SetMainThread(PRThread* aVirtualThread)
> +{
> +  gTlsCurrentVirtualThread.set(aVirtualThread);
> +  NS_SetMainThread();

Can we assert anything here to ensure we don't call this function at an untimely place?

::: xpcom/threads/nsThreadManager.h
@@ +44,5 @@
>    // initialized.
>    nsThread* GetCurrentThread();
>  
> +  nsThread* CreateCurrentThread(mozilla::SynchronizedEventQueue* aQueue,
> +                                nsThread::MainThreadFlag aMainThread);

This could stand some commentary on why it's different from GetCurrentThread.  Or maybe a name with "Cooperative" in it?

::: xpcom/threads/nsThreadUtils.h
@@ +1664,5 @@
> +void
> +NS_SetMainThread(PRThread* aVirtualThread);
> +
> +void
> +NS_UnsetMainThread();

These could stand some comments, maybe some way of restricting these to only being called from the cooperative scheduler.
Attachment #8897106 - Flags: review?(nfroyd) → feedback+
Posted patch MutexAutoLock assertions (obsolete) — Splinter Review
This patch adds the MutexAutoLock assertions you asked for (I think).
Attachment #8899669 - Flags: review?(nfroyd)
Posted patch main patch v3Splinter Review
> Are we hijacking some other interrupt callback here?  I guess the assumption is
> that we're running on a completely different thread than the actual main
> thread?  Too bad the JS engine doesn't give us any way to tell whether we're
> overwriting something here, so we could assert.

JS_AddInterruptCallback adds another interrupt callback. It doesn't replace the old one. They get called one after the other during an interrupt callback. If any one of them returns false, then the script is stopped.

I think I fixed everything else.
Attachment #8897106 - Attachment is obsolete: true
Attachment #8899671 - Flags: review?(nfroyd)
Comment on attachment 8899670 [details] [diff] [review]
add infallibleInit for TLS

Review of attachment 8899670 [details] [diff] [review]:
-----------------------------------------------------------------

Sure, this will work.
Attachment #8899670 - Flags: review?(nfroyd) → review+
Comment on attachment 8899669 [details] [diff] [review]
MutexAutoLock assertions

Review of attachment 8899669 [details] [diff] [review]:
-----------------------------------------------------------------

I should have made clearer that I didn't necessarily expect you to write these extra patches, but I'm happy you did!  Comments below.

::: xpcom/threads/Mutex.h
@@ +173,5 @@
>  
> +  // Assert that the current thread owns the lock right now.
> +  void AssertOwns() const
> +  {
> +    mLock->AssertCurrentThreadOwns();

I'm unsure about this method; if you have access to the lock at the point where you called BaseAutoLock::AssertOwns(), I think you should just be calling the method below.  If you don't have access to the lock at the point where you called BaseAutoLock::AssertOwns()...that's *really* strange code, right?  (I guess you could have a class that exposed:

  Mutex& GetLock();

and then you'd lock it, and calling GetLock() is a bit unwieldy...but I still think that's strange code, and we shouldn't cater too much to it.)

@@ +181,5 @@
> +  // aMutex is the lock passed to the constructor.
> +  void AssertOwns(const T& aLock) const
> +  {
> +    MOZ_ASSERT(&aLock == mLock);
> +    AssertOwns();

I was thinking that we'd just have AssertOwns(const T&) with the equality check; do we need to ask the mutex whether it's owned by the current thread?  You have an AutoLock instance constructed with the mutex, which locked the mutex, so if the mutex is equivalent to the one you've passed in, it is owned by definition.  Are there other circumstances where we'd need to check with the mutex for ownership?
Attachment #8899669 - Flags: review?(nfroyd) → feedback+
(In reply to Nathan Froyd [:froydnj] from comment #18)
> Comment on attachment 8899669 [details] [diff] [review]
> MutexAutoLock assertions
> 
> Review of attachment 8899669 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> @@ +181,5 @@
> > +  // aMutex is the lock passed to the constructor.
> > +  void AssertOwns(const T& aLock) const
> > +  {
> > +    MOZ_ASSERT(&aLock == mLock);
> > +    AssertOwns();
> 
> I was thinking that we'd just have AssertOwns(const T&) with the equality
> check; do we need to ask the mutex whether it's owned by the current thread?
> You have an AutoLock instance constructed with the mutex, which locked the
> mutex, so if the mutex is equivalent to the one you've passed in, it is
> owned by definition.  Are there other circumstances where we'd need to check
> with the mutex for ownership?

I'm mostly worried about a case like this:
  {
    MutexAutoLock lock(mutex);
    {
      MutexAutoUnlock unlock(mutex);
      CallFuncThatExpectsLock(lock);
    }
  }
Comment on attachment 8899671 [details] [diff] [review]
main patch v3

Review of attachment 8899671 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for adding all the extra documentation!  Hopefully that will make it easier for future developers to untangle things.

Let's get this landed and then start experimenting with it.
Attachment #8899671 - Flags: review?(nfroyd) → review+
(In reply to Bill McCloskey (:billm) from comment #19)
> (In reply to Nathan Froyd [:froydnj] from comment #18)
> > Are there other circumstances where we'd need to check
> > with the mutex for ownership?
> 
> I'm mostly worried about a case like this:
>   {
>     MutexAutoLock lock(mutex);
>     {
>       MutexAutoUnlock unlock(mutex);
>       CallFuncThatExpectsLock(lock);
>     }
>   }

Ugh.  OK, that's reasonable.  Can you describe in a comment why pointer equality checking is insufficient, then?
Attachment #8899669 - Attachment is obsolete: true
Attachment #8900378 - Flags: review?(nfroyd)
I noticed this while fixing some test failures. The thread observers really belong on the event queue rather than the thread, since they get called whenever an event is run out of the queue. We want to share the observers across all threads that use the same queue. The easiest place to put them is on SynchronizedEventQueue, since it's the nsThread's view of an "event queue". Unlike the other stuff on that class, this stuff doesn't need to be virtual. It's just some data that we attach to all implementations.
Attachment #8900398 - Flags: review?(nfroyd)
Comment on attachment 8900378 [details] [diff] [review]
MutexAutoLock assertions v2

Review of attachment 8900378 [details] [diff] [review]:
-----------------------------------------------------------------

I have suggestions, but otherwise this looks good.

::: xpcom/threads/Mutex.h
@@ +183,5 @@
> +  // ...
> +  // CallMethod(lock);
> +  //
> +  // If CallMethod calls aProofOfLock.AssertOwns, then it can guarantee that the
> +  // "proof" is valid.

Thanks for explaining this.  WDYT about presenting the less complicated case first:

"Assert that aLock is the mutex passed to the constructor and that the current thread owns the mutex.  In coding patterns such as:

  void LockedMethod(const MutexAutoLock& aProofOfLock)
  {
    aProofOfLock.assertOwns(mMutex);
    ...
  }

the former condition is asserting that the "proof" actually owns the mutex that we think it should.  The need for the latter condition is more subtle; it's possible to have code like:

  MutexAutoLock lock(someMutex);
  ...
  MutexAutoUnlock unlock(someMutex);
  ...
  LockedMethod(lock);

and in such a case, simply asserting that the mutex pointers match is not sufficient; mutex ownership must be asserted as well.

Note that if you are going to use the coding pattern presented above, you should use this method in preference to using AssertCurrentThreadOwns on the mutex you expected to be held, since this method provides stronger guarantees."
Attachment #8900378 - Flags: review?(nfroyd) → review+
Comment on attachment 8900398 [details] [diff] [review]
move nsThread::mEventObservers to event queue

Review of attachment 8900398 [details] [diff] [review]:
-----------------------------------------------------------------

Doh, good catch.
Attachment #8900398 - Flags: review?(nfroyd) → review+

Comment 26

2 years ago
Pushed by wmccloskey@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3e03fe144848
Remove mEventObservers to SynchronizedEventQueue (r=froydnj)
https://hg.mozilla.org/integration/mozilla-inbound/rev/aa70fc29d09a
Add MutexAutoLock::AssertOwns (r=froydnj)
https://hg.mozilla.org/integration/mozilla-inbound/rev/0cbb2122622b
Add TLS infallible init (r=froydnj)
https://hg.mozilla.org/integration/mozilla-inbound/rev/3519544711af
Initial Quantum DOM scheduler implementation, disabled by default (r=froydnj)
Depends on: 1394655

Comment 28

2 years ago
It looks like I've just come across a bug that makes Nightly hang and crash. STR: create new profile, go to about:config and set
input_event_queue.supported true
dom.ipc.scheduler true

Close Nightly, launch Nightly, open new tab, go to https://bugzilla.mozilla.org/
Open new tab, go to https://bugzilla.mozilla.org/show_bug.cgi?id=1390044
Open new tab, go to about:config

ER: pages are quickly loaded and rendered
AR: tabs get stuck loading forever, blank screen/new tab page instead of the web page, high CPU usage.

Nightly works normally with only input_event_queue.supported set to true. Nightly works normally with only dom.ipc.scheduler set to true.
Bill, did you see comment 28?
Flags: needinfo?(wmccloskey)
(In reply to ajfhajf from comment #28)
> It looks like I've just come across a bug that makes Nightly hang and crash.

Tested with the latest code on central, found that it asserted at [1] when comparing the timestamp. It's because we don't initialize the timestamp to handle input events at [2] when SelectQueue [3]

Tried to update the timestamp but hit assert NS_IsMainThread() in nsRefreshDriver. I think that's because the current thread is not yet marked as the main thread when calling CooperativeThreadPool::RecheckBlockers [4]

[1] http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/xpcom/threads/PrioritizedEventQueue.cpp#157
[2] http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/xpcom/threads/PrioritizedEventQueue.cpp#125
[3] http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/xpcom/threads/PrioritizedEventQueue.cpp#272
[4] http://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/xpcom/threads/CooperativeThreadPool.cpp#91

Comment 31

2 years ago
One more observation.

1. Create new profile, set

dom.ipc.scheduler true
dom.ipc.scheduler.preemption true

Restart, open new tab, go to https://bugzilla.mozilla.org/
Open new tab, go to https://bugzilla.mozilla.org/show_bug.cgi?id=1350432

ER: Both pages load fine
AR: Both tabs crash

2. Create new profile, set

dom.ipc.scheduler true
dom.ipc.scheduler.preemption true
dom.ipc.scheduler.chaoticScheduling true
dom.ipc.scheduler.useMultipleQueues true

Restart, open new tab, go to https://bugzilla.mozilla.org/
Open new tab, go to https://bugzilla.mozilla.org/show_bug.cgi?id=1350432

ER: Both pages load fine, minimal CPU usage
AR: I have quad-core i5 2300, every Nightly process uses 25% of the CPU, CPU stuck at 100% load, browser and Windows stop responding
(In reply to Andrew Overholt [:overholt] from comment #29)
> Bill, did you see comment 28?

Yes, I think this is fixed now.

Regarding comment 31, it's not supported to turn on preemption without useMultipleQueues.
Flags: needinfo?(wmccloskey)

Comment 33

2 years ago
(In reply to Bill McCloskey (:billm) from comment #32)
> Yes, I think this is fixed now.
> 
> Regarding comment 31, it's not supported to turn on preemption without
> useMultipleQueues.
I can still reproduce bug from comment 28 in Nightly 18-09-17, new profile.

Comment 34

2 years ago
From my tests on the latest Nightly, it seems like enabling preemption causes a massive spike in CPU usage and renders Firefox unresponsive.
Keith, would you mind filing a separate bug about comment 34? Thank you.
Flags: needinfo?(kah0922)
I don't think we need further bugs. This code isn't ready for testing.
Flags: needinfo?(kah0922)

Comment 37

2 years ago
(In reply to Bill McCloskey (:billm) from comment #36)
> I don't think we need further bugs. This code isn't ready for testing.

Perhaps we need to hide the prefs from about:config for now? :-)

Comment 38

2 years ago
After more tests (Latest Nightly on Linux):
dom.ipc.scheduler on its own works. It does rename the WebExtensions process to just Web.
Enabling dom.ipc.scheduler.chaoticScheduling also works, but it breaks remote WebExtensions.

So currently only dom.ipc.scheduler.preemption renders Firefox unusable.
FYI...

dom.ipc.scheduler set to true causes Fx58 to crash on Windows 10 x86-64.  Totally not usable even with a new profile.

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:58.0) Gecko/20100101 Firefox/58.0
Some crash reports for you:

https://crash-stats.mozilla.com/report/index/abefbd2f-c567-4779-9e89-bfa031170922

https://crash-stats.mozilla.com/report/index/153dd00e-adbf-4817-ac4e-7d2191170922

When I first start Fx58 the weirdness begins.  I eventually hang and have to exit Fx.  Upon restarting it I get the crash dumps.
problem also happens on Fx57.
Depends on: 1410077
Depends on: 1410081
Depends on: 1410096

Updated

2 years ago
Depends on: 1414645

Updated

Last year
Depends on: 1442970
You need to log in before you can comment on or make changes to this bug.