Closed Bug 1589561 Opened 2 years ago Closed 2 years ago

Factor out idle state tracking from PrioritizedEventQueue


(Core :: XPCOM, task)

Not set



Tracking Status
firefox71 --- fixed


(Reporter: bzbarsky, Assigned: bzbarsky)



(1 file)

We're going to want this in multiple places, it looks like, so it should go into a separate object.

We could try to move the EnforcePEndingTaskGuarantee() bit into PeekIdleDeadline, but
then we'd need to check HasReadyEvent() on mDeferredTimersQueue and mIdleQueue
before we unlock the mutex and PeekIdleDeadline, and it's not clear that that
state cannot change once the mutex is unlocked...

FYI, backing out a tiny comment change I just landed may make it easier to land this stuff

Ah, I should just re-merge on top of that change, and that will make it easier to merge to our other stuff later.

Pushed by
Factor out idle handling from PrioritizedEventQueue.  r=smaug
Backout by
Backed out changeset 798bab343a0a for turning bug 1586420 into permafail. CLOSED TREE
Flags: needinfo?(bzbarsky)

OK, so I tracked down what's going on with the gtest failures, at least; I managed to reproduce them under rr on Linux with chaos mode. I expect the other failures are similar, though I haven't managed to reproduce them locally yet.

We're running the test at which basically creates 50 threads, joins them all, repeats that 1000 times. We hang partway through the test.

That happens because we are running on the main thread in the test and attempt to join an nsThread by calling Shutdown on it. That posts a runnable to that thread, which shuts down and posts a runnable back to us. We, in the meantime, block while waiting for that runnable via the SpinEventLoopUntil call under nsThread::Shutdown. In our event loop spin we call into PrioritizedEventQueue::GetEvent and if it returns null we sleep until an event appears. In this case it returns null, we sleep, and never wake up.

And that happens because we call GetEvent and discover that there is no event queued so far. So we go to run this code:

    MutexAutoUnlock unlock(*mMutex);
    return nullptr;

and while we have the mutex unlocked is exactly when the shutting-down thread queues its nsThreadShutdownAckEvent on us. So we return null and sleep waiting for more events to appear, but that never happens. So yes, the MutexAutoUnlock thing is just fishy...

Without my changes this does not fail because the code before my changes looks like this:

    return nullptr;

Now both EnsureIsPaused and ClearIdleToken can end up doing MutexAutoUnlock in the code without my changes, but ClearIdleToken only does it if we have mIdleRequestId, and that only happens in content processes, while this test is running in a parent process. And EnsureIsPaused only does the unlock if mActive is true (which in this case it's not) and mIdleScheduler is non-null (but in this case it's null, of course). So without my changes this test can never reach the MutexAutoUnlock lines.

Of course in an actual Firefox run, if a content process ever joins a thread while not in shutdown, it could in fact hit that exact hang if nothing else posts any other runnables to the main-thread event queue. Which seems like a pretty unlikely situation....

Anyway, in terms of fixing this, apart from getting rid from the MutexAutoUnlock bits completely, which is the long-term plan what I could do for now is replace all the "proof of unlock" arguments with a Mutex& argument and doing the MutexAutoUnlock calls within IdlePeriodState in places that exactly correspond to the current MutexAutoUnlock calls. That will at least restore the status quo.

I pushed with a patch to that effect, for the moment. If that generally sounds reasonable, I'll put it up in phabricator...

Flags: needinfo?(bzbarsky) → needinfo?(bugs)

That tryserver patch looks ok to me, for now.

I agree getting rid of unlocks is something we should do eventually.

(And there is a separate bug to get rid of nsThread::Shutdow requiring spinning event loops)

Flags: needinfo?(bugs)
Pushed by
Factor out idle handling from PrioritizedEventQueue.  r=smaug
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71
You need to log in before you can comment on or make changes to this bug.