Factor out idle state tracking from PrioritizedEventQueue
Categories
(Core :: XPCOM, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox71 | --- | fixed |
People
(Reporter: bzbarsky, Assigned: bzbarsky)
Details
Attachments
(1 file)
We're going to want this in multiple places, it looks like, so it should go into a separate object.
Assignee | ||
Comment 1•5 years ago
|
||
We could try to move the EnforcePEndingTaskGuarantee() bit into PeekIdleDeadline, but
then we'd need to check HasReadyEvent() on mDeferredTimersQueue and mIdleQueue
before we unlock the mutex and PeekIdleDeadline, and it's not clear that that
state cannot change once the mutex is unlocked...
FYI, backing out a tiny comment change I just landed may make it easier to land this stuff
https://hg.mozilla.org/integration/autoland/rev/c1b0190a53476e04a3f79d056d177af0c30adfa0
Assignee | ||
Comment 3•5 years ago
|
||
Ah, I should just re-merge on top of that change, and that will make it easier to merge to our other stuff later.
Comment 6•5 years ago
•
|
||
Please ignore the backout commit message, was led astray by this round of failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&fromchange=c3f3c9dd353724c0e16cab127bd04ce3afc59537&searchStr=OS%2CX%2C10.14%2Cdebug%2CWeb%2Cplatform%2Ctests%2Ctest-macosx1014-64%2Fdebug-web-platform-tests-e10s-16%2CW%28wpt16%29&tochange=5812f2d6c02fcb5dce7a722f94546a726b09c547&selectedJob=271949074
However, the changes in https://hg.mozilla.org/integration/autoland/rev/798bab343a0aa9e7940e5a449e6555096ecdc4ac might have led to a spike in build bustages, especially valgrind ones like: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=busted&tochange=21b88cef47e1e4aafb8450953739d3c15cffb766&fromchange=b5f7dccff453e27c686b89f4d94852d6ec37d1fa&selectedJob=271949018
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=271944122&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=271947273&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=271957135&repo=autoland
Boris, could you please take a look over this? Thank you.
Assignee | ||
Comment 7•5 years ago
|
||
OK, so I tracked down what's going on with the gtest failures, at least; I managed to reproduce them under rr on Linux with chaos mode. I expect the other failures are similar, though I haven't managed to reproduce them locally yet.
We're running the test at https://searchfox.org/mozilla-central/rev/696cf3b52a9aa04aa5013008a8b448904a298356/xpcom/tests/gtest/TestThreads.cpp#99-126 which basically creates 50 threads, joins them all, repeats that 1000 times. We hang partway through the test.
That happens because we are running on the main thread in the test and attempt to join an nsThread
by calling Shutdown
on it. That posts a runnable to that thread, which shuts down and posts a runnable back to us. We, in the meantime, block while waiting for that runnable via the SpinEventLoopUntil
call under nsThread::Shutdown
. In our event loop spin we call into PrioritizedEventQueue::GetEvent
and if it returns null we sleep until an event appears. In this case it returns null, we sleep, and never wake up.
And that happens because we call GetEvent
and discover that there is no event queued so far. So we go to run this code:
MutexAutoUnlock unlock(*mMutex);
mIdlePeriodState.RanOutOfTasks(unlock);
return nullptr;
and while we have the mutex unlocked is exactly when the shutting-down thread queues its nsThreadShutdownAckEvent
on us. So we return null and sleep waiting for more events to appear, but that never happens. So yes, the MutexAutoUnlock
thing is just fishy...
Without my changes this does not fail because the code before my changes looks like this:
EnsureIsPaused();
ClearIdleToken();
return nullptr;
Now both EnsureIsPaused
and ClearIdleToken
can end up doing MutexAutoUnlock
in the code without my changes, but ClearIdleToken
only does it if we have mIdleRequestId
, and that only happens in content processes, while this test is running in a parent process. And EnsureIsPaused
only does the unlock if mActive
is true (which in this case it's not) and mIdleScheduler
is non-null (but in this case it's null, of course). So without my changes this test can never reach the MutexAutoUnlock
lines.
Of course in an actual Firefox run, if a content process ever joins a thread while not in shutdown, it could in fact hit that exact hang if nothing else posts any other runnables to the main-thread event queue. Which seems like a pretty unlikely situation....
Anyway, in terms of fixing this, apart from getting rid from the MutexAutoUnlock
bits completely, which is the long-term plan what I could do for now is replace all the "proof of unlock" arguments with a Mutex&
argument and doing the MutexAutoUnlock
calls within IdlePeriodState in places that exactly correspond to the current MutexAutoUnlock
calls. That will at least restore the status quo.
I pushed https://treeherder.mozilla.org/#/jobs?repo=try&revision=204d00c2281d701a578d233dfc55b2764b2ce426 with a patch to that effect, for the moment. If that generally sounds reasonable, I'll put it up in phabricator...
That tryserver patch looks ok to me, for now.
I agree getting rid of unlocks is something we should do eventually.
(And there is a separate bug to get rid of nsThread::Shutdow requiring spinning event loops)
Comment 10•5 years ago
|
||
bugherder |
Description
•