telemetry | High frequency Assertion failure: mThreadLocalIndex != kBadThreadLocalIndex (BackgroundChild::Startup() was never called!), at /builds/worker/checkouts/gecko/ipc/glue/BackgroundImpl.cpp:378
Categories
(Core :: IPC, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox-esr91 | --- | fixed |
firefox89 | --- | unaffected |
firefox90 | --- | unaffected |
firefox91 | --- | wontfix |
firefox93 | --- | wontfix |
firefox94 | --- | fixed |
People
(Reporter: aosmond, Assigned: jld)
References
(Regression)
Details
(Keywords: assertion, intermittent-failure, regression)
Attachments
(2 files, 2 obsolete files)
192.98 KB,
application/zip
|
Details | |
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr91+
|
Details | Review |
+++ This bug was initially created as a clone of Bug #1644166 +++
This seems to be permafailing on https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=YjfU3do0SzyzEiXq761CMA.0&revision=c7230b53515ec3e0552294ced049f7ee9072e128&searchStr=linux%2Cwebrender%2Cdebug%2Ctelemetry
[task 2021-06-08T23:43:57.878Z] 23:43:57 INFO - Assertion failure: mThreadLocalIndex != kBadThreadLocalIndex (BackgroundChild::Startup() was never called!), at /builds/worker/checkouts/gecko/ipc/glue/BackgroundImpl.cpp:383
We can't move these tests over until the intermittent gets fixed (or we get a pass for creating a permafail).
Reporter | ||
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 4•3 years ago
•
|
||
There are 151 total failures in the last 7 days on
- windows10-64-qr debug
- windows10-64 debug
- windows10-32-qr debug
- macosx1015-64-qr debug
- linux1804-64 debug
Randell, please take a look.
Updated•3 years ago
|
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 6•3 years ago
|
||
Should be fixed by the backout of Bug 1687843:
https://hg.mozilla.org/integration/autoland/rev/a0c0b00b2df9
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Updated•3 years ago
|
Comment 9•3 years ago
|
||
Jesup, this has reappeared on esr91: https://treeherder.mozilla.org/jobs?repo=mozilla-esr91&duplicate_jobs=visible&resultStatus=testfailed%2Cbusted%2Cexception&fromchange=820004fa5e68dc6da164d9ffb5149fd901960c3f&searchStr=Linux%2C18.04%2Cx64%2CWebRender%2Cdebug%2CTelemetry%2Ctests%2Ctest-linux1804-64-qr%2Fdebug-telemetry-tests-client-e10s%2Cc
Could you please take a look?
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 13•3 years ago
|
||
Jed, this bug is failing frequently on ESR91 and we're struggling to find someone who can help move the investigation forward. Can you please take a look and redirect as needed? Thanks.
Assignee | ||
Comment 14•3 years ago
|
||
I think what's going on here is that the idle scheduler is being used (in a content process) before PBackground
is initialized, because the latter happens relatively late: from ContentChild::InitXPCOM
called from RecvSetXPCOMProcessAttributes
. So if thread scheduling is such that receiving the message is delayed enough for the process to consider itself idle before that point, then I can see how we'd crash like this.
But it's interesting that this is happening only on ESR91 and no other branches.
Redirecting needinfo to people who might have more detailed knowledge.
Assignee | ||
Updated•3 years ago
|
Comment 15•3 years ago
|
||
ok, so obviously the API (GetOrCreateForCurrentThread) doesn't hint that it can be called only in some cases but not in others. It should just handle the problematic case in some reasonable way.
To fix this particular issue it should be enough to return TimeStamp() or TimeStamp::Now() from
https://searchfox.org/mozilla-central/rev/ad2ffab089e4e0c0fe99a1a046ab2b1c45546bdb/xpcom/threads/IdlePeriodState.cpp#102 when PBackground isn't ready yet. What would be the easier way to check that?
Assignee | ||
Comment 16•3 years ago
|
||
I discussed this with the other IPC people and it was pointed out that the end of ContentProcess::Init
should be late enough for everything in BackgroundChild::Startup
to work and earlier than anything that involves event loops.
Also, the thread-local storage initialization could be separated from the rest and done much earlier if necessary, but I don't think that's the case here.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 21•3 years ago
|
||
Comment 22•3 years ago
|
||
bugherder uplift |
Comment hidden (Intermittent Failures Robot) |
Comment 24•3 years ago
|
||
Comment 25•3 years ago
|
||
Disabling the one test didn't work, so let's skip the entire job on Linux debug instead.
Comment 26•3 years ago
|
||
bugherder uplift |
Updated•3 years ago
|
Assignee | ||
Comment 27•3 years ago
|
||
I'll take this. If comment 14 / comment 16 are right, it should be simple to fix.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 29•3 years ago
|
||
Try run on the ESR91 branch (passes 40x), vs. the control group (fails in 8/10 test runs). I'll send this for review.
Assignee | ||
Comment 30•3 years ago
|
||
Previously we were staring PBackground
in content processes in
response to receiving the SetXPCOMProcessAttributes
IPC message, which
is sent immediately after the process is launched. Meanwhile, the
idle scheduler tries to use PBackground when the main thread considers
itself idle. But if thread scheduling is such that the content process
main thread becomes idle before the IPC I/O thread has received and
dispatched that message, then we have a problem (signaled by an assertion
failure).
This patch moves content process PBackground
startup earlier, to the
end of ContentProcess::Init
; that point is after enough of IPC and
XPCOM is started for it to work, but before we start spinning the main
thread event loop.
Comment 31•3 years ago
|
||
Pushed by jedavis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/671ba1530436 Start PBackground earlier in content processes. r=nika
Comment 32•3 years ago
|
||
Backout by ccozmuta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ff118c8d4a31 Backed out changeset 671ba1530436 for casuing bustages on ContentProcess.cpp. CLOSED TREE
Comment 33•3 years ago
|
||
Pushed by jedavis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/82e09e54c635 Start PBackground earlier in content processes. r=nika
Comment 34•3 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Updated•3 years ago
|
Comment 36•3 years ago
|
||
This fix looks fantastic on Try on top of ESR91:
https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Cusercancel%2Crunning%2Cpending%2Crunnable&revision=f54ec923b5da8d80e8069d1a7ad96ccf372ae41f&searchStr=telemetry-tests
Is this something we can safely uplift there? If so, go ahead and nominate.
Assignee | ||
Comment 37•3 years ago
|
||
Comment on attachment 9242150 [details]
Bug 1715414 - Start PBackground earlier in content processes.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Fixes intermittent test failures.
- User impact if declined: This is relatively harmless in practice (the idle scheduler code looks like it fails gracefully, and I don't think any other PBackground users are currently affected), but the main motivation for uplift is to allow the failing tests that were disabled earlier in this bug to be re-enabled so we can catch any actual failures in those tests.
- Fix Landed on Version: 94
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This just runs an existing small part of child process startup earlier (but still after the things it depends on; the code is simple enough to be reasonably sure of that), and it's been stable on 94 for a while.
- String or UUID changes made by this patch: none
Comment 38•3 years ago
|
||
Comment on attachment 9242150 [details]
Bug 1715414 - Start PBackground earlier in content processes.
Approved for 91.3esr, thanks.
Updated•3 years ago
|
Comment 39•3 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr91/rev/e1a3b77fc5f3
https://hg.mozilla.org/releases/mozilla-esr91/rev/06a904f58de5
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Description
•