Linux webrender asan xpcshell frequent retries that end up in exception
Categories
(Core :: DOM: Content Processes, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr102 | --- | unaffected |
| firefox109 | --- | unaffected |
| firefox110 | --- | unaffected |
| firefox111 | --- | fixed |
People
(Reporter: smolnar, Assigned: jstutte)
References
(Regression)
Details
(Keywords: regression, Whiteboard: [stockwell disable-recommended])
Attachments
(1 file)
There are frequent Linux webrender asan xpcshell test retries that end up in exception:
Started from this push
@Jens, can you take a look?
Range is indicating this failure started from bug 1810666
Updated•2 years ago
|
Comment 1•2 years ago
|
||
Set release status flags based on info from the regressing bug 1810666
| Assignee | ||
Comment 2•2 years ago
|
||
I am able to reproduce this but unfortunately the log files of those tasks seem not to be accessible. I can probably try and run the asan build from try locally (which I never tried) but then I would not know for which test I should look out?
There is some potential in the patches from bug 1810666 to cause more content process creations during parent shutdown, this might just make us hit some limit of the machine or we might see a real problem with those patches.
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 4•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #2)
There is some potential in the patches from bug 1810666 to cause more content process creations during parent shutdown, this might just make us hit some limit of the machine or we might see a real problem with those patches.
I added some diagnostic assert to see if we ever hit the case I have in mind.
| Assignee | ||
Comment 5•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #4)
I added some diagnostic assert to see if we ever hit the case I have in mind.
From a first run, this seems not to be the case - and it ran successfully now. I re-triggered another few times just to understand, if this is intermittent now. Did someone alter the machine's configuration?
| Assignee | ||
Comment 6•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #5)
From a first run, this seems not to be the case - and it ran successfully now. I re-triggered another few times just to understand, if this is intermittent now. Did someone alter the machine's configuration?
Hmm, now I see 6 re-trigger of X3, while I am sure I just started two of them. There seems something weird going on with restarts here? In any case and without being able to see any logs I do not see much actionable here for me.
Comment 7•2 years ago
•
|
||
The test groups that run when this ends as an exception are:
browser/components/customizableui/test/unit/xpcshell.ini
browser/components/sessionstore/test/unit/xpcshell.ini
browser/extensions/formautofill/test/unit/heuristics/third_party/xpcshell.ini
browser/tools/mozscreenshots/tests/xpcshell/xpcshell.ini
chrome/test/unit/xpcshell.ini
devtools/server/actors/compatibility/lib/test/xpcshell/xpcshell.ini
devtools/shared/discovery/tests/xpcshell/xpcshell.ini
devtools/shared/tests/xpcshell/xpcshell.ini
devtools/shared/webconsole/test/xpcshell/xpcshell.ini
docshell/test/unit/xpcshell.ini
dom/abort/tests/unit/xpcshell.ini
dom/base/test/unit_ipc/xpcshell.ini
dom/encoding/test/unit/xpcshell.ini
dom/media/webvtt/test/xpcshell/xpcshell.ini
dom/messagechannel/tests/unit/xpcshell.ini
dom/notification/test/unit/xpcshell.ini
dom/quota/test/xpcshell/xpcshell.ini
dom/tests/unit/xpcshell.ini
extensions/pref/autoconfig/test/unit/xpcshell.ini
extensions/pref/autoconfig/test/unit/xpcshell_snap.ini
intl/uconv/tests/unit/xpcshell.ini
js/xpconnect/tests/unit/xpcshell.ini
modules/libjar/test/unit/xpcshell.ini
modules/libmar/tests/unit/xpcshell.ini
parser/xml/test/unit/xpcshell.ini
remote/shared/test/xpcshell/xpcshell.ini
security/manager/ssl/tests/unit/xpcshell-smartcards.ini
testing/modules/tests/xpcshell/xpcshell.ini
toolkit/components/aboutthirdparty/tests/xpcshell/xpcshell.ini
toolkit/components/asyncshutdown/tests/xpcshell/xpcshell.ini
toolkit/components/autocomplete/tests/unit/xpcshell.ini
toolkit/components/commandlines/test/unit_unix/xpcshell.ini
toolkit/components/contextualidentity/tests/unit/xpcshell.ini
toolkit/components/credentialmanagement/tests/xpcshell/xpcshell.ini
toolkit/components/ctypes/tests/unit/xpcshell.ini
toolkit/components/downloads/test/unit/xpcshell.ini
toolkit/components/extensions/test/xpcshell/xpcshell.ini
toolkit/components/mediasniffer/test/unit/xpcshell.ini
toolkit/components/messaging-system/targeting/test/unit/xpcshell.ini
toolkit/components/mozintl/test/xpcshell.ini
toolkit/components/osfile/tests/xpcshell/xpcshell.ini
toolkit/components/passwordmgr/test/unit/xpcshell.ini
toolkit/components/satchel/test/unit/xpcshell.ini
toolkit/components/startup/tests/unit/xpcshell.ini
toolkit/components/telemetry/dap/tests/xpcshell/xpcshell.ini
toolkit/components/thumbnails/test/xpcshell.ini
toolkit/components/urlformatter/tests/unit/xpcshell.ini
toolkit/components/windowcreator/tests/unit/xpcshell.ini
toolkit/mozapps/update/tests/unit_service_updater/xpcshell.ini
toolkit/profile/xpcshell/xpcshell.ini
widget/headless/tests/xpcshell.ini
fwiw these ^ are only run on backstop pushes
vs when there's a green one there's only this one:
toolkit/components/extensions/test/xpcshell/xpcshell.ini
Taking as example this range.
Jens, could this be a case of a test misbehaving just as it was in Bug 1796753?
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 9•2 years ago
•
|
||
The patches from bug 1810666 did not change any test directly. However there is potential for them to cause a higher number of content processes or at least to change the order with which they are created/removed. However, AFAICS, this try shows that the only known case where we expect this to be possible is not hit if applying also the patches from bug 1811195 (but the test still fails), but I might overlook something.
Does the XPCShell test harness know, how many processes were ever spawned or even better the maximum number of processes being alive in parallel? It would be interesting to compare this number between the successful and the failed runs.
And can we reduce the number of tests running in parallel on those instances (or give them more memory), just to see if it makes a difference? I'd like to understand if something is really going nuts and allocating a very high number of extra processes or if we just sail along the border already and small fluctuations make us fail.
| Reporter | ||
Comment 10•2 years ago
|
||
Unfortunately do not have information on the specific metrics regarding the XPCShell test harness.
@Aryx, do you have any insight about this?
| Comment hidden (Intermittent Failures Robot) |
Comment 12•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #9)
Does the XPCShell test harness know, how many processes were ever spawned or even better the maximum number of processes being alive in parallel? It would be interesting to compare this number between the successful and the failed runs.
And can we reduce the number of tests running in parallel on those instances (or give them more memory), just to see if it makes a difference? I'd like to understand if something is really going nuts and allocating a very high number of extra processes or if we just sail along the border already and small fluctuations make us fail.
Comment 13•2 years ago
|
||
interesting questions, this might be possible- maybe some solutions here.
- xpcshell.ini has options to run tests sequentially, not in parallel ( https://searchfox.org/mozilla-central/search?q=sequential&path=xpcshell.ini&case=false®exp=false ). In fact, about a year ago I took the most frequent failures (ones that almost perma failed in parallel but maybe not in sequential) and forced them to run as sequential
- we run 1 test per thread, and this is defined by https://searchfox.org/mozilla-central/source/testing/xpcshell/runxpcshelltests.py#54 (
NUM_THREADS = int(cpu_count() * 4))
The question isn't answered yet, here is what answers more of it, if you designate a test to run-sequentially, it will be put into a list and after all the parallel tests are completed we iterate through the sequential list:
https://searchfox.org/mozilla-central/source/testing/xpcshell/runxpcshelltests.py#1946
So a few ways forward:
- adjust num_threads and push to ry
- if certains tests or directories are suspect, add
run-sequentiallyto the manifest
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 15•2 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #13)
So a few ways forward:
- adjust num_threads and push to ry
- if certains tests or directories are suspect, add
run-sequentiallyto the manifest
(In reply to Cosmin Sabou [:CosminS] from comment #7)
The test groups that run when this ends as an exception are:
browser/components/customizableui/test/unit/xpcshell.ini ... widget/headless/tests/xpcshell.inifwiw these ^ are only run on backstop pushes
vs when there's a green one there's only this one:toolkit/components/extensions/test/xpcshell/xpcshell.ini
:CosminS, based on the above: do you have an idea for which tests we could apply those manifest changes then? I do not really have the feeling we identified a clear offender, yet.
| Assignee | ||
Comment 16•2 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #13)
- we run 1 test per thread, and this is defined by https://searchfox.org/mozilla-central/source/testing/xpcshell/runxpcshelltests.py#54 (
NUM_THREADS = int(cpu_count() * 4))
It seems that constant has not been changed for 9 years now. But IIUC there is also an option threadCount that can be set from the command line ? Do we ever use this option and/or would that be an easier way to test it ?
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 18•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #16)
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #13)
- we run 1 test per thread, and this is defined by https://searchfox.org/mozilla-central/source/testing/xpcshell/runxpcshelltests.py#54 (
NUM_THREADS = int(cpu_count() * 4))It seems that constant has not been changed for 9 years now. But IIUC there is also an option
threadCountthat can be set from the command line ? Do we ever use this option and/or would that be an easier way to test it ?
There is also this adjustment for tsan already. I cannot find anything similar for asan, though. A successful run of X4 asan shows:
[task 2023-01-20T19:43:21.720Z] 19:43:21 INFO - Using at most 8 threads.
while a successful run of tsan says:
[task 2023-01-20T19:24:56.226Z] 19:24:56 INFO - Using at most 4 threads.
logged from runxpcshelltests.py.
If we knew cpu_count() == 2 of the interested node then this could be the initial NUM_THREADS = int(cpu_count() * 4) value for asan and the adjustment of that value for tsan. It is probably reasonable to make the same/a similar adjustment for asan?
| Assignee | ||
Comment 19•2 years ago
|
||
| Assignee | ||
Comment 20•2 years ago
•
|
||
(In reply to Jens Stutte [:jstutte] from comment #19)
Try: https://treeherder.mozilla.org/jobs?repo=try&revision=4ae7bd2d6aa321cca5b1c042c33122ccd1ad1657
That looks good, so far. The log shows we are using now 4 "threads". For some reason I do not understand phabricator keeps the patch I attached here secret, so it does not show up here?
Comment 21•2 years ago
|
||
I think lowering the thread count for asan wouldn't be a problem.
| Assignee | ||
Comment 22•2 years ago
|
||
Updated•2 years ago
|
Comment 23•2 years ago
|
||
Comment 24•2 years ago
|
||
| bugherder | ||
| Comment hidden (Intermittent Failures Robot) |
| Reporter | ||
Updated•2 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Description
•