Intermittent TEST-UNEXPECTED-TIMEOUT | toolkit/components/extensions/test/mochitest/test_ext_webrequest_auth.html | application timed out after 370 seconds with no output
Categories
(WebExtensions :: General, defect, P5)
Tracking
(firefox-esr78 unaffected, firefox89 unaffected, firefox90 unaffected, firefox91 fixed, firefox92 fixed)
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox89 | --- | unaffected |
firefox90 | --- | unaffected |
firefox91 | --- | fixed |
firefox92 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: rpl)
References
(Regression)
Details
(Keywords: intermittent-failure, regression, Whiteboard: [stockwell fixed:patch])
Crash Data
Attachments
(3 files, 1 obsolete file)
Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=342754204&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QivaseasQc6rmZ5gni9nSQ/runs/0/artifacts/public/logs/live_backing.log
[task 2021-06-14T22:55:32.941Z] 22:55:32 INFO - TEST-PASS | toolkit/components/extensions/test/mochitest/test_ext_webrequest_auth.html | authHandler.onAuthRequired called with 840 http://mochi.test:8888/tests/toolkit/components/passwordmgr/test/mochitest/authenticate.sjs?realm=auth_nonblocking_forwardAuth&user=auth_nonblocking_forwardAuth&pass=auth_nonblocking_forwardAuth result undefined
[task 2021-06-14T22:55:32.941Z] 22:55:32 INFO - Buffered messages finished
[task 2021-06-14T22:55:32.942Z] 22:55:32 ERROR - TEST-UNEXPECTED-TIMEOUT | toolkit/components/extensions/test/mochitest/test_ext_webrequest_auth.html | application timed out after 370 seconds with no output
[task 2021-06-14T22:55:32.942Z] 22:55:32 ERROR - Force-terminating active process(es).
[task 2021-06-14T22:55:32.942Z] 22:55:32 INFO - Determining child pids from psutil...
[task 2021-06-14T22:55:32.953Z] 22:55:32 INFO - [5328, 5350, 5415, 5448, 5482, 5512, 5545, 5566, 5598, 5631, 5659, 5696, 5729, 5761, 5787, 5826, 5853, 5875, 5906, 5931, 5956, 5981, 6006, 6031, 6056, 6081, 6106, 6131, 6157, 6182, 6230]
[task 2021-06-14T22:55:32.954Z] 22:55:32 INFO - ==> process 5267 launched child process 5287
Comment 1•4 years ago
|
||
Looks like this has started after Bug 1709838 landed. Retriggers.
Christoph, could you have a look over it? Thank you.
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 3•4 years ago
|
||
Set release status flags based on info from the regressing bug 1709838
Updated•4 years ago
|
Comment 4•4 years ago
|
||
On it ...
Comment 5•4 years ago
|
||
Hey Cosmin, I evaluated test test_ext_webrequest_auth.html in detail and I don't think that Bug 1709838 is causing the intermittent.
In more detail:
Within Bug 1709838, we enable https-first
in Private Windows
which upgrades all connections from http
to https
in PBM but ...
- only in case the port is the default port - in other words,
https-first
does not apply tohttp://mochi.test:8888/
like in the testcase - only upgrades connections in PBM which is not the case in the testcase.
When running the test I also couldn't find any upgrades or any other indicators that https-first
would be messing with it - maybe some other patch that caused this intermittent to appear?
Comment 6•4 years ago
•
|
||
Hi Christoph, maybe it's something that surfaced along with those changes.
The thing is that before those changes there are 11 green jobs for test_ext_webrequest_auth
(log) and after the patch landed there are 8/21 timeouts as you can see from this TH link. Added 20 retriggers more to see if it will fail on the previous push.
Comment 7•4 years ago
|
||
After 20 more retriggers it's still there it fails first.
Comment 8•4 years ago
|
||
(In reply to Cosmin Sabou [:CosminS] from comment #7)
After 20 more retriggers it's still there it fails first.
Hey Shane, I don't know how to move forward with this intermittent test failure. I left some analysis in comment 5, but I can't reproduce the error. I was wondering, since you wrote the test, if you could take a look and maybe with joint forces we get the problem resolved - thank you!
Comment hidden (Intermittent Failures Robot) |
Comment 10•4 years ago
|
||
There is probably some minor timing change that affects this test on linux. The log shows that onAuthRequired is called, but otherwise there is not much logging in this test.
I would add some logging between each await and figure out which promise is failing, that might tell us something. Then maybe consider doing a Promise.all on these promises: https://searchfox.org/mozilla-central/rev/683046a2d37e1e2a4fd68e7749c1effd1aee2d38/toolkit/components/extensions/test/mochitest/test_ext_webrequest_auth.html#109-117
Failing that, I'd disable the test for linux+debug since that seems to be the bulk of the timeouts.
Comment 11•4 years ago
|
||
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•4 years ago
|
Assignee | ||
Comment 15•4 years ago
|
||
Hey Shane,
have you even been able to trigger it locally? I even tried to add some debugging logging and trigger it in a push to try with a TV job but unfortunately I've been unable to trigger it in that TV job, despite the TV job did run 10 times).
This makes me suspect there may be some interaction with a test case running before this one, and without both the test does not fail even if executed multiple times and in chaos mode (as the TV job does).
Follows some more context related to a couple of details I did notice while looking to this intermittent failure logs.
Looking to a bunch of the intermittent failures tracked in this bug, I did notice that all screenshot captured when the failure is triggered show the authentication dialog open, while the test is expected to be intercepting requests to open the authentication dialog and replace it with a mock object to test the behaviors expected by this test.
I'm unable to trigger this locally and so I didn't dig more into it, but looking to what class AuthRequestor
(defined and used internally in WebRequest.jsm) I did notice that AuthRequestor
's _getForwardPrompt
does have a try..catch with an empty catch block as part of its own fallback mechanism, and so there is a chance that the test is somehow triggering an exception when the test fails and the fallback mechanism is calling the authentication dialog as a side effect of that, but the empty catch
block is preventing us to see an error log when that happens.
If we could trigger the failure with some additional debugging logs in those two _getForwardPrompt
's try...catch blocks may help to shade some lights about this intermittent.
To be fair, it may be reasonable to still report the error using (Cu.reportError) if we are running on a debug build, that may help us to get some more details to work with if this kind of unexpected issue will keep happening in the future.
Assignee | ||
Comment 16•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 17•4 years ago
|
||
Adding leave-open to prevent this issue from being closed once D119644, the purpose of that patch it to provide us some more details from the failure happening while the test runs on the build infrastructure along with the other tests in the same chunk, but it wouldn't fix the intermittent failure yet.
(Also clearing the needinfo assigned to Shane, we have already discussed about this and agreed to proceed with the proposed approach described in comment 15).
Comment 18•4 years ago
|
||
Comment 19•4 years ago
|
||
bugherder |
Assignee | ||
Comment 20•4 years ago
|
||
This patch fixes the intermittent failures being triggered by test_ext_webrequest_auth.html by making sure that
the "http-on-modify-request" observer added by the test tasks part of this test file are going to ignore any http
request that isn't related to the mochi.test url that is expected to trigger the authentication dialog.
With the previous version of the chrome script executed by these two test tasks, there was a chance that the observer
would be setting the mocked nsIAuthPromptProvider/nsIauthPromp2 notification callback on the wrong channel, if
the http channel was also related to the mochi.test host. After that the observer was removed and it would not be
setting the notificationCallbacks on the channel that was meant to set it on (and that would make it the test to
trigger the actual authentication prompt that can be seen in the failure screenshots).
As part of this patch we are also reventing the additional debugging logs added in a previous patch attached to
this issue.
Assignee | ||
Comment 21•4 years ago
|
||
Removing leave-open keyword, we are pushing to autoland the fix attached to this issue as D119880, and I'm pretty confident it should prevent these intermittent failures from now on.
Comment 22•4 years ago
|
||
Comment 23•4 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 25•4 years ago
|
||
The trend of the intermittent failures tracked by this bug seems to also confirm the last patch we landed (comment 23) is preventing the unexpected timeout failures.
Removing disable-recommended from the whiteboard to reflect that.
Updated•4 years ago
|
Comment 26•4 years ago
|
||
The patch landed in nightly and beta is affected.
:rpl, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Comment 27•4 years ago
|
||
mozilla-beta failure https://treeherder.mozilla.org/logviewer?job_id=345632569&repo=mozilla-beta&lineNumber=10340
Assignee | ||
Comment 28•4 years ago
|
||
(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #26)
The patch landed in nightly and beta is affected.
:rpl, is this bug important enough to require an uplift?
If not please setstatus_beta
towontfix
.For more information, please visit auto_nag documentation.
The patch with the fix would not graft cleanly on beta as is without landing both D119644 and D119880 (because the fix from D119880 does also revert the additional debugging logging added in D119644).
It may not be worth it If the intermittency rate in beta isn't that high, but if the intermittency rate on beta is high enough to be annoying then I'll be more than happy to attach a separate test-only patch that does only apply the fix to the test (which wouldn't also require an explicit uplift request).
Andreaa, wdyt?
Comment 29•4 years ago
|
||
(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #28)
(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #26)
The patch landed in nightly and beta is affected.
:rpl, is this bug important enough to require an uplift?
If not please setstatus_beta
towontfix
.For more information, please visit auto_nag documentation.
The patch with the fix would not graft cleanly on beta as is without landing both D119644 and D119880 (because the fix from D119880 does also revert the additional debugging logging added in D119644).
It may not be worth it If the intermittency rate in beta isn't that high, but if the intermittency rate on beta is high enough to be annoying then I'll be more than happy to attach a separate test-only patch that does only apply the fix to the test (which wouldn't also require an explicit uplift request).
Andreaa, wdyt?
Failure rate is not high at the moment https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2021-07-12&endday=2021-07-19&tree=mozilla-beta&bug=1716452.
We'll keep an eye on it and add a ni if it gets high frequency.
Thank you!
Comment hidden (Intermittent Failures Robot) |
Comment 31•4 years ago
|
||
IMO, we should go ahead with the test-only uplift patch as we'll be living with these failures on ESR91 for the next year otherwise.
Assignee | ||
Comment 32•4 years ago
|
||
This attachment contains the fix landed on Nightly, turned into a single test-only patch that can be applied cleanly on 91 beta as is (got by just folding both D119644 and D119880 into a single patch and then double-checking that it does apply without conflicts on 91 beta as expected).
Assignee | ||
Comment 33•4 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #31)
IMO, we should go ahead with the test-only uplift patch as we'll be living with these failures on ESR91 for the next year otherwise.
+1 That's definitely a good point.
I've just attached in comment 32 the test-only patch which can be grafted on 91 beta.
Comment 34•4 years ago
|
||
bugherder uplift |
Comment hidden (Intermittent Failures Robot) |
Description
•