Closed Bug 1716452 Opened 4 years ago Closed 4 years ago

Intermittent TEST-UNEXPECTED-TIMEOUT | toolkit/components/extensions/test/mochitest/test_ext_webrequest_auth.html | application timed out after 370 seconds with no output

Tracking

(firefox-esr78 unaffected, firefox89 unaffected, firefox90 unaffected, firefox91 fixed, firefox92 fixed)

Status:

RESOLVED FIXED

Milestone:

92 Branch

Tracking Flags:

Tracking

Status

firefox-esr78

---

unaffected

firefox89

---

unaffected

firefox90

---

unaffected

firefox91

---

fixed

firefox92

---

fixed

People

(Reporter: intermittent-bug-filer, Assigned: rpl)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [stockwell fixed:patch])

Crash Data

Attachments

(3 files, 1 obsolete file)

Bug 1716452 - disable test_ext_webrequest_auth.html on Linux_debug for frequent failures. r=#intermittent-reviewers 4 years ago Atila Butkovits 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1716452 - Ensure WebRequest.jsm AuthRequestor _getForwardPrompt does report errors in debug builds. r?mixedpuppy! 4 years ago Luca Greco [:rpl] [:luca] [:lgreco] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1716452 - Fix test_ext_webrequest_auth.html intermittent failures. r?mixedpuppy! 4 years ago Luca Greco [:rpl] [:luca] [:lgreco] 48 bytes, text/x-phabricator-request		Details \| Review
beta91-folded-and-rebased--Bug_1716452___Fix_test_ext_webrequest_auth_html_intermittent_failures__r_mixedpuppy.patch 4 years ago Luca Greco [:rpl] [:luca] [:lgreco] 2.61 KB, patch		Details \| Diff \| Splinter Review

Treeherder Bug Filer

Reporter

Description

•

4 years ago

treeherder

Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=342754204&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/QivaseasQc6rmZ5gni9nSQ/runs/0/artifacts/public/logs/live_backing.log

[task 2021-06-14T22:55:32.941Z] 22:55:32     INFO - TEST-PASS | toolkit/components/extensions/test/mochitest/test_ext_webrequest_auth.html | authHandler.onAuthRequired called with 840 http://mochi.test:8888/tests/toolkit/components/passwordmgr/test/mochitest/authenticate.sjs?realm=auth_nonblocking_forwardAuth&user=auth_nonblocking_forwardAuth&pass=auth_nonblocking_forwardAuth result undefined 
[task 2021-06-14T22:55:32.941Z] 22:55:32     INFO - Buffered messages finished
[task 2021-06-14T22:55:32.942Z] 22:55:32    ERROR - TEST-UNEXPECTED-TIMEOUT | toolkit/components/extensions/test/mochitest/test_ext_webrequest_auth.html | application timed out after 370 seconds with no output
[task 2021-06-14T22:55:32.942Z] 22:55:32    ERROR - Force-terminating active process(es).
[task 2021-06-14T22:55:32.942Z] 22:55:32     INFO - Determining child pids from psutil...
[task 2021-06-14T22:55:32.953Z] 22:55:32     INFO - [5328, 5350, 5415, 5448, 5482, 5512, 5545, 5566, 5598, 5631, 5659, 5696, 5729, 5761, 5787, 5826, 5853, 5875, 5906, 5931, 5956, 5981, 6006, 6031, 6056, 6081, 6106, 6131, 6157, 6182, 6230]
[task 2021-06-14T22:55:32.954Z] 22:55:32     INFO - ==> process 5267 launched child process 5287

Cosmin Sabou [:CosminS]

Comment 1

•

4 years ago

Looks like this has started after Bug 1709838 landed. Retriggers.
Christoph, could you have a look over it? Thank you.

Flags: needinfo?(ckerschb)

Regressed by: 1709838

BMO Automation

Updated

•

4 years ago

Has Regression Range: --- → yes

Comment hidden (Intermittent Failures Robot)

BugBot [:suhaib / :marco/ :calixte]

Comment 3

•

4 years ago

Set release status flags based on info from the regressing bug 1709838

status-firefox89: --- → unaffected

status-firefox90: --- → unaffected

status-firefox91: --- → affected

status-firefox-esr78: --- → unaffected

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Keywords: regression

Christoph Kerschbaumer [:ckerschb]

Comment 4

•

4 years ago

On it ...

Assignee: nobody → ckerschb

Status: NEW → ASSIGNED

Flags: needinfo?(ckerschb)

Christoph Kerschbaumer [:ckerschb]

Comment 5

•

4 years ago

Hey Cosmin, I evaluated test test_ext_webrequest_auth.html in detail and I don't think that Bug 1709838 is causing the intermittent.

In more detail:
Within Bug 1709838, we enable https-first in Private Windows which upgrades all connections from http to https in PBM but ...

only in case the port is the default port - in other words, https-first does not apply to http://mochi.test:8888/ like in the testcase
only upgrades connections in PBM which is not the case in the testcase.

When running the test I also couldn't find any upgrades or any other indicators that https-first would be messing with it - maybe some other patch that caused this intermittent to appear?

Assignee: ckerschb → nobody

Status: ASSIGNED → NEW

Flags: needinfo?(csabou)

Cosmin Sabou [:CosminS]

Comment 6

•

4 years ago

•

Edited

Hi Christoph, maybe it's something that surfaced along with those changes.
The thing is that before those changes there are 11 green jobs for test_ext_webrequest_auth (log) and after the patch landed there are 8/21 timeouts as you can see from this TH link. Added 20 retriggers more to see if it will fail on the previous push.

Flags: needinfo?(csabou)

Cosmin Sabou [:CosminS]

Comment 7

•

4 years ago

After 20 more retriggers it's still there it fails first.

Flags: needinfo?(ckerschb)

Christoph Kerschbaumer [:ckerschb]

Comment 8

•

4 years ago

(In reply to Cosmin Sabou [:CosminS] from comment #7)

After 20 more retriggers it's still there it fails first.

Hey Shane, I don't know how to move forward with this intermittent test failure. I left some analysis in comment 5, but I can't reproduce the error. I was wondering, since you wrote the test, if you could take a look and maybe with joint forces we get the problem resolved - thank you!

Flags: needinfo?(ckerschb) → needinfo?(mixedpuppy)

Comment hidden (Intermittent Failures Robot)

Shane Caraveo (:mixedpuppy)

Comment 10

•

4 years ago

There is probably some minor timing change that affects this test on linux. The log shows that onAuthRequired is called, but otherwise there is not much logging in this test.

I would add some logging between each await and figure out which promise is failing, that might tell us something. Then maybe consider doing a Promise.all on these promises: https://searchfox.org/mozilla-central/rev/683046a2d37e1e2a4fd68e7749c1effd1aee2d38/toolkit/components/extensions/test/mochitest/test_ext_webrequest_auth.html#109-117

Failing that, I'd disable the test for linux+debug since that seems to be the bulk of the timeouts.

Flags: needinfo?(mixedpuppy)

Atila Butkovits

Comment 11

•

4 years ago

Attached file Bug 1716452 - disable test_ext_webrequest_auth.html on Linux_debug for frequent failures. r=#intermittent-reviewers (obsolete) — Details

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → abutkovits

Status: NEW → ASSIGNED

Comment hidden (Intermittent Failures Robot)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

Assignee: abutkovits → nobody

Status: ASSIGNED → NEW

Comment hidden (Intermittent Failures Robot)

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Crash Signature: [@ profiler_thread_sleep()]

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 15

•

4 years ago

Hey Shane,
have you even been able to trigger it locally? I even tried to add some debugging logging and trigger it in a push to try with a TV job but unfortunately I've been unable to trigger it in that TV job, despite the TV job did run 10 times).
This makes me suspect there may be some interaction with a test case running before this one, and without both the test does not fail even if executed multiple times and in chaos mode (as the TV job does).

Follows some more context related to a couple of details I did notice while looking to this intermittent failure logs.

Looking to a bunch of the intermittent failures tracked in this bug, I did notice that all screenshot captured when the failure is triggered show the authentication dialog open, while the test is expected to be intercepting requests to open the authentication dialog and replace it with a mock object to test the behaviors expected by this test.

I'm unable to trigger this locally and so I didn't dig more into it, but looking to what class AuthRequestor (defined and used internally in WebRequest.jsm) I did notice that AuthRequestor's _getForwardPrompt does have a try..catch with an empty catch block as part of its own fallback mechanism, and so there is a chance that the test is somehow triggering an exception when the test fails and the fallback mechanism is calling the authentication dialog as a side effect of that, but the empty catch block is preventing us to see an error log when that happens.

If we could trigger the failure with some additional debugging logs in those two _getForwardPrompt's try...catch blocks may help to shade some lights about this intermittent.

To be fair, it may be reasonable to still report the error using (Cu.reportError) if we are running on a debug build, that may help us to get some more details to work with if this kind of unexpected issue will keep happening in the future.

Flags: needinfo?(mixedpuppy)

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 16

•

4 years ago

Attached file Bug 1716452 - Ensure WebRequest.jsm AuthRequestor _getForwardPrompt does report errors in debug builds. r?mixedpuppy! — Details

Phabricator Automation

Updated

•

4 years ago

Assignee: nobody → lgreco

Status: NEW → ASSIGNED

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 17

•

4 years ago

Adding leave-open to prevent this issue from being closed once D119644, the purpose of that patch it to provide us some more details from the failure happening while the test runs on the build infrastructure along with the other tests in the same chunk, but it wouldn't fix the intermittent failure yet.

(Also clearing the needinfo assigned to Shane, we have already discussed about this and agreed to proceed with the proposed approach described in comment 15).

Flags: needinfo?(mixedpuppy)

Keywords: leave-open

Pulsebot

Comment 18

•

4 years ago

Pushed by luca.greco@alcacoop.it: https://hg.mozilla.org/integration/autoland/rev/d5e14c4d6f96 Ensure WebRequest.jsm AuthRequestor _getForwardPrompt does report errors in debug builds. r=mixedpuppy

Dorel Luca [:dluca]

Comment 19

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d5e14c4d6f96

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 20

•

4 years ago

Attached file Bug 1716452 - Fix test_ext_webrequest_auth.html intermittent failures. r?mixedpuppy! — Details

This patch fixes the intermittent failures being triggered by test_ext_webrequest_auth.html by making sure that
the "http-on-modify-request" observer added by the test tasks part of this test file are going to ignore any http
request that isn't related to the mochi.test url that is expected to trigger the authentication dialog.

With the previous version of the chrome script executed by these two test tasks, there was a chance that the observer
would be setting the mocked nsIAuthPromptProvider/nsIauthPromp2 notification callback on the wrong channel, if
the http channel was also related to the mochi.test host. After that the observer was removed and it would not be
setting the notificationCallbacks on the channel that was meant to set it on (and that would make it the test to
trigger the actual authentication prompt that can be seen in the failure screenshots).

As part of this patch we are also reventing the additional debugging logs added in a previous patch attached to
this issue.

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 21

•

4 years ago

Removing leave-open keyword, we are pushing to autoland the fix attached to this issue as D119880, and I'm pretty confident it should prevent these intermittent failures from now on.

Keywords: leave-open

Pulsebot

Comment 22

•

4 years ago

Pushed by luca.greco@alcacoop.it: https://hg.mozilla.org/integration/autoland/rev/9078c74116cd Fix test_ext_webrequest_auth.html intermittent failures. r=mixedpuppy

Natalia Csoregi [:nataliaCs]

Comment 23

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/9078c74116cd

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox92: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 92 Branch

Comment hidden (Intermittent Failures Robot)

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 25

•

4 years ago

The trend of the intermittent failures tracked by this bug seems to also confirm the last patch we landed (comment 23) is preventing the unexpected timeout failures.

Removing disable-recommended from the whiteboard to reflect that.

Whiteboard: [stockwell disable-recommended] → [stockwell]

Phabricator Automation

Updated

•

4 years ago

Attachment #9229565 - Attachment is obsolete: true

BugBot [:suhaib / :marco/ :calixte]

Comment 26

•

4 years ago

The patch landed in nightly and beta is affected.
:rpl, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(lgreco)

Andreea Pavel [:apavel]

Comment 27

•

4 years ago

mozilla-beta failure https://treeherder.mozilla.org/logviewer?job_id=345632569&repo=mozilla-beta&lineNumber=10340

Whiteboard: [stockwell] → [stockwell fixed:patch]

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 28

•

4 years ago

(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #26)

The patch landed in nightly and beta is affected.
:rpl, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

The patch with the fix would not graft cleanly on beta as is without landing both D119644 and D119880 (because the fix from D119880 does also revert the additional debugging logging added in D119644).

It may not be worth it If the intermittency rate in beta isn't that high, but if the intermittency rate on beta is high enough to be annoying then I'll be more than happy to attach a separate test-only patch that does only apply the fix to the test (which wouldn't also require an explicit uplift request).

Andreaa, wdyt?

Flags: needinfo?(lgreco) → needinfo?(apavel)

Andreea Pavel [:apavel]

Comment 29

•

4 years ago

(In reply to Luca Greco [:rpl] [:luca] [:lgreco] from comment #28)

(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #26)

The patch landed in nightly and beta is affected.
:rpl, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

The patch with the fix would not graft cleanly on beta as is without landing both D119644 and D119880 (because the fix from D119880 does also revert the additional debugging logging added in D119644).

It may not be worth it If the intermittency rate in beta isn't that high, but if the intermittency rate on beta is high enough to be annoying then I'll be more than happy to attach a separate test-only patch that does only apply the fix to the test (which wouldn't also require an explicit uplift request).

Andreaa, wdyt?

Failure rate is not high at the moment https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2021-07-12&endday=2021-07-19&tree=mozilla-beta&bug=1716452.
We'll keep an eye on it and add a ni if it gets high frequency.

Thank you!

Flags: needinfo?(apavel)

Comment hidden (Intermittent Failures Robot)

Ryan VanderMeulen [:RyanVM]

Comment 31

•

4 years ago

IMO, we should go ahead with the test-only uplift patch as we'll be living with these failures on ESR91 for the next year otherwise.

Flags: needinfo?(lgreco)

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 32

•

4 years ago

Attached patch beta91-folded-and-rebased--Bug_1716452___Fix_test_ext_webrequest_auth_html_intermittent_failures__r_mixedpuppy.patch — Details — Splinter Review

This attachment contains the fix landed on Nightly, turned into a single test-only patch that can be applied cleanly on 91 beta as is (got by just folding both D119644 and D119880 into a single patch and then double-checking that it does apply without conflicts on 91 beta as expected).

Luca Greco [:rpl] [:luca] [:lgreco]

Assignee

Comment 33

•

4 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #31)

IMO, we should go ahead with the test-only uplift patch as we'll be living with these failures on ESR91 for the next year otherwise.

+1 That's definitely a good point.

I've just attached in comment 32 the test-only patch which can be grafted on 91 beta.

Flags: needinfo?(lgreco)

Pascal Chevrel:pascalc (PTO until Sept 2)

Comment 34

•

4 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/bbc5de09297c

status-firefox91: affected → fixed

Comment hidden (Intermittent Failures Robot)

You need to log in before you can comment on or make changes to this bug.