Closed Bug 1835822 Opened 1 year ago Closed 1 year ago

Perma Wd TEST-UNEXPECTED-TIMEOUT | expected OK [TEST-INFO took 80003ms] when Gecko 115 merges to Beta on 2023-06-05

Categories

(Remote Protocol :: Marionette, defect)

defect

Tracking

(firefox-esr102 unaffected, firefox113 unaffected, firefox114 unaffected, firefox115+ fixed)

RESOLVED FIXED
Tracking Status
firefox-esr102 --- unaffected
firefox113 --- unaffected
firefox114 --- unaffected
firefox115 + fixed

People

(Reporter: noriszfay, Assigned: jdescottes)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Central-as-beta simulation

How to run these simulations

Failure log:

[task 2023-05-30T11:23:39.993Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump BusyCount: 0
[task 2023-05-30T11:23:39.993Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump CrashInfo: dispatch failed
[task 2023-05-30T11:23:39.994Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump Found ActiveWorker (dedicated): chrome://global/content/translations/language-id-engine-worker.js
[task 2023-05-30T11:23:39.994Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump Principal: [System Principal]
[task 2023-05-30T11:23:39.994Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump LoadingPrincipal: [System Principal]
[task 2023-05-30T11:23:39.995Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump BusyCount: 0
[task 2023-05-30T11:23:39.995Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump CrashInfo: dispatch failed
[task 2023-05-30T11:23:49.682Z] 11:23:49     INFO - PID 13583 | [Parent 13597, IPC I/O Parent] WARNING: Process 14026 may be hanging at shutdown; will wait for up to 40000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc:184
[task 2023-05-30T11:24:18.369Z] 11:24:18     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/classic/get_element_attribute/get.py | expected OK
[task 2023-05-30T11:24:18.370Z] 11:24:18     INFO - TEST-INFO took 80003ms
[task 2023-05-30T11:24:18.443Z] 11:24:18     INFO - STDOUT: ======================== 32 passed in 79.85s (0:01:19) =========================
[task 2023-05-30T11:24:18.444Z] 11:24:18     INFO - Closing logging queue
[task 2023-05-30T11:24:18.444Z] 11:24:18     INFO - queue closed
[task 2023-05-30T11:24:18.452Z] 11:24:18     INFO - PID 14234 | 1685445858451	geckodriver	INFO	Listening on 127.0.0.1:33079
[task 2023-05-30T11:24:18.453Z] 11:24:18     INFO - Starting runner
[task 2023-05-30T11:24:18.654Z] 11:24:18     INFO - TEST-START | /webdriver/tests/classic/get_element_attribute/user_prompts.py

Hey Julian, this might be caused by Bug 1816538 as all the Wd fails seems to be caused by timeouts (hanging in shutdown), could you take a look at it please. Thank you!

Flags: needinfo?(jdescottes)
Status: NEW → RESOLVED
Closed: 1 year ago
Flags: needinfo?(jdescottes)
Resolution: --- → FIXED

Removing tracking request and setting Fx115 to fixed per Comment 1

(In reply to Norisz Fay [:noriszfay] from comment #0)

Central-as-beta simulation

How to run these simulations

Failure log:

[task 2023-05-30T11:23:39.993Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump BusyCount: 0
[task 2023-05-30T11:23:39.993Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump CrashInfo: dispatch failed
[task 2023-05-30T11:23:39.994Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump Found ActiveWorker (dedicated): chrome://global/content/translations/language-id-engine-worker.js
[task 2023-05-30T11:23:39.994Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump Principal: [System Principal]
[task 2023-05-30T11:23:39.994Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump LoadingPrincipal: [System Principal]
[task 2023-05-30T11:23:39.995Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump BusyCount: 0
[task 2023-05-30T11:23:39.995Z] 11:23:39     INFO - PID 13583 | [Child 14026: Main Thread]: D/WorkerShutdownDump CrashInfo: dispatch failed
[task 2023-05-30T11:23:49.682Z] 11:23:49     INFO - PID 13583 | [Parent 13597, IPC I/O Parent] WARNING: Process 14026 may be hanging at shutdown; will wait for up to 40000ms: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc:184
[task 2023-05-30T11:24:18.369Z] 11:24:18     INFO - TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/classic/get_element_attribute/get.py | expected OK
[task 2023-05-30T11:24:18.370Z] 11:24:18     INFO - TEST-INFO took 80003ms
[task 2023-05-30T11:24:18.443Z] 11:24:18     INFO - STDOUT: ======================== 32 passed in 79.85s (0:01:19) =========================
[task 2023-05-30T11:24:18.444Z] 11:24:18     INFO - Closing logging queue
[task 2023-05-30T11:24:18.444Z] 11:24:18     INFO - queue closed
[task 2023-05-30T11:24:18.452Z] 11:24:18     INFO - PID 14234 | 1685445858451	geckodriver	INFO	Listening on 127.0.0.1:33079
[task 2023-05-30T11:24:18.453Z] 11:24:18     INFO - Starting runner
[task 2023-05-30T11:24:18.654Z] 11:24:18     INFO - TEST-START | /webdriver/tests/classic/get_element_attribute/user_prompts.py

Hey Julian, this might be caused by Bug 1816538 as all the Wd fails seems to be caused by timeouts (hanging in shutdown), could you take a look at it please. Thank you!

After investigating, this is probably unrelated to Bug 1816538. I couldn't really see a connection between the two, so I did a beta simulation with and without the patch from Bug 1816538 and both show the same timeout permas

Status: RESOLVED → REOPENED
No longer regressed by: 1816538
Resolution: FIXED → ---

Did some additional beta simulations, it seems failures start with the following push: https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=5eb0c339e4a6c919642f1e70d15347b766ceec00

The pushlog contains the changes for Bug 1831963, which flips a preference for early beta. And the failures are only on early beta simulations.
So I did a beta simulation from central: https://treeherder.mozilla.org/jobs?repo=try&revision=c2ac0bcfc78bd170b368926b46e741ddee00ffaf -> all orange.

Then I backed out the preference flip from Bug 1831963: https://treeherder.mozilla.org/jobs?repo=try&revision=0fa1bb8848bf26e9d5b39b9ce68aa96c5cf8a6b1 -> mostly green

So I strongly suspect this was regressed by the pref flip from Bug 1831963.

Looking at the logs it seems that it just slows down each and every test, it doesn't specifically hang in a single spot. This makes long tests go well over the test timeout.

Regressed by: 1831963

:gregtatum, since you are the author of the regressor, bug 1831963, could you take a look? Also, could you set the severity field?

For more information, please visit BugBot documentation.

Flags: needinfo?(gtatum)

Note that for the Wdspec test failures, one option would be to turn off the feature in the Recommended Preferences used for automation. I can provide a patch for that if that can help.

That seems like a reasonable course of action to me. I'm investigating some perf regressions with the translations feature being enabled, but I don't see how the translations feature needs to be enabled for web driver tests.

Flags: needinfo?(gtatum)

(In reply to Greg Tatum [:gregtatum] from comment #8)

That seems like a reasonable course of action to me. I'm investigating some perf regressions with the translations feature being enabled,

Thanks for the feedback. In that case I will disable the pref for automation in a separate bug.

but I don't see how the translations feature needs to be enabled for web driver tests.

It's always better if webdriver users run their tests against a version of Firefox which close to what real Firefox users have. If the configuration for webdriver is too different from the default one, they might be unable to detect issues and regressions that will affect their users. So ideally we would be as close as possible to a real Firefox configuration. But it's quite difficult to achieve. Here it's also puzzling that this only affected beta and not nightly, so even after disabling it we should still keep a follow up bug to try and re-enable the preference if possible.

Assignee: nobody → jdescottes
Depends on: 1836093
Assignee: jdescottes → nobody

With the translation feature disabled for our tests we won't have this perma failure anymore. But it would make sense to keep a bug open to get this tracked. Especially when it's easy to reproduce. Greg, is there already an existing bug that could be re-used or should we file a new one?

Flags: needinfo?(gtatum)

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #10)

With the translation feature disabled for our tests we won't have this perma failure anymore. But it would make sense to keep a bug open to get this tracked. Especially when it's easy to reproduce. Greg, is there already an existing bug that could be re-used or should we file a new one?

I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1836204 for re-enabling the feature, and pinged Greg there already.

Flags: needinfo?(gtatum)

Ah ok. So then lets close this bug as fixed and make sure that it's tracked correctly on the other bug. Thanks.

Assignee: nobody → jdescottes
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Duplicate of this bug: 1836275
Duplicate of this bug: 1835483
You need to log in before you can comment on or make changes to this bug.