Closed Bug 1898598 Opened 6 months ago Closed 5 months ago

Intermittent browser/components/translations/tests/browser/browser_translations_select_telemetry_translation_failure_ui.js | single tracking bug

Categories

(Firefox :: Translations, defect, P5)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr115 --- unaffected
firefox126 --- unaffected
firefox127 --- unaffected
firefox128 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: nordzilla)

References

(Regression)

Details

(4 keywords, Whiteboard: [stockwell disable-recommended])

Attachments

(1 file)

Filed by: smolnar [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=459398288&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/F9Y1JXkcTjql7SdQ4C_6zw/runs/0/artifacts/public/logs/live_backing.log


TEST-PASS | browser/components/translations/tests/browser/browser_translations_select_telemetry_translation_failure_ui.js | Expected select-translations-panel-to selection to match 'en' - 
[task 2024-05-23T18:41:27.984Z] 18:41:27     INFO - Buffered messages finished
[task 2024-05-23T18:41:27.985Z] 18:41:27     INFO - TEST-UNEXPECTED-FAIL | browser/components/translations/tests/browser/browser_translations_select_telemetry_translation_failure_ui.js | Uncaught exception in test bound test_select_translations_panel_telemetry_translation_failure_ui - undefined - timed out after 200 tries.
[task 2024-05-23T18:41:27.985Z] 18:41:27     INFO - Leaving test bound test_select_translations_panel_telemetry_translation_failure_ui
[task 2024-05-23T18:41:27.986Z] 18:41:27     INFO - GECKO(3433) | MEMORY STAT | vsize 3700MB | residentFast 515MB | heapAllocated 304MB
[task 2024-05-23T18:41:27.986Z] 18:41:27     INFO - TEST-OK | browser/components/translations/tests/browser/browser_translations_select_telemetry_translation_failure_ui.js | took 24764ms
[task 2024-05-23T18:41:27.987Z] 18:41:27     INFO - Not taking screenshot here: see the one that was previously logged
[task 2024-05-23T18:41:27.988Z] 18:41:27     INFO - TEST-UNEXPECTED-FAIL | browser/components/translations/tests/browser/browser_translations_select_telemetry_translation_failure_ui.js | Found an unexpected tab at the end of test run: https://example.com/browser/toolkit/components/translations/tests/browser/translations-tester-select.html - 
[task 2024-05-23T18:41:27.988Z] 18:41:27     INFO - GECKO(3433) | [Child 9665, Main Thread] WARNING: NS_ENSURE_TRUE(mDoneSetup) failed: file /builds/worker/checkouts/gecko/editor/composer/nsEditingSession.cpp:1163
[task 2024-05-23T18:41:27.989Z] 18:41:27     INFO - GECKO(3433) | [Child 3693: Main Thread]: I/DocShellAndDOMWindowLeak ++DOCSHELL 7f4610d5c400 == 1 [pid = 3693] [id = 83]
[task 2024-05-23T18:41:27.989Z] 18:41:27     INFO - GECKO(3433) | [Child 3693: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 165 (7f460fb2b200) [pid = 3693] [serial = 250] [outer = 0]
[task 2024-05-23T18:41:27.990Z] 18:41:27     INFO - GECKO(3433) | [Child 3693: Main Thread]: I/DocShellAndDOMWindowLeak ++DOMWINDOW == 166 (7f4610d5c800) [pid = 3693] [serial = 251] [outer = 7f460fb2b200]
[task 2024-05-23T18:41:28.020Z] 18:41:28     INFO - checking window state
[task 2024-05-23T18:41:28.120Z] 18:41:28     INFO - TEST-START | browser/components/translations/tests/browser/browser_translations_select_telemetry_unsupported_language_ui.js

:nordzilla, since you are the author of the regressor, bug 1870368, could you take a look?

For more information, please visit BugBot documentation.

Flags: needinfo?(enordin)

Set release status flags based on info from the regressing bug 1870368

This intermittent seems to be fairly regular in CI, but it doesn't produce as well locally.

Given how often it's happening, and that it's a new test I'm going to try to prioritize resolving this one.

Assignee: nobody → enordin
Flags: needinfo?(enordin)
Keywords: pernosco-wanted

Splits up the test case in browser_translations_select_telemetry_translation_failure_ui.js
in hopes that it cuts down on the intermittent failure in CI.

Keywords: leave-open

I'm going to land the patch to split up this test, but I don't think it fixes it entirely, so I'm adding leave-open.

FOG's IPC and test methods are very synchronous and deterministic, so intermittents like this should be avoidable.

Current hypothesis: Maybe the "events" ping is being submitted during the test run, gobbling the recorded event out of storage before it is asserted. FOG Logging being on during a failed run will tell us if there's any merit to that, and also give us clues to follow if it isn't.

Pushed by enordin@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/729e400f7b0b Split up translation_failure_ui test r=translations-reviewers,gregtatum
See Also: → 1900085

Thank you, :nordzilla, for catching a failure in action with logging enabled.

This log shows evidence consistent with my hypothesis: the event ping is sent on line 15323 due to a user-interaction-inactive being received after more than 2mins of "user activity" due to previous tests. This happens after the open event is recorded but before its presence is asserted, which is some wickedly bad luck.

This also explains why it doesn't happen locally: running locally we don't give it ten minutes per run. Not enough time to grow idle, or if it becomes idle it's after too short of an active period to trigger the "events" ping (which shares the "baseline" pings schedule).

We can paper over this failure immediately with a Services.obs.notifyObservers("user-interaction-inactive"); at the beginning of this test (triggers the "events" ping early, or avoids triggering it at all. Both are acceptable), or by doing what we did in bug 1690728 and giving dom.events.user_interaction_interval a value in excess of the length the app is open (meaning EventStateManager won't ever claim the user's gone active or idle for the duration). Either will ensure the "events" ping will not be sent during the test.

A more appropriate (less hacky) solution is being pursued in bug 1900085, so if we go one of those above routes in the meantime please do leave a note there (and reference the bug number in a comment next to the hack) so we can undo it later.

Regressions: 1900091
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED

The two pushes in comment 19 must be based on an old branch, because I split up this test file into two separate files, and a test file with this exact name no longer exists in tree.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: