Closed Bug 1855360 Opened 1 year ago Closed 2 months ago

Intermittent browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug

Categories

(Firefox :: Shopping, defect, P5)

defect

Tracking

()

RESOLVED FIXED
130 Branch
Tracking Status
firefox129 --- fixed
firefox130 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: aminomancer)

References

Details

(Keywords: intermittent-failure, intermittent-testcase, Whiteboard: [stockwell disable-recommended])

Attachments

(2 files)

Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=430438837&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/dHZcYHtLSPamKHNd-qI5hA/runs/0/artifacts/public/logs/live_backing.log


task 2023-09-27T04:30:15.010Z] 04:30:15     INFO - TEST-PASS | browser/components/shopping/tests/browser/browser_ui_telemetry.js | {"category":"shopping","extra":{"source":"addressBarIcon"},"name":"surface_closed"} deepEqual {"category":"shopping","name":"surface_closed","extra":{"source":"addressBarIcon"}} - 
[task 2023-09-27T04:30:15.013Z] 04:30:15     INFO - Leaving test bound test_close_telemetry_recorded
[task 2023-09-27T04:30:15.013Z] 04:30:15     INFO - Buffered messages finished
[task 2023-09-27T04:30:15.013Z] 04:30:15     INFO - TEST-UNEXPECTED-FAIL | browser/components/shopping/tests/browser/browser_ui_telemetry.js | This test exceeded the timeout threshold. It should be rewritten or split up. If that's not possible, use requestLongerTimeout(N), but only as a last resort. - 
[task 2023-09-27T04:30:15.013Z] 04:30:15     INFO - GECKO(9670) | MEMORY STAT | vsize 3148MB | residentFast 367MB | heapAllocated 219MB
[task 2023-09-27T04:30:15.014Z] 04:30:15     INFO - TEST-OK | browser/components/shopping/tests/browser/browser_ui_telemetry.js | took 47695ms
[task 2023-09-27T04:30:15.015Z] 04:30:15     INFO - checking window state
[task 2023-09-27T04:30:15.015Z] 04:30:15     INFO - TEST-START | browser/components/shopping/tests/browser/browser_unanalyzed_product.js

Update

There have been 37 failures within the last 7 days:

  • 10 failures on Linux 18.04 x64 WebRender debug/opt
  • 6 failures on Linux 18.04 x64 WebRender Shippable opt
  • 3 failures on Linux 18.04 x64 WebRender tsan opt
  • 6 failures on OS X 10.15 WebRender debug/ opt
  • 6 failures on Windows 11 x86 22H2 WebRender debug/opt
  • 6 failures on Windows 11 x64 22H2 WebRender debug/opt

Recent log: https://treeherder.mozilla.org/logviewer?job_id=431408743&repo=autoland&lineNumber=14279

Jared, can you assign this to someone?
Thank you.

Flags: needinfo?(jhirsch)
Whiteboard: [stockwell needswork:owner]
Flags: needinfo?(jhirsch)

Hi Jared! Can you please take a look at this? I think the recent spike in failures here is caused by the changes from Bug 1868602. The new failure line TEST-UNEXPECTED-FAIL | browser/components/shopping/tests/browser/browser_ui_telemetry.js | Uncaught exception in test bound test_shopping_sidebar_displayed - at chrome://mochitests/content/browser/browser/components/shopping/tests/browser/browser_ui_telemetry.js:197 - TypeError: can't access property "length", tabSwitchEvents is null appeared after that bug landed.
Thank you!

Flags: needinfo?(jhirsch)
Summary: Intermittent browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug → High frequency browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug
Whiteboard: [stockwell unknown]
Assignee: nobody → csabou
Status: NEW → ASSIGNED
Assignee: csabou → nobody
Status: ASSIGNED → NEW
Keywords: leave-open
Attachment #9375434 - Attachment description: Bug 1855360 - Disable browser_ui_telemetry.js on linux for frequent failures. r=#intermittent-reviewers → Bug 1855360 - Disable browser_ui_telemetry.js on linux 18.04 for frequent failures. r=#intermittent-reviewers
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/91e4c1fcde28 Disable browser_ui_telemetry.js on linux 18.04 for frequent failures. r=intermittent-reviewers,jmaher DONTBUILD
Summary: High frequency browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug → Intermittent browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]

If you run the whole shopping browser test manifest locally and watch it as it runs, you may see a point where browser_ui_telemetry.js hangs for 30+ seconds, and you may get the "exceeds the timeout threshold" failure at the end. I'm not sure if it's consistently reproducible on every platform, but it is 100% consistent for me on Windows 10.

I encountered this before with a different test, and my fix was simply to add Services.fog.testResetFOG() after every subtest. I don't know exactly why that worked. My intuition at the time was that some kind of data was accumulating, and once it crossed a certain threshold, finally calling Services.fog.testResetFOG() would cause a hang as it stumbled over something. Whereas if you call that method often enough, it never crosses that threshold, so it never hangs. It's not simply that you're replacing one long hang with 15 small hangs - the total duration of the test was greatly reduced by placing these calls all over the place. And I also found that removing tests didn't alter the duration, until I removed a certain number, at which point the hang completely disappeared. So there seems to be a certain threshold where you go from no hang to a really long one.

This particular test already has very frequent Services.fog.testResetFOG() calls. So it doesn't actually fail individually. It has to be run in combination with other tests for the hang and the timeout threshold failure to happen. I think some other tests in this manifest are accumulating this unknown FOG data, and then browser_ui_telemetry.js is the test that's hanging because it's the first one in a long time to call testResetFOG(). That was exactly what happened with my previous test, except that it happened in the context of one test file. All the earlier subtests didn't hang, because they didn't call testResetFOG. It was only the first testResetFOG call that ran into problems.

So I think this failure can be avoided by adding testResetFOG to one or some of the tests in this manifest that come between browser_shopping_onboarding.js and browser_ui_telemetry.js (since those 2 already call it). The placement would be kind of arbitrary, since it would not be necessary for any actual assertions. But if it's causing 30-second hangs for me, I imagine that adds up considerably on CI, so I suspect this is worthwhile, even if it's a bit hacky.

ni?chutten because I think we discussed this briefly several months ago, and I seem to recall you having more information about the underlying cause of the issue. If that can be fixed in a more direct manner, it'd probably be preferable to adding testResetFOG for reasons other than its intended purpose. Thanks!

Flags: needinfo?(chutten)

Yeah, that was bug 1833453 which I couldn't prioritize at the time. It was down to there being so many pending pings on disk, and disk not being as rapid as we might like. Might be the same here, not sure.

Flags: needinfo?(chutten)
See Also: → 1833453

@Shane, it's perma on win/mac, after bug 1900486 landed.

Flags: needinfo?(shughes)
Summary: Intermittent browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug → Perma browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug

Thanks, that makes sense. I was running the tests after bug 1900486. So it's my regression, presumably due to the extra test I added. Seems like the fix will still be the same.

Pushed by smolnar@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5cdafdad54eb Fix the Shopping test manifest FOG timeout failure. r=omc-reviewers,negin
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c5fffd211e13 Enable browser_ui_telemetry.js on linux as the test is fixed. a=test-only
Flags: needinfo?(shughes)
Flags: needinfo?(jhirsch)
Keywords: leave-open
Summary: Perma browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug → Intermittent browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug
Whiteboard: [stockwell disabled]

For posterity, try push.

Status: NEW → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Target Milestone: --- → 130 Branch
Assignee: nobody → csabou

The patch landed in nightly and beta is affected.
:CosminS, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox129 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(csabou)

Shane, if you think the fix here needs to get into beta where is permafailing please coordinate with dmeehan.

Assignee: csabou → shughes
Flags: needinfo?(csabou) → needinfo?(shughes)
Whiteboard: [stockwell disable-recommended]

Is permafailing bad enough to justify an uplift? I would normally not uplift non-user-facing issues, but I'd defer to your judgment on CI matters.

Flags: needinfo?(shughes)

(In reply to Shane Hughes [:aminomancer] from comment #78)

Is permafailing bad enough to justify an uplift? I would normally not uplift non-user-facing issues, but I'd defer to your judgment on CI matters.

In general, we aim to minimize test failures in all branches when it makes sense.

This is test-only change, I can push it without an uplift request. I'll take in my next push to beta and esr128

Donal, the fix is in https://bugzilla.mozilla.org/show_bug.cgi?id=1855360#c60. I'll let you decide if that needs an uplift request. Comment 69 just enables the test after the fix. Thank you.

Flags: needinfo?(dmeehan)
Flags: needinfo?(dmeehan)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: