1855360 - Intermittent browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug

Reporter

Description

•

1 year ago

treeherder

Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=430438837&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/dHZcYHtLSPamKHNd-qI5hA/runs/0/artifacts/public/logs/live_backing.log

task 2023-09-27T04:30:15.010Z] 04:30:15     INFO - TEST-PASS | browser/components/shopping/tests/browser/browser_ui_telemetry.js | {"category":"shopping","extra":{"source":"addressBarIcon"},"name":"surface_closed"} deepEqual {"category":"shopping","name":"surface_closed","extra":{"source":"addressBarIcon"}} - 
[task 2023-09-27T04:30:15.013Z] 04:30:15     INFO - Leaving test bound test_close_telemetry_recorded
[task 2023-09-27T04:30:15.013Z] 04:30:15     INFO - Buffered messages finished
[task 2023-09-27T04:30:15.013Z] 04:30:15     INFO - TEST-UNEXPECTED-FAIL | browser/components/shopping/tests/browser/browser_ui_telemetry.js | This test exceeded the timeout threshold. It should be rewritten or split up. If that's not possible, use requestLongerTimeout(N), but only as a last resort. - 
[task 2023-09-27T04:30:15.013Z] 04:30:15     INFO - GECKO(9670) | MEMORY STAT | vsize 3148MB | residentFast 367MB | heapAllocated 219MB
[task 2023-09-27T04:30:15.014Z] 04:30:15     INFO - TEST-OK | browser/components/shopping/tests/browser/browser_ui_telemetry.js | took 47695ms
[task 2023-09-27T04:30:15.015Z] 04:30:15     INFO - checking window state
[task 2023-09-27T04:30:15.015Z] 04:30:15     INFO - TEST-START | browser/components/shopping/tests/browser/browser_unanalyzed_product.js

Comment hidden (Intermittent Failures Robot)

Natalia Csoregi [:nataliaCs]

Comment 2

•

1 year ago

Update

There have been 37 failures within the last 7 days:

10 failures on Linux 18.04 x64 WebRender debug/opt
6 failures on Linux 18.04 x64 WebRender Shippable opt
3 failures on Linux 18.04 x64 WebRender tsan opt
6 failures on OS X 10.15 WebRender debug/ opt
6 failures on Windows 11 x86 22H2 WebRender debug/opt
6 failures on Windows 11 x64 22H2 WebRender debug/opt

Recent log: https://treeherder.mozilla.org/logviewer?job_id=431408743&repo=autoland&lineNumber=14279

Jared, can you assign this to someone?
Thank you.

Flags: needinfo?(jhirsch)

Whiteboard: [stockwell needswork:owner]

Comment hidden (Intermittent Failures Robot)

Iulian Moraru

Updated

•

8 months ago

Flags: needinfo?(jhirsch)

Iulian Moraru

Comment 21

•

8 months ago

Hi Jared! Can you please take a look at this? I think the recent spike in failures here is caused by the changes from Bug 1868602. The new failure line TEST-UNEXPECTED-FAIL | browser/components/shopping/tests/browser/browser_ui_telemetry.js | Uncaught exception in test bound test_shopping_sidebar_displayed - at chrome://mochitests/content/browser/browser/components/shopping/tests/browser/browser_ui_telemetry.js:197 - TypeError: can't access property "length", tabSwitchEvents is null appeared after that bug landed.
Thank you!

Flags: needinfo?(jhirsch)

Comment hidden (Intermittent Failures Robot)

Cosmin Sabou [:CosminS]

Comment 25

•

8 months ago

•

Edited

This is frequently failing on linux runs. By the rate this is failing it should be fixed or it will be disabled on linux. https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2024-01-01&endday=2024-01-16&tree=trunk&failurehash=all&bug=1855360
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=443543302&repo=autoland

Summary: Intermittent browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug → High frequency browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug

Whiteboard: [stockwell unknown]

Comment hidden (Intermittent Failures Robot)

Cosmin Sabou [:CosminS]

Comment 28

•

8 months ago

Attached file Bug 1855360 - Disable browser_ui_telemetry.js on linux 18.04 for frequent failures. r=#intermittent-reviewers — Details

Phabricator Automation

Updated

•

8 months ago

Assignee: nobody → csabou

Status: NEW → ASSIGNED

Cosmin Sabou [:CosminS]

Updated

•

8 months ago

Assignee: csabou → nobody

Status: ASSIGNED → NEW

Keywords: leave-open

Phabricator Automation

Updated

•

8 months ago

Attachment #9375434 - Attachment description: Bug 1855360 - Disable browser_ui_telemetry.js on linux for frequent failures. r=#intermittent-reviewers → Bug 1855360 - Disable browser_ui_telemetry.js on linux 18.04 for frequent failures. r=#intermittent-reviewers

Pulsebot

Comment 29

•

8 months ago

Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/91e4c1fcde28 Disable browser_ui_telemetry.js on linux 18.04 for frequent failures. r=intermittent-reviewers,jmaher DONTBUILD

Cosmin Sabou [:CosminS]

Updated

•

8 months ago

Summary: High frequency browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug → Intermittent browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug

Whiteboard: [stockwell disable-recommended] → [stockwell disabled]

Pulsebot

Comment 30

•

8 months ago

Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/935ef2364448 Fix the skip-if syntax. a=bustage-fix

Serban Stanca [:SerbanS]

Comment 31

•

8 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/91e4c1fcde28
https://hg.mozilla.org/mozilla-central/rev/935ef2364448

Comment hidden (Intermittent Failures Robot)

Shane Hughes [:aminomancer]

Assignee

Comment 43

•

3 months ago

If you run the whole shopping browser test manifest locally and watch it as it runs, you may see a point where browser_ui_telemetry.js hangs for 30+ seconds, and you may get the "exceeds the timeout threshold" failure at the end. I'm not sure if it's consistently reproducible on every platform, but it is 100% consistent for me on Windows 10.

I encountered this before with a different test, and my fix was simply to add Services.fog.testResetFOG() after every subtest. I don't know exactly why that worked. My intuition at the time was that some kind of data was accumulating, and once it crossed a certain threshold, finally calling Services.fog.testResetFOG() would cause a hang as it stumbled over something. Whereas if you call that method often enough, it never crosses that threshold, so it never hangs. It's not simply that you're replacing one long hang with 15 small hangs - the total duration of the test was greatly reduced by placing these calls all over the place. And I also found that removing tests didn't alter the duration, until I removed a certain number, at which point the hang completely disappeared. So there seems to be a certain threshold where you go from no hang to a really long one.

This particular test already has very frequent Services.fog.testResetFOG() calls. So it doesn't actually fail individually. It has to be run in combination with other tests for the hang and the timeout threshold failure to happen. I think some other tests in this manifest are accumulating this unknown FOG data, and then browser_ui_telemetry.js is the test that's hanging because it's the first one in a long time to call testResetFOG(). That was exactly what happened with my previous test, except that it happened in the context of one test file. All the earlier subtests didn't hang, because they didn't call testResetFOG. It was only the first testResetFOG call that ran into problems.

So I think this failure can be avoided by adding testResetFOG to one or some of the tests in this manifest that come between browser_shopping_onboarding.js and browser_ui_telemetry.js (since those 2 already call it). The placement would be kind of arbitrary, since it would not be necessary for any actual assertions. But if it's causing 30-second hangs for me, I imagine that adds up considerably on CI, so I suspect this is worthwhile, even if it's a bit hacky.

ni?chutten because I think we discussed this briefly several months ago, and I seem to recall you having more information about the underlying cause of the issue. If that can be fixed in a more direct manner, it'd probably be preferable to adding testResetFOG for reasons other than its intended purpose. Thanks!

Flags: needinfo?(chutten)

Chris H-C :chutten

Comment 44

•

3 months ago

Yeah, that was bug 1833453 which I couldn't prioritize at the time. It was down to there being so many pending pings on disk, and disk not being as rapid as we might like. Might be the same here, not sure.

Flags: needinfo?(chutten)

Comment 46

•

3 months ago

@Shane, it's perma on win/mac, after bug 1900486 landed.

Flags: needinfo?(shughes)

Summary: Intermittent browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug → Perma browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug

Shane Hughes [:aminomancer]

Assignee

Comment 47

•

3 months ago

Thanks, that makes sense. I was running the tests after bug 1900486. So it's my regression, presumably due to the extra test I added. Seems like the fix will still be the same.

Comment hidden (Intermittent Failures Robot)

Shane Hughes [:aminomancer]

Assignee

Comment 51

•

3 months ago

Attached file Bug 1855360 - Fix the Shopping test manifest FOG timeout failure. r=#omc-reviewers — Details

Comment hidden (Intermittent Failures Robot)

Pulsebot

Comment 58

•

3 months ago

Pushed by smolnar@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5cdafdad54eb Fix the Shopping test manifest FOG timeout failure. r=omc-reviewers,negin

Comment hidden (Intermittent Failures Robot)

tszentpeteri

Comment 60

•

2 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5cdafdad54eb

Comment hidden (Intermittent Failures Robot)

Pulsebot

Comment 69

•

2 months ago

Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c5fffd211e13 Enable browser_ui_telemetry.js on linux as the test is fixed. a=test-only

Cosmin Sabou [:CosminS]

Updated

•

2 months ago

status-firefox129: --- → affected

status-firefox130: --- → fixed

Flags: needinfo?(shughes)

Flags: needinfo?(jhirsch)

Keywords: leave-open

Summary: Perma browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug → Intermittent browser/components/shopping/tests/browser/browser_ui_telemetry.js | single tracking bug

Whiteboard: [stockwell disabled]

Cosmin Sabou [:CosminS]

Comment 70

•

2 months ago

•

Edited

For posterity, try push.

Comment hidden (Intermittent Failures Robot)

pstanciu

Comment 72

•

2 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c5fffd211e13

Status: NEW → RESOLVED

Closed: 2 months ago

Resolution: --- → FIXED

Target Milestone: --- → 130 Branch

BugBot [:suhaib / :marco/ :calixte]

Updated

•

2 months ago

Assignee: nobody → csabou

BugBot [:suhaib / :marco/ :calixte]

Comment 73

•

2 months ago

The patch landed in nightly and beta is affected.
:CosminS, is this bug important enough to require an uplift?

If yes, please nominate the patch for beta approval.
If no, please set status-firefox129 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(csabou)

Comment hidden (Intermittent Failures Robot)

Cosmin Sabou [:CosminS]

Comment 76

•

2 months ago

Shane, if you think the fix here needs to get into beta where is permafailing please coordinate with dmeehan.

Assignee: csabou → shughes

Flags: needinfo?(csabou) → needinfo?(shughes)

Whiteboard: [stockwell disable-recommended]

Comment hidden (Intermittent Failures Robot)

Shane Hughes [:aminomancer]

Assignee

Comment 78

•

2 months ago

Is permafailing bad enough to justify an uplift? I would normally not uplift non-user-facing issues, but I'd defer to your judgment on CI matters.

Flags: needinfo?(shughes)

Comment hidden (Intermittent Failures Robot)

Donal Meehan [:dmeehan]

Comment 80

•

2 months ago

(In reply to Shane Hughes [:aminomancer] from comment #78)

Is permafailing bad enough to justify an uplift? I would normally not uplift non-user-facing issues, but I'd defer to your judgment on CI matters.

In general, we aim to minimize test failures in all branches when it makes sense.

This is test-only change, I can push it without an uplift request. I'll take in my next push to beta and esr128

Pulsebot

Comment 81

•

2 months ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/00d97fc78aa3

Donal Meehan [:dmeehan]

Updated

•

2 months ago

status-firefox129: affected → fixed

Cosmin Sabou [:CosminS]

Comment 82

•

2 months ago

Donal, the fix is in https://bugzilla.mozilla.org/show_bug.cgi?id=1855360#c60. I'll let you decide if that needs an uplift request. Comment 69 just enables the test after the fix. Thank you.

Flags: needinfo?(dmeehan)

Pulsebot

Comment 83

•

2 months ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/09503592810b

Donal Meehan [:dmeehan]

Updated

•

2 months ago

Flags: needinfo?(dmeehan)

Comment hidden (Intermittent Failures Robot)

Bug 1855360 - Disable browser_ui_telemetry.js on linux 18.04 for frequent failures. r=#intermittent-reviewers 8 months ago Cosmin Sabou [:CosminS] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1855360 - Fix the Shopping test manifest FOG timeout failure. r=#omc-reviewers 3 months ago Shane Hughes [:aminomancer] 48 bytes, text/x-phabricator-request		Details \| Review