Frequent fenix browsertime [taskcluster:error] Aborting task...
Categories
(Testing :: Raptor, defect, P2)
Tracking
(firefox-esr102 unaffected, firefox108 unaffected, firefox109 unaffected, firefox110 wontfix, firefox111 fixed)
| Tracking | Status | |
|---|---|---|
| firefox-esr102 | --- | unaffected |
| firefox108 | --- | unaffected |
| firefox109 | --- | unaffected |
| firefox110 | --- | wontfix |
| firefox111 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: zmckenney)
References
(Regression)
Details
(Keywords: intermittent-failure, regression, Whiteboard: [geckoview:m111][stockwell disable-recommended])
Attachments
(1 obsolete file)
Filed by: archaeopteryx [at] coole-files.de
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=401127404&repo=fenix
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/KZQ3t02DTxKlJaFfHkI0Yg/runs/0/artifacts/public/logs/live_backing.log
~40% of browsertime tasks for fenix hit the run time limit. This issue started on 2022-12-31.
[task 2023-01-02T06:46:45.575Z] 06:46:37 INFO - raptor-browsertime Info: Starting a browsertime pageload
[task 2023-01-02T06:46:45.576Z] 06:46:37 INFO - raptor-browsertime Info: Waiting for 1000 ms (post_startup_delay)
[task 2023-01-02T06:46:45.576Z] 06:46:38 INFO - raptor-browsertime Info: Navigating to about:blank, count: 0
[task 2023-01-02T06:46:45.576Z] 06:46:38 INFO - raptor-browsertime Info: Navigating to url about:blank iteration 14
[task 2023-01-02T06:46:45.576Z] 06:46:44 INFO - raptor-browsertime Info: Navigating to primary url:https://www.allrecipes.com/
[task 2023-01-02T06:46:45.576Z] 06:46:44 INFO - raptor-browsertime Info: Cycle 0, waiting for 1000 ms
[task 2023-01-02T06:46:45.576Z] 06:46:45 INFO - raptor-browsertime Info: Cycle 0, starting the measure
[task 2023-01-02T06:46:45.576Z] 06:46:45 INFO - raptor-browsertime Info: Testing url https://www.allrecipes.com/ iteration 14
[taskcluster:error] Aborting task...
There was an infrastructure change related to HTTP/3, potentially 1-2 days earlier. Please reach out to :jlorenzo if you need more information.
Comment 1•3 years ago
|
||
:aryx, could you trigger a couple backfill runs for a handful of tests on the following two commits? I'm unable to trigger them:
(1) https://treeherder.mozilla.org/jobs?repo=fenix&tier=1%2C2%2C3&searchStr=browsertime&revision=4946c2795c357521e1c03c9368c4dc4526d2c9a6
(2) https://treeherder.mozilla.org/jobs?repo=fenix&tier=1%2C2%2C3&searchStr=browsertime&revision=bdf6432abce70c5c9633390edb2fd7a74392cafd
Comment 2•3 years ago
|
||
I don't see backfills in the custom actions, and adding the task to these pushes fails. Maybe Johan can add/run the browsertime allrecipes task to the two pushes which don't have it yet?
Comment 3•3 years ago
|
||
Ok, :jlorenzo, would you be able to add new tasks for browsertime to the two pushes below? :aryx, and I are having issues triggering them.
(1) https://treeherder.mozilla.org/jobs?repo=fenix&tier=1%2C2%2C3&searchStr=browsertime&revision=4946c2795c357521e1c03c9368c4dc4526d2c9a6
(2) https://treeherder.mozilla.org/jobs?repo=fenix&tier=1%2C2%2C3&searchStr=browsertime&revision=bdf6432abce70c5c9633390edb2fd7a74392cafd
Comment 4•3 years ago
|
||
Updated•3 years ago
|
Comment 5•3 years ago
|
||
Hi there! Crap, I'm facing the same scope issue. It'll be fixed by the patch I just attached.
Updated•3 years ago
|
Comment 6•3 years ago
|
||
Comment on attachment 9310502 [details]
Bug 1808231 - Allow browsertime tasks to be scheduled by an Action task r=jcristau
Revision D165870 was moved to bug 1808334. Setting attachment 9310502 [details] to obsolete.
Comment 7•3 years ago
|
||
(In reply to Johan Lorenzo [:jlorenzo] from comment #5)
It'll be fixed by the patch I just attached.
Update: It'll be fixed in bug 1808334
Updated•3 years ago
|
Comment 8•3 years ago
|
||
Excellent, thanks :jlorenzo!
Comment 9•3 years ago
|
||
(In reply to Greg Mierzwinski [:sparky] from comment #3)
Ok, :jlorenzo, would you be able to add new tasks for browsertime to the two pushes below? :aryx, and I are having issues triggering them.
(1) https://treeherder.mozilla.org/jobs?repo=fenix&tier=1%2C2%2C3&searchStr=browsertime&revision=4946c2795c357521e1c03c9368c4dc4526d2c9a6
(2) https://treeherder.mozilla.org/jobs?repo=fenix&tier=1%2C2%2C3&searchStr=browsertime&revision=bdf6432abce70c5c9633390edb2fd7a74392cafd
New tasks added! You guys should be able to schedule them next time. Let me know if that's not the case 🙂
Comment 10•3 years ago
|
||
Will do, thanks for the quick fix Johan!
So far based on the backfill, it looks like the failure started on this top commit: https://treeherder.mozilla.org/jobs?repo=fenix&tochange=4946c2795c357521e1c03c9368c4dc4526d2c9a6&fromchange=d07de1e028350bfb2b28cd2ad2e92b735c2ac88c
Which is from bug 1807716.
Comment 11•3 years ago
|
||
:zmckenney, would you know what might be happening here? Here's a logcat log if that helps (it seems to fail on browser startup): https://firefoxci.taskcluster-artifacts.net/DcDeZLjkQSyHjMRBbGPQew/0/public/test_info/logcat-R58R36P8E0D.log
We also start the app a bit differently which involves glean. Here's a list of extra arguments we pass when we start the app:
--firefox.android.activity mozilla.telemetry.glean.debug.GleanDebugActivity
--firefox.android.intentArgument=-a
--firefox.android.intentArgument android.intent.action.VIEW
--firefox.android.intentArgument=--es
--firefox.android.intentArgument=startNext
--firefox.android.intentArgument=org.mozilla.fenix.IntentReceiverActivity
--firefox.android.intentArgument=--esa
--firefox.android.intentArgument=sourceTags
--firefox.android.intentArgument=automation
--firefox.android.intentArgument=--ez
--firefox.android.intentArgument=performancetest
--firefox.android.intentArgument=true
--firefox.android.intentArgument=-d
--firefox.android.intentArgument about:blank
This was taken from this log: https://firefoxci.taskcluster-artifacts.net/cyDNKHuTSkKhq_6JRhl-nw/0/public/logs/live_backing.log
You can find more tests here: https://treeherder.mozilla.org/jobs?repo=fenix&tochange=4946c2795c357521e1c03c9368c4dc4526d2c9a6&fromchange=d07de1e028350bfb2b28cd2ad2e92b735c2ac88c
Let me know if you need anything else from me.
| Assignee | ||
Comment 12•3 years ago
|
||
[:sparky] thanks for adding the startup of the app since this pointed me to where to look. This is failing because of an existing race condition here that checks whether Glean is initialized. Because Glean is initialized asynchronously this was never guaranteed to return true and because of the change in initialization order it now returns false and finishes.
The change to the order of application initialization is still proper since we want to warm up the engine before initializing Glean. I'm not sure how the test harness is setup but waiting for Glean to be initialized will solve this issue. A temporary fix for this might be adding a wait or repeating retry with timeout in GleanDebugActivity. A better fix would probably be in observing whether Glean has been initialized inside GleanDebugActivity before running the tests.
Hope this helps!
Comment 13•3 years ago
|
||
That's very helpful, thanks! Regarding how our test harness is setup, we don't do any explicit waiting for the GleanDebugActivity to be initialized.
I see two ideas for how we can resolve this: (1) increase the wait time after browser initialization (flaky), and (2) resolve the race condition in Fenix.
I'll start on option (1) with an additional 5 second wait and see how it goes.
:zmckenney, would you know if Glean is exposed in JS in a way that would allow us to wait for it to exist? This would be a better solution versus waiting a hard-coded amount of time.
Comment 14•3 years ago
|
||
Try run with an increased settle time for Fenix (5s): https://treeherder.mozilla.org/jobs?repo=try&revision=13bdbb1df23401a170304ce9560c6a57619033a2
| Assignee | ||
Comment 15•3 years ago
•
|
||
:sparky I don't know about the JS visibility of Glean but I think we're going to need to check the java API initialization. I've created bug 1808586 for waiting on the initialization of Glean before proceeding with the tests which I don't think will be too difficult.
Comment 16•3 years ago
|
||
Set release status flags based on info from the regressing bug 1807716
Comment 17•3 years ago
|
||
Ok that would be great thanks! I should note that the waiting I tried didn't work because we're failing before that code:
[task 2023-01-05T01:06:34.928Z] 00:57:33 INFO - raptor-browsertime Info: Skip setting default preferences for Firefox
[task 2023-01-05T01:06:34.928Z] 01:00:33 INFO - raptor-browsertime Info: firefox failed to start, trying 2 more time(s): Failed to start firefox in 180 seconds.
[task 2023-01-05T01:06:34.928Z] 01:00:33 INFO - raptor-browsertime Info: Skip setting default preferences for Firefox
[task 2023-01-05T01:06:34.928Z] 01:03:33 INFO - raptor-browsertime Info: firefox failed to start, trying 1 more time(s): Failed to start firefox in 180 seconds.
[task 2023-01-05T01:06:34.928Z] 01:03:33 INFO - raptor-browsertime Info: Skip setting default preferences for Firefox
[task 2023-01-05T01:06:34.928Z] 01:06:33 INFO - raptor-browsertime Info: firefox failed to start, trying 0 more time(s): Failed to start firefox in 180 seconds.
[task 2023-01-05T01:06:34.928Z] 01:06:33 ERROR - raptor-browsertime Error: Browsertime failed to run
[task 2023-01-05T01:06:34.928Z] 01:06:33 INFO - raptor-browsertime Info: at SeleniumRunner.start (/builds/task_167287963545905/fetches/browsertime/node_modules/browsertime/lib/core/seleniumRunner.js:91:13)
[task 2023-01-05T01:06:34.928Z] 01:06:33 INFO - raptor-browsertime Info: at async Iteration.run (/builds/task_167287963545905/fetches/browsertime/node_modules/browsertime/lib/core/engine/iteration.js:118:7)
[task 2023-01-05T01:06:34.929Z] 01:06:33 INFO - raptor-browsertime Info: at async Engine.runByScript (/builds/task_167287963545905/fetches/browsertime/node_modules/browsertime/lib/core/engine/index.js:274:20)
[task 2023-01-05T01:06:34.929Z] 01:06:33 INFO - raptor-browsertime Info: at async run (/builds/task_167287963545905/fetches/browsertime/node_modules/browsertime/bin/browsertime.js:104:22)
[task 2023-01-05T01:06:34.929Z] 01:06:33 ERROR - raptor-browsertime Error: Browsertime failed to run
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 20•3 years ago
•
|
||
:zmckenny, do you know when bug 1808586 might get resolved? I'm wondering if I should disable all the Fenix performance tests as they are failing far too often but I'm not sure how long the fix may take.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Assignee | ||
Comment 23•3 years ago
|
||
:sparky I don't know but I did talk with the Glean team today and they are aware of it. Sounds like they'll probably triage or look at it this week.
| Comment hidden (Intermittent Failures Robot) |
Comment 25•3 years ago
|
||
Tracking for Fenix 111 because Dave Hunt says Glean bug 1808586 is causing a high number of failures (~50%) in Fenix performance tests in CI.
| Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
| Comment hidden (Intermittent Failures Robot) |
Comment 28•3 years ago
|
||
Assigning to Zac because he will verify that Jan-Erik's fix for Glean bug 1808586 fixes these test failures.
| Assignee | ||
Comment 29•3 years ago
•
|
||
I will still test this to verify the fix but adding STR below.
Note: Because we only care about the race condition for fixing this failure this STR for local builds is likely not suitable for other bugs due to it allowing a device that is not used for normal perf tests (on an emulator no less).
STR on local emulator
- Create an emulated device such as a Pixel 6 (I used API 30 without Google Play)
- Set ADB to root by running
adb root - Run device
- add
raptorEnabledtolocal.propertiesin Fenix project - Install nightly variant of Fenix to device
- Navigate to Gecko project root
- edit <root of Gecko project>/testing/raptor/raptor/power.py
- In the function
get_device_typeadd aneliffor the device you are using with apass - My example was
elif device_type == "sdk_gphone_arm64": pass
- In the function
- Use the failing Amazon test to determine if race condition is resolved by running
./mach raptor --test amazon --app fenix --binary org.mozilla.fenix
I've confirmed that this test fails if Glean is initialized after engine warmup in FenixApplication and succeeds if Glean is initialized first.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
| Comment hidden (Intermittent Failures Robot) |
Comment 35•3 years ago
|
||
It looks like the issue is resolved with yesterday's push! https://treeherder.mozilla.org/jobs?repo=fenix&revision=d8cd52189cdb8a784e2e123ce605266ddcede7d7
Thank you for the help everyone!
Updated•3 years ago
|
Comment 36•3 years ago
|
||
The patch landed in nightly and beta is affected.
:zmckenney, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox110towontfix.
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Description
•