Closed Bug 1808231 Opened 3 years ago Closed 3 years ago

Frequent fenix browsertime [taskcluster:error] Aborting task...

Categories

(Testing :: Raptor, defect, P2)

defect

Tracking

(firefox-esr102 unaffected, firefox108 unaffected, firefox109 unaffected, firefox110 wontfix, firefox111 fixed)

RESOLVED FIXED
Tracking Status
firefox-esr102 --- unaffected
firefox108 --- unaffected
firefox109 --- unaffected
firefox110 --- wontfix
firefox111 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: zmckenney)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [geckoview:m111][stockwell disable-recommended])

Attachments

(1 obsolete file)

Filed by: archaeopteryx [at] coole-files.de
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=401127404&repo=fenix
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/KZQ3t02DTxKlJaFfHkI0Yg/runs/0/artifacts/public/logs/live_backing.log


~40% of browsertime tasks for fenix hit the run time limit. This issue started on 2022-12-31.

[task 2023-01-02T06:46:45.575Z] 06:46:37     INFO -  raptor-browsertime Info: Starting a browsertime pageload
[task 2023-01-02T06:46:45.576Z] 06:46:37     INFO -  raptor-browsertime Info: Waiting for 1000 ms (post_startup_delay)
[task 2023-01-02T06:46:45.576Z] 06:46:38     INFO -  raptor-browsertime Info: Navigating to about:blank, count: 0
[task 2023-01-02T06:46:45.576Z] 06:46:38     INFO -  raptor-browsertime Info: Navigating to url about:blank iteration 14
[task 2023-01-02T06:46:45.576Z] 06:46:44     INFO -  raptor-browsertime Info: Navigating to primary url:https://www.allrecipes.com/
[task 2023-01-02T06:46:45.576Z] 06:46:44     INFO -  raptor-browsertime Info: Cycle 0, waiting for 1000 ms
[task 2023-01-02T06:46:45.576Z] 06:46:45     INFO -  raptor-browsertime Info: Cycle 0, starting the measure
[task 2023-01-02T06:46:45.576Z] 06:46:45     INFO -  raptor-browsertime Info: Testing url https://www.allrecipes.com/ iteration 14
[taskcluster:error] Aborting task...

There was an infrastructure change related to HTTP/3, potentially 1-2 days earlier. Please reach out to :jlorenzo if you need more information.

I don't see backfills in the custom actions, and adding the task to these pushes fails. Maybe Johan can add/run the browsertime allrecipes task to the two pushes which don't have it yet?

Flags: needinfo?(aryx.bugmail)
Assignee: nobody → jlorenzo
Status: NEW → ASSIGNED

Hi there! Crap, I'm facing the same scope issue. It'll be fixed by the patch I just attached.

Assignee: jlorenzo → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(jlorenzo)
Assignee: nobody → jlorenzo
Status: NEW → ASSIGNED
Depends on: 1808334

Comment on attachment 9310502 [details]
Bug 1808231 - Allow browsertime tasks to be scheduled by an Action task r=jcristau

Revision D165870 was moved to bug 1808334. Setting attachment 9310502 [details] to obsolete.

Attachment #9310502 - Attachment is obsolete: true

(In reply to Johan Lorenzo [:jlorenzo] from comment #5)

It'll be fixed by the patch I just attached.

Update: It'll be fixed in bug 1808334

Assignee: jlorenzo → nobody
Status: ASSIGNED → NEW

Excellent, thanks :jlorenzo!

(In reply to Greg Mierzwinski [:sparky] from comment #3)

Ok, :jlorenzo, would you be able to add new tasks for browsertime to the two pushes below? :aryx, and I are having issues triggering them.
(1) https://treeherder.mozilla.org/jobs?repo=fenix&tier=1%2C2%2C3&searchStr=browsertime&revision=4946c2795c357521e1c03c9368c4dc4526d2c9a6
(2) https://treeherder.mozilla.org/jobs?repo=fenix&tier=1%2C2%2C3&searchStr=browsertime&revision=bdf6432abce70c5c9633390edb2fd7a74392cafd

New tasks added! You guys should be able to schedule them next time. Let me know if that's not the case 🙂

Will do, thanks for the quick fix Johan!

So far based on the backfill, it looks like the failure started on this top commit: https://treeherder.mozilla.org/jobs?repo=fenix&tochange=4946c2795c357521e1c03c9368c4dc4526d2c9a6&fromchange=d07de1e028350bfb2b28cd2ad2e92b735c2ac88c

Which is from bug 1807716.

Regressed by: 1807716

:zmckenney, would you know what might be happening here? Here's a logcat log if that helps (it seems to fail on browser startup): https://firefoxci.taskcluster-artifacts.net/DcDeZLjkQSyHjMRBbGPQew/0/public/test_info/logcat-R58R36P8E0D.log

We also start the app a bit differently which involves glean. Here's a list of extra arguments we pass when we start the app:

--firefox.android.activity mozilla.telemetry.glean.debug.GleanDebugActivity 
--firefox.android.intentArgument=-a
--firefox.android.intentArgument android.intent.action.VIEW
--firefox.android.intentArgument=--es
--firefox.android.intentArgument=startNext
--firefox.android.intentArgument=org.mozilla.fenix.IntentReceiverActivity
--firefox.android.intentArgument=--esa
--firefox.android.intentArgument=sourceTags
--firefox.android.intentArgument=automation
--firefox.android.intentArgument=--ez
--firefox.android.intentArgument=performancetest
--firefox.android.intentArgument=true
--firefox.android.intentArgument=-d
--firefox.android.intentArgument about:blank

This was taken from this log: https://firefoxci.taskcluster-artifacts.net/cyDNKHuTSkKhq_6JRhl-nw/0/public/logs/live_backing.log
You can find more tests here: https://treeherder.mozilla.org/jobs?repo=fenix&tochange=4946c2795c357521e1c03c9368c4dc4526d2c9a6&fromchange=d07de1e028350bfb2b28cd2ad2e92b735c2ac88c

Let me know if you need anything else from me.

Flags: needinfo?(zmckenney)

[:sparky] thanks for adding the startup of the app since this pointed me to where to look. This is failing because of an existing race condition here that checks whether Glean is initialized. Because Glean is initialized asynchronously this was never guaranteed to return true and because of the change in initialization order it now returns false and finishes.

The change to the order of application initialization is still proper since we want to warm up the engine before initializing Glean. I'm not sure how the test harness is setup but waiting for Glean to be initialized will solve this issue. A temporary fix for this might be adding a wait or repeating retry with timeout in GleanDebugActivity. A better fix would probably be in observing whether Glean has been initialized inside GleanDebugActivity before running the tests.

Hope this helps!

Flags: needinfo?(zmckenney)

That's very helpful, thanks! Regarding how our test harness is setup, we don't do any explicit waiting for the GleanDebugActivity to be initialized.

I see two ideas for how we can resolve this: (1) increase the wait time after browser initialization (flaky), and (2) resolve the race condition in Fenix.

I'll start on option (1) with an additional 5 second wait and see how it goes.

:zmckenney, would you know if Glean is exposed in JS in a way that would allow us to wait for it to exist? This would be a better solution versus waiting a hard-coded amount of time.

Flags: needinfo?(zmckenney)

:sparky I don't know about the JS visibility of Glean but I think we're going to need to check the java API initialization. I've created bug 1808586 for waiting on the initialization of Glean before proceeding with the tests which I don't think will be too difficult.

Flags: needinfo?(zmckenney)

Set release status flags based on info from the regressing bug 1807716

Ok that would be great thanks! I should note that the waiting I tried didn't work because we're failing before that code:

[task 2023-01-05T01:06:34.928Z] 00:57:33     INFO -  raptor-browsertime Info: Skip setting default preferences for Firefox
[task 2023-01-05T01:06:34.928Z] 01:00:33     INFO -  raptor-browsertime Info: firefox failed to start, trying 2 more time(s): Failed to start firefox in 180 seconds.
[task 2023-01-05T01:06:34.928Z] 01:00:33     INFO -  raptor-browsertime Info: Skip setting default preferences for Firefox
[task 2023-01-05T01:06:34.928Z] 01:03:33     INFO -  raptor-browsertime Info: firefox failed to start, trying 1 more time(s): Failed to start firefox in 180 seconds.
[task 2023-01-05T01:06:34.928Z] 01:03:33     INFO -  raptor-browsertime Info: Skip setting default preferences for Firefox
[task 2023-01-05T01:06:34.928Z] 01:06:33     INFO -  raptor-browsertime Info: firefox failed to start, trying 0 more time(s): Failed to start firefox in 180 seconds.
[task 2023-01-05T01:06:34.928Z] 01:06:33    ERROR -  raptor-browsertime Error: Browsertime failed to run
[task 2023-01-05T01:06:34.928Z] 01:06:33     INFO -  raptor-browsertime Info:     at SeleniumRunner.start (/builds/task_167287963545905/fetches/browsertime/node_modules/browsertime/lib/core/seleniumRunner.js:91:13)
[task 2023-01-05T01:06:34.928Z] 01:06:33     INFO -  raptor-browsertime Info:     at async Iteration.run (/builds/task_167287963545905/fetches/browsertime/node_modules/browsertime/lib/core/engine/iteration.js:118:7)
[task 2023-01-05T01:06:34.929Z] 01:06:33     INFO -  raptor-browsertime Info:     at async Engine.runByScript (/builds/task_167287963545905/fetches/browsertime/node_modules/browsertime/lib/core/engine/index.js:274:20)
[task 2023-01-05T01:06:34.929Z] 01:06:33     INFO -  raptor-browsertime Info:     at async run (/builds/task_167287963545905/fetches/browsertime/node_modules/browsertime/bin/browsertime.js:104:22)
[task 2023-01-05T01:06:34.929Z] 01:06:33    ERROR -  raptor-browsertime Error: Browsertime failed to run
Depends on: 1808586

:zmckenny, do you know when bug 1808586 might get resolved? I'm wondering if I should disable all the Fenix performance tests as they are failing far too often but I'm not sure how long the fix may take.

Flags: needinfo?(zmckenney)
See Also: → 1809175

:sparky I don't know but I did talk with the Glean team today and they are aware of it. Sounds like they'll probably triage or look at it this week.

Flags: needinfo?(zmckenney)

Tracking for Fenix 111 because Dave Hunt says Glean bug 1808586 is causing a high number of failures (~50%) in Fenix performance tests in CI.

Priority: P5 → P2
Whiteboard: [stockwell disable-recommended] → [stockwell disable-recommended] [geckoview:m111?]
Rank: 111

Assigning to Zac because he will verify that Jan-Erik's fix for Glean bug 1808586 fixes these test failures.

Assignee: nobody → zmckenney
Whiteboard: [stockwell disable-recommended] [geckoview:m111?] → [stockwell disable-recommended] [geckoview:m111]

I will still test this to verify the fix but adding STR below.

Note: Because we only care about the race condition for fixing this failure this STR for local builds is likely not suitable for other bugs due to it allowing a device that is not used for normal perf tests (on an emulator no less).

STR on local emulator

  • Create an emulated device such as a Pixel 6 (I used API 30 without Google Play)
  • Set ADB to root by running adb root
  • Run device
  • add raptorEnabled to local.properties in Fenix project
  • Install nightly variant of Fenix to device
  • Navigate to Gecko project root
  • edit <root of Gecko project>/testing/raptor/raptor/power.py
    • In the function get_device_type add an elif for the device you are using with a pass
    • My example was elif device_type == "sdk_gphone_arm64": pass
  • Use the failing Amazon test to determine if race condition is resolved by running ./mach raptor --test amazon --app fenix --binary org.mozilla.fenix

I've confirmed that this test fails if Glean is initialized after engine warmup in FenixApplication and succeeds if Glean is initialized first.

Whiteboard: [stockwell disable-recommended] [geckoview:m111] → [geckoview:m111]

It looks like the issue is resolved with yesterday's push! https://treeherder.mozilla.org/jobs?repo=fenix&revision=d8cd52189cdb8a784e2e123ce605266ddcede7d7

Thank you for the help everyone!

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED

The patch landed in nightly and beta is affected.
:zmckenney, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox110 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(zmckenney)
Flags: needinfo?(zmckenney)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: