1776050 - Intermittent toolkit/components/narrate/test/browser_narrate.js | single tracking bug

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Reporter

Description

•

3 years ago

No description provided.

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Reporter

Comment 1

•

3 years ago

Additional information about this bug failures and frequency patterns can be found by running: ./mach test-info failure-report --bug 1776050

:Gijs (he/him)

Comment 2

•

3 years ago

Mass marking intermittent trackers as P5/S4 if less than 5 instances a week. Please clear the severity and priority field if the frequency increases and needinfo triage owners.

Severity: -- → S4

Priority: P3 → P5

BugBot [:suhaib / :marco/ :calixte]

Comment 3

•

3 years ago

https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit auto_nag documentation.

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → INCOMPLETE

Treeherder Bug Filer

Comment 4

•

3 years ago

treeherder

New failure instance: https://treeherder.mozilla.org/logviewer?job_id=393121987&repo=autoland

Status: RESOLVED → REOPENED

Resolution: INCOMPLETE → ---

Comment hidden (Intermittent Failures Robot)

Marian-Vasile Laza

Comment 6

•

3 years ago

There have been 34 total failures in the last 7 days, recent failure log.
Affected platforms are:

linux1804-64-qr
linux1804-64-shippable-qr

Flags: needinfo?(gijskruitbosch+bugs)

Whiteboard: [stockwell needswork:owner]

Comment hidden (Intermittent Failures Robot)

:Gijs (he/him)

Comment 8

•

3 years ago

Eemeli, looks like these tests started failing after the fluent refactor of reader mode. I don't suppose you have any idea why that might have made this test sad?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(earo)

Comment hidden (Intermittent Failures Robot)

Mike Conley (:mconley) (:⚙️)

Updated

•

3 years ago

Priority: -- → P3

Eemeli Aro [:eemeli]

Comment 11

•

3 years ago

Sorry for the delay; properly back at work today after a few weeks' travels.

This might be a timing issue. The first failing test is checking that the dropdown has closed after a scroll, which is functionality provided by the parent AboutReader controls. There, scroll events that are triggered by changes in the page height are ignored, but they're not ignored in the test.

Could it be that in some situations a Fluent formatting pass manages to happen at just the wrong time to trigger such an ignored scroll event on account of inserting some content into the DOM, such as the estimated reading time?

Eemeli Aro [:eemeli]

Updated

•

3 years ago

Flags: needinfo?(earo) → needinfo?(gijskruitbosch+bugs)

:Gijs (he/him)

Comment 12

•

3 years ago

(In reply to Eemeli Aro [:eemeli] from comment #11)

Could it be that in some situations a Fluent formatting pass manages to happen at just the wrong time to trigger such an ignored scroll event on account of inserting some content into the DOM, such as the estimated reading time?

This is an interesting hypothesis. I'd like to add some logging to confirm. But I don't think there's an easy way to catch in JS when the "natural" processing of fluent has finished, right? What could we hang the logging off?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(earo)

Comment hidden (Intermittent Failures Robot)

Eemeli Aro [:eemeli]

Comment 14

•

3 years ago

(In reply to :Gijs (he/him) from comment #12)

This is an interesting hypothesis. I'd like to add some logging to confirm. But I don't think there's an easy way to catch in JS when the "natural" processing of fluent has finished, right? What could we hang the logging off?

Correct; there's no explicit event for when Fluent DOM translations are ready. Could maybe track the contents of all nodes with a data-l10n-id, and compare them on each scroll event, as well as the body height?

Indirectly we could confirm my hypothesis by modifying the test I linked to earlier so that it applies the same body height filter as in the runtime code to the scroll events it's waiting for, and seeing if that gets rid of the intermittent failures. At the very least, this should make the expected behaviour a bit more like what's intended.

Would you like me to submit a patch for that?

Flags: needinfo?(earo) → needinfo?(gijskruitbosch+bugs)

:Gijs (he/him)

Comment 15

•

3 years ago

(In reply to Eemeli Aro [:eemeli] from comment #14)

Indirectly we could confirm my hypothesis by modifying the test I linked to earlier so that it applies the same body height filter as in the runtime code to the scroll events it's waiting for, and seeing if that gets rid of the intermittent failures. At the very least, this should make the expected behaviour a bit more like what's intended.

Would you like me to submit a patch for that?

Yes please! That seems worth a shot. :-)

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(earo)

Eemeli Aro [:eemeli]

Comment 16

•

3 years ago

Once this try run completes without triggering the failure, I'll submit the patch for it: https://treeherder.mozilla.org/jobs?repo=try&revision=03844d39ba9ba7a91b32a3d3118b8f0e0dcdc6ea

Flags: needinfo?(earo)

:Gijs (he/him)

Comment 17

•

3 years ago

(In reply to Eemeli Aro [:eemeli] from comment #16)

Once this try run completes without triggering the failure, I'll submit the patch for it: https://treeherder.mozilla.org/jobs?repo=try&revision=03844d39ba9ba7a91b32a3d3118b8f0e0dcdc6ea

Thanks! I triggered test-verify jobs on that trypush for you. I'm not 100% confident they'll work correctly when triggered after the fact for trypushes but I guess we'll find out.

Eemeli Aro [:eemeli]

Comment 18

•

3 years ago

Looks like I was wrong, and filtering the scroll events doesn't help. I submitted a second try just to make sure. I'm a bit out of ideas here, esp. as this doesn't seem to replicate for me locally (M1 Macbook).

Flags: needinfo?(gijskruitbosch+bugs)

Comment hidden (Intermittent Failures Robot)

:Gijs (he/him)

Comment 20

•

3 years ago

•

Edited

I don't have access to a Linux box again until Wednesday, but the screenshot from a failed test makes me wonder if we're scrolled all the way to the top. The fact that we're scrolling by 10px is making this quite tricky to establish at all, I guess. I've asked pernosco to repro in case that pulls up anything, but if it's some kind of race condition then I suspect the slowdown from running under rr might mean this won't bring up anything.

Comment hidden (Intermittent Failures Robot)

:Gijs (he/him)

Comment 22

•

3 years ago

•

Edited

(In reply to :Gijs (he/him) from comment #20)

I don't have access to a Linux box again until Wednesday, but the screenshot from a failed test makes me wonder if we're scrolled all the way to the top. The fact that we're scrolling by 10px is making this quite tricky to establish at all, I guess. I've asked pernosco to repro in case that pulls up anything, but if it's some kind of race condition then I suspect the slowdown from running under rr might mean this won't bring up anything.

Apparently pernosco is very unhappy about this repro attempt. Will try to take a look on my linux vm tomorrow.

Comment hidden (Intermittent Failures Robot)

:Gijs (he/him)

Comment 24

•

3 years ago

(In reply to :Gijs (he/him) from comment #22)

Apparently pernosco is very unhappy about this repro attempt. Will try to take a look on my linux vm tomorrow.

Unfortunately I can't reproduce the issue at all locally.

Between that and not having any ideas as to what's wrong, and pernosco not working, I'm honestly not 100% sure how to investigate further. We can try adding some logging and debug on try... but it's gonna be tedious.

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Reporter

Comment 25

•

3 years ago

these are failing on the spi-nw variant, so most likely to reproduce it needs to have those preferences set. I don't know if pernosco or local attempts to reproduce were using those preferences.

:Gijs (he/him)

Comment 26

•

3 years ago

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #25)

these are failing on the spi-nw variant, so most likely to reproduce it needs to have those preferences set.

Thanks for reminding me, it's a good catch. Unfortunately I appear to have no luck with those set, either. I tried with --verify, I tried running the directory, but no cigar.

I don't know if pernosco or local attempts to reproduce were using those preferences.

I think pernosco would, yeah - my earlier local attempts definitely forgot. I'm also trying with artifact builds, but if anything I'd expect that to be closer to what's run on infra, vs. a "real" local build...

Comment hidden (Intermittent Failures Robot)

Florian Quèze [:florian]

Comment 29

•

3 years ago

I profiled this test failure per Gijs' suggestion. After looking at multiple profiles, my current impression from what I saw is:

The test fails on jobs that run on a single core Linux machine. The failure is more likely related to the single core hardware than to the Linux operating system. And actually, I could reproduce on my M1 macbook pro by forcing the test to only run on power efficiency cores (taskpolicy -c background ./mach test toolkit/components/narrate/test/browser_narrate.js --profiler --run-until-failure reproduces for me)
The test seems to fail when "Runnable - Document::NotifyIntersectionObservers" happens during "ContentTaskUtils.waitForEvent - scroll": https://share.firefox.dev/3UXfW5G
I noticed that on the slow Linux machine, the test document hasn't finished visually loading (it is not shown yet on the profiler screenshots) when the actual test starts. I pushed to try with an arbitrary 4s wait before the start of the test, and that try run was green. The "Document::NotifyIntersectionObservers" runnable runs during the 4s arbitrary wait: https://share.firefox.dev/3uUbom7 I think it runs at the time the page finishes loading.

In the failing case, the style tree is invalidated with this stack:

XRE_InitChildProcess []
Task Document::NotifyIntersectionObservers []
IntersectionCallback []
js::RunScript []
$bound [self-hosted:1494:24]
js::RunScript []
_topScrollChange [resource://gre/modules/AboutReader.jsm:1321:18]
DOMTokenList.toggle []

In the passing case, I see invalidations with these stacks:

XRE_InitChildProcess []
promise callback []
js::RunScript []
AsyncFunctionNext [self-hosted:945:26]
js::RunScript []
InterpretGeneratorResume [self-hosted:1957:33]
(root scope) chrome://mochitests/content/browser/toolkit/components/narrate/test/browser_narrate.js [chrome://mochitests/content/browser/toolkit/components/narrate/test/browser_narrate.js:13:15]
Window.scrollBy []

XRE_InitChildProcess []
Task PVsync::Msg_Notify(unlabeled) []
PVsync::Msg_Notify []
RefreshDriver tick []
EventDispatcher::Dispatch []
EventDispatcher::Dispatch scroll []
EventListener.handleEvent []
js::RunScript []
handleEvent [resource://gre/modules/AboutReader.jsm:365:13]
_closeDropdowns [resource://gre/modules/AboutReader.jsm:1240:17]
DOMTokenList.remove []

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Reporter

Comment 30

•

3 years ago

so our tasks are being run sometimes on single core machines? curious if this is perma fail on single core, or intermittent on single core. Our tests should be running on something more realistic- this test runs on t-linux-large-gcp:
https://firefox-ci-tc.services.mozilla.com/provisioners/gecko-t/worker-types/t-linux-large-gcp?sortBy=Last%20Active&sortDirection=desc

we run on a n2-standard-2:
https://hg.mozilla.org/ci/ci-configuration/file/e49187d9b67b02ace49930eae0836defa26098c3/worker-pools.yml#l3905
https://cloud.google.com/compute/docs/general-purpose-machines#n2_machines

:masterwayz- can you confirm what we are running at gcp and the cores? I see we have 2 vCPUs and 8GB of RAM for the machines.

Flags: needinfo?(mgoossens)

Florian Quèze [:florian]

Comment 31

•

3 years ago

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #30)

so our tasks are being run sometimes on single core machines? curious if this is perma fail on single core, or intermittent on single core.

Intermittent: it fails about 50% of the time in my try run with the profiler running. The profiler likely takes some CPU resources away from the rest of what's happening and may help reproducing.

The CPU info seen by the profiler is:
CPU name: Intel(R) Xeon(R) CPU @ 2.80GHz
CPU cores: ⁨⁨1⁩ physical core⁩, ⁨⁨2⁩ logical cores⁩
I think this matches the "2 vCPUs"

Our tests should be running on something more realistic

https://data.firefox.com/dashboard/hardware says single core machines still represents more than 1.5% of our users. It might still makes sense to have some test coverage on single core machines, but it would be nicer if the machine being single core was clearly visible on treeherder.

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Reporter

Comment 32

•

3 years ago

I think in a more ideal world we should have the 1core (current n2-standard-2) be for our linux32 tests, and then pick a 2 or 4 core machine for the rest of our tests (linux64). I am sure we have plenty of cores on osx/windows/android.

:masterwayz, to add on to my request, is there an option at gcp with 2 || 4 cores that we could add to the pool and test?

Michelle Goossens [:masterwayz]

Comment 33

•

3 years ago

So I checked; (as with most virtualization technologies, some allow you to manually set these things), one vCPU = one thread on the physical host CPU.
So that means that two vCPU would be one core with both of its threads.
Four vCPU would be two cores with each their associated hyperthreads. And not all modern CPUs (the lower end ones) don't have hyperthreading either. But they both work the same. How it gets reported shouldn't matter as it should give the same amount of power either way.

To answer Joel; there if mean physical cores:
n2-standard-4 = 2 cores with 4 threads, so four vCPUs total
n2-standard-8 = 4 cores with 8 threads, so eight vCPUs total.

Flags: needinfo?(mgoossens)

Comment hidden (Intermittent Failures Robot)

:Gijs (he/him)

Comment 35

•

3 years ago

Sounds like we want to have a separate bug to change what machines get used for these tests - Joel, can you file that based on the info from Michelle?

I'll try to find time to dig into the test failure some more with the info Florian unearthed from the profiling work anyway.

Flags: needinfo?(jmaher)

Comment hidden (Intermittent Failures Robot)

Natalia Csoregi [:nataliaCs]

Updated

•

3 years ago

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Iulian Moraru

Comment 38

•

3 years ago

This fails frequently with this failure line TEST-UNEXPECTED-FAIL | toolkit/components/narrate/test/browser_narrate.js | popup is hidden after scroll - false == true - got false, expected true (operator ==).
Backfill range and retriggers: https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception%2Crunnable&fromchange=d6aa92d2655910b034dd430187865b1dd646f01f&searchStr=linux%2C18.04%2Cx64%2Cwebrender%2Copt%2Cmochitests%2Cwith%2Cnetworking%2Con%2Csocket%2Cprocess%2Ctest-linux1804-64-qr%2Fopt-mochitest-browser-chrome-spi-nw%2Cbc5&group_state=expanded&selectedTaskRun=biHWUi-jReG7AGQEC-jgsA.0&tochange=345ec3c4436ae80e5f25367b1a07e301aaa416c5
I see that this is an old issue, I will update the bug summary to depict the current situation with this bug.

Summary: Intermittent toolkit/components/narrate/test/browser_narrate.js | single tracking bug → High frequency toolkit/components/narrate/test/browser_narrate.js | single tracking bug

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Reporter

Updated

•

3 years ago

Comment 39

•

3 years ago

filed bug 1807666 to track work on multi-core

Flags: needinfo?(jmaher)

Comment hidden (Intermittent Failures Robot)

Marian-Vasile Laza

Updated

•

3 years ago

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Comment hidden (Intermittent Failures Robot)

Cosmin Sabou [:CosminS]

Comment 43

•

3 years ago

Given that this has 234 total failures in the last 30 days https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2022-12-05&endday=2023-01-04&tree=trunk&failurehash=all&bug=1776050 and 99% of them are on linux opt, could we go ahead and disable this until a fix is in place?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Reporter

Comment 44

•

3 years ago

yes, once the fix is ready it is super easy to turn the test back on!

Flags: needinfo?(jmaher)

Cosmin Sabou [:CosminS]

Comment 45

•

3 years ago

Attached file Bug 1776050 - Disable browser_narrate.js on linux opt for frequent failures. r=jmaher — Details

Phabricator Automation

Updated

•

3 years ago

Assignee: nobody → csabou

Pulsebot

Comment 46

•

3 years ago

Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0c86cc71e8f7 Disable browser_narrate.js on linux opt for frequent failures. r=jmaher,intermittent-reviewers DONTBUILD

Cosmin Sabou [:CosminS]

Updated

•

3 years ago

Assignee: csabou → nobody

Keywords: leave-open

Whiteboard: [stockwell disable-recommended] → [stockwell disabled]

Marian-Vasile Laza

Comment 47

•

3 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0c86cc71e8f7

Cosmin Sabou [:CosminS]

Updated

•

3 years ago

Summary: High frequency toolkit/components/narrate/test/browser_narrate.js | single tracking bug → Intermittent toolkit/components/narrate/test/browser_narrate.js | single tracking bug

Comment hidden (Intermittent Failures Robot)

:Gijs (he/him)

Comment 49

•

3 years ago

(In reply to Florian Quèze [:florian] from comment #29)

I profiled this test failure per Gijs' suggestion. After looking at multiple profiles, my current impression from what I saw is:

The test fails on jobs that run on a single core Linux machine. The failure is more likely related to the single core hardware than to the Linux operating system. And actually, I could reproduce on my M1 macbook pro by forcing the test to only run on power efficiency cores (taskpolicy -c background ./mach test toolkit/components/narrate/test/browser_narrate.js --profiler --run-until-failure reproduces for me)

It doesn't for me, also on an M1 macbook pro. :-(

Is there some other way to make the local mach test invocation still-closer to resembling the single core Linux machine?

The test seems to fail when "Runnable - Document::NotifyIntersectionObservers" happens during "ContentTaskUtils.waitForEvent - scroll": https://share.firefox.dev/3UXfW5G

This is confusing because that is what is supposed to happen, AIUI - the intersection observer should be triggered when there's a scroll, and that will also resolve the waitForEvent promise, right?

I noticed that on the slow Linux machine, the test document hasn't finished visually loading (it is not shown yet on the profiler screenshots) when the actual test starts. I pushed to try with an arbitrary 4s wait before the start of the test, and that try run was green. The "Document::NotifyIntersectionObservers" runnable runs during the 4s arbitrary wait: https://share.firefox.dev/3uUbom7 I think it runs at the time the page finishes loading.

I guess we could wait for the document to be painted in the child? Perhaps that would be enough?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(florian)

Florian Quèze [:florian]

Comment 50

•

3 years ago

(In reply to :Gijs (he/him) from comment #49)

(In reply to Florian Quèze [:florian] from comment #29)

I profiled this test failure per Gijs' suggestion. After looking at multiple profiles, my current impression from what I saw is:

The test fails on jobs that run on a single core Linux machine. The failure is more likely related to the single core hardware than to the Linux operating system. And actually, I could reproduce on my M1 macbook pro by forcing the test to only run on power efficiency cores (taskpolicy -c background ./mach test toolkit/components/narrate/test/browser_narrate.js --profiler --run-until-failure reproduces for me)

It doesn't for me, also on an M1 macbook pro. :-(

Is there some other way to make the local mach test invocation still-closer to resembling the single core Linux machine?

I don't know. I can still reproduce failures of the test with my current mozilla-central local build with the exact command from comment 29. The first time I got a "Test timed out" failure, and the second time a failure similar to what we were investigating here.

The test seems to fail when "Runnable - Document::NotifyIntersectionObservers" happens during "ContentTaskUtils.waitForEvent - scroll": https://share.firefox.dev/3UXfW5G

This is confusing because that is what is supposed to happen, AIUI - the intersection observer should be triggered when there's a scroll, and that will also resolve the waitForEvent promise, right?

I don't know what's expected or unexpected. I only observed a difference in the markers in the passing runs vs the failing run. Here's a view of the same profile that shows also the passing runs. https://share.firefox.dev/3jXlEYN

I guess we could wait for the document to be painted in the child? Perhaps that would be enough?

Maybe? It would be nice to understand the root cause of the intermittent failure, but we might indeed also want a quick solution to hide the problem.

Flags: needinfo?(florian)

Comment hidden (Intermittent Failures Robot)

Mike Conley (:mconley) (:⚙️)

Updated

•

2 years ago

Attachment #9383184 - Attachment is obsolete: true

Comment hidden (Intermittent Failures Robot)

BugBot [:suhaib / :marco/ :calixte]

Comment 57

•

11 months ago

https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.

Status: REOPENED → RESOLVED

Closed: 3 years ago → 11 months ago

Resolution: --- → INCOMPLETE

BugBot (nomail) [:suhaib / :marco/ :calixte]

Updated

•

11 months ago

Keywords: leave-open

Bug 1776050 - Disable browser_narrate.js on linux opt for frequent failures. r=jmaher 3 years ago Cosmin Sabou [:CosminS] 48 bytes, text/x-phabricator-request		Details \| Review
same problem 2 years ago Jeffrey Sujumnong 35 bytes, text/html		Details