Closed Bug 1776050 Opened 3 years ago Closed 11 months ago

Intermittent toolkit/components/narrate/test/browser_narrate.js | single tracking bug

Categories

(Toolkit :: Reader Mode, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: intermittent-failure, intermittent-testcase, Whiteboard: [stockwell disabled])

Attachments

(1 file, 1 obsolete file)

No description provided.

Additional information about this bug failures and frequency patterns can be found by running: ./mach test-info failure-report --bug 1776050

Mass marking intermittent trackers as P5/S4 if less than 5 instances a week. Please clear the severity and priority field if the frequency increases and needinfo triage owners.

Severity: -- → S4
Priority: P3 → P5
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

There have been 34 total failures in the last 7 days, recent failure log.
Affected platforms are:

  • linux1804-64-qr
  • linux1804-64-shippable-qr
Flags: needinfo?(gijskruitbosch+bugs)
Whiteboard: [stockwell needswork:owner]

Eemeli, looks like these tests started failing after the fluent refactor of reader mode. I don't suppose you have any idea why that might have made this test sad?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(earo)
Priority: -- → P3

Sorry for the delay; properly back at work today after a few weeks' travels.

This might be a timing issue. The first failing test is checking that the dropdown has closed after a scroll, which is functionality provided by the parent AboutReader controls. There, scroll events that are triggered by changes in the page height are ignored, but they're not ignored in the test.

Could it be that in some situations a Fluent formatting pass manages to happen at just the wrong time to trigger such an ignored scroll event on account of inserting some content into the DOM, such as the estimated reading time?

Flags: needinfo?(earo) → needinfo?(gijskruitbosch+bugs)

(In reply to Eemeli Aro [:eemeli] from comment #11)

Could it be that in some situations a Fluent formatting pass manages to happen at just the wrong time to trigger such an ignored scroll event on account of inserting some content into the DOM, such as the estimated reading time?

This is an interesting hypothesis. I'd like to add some logging to confirm. But I don't think there's an easy way to catch in JS when the "natural" processing of fluent has finished, right? What could we hang the logging off?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(earo)

(In reply to :Gijs (he/him) from comment #12)

This is an interesting hypothesis. I'd like to add some logging to confirm. But I don't think there's an easy way to catch in JS when the "natural" processing of fluent has finished, right? What could we hang the logging off?

Correct; there's no explicit event for when Fluent DOM translations are ready. Could maybe track the contents of all nodes with a data-l10n-id, and compare them on each scroll event, as well as the body height?

Indirectly we could confirm my hypothesis by modifying the test I linked to earlier so that it applies the same body height filter as in the runtime code to the scroll events it's waiting for, and seeing if that gets rid of the intermittent failures. At the very least, this should make the expected behaviour a bit more like what's intended.

Would you like me to submit a patch for that?

Flags: needinfo?(earo) → needinfo?(gijskruitbosch+bugs)

(In reply to Eemeli Aro [:eemeli] from comment #14)

Indirectly we could confirm my hypothesis by modifying the test I linked to earlier so that it applies the same body height filter as in the runtime code to the scroll events it's waiting for, and seeing if that gets rid of the intermittent failures. At the very least, this should make the expected behaviour a bit more like what's intended.

Would you like me to submit a patch for that?

Yes please! That seems worth a shot. :-)

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(earo)

Once this try run completes without triggering the failure, I'll submit the patch for it: https://treeherder.mozilla.org/jobs?repo=try&revision=03844d39ba9ba7a91b32a3d3118b8f0e0dcdc6ea

Flags: needinfo?(earo)

(In reply to Eemeli Aro [:eemeli] from comment #16)

Once this try run completes without triggering the failure, I'll submit the patch for it: https://treeherder.mozilla.org/jobs?repo=try&revision=03844d39ba9ba7a91b32a3d3118b8f0e0dcdc6ea

Thanks! I triggered test-verify jobs on that trypush for you. I'm not 100% confident they'll work correctly when triggered after the fact for trypushes but I guess we'll find out.

Looks like I was wrong, and filtering the scroll events doesn't help. I submitted a second try just to make sure. I'm a bit out of ideas here, esp. as this doesn't seem to replicate for me locally (M1 Macbook).

Flags: needinfo?(gijskruitbosch+bugs)

I don't have access to a Linux box again until Wednesday, but the screenshot from a failed test makes me wonder if we're scrolled all the way to the top. The fact that we're scrolling by 10px is making this quite tricky to establish at all, I guess. I've asked pernosco to repro in case that pulls up anything, but if it's some kind of race condition then I suspect the slowdown from running under rr might mean this won't bring up anything.

(In reply to :Gijs (he/him) from comment #20)

I don't have access to a Linux box again until Wednesday, but the screenshot from a failed test makes me wonder if we're scrolled all the way to the top. The fact that we're scrolling by 10px is making this quite tricky to establish at all, I guess. I've asked pernosco to repro in case that pulls up anything, but if it's some kind of race condition then I suspect the slowdown from running under rr might mean this won't bring up anything.

Apparently pernosco is very unhappy about this repro attempt. Will try to take a look on my linux vm tomorrow.

(In reply to :Gijs (he/him) from comment #22)

Apparently pernosco is very unhappy about this repro attempt. Will try to take a look on my linux vm tomorrow.

Unfortunately I can't reproduce the issue at all locally.

Between that and not having any ideas as to what's wrong, and pernosco not working, I'm honestly not 100% sure how to investigate further. We can try adding some logging and debug on try... but it's gonna be tedious.

these are failing on the spi-nw variant, so most likely to reproduce it needs to have those preferences set. I don't know if pernosco or local attempts to reproduce were using those preferences.

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #25)

these are failing on the spi-nw variant, so most likely to reproduce it needs to have those preferences set.

Thanks for reminding me, it's a good catch. Unfortunately I appear to have no luck with those set, either. I tried with --verify, I tried running the directory, but no cigar.

I don't know if pernosco or local attempts to reproduce were using those preferences.

I think pernosco would, yeah - my earlier local attempts definitely forgot. I'm also trying with artifact builds, but if anything I'd expect that to be closer to what's run on infra, vs. a "real" local build...

I profiled this test failure per Gijs' suggestion. After looking at multiple profiles, my current impression from what I saw is:

  • The test fails on jobs that run on a single core Linux machine. The failure is more likely related to the single core hardware than to the Linux operating system. And actually, I could reproduce on my M1 macbook pro by forcing the test to only run on power efficiency cores (taskpolicy -c background ./mach test toolkit/components/narrate/test/browser_narrate.js --profiler --run-until-failure reproduces for me)
  • The test seems to fail when "Runnable - Document::NotifyIntersectionObservers" happens during "ContentTaskUtils.waitForEvent - scroll": https://share.firefox.dev/3UXfW5G
  • I noticed that on the slow Linux machine, the test document hasn't finished visually loading (it is not shown yet on the profiler screenshots) when the actual test starts. I pushed to try with an arbitrary 4s wait before the start of the test, and that try run was green. The "Document::NotifyIntersectionObservers" runnable runs during the 4s arbitrary wait: https://share.firefox.dev/3uUbom7 I think it runs at the time the page finishes loading.

In the failing case, the style tree is invalidated with this stack:

XRE_InitChildProcess []
Task Document::NotifyIntersectionObservers []
IntersectionCallback []
js::RunScript []
$bound [self-hosted:1494:24]
js::RunScript []
_topScrollChange [resource://gre/modules/AboutReader.jsm:1321:18]
DOMTokenList.toggle []

In the passing case, I see invalidations with these stacks:

XRE_InitChildProcess []
promise callback []
js::RunScript []
AsyncFunctionNext [self-hosted:945:26]
js::RunScript []
InterpretGeneratorResume [self-hosted:1957:33]
(root scope) chrome://mochitests/content/browser/toolkit/components/narrate/test/browser_narrate.js [chrome://mochitests/content/browser/toolkit/components/narrate/test/browser_narrate.js:13:15]
Window.scrollBy []
XRE_InitChildProcess []
Task PVsync::Msg_Notify(unlabeled) []
PVsync::Msg_Notify []
RefreshDriver tick []
EventDispatcher::Dispatch []
EventDispatcher::Dispatch scroll []
EventListener.handleEvent []
js::RunScript []
handleEvent [resource://gre/modules/AboutReader.jsm:365:13]
_closeDropdowns [resource://gre/modules/AboutReader.jsm:1240:17]
DOMTokenList.remove []

so our tasks are being run sometimes on single core machines? curious if this is perma fail on single core, or intermittent on single core. Our tests should be running on something more realistic- this test runs on t-linux-large-gcp:
https://firefox-ci-tc.services.mozilla.com/provisioners/gecko-t/worker-types/t-linux-large-gcp?sortBy=Last%20Active&sortDirection=desc

we run on a n2-standard-2:
https://hg.mozilla.org/ci/ci-configuration/file/e49187d9b67b02ace49930eae0836defa26098c3/worker-pools.yml#l3905
https://cloud.google.com/compute/docs/general-purpose-machines#n2_machines

:masterwayz- can you confirm what we are running at gcp and the cores? I see we have 2 vCPUs and 8GB of RAM for the machines.

Flags: needinfo?(mgoossens)

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #30)

so our tasks are being run sometimes on single core machines? curious if this is perma fail on single core, or intermittent on single core.

Intermittent: it fails about 50% of the time in my try run with the profiler running. The profiler likely takes some CPU resources away from the rest of what's happening and may help reproducing.

The CPU info seen by the profiler is:
CPU name: Intel(R) Xeon(R) CPU @ 2.80GHz
CPU cores: ⁨⁨1⁩ physical core⁩, ⁨⁨2⁩ logical cores⁩
I think this matches the "2 vCPUs"

Our tests should be running on something more realistic

https://data.firefox.com/dashboard/hardware says single core machines still represents more than 1.5% of our users. It might still makes sense to have some test coverage on single core machines, but it would be nicer if the machine being single core was clearly visible on treeherder.

I think in a more ideal world we should have the 1core (current n2-standard-2) be for our linux32 tests, and then pick a 2 or 4 core machine for the rest of our tests (linux64). I am sure we have plenty of cores on osx/windows/android.

:masterwayz, to add on to my request, is there an option at gcp with 2 || 4 cores that we could add to the pool and test?

So I checked; (as with most virtualization technologies, some allow you to manually set these things), one vCPU = one thread on the physical host CPU.
So that means that two vCPU would be one core with both of its threads.
Four vCPU would be two cores with each their associated hyperthreads. And not all modern CPUs (the lower end ones) don't have hyperthreading either. But they both work the same. How it gets reported shouldn't matter as it should give the same amount of power either way.

To answer Joel; there if mean physical cores:
n2-standard-4 = 2 cores with 4 threads, so four vCPUs total
n2-standard-8 = 4 cores with 8 threads, so eight vCPUs total.

Flags: needinfo?(mgoossens)

Sounds like we want to have a separate bug to change what machines get used for these tests - Joel, can you file that based on the info from Michelle?

I'll try to find time to dig into the test failure some more with the info Florian unearthed from the profiling work anyway.

Flags: needinfo?(jmaher)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

This fails frequently with this failure line TEST-UNEXPECTED-FAIL | toolkit/components/narrate/test/browser_narrate.js | popup is hidden after scroll - false == true - got false, expected true (operator ==).
Backfill range and retriggers: https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception%2Crunnable&fromchange=d6aa92d2655910b034dd430187865b1dd646f01f&searchStr=linux%2C18.04%2Cx64%2Cwebrender%2Copt%2Cmochitests%2Cwith%2Cnetworking%2Con%2Csocket%2Cprocess%2Ctest-linux1804-64-qr%2Fopt-mochitest-browser-chrome-spi-nw%2Cbc5&group_state=expanded&selectedTaskRun=biHWUi-jReG7AGQEC-jgsA.0&tochange=345ec3c4436ae80e5f25367b1a07e301aaa416c5
I see that this is an old issue, I will update the bug summary to depict the current situation with this bug.

Summary: Intermittent toolkit/components/narrate/test/browser_narrate.js | single tracking bug → High frequency toolkit/components/narrate/test/browser_narrate.js | single tracking bug

filed bug 1807666 to track work on multi-core

Flags: needinfo?(jmaher)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Given that this has 234 total failures in the last 30 days https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2022-12-05&endday=2023-01-04&tree=trunk&failurehash=all&bug=1776050 and 99% of them are on linux opt, could we go ahead and disable this until a fix is in place?

Flags: needinfo?(jmaher)

yes, once the fix is ready it is super easy to turn the test back on!

Flags: needinfo?(jmaher)
Assignee: nobody → csabou
Pushed by csabou@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0c86cc71e8f7 Disable browser_narrate.js on linux opt for frequent failures. r=jmaher,intermittent-reviewers DONTBUILD
Assignee: csabou → nobody
Keywords: leave-open
Whiteboard: [stockwell disable-recommended] → [stockwell disabled]
Summary: High frequency toolkit/components/narrate/test/browser_narrate.js | single tracking bug → Intermittent toolkit/components/narrate/test/browser_narrate.js | single tracking bug

(In reply to Florian Quèze [:florian] from comment #29)

I profiled this test failure per Gijs' suggestion. After looking at multiple profiles, my current impression from what I saw is:

  • The test fails on jobs that run on a single core Linux machine. The failure is more likely related to the single core hardware than to the Linux operating system. And actually, I could reproduce on my M1 macbook pro by forcing the test to only run on power efficiency cores (taskpolicy -c background ./mach test toolkit/components/narrate/test/browser_narrate.js --profiler --run-until-failure reproduces for me)

It doesn't for me, also on an M1 macbook pro. :-(

Is there some other way to make the local mach test invocation still-closer to resembling the single core Linux machine?

  • The test seems to fail when "Runnable - Document::NotifyIntersectionObservers" happens during "ContentTaskUtils.waitForEvent - scroll": https://share.firefox.dev/3UXfW5G

This is confusing because that is what is supposed to happen, AIUI - the intersection observer should be triggered when there's a scroll, and that will also resolve the waitForEvent promise, right?

  • I noticed that on the slow Linux machine, the test document hasn't finished visually loading (it is not shown yet on the profiler screenshots) when the actual test starts. I pushed to try with an arbitrary 4s wait before the start of the test, and that try run was green. The "Document::NotifyIntersectionObservers" runnable runs during the 4s arbitrary wait: https://share.firefox.dev/3uUbom7 I think it runs at the time the page finishes loading.

I guess we could wait for the document to be painted in the child? Perhaps that would be enough?

Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(florian)

(In reply to :Gijs (he/him) from comment #49)

(In reply to Florian Quèze [:florian] from comment #29)

I profiled this test failure per Gijs' suggestion. After looking at multiple profiles, my current impression from what I saw is:

  • The test fails on jobs that run on a single core Linux machine. The failure is more likely related to the single core hardware than to the Linux operating system. And actually, I could reproduce on my M1 macbook pro by forcing the test to only run on power efficiency cores (taskpolicy -c background ./mach test toolkit/components/narrate/test/browser_narrate.js --profiler --run-until-failure reproduces for me)

It doesn't for me, also on an M1 macbook pro. :-(

Is there some other way to make the local mach test invocation still-closer to resembling the single core Linux machine?

I don't know. I can still reproduce failures of the test with my current mozilla-central local build with the exact command from comment 29. The first time I got a "Test timed out" failure, and the second time a failure similar to what we were investigating here.

  • The test seems to fail when "Runnable - Document::NotifyIntersectionObservers" happens during "ContentTaskUtils.waitForEvent - scroll": https://share.firefox.dev/3UXfW5G

This is confusing because that is what is supposed to happen, AIUI - the intersection observer should be triggered when there's a scroll, and that will also resolve the waitForEvent promise, right?

I don't know what's expected or unexpected. I only observed a difference in the markers in the passing runs vs the failing run. Here's a view of the same profile that shows also the passing runs. https://share.firefox.dev/3jXlEYN

I guess we could wait for the document to be painted in the child? Perhaps that would be enough?

Maybe? It would be nice to understand the root cause of the intermittent failure, but we might indeed also want a quick solution to hide the problem.

Flags: needinfo?(florian)
Attachment #9383184 - Attachment is obsolete: true
Status: REOPENED → RESOLVED
Closed: 3 years ago11 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: