Intermittent toolkit/components/narrate/test/browser_narrate.js | single tracking bug
Categories
(Toolkit :: Reader Mode, defect, P3)
Tracking
()
People
(Reporter: jmaher, Unassigned)
References
Details
(Keywords: intermittent-failure, intermittent-testcase, Whiteboard: [stockwell disabled])
Attachments
(1 file, 1 obsolete file)
Reporter | ||
Comment 1•3 years ago
|
||
Additional information about this bug failures and frequency patterns can be found by running: ./mach test-info failure-report --bug 1776050
Comment 2•3 years ago
|
||
Mass marking intermittent trackers as P5/S4 if less than 5 instances a week. Please clear the severity and priority field if the frequency increases and needinfo triage owners.
Comment 3•3 years ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit auto_nag documentation.
Comment 4•3 years ago
|
||
treeherder |
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=393121987&repo=autoland
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 6•3 years ago
|
||
There have been 34 total failures in the last 7 days, recent failure log.
Affected platforms are:
- linux1804-64-qr
- linux1804-64-shippable-qr
Comment hidden (Intermittent Failures Robot) |
Comment 8•3 years ago
|
||
Eemeli, looks like these tests started failing after the fluent refactor of reader mode. I don't suppose you have any idea why that might have made this test sad?
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Comment 11•3 years ago
|
||
Sorry for the delay; properly back at work today after a few weeks' travels.
This might be a timing issue. The first failing test is checking that the dropdown has closed after a scroll, which is functionality provided by the parent AboutReader controls. There, scroll events that are triggered by changes in the page height are ignored, but they're not ignored in the test.
Could it be that in some situations a Fluent formatting pass manages to happen at just the wrong time to trigger such an ignored scroll event on account of inserting some content into the DOM, such as the estimated reading time?
Updated•3 years ago
|
Comment 12•3 years ago
|
||
(In reply to Eemeli Aro [:eemeli] from comment #11)
Could it be that in some situations a Fluent formatting pass manages to happen at just the wrong time to trigger such an ignored scroll event on account of inserting some content into the DOM, such as the estimated reading time?
This is an interesting hypothesis. I'd like to add some logging to confirm. But I don't think there's an easy way to catch in JS when the "natural" processing of fluent has finished, right? What could we hang the logging off?
Comment hidden (Intermittent Failures Robot) |
Comment 14•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #12)
This is an interesting hypothesis. I'd like to add some logging to confirm. But I don't think there's an easy way to catch in JS when the "natural" processing of fluent has finished, right? What could we hang the logging off?
Correct; there's no explicit event for when Fluent DOM translations are ready. Could maybe track the contents of all nodes with a data-l10n-id
, and compare them on each scroll event, as well as the body height?
Indirectly we could confirm my hypothesis by modifying the test I linked to earlier so that it applies the same body height filter as in the runtime code to the scroll events it's waiting for, and seeing if that gets rid of the intermittent failures. At the very least, this should make the expected behaviour a bit more like what's intended.
Would you like me to submit a patch for that?
Comment 15•3 years ago
|
||
(In reply to Eemeli Aro [:eemeli] from comment #14)
Indirectly we could confirm my hypothesis by modifying the test I linked to earlier so that it applies the same body height filter as in the runtime code to the scroll events it's waiting for, and seeing if that gets rid of the intermittent failures. At the very least, this should make the expected behaviour a bit more like what's intended.
Would you like me to submit a patch for that?
Yes please! That seems worth a shot. :-)
Comment 16•3 years ago
|
||
Once this try run completes without triggering the failure, I'll submit the patch for it: https://treeherder.mozilla.org/jobs?repo=try&revision=03844d39ba9ba7a91b32a3d3118b8f0e0dcdc6ea
Comment 17•3 years ago
|
||
(In reply to Eemeli Aro [:eemeli] from comment #16)
Once this try run completes without triggering the failure, I'll submit the patch for it: https://treeherder.mozilla.org/jobs?repo=try&revision=03844d39ba9ba7a91b32a3d3118b8f0e0dcdc6ea
Thanks! I triggered test-verify jobs on that trypush for you. I'm not 100% confident they'll work correctly when triggered after the fact for trypushes but I guess we'll find out.
Comment 18•3 years ago
|
||
Looks like I was wrong, and filtering the scroll events doesn't help. I submitted a second try just to make sure. I'm a bit out of ideas here, esp. as this doesn't seem to replicate for me locally (M1 Macbook).
Comment hidden (Intermittent Failures Robot) |
Comment 20•3 years ago
•
|
||
I don't have access to a Linux box again until Wednesday, but the screenshot from a failed test makes me wonder if we're scrolled all the way to the top. The fact that we're scrolling by 10px is making this quite tricky to establish at all, I guess. I've asked pernosco to repro in case that pulls up anything, but if it's some kind of race condition then I suspect the slowdown from running under rr
might mean this won't bring up anything.
Comment hidden (Intermittent Failures Robot) |
Comment 22•3 years ago
•
|
||
(In reply to :Gijs (he/him) from comment #20)
I don't have access to a Linux box again until Wednesday, but the screenshot from a failed test makes me wonder if we're scrolled all the way to the top. The fact that we're scrolling by 10px is making this quite tricky to establish at all, I guess. I've asked pernosco to repro in case that pulls up anything, but if it's some kind of race condition then I suspect the slowdown from running under
rr
might mean this won't bring up anything.
Apparently pernosco is very unhappy about this repro attempt. Will try to take a look on my linux vm tomorrow.
Comment hidden (Intermittent Failures Robot) |
Comment 24•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #22)
Apparently pernosco is very unhappy about this repro attempt. Will try to take a look on my linux vm tomorrow.
Unfortunately I can't reproduce the issue at all locally.
Between that and not having any ideas as to what's wrong, and pernosco not working, I'm honestly not 100% sure how to investigate further. We can try adding some logging and debug on try... but it's gonna be tedious.
Reporter | ||
Comment 25•3 years ago
|
||
these are failing on the spi-nw variant, so most likely to reproduce it needs to have those preferences set. I don't know if pernosco or local attempts to reproduce were using those preferences.
Comment 26•3 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #25)
these are failing on the spi-nw variant, so most likely to reproduce it needs to have those preferences set.
Thanks for reminding me, it's a good catch. Unfortunately I appear to have no luck with those set, either. I tried with --verify
, I tried running the directory, but no cigar.
I don't know if pernosco or local attempts to reproduce were using those preferences.
I think pernosco would, yeah - my earlier local attempts definitely forgot. I'm also trying with artifact builds, but if anything I'd expect that to be closer to what's run on infra, vs. a "real" local build...
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 29•3 years ago
|
||
I profiled this test failure per Gijs' suggestion. After looking at multiple profiles, my current impression from what I saw is:
- The test fails on jobs that run on a single core Linux machine. The failure is more likely related to the single core hardware than to the Linux operating system. And actually, I could reproduce on my M1 macbook pro by forcing the test to only run on power efficiency cores (
taskpolicy -c background ./mach test toolkit/components/narrate/test/browser_narrate.js --profiler --run-until-failure
reproduces for me) - The test seems to fail when "Runnable - Document::NotifyIntersectionObservers" happens during "ContentTaskUtils.waitForEvent - scroll": https://share.firefox.dev/3UXfW5G
- I noticed that on the slow Linux machine, the test document hasn't finished visually loading (it is not shown yet on the profiler screenshots) when the actual test starts. I pushed to try with an arbitrary 4s wait before the start of the test, and that try run was green. The "Document::NotifyIntersectionObservers" runnable runs during the 4s arbitrary wait: https://share.firefox.dev/3uUbom7 I think it runs at the time the page finishes loading.
In the failing case, the style tree is invalidated with this stack:
XRE_InitChildProcess []
Task Document::NotifyIntersectionObservers []
IntersectionCallback []
js::RunScript []
$bound [self-hosted:1494:24]
js::RunScript []
_topScrollChange [resource://gre/modules/AboutReader.jsm:1321:18]
DOMTokenList.toggle []
In the passing case, I see invalidations with these stacks:
XRE_InitChildProcess []
promise callback []
js::RunScript []
AsyncFunctionNext [self-hosted:945:26]
js::RunScript []
InterpretGeneratorResume [self-hosted:1957:33]
(root scope) chrome://mochitests/content/browser/toolkit/components/narrate/test/browser_narrate.js [chrome://mochitests/content/browser/toolkit/components/narrate/test/browser_narrate.js:13:15]
Window.scrollBy []
XRE_InitChildProcess []
Task PVsync::Msg_Notify(unlabeled) []
PVsync::Msg_Notify []
RefreshDriver tick []
EventDispatcher::Dispatch []
EventDispatcher::Dispatch scroll []
EventListener.handleEvent []
js::RunScript []
handleEvent [resource://gre/modules/AboutReader.jsm:365:13]
_closeDropdowns [resource://gre/modules/AboutReader.jsm:1240:17]
DOMTokenList.remove []
Reporter | ||
Comment 30•3 years ago
|
||
so our tasks are being run sometimes on single core machines? curious if this is perma fail on single core, or intermittent on single core. Our tests should be running on something more realistic- this test runs on t-linux-large-gcp:
https://firefox-ci-tc.services.mozilla.com/provisioners/gecko-t/worker-types/t-linux-large-gcp?sortBy=Last%20Active&sortDirection=desc
we run on a n2-standard-2
:
https://hg.mozilla.org/ci/ci-configuration/file/e49187d9b67b02ace49930eae0836defa26098c3/worker-pools.yml#l3905
https://cloud.google.com/compute/docs/general-purpose-machines#n2_machines
:masterwayz- can you confirm what we are running at gcp and the cores? I see we have 2 vCPUs and 8GB of RAM for the machines.
Comment 31•3 years ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #30)
so our tasks are being run sometimes on single core machines? curious if this is perma fail on single core, or intermittent on single core.
Intermittent: it fails about 50% of the time in my try run with the profiler running. The profiler likely takes some CPU resources away from the rest of what's happening and may help reproducing.
The CPU info seen by the profiler is:
CPU name: Intel(R) Xeon(R) CPU @ 2.80GHz
CPU cores: 1 physical core, 2 logical cores
I think this matches the "2 vCPUs"
Our tests should be running on something more realistic
https://data.firefox.com/dashboard/hardware says single core machines still represents more than 1.5% of our users. It might still makes sense to have some test coverage on single core machines, but it would be nicer if the machine being single core was clearly visible on treeherder.
Reporter | ||
Comment 32•3 years ago
|
||
I think in a more ideal world we should have the 1core (current n2-standard-2) be for our linux32 tests, and then pick a 2 or 4 core machine for the rest of our tests (linux64). I am sure we have plenty of cores on osx/windows/android.
:masterwayz, to add on to my request, is there an option at gcp with 2 || 4 cores that we could add to the pool and test?
Comment 33•3 years ago
|
||
So I checked; (as with most virtualization technologies, some allow you to manually set these things), one vCPU = one thread on the physical host CPU.
So that means that two vCPU would be one core with both of its threads.
Four vCPU would be two cores with each their associated hyperthreads. And not all modern CPUs (the lower end ones) don't have hyperthreading either. But they both work the same. How it gets reported shouldn't matter as it should give the same amount of power either way.
To answer Joel; there if mean physical cores:
n2-standard-4 = 2 cores with 4 threads, so four vCPUs total
n2-standard-8 = 4 cores with 8 threads, so eight vCPUs total.
Comment hidden (Intermittent Failures Robot) |
Comment 35•3 years ago
|
||
Sounds like we want to have a separate bug to change what machines get used for these tests - Joel, can you file that based on the info from Michelle?
I'll try to find time to dig into the test failure some more with the info Florian unearthed from the profiling work anyway.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Updated•3 years ago
|
Comment 38•3 years ago
|
||
This fails frequently with this failure line TEST-UNEXPECTED-FAIL | toolkit/components/narrate/test/browser_narrate.js | popup is hidden after scroll - false == true - got false, expected true (operator ==)
.
Backfill range and retriggers: https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception%2Crunnable&fromchange=d6aa92d2655910b034dd430187865b1dd646f01f&searchStr=linux%2C18.04%2Cx64%2Cwebrender%2Copt%2Cmochitests%2Cwith%2Cnetworking%2Con%2Csocket%2Cprocess%2Ctest-linux1804-64-qr%2Fopt-mochitest-browser-chrome-spi-nw%2Cbc5&group_state=expanded&selectedTaskRun=biHWUi-jReG7AGQEC-jgsA.0&tochange=345ec3c4436ae80e5f25367b1a07e301aaa416c5
I see that this is an old issue, I will update the bug summary to depict the current situation with this bug.
Reporter | ||
Comment 39•3 years ago
|
||
filed bug 1807666 to track work on multi-core
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 43•3 years ago
|
||
Given that this has 234 total failures in the last 30 days https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2022-12-05&endday=2023-01-04&tree=trunk&failurehash=all&bug=1776050 and 99% of them are on linux opt, could we go ahead and disable this until a fix is in place?
Reporter | ||
Comment 44•3 years ago
|
||
yes, once the fix is ready it is super easy to turn the test back on!
Comment 45•3 years ago
|
||
Updated•3 years ago
|
Comment 46•3 years ago
|
||
Updated•3 years ago
|
![]() |
||
Comment 47•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 49•3 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #29)
I profiled this test failure per Gijs' suggestion. After looking at multiple profiles, my current impression from what I saw is:
- The test fails on jobs that run on a single core Linux machine. The failure is more likely related to the single core hardware than to the Linux operating system. And actually, I could reproduce on my M1 macbook pro by forcing the test to only run on power efficiency cores (
taskpolicy -c background ./mach test toolkit/components/narrate/test/browser_narrate.js --profiler --run-until-failure
reproduces for me)
It doesn't for me, also on an M1 macbook pro. :-(
Is there some other way to make the local mach test
invocation still-closer to resembling the single core Linux machine?
- The test seems to fail when "Runnable - Document::NotifyIntersectionObservers" happens during "ContentTaskUtils.waitForEvent - scroll": https://share.firefox.dev/3UXfW5G
This is confusing because that is what is supposed to happen, AIUI - the intersection observer should be triggered when there's a scroll, and that will also resolve the waitForEvent
promise, right?
- I noticed that on the slow Linux machine, the test document hasn't finished visually loading (it is not shown yet on the profiler screenshots) when the actual test starts. I pushed to try with an arbitrary 4s wait before the start of the test, and that try run was green. The "Document::NotifyIntersectionObservers" runnable runs during the 4s arbitrary wait: https://share.firefox.dev/3uUbom7 I think it runs at the time the page finishes loading.
I guess we could wait for the document to be painted in the child? Perhaps that would be enough?
Comment 50•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #49)
(In reply to Florian Quèze [:florian] from comment #29)
I profiled this test failure per Gijs' suggestion. After looking at multiple profiles, my current impression from what I saw is:
- The test fails on jobs that run on a single core Linux machine. The failure is more likely related to the single core hardware than to the Linux operating system. And actually, I could reproduce on my M1 macbook pro by forcing the test to only run on power efficiency cores (
taskpolicy -c background ./mach test toolkit/components/narrate/test/browser_narrate.js --profiler --run-until-failure
reproduces for me)It doesn't for me, also on an M1 macbook pro. :-(
Is there some other way to make the local
mach test
invocation still-closer to resembling the single core Linux machine?
I don't know. I can still reproduce failures of the test with my current mozilla-central local build with the exact command from comment 29. The first time I got a "Test timed out" failure, and the second time a failure similar to what we were investigating here.
- The test seems to fail when "Runnable - Document::NotifyIntersectionObservers" happens during "ContentTaskUtils.waitForEvent - scroll": https://share.firefox.dev/3UXfW5G
This is confusing because that is what is supposed to happen, AIUI - the intersection observer should be triggered when there's a scroll, and that will also resolve the
waitForEvent
promise, right?
I don't know what's expected or unexpected. I only observed a difference in the markers in the passing runs vs the failing run. Here's a view of the same profile that shows also the passing runs. https://share.firefox.dev/3jXlEYN
I guess we could wait for the document to be painted in the child? Perhaps that would be enough?
Maybe? It would be nice to understand the root cause of the intermittent failure, but we might indeed also want a quick solution to hide the problem.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 57•11 months ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.
Updated•11 months ago
|
Description
•