Perma Android async-scrolling/position-fixed-async-zoom-1.html == async-scrolling/position-fixed-async-zoom-1-ref.html | image comparison, max difference: 255, number of differing pixels: 15000
Categories
(Core :: Panning and Zooming, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox131 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: dholbert)
References
Details
(Keywords: intermittent-failure, intermittent-testcase)
Attachments
(2 files)
Filed by: sstanca [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=471831635&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Gvlcf1OkTdmUqtCeATGFtA/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Gvlcf1OkTdmUqtCeATGFtA/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
Comment 1•11 months ago
•
|
||
This started to permafail once this push landed on Autoland. From backfills and retriggers it seems that this goes way back, falling even where it was before green.
Updated•11 months ago
|
Assignee | ||
Comment 2•11 months ago
|
||
Maybe something changed or got deployed/enabled (outside of the autoland/mozilla-central repos) related to Android + nofission test configurations in the past day?
(I recall owlish and whimboo greening up tests with Android+fission recently; not sure if either of them are aware of any such change? CC'ing them both in case they have ideas.)
Assignee | ||
Comment 3•11 months ago
|
||
Though also: this seems like it turned out not to be a perma-fail, right? There are some pushes where it looks perma-faily, but other pushes where we've got some green and some orange, and others (older and newer than the fail-prone ones) that look fully green.
Assignee | ||
Comment 4•11 months ago
•
|
||
(I guess specifically for the Android "R-nofis-bk" configuration (note the "-bk" suffix), this does currently seem like a perma-fail. On other platforms, it looks more intermittent.)
[EDIT: since I didn't know & now have found out: -bk
here means "backfill" tests, i.e. a task from a failing run, backfilled to run on previous builds, or something along those lines And the intermittent-ness is potentially not really intermittent; see comment 8.]
Comment 5•11 months ago
|
||
Nothing I worked on should have affected that. However, when I compare screenshots of the failures across different tests, I notice that the test image has a zoom value set to twice that of the reference image. This suggests there might be an issue with async zooming?
![]() |
||
Comment 6•11 months ago
|
||
This is similar to bug 1913376 where Android 7.0 reftest failures also had bee observed for a combination of reftest manifests. This dynamic scheduling got turned on in bug 1893181 recently.
Is there anything shared between the failing tests in the two bugs?
Comment 7•11 months ago
|
||
most likely this fails when run as a standalone manifest vs in a big bunch of manifests. usually this is the result of a previous test changing the state of the browser. In other cases it could be that the browser naturally does something (i.e. garbage collection) before this test runs and now it is testing without a full browser where it depends on a service being initialized fully.
Assignee | ||
Comment 8•11 months ago
•
|
||
(In reply to Daniel Holbert [:dholbert] from comment #3)
Though also: this seems like it turned out not to be a perma-fail, right? There are some pushes where it looks perma-faily, but other pushes where we've got some green and some orange, and others (older and newer than the fail-prone ones) that look fully green.
Ah, it looks like maybe the green logs are all just logs that ran hardly any tests at all (and not these tests in particular).
e.g. compare these two runs (a33f8b42c8f35579dd1e9f74036641b094bd3f4d where this is green vs. its parent c0f325a58cd9f31aafbe33c36e8d5963cfc50511 where this looks perma-orange, on "Android 7.0 x86-64 Lite WebRender opt R-nofis and R-nofis-ship R6):
https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=android%2C7.0%2Copt%2Creftests%2Ctest-android-em-7.0-x86_64-lite-qr%2Fopt-geckoview-reftest-nofis%2Cr6&tochange=a33f8b42c8f35579dd1e9f74036641b094bd3f4d&fromchange=c0f325a58cd9f31aafbe33c36e8d5963cfc50511
The green logs only run 47 tests, and they're 4 minutes & 6 minutes long (respectively) -- here's the link and a quote from the first one:
https://firefoxci.taskcluster-artifacts.net/IRK2uhq-SqSpEgm275e4FQ/0/public/logs/live_backing.log
[task 2024-08-26T17:50:46.701Z] 17:50:46 INFO - REFTEST SUITE-START | Running 47 tests
[task 2024-08-26T17:50:46.701Z] 17:50:46 INFO - REFTEST TEST-START | gfx/tests/reftest/468496-1.html == gfx/tests/reftest/468496-1-ref.html
[...]
[task 2024-08-26T17:51:16.966Z] 17:51:16 INFO - REFTEST TEST-END | gfx/tests/reftest/1912431-emoji-globalAlpha.html == gfx/tests/reftest/1912431-emoji-globalAlpha-ref.html
[task 2024-08-26T17:51:16.966Z] 17:51:16 INFO - REFTEST INFO | Result summary:
[task 2024-08-26T17:51:16.966Z] 17:51:16 INFO - REFTEST INFO | Successful: 40 (40 pass, 0 load only)
[task 2024-08-26T17:51:16.966Z] 17:51:16 INFO - REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
[task 2024-08-26T17:51:16.966Z] 17:51:16 INFO - REFTEST INFO | Known problems: 7 (0 known fail, 0 known asserts, 1 random, 6 skipped, 0 slow)
[task 2024-08-26T17:51:16.966Z] 17:51:16 INFO - REFTEST SUITE-END | Shutdown
vs. the orange tests look like this and run 3163 tests (including the ones in question here), taking 25-40min:
https://firefoxci.taskcluster-artifacts.net/BDpzYP1bS8KsLAexwIIxOQ/0/public/logs/live_backing.log
[task 2024-08-26T17:10:03.297Z] 17:10:03 INFO - REFTEST SUITE-START | Running 3163 tests
[task 2024-08-26T17:10:03.298Z] 17:10:03 INFO - REFTEST TEST-START | layout/reftests/bugs/105-1.html == layout/reftests/bugs/105-1-ref.html
[...]
[task 2024-08-26T17:31:22.723Z] 17:31:22 INFO - REFTEST TEST-END | layout/reftests/position-dynamic-changes/horizontal/fromauto-leftA-widthA-rightN.html?border_parent == layout/reftests/position-dynamic-changes/horizontal/leftA-widthA-rightN-ref.html?border_parent
[task 2024-08-26T17:31:22.723Z] 17:31:22 INFO - REFTEST INFO | Result summary:
[task 2024-08-26T17:31:22.723Z] 17:31:22 INFO - REFTEST INFO | Successful: 2993 (2993 pass, 0 load only)
[task 2024-08-26T17:31:22.723Z] 17:31:22 WARNING - REFTEST INFO | Unexpected: 6 (6 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 failed load, 0 exception)
[task 2024-08-26T17:31:22.723Z] 17:31:22 WARNING - One or more unittests failed.
[task 2024-08-26T17:31:22.723Z] 17:31:22 INFO - REFTEST INFO | Known problems: 164 (47 known fail, 0 known asserts, 23 random, 94 skipped, 0 slow)
[task 2024-08-26T17:31:22.723Z] 17:31:22 INFO - REFTEST SUITE-END | Shutdown
jmaher, is there any reason why we'd be picking such a vastly different set of tests for the same configuration from one run to the next? Note that the only commit that differs between these two runs is https://hg.mozilla.org/integration/autoland/rev/a33f8b42c8f35579dd1e9f74036641b094bd3f4d which didn't add or remove any tests & hence doesn't seem like it should have changed rebucketing?
Assignee | ||
Updated•11 months ago
|
Assignee | ||
Comment 9•11 months ago
|
||
Are we doing something tricky like e.g. analyzing the commit itself to figure out which tests should get included in the R-nofis(R6)
bucket?)
It seems pretty suspicious that both of Robert Longson's pushes yesterday (for bug 1914345) happened to be the ones that got the broader bucket of R-nofis(R6)
and R-nofis-ship(R6)
tests, whereas their direct child commits (which touched zero reftests) mysteriously got a much smaller (and hence greener) bucket of R-nofis(R6)
tests:
https://treeherder.mozilla.org/jobs?repo=autoland&group_state=expanded&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=android%2C7.0%2Copt%2Creftests%2Ctest-android-em-7.0-x86_64-lite-qr%2Fopt-geckoview-reftest-nofis%2Cr6&tochange=abeda35559be4439d219295722d3d364be36850e&fromchange=92acbb3b404d41b2fc9a3f5307603d42b70f31fa
Whatever's responsible for the variability here, we might want to rethink it, because it leads to confusion about whether a test is a perma-failure or not, and whether a patch introduced a test-failure or not (e.g. longsonr's bug 1914345 got repeatedly backed out for supposedly causing this test failure, when really it seems to have been a preexisting test-failure, in a test that only gets run as under some semi-mysterious conditions)
Comment 10•11 months ago
|
||
we determine the tests based on the code, finding the tests most relevant. Every 20th push (or every 2 hours) we run ALL the tests to ensure we are not missing coverage for too long of a gap.
Mochitest + xpcshell + wpt have been running like this for 4.5 years, now reftests can run like this, reducing load on autoland and try (if using ./mach try auto
) a bit.
Ideally all tests should be able to run independent of any other test and in any order, we know that isn't always the truth. I am on PTO this week, otherwise I would do a series of try pushes to narrow down which test is running before in order to make this pass.
Assignee | ||
Comment 11•11 months ago
|
||
Thanks for the reply when on PTO! That info helps.
otherwise I would do a series of try pushes to narrow down which test is running before in order to make this pass.
I might poke at doing that myself a bit. It's not yet clear to me that this test is passing at all in any recent test run. In the green test runs that I've looked at so far, the test is simply not-running-at-all. (Maybe it is passing in other test runs on other pushes, but I'm not sure how/where to look to find those test runs where it might hypothetically be getting run + passing.)
Assignee | ||
Comment 12•11 months ago
|
||
Update: the test is indeed usually passing, and it's only failing when it gets run in coordination with just-the-right set of prior reftest directories.
In this case the relevant prior-reftest directory is layout/reftests/meta-viewport
. So this issue is reproducible if you run e.g.
./mach try fuzzy layout/reftests/meta-viewport/ layout/reftests/async-scrolling -q "android lite reftest"
I just tested a patch which reduced the reftest.list files drastically, and it still reproduced the issue:
https://hg.mozilla.org/try/rev/c4205fe6d189c75565a90af87944043a999775eb
Here's the try run showing the issue reproducing with that patch:
https://treeherder.mozilla.org/jobs?repo=try&revision=8efc2e09141d04a12494b1a2152458431f3d581a
I reduced the meta-viewport
reftest.list file to just this defaults
line plus a single test:
defaults pref(dom.meta-viewport.enabled,true) pref(apz.allow_zooming,true)
So I suspect those default prefs are involved here -- they're perhaps not getting cleared properly when we reach the end of that directory, or something along those lines?
Assignee | ||
Comment 13•11 months ago
|
||
I'm pretty sure this is the culprit:
https://searchfox.org/mozilla-central/rev/45d6f8bf028e049f812aa26dced565d50068af5d/layout/reftests/meta-viewport/desktop-mode.html#24-26
let win = SpecialPowers.wrap(window);
SpecialPowers.spawnChrome([win.browsingContext.id], id => {
BrowsingContext.get(id).forceDesktopViewport = true;
This forceDesktopViewport
setting persists and potentially impacts the tests that follow this one.
I'm working on adding a new dummy reftest to run right after this one, which simply restores that setting to its default value. I'm pretty sure that'll fix this.
Assignee | ||
Comment 14•10 months ago
|
||
This patch doesn't change the test's behavior at all; it's just syntactic sugar.
This makes things cleaner in the next patch in the series (allowing us to
ergonomically return a value from the chrome task to the test logic).
Assignee | ||
Comment 15•10 months ago
|
||
The reftest desktop-mode.html modifies forceDesktopViewport
, and it needs
that modification to still be in-effect when its reftest snapshot is taken.
However, we also need to undo that modification, to avoid having an impact on
subsequent reftests. This patch adds a "dummy" reftest to do that cleanup
work, which will now be run immediately after desktop-mode.html.
This patch also makes both reftests validate their expectations about what
value the forceDesktopViewport
setting should have when the test gets run.
If forceDesktopViewport
doesn't have the expected value, then the tests
change their background to red, to trigger a test-failure.
Assignee | ||
Comment 16•10 months ago
|
||
Try run without patch stack (expected to be orange):
https://treeherder.mozilla.org/jobs?repo=try&revision=17d7be9ade018c5cfdfb035368b66632ff081ef2
Try run with patch stack (expected to be green):
https://treeherder.mozilla.org/jobs?repo=try&revision=cb2a909fae877033ed0b0812f5288b59962a27da
(Both try runs are running reftests in layout/reftests/meta-viewport
and layout/reftests/async-scrolling
.)
Assignee | ||
Comment 17•10 months ago
•
|
||
Here's a try run including test-verify
tasks:
[EDIT: updated link to include a try run that actually has my patches, so that test-verify hopefully actually runs the tests]
https://treeherder.mozilla.org/jobs?repo=try&revision=ee0d5d9e28d98e17443b39fa2d8c69c80a97ca21
I anticipate that test-verify will fail for desktop-mode-cleanup.html
since that test has built-in strong expectation that desktop-mode.html
will have run prior to it and made the modification that the -cleanup.html
test is there to undo. (And on test-verify, that expectation doesn't hold up, because the test gets run in isolation.)
I don't have strong feelings whether to keep that expectation in the test; it's not mandatory, but it's kinda nice for strictness (in case the tests get reshuffled apart somehow in the future), but it's unfortunate that it makes the test fail in test-verify mode. Though I'm also not super-concerned about test-verify in this case, since this test will hopefully never need to be modified (and hence won't get TV tasks ever/often once it's added); and I've added a comment in the test itself mentioning that it's known and sort-of-intentional that the test will fail when run on its own. So hopefully that avoids confusion/trouble in a future-world where this test gets occasional TV tasks scheduled which end up failing.
Assignee | ||
Comment 18•10 months ago
•
|
||
I guess the preexisting test desktop-mode.html
itself will also now fail verification, specifically for the "run test 10 times in a row" etc. The new expectation that I added to that test (that forceDesktopViewport
is initially off) won't hold up if the test is run in a loop (without its -cleanup.html
buddy). As above, we could relax this expectation in the test, but it's kind of nice as validation that the test pair as-a-whole are sound (i.e. that the value that we're setting in the cleanup test is in fact the original/default value).
In a perfect world, we wouldn't need the separate -cleanup.html
test file, and we'd be able to make this reftest standalone and atomic by scheduling the "cleanup" work (restoring forceDesktopViewport
) to happen after the reftest snapshot. But I'm not sure we can arrange for that to happen; once the reftest snapshot has been taken, the test harness navigates away from the test and I don't think there's a way for a test to reliably schedule JS to run at that point.
Assignee | ||
Comment 19•10 months ago
•
|
||
Observations from try run in comment 17:
desktop-mode-cleanup.html
does indeed fail in test-verify mode for reasons described in comment 17.desktop-mode.html
fails in test-verify mode in the "Run each test 10 times in one browser" section (and its "in chaos mode" variant), for reasons described in comment 18.
Seeing the actual test failures in the Try run, I'm feeling like that's probably not-great (particularly for the "real" reftest desktop-mode.html
), so I'm inclined to remove the precondition so that the tests pass verification.
One other observation: both of these tests apparently get skipped in most test-verify tasks -- they're only run in test-verify-gpu
verification (which is linux-only). On other test-verify tasks, the log shows them being skipped/discarded like so:
INFO - Per-test run (non-gpu) discarded gpu test layout\reftests\meta-viewport\desktop-mode-cleanup.html (gpu)
INFO - Per-test run (non-gpu) discarded gpu test layout\reftests\meta-viewport\desktop-mode.html (gpu)
WARNING - No tests to verify.
Comment 20•10 months ago
|
||
Comment 21•10 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a8224d80f9d1
https://hg.mozilla.org/mozilla-central/rev/ad62a16baf7a
Comment hidden (Intermittent Failures Robot) |
Description
•