We run a lot of duplicated reftests

NEW
Unassigned

Status

enhancement
P4
normal
11 months ago
4 months ago

People

(Reporter: mats, Unassigned)

Tracking

(Blocks 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 2 obsolete attachments)

It appears we're running tests both in layout/reftests/w3c-css/submitted/
and testing/web-platform/tests/css/vendor-imports/mozilla/mozilla-central-reftests/ which are duplicates of each other.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=56aeabe8b84a3874b2ec1d06389894e22fdb9f5c

Shouldn't we exclude the latter to conserve computing resources?

Also, layout/reftests/w3c-css/submitted/README claims to be
the original and that changes should be made there.
But then if I make a code change that requires a test change,
am I supposed to  mark the temporarily obsolete copy in
the other place as failing, and then update the meta data
again once the updated test is vendored in again?
This seems rather tedious...
IIRC, the main blocker to this is that wpt don't run on all supported platforms yet (i.e. Android).
(In reply to Ryan VanderMeulen [:RyanVM] from comment #1)
> IIRC, the main blocker to this is that wpt don't run on all supported
> platforms yet (i.e. Android).

See also bug 1423971, which this might be a dupe of.
Bug 1423971 would be the ideal fix, but can we do something in the meantime?
I don't see the point of running the WPT copy until that bug is fixed.
Per bug 1481723 we don't even run them on Windows and Mac -- they're run only on Linux.

Given the various things that our reftest harness does that the wpt reftest harness doesn't do, I'm inclined to say that for now we shouldn't run the wpt copy.  The reason to add things to w3c-css/submitted rather than directly to the wpt directories is that we get the various features in our main reftest harness.  (Or, I guess, that they get run on platforms other than Linux...)
I think the argument from computing resources is fairly weak; reftests run fast in CI and the number of duplicate tests is small enough that we're talking about a single digit number of minutes per run, or a negligible fraction of the overall runtime.

The argument that updating the internal copy of the tests is difficult does make sense, and having duplicate copies occasionally causes me a lot of work.

Having said that, I would prefer to frame this discussion around how we can work together to reach a point where the layout team treats wpt tests as first class, rather than trying to figure out what it's acceptable to disable in order to avoid the pain that duplicating tests causes.
I think everyone on the Layout team would be very happy to treat the WPT
files as the originals (as I said in comment 3).  But that seems to be
blocked on a number of things outside of our control (WPT not working
on Android was mentioned in comment 1).
I don't think that all the blocking issues are out of your control.

Several of the issues aren't really about the harness but about unexplained behaviour in Gecko, for example the fact that we see lots of antialiasing-related failures on Windows.

Android is perhaps something of an exception; the recent backstory on that is that maja_zf was working on it before she went on parental leave, and finding cover to continue that work has proven more difficult than we would have liked. However there is the prospect of resuming work there in the near future.

I'm sure I haven't communicated the current state of affairs, or progress that's being made, as well as I should. But if we have agreement on where we want to end up then there's certainly ways to work together to get there faster.
(In reply to James Graham [:jgraham] from comment #7)
> Several of the issues aren't really about the harness but about unexplained
> behaviour in Gecko, for example the fact that we see lots of
> antialiasing-related failures on Windows.

Hmm, are you saying that we get antialiasing failures when running a WPT reftest
but not when running *the same test* in our traditional reftest framework?
If so, is that due to differences in the frameworks, or simply that WPT
lacks the relevant "fuzzy" annotations in the manifest?
What I know at the moment is that the try runs in bug 1481723 show failures on Windows on tests that pass on Linux, with substantial numbers of differences being apparently due to antialiasing. It is *possible* that all these tests need to be marked as fuzzy on Windows, but that seems like a pretty unfortunate situation for wpt since even when fuzzy support in the harness lands [1] we will continually add new tests from upstream without any automated way to update the fuzzy annotations. Unless people are paying very close attention, we're likely to end up with a lot of unnecessary EXPECTED-FAIL on Windows in those imported tests. Therefore I've been hoping there's some set of prefs I can set, or similar, that will give more reliable results on that platform. But so far I haven't managed to dig into this very deeply.

[1] https://github.com/web-platform-tests/wpt/pull/12187
> Therefore I've been hoping there's some set of prefs I can set, or similar,
> that will give more reliable results on that platform.

I think we've discussed that in the past and decided that we explicitly
don't want to do that because then we don't test the actual code path
the test will take in a release build.


> we're likely to end up with a lot of unnecessary EXPECTED-FAIL on Windows in
> those imported tests

I don't see that as a problem for WPT tests that we're currently not running
on these platforms.  BTW, I looked at some of the failures in bug 1481723 and
some of them appears to be bugs in the tests themselves, not anti-aliasing.
(dependencies on font metrics and whatnot)
I don't see any failures in any of the
testing/web-platform/tests/css/vendor-imports/mozilla/mozilla-central-reftests/
tests though.  If the failures are only for tests that we currently don't
run on these platforms we should just mark them as failing and enable
the suite IMO.  It will create an initial backlog of tests for us to
investigate but that seems better to me than not running them at all
on these platforms.

BTW, are the WPT machines from the same pool as other types of tests
on that platform?
> If the failures are only for tests that we currently don't
> run on these platforms we should just mark them as failing and enable
> the suite IMO.  It will create an initial backlog of tests for us to
> investigate but that seems better to me than not running them at all
> on these platforms.

If people are happy with hundreds of extra "pass-looking" failures then I can enable the tests on Windows 10 with very little effort. Windows 7 has something that looks like a scrollbar issue that I'm investigating, and Mac has some compareCanvases issue that needs work.

> BTW, are the WPT machines from the same pool as other types of tests on that platform?

On AWS wpt uses xlarge instances by default whereas reftest seems to use default large instances. I don't know what pooling exists outside of the AWS-backed infrastructure.
> On AWS wpt uses xlarge instances by default whereas reftest seems
> to use default large instances.

Interesting, so I don't know what differences there are between
xlarge/large, could there be differences in graphics cards/drivers,
Windows settings etc?

Would it be possible to do a WPT test run on a "large" machine -
just to see if you get a /different/ set of failures?
Oh actually I just found some different configuration related to Windows (xlarge vs large seems to just be linux). AFAICT Windows 10 reftests are run on hardware and others are not. jmaher seems to have set that up, so I bet he knows the context here.
Flags: needinfo?(jmaher)
we are unable to run reftests on windows 10 in AWS due to issues with the image hanging/crashing (will be blue jobs in treeherder).  I don't know the details, but we run code coverage tests on windows 10 on hardware instead of AWS for the same reason.

I would imagine in November we could be switching reftests on windows10 to AWS as the work to fix the driver issues and installation issues for that driver are recently fixed.
Flags: needinfo?(jmaher)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=78994cdedf7517619e8dc67987f17255ded7ac6a&selectedJob=199332028 is a wpt run using the same Windows machines as for reftests; it still seems to be failing tests in the same way.

Are tests like https://searchfox.org/mozilla-central/source/testing/web-platform/tests/css/CSS2/box-display/display-change-001.xht really buggy (linking to the reftest analyzer is really hard, but it's one of the tests in [1])? That seems to fail due to blurring of the borders of the rendering of the Ahem glyph, but are tests supposed to account for that somehow? If they are it seems like we really really want a lint.

[1] https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/aCOopG5FTRSOa8BdrZF5Ug/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
display-change-001.xht makes the assumption that text using the Ahem font
isn't anti-aliased.  (I think that's a pretty common assumption for tests
using Ahem, BTW).  Apparently it fails on this platform.

https://searchfox.org/mozilla-central/source/testing/web-platform/tests/css/CSS2/box-display/display-change-001.xht

Maybe there's a config knob somewhere to suppress anti-aliasing for
Ahem specifically?  Maybe Jonathan knows.
Flags: needinfo?(jfkthame)
(In reply to Mats Palmgren (:mats) from comment #16)
> display-change-001.xht makes the assumption that text using the Ahem font
> isn't anti-aliased.  (I think that's a pretty common assumption for tests
> using Ahem, BTW).  Apparently it fails on this platform.
> 
> https://searchfox.org/mozilla-central/source/testing/web-platform/tests/css/
> CSS2/box-display/display-change-001.xht
> 
> Maybe there's a config knob somewhere to suppress anti-aliasing for
> Ahem specifically?  Maybe Jonathan knows.

I don't know of a config setting that would do this. In theory, it should be possible to use an appropriate 'gasp' table[1] in the Ahem font to request non-AA rendering; but I don't know whether the various rasterizers across all platforms actually respect this or not.

[1] https://docs.microsoft.com/en-us/typography/opentype/spec/gasp
Flags: needinfo?(jfkthame)
Hmm, I'm surprised there isn't a @font-face feature to control
anti-aliasing per font.

Maybe we could set gfxFont::mAntialiasOption to kAntialiasNone
for Ahem, based on a pref? (enabled in testing profiles)

I'm puzzled how/if other UAs pass these tests...
I think that blink and WebKit disable font antialiasing for tests.

gsnedders tried altering the gasp table for the Ahem font but it didn't seem to work; maybe we got the settings wrong [1]

[1] https://github.com/web-platform-tests/wpt/pull/13010
jfkthame: Do you have some time to look at https://github.com/web-platform-tests/wpt/pull/13010 and see if the changes to Ahem look like they ought to be doing the right thing?

If the consensus is that we should disable antialiasing for Ahem specifically in the test configs, I can look at implementing that but I'm totally unfamiliar with the code so would need some pointers to get going. Otherwise I'd be very happy to test a patch :)
Flags: needinfo?(jfkthame)
(In reply to James Graham [:jgraham] from comment #20)
> jfkthame: Do you have some time to look at
> https://github.com/web-platform-tests/wpt/pull/13010 and see if the changes
> to Ahem look like they ought to be doing the right thing?

Looks like it to me, in principle. I'm a bit doubtful, though, whether the font rasterizer always respects the 'gasp' table; e.g. in the case of Linux, I suspect this might be overridden by stuff in the fontconfig configuration.

Or I guess we might be overriding it somewhere in Moz2d-land or Skia or something.... that's getting pretty far into the gfx side of things, dealing with details of rasterization etc. Maybe Lee would know where to look for more detail.
Flags: needinfo?(jfkthame) → needinfo?(lsalzman)
(In reply to Jonathan Kew (:jfkthame) from comment #21)
> (In reply to James Graham [:jgraham] from comment #20)
> > jfkthame: Do you have some time to look at
> > https://github.com/web-platform-tests/wpt/pull/13010 and see if the changes
> > to Ahem look like they ought to be doing the right thing?
> 
> Looks like it to me, in principle. I'm a bit doubtful, though, whether the
> font rasterizer always respects the 'gasp' table; e.g. in the case of Linux,
> I suspect this might be overridden by stuff in the fontconfig configuration.
> 
> Or I guess we might be overriding it somewhere in Moz2d-land or Skia or
> something.... that's getting pretty far into the gfx side of things, dealing
> with details of rasterization etc. Maybe Lee would know where to look for
> more detail.

I would think it would be better to manipulate font AA via a pref. Trying to manipulate the font itself seems sketchy.
Flags: needinfo?(lsalzman)
I'd argue the opposite:  if we can make the reftests in web-platform-tests more portable between browsers and configurations, that's a pretty substantial win for the usability of those tests (and their accessibility to contributors, etc.).
Posted patch kAntialiasNone (obsolete) — Splinter Review
jgraham: could you test this patch please?

(It intentionally disables AA for *all* fonts currently,
because I wanted to see how many other tests we have that
depend on AA (around 30), but hopefully it should fix
the failing Ahem reftests.)
Flags: needinfo?(james)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=31fefee5116c44e07f67897a7a5e03ff7e7a8bb8&selectedJob=200154874

No change on Linux, fixes the vast majority of issues on Windows 10 (for wpt). Do you also want non-wpt reftests?
Flags: needinfo?(james)
Just WPT is fine, thanks.  I suspect that it would fix most
issues on Win10 if we limit this hack to Ahem only.  There
seems to be some remaining issue with the viewport though:
https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/D1FU5FkFRs6vy2jov8Xo0w/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
(then click "Circle differences")

I'm assuming that the OSX failures are still that the Ahem
fails to install properly there.
Yeah, Windows 7 has some strange issues getting the correct viewport size, which I'm investigating in bug 1490096 (seems to be a gfx issue since I see different behaviour in headless and non-headless mode and they layer tree looks sensible).
OK, so we have several suggested solutions for the Ahem issue:

1. Change the Ahem font so it's no longer antialiased
2. Update all the tests so they only ever do glyph-to-glyph comparisons, so that the antialaising is always the same in test and ref.
3. Disable antialiasing for wpt reftests
4. Disable antialiasing for Ahem specifically in wpt reftests.

I agree with dbaron that 1 would be ideal if we could make it work, but my sense from all the font experts is that it's hard to achieve in practice. 2 seems like a theoretically pure solution that is difficult to implement because it would involve rewriting hundreds of existing tests and probably not be easy to maintain for new tests.

If we agree that 1 and 2 aren't practical, then that leaves 3 and 4. 3 has a considerable disadvantage that it makes the rendering different to a real user session, and may miss bugs where AA affects other layout. But it has the considerable advantage that it's the same as Blink and WebKit do, so we're less likely to run into tests that work in those browsers but fail with AA issues in gecko. 4 is the compromise option where we disable AA for Ahem only. I don't have a sense for how big the advantages/disadvantages of this are compared to option 3; do we have a lot of tests where layout depends on font metrics, but Ahem isn't the font being used in the test?

p.s. the Windows 7 issue seems to be that a window with the dialog flag doesn't necessarily open at the specified size on that platform.
Here's a little more fleshed out patch.  It puts the feature behind
a pref and only enables it for WPT and reftests.  Also, it disables
AA for Ahem only.

There are some failures on OSX I don't understand though:
https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/cnB_dXsBTliKSrohP8GIwg/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
It appears that disabling AA affected the Ahem baseline?
Attachment #9010072 - Attachment is obsolete: true
Another disadvantage of the w3c-css/submitted tests is that they make contributing or changing the tests really difficult to other developers from other engines.

Rego just mentioned that he wanted to adjust one of the contain/ tests for a spec change, and he right now would need to either copy the tests into wpt, or submit the patch to Gecko (marking the tests as failing here), which is probably fine but somewhat unfortunate.

I suggested him to submit a patch moving the reftests to testing/web-platform.
mats: I made a similar patch in bug 1494715, but I think your one already addresses most of the review issues. So I hope you don't mind that I replaced my patch with yours, set you as author, and updated the review [1] (I'll obviously check that the author information is preserved on landing; no idea what Phabricator/Lando do when the commit author changes).

I think in the interests of making progress we shouldn't set the pref on non-wpt reftests for now so that we can handle any changes separately?

[1] https://phabricator.services.mozilla.com/D7200
Depends on: 1481723, 1495430
Flags: needinfo?(mats)
I'm puzzled by the new failures on OSX though (comment 29).
I think we need to understand those first.  Any help with that
would be appreciated.

Also, there are a couple of @font-face reftests that legitimately
need to have AA for Ahem (they load Ahem.ttf under a different
name and compares it to Ahem).  I tried to fix that by just
adding a pref() in the manifest to disable this feature for
those tests but I discovered that it doesn't work.  I suspect
that Ahem was already loaded in that content process and once
it's in the font cache it's too late.  So we need to add a pref
callback to clear the font cache or something.
Alternatively, we could change those few tests to use
a different name than Ahem; that's probably easier.
Flags: needinfo?(mats)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=931935f9781eb22a92c4b739f018266d62c9f93e&selectedJob=202937460 shows 

Assertion failure: IsAtomic<bool>::value || NS_IsMainThread() (Non-atomic static pref 'gfx.font_ahem_antialias_none' being accessed on background thread), at /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/StaticPrefList.h:439

amongst other problems.
Could you try declaring it RelaxedAtomicBool instead?
(assuming it's for prefs like this)
The OSX failures are still that the Ahem font wasn't installed correctly, right?

The Win7 failures:
 [Exception... "Failure" nsresult: "0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: chrome://marionette/content/reftest.js :: toBase64 :: line 224" data: no]
seems to be an error with the snapshotting feature on that platform?
(The debug build says: Assertion failure: [GFX1]: Failed to create DrawTarget, Type: 3 Size: Size(616,638)
so maybe we just ran out of memory?)

The Win10 failures (meshgradient*.svg) makes no sense to me.  The test
snapshot is blank, so I'm guessing they test a SVG feature we don't support?
We should probably just mark them as failing and file a bug in Core/SVG.

Other than that, it looks pretty good to me. ;-)
https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=203425483&revision=6c8d8af7621c834fb62fbeb156634c073aeb136f looks more healthy. Note that this includes updated metadata so we aren't actually passing all tests (and in particular we haven't fixed the AHem-on-OSX issue), but we don't see any more errors.
I think we need to figure out the OSX issue I noted in comment 29
before landing that patch though.
Jonathan, maybe you can take a look?
Flags: needinfo?(jfkthame)
(In reply to Mats Palmgren (:mats) from comment #37)
> I think we need to figure out the OSX issue I noted in comment 29
> before landing that patch though.

Yes, I'm puzzled by that too.

As a first step, I confirmed in a local build that the font metrics we get (as computed by gfxMacFont::InitMetrics()) are identical in the unpatched and patched cases.

The frame tree we end up with for an affected testcase is also identical. So there's no actual change in layout resulting from the patch; the issue is happening during painting only.

Next step is to see what's different about the display list, perhaps...
Flags: needinfo?(jfkthame)
(In reply to Mats Palmgren (:mats) from comment #29)
> It appears that disabling AA affected the Ahem baseline?

I'm not sure this is a baseline issue at all, actually; it just appears like one at first glance. But the failures I looked at involve Ahem glyphs in an element with vertical writing mode, so when we see a discrepancy at the top and/or bottom edges that corresponds to an issue in the *inline* direction.

Moreover, the examples I checked turn out to be already annotated as fuzzy already. The trouble is that when we disable AA for Ahem, the fuzz numbers tend (in vertical writing-mode examples) to get larger. Specifically, we seem to end up consistently painting the Ahem glyph 1 device pixel lower -- i.e. with an inline-position that is 1 dev px advanced -- than would be "correct" according to the element's rect. We *sometimes* also have it offset 1 dev px to the right (i.e. in the block direction), but that's variable depending on the font size.
Here's a testcase that (with a patched build and the gfx.font_ahem_antialias_none pref set to true) can be used to compare the rendering we get with and without AA, in both horizontal and vertical writing modes.

With antialiasing enabled, there's frequently some fuzziness on one or more edges of the block, depending on size and directionality, but there doesn't seem to be any consistent error, just antialias blur. With it disabled, however, the vertical writing-mode example (4th line) clearly shows a consistent 1 pixel offset in the vertical direction.
Updated to also test vertical writing-mode with text-orientation:upright. In this case the problem doesn't occur; the non-AA glyphs are rendered at the expected vertical position (though depending on size, there's sometimes a 1-pixel offset horizontally).

The other thing I notice is that with WebRender enabled, the issue here doesn't seem to occur; it's specific to the non-WR code path for drawing rotated ("sideways") text in vertical modes.

This suggests perhaps we need to pixel-snap the y-coordinate of the center of rotation in the code at https://searchfox.org/mozilla-central/rev/5786b9be9f887ed371c804e786081b476a403104/gfx/thebes/gfxFont.cpp#2306-2319, or something like that.
Attachment #9015021 - Attachment is obsolete: true
A further demonstration that the problem arises due to the rotation origin being at a fractional-device-pixel vertical position. Here, we just repeat the exact same testcase several times. The problem occurs only on lines 2 and 3 out of the 6 examples here; on the other lines, the vertical positioning is OK.
(In reply to Jonathan Kew (:jfkthame) from comment #41)
> This suggests perhaps we need to pixel-snap the y-coordinate of the center
> of rotation in the code at
> https://searchfox.org/mozilla-central/rev/
> 5786b9be9f887ed371c804e786081b476a403104/gfx/thebes/gfxFont.cpp#2306-2319,
> or something like that.

Maybe SetMatrixDouble -> SetMatrix -> ChangeTransform:
https://searchfox.org/mozilla-central/source/gfx/thebes/gfxContext.cpp#954
  mRect.NudgeToIntegers();
So, given bug 1425698 is still unsolved, it seems like blocking the Ahem antialiasing changes, scoped to wpt-only, on a mac-only issue prevents us turning on Windows tests for not much gain. Would there be objections to landing a set of fixes to enable the wpt CSS tests on Windows only? I think that is currently blocked on https://phabricator.services.mozilla.com/D7200 getting a second round of review from jfkthame, and me splitting the relevant patches into windows-specific and mac-specific parts.
Sounds good to me.

Also, once bug 1425698 is resolved, I think we should also enable it
on OSX even if the "rounding issue" discussed above is unresolved.
I'm assuming that we aim to make WebRender the default soon-ish,
so it seems OK to mark those tests as failing in a non-WR OSX build.
All CSS tests now run on Mac and Windows (both with Ahem)

https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&selectedJob=213118961&revision=9c736fa46ae77b97f6d3cbb73e9b32f975c8dba0

is a try run on Android; there's some bustage that I think is related to dpi/scaling. I set devicePixelRatio to 1, but some of the failures e.g. in https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/dTSbpRpoSvSMW_9a5Gea5g/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1 look like the kind of slight misalignment that you get running reftests on high dpi devices. So I'm not sure whether there is some other setting I should tweak, or similar.
(In reply to James Graham [:jgraham] from comment #47)
> All CSS tests now run on Mac and Windows (both with Ahem)

Yay \o/

> is a try run on Android; there's some bustage

It looks to me like AA differences are causing some of these errors...
For example, css/CSS2/css1/c412-hz-box-000.xht which use Ahem text
in the test and compares that to <img> blocks in the reference.
The Ahem text is now NOT doing AA as we intended, but the image
rendering apply AA at the edges (which seems a bit odd to me).
We might be able to make some of these tests pass with an .ini file
that unset our ahem pref on Android only?

Some other failures seems to be unrelated to AA though, e.g.
css/CSS2/tables/fixed-table-layout-003f01.xht which has some
text (the "p" in "200px") protruding below the block size.
It seems weird to me that part of a descender can cause visual
overflow like this when using a regular font.  It might be
technically allowed for a font to do that though, I don't know.
Perhaps we should just mark these tests as failing on Android?
Depends on: 1509983
After figuring out what setting are required, Android reftests are close to landing. It seems like we have a lot of intermittents related to what I think is intermittent rendering of the scrollbar e.g. https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/aZ-5y73ER7Ki_EOFS7jhjg/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1 If anyone knows how to address that I would be very interested to know.
The best fix for cases like this is to split the test in two parts
so that it doesn't overflow the viewport.  Using a smaller font-size
might work too if the font-size isn't relevant to the test.
OK; there's more general work going on to ensure that wpt reftests don't overflow the bounds of the viewport (principally by making the viewport larger), so if that's the only thing to do here then I guess we can go ahead and disable these for now and re-enable them once we have addressed the underlying issue.
Although I'm unclear how to handle cases like the position:sticky test in https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/aHJ22_QqTaS9UfVP6Uhcbw/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1 which uses scrolling child elements. It looks like there are prefs to either disable overlay scrollbars or change the time for them to disappear to 0, so I'm going to try that, but I don't know if there's a preferred option here?
For those two specific tests in comment 52 (tall--auto-32px--omitted-width-nonpercent-height-viewbox.html and position-sticky-grid.html), we could probably adjust the test's CSS to use "overflow:hidden" rather than "scroll", right?  (In the former test, putting "overflow:hidden" on the <body> or <html> element.)

Those tests don't seem to be intending to test scrollbar rendering -- they're just testing rendering in a scrollable-via-JS area, and overflow:hidden is still scrollable-via-JS.  If scrollbars cause screenshotting-problems from fading etc., then that's a pretty good argument for avoiding overflow:scroll in this case.

But: having said that, I also expect there are *other* testcases that *do* intend to test scrollbar rendering (e.g. to exercise edge cases relating to the space that scrollbars may occupy if they're not overlaid). So we probably do need a more general solution beyond just fixing the tests in comment 52.
Perhaps we could set the fade time to "infinity"?
Then we still have general test coverage for overlay scrollbars
but with stable rendering.  Then we only need to reset that
setting for tests that checks the fade-away behavior.

All the recorded blockers here are fixed. Are there other issues that people feel should block deduping the reftests?

No, I don't see any blockers. We can fix any remaining tests
that fails when run as WPT but not as a reftest as we go.

Are we running the WPT reftests on Android now, and if not, should we be blocking on that? Looking at a m-c push, I see Tier 2 Wr jobs on Android 7.0 x86 opt, but no other Android configurations.

Flags: needinfo?(james)

I see the dependent bug 1509983 which enabled them for that configuration, in Tier 2. What prevents those from moving to Tier 1, and should we be running those tests in debug or other Android version configurations?

OK, so the Android configuration has been a little bit in flux, but the situation is clearer now.

We are currently running tests in Fennec as Tier-2 and (just landed, and not yet very green) GeckoView as Tier-3.

The intent is to green up the GeckoView version and then eventually promote that to Tier-1. I expect that Fennec tests will be dropped at some point, although that could be subject to product decisions. After the opt tests are Tier-1 (or at least Tier-2) we can look at debug, but I'm not confident we will be able to run the full suite in that configuration given the limited resources and relatively poor performance. But hopefully I'm wrong.

I don't think there's any plan to run the tests in other android configurations; given the relative performance it's hard to justify running things in ARM Andoid emulators on x86 hardware rather than in x86 Andoid emulators on x86 hardware unless there's a strong justification for doing so. For generic gecko code we've been taking the approach that Android vs !Android is a more significant distinction than ARM vs x86, although there are obviously exceptions for things like the Spidermonkey JIT. If that's something we need to do with (a subset of) wpt, we should discuss that.

Flags: needinfo?(james)
You need to log in before you can comment on or make changes to this bug.