Closed Bug 1341078 Opened 7 years ago Closed 7 years ago

Run CSSWG tests

Categories

(Testing :: web-platform-tests, defect)

defect
Not set
normal

Tracking

(firefox56 fixed)

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: jgraham, Assigned: jgraham)

References

Details

Attachments

(2 files)

Tests will need to be greened up, unstable ones disabled, chunked in an appropriate way, etc.
This is expected to land upstream today, so we need to take some short term action to deal with that, even if it's just skipping the import initially. A better option from my point of view would be to do the import but disable running the tests by default for now.

gps: I understand that you have concerns about imorting this many files into the repo. What do you need me to do to test the hypothesis that it will be problematic? FWIW here's a try run based on a sample of the import:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2b18baa9dfefcae67921318d14ae71304b46a1a
Flags: needinfo?(gps)
Can you please assemble a commit (or series of commits) with all the proposed changes and push that somewhere (like Try)? I want to pull things down locally and poke around.

Aside from straight file count, I'm interested in things like increased bundle size, inode overhead, and `hg update` performance implications.
Flags: needinfo?(gps) → needinfo?(james)
Oh, it wasn't clear if that Try push you referenced is a full import because of the word "sample" in your comment. I'm not sure if that's "sample" meaning "subset" or "sample" meaning "trial run."
That's sample meaning "trial run". The full change has landed upstream now, but I don't think it's substantially different to what's in that try run.
Flags: needinfo?(james)
Other than the file count, the first thing that jumped out to me was the raw repo size increase.

Using `hg bundle -r <revs> --base <parent> -t gzip-v2`, I produced a standalone bundle file containing only the changes in these 2 changesets. Surprisingly, that file was 30,082,219 bytes.

`hg files -r . 'set:added() and size(">100k")'` on the changeset that imports the files shows a bunch of 100+kb files:

$ hg files -r . 'set:added() and size(">100k")' | xargs ls -al | awk '{print $5, $9}' | sort -n -r
8983997 testing/web-platform/tests/css/work-in-progress/hixie/comments-010.xht
3292555 testing/web-platform/tests/css/css-regions-1/interactivity/full-screen/support/big-buck-bunny-240p.webm
3292555 testing/web-platform/tests/css/css-regions-1/elements/support/big-buck-bunny-240p.webm
2473623 testing/web-platform/tests/css/css-regions-1/elements/support/big-buck-bunny-240p.ogv
1795435 testing/web-platform/tests/css/css-regions-1/interactivity/full-screen/support/big-buck-bunny-240p.mp4
1795435 testing/web-platform/tests/css/css-regions-1/elements/support/big-buck-bunny-240p.mp4
1571848 testing/web-platform/tests/css/css-writing-modes-3/support/mplus-1p-regular.ttf
871518 testing/web-platform/tests/css/work-in-progress/ttwf_shenzen/loutongbing/dish-isolated.png
803300 testing/web-platform/tests/css/css-writing-modes-3/support/mplus-1p-regular.woff
803300 testing/web-platform/tests/css/css-text-3/support/mplus-1p-regular.woff
803300 testing/web-platform/tests/css/css-text-3/i18n/support/mplus-1p-regular.woff
803300 testing/web-platform/tests/css/css-text-3/i18n/reference/support/mplus-1p-regular.woff
622724 testing/web-platform/tests/css/css-text-3/i18n/support/GentiumPlus-R.woff
330356 testing/web-platform/tests/css/css-regions-1/elements/support/Three.js
289724 testing/web-platform/tests/css/css-fonts-3/support/fonts/gsubtest-lookup3.otf
272605 testing/web-platform/tests/css/WOFF2/testcaseindex.xht
261428 testing/web-platform/tests/css/css-fonts-3/support/fonts/LinLibertine_Re-4.7.5.woff
244082 testing/web-platform/tests/css/css-writing-modes-3/tools/generators/ucd/DerivedGeneralCategory.txt
235940 testing/web-platform/tests/css/css-writing-modes-3/support/tcu-font.otf
232344 testing/web-platform/tests/css/fonts/CSSTest/csstest-basic-bold.ttf
229636 testing/web-platform/tests/css/fonts/CSSTest/csstest-basic-regular.ttf
227600 testing/web-platform/tests/css/fonts/CSSTest/csstest-basic-italic.ttf
221620 testing/web-platform/tests/css/fonts/CSSTest/csstest-basic-bolditalic.ttf
205416 testing/web-platform/tests/css/compositing-1/mix-blend-mode/support/RGB_Circles.mov
202312 testing/web-platform/tests/css/compositing-1/mix-blend-mode/support/RGB_Circles.webmsd.webm
191182 testing/web-platform/tests/css/work-in-progress/ttwf_bj/babyliner/images/border-image-003.jpg
150641 testing/web-platform/tests/css/work-in-progress/mozilla/mochitest-simplified/packed.js
129820 testing/web-platform/tests/css/css3-selectors/reports/implementreportOpera10.0.html
129738 testing/web-platform/tests/css/work-in-progress/gabriele/css-xml-dom/js/prototype.js
129315 testing/web-platform/tests/css/css3-selectors/reports/implementreportFx3.5.html
129265 testing/web-platform/tests/css/css3-selectors/reports/implementreportKonqueror4.2.2.html
129130 testing/web-platform/tests/css/css3-selectors/reports/implementreportPrince7.0.html
127871 testing/web-platform/tests/css/css-writing-modes-3/text-orientation-script-001.html
127264 testing/web-platform/tests/css/css3-selectors/reports/implementreportTEMPLATE.html
121949 testing/web-platform/tests/css/compositing-1/test-plan/test-plan.html
121335 testing/web-platform/tests/css/css-fonts-3/support/fonts/gsubtest-lookup3.ufo/features.fea
120451 testing/web-platform/tests/css/css-grid-1/grid-items/support/200x200-green.png
119784 testing/web-platform/tests/css/css-fonts-3/support/fonts/gsubtest-lookup1.otf
115747 testing/web-platform/tests/css/work-in-progress/ttwf_bj/babyliner/images/border-image-002.jpg
110078 testing/web-platform/tests/css/work-in-progress/hp/House24.png

Just these files alone is 32,819,523 bytes uncompressed. tar+gzipped is 20,153,151.

Is there any fat we can trim here? That ~9MB comments-010.xht file is especially unappealing.
For the record, investigating this bloat allowed me to find a performance bug in Mercurial when status-based filesets are used. Mercurial 4.2 will be >10x faster on these operations for large repositories.
gsnedders: Can we remove any of these large files from upstream? It seems like things under work-in-progress or reports might be good candidates for removal?
Flags: needinfo?(geoffers+mozilla)
So if we want to run CSSWG tests as part of web-platform tests, what's our plan for the imported tests in layout/reftests/w3c-css both submitted and received? I suppose we don't want to keep two separate sets of them in-tree?
Flags: needinfo?(james)
(Sorry this conversation got split across two bugs, which is entirely my fault).

dbaron has concerns about the ability of the wpt reftest harness to provide the same test depth as the layout/ reftest harness. I have concerns that the latter is unable to support the requirements we have for frequent synchronisation with upstream. In the short term I think it makes sense to run both sets of tests. I plan to concentrate on improving reftest support in the wpt harness. Once that work is complete I think we will be in a better position to decide on the right long-term strategy.
Flags: needinfo?(james)
As I've said elsewhere, I think it would be better to add the necessary manifest features to the reftest harness in layout/tools/reftest/.

I also think the performance of running all the CSSWG reftests using the wpt harness is likely to be an unacceptable load on our infrastructure.  (Running them with the layout/tools/reftest/ harness might be as well, but it's at least faster.)
(In reply to James Graham [:jgraham] from comment #8)
> gsnedders: Can we remove any of these large files from upstream? It seems
> like things under work-in-progress or reports might be good candidates for
> removal?

Maybe, we probably want to move the regions tests away from big-buck-bunny (as was done with the Opera media tests) to what we already have in /common (though sadly we'll still have to duplicate the files given we can't actually use symlinks because Windows).

We probably also want to try and push as much out of work-in-progress (given 99% of it is an ancient dumping ground).

Many of the large files are fonts, which are probably harder to deal with.

I agree implementation reports can die, though.

(In reply to James Graham [:jgraham] from comment #10)
> (Sorry this conversation got split across two bugs, which is entirely my
> fault).
> 
> dbaron has concerns about the ability of the wpt reftest harness to provide
> the same test depth as the layout/ reftest harness. I have concerns that the
> latter is unable to support the requirements we have for frequent
> synchronisation with upstream. In the short term I think it makes sense to
> run both sets of tests. I plan to concentrate on improving reftest support
> in the wpt harness. Once that work is complete I think we will be in a
> better position to decide on the right long-term strategy.

We have discussion about that in bug 1263568.
Flags: needinfo?(geoffers+mozilla)
8983997 testing/web-platform/tests/css/work-in-progress/hixie/comments-010.xht 

https://github.com/w3c/web-platform-tests/pull/5321

3292555 testing/web-platform/tests/css/css-regions-1/interactivity/full-screen/support/big-buck-bunny-240p.webm
3292555 testing/web-platform/tests/css/css-regions-1/elements/support/big-buck-bunny-240p.webm
2473623 testing/web-platform/tests/css/css-regions-1/elements/support/big-buck-bunny-240p.ogv
1795435 testing/web-platform/tests/css/css-regions-1/interactivity/full-screen/support/big-buck-bunny-240p.mp4
1795435 testing/web-platform/tests/css/css-regions-1/elements/support/big-buck-bunny-240p.mp4

https://github.com/w3c/web-platform-tests/pull/5326

1571848 testing/web-platform/tests/css/css-writing-modes-3/support/mplus-1p-regular.ttf

https://github.com/w3c/web-platform-tests/pull/5328

871518 testing/web-platform/tests/css/work-in-progress/ttwf_shenzen/loutongbing/dish-isolated.png
115747 testing/web-platform/tests/css/work-in-progress/ttwf_bj/babyliner/images/border-image-002.jpg

https://github.com/w3c/web-platform-tests/pull/5327

129820 testing/web-platform/tests/css/css3-selectors/reports/implementreportOpera10.0.html
129315 testing/web-platform/tests/css/css3-selectors/reports/implementreportFx3.5.html
129265 testing/web-platform/tests/css/css3-selectors/reports/implementreportKonqueror4.2.2.html
129130 testing/web-platform/tests/css/css3-selectors/reports/implementreportPrince7.0.html
127264 testing/web-platform/tests/css/css3-selectors/reports/implementreportTEMPLATE.html

https://github.com/w3c/web-platform-tests/pull/5320


Would landing these be enough to assuage the concern about the raw repo size?
Flags: needinfo?(gps)
Honestly, 30 MB compressed was tolerable. I you told me there was no way to make the content smaller, I would have said "OK." I just don't like needless fat and am glad we're cutting some away :)

After removing the large files, my biggest concern is file count. This is a bit annoying. But if we have to take ~30k new files, we have to do it.

With the addition of Servo + Rust crates, test262, and now WPT, we've added a quite a number of files to the Firefox repo this year. We're getting to the point where a sparse checkout (only realize a subset of files in the working directory) is becoming a necessity, especially for automation. I'm trying to push against upstream Mercurial to get this feature integrated into core. (It exists as a Facebook-maintained extension now.) I'd prefer it be an officially supported extension before we start leaning on it too much at Mozilla. But that's more my problem than yours.
Flags: needinfo?(gps)
(In reply to David Baron :dbaron: ⌚️UTC+8 from comment #11)
 
> I also think the performance of running all the CSSWG reftests using the wpt
> harness is likely to be an unacceptable load on our infrastructure. 
> (Running them with the layout/tools/reftest/ harness might be as well, but
> it's at least faster.)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f4f9732692123863cd4bc10dbb46ff077e5211b6 implements a reftest backend in wpt similar to the one the reftest harness is using (feedback very welcome). A small local test suggested performance from the point at which marionette connects to the point at which the last test is done within 10-20% of the reftest harness (I didn't gather enough data to be more accurate, but it is at least significnatly faster than the current implementation).

With 12 jobs (too many it turns out) and some instability (adds overhead from extra browser restarts) it takes <100 minutes of machine time to run all the reftests from CSS+wpt with this backend on linux64 opt. It seems to me that, assuming I didn't do something terribly wrong — which is of course quite possible — this is not an unreasonanable amount of extra runtime given the number of tests added.
Depends on: 1363428
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4bb710f46330aef60c02ef5eb5cb08a5dc443f37

This is approximately ready to land, initially enabled on Linux only for resource reasons. However I think gps has some residual concerns about adding these files to the tree.
Flags: needinfo?(gps)
So this is what - ~37000 new files? We're at ~196000 now. So that puts us ~233000.

This will slow down parts of automation and possibly annoy developers. Fortunately, Mercurial 4.3 will have experimental support for sparse checkout. So we can work around perf issues in automation in August. For developers, it may have to wait until the feature stabilizes a bit in 4.4. Either way, Git users may suffer, as Git's sparse checkout support is a PITA to configure. And Git doesn't yet have support for integrating with a filesystem watcher (although patches are actively being discussed on the Git mailing list).

If developers will benefit from having these tests in tree, let's move forward. Please do let me know when you land this so I can let people know to look for oddities on the hg.mo server.
Flags: needinfo?(gps)
If we land this, we should probably remove layout/reftests/w3c-css. That is ~3800 files. Not a lot, though...
> If developers will benefit from having these tests in tree, let's move forward. Please do let me know when you land this so I can let people know to look for oddities on the hg.mo server.

OK. Thanks. I think this is important additional testing, particularly as we do significant work on our layout engine (many of these tests have been running on servo for some time, but we loose that coverage when we import the code to gecko). I am on PTO from 11th July for a week, so my tentative plan at this point is to land this asap when I get back. Some additional metadata changes will be required due to code changes in the interim, but hopefully that's not more than a day or so of effort and I can land this on the 19th or 20th July.

> If we land this, we should probably remove layout/reftests/w3c-css. That is ~3800 files

That has the disadvantage that it would reduce our test coverage on android since wpt doesn't run there (just because that involves lots of "trivial" but high effort work getting stability and it hasn't been priortized). I believe there is a desire to keep the old import around until we fix wpt to work on all platforms.
Depends on: 1381842
Blocks: 1381855
Depends on: 1381856
Comment on attachment 8887511 [details]
Bug 1341078 - Update taskcluster tasks for wpt CSS tests,

https://reviewboard.mozilla.org/r/158354/#review163590
Attachment #8887511 - Flags: review?(ahalberstadt) → review+
Comment on attachment 8887510 [details]
Bug 1341078 - Run CSS tests on Linux,

https://reviewboard.mozilla.org/r/158352/#review163594
Attachment #8887510 - Flags: review?(ahalberstadt) → review+
Blocks: 1381892
Blocks: 1381893
Blocks: 1381898
Blocks: 1381899
Assignee: nobody → james
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: