We are currently running b2g emulator reftest-sanity tests on the core branches. We should expand the set of reftests being run to all passing tests.
Created attachment 681547 [details] [diff] [review] Base patch From Aug-Oct I triaged most of the failing/random reftests that cropped up and ended up with this patch. It's been awhile since I ran it so there are probably new failures by now. I plan on checking it in to cedar and trying to get a stable green run again.
Pushed to cedar: https://hg.mozilla.org/projects/cedar/rev/a0b15032b295
Most of the chunks are getting killed because they are taking more than an hour to run. When running these on my desktop they took around ~20 min but I guess the slaves aren't as powerful. We'll want to: 1) Figure out how to speed them up 2) Use more chunks 3) Possibly get the emulators running on mac where wait times aren't as high This is all outside the scope of this bug. It'll be a slow process.
I overwrote local changes on Cedar in the latest merge: https://hg.mozilla.org/projects/cedar/diff/5b7cce7a7f1b/layout/reftests/font-inflation/reftest.list
For posterity here is :cjones' rankings of reftest b2g importance: == Critical == I wouldn't consider shipping a phone without knowing the exact state of these tests. In order of importance. crashtests layout/reftests/reftest-sanity layout/reftests/bugs layout/reftests/invalidation == High priority == Tests that are critical to the project and for which desktop/android coverage is *not* mostly sufficient. In no particular order. content/canvas/test/reftest image/test/reftest gfx/tests/reftest layout/reftests/position-dynamic-changes layout/reftests/text layout/reftests/canvas layout/reftests/svg/smil layout/reftests/svg/as-image layout/reftests/font-inflation layout/reftests/transform layout/reftests/image layout/reftests/scrolling layout/reftests/forms layout/reftests/css-gradients layout/reftests/ogg-video layout/reftests/transform-3d layout/reftests/layers layout/reftests/flexbox layout/reftests/webm-video layout/reftests/selection layout/reftests/css-selectors layout/reftests/css-calc layout/reftests/font-face == Normal priority == Tests that we should run but for which desktop/android coverage *is* mostly sufficient. In no particular order. content/html/content/reftests content/test/reftest layout/reftests/border-radius layout/reftests/cssom layout/reftests/text-shadow layout/reftests/columns layout/reftests/list-item layout/reftests/table-width layout/reftests/css-ui-valid layout/reftests/css-optional layout/reftests/box-sizing layout/reftests/bidi layout/reftests/font-matching layout/reftests/table-background layout/reftests/text-indent layout/reftests/marquee layout/reftests/image-element layout/reftests/indic-shaping layout/reftests/line-breaking layout/reftests/datalist layout/reftests/css-transitions layout/reftests/svg layout/reftests/css-visited layout/reftests/css-charset layout/reftests/table-dom layout/reftests/counters layout/reftests/css-parsing layout/reftests/unicode layout/reftests/text-decoration layout/reftests/box-ordinal layout/reftests/abs-pos layout/reftests/table-overflow layout/reftests/css-placeholder layout/reftests/css-default layout/reftests/text-transform layout/reftests/text-overflow layout/reftests/pagination layout/reftests/image-rect layout/reftests/z-index layout/reftests/percent-overflow-sizing layout/reftests/object layout/reftests/font-features layout/reftests/image-region layout/reftests/inline-borderpadding layout/reftests/css-disabled layout/reftests/pixel-rounding layout/reftests/native-theme layout/reftests/box-shadow layout/reftests/table-bordercollapse layout/reftests/floats layout/reftests/css-import layout/reftests/text-svgglyphs layout/reftests/generated-content layout/reftests/table-anonymous-boxes layout/reftests/w3c-css layout/reftests/first-line layout/reftests/box-properties layout/reftests/css-mediaqueries layout/reftests/css-valid layout/reftests/css-invalid layout/reftests/first-letter layout/reftests/css-ui-invalid layout/reftests/box layout/reftests/css-enabled layout/reftests/backgrounds layout/reftests/ib-split layout/reftests/tab-size layout/reftests/border-image layout/reftests/css-submit-invalid layout/reftests/margin-collapsing layout/reftests/dom layout/reftests/css-required layout/reftests/css-valuesandunits layout/reftests/mathml parser/htmlparser/tests/reftest editor/reftests netwerk/test/reftest toolkit/content/tests/reftests widget/reftests == Completely worthless == (Just a waste of CPU cycles, please don't run.) dom/plugins/test/reftest editor/reftests/xul layout/reftests/printing layout/reftests/xul layout/reftests/xul-document-load layout/xul toolkit/themes/pinstripe/reftests
I disabled everything except the critical and high priority tests on cedar. Unfortunately because chunking doesn't take into account skipped tests, and due to the uneven distribution of skipped tests, some chunks are still timing out (since they are still running 1000+ tests while other chunks are only running ~100-200 tests).
The easiest solution would probably be to create a separate reftest_b2g.list root manifest. This way we wouldn't technically be skipping everything and all chunks would see an even distribution of tests.
Yes, for now, comment 7 is the way forward. The chunking problem itself is being worked on in bug 818156.
Created attachment 699322 [details] [diff] [review] Patch 1.0 - Enable larger set of b2g reftests I'm fairly confident that the set of reftests enabled by this patch is green enough on cedar to get the ball rolling. Notes: * the two important files to look at are layout/reftests/reftest.list to see the overall set of tests that will be run and layout/tools/reftest/runreftestb2g.py since I had to turn off <iframe mozbrowser> due to bug 785074 which causes tons of additional failures * I used skip-if instead of random or fails-if to avoid unnecessary test slave load * after landing this patch we'll need to update the mozharness configs to point to the root manifest instead of reftest-sanity * if you'd rather I create a separate root manifest for B2G as opposed to skip-if'ing everything in the main one, I can attach a new patch There are obviously still some fundamental problems with the reftest harness on B2G and this patch isn't going to make anyone happy (including myself). But at the end of the day it will put us in a better position in terms of test coverage than we are currently.
Comment on attachment 699322 [details] [diff] [review] Patch 1.0 - Enable larger set of b2g reftests Review of attachment 699322 [details] [diff] [review]: ----------------------------------------------------------------- Looks like we're still getting the odd random orange on cedar; I guess we can cover those with new bugs, or skip those as well if they become too frequent.
https://hg.mozilla.org/releases/mozilla-aurora/rev/885f829b692b This patch applied cleanly to aurora, but there were massive differences on the b2g-18 branch. I don't think that merging it by hand will produce a green test run anyway, so I'd advocate not turning these tests on there and waiting for the next merge.
I made a typo when merging the root manifest to aurora: https://hg.mozilla.org/releases/mozilla-aurora/rev/a6a8dd94822b
(In reply to Andrew Halberstadt [:ahal] from comment #15) > https://hg.mozilla.org/releases/mozilla-aurora/rev/885f829b692b > > This patch applied cleanly to aurora, but there were massive differences on > the b2g-18 branch. I don't think that merging it by hand will produce a > green test run anyway, so I'd advocate not turning these tests on there and > waiting for the next merge. There aren't any planned merges to b2g18. We may need to bite the bullet and hide the tests on b2g18 until we can exclude all of the failures there.
Comment on attachment 699322 [details] [diff] [review] Patch 1.0 - Enable larger set of b2g reftests Sorry, this f? hit me at a really bad crunch time. I didn't look through all the manifest changes but it's usually bad form to disable tests without a bug to re-enable or comment explaining why.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #19) > Sorry, this f? hit me at a really bad crunch time No worries, I mostly just wanted you to be aware of this horrible patch and the fact that more tests are running. > it's usually bad form to disable tests without > a bug to re-enable or comment explaining why. Agreed, though: A) We are running 10 chunks at 30+ minutes each for over 5 hours of B2G reftest per push. Realistically these tests are never coming back on with emulators as we just don't have capacity for even this much. When we switch to pandaboards I'll re-enable everything, re-triage on pandas and emulator reftests will be phased out. B) There are so many failures (possibly in the thousands) that I don't know if it is harness, platform, emulator or test related. The best I could do is comment with a tracking bug which isn't much more useful than nothing at all.
Are there any tracking bugs filed on further-increasing the number of reftests that are run on B2G? We've got a frightening number of entire subdirectories marked as "skip-if(B2G)", from this bug's changeset - this part in particular, tweaking the toplevel reftest.list file: http://hg.mozilla.org/mozilla-central/diff/d932f2172ce2/layout/reftests/reftest.list (I'm assuming the situation was worse beforehand, but this still leaves us in a pretty bad state, reftest-coverage-wise, and I'm hoping we have plans to get better. :))
ahal, can you answer dholbert?
(In reply to Daniel Holbert [:dholbert] from comment #21) > Are there any tracking bugs filed on further-increasing the number of > reftests that are run on B2G? > > We've got a frightening number of entire subdirectories marked as > "skip-if(B2G)", from this bug's changeset - this part in particular, > tweaking the toplevel reftest.list file: > http://hg.mozilla.org/mozilla-central/diff/d932f2172ce2/layout/reftests/ > reftest.list > > (I'm assuming the situation was worse beforehand, but this still leaves us > in a pretty bad state, reftest-coverage-wise, and I'm hoping we have plans > to get better. :)) Yes, I'd really like to get more tests enabled as well. The main problem is that reftests can't be run on the Ubuntu AWS VM's (bug 818968) so we have to keep them running on actual hardware. Combined with the fact that they are *very* slow on the emulators (~40 minutes for 500 tests) we can't just wholesale enable them without getting long test backups. That being said, since we've moved other tests off of the Fedora pool, we do have a bit of spare capacity to enable some more tests. I've blogged/posted to dev.b2g about this in the past but no one seemed interested. To answer your question, there aren't any bugs filed to enable specific swathes of tests, but if you would like to, feel free to make them block the 'b2g-reftest' main tracking bug. Or if you want to give me a list of which tests are currently disabled that you think would be most useful to disable, I'd be happy to enable them when I have a few spare cycles.
Not really 40 minutes for 500 tests, it's actually more like 25 minutes for 500 tests, and 15 minutes of setup/teardown time per hunk. We could easily add another 1000 tests and not lose anything, just by switching from 10 hunks to 5.