Last Comment Bug 811779 - Expand set of reftests running on m-i/m-c/try
: Expand set of reftests running on m-i/m-c/try
Status: RESOLVED FIXED
:
Product: Testing
Classification: Components
Component: Reftest (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla21
Assigned To: Andrew Halberstadt [:ahal]
:
Mentors:
: 807799 (view as bug list)
Depends on: 811783 818156 820958
Blocks: b2g-reftest
  Show dependency treegraph
 
Reported: 2012-11-14 10:07 PST by Andrew Halberstadt [:ahal]
Modified: 2013-06-27 19:58 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
Base patch (245.42 KB, patch)
2012-11-14 10:16 PST, Andrew Halberstadt [:ahal]
no flags Details | Diff | Review
Patch 1.0 - Enable larger set of b2g reftests (323.54 KB, patch)
2013-01-08 10:37 PST, Andrew Halberstadt [:ahal]
jgriffin: review+
Details | Diff | Review

Description Andrew Halberstadt [:ahal] 2012-11-14 10:07:58 PST
We are currently running b2g emulator reftest-sanity tests on the core branches. We should expand the set of reftests being run to all passing tests.
Comment 1 Andrew Halberstadt [:ahal] 2012-11-14 10:16:20 PST
Created attachment 681547 [details] [diff] [review]
Base patch

From Aug-Oct I triaged most of the failing/random reftests that cropped up and ended up with this patch. It's been awhile since I ran it so there are probably new failures by now.

I plan on checking it in to cedar and trying to get a stable green run again.
Comment 2 Andrew Halberstadt [:ahal] 2012-11-16 10:27:09 PST
Pushed to cedar: https://hg.mozilla.org/projects/cedar/rev/a0b15032b295
Comment 3 Andrew Halberstadt [:ahal] 2012-11-16 13:26:04 PST
Most of the chunks are getting killed because they are taking more than an hour to run. When running these on my desktop they took around ~20 min but I guess the slaves aren't as powerful.

We'll want to:
1) Figure out how to speed them up
2) Use more chunks
3) Possibly get the emulators running on mac where wait times aren't as high

This is all outside the scope of this bug. It'll be a slow process.
Comment 4 Aki Sasaki [:aki] 2012-11-28 15:07:44 PST
I overwrote local changes on Cedar in the latest merge: https://hg.mozilla.org/projects/cedar/diff/5b7cce7a7f1b/layout/reftests/font-inflation/reftest.list
Comment 5 Andrew Halberstadt [:ahal] 2012-12-04 12:02:17 PST
For posterity here is :cjones' rankings of reftest b2g importance:

== Critical ==
I wouldn't consider shipping a phone without knowing the exact state
of these tests.  In order of importance.

crashtests
layout/reftests/reftest-sanity
layout/reftests/bugs
layout/reftests/invalidation


 == High priority ==
Tests that are critical to the project and for which desktop/android
coverage is *not* mostly sufficient.  In no particular order.

content/canvas/test/reftest
image/test/reftest
gfx/tests/reftest
layout/reftests/position-dynamic-changes
layout/reftests/text
layout/reftests/canvas
layout/reftests/svg/smil
layout/reftests/svg/as-image
layout/reftests/font-inflation
layout/reftests/transform
layout/reftests/image
layout/reftests/scrolling
layout/reftests/forms
layout/reftests/css-gradients
layout/reftests/ogg-video
layout/reftests/transform-3d
layout/reftests/layers
layout/reftests/flexbox
layout/reftests/webm-video
layout/reftests/selection
layout/reftests/css-selectors
layout/reftests/css-calc
layout/reftests/font-face


 == Normal priority ==
Tests that we should run but for which desktop/android coverage *is*
mostly sufficient.  In no particular order.

content/html/content/reftests
content/test/reftest
layout/reftests/border-radius
layout/reftests/cssom
layout/reftests/text-shadow
layout/reftests/columns
layout/reftests/list-item
layout/reftests/table-width
layout/reftests/css-ui-valid
layout/reftests/css-optional
layout/reftests/box-sizing
layout/reftests/bidi
layout/reftests/font-matching
layout/reftests/table-background
layout/reftests/text-indent
layout/reftests/marquee
layout/reftests/image-element
layout/reftests/indic-shaping
layout/reftests/line-breaking
layout/reftests/datalist
layout/reftests/css-transitions
layout/reftests/svg
layout/reftests/css-visited
layout/reftests/css-charset
layout/reftests/table-dom
layout/reftests/counters
layout/reftests/css-parsing
layout/reftests/unicode
layout/reftests/text-decoration
layout/reftests/box-ordinal
layout/reftests/abs-pos
layout/reftests/table-overflow
layout/reftests/css-placeholder
layout/reftests/css-default
layout/reftests/text-transform
layout/reftests/text-overflow
layout/reftests/pagination
layout/reftests/image-rect
layout/reftests/z-index
layout/reftests/percent-overflow-sizing
layout/reftests/object
layout/reftests/font-features
layout/reftests/image-region
layout/reftests/inline-borderpadding
layout/reftests/css-disabled
layout/reftests/pixel-rounding
layout/reftests/native-theme
layout/reftests/box-shadow
layout/reftests/table-bordercollapse
layout/reftests/floats
layout/reftests/css-import
layout/reftests/text-svgglyphs
layout/reftests/generated-content
layout/reftests/table-anonymous-boxes
layout/reftests/w3c-css
layout/reftests/first-line
layout/reftests/box-properties
layout/reftests/css-mediaqueries
layout/reftests/css-valid
layout/reftests/css-invalid
layout/reftests/first-letter
layout/reftests/css-ui-invalid
layout/reftests/box
layout/reftests/css-enabled
layout/reftests/backgrounds
layout/reftests/ib-split
layout/reftests/tab-size
layout/reftests/border-image
layout/reftests/css-submit-invalid
layout/reftests/margin-collapsing
layout/reftests/dom
layout/reftests/css-required
layout/reftests/css-valuesandunits
layout/reftests/mathml
parser/htmlparser/tests/reftest
editor/reftests
netwerk/test/reftest
toolkit/content/tests/reftests
widget/reftests

 == Completely worthless ==
(Just a waste of CPU cycles, please don't run.)

dom/plugins/test/reftest
editor/reftests/xul
layout/reftests/printing
layout/reftests/xul
layout/reftests/xul-document-load
layout/xul
toolkit/themes/pinstripe/reftests
Comment 6 Andrew Halberstadt [:ahal] 2012-12-04 12:06:40 PST
I disabled everything except the critical and high priority tests on cedar. Unfortunately because chunking doesn't take into account skipped tests, and due to the uneven distribution of skipped tests, some chunks are still timing out (since they are still running 1000+ tests while other chunks are only running ~100-200 tests).
Comment 7 Andrew Halberstadt [:ahal] 2012-12-04 12:36:47 PST
The easiest solution would probably be to create a separate reftest_b2g.list root manifest. This way we wouldn't technically be skipping everything and all chunks would see an even distribution of tests.
Comment 8 cmtalbert 2012-12-05 18:07:55 PST
Yes, for now, comment 7 is the way forward. The chunking problem itself is being worked on in bug 818156.
Comment 9 Andrew Halberstadt [:ahal] 2013-01-08 10:37:11 PST
Created attachment 699322 [details] [diff] [review]
Patch 1.0 - Enable larger set of b2g reftests

I'm fairly confident that the set of reftests enabled by this patch is green enough on cedar to get the ball rolling.

Notes:
* the two important files to look at are layout/reftests/reftest.list to see the overall set of tests that will be run and layout/tools/reftest/runreftestb2g.py since I had to turn off <iframe mozbrowser> due to bug 785074 which causes tons of additional failures
* I used skip-if instead of random or fails-if to avoid unnecessary test slave load
* after landing this patch we'll need to update the mozharness configs to point to the root manifest instead of reftest-sanity
* if you'd rather I create a separate root manifest for B2G as opposed to skip-if'ing everything in the main one, I can attach a new patch

There are obviously still some fundamental problems with the reftest harness on B2G and this patch isn't going to make anyone happy (including myself). But at the end of the day it will put us in a better position in terms of test coverage than we are currently.
Comment 10 Jonathan Griffin (:jgriffin) 2013-01-08 14:59:27 PST
Comment on attachment 699322 [details] [diff] [review]
Patch 1.0 - Enable larger set of b2g reftests

Review of attachment 699322 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like we're still getting the odd random orange on cedar; I guess we can cover those with new bugs, or skip those as well if they become too frequent.
Comment 11 Andrew Halberstadt [:ahal] 2013-01-10 13:19:11 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/d932f2172ce2
Comment 12 Ed Morley [:emorley] 2013-01-11 06:31:44 PST
https://hg.mozilla.org/mozilla-central/rev/d932f2172ce2
Comment 13 Ed Morley [:emorley] 2013-01-11 06:59:43 PST
https://hg.mozilla.org/mozilla-central/rev/d932f2172ce2
Comment 14 Ed Morley [:emorley] 2013-01-11 07:47:57 PST
https://hg.mozilla.org/mozilla-central/rev/d932f2172ce2
Comment 15 Andrew Halberstadt [:ahal] 2013-01-11 08:07:00 PST
https://hg.mozilla.org/releases/mozilla-aurora/rev/885f829b692b

This patch applied cleanly to aurora, but there were massive differences on the b2g-18 branch. I don't think that merging it by hand will produce a green test run anyway, so I'd advocate not turning these tests on there and waiting for the next merge.
Comment 16 Andrew Halberstadt [:ahal] 2013-01-11 08:15:12 PST
*** Bug 807799 has been marked as a duplicate of this bug. ***
Comment 17 Andrew Halberstadt [:ahal] 2013-01-11 08:46:37 PST
I made a typo when merging the root manifest to aurora:
https://hg.mozilla.org/releases/mozilla-aurora/rev/a6a8dd94822b
Comment 18 Jonathan Griffin (:jgriffin) 2013-01-11 09:48:51 PST
(In reply to Andrew Halberstadt [:ahal] from comment #15)
> https://hg.mozilla.org/releases/mozilla-aurora/rev/885f829b692b
> 
> This patch applied cleanly to aurora, but there were massive differences on
> the b2g-18 branch. I don't think that merging it by hand will produce a
> green test run anyway, so I'd advocate not turning these tests on there and
> waiting for the next merge.

There aren't any planned merges to b2g18.  We may need to bite the bullet and hide the tests on b2g18 until we can exclude all of the failures there.
Comment 19 Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2013-01-16 11:24:55 PST
Comment on attachment 699322 [details] [diff] [review]
Patch 1.0 - Enable larger set of b2g reftests

Sorry, this f? hit me at a really bad crunch time.  I didn't look through all the manifest changes but it's usually bad form to disable tests without a bug to re-enable or comment explaining why.
Comment 20 Andrew Halberstadt [:ahal] 2013-01-16 11:55:40 PST
(In reply to Chris Jones [:cjones] [:warhammer] from comment #19)
> Sorry, this f? hit me at a really bad crunch time

No worries, I mostly just wanted you to be aware of this horrible patch and the fact that more tests are running.

> it's usually bad form to disable tests without
> a bug to re-enable or comment explaining why.

Agreed, though:

A) We are running 10 chunks at 30+ minutes each for over 5 hours of B2G reftest per push. Realistically these tests are never coming back on with emulators as we just don't have capacity for even this much. When we switch to pandaboards I'll re-enable everything, re-triage on pandas and emulator reftests will be phased out.

B) There are so many failures (possibly in the thousands) that I don't know if it is harness, platform, emulator or test related. The best I could do is comment with a tracking bug which isn't much more useful than nothing at all.
Comment 21 Daniel Holbert [:dholbert] 2013-06-27 12:27:36 PDT
Are there any tracking bugs filed on further-increasing the number of reftests that are run on B2G?

We've got a frightening number of entire subdirectories marked as "skip-if(B2G)", from this bug's changeset - this part in particular, tweaking the toplevel reftest.list file:
 http://hg.mozilla.org/mozilla-central/diff/d932f2172ce2/layout/reftests/reftest.list

(I'm assuming the situation was worse beforehand, but this still leaves us in a pretty bad state, reftest-coverage-wise, and I'm hoping we have plans to get better. :))
Comment 22 Jonathan Griffin (:jgriffin) 2013-06-27 15:34:56 PDT
ahal, can you answer dholbert?
Comment 23 Andrew Halberstadt [:ahal] 2013-06-27 15:55:05 PDT
(In reply to Daniel Holbert [:dholbert] from comment #21)
> Are there any tracking bugs filed on further-increasing the number of
> reftests that are run on B2G?
> 
> We've got a frightening number of entire subdirectories marked as
> "skip-if(B2G)", from this bug's changeset - this part in particular,
> tweaking the toplevel reftest.list file:
>  http://hg.mozilla.org/mozilla-central/diff/d932f2172ce2/layout/reftests/
> reftest.list
> 
> (I'm assuming the situation was worse beforehand, but this still leaves us
> in a pretty bad state, reftest-coverage-wise, and I'm hoping we have plans
> to get better. :))

Yes, I'd really like to get more tests enabled as well. The main problem is that reftests can't be run on the Ubuntu AWS VM's (bug 818968) so we have to keep them running on actual hardware. Combined with the fact that they are *very* slow on the emulators (~40 minutes for 500 tests) we can't just wholesale enable them without getting long test backups.

That being said, since we've moved other tests off of the Fedora pool, we do have a bit of spare capacity to enable some more tests. I've blogged/posted to dev.b2g about this in the past but no one seemed interested.

To answer your question, there aren't any bugs filed to enable specific swathes of tests, but if you would like to, feel free to make them block the 'b2g-reftest' main tracking bug. Or if you want to give me a list of which tests are currently disabled that you think would be most useful to disable, I'd be happy to enable them when I have a few spare cycles.
Comment 24 Phil Ringnalda (:philor) 2013-06-27 19:58:53 PDT
Not really 40 minutes for 500 tests, it's actually more like 25 minutes for 500 tests, and 15 minutes of setup/teardown time per hunk. We could easily add another 1000 tests and not lose anything, just by switching from 10 hunks to 5.

Note You need to log in before you can comment on or make changes to this bug.