Closed Bug 737961 Opened 13 years ago Closed 12 years ago

add reftest manifest conditions for height width


(Testing :: Reftest, defect)

Not set


(Not tracked)



(Reporter: jmaher, Assigned: jmaher)


(Depends on 2 open bugs, Blocks 1 open bug)



(4 files, 5 obsolete files)

Currently we have some reftests which require up to 1000 pixels.  This is great for testing on platforms and hardware which have adequate resources to accommodate such a screen resolution.  

As it appears there are only a few tests which have this, I have adjusted their manifest as well.
Assignee: nobody → jmaher
Attachment #608036 - Flags: review?(dbaron)
Blocks: 737427
Why do these tests fail despite reftest.js enforcing a constant browser element size?

265     // Make sure the browser element is exactly 800x1000, no matter
266     // what size our window is
267     gBrowser.setAttribute("style", "min-width: 800px; min-height: 1000px; max-width: 800px; max-height: 1000px");
> +    var testRect = gBrowser.getBoundingClientRect();
> +    sandbox.resolution = {width: testRect.right, height: testRect.bottom};

What is this actually measuring?

I'm not familiar with getBoundingClientRect, but based on the names it seems like you might need to subtract left/top.
On android, we fail this check with a 1024x768 resolution:

What that means is we are not able to set a resolution at 1000px height because our screen resolution is smaller than that.
ping for review.
ping for a review, or please reassign to others if necessary.
Comment on attachment 608036 [details] [diff] [review]
add resolution.height for reftest.list manifest and use it where needed (1.0)

The test format is designed around the assumption that the tests are 800x1000.  While the number of false *failures* when using a lower resolution might be low, you're going to get tons of lost test coverage when running at a lower resolution.

If we're going to run the tests at some smaller size we need to agree on what that is and make sure the tests are actually designed to test what they're supposed to test within that size.  (The W3C recently had some discussion and agreed on a somewhat smaller size, I think.  It's worth finding out what that size was.)

What are our constraints on sizes?
Attachment #608036 - Flags: review?(dbaron) → review-
I believe we only have a small number of tests which require >600 pixels of height.

The driving factor is on mobile we have 1024x768 resolution by default on the tegras.  As we adjust this higher the memory for applications becomes less due to the shared memory between the OS and the video card.  When running the tests on the default resolution the number of hangs and timeouts we experience is much less than running in 1600x1200 (the only resolution available to get 1000 pixels height).  It also saves us 10 minutes of automation time as we cannot run our mochitests and talos tests with the restricted memory.

It boils down to ignoring 4 reftests that are currently running, vs dealing with missing entire chunks of tests because the tests turn red or purple due to a timeout or hang.  

The majority of the reftests that I have looked at work well in 400x400.  I think overall there are about 12 tests which specifically use the 1000 pixels height.

Looking on the W3C lists, I didn't see a consensus regarding a specific width/height, but I did see some questions about it from Dec 2011.  Specifically asking whether >420px is useful as in one case tests greater than that would add a scrollbar.
Can we isolate those 12 tests unto its own separate job and only adjust screen resolution for that specific job?
It is just 4 tests on android that we run (the rest already are not run due to other conditions).  

I am not sure if we want to run 4 tests in a 25 minute job by itself, or just ignore them.  Either way, I want to get test results that are meaningful and run more tests in general.
In looking at 203 reftest based test runs (jsreftest, reftest, crashtest), 105 of those runs had failures of some type (orange, red, retry).

On a similar not, the 8 mochitest runs that we do have 172 total runs with 31 total failures, and out of 104 talos runs, only 19 failures were detected.

The reftests are a real problem for our overall Android failure rate.  

Does anybody have a problem with me disabling the 4 tests (which we run on Android) that require 1000 pixels of height and then running the tests at 1024x768?
How did you determine that there are 4 tests that require 1000 pixels of height?
I did 2 things:
1) ran the reftests on android with a smaller screen size (and --ignore-window-size)
2) ran the tests on desktop and changed the gBrowser.setAttributes to be 600

I got similar results between both runs and after looking at the test cases to confirm the usage of >600 pixels of height, I see these tests fail on android:
So you didn't examine any of the tests that continued to pass to see if there was a loss in test coverage?
The same number of tests pass/fail as before (with the exception of the 4 listed above).  Those 4 tests clearly need the larger resolution.  I have looked at about 20 different tests (out of a few thousand) and all of them do not indicate a larger pixel size, usually <400x400.

Let me know if there is something specific I should look for and I can take the time and look through the tests.
For a start, you need to look for tests where a substantive part (meaning, something other than text-to-cause-scrollbars) of the test is being cut off (and thus no longer compared).  It's probably also worth looking for other height-dependencies (e.g., positioning things based on percentage heights in ways designed for a particular size).  See comment 7.

Print reftests may also require special consideration.
That said, before you spend the time on evaluating every single test, we should decide on what the new target size is, probably after discussion in a public forum (dev-platform, probably).
See Also: → 746260
For convenience, I created a simple test case in which the two images will be identical at 400x600, but will differ at anything greater. I plan to use this for testing and figured it might be useful for someone else too.
This patch uses the canvas' getContext("2d").getImageData().data array and compares each pixel to the default pixel color to calculate the actual width and height used by the test.

The downside to this is it takes a long time (~1-2 minutes per test). I can still do some optimizations that might help a bit and if we can rule out large chunks of tests that for sure won't be a problem it could help out as well. We'd only need to run it once and we could store the output for future use.
I think this is a big step in the right direction.  While this takes a while, we can run this in parallel across many machines.  Ideally giving us results overnight.  Alternatively finding a way to make this a bit faster would probably be a better use of time :)
Hmm, on closer look my method might not work out as well as I hoped. Off window data doesn't seem to be rendered. I think it may still be possible to scroll the window and render it piece by piece onto the canvas however. I'm doing some more digging.
This patch optimizes the last one by scanning pixels in reverse so we don't have to compare the entire image data array each time. Worst case scenario is still to compare the entire array.
Attachment #654224 - Attachment is obsolete: true
Here is a log that I generated using the previous patch:

There are:
3218 tests with resolution > 600x400
2712 tests with resolution > 600x600
1448 tests with resolution > 800x720
1425 tests with resolution == 800x1000

Note that these include tests that set the background colour or have relatively sized elements. The actual number that needs to be fixed will likely be much smaller.
I wrote a tool which loads each reftest (that goes over the target resolution) side by side in two iframes, one at 800x1000 the other at the target resolution. You can then visually inspect them and tell the tool if the test needs fixing at that resolution.

I went through all 3218 tests at 600x400 and came up with this list of tests that will need to be fixed (around 1000):

The good thing about this is that if we decide to go with a larger resolution, i.e 600x600, then only these tests will need to be triaged since a test that is good at 600x400 will also be good at larger resolutions.
a backwards way to look at this would be to disable all 1000 of those tests, change the resolution and let the original test authors turn them on after review/fixing.
Another option is to do what this bug says and add a manifest condition for all tests >600x400 so we can start running on mobile platforms right now where we cannot adjust the resolution.  This means that desktop can still test at 800x1000 and life will be good.  

My only concern with either plan is that we delay fixing the tests or addressing them properly and they stay off or we end up in a bad position in the future.
Joel and I talked on irc and decided that maybe we should target 600x600 instead. This cuts the number of test to be fixed to ~600 and will give us better test coverage on mobile in the short term.
Blocks: 784278
this patch takes all the work that :ahal has done and adds manifest conditions for anything that has a height or a width >600 pixels.  This is identified by the manifest condition: smallScreen.
Attachment #608036 - Attachment is obsolete: true
Attachment #663163 - Flags: review?(dbaron)
Attachment #663163 - Flags: feedback?(ahalberstadt)
Comment on attachment 663163 [details] [diff] [review]
add reftest condition for smallScreen (2.0)

I think I've been pretty clear that I think this isn't the approach we should be taking.
Attachment #663163 - Flags: review?(dbaron) → review-
I read back through this bug and I don't see the recommended approach.  We cannot turn on tests for panda board based automation with large resolutions, this patch allows us to ignore the 502 tests which depend on larger resolutions than we can support.

If you have a solution that gets us there by the end of the month, I would like to hear it and I can work on it.
(In reply to Joel Maher (:jmaher) from comment #31)
> I read back through this bug and I don't see the recommended approach.

A few other notes though:
 * we should aim to switch the size we're running reftests at on all platforms
 * fantasai raised the issue in the CSS WG; gave other vendors until next Wednesday to report back if they have problems with 600x600
 * The set of tests that need to be fixed for a smaller resolution (which I think is what you've based this patch on) is, I think, much larger than the set of tests that would need to be skipped as a temporary measure when running the harness at a smaller resolution (which is only the set that would actually fail at the smaller resolution, not those that need to be updated in order to get full coverage at that smaller resolution).  If you really need the temporary measure of running reftests at different sizes on different platforms, you should only conditionally-disable the much smaller set.  I'm still uncomfortable with doing that as a temporary measure, since I think it means that we'll never go back and do things correctly (which I still don't think is that much work)
In other words, if you're going to skip-if() tests, you should just skip-if() the list of 4 tests from comment 13.  The point of finding the full list of tests that needs to be fixed is to actually fix the tests.  Not fixing them, but running them at a smaller resolution, means we'll have partial test coverage on those tests rather than the none that we'd get from skip-if()'ing them.

Or, to put it another way, the whole point of finding the full list of tests that needed to be fixed to have full test coverage is to fix those tests, not to disable them (and have no coverage on what they're testing, instead of partial).
Thanks for the clarification.  Next Wednesday we can file a list of bugs for the reftests which we should change to meet the the new standard size.  Even though there are 502 tests identified, the files to adjust will be much less as most of these tests share a common file.

As a temporary solution we skip-if the tests which fail when we run at a lower resolution.  Ideally in a month or so we can have the tests converted and have complete test coverage.
Sounds like a plan, I can help fix up the tests next week. I also wrote a tool which loads the same test side by side in two iframes which should help a bit with the fixing.
With this patch, we will random-if the 9 cases which fail due to a smaller screen resolution on android.  Once we have confirmation of the target resolution we can start fixing the test files to adjust the resolution down and then back this patch out.
Attachment #663163 - Attachment is obsolete: true
Attachment #663163 - Flags: feedback?(ahalberstadt)
Attachment #664998 - Flags: review?(dbaron)
Depends on: 795022
Depends on: 795278
as b2g is becoming a higher priority, we need to get this landed so we can start working on turning tests on.  Please reassign this if the review should be done by someone else.
Depends on: 799105
Depends on: 800093
Depends on: 800099
Dbaron, we really need a review here. I need this done so I can get reftest supported on panda boards for B2G which cannot support the current reftest resolution requirements. I'd ideally like to be able to test this next week (week of October 15). Can we get a review here so we can get this resolved?
Flags: needinfo?(dbaron)
Comment on attachment 664998 [details] [diff] [review]
manifest off the 9 tests which fail on android with smaller resolution (1.0)

Can you:
 (1) resurrect the smallScreen bit of the previous patch
 (2) change these from random-if(Android) to fails-if(smallScreen)

(It's better to use fails than random, since then we'll know to get the test coverage back when the underlying problem is fixed.)

r=dbaron with that, if you agree it makes sense
Attachment #664998 - Flags: review?(dbaron) → review+
Flags: needinfo?(dbaron)
updated with feedback from review.
Attachment #664998 - Attachment is obsolete: true
Attachment #671941 - Flags: review+
This caused some unexpected passes on Win Opt:
REFTEST TEST-UNEXPECTED-PASS | file:///c:/talos-slave/test/build/reftest/tests/layout/reftests/reftest-sanity/corners-3.html | image comparison (!=), max difference: 255, number of differing pixels: 1
REFTEST TEST-UNEXPECTED-PASS | file:///c:/talos-slave/test/build/reftest/tests/layout/reftests/reftest-sanity/corners-4.html | image comparison (!=), max difference: 255, number of differing pixels: 1
REFTEST TEST-UNEXPECTED-PASS | file:///c:/talos-slave/test/build/reftest/tests/layout/reftests/backgrounds/background-size-body-cover-not-fixed.html | image comparison (!=), max difference: 255, number of differing pixels: 160000
REFTEST TEST-UNEXPECTED-PASS | file:///c:/talos-slave/test/build/reftest/tests/layout/reftests/backgrounds/background-size-body-contain-no-repeat.html | image comparison (!=), max difference: 255, number of differing pixels: 160000
REFTEST TEST-UNEXPECTED-PASS | file:///c:/talos-slave/test/build/reftest/tests/layout/reftests/svg/sizing/scrollbars-01.svg | image comparison (!=), max difference: 248, number of differing pixels: 13600
REFTEST TEST-UNEXPECTED-PASS | file:///c:/talos-slave/test/build/reftest/tests/layout/reftests/svg/smil/transform/translate-pattern-1.svg | image comparison (==)

I believe all of those tests were marked "fails-if(smallScreen)" in this bug's push. We're apparently treating that as "fails" on WinOpt and spamming because the tests are not actually failing.
For reference, here's another run with the same unexpected passes, on Win Debug Reftest:
Rev3 WINNT 6.1 mozilla-inbound debug test reftest on 2012-10-16 14:25:43 PDT for push 55d64487e54a
Also: this appears to sporadically trigger UNEXPECTED-PASS on Android opt for corners-3.html and corners-4.html.

Here's a green Android Opt R1 log from after this landed:
...and here's the next Android Opt R1 log after that one:
which was orange with:
REFTEST TEST-UNEXPECTED-PASS | | image comparison (!=), max difference: 255, number of differing pixels: 1
REFTEST TEST-UNEXPECTED-PASS | | image comparison (!=), max difference: 255, number of differing pixels: 1
...and one more Android Opt R1 log with unexpected passes in those same 2 tests:
The reason these tests failed on windows [7] opt is that we run the screen resolution on that platform at 1024x768 instead of 1280x1024 or a higher resolution as we do on all other platforms.

What I don't understand is that when running the tests on Android with 1024x768 we get failures in a few of the tests, but these same tests pass with the same resolution on windows.

A possible solution to this is to:

If I knew the reason for the different I would feel better about that solution.  In the meantime we are making slow progress on fixing the other tests so we can adjust the overall size down and not require these hacks.

I am open to all suggestions about debugging and or other manifest approaches to get us running on smaller resolutions for android.
I have fixed some of the tests that were needed for this patch already.  Since win7 runs 1024x768 we need to just ignore on Android.  Right now there are 5 remaining tests that fall into this.
Attachment #671941 - Attachment is obsolete: true
Attachment #675186 - Flags: review?(dbaron)
Comment on attachment 675186 [details] [diff] [review]
updated to only fail for android (3.0)

r=dbaron; sorry for the delay
Attachment #675186 - Flags: review?(dbaron) → review+
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
We really need to *change* the reftest resolution. A lot of modern laptops (including all Apple laptops) have a vertical resolution of 900 pixels by default, which causes many reftest failures when running locally. Even when the reftest doesn't fail, we are using a different code path in the layer system in this situation, which means that developers running their tests locally are not running the same code that we run on our CI infrastructure.

I'm going to file a new bug about pushing forward a change. The current situation is seriously problematic, especially because most developers are probably not aware that the problem even exists.
please look at bug 795022 and bug 795278, they have had patches for years with no luck.
You need to log in before you can comment on or make changes to this bug.