Last Comment Bug 580786 - Support fuzzy matching for reftests
: Support fuzzy matching for reftests
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Graphics (show other bugs)
: Trunk
: x86 Mac OS X
: -- normal with 1 vote (vote)
: mozilla12
Assigned To: Jeff Muizelaar [:jrmuizel]
:
:
Mentors:
Depends on: 1252361
Blocks: 586835 578134 578135 718329
  Show dependency treegraph
 
Reported: 2010-07-21 14:41 PDT by Jeff Muizelaar [:jrmuizel]
Modified: 2016-02-29 19:56 PST (History)
12 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Add fuzzy matching support (2.79 KB, patch)
2010-08-05 15:58 PDT, Jeff Muizelaar [:jrmuizel]
no flags Details | Diff | Splinter Review
An updated version that actually works (4.14 KB, patch)
2010-08-12 12:14 PDT, Jeff Muizelaar [:jrmuizel]
dbaron: review-
Details | Diff | Splinter Review
Address review comments (6.18 KB, patch)
2011-11-16 22:30 PST, Jeff Muizelaar [:jrmuizel]
dbaron: review-
Details | Diff | Splinter Review
Require gwindowutils (2.17 KB, patch)
2011-12-13 20:11 PST, Jeff Muizelaar [:jrmuizel]
dbaron: review+
Details | Diff | Splinter Review
Address review comments better (7.43 KB, patch)
2011-12-13 20:13 PST, Jeff Muizelaar [:jrmuizel]
dbaron: review+
Details | Diff | Splinter Review
Fix the Android test failures (1.05 KB, patch)
2012-01-05 15:34 PST, Jeff Muizelaar [:jrmuizel]
no flags Details | Diff | Splinter Review
final version (8.07 KB, patch)
2012-01-09 16:15 PST, Jeff Muizelaar [:jrmuizel]
dbaron: review+
Details | Diff | Splinter Review

Description Jeff Muizelaar [:jrmuizel] 2010-07-21 14:41:39 PDT
The basic idea here is to support a threshold of differences so that we don't get as bitten by rasterization differences. The threshold would be per test and platform and likely a sum of squared differences.
Comment 1 Vladimir Vukicevic [:vlad] [:vladv] 2010-07-21 14:44:36 PDT
I thought that we decided to explicitly not do this for reftests?
Comment 2 Jeff Muizelaar [:jrmuizel] 2010-07-21 14:59:48 PDT
(In reply to comment #1)
The alternative seems to be disabling some tests for D2D. Some reftests assume that clip() creates the same pixels as fill(), D2D doesn't appear to have this property.
Comment 3 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-07-21 23:03:26 PDT
Under what conditions? Is that not a D2D bug?
Comment 4 Jeff Muizelaar [:jrmuizel] 2010-07-22 15:46:26 PDT
(In reply to comment #3)
> Under what conditions? Is that not a D2D bug?

I took an in-depth look at this today. 

test:
 - draw a grey circle
reference:
 - clip a grey surface to a circle mask

It looks like we can get slightly different results from the floating point computations in this two situations. I wasn't able to get a great picture of where exactly the differences were coming from because the PIX debugger's results didn't match the hardware and when run with the reference driver the difference seemed to go away.

All of our previous graphics frameworks have done compositing with integers so with the switch to loosely standardized floating point it seems like that these kinds of problems will tend to creep in.
Comment 5 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-07-22 16:01:37 PDT
Can we have the tinderbox machines use the reference driver? If we want to be testing non-reference drivers then presumably we'll need a separate pool of test machines for that anyway.
Comment 6 Jeff Muizelaar [:jrmuizel] 2010-07-23 14:33:30 PDT
I tried out the box shadow reftests with the reference driver and while the results were closer there was still a pixel that didn't match for reasons I could not explain. However, pixel shader's inputs and outputs were slightly different.

The reference is driver is also very slow. It takes 25 minutes to run the box-shadow reftests vs. the 7s seconds that hardware version. I also tried out WARP it was fast enough but had the worst errors of the three.
Comment 7 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-07-23 14:41:40 PDT
How much of a perf hit would it be to have the D2D backend do all fills via clip and paint?
Comment 8 Jeff Muizelaar [:jrmuizel] 2010-07-23 14:53:07 PDT
(In reply to comment #7)
> How much of a perf hit would it be to have the D2D backend do all fills via
> clip and paint?

I expect it would be noticeable. The D2D backend isn't very good at clipping. The clip and paint turns into a texture allocation the size of the bounds a fill() of that and then then a clip pass onto the destination. Texture allocation has in the past shown to be quite expensive.
Comment 9 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-07-23 22:49:54 PDT
Then I think we should just disable those tests for D2D.
Comment 10 Jeff Muizelaar [:jrmuizel] 2010-07-26 07:09:41 PDT
(In reply to comment #9)
> Then I think we should just disable those tests for D2D.

What's the advantage of disabling those tests over fuzzy matching them?
Comment 11 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-07-27 15:15:32 PDT
Maybe I'm just paranoid, but I worry that a general fuzzy-matching facility would be too strong a temptation for people to paper over small rendering differences.
Comment 12 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-07-27 15:21:16 PDT
And I think the thing to do for the bugs this blocks is to fix our painting code to use filling instead of clipping when painting background colors with rounded borders.
Comment 13 Jeff Muizelaar [:jrmuizel] 2010-08-05 15:58:14 PDT
Created attachment 463341 [details] [diff] [review]
Add fuzzy matching support

Here's an example of what this could look like.

It adds fuzzy-if and would be used like fuzzy-if(d2d). The fuzzy allows a max difference of 1 which should take care of any rounding differences.
Comment 14 Jeff Muizelaar [:jrmuizel] 2010-08-12 12:14:20 PDT
Created attachment 465328 [details] [diff] [review]
An updated version that actually works

The following tests can use fuzzy matching instead of disabling them.

fuzzy-if(d2d) == boxshadow-rounded-spread.html boxshadow-rounded-spread-ref.html
fuzzy-if(d2d) == boxshadow-onecorner.html boxshadow-onecorner-ref.html
fuzzy-if(d2d) == 523468-1.html 523468-1-ref.html
fuzzy-if(d2d) == 555388-1.html 555388-1-ref.html
fuzzy-if(d2d) == radial-1a.html radial-1-ref.html
fuzzy-if(d2d) == dynamic-2.html dynamic-2-ref.html
fuzzy-if(d2d) == filter-filterRes-low-01.svg pass.svg
Comment 15 Robert O'Callahan (:roc) (email my personal email if necessary) 2010-08-12 16:10:01 PDT
Comment on attachment 465328 [details] [diff] [review]
An updated version that actually works

I'm OK with the principle, but dbaron needs to review this.
Comment 16 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-08-12 16:44:14 PDT
So the idea here is that this allows colors to vary by a single color component?  So an rgb(0, 0, 1) pixel could match black in the reference, but an rgb(0, 0, 2) pixel could not?  I guess I could be ok with that.

However, you need to document what it means in layout/reftest/tools/README.txt.

Also, if we do this, I think the case where we don't have DOMWindowUtils would need to be no-longer-supported.  So I think we'd want to rip out the !gWindowUtils code.  (If that doesn't work on tinderbox, we have a problem... but I'm not going to guarantee that it will work on tinderbox; you need to test.)

>+            if (maxDifference.value == 1) {
>+                equal = expected == EXPECTED_FUZZY;
>+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
>+            }

I think this might be clearer as:

if (expected == EXPECTED_FUZZY && maxDifference.value == 1) {
  // When the expected result is fuzzy, a max difference of 1 counts as equal.
  equal = true;
}


Also, " " + "max difference: " doesn't need two strings.

And you don't want the change from "IMAGE 1" to "IMAGE 1a"; I'm guessing that was to see if your changes were taking effect?

>+    outputs[EXPECTED_FUZZY] = {
>+        true:  {s: "TEST-PASS"                  , n: "Pass"},
>+        false: {s: "TEST-UNEXPECTED-FAIL"       , n: "UnexpectedFail"}
>+    };

Can you just do outputs[EXPECTED_FUZZY] = outputs[EXPECTED_PASS] ?
Comment 17 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-08-12 16:52:49 PDT
Also, I'm going to ask that you add tests to layout/reftests/reftest-sanity/ that check that this works as expected.  Such tests would involve:
 * a pair of pieces of markup that have only single-color-component-value differences
 * a pair of pieces of markup that have some two-color-component-value differences
 * tests involving fuzzy, fuzzy-if(true), and fuzzy-if(false) with both == and !=
Comment 18 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-08-13 16:32:58 PDT
Comment on attachment 465328 [details] [diff] [review]
An updated version that actually works

Marking review- to elicit answers to questions (and perhaps also a revised patch).
Comment 19 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2010-09-04 07:12:30 PDT
Are you planning to update this patch so it can get landed?
Comment 20 Jeff Muizelaar [:jrmuizel] 2011-11-16 22:30:07 PST
Created attachment 575098 [details] [diff] [review]
Address review comments
Comment 21 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2011-12-09 14:56:12 PST
Comment on attachment 575098 [details] [diff] [review]
Address review comments

>+# reftest syntax: fuzzy
>+fuzzy == fuzzy.html fuzzy-ref.html
>+fuzzy != fuzzy2.html fuzzy-ref.html
>+fuzzy-if(true) == fuzzy2.html fuzzy-ref.html
>+fuzzy-if(false) == fuzzy-ref.html fuzzy-ref.html

Hmmm.  I hadn't thought about fuzzy with != before.  You need to document what it does.  I think == and != should be symmetric, so that != combined with fuzzy requires that it be more different than the fuzziness threshold.  And how it behaves should certainly be documented.  (I think it behaves right based on the code -- though I'm confused by the tests.)

You also need to include the actual test files in the patch.

>             if (gWindowUtils) {
>-                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, {});
>+                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, maxDifference);
>                 equal = (differences == 0);
>             } else {
>                 differences = -1;
>+                maxDifference.value = -1;
>                 var k1 = gCanvas1.toDataURL();
>                 var k2 = gCanvas2.toDataURL();
>                 equal = (k1 == k2);
>             }

You need to address my earlier comment about getting rid of the !gWindowUtils codepath.

> 
>-            // whether the comparison result matches what is in the manifest
>-            var test_passed = (equal == (gURLs[0].type == TYPE_REFTEST_EQUAL));
>             // what is expected on this platform (PASS, FAIL, or RANDOM)
>             var expected = gURLs[0].expected;
>+
>+            if (maxDifference.value == 2) {

This seems pretty wrong -- a maxDifference of 2 counts as a fuzzy match, but if it's 1 it doesn't count as a fuzzy match‽

>+                equal = expected == EXPECTED_FUZZY;
>+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
>+            }

See my earlier review comment on this.
Comment 22 Jeff Muizelaar [:jrmuizel] 2011-12-13 20:11:48 PST
Created attachment 581518 [details] [diff] [review]
Require gwindowutils
Comment 23 Jeff Muizelaar [:jrmuizel] 2011-12-13 20:13:17 PST
Created attachment 581519 [details] [diff] [review]
Address review comments better
Comment 24 Jeff Muizelaar [:jrmuizel] 2011-12-13 20:14:45 PST
(In reply to David Baron [:dbaron] from comment #21)
> Comment on attachment 575098 [details] [diff] [review]
> Address review comments
> 
> >+# reftest syntax: fuzzy
> >+fuzzy == fuzzy.html fuzzy-ref.html
> >+fuzzy != fuzzy2.html fuzzy-ref.html
> >+fuzzy-if(true) == fuzzy2.html fuzzy-ref.html
> >+fuzzy-if(false) == fuzzy-ref.html fuzzy-ref.html
> 
> Hmmm.  I hadn't thought about fuzzy with != before.  You need to document
> what it does.  I think == and != should be symmetric, so that != combined
> with fuzzy requires that it be more different than the fuzziness threshold. 
> And how it behaves should certainly be documented.  (I think it behaves
> right based on the code -- though I'm confused by the tests.)
> 
> You also need to include the actual test files in the patch.
> 
> >             if (gWindowUtils) {
> >-                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, {});
> >+                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, maxDifference);
> >                 equal = (differences == 0);
> >             } else {
> >                 differences = -1;
> >+                maxDifference.value = -1;
> >                 var k1 = gCanvas1.toDataURL();
> >                 var k2 = gCanvas2.toDataURL();
> >                 equal = (k1 == k2);
> >             }
> 
> You need to address my earlier comment about getting rid of the
> !gWindowUtils codepath.
> 
> > 
> >-            // whether the comparison result matches what is in the manifest
> >-            var test_passed = (equal == (gURLs[0].type == TYPE_REFTEST_EQUAL));
> >             // what is expected on this platform (PASS, FAIL, or RANDOM)
> >             var expected = gURLs[0].expected;
> >+
> >+            if (maxDifference.value == 2) {
> 
> This seems pretty wrong -- a maxDifference of 2 counts as a fuzzy match, but
> if it's 1 it doesn't count as a fuzzy match‽

Fixed.

> 
> >+                equal = expected == EXPECTED_FUZZY;
> >+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
> >+            }
> 
> See my earlier review comment on this.

I've chosen to keep it as it was. The advantage of this approach is that it will print things that match fuzzy but otherwise fail. If you still don't like it though, I'd be ok with changing it.
Comment 25 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2011-12-21 11:43:26 PST
Comment on attachment 581518 [details] [diff] [review]
Require gwindowutils

r=dbaron, but you need a better commit message
Comment 26 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2011-12-21 11:52:59 PST
Comment on attachment 581519 [details] [diff] [review]
Address review comments better

remove the file fuzzy.list -- nothing uses it.  (I presume you were using
it for debugging?)

In the reftest list, could you also add:

fails fuzzy-if(false) fuzzy.html fuzzy-ref.html

>+            if (maxDifference.value > 0 && maxDifference.value <= 2) {
>+                equal = expected == EXPECTED_FUZZY;
>+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
>+            }

OK, I suppose this is ok, as long as you put inside the if, before the
assignment to equal:

    if (equal) {
        throw "Inconsistent result from compareCanvases.";
    }

I also don't think dumping out "equal" and "expected" is useful there.
(It might have been while debugging the patch, but I don't think it is
to people debugging reftests.)


r=dbaron with that
Comment 27 Jeff Muizelaar [:jrmuizel] 2011-12-29 20:07:45 PST
https://hg.mozilla.org/integration/mozilla-inbound/rev/5bd2b99f0645
Comment 28 Phil Ringnalda (:philor) 2011-12-29 21:45:43 PST
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/3f2e5930cb89 - both fuzzy.html and too-fuzzy.html fail on Android reftest-1 (Android XUL, that being the one that actually runs reftests).
Comment 29 Jeff Muizelaar [:jrmuizel] 2012-01-05 15:34:00 PST
Created attachment 586248 [details] [diff] [review]
Fix the Android test failures

The Android test failures are presumably caused by 565. This fixes the problem by making too-fuzzy.html fuzzier and having the final test be random-if(Android).
Comment 30 Jeff Muizelaar [:jrmuizel] 2012-01-05 18:11:05 PST
(In reply to Jeff Muizelaar [:jrmuizel] from comment #29)
> Created attachment 586248 [details] [diff] [review]
> Fix the Android test failures
> 
> The Android test failures are presumably caused by 565. This fixes the
> problem by making too-fuzzy.html fuzzier and having the final test be
> random-if(Android).

Hmm. Apparently random-if(Android) fuzzy-if(false) doesn't seem to work properly.
Comment 31 Jeff Muizelaar [:jrmuizel] 2012-01-06 13:04:35 PST
(In reply to Jeff Muizelaar [:jrmuizel] from comment #30)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #29)
> > Created attachment 586248 [details] [diff] [review]
> > Fix the Android test failures
> > 
> > The Android test failures are presumably caused by 565. This fixes the
> > problem by making too-fuzzy.html fuzzier and having the final test be
> > random-if(Android).
> 
> Hmm. Apparently random-if(Android) fuzzy-if(false) doesn't seem to work
> properly.

But fuzzy-if(false) random-if(Android) seems to do the trick.
Comment 32 Jeff Muizelaar [:jrmuizel] 2012-01-09 16:15:17 PST
Created attachment 587193 [details] [diff] [review]
final version
Comment 33 Zack Weinberg (:zwol) 2012-01-13 15:21:03 PST
I kinda want a way to say "_these pixels_ may be fuzzy", but that can wait for a follow-up bug.
Comment 34 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-01-14 03:06:45 PST
You can mask out particular pixels with a transparent image, for example.

Maybe you still want to test that those pixels are close to expected, but I doubt that matters much.
Comment 35 Matt Brubeck (:mbrubeck) 2012-01-17 10:15:38 PST
Landed on inbound but backed out (along with the other patches it landed with):
https://hg.mozilla.org/integration/mozilla-inbound/rev/52f52eb53549

because one of the changes caused reftests to fail at startup with "REFTEST TEST-UNEXPECTED-FAIL | | EXCEPTION: SyntaxError: missing ) in parenthetical":
https://tbpl.mozilla.org/php/getParsedLog.php?id=8612031&tree=Mozilla-Inbound
Comment 36 Matt Brubeck (:mbrubeck) 2012-01-17 12:19:27 PST
Re-landed on inbound:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=8b4aae361875

(Please comment in bugs when patches land on inbound, especially if they have been backed out and/or re-landed.  It makes it much easier for us to update bugs correctly after merges.)
Comment 38 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-01-18 17:30:42 PST
(In reply to Jeff Muizelaar [:jrmuizel] from comment #31)
> But fuzzy-if(false) random-if(Android) seems to do the trick.

That must be fixed! We shouldn't be burdening developers with that kind of gotcha.
Comment 39 Jeff Muizelaar [:jrmuizel] 2012-01-18 17:50:03 PST
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #31)
> > But fuzzy-if(false) random-if(Android) seems to do the trick.
> 
> That must be fixed! We shouldn't be burdening developers with that kind of
> gotcha.

Well since both conditions are true there needs to be some sort of precedence. Currently, that precedence is right to left which might be weirder than left to right, but if you think of them evaluating from left to right it seems reasonable.
Comment 40 Robert O'Callahan (:roc) (email my personal email if necessary) 2012-01-18 18:21:37 PST
True, sorry.
Comment 41 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2012-01-24 14:40:55 PST
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #31)
> > But fuzzy-if(false) random-if(Android) seems to do the trick.
> 
> That must be fixed! We shouldn't be burdening developers with that kind of
> gotcha.

My intent was that the rightmost *true* expectation is the one that's used.  So something that's false shouldn't be involved in the precedence at all.  Were you seeing otherwise?  Looking at the code it looks like that's what it will do.

I noticed (while checking the code for the above that it worked as I expected) that this patch introduced a few lines that use the wrong indent (2 spaces instead of 4) here:

            } else if ((m = item.match(/^fuzzy\((\d+),(\d+)\)$/))) {
              cond = false;
              expected_status = EXPECTED_FUZZY;
              fuzzy_max_delta = Number(m[1]);
              fuzzy_max_pixels = Number(m[2]);
            } else if ((m = item.match(/^fuzzy-if\((.*?),(\d+),(\d+)\)$/))) {
              cond = false;
              if (Components.utils.evalInSandbox("(" + m[1] + ")", sandbox)) {
                expected_status = EXPECTED_FUZZY;
                fuzzy_max_delta = Number(m[2]);
                fuzzy_max_pixels = Number(m[3]);
              }
Comment 42 David Baron :dbaron: ⌚️UTC-7 (busy September 14-25) 2012-01-24 14:50:31 PST
Comment on attachment 587193 [details] [diff] [review]
final version

r=dbaron (though I'm not sure why you made a new review request when all of the changes matched the requests I made when marking review+ on an older version -- then again, you already landed it)

Note You need to log in before you can comment on or make changes to this bug.