Support fuzzy matching for reftests

RESOLVED FIXED in mozilla12

Status

()

Core
Graphics
RESOLVED FIXED
7 years ago
a year ago

People

(Reporter: jrmuizel, Assigned: jrmuizel)

Tracking

(Blocks: 1 bug)

Trunk
mozilla12
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 4 obsolete attachments)

(Assignee)

Description

7 years ago
The basic idea here is to support a threshold of differences so that we don't get as bitten by rasterization differences. The threshold would be per test and platform and likely a sum of squared differences.
I thought that we decided to explicitly not do this for reftests?
(Assignee)

Comment 2

7 years ago
(In reply to comment #1)
The alternative seems to be disabling some tests for D2D. Some reftests assume that clip() creates the same pixels as fill(), D2D doesn't appear to have this property.
(Assignee)

Updated

7 years ago
Blocks: 578134, 578135
Under what conditions? Is that not a D2D bug?
(Assignee)

Comment 4

7 years ago
(In reply to comment #3)
> Under what conditions? Is that not a D2D bug?

I took an in-depth look at this today. 

test:
 - draw a grey circle
reference:
 - clip a grey surface to a circle mask

It looks like we can get slightly different results from the floating point computations in this two situations. I wasn't able to get a great picture of where exactly the differences were coming from because the PIX debugger's results didn't match the hardware and when run with the reference driver the difference seemed to go away.

All of our previous graphics frameworks have done compositing with integers so with the switch to loosely standardized floating point it seems like that these kinds of problems will tend to creep in.
Can we have the tinderbox machines use the reference driver? If we want to be testing non-reference drivers then presumably we'll need a separate pool of test machines for that anyway.
(Assignee)

Comment 6

7 years ago
I tried out the box shadow reftests with the reference driver and while the results were closer there was still a pixel that didn't match for reasons I could not explain. However, pixel shader's inputs and outputs were slightly different.

The reference is driver is also very slow. It takes 25 minutes to run the box-shadow reftests vs. the 7s seconds that hardware version. I also tried out WARP it was fast enough but had the worst errors of the three.
How much of a perf hit would it be to have the D2D backend do all fills via clip and paint?
(Assignee)

Comment 8

7 years ago
(In reply to comment #7)
> How much of a perf hit would it be to have the D2D backend do all fills via
> clip and paint?

I expect it would be noticeable. The D2D backend isn't very good at clipping. The clip and paint turns into a texture allocation the size of the bounds a fill() of that and then then a clip pass onto the destination. Texture allocation has in the past shown to be quite expensive.
Then I think we should just disable those tests for D2D.
(Assignee)

Comment 10

7 years ago
(In reply to comment #9)
> Then I think we should just disable those tests for D2D.

What's the advantage of disabling those tests over fuzzy matching them?
Maybe I'm just paranoid, but I worry that a general fuzzy-matching facility would be too strong a temptation for people to paper over small rendering differences.
And I think the thing to do for the bugs this blocks is to fix our painting code to use filling instead of clipping when painting background colors with rounded borders.
(Assignee)

Comment 13

7 years ago
Created attachment 463341 [details] [diff] [review]
Add fuzzy matching support

Here's an example of what this could look like.

It adds fuzzy-if and would be used like fuzzy-if(d2d). The fuzzy allows a max difference of 1 which should take care of any rounding differences.
(Assignee)

Comment 14

7 years ago
Created attachment 465328 [details] [diff] [review]
An updated version that actually works

The following tests can use fuzzy matching instead of disabling them.

fuzzy-if(d2d) == boxshadow-rounded-spread.html boxshadow-rounded-spread-ref.html
fuzzy-if(d2d) == boxshadow-onecorner.html boxshadow-onecorner-ref.html
fuzzy-if(d2d) == 523468-1.html 523468-1-ref.html
fuzzy-if(d2d) == 555388-1.html 555388-1-ref.html
fuzzy-if(d2d) == radial-1a.html radial-1-ref.html
fuzzy-if(d2d) == dynamic-2.html dynamic-2-ref.html
fuzzy-if(d2d) == filter-filterRes-low-01.svg pass.svg
Attachment #463341 - Attachment is obsolete: true
Attachment #465328 - Flags: review?
Attachment #465328 - Flags: review? → review?(roc)
Comment on attachment 465328 [details] [diff] [review]
An updated version that actually works

I'm OK with the principle, but dbaron needs to review this.
Attachment #465328 - Flags: review?(roc) → review?(dbaron)
So the idea here is that this allows colors to vary by a single color component?  So an rgb(0, 0, 1) pixel could match black in the reference, but an rgb(0, 0, 2) pixel could not?  I guess I could be ok with that.

However, you need to document what it means in layout/reftest/tools/README.txt.

Also, if we do this, I think the case where we don't have DOMWindowUtils would need to be no-longer-supported.  So I think we'd want to rip out the !gWindowUtils code.  (If that doesn't work on tinderbox, we have a problem... but I'm not going to guarantee that it will work on tinderbox; you need to test.)

>+            if (maxDifference.value == 1) {
>+                equal = expected == EXPECTED_FUZZY;
>+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
>+            }

I think this might be clearer as:

if (expected == EXPECTED_FUZZY && maxDifference.value == 1) {
  // When the expected result is fuzzy, a max difference of 1 counts as equal.
  equal = true;
}


Also, " " + "max difference: " doesn't need two strings.

And you don't want the change from "IMAGE 1" to "IMAGE 1a"; I'm guessing that was to see if your changes were taking effect?

>+    outputs[EXPECTED_FUZZY] = {
>+        true:  {s: "TEST-PASS"                  , n: "Pass"},
>+        false: {s: "TEST-UNEXPECTED-FAIL"       , n: "UnexpectedFail"}
>+    };

Can you just do outputs[EXPECTED_FUZZY] = outputs[EXPECTED_PASS] ?
Also, I'm going to ask that you add tests to layout/reftests/reftest-sanity/ that check that this works as expected.  Such tests would involve:
 * a pair of pieces of markup that have only single-color-component-value differences
 * a pair of pieces of markup that have some two-color-component-value differences
 * tests involving fuzzy, fuzzy-if(true), and fuzzy-if(false) with both == and !=
Blocks: 586835
Comment on attachment 465328 [details] [diff] [review]
An updated version that actually works

Marking review- to elicit answers to questions (and perhaps also a revised patch).
Attachment #465328 - Flags: review?(dbaron) → review-
Are you planning to update this patch so it can get landed?
(Assignee)

Comment 20

6 years ago
Created attachment 575098 [details] [diff] [review]
Address review comments
Attachment #465328 - Attachment is obsolete: true
Attachment #575098 - Flags: review?(dbaron)
Comment on attachment 575098 [details] [diff] [review]
Address review comments

>+# reftest syntax: fuzzy
>+fuzzy == fuzzy.html fuzzy-ref.html
>+fuzzy != fuzzy2.html fuzzy-ref.html
>+fuzzy-if(true) == fuzzy2.html fuzzy-ref.html
>+fuzzy-if(false) == fuzzy-ref.html fuzzy-ref.html

Hmmm.  I hadn't thought about fuzzy with != before.  You need to document what it does.  I think == and != should be symmetric, so that != combined with fuzzy requires that it be more different than the fuzziness threshold.  And how it behaves should certainly be documented.  (I think it behaves right based on the code -- though I'm confused by the tests.)

You also need to include the actual test files in the patch.

>             if (gWindowUtils) {
>-                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, {});
>+                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, maxDifference);
>                 equal = (differences == 0);
>             } else {
>                 differences = -1;
>+                maxDifference.value = -1;
>                 var k1 = gCanvas1.toDataURL();
>                 var k2 = gCanvas2.toDataURL();
>                 equal = (k1 == k2);
>             }

You need to address my earlier comment about getting rid of the !gWindowUtils codepath.

> 
>-            // whether the comparison result matches what is in the manifest
>-            var test_passed = (equal == (gURLs[0].type == TYPE_REFTEST_EQUAL));
>             // what is expected on this platform (PASS, FAIL, or RANDOM)
>             var expected = gURLs[0].expected;
>+
>+            if (maxDifference.value == 2) {

This seems pretty wrong -- a maxDifference of 2 counts as a fuzzy match, but if it's 1 it doesn't count as a fuzzy match‽

>+                equal = expected == EXPECTED_FUZZY;
>+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
>+            }

See my earlier review comment on this.
Attachment #575098 - Attachment description: Addresss review comments → Address review comments
Attachment #575098 - Flags: review?(dbaron) → review-
(Assignee)

Comment 22

6 years ago
Created attachment 581518 [details] [diff] [review]
Require gwindowutils
Attachment #581518 - Flags: review?(dbaron)
(Assignee)

Comment 23

6 years ago
Created attachment 581519 [details] [diff] [review]
Address review comments better
Attachment #581519 - Flags: review?(dbaron)
(Assignee)

Comment 24

6 years ago
(In reply to David Baron [:dbaron] from comment #21)
> Comment on attachment 575098 [details] [diff] [review]
> Address review comments
> 
> >+# reftest syntax: fuzzy
> >+fuzzy == fuzzy.html fuzzy-ref.html
> >+fuzzy != fuzzy2.html fuzzy-ref.html
> >+fuzzy-if(true) == fuzzy2.html fuzzy-ref.html
> >+fuzzy-if(false) == fuzzy-ref.html fuzzy-ref.html
> 
> Hmmm.  I hadn't thought about fuzzy with != before.  You need to document
> what it does.  I think == and != should be symmetric, so that != combined
> with fuzzy requires that it be more different than the fuzziness threshold. 
> And how it behaves should certainly be documented.  (I think it behaves
> right based on the code -- though I'm confused by the tests.)
> 
> You also need to include the actual test files in the patch.
> 
> >             if (gWindowUtils) {
> >-                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, {});
> >+                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, maxDifference);
> >                 equal = (differences == 0);
> >             } else {
> >                 differences = -1;
> >+                maxDifference.value = -1;
> >                 var k1 = gCanvas1.toDataURL();
> >                 var k2 = gCanvas2.toDataURL();
> >                 equal = (k1 == k2);
> >             }
> 
> You need to address my earlier comment about getting rid of the
> !gWindowUtils codepath.
> 
> > 
> >-            // whether the comparison result matches what is in the manifest
> >-            var test_passed = (equal == (gURLs[0].type == TYPE_REFTEST_EQUAL));
> >             // what is expected on this platform (PASS, FAIL, or RANDOM)
> >             var expected = gURLs[0].expected;
> >+
> >+            if (maxDifference.value == 2) {
> 
> This seems pretty wrong -- a maxDifference of 2 counts as a fuzzy match, but
> if it's 1 it doesn't count as a fuzzy match‽

Fixed.

> 
> >+                equal = expected == EXPECTED_FUZZY;
> >+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
> >+            }
> 
> See my earlier review comment on this.

I've chosen to keep it as it was. The advantage of this approach is that it will print things that match fuzzy but otherwise fail. If you still don't like it though, I'd be ok with changing it.
Comment on attachment 581518 [details] [diff] [review]
Require gwindowutils

r=dbaron, but you need a better commit message
Attachment #581518 - Flags: review?(dbaron) → review+
Comment on attachment 581519 [details] [diff] [review]
Address review comments better

remove the file fuzzy.list -- nothing uses it.  (I presume you were using
it for debugging?)

In the reftest list, could you also add:

fails fuzzy-if(false) fuzzy.html fuzzy-ref.html

>+            if (maxDifference.value > 0 && maxDifference.value <= 2) {
>+                equal = expected == EXPECTED_FUZZY;
>+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
>+            }

OK, I suppose this is ok, as long as you put inside the if, before the
assignment to equal:

    if (equal) {
        throw "Inconsistent result from compareCanvases.";
    }

I also don't think dumping out "equal" and "expected" is useful there.
(It might have been while debugging the patch, but I don't think it is
to people debugging reftests.)


r=dbaron with that
Attachment #581519 - Flags: review?(dbaron) → review+
(Assignee)

Updated

6 years ago
Assignee: nobody → jmuizelaar
Target Milestone: --- → mozilla11
(Assignee)

Comment 27

6 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/5bd2b99f0645
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/3f2e5930cb89 - both fuzzy.html and too-fuzzy.html fail on Android reftest-1 (Android XUL, that being the one that actually runs reftests).
Target Milestone: mozilla11 → ---
(Assignee)

Comment 29

6 years ago
Created attachment 586248 [details] [diff] [review]
Fix the Android test failures

The Android test failures are presumably caused by 565. This fixes the problem by making too-fuzzy.html fuzzier and having the final test be random-if(Android).
Attachment #586248 - Flags: review?(dbaron)
(Assignee)

Comment 30

6 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #29)
> Created attachment 586248 [details] [diff] [review]
> Fix the Android test failures
> 
> The Android test failures are presumably caused by 565. This fixes the
> problem by making too-fuzzy.html fuzzier and having the final test be
> random-if(Android).

Hmm. Apparently random-if(Android) fuzzy-if(false) doesn't seem to work properly.
(Assignee)

Comment 31

6 years ago
(In reply to Jeff Muizelaar [:jrmuizel] from comment #30)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #29)
> > Created attachment 586248 [details] [diff] [review]
> > Fix the Android test failures
> > 
> > The Android test failures are presumably caused by 565. This fixes the
> > problem by making too-fuzzy.html fuzzier and having the final test be
> > random-if(Android).
> 
> Hmm. Apparently random-if(Android) fuzzy-if(false) doesn't seem to work
> properly.

But fuzzy-if(false) random-if(Android) seems to do the trick.
(Assignee)

Comment 32

6 years ago
Created attachment 587193 [details] [diff] [review]
final version
Attachment #581519 - Attachment is obsolete: true
Attachment #586248 - Attachment is obsolete: true
Attachment #587193 - Flags: review?(dbaron)
Attachment #586248 - Flags: review?(dbaron)
I kinda want a way to say "_these pixels_ may be fuzzy", but that can wait for a follow-up bug.
You can mask out particular pixels with a transparent image, for example.

Maybe you still want to test that those pixels are close to expected, but I doubt that matters much.
Blocks: 718329
Landed on inbound but backed out (along with the other patches it landed with):
https://hg.mozilla.org/integration/mozilla-inbound/rev/52f52eb53549

because one of the changes caused reftests to fail at startup with "REFTEST TEST-UNEXPECTED-FAIL | | EXCEPTION: SyntaxError: missing ) in parenthetical":
https://tbpl.mozilla.org/php/getParsedLog.php?id=8612031&tree=Mozilla-Inbound
Re-landed on inbound:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=8b4aae361875

(Please comment in bugs when patches land on inbound, especially if they have been backed out and/or re-landed.  It makes it much easier for us to update bugs correctly after merges.)
Target Milestone: --- → mozilla12
https://hg.mozilla.org/mozilla-central/rev/39948fbe6659
https://hg.mozilla.org/mozilla-central/rev/06c1e5e1387a
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(In reply to Jeff Muizelaar [:jrmuizel] from comment #31)
> But fuzzy-if(false) random-if(Android) seems to do the trick.

That must be fixed! We shouldn't be burdening developers with that kind of gotcha.
(Assignee)

Comment 39

6 years ago
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #31)
> > But fuzzy-if(false) random-if(Android) seems to do the trick.
> 
> That must be fixed! We shouldn't be burdening developers with that kind of
> gotcha.

Well since both conditions are true there needs to be some sort of precedence. Currently, that precedence is right to left which might be weirder than left to right, but if you think of them evaluating from left to right it seems reasonable.
True, sorry.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #31)
> > But fuzzy-if(false) random-if(Android) seems to do the trick.
> 
> That must be fixed! We shouldn't be burdening developers with that kind of
> gotcha.

My intent was that the rightmost *true* expectation is the one that's used.  So something that's false shouldn't be involved in the precedence at all.  Were you seeing otherwise?  Looking at the code it looks like that's what it will do.

I noticed (while checking the code for the above that it worked as I expected) that this patch introduced a few lines that use the wrong indent (2 spaces instead of 4) here:

            } else if ((m = item.match(/^fuzzy\((\d+),(\d+)\)$/))) {
              cond = false;
              expected_status = EXPECTED_FUZZY;
              fuzzy_max_delta = Number(m[1]);
              fuzzy_max_pixels = Number(m[2]);
            } else if ((m = item.match(/^fuzzy-if\((.*?),(\d+),(\d+)\)$/))) {
              cond = false;
              if (Components.utils.evalInSandbox("(" + m[1] + ")", sandbox)) {
                expected_status = EXPECTED_FUZZY;
                fuzzy_max_delta = Number(m[2]);
                fuzzy_max_pixels = Number(m[3]);
              }
Comment on attachment 587193 [details] [diff] [review]
final version

r=dbaron (though I'm not sure why you made a new review request when all of the changes matched the requests I made when marking review+ on an older version -- then again, you already landed it)
Attachment #587193 - Flags: review?(dbaron) → review+
Depends on: 1252361
You need to log in before you can comment on or make changes to this bug.