Closed Bug 580786 Opened 14 years ago Closed 12 years ago

Support fuzzy matching for reftests

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla12

People

(Reporter: jrmuizel, Assigned: jrmuizel)

References

(Blocks 1 open bug)

Details

Attachments

(3 files, 4 obsolete files)

Add fuzzy matching support 14 years ago Jeff Muizelaar [:jrmuizel] 2.79 KB, patch		Details \| Diff \| Splinter Review
An updated version that actually works 14 years ago Jeff Muizelaar [:jrmuizel] 4.14 KB, patch	dbaron : review-	Details \| Diff \| Splinter Review
Address review comments 13 years ago Jeff Muizelaar [:jrmuizel] 6.18 KB, patch	dbaron : review-	Details \| Diff \| Splinter Review
Require gwindowutils 13 years ago Jeff Muizelaar [:jrmuizel] 2.17 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review
Address review comments better 13 years ago Jeff Muizelaar [:jrmuizel] 7.43 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review
Fix the Android test failures 13 years ago Jeff Muizelaar [:jrmuizel] 1.05 KB, patch		Details \| Diff \| Splinter Review
final version 12 years ago Jeff Muizelaar [:jrmuizel] 8.07 KB, patch	dbaron : review+	Details \| Diff \| Splinter Review

Jeff Muizelaar [:jrmuizel]

Assignee

Description

•

14 years ago

The basic idea here is to support a threshold of differences so that we don't get as bitten by rasterization differences. The threshold would be per test and platform and likely a sum of squared differences.

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 1

•

14 years ago

I thought that we decided to explicitly not do this for reftests?

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 2

•

14 years ago

(In reply to comment #1)
The alternative seems to be disabling some tests for D2D. Some reftests assume that clip() creates the same pixels as fill(), D2D doesn't appear to have this property.

Jeff Muizelaar [:jrmuizel]

Assignee

Updated

•

14 years ago

Blocks: 578134, 578135

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 3

•

14 years ago

Under what conditions? Is that not a D2D bug?

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 4

•

14 years ago

(In reply to comment #3)
> Under what conditions? Is that not a D2D bug?

I took an in-depth look at this today. 

test:
 - draw a grey circle
reference:
 - clip a grey surface to a circle mask

It looks like we can get slightly different results from the floating point computations in this two situations. I wasn't able to get a great picture of where exactly the differences were coming from because the PIX debugger's results didn't match the hardware and when run with the reference driver the difference seemed to go away.

All of our previous graphics frameworks have done compositing with integers so with the switch to loosely standardized floating point it seems like that these kinds of problems will tend to creep in.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 5

•

14 years ago

Can we have the tinderbox machines use the reference driver? If we want to be testing non-reference drivers then presumably we'll need a separate pool of test machines for that anyway.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 6

•

14 years ago

I tried out the box shadow reftests with the reference driver and while the results were closer there was still a pixel that didn't match for reasons I could not explain. However, pixel shader's inputs and outputs were slightly different.

The reference is driver is also very slow. It takes 25 minutes to run the box-shadow reftests vs. the 7s seconds that hardware version. I also tried out WARP it was fast enough but had the worst errors of the three.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 7

•

14 years ago

How much of a perf hit would it be to have the D2D backend do all fills via clip and paint?

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 8

•

14 years ago

(In reply to comment #7)
> How much of a perf hit would it be to have the D2D backend do all fills via
> clip and paint?

I expect it would be noticeable. The D2D backend isn't very good at clipping. The clip and paint turns into a texture allocation the size of the bounds a fill() of that and then then a clip pass onto the destination. Texture allocation has in the past shown to be quite expensive.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 9

•

14 years ago

Then I think we should just disable those tests for D2D.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 10

•

14 years ago

(In reply to comment #9)
> Then I think we should just disable those tests for D2D.

What's the advantage of disabling those tests over fuzzy matching them?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 11

•

14 years ago

Maybe I'm just paranoid, but I worry that a general fuzzy-matching facility would be too strong a temptation for people to paper over small rendering differences.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 12

•

14 years ago

And I think the thing to do for the bugs this blocks is to fix our painting code to use filling instead of clipping when painting background colors with rounded borders.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 13

•

14 years ago

Attached patch Add fuzzy matching support (obsolete) — Details — Splinter Review

Here's an example of what this could look like.

It adds fuzzy-if and would be used like fuzzy-if(d2d). The fuzzy allows a max difference of 1 which should take care of any rounding differences.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 14

•

14 years ago

Attached patch An updated version that actually works (obsolete) — Details — Splinter Review

The following tests can use fuzzy matching instead of disabling them.

fuzzy-if(d2d) == boxshadow-rounded-spread.html boxshadow-rounded-spread-ref.html
fuzzy-if(d2d) == boxshadow-onecorner.html boxshadow-onecorner-ref.html
fuzzy-if(d2d) == 523468-1.html 523468-1-ref.html
fuzzy-if(d2d) == 555388-1.html 555388-1-ref.html
fuzzy-if(d2d) == radial-1a.html radial-1-ref.html
fuzzy-if(d2d) == dynamic-2.html dynamic-2-ref.html
fuzzy-if(d2d) == filter-filterRes-low-01.svg pass.svg

Attachment #463341 - Attachment is obsolete: true

Attachment #465328 - Flags: review?

Joe Drew (not getting mail)

Updated

•

14 years ago

Attachment #465328 - Flags: review? → review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

14 years ago

Comment on attachment 465328 [details] [diff] [review]
An updated version that actually works

I'm OK with the principle, but dbaron needs to review this.

Attachment #465328 - Flags: review?(roc) → review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 16

•

14 years ago

So the idea here is that this allows colors to vary by a single color component?  So an rgb(0, 0, 1) pixel could match black in the reference, but an rgb(0, 0, 2) pixel could not?  I guess I could be ok with that.

However, you need to document what it means in layout/reftest/tools/README.txt.

Also, if we do this, I think the case where we don't have DOMWindowUtils would need to be no-longer-supported.  So I think we'd want to rip out the !gWindowUtils code.  (If that doesn't work on tinderbox, we have a problem... but I'm not going to guarantee that it will work on tinderbox; you need to test.)

>+            if (maxDifference.value == 1) {
>+                equal = expected == EXPECTED_FUZZY;
>+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
>+            }

I think this might be clearer as:

if (expected == EXPECTED_FUZZY && maxDifference.value == 1) {
  // When the expected result is fuzzy, a max difference of 1 counts as equal.
  equal = true;
}


Also, " " + "max difference: " doesn't need two strings.

And you don't want the change from "IMAGE 1" to "IMAGE 1a"; I'm guessing that was to see if your changes were taking effect?

>+    outputs[EXPECTED_FUZZY] = {
>+        true:  {s: "TEST-PASS"                  , n: "Pass"},
>+        false: {s: "TEST-UNEXPECTED-FAIL"       , n: "UnexpectedFail"}
>+    };

Can you just do outputs[EXPECTED_FUZZY] = outputs[EXPECTED_PASS] ?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 17

•

14 years ago

Also, I'm going to ask that you add tests to layout/reftests/reftest-sanity/ that check that this works as expected.  Such tests would involve:
 * a pair of pieces of markup that have only single-color-component-value differences
 * a pair of pieces of markup that have some two-color-component-value differences
 * tests involving fuzzy, fuzzy-if(true), and fuzzy-if(false) with both == and !=

Joe Drew (not getting mail)

Updated

•

14 years ago

Blocks: 586835

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 18

•

14 years ago

Comment on attachment 465328 [details] [diff] [review]
An updated version that actually works

Marking review- to elicit answers to questions (and perhaps also a revised patch).

Attachment #465328 - Flags: review?(dbaron) → review-

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 19

•

14 years ago

Are you planning to update this patch so it can get landed?

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 20

•

13 years ago

Attached patch Address review comments — Details — Splinter Review

Attachment #465328 - Attachment is obsolete: true

Attachment #575098 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 21

•

13 years ago

Comment on attachment 575098 [details] [diff] [review]
Address review comments

>+# reftest syntax: fuzzy
>+fuzzy == fuzzy.html fuzzy-ref.html
>+fuzzy != fuzzy2.html fuzzy-ref.html
>+fuzzy-if(true) == fuzzy2.html fuzzy-ref.html
>+fuzzy-if(false) == fuzzy-ref.html fuzzy-ref.html

Hmmm.  I hadn't thought about fuzzy with != before.  You need to document what it does.  I think == and != should be symmetric, so that != combined with fuzzy requires that it be more different than the fuzziness threshold.  And how it behaves should certainly be documented.  (I think it behaves right based on the code -- though I'm confused by the tests.)

You also need to include the actual test files in the patch.

>             if (gWindowUtils) {
>-                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, {});
>+                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, maxDifference);
>                 equal = (differences == 0);
>             } else {
>                 differences = -1;
>+                maxDifference.value = -1;
>                 var k1 = gCanvas1.toDataURL();
>                 var k2 = gCanvas2.toDataURL();
>                 equal = (k1 == k2);
>             }

You need to address my earlier comment about getting rid of the !gWindowUtils codepath.

> 
>-            // whether the comparison result matches what is in the manifest
>-            var test_passed = (equal == (gURLs[0].type == TYPE_REFTEST_EQUAL));
>             // what is expected on this platform (PASS, FAIL, or RANDOM)
>             var expected = gURLs[0].expected;
>+
>+            if (maxDifference.value == 2) {

This seems pretty wrong -- a maxDifference of 2 counts as a fuzzy match, but if it's 1 it doesn't count as a fuzzy match‽

>+                equal = expected == EXPECTED_FUZZY;
>+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
>+            }

See my earlier review comment on this.

Attachment #575098 - Attachment description: Addresss review comments → Address review comments

Attachment #575098 - Flags: review?(dbaron) → review-

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 22

•

13 years ago

Attached patch Require gwindowutils — Details — Splinter Review

Attachment #581518 - Flags: review?(dbaron)

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 23

•

13 years ago

Attached patch Address review comments better (obsolete) — Details — Splinter Review

Attachment #581519 - Flags: review?(dbaron)

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 24

•

13 years ago

(In reply to David Baron [:dbaron] from comment #21)
> Comment on attachment 575098 [details] [diff] [review]
> Address review comments
> 
> >+# reftest syntax: fuzzy
> >+fuzzy == fuzzy.html fuzzy-ref.html
> >+fuzzy != fuzzy2.html fuzzy-ref.html
> >+fuzzy-if(true) == fuzzy2.html fuzzy-ref.html
> >+fuzzy-if(false) == fuzzy-ref.html fuzzy-ref.html
> 
> Hmmm.  I hadn't thought about fuzzy with != before.  You need to document
> what it does.  I think == and != should be symmetric, so that != combined
> with fuzzy requires that it be more different than the fuzziness threshold. 
> And how it behaves should certainly be documented.  (I think it behaves
> right based on the code -- though I'm confused by the tests.)
> 
> You also need to include the actual test files in the patch.
> 
> >             if (gWindowUtils) {
> >-                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, {});
> >+                differences = gWindowUtils.compareCanvases(gCanvas1, gCanvas2, maxDifference);
> >                 equal = (differences == 0);
> >             } else {
> >                 differences = -1;
> >+                maxDifference.value = -1;
> >                 var k1 = gCanvas1.toDataURL();
> >                 var k2 = gCanvas2.toDataURL();
> >                 equal = (k1 == k2);
> >             }
> 
> You need to address my earlier comment about getting rid of the
> !gWindowUtils codepath.
> 
> > 
> >-            // whether the comparison result matches what is in the manifest
> >-            var test_passed = (equal == (gURLs[0].type == TYPE_REFTEST_EQUAL));
> >             // what is expected on this platform (PASS, FAIL, or RANDOM)
> >             var expected = gURLs[0].expected;
> >+
> >+            if (maxDifference.value == 2) {
> 
> This seems pretty wrong -- a maxDifference of 2 counts as a fuzzy match, but
> if it's 1 it doesn't count as a fuzzy match‽

Fixed.

> 
> >+                equal = expected == EXPECTED_FUZZY;
> >+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
> >+            }
> 
> See my earlier review comment on this.

I've chosen to keep it as it was. The advantage of this approach is that it will print things that match fuzzy but otherwise fail. If you still don't like it though, I'd be ok with changing it.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 25

•

13 years ago

Comment on attachment 581518 [details] [diff] [review]
Require gwindowutils

r=dbaron, but you need a better commit message

Attachment #581518 - Flags: review?(dbaron) → review+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 26

•

13 years ago

Comment on attachment 581519 [details] [diff] [review]
Address review comments better

remove the file fuzzy.list -- nothing uses it.  (I presume you were using
it for debugging?)

In the reftest list, could you also add:

fails fuzzy-if(false) fuzzy.html fuzzy-ref.html

>+            if (maxDifference.value > 0 && maxDifference.value <= 2) {
>+                equal = expected == EXPECTED_FUZZY;
>+                gDumpLog("REFTEST fuzzy match: " + equal + " " + expected + " \n");
>+            }

OK, I suppose this is ok, as long as you put inside the if, before the
assignment to equal:

    if (equal) {
        throw "Inconsistent result from compareCanvases.";
    }

I also don't think dumping out "equal" and "expected" is useful there.
(It might have been while debugging the patch, but I don't think it is
to people debugging reftests.)


r=dbaron with that

Attachment #581519 - Flags: review?(dbaron) → review+

Jeff Muizelaar [:jrmuizel]

Assignee

Updated

•

13 years ago

Assignee: nobody → jmuizelaar

Target Milestone: --- → mozilla11

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 27

•

13 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/5bd2b99f0645

Phil Ringnalda (:philor)

Comment 28

•

13 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/3f2e5930cb89 - both fuzzy.html and too-fuzzy.html fail on Android reftest-1 (Android XUL, that being the one that actually runs reftests).

Target Milestone: mozilla11 → ---

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 29

•

13 years ago

Attached patch Fix the Android test failures (obsolete) — Details — Splinter Review

The Android test failures are presumably caused by 565. This fixes the problem by making too-fuzzy.html fuzzier and having the final test be random-if(Android).

Attachment #586248 - Flags: review?(dbaron)

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 30

•

13 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #29)
> Created attachment 586248 [details] [diff] [review]
> Fix the Android test failures
> 
> The Android test failures are presumably caused by 565. This fixes the
> problem by making too-fuzzy.html fuzzier and having the final test be
> random-if(Android).

Hmm. Apparently random-if(Android) fuzzy-if(false) doesn't seem to work properly.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 31

•

13 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #30)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #29)
> > Created attachment 586248 [details] [diff] [review]
> > Fix the Android test failures
> > 
> > The Android test failures are presumably caused by 565. This fixes the
> > problem by making too-fuzzy.html fuzzier and having the final test be
> > random-if(Android).
> 
> Hmm. Apparently random-if(Android) fuzzy-if(false) doesn't seem to work
> properly.

But fuzzy-if(false) random-if(Android) seems to do the trick.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 32

•

12 years ago

Attached patch final version — Details — Splinter Review

Attachment #581519 - Attachment is obsolete: true

Attachment #586248 - Attachment is obsolete: true

Attachment #587193 - Flags: review?(dbaron)

Attachment #586248 - Flags: review?(dbaron)

Zack Weinberg (:zwol)

Comment 33

•

12 years ago

I kinda want a way to say "_these pixels_ may be fuzzy", but that can wait for a follow-up bug.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 34

•

12 years ago

You can mask out particular pixels with a transparent image, for example.

Maybe you still want to test that those pixels are close to expected, but I doubt that matters much.

Matt Woodrow (:mattwoodrow)

Updated

•

12 years ago

Blocks: 718329

Matt Brubeck (:mbrubeck)

Comment 35

•

12 years ago

Landed on inbound but backed out (along with the other patches it landed with):
https://hg.mozilla.org/integration/mozilla-inbound/rev/52f52eb53549

because one of the changes caused reftests to fail at startup with "REFTEST TEST-UNEXPECTED-FAIL | | EXCEPTION: SyntaxError: missing ) in parenthetical":
https://tbpl.mozilla.org/php/getParsedLog.php?id=8612031&tree=Mozilla-Inbound

Matt Brubeck (:mbrubeck)

Comment 36

•

12 years ago

Re-landed on inbound:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=8b4aae361875

(Please comment in bugs when patches land on inbound, especially if they have been backed out and/or re-landed.  It makes it much easier for us to update bugs correctly after merges.)

Target Milestone: --- → mozilla12

Joe Drew (not getting mail)

Comment 37

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/39948fbe6659
https://hg.mozilla.org/mozilla-central/rev/06c1e5e1387a

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 38

•

12 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #31)
> But fuzzy-if(false) random-if(Android) seems to do the trick.

That must be fixed! We shouldn't be burdening developers with that kind of gotcha.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 39

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #31)
> > But fuzzy-if(false) random-if(Android) seems to do the trick.
> 
> That must be fixed! We shouldn't be burdening developers with that kind of
> gotcha.

Well since both conditions are true there needs to be some sort of precedence. Currently, that precedence is right to left which might be weirder than left to right, but if you think of them evaluating from left to right it seems reasonable.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 40

•

12 years ago

True, sorry.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 41

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #38)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #31)
> > But fuzzy-if(false) random-if(Android) seems to do the trick.
> 
> That must be fixed! We shouldn't be burdening developers with that kind of
> gotcha.

My intent was that the rightmost *true* expectation is the one that's used.  So something that's false shouldn't be involved in the precedence at all.  Were you seeing otherwise?  Looking at the code it looks like that's what it will do.

I noticed (while checking the code for the above that it worked as I expected) that this patch introduced a few lines that use the wrong indent (2 spaces instead of 4) here:

            } else if ((m = item.match(/^fuzzy\((\d+),(\d+)\)$/))) {
              cond = false;
              expected_status = EXPECTED_FUZZY;
              fuzzy_max_delta = Number(m[1]);
              fuzzy_max_pixels = Number(m[2]);
            } else if ((m = item.match(/^fuzzy-if\((.*?),(\d+),(\d+)\)$/))) {
              cond = false;
              if (Components.utils.evalInSandbox("(" + m[1] + ")", sandbox)) {
                expected_status = EXPECTED_FUZZY;
                fuzzy_max_delta = Number(m[2]);
                fuzzy_max_pixels = Number(m[3]);
              }

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 42

•

12 years ago

Comment on attachment 587193 [details] [diff] [review]
final version

r=dbaron (though I'm not sure why you made a new review request when all of the changes matched the requests I made when marking review+ on an older version -- then again, you already landed it)

Attachment #587193 - Flags: review?(dbaron) → review+

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

8 years ago

Depends on: 1252361

You need to log in before you can comment on or make changes to this bug.