Open Bug 1346692 Opened 7 years ago Updated 2 years ago

Reftest failure type may overwrite each other

Categories

(Testing :: Reftest, defect)

Version 3
defect

Tracking

(Not tracked)

People

(Reporter: shinglyu, Unassigned)

References

Details

If you write this kind of expectation in reftest.list:

   skip-if(http.oscpu=="Linux\u0020x86_64") fails-if(stylo) == foo.html foo.html

Assuming that I'm on a linux x86-64 AND stylo build, it will evaluate to EXPECTED-FAIL

But If I reverse the two:

   fails-if(stylo) skip-if(http.oscpu=="Linux\u0020x86_64") == foo.html foo.html

It will evaluate to SKIP. 

In this particular case, I think SKIP is a better solution, because when we use skip that usually means it's crashing on linux. There are already some similar usage in m-c like "skip-if(Android) fails-if(webrender)"

I was thinking about make reftest crash when it see skip-if followed by fails-if, but there are other case that are totally valid like:

    skip-if(!Andorid) fails-if(Android) ...

I'm not sure what's the best way to fix it, maybe we should collect all conditions and evaluate them together, rather then evaluate them one by one in a while loop.

Do you have any opinion on this, dbaron?
Flags: needinfo?(dbaron)
Another possibility is that we clearly define the specificity of the flags, and we collect all the flags, sort them by specificity then evaluate them from the least specific to the most specific.
The documentation has always said that later failure types listed override earlier ones:
https://searchfox.org/mozilla-central/rev/ef0b6a528e0e7b354ddf8cdce08392be8b8ca679/layout/tools/reftest/README.txt#56
this allows people to write whatever overriding they want by changing the order.  (Though there are a few things listed in that list that should really be separate.)

The alternative is to somehow rank the types, which might make sense for skip, but I think it's far less obvious how to do that for the other types, so I think I lean towards keeping things simple and understandable even if they don't always produce the result people expect, because it makes it always *possible* to produce the expected results.

For example, if a test is random-if() because on some machines the native theme button makes the test too wide but doesn't do the same to the reference, but fails-if() because of a bug on some platforms that causes a vertical alignment error, then we want the fails-if() to override the random-if() and report a TEST-UNEXPECTED-PASS if the test passes.  On the other hand, if a test is random-if() because an image drawing bug causes the entire test or reference to sometimes fail to draw (which might lead to one or both being blank) and fails-if() on some build configurations because of a positioning error, we want the random-if() to override the fails-if() so that we do not report a TEST-UNEXPECTED-PASS if both test and reference are blank.
Flags: needinfo?(dbaron)
(If anything, I think maybe the mistake was that earlier things in the line should have overridden later ones, instead of later ones overriding earlier ones.)
Blocks: 1337766
See Also: → 1337766
Thanks for clarifying the documentation!
bug 1057526 and bug 1299325 seem to need attention when it comes to operator precedence in the reftest manifest, specifically with fuzzy if.
See Also: → 1299325
The leave-open keyword is there and there is no activity for 6 months.
:ahal, maybe it's time to close this bug?
Flags: needinfo?(ahal)
I think there might still be some discussion to be had here, but the leave-open can definitely be removed.
Flags: needinfo?(ahal)
Keywords: leave-open
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.