Closed Bug 1521147 Opened 5 years ago Closed 5 years ago

WPT reftest known failure screenshots get logged with "TEST-FAIL" which reftest-analyzer does not understand

Categories

(Testing :: web-platform-tests, defect)

Version 3
defect
Not set
normal

Tracking

(firefox67 fixed)

RESOLVED FIXED
mozilla67
Tracking Status
firefox67 --- fixed

People

(Reporter: dholbert, Assigned: jgraham)

References

Details

Attachments

(2 files)

tl;dr: the --log-tbpl output from WPT currently flags known test failures as TEST-FAIL, which is incomplete (and rejected by reftest-analyzer) because it's missing a "known failure vs. unexpected failure" assessment. We should instead use TEST-KNOWN-FAIL which then lets human readers and reftest-analyzer understand the failure.

STR:

  1. Run the following command (for this test or any known-failure wpt reftest -- note that this particular one may be fixed in a few days so it'll become less good of a testfile):
./mach test --reftest-screenshot=fail \
--log-tbpl mylog.txt  testing/web-platform/tests/css/vendor-imports/mozilla/mozilla-central-reftests/flexbox/flexbox-justify-content-horiz-001b.xhtml
  1. Try to load your "mylog.txt" file in https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml
    ...or, just copypaste the 3 lines for the failure (with the filenames and REFTEST IMAGE...") into the reftest-analyzer.xhtml textbox and click the button to process it.

ACTUAL RESULTS:
reftest-analyzer rejects it, because the test is flagged as TEST-FAIL, which is a classification that reftest-analyzer doesn't understand, because it doesn't have an assessment of whether the failure was expected or not.

EXPECTED RESULTS:
The logging should say TEST-KNOWN-FAIL rather than TEST-FAIL. This is something that reftest-analyzer understands, and it also helpfully classifies the test failure for human readers.

Note 1: We also have bug 1521140 which may cause reftest-analyzer to reject your log for sillier reasons. (Fortunately that's easy to work around by just deleting the trailing text as indicated in that bug.)

Note 2: If you perform this bug's STR with a test that has an unexpected failure, then reftest-analyzer parses the log just fine, because the WPT harness flags those (in --log-tbpl output) as TEST-UNEXPECTED-FAIL, which reftest-analyzer understands.

Here's some sample --log-tbpl logged output that demonstrates the issue. The problematic line looks like this:

TEST-FAIL | /css/vendor-imports/mozilla/mozilla-central-reftests/flexbox/flexbox-justify-content-horiz-001b.xhtml | took 4069ms

If I adjust that line to say TEST-KNOWN-FAIL, then reftest analyzer handles it (and the subsequent two "REFTEST IMAGE" lines) just fine.

In case it matters, I'm using a debug build (ac_add_options --enable-debug --disable-optimize), built from this cset:
https://hg.mozilla.org/integration/autoland/rev/b0377ec57ddd

This same bug affects the terminal output of ./mach wpt, too. (Comment 0 uses mach test which doesn't log data URIs to the console for some reason; but if I run...

./mach wpt \
testing/web-platform/tests/css/vendor-imports/mozilla/mozilla-central-reftests/flexbox/flexbox-justify-content-horiz-001b.xhtml

...then I do see data URIs logged to the terminal, and they have this busted TEST-FAIL annotation in the line just before them, which I believe wants to be TEST-KNOWN-FAIL.

This allows TEST-FAIL in addition to TEST-KNOWN-FAIL for expeceted failures. It
also makes the second pipe ("|") and everything following it optional, to match
the behaviour of the mozlog formatter.

Thanks for taking a look at this. But: rather than making reftest-analyzer more permissive, would it be possible to instead change the logger to use TEST-KNOWN-FAIL for this scenario?

If that doesn't add complications/side-effects, that seems like a better solution, to me. The current "TEST-FAIL" tag feels like a misleading logging annotation for this scenario, from a human-readability perspective, because it seems to imply that the failure is "real" & unexpected. (when in fact, it's expected & not notable)

Flags: needinfo?(james)

Well I would argue we should take my pach in any case since it brings the reftest analyzer in line with the way that the tbpl logger has worked for the last four years or so. In particular the part about not requring a second pipe character is just a case where the analyzer is over-fitted to the output of one test harness and not matching the general format.

As for TEST-FAIL vs TEST-KNOWN-FAIL, the reason for preferring the former is that TEST-KNOWN-FAIL implies (to me) that the issue has been triaged by a human and judged to be not a problem. But at least for web-platform-tests that isn't true; any TEST-FAIL could be a serious interop problem that requires attention and may never have been looked at by anyone. The status means just that it doesn't represent a regression in firefox. I'd argue that TEST-FAIL is more neutral than TEST-KNOWN-FAIL and so conveys this semantic better.

There is a possibility of making that difference explicit; there's a convention of tagging bugs in wpt metadata files with bug: labels. I could pass that through the logging infrastructure and output TEST-KNOWN-FAIL when there's a bug to point at and TEST-FAIL when there isn't. I don't know if other harnesses have similar ways to add machine-readable annotataions, but I guess they probably do.

In any case, the reftest analyzer should support both TEST-FAIL and TEST-KNOWN-FAIL.

Of course there's also the possibility that some other tool is reading TBPL logs and not handling TEST-KNOWN-FAIL since that hasn't been produced by anything using the standard logging infrastructure for some time. So any changes here are risky :/

Flags: needinfo?(james)

(In reply to James Graham [:jgraham] from comment #7)

As for TEST-FAIL vs TEST-KNOWN-FAIL, the reason for preferring the former is that TEST-KNOWN-FAIL implies (to me) that the issue has been triaged by a human and judged to be not a problem.

"known" isn't necessarily meant to convey "not a problem".

Typically, when read by a human Gecko developer running e.g. a whole folder of tests, these annotations are proximally intended to help answer the question "Did the changes in my patch-stack cause this failure?"

TEST-KNOWN-FAIL indicates "probably not", because the test is already known to fail. Whereas TEST-FAIL, the rational assumption is "maybe/probably this is something I caused" -- we're green on the mozilla-central treeherder, so if I'm getting TEST-FAIL output, then it seems like I must've locally caused the problem.

But I get your point - some of these tests are only "known fail" in the sense that they've been observed to fail, but nobody's triaged the failure & determined whether/how-much we should care. Maybe we should create an annotation like TEST-UNTRIAGED-FAIL or something like that for these sorts of cases, which would be treated like TEST-KNOWN-FAIL but would convey extra information for human readers that maybe-we-should-look-into-this-one?

Anyway, I suppose the current patch is good for the time being.

Pushed by james@hoppipolla.co.uk:
https://hg.mozilla.org/integration/autoland/rev/77bd40408628
Make reftest analyzer more compatible with mozlog tbpl logger, r=dholbert
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Assignee: nobody → james
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: