Closed Bug 1553584 Opened 5 years ago Closed 5 years ago

web-platform-test failures on Android x86-64 with the wrong reference case being used

Categories

(Testing :: web-platform-tests, defect, P1)

Version 3
defect

Tracking

(firefox69 fixed)

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: dholbert, Assigned: impossibus)

References

()

Details

Attachments

(1 file)

When looking at bug 1553388's reftest-analyzer view, I noticed that some of the tests seem to have been compared against the wrong reference case. (Not the test that bug 1553388 was filed for, but the failure right after it in the log. Hence, spinning this off as a separate bug from bug 1553388.)

Log:
https://treeherder.mozilla.org/logviewer.html#?job_id=247695754&repo=mozilla-central

Reftest-analyzer view:
https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/XANi9e4kTjOQTBvJHQIIag/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1

If you view that reftest-analyzer ^^ link and click the second entry (css/CSS2/backgrounds/background-root-020.xht), you can see that its reference case snapshot is VERY different from its test screenshot. In fact, it looks like the test screenshot is correct (green+lime), and the reference-case-screenshot that we used was the reference case for the previous test! (i.e. we're comparing it against background-root-019-ref.xht. instead of against 020-ref.xht)

For convenience, here's the test in question:
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/css/CSS2/backgrounds/background-root-020.xht

reftest-analyzer shows what appears to be a correct snapshot of that test (green page, 2 light-green borders, and descriptive text).

But for the reference snapshot, reftest-analyzer shows what seems to be this unrelated reference case:
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/css/CSS2/backgrounds/background-root-019-ref.xht
...which is not the reference case that background-root-020.xht actually calls for.

See Also: → 1553388

I've noticed a bunch of new bug reports come in today for web-platform-test intermittent failures, all on "Android 7.0 x86-64", many of which look like there might be a race condition involved. Specifically, these bugs:
bug 1553349, bug 1553350, bug 1553388, bug 1553404, bug 1553643, bug 1553645

They mostly don't look like the specific issue described in comment 0 here (wrong-reference-case-snapshot), but many of them look like they've got some sort of reftest-snapshot race condition involved -- e.g. an image or font not having loaded yet.

gbrown or jgraham, do you know if there were any recent changes to our testing infra that might've caused these? (Or perhaps: was this Android 7.0 x86-64 platform only just recently spun up/turned on/made-visible?)

Flags: needinfo?(gbrown)
Flags: needinfo?(james)

Android wpt is relatively new, but I don't know of very recent changes. :kwierso is the expert...

Flags: needinfo?(gbrown) → needinfo?(wkocher)
Summary: web-platform-test failures on Android with the wrong reference case being used → web-platform-test failures on Android x86-64 with the wrong reference case being used

android x86-64 web-platform-tests are specifically testing against geckoview's testrunneractivity. I landed a patch to bump these jobs up to tier-2 (they were previously tier-3, so hidden from view) a few days ago, because they seemed to be roughly the same green-ness as the android x86 wpt (which runs against regular fennec).

I'm not familiar enough with geckoview's internals to know if there's a problem with it or maybe a race condition between it and marionette. I've seen weird things with crashes where the crash stack sometimes prints before the "Test Crashed" lines, but sometimes after. Maybe it's the same thing?

Looking at the logs from the failures, I think it's always the last test in a "batch" that fails... At least, there's some logging that looks like "starting up a new batch" after each reftest failure message. And there are about 5 seconds between the TEST-UNEXPECTED-FAIL message and the next TEST-START message. (I've omitted the part of the line with the timestamp in the snippet here, for brevity, but you can see it in actual logs).

Here's one example -- notice all the logging between the REFTEST IMAGE lines and the subsequent TEST-START line:

INFO - TEST-START | /css/CSS2/borders/border-left-color-062.xht
INFO - TEST-PASS | /css/CSS2/borders/border-left-color-062.xht | took 346ms
INFO - TEST-START | /css/CSS2/borders/border-left-color-063.xht
INFO - TEST-UNEXPECTED-FAIL | /css/CSS2/borders/border-left-color-063.xht | Testing http://web-platform.test:8000/css/CSS2/borders/border-left-color-063.xht == http://web-platform.test:8000/css/CSS2/borders/border-bottom-color-063-ref.xht
INFO - REFTEST IMAGE 1 (TEST): data:image/png;base64,[...]
INFO - REFTEST IMAGE 2 (REFERENCE): data:image/png;base64,[...]
INFO took 328ms
INFO - Closing logging queue
INFO - queue closed
INFO - Setting android reftest preferences
INFO - Setting up ssl
INFO - certutil |
INFO - certutil |
INFO - certutil |
INFO - Certificate Nickname Trust Attributes
INFO - SSL,S/MIME,JAR/XPI
INFO -
INFO - web-platform-tests CT,,
INFO -
INFO - adb Granting important runtime permissions to org.mozilla.geckoview.test
INFO - adb launch_application: am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity -a android.intent.action.MAIN --es env9 MOZ_PROCESS_LOG=/tmp/tmpjqsbitpidlog --es env8 MOZ_CRASHREPORTER_NO_REPORT=1 --es args "-no-remote -profile /sdcard/tests/profile --marionette about:blank" --es env3 STYLO_THREADS=4 --es env2 MOZ_HIDE_RESULTS_TABLE=1 --es env1 R_LOG_VERBOSE=1 --es env0 MOZ_CRASHREPORTER=1 --es env7 MOZ_DISABLE_NONLOCAL_CONNECTIONS=1 --es env6 R_LOG_DESTINATION=stderr --es env5 MOZ_CRASHREPORTER_SHUTDOWN=1 --es env4 MOZ_LOG=signaling:3,mtransport:4,DataChannel:4,jsep:4 --ez use_multiprocess True --es env10 R_LOG_LEVEL=6
INFO - Starting runner
INFO - TEST-START | /css/CSS2/borders/border-left-color-064.xht
INFO - TEST-PASS | /css/CSS2/borders/border-left-color-064.xht | took 729ms
INFO - TEST-START | /css/CSS2/borders/border-left-color-065.xht

So: unless I'm misinterpreting the logging (which is entirely possible), then maybe there's a bug in the harness that causes trouble for the last test in each batch? (i.e. there's a problem with the way we run/snapshot the last test before each chunk of "Setting android reftest preferences", "Setting up ssl", etc., which I'm interpreting as the logging for starting up a subsequent batch of tests)

(In reply to Daniel Holbert [:dholbert] from comment #2)

I've noticed a bunch of new bug reports come in today

Two more bugs that were just filed: bug 1553656, bug 1553657
(Both are Android x86-64; both are explainable via an image-not-yet-loaded-when-snapshot-is-taken race condition; and both have logging that looks like my previous comment, with "restarting-the-test-process"-looking logging after the test failure).

kwierso, given this volume of bug reports, maybe this platform isn't ready to be Tier 2 after all? I suspect these bug reports are indicative of an issue with the platform and/or harness, so I'm worried that we're just going to have a neverending stream of such reports for random tests.

(In reply to Daniel Holbert [:dholbert] from comment #6)

kwierso, given this volume of bug reports, maybe this platform isn't ready to be Tier 2 after all? I suspect these bug reports are indicative of an issue with the platform and/or harness, so I'm worried that we're just going to have a neverending stream of such reports for random tests.

Fair, I'll drop them back to tier-3 again in bug 1550895.

Flags: needinfo?(wkocher)

The last test in a batch thing is misleading because we always restart the runner after we get an unexpected result. So I expect there's just a race here when we're running the reftest code in geckoview. I think that majazf and/or Kwierso should take a look in the first instance, but this seems very high priority.

Flags: needinfo?(james)
Priority: -- → P1

Yes, the emulator logcat shows that screenshot comparison is done while the relevant doc is still loading. Sometimes the wrong reference doc is used in the screenshot, sometimes wrong test doc: e.g. in the case of border-right-color-025.xht, the reference is correct but the test data being used is for the previous test (border-right-color-024.xht).

I'm still digging around, but so far I think this could be a problem with how reftestWait [1] works with TestRunnerActivity.

[1] https://searchfox.org/mozilla-central/rev/662de518b1686c4769320d6b8825ce4864c4eda0/testing/marionette/listener.js#1605

I did some try runs with debug-level logging from Marionette (I also did a trace-level run, but that didn't seem to include any additional logging. "marionette.log.level": "trace" should work, no?):

Here's a failing job with extra logging: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=248971383&repo=try&lineNumber=4361
The Marionette debug messages are in the logcat log.

Excerpt

05-29 21:22:16.560  7218  7233 I Gecko   : 1559161336560    Marionette  INFO    Testing http://web-platform.test:8000/css/CSS2/borders/border-top-width-028.xht == http://web-platform.test:8000/css/CSS2/borders/border-bottom-width-006-ref.xht
05-29 21:22:16.560  7218  7233 I Gecko   : 1559161336560    Marionette  DEBUG   screenshot http://web-platform.test:8000/css/CSS2/borders/border-bottom-width-006-ref.xht remainingCount: 20 cache: true cacheKey: 800x600
05-29 21:22:16.560  7218  7233 I Gecko   : 1559161336560    Marionette  DEBUG   screenshot http://web-platform.test:8000/css/CSS2/borders/border-bottom-width-006-ref.xht taken from cache
05-29 21:22:16.560  7218  7233 I Gecko   : 1559161336560    Marionette  DEBUG   screenshot http://web-platform.test:8000/css/CSS2/borders/border-top-width-028.xht remainingCount: 1 cache: false cacheKey: 800x600
05-29 21:22:16.560  7218  7233 I Gecko   : 1559161336560    Marionette  DEBUG   reusing canvas from canvas pool
05-29 21:22:16.560  7218  7233 I Gecko   : 1559161336560    Marionette  DEBUG   Starting load of http://web-platform.test:8000/css/CSS2/borders/border-top-width-028.xht
05-29 21:22:16.560  7278  7293 D GeckoViewNavigation[C]: loadURI: uri=http://web-platform.test:8000/css/CSS2/borders/border-top-width-028.xht where=1 flags=0x1 tp=null
05-29 21:22:16.560  7278  7293 D GeckoViewNavigation[C]: shouldLoadURI http://web-platform.test:8000/css/CSS2/borders/border-top-width-028.xht
05-29 21:22:16.580  7278  7293 I Gecko   : 1559161336580    Marionette  DEBUG   [2147483649] Waiting for page load of http://web-platform.test:8000/css/CSS2/borders/border-top-width-028.xht
05-29 21:22:16.580  7278  7293 D GeckoViewSelectionAction[C]: handleEvent: visibilitychange
05-29 21:22:16.580  7278  7293 D GeckoViewContent[C]: handleEvent: pagehide
05-29 21:22:16.580  7278  7293 D GeckoViewAutoFill: Clearing auto-fill
05-29 21:22:16.580  7278  7293 I Gecko   : ++DOMWINDOW == 38 (0x7f293febb000) [pid = 7278] [serial = 557] [outer = 0x7f29421e8020]
05-29 21:22:16.580  7218  7233 D GeckoViewNavigation: onLocationChange
05-29 21:22:16.580  7218  7233 E GeckoConsole: [JavaScript Error: "TypeError: this.webNavigation is null" {file: "chrome://global/content/elements/browser-custom-element.js" line: 336}]
<snip> webNavigation is null happens in successful run also
05-29 21:22:16.590  7278  7293 D GeckoViewContent[C]: handleEvent: DOMTitleChanged
05-29 21:22:16.590  7278  7293 D GeckoViewContent[C]: handleEvent: DOMContentLoaded
05-29 21:22:16.610  7278  7293 D GeckoViewContent[C]: handleEvent: pageshow
<< Note the next line [1] >>
05-29 21:22:16.620  7278  7293 I Gecko   : 1559161336620    Marionette  DEBUG   [2147483649] Waiting for rendering  
05-29 21:22:16.620  7278  7293 I Gecko   : 1559161336620    Marionette  DEBUG   [2147483649] flushRendering false
05-29 21:22:16.690  7218  7233 I Gecko   : 1559161336690    Marionette  DEBUG   lhs canvas size 800x600
05-29 21:22:16.690  7218  7233 I Gecko   : 1559161336690    Marionette  DEBUG   rhs canvas size 800x600
05-29 21:22:16.690  7218  7233 I Gecko   : 1559161336690    Marionette  INFO    No differences allowed
05-29 21:22:16.690  7218  7233 I Gecko   : 1559161336690    Marionette  INFO    Found 20624 pixels different, maximum difference per channel 255
05-29 21:22:16.720  7218  7233 I Gecko   : 1559161336720    Marionette  DEBUG   Canvas pool (800x600) is of length 1

So in the above example, case border-top-width-028.xht has reference border-bottom-width-006-ref.xht.
Reference is loaded from cache successfully, but harness uses screenshot of border-top-width-027.xht as test (the previous test). From the logs, it seems that border-top-width-028.xht does actually load, and yet the screenshot doesn't reflect that. At note [1] in the log excerpt, I'm guessing this Promise resolves too soon.

I have to pause now, but I'll test that theory when I'm back tonight or on Monday.

Aside from that, the debug log didn't reveal anything interesting when I compared it to the same successful test in a different run. The race behaviour is not related to caching of screenshots or canvas pools in reftest.js. The time taken to "wait for rendering" doesn't vary between success and failure either so I don't think we're hitting any timeouts. I noticed a lot of this.webNavigation is null errors, but that happens in successful jobs just as much.

Setting n-i on jgraham: in case you have any thoughts to add regarding the reftest-related code in Marionette.

Flags: needinfo?(james)

Yeah, I guess it's plausible that the rendering is so far behind the navigation that we actually end up with the layout from the previous page. I wonder if a layout developer has any insight into whether that's possible and how we might go about debugging it and/or preventing it?

Flags: needinfo?(james)

We're only seeing this behaviour on Android debug (as far as I can tell -- identifying the bad screenshots can only be done manually) but I don't think it actually has anything to do with Android or geckoview. I suspect that the test environment is just slow enough to make the race apparent.

I'm able to reproduce this locally with testing/web-platform/tests/css/CSS2/css1/c44-ln-box-001.xht, and I've been looking at MozAfterPaint.transactionId and lastTransactionId from windowUtils.

See https://treeherder.mozilla.org/#/jobs?repo=try&revision=6172469318a211978728f0b872179f925d70afc1

I record lastTransactionId at each call to flushWindow. When the test passes, the transaction id of the last MozAfterPaint that we handle before taking a screenshot is equal to lastTransactionId. When the test fails, it's either less than lastTransactionId or we never wait for MozAfterPaint at all.

The way I understand it is: there are more "paints" to process, but we're not flushing them. This is odd, because isMozAfterPaintPending is false by the time we take a screenshot, and yet...

So maybe I can use the transaction ID concept to fix the problem. Also requestAnimationFrame instead of plain setTimeout might help us not resolve the Promise too early.

dholbert, do you have any idea why the transaction IDs might be off like this, or what can be done to flush the layout further? All the harness does now is call getBoundingClientRect()

Flags: needinfo?(dholbert)

From above try push, this job is a good example of an unexpected failure: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=250074632&repo=try&lineNumber=1799 (See logcat)

Excerpt (failure):

06-05 03:04:12.200  3410  3425 I Gecko   : 1559700252200	Marionette	DEBUG	[2147483649] Waiting for page load of http://web-platform.test:8000/css/CSS2/css1/c44-ln-box-003.xht
06-05 03:04:12.224  3410  3425 D GeckoViewContent[C]: handleEvent: DOMTitleChanged
06-05 03:04:12.226  3410  3425 D GeckoViewContent[C]: handleEvent: DOMContentLoaded
06-05 03:04:12.249  3410  3425 D GeckoViewContent[C]: handleEvent: pageshow
06-05 03:04:12.252  3410  3425 I Gecko   : 1559700252252	Marionette	DEBUG	[2147483649] Waiting for rendering
06-05 03:04:12.253  3410  3425 I Gecko   : 1559700252253	Marionette	DEBUG	[2147483649] afterPaintWasPending: false
06-05 03:04:12.253  3410  3425 I Gecko   : 1559700252253	Marionette	DEBUG	[2147483649] lastTransactionId: 144
06-05 03:04:12.253  3410  3425 I Gecko   : 1559700252253	Marionette	DEBUG	[2147483649] flush another window
06-05 03:04:12.253  3410  3425 I Gecko   : 1559700252253	Marionette	DEBUG	[2147483649] flushRendering false
06-05 03:04:12.324  3351  3366 I Gecko   : 1559700252324	Marionette	DEBUG	lhs canvas size 800x600
06-05 03:04:12.324  3351  3366 I Gecko   : 1559700252324	Marionette	DEBUG	rhs canvas size 800x600
06-05 03:04:12.325  3351  3366 I Gecko   : 1559700252325	Marionette	INFO	No differences allowed
06-05 03:04:12.325  3351  3366 I Gecko   : 1559700252325	Marionette	INFO	Found 43200 pixels different, maximum difference per channel 255

Versus the log for same test when it passes:

06-05 03:04:09.768  3425  3440 I Gecko   : 1559700249768	Marionette	DEBUG	[2147483649] Waiting for rendering
06-05 03:04:09.768  3425  3440 I Gecko   : 1559700249768	Marionette	DEBUG	[2147483649] afterPaintWasPending: true
06-05 03:04:09.768  3425  3440 I Gecko   : 1559700249768	Marionette	DEBUG	[2147483649] lastTransactionId: 146
06-05 03:04:09.768  3425  3440 I Gecko   : 1559700249768	Marionette	DEBUG	[2147483649] flush another window
06-05 03:04:09.771  3425  3440 I Gecko   : 1559700249771	Marionette	DEBUG	[2147483649] flushRendering true
06-05 03:04:09.877  3425  3440 I Gecko   : 1559700249877	Marionette	DEBUG	[2147483649] current transaction id: 145
06-05 03:04:09.877  3425  3440 I Gecko   : 1559700249877	Marionette	DEBUG	[2147483649] afterPaintWasPending: true
06-05 03:04:09.877  3425  3440 I Gecko   : 1559700249877	Marionette	DEBUG	[2147483649] lastTransactionId: 148
06-05 03:04:09.877  3425  3440 I Gecko   : 1559700249877	Marionette	DEBUG	[2147483649] flush another window
06-05 03:04:09.878  3425  3440 I Gecko   : 1559700249878	Marionette	DEBUG	[2147483649] flushRendering true
06-05 03:04:09.903  3425  3440 I Gecko   : 1559700249903	Marionette	DEBUG	[2147483649] current transaction id: 147
06-05 03:04:09.903  3425  3440 I Gecko   : 1559700249903	Marionette	DEBUG	[2147483649] afterPaintWasPending: false
06-05 03:04:09.903  3425  3440 I Gecko   : 1559700249903	Marionette	DEBUG	[2147483649] lastTransactionId: 148
06-05 03:04:09.903  3425  3440 I Gecko   : 1559700249903	Marionette	DEBUG	[2147483649] flush another window
06-05 03:04:09.903  3425  3440 I Gecko   : 1559700249903	Marionette	DEBUG	[2147483649] flushRendering false
06-05 03:04:10.016  3366  3381 I Gecko   : 1559700250016	Marionette	DEBUG	lhs canvas size 800x600
06-05 03:04:10.016  3366  3381 I Gecko   : 1559700250016	Marionette	DEBUG	rhs canvas size 800x600
06-05 03:04:10.016  3366  3381 I Gecko   : 1559700250016	Marionette	INFO	No differences allowed

(In reply to Maja Frydrychowicz :maja_zf (she/her) (UTC-4) (maja@mozilla.com) from comment #12)

dholbert, do you have any idea why the transaction IDs might be off like this, or what can be done to flush the layout further? All the harness does now is call getBoundingClientRect()

One idea: so, your trigger-a-flush function getBoundingClientRect() just flushes layout (per its use of FlushType::Layout), and it doesn't flush painting -- so if there's some async pending paint and we snapshot immediately after the layout flush, it's conceivable that we'll get an out-of-date rendering. (I'm hand-waving a bit here, because I'm not sure about timing of paints, particularly on Android. But it's conceivable to me that this might be the problem.)

So, if we want to flush more thoroughly, we might want to use a function that flushes with FlushType::Display. One option for that is the nsIDOMWindowUtils method updateLayerTree() (which flushes Display here). It looks like that's what we use in the normal reftest harness, actually: https://searchfox.org/mozilla-central/rev/153172de0c5bfca31ef861bd8fc0995f44cada6a/layout/tools/reftest/reftest-content.js#1121,1137

Flags: needinfo?(dholbert)
Assignee: nobody → mjzffr
Pushed by mjzffr@gmail.com: https://hg.mozilla.org/integration/autoland/rev/c271ae1b60b9 Delay wpt reftest screenshots until after paint. r=jgraham
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
Regressions: 1558849
See Also: → 1823452
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: