Open Bug 1567639 Opened 5 years ago Updated 2 years ago

Intermittent Android WPT reftest failures due to a CSS background image that hadn't (yet?) rendered

Categories

(Testing :: web-platform-tests, defect, P3)

Version 3
defect

Tracking

(Not tracked)

People

(Reporter: dholbert, Unassigned)

References

Details

Attachments

(1 file)

I've seen a number of intermittent WPT failure bugs filed recently where the failure seems to have been due to an image that didn't render (or didn't render yet) when the reftest snapshot was taken.

I'm filing this bug to aggregate these failures & so we can discuss/investigate them in a single place.

Depends on: 1567495
Blocks: 1566918
Blocks: 1567495
No longer depends on: 1567495
Blocks: 1567344, 1565386

Bugbug thinks this bug is a task, but please change it back in case of error.

Type: defect → task
Assignee: nobody → mjzffr
Type: task → defect
Blocks: 1553388
Summary: Intermittent Andoid WPT reftest failures due to an image that hadn't (yet?) rendered → Intermittent Android WPT reftest failures due to an image that hadn't (yet?) rendered
Blocks: 1568159
Blocks: 1569273
Blocks: 1569952
Blocks: 1569992
Blocks: 1571041
Blocks: 1571030
Blocks: 1570888
Blocks: 1570877
Blocks: 1570859
Blocks: 1570837
Blocks: 1570781
Blocks: 1570795
Blocks: 1570815
Blocks: 1570831
Blocks: 1570835

This category of failure is happening extremely frequently (and always in a different test, which means it always spawns a new bug report, which means it always adds a new bit of triage to be done). I just went through the list of new Layout bugs that were filed in the past ~day, and the majority of them were versions of this bug. (I should probably be duping them to this bug, but so far I've been marking them as dependent.)

It's making me wonder if we should mass-disable WPT tests on Android (perhaps anything with background-image). Maja, do you have any ideas here, either on a fix or on a way to stop the flood of incoming bugs?

Flags: needinfo?(mjzffr)
Blocks: 1571114
Blocks: 1571113

Sorry, I'm finally circling around to take a closer look at this. I'll report my findings on the 6th. If I don't have an easy fix, then yes, mass disabling while I investigate further will be a good choice.

(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #3)

Sorry, I'm finally circling around to take a closer look at this. I'll report my findings on the 6th. If I don't have an easy fix, then yes, mass disabling while I investigate further will be a good choice.

(Or rather mass-marking-these-as-known-intermittents. Disabling is a last resort.)

Blocks: 1571338
Blocks: 1571402
Blocks: 1571140
Summary: Intermittent Android WPT reftest failures due to an image that hadn't (yet?) rendered → Intermittent Android WPT reftest failures due to a CSS background image that hadn't (yet?) rendered

I think this case might be related as well: https://bugzilla.mozilla.org/show_bug.cgi?id=1554777.

That one is easy to reproduce locally, for some reason, unlike the others. The logging I observed in https://bugzilla.mozilla.org/show_bug.cgi?id=1554777#c12 is quite different now, so I'll take a look again.

Blocks: 1554777

(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #5)

I think this case might be related as well: https://bugzilla.mozilla.org/show_bug.cgi?id=1554777.

That one is easy to reproduce locally, for some reason, unlike the others. The logging I observed in https://bugzilla.mozilla.org/show_bug.cgi?id=1554777#c12 is quite different now, so I'll take a look again.

To recap, in the case of css/css-backgrounds/border-image-width-006.xht I can clearly see that the image resource is loaded, but the border-image-source rule does not appear to kick-in (the green box is not visible) until I interact with the GeckoView app in some way (touch the content window, or even inspect the element through the remote debugger). Unfortunately, adb logcat with lots of debug logging hasn't revealed anything that might point to a cause. I've tried both in TestRunnerActivity and GeckoViewExample.

dholbert, would you mind trying this out on your end?

Steps to reproduce:

  1. Delete testing/web-platform/meta/css/css-backgrounds/border-image-width-006.xht.ini so the test is enabled.
  2. Enable Android artifact builds in mach bootstrap, then ./mach build and ./mach package
  3. ./mach android-emulator --version x86-7.0
  4. ./mach wpt testing/web-platform/tests/css/css-backgrounds/border-image-width-006.xht - this will run the test in TestRunnerActivity and pause the runner. You can leave it running to use its server for the test URLs during manual testing.

You can also see the failure when running manually like:

adb shell am start -W -n org.mozilla.geckoview.test/org.mozilla.geckoview.test.TestRunnerActivity -a android.intent.action.MAIN -es args "http://web-platform.test:8000/css/css-backgrounds/border-image-width-006.xht"

I took a closer look in Nightly's about:debugging which is where I could see that the image resource was indeed loaded despite not being displayed: enable USB and connect to the remote browser and then inspect the "tab" in question.

Aside: For some reason I can't inspect remote tabs on a browser that's been started by the wpt test harness so I take some extra steps to run the test outside of the harness:

  1. ./mach run will install and run GeckoViewExample
  2. In desktop Nightly about:debugging, connect to "Mozilla Fennec" -- that's actually GeckoViewExample
  3. Inspect the main process and set the pref to recognize the web-platform.test domain in the browser console, then debug as usual
Preferences = ChromeUtils.import(
  "resource://gre/modules/Preferences.jsm"
);
Preferences.set("network.dns.localDomains", "web-platform.test")

(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #6)

To recap, in the case of css/css-backgrounds/border-image-width-006.xht I can clearly see that the image resource is loaded, but the border-image-source rule does not appear to kick-in (the green box is not visible) until I interact with the GeckoView app in some way (touch the content window, or even inspect the element through the remote debugger).

Also, :snorp, can you think of any GeckoView/Android-specific reason why the document layout/paint is paused as described above? My guess is it's somehow related to focus? The test itself doesn't rely on any event handlers or CSS animation.

Flags: needinfo?(snorp)

When I try to profile on mobile, GeckoViewExample/TestRunnerActivity disconnects, BUT by some fluke I just saw this same css-style-not-displaying-until-click behaviour on desktop. In the attached profile you can clearly see two paints after my click at 14s.

Good find!

FWIW, in my ac_add_options --enable-debug --enable-optimize="-O1" GeckoView build, running in Android emulator, I can reproduce the failure about 20% of the time via the adb shell command from comment 6 (running that after running the ./mach wpt ... command).

RE comment 8's profile: for convenience, here's the profiler.firefox.com-hosted view of it: https://perfht.ml/33gBVek

Flags: needinfo?(dholbert)

I can reproduce the failure in Desktop Nightly as well (by visiting http://localhost:8000/css/css-backgrounds/border-image-width-006.xht while the WPT server is running for a desktop mozilla-central WPT run)

I'm sorry to say, I suspect the border-image-width-006.xht failure may be unrelated to the intermittent background-image failures...

The border-image-width-006.xht testcase uses border-image-slice which creates an extra wrapper ClippedImage struct under the hood, and I think that extra wrapper is what's introducing trouble.

I'm going to poke at that test locally and see if I can figure out what's going on, but for now I'll post my results for that test on bug 1554777 to avoid adding too many potentially-unrelated-to-the-real-underlying-issue comments on this bug.

Blocks: 1571923
Blocks: 1571945

I cannot reproduce this behaviour locally with any of the blocked tests, so it's really hard to see what's going on.

Lacking any better ideas, here's a try push with additional Marionette logging that could show something interesting about document life cycle and rendering with respect to screenshots: https://treeherder.mozilla.org/#/jobs?repo=try&revision=09992c40ef22851379e5a011fcfd659061c17688

If we could see when the images render, if at all, that would be great.

(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #4)

(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #3)

Sorry, I'm finally circling around to take a closer look at this. I'll report my findings on the 6th. If I don't have an easy fix, then yes, mass disabling while I investigate further will be a good choice.

(Or rather mass-marking-these-as-known-intermittents. Disabling is a last resort.)

On second thought, I don't think a mass metadata update is a good idea here. The most affected directories seem to be "*borders", "*backgrounds" and "*writing-modes". In these directories alone, the number of css reftests using a support image for background-image or background, etc, is very large. At the same time, I see enough other failure modes than image-not-loading/rendering among the tests reported in the last week, and I'm concerned about silencing that signal.

Sheriffs have started lumping some of these intermittents together such as in Bug 1572420 and Bug 1568236, which isn't great either due to the variety of failures, but it should lighten your triage burden a bit.

I will spend some time next week fixing Bug 1567638 to make the reftest analyzer more helpful.

Assignee: mjzffr → nobody

The priority flag is not set for this bug.
:jgraham, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(james)
Flags: needinfo?(james)
Priority: -- → P1

(In reply to Maja Frydrychowicz :maja_zf (UTC-4) (maja@mozilla.com) from comment #14)

Extra logging with more clues about paints here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3ab01d4ee50081a2c35f400e8c73ee5c89a71368

Alright, so that yielded some new information about paints.

A passing test looks like this. Based on the transaction IDs, at least 3 paints occur between pageshow and our screenshots 64ms later.

Starting load of http://web-platform.test:8000/css/css-writing-modes/abs-pos-non-replaced-vrl-126.xht
08-14 07:49:46.700 17998 18013 D GeckoViewNavigation[C]: loadURI: uri=http://web-platform.test:8000/css/css-writing-modes/abs-pos-non-replaced-vrl-126.xht where=1 flags=0x1 tp=null
08-14 07:49:46.701 17934 17934 I GeckoSession: handleMessage GeckoView:OnLoadRequest uri=http://web-platform.test:8000/css/css-writing-modes/abs-pos-non-replaced-vrl-126.xht
08-14 07:49:46.701 17998 18013 D GeckoViewNavigation[C]: shouldLoadURI http://web-platform.test:8000/css/css-writing-modes/abs-pos-non-replaced-vrl-126.xht
08-14 07:49:46.702 17998 18013 I Gecko   : 1565765386702    Marionette  DEBUG   [6] Waiting for page load of http://web-platform.test:8000/css/css-writing-modes/abs-pos-non-replaced-vrl-126.xht
08-14 07:49:46.708 17934 17949 W GeckoEventDispatcher: No listener for GeckoView:StateUpdated
08-14 07:49:46.710 17998 18013 D GeckoViewSelectionAction[C]: handleEvent: visibilitychange
08-14 07:49:46.710 17998 18013 D GeckoViewContent[C]: handleEvent: pagehide
08-14 07:49:46.711 17998 18013 D GeckoViewAutoFill: Clearing auto-fill
08-14 07:49:46.715 17934 17949 E GeckoConsole: [JavaScript Error: "TypeError: this.docShell is null" {file: "chrome://global/content/elements/browser-custom-element.js" line: 899}]
08-14 07:49:46.720 17998 18013 D GeckoViewContent[C]: handleEvent: DOMTitleChanged
08-14 07:49:46.720 17998 18013 D GeckoViewContent[C]: handleEvent: DOMContentLoaded
08-14 07:49:46.754 17998 18013 D GeckoViewContent[C]: handleEvent: pageshow
08-14 07:49:46.754 17998 18013 I Gecko   : 1565765386754    Marionette  DEBUG   [6] Waiting for rendering
08-14 07:49:46.754 17998 18013 I Gecko   : 1565765386754    Marionette  DEBUG   [6] afterPaintWasPending: true
08-14 07:49:46.754 17998 18013 I Gecko   : 1565765386754    Marionette  DEBUG   [6] lastTransactionId: 825
08-14 07:49:46.754 17998 18013 I Gecko   : 1565765386754    Marionette  DEBUG   [6] flush another window
08-14 07:49:46.754 17998 18013 I Gecko   : 1565765386754    Marionette  DEBUG   [6] reftestWait: true
08-14 07:49:46.768 17998 18013 I Gecko   : 1565765386768    Marionette  DEBUG   [6] MAPaint was pending, handled: 825
08-14 07:49:46.769 17998 18013 I Gecko   : 1565765386769    Marionette  DEBUG   [6] afterPaintWasPending: true
08-14 07:49:46.769 17998 18013 I Gecko   : 1565765386769    Marionette  DEBUG   [6] lastTransactionId: 827
08-14 07:49:46.769 17998 18013 I Gecko   : 1565765386769    Marionette  DEBUG   [6] flush another window
08-14 07:49:46.769 17998 18013 I Gecko   : 1565765386769    Marionette  DEBUG   [6] reftestWait: true
08-14 07:49:46.784 17998 18013 I Gecko   : 1565765386784    Marionette  DEBUG   [6] MAPaint was pending, handled: 826
08-14 07:49:46.784 17998 18013 I Gecko   : 1565765386784    Marionette  DEBUG   [6] afterPaintWasPending: false
08-14 07:49:46.784 17998 18013 I Gecko   : 1565765386784    Marionette  DEBUG   [6] lastTransactionId: 827
08-14 07:49:46.784 17998 18013 I Gecko   : 1565765386784    Marionette  DEBUG   [6] flush another window
08-14 07:49:46.806 17998 18013 I Gecko   : 1565765386806    Marionette  DEBUG   [6] We think rendering is done:lastTransactionId: 827
08-14 07:49:46.818 17934 17949 I Gecko   : 1565765386818    Marionette  DEBUG   lhs canvas size 800x600
08-14 07:49:46.818 17934 17949 I Gecko   : 1565765386818    Marionette  DEBUG   rhs canvas size 800x600
08-14 07:49:46.820 17934 17949 I Gecko   : 1565765386820    Marionette  INFO    No differences allowed

Failing test has no new paints after pageshow. "We think rendering is done:lastTransactionId: 817" indicates that no new paints happen after our last layout or paint flush. The screenshot is taken some time in the following frame or later, in this case 50ms after pageshow.

08-14 07:49:09.926 17921 17936 I Gecko   : 1565765349926    Marionette  DEBUG   [6] Waiting for page load of http://web-platform.test:8000/css/css-writing-modes/abs-pos-non-replaced-vrl-126.xht
08-14 07:49:09.934 17921 17936 D GeckoViewSelectionAction[C]: handleEvent: visibilitychange
08-14 07:49:09.934 17921 17936 D GeckoViewContent[C]: handleEvent: pagehide
08-14 07:49:09.934 17921 17936 D GeckoViewAutoFill: Clearing auto-fill
08-14 07:49:09.939 17858 17873 E GeckoConsole: [JavaScript Error: "TypeError: this.docShell is null" {file: "chrome://global/content/elements/browser-custom-element.js" line: 899}]
08-14 07:49:09.945 17921 17936 D GeckoViewContent[C]: handleEvent: DOMTitleChanged
08-14 07:49:09.946 17921 17936 D GeckoViewContent[C]: handleEvent: DOMContentLoaded
08-14 07:49:10.001 17921 17936 D GeckoViewContent[C]: handleEvent: pageshow
08-14 07:49:10.001 17921 17936 I Gecko   : 1565765350001    Marionette  DEBUG   [6] Waiting for rendering
08-14 07:49:10.001 17921 17936 I Gecko   : 1565765350001    Marionette  DEBUG   [6] afterPaintWasPending: false
08-14 07:49:10.002 17921 17936 I Gecko   : 1565765350001    Marionette  DEBUG   [6] lastTransactionId: 817
08-14 07:49:10.002 17921 17936 I Gecko   : 1565765350002    Marionette  DEBUG   [6] flush another window
08-14 07:49:10.018 17921 17936 I Gecko   : 1565765350018    Marionette  DEBUG   [6] We think rendering is done:lastTransactionId: 817
08-14 07:49:10.051 17858 17873 I Gecko   : 1565765350051    Marionette  DEBUG   lhs canvas size 800x600
08-14 07:49:10.051 17858 17873 I Gecko   : 1565765350051    Marionette  DEBUG   rhs canvas size 800x600
08-14 07:49:10.056 17858 17873 I Gecko   : 1565765350056    Marionette  INFO    No differences allowed
08-14 07:49:10.104 17858 17873 I Gecko   : 1565765350104    Marionette  DEBUG   Canvas pool (800x600) is of length 1
08-14 07:49:10.104 17858 17873 I Gecko   : 1565765350104    Marionette  DEBUG   2 <- [1,208,null,{"value":{"status":"FAIL","message":"Testing http://web-platform.test:8000/css/css-writing-modes/abs-pos-non-repl ... QIAACQRoAAAABpBAgAAJBGgAAAAGkECAAAkEaAAAAAaQQIAACQRoAAAABpBAgAAJBGgAAAAGkECAAAkEaAAAAAaf4/OR5qgehc+w4AAAAASUVORK5CYII="}]}}}]

Maybe this is also covered by Bug 1554777 after all?

Thanks for doing that investigation!

Maybe this is also covered by Bug 1554777 after all?

It's conceivable they're related, but it's unlikely that the patch there will help with the other failures here. That bug's patch is highly specific to border-image (it makes tweaks like IsBorderImageSizeAvailable()), which won't change the logic for CSS background images that are involved in most of the failures here.

Blocks: 1575117
Blocks: 1567475
No longer blocks: 1570795
No longer blocks: 1571402

I'm not sure if this is still happening, and we've marked all the resulting failures as intermittent, or if the problem went away at some point.

Priority: P1 → P3
Severity: normal → S3

The severity field for this bug is relatively low, S3. However, the bug has 14 duplicates.
:jgraham, could you consider increasing the bug severity?

For more information, please visit auto_nag documentation.

Flags: needinfo?(james)
Flags: needinfo?(james)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: