Empty screenshot when attaching GeckoView to already loaded session
Categories
(Core :: Graphics, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr68 | --- | unaffected |
firefox-esr78 | --- | unaffected |
firefox79 | --- | wontfix |
firefox80 | --- | wontfix |
firefox81 | --- | fixed |
People
(Reporter: jonalmeida, Assigned: jnicol)
References
(Regression)
Details
(Keywords: regression, Whiteboard: [geckoview:m81][fenix:p1])
Attachments
(3 files)
In bug 1641064, we decided to fail fast when we cannot create a thumbnail. This approach seems to work better for our Fenix needs. However, a side-effect from it seems to be that it provides us with a blank bitmap on failure. It's not reasonably possible to figure out if the bitmap is blank or has content in it unfortunately.
Would it be possible to have the return value be a GeckoResult<Bitmap?>
instead?
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 1•5 years ago
•
|
||
This approach seems to work better for our Fenix needs. However, a side-effect from it seems to be that it provides us with a blank bitmap on failure
Hey Jon, as far as I am aware we don't intentionally send an empty Bitmap. When we fail fast in Bug 1641064 we reject the GeckoResult
, so no empty bitmap is sent there.
Reporter | ||
Comment 2•5 years ago
•
|
||
For sanity sake, I re-verified that we are indeed getting blank bitmaps back from GV. Christian corrected me in realizing that waiting for the firstContentFullPaint
was added to only stop the exceptions (but we were still seeing those come through) and not for getting a useful screenshot.
We're seeing this happen when we return to the browser and we take a screenshot if the page is done loading.
Comment 3•5 years ago
|
||
(In reply to Jonathan Almeida [:jonalmeida] from comment #2)
We're seeing this happen when we return to the browser and we take a screenshot if the page is done loading.
I see, that makes sense. We can definitely look at this case and figure out if there's something we can do better here.
Comment 4•5 years ago
|
||
A good first step here would be to see if we can write a junit test for this.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Comment 6•5 years ago
|
||
@Jamie: Could you take a look at this.
Assignee | ||
Comment 7•5 years ago
|
||
Sure. Jonathan, do you have a simple steps to reproduce? I'm afraid I'm not sure how to translate "attaching GeckoView to already loaded session" in to user interactions! Also, are you seeing this on specific devices, or all devices? (ie is it webrender or non-webrender?) Thanks!
Reporter | ||
Comment 8•5 years ago
•
|
||
Hey Jamie, sorry about the lack of context.
Steps to reproduce
- Open Fenix or Reference Browser and load a site with static content (so that its easier to visually see the differences) like https://mozilla.github.io/geckoview/.
- Open the tabs tray and observe a thumbnail image.
- Switch to a different screen by going to the three-dot kebab menu and clicking Settings.
- Return back to page by pressing back.
- Open the tabs tray and observe a thumbnail image.
Expected results
- At 5, you should see the same screenshot as the step 3.
Actual results
- At 5, you see a blank white thumbnail.
This used to be easier to reproduce on Fenix but there have been lots of changes there so it's a hit of miss for me, we are still getting lots of reports on the issue though, for example this one; Reference Browser is consistent.
As mentioned in comment 2, it seems like we're getting blank white images from Gecko on requesting a thumbnail when returning to a browser view where the page being loaded from the browser cache in GeckoView, which is roughly what the "attaching GeckoView to already loaded session" would translate to.
Assignee | ||
Comment 9•5 years ago
|
||
Thanks for the detailed instructions Jon! I can indeed reproduce consistently in the Reference Browser, much more than in Fenix. Will try to get to the bottom of this tomorrow.
Assignee | ||
Comment 10•5 years ago
|
||
Okay, I think I've figured out what's going on here.
When geckoview is paused, we call ClearCachedResources()
. This makes its way to LayerManagerComposite::ClearCachedResources()
or WebRenderBridgeParent::RecvClearCachedResources()
, which will throw away the data they need to render the page (eg the PaintedLayer content tiles, the webrender display list, etc).
When we then resume we immediately force a composite, so that the uninitialized SurfaceTexture is not visible (resulting in a black flash). But since we have no content to composite, all we can do is just clear the screen to the chosen clear colour. It's not ideal, but the logic is that a flash of the clear colour is preferable to a flash of black.
I gather that Fenix/RB previously waited for the first contentful paint before requesting the screenshot, hence why it captured the correct data. By now requesting the screenshot immediately it is requesting the screenshot of the first frame, which is just the clear colour.
I guess we have a couple of ways of fixing this:
- Fenix can wait until the first contentful paint before capturing the screenshot. Make the GeckoView API explicit that attempting to capture a thumbnail before then will return a blank colour.
- Make LayerManagerComposite/WebRenderBridgeParent wait until a contentful composite before capturing the screenshot.
Is there some context for why was Fenix changed to capture the screenshot immediately instead of waiting for the first contentful paint?
Assignee | ||
Comment 11•5 years ago
|
||
As an aside, you'll notice that pages such as about:support do always get thumbnailed correctly. I think this is because we don't ClearCachedResources() for parent-process pages, or something.
I think there's a discussion to be had about whether it's wise to clear our cached resources immediately upon pausing. This "clear colour frame" causes a slight flickering effect which is noticeable (though much less noticable than when it was a black flash!). We probably don't want that to happen when the user opens the settings menu then presses back. But we certainly will want to clear them eventually after pausing, so the screenshot mechanism needs to be resilient to this one way or the other.
(In reply to Jamie Nicol [:jnicol] from comment #11)
As an aside, you'll notice that pages such as about:support do always get thumbnailed correctly. I think this is because we don't ClearCachedResources() for parent-process pages, or something.
I think there's a discussion to be had about whether it's wise to clear our cached resources immediately upon pausing. This "clear colour frame" causes a slight flickering effect which is noticeable (though much less noticable than when it was a black flash!). We probably don't want that to happen when the user opens the settings menu then presses back. But we certainly will want to clear them eventually after pausing, so the screenshot mechanism needs to be resilient to this one way or the other.
Yes, I think we should only clear those resources after a timeout or memory pressure. Short pauses of the compositor are very common in situations like the one you mentioned. You also don't want to throw everything away when switching between two (or more) tabs frequently, etc.
(In reply to Jamie Nicol [:jnicol] from comment #10)
Thanks for digging into this!
I gather that Fenix/RB previously waited for the first contentful paint before requesting the screenshot, hence why it captured the correct data. By now requesting the screenshot immediately it is requesting the screenshot of the first frame, which is just the clear colour.
I guess we have a couple of ways of fixing this:
- Fenix can wait until the first contentful paint before capturing the screenshot. Make the GeckoView API explicit that attempting to capture a thumbnail before then will return a blank colour.
This is pretty annoying for the app to track, and we already have some pretty annoying caveats with this method.
- Make LayerManagerComposite/WebRenderBridgeParent wait until a contentful composite before capturing the screenshot.
This would be my preferred solution. It could timeout if there isn't a reasonable paint within some threshold.
Is there some context for why was Fenix changed to capture the screenshot immediately instead of waiting for the first contentful paint?
I think it's still doing it then too, but not sure. We did have some problems where FCP wasn't fired consistently.
Updated•5 years ago
|
Reporter | ||
Comment 14•5 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #10)
Is there some context for why was Fenix changed to capture the screenshot immediately instead of waiting for the first contentful paint?
This was done for a short period of time until we understood that this change was still needed.
In Fenix and Reference Browser (that both get this feature from Android Components), we indeed wait for the firstContentfulPaint
before requesting a screenshot, see here.
On a Matrix conversation, Randall from FxR did some investigation here and found that the flag is only set per session and not per site load, which would explain why we still see this issue.
Make LayerManagerComposite/WebRenderBridgeParent wait until a contentful composite before capturing the screenshot.
Maybe this recommendation is something that would need to be altered depending on my above comment?
Assignee | ||
Comment 15•5 years ago
|
||
On a Matrix conversation, Randall from FxR did some investigation here and found that the flag is only set per session and not per site load, which would explain why we still see this issue.
Yeah, I was looking in to this further today and reached a similar conclusion. I think on the Gecko end the first contentful paint event does get fired for each new site (as it's tracked by a property on the PresContext), but android-components never unsets its copy of the flag. But yeah, it certainly doesn't get unset after we lose our painted data when pausing, so we hit this bug.
Maybe this recommendation is something that would need to be altered depending on my above comment?
I do think it logically makes more sense for android-components to completely decide when to thumbnail. But for that to work GV needs to be able to tell it when we no longer have content painted. This would presumably require a new/changed API so might therefore be a more difficult change to make. I'd also worry about changing the semantics of things like the firstContentfulPaint
flag which might be used for other things too, and modifying PresContext code which I really don't know.
OTOH the compositor isn't really aware of how suitable the thing it's compositing is for thumbnailing, so I think it's a bit of a hack to do that option. But it's probably the path of least resistance. My handwavey plan is to add a flag to LayerManagerComposite
/WebRenderBridgeParent
which tracks whether we are contentful/thumbnailable. When we send a Layer transaction or webrender display list, we can set a flag which sets that true. When we ClearCachedResources()
we set it false. When recieve a screenshot request, we immediately perform the screenshot if the flag is true, otherwise we wait.
Assignee | ||
Comment 16•5 years ago
•
|
||
After attempting to implement the discussed workarounds in graphics code, I've decided not to continue that approach. I wasn't happy with the code at all, and I think the other option of communicating to GeckoView when the contentful paint status is reset is much cleaner. It involves an API change (I've renamed GeckoSession.ContentDelegate.onFirstContentfulPaint()
to GeckoSession.ContentDelegate.onContentfulPaintStatusChanged(boolean paintStatus)
, and some changes to android-component's engine-gecko-*
components. Thanks to their abstractions the Thumbnail component is magically fixed without any changes required!
Patch up for this shortly, and I can submit a PR for the android-components changes too
Assignee | ||
Comment 17•5 years ago
|
||
Android-components listens to the GeckoView callback onFirstContentfulPaint to
track whether a contentful paint has occured, in order to decide when to
thumbnail a tab. Currently this gets fired once per tab.
However, when the GeckoSession is paused, we clear cached resources in the
compositor. This means that when the session is resumed, the compositor does not
have the necessary information to render the page (such as painted content
buffers, or the webrender display list). Because android-components attempts to
capture a new thumbnail immediately upon resuming, it ends up capturing a blank
thumbnail.
To fix this, change the onFirstContentfulPaint() callback to
onContentfulPaintStatusChanged(bool paintStatus), so that android-components can
be informed when the contenful paint is no longer visible. It can then wait
until the subsequent contentful paint occurs before capturing the thumbnail.
Updated•5 years ago
|
Assignee | ||
Comment 18•5 years ago
|
||
Add test paintStatusReset() to ContentDelegateTest, which asserts that
onPaintStatusReset() is called after GeckoSession.setActive(false).
Add test capturePixelsSessionDeactivatedActivated() to ScreenshotTest, which
asserts that capturePixels() is successful if called when the session is
deactivated then reactivated, after waiting for the onPaintStatusReset() and
onFirstContentfulPaint() callbacks.
Depends on D87341
Comment 19•5 years ago
|
||
Comment 20•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 21•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c0a323ca7d8b
https://hg.mozilla.org/mozilla-central/rev/f2663e59234a
Updated•5 years ago
|
Description
•