Closed Bug 1439285 Opened 2 years ago Closed 2 years ago

stylo-chrome: flickering tab background when switching between two windows

Categories

(Core :: ImageLib, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr52 --- unaffected
firefox58 --- unaffected
firefox59 --- unaffected
firefox60 --- fixed

People

(Reporter: darkspirit, Assigned: emilio)

References

(Blocks 1 open bug)

Details

(Keywords: nightly-community, regression, Whiteboard: [gfx-noted])

Attachments

(6 files)

Attached video 2018-02-19_01-52-10.mp4
Nightly 60 x64 20180218100128 de_DE @ Debian Testing (KDE, Radeon RX480)

Install the theme, open a second window and switch between them.

mozregression --launch 2018-02-18 --pref layout.css.servo.chrome.enabled:true startup.homepage_welcome_url:"https://addons.mozilla.org/en-US/firefox/addon/quantum-lights-dynamic/"

See attached video.
Do you happen to have a regression range? It'd be super-helpful if so.
Flags: needinfo?(jan)
Summary: stylo: stylo-chrome: flickering tab background when switching between two windows → stylo-chrome: flickering tab background when switching between two windows
Also, is it with webrender or some other thing? It seems really strange that this kind of flicker comes from the style system instead of graphics.
It seems to be a recent regression. I cannot reproduce it in a 20180125 build.
(This bug only appears with stylo-chrome enabled.)

mozregression --good 2018-02-01 --bad 2018-02-18 --pref layout.css.servo.chrome.enabled:true startup.homepage_welcome_url:"https://addons.mozilla.org/en-US/firefox/addon/quantum-lights-dynamic/"
> 8:16.80 INFO: Last good revision: b8a6c14ad2804b410fe11a341360f13652649629
> 8:16.80 INFO: First bad revision: 7d098669f8cb02be84451118e9c93bc0b9b3d461
> 8:16.80 INFO: Pushlog:
> https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b8a6c14ad2804b410fe11a341360f13652649629&tochange=7d098669f8cb02be84451118e9c93bc0b9b3d461

> 7d098669f8cb	Andrew Osmond — Bug 1408636 - Ensure accessibility tests pass regardless of image caching affecting whitespace. r=yzen
> 30ee5a21a6a1	Andrew Osmond — Bug 1383682 - Part 3. Prevent imgRequestProxy from leaking the current state when validating. r=tnikkel
> 62497967f092	Andrew Osmond — Bug 1383682 - Part 2. Rename IProgressObserver::SetNotificationsDeferred to make purpose clear. r=tnikkel
> c657d87a0011	Andrew Osmond — Bug 1383682 - Part 1. Split off imgRequestProxy notification deferrals for validation. r=tnikkel
Blocks: 1383682
Has Regression Range: --- → yes
Flags: needinfo?(jan)
Keywords: regression
Oh.
"Last good" from comment 4: The tab background is deep purple for a moment when opening a new window. And when switching between them tab backgrounds intermittently flicker deep purple.

Without stylo-chrome everything is perfect.
Andrew, any chance you could take a look to why your patches (see comment 4) affected this with stylo enabled? No worries if not, I guess I or Xidorn can investigate otherwise. I just guessed you may know faster than us :)
Flags: needinfo?(aosmond)
I found another range.

good = The window gets grey after installing the theme and stays forever like this. A second and third window are fine. I can switch between them and don't see a dark purple tab background flickering.

bad = The window gets grey after installing the theme. When opening a second window the first one switches from grey to the active theme. When I switch between them I see the dark purple tab background flickering described in comment 5.

mozregression --good 2017-10-28 --bad 2017-11-01 --pref layout.css.servo.chrome.enabled:true startup.homepage_welcome_url:"https://addons.mozilla.org/en-US/firefox/addon/quantum-lights-dynamic/"
> 12:10.04 INFO: Last good revision: 21e6718ad041feb812d53e73dbe82f35e984fe47
> 12:10.04 INFO: First bad revision: 0238eee84c25b41e3582648d0cb4e84b0739a268
> 12:10.04 INFO: Pushlog:
> https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=21e6718ad041feb812d53e73dbe82f35e984fe47&tochange=0238eee84c25b41e3582648d0cb4e84b0739a268

> 0238eee84c25	Xidorn Quan — Bug 1390694 - Part 5: Enable browser_windowactivation.js test. r=emilio
> 6b2e7817128b	Cameron McCormack — Bug 1390694 - Part 4: Restyle the document when document state dependencies change. r=emilio
> 4617a3675f58	Cameron McCormack — Bug 1390694 - Part 3: Add nsBindingManager function to check for document state dependencies. r=emilio
> 68e5971edabb	Cameron McCormack — Bug 1390694 - Part 2: Allow EnumerateBoundContentBindings callbacks to stop enumeration. r=emilio
> faafb9307cb1	Cameron McCormack — Bug 1390694 - Part 1: Add ServoStyleSet function for checking document state dependencies. r=emilio
Before bug 1390694, window focus change doesn't trigger restyle, and thus the result of comment 7. I don't think that's the expected result.
In bug 1383682, I made imgRequestProxy entries stop giving cached info if the underlying imgRequest is being validated. This should actually cause *less* flickering, because one would display a cached result, the cache gets evicted, and then we redecode. My best guess is that before the timing was such that we never really flickered with stylo-chrome; decoding is usually pretty fast and can race with the main thread.

As for the particulars of this bug, I blame bug 1406134. If I add a similar hack for moz-extension as we've done for moz-page-thumb, and skip cache validation for said URIs, the flickering goes away. This is a consequence of us moving away from file:: URIs where imagelib used to just check the file timestamp to do cache validation. We need support from the underlying channel to do this check for us.

Considering the growing need for this, I will look into what is necessary to make this happen, or at least provide a stubbed nsICacheInfoChannel implemention for the relevant channels as recommended in bug 1406134 comment 4. This would move the hacks out of imagelib and to where they belong for eventual proper implementation...
See Also: → 1406134
Moving this to imagelib for now, as a short term fix will be necessary regardless.
Component: CSS Parsing and Computation → ImageLib
Priority: -- → P3
Whiteboard: [gfx-noted]
Ok, so I dug into why this didn't happen with the old style system at all, and the reason is the following: Currently image requests are stored in style structs. Gecko can reuse style structs from the same style context, when an element is restyled.

If you add logging in nsStyleImageRequest::Resolve on this simple test-case, you'll see a lot of image requests triggered for Stylo, and only one on initial styling for Gecko.

I'll try to come up with a setup that doesn't require this. nsFrame::DidSetStyleContext already needs to deal with the images anyway.

I think with this we'd be able to remove StyleFoo::FinishStyle, ResolveSameStructsAs, etc etc.
Shouldn't stylo reuse the same imgRequest via cache in specified value?
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #13)
> Shouldn't stylo reuse the same imgRequest via cache in specified value?

That's what the problem is on the imagelib/networking side. When you use LoadImage to get a new imgRequestProxy (instead of cloning an existing imgRequestProxy), they both start out pointing to the original imgRequest. However LoadImage will trigger a validation of the cache, which always fails, because channels other than nsHTTPChannel don't implement nsICacheInfoChannel. We used to get around this by using SubstitutingURL + file URLs, and a hack in imagelib to check the file last modified timestamp, but with the content process losing access to the file system, this doesn't work anymore. The file channel needs to support caching. That is what I'm looking into.

If you can avoid the extra LoadImage calls, and preserve your original state as emilio is suggesting in comment 12, then you will work around this limitation.
See Also: → 1440305
Attachment #8953034 - Attachment mime type: text/plain → text/html
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #13)
> Shouldn't stylo reuse the same imgRequest via cache in specified value?

We don't cache the request in the specified value.
I think I can hack around this before 60 so we can ship stylo chrome... It won't be beautiful, but I think it's the less risky thing to do for now, and we really really need to get rid of the old style system.
Assignee: nobody → emilio
Comment on attachment 8953180 [details]
Bug 1439285: Make the old image request arrive to FinishStyle if we come from ResolveSameStructsAs.

https://reviewboard.mozilla.org/r/222452/#review228376

::: layout/style/nsStyleStruct.cpp:4277
(Diff revision 1)
> +  size_t i = 0;
>    for (nsStyleContentData& data : mContents) {
> -    data.Resolve(aPresContext);
> +    const nsStyleContentData* oldData =
> +      (aOldStyle && aOldStyle->mContents.Length() > i)
> +      ? &aOldStyle->mContents[i]
> +      : nullptr;
> +    data.Resolve(aPresContext, oldData);

Looks like you forgot to convert this to a traditional for loop, so i will always be 0.

::: layout/style/nsStyleStruct.cpp:4667
(Diff revision 1)
> +  size_t i = 0;
>    for (nsCursorImage& cursor : mCursorImages) {

I think |i| can get out of sync here. Please convert this to a traditional for loop.
Attachment #8953180 - Flags: review?(bobbyholley) → review+
Comment on attachment 8953181 [details]
Bug 1439285: Hack around bug 1406134.

https://reviewboard.mozilla.org/r/222454/#review228378

Please file a dependent bug on the imagelib bug so that we remember to rip this out.
Attachment #8953181 - Flags: review?(bobbyholley) → review+
Comment on attachment 8953180 [details]
Bug 1439285: Make the old image request arrive to FinishStyle if we come from ResolveSameStructsAs.

https://reviewboard.mozilla.org/r/222452/#review228376

> Looks like you forgot to convert this to a traditional for loop, so i will always be 0.

Whoops, I did forget the `++`. I'm happy to do any of either that or converting to a for loop.

> I think |i| can get out of sync here. Please convert this to a traditional for loop.

Well, `i` can indeed get out of sync, but just when it doesn't matter anymore.
Flags: needinfo?(aosmond)
Blocks: 1440442
Pushed by ecoal95@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/90dde2f81fa0
Make the old image request arrive to FinishStyle if we come from ResolveSameStructsAs. r=bholley
https://hg.mozilla.org/integration/autoland/rev/1218f0cd72df
Hack around bug 1406134. r=bholley
(In reply to Emilio Cobos Álvarez [:emilio] from comment #15)
> (In reply to Xidorn Quan [:xidorn] UTC+10 from comment #13)
> > Shouldn't stylo reuse the same imgRequest via cache in specified value?
> 
> We don't cache the request in the specified value.

We do, see https://searchfox.org/mozilla-central/rev/bd05e3853c6e982e2a35c1cc404b987b2bc914d6/layout/style/nsCSSValue.h#292

But as described in comment 14, there is probably some problem with reusing that, which imagelib should probably fix.
(In reply to Andrew Osmond [:aosmond] from comment #14)
> (In reply to Xidorn Quan [:xidorn] UTC+10 from comment #13)
> > Shouldn't stylo reuse the same imgRequest via cache in specified value?
> 
> That's what the problem is on the imagelib/networking side. When you use
> LoadImage to get a new imgRequestProxy (instead of cloning an existing
> imgRequestProxy), they both start out pointing to the original imgRequest.
> However LoadImage will trigger a validation of the cache, which always
> fails, because channels other than nsHTTPChannel don't implement
> nsICacheInfoChannel. We used to get around this by using SubstitutingURL +
> file URLs, and a hack in imagelib to check the file last modified timestamp,
> but with the content process losing access to the file system, this doesn't
> work anymore. The file channel needs to support caching. That is what I'm
> looking into.
> 
> If you can avoid the extra LoadImage calls, and preserve your original state
> as emilio is suggesting in comment 12, then you will work around this
> limitation.

It still doesn't explain, actually... It seems we only call LoadImage for each specified value at most once, so later use of the same specified image shouldn't trigger another LoadImage. We don't even clone imgRequestProxy, but just addref it.

This sounds like something which is supposed to work doesn't work properly in the style system...
So the reason actually is... the images are in custom properties, so we don't have cache in specified value, and every time we restyle the element, we generate a new specified value and computed value, so no image request is preserved among restyle!

Sounds like maybe we should also put image cache in custom properties as well when we find they have url which is used as images... That sounds a bit hard, and it would become especially hard given when url() becomes a function which accepts a string (which means you should be able to do `--some-url: "/some-image.png"; background-image: url(var(--some-url));`).
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #29)
> So the reason actually is... the images are in custom properties, so we
> don't have cache in specified value, and every time we restyle the element,
> we generate a new specified value and computed value, so no image request is
> preserved among restyle!
> 
> Sounds like maybe we should also put image cache in custom properties as
> well when we find they have url which is used as images... That sounds a bit
> hard, and it would become especially hard given when url() becomes a
> function which accepts a string (which means you should be able to do
> `--some-url: "/some-image.png"; background-image: url(var(--some-url));`).

I see, nice find! So, the hackaround still stands I guess, and probably that's a bigger reason to not cache them on specified / computed values and cache them later in the pipeline, like bug 1440305 proposes...
https://hg.mozilla.org/mozilla-central/rev/90dde2f81fa0
https://hg.mozilla.org/mozilla-central/rev/1218f0cd72df
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
(In reply to Andrew Osmond [:aosmond] from comment #10)
> In bug 1383682, I made imgRequestProxy entries stop giving cached info if
> the underlying imgRequest is being validated. This should actually cause
> *less* flickering, because one would display a cached result, the cache gets
> evicted, and then we redecode.

I disagree with this. With the old way of doing things our imgRequestProxy was always paintable (even if it drew the old image). With the new way it cannot be painted at all (old or new) during validation. And remember that the majority of the time the old and new images are the same.

Old way: paint old image during validation, then allow painting of new image but with possibility of drawing nothing if data over the network didn't come in fast enough (like all images).

New way: paint no image during validation, then allow painting of the new image (same possibility of drawing nothing as before).


(In reply to Emilio Cobos Álvarez [:emilio] from comment #12)
> If you add logging in nsStyleImageRequest::Resolve on this simple test-case,
> you'll see a lot of image requests triggered for Stylo, and only one on
> initial styling for Gecko.

(In reply to Andrew Osmond [:aosmond] from comment #14)
> That's what the problem is on the imagelib/networking side. When you use
> LoadImage to get a new imgRequestProxy (instead of cloning an existing
> imgRequestProxy), they both start out pointing to the original imgRequest.
> However LoadImage will trigger a validation of the cache, which always
> fails, because channels other than nsHTTPChannel don't implement
> nsICacheInfoChannel.

LoadImage should not be triggering a validation of the cache when all of the LoadImage calls are coming from the same document (in fact this is required per spec). See

https://dxr.mozilla.org/mozilla-central/rev/3f474e48db7f7b0f1c2c090f812290a8d9a21650/image/imgLoader.cpp#1931

where the loadid and key are basically a pointer to the document. Note that we only validate if the document of the new LoadImage call does not match the document of the previous LoadImage call that is in the cache. If it's all in the same document all LoadImage should do is a hashtable lookup and return the existing image in the cache.

So what is going on here that that isn't working?
Flags: needinfo?(emilio)
Flags: needinfo?(aosmond)
Let me dig into a debugger on nsContentUtils::LoadImage reverting the patch in this bug...
So the explanation is that obviously two different windows are two different XUL documents.

Here's what I see (this was written while I debugged):

The load is triggered, as expected, from:

#0  0x00007f7b46069ae0 in nsContentUtils::LoadImage(nsIURI*, nsINode*, nsIDocument*, nsIPrincipal*, unsigned long, nsIURI*, mozilla::net::ReferrerPolicy, imgINotificationObserver*, int, nsTSubstring<char16_t> const&, imgRequestProxy**, unsigned int, bool) (aURI=0x7f7b1fabb700, aContext=0x7f7b2ebab000, aLoadingDocument=0x7f7b2ebab000, aLoadingPrincipal=0x7f7b3aeb4b30, aRequestContextID=0, aReferrer=0x7f7b21589800, aReferrerPolicy=mozilla::net::RP_Unset, aObserver=0x0, aLoadFlags=0, initiatorType=..., aRequest=0x7ffebf7c4bb8, aContentPolicyType=37, aUseUrgentStartForChannel=false) at /home/emilio/projects/moz/gecko/dom/base/nsContentUtils.cpp:3816
#1  0x00007f7b48ada060 in mozilla::css::ImageLoader::LoadImage(nsIURI*, nsIPrincipal*, nsIURI*, mozilla::css::ImageValue*, mozilla::CORSMode) (this=0x7f7b215127a0, aURI=0x7f7b1fabb700, aOriginPrincipal=0x7f7b3aeb4b30, aReferrer=0x7f7b21589800, aImage=0x7f7b099fef20, aCorsMode=mozilla::CORSMode::CORS_NONE) at /home/emilio/projects/moz/gecko/layout/style/ImageLoader.cpp:266
#2  0x00007f7b48ba4d01 in mozilla::css::ImageValue::Initialize(nsIDocument*) (this=0x7f7b099fef20, aDocument=0x7f7b2ebab000)
    at /home/emilio/projects/moz/gecko/layout/style/nsCSSValue.cpp:3186
#3  0x00007f7b48ca49c9 in nsStyleImageRequest::Resolve(nsPresContext*, nsStyleImageRequest const*) (this=0x7f7b098fe5e0, aPresContext=0x7f7b2ebf6000, aOldImageRequest=0x7f7b01de6d60)
    at /home/emilio/projects/moz/gecko/layout/style/nsStyleStruct.cpp:2224
#4  0x00007f7b48cc0a9a in nsStyleImage::ResolveImage(nsPresContext*, nsStyleImage const*) (this=0x7f7b1e7bcdc8, aContext=0x7f7b2ebf6000, aOldImage=0x7f7b1e7bc6a8)
    at /home/emilio/projects/moz/gecko/obj-x86_64-pc-linux-gnu/dist/include/nsStyleStruct.h:464
#5  0x00007f7b48cc983d in nsStyleImageLayers::Layer::ResolveImage(nsPresContext*, nsStyleImageLayers::Layer const*) (this=0x7f7b1e7bcdc8, aContext=0x7f7b2ebf6000, aOldLayer=0x7f7b1e7bc6a8)
    at /home/emilio/projects/moz/gecko/obj-x86_64-pc-linux-gnu/dist/include/nsStyleStruct.h:792
#6  0x00007f7b48cc3164 in nsStyleImageLayers::ResolveImages(nsPresContext*, nsStyleImageLayers const*) (this=0x7f7b049f8458, aContext=0x7f7b2ebf6000, aOldLayers=0x7f7b0e147718)
    at /home/emilio/projects/moz/gecko/obj-x86_64-pc-linux-gnu/dist/include/nsStyleStruct.h:847
#7  0x00007f7b48cae35f in nsStyleBackground::FinishStyle(nsPresContext*, nsStyleBackground const*) (this=0x7f7b049f8458, aPresContext=0x7f7b2ebf6000, aOldStyle=0x7f7b0e147718)
    at /home/emilio/projects/moz/gecko/layout/style/nsStyleStruct.cpp:3362
#8  0x00007f7b48d4d1a1 in mozilla::ServoStyleContext::ResolveSameStructsAs(mozilla::ServoStyleContext const*) (this=0x7f7b0c910608, aOther=0x7f7b0d823508)
    at /home/emilio/projects/moz/gecko/obj-x86_64-pc-linux-gnu/dist/include/nsStyleStructList.h:83
#9  0x00007f7b48d388e4 in mozilla::ServoRestyleManager::ProcessPostTraversal(mozilla::dom::Element*, mozilla::ServoStyleContext*, mozilla::ServoRestyleState&, mozilla::ServoPostTraversalFlags) (this=0x7f7b215128e0, aElement=0x7f7b2151ec10, aParentContext=0x0, aRestyleState=..., aFlags=mozilla::ServoPostTraversalFlags::Empty)
    at /home/emilio/projects/moz/gecko/layout/base/ServoRestyleManager.cpp:899

(rr) p requestFlags
$5 = 4

We find a cache entry, but in imgLoader::ValidateEntry:

(rr) p hasExpired
$10 = true
(rr) p expiryTime
$11 = 1519390562

But hey! It's a file: url, so we enter the block at:

  https://searchfox.org/mozilla-central/rev/bd05e3853c6e982e2a35c1cc404b987b2bc914d6/image/imgLoader.cpp#1883

But GetFile doesn't return anything meaningful and thus it fails.

The spec in mozilla::net::SubstitutingURL::EnsureFile is:

  jar:file:///home/emilio/projects/moz/gecko/obj-x86_64-pc-linux-gnu/tmp/scratch_user/extensions/%7Bed26ddcb-5611-4512-a89a-51b8db81cfb2%7D.xpi!/day.png

The scheme is "jar", so we hit this condition and return an error in GetFile:

  https://searchfox.org/mozilla-central/rev/bd05e3853c6e982e2a35c1cc404b987b2bc914d6/netwerk/protocol/res/SubstitutingProtocolHandler.cpp#74

The ValidateSecurityInfo check works just fine and we get through it.

Now:

(rr) p request->LoadId()
$17 = (void *) 0x7f7b2b34b000
(rr) p aCX
$18 = (mozilla::dom::XULDocument *) 0x7f7b2ebab000

So we set validateRequest to true.

The LoadId is also another XUL doc:

(rr) p (mozilla::dom::XULDocument*)request->LoadId()
$25 = (mozilla::dom::XULDocument *) 0x7f7b2b34b000
(rr) p $25->GetRootElement()
$26 = (nsIDocument::Element *) 0x7f7b1fa4da60

(Are we ping-ponging between docs?)

We enter in ValidateRequestWithNewChannel, and create a new channel for the
request, probably it being the underlying issue for the flickering with Andrew's explanation.

Timothy, does that answer the question? Anything else you want me to look into?
Flags: needinfo?(emilio) → needinfo?(tnikkel)
(In reply to Emilio Cobos Álvarez [:emilio] from comment #34)
> So the explanation is that obviously two different windows are two different
> XUL documents.

Sorry to make you dig into that.

Ah, okay. I was confused because of your reduced testcase in comment 12. So the reduced testcase doesn't really reproduce the problem then, it was just to help debugging/understanding? Or you have to have two copies of it open in order to reproduce?

We have bug 919113 filed for the issue that we can only remember one document for which the image is valid. That would probably save us from having to validate in the majority of cases except where we actually need it.
Flags: needinfo?(tnikkel)
Flags: needinfo?(emilio)
Flags: needinfo?(aosmond)
Oh, yeah, the test-case was only to understand why the number of calls to ResolveImage was different across the old and new style system, sorry for the confusion.
Flags: needinfo?(emilio)
Attachment #8953034 - Attachment description: testcase. → testcase that shows stylo calling nsStyleImageRequest::Resolve way more often than the old style system, due to struct caching.
You need to log in before you can comment on or make changes to this bug.