Closed Bug 1462411 Opened 5 years ago Closed 5 years ago
Try to Flush
Async Paints in Client Layer Manager::End Transaction
Currently we block entering a layer transaction while previous async paints are still finishing.  The reason we do this is because we are double buffered and cannot have two active frames being painted into a single backbuffer. But, the painting only happens in EndTransaction . It seems to me that we can delay blocking for an async paint until we get to EndTransaction. This would let us do display list building and frame layer building at the same time we are painting. In the future we might determine cases where we don't need to block at all, like an empty transaction or if there are no invalidated painted regions. The concern for race conditions lies in two areas, IPDL and content clients/texture clients. When we have async paints still active, we postpone all IPDL messages on the CompositorBridge protocol until they complete . This is important so that the Update message is not sent until the paints are complete and so that other messages are in order as they are sent. This may have some effects here, and I will need to think about it. The other thing is we need to make sure that no content clients are manipulated or destroyed that have async paints happening. We currently protect against this by flushing async paints before we call root->RenderLayer(), and that would continue with this change. Here's a try run with a prototype of this patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7731a2f09f6132566c30e171cacd7771af63b27 It looks green to me.  https://searchfox.org/mozilla-central/rev/d4b9e50875ad7e5d20f2fee6a53418315f6dfcc0/gfx/layers/client/ClientLayerManager.cpp#228  https://searchfox.org/mozilla-central/rev/d4b9e50875ad7e5d20f2fee6a53418315f6dfcc0/gfx/layers/client/ClientLayerManager.cpp#356  https://searchfox.org/mozilla-central/rev/d4b9e50875ad7e5d20f2fee6a53418315f6dfcc0/gfx/layers/ipc/ShadowLayers.cpp#771
Matt, would you have any concerns about this change? Are there any things we need to watch out here during DL building or FLB?
I can't think of anything potentially dangerous during DL building. Might be worth double checking (or finding a way to assert) that we never try to clear buffers or copy data during FLB. FLB might also delete the Layers used in the previous paint, and that would probably drop a reference to the pixel buffer. We'd want to make sure we're holding the right refs, and that nobody unlocks/unmaps memory that we still want to write to.
(In reply to Matt Woodrow (:mattwoodrow) from comment #2) > I can't think of anything potentially dangerous during DL building. > > Might be worth double checking (or finding a way to assert) that we never > try to clear buffers or copy data during FLB. > I haven't seen anything in the public interface of ClientPaintedLayer that would clear buffers or copy data. There is one exception, when gfxEnv::DumpPaint() is enabled GetCompositableClient() is used with Dump() to read the current buffer. This happens in FLB so with this change we could be reading bogus values. Not sure what the best solution here is, but detecting this and falling back to the old behavior seems reasonable. > FLB might also delete the Layers used in the previous paint, and that would > probably drop a reference to the pixel buffer. We'd want to make sure we're > holding the right refs, and that nobody unlocks/unmaps memory that we still > want to write to. We currently hold on to texture clients for this reason . I don't see a reason this would become worse now. The IPDL/TextureClient destruction stuff is a bit of a mystery to me, but I believe it will behave the same as before. I sent an email to David Anderson as he originally wrote this code, and he wasn't sure exactly the reason. He thinks it was most likely a safety precaution around FLB mutating the layer tree, but also that he's not sure of anything in the paint thread code that would depend on that so that this could be safe. I think this is worth landing and monitoring for regressions.  https://searchfox.org/mozilla-central/rev/da499aac682d0bbda5829327b60a865cbc491611/gfx/layers/Layers.cpp#1608  https://searchfox.org/mozilla-central/rev/da499aac682d0bbda5829327b60a865cbc491611/gfx/layers/ipc/CompositorBridgeChild.h#246
Attachment #8977065 - Flags: review?(matt.woodrow) → review+
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c0b2977cab3c Move FlushAsyncPaints to ClientLayerManager::EndTransactionInternal. r=mattwoodrow
Perf wins! == Change summary for alert #13421 (as of Thu, 24 May 2018 11:36:11 GMT) == Improvements: 9% tsvgx windows7-32 opt e10s stylo 261.64 -> 236.82 8% tsvgx osx-10-10 opt e10s stylo 320.37 -> 294.97 8% tsvgx windows7-32 pgo e10s stylo 232.51 -> 214.63 For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=13421
This also had a major impact on the SCALARS_GFX.OMTP.PAINT_WAIT_RATIO telemetry probe.  Comparing the probe from 5/25 to 5/24, we see the median drop by half from 161.2 to 86.7. That is a change of 1.6% of blocking paints waiting over 200us to 0.87% of paints.  http://alerts.telemetry.mozilla.org/index.html#/detectors/1/metrics/2530/alerts/?from=2018-05-25&to=2018-05-25  https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-05-25&keys=__none__!__none__!__none__&max_channel_version=nightly%252F62&measure=SCALARS_GFX.OMTP.PAINT_WAIT_RATIO&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-05-25&table=0&trim=1&use_submission_date=0  https://telemetry.mozilla.org/new-pipeline/dist.html#!cumulative=0&end_date=2018-05-24&keys=__none__!__none__!__none__&max_channel_version=nightly%252F62&measure=SCALARS_GFX.OMTP.PAINT_WAIT_RATIO&min_channel_version=null&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2018-05-24&table=0&trim=1&use_submission_date=0
You need to log in before you can comment on or make changes to this bug.