Closed Bug 1265873 Opened 8 years ago Closed 8 years ago

Flickering when scrolling puppygames with OSX Basic

Categories

(Core :: Graphics: Layers, defect)

48 Branch
Unspecified
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox47 --- unaffected
firefox48 --- unaffected
firefox49 --- fixed

People

(Reporter: BenWa, Assigned: sotaro)

References

Details

(Keywords: regression, Whiteboard: [gfx-noted])

Attachments

(3 files, 7 obsolete files)

Scrolling http://www.puppygames.net/blog/?p=1282 with either the trackpack or mousewhell but not the scrollbar. There's heavy flickering.
Attached image GIF
The GIF is recorded at a low framerate but the flickering I see is even more jarring. Notice how it's not the whole layer that's fickering as well. The part on the right doesn't flicker.
I don't see flickering on my machine, just checkerboarding. But I have the velocity bias set to zero. Can you try that and see whether it makes a difference?
I see the background lagging behind (due to a main-thread parallax effect), but no flickering.
I tried with 'apz.velocity_bias;0' and it still reproduces.

I'll try to run a bisection.
Caused by OSX Basic.
Summary: Flickering when scrolling puppygames → Flickering when scrolling puppygames with OSX Basic
Depends on: 1263053
Blocks: 1263053
No longer depends on: 1263053
Component: Panning and Zooming → Graphics: Layers
Keywords: regression
OS: Unspecified → Mac OS X
Whiteboard: [gfx-noted]
Version: unspecified → 48 Branch
I confirmed the problem on my mac.
(In reply to Sotaro Ikeda [:sotaro] from comment #7)
> I confirmed the problem on my mac.

The problem happened when e10s was enabled.
I also see the problem with BasicCompositor on win8, when tiling and e10s is enabled.
Assignee: nobody → sotaro.ikeda.g
During scrolling I sometimes saw different tile's drawing.

And when nglayout.debug.paint_flashing is set, I saw very frequent flashing.
(In reply to Sotaro Ikeda [:sotaro] from comment #10)
> During scrolling I sometimes saw different tile's drawing.

Tile's front buffer's recycling seems now work correctly.
attachment 8744152 [details] [diff] [review] addressed the problem on my Win8 and mac.
(In reply to Sotaro Ikeda [:sotaro] from comment #11)
> (In reply to Sotaro Ikeda [:sotaro] from comment #10)
> > During scrolling I sometimes saw different tile's drawing.
> 
> Tile's front buffer's recycling seems now work correctly.

In current gecko, front buffer is recycled at the end of Transaction. It does not wait end of the buffer's compositing on compositor side. There is a risk that client side does recycle and paint to compositing buffer. On BasicCompositor, it seems to happen easily, because compositing is very heavy.
With CompositorOGL, we copy the tile contents when we process the transaction, by uploading it to a GPU texture. With BasicCompositor, apparently we don't copy. So do we need to keep the texture locked, until compositing is done? It sounds like we have very similar requirements here to what we need in bug 1265824.
Tiling is supposed to handle this already, for TextureHosts that return false for HasIntermediateBuffer.

All tiles passed into TiledLayerBufferComposite::UseTiles should have a read lock on them, and if HasIntermediateBuffer() is false, then we only unlock the tiles the composite *after* a new transaction arrives (with a new set of tiles).

Tiles that exist in both the new set and the old set will have been read locked a second time by content, so this unlock still leaves them in a locked state.

Any tiles that are no longer in the tiled buffer will have their final read lock released (once we've finished the next composite and are sure no access will be made to them by deferred GPU work) and will be available to the tile pool again.

On the client side it looks like ComputeHasIntermediateBuffer should be determining that we don't have an intermediate buffer when using basic compositor.
Yea, on TileClient point of view, TextureClient seems to be handled correctly with gfxSharedReadLock. But this bug was out of TileClient management :(
Attachment #8744152 - Attachment description: patch - Make Tiles front buffer recycleing wait corresponding DidComposite event → patch - Make Tiles buffers recycleing wait corresponding DidComposite event
Update nits.
Attachment #8744152 - Attachment is obsolete: true
Hmm, TransactionId is not always updated incrementally from ClientLayerManager point of view. Sometimes, it is reset to 1 and increment again. It needs to be always incremental.
It seems better to pick FwdTransactionId from Bug 1252835.
Add FwdTransacitonId.
Attachment #8744167 - Attachment is obsolete: true
Update nits.
Attachment #8744249 - Attachment is obsolete: true
Attachment #8744326 - Flags: review?(nical.bugzilla)
(In reply to Sotaro Ikeda [:sotaro] from comment #17)
> Yea, on TileClient point of view, TextureClient seems to be handled
> correctly with gfxSharedReadLock. But this bug was out of TileClient
> management :(

Can you provide more details about what's wrong here? The tiling code is supposed to handle this through the copy-on-write lock mechanism. If that logic has bugs, I would prefer that we fix it rather than add new stuff to work around the issue.

(In reply to Sotaro Ikeda [:sotaro] from comment #14)
> In current gecko, front buffer is recycled at the end of Transaction. It
> does not wait end of the buffer's compositing on compositor side. There is a
> risk that client side does recycle and paint to compositing buffer. On
> BasicCompositor, it seems to happen easily, because compositing is very
> heavy.

Shouldn't this take the state of the Copy-on-write lock into account? If the compositor hasn't unlocked it, we shouldn't try to recycle the texture (yet). My recollection of the details involved is a bit fuzzy...
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Nicolas Silva [:nical] from comment #24)
> (In reply to Sotaro Ikeda [:sotaro] from comment #17)
> > Yea, on TileClient point of view, TextureClient seems to be handled
> > correctly with gfxSharedReadLock. But this bug was out of TileClient
> > management :(
> 
> Can you provide more details about what's wrong here? The tiling code is
> supposed to handle this through the copy-on-write lock mechanism. If that
> logic has bugs, I would prefer that we fix it rather than add new stuff to
> work around the issue.

When TileClient calls ReturnTextureClientDeferred(), it returned TextureClient to TextureClientPool and gfxSharedReadLock is unlocked and cleared.
  https://dxr.mozilla.org/mozilla-central/source/gfx/layers/client/TiledContentClient.cpp#674

The gfxSharedReadLock is not referenced anymore on client side. Only Host side continues to hold ref of gfxSharedReadLock. Then TextureClientPool reuse the TextureClient after TextureClientPool::ReturnDeferredClients(). It does not care about gfxSharedReadLock to recycle TextureClient. ReturnDeferredClients() is called in ClientLayerManager::EndTransaction(). If compositor side delayed its task, the TextureClient could be still in use in Host side.

> (In reply to Sotaro Ikeda [:sotaro] from comment #14)
> > In current gecko, front buffer is recycled at the end of Transaction. It
> > does not wait end of the buffer's compositing on compositor side. There is a
> > risk that client side does recycle and paint to compositing buffer. On
> > BasicCompositor, it seems to happen easily, because compositing is very
> > heavy.
> 
> Shouldn't this take the state of the Copy-on-write lock into account? If the
> compositor hasn't unlocked it, we shouldn't try to recycle the texture
> (yet). My recollection of the details involved is a bit fuzzy...

As the above, the lock is not applied to TextureClient that ReturnTextureClientDeferred() is called.
attachment 8744326 [details] [diff] [review] did not use gfxSharedReadLock to address ReturnTextureClientDeferred(), since it might be better to limit the lock only to TileClient.
Flags: needinfo?(sotaro.ikeda.g)
Ok, this makes sense to me now. When we're recycling buffers within a tile, the read lock system work. But sometimes we discard tile buffers and return those buffers to the pool while they should still be locked.

The code in DiscardBackBuffer only returns the tile if we're already unlocked, and discards the buffer entirely if not.

The simple solution would be to do the same for DiscardFrontBuffer.

Ultimately it seems like it would be preferable to keep the buffers and locks together at all times, teach the pool about locking, and only recycle buffers that are unlocked.
Attachment #8744326 - Flags: review?(nical.bugzilla)
(In reply to Matt Woodrow (:mattwoodrow) from comment #27)
> 
> Ultimately it seems like it would be preferable to keep the buffers and
> locks together at all times, teach the pool about locking, and only recycle
> buffers that are unlocked.

I am creating a patch to do it.
Attachment #8744326 - Attachment is obsolete: true
Attachment #8745193 - Attachment description: Use gfxSharedReadLock in TextureClientPool → patch - Use gfxSharedReadLock in TextureClientPool
Fix build failures on android.
Attachment #8745193 - Attachment is obsolete: true
Blocks: 1267569
Need to address ShmemSection lifetime problem.
(In reply to Sotaro Ikeda [:sotaro ouf of office 29/Apr - 8/May] from comment #33)
> Need to address ShmemSection lifetime problem.

If you are talking about the mUsedShmems.empty() assertion, I am okay with r+ing a patch that just removes it. That bogus assertion is getting in the way of too much stuff and causes intermittent failures in cases where things are actually correct.
(In reply to Nicolas Silva [:nical] from comment #34)
> If you are talking about the mUsedShmems.empty() assertion, I am okay with
> r+ing a patch that just removes it.

I put a patch on bug 1267246 that removes the assertion.
(In reply to Nicolas Silva [:nical] from comment #34)
> (In reply to Sotaro Ikeda [:sotaro ouf of office 29/Apr - 8/May] from
> comment #33)
> > Need to address ShmemSection lifetime problem.
> 
> If you are talking about the mUsedShmems.empty() assertion.

test failures happened on mac. Address of mShmemSection seemed to become invalid when ReadCount became 0. gfxShmSharedReadLock::ReadUnlock() dealloc ShmemSection when the ReadCount became 0. It seemed to be affected to the failures.

https://dxr.mozilla.org/mozilla-central/source/gfx/layers/client/TiledContentClient.cpp#422
Attached patch patch - Update ReadLock handling (obsolete) — Splinter Review
Fix ASSERT.
Attachment #8745812 - Attachment is obsolete: true
Blocks: 1252835
Attachment #8745199 - Flags: review?(nical.bugzilla)
Attachment #8745832 - Flags: review?(nical.bugzilla)
Attachment #8745199 - Attachment description: patch - Use gfxSharedReadLock in TextureClientPool → patch part 1 - Use gfxSharedReadLock in TextureClientPool
Attachment #8745832 - Attachment description: patch - Update ReadLock handling → patch part 2 - Update ReadLock handling
Attachment #8745199 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8745832 [details] [diff] [review]
patch part 2 - Update ReadLock handling

Forgot to update one asserts in TextureClientPool.
Attachment #8745832 - Flags: review?(nical.bugzilla)
Fix assert.
Attachment #8745832 - Attachment is obsolete: true
Attachment #8745912 - Flags: review?(nical.bugzilla)
Attachment #8745912 - Flags: review?(nical.bugzilla) → review+
https://hg.mozilla.org/mozilla-central/rev/693598bb1756
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
I see a 3% talos improvement on osx for the main_rss measurement- thought I would mention it:
https://treeherder.mozilla.org/perf.html#/alerts?id=1043
Hi Florin, can we have QA's help to verify this?
Flags: needinfo?(florin.mezei)
Setting verification flag.
Flags: needinfo?(florin.mezei) → qe-verify+
Hi Sotaro,
Since this patch also affects 48, are you also considering to uplift this patch to 48?
Flags: needinfo?(sotaro.ikeda.g)
By Bug 1263053 comment 15, it does not affect 48. Then it is not necessary to uplift the patch to 48.
Flags: needinfo?(sotaro.ikeda.g)
See Also: → 1293908
I can't reproduce this using 48 Nightly under Mac OS X 10.9.5.
Are there any prefs that I need to set first or is this specific to some OS's? If that's the case, can you please confirm the fix using Firefox 49 RC build?
Thank you!
Flags: needinfo?(bgirard)
According to the tracking flags it says that 48 is unaffected. Make sure 'layers.acceleration.disabled' is set to false and to restart the build on an affected build to reproduce the problem.
Flags: needinfo?(bgirard)
I was testing the Nightly from the date this bug was filled (2016-04-19) - this was 48 (as stated for Version in the top description). 
48 was set as unaffected after a backout.

'layers.acceleration.disabled' preference is set to false by default.

Even though I couldn't reproduce the flickering, I can see that on Firefox 49 RC the scrolling is improved (eg: on OSX the background image doesn't scroll anymore).

Considering this, I'll remove the qe-verify+ flag and leave the status flag as it is.
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.