Closed Bug 1717209 Opened 5 years ago Closed 4 years ago

Categories

(Core :: Graphics: Canvas2D, defect)

defect

Tracking

()

VERIFIED FIXED
92 Branch
Tracking Status
firefox91 --- verified
firefox92 --- verified

People

(Reporter: jrmuizel, Assigned: bobowen)

References

Details

Attachments

(1 file)

STR:

  1. Open 'https://www.bitstamp.net/market/tradeview/'. This opens a page with a TradeView element. It's not just TradeView that makes Firefox freeze but I found this key into triggering it sooner.
  2. Open a second tab with any other website.
  3. Maximize the Firefox window. I found that the smaller the window, the less likely it freezes.
  4. Constantly switch between the tabs manually with your mouse (ctrl+tab does not work), but while on the TradeView page, hover your mouse over the TradeView element first before switching again. Do this until Firefox freezes. Can take anywhere from 1 switch up to 30; it's pretty random but a freeze is guaranteed if your environment is affected.

https://share.firefox.dev/2TNGxs8

We're hanging while recording a path into the CanvasRingBuffer.

Bob, can you take a look at this?

Flags: needinfo?(bobowencode)

Adding that this is since version 89 (the one that introduced the new style). And the following info might be helpful:

Disabling hardware acceleration helps. Setting gfx.webrender.software.opengl to true does not help (HW accel. on). Setting gfx.webrender.force-disabled to false helps (HW accel. on). I'm only able to trigger this issue on my desktop PC. Laptop still uses Windows 7 and seems unaffected.

Issue also present on my brand new laptop. Hope this issue doesn't get ignored. The bug got introduced in version 89. It would be a shame if it carried on to future versions.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)

Bob, can you take a look at this?

Sorry, been out for a couple of weeks, yes I'll look at this now.

Flags: needinfo?(bobowencode)

Well this is exciting.
I've reproduced this twice and both times it is caused by the AcquireSync failing and then the crash from bug 1709603.
In that bug I was speculating that we might have an issue with our TextureClient read locks on tab switch, but I couldn't actually reproduce the write lock issue and crash.
In theory we have a mechanism to handle this, but something is obviously going wrong somewhere.
Hopefully now I have STR I can track it down.

I can see a whole load of printf debugging in the not too distant future.

Assignee: nobody → bobowencode
Status: NEW → ASSIGNED
See Also: → 1709603

So it looks like relying on mTextureLockIsUnreliable as stated in bug 1709603 comment 4, doesn't work.
This only gets set in ClearCachedResources, which is only called from an expiration tracker as far as I can tell.

So, we can on rare (but clearly not rare enough) occasions end up:

  • forwarding (at least) two textures with no read lock
  • these start being used with associated Acquire/ReleaseSyncs in the Renderer thread
  • they then get selected as back buffers in the content process, with associated AcquireSync in canvas threads
  • and we hit the point where one of these is waiting for AcquireSync in Renderer thread and the other in the canvas thread, because each already has a lock on the other texture

Just trying to work out how best to resolve this.

OK patch coming soon.
It checks to see if the front texture is read locked when retrieved from PersistentBufferProviderShared::GetTextureClient.
This can only happen if we are re-forwarding a texture.
We can't read lock this again because I think it might only get released once in the Compositor / Renderer.
So instead, it borrows and returns its DrawTarget to copy to a new front buffer, which we can safely read lock.
If it isn't read locked at all then we ensure it is updated to read lock again when we re-forward it.

With this patch I can't reproduce the issue.

If we are re-forwarding a texture has not been updated then it might still be
read locked. We rely on the read lock for us to know when the Compositor or
Renderer has finished with the texture. So this change copies it to a new
texture, which we can safely read lock.
If the texture is not read locked then we ensure that it is set as updated,
which means we will read lock it as we forward it.

Pushed by bobowencode@gmail.com: https://hg.mozilla.org/integration/autoland/rev/723d9c48460a Copy to a new canvas texture, when re-forwarding a texture that is still read locked. r=lsalzman
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 92 Branch

Comment on attachment 9230528 [details]
Bug 1717209: Copy to a new canvas texture, when re-forwarding a texture that is still read locked. r=jrmuizel!

Beta/Release Uplift Approval Request

  • User impact if declined: This cause of hangs (and potential subsequent GPU process crashes) on certain sites using canvas 2D will remain.
    Particularly thinking that 91 is an ESR release.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: See description.
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): I think this is low/medium, it's not a trivial patch but it's fairly simple.
  • String changes made/needed: None
Attachment #9230528 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Bob, all the crashes in bug 1709603 are on the nightly channel, are the hangs causing these crashes also happening on the beta/release channel or only on nightly builds? I want to make sure we are fixing a problem that happens on the release channel and not just pre-release before evaluating this for uplift. Thanks

Flags: needinfo?(bobowencode)

(In reply to Pascal Chevrel:pascalc from comment #14)

Bob, all the crashes in bug 1709603 are on the nightly channel, are the hangs causing these crashes also happening on the beta/release channel or only on nightly builds? I want to make sure we are fixing a problem that happens on the release channel and not just pre-release before evaluating this for uplift. Thanks

Yes the hangs still happen on beta and release, but we only crash immediately on nightly.
It is also possible that the timeout and failure to lock the texture, will cause other instability on beta and release.

Flags: needinfo?(bobowencode)

Reproduced the issue on Firefox 91.0a1 (2021-06-18) under Windows 10.

The issue is no longer reproducible on 92.0a1 (2021-07-26).

The issue is reproducible on 91.0b7.

Comment on attachment 9230528 [details]
Bug 1717209: Copy to a new canvas texture, when re-forwarding a texture that is still read locked. r=jrmuizel!

Approved for 91 beta 8, thanks.

Attachment #9230528 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [qa-triaged]
See Also: → 1718930
Regressions: 1723214

I think this issue needs to be re-opened. The initial problem is resolved, but it introduced a new problem where the canvas does not scale with a window re-size. Example: https://i.imgur.com/mL7lBSM.png

Tested in 91.0 stable and 92.0b3.

(In reply to limeclipse from comment #20)

I think this issue needs to be re-opened. The initial problem is resolved, but it introduced a new problem where the canvas does not scale with a window re-size. Example: https://i.imgur.com/mL7lBSM.png

Tested in 91.0 stable and 92.0b3.

Thanks for taking the time to report this.
I could reproduce, so I've bisected it with mozregression and it looks like this is caused by the enabling of Window.visualViewport in bug 1551302.

Not sure whether that's because the website detects that and changes behaviour, but I'll leave a comment on that bug to add this case.

(In reply to Bob Owen (:bobowen) from comment #21)
...

Not sure whether that's because the website detects that and changes behaviour, but I'll leave a comment on that bug to add this case.

Forgot to say that a similar regression (bug 1725569) has been filed, so that's the one I've commented on.

Verified that the issue is also fixed on 91.0.2 under Windows 10.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Issue is not fixed on 91.0.2 under Windows 10 for me. Note that TradeView seems to have implemented a workaround that reloads the viewport when it detects a window resize.

(In reply to limeclipse from comment #25)

Issue is not fixed on 91.0.2 under Windows 10 for me. Note that TradeView seems to have implemented a workaround that reloads the viewport when it detects a window resize.

Hi, this particular bug is fixed.
The bug you found already had a similar one with what is probably the root cause filed as bug 1725569.

Oh, my apologies. I though "this issue" was in the context of the newly discovered one. Thanks for pointing that out.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: