Intermittent damp | application crashed [@ mozilla::CrossProcessSemaphore::CrossProcessSemaphore]

RESOLVED FIXED in Firefox 54

Status

()

defect
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: intermittent-bug-filer, Assigned: mattwoodrow)

Tracking

({crash, intermittent-failure})

unspecified
mozilla55
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox53 unaffected, firefox54 fixed, firefox55 fixed)

Details

Attachments

(3 attachments)

matt: since you introduced CrossProcessSemaphore in Bug 	1325227 - could you take a look, thanks!
Component: Talos → Graphics: Layers
Flags: needinfo?(matt.woodrow)
Keywords: crash
Product: Testing → Core
Version: Version 3 → unspecified
Thread 0 (crashed)
[task 2017-03-16T07:32:31.143611Z] 07:32:31     INFO -  0  libxul.so!mozilla::CrossProcessSemaphore::CrossProcessSemaphore [CrossProcessSemaphore_posix.cpp:ab96d8a9e247 : 54 + 0x0]
[task 2017-03-16T07:32:31.144335Z] 07:32:31     INFO -     rax = 0x000000000061fd00   rdx = 0x0000000000000002
[task 2017-03-16T07:32:31.145136Z] 07:32:31     INFO -     rcx = 0x00007fd0c5ffeee7   rbx = 0x00007fd0a5984860
[task 2017-03-16T07:32:31.145896Z] 07:32:31     INFO -     rsi = 0x0000000000000000   rdi = 0x00007fd0d291c140
[task 2017-03-16T07:32:31.146727Z] 07:32:31     INFO -     rbp = 0x00007fff235bf3f0   rsp = 0x00007fff235bf3d0
[task 2017-03-16T07:32:31.147530Z] 07:32:31     INFO -      r8 = 0x0000000000000000    r9 = 0x00007fd0a5800000
[task 2017-03-16T07:32:31.148242Z] 07:32:31     INFO -     r10 = 0x0000000000000000   r11 = 0x0000000000000206
[task 2017-03-16T07:32:31.149132Z] 07:32:31     INFO -     r12 = 0x00007fd0a7121240   r13 = 0x0000000000000001
[task 2017-03-16T07:32:31.149964Z] 07:32:31     INFO -     r14 = 0x00007fff235bf610   r15 = 0x0000000000000001
[task 2017-03-16T07:32:31.150703Z] 07:32:31     INFO -     rip = 0x00007fd0c3c66ea0
[task 2017-03-16T07:32:31.151526Z] 07:32:31     INFO -     Found by: given as instruction pointer in context
[task 2017-03-16T07:32:31.152321Z] 07:32:31     INFO -  1  libxul.so!mozilla::layers::TextureClient::EnableBlockingReadLock [TextureClient.cpp:ab96d8a9e247 : 1441 + 0x21]
[task 2017-03-16T07:32:31.153349Z] 07:32:31     INFO -     rbx = 0x00007fd0a78b8ee0   rbp = 0x00007fff235bf410
[task 2017-03-16T07:32:31.154415Z] 07:32:31     INFO -     rsp = 0x00007fff235bf400   r12 = 0x00007fd0a5984850
[task 2017-03-16T07:32:31.155510Z] 07:32:31     INFO -     r13 = 0x00007fff235bf428   r14 = 0x00007fff235bf610
[task 2017-03-16T07:32:31.156539Z] 07:32:31     INFO -     r15 = 0x0000000000000001   rip = 0x00007fd0c4090d4f
[task 2017-03-16T07:32:31.157555Z] 07:32:31     INFO -     Found by: call frame info
[task 2017-03-16T07:32:31.158617Z] 07:32:31     INFO -  2  libxul.so!mozilla::layers::ContentClientRemoteBuffer::CreateBackBuffer [ContentClient.cpp:ab96d8a9e247 : 323 + 0xc]
[task 2017-03-16T07:32:31.159691Z] 07:32:31     INFO -     rbx = 0x00007fd0a46aeef0   rbp = 0x00007fff235bf450
[task 2017-03-16T07:32:31.160761Z] 07:32:31     INFO -     rsp = 0x00007fff235bf420   r12 = 0x0000000000000002
[task 2017-03-16T07:32:31.161820Z] 07:32:31     INFO -     r13 = 0x00007fff235bf428   r14 = 0x00007fff235bf610
[task 2017-03-16T07:32:31.162836Z] 07:32:31     INFO -     r15 = 0x0000000000000001   rip = 0x00007fd0c4098cec
[task 2017-03-16T07:32:31.163889Z] 07:32:31     INFO -     Found by: call frame info
[task 2017-03-16T07:32:31.165018Z] 07:32:31     INFO -  3  libxul.so!mozilla::layers::ContentClientDoubleBuffered::EnsureBackBufferIfFrontBuffer [ContentClient.cpp:ab96d8a9e247 : 631 + 0x5]
[task 2017-03-16T07:32:31.166178Z] 07:32:31     INFO -     rbx = 0x00007fd0a46aeef0   rbp = 0x00007fff235bf470
[task 2017-03-16T07:32:31.167279Z] 07:32:31     INFO -     rsp = 0x00007fff235bf460   r12 = 0x00007fff235bf5c0
[task 2017-03-16T07:32:31.168436Z] 07:32:31     INFO -     r13 = 0x00007fff235bf5c8   r14 = 0x00007fff235bf610
[task 2017-03-16T07:32:31.169568Z] 07:32:31     INFO -     r15 = 0x0000000000000001   rip = 0x00007fd0c4098ebd
It looks like sem_init is failing, possibly due to hitting resource limits.

Nical, any ideas what we should do when we can't have any new semaphores?

The two obvious (and simple) choices are:

* Treat it as a failure to allocate the texture and bail out from rendering the layer entirely.

* Ignore it, render the layer, but have it unsychronized. It's possible that the user would never notice, but they might get weird corruption from races.


There's also other work I think we could do to reduce the chances of this happening.

* Share a single readlock across a component alpha white/black buffer pair.

* When re-allocating buffers (due to a size change), re-use the existing lock rather than allocating a new one.

These may or may not help (depending on what exactly it causing us to run out), and it's not clear if they're worth the engineering effort right now.
Flags: needinfo?(matt.woodrow) → needinfo?(nical.bugzilla)
I would rather avoid potentially unsychronized texture accesses, because it's a bit hard to debug and the temptation to blame any glitch on that may become strong. If the lock serialization fails I would rather fall back to the copy-on-write behavior and force the copy next time we render into that texture, or have a pre-allocated lock per frame and fall back to blocking on it instead of blocking on just that texture.
Sharing the lock across buffer pairs sounds like a good idea, and recycling locks as well.
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8855636 [details]
Bug 1341496 - Part 3: Make CrossProcessSemaphore allocation fallible.

https://reviewboard.mozilla.org/r/127506/#review130512
Attachment #8855636 - Flags: review?(wmccloskey) → review+
Comment on attachment 8855635 [details]
Bug 1341496 - Part 2: Don't use a separate ReadLock for the second component alpha texture as they should always be locked/unlocked at the same time.

https://reviewboard.mozilla.org/r/127504/#review133650
Attachment #8855635 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8855634 [details]
Bug 1341496 - Part 1: Don't try to serialize read locks that aren't valid.

https://reviewboard.mozilla.org/r/127502/#review133664
Attachment #8855634 - Flags: review?(nical.bugzilla) → review+
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5785434af74
Part 1: Don't try to serialize read locks that aren't valid. r=nical
https://hg.mozilla.org/integration/mozilla-inbound/rev/f4c724034728
Part 2: Don't use a separate ReadLock for the second component alpha texture as they should always be locked/unlocked at the same time. r=nical
https://hg.mozilla.org/integration/mozilla-inbound/rev/f79d7564d39d
Part 3: Make CrossProcessSemaphore allocation fallible. r=billm
Matt, do we need to uplift this to Beta as well?
Assignee: nobody → matt.woodrow
Flags: needinfo?(matt.woodrow)
Comment on attachment 8855634 [details]
Bug 1341496 - Part 1: Don't try to serialize read locks that aren't valid.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1325227
[User impact if declined]: Crashes on pages that introduce lots of layers.
[Is this code covered by automated tests?]: Yes, intermittent failures introduced by the regressing bug have stopped happening.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No,
[Why is the change risky/not risky?]: It just adds graceful fallback for when allocation of a system object fails.
[String changes made/needed]: None
Flags: needinfo?(matt.woodrow)
Attachment #8855634 - Flags: approval-mozilla-beta?
Comment on attachment 8855634 [details]
Bug 1341496 - Part 1: Don't try to serialize read locks that aren't valid.

Fix an intermittent-failure. Beta54+. Should be in 54 beta 3.
Attachment #8855634 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Matt Woodrow (:mattwoodrow) from comment #19)
> [Is this code covered by automated tests?]: Yes, intermittent failures
> introduced by the regressing bug have stopped happening.
> [Has the fix been verified in Nightly?]: No.
> [Needs manual test from QE? If yes, steps to reproduce]: No

Setting qe-verify- based on Matt's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Fwiw this might fix #1345899 by accident where CrossProcessSemaphore isnt functional - i'll make sure to test 54.0b3.
With 54.0b3 on OpenBSD and the default of false for layers.enable-tiles, the window is displayed empty, and the terminal is filled with messages like:

Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 751 (t=3.74234) |[256][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3699) |[257][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3843) |[258][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4132) |[259][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4206) |[260][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4337) |[261][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4429) |[262][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4628) |[263][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4736) |[264][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4998) |[250][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3021) |[251][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3099) |[252][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3219) |[253][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3301) |[254][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3478) |[255][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3554) [GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747  

So even if the browser doesnt crash per se (#1345899), it isnt usable without enabling tiles - dunno how related it is to the CrossProcessSemaphore thing , im a bit lost in the interdependencies of e10s/gfx/tiles...
Duplicate of this bug: 1359228
You need to log in before you can comment on or make changes to this bug.