Intermittent damp | application crashed [@ mozilla::CrossProcessSemaphore::CrossProcessSemaphore]

RESOLVED FIXED in Firefox 54

Status

()

Core
Graphics: Layers
RESOLVED FIXED
10 months ago
8 months ago

People

(Reporter: Treeherder Bug Filer, Assigned: mattwoodrow)

Tracking

({crash, intermittent-failure})

unspecified
mozilla55
crash, intermittent-failure
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox53 unaffected, firefox54 fixed, firefox55 fixed)

Details

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(3 attachments)

(Reporter)

Description

10 months ago
treeherder
Filed by: wkocher [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=79146911&repo=autoland

https://archive.mozilla.org/pub/firefox/tinderbox-builds/autoland-linux64/autoland_ubuntu64_hw_test-g2-e10s-bm105-tests1-linux-build1727.txt.gz

Comment 1

9 months ago
12 failures in 790 pushes (0.015 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 5
* try: 2
* graphics: 2
* oak: 1
* mozilla-central: 1
* autoland: 1

Platform breakdown:
* linux64: 8
* linux32: 4

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1341496&startday=2017-03-06&endday=2017-03-12&tree=all
matt: since you introduced CrossProcessSemaphore in Bug 	1325227 - could you take a look, thanks!
Component: Talos → Graphics: Layers
Flags: needinfo?(matt.woodrow)
Keywords: crash
Product: Testing → Core
Version: Version 3 → unspecified
Thread 0 (crashed)
[task 2017-03-16T07:32:31.143611Z] 07:32:31     INFO -  0  libxul.so!mozilla::CrossProcessSemaphore::CrossProcessSemaphore [CrossProcessSemaphore_posix.cpp:ab96d8a9e247 : 54 + 0x0]
[task 2017-03-16T07:32:31.144335Z] 07:32:31     INFO -     rax = 0x000000000061fd00   rdx = 0x0000000000000002
[task 2017-03-16T07:32:31.145136Z] 07:32:31     INFO -     rcx = 0x00007fd0c5ffeee7   rbx = 0x00007fd0a5984860
[task 2017-03-16T07:32:31.145896Z] 07:32:31     INFO -     rsi = 0x0000000000000000   rdi = 0x00007fd0d291c140
[task 2017-03-16T07:32:31.146727Z] 07:32:31     INFO -     rbp = 0x00007fff235bf3f0   rsp = 0x00007fff235bf3d0
[task 2017-03-16T07:32:31.147530Z] 07:32:31     INFO -      r8 = 0x0000000000000000    r9 = 0x00007fd0a5800000
[task 2017-03-16T07:32:31.148242Z] 07:32:31     INFO -     r10 = 0x0000000000000000   r11 = 0x0000000000000206
[task 2017-03-16T07:32:31.149132Z] 07:32:31     INFO -     r12 = 0x00007fd0a7121240   r13 = 0x0000000000000001
[task 2017-03-16T07:32:31.149964Z] 07:32:31     INFO -     r14 = 0x00007fff235bf610   r15 = 0x0000000000000001
[task 2017-03-16T07:32:31.150703Z] 07:32:31     INFO -     rip = 0x00007fd0c3c66ea0
[task 2017-03-16T07:32:31.151526Z] 07:32:31     INFO -     Found by: given as instruction pointer in context
[task 2017-03-16T07:32:31.152321Z] 07:32:31     INFO -  1  libxul.so!mozilla::layers::TextureClient::EnableBlockingReadLock [TextureClient.cpp:ab96d8a9e247 : 1441 + 0x21]
[task 2017-03-16T07:32:31.153349Z] 07:32:31     INFO -     rbx = 0x00007fd0a78b8ee0   rbp = 0x00007fff235bf410
[task 2017-03-16T07:32:31.154415Z] 07:32:31     INFO -     rsp = 0x00007fff235bf400   r12 = 0x00007fd0a5984850
[task 2017-03-16T07:32:31.155510Z] 07:32:31     INFO -     r13 = 0x00007fff235bf428   r14 = 0x00007fff235bf610
[task 2017-03-16T07:32:31.156539Z] 07:32:31     INFO -     r15 = 0x0000000000000001   rip = 0x00007fd0c4090d4f
[task 2017-03-16T07:32:31.157555Z] 07:32:31     INFO -     Found by: call frame info
[task 2017-03-16T07:32:31.158617Z] 07:32:31     INFO -  2  libxul.so!mozilla::layers::ContentClientRemoteBuffer::CreateBackBuffer [ContentClient.cpp:ab96d8a9e247 : 323 + 0xc]
[task 2017-03-16T07:32:31.159691Z] 07:32:31     INFO -     rbx = 0x00007fd0a46aeef0   rbp = 0x00007fff235bf450
[task 2017-03-16T07:32:31.160761Z] 07:32:31     INFO -     rsp = 0x00007fff235bf420   r12 = 0x0000000000000002
[task 2017-03-16T07:32:31.161820Z] 07:32:31     INFO -     r13 = 0x00007fff235bf428   r14 = 0x00007fff235bf610
[task 2017-03-16T07:32:31.162836Z] 07:32:31     INFO -     r15 = 0x0000000000000001   rip = 0x00007fd0c4098cec
[task 2017-03-16T07:32:31.163889Z] 07:32:31     INFO -     Found by: call frame info
[task 2017-03-16T07:32:31.165018Z] 07:32:31     INFO -  3  libxul.so!mozilla::layers::ContentClientDoubleBuffered::EnsureBackBufferIfFrontBuffer [ContentClient.cpp:ab96d8a9e247 : 631 + 0x5]
[task 2017-03-16T07:32:31.166178Z] 07:32:31     INFO -     rbx = 0x00007fd0a46aeef0   rbp = 0x00007fff235bf470
[task 2017-03-16T07:32:31.167279Z] 07:32:31     INFO -     rsp = 0x00007fff235bf460   r12 = 0x00007fff235bf5c0
[task 2017-03-16T07:32:31.168436Z] 07:32:31     INFO -     r13 = 0x00007fff235bf5c8   r14 = 0x00007fff235bf610
[task 2017-03-16T07:32:31.169568Z] 07:32:31     INFO -     r15 = 0x0000000000000001   rip = 0x00007fd0c4098ebd

Comment 4

9 months ago
8 failures in 777 pushes (0.01 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* mozilla-inbound: 3
* autoland: 3
* mozilla-central: 2

Platform breakdown:
* linux64: 5
* linux32: 2
* linux64-stylo: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1341496&startday=2017-03-13&endday=2017-03-19&tree=all
(Assignee)

Comment 5

9 months ago
It looks like sem_init is failing, possibly due to hitting resource limits.

Nical, any ideas what we should do when we can't have any new semaphores?

The two obvious (and simple) choices are:

* Treat it as a failure to allocate the texture and bail out from rendering the layer entirely.

* Ignore it, render the layer, but have it unsychronized. It's possible that the user would never notice, but they might get weird corruption from races.


There's also other work I think we could do to reduce the chances of this happening.

* Share a single readlock across a component alpha white/black buffer pair.

* When re-allocating buffers (due to a size change), re-use the existing lock rather than allocating a new one.

These may or may not help (depending on what exactly it causing us to run out), and it's not clear if they're worth the engineering effort right now.
Flags: needinfo?(matt.woodrow) → needinfo?(nical.bugzilla)
I would rather avoid potentially unsychronized texture accesses, because it's a bit hard to debug and the temptation to blame any glitch on that may become strong. If the lock serialization fails I would rather fall back to the copy-on-write behavior and force the copy next time we render into that texture, or have a pre-allocated lock per frame and fall back to blocking on it instead of blocking on just that texture.
Sharing the lock across buffer pairs sounds like a good idea, and recycling locks as well.
Flags: needinfo?(nical.bugzilla)

Comment 7

9 months ago
9 failures in 898 pushes (0.01 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 5
* mozilla-inbound: 3
* mozilla-central: 1

Platform breakdown:
* linux64: 8
* linux64-stylo: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1341496&startday=2017-03-20&endday=2017-03-26&tree=all

Comment 8

9 months ago
7 failures in 845 pushes (0.008 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 3
* graphics: 2
* try: 1
* mozilla-inbound: 1

Platform breakdown:
* linux32: 4
* linux64: 2
* linux64-stylo: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1341496&startday=2017-03-27&endday=2017-04-02&tree=all
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 12

9 months ago
mozreview-review
Comment on attachment 8855636 [details]
Bug 1341496 - Part 3: Make CrossProcessSemaphore allocation fallible.

https://reviewboard.mozilla.org/r/127506/#review130512
Attachment #8855636 - Flags: review?(wmccloskey) → review+

Comment 13

8 months ago
mozreview-review
Comment on attachment 8855635 [details]
Bug 1341496 - Part 2: Don't use a separate ReadLock for the second component alpha texture as they should always be locked/unlocked at the same time.

https://reviewboard.mozilla.org/r/127504/#review133650
Attachment #8855635 - Flags: review?(nical.bugzilla) → review+

Comment 14

8 months ago
mozreview-review
Comment on attachment 8855634 [details]
Bug 1341496 - Part 1: Don't try to serialize read locks that aren't valid.

https://reviewboard.mozilla.org/r/127502/#review133664
Attachment #8855634 - Flags: review?(nical.bugzilla) → review+

Comment 15

8 months ago
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c5785434af74
Part 1: Don't try to serialize read locks that aren't valid. r=nical
https://hg.mozilla.org/integration/mozilla-inbound/rev/f4c724034728
Part 2: Don't use a separate ReadLock for the second component alpha texture as they should always be locked/unlocked at the same time. r=nical
https://hg.mozilla.org/integration/mozilla-inbound/rev/f79d7564d39d
Part 3: Make CrossProcessSemaphore allocation fallible. r=billm

Comment 16

8 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/c5785434af74
https://hg.mozilla.org/mozilla-central/rev/f4c724034728
https://hg.mozilla.org/mozilla-central/rev/f79d7564d39d
Status: NEW → RESOLVED
Last Resolved: 8 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55

Comment 17

8 months ago
7 failures in 817 pushes (0.009 failures/push) were associated with this bug in the last 7 days.   

Repository breakdown:
* autoland: 6
* graphics: 1

Platform breakdown:
* linux64-stylo: 5
* linux64: 1
* linux32: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1341496&startday=2017-04-17&endday=2017-04-23&tree=all
Matt, do we need to uplift this to Beta as well?
Assignee: nobody → matt.woodrow
status-firefox53: --- → unaffected
status-firefox54: --- → affected
status-firefox-esr52: --- → unaffected
Flags: needinfo?(matt.woodrow)
(Assignee)

Comment 19

8 months ago
Comment on attachment 8855634 [details]
Bug 1341496 - Part 1: Don't try to serialize read locks that aren't valid.

Approval Request Comment
[Feature/Bug causing the regression]: Bug 1325227
[User impact if declined]: Crashes on pages that introduce lots of layers.
[Is this code covered by automated tests?]: Yes, intermittent failures introduced by the regressing bug have stopped happening.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No,
[Why is the change risky/not risky?]: It just adds graceful fallback for when allocation of a system object fails.
[String changes made/needed]: None
Flags: needinfo?(matt.woodrow)
Attachment #8855634 - Flags: approval-mozilla-beta?
Comment on attachment 8855634 [details]
Bug 1341496 - Part 1: Don't try to serialize read locks that aren't valid.

Fix an intermittent-failure. Beta54+. Should be in 54 beta 3.
Attachment #8855634 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 21

8 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/8f4eea328bd5
https://hg.mozilla.org/releases/mozilla-beta/rev/b38f7851fe6f
https://hg.mozilla.org/releases/mozilla-beta/rev/3eb107770633
status-firefox54: affected → fixed
(In reply to Matt Woodrow (:mattwoodrow) from comment #19)
> [Is this code covered by automated tests?]: Yes, intermittent failures
> introduced by the regressing bug have stopped happening.
> [Has the fix been verified in Nightly?]: No.
> [Needs manual test from QE? If yes, steps to reproduce]: No

Setting qe-verify- based on Matt's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Fwiw this might fix #1345899 by accident where CrossProcessSemaphore isnt functional - i'll make sure to test 54.0b3.
With 54.0b3 on OpenBSD and the default of false for layers.enable-tiles, the window is displayed empty, and the terminal is filled with messages like:

Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 751 (t=3.74234) |[256][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3699) |[257][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3843) |[258][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4132) |[259][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4206) |[260][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4337) |[261][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4429) |[262][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4628) |[263][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4736) |[264][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4998) |[250][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3021) |[251][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3099) |[252][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3219) |[253][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3301) |[254][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3478) |[255][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3554) [GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747  

So even if the browser doesnt crash per se (#1345899), it isnt usable without enabling tiles - dunno how related it is to the CrossProcessSemaphore thing , im a bit lost in the interdependencies of e10s/gfx/tiles...

Updated

8 months ago
Duplicate of this bug: 1359228
You need to log in before you can comment on or make changes to this bug.