1416726 - mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPTextureConstructor

Reporter

Description

•

8 years ago

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0 Build ID: 20171112220346 Steps to reproduce: Use latest nightly 58a1 on Linux 64bit Go to http://wsprnet.org Select map Select band "all" from dropdown Push update button Actual results: Page crashes ( for example https://crash-stats.mozilla.com/report/index/c8baf58f-3373-4f2b-ba8f-310150171113 ) Expected results: No crash. Reproduced on two linux machines with latest nightly. https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozilla%3A%3Aipc%3A%3AFatalError%20%7C%20mozilla%3A%3Adom%3A%3AContentChild%3A%3AFatalErrorIfNotUsingGPUProcess%20%7C%20mozilla%3A%3Alayers%3A%3APCompositorBridgeChild%3A%3ASendPTextureConstructor Not crashing on windows. Using mozregression gives result: 2017-11-12T22:21:23: DEBUG : Using url: https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?changeset=713c3225f14883b71e0d6febda505e56889f8234&full=1 2017-11-12T22:21:24: DEBUG : Found commit message: Don't do PaintThebes after PaintOffMainThread. (bug 1399692 part 10, r=dvander) MozReview-Commit-ID: J0IOzqIGRtz 2017-11-12T22:21:24: INFO : The bisection is done.

Henri

Reporter

Updated

•

8 years ago

Crash Signature: [@ mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPTextureConstructor ]

Henri

Reporter

Updated

•

8 years ago

Keywords: crash, regression

Henri

Reporter

Updated

•

8 years ago

OS: Unspecified → Linux

Kohei Yoshino [:kohei]

Updated

•

8 years ago

Component: Untriaged → DOM

Product: Firefox → Core

Henri

Reporter

Comment 1

•

8 years ago

Sometimes the same procedure results in different crash signature: [@ mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPLayerTransactionConstructor ] For example bp-b316a2ce-89a3-4377-ba06-cd07d0171113 Still happens on 59.0a1 nightly

Henri

Reporter

Updated

•

8 years ago

Crash Signature: [@ mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPTextureConstructor ] → [@ mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPTextureConstructor ] [@ mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozil…

Andrew Overholt [:overholt]

Updated

•

8 years ago

Blocks: 1399692

Component: DOM → Graphics

Flags: needinfo?(rhunt)

Marius Coman [:mcoman], Ecosystem QA

Updated

•

8 years ago

status-firefox57: --- → unaffected

status-firefox58: --- → affected

status-firefox59: --- → affected

Jerry Shih[:jerry] (UTC+8) (inactive)

Updated

•

8 years ago

Priority: -- → P2

Whiteboard: [gfx-noted]

Jerry Shih[:jerry] (UTC+8) (inactive)

Updated

•

8 years ago

Flags: needinfo?(hshih)

Ryan Hunt [:rhunt]

Assignee

Comment 2

•

8 years ago

This should be fixed by reverting some content client changes in bug 1416921.

Assignee: nobody → rhunt

Depends on: 1416921

Flags: needinfo?(rhunt)

Mike Taylor [:miketaylr]

Comment 3

•

8 years ago

Ryan, how does it look with bug 1416921 landed?

Flags: needinfo?(rhunt)

Ryan Hunt [:rhunt]

Assignee

Comment 4

•

8 years ago

It looks like this has been fixed now.

Status: UNCONFIRMED → RESOLVED

Closed: 8 years ago

Flags: needinfo?(rhunt)

Flags: needinfo?(hshih)

Resolution: --- → FIXED

(Out Sep 17-23) Ryan VanderMeulen

Comment 5

•

8 years ago

Do we need to land something on Beta for 58 still?

status-firefox59: affected → fixed

status-firefox-esr52: --- → unaffected

Flags: needinfo?(rhunt)

Target Milestone: --- → mozilla59

Ryan Hunt [:rhunt]

Assignee

Comment 6

•

8 years ago

Yes, this bug and bug 1416921 is still reproducible on Beta for 58. I'll request uplift.

Flags: needinfo?(rhunt)

Ryan Hunt [:rhunt]

Assignee

Comment 7

•

8 years ago

(In reply to Ryan Hunt [:rhunt] from comment #6) > Yes, this bug and bug 1416921 is still reproducible on Beta for 58. I'll > request uplift. I was looking through the patch that landed on inbound to make sure I didn't add any fixups that aren't in the patch on bugzilla, and it matches. But I also found a line of code that was dropped by that bug that shouldn't have and filed bug 1421871 about it. That should be landed and uplifted with bug 1416921.

Julien Cristau [:jcristau]

Comment 8

•

8 years ago

bugs 1416921 and 1421871 are in 58, so I'll go ahead and mark this fixed.

status-firefox58: affected → fixed

Henri

Reporter

Comment 9

•

8 years ago

It seems the root cause for the crash was not fixed. The problem that was fixed just made this problem easier to trigger. The crash can be still triggered by going to the wsprnet.org page, selecting map, selecting "all" from dropdown, clicking update and after that zooming in and out repeatedly with ctrl+mouse wheel. New trace here: bp-13a6f565-3d13-4d09-8083-c6b670171226 I did a new bisection, this time the regression is all the way back to: 2017-12-26T19:41:39: DEBUG : Using url: https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?changeset=a19ac5da60828cdcf374e6798dc7e07b8b4c1b7d&full=1 2017-12-26T19:41:41: DEBUG : Found commit message: Bug 1325227 - Part 10: Use blocking read locks instead of forcing a synchronous transaction when using ContentClientRemoteBuffer. r=nical Should this bug be reopened, or entered into a new one?

Henri

Reporter

Updated

•

8 years ago

Flags: needinfo?(rhunt)

Henri

Reporter

Updated

•

8 years ago

Crash Signature: mozilla::layers::PCompositorBridgeChild::SendPLayerTransactionConstructor ] → mozilla::layers::PCompositorBridgeChild::SendPLayerTransactionConstructor ] [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPTexture…

Ryan Hunt [:rhunt]

Assignee

Comment 10

•

8 years ago

I'd say this bug can be reopened. This confirms my suspicions that this wasn't exactly related to OMTP, and is a preexisting issue made worse with timing changes. The page linked creates many layers (I forget how many but it may have been in the thousand), and this can cause resource failures we don't handle well. I think it would be worth figuring out why we create so many layers, and trying to fix that.

No longer blocks: 1399692

Status: RESOLVED → REOPENED

Ever confirmed: true

Flags: needinfo?(rhunt)

Resolution: FIXED → ---

Henri

Reporter

Updated

•

8 years ago

Crash Signature: mozilla::layers::PCompositorBridgeChild::SendPTextureConstructor ] → mozilla::layers::PCompositorBridgeChild::SendPTextureConstructor ] [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPLayerTransaction…

Stebs

Comment 11

•

8 years ago

Another STR that always crashes for me (on Nightly 20180129100406 and Release 58): Go to: http://www.erlangen.mein-abfallkalender.de/containerstandorte Click with mouse on + icon (zoom in): crash Clicking on - icon (zoom out) does not crash Zooming with Ctrl+Mousewheel does not crash bp-ee86ad44-3804-4446-b6f2-001fb0180129

Sylvestre Ledru [:Sylvestre]

Comment 12

•

8 years ago

I have this crash on github, meetup.com, etc (10 times in 2 days) Happy to test anything. Ryan, is there anything I could do to help?

Flags: needinfo?(rhunt)

Andrew McCreight [:mccr8]

Comment 13

•

8 years ago

This is approximately 20% of all OSX Nightly crashes for the Jan 31 Nightly. eg a very common crash. Though it isn't coming from all that many installations.

Stebs

Comment 14

•

8 years ago

With my STR from above, I got this regression range: INFO: Last good revision: bcda57aa0c3b291dcd5de5d7fd048af69dbdff4d INFO: First bad revision: a19ac5da60828cdcf374e6798dc7e07b8b4c1b7d INFO: Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=bcda57aa0c3b291dcd5de5d7fd048af69dbdff4d&tochange=a19ac5da60828cdcf374e6798dc7e07b8b4c1b7d

Ryan Hunt [:rhunt]

Assignee

Comment 15

•

8 years ago

Could you provide a link for the OSX crashes? I'm having a difficulty finding it on crash-reports. I only see linux crashes.

Flags: needinfo?(continuation)

Ryan Hunt [:rhunt]

Assignee

Comment 16

•

8 years ago

Would you happen to have a link to one of the crashes you had? What platform was it on? OSX vs Linux/Windows use different code paths, and it'd help to know if this is related to parallel painting with tiling which just was enabled on OSX.

Flags: needinfo?(rhunt)

Andrew McCreight [:mccr8]

Comment 17

•

8 years ago

(In reply to Ryan Hunt [:rhunt] from comment #15) > Could you provide a link for the OSX crashes? I'm having a difficulty > finding it on crash-reports. I only see linux crashes. Hmm, yeah, that's strange. I don't see any OSX crashes, either. Maybe they got removed, or I somehow was mistaken.

Flags: needinfo?(continuation)

Sylvestre Ledru [:Sylvestre]

Comment 18

•

8 years ago

Some examples: bp-f68aa17a-f9f2-4274-b420-9eed80180206 bp-0f209b46-946e-4dc2-9fbd-7f51d0180201 bp-c1edbc14-66c8-4f7d-aa8e-52e4b0180201 Linux only; this is causing my nightly firefox to be unusable.

Flags: needinfo?(rhunt)

Ryan Hunt [:rhunt]

Assignee

Comment 19

•

8 years ago

Hmm, those crash signatures are: [@ mozalloc_abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPLayerTransactionConstructor ] Which has a different cause from: mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPTextureConstructor Both come from IPC being down, but the SendPTextureConstructor one is from running out of shmem's, and I'm not sure what's going on with the SendPLayerTransactionConstructor one. It's happening during the startup process of a Tab and it looks like IPC is already down? I can't see any more information about why IPC is failing. Has this just started happening, and if so would it be possible to use mozregression to find when the instability started? Are you able to open any tabs successfully? If possible log output would also help. About:support might have some clues as well.

Flags: needinfo?(rhunt)

Henri

Reporter

Comment 20

•

8 years ago

I've tried to narrow down the SendPLayerTransactionConstructor crashes. They are more easily reproduced on recent builds, but so far I've been able to reproduce them back to 2017-05-04 build. Builds much earlier than that are too crashy to easily differentiate between the crashes (SendPLayerTransactionConstructor or SendPTextureConstructor), because the whole browser tends to crash upon entering the page. I suspect the root cause is the same for both, bisecting including the browser crashes gets this result: 2018-02-07T11:57:31: DEBUG : Using url: https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?changeset=a19ac5da60828cdcf374e6798dc7e07b8b4c1b7d&full=1 2018-02-07T11:57:32: DEBUG : Found commit message: Bug 1325227 - Part 10: Use blocking read locks instead of forcing a synchronous transaction when using ContentClientRemoteBuffer. r=nical I've noticed that for the SendPLayerTransactionConstructor, the crash is a bit different. It can be reproduced the same way using scroll wheel zoom on the wsprnet page. Sometimes, instead of getting the "tab crashed" page, I only get a white page which is not responsive. Switching to another tab and back, I get the spinner to the middle of the white page. So I suspect it's the "tab crashed" page crashing instead in this case. Maybe the previous crash on the wsprnet page took down IPC and it's not for some reason restored when the "tab crashed" page takes over.

Nathan Froyd [:froydnj]

Comment 21

•

8 years ago

I find that I usually have updates pending when this crash happens to me; is it possible that we fail to launch the GPU process (because it's incompatible with the currently running Firefox or something) and then this is the result?

Andrew Osmond [:aosmond] (he/him)

Comment 22

•

8 years ago

(In reply to Nathan Froyd [:froydnj] from comment #21) > I find that I usually have updates pending when this crash happens to me; is > it possible that we fail to launch the GPU process (because it's > incompatible with the currently running Firefox or something) and then this > is the result? According to bug 1366808, it could theoretically happen, as the GPU process has the same build ID check as the content processes.

Nathan Froyd [:froydnj]

Comment 23

•

8 years ago

I just received https://crash-stats.mozilla.com/report/index/f42156f2-8786-4b3b-a08a-d53540180207 while an update was not pending (in fact, had recently restarted from an update), so maybe that's not a super-great theory. I have a smart bookmark pointing at http://www.imcpl.org/cgi-bin/searchdirect_new.pl?restrict=catalog&q=%s with a keyword set, and entering "$KEYWORD search terms" in the URL bar (+ hitting enter, of course) is particularly likely to trigger this crash.

Nathan Froyd [:froydnj]

Comment 24

•

8 years ago

I keep hitting this, so some new data points: https://crash-stats.mozilla.com/report/index/06ea525a-8d31-41fb-847c-f59010180210 https://crash-stats.mozilla.com/report/index/1e8d511e-a269-46e7-b4b0-817b60180210 https://crash-stats.mozilla.com/report/index/8bea56d4-4d47-4b34-9cc2-63ae70180210 Crashes within a minute of starting the browser, and: https://crash-stats.mozilla.com/report/index/6b8566fd-e445-4190-93fc-4c6d50180210 https://crash-stats.mozilla.com/report/index/8c6df4f1-6a93-4cc5-9680-7e0ec0180210 https://crash-stats.mozilla.com/report/index/04913626-0637-4558-9698-c431a0180210 are all crashes within five seconds of starting the browser. Are we sure there's not some sort of race in starting the GPU process, where one part of the code thinks we are using the GPU process, but another part of the code disagrees?

Ryan Hunt [:rhunt]

Assignee

Comment 25

•

8 years ago

There's been race conditions in this part of the code for sure. But on linux the GPU process should be disabled (and I see that in the crash reports), which means that we shouldn't be ignoring this IPC error in this case. For some reason the MessageChannel to the compositor is down right away after startup in these crashes. Either we never connected, or the connection was broken for some reason. I don't think it's likely we never connected, because in that case we'd crash much sooner than paint. Most likely in TabChild::InitRendering. For the crashes with a STR earlier in this bug, the connection went down because of a huge number of PaintedLayer's being created, but that shouldn't be the case with these latest crashes as it's right at startup. Andrew, do you have any idea of what could be going on here or a diagnostic we could add to get more information?

Flags: needinfo?(aosmond)

Andrew Osmond [:aosmond] (he/him)

Comment 26

•

8 years ago

The parent and the content process split the file handles between them from my own testing using the STR from comment 11. CrossProcessSemaphore appears to not release its file handles after getting shared. In short, we create so many of them, that we run out of file handles. (CrossProcessMutex has a similar flaw.) Matches with the push log.

Flags: needinfo?(aosmond)

Ryan Hunt [:rhunt]

Assignee

Comment 27

•

8 years ago

I'm just gonna summarize the findings we have here so I don't forget over the long weekend. The problem is we are running out of file handles from creating so many PaintedLayers which use blocking readlocks which use CrossProcessSemaphore. The reason we create so many PaintedLayers on these pages with maps is because the maps create absolutely positioned pins that set the top and left properties enough times quick enough that it triggers layerization for each one. They are pretty small, so we are able to layerize a lot before we run out of AGRBudget. [1] Possible solutions: 1. We could introduce a count limit to the AGR budget, similar to the layer count limit [2]. But if a page is truly animating a lot of layers there's a chance that could cause a painting regression. 2. We could change how we allocate CrossProcessSemaphores to share shmems from a pool thereby lowering the amount of file handles we need to open. 3. We could rework the layers IPC to only send the shmem file handle once to the compositor, then close it, and send some other ID in each layer transaction. This would probably completely get around the file handle limits. [1] https://searchfox.org/mozilla-central/rev/cac28623a15ace458a8f4526e107a71db1519daf/layout/painting/nsDisplayList.cpp#1786 [2] https://searchfox.org/mozilla-central/rev/cac28623a15ace458a8f4526e107a71db1519daf/layout/painting/FrameLayerBuilder.cpp#4039

Asif Youssuff

Updated

•

8 years ago

status-firefox60: --- → affected

Jamie Nicol [:jnicol]

Comment 28

•

8 years ago

Some layerization tweaks, ie solution 1, certainly might help. But in my experience, fixing these bugs is like playing whack-a-mole, and it'll be pretty much impossible to ensure we never have too many layers on every site. So I think for this bug we have to do option 2 or 3. At the same time I will look in to sites creating too many layers and see what can be done.

[:philipp]

Comment 29

•

8 years ago

this will probably regress in higher numbers once 60.0 is released to the beta audience.

tracking-firefox60: --- → ?

Julien Cristau [:jcristau]

Comment 30

•

8 years ago

Did anything change in 60 that would make this more common?

Ryan Hunt [:rhunt]

Assignee

Comment 31

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=992fa5bf9ea6a99b0a33df6f71470b44ed486423 There are 2 IPC related failures in this try run that are a little suspicious, but are in unaffected call stacks so I think it's okay.

Comment hidden (mozreview-request)

Andrew Osmond [:aosmond] (he/him)

Comment 33

•

8 years ago

mozreview-review

Comment on attachment 8958113 [details] Allocate TextureReadLock at TextureClient creation and drop file handles immediately after. (bug 1416726, ) https://reviewboard.mozilla.org/r/227070/#review232830 ::: commit-message-fdd1a:14 (Diff revision 1) > +1. Create the TextureReadLock in the TextureClient constructor so it's available before IPC creation > + a. This is superficial as EnableReadLock was always called before IPC creation > +1. Send the ReadLockDescriptor in the PTextureConstructor message and close the file handle > +1. Receive the ReadLockDescriptor in TextureHost and close the file handle > +1. Send a boolean flag in layer transactions if the texture is read locked instead of a descriptor > +1. Use a boolean flag in TextureHost to determine if the ReadLock must be unlocked instead of a nullptr nit: 1, 2, 3 ... instead of 1, 1, 1 :). ::: gfx/layers/client/ContentClient.cpp:765 (Diff revision 1) > > RefPtr<TextureClient> textureClientOnWhite; > if (aFlags & TextureFlags::COMPONENT_ALPHA) { > textureClientOnWhite = textureClient->CreateSimilar( > mForwarder->GetCompositorBackendType(), > - aFlags | ExtraTextureFlags(), > + aFlags | ExtraTextureFlags() | TextureFlags::BLOCKING_READ_LOCK, textureClientOnWhite did not originally have a blocking read lock. Was it missing? ::: gfx/layers/client/TextureClient.cpp:1364 (Diff revision 1) > { > mData->FillInfo(mInfo); > mFlags |= mData->GetTextureFlags(); > + > + if (mFlags & TextureFlags::NON_BLOCKING_READ_LOCK) { > + EnableReadLock(); Might want to assert here that BLOCKING_READ_LOCK isn't set.

Attachment #8958113 - Flags: review?(aosmond) → review+

Ryan Hunt [:rhunt]

Assignee

Comment 34

•

8 years ago

mozreview-review-reply

Comment on attachment 8958113 [details] Allocate TextureReadLock at TextureClient creation and drop file handles immediately after. (bug 1416726, ) https://reviewboard.mozilla.org/r/227070/#review232830 > textureClientOnWhite did not originally have a blocking read lock. Was it missing? No it wasn't, it was added erroneously. > Might want to assert here that BLOCKING_READ_LOCK isn't set. Yes, good idea.

Pulsebot

Comment 35

•

8 years ago

Pushed by rhunt@eqrion.net: https://hg.mozilla.org/integration/mozilla-inbound/rev/45b5743ae3ad Allocate TextureReadLock at TextureClient creation and drop file handles immediately after. (bug 1416726, r=aosmond)

Ryan Hunt [:rhunt]

Assignee

Updated

•

8 years ago

Blocks: 1445008

Eliza Balazs [:ebalazs_]

Comment 36

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/45b5743ae3ad

Status: REOPENED → RESOLVED

Closed: 8 years ago → 8 years ago

status-firefox61: --- → fixed

Resolution: --- → FIXED

Target Milestone: mozilla59 → mozilla61

Henri

Reporter

Comment 37

•

8 years ago

I can't reproduce the crash anymore on latest nightly. Thanks for the fix!

Jamie Nicol [:jnicol]

Comment 38

•

8 years ago

Thanks for the fix Ryan! If there specific sites in this bug causing bad layerization, even if they now no longer cause this crash, let's file specific bugs for them and I'll look in to them

Nathan Froyd [:froydnj]

Comment 39

•

8 years ago

(In reply to Jamie Nicol [:jnicol] from comment #38) > Thanks for the fix Ryan! Yes, thank you! > If there specific sites in this bug causing bad > layerization, even if they now no longer cause this crash, let's file > specific bugs for them and I'll look in to them Are there easy ways for users to tell if sites are causing poor layerizations besides the browser crashing?

Flags: needinfo?(jnicol)

Jamie Nicol [:jnicol]

Comment 40

•

8 years ago

Short answer: poor performance and/or crashing often indicate poor layerizations but the user might not easily be able to attribute it to that. Longer answer: Poor layerization should never cause crashes in theory. It does of course, due to bugs such as this one, and as always, OOMs. The easiest way to see the layerization is to set layers.draw-borders=true. Sometimes this will make it immediately obvious that we've done a bad job. Though this isn't always easy to see, nor 100% objective. Using the gecko profiler can also help - lots of time spent allocating layers is usually a big indication, for example. Also enabling layers.dump or layout.display-list.dump will log the layer tree which can be inspected. So if we enable layer borders and take your smart bookmark example above - you'll see the loading spinners each create a fair number of tiny layers while they spin. Not sure why they are doing so, nor if this is quite bad enough to be the only problem on this page. But it's certainly worth investigating. Likewise on http://wsprnet.org/drupal/wsprnet/map, each of the markers on the map gets their own layer, and there are a lot of them. From a glance I don't think this is necessarily bad, however - how else would they move correctly with the map while zooming? So I'm glad this bug got fixed in a way other than simply limiting the number of layers we create.

Flags: needinfo?(jnicol)

Kestrel

Updated

•

8 years ago

Depends on: 1445067

Kestrel

Updated

•

8 years ago

No longer depends on: 1445067

Julien Cristau [:jcristau]

Comment 41

•

7 years ago

Is this something that should be considered for uplift?

Flags: needinfo?(rhunt)

Ryan Hunt [:rhunt]

Assignee

Comment 42

•

7 years ago

Yes I think it should be, I'll request uplift.

Flags: needinfo?(rhunt)

Ryan Hunt [:rhunt]

Assignee

Comment 43

•

7 years ago

Comment on attachment 8958113 [details] Allocate TextureReadLock at TextureClient creation and drop file handles immediately after. (bug 1416726, ) Approval Request Comment [Feature/Bug causing the regression]: Graphics texture IPC used a cross process synchronization mechanism (read locks) that required a file descriptor be open for each layer on the screen. In some edge cases we end up generating thousands of layers, and that can cause us to crash from file descriptor exhaustion. [User impact if declined]: Some web pages may crash such as the google maps embedding pages listed in this bug. [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: Yes [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: Just this bug, the uploaded mozreview request has a few nits that were fixed locally before being pushed to central. [Is the change risky?]: Moderately [Why is the change risky/not risky?]: The way read locks are sent to the compositor was changed, and while this new method is more straightforward it was not a single line fix. [String changes made/needed]: None

Attachment #8958113 - Flags: approval-mozilla-beta?

Julien Cristau [:jcristau]

Comment 44

•

7 years ago

Comment on attachment 8958113 [details] Allocate TextureReadLock at TextureClient creation and drop file handles immediately after. (bug 1416726, ) this is slightly scary, but seems to be doing ok in nightly and was verified to fix the issue with running out of file descriptors by at least one affected user, so let's get this on beta60

Attachment #8958113 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Julien Cristau [:jcristau]

Updated

•

7 years ago

tracking-firefox60: ? → +

Raul Gurzau (:RaulG)

Comment 45

•

7 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/395d409b9bf5

status-firefox60: affected → fixed

Cornel Ionce [:noni] [Hubs QA]

Comment 46

•

7 years ago

(In reply to Ryan Hunt [:rhunt] from comment #43) > [Needs manual test from QE? If yes, steps to reproduce]: No Marking this as qe- per Ryan's assessment on manual testing needs.

Flags: qe-verify-

Calixte Denizet (:calixte)

Comment 47

•

7 years ago

There are no more crashes with signature "mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPLayerTransactionConstructor" in nightly 61 & beta 60 but there still are ones with signature "mozilla::ipc::FatalError | mozilla::dom::ContentChild::FatalErrorIfNotUsingGPUProcess | mozilla::layers::PCompositorBridgeChild::SendPTextureConstructor" (however the crash volume dropped after the patch landed). :rhunt, Should I file a new bug ?

Flags: needinfo?(rhunt)

Ryan Hunt [:rhunt]

Assignee

Comment 48

•

7 years ago

Yes that would be good, that signature can have multiple reasons for it happening so it'd be good to handle what remains in a separate bug.

Flags: needinfo?(rhunt)

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Updated

•

7 years ago

Updated

•

7 years ago