Closed Bug 1286437 Opened 3 years ago Closed 3 years ago

Crash in mozilla::ipc::FatalError | mozilla::layers::PLayerTransactionParent::Read

Categories

(Core :: Graphics: Layers, defect, critical)

50 Branch
Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
firefox47 --- unaffected
firefox48 --- unaffected
firefox49 --- unaffected
firefox-esr45 --- unaffected
firefox50 --- fixed

People

(Reporter: kanru, Assigned: nical)

Details

(Keywords: crash, regression, Whiteboard: [gfx-noted])

Crash Data

Attachments

(1 file)

Bug 1208226#c63 mentions getting this crash when he tore away a tab, and there is another comment in the bug about tearing tabs causing this crash - https://crash-stats.mozilla.com/report/index/dd2c131a-332c-4f5a-909f-af9742160706
Version: unspecified → 50 Branch
I have been looking into this for quite a while now and it's hard to tell what the root cause is, beyound that a texture is added to the update list of a LayerTransactionChild (se UseTextures in ShadowLayers.cpp) and destroyed (seemingly outside of the transaction) causing the parent side to receive the destruction of the texture before the OpUseTexture/Update message.
A band-aid that is probably not unreasonable would be to keep a list of strong references to all textureclients queued for update in ShadowLayers.cpp so that the TextureClient is not destroyed before the transaction is sent.
Ideally I'd find out if the texture is indeed destroyed out of the transaction and how it is even possible since queuing textures for updates happens in the transaction. Or maybe figure out why the destruction of the texture is not added to the current transaction if the texture is in fact destroyed during a transaction.
I'm still digging.
Flags: needinfo?(nical.bugzilla)
(In reply to Nicolas Silva [:nical] from comment #2)
> ...
> I'm still digging.

Thanks.  I'd rather this didn't make it to aurora (we have until the end of the month.)
Assignee: nobody → nical.bugzilla
(In reply to Nicolas Silva [:nical] from comment #2)
> I have been looking into this for quite a while now and it's hard to tell
> what the root cause is, beyound that a texture is added to the update list
> of a LayerTransactionChild (se UseTextures in ShadowLayers.cpp) and
> destroyed (seemingly outside of the transaction) causing the parent side to
> receive the destruction of the texture before the OpUseTexture/Update
> message.
> A band-aid that is probably not unreasonable would be to keep a list of
> strong references to all textureclients queued for update in
> ShadowLayers.cpp so that the TextureClient is not destroyed before the
> transaction is sent.
> Ideally I'd find out if the texture is indeed destroyed out of the
> transaction and how it is even possible since queuing textures for updates
> happens in the transaction. Or maybe figure out why the destruction of the
> texture is not added to the current transaction if the texture is in fact
> destroyed during a transaction.
> I'm still digging.

What's confusing is this looks like a deserialization error. Generally those should be impossible - sending a bad actor can do it, but it's not failing to deserialize an actor. It's failing decoding a timestamp (PLayerTransactionParent.cpp:3223). MessageChannel only dispatches after receiving the full length of a message, so we would only expect this to happen if something were severely wrong, like the message dispatching on the wrong top-level actor, or a compromised child process, or pickle/unpickle code not being symmetric.
(In reply to David Anderson [:dvander] from comment #4)
> [...] like the message dispatching on the
> wrong top-level actor

Oooooooh! Somehow I was convinced that the child process would check that, and that a crash in the parent process could therefore not be caused by the child process sending with the wrong top-level actor.
I just hacked something up to check and indeed, the symptoms are the same when that happens. I'll add some checks ShadowLayers.cpp and hopefully we'll get more interesting crash information.
This is not the ideal thing. I first tried to get the IPDL generator to emit checks that actors have proper top-level protocol when serializing their handles but modifying the IPDL generator is ridiculously hard and I want to get better crash data asap so this will do in the short term.
Attachment #8773705 - Flags: review?(dvander)
Attachment #8773705 - Flags: review?(dvander) → review+
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/8c4e144e372b
Crash the content process instead of the parent when an actor is sent on the wrong channel. r=dvander
Whiteboard: [gfx-noted] → [gfx-noted][leave-open]
Is this being followed up in bug 1291296 now?  It seems this assert is hitting there, and we have some patches there.  Should we close this?
Flags: needinfo?(nical.bugzilla)
Yes, let's close it now that we know that the crash has moved.
Flags: needinfo?(nical.bugzilla)
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Whiteboard: [gfx-noted][leave-open] → [gfx-noted]
Fixed in 50 as part of bug 1291296.
You need to log in before you can comment on or make changes to this bug.