Closed Bug 1491340 Opened 2 years ago Closed 2 years ago

Crash in xul.dll@0x17e687a | xul.dll@0x17e6af2 | xul.dll@0x181a883 | xul.dll@0x181b87d | xul.dll@0x509db1 | xul.dll@0x18acb9f | xul.dll@0x449876 | xul.dll@0x445691 | xul.dll@0x16c39e | xul.dll@0x16c111 | xul.dll@0x1a8011 | xul.dll@0x1cff7 | xul.dll@0x1...

Categories

(Core :: General, defect)

63 Branch
x86
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox-esr60 --- unaffected
firefox62 --- unaffected
firefox63 --- fixed
firefox64 --- fixed

People

(Reporter: marcia, Assigned: Alex_Gaynor)

References

Details

(Keywords: crash, regression)

Crash Data

This bug was filed from the Socorro interface and is
report bp-3190e4ef-207f-46b0-baba-d6a9e0180913.
=============================================================

Win7 specific crash which started in 63.0b5: https://bit.ly/2p7sY3S. Here are the bugs that went into that beta: https://mzl.la/2MYBXD3

Full Mercurial Changelog: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_63_0b4_RELEASE&tochange=554953db0c6120f7c67794dfb1bd5b493f15749d&full=1

Correlation: Error deserializing variant TShmem of union MemoryOrShmem

Some of the URLs appear to be web developers testing various things.

Top 10 frames of crashing thread:

0 xul.dll xul.dll@0x17e687a 
1 xul.dll xul.dll@0x17e6af2 
2 xul.dll xul.dll@0x181a883 
3 xul.dll xul.dll@0x181b87d 
4 xul.dll xul.dll@0x509db1 
5 xul.dll xul.dll@0x18acb9f 
6 xul.dll xul.dll@0x449876 
7 xul.dll xul.dll@0x445691 
8 xul.dll xul.dll@0x16c39e 
9 xul.dll xul.dll@0x16c111 

=============================================================
philipp pointed out that this signature has been around before in previous betas, this query has a similar crash reason: https://bit.ly/2NdU8oq.
Adding the other top signatures seen throughout the 63 betas.
Crash Signature: x...] → x...] [@ xul.dll@0x17ee99a | xul.dll@0x17eec12 | xul.dll@0x1822983 | xul.dll@0x182397d | xul.dll@0x50c771 | xul.dll@0x18b4cbf | xul.dll@0x44a086 | xul.dll@0x445df1 | xul.dll@0x16c35e | xul.dll@0x16c0d1 | xul.dll@0x1a8151 | xul.dll@0x1d037 | xul.dll@0x14…
Adding another signature seen in B6.
Crash Signature: xul.dll@0x1a8151 | xul.dll@0x1d037 | xul.dll@0x144a1e | xul.dll@0x1ce91 | x... ] → xul.dll@0x1a8151 | xul.dll@0x1d037 | xul.dll@0x144a1e | xul.dll@0x1ce91 | x... ] [@ xul.dll@0x17f593a | xul.dll@0x17f5bb2 | xul.dll@0x18299c3 | xul.dll@0x182a9bd | xul.dll@0x50afa1 | xul.dll@0x18bc31f | xul.dll@0x44cf46 | xul.dll@0x448d41 | xul.dll@0x16…
Ted - Any ideas here?
Flags: needinfo?(ted)
adding b7 signature
Crash Signature: xul.dll@0x16dd11 | xul.dll@0x16da51 | xul.dll@0x1a9fd1 | xul.dll@0x1d287 | xul.dll@0x14546e | x... ] → xul.dll@0x16dd11 | xul.dll@0x16da51 | xul.dll@0x1a9fd1 | xul.dll@0x1d287 | xul.dll@0x14546e | x... ] [@ xul.dll@0x17d5cfa | xul.dll@0x17d5f72 | xul.dll@0x1809e43 | xul.dll@0x180ae3d | xul.dll@0x50b991 | xul.dll@0x189ca1f | xul.dll@0x44d3f6 | xul.dll@0x4…
adding B8 signature.
Crash Signature: xul.dll@0x449131 | xul.dll@0x16e471 | xul.dll@0x16e1b1 | xul.dll@0x1aa871 | xul.dll@0x1d257 | xul.dll@0x145985 | x...] → xul.dll@0x449131 | xul.dll@0x16e471 | xul.dll@0x16e1b1 | xul.dll@0x1aa871 | xul.dll@0x1d257 | xul.dll@0x145985 | x...] [@ xul.dll@0x17df30a | xul.dll@0x17df582 | xul.dll@0x1813353 | xul.dll@0x181434d | xul.dll@0x50b851 | xul.dll@0x18a5c5f | xul.dll@0x44…
If you look at the modules tab, libxul is missing a debug ID. This usually only happens in near-OOM situations. Looking at the memory annotations does indicate that this system is pretty low on memory.
Flags: needinfo?(ted)
Hey Eric,  

These seem like near-OOMs. Is there anything here that looks like we should dig further into it, from a MemShrink perspective? 

-selena
Flags: needinfo?(erahm)
These all have an IPC fatal error of "Error deserializing variant TShmem of union MemoryOrShmem" [1]. This points to a gfx issue. The few reports I looked at had low virtual memory, it seems possible we're leaking shared memory.

Andrew, do you know who could look at this?

[1] https://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/LayersSurfaces.cpp#2805
Flags: needinfo?(erahm) → needinfo?(aosmond)
Seems to be 32-bit specific which makes sense for running out of virtual memory, we have some crashes on Windows 10 so it's more generic than just 7.
OS: Windows 7 → Windows
I think the evidence points to this having been in beta since the start, and this was first observed in 20180829100131 in 63.0a1. It may have been introduced sooner than that, but probably not more than a day or two given it seems to have at least once for a build from each day thereafter.

The Windows builds around that date:

Pushlog for 20180829100131: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3205543f957c&tochange=b75561ff5ffe
Pushlog for 20180828220157: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=768eef11f5ff&tochange=3205543f957c
Pushlog for 20180827100129: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e4e2245fc142&tochange=768eef11f5ff
Bug 1485762 strikes me as a smoking gun. I'm guessing either the signature morphed from something more specific (whatever component needed the shmem would have either failed gracefully or crashed), or we used to fail silently.
Flags: needinfo?(aosmond) → needinfo?(agaynor)
Signature morph seems possible, but I'm a bit confused. As far as I could tell when I landed that patch, the "invalid ID" path was only ever hit by the fuzzer. IIRC, I changed that path to a MOZ_CRASH and did some try runs and verified that it wasn't hit.
Flags: needinfo?(agaynor)
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #14)
> Signature morph seems possible, but I'm a bit confused. As far as I could
> tell when I landed that patch, the "invalid ID" path was only ever hit by
> the fuzzer. IIRC, I changed that path to a MOZ_CRASH and did some try runs
> and verified that it wasn't hit.

It seems to only happen once per nightly build at most across the whole population, some builds don't even see one. We only got meaningful volume on beta. If we can find the old signatures, we can possibly confirm whether or not the crash rate has changed -- if it is the same or lower, I imagine that would alleviate concerns.
It appears if we got a shmem we couldn't map in, we would still create the texture:

https://searchfox.org/mozilla-central/rev/ffe6eaf2f032e58ec3b0650a87df2c62ae4ca441/gfx/layers/composite/TextureHost.cpp#284

Apparently I was even the last one to touch this (!!). This is called indirectly by TextureHost::CreateIPDLActor which in turn comes from AllocPTextureParent on CompositorBridgeParent, ImageBridgeParent, VideoBridgeParent. Buried inside the SurfaceDescriptor parameter there is a shmem, which presumably fails to be read, so we don't get to the graceful failure in the graphics code.

Based on that, I recommend we back out bug 1485762 from beta (and central). This may cause issues with the fuzzer, but I don't know much about the motivations of the original change or if it can be achieved another way that doesn't prevent the discretion of the IPDL actors to decide the severity of not getting the shmem...
I would expect that code to become unreachable with my patch landed. Returning |false| from the deserializer should cause an error, instead of going into the AllocPTextureParent (etc) code.
Adding signature seen in b9.
Crash Signature: xul.dll@0x44d616 | xul.dll@0x449581 | xul.dll@0x16e6a1 | xul.dll@0x16e3e1 | xul.dll@0x1aaa11 | xul.dll@0x1d2b7 | xul.dll@0x145bde | x...] → xul.dll@0x44d616 | xul.dll@0x449581 | xul.dll@0x16e6a1 | xul.dll@0x16e3e1 | xul.dll@0x1aaa11 | xul.dll@0x1d2b7 | xul.dll@0x145bde | x...] [@ xul.dll@0x17ee0ea | xul.dll@0x17ee362 | xul.dll@0x1822133 | xul.dll@0x182312d | xul.dll@0x50e4d1 | xul.dll@0x18b…
(In reply to Andrew Osmond [:aosmond] from comment #16)
> It appears if we got a shmem we couldn't map in, we would still create the
> texture:
> 
> https://searchfox.org/mozilla-central/rev/
> ffe6eaf2f032e58ec3b0650a87df2c62ae4ca441/gfx/layers/composite/TextureHost.
> cpp#284
> 
> Apparently I was even the last one to touch this (!!). This is called
> indirectly by TextureHost::CreateIPDLActor which in turn comes from
> AllocPTextureParent on CompositorBridgeParent, ImageBridgeParent,
> VideoBridgeParent. Buried inside the SurfaceDescriptor parameter there is a
> shmem, which presumably fails to be read, so we don't get to the graceful
> failure in the graphics code.
> 
> Based on that, I recommend we back out bug 1485762 from beta (and central).
> This may cause issues with the fuzzer, but I don't know much about the
> motivations of the original change or if it can be achieved another way that
> doesn't prevent the discretion of the IPDL actors to decide the severity of
> not getting the shmem...

Alex, can you please backout bug 1485762 and request an uplift? The code Andrew noted appropriately handles invalid shmem instances in a non-fatal way. If you need to assert in fuzzing builds, perhaps we can wrap that in a `#ifdef FUZZING`.
Flags: needinfo?(agaynor)
I can, but I'm still unclear on how that patch would cause these issues.

The result of that patch should be that invalid shmem IDs result in the ParamTraits<> parsing of the IPC message error, which means either the parent or the child gets killed (I can't remember which). Is that what's happening here?
Flags: needinfo?(agaynor)
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #20)
> The result of that patch should be that invalid shmem IDs result in the
> ParamTraits<> parsing of the IPC message error, which means either the
> parent or the child gets killed (I can't remember which). Is that what's
> happening here?

Yes, there are some details in comment 9.
Ok, backports landed and uplifted.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Assignee: nobody → agaynor
Target Milestone: --- → mozilla64
Adding signature for B10. I think maybe the backout missed Beta10 and will be in the next beta.
Crash Signature: xul.dll@0x18b4adf | xul.dll@0x44d5a6 | xul.dll@0x4493a1 | xul.dll@0x16e611 | xul.dll@0x16e351 | xul.dll@0x1aa8d1 | xul.dll@0x1d287 | xul.dll@0x145b3b | x...] → xul.dll@0x18b4adf | xul.dll@0x44d5a6 | xul.dll@0x4493a1 | xul.dll@0x16e611 | xul.dll@0x16e351 | xul.dll@0x1aa8d1 | xul.dll@0x1d287 | xul.dll@0x145b3b | x...] [@ xul.dll@0x17e384a | xul.dll@0x17e3ac2 | xul.dll@0x1817963 | xul.dll@0x181895d | xul.dll@0x50…
there is handful of signatures in b11 but their share has greatly diminished, from 2% of browser crashes in b10 to 0.1% in b11 currently.
Crash Signature: xul.dll@0x508bc1 | xul.dll@0x18a9ccf | xul.dll@0x4492a6 | xul.dll@0x4450c1 | xul.dll@0x16cf1e | xul.dll@0x16cc91 | xul.dll@0x1a8c41 | xul.dll@0x1cfe7 | xul.dll@0x1453fe | x...] → xul.dll@0x508bc1 | xul.dll@0x18a9ccf | xul.dll@0x4492a6 | xul.dll@0x4450c1 | xul.dll@0x16cf1e | xul.dll@0x16cc91 | xul.dll@0x1a8c41 | xul.dll@0x1cfe7 | xul.dll@0x1453fe | x...] [@ xul.dll@0x17e460a | xul.dll@0x17e4882 | xul.dll@0x17f4f7f | xul.dll@0x17f…
You need to log in before you can comment on or make changes to this bug.