Closed
Bug 1491340
Opened 6 years ago
Closed 6 years ago
Crash in xul.dll@0x17e687a | xul.dll@0x17e6af2 | xul.dll@0x181a883 | xul.dll@0x181b87d | xul.dll@0x509db1 | xul.dll@0x18acb9f | xul.dll@0x449876 | xul.dll@0x445691 | xul.dll@0x16c39e | xul.dll@0x16c111 | xul.dll@0x1a8011 | xul.dll@0x1cff7 | xul.dll@0x1...
Categories
(Core :: General, defect)
Tracking
()
RESOLVED
FIXED
mozilla64
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox62 | --- | unaffected |
firefox63 | --- | fixed |
firefox64 | --- | fixed |
People
(Reporter: marcia, Assigned: Alex_Gaynor)
References
Details
(Keywords: crash, regression)
Crash Data
This bug was filed from the Socorro interface and is report bp-3190e4ef-207f-46b0-baba-d6a9e0180913. ============================================================= Win7 specific crash which started in 63.0b5: https://bit.ly/2p7sY3S. Here are the bugs that went into that beta: https://mzl.la/2MYBXD3 Full Mercurial Changelog: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_63_0b4_RELEASE&tochange=554953db0c6120f7c67794dfb1bd5b493f15749d&full=1 Correlation: Error deserializing variant TShmem of union MemoryOrShmem Some of the URLs appear to be web developers testing various things. Top 10 frames of crashing thread: 0 xul.dll xul.dll@0x17e687a 1 xul.dll xul.dll@0x17e6af2 2 xul.dll xul.dll@0x181a883 3 xul.dll xul.dll@0x181b87d 4 xul.dll xul.dll@0x509db1 5 xul.dll xul.dll@0x18acb9f 6 xul.dll xul.dll@0x449876 7 xul.dll xul.dll@0x445691 8 xul.dll xul.dll@0x16c39e 9 xul.dll xul.dll@0x16c111 =============================================================
Reporter | ||
Comment 1•6 years ago
|
||
philipp pointed out that this signature has been around before in previous betas, this query has a similar crash reason: https://bit.ly/2NdU8oq.
Reporter | ||
Comment 2•6 years ago
|
||
Adding the other top signatures seen throughout the 63 betas.
Crash Signature: x...] → x...]
[@ xul.dll@0x17ee99a | xul.dll@0x17eec12 | xul.dll@0x1822983 | xul.dll@0x182397d | xul.dll@0x50c771 | xul.dll@0x18b4cbf | xul.dll@0x44a086 | xul.dll@0x445df1 | xul.dll@0x16c35e | xul.dll@0x16c0d1 | xul.dll@0x1a8151 | xul.dll@0x1d037 | xul.dll@0x14…
Reporter | ||
Comment 3•6 years ago
|
||
Adding another signature seen in B6.
Crash Signature: xul.dll@0x1a8151 | xul.dll@0x1d037 | xul.dll@0x144a1e | xul.dll@0x1ce91 | x... ] → xul.dll@0x1a8151 | xul.dll@0x1d037 | xul.dll@0x144a1e | xul.dll@0x1ce91 | x... ]
[@ xul.dll@0x17f593a | xul.dll@0x17f5bb2 | xul.dll@0x18299c3 | xul.dll@0x182a9bd | xul.dll@0x50afa1 | xul.dll@0x18bc31f | xul.dll@0x44cf46 | xul.dll@0x448d41 | xul.dll@0x16…
Comment 5•6 years ago
|
||
adding b7 signature
Crash Signature: xul.dll@0x16dd11 | xul.dll@0x16da51 | xul.dll@0x1a9fd1 | xul.dll@0x1d287 | xul.dll@0x14546e | x... ] → xul.dll@0x16dd11 | xul.dll@0x16da51 | xul.dll@0x1a9fd1 | xul.dll@0x1d287 | xul.dll@0x14546e | x... ]
[@ xul.dll@0x17d5cfa | xul.dll@0x17d5f72 | xul.dll@0x1809e43 | xul.dll@0x180ae3d | xul.dll@0x50b991 | xul.dll@0x189ca1f | xul.dll@0x44d3f6 | xul.dll@0x4…
Reporter | ||
Comment 6•6 years ago
|
||
adding B8 signature.
Crash Signature: xul.dll@0x449131 | xul.dll@0x16e471 | xul.dll@0x16e1b1 | xul.dll@0x1aa871 | xul.dll@0x1d257 | xul.dll@0x145985 | x...] → xul.dll@0x449131 | xul.dll@0x16e471 | xul.dll@0x16e1b1 | xul.dll@0x1aa871 | xul.dll@0x1d257 | xul.dll@0x145985 | x...]
[@ xul.dll@0x17df30a | xul.dll@0x17df582 | xul.dll@0x1813353 | xul.dll@0x181434d | xul.dll@0x50b851 | xul.dll@0x18a5c5f | xul.dll@0x44…
Comment 7•6 years ago
|
||
If you look at the modules tab, libxul is missing a debug ID. This usually only happens in near-OOM situations. Looking at the memory annotations does indicate that this system is pretty low on memory.
Flags: needinfo?(ted)
Comment 8•6 years ago
|
||
Hey Eric, These seem like near-OOMs. Is there anything here that looks like we should dig further into it, from a MemShrink perspective? -selena
Flags: needinfo?(erahm)
Comment 9•6 years ago
|
||
These all have an IPC fatal error of "Error deserializing variant TShmem of union MemoryOrShmem" [1]. This points to a gfx issue. The few reports I looked at had low virtual memory, it seems possible we're leaking shared memory. Andrew, do you know who could look at this? [1] https://searchfox.org/mozilla-central/source/__GENERATED__/ipc/ipdl/LayersSurfaces.cpp#2805
Flags: needinfo?(erahm) → needinfo?(aosmond)
Comment 10•6 years ago
|
||
Seems to be 32-bit specific which makes sense for running out of virtual memory, we have some crashes on Windows 10 so it's more generic than just 7.
OS: Windows 7 → Windows
Comment 11•6 years ago
|
||
Are we sure we have the right regression window? https://crash-stats.mozilla.com/search/?ipc_fatal_error_msg=~Error%20deserializing%20variant%20TShmem%20of%20union%20MemoryOrShmem&date=%3E%3D2018-06-25T21%3A04%3A48.000Z&date=%3C2018-09-25T21%3A04%3A48.000Z&_sort=-date&_facets=signature&_facets=version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version I suspect this started happening earlier...possibly even in nightly.
Comment 12•6 years ago
|
||
I think the evidence points to this having been in beta since the start, and this was first observed in 20180829100131 in 63.0a1. It may have been introduced sooner than that, but probably not more than a day or two given it seems to have at least once for a build from each day thereafter. The Windows builds around that date: Pushlog for 20180829100131: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3205543f957c&tochange=b75561ff5ffe Pushlog for 20180828220157: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=768eef11f5ff&tochange=3205543f957c Pushlog for 20180827100129: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e4e2245fc142&tochange=768eef11f5ff
Comment 13•6 years ago
|
||
Bug 1485762 strikes me as a smoking gun. I'm guessing either the signature morphed from something more specific (whatever component needed the shmem would have either failed gracefully or crashed), or we used to fail silently.
Flags: needinfo?(aosmond) → needinfo?(agaynor)
Assignee | ||
Comment 14•6 years ago
|
||
Signature morph seems possible, but I'm a bit confused. As far as I could tell when I landed that patch, the "invalid ID" path was only ever hit by the fuzzer. IIRC, I changed that path to a MOZ_CRASH and did some try runs and verified that it wasn't hit.
Flags: needinfo?(agaynor)
Comment 15•6 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #14) > Signature morph seems possible, but I'm a bit confused. As far as I could > tell when I landed that patch, the "invalid ID" path was only ever hit by > the fuzzer. IIRC, I changed that path to a MOZ_CRASH and did some try runs > and verified that it wasn't hit. It seems to only happen once per nightly build at most across the whole population, some builds don't even see one. We only got meaningful volume on beta. If we can find the old signatures, we can possibly confirm whether or not the crash rate has changed -- if it is the same or lower, I imagine that would alleviate concerns.
Comment 16•6 years ago
|
||
It appears if we got a shmem we couldn't map in, we would still create the texture: https://searchfox.org/mozilla-central/rev/ffe6eaf2f032e58ec3b0650a87df2c62ae4ca441/gfx/layers/composite/TextureHost.cpp#284 Apparently I was even the last one to touch this (!!). This is called indirectly by TextureHost::CreateIPDLActor which in turn comes from AllocPTextureParent on CompositorBridgeParent, ImageBridgeParent, VideoBridgeParent. Buried inside the SurfaceDescriptor parameter there is a shmem, which presumably fails to be read, so we don't get to the graceful failure in the graphics code. Based on that, I recommend we back out bug 1485762 from beta (and central). This may cause issues with the fuzzer, but I don't know much about the motivations of the original change or if it can be achieved another way that doesn't prevent the discretion of the IPDL actors to decide the severity of not getting the shmem...
Assignee | ||
Comment 17•6 years ago
|
||
I would expect that code to become unreachable with my patch landed. Returning |false| from the deserializer should cause an error, instead of going into the AllocPTextureParent (etc) code.
Reporter | ||
Comment 18•6 years ago
|
||
Adding signature seen in b9.
Crash Signature: xul.dll@0x44d616 | xul.dll@0x449581 | xul.dll@0x16e6a1 | xul.dll@0x16e3e1 | xul.dll@0x1aaa11 | xul.dll@0x1d2b7 | xul.dll@0x145bde | x...] → xul.dll@0x44d616 | xul.dll@0x449581 | xul.dll@0x16e6a1 | xul.dll@0x16e3e1 | xul.dll@0x1aaa11 | xul.dll@0x1d2b7 | xul.dll@0x145bde | x...]
[@ xul.dll@0x17ee0ea | xul.dll@0x17ee362 | xul.dll@0x1822133 | xul.dll@0x182312d | xul.dll@0x50e4d1 | xul.dll@0x18b…
Comment 19•6 years ago
|
||
(In reply to Andrew Osmond [:aosmond] from comment #16) > It appears if we got a shmem we couldn't map in, we would still create the > texture: > > https://searchfox.org/mozilla-central/rev/ > ffe6eaf2f032e58ec3b0650a87df2c62ae4ca441/gfx/layers/composite/TextureHost. > cpp#284 > > Apparently I was even the last one to touch this (!!). This is called > indirectly by TextureHost::CreateIPDLActor which in turn comes from > AllocPTextureParent on CompositorBridgeParent, ImageBridgeParent, > VideoBridgeParent. Buried inside the SurfaceDescriptor parameter there is a > shmem, which presumably fails to be read, so we don't get to the graceful > failure in the graphics code. > > Based on that, I recommend we back out bug 1485762 from beta (and central). > This may cause issues with the fuzzer, but I don't know much about the > motivations of the original change or if it can be achieved another way that > doesn't prevent the discretion of the IPDL actors to decide the severity of > not getting the shmem... Alex, can you please backout bug 1485762 and request an uplift? The code Andrew noted appropriately handles invalid shmem instances in a non-fatal way. If you need to assert in fuzzing builds, perhaps we can wrap that in a `#ifdef FUZZING`.
Flags: needinfo?(agaynor)
Assignee | ||
Comment 20•6 years ago
|
||
I can, but I'm still unclear on how that patch would cause these issues. The result of that patch should be that invalid shmem IDs result in the ParamTraits<> parsing of the IPC message error, which means either the parent or the child gets killed (I can't remember which). Is that what's happening here?
Flags: needinfo?(agaynor)
Comment 21•6 years ago
|
||
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #20) > The result of that patch should be that invalid shmem IDs result in the > ParamTraits<> parsing of the IPC message error, which means either the > parent or the child gets killed (I can't remember which). Is that what's > happening here? Yes, there are some details in comment 9.
Assignee | ||
Comment 22•6 years ago
|
||
Ok, backports landed and uplifted.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Updated•6 years ago
|
status-firefox-esr60:
--- → unaffected
Reporter | ||
Comment 23•6 years ago
|
||
Adding signature for B10. I think maybe the backout missed Beta10 and will be in the next beta.
Crash Signature: xul.dll@0x18b4adf | xul.dll@0x44d5a6 | xul.dll@0x4493a1 | xul.dll@0x16e611 | xul.dll@0x16e351 | xul.dll@0x1aa8d1 | xul.dll@0x1d287 | xul.dll@0x145b3b | x...] → xul.dll@0x18b4adf | xul.dll@0x44d5a6 | xul.dll@0x4493a1 | xul.dll@0x16e611 | xul.dll@0x16e351 | xul.dll@0x1aa8d1 | xul.dll@0x1d287 | xul.dll@0x145b3b | x...]
[@ xul.dll@0x17e384a | xul.dll@0x17e3ac2 | xul.dll@0x1817963 | xul.dll@0x181895d | xul.dll@0x50…
Comment 24•6 years ago
|
||
there is handful of signatures in b11 but their share has greatly diminished, from 2% of browser crashes in b10 to 0.1% in b11 currently.
Crash Signature: xul.dll@0x508bc1 | xul.dll@0x18a9ccf | xul.dll@0x4492a6 | xul.dll@0x4450c1 | xul.dll@0x16cf1e | xul.dll@0x16cc91 | xul.dll@0x1a8c41 | xul.dll@0x1cfe7 | xul.dll@0x1453fe | x...] → xul.dll@0x508bc1 | xul.dll@0x18a9ccf | xul.dll@0x4492a6 | xul.dll@0x4450c1 | xul.dll@0x16cf1e | xul.dll@0x16cc91 | xul.dll@0x1a8c41 | xul.dll@0x1cfe7 | xul.dll@0x1453fe | x...]
[@ xul.dll@0x17e460a | xul.dll@0x17e4882 | xul.dll@0x17f4f7f | xul.dll@0x17f…
You need to log in
before you can comment on or make changes to this bug.
Description
•