Closed Bug 1498784 Opened 3 years ago Closed 3 years ago
Crash in mozilla::ipc::Optional
This bug was filed from the Socorro interface and is report bp-d551d66c-bff2-4f4e-aae5-5963f0181012. ============================================================= Top 10 frames of crashing thread: 0 xul.dll mozilla::ipc::OptionalIPCStream::AssertSanity ipc/ipdl/_ipdlheaders/mozilla/ipc/IPCStream.h:742 1 xul.dll mozilla::ipc::OptionalIPCStream::AssertSanity ipc/ipdl/_ipdlheaders/mozilla/ipc/IPCStream.h:748 2 xul.dll ?CleanupIPCStream@?A0xCBC2F23F@ipc@mozilla@@YAXAEAVOptionalIPCStream@12@_N1@Z.llvm.1628995540106455789 ipc/glue/IPCStreamUtils.cpp:348 3 xul.dll mozilla::ipc::AutoIPCStream::~AutoIPCStream ipc/glue/IPCStreamUtils.cpp:478 4 xul.dll mozilla::UniquePtr<mozilla::ipc::AutoIPCStream, mozilla::DefaultDelete<mozilla::ipc::AutoIPCStream> >::reset mfbt/UniquePtr.h:343 5 xul.dll void nsTArray_Impl<mozilla::UniquePtr<mozilla::ipc::AutoIPCStream, mozilla::DefaultDelete<mozilla::ipc::AutoIPCStream> >, nsTArrayInfallibleAllocator>::ClearAndRetainStorage xpcom/ds/nsTArray.h:1382 6 xul.dll void nsTArray_Impl<mozilla::UniquePtr<mozilla::ipc::AutoIPCStream, mozilla::DefaultDelete<mozilla::ipc::AutoIPCStream> >, nsTArrayInfallibleAllocator>::Clear xpcom/ds/nsTArray.h:1895 7 xul.dll void mozilla::dom::cache::AutoChildOpArgs::~AutoChildOpArgs dom/cache/AutoUtils.cpp:141 8 xul.dll mozilla::dom::cache::Cache::Put dom/cache/Cache.cpp:453 9 xul.dll static bool mozilla::dom::Cache_Binding::put dom/bindings/CacheBinding.cpp:871 ============================================================= crash reports with this signature are newly showing up in 63.0b14 and 64.0a1 with "MOZ_RELEASE_ASSERT((T__None) <= (mType)) (invalid type tag)". from the list of changes that were going into b14 (https://mzl.la/2pT1a3P) it looks like the signature might be related to the crash fix for bug 1484524.
The signature appeared in nightly 64 with buildid 20181011220118. In comparing pushlogs for this build and for beta 14, it appears that bug 1484524 likely introduced this regression. :ytausky, could you investigate please ?
This was bound to happen... Take everything I say with a grain of salt since I still don't know how to reproduce this scenario, so I cannot test whether a fix for it actually works. Having that said, I think I know what's going on here. When `ToCacheResponse` fails, the element referenced by `pair` get removed, making the reference previously returned from `pair.response()` dangling. That reference was indirectly stored in `mStreamCleanupList`, so when that list is cleared, cleanup code is run on a dangling reference, leading to the crash. I could try to patch that up, but until we find a way to reliably reproduce this, it's hard to give any guarantees.  https://searchfox.org/mozilla-central/source/dom/cache/AutoUtils.cpp#333  https://searchfox.org/mozilla-central/source/dom/cache/TypeUtils.cpp#521
Assignee: nobody → ytausky
Status: NEW → ASSIGNED
Priority: -- → P2
A hasty attempt to fix another crash (bug 1484524) created a use- after-free situation, leading to another crash. This patch attempts to rectify that, hopefully without running into yet another crash.
The crashes all have 0xe5e5e5e5e5e5e5e5 in r9, which is consistent with a use-after-free. Also, of the 55 crashes known to crash-stats, 52 are in a content process but 3 are in the parent process, so sandboxing may not completely contain this and/or this may be usable as a sandbox escape.
I took a look at where we're crashing to see if I could get any more info from the crash reports; this is from win64 beta 20181011200118 (GNU/AT&T syntax; the int3 is the crash site): 181a29590: 48 83 ec 28 sub $0x28,%rsp 181a29594: 8b 41 40 mov 0x40(%rcx),%eax 181a29597: 85 c0 test %eax,%eax 181a29599: 78 0a js 0x181a295a5 181a2959b: 83 f8 03 cmp $0x3,%eax 181a2959e: 7d 23 jge 0x181a295c3 181a295a0: 48 83 c4 28 add $0x28,%rsp 181a295a4: c3 retq 181a295a5: 48 8d 05 5f f1 17 03 lea 0x317f15f(%rip),%rax # 0x184ba870b 181a295ac: 48 8b 0d 55 34 23 03 mov 0x3233455(%rip),%rcx # 0x184c5ca08 181a295b3: 48 89 01 mov %rax,(%rcx) 181a295b6: cc int3 The value that we're bounds-checking is in %eax and the branch for the value being negative (which would include 0xe5e5e5e5) is taken, but the register is clobbered by the process of setting a global variable to point to the assertion string, so we don't have the invalid value itself in the crash report. I don't know exactly where the heap poison value in r9 is coming from; I walked up the stack looking for references to r9, and I didn't see any until the mozilla::dom::cache::Cache::Put frame, but that function is complicated.
I'm working on a unit-ish test that reproduces what I assume to be this scenario. This requires some groundwork to enable mocking some stuff, so it might take a while. In the meantime it might be wise to revert the patch for bug 1484524 -- before the patch, the crash happened due to a normal assertion failure, without dereferencing any dangling or null pointers.
The change that caused this has been backed out in bug 1484524, so I'm resolving this as fixed.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.