Closed
Bug 1402586
Opened 7 years ago
Closed 7 years ago
Crash in nsTArray_base<T>::SwapArrayElements<T> | nsTArray_Impl<T>::SwapElements<T> | mozilla::ipc::DeserializeIPCStream
Categories
(Core :: DOM: Core & HTML, defect)
Tracking
()
RESOLVED
FIXED
mozilla58
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox55 | --- | unaffected |
firefox56 | --- | unaffected |
firefox57 | --- | fixed |
firefox58 | --- | fixed |
People
(Reporter: philipp, Assigned: bkelly)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
1.51 KB,
patch
|
asuth
:
review+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-b5433694-a1ff-4539-9f77-0444a0170922. ============================================================= Crashing Thread (33), Name: DOM Worker Frame Module Signature Source 0 xul.dll nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::SwapArrayElements<nsTArrayInfallibleAllocator, nsTArrayInfallibleAllocator>(nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>&, unsigned __int64, unsigned __int64) xpcom/ds/nsTArray-inl.h:349 1 xul.dll nsTArray_Impl<RefPtr<mozilla::dom::quota::DirectoryLockImpl>, nsTArrayInfallibleAllocator>::SwapElements<nsTArrayInfallibleAllocator>(nsTArray_Impl<RefPtr<mozilla::dom::quota::DirectoryLockImpl>, nsTArrayInfallibleAllocator>&) xpcom/ds/nsTArray.h:1802 2 xul.dll mozilla::ipc::DeserializeIPCStream(mozilla::ipc::IPCStream const&) ipc/glue/IPCStreamUtils.cpp:414 3 xul.dll mozilla::ipc::DeserializeIPCStream(mozilla::ipc::OptionalIPCStream const&) ipc/glue/IPCStreamUtils.cpp:430 4 xul.dll <lambda_d131e09cdf1f65f318b74bf31670ad4c>::operator() dom/cache/CacheStreamControlChild.cpp:120 5 xul.dll mozilla::MozPromise<mozilla::ipc::OptionalIPCStream, mozilla::ipc::PromiseRejectReason, 0>::ThenValue<<lambda_d131e09cdf1f65f318b74bf31670ad4c>, <lambda_456c3dd1edb9b73c99703b668fed0127> >::DoResolveOrRejectInternal(mozilla::MozPromise<mozilla::ipc::OptionalIPCStream, mozilla::ipc::PromiseRejectReason, 0>::ResolveOrRejectValue&) xpcom/threads/MozPromise.h:765 6 xul.dll mozilla::MozPromise<mozilla::ipc::FileDescriptor, mozilla::ipc::PromiseRejectReason, 0>::ThenValueBase::DoResolveOrReject(mozilla::MozPromise<mozilla::ipc::FileDescriptor, mozilla::ipc::PromiseRejectReason, 0>::ResolveOrRejectValue&) xpcom/threads/MozPromise.h:497 7 xul.dll mozilla::MozPromise<mozilla::ipc::OptionalIPCStream, mozilla::ipc::PromiseRejectReason, 0>::ThenValueBase::ResolveOrRejectRunnable::Run() xpcom/threads/MozPromise.h:402 8 xul.dll `anonymous namespace'::ExternalRunnableWrapper::Cancel dom/workers/WorkerPrivate.cpp:277 9 xul.dll mozilla::dom::workers::WorkerRunnable::Run() dom/workers/WorkerRunnable.cpp:253 10 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1039 11 xul.dll NS_ProcessPendingEvents(nsIThread*, unsigned int) xpcom/threads/nsThreadUtils.cpp:463 12 xul.dll mozilla::dom::workers::WorkerPrivate::ClearMainEventQueue(mozilla::dom::workers::WorkerPrivate::WorkerRanOrNot) dom/workers/WorkerPrivate.cpp:5603 13 xul.dll `anonymous namespace'::WorkerThreadPrimaryRunnable::Run dom/workers/RuntimeService.cpp:2863 14 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1039 15 xul.dll NS_ProcessNextEvent(nsIThread*, bool) xpcom/threads/nsThreadUtils.cpp:521 16 xul.dll mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:338 17 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:319 18 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:299 19 xul.dll nsThread::ThreadFunc(void*) xpcom/threads/nsThread.cpp:427 20 nss3.dll PR_NativeRunThread nsprpub/pr/src/threads/combined/pruthr.c:397 21 nss3.dll pr_root nsprpub/pr/src/md/windows/w95thred.c:95 22 ucrtbase.dll o__realloc_base 23 kernel32.dll BaseThreadInitThunk 24 ntdll.dll LdrpAllocateTls 25 kernelbase.dll kernelbase.dll@0x6d4ef this signature for content process crashes first showed up on 57.0a1 build 20170916220246, which would look related to bug 1397128 timing wise.
Flags: needinfo?(bkelly)
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 1•7 years ago
|
||
I think this is what is happening: 1. A worker thread starts reading the Cache API stream which triggers a lazy open. 2. While the lazy open is in process the worker thread starts shutting down. 3. The worker shutdown closes the stream. 4. The lazy open resolves some time around here placing a runnable in the worker event loop. 5. The CacheStreamControlChild sees the close and allows itself to be destroyed. 6. The WorkerHolder for the CacheStreamControlChild is dropped allowing the worker to shutdown. 7. The worker shuts down the PBackground actor killing the PFileDescriptorChild in the serialized stream. 8. The lazy open resolve runnable finally fires when the worker empties its event loop at the end. 9. The lazy open resolve runnable tries to deserialize the PFileDescriptorChild and crashes.
Assignee | ||
Updated•7 years ago
|
Blocks: ServiceWorkers-stability
Assignee | ||
Comment 2•7 years ago
|
||
Andrew, do you mind reviewing this? Tom reviewed the original patches for this feature, but he is out on PTO at the moment. This patch just ensures that we hold the worker alive while the lazy open operation is in progress. This will prevent the worker thread from kill the PBackground IPC tree out from underneath us. Note, its ok if the CacheStreamControlChild actor itself goes away, but we must ensure the PBackground actor and its child PFileDescriptorSetChild actor live until we do the DeserializeIPCStream() method. I don't have a way to reproduce this at the moment. Its quite racy. So I'm not planning on writing a test. https://treeherder.mozilla.org/#/jobs?repo=try&revision=a91a4f886c7615fd3c8ac6eb982dad82eb284408
Attachment #8911835 -
Flags: review?(bugmail)
Comment 3•7 years ago
|
||
Comment on attachment 8911835 [details] [diff] [review] Hold the worker alive while Cache API completes a lazy body open IPC operation. r=asuth Review of attachment 8911835 [details] [diff] [review]: ----------------------------------------------------------------- Excellent analysis in comment 1 and comment in the patch, thank you! (In reply to Ben Kelly [:bkelly] from comment #2) > I don't have a way to reproduce this at the moment. Its quite racy. So I'm > not planning on writing a test. Yes, I think we'd need test infrastructure changes to support something like this. Something that effectively looked like gdb (or rr) playing god with scheduling based on a set of symbolic ordering descriptions in whatever the test definition is. This could also be done internally with a swapped-in replacement event dispatch mechanism that could perform symbolic unwinding itself to know when the runnable under test was getting dispatched and should result in stalling the queue's execution until whatever the next semantic event is; in this case, thread shutdown.
Attachment #8911835 -
Flags: review?(bugmail) → review+
Pushed by bkelly@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/68cc39f8ed24 Hold the worker alive while Cache API completes a lazy body open IPC operation. r=asuth
Comment 5•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/68cc39f8ed24
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Assignee | ||
Comment 6•7 years ago
|
||
Comment on attachment 8911835 [details] [diff] [review] Hold the worker alive while Cache API completes a lazy body open IPC operation. r=asuth Approval Request Comment [Feature/Bug causing the regression]: Bug 1397128 [User impact if declined]: Intermittent crashes on sites using service workers. [Is this code covered by automated tests?]: We have extensive tests for this code, but it doesn't trigger the race condition responsible for the crash. [Has the fix been verified in Nightly?]: The fix is in nightly, but we don't have a reliable way to reproduce the crash. So its difficult to verify. Its a low frequency crash. [Needs manual test from QE? If yes, steps to reproduce]: No. [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Minimal risk [Why is the change risky/not risky?]: The solution to this bug is to keep a worker thread slightly longer while some IPC messaging completes. This is a standard operation we do in other places. [String changes made/needed]: None
Attachment #8911835 -
Flags: approval-mozilla-beta?
Comment 7•7 years ago
|
||
Comment on attachment 8911835 [details] [diff] [review] Hold the worker alive while Cache API completes a lazy body open IPC operation. r=asuth Fix a crash, taking it. Should be in 57b4 (gtb tomorrow Thursday)
Attachment #8911835 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 8•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/dce8934980ff
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•