Closed Bug 1450874 Opened 6 years ago Closed 6 years ago

Crash in nsTArray_Impl<T>::AppendElement<T> | mozilla::dom::WorkerLoadInfo::InterfaceRequestor::MaybeAddTabChild

Categories

(Core :: DOM: Service Workers, defect, P1)

All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox-esr52 --- unaffected
firefox59 --- unaffected
firefox60 --- unaffected
firefox61 + fixed

People

(Reporter: calixte, Assigned: bkelly)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is
report bp-4a398844-871f-4b8c-ab8e-c4c870180403.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll nsTArray_Impl<RefPtr<mozilla::ComputedStyle>, nsTArrayInfallibleAllocator>::AppendElement<already_AddRefed<mozilla::ComputedStyle>, nsTArrayInfallibleAllocator> xpcom/ds/nsTArray.h:2286
1 xul.dll mozilla::dom::WorkerLoadInfo::InterfaceRequestor::MaybeAddTabChild dom/workers/WorkerLoadInfo.cpp:472
2 xul.dll mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded dom/serviceworkers/ServiceWorkerPrivate.cpp:1822
3 xul.dll mozilla::dom::ServiceWorkerPrivate::SendFetchEvent dom/serviceworkers/ServiceWorkerPrivate.cpp:1757
4 xul.dll static nsresult mozilla::dom::`anonymous namespace'::ContinueDispatchFetchEventRunnable::Run dom/serviceworkers/ServiceWorkerManager.cpp:2328
5 xul.dll nsPermissionManager::WhenPermissionsAvailable extensions/cookie/nsPermissionManager.cpp:3444
6 xul.dll static void <lambda_499c2de420bd99c0430356c54ec813d7>::operator dom/serviceworkers/ServiceWorkerManager.cpp:2496
7 xul.dll nsresult mozilla::detail::RunnableFunction<<lambda_499c2de420bd99c0430356c54ec813d7> >::Run xpcom/threads/nsThreadUtils.h:551
8 xul.dll mozilla::net::HttpBaseChannel::EnsureUploadStreamIsCloneable netwerk/protocol/http/HttpBaseChannel.cpp:918
9 xul.dll mozilla::dom::ServiceWorkerManager::DispatchFetchEvent dom/serviceworkers/ServiceWorkerManager.cpp:2509

=============================================================

There are 223 crashes (from 17 installations) in nightly 61 with buildid 20180402220122. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1448141.

[1] https://hg.mozilla.org/mozilla-central/rev?node=c22501d278b4
Flags: needinfo?(bkelly)
Crash Signature: [@ nsTArray_Impl<T>::AppendElement<T> | mozilla::dom::WorkerLoadInfo::InterfaceRequestor::MaybeAddTabChild] → [@ nsTArray_Impl<T>::AppendElement<T> | mozilla::dom::WorkerLoadInfo::InterfaceRequestor::MaybeAddTabChild] [@ mozilla::dom::WorkerLoadInfo::InterfaceRequestor::MaybeAddTabChild] [@ libxul.so@0x1bf53d7 | mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIf…
Crash Signature: mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded] → mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded] [@ libxul.so@0x1be2eb7 | mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded]
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)
Priority: -- → P1
Mike, does it reproduce in a clean profile for you?  Any chance you can save off the profile for me to look at?
Flags: needinfo?(mconley)
(In reply to Ben Kelly [Part-time, not reviewing][:bkelly] from comment #3)
> Mike, does it reproduce in a clean profile for you?  Any chance you can save
> off the profile for me to look at?

Hm, no, it doesn't.

This is my default profile - I'm a little reluctant to send it anywhere, really. I trust you, certainly, but I'm a bit antsy about putting it anywhere, mailing it somehow, etc.

Is there some debugging information I can give you instead?
Flags: needinfo?(mconley) → needinfo?(bkelly)
I think this is perhaps related to bug 1448141 P4.  If the service worker script fails to load we don't actually clear our mWorkerPrivate.  So we can try dispatching to a shutting down worker.
Mike, can you send me these files and folders from your profile?

  serviceworker.txt
  storage/default/https+++www.cnet.com

The cnet.com folder may have a userContextId part as well if you are using containers.
Flags: needinfo?(bkelly) → needinfo?(mconley)
Also, if you could at least save your current profile off that would help me.  I want to provide you a test build later today.
I can always reproduce the crash with the cnet.com url:
https://crash-stats.mozilla.com/report/index/e9e5610e-6d94-4baa-b78e-940b50180403
Ok.  I got the files I asked for in comment 6 from Mike.  I'll keep investigating.  Hopefully the try build in comment 8 will help once its done building.
Affects outlook.live.com as well. Just try opening any received mail.   
https://crash-stats.mozilla.com/report/index/d620645e-37f6-46fa-b383-02ead0180403
Hey bkelly - the try build appears to fix the crash for me!
Flags: needinfo?(mconley)
(In reply to Mike Conley (:mconley) (:⚙️) (Totally backlogged on reviews and needinfos) from comment #12)
> Hey bkelly - the try build appears to fix the crash for me!

Ok.  It appears to still fail in a debug build, though... trying to get that sorted and then I'll post for review.  I'll also try to write a test.
Comment on attachment 8964592 [details] [diff] [review]
P1 If a service worker script self closes due to a compilation error, then don't re-use the worker for further events. r=asuth

Andrew, this patch makes us follow our TerminateWorker() path if we detect that the service worker thread started shutting down on its own.

Self-shutdown was not possible in previous code, but was introduced in bug 1448141.  When the worker script cannot be compiled we now immediately enter the closing state on the worker thread.  This prevents the service worker code from accidentally holding it alive while it dispatched functional events.

The ServiceWorkerPrivate code, however, was never written with the expectation that worker thread would terminate itself.  So this patch catches the condition by checking for a non-running status and treating it like a normal terminate.

I believe we are hitting this path because navigating to a controlled page effectively queues up multiple fetch events before we are able to run the update to clear the bad service worker.  The navigation tries to start the service worker, fails, but then loads from the network.  Then a style sheet or script resource tries to hit the service worker and triggers the crash.

Note, I can't completely explain the last frame in the crash stack.  It seems the strong ref from SWP should keep the WorkerPrivate and it's info's nsTArray at least in memory.  Its possible there is something further wrong due to WorkerRef, etc, or maybe the crash stack is just bogus.

In any case, I think this check is reasonably safe and local testing shows that it fixes the crash.
Attachment #8964592 - Flags: review?(bugmail)
Comment on attachment 8964618 [details] [diff] [review]
P2 Don't save service worker time stamps if the fetch event didn't actually dispatch. r=asuth

This patch makes us skip trying to set the various time stamps after resuming a failed channel.  Ideally we would avoid calling this method, but fixing that would require additional plumbing I would prefer not to rush for this top crasher.
Attachment #8964618 - Flags: review?(bugmail)
See Also: → 1183245
BTW, my local STR are:

1. Visit https://blog.wanderview.com/blog/2017/03/13/firefox-52-settimeout-changes/
2. Close the browser
3. Go to your profile dir and delete storage/default/https+++blog.wanderview.com
4. Open the browser and visit the URL in (1) again.

I'm going to see if I can trigger in a test now.
I can't seem to get an automated test to trigger this crash.  I think its too timing dependent.  I'll probably just go without a test.
Ben, did you try by using a Marionette test? Not sure which harness you were using but only Marionette offers a real restart of the browser.
(In reply to Ben Kelly [Part-time, not reviewing][:bkelly] from comment #15)
> Note, I can't completely explain the last frame in the crash stack.  It
> seems the strong ref from SWP should keep the WorkerPrivate and it's info's
> nsTArray at least in memory.  Its possible there is something further wrong
> due to WorkerRef, etc, or maybe the crash stack is just bogus.

Disassembly for the linux crasher https://crash-stats.mozilla.com/report/index/e9e5610e-6d94-4baa-b78e-940b50180403 via radare2 seems to show that we're dying dereferencing mLoadInfo.mInterfaceRequestor and it's null.  Specifically, we're in InterfaceRequestor::MaybeAddTabChild, we've returned from the call to do_GetWeakReference(tabChild) and are doing [(first arg to InterfaceRequestor::MaybeAddTabChild which should be the 'this') + 0x20], and the assembly then does another dereference on that with +0x20 before calling the actual nsTArray append method.

I need to go to dinner but will finish review this evening.
Ok.  I think that makes sense.  This clears the mInterfaceRequestor if we have started closing:

https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#2858
Sorry, its more likely cleared here:

https://searchfox.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#288

In any case, I still think checking the running status is a fine fix.
Attachment #8964592 - Flags: review?(bugmail) → review+
Attachment #8964618 - Flags: review?(bugmail) → review+
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d602877c654a
P1 If a service worker script self closes due to a compilation error, then don't re-use the worker for further events. r=asuth
https://hg.mozilla.org/integration/mozilla-inbound/rev/b054a5905c4b
P2 Don't save service worker time stamps if the fetch event didn't actually dispatch. r=asuth
https://hg.mozilla.org/mozilla-central/rev/d602877c654a
https://hg.mozilla.org/mozilla-central/rev/b054a5905c4b
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Crash Signature: mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded] [@ libxul.so@0x1be2eb7 | mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded] → mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded] [@ libxul.so@0x1be2eb7 | mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded] [@ libxul.so@0x1bd9997 | mozilla::dom::ServiceWorkerPrivate::SpawnWorkerIfNeeded]
We've gone a few buildid's without this crash triggering, so I believe this fix worked.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: