Intermittent test_quit_restart.py TestServerQuitApplication.test_attempt_quit | application crashed [@ mozilla::dom::FetchBodyConsumer<mozilla::dom::Response>::BeginConsumeBodyMainThread]

RESOLVED FIXED in Firefox 56

Status

()

defect
--
critical
RESOLVED FIXED
2 years ago
5 months ago

People

(Reporter: intermittent-bug-filer, Assigned: baku)

Tracking

({crash, intermittent-failure, regression})

unspecified
mozilla56
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox54 unaffected, firefox55 unaffected, firefox56 fixed)

Details

(crash signature)

Attachments

(1 attachment, 2 obsolete attachments)

Not sure why this got filed under Headless. It's happening for all kind of builds during Marionette jobs.

Here the stack details:

[task 2017-06-22T15:24:09.981435Z] 15:24:09     INFO - Crash reason:  SIGSEGV
[task 2017-06-22T15:24:09.982957Z] 15:24:09     INFO - Crash address: 0x58
[task 2017-06-22T15:24:09.984698Z] 15:24:09     INFO - Process uptime: not available
[task 2017-06-22T15:24:09.986195Z] 15:24:09     INFO - 
[task 2017-06-22T15:24:09.987801Z] 15:24:09     INFO - Thread 0 (crashed)
[task 2017-06-22T15:24:09.989655Z] 15:24:09     INFO -  0  libxul.so!mozilla::dom::FetchBodyConsumer<mozilla::dom::Response>::BeginConsumeBodyMainThread [Response.h:0000dd6746c4 : 107 + 0x3]
[task 2017-06-22T15:24:09.990874Z] 15:24:09     INFO -     rax = 0x00007ffea920b1d0   rdx = 0x0000000000000000
[task 2017-06-22T15:24:09.992140Z] 15:24:09     INFO -     rcx = 0x0000000000000003   rbx = 0x00007f5499837dd0
[task 2017-06-22T15:24:09.993801Z] 15:24:09     INFO -     rsi = 0x00007ffea920b1d0   rdi = 0x0000000000000000
[task 2017-06-22T15:24:09.994974Z] 15:24:09     INFO -     rbp = 0x00007ffea920b250   rsp = 0x00007ffea920b1b0
[task 2017-06-22T15:24:09.995641Z] 15:24:09     INFO -      r8 = 0x0000000000000000    r9 = 0x00000000000007ca
[task 2017-06-22T15:24:09.996262Z] 15:24:09     INFO -     r10 = 0x00007ffea920b300   r11 = 0x0000000000000000
[task 2017-06-22T15:24:09.996408Z] 15:24:09     INFO -     r12 = 0x00007ffea920b1d0   r13 = 0x00007ffea920b38f
[task 2017-06-22T15:24:09.996994Z] 15:24:09     INFO -     r14 = 0x00007ffea920b1c0   r15 = 0x00007f54c285b710
[task 2017-06-22T15:24:09.997179Z] 15:24:09     INFO -     rip = 0x00007f54be8ee094
[task 2017-06-22T15:24:09.997734Z] 15:24:09     INFO -     Found by: given as instruction pointer in context
[task 2017-06-22T15:24:09.997979Z] 15:24:09     INFO -  1  libxul.so!BeginConsumeBodyRunnable<mozilla::dom::Response>::Run [FetchConsumer.cpp:0000dd6746c4 : 67 + 0x5]
[task 2017-06-22T15:24:09.998689Z] 15:24:09     INFO -     rbx = 0x00007f54cd064500   rbp = 0x00007ffea920b260
[task 2017-06-22T15:24:09.999260Z] 15:24:09     INFO -     rsp = 0x00007ffea920b260   r12 = 0x00007f54b406fc28
[task 2017-06-22T15:24:09.999987Z] 15:24:09     INFO -     r13 = 0x00007ffea920b38f   r14 = 0x00007ffea920b2d0
[task 2017-06-22T15:24:10.000514Z] 15:24:09     INFO -     r15 = 0x00007f54c285b710   rip = 0x00007f54be8ee3a7
[task 2017-06-22T15:24:10.001038Z] 15:24:10     INFO -     Found by: call frame info
[task 2017-06-22T15:24:10.001099Z] 15:24:10     INFO -  2  libxul.so!nsThread::ProcessNextEvent(bool, bool*) + 0x36f
[task 2017-06-22T15:24:10.001585Z] 15:24:10     INFO -     rbx = 0x00007f54cd064500   rbp = 0x00007ffea920b370
[task 2017-06-22T15:24:10.002123Z] 15:24:10     INFO -     rsp = 0x00007ffea920b270   r12 = 0x00007f54b406fc28
[task 2017-06-22T15:24:10.002648Z] 15:24:10     INFO -     r13 = 0x00007ffea920b38f   r14 = 0x00007ffea920b2d0
[task 2017-06-22T15:24:10.003194Z] 15:24:10     INFO -     r15 = 0x00007f54c285b710   rip = 0x00007f54bfd5088f
[task 2017-06-22T15:24:10.003732Z] 15:24:10     INFO -     Found by: call frame info
[task 2017-06-22T15:24:10.004277Z] 15:24:10     INFO -  3  libxul.so!NS_ProcessPendingEvents(nsIThread*, unsigned int) + 0x3b
[task 2017-06-22T15:24:10.004804Z] 15:24:10     INFO -     rbx = 0x0000000000000000   rbp = 0x00007ffea920b3c0
[task 2017-06-22T15:24:10.005315Z] 15:24:10     INFO -     rsp = 0x00007ffea920b380   r12 = 0x00007f54cd064500
[task 2017-06-22T15:24:10.005392Z] 15:24:10     INFO -     r13 = 0x00007ffea920b38f   r14 = 0x00000000ffffffff
[task 2017-06-22T15:24:10.005988Z] 15:24:10     INFO -     r15 = 0x00000000001e6fc2   rip = 0x00007f54c060a2bb
[task 2017-06-22T15:24:10.006487Z] 15:24:10     INFO -     Found by: call frame info
[task 2017-06-22T15:24:10.008246Z] 15:24:10     INFO -  4  libxul.so!mozilla::ShutdownXPCOM(nsIServiceManager*) + 0xf1
[task 2017-06-22T15:24:10.008774Z] 15:24:10     INFO -     rbx = 0x00007f54cd064500   rbp = 0x00007ffea920b440
[task 2017-06-22T15:24:10.008842Z] 15:24:10     INFO -     rsp = 0x00007ffea920b3d0   r12 = 0x00007ffea920b400
[task 2017-06-22T15:24:10.009332Z] 15:24:10     INFO -     r13 = 0x00007f54cd0b3298   r14 = 0x00007ffea920b3f0
[task 2017-06-22T15:24:10.009846Z] 15:24:10     INFO -     r15 = 0x00007f54bfd39040   rip = 0x00007f54c0610601
[task 2017-06-22T15:24:10.010335Z] 15:24:10     INFO -     Found by: call frame info
[task 2017-06-22T15:24:10.010412Z] 15:24:10     INFO -  5  libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup [nsAppRunner.cpp:0000dd6746c4 : 1467 + 0x8]
[task 2017-06-22T15:24:10.010924Z] 15:24:10     INFO -     rbx = 0x00007f54cd0bd1f8   rbp = 0x00007ffea920b470
[task 2017-06-22T15:24:10.011463Z] 15:24:10     INFO -     rsp = 0x00007ffea920b450   r12 = 0x0000000000000000
[task 2017-06-22T15:24:10.011549Z] 15:24:10     INFO -     r13 = 0x0000000000000000   r14 = 0x0000000000000000
[task 2017-06-22T15:24:10.011663Z] 15:24:10     INFO -     r15 = 0x00007f54bfd39040   rip = 0x00007f54c09fb832
[task 2017-06-22T15:24:10.012212Z] 15:24:10     INFO -     Found by: call frame info
[task 2017-06-22T15:24:10.012804Z] 15:24:10     INFO -  6  libxul.so!mozilla::DefaultDelete<ScopedXPCOMStartup>::operator() [UniquePtr.h:0000dd6746c4 : 528 + 0x5]
[task 2017-06-22T15:24:10.013425Z] 15:24:10     INFO -     rbx = 0x00007f54cd0bd1f8   rbp = 0x00007ffea920b490
[task 2017-06-22T15:24:10.013534Z] 15:24:10     INFO -     rsp = 0x00007ffea920b480   r12 = 0x0000000000000000
[task 2017-06-22T15:24:10.014501Z] 15:24:10     INFO -     r13 = 0x0000000000000000   r14 = 0x0000000000000000
[task 2017-06-22T15:24:10.015044Z] 15:24:10     INFO -     r15 = 0x00007f54bfd39040   rip = 0x00007f54c09fb7a3
[task 2017-06-22T15:24:10.015165Z] 15:24:10     INFO -     Found by: call frame info
[task 2017-06-22T15:24:10.015786Z] 15:24:10     INFO -  7  libxul.so!XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) + 0x3a5
[task 2017-06-22T15:24:10.016425Z] 15:24:10     INFO -     rbx = 0x00007ffea920b530   rbp = 0x00007ffea920b520
[task 2017-06-22T15:24:10.017047Z] 15:24:10     INFO -     rsp = 0x00007ffea920b4a0   r12 = 0x0000000000000000
[task 2017-06-22T15:24:10.017220Z] 15:24:10     INFO -     r13 = 0x0000000000000000   r14 = 0x0000000000000000
[task 2017-06-22T15:24:10.018086Z] 15:24:10     INFO -     r15 = 0x00007f54bfd39040   rip = 0x00007f54c09f8655
[task 2017-06-22T15:24:10.018581Z] 15:24:10     INFO -     Found by: call frame info
[task 2017-06-22T15:24:10.019257Z] 15:24:10     INFO -  8  libxul.so!XRE_main [nsAppRunner.cpp:0000dd6746c4 : 4869 + 0x5]
[task 2017-06-22T15:24:10.019324Z] 15:24:10     INFO -     rbx = 0x00007ffea920b530   rbp = 0x00007ffea920b700
[task 2017-06-22T15:24:10.019880Z] 15:24:10     INFO -     rsp = 0x00007ffea920b530   r12 = 0x00007ffea920b680
[task 2017-06-22T15:24:10.020473Z] 15:24:10     INFO -     r13 = 0x0000000000000005   r14 = 0x00007ffea920c868
[task 2017-06-22T15:24:10.021108Z] 15:24:10     INFO -     r15 = 0x00007ffea920b720   rip = 0x00007f54c09f827f
[task 2017-06-22T15:24:10.021164Z] 15:24:10     INFO -     Found by: call frame info
[task 2017-06-22T15:24:10.021735Z] 15:24:10     INFO -  9  firefox!do_main(int, char**, char**) + 0x8a
[task 2017-06-22T15:24:10.022342Z] 15:24:10     INFO -     rbx = 0x0000000000421a73   rbp = 0x00007ffea920c750
[task 2017-06-22T15:24:10.022407Z] 15:24:10     INFO -     rsp = 0x00007ffea920b710   r12 = 0x00007ffea920e7bc
[task 2017-06-22T15:24:10.023005Z] 15:24:10     INFO -     r13 = 0x00007ffea920c868   r14 = 0x00007f54c2791050
[task 2017-06-22T15:24:10.023630Z] 15:24:10     INFO -     r15 = 0x0000000000000005   rip = 0x000000000041b34a
[task 2017-06-22T15:24:10.023724Z] 15:24:10     INFO -     Found by: call frame info
[task 2017-06-22T15:24:10.024255Z] 15:24:10     INFO - 10  firefox!main + 0x74
[task 2017-06-22T15:24:10.024885Z] 15:24:10     INFO -     rbx = 0x00007ffea920c868   rbp = 0x00007ffea920c780
[task 2017-06-22T15:24:10.025487Z] 15:24:10     INFO -     rsp = 0x00007ffea920c760   r12 = 0x0000000000000005
[task 2017-06-22T15:24:10.025550Z] 15:24:10     INFO -     r13 = 0x00007ffea920c898   r14 = 0x000001cf98e5226c
[task 2017-06-22T15:24:10.026278Z] 15:24:10     INFO -     r15 = 0x0000000000000000   rip = 0x0000000000411854
[task 2017-06-22T15:24:10.026397Z] 15:24:10     INFO -     Found by: call frame info

Maybe this is the same underlying crash as bug 1374922? Andrea, can you please have a look? Thanks.
Component: Headless → DOM
Flags: needinfo?(amarchesini)
Product: Firefox → Core
(In reply to Henrik Skupin (:whimboo) from comment #1)
> Maybe this is the same underlying crash as bug 1374922? Andrea, can you
> please have a look? Thanks.

Andrea, can you please check?
Duplicate of this bug: 1377393
Posted patch crash.patch (obsolete) — Splinter Review
This patch does several things:

1. mBody and mGlobal are nullified only in the DTOR of FetchConsumer.
2. mShuttingDown is a boolean that is set to true when the consuming is done, or aborted.
3. mShuttingDown is used to avoid the beginning of the body consumption in case Fetch has been already aborted.
4. mConsumeBodyPump is nullified differently, directly when mShuttingDown is set to true.
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Attachment #8882749 - Flags: review?(bkelly)
Posted patch crash.patch (obsolete) — Splinter Review
Same as before, but nsProxyRelease needs to use CancelableRunnables.
Attachment #8882749 - Attachment is obsolete: true
Attachment #8882749 - Flags: review?(bkelly)
Attachment #8882803 - Flags: review?(bkelly)
Crash Signature: [@ mozilla::dom::FetchBodyConsumer<mozilla::dom::Response>::BeginConsumeBodyMainThread] → [@ mozilla::dom::FetchBodyConsumer<mozilla::dom::Response>::BeginConsumeBodyMainThread] [@ mozilla::dom::FetchBodyConsumer<T>::BeginConsumeBodyMainThread]
This is top #14 of Nightly 20170702030204 on Windows.
Comment on attachment 8882803 [details] [diff] [review]
crash.patch

Review of attachment 8882803 [details] [diff] [review]:
-----------------------------------------------------------------

I think we need to better handle the case where the worker thread is shutdown.  I don't think the NS_ProxyRelease() stuff will work reliably since its not covered within a WorkerHolder, AFAICT.  Also, our event targets on workers are a bit of a mess and make it harder to do this correctly.

::: dom/fetch/FetchConsumer.cpp
@@ +391,1 @@
>    : mTargetThread(NS_GetCurrentThread())

I think its better to use aGlobalObject->EventTargetFor(TaskCategory::Other) instead of NS_GetCurrentThread().  This will ensure the runnable is labeled properly.

@@ +408,5 @@
>  {
>    NS_ProxyRelease("FetchBodyConsumer::mBody",
>                    mTargetThread, mBody.forget());
> +  NS_ProxyRelease("FetchBodyConsumer::mGlobal",
> +                  mTargetThread, mGlobal.forget());

So, this has a reasonable chance of leaking on a Worker thread.  I think we should at least have a WorkerHolder here to ensure the thread is still alive.

Also, I have some patches I am working on to make the WorkerPrivate::EventTarget class continue to accept runnables as long as a WorkerHolder is active.  This would avoid these leaks.  Alternatively, use the WorkerPrivate ControlEventTarget for the mTargetThread here.

So should the consumer be its own Holder instead of being ref'd from FetchBodyWorkerHolder?

::: xpcom/threads/nsProxyRelease.h
@@ +31,4 @@
>  {
>  public:
>    ProxyReleaseEvent(const char* aName, already_AddRefed<T> aDoomed)
> +  : CancelableRunnable(aName), mDoomed(aDoomed.take()) {}

I think this should override Cancel() to call Run(), right?  We must execute the release on the target thread.
Attachment #8882803 - Flags: review?(bkelly) → review-
Posted patch crash.patchSplinter Review
Better approach without extra WorkerHolders.
Attachment #8882803 - Attachment is obsolete: true
Attachment #8884251 - Flags: review?(bkelly)
Comment on attachment 8884251 [details] [diff] [review]
crash.patch

Review of attachment 8884251 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to chat in IRC before r+'ing this.  Are you around today?

::: dom/fetch/FetchConsumer.cpp
@@ +414,5 @@
>  void
>  FetchBodyConsumer<Derived>::AssertIsOnTargetThread() const
>  {
> +  MOZ_ASSERT_IF(mGlobal,
> +                mGlobal->EventTargetFor(TaskCategory::Other) == mTargetThread);

Can you write this as `MOZ_ASSERT(mTargetThread->IsOnCurrentThread())`?  I think we sometimes return different event targets from EventTargetFor() on the main thread if we've entered xpcom shutdown, etc.

Note, if its a WorkerPrivate::EventTarget then this will return false if the worker has started shutting down.  I think we should probably make that work through shutdown, though.
Attachment #8884251 - Flags: review?(bkelly)
Duplicate of this bug: 1378522
See this bug for something that might help here.
See Also: → 1379243
Comment on attachment 8884251 [details] [diff] [review]
crash.patch

I'm going to go ahead and r+ this since its a top crasher.  I don't have any specific objections to the patch.  I would like to chat at some point, though, as its not 100% clear to me how these changes fix the crash.  You could also simply post a description of what the crash was and how this fixes in this bug.

Also, please consider if you need bug 1379243 for the mGlobal->EventTargetFor() stuff to work right on worker threads.
Attachment #8884251 - Flags: review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e6a2348df217
FetchConsumer must check if the operation has been aborted before starting, r=bkelly
https://hg.mozilla.org/mozilla-central/rev/e6a2348df217
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Please request Beta/ESR52 approval on this (and associated bugs) whenever we're feeling confident that the crashes are solved.
Flags: needinfo?(amarchesini)
Comment on attachment 8884251 [details] [diff] [review]
crash.patch

Approval Request Comment
[Feature/Bug causing the regression]: Fetch API
[User impact if declined]: UAF in the Fetch Body consumption.
[Is this code covered by automated tests?]: no, it's racy.
[Has the fix been verified in Nightly?]: it was an intermittent failure.
[Needs manual test from QE? If yes, steps to reproduce]: none
[List of other uplifts needed for the feature/fix]: 1374922
[Is the change risky?]: not too much.
[Why is the change risky/not risky?]: There is a special boolean value to avoid the creation of a fetch stream pump in case the operation has been already aborted before starting.
[String changes made/needed]: none
Flags: needinfo?(amarchesini)
Attachment #8884251 - Flags: approval-mozilla-esr52?
Attachment #8884251 - Flags: approval-mozilla-beta?
Comment on attachment 8884251 [details] [diff] [review]
crash.patch

Should fix a crash, let's uplift this for beta 9.
Attachment #8884251 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8884251 [details] [diff] [review]
crash.patch

Since this is fallout from bug 1371889 let's uplift to ESR 53.3.0.
Attachment #8884251 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
This needs rebased patches for Beta/ESR52.
Flags: needinfo?(amarchesini)
Actually, we need to uplift bugs 1373555, 1374922 and 1375749.
Do we want to proceed? the crash seems gone in m-i/m-c.
Flags: needinfo?(amarchesini) → needinfo?(ryanvm)
Do we need bug 1380604 uplifted as well?  That fix seems related to the FetchConsumer stuff here.
Flags: needinfo?(amarchesini)
That one as well, yes.
Flags: needinfo?(amarchesini)
Keywords: regression
Flags: needinfo?(ryanvm)
This won't be landing on 55.
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.