Closed
Bug 1381748
Opened 7 years ago
Closed 6 years ago
Intermittent test_quit_restart.py TestServerQuitApplication.test_attempt_quit | application crashed [@ RefPtr<mozilla::dom::FetchBody<mozilla::dom::Response> >::operator->] after assertion failure: mRawPtr != nullptr
Categories
(Core :: DOM: Service Workers, defect, P1)
Core
DOM: Service Workers
Tracking
()
RESOLVED
FIXED
mozilla57
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | unaffected |
firefox56 | --- | fixed |
firefox57 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: baku)
References
Details
(Keywords: crash, intermittent-failure)
Crash Data
Attachments
(6 files)
10.06 KB,
patch
|
catalinb
:
review+
|
Details | Diff | Splinter Review |
2.49 KB,
patch
|
catalinb
:
review+
|
Details | Diff | Splinter Review |
7.48 KB,
patch
|
catalinb
:
review+
|
Details | Diff | Splinter Review |
10.08 KB,
patch
|
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
2.57 KB,
patch
|
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
8.66 KB,
patch
|
lizzard
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Filed by: hskupin [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=114685902&repo=mozilla-central https://queue.taskcluster.net/v1/task/YNnmNffZTmGZSqJPmVnyFw/runs/0/artifacts/public/logs/live_backing.log Looks like the patch on bug 1375749 didn't fix it completely
Comment 1•7 years ago
|
||
Btw. there is an assertion: Assertion failure: mRawPtr != nullptr (You can't dereference a NULL RefPtr with operator->().), at /home/worker/workspace/build/src/obj-firefox/dist/include/mozilla/RefPtr.h:315 [task 2017-07-16T15:30:27.407601Z] 15:30:27 INFO - #01: ???[/home/worker/workspace/build/application/firefox/libxul.so +0x19efa9c] [task 2017-07-16T15:30:27.409850Z] 15:30:27 INFO - #02: ???[/home/worker/workspace/build/application/firefox/libxul.so +0x19f236a] [task 2017-07-16T15:30:27.409918Z] 15:30:27 INFO - #03: ???[/home/worker/workspace/build/application/firefox/libxul.so +0x19f278a] [task 2017-07-16T15:30:27.410567Z] 15:30:27 INFO - #04: ???[/home/worker/workspace/build/application/firefox/libxul.so +0x548ca5] [task 2017-07-16T15:30:27.410642Z] 15:30:27 INFO - #05: ???[/home/worker/workspace/build/application/firefox/libxul.so +0x544663] [task 2017-07-16T15:30:27.410723Z] 15:30:27 INFO - #06: ???[/home/worker/workspace/build/application/firefox/libxul.so +0x55f03c] [task 2017-07-16T15:30:27.411567Z] 15:30:27 INFO - #07: ???[/home/worker/workspace/build/application/firefox/libxul.so +0x319e5a4] [task 2017-07-16T15:30:27.412064Z] 15:30:27 INFO - #08: ???[/home/worker/workspace/build/application/firefox/libxul.so +0x319e5dd] [task 2017-07-16T15:30:27.412778Z] 15:30:27 INFO - #09: ???[/home/worker/workspace/build/application/firefox/libxul.so +0x31a5962] [task 2017-07-16T15:30:27.413453Z] 15:30:27 INFO - #10: ???[/home/worker/workspace/build/application/firefox/libxul.so +0x31a5bd6] [task 2017-07-16T15:30:27.414110Z] 15:30:27 INFO - #11: ???[/home/worker/workspace/build/application/firefox/firefox +0x55fd] [task 2017-07-16T15:30:27.414508Z] 15:30:27 INFO - #12: ???[/home/worker/workspace/build/application/firefox/firefox +0x4e06] [task 2017-07-16T15:30
status-firefox56:
--- → affected
Flags: needinfo?(amarchesini)
Summary: Intermittent test_quit_restart.py TestServerQuitApplication.test_attempt_quit | application crashed [@ RefPtr<mozilla::dom::FetchBody<mozilla::dom::Response> >::operator->] → Intermittent test_quit_restart.py TestServerQuitApplication.test_attempt_quit | application crashed [@ RefPtr<mozilla::dom::FetchBody<mozilla::dom::Response> >::operator->] after assertion failure: mRawPtr != nullptr
Assignee | ||
Comment 2•7 years ago
|
||
This has been fixed by bug 1380604.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(amarchesini)
Resolution: --- → DUPLICATE
Comment 3•7 years ago
|
||
No, it's not. The patch for bug 1380604 got merged to autoland on July 13th: https://hg.mozilla.org/integration/autoland/rev/530b1eb0a548 But this failure is still present: https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=mn&bugfiler&fromchange=0ee162368b58416628490ee3cd97e3f7f6bf9b77&selectedJob=115465007
Status: RESOLVED → REOPENED
Flags: needinfo?(amarchesini)
Resolution: DUPLICATE → ---
Comment 4•7 years ago
|
||
03:01:04 INFO - 1500433264831 Marionette DEBUG Received observer notification "xpcom-shutdown" [task 2017-07-19T03:01:04.841559Z] 03:01:04 INFO - 1500433264831 Marionette DEBUG Received observer notification "xpcom-shutdown" [task 2017-07-19T03:01:04.870654Z] 03:01:04 INFO - [Parent 3444] WARNING: 'NS_FAILED(aStatus)', file /home/worker/workspace/build/src/dom/fetch/FetchConsumer.cpp, line 558 [task 2017-07-19T03:01:04.870752Z] 03:01:04 INFO - [Parent 3444] WARNING: 'rv.Failed()', file /home/worker/workspace/build/src/dom/fetch/FetchConsumer.cpp, line 690 It's the same null dereference. Looks like the sync runnable dispatch at http://searchfox.org/mozilla-central/source/dom/fetch/FetchConsumer.cpp#689 can still fail during shutdown.
Assignee | ||
Comment 5•7 years ago
|
||
The issue here is that, on shutdown, the worker decides to do not create a sync event loop; we don't cleanup the main-thread part of FetchConsumer; FetchConsumer crashes using a null mBody pointer. This first patch removes mBody from FetchConsumer. Note the real fix is in patch 3, but this is nice to have: only thread-safe objects are touched on main-thread if main-thread is not the target thread.
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Attachment #8887910 -
Flags: review?(bkelly)
Assignee | ||
Comment 6•7 years ago
|
||
Completely unrelated patch. I can move it to a separate bug if you prefer.
Attachment #8887911 -
Flags: review?(bkelly)
Assignee | ||
Comment 7•7 years ago
|
||
Note that this patch is based on your work on control runnables. I don't use any control runnables! The issue here is that, if the creation of the sync event loop fails, FetchConsumer releases resources also when on the main-thread they will be used, eventually. The new approach doesn't release resources when the WorkerHolder is notified. But instead, it dispatches and event on the main-thread in order to start the shutting down. I have identified 3 scenarios: 1. the beginning of the consuming is not started yet. In this case, AutoFailConsumeBody will end up calling ContinueConsumeBody() in BeginConsumeBodyMainThread(). 2. the reading is running. The pump is canceled. OnStreamComplete() will be called. And from here we end up calling ContinueConsumeBody(). 3. the reading is completed. ContinueConsumeBody() has been scheduled and it will run eventually. In all these scenarios, the pump is nullified on the main-thread because, if it is created, we always run OnStreamComplete() - except for scenario 1. ContinueConsumeBody() is called when the pump is always null and we can release resources. Nothing should run after it.
Attachment #8887916 -
Flags: review?(bkelly)
Updated•7 years ago
|
Priority: -- → P1
Updated•7 years ago
|
status-firefox54:
--- → unaffected
status-firefox55:
--- → unaffected
status-firefox-esr52:
--- → affected
Comment 8•7 years ago
|
||
Comment on attachment 8887910 [details] [diff] [review] part 1 - Remove mBody Redirecting to Catalin since I believe he was looking at this code recently and I may not have time to do this before next week.
Attachment #8887910 -
Flags: review?(bkelly) → review?(catalin.badea392)
Updated•7 years ago
|
Attachment #8887911 -
Flags: review?(bkelly) → review?(catalin.badea392)
Updated•7 years ago
|
Attachment #8887916 -
Flags: review?(bkelly) → review?(catalin.badea392)
Comment 9•7 years ago
|
||
(In reply to Andrea Marchesini [:baku] from comment #7) > Created attachment 8887916 [details] [diff] [review] > part 3 - shutdown > > Note that this patch is based on your work on control runnables. I don't use > any control runnables! Does this mean it depends on bug 1379243? I was planning to land that friday after I got back home, but I suppose we could try to land sooner if needed. Note, I need to remove that printf_stderr(), so I don't think we can just do a checkin-needed with the currently uploaded patches.
Comment 10•7 years ago
|
||
Comment on attachment 8887910 [details] [diff] [review] part 1 - Remove mBody Review of attachment 8887910 [details] [diff] [review]: ----------------------------------------------------------------- lgtm
Attachment #8887910 -
Flags: review?(catalin.badea392) → review+
Updated•7 years ago
|
Attachment #8887911 -
Flags: review?(catalin.badea392) → review+
Comment 11•7 years ago
|
||
Comment on attachment 8887916 [details] [diff] [review] part 3 - shutdown Review of attachment 8887916 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Andrea Marchesini [:baku] from comment #7) > Created attachment 8887916 [details] [diff] [review] > part 3 - shutdown > > Note that this patch is based on your work on control runnables. I don't use > any control runnables! AutoFailConsumeBody uses control runnables. Could you please explain what's the dependency on the other bug? ::: dom/fetch/FetchConsumer.cpp @@ +454,5 @@ > FetchBodyConsumer<Derived>::BeginConsumeBodyMainThread() > { > AssertIsOnMainThread(); > > + AutoFailConsumeBody<Derived> autoReject(this); Is this guaranteed to succeed in dispatching the runnable by the worker holder? I'm thinking of the following case: 1. queue BeginConsumeBodyMainThread() <long backlog on the main thread> 2. Worker gets terminated 3. BeginConsumeBodyMainThread gets executed. Can we still dispatch a runnable to release objects at this point? @@ +661,2 @@ > > + mMainThreadEventTarget->Dispatch(r.forget(), NS_DISPATCH_NORMAL); Can this fail during browser shutdown? The logs suggest that the previous sync runnable dispatch r->Dispatch(Killing, rv) can still fail. That's either because the worker is in >=Killing state or because the main thread dispatch failed. But since we have a WorkerHolder preventing the shutdown, we shouldn't reach the Killing state, right?
Comment 12•7 years ago
|
||
(In reply to Ben Kelly [PTO, back July 24][:bkelly] from comment #9) > > Note that this patch is based on your work on control runnables. I don't use > > any control runnables! > > Does this mean it depends on bug 1379243? FWIW, I pushed bug 1379243 to inbound this morning.
![]() |
||
Updated•6 years ago
|
Summary: Intermittent test_quit_restart.py TestServerQuitApplication.test_attempt_quit | application crashed [@ RefPtr<mozilla::dom::FetchBody<mozilla::dom::Response> >::operator->] after assertion failure: mRawPtr != nullptr → Intermittent test_quit_restart.py TestServerQuitApplication.test_attempt_quit | application crashed [@ RefPtr<mozilla::dom::FetchBody<mozilla::dom::Response> >::operator->] after assertion failure: mRawPtr != nullptr
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 14•6 years ago
|
||
Sorry I didn't see your comment. > AutoFailConsumeBody uses control runnables. Could you please explain what's > the dependency > on the other bug? bug 1379243 is about 'converting' normal runnables to ControlRunnables when a worker is shutting down. This helps to do the shutting down. > <long backlog on the main thread> > 2. Worker gets terminated > > 3. BeginConsumeBodyMainThread gets executed. Can we still dispatch a > runnable to release objects at this point? The holder is not released, right? So this means that the it's still possible to dispatch runnables. Using bug 1379243, the normal runnable will be converted to a control one and it should be dispatched. > The logs suggest that the previous sync runnable dispatch > r->Dispatch(Killing, rv) can still fail. That's either because the worker is > in >=Killing state or because the main thread dispatch failed. But since we > have a WorkerHolder preventing the shutdown, we shouldn't reach the Killing > state, right? Worker fails in the creation of the sync event loop because it is shutting down. But then we still release the WorkerHolder (see the RAII class), and the body. When BeginConsumeBodyMainThread runs, mBody is null, and we crash. Here we don't create any sync event loop. We just keep the worker alive and we wait to receive a runnable (maybe a Control one).
Updated•6 years ago
|
Attachment #8887916 -
Flags: review?(catalin.badea392) → review+
Comment 15•6 years ago
|
||
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/2a8ceea666e9 Cleanup FetchConsumer workflow - part 1 - no mBody, r=catalinb https://hg.mozilla.org/integration/mozilla-inbound/rev/067897f212ac Cleanup FetchConsumer workflow - part 2 - cleanup RegisterWorkerHolder, r=catalinb https://hg.mozilla.org/integration/mozilla-inbound/rev/da550c5c845f Cleanup FetchConsumer workflow - part 3 - shutdown workflow, r=catalinb
Comment 16•6 years ago
|
||
sorry had to back this out for frequent failures on android like https://treeherder.mozilla.org/logviewer.html#?job_id=117908955&repo=mozilla-inbound starting with this push here
Flags: needinfo?(amarchesini)
Comment 17•6 years ago
|
||
Backout by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/19f1250e37cd Backed out changeset da550c5c845f https://hg.mozilla.org/integration/mozilla-inbound/rev/299ae6eb5ec1 Backed out changeset 067897f212ac https://hg.mozilla.org/integration/mozilla-inbound/rev/fa20618aedb5 Backed out changeset 2a8ceea666e9 for frequent android failures in test_postMessage_override.html
Comment 18•6 years ago
|
||
That looks like an unrelated test failure.
Assignee | ||
Comment 19•6 years ago
|
||
It looks so, but somehow it exists only when those patches are in the tree.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•6 years ago
|
status-firefox57:
--- → affected
Comment 22•6 years ago
|
||
Do we have an update here? This crash is still happening a lot.
Comment hidden (Intermittent Failures Robot) |
Comment 24•6 years ago
|
||
Cătălin and Ben, is there someone else who could pick this up until Andrea is back?
Flags: needinfo?(catalin.badea392)
Flags: needinfo?(bkelly)
Comment 25•6 years ago
|
||
Andrew, see question in comment 24.
Flags: needinfo?(bkelly) → needinfo?(overholt)
Comment 26•6 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #24) > Cătălin and Ben, is there someone else who could pick this up until Andrea > is back? Same answer as the other bug. Is this super-urgent?
Flags: needinfo?(overholt) → needinfo?(hskupin)
Comment 27•6 years ago
|
||
We are close to get warnings from OF but given comment 4 it should be a null pointer crash too. So depending the trend for the next days, we should be able to wait.
Flags: needinfo?(hskupin)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 30•6 years ago
|
||
Andrea, do you have an update for us now that you are back? Since lately it seems to be a beta only failure. Thanks.
Updated•6 years ago
|
Flags: needinfo?(amarchesini)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Updated•6 years ago
|
Flags: needinfo?(amarchesini)
Comment 32•6 years ago
|
||
Pushed by amarchesini@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/4206a9271660 Cleanup FetchConsumer workflow - part 1 - no mBody, r=catalinb https://hg.mozilla.org/integration/mozilla-inbound/rev/30aeba7deb29 Cleanup FetchConsumer workflow - part 2 - cleanup RegisterWorkerHolder, r=catalinb https://hg.mozilla.org/integration/mozilla-inbound/rev/c309a6360b2f Cleanup FetchConsumer workflow - part 3 - shutdown workflow, r=catalinb
![]() |
||
Comment 33•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4206a9271660 https://hg.mozilla.org/mozilla-central/rev/30aeba7deb29 https://hg.mozilla.org/mozilla-central/rev/c309a6360b2f
Status: REOPENED → RESOLVED
Closed: 7 years ago → 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment 34•6 years ago
|
||
We'll eventually want to backport this to Beta, but probably not until early next week after it's had some bake time.
Comment 39•6 years ago
|
||
Is this ready for a Beta approval request? At this point, it wouldn't ship in a Beta until b11 next week.
Flags: needinfo?(amarchesini)
Assignee | ||
Comment 40•6 years ago
|
||
Flags: needinfo?(amarchesini)
Assignee | ||
Comment 41•6 years ago
|
||
Assignee | ||
Comment 42•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Attachment #8905995 -
Attachment is patch: true
Comment 43•6 years ago
|
||
Did you mean to request approval on these too? :)
Flags: needinfo?(amarchesini)
Assignee | ||
Comment 44•6 years ago
|
||
Comment on attachment 8905995 [details] [diff] [review] m-b - part 1 Approval Request Comment [Feature/Bug causing the regression]: Fetch + workers [User impact if declined]: a crash when fetch is running in a worker that is terminating. Race condition. [Is this code covered by automated tests?]: no [Has the fix been verified in Nightly?]: no but we don't have crashes on try. So I guess we can consider it verified. [Needs manual test from QE? If yes, steps to reproduce]: none [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: maybe... [Why is the change risky/not risky?]: this is a rewrite of the canceling of the fetch-body consumption. It's a lot of code, but it's green on try, it fixes crashes, and it's important to have it on beta. [String changes made/needed]: none
Flags: needinfo?(amarchesini)
Attachment #8905995 -
Flags: approval-mozilla-beta?
Assignee | ||
Updated•6 years ago
|
Attachment #8905996 -
Flags: approval-mozilla-beta?
Assignee | ||
Updated•6 years ago
|
Attachment #8905997 -
Flags: approval-mozilla-beta?
Comment hidden (Intermittent Failures Robot) |
Comment on attachment 8905995 [details] [diff] [review] m-b - part 1 Big change but this should fix a lot of test failures/crashes. Try builds look good so let's uplift it for beta 11 today.
Attachment #8905995 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8905996 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #8905997 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 47•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/cec5d3fc21b1 https://hg.mozilla.org/releases/mozilla-beta/rev/32d8da2721a2 https://hg.mozilla.org/releases/mozilla-beta/rev/335fe07a77ed
Updated•6 years ago
|
Blocks: 1375749, CVE-2017-7793
No longer depends on: 1375749
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•