Closed
Bug 1330693
Opened 7 years ago
Closed 7 years ago
Crash in mozilla::dom::workers::ServiceWorkerJobQueue::JobFinished
Categories
(Core :: DOM: Service Workers, defect)
Tracking
()
RESOLVED
FIXED
mozilla53
People
(Reporter: bkelly, Assigned: bkelly)
References
Details
(Keywords: crash)
Crash Data
Attachments
(1 file)
4.47 KB,
patch
|
ehsan.akhgari
:
review+
gchang
:
approval-mozilla-aurora+
gchang
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-460ce510-1953-43fd-9be1-11a732170112. ============================================================= Our double-completion checks in JobFinish() now trigger the release assertions in nsTArray. We should change this to make an explicit IsEmpty() check.
Assignee | ||
Comment 1•7 years ago
|
||
Actually, I think this is more about calling JobFinished() on a nullptr mFinalCallback object.
Assignee | ||
Comment 2•7 years ago
|
||
Ehsan, given the crash stack in this bug I think we still have some trouble with double-completion of SW jobs. This patch improves our protections by checking further up the completion call chain. In this case I think the crash is from calling JobFinished() on a nullptr mFinalCallback for some reason. I also added more diagnostic assertions to try to narrow down the problem.
Attachment #8826243 -
Flags: review?(ehsan)
Comment 3•7 years ago
|
||
Hmm. So there are only two crashes with this signature so far. It's obvious that this isn't SafeElementAt failing. But I also think it's clear this isn't a nullptr dereference crash, given that the crash address in comment 0 is 0x1612004 and the crash address in bp-aac6a324-b920-4a7c-8cef-4fe292170111 is 0xfffffffff35a5fb0 (nullptr deref crashes typically occur at an address close to 0, for example, accessing a member with offset N would crash on address N if |this| is nullptr.) More specifically, since ServiceWorkerJobQueue::Callback::JobFinished is virtual, if mFinalCallback was 0 then the deref of the |this| pointer to get to the vtable would crash at address 0. So I think there is something else going on here.
Comment 4•7 years ago
|
||
Comment on attachment 8826243 [details] [diff] [review] Try to handle SW job double-completion better. r=ehsan Review of attachment 8826243 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/workers/ServiceWorkerJob.cpp @@ +215,5 @@ > > mState = State::Finished; > > + MOZ_DIAGNOSTIC_ASSERT(mFinalCallback); > + if (mFinalCallback) { As I said before, I'm pretty sure this isn't going to solve the crash, but still the patch may uncover the bug we're chasing here. FWIW I suggest looking at the crash dumps for this, let me know if you want me to download them for you. But since we only have two of these crashes, it's up to you how much time you want to spend on debugging it... :-)
Attachment #8826243 -
Flags: review?(ehsan) → review+
Assignee | ||
Comment 5•7 years ago
|
||
In the last 24 hours we've gotten 3 more reports that suggest nullptr of mFinalCallback: https://crash-stats.mozilla.com/report/index/571939d5-08eb-4fa2-b654-459832170113 Is our nullptr de-ref working as expected in release?
Flags: needinfo?(ehsan)
Assignee | ||
Comment 6•7 years ago
|
||
Comment on attachment 8826243 [details] [diff] [review] Try to handle SW job double-completion better. r=ehsan Approval Request Comment [Feature/Bug causing the regression]: Service workers [User impact if declined]: Low frequency crashes [Is this code covered by automated tests?]: Its heavily tested, but our automation does not trigger this problem. [Has the fix been verified in Nightly?]: It has not yet merged to nightly, but I'd like to flag now so the patch might make it into the beta RC next week. [Needs manual test from QE? If yes, steps to reproduce]: None [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Minimal risk [Why is the change risky/not risky?]: This patch adds some additional state and nullptr checks. It has a very low risk of regression. [String changes made/needed]: None Note, I don't think this patch is very speculative. See the crash stack in comment 5 which is pretty clearly a nullptr de-ref on FF51b that is fixed by this patch.
Attachment #8826243 -
Flags: approval-mozilla-beta?
Attachment #8826243 -
Flags: approval-mozilla-aurora?
Comment 7•7 years ago
|
||
Ben are you going to land this soon so that it might get into m-c? The sheriffs do often handle some merges over the weekend. I will be checking back on uplifts tomorrow.
status-firefox51:
--- → affected
status-firefox52:
--- → affected
status-firefox53:
--- → affected
Flags: needinfo?(bkelly)
Comment 8•7 years ago
|
||
Gerry, just letting you know you might want this for tomorrow's RC build.
Flags: needinfo?(gchang)
Comment 9•7 years ago
|
||
Comment on attachment 8826243 [details] [diff] [review] Try to handle SW job double-completion better. r=ehsan Fix a crash. Beta51+ & Aurora52+. Should be in 51 RC.
Flags: needinfo?(gchang)
Attachment #8826243 -
Flags: approval-mozilla-beta?
Attachment #8826243 -
Flags: approval-mozilla-beta+
Attachment #8826243 -
Flags: approval-mozilla-aurora?
Attachment #8826243 -
Flags: approval-mozilla-aurora+
Comment 10•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-aurora/rev/f987327e5c36
Comment 11•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/15467610e733
Comment 12•7 years ago
|
||
ben, i landed this on gchangs request on beta and aurora. Does this need to land on trunk too ? (i guess so from the status flags)
Assignee | ||
Comment 13•7 years ago
|
||
Sorry, I thought I landed this. I think maybe the tree was closed when I tried to push and forgot to try again. Landing now.
Flags: needinfo?(bkelly)
Comment 14•7 years ago
|
||
Pushed by bkelly@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/9b84a435cd2a Try to handle SW job double-completion better. r=ehsan
Assignee | ||
Updated•7 years ago
|
Crash Signature: [@ mozilla::dom::workers::ServiceWorkerJobQueue::JobFinished] → [@ mozilla::dom::workers::ServiceWorkerJobQueue::JobFinished]
[@ mozilla::dom::workers::ServiceWorkerJob::Finish ]
Comment 15•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/9b84a435cd2a
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment 16•7 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #5) > In the last 24 hours we've gotten 3 more reports that suggest nullptr of > mFinalCallback: > > https://crash-stats.mozilla.com/report/index/571939d5-08eb-4fa2-b654- > 459832170113 Yeah this is definitely a clear case of a null deref. > Is our nullptr de-ref working as expected in release? What do you mean?
Flags: needinfo?(ehsan) → needinfo?(bkelly)
Assignee | ||
Comment 17•7 years ago
|
||
I mean, are you sure things like PGO, other optimizations, or some kind of runtime config could be allowing us to traverse that nullptr vtable on windows release? I feel like I see a lot of crash reports where it seems we have made it through a nullptr read.
Flags: needinfo?(bkelly)
Comment 18•7 years ago
|
||
If the vtable pointer is null and we do try to make a virtual call, we have to dereference that address. Barring that, there are only two cases that I can think of where something like what you explain can happen. 1. Somebody having mapped a readable page at address 0. Technically we don't protect against that, but if some malicious code does that then null pointer derefs will return what's at the address. This value isn't probably specific to the call site though, and will be some constant, and interpreting that as a vtable can have arbitrary results. But the chances of you seeing something on the stack past that point that makes sense are slim. 2. The call being devirtualized by the optimizer, if the compiler can figure out the concrete type of the object at hand. Here, the compiler can't know anything about mFinalCallback since it's set based on what the caller passed to Start(), so there's no way for it to devirtualize. Just to be certain, I checked the code in Nightly and there's definitely a virtual call there: if (mFinalCallback) { mFinalCallback->JobFinished(this, aRv); 66B52860 mov eax,dword ptr [ecx] if (mFinalCallback) { mFinalCallback->JobFinished(this, aRv); 66B52862 push ebx 66B52863 push esi 66B52864 call dword ptr [eax]
Assignee | ||
Comment 19•7 years ago
|
||
(In reply to :Ehsan Akhgari from comment #18) > 2. The call being devirtualized by the optimizer, if the compiler can figure > out the concrete type of the object at hand. Here, the compiler can't know > anything about mFinalCallback since it's set based on what the caller passed > to Start(), so there's no way for it to devirtualize. Just to be certain, I > checked the code in Nightly and there's definitely a virtual call there: > > if (mFinalCallback) { > mFinalCallback->JobFinished(this, aRv); > 66B52860 mov eax,dword ptr [ecx] > if (mFinalCallback) { > mFinalCallback->JobFinished(this, aRv); > 66B52862 push ebx > 66B52863 push esi > 66B52864 call dword ptr [eax] Is this a windows PGO build? Because mFinalCallback will only ever have a single concrete type set for it; ServiceWorkerJobQueue::Callback. It does seem optimizable to me.
Comment 20•7 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #19) > (In reply to :Ehsan Akhgari from comment #18) > > 2. The call being devirtualized by the optimizer, if the compiler can figure > > out the concrete type of the object at hand. Here, the compiler can't know > > anything about mFinalCallback since it's set based on what the caller passed > > to Start(), so there's no way for it to devirtualize. Just to be certain, I > > checked the code in Nightly and there's definitely a virtual call there: > > > > if (mFinalCallback) { > > mFinalCallback->JobFinished(this, aRv); > > 66B52860 mov eax,dword ptr [ecx] > > if (mFinalCallback) { > > mFinalCallback->JobFinished(this, aRv); > > 66B52862 push ebx > > 66B52863 push esi > > 66B52864 call dword ptr [eax] > > Is this a windows PGO build? Yes, that's what Nightly is. > Because mFinalCallback will only ever have a > single concrete type set for it; ServiceWorkerJobQueue::Callback. It does > seem optimizable to me. Well, like I showed before, the code isn't optimized to devirtualize this call.
Assignee | ||
Comment 21•7 years ago
|
||
(In reply to :Ehsan Akhgari from comment #20) > Yes, that's what Nightly is. You didn't tell me what platform you were looking at. I thought you used a mac, so I wasn't sure if you were looking at the windows visual studio produced binary.
Comment 22•7 years ago
|
||
(In reply to Ben Kelly [:bkelly] from comment #21) > (In reply to :Ehsan Akhgari from comment #20) > > Yes, that's what Nightly is. > > You didn't tell me what platform you were looking at. I thought you used a > mac, so I wasn't sure if you were looking at the windows visual studio > produced binary. Oh no sorry, I was talking about Windows (since the crash happens on Windows it seems).
You need to log in
before you can comment on or make changes to this bug.
Description
•