Still crashing in nsTArray_Impl<T>::IndexOf<T> | mozilla::dom::Geolocation::RemoveRequest

RESOLVED FIXED in Firefox 47

Status

()

defect
--
critical
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: khuey, Assigned: mccr8)

Tracking

({crash})

unspecified
mozilla48
Unspecified
Windows NT
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox46 wontfix, firefox47+ fixed, firefox48+ fixed, firefox-esr45 affected)

Details

(Whiteboard: btpp-active, crash signature)

Attachments

(1 attachment)

This bug was filed from the Socorro interface and is 
report bp-a472c842-2c7b-4ca7-a267-478cf2160405.
=============================================================

Actual stack, from MSVC:

 	xul.dll!nsTArray_Impl<nsIContent * __ptr64,nsTArrayInfallibleAllocator>::IndexOf<nsIContent * __ptr64,nsDefaultComparator<nsIContent * __ptr64,nsIContent * __ptr64> >(nsIContent * const & aItem, unsigned __int64) Line 1098	C++
>	xul.dll!mozilla::dom::Geolocation::RemoveRequest(nsGeolocationRequest * aRequest) Line 1415	C++
 	xul.dll!nsGeolocationRequest::NotifyErrorAndShutdown(unsigned short aErrorCode) Line 426	C++
 	xul.dll!nsGeolocationRequest::TimerCallbackHolder::Notify(nsITimer * __formal) Line 781	C++
 	xul.dll!nsTimerImpl::Fire() Line 541	C++
 	xul.dll!nsTimerEvent::Run() Line 290	C++
 	xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 994	C++
 	xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 297	C++
 	xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 98	C++
 	xul.dll!MessageLoop::RunHandler() Line 224	C++
 	xul.dll!MessageLoop::Run() Line 204	C++
 	xul.dll!nsBaseAppShell::Run() Line 158	C++
 	xul.dll!nsAppShell::Run() Line 264	C++
 	xul.dll!nsAppStartup::Run() Line 282	C++
 	xul.dll!XREMain::XRE_mainRun() Line 4343	C++
 	xul.dll!XREMain::XRE_main(int argc, char * * argv, const nsXREAppData * aAppData) Line 4439	C++
 	xul.dll!XRE_main(int argc, char * * argv, const nsXREAppData * aAppData, unsigned int aFlags) Line 4546	C++

This is 1% of our nightly crashes over the last three days.
It should be related to bug 1256061.

Hi Andrew,
could you take a look?
Flags: needinfo?(continuation)
Bah, I guess this is another variant of bug 1256061. I even see the same crash signatures. It looks like it is on the first clause of the inequality in RemoveRequest() rather than the second, so maybe the geolocation object is null instead of the request? I'm not sure how mLocator can end up null here, as I don't think it gets cleared outside of the CC unlink method and the destructor, and the object is being held live on the stack. Maybe I'll just throw a null check in...
Assignee: nobody → continuation
Blocks: 1256061
Crash Signature: [@ nsTArray_Impl<T>::IndexOf<T>] → [@ nsTArray_Impl<T>::IndexOf<T>][@ nsTArray_Impl<T>::IndexOf<T> | mozilla::dom::Geolocation::RemoveRequest]
Flags: needinfo?(continuation)
Whiteboard: btpp-active
I looked at why Breakpad fails to get a good stack here, and it's actually pretty straightforward. Frame 0 is a leaf function, so it doesn't have an entry in the unwind table. However, if Breakpad can't unwind using unwind info it skips to trying to use a frame pointer, and %rbp is not actually a frame pointer here, but it points to valid memory so Breakpad hands that back as the return address and falls apart. The actual return address is just sitting there at the top of the stack. I think I know how to fix Breakpad to do the right thing here without breaking other cases, so I'll try to fix that upstream.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4)
> I hit this twice in a row on http://beta.temblor.net/

Did you share your location? Did you close the tab quickly?
I shared my location and then tried to zoom in and move around.  I didn't close the tab quickly.
I wrote a patch to Breakpad's stackwalker that allows it to get the correct stack in this situation:
https://codereview.chromium.org/1902783002
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #7)
> I wrote a patch to Breakpad's stackwalker that allows it to get the correct
> stack in this situation:
> https://codereview.chromium.org/1902783002

Awesome! Is there a place where we track getting that deployed on our side?
No, but I'm going to land it upstream today, and spin a new Breakpad build for Socorro, so the next Socorro production push after that should pick it up.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #9)
> No, but I'm going to land it upstream today, and spin a new Breakpad build
> for Socorro, so the next Socorro production push after that should pick it
> up.

OK, thanks, we should get a bunch of recent crashes with this signature reprocessed then (and attach the new signature here) and we probably should get a message to stability@m.o to watch out for changing signatures (and better stacks).
If an unlinked nsGeolocationRequest somehow stays alive, then calling
Notify() on it will likely cause a null-deref crash. I'm not sure
exactly how that could happen, but maybe this will at least fix the
crash. mLocator is only cleared in Unlink.
Attachment #8744402 - Flags: review?(josh)
Attachment #8744402 - Flags: review?(josh) → review+
blocking-b2g: 2.2r? → ---
https://hg.mozilla.org/mozilla-central/rev/1e006c7b5eda
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
[Tracking Requested - why for this release]:
this is around in 47 builds as well. in the last builds of the aurora cycle the signature is at #7 with 1.7% of crashes. maybe the fix could be uplifted to 47?
(In reply to philipp from comment #15)
> maybe the fix could be uplifted to 47?

Yeah, definitely, if it actually works this time. :)

Kyle, once you've updated your Nightly, could you see if you can still reproduce this? Thanks. Not that it is a guarantee, but it would give us at least some partial information.
Flags: needinfo?(khuey)
Summary: crash in nsTArray_Impl<T>::IndexOf<T> → Still crashing in nsTArray_Impl<T>::IndexOf<T> | mozilla::dom::Geolocation::RemoveRequest
so far there are no more crashes after build 20160424030601, which is a good sign :-)
Hi Andrew, given the promising data from Nightly, should we uplift this to Beta47? Thanks!
Flags: needinfo?(continuation)
Comment on attachment 8744402 [details] [diff] [review]
Don't Notify() an unlinked nsGeolocationRequest.

Approval Request Comment
[Feature/regressing bug #]: bug 1238427
[User impact if declined]: crashes
[Describe test coverage new/current, TreeHerder]: this code in general is tested, but this particular part isn't well-tested
[Risks and why]: low-risk: it just adds a null check
[String/UUID change made/needed]: none
Flags: needinfo?(khuey)
Flags: needinfo?(continuation)
Attachment #8744402 - Flags: approval-mozilla-beta?
Attachment #8744402 - Flags: approval-mozilla-aurora?
Comment on attachment 8744402 [details] [diff] [review]
Don't Notify() an unlinked nsGeolocationRequest.

Nightly data on this crash fix looks promising, Aurora48+, Beta47+
Attachment #8744402 - Flags: approval-mozilla-beta?
Attachment #8744402 - Flags: approval-mozilla-beta+
Attachment #8744402 - Flags: approval-mozilla-aurora?
Attachment #8744402 - Flags: approval-mozilla-aurora+
Comment on attachment 8744402 [details] [diff] [review]
Don't Notify() an unlinked nsGeolocationRequest.

This is already in Fx48 (which was Nightly pre-merge) when the fix landed.
Attachment #8744402 - Flags: approval-mozilla-aurora+
FWIW I wasn't able to reproduce this on yesterday's nightly.
I don't think this is common enough of a crash to justify backporting to ESR45, but it would be easy to do.
Blocks: 1238427
No longer blocks: 1256061
You need to log in before you can comment on or make changes to this bug.