Closed Bug 1256061 Opened 4 years ago Closed 4 years ago

crash in nsTArray_Impl<T>::IndexOf<T> | mozilla::dom::Geolocation::RemoveRequest

Categories

(Core :: DOM: Geolocation, defect, critical)

46 Branch
All
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox46 + fixed
firefox47 --- fixed
firefox48 --- fixed

People

(Reporter: mats, Assigned: mccr8)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file, 1 obsolete file)

[Tracking Requested - why for this release]:

This bug was filed from the Socorro interface and is 
report bp-cc61b79e-b06b-43ad-a608-3bf672160311.
=============================================================

Looks like a recent regression, starting in v46 ?
All crashes are on Windows, so far.
All crashes are null-pointer crashes:
 x86 	EXCEPTION_ACCESS_VIOLATION_READ 	0x1c
amd64 	EXCEPTION_ACCESS_VIOLATION_READ 	0x38 

Stack:
nsTArray_Impl<unsigned int, nsTArrayInfallibleAllocator>::IndexOf<unsigned int, nsDefaultComparator<unsigned int, unsigned int> >(unsigned int const&, unsigned int, nsDefaultComparator<unsigned int, unsigned int> const&)
mozilla::dom::Geolocation::RemoveRequest(nsGeolocationRequest*)
nsGeolocationRequest::NotifyErrorAndShutdown(unsigned short)
nsGeolocationRequest::TimerCallbackHolder::Notify(nsITimer*)
nsTimerImpl::Fire()
nsTimerEvent::Run()
nsThread::ProcessNextEvent(bool, bool*)
NS_ProcessNextEvent(nsIThread*, bool)
mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
MessageLoop::RunHandler()
MessageLoop::Run()
nsBaseAppShell::Run()
nsAppShell::Run()
nsAppStartup::Run()
XREMain::XRE_mainRun()
[...]
New crash in 46, tracking it.
ywu, can you take a look? 
dougt, just making sure you see this as you may be able to find a good owner for it.
Flags: needinfo?(ywu)
Flags: needinfo?(dougt)
Hi Andrew,
could you check this bug?
I think the bug may be caused by your patch in bug 1238427.
Flags: needinfo?(ywu) → needinfo?(continuation)
Thanks for the needinfo. Yeah, at first glance I need to null check mRequest.get() and not mRequest.
Assignee: nobody → continuation
Flags: needinfo?(dougt)
Flags: needinfo?(continuation)
Blocks: 1238427
oh, WeakPtr is annoyingly different to nsWeakPtr
Anyhow, sounds like a bug in WeakPtr.
The try run hasn't come back yet, but this seems simple. I'm not really sure how to write a test for this.

try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=97d3ccc71186
Attachment #8731370 - Flags: review?(josh)
Attachment #8731370 - Flags: review?(josh) → review+
See Also: → 1257296
https://hg.mozilla.org/mozilla-central/rev/9e8c140ba97d
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Comment on attachment 8731370 [details] [diff] [review]
Actually check if the underlying referent still exists in nsGeolocationRequest::TimerCallbackHolder::Notify().

Approval Request Comment
[Feature/regressing bug #]: bug 1238427
[User impact if declined]: crashes
[Describe test coverage new/current, TreeHerder]: no tests
[Risks and why]: low, this just fixes up a null check
[String/UUID change made/needed]: none
Attachment #8731370 - Flags: approval-mozilla-beta?
Attachment #8731370 - Flags: approval-mozilla-aurora?
Comment on attachment 8731370 [details] [diff] [review]
Actually check if the underlying referent still exists in nsGeolocationRequest::TimerCallbackHolder::Notify().

Crash fix, Aurora47+, Beta46+
Attachment #8731370 - Flags: approval-mozilla-beta?
Attachment #8731370 - Flags: approval-mozilla-beta+
Attachment #8731370 - Flags: approval-mozilla-aurora?
Attachment #8731370 - Flags: approval-mozilla-aurora+
For what it is worth, there haven't been any of these crashes on Nightly since my patch landed. However, on Nightly the crash isn't common enough to really tell if it is a fix. I think once the patch is on Aurora we should be able to tell.
Comment on attachment 8731370 [details] [diff] [review]
Actually check if the underlying referent still exists in nsGeolocationRequest::TimerCallbackHolder::Notify().

dholbert pointed out that my patch probably is not actually changing the behavior, because before my patch it is probably using operator* which just does get. I was looking at WeakReference instead of WeakPtr. Furthermore, this crash showed up in the 3-22 build, confirming my fix was not a fix. I'll have to investigate further.
Attachment #8731370 - Flags: approval-mozilla-beta+
Attachment #8731370 - Flags: approval-mozilla-aurora+
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I think it is possible for the TimerCallbackHolder to fire off a
Notify() while the geolocation object and the nsGeolocationRequest are
only holding each other alive, so they would be freed by the cycle
collector the next time it runs, but we haven't run the cycle
collector yet. If that happens, then Geolocation::RemoveRequest()
would break the cycle, causing stuff to unravel and bad things to
happen. To fix this, we just hold the request alive in
TimerCallbackHolder::Notify(), which will also ensure that the
geolocation object is alive, hopefully preventing crashes.

This will make the Notify() behavior similar to what it was before bug
1238427, when the nsITimer object would hold a strong reference to the
request when the Notify() was being run.
Attachment #8733986 - Flags: review?(josh)
Comment on attachment 8733986 [details] [diff] [review]
Hold a strong reference to a request when we call a method on it.

Review of attachment 8733986 [details] [diff] [review]:
-----------------------------------------------------------------

Delightful!
Attachment #8733986 - Flags: review?(josh) → review+
(I'm going to back out my prior patch before landing the other one, just to ensure cleaner backporting.)

try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3a350b491ffa
https://hg.mozilla.org/mozilla-central/rev/403d085af5dd
https://hg.mozilla.org/mozilla-central/rev/9d73e42396c8
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Attachment #8731370 - Attachment is obsolete: true
Comment on attachment 8733986 [details] [diff] [review]
Hold a strong reference to a request when we call a method on it.

Approval Request Comment
[Feature/regressing bug #]: bug 1238427
[User impact if declined]: crashes
[Describe test coverage new/current, TreeHerder]: none
[Risks and why]: very low, this mostly restores existing behavior
[String/UUID change made/needed]: none

I'd like to get this on Aurora quickly because the crash happens much more frequently on Aurora than on other branches, so it would help me verify that this fixes the crash. I'd then require approval for beta.
Attachment #8733986 - Flags: approval-mozilla-aurora?
Comment on attachment 8733986 [details] [diff] [review]
Hold a strong reference to a request when we call a method on it.

Crash fix that will work (positive thinking ;), Aurora47+
Attachment #8733986 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8733986 [details] [diff] [review]
Hold a strong reference to a request when we call a method on it.

Approval Request Comment

See comment 20. There have been no crashes on Aurora or Nightly since this patch landed on those branches.
Attachment #8733986 - Flags: approval-mozilla-beta?
Comment on attachment 8733986 [details] [diff] [review]
Hold a strong reference to a request when we call a method on it.

Crash fix, verified on aurora, let's try to get this into 46 beta 6.
Attachment #8733986 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1263001
No longer depends on: 1263001
You need to log in before you can comment on or make changes to this bug.