nsGeolocationRequest::mTimeoutTimer can cause windows to leak

RESOLVED FIXED in Firefox 45

Status

()

RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: karlt, Assigned: mccr8)

Tracking

(Blocks: 1 bug, {memory-leak})

43 Branch
mozilla46
Unspecified
All
memory-leak
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox43 wontfix, firefox44 wontfix, firefox45 fixed, firefox46 fixed, firefox-esr38 wontfix)

Details

(Whiteboard: [MemShrink:P1], URL)

Attachments

(1 attachment, 4 obsolete attachments)

(Reporter)

Description

3 years ago
1. Load about:memory
2. Open new tab and load http://www.bunnings.co.nz/coolaroo-bird-netting-4-x-4m_p00293358
3. Close tab.
4. Minimize memory use.
5. Measure.

Reproduced on nightly without e10s:

4 (100.0%) -- ghost-windows
├──2 (50.00%) ── http://s7.addthis.com/static/sh.63ce9099e9ce7d094433c330.html#rand=0.927425674208481&iit=1452480348184&tmr=load%3D1452480347536%26core%3D1452480347580%26main%3D1452480348172%26ifr%3D1452480348197&cb=0&cdn=0&md=0&kw=&ab=-&dh=www.bunnings.co.nz&dr=&du=http%3A%2F%2Fwww.bunnings.co.nz%2Fcoolaroo-bird-netting-4-x-4m_p00293358&href=http%3A%2F%2Fwww.bunnings.co.nz%2Fcoolaroo-bird-netting-4-x-4m_p00293358&dt=Coolaroo%20Bird%20Netting%204%20x%204m%20%7C%20Bunnings%20Warehouse&dbg=0&cap=tc%3D0%26ab%3D0&inst=1&jsl=1&prod=undefined&lng=en&ogt=image%2Curl%2Csite_name%2Cdescription%2Ctype%3Dwebsite%2Ctitle&pc=men&pub=ra-4e2f690a7922f66b&ssl=0&sid=5693175b03ab2bd7&srpl=1&srd=1&srf=0.01&srx=1&ver=300&xck=0&xtr=0&og=title%3DCoolaroo%2520Bird%2520Netting%25204%2520x%25204m%26type%3Dwebsite%26description%3DFind%2520Coolaroo%2520Bird%2520Netting%25204%2520x%25204m%2520for%2520the%2520lowest%2520prices%2520at%2520Bunnings%2520Warehouse.%2520Visit%2520your%2520local%2520store%2520for%2520the%2520widest%2520range%2520of%2520GARDEN%2520LIFESTYLE%2520%253E%2520GARDEN%2520CARE%2520%253E%2520GARDEN%2520CLOTH%2520%253E%2520BIRD%2520NETTING%2520%253E%2520PACK%2520products.%26site_name%3DBunnings%2520New%2520Zealand%26url%3Dhttp%253A%252F%252Fwww.bunnings.co.nz%252Four-range%252Fgarden%252Flandscaping%252Fplant-protection%252Fbird-netting%252Fcoolaroo%252520bird%252520netting%2525204%252520x%2525204m_00293358%26image%3Dhttp%253A%252F%252Fwww.bunnings.co.nz%252Fassets%252Fimg%252Fbunnings-logo-small-img.jpg&csi=undefined&toLoJson=uvs%3D5693175b36e4cfd1000%26chr%3DUTF-8%26vcl%3D0&rev=v4.0.3-wp&ct=1&xld=1&xd=1 [2]
└──2 (50.00%) ── http://www.bunnings.co.nz/coolaroo-bird-netting-4-x-4m_p00293358 [2]
(Reporter)

Comment 1

3 years ago
First noticed on 43.0b2.  Location service is involved in some:

26 (100.0%) -- ghost-windows
├───6 (23.08%) ── http://stirlingsports.co.nz/store-locator [6]
├───2 (07.69%) ── http://epsonenau.ugc.bazaarvoice.com/5625-en_nz/crossdomain.htm?format=embedded#origin=http%3A%2F%2Fwww.epson.co.nz [2]
├───2 (07.69%) ── http://www.bunnings.co.nz/coolaroo-bird-netting-4-x-4m_p00293358 [2]
├───2 (07.69%) ── http://www.bunnings.co.nz/gardman-garden-netting-4-x-6m_p00153918 [2]
├───2 (07.69%) ── http://www.bunnings.co.nz/gardman-garden-netting-6-x-4m_p00220177 [2]
├───2 (07.69%) ── http://www.bunnings.co.nz/search/products?q=bird%20netting&redirectFrom=Any [2]
├───2 (07.69%) ── http://www.torpedo7.co.nz/products/T7W4RN5YM/title/torpedo7-girl-s-mystic-long-sleeve-rash-top [2]
├───2 (07.69%) ── https://disqus.com/embed/comments/?base=default&version=f3e1717b71e7256da258d3a504e56865&f=openprintingorg&t_u=http%3A%2F%2Fwww.openprinting.org%2Fprinter%2FBrother%2FBrother-MFC-j615w&t_d=Printer%3A%20Brother%20MFC-j615w%20%7C%20OpenPrinting%20-%20The%20Linux%20Foundation&t_t=Printer%3A%20Brother%20MFC-j615w%20%7C%20OpenPrinting%20-%20The%20Linux%20Foundation&s_o=default [2]
├───2 (07.69%) ── https://disqus.com/embed/comments/?base=default&version=f3e1717b71e7256da258d3a504e56865&f=openprintingorg&t_u=http%3A%2F%2Fwww.openprinting.org%2Fprinter%2FEpson%2FEpson-WF-3620_Series&t_d=Printer%3A%20Epson%20WF-3620%20Series%20%7C%20OpenPrinting%20-%20The%20Linux%20Foundation&t_t=Printer%3A%20Epson%20WF-3620%20Series%20%7C%20OpenPrinting%20-%20The%20Linux%20Foundation&s_o=default [2]
├───2 (07.69%) ── https://e1.fanplayr.com/tunnel.html?v3 [2]
├───1 (03.85%) ── http://www.supercheapauto.com.au/online-store/products/Stanfred-Bike-Carrier-Single-Pole-Twist-4-Clamp-Velcro.aspx?pid=326853
└───1 (03.85%) ── http://www.supercheapauto.com.au/online-store/products/Stanfred-Bike-Carrier-Single-Pole-Twist-4-Clamp-Velcro.aspx?pid=326853#Recommendations
Flags: needinfo?(continuation)

Comment 3

3 years ago
(In reply to Karl Tomlinson (ni?:karlt) from comment #0)
> 1. Load about:memory
> 2. Open new tab and load
> http://www.bunnings.co.nz/coolaroo-bird-netting-4-x-4m_p00293358
Am I expected to "Share location"? Or does that not matter?

Comment 4

3 years ago
I guess yes, since I can't reproduce if location is not shared.

Comment 5

3 years ago
Hmm, but can't reproduce with location shared either.

Comment 6

3 years ago
Another question:
can you reproduce on debug build? If so, does the leak show up as a shutdown leak or is it a runtime leak? 
export XPCOM_MEM_LEAK_LOG=1 showing leaks means shutdown leaks. (it is just that runtime leaks tend to be much easier to debug.)
(Assignee)

Comment 7

3 years ago
I can reproduce with e10s disabled. I had to share my location on the drop down thing.

I looked at the CC logs, but it just says that the Bunnings nsGlobalWindow is being held alive.
Assignee: nobody → continuation
(Assignee)

Updated

3 years ago
Flags: needinfo?(continuation)
Keywords: mlk
Whiteboard: [MemShrink]
(Assignee)

Comment 8

3 years ago
In a local debug build, when I click on allow on the location door hanger, I get an additional modal dialogue asking if I want to share my location with the web page. If I close the page, then click "allow" on the modal dialogue, then I get a ghost window. On my regular profile, in Nightly, I don't see the additional confirmation dialogue, which is odd. I was still able to reproduce the leak one time, somehow.
(Assignee)

Comment 9

3 years ago
The ghost window is freed before shutdown, but the browser does leak some objects. 1 CoreLocationLocationProvider, 1 nsGeolocationService, 8 nsStringBuffer, 8 nsTArray_base.
(Assignee)

Comment 10

3 years ago
At least for this leak I'm seeing in debug builds, the missing reference seems to be from nsGeolocationRequest.
(Assignee)

Comment 11

3 years ago
(In reply to Andrew McCreight [:mccr8] from comment #10)
> At least for this leak I'm seeing in debug builds, the missing reference
> seems to be from nsGeolocationRequest.

More specifically, an nsGeolocationRequest gets destroyed in shutdown that is holding alive an nsContentPermissionRequester that is holding alive the window.
(Assignee)

Comment 12

3 years ago
It sounds like Google geolocation is broken right now (bug 1232995). Maybe we're hitting some odd failure case.
(Reporter)

Comment 13

3 years ago
The ghost window doesn't show up every time, but I see it around half the time in an optimized build and in a new profile.  Sometimes I need to Minimize memory use twice before it shows up as a ghost window after Measure.

(In reply to Olli Pettay [:smaug] from comment #3)
> (In reply to Karl Tomlinson (ni?:karlt) from comment #0)
> > 1. Load about:memory
> > 2. Open new tab and load
> > http://www.bunnings.co.nz/coolaroo-bird-netting-4-x-4m_p00293358
> Am I expected to "Share location"? Or does that not matter?

I ignore the door hanger and close the tab with the mouse button.
This dismissed the door hanger before starting the tab-close animation.

(In reply to Olli Pettay [:smaug] from comment #6)
> Another question:
> can you reproduce on debug build? If so, does the leak show up as a shutdown
> leak or is it a runtime leak? 
> export XPCOM_MEM_LEAK_LOG=1 showing leaks means shutdown leaks. (it is just
> that runtime leaks tend to be much easier to debug.)

It does reproduce in a debug build.  I don't yet know re frequency because it takes much longer to run.

XPCOM_MEM_LEAK_LOG=1 did not report any object leaked for me.  I did get

[18597] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /mnt/ssd1/karl/moz/dev/xpcom/base/nsCycleCollector.cpp, line 3563
(Reporter)

Updated

3 years ago
OS: Unspecified → Linux
(Reporter)

Updated

3 years ago
Component: DOM → Geolocation
mccr8 is on OSX and can reproduce this.
OS: Linux → All
(Assignee)

Comment 15

3 years ago
Created attachment 8706680 [details] [diff] [review]
Make window reference from nsContentPermissionRequester weak.

This patch changes the leak from a missing reference to the nsGlobalWindow to a missing reference to two callbacks, which I think is an improvement.
(Assignee)

Comment 16

3 years ago
The two callbacks are being held alive by an nsGeolocationRequest. That in turn is being held alive by the timer thread.

The request is added to a timer in nsGeolocationRequest::SetTimeoutTimer(). The timeout is based on a timeout value from an options object that is passed in. The default value is 2147483647, which is apparently infinity, because according to bug 741131 comment 0, "If timeout was not specified, set the internal timeout variable to Infinity.". I don't know why we bother setting a timer that will never finish, but that's kind of a separate issue, because as smaug pointed out, a web page could just specify an absurdly large timeout, and we don't want to leak the page in that case.

So the bug here is that we need to cancel the timeout somehow when the page goes away. Given that this is an infinite timer, and we don't always leak, we must cancel it in some circumstances and not others.
(Assignee)

Comment 17

3 years ago
I filed bug 1238801 about not starting a timer that will never finish, but like I said that is orthogonal to the leak.
(Assignee)

Comment 18

3 years ago
Created attachment 8707158 [details] [diff] [review]
Make window reference from nsContentPermissionRequester weak.

I don't know if this is needed to fix the leak but it seems like a good idea.
(Assignee)

Comment 19

3 years ago
Created attachment 8707159 [details] [diff] [review]
Manually shutdown pending callbacks.

This alone fixes the leak, though it is kind of a bandaid. Maybe it is good to have anyways?
(Assignee)

Comment 20

3 years ago
Created attachment 8707163 [details] [diff] [review]
Pretend the strong timer reference to nsGeolocationRequest is a self reference.

This fixes the leak even without the previous patch, and should be more general. If there's a timeout timer going, then it has a strong reference to the request, but we don't want to keep the request alive if the only thing keeping it alive is the timer. So we tell the CC about reference from the timer, except we pretend it is from the object itself.

Olli, what do you think about this? I like how this patch is general (the approach in the previous patch really requires that you audit every place that calls release on a request, which is scary), but these fake CC edges are always odd.
Attachment #8707163 - Flags: feedback?(bugs)
(Assignee)

Updated

3 years ago
Summary: ghost windows from bunnings → nsGeolocationRequest::mTimeoutTimer can cause windows to leak
Comment on attachment 8707163 [details] [diff] [review]
Pretend the strong timer reference to nsGeolocationRequest is a self reference.

oh, fancy.
And normally nsGeolocationRequests are kept alive by Geolocation object, right?
(and Geolocation is kept alive by Navigator which is kept alive by Window)
Attachment #8707163 - Flags: feedback?(bugs) → feedback+
(Assignee)

Comment 22

3 years ago
(In reply to Olli Pettay [:smaug] from comment #21)
> And normally nsGeolocationRequests are kept alive by Geolocation object,
> right?

Yes. It is possible that some code somewhere implicitly relies on that, but I can't imagine anything important happens to a geolocation object after its page goes away.
Whiteboard: [MemShrink] → [MemShrink:P1]
(Assignee)

Comment 23

3 years ago
The patch is currently crashing during CC in the xpcshell test test_geolocation_reset_accuracy.js. I think the callback is being cleared from the timer (I guess by the timers being shut down) before the final CC, so we end up reporting more references to the request than the refcount. The obvious solution of checking the callback of the timer does not work because the best you can do is to return the callback, which AddRefs the request during a CC, which is bad.
Add HasCallback() to nsITimer ?
(Assignee)

Comment 25

3 years ago
Created attachment 8709643 [details] [diff] [review]
Avoid a strong reference from the timeout timer to nsGeolocationRequest.

The timeout timer of a geolocation request holds a strong reference to the request. This can cause the window to leak if the request is not completed before the tab containing the window is closed.

To fix this, I made the timer instead hold a strong reference to a wrapper class that has only a weak reference to the request. The request destructor must now cancel the timeout timer.

I also outlined a call to StopTimeoutTimer() in nsGeolocationRequest::Shutdown().

This adds a browser-chrome test, because those already detect leaks-until-shutdown. I verified that the test leaks without my patch and doesn't leak with it. I don't really know what I'm doing, so please take a look at it.

try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ede1cf265db5
Attachment #8709643 - Flags: review?(josh)
(Assignee)

Updated

3 years ago
Attachment #8706680 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
Attachment #8707158 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
Attachment #8707159 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
Attachment #8707163 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
status-firefox43: --- → wontfix
status-firefox44: --- → wontfix
status-firefox45: --- → affected
status-firefox46: --- → affected
status-firefox-esr38: --- → wontfix

Updated

3 years ago
Attachment #8709643 - Flags: review?(josh) → review+
(Assignee)

Updated

3 years ago
Keywords: checkin-needed
(Assignee)

Updated

3 years ago
Blocks: 1181677
(Assignee)

Updated

3 years ago
Duplicate of this bug: 1041219
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/f87d296b0528 for "/home/worker/workspace/build/src/dom/geolocation/nsGeolocation.cpp:119:5: error: bad implicit conversion constructor for 'TimerCallbackHolder'", https://treeherder.mozilla.org/logviewer.html#?job_id=20170911&repo=mozilla-inbound
So we need this at least on Aurora.
(I would even consider to Beta, but that is unlikely to be approved)
(Assignee)

Comment 30

3 years ago
I added the missing explicit to the ctor, which should fix the issue in comment 28.

Comment 32

3 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/b09c86460959
Status: NEW → RESOLVED
Last Resolved: 3 years ago
status-firefox46: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
(Assignee)

Comment 33

3 years ago
Comment on attachment 8709643 [details] [diff] [review]
Avoid a strong reference from the timeout timer to nsGeolocationRequest.

This is an approval request for beta 45.

Approval Request Comment
[Feature/regressing bug #]: old as far as I can tell
[User impact if declined]: bad leaks on pages with geolocation if the tab is closed before the geolocation request finishes.
[Describe test coverage new/current, TreeHerder]: There's some testing.
[Risks and why]: low. This should only substantially change behavior when there would have otherwise been a leak.
[String/UUID change made/needed]: none
Attachment #8709643 - Flags: approval-mozilla-beta?
Comment on attachment 8709643 [details] [diff] [review]
Avoid a strong reference from the timeout timer to nsGeolocationRequest.

Fix a leak + has tests, taking it.
Attachment #8709643 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 35

3 years ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/76c81f89ca91
status-firefox45: affected → fixed
(Assignee)

Updated

3 years ago
Depends on: 1256061
(Assignee)

Updated

2 years ago
Depends on: 1263001
You need to log in before you can comment on or make changes to this bug.