Closed Bug 1238427 Opened 8 years ago Closed 8 years ago

nsGeolocationRequest::mTimeoutTimer can cause windows to leak

Categories

(Core :: DOM: Geolocation, defect)

43 Branch
Unspecified
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox43 --- wontfix
firefox44 --- wontfix
firefox45 --- fixed
firefox46 --- fixed
firefox-esr38 --- wontfix

People

(Reporter: karlt, Assigned: mccr8)

References

(Blocks 1 open bug, )

Details

(Keywords: memory-leak, Whiteboard: [MemShrink:P1])

Attachments

(1 file, 4 obsolete files)

1. Load about:memory
2. Open new tab and load http://www.bunnings.co.nz/coolaroo-bird-netting-4-x-4m_p00293358
3. Close tab.
4. Minimize memory use.
5. Measure.

Reproduced on nightly without e10s:

4 (100.0%) -- ghost-windows
├──2 (50.00%) ── http://s7.addthis.com/static/sh.63ce9099e9ce7d094433c330.html#rand=0.927425674208481&iit=1452480348184&tmr=load%3D1452480347536%26core%3D1452480347580%26main%3D1452480348172%26ifr%3D1452480348197&cb=0&cdn=0&md=0&kw=&ab=-&dh=www.bunnings.co.nz&dr=&du=http%3A%2F%2Fwww.bunnings.co.nz%2Fcoolaroo-bird-netting-4-x-4m_p00293358&href=http%3A%2F%2Fwww.bunnings.co.nz%2Fcoolaroo-bird-netting-4-x-4m_p00293358&dt=Coolaroo%20Bird%20Netting%204%20x%204m%20%7C%20Bunnings%20Warehouse&dbg=0&cap=tc%3D0%26ab%3D0&inst=1&jsl=1&prod=undefined&lng=en&ogt=image%2Curl%2Csite_name%2Cdescription%2Ctype%3Dwebsite%2Ctitle&pc=men&pub=ra-4e2f690a7922f66b&ssl=0&sid=5693175b03ab2bd7&srpl=1&srd=1&srf=0.01&srx=1&ver=300&xck=0&xtr=0&og=title%3DCoolaroo%2520Bird%2520Netting%25204%2520x%25204m%26type%3Dwebsite%26description%3DFind%2520Coolaroo%2520Bird%2520Netting%25204%2520x%25204m%2520for%2520the%2520lowest%2520prices%2520at%2520Bunnings%2520Warehouse.%2520Visit%2520your%2520local%2520store%2520for%2520the%2520widest%2520range%2520of%2520GARDEN%2520LIFESTYLE%2520%253E%2520GARDEN%2520CARE%2520%253E%2520GARDEN%2520CLOTH%2520%253E%2520BIRD%2520NETTING%2520%253E%2520PACK%2520products.%26site_name%3DBunnings%2520New%2520Zealand%26url%3Dhttp%253A%252F%252Fwww.bunnings.co.nz%252Four-range%252Fgarden%252Flandscaping%252Fplant-protection%252Fbird-netting%252Fcoolaroo%252520bird%252520netting%2525204%252520x%2525204m_00293358%26image%3Dhttp%253A%252F%252Fwww.bunnings.co.nz%252Fassets%252Fimg%252Fbunnings-logo-small-img.jpg&csi=undefined&toLoJson=uvs%3D5693175b36e4cfd1000%26chr%3DUTF-8%26vcl%3D0&rev=v4.0.3-wp&ct=1&xld=1&xd=1 [2]
└──2 (50.00%) ── http://www.bunnings.co.nz/coolaroo-bird-netting-4-x-4m_p00293358 [2]
First noticed on 43.0b2.  Location service is involved in some:

26 (100.0%) -- ghost-windows
├───6 (23.08%) ── http://stirlingsports.co.nz/store-locator [6]
├───2 (07.69%) ── http://epsonenau.ugc.bazaarvoice.com/5625-en_nz/crossdomain.htm?format=embedded#origin=http%3A%2F%2Fwww.epson.co.nz [2]
├───2 (07.69%) ── http://www.bunnings.co.nz/coolaroo-bird-netting-4-x-4m_p00293358 [2]
├───2 (07.69%) ── http://www.bunnings.co.nz/gardman-garden-netting-4-x-6m_p00153918 [2]
├───2 (07.69%) ── http://www.bunnings.co.nz/gardman-garden-netting-6-x-4m_p00220177 [2]
├───2 (07.69%) ── http://www.bunnings.co.nz/search/products?q=bird%20netting&redirectFrom=Any [2]
├───2 (07.69%) ── http://www.torpedo7.co.nz/products/T7W4RN5YM/title/torpedo7-girl-s-mystic-long-sleeve-rash-top [2]
├───2 (07.69%) ── https://disqus.com/embed/comments/?base=default&version=f3e1717b71e7256da258d3a504e56865&f=openprintingorg&t_u=http%3A%2F%2Fwww.openprinting.org%2Fprinter%2FBrother%2FBrother-MFC-j615w&t_d=Printer%3A%20Brother%20MFC-j615w%20%7C%20OpenPrinting%20-%20The%20Linux%20Foundation&t_t=Printer%3A%20Brother%20MFC-j615w%20%7C%20OpenPrinting%20-%20The%20Linux%20Foundation&s_o=default [2]
├───2 (07.69%) ── https://disqus.com/embed/comments/?base=default&version=f3e1717b71e7256da258d3a504e56865&f=openprintingorg&t_u=http%3A%2F%2Fwww.openprinting.org%2Fprinter%2FEpson%2FEpson-WF-3620_Series&t_d=Printer%3A%20Epson%20WF-3620%20Series%20%7C%20OpenPrinting%20-%20The%20Linux%20Foundation&t_t=Printer%3A%20Epson%20WF-3620%20Series%20%7C%20OpenPrinting%20-%20The%20Linux%20Foundation&s_o=default [2]
├───2 (07.69%) ── https://e1.fanplayr.com/tunnel.html?v3 [2]
├───1 (03.85%) ── http://www.supercheapauto.com.au/online-store/products/Stanfred-Bike-Carrier-Single-Pole-Twist-4-Clamp-Velcro.aspx?pid=326853
└───1 (03.85%) ── http://www.supercheapauto.com.au/online-store/products/Stanfred-Bike-Carrier-Single-Pole-Twist-4-Clamp-Velcro.aspx?pid=326853#Recommendations
Flags: needinfo?(continuation)
(In reply to Karl Tomlinson (ni?:karlt) from comment #0)
> 1. Load about:memory
> 2. Open new tab and load
> http://www.bunnings.co.nz/coolaroo-bird-netting-4-x-4m_p00293358
Am I expected to "Share location"? Or does that not matter?
I guess yes, since I can't reproduce if location is not shared.
Hmm, but can't reproduce with location shared either.
Another question:
can you reproduce on debug build? If so, does the leak show up as a shutdown leak or is it a runtime leak? 
export XPCOM_MEM_LEAK_LOG=1 showing leaks means shutdown leaks. (it is just that runtime leaks tend to be much easier to debug.)
I can reproduce with e10s disabled. I had to share my location on the drop down thing.

I looked at the CC logs, but it just says that the Bunnings nsGlobalWindow is being held alive.
Assignee: nobody → continuation
Flags: needinfo?(continuation)
Keywords: mlk
Whiteboard: [MemShrink]
In a local debug build, when I click on allow on the location door hanger, I get an additional modal dialogue asking if I want to share my location with the web page. If I close the page, then click "allow" on the modal dialogue, then I get a ghost window. On my regular profile, in Nightly, I don't see the additional confirmation dialogue, which is odd. I was still able to reproduce the leak one time, somehow.
The ghost window is freed before shutdown, but the browser does leak some objects. 1 CoreLocationLocationProvider, 1 nsGeolocationService, 8 nsStringBuffer, 8 nsTArray_base.
At least for this leak I'm seeing in debug builds, the missing reference seems to be from nsGeolocationRequest.
(In reply to Andrew McCreight [:mccr8] from comment #10)
> At least for this leak I'm seeing in debug builds, the missing reference
> seems to be from nsGeolocationRequest.

More specifically, an nsGeolocationRequest gets destroyed in shutdown that is holding alive an nsContentPermissionRequester that is holding alive the window.
It sounds like Google geolocation is broken right now (bug 1232995). Maybe we're hitting some odd failure case.
The ghost window doesn't show up every time, but I see it around half the time in an optimized build and in a new profile.  Sometimes I need to Minimize memory use twice before it shows up as a ghost window after Measure.

(In reply to Olli Pettay [:smaug] from comment #3)
> (In reply to Karl Tomlinson (ni?:karlt) from comment #0)
> > 1. Load about:memory
> > 2. Open new tab and load
> > http://www.bunnings.co.nz/coolaroo-bird-netting-4-x-4m_p00293358
> Am I expected to "Share location"? Or does that not matter?

I ignore the door hanger and close the tab with the mouse button.
This dismissed the door hanger before starting the tab-close animation.

(In reply to Olli Pettay [:smaug] from comment #6)
> Another question:
> can you reproduce on debug build? If so, does the leak show up as a shutdown
> leak or is it a runtime leak? 
> export XPCOM_MEM_LEAK_LOG=1 showing leaks means shutdown leaks. (it is just
> that runtime leaks tend to be much easier to debug.)

It does reproduce in a debug build.  I don't yet know re frequency because it takes much longer to run.

XPCOM_MEM_LEAK_LOG=1 did not report any object leaked for me.  I did get

[18597] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /mnt/ssd1/karl/moz/dev/xpcom/base/nsCycleCollector.cpp, line 3563
OS: Unspecified → Linux
Component: DOM → Geolocation
mccr8 is on OSX and can reproduce this.
OS: Linux → All
This patch changes the leak from a missing reference to the nsGlobalWindow to a missing reference to two callbacks, which I think is an improvement.
The two callbacks are being held alive by an nsGeolocationRequest. That in turn is being held alive by the timer thread.

The request is added to a timer in nsGeolocationRequest::SetTimeoutTimer(). The timeout is based on a timeout value from an options object that is passed in. The default value is 2147483647, which is apparently infinity, because according to bug 741131 comment 0, "If timeout was not specified, set the internal timeout variable to Infinity.". I don't know why we bother setting a timer that will never finish, but that's kind of a separate issue, because as smaug pointed out, a web page could just specify an absurdly large timeout, and we don't want to leak the page in that case.

So the bug here is that we need to cancel the timeout somehow when the page goes away. Given that this is an infinite timer, and we don't always leak, we must cancel it in some circumstances and not others.
I filed bug 1238801 about not starting a timer that will never finish, but like I said that is orthogonal to the leak.
I don't know if this is needed to fix the leak but it seems like a good idea.
This alone fixes the leak, though it is kind of a bandaid. Maybe it is good to have anyways?
This fixes the leak even without the previous patch, and should be more general. If there's a timeout timer going, then it has a strong reference to the request, but we don't want to keep the request alive if the only thing keeping it alive is the timer. So we tell the CC about reference from the timer, except we pretend it is from the object itself.

Olli, what do you think about this? I like how this patch is general (the approach in the previous patch really requires that you audit every place that calls release on a request, which is scary), but these fake CC edges are always odd.
Attachment #8707163 - Flags: feedback?(bugs)
Summary: ghost windows from bunnings → nsGeolocationRequest::mTimeoutTimer can cause windows to leak
Comment on attachment 8707163 [details] [diff] [review]
Pretend the strong timer reference to nsGeolocationRequest is a self reference.

oh, fancy.
And normally nsGeolocationRequests are kept alive by Geolocation object, right?
(and Geolocation is kept alive by Navigator which is kept alive by Window)
Attachment #8707163 - Flags: feedback?(bugs) → feedback+
(In reply to Olli Pettay [:smaug] from comment #21)
> And normally nsGeolocationRequests are kept alive by Geolocation object,
> right?

Yes. It is possible that some code somewhere implicitly relies on that, but I can't imagine anything important happens to a geolocation object after its page goes away.
Whiteboard: [MemShrink] → [MemShrink:P1]
The patch is currently crashing during CC in the xpcshell test test_geolocation_reset_accuracy.js. I think the callback is being cleared from the timer (I guess by the timers being shut down) before the final CC, so we end up reporting more references to the request than the refcount. The obvious solution of checking the callback of the timer does not work because the best you can do is to return the callback, which AddRefs the request during a CC, which is bad.
Add HasCallback() to nsITimer ?
The timeout timer of a geolocation request holds a strong reference to the request. This can cause the window to leak if the request is not completed before the tab containing the window is closed.

To fix this, I made the timer instead hold a strong reference to a wrapper class that has only a weak reference to the request. The request destructor must now cancel the timeout timer.

I also outlined a call to StopTimeoutTimer() in nsGeolocationRequest::Shutdown().

This adds a browser-chrome test, because those already detect leaks-until-shutdown. I verified that the test leaks without my patch and doesn't leak with it. I don't really know what I'm doing, so please take a look at it.

try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ede1cf265db5
Attachment #8709643 - Flags: review?(josh)
Attachment #8706680 - Attachment is obsolete: true
Attachment #8707158 - Attachment is obsolete: true
Attachment #8707159 - Attachment is obsolete: true
Attachment #8707163 - Attachment is obsolete: true
Attachment #8709643 - Flags: review?(josh) → review+
Keywords: checkin-needed
Blocks: GhostWindows
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/f87d296b0528 for "/home/worker/workspace/build/src/dom/geolocation/nsGeolocation.cpp:119:5: error: bad implicit conversion constructor for 'TimerCallbackHolder'", https://treeherder.mozilla.org/logviewer.html#?job_id=20170911&repo=mozilla-inbound
So we need this at least on Aurora.
(I would even consider to Beta, but that is unlikely to be approved)
I added the missing explicit to the ctor, which should fix the issue in comment 28.
https://hg.mozilla.org/mozilla-central/rev/b09c86460959
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Comment on attachment 8709643 [details] [diff] [review]
Avoid a strong reference from the timeout timer to nsGeolocationRequest.

This is an approval request for beta 45.

Approval Request Comment
[Feature/regressing bug #]: old as far as I can tell
[User impact if declined]: bad leaks on pages with geolocation if the tab is closed before the geolocation request finishes.
[Describe test coverage new/current, TreeHerder]: There's some testing.
[Risks and why]: low. This should only substantially change behavior when there would have otherwise been a leak.
[String/UUID change made/needed]: none
Attachment #8709643 - Flags: approval-mozilla-beta?
Comment on attachment 8709643 [details] [diff] [review]
Avoid a strong reference from the timeout timer to nsGeolocationRequest.

Fix a leak + has tests, taking it.
Attachment #8709643 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1256061
Depends on: 1263001
You need to log in before you can comment on or make changes to this bug.