Closed Bug 1326136 Opened 7 years ago Closed 7 years ago

Intermittent leakcheck | default process: 27858 bytes leaked (BackstagePass, CacheEntry, CacheEntryHandle, CacheFile, CacheFileHandle, ...)

Categories

(Core :: Networking, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox52 + fixed
firefox-esr52 --- fixed
firefox53 + fixed
firefox54 + fixed

People

(Reporter: intermittent-bug-filer, Assigned: mayhemer)

References

Details

(Keywords: intermittent-failure, memory-leak, regression, Whiteboard: [necko-active][stockwell fixed])

Attachments

(1 file, 1 obsolete file)

Component: General → Networking: Cache
Keywords: mlk
I strongly suspect that this and bug 1317290 are the same issue. Both have similar leaked URLs and XHR stuff on the stack of the leaked objects.
See Also: → 1317290
I believe bug 1325438 is also the same underlying issue. CC'ing Shawn Huang as he is working on the other one.
CCing mccr8 here too so we don't end up duplicating efforts across multiple bugs (given comment 1).
This one still seems to be happening. In a few logs, I see http://127.0.0.1:8888/experiments-dummy/manifest as the manifest.
Blocks: 1325148
I looked at about 1/5 of all of the times that this leak happened, and every single one had the single leaked URL http://127.0.0.1:8888/experiments-dummy/manifest
[Tracking Requested - why for this release]: leak regression
The only substantial use of that pref seems to be in browser/experiments/Experiments.jsm. Maybe this is somehow a regression from bug 987512? That landed at the end of December, and it looks like the only substantial change to that file since September.
Flags: needinfo?(kmaglione+bmo)
Component: Networking: Cache → Client: Desktop
Product: Core → Firefox Health Report
I don't think so. This looks like the same basic issue as bug 1317290, which wouldn't have been affected by those changes. I'll see if I can manage to reproduce this locally.
Flags: needinfo?(kmaglione+bmo)
Yeah, I see a AsyncShutdown.profileBeforeChange in there, so my bet is still on that getting registered after shutdown, or something.
The async shutdown service is implemented in JS, so it's hard to see how adding a blocker late in shutdown would lead to these kinds of leaks.

The problem in bug 1326225 turned out to be that it was trying to fetch an HTTPS URL form an HTTP-only test server, which lead to the connection stalling until it timed out, and the requests never being cleaned up until shutdown.

Given how hard this is to reproduce, I'm guessing we're having a similar problem here. If the request is triggered just before shutdown, and it doesn't have time to complete, we just wind up never cleaning it up.
I'm seeing this in all of the logs I look at with leaks, but not in any of the shutdowns with leaks, which supports my suspicion:

INFO - [3037] WARNING: cannot post event if not initialized: file /home/worker/workspace/build/src/netwerk/protocol/http/nsHttpConnectionMgr.cpp, line 232

So I guess that we're either canceling the request without calling the proper handlers when shutting down the network stack, or starting it after the network stack has started shutting down, but not handling it properly.
Component: Client: Desktop → Networking
Product: Firefox Health Report → Core
Tracking 53+ for this leak regression.
(In reply to Kris Maglione [:kmag] from comment #21)
> I'm seeing this in all of the logs I look at with leaks, but not in any of
> the shutdowns with leaks, which supports my suspicion:
> 
> INFO - [3037] WARNING: cannot post event if not initialized: file
> /home/worker/workspace/build/src/netwerk/protocol/http/nsHttpConnectionMgr.
> cpp, line 232
> 
> So I guess that we're either canceling the request without calling the
> proper handlers when shutting down the network stack, or starting it after
> the network stack has started shutting down, but not handling it properly.

After shutdown (exactly: profile-change-net-teardown or xpcom-shutdown) we synchronously fail any request (nsHttpChannel::AsyncOpen fails).

So, this is probably a request in progress being intercepted by shutdown.  I'll try to find out which event it is we fail to dispatch and go on from there.
Assignee: nobody → honzab.moz
Whiteboard: [necko-active]
We are failing here:

mozilla::net::nsHttpConnectionMgr::PostEvent 
mozilla::net::nsHttpConnectionMgr::CancelTransaction 
mozilla::net::nsHttpChannel::Cancel 
mozilla::dom::XMLHttpRequestMainThread::CloseRequest 
mozilla::dom::XMLHttpRequestMainThread::Abort 
mozilla::dom::XMLHttpRequestBinding::abort 

I'll check why that transaction is not canceled as part of the shutdown logic, tho.
Seems like nsHttpConnection::Close called from nsHttpConnectionMgr::OnMsgShutdown posted from nsHttpConnectionMgr::Shutdown doesn't close its (active) transaction, at least not directly.
Blocks: 1332532
Comment on attachment 8827491 [details] [diff] [review]
v1 (close transaction in conn::close() at shutdown)

Review of attachment 8827491 [details] [diff] [review]:
-----------------------------------------------------------------

I think the path we want to include is to have nsHttpConnectionMgr call conn->CloseTransaction(conn->Transaction(), NS_ERROR_ABORT) as it does some things other than just trans->close().. it also calls connection::Close() so it could be a straight substitution for that existing call in connection manager - it would just need to plumb the aIsShutdown param through which isn't currently on CloseTransaction().. but we could add it either as a default param or closeTransaction2 or whatever.

wdyt?
Attachment #8827491 - Flags: review?(mcmanus)
honza - I think you may have missed patrick's comment 36
Flags: needinfo?(odvarko)
(In reply to Randell Jesup [:jesup] from comment #37)
> honza - I think you may have missed patrick's comment 36

you wanted :mayhemer.  no, I didn't.  I just don't have time right now to work on this bug.  but it's on the list.
Flags: needinfo?(odvarko)
connman now calls conn->CloseTransaction(conn->Transaction(), ..ABORT, true) on shutdown

https://treeherder.mozilla.org/#/jobs?repo=try&revision=bcf0c9a95d677a13c71c57d1f0e43702ccf3829d
Attachment #8827491 - Attachment is obsolete: true
Attachment #8838050 - Flags: review?(mcmanus)
Comment on attachment 8838050 [details] [diff] [review]
v2 (close transaction at shutdown with conn::CloseTransaction(conn->Trans()))

Review of attachment 8838050 [details] [diff] [review]:
-----------------------------------------------------------------

thanks honza
Attachment #8838050 - Flags: review?(mcmanus) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9a6f80040aa2
Let nsHttpConnection close it's transaction on shutdown. r=mcmanus
Keywords: checkin-needed
Looks to me like the underlying code issue potentially goes back further even we were seeing these specific leaks in automation on 52? I'm wondering if we should consider backporting here.
Flags: needinfo?(continuation)
Sounds like a good idea.
Flags: needinfo?(continuation)
See Also: → 1336811
https://hg.mozilla.org/mozilla-central/rev/9a6f80040aa2
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
I know you want to let this bake for a few days, so please request approval when you're ready.
Flags: needinfo?(honzab.moz)
Tracking 52/54+ for this regression.
This is a shutdown only small memory leak - the only impact is on the CI. I would not take any risk in backporting it.
52 is going to be our next ESR and at least one dependent bug shows these leaks as happening there as well (and a more severe one at that). Saying we should live with these leaks on a supported branch in CI for the next year doesn't seem entirely reasonable to me unless these patches are exceedingly high-risk IMO.
relman can make the call - just noting there is no user impact here to wontfixing
Comment on attachment 8838050 [details] [diff] [review]
v2 (close transaction at shutdown with conn::CloseTransaction(conn->Trans()))

Approval Request Comment
[Feature/Bug causing the regression]: since ever, I think
[User impact if declined]: none
[Is this code covered by automated tests?]: yes!
[Has the fix been verified in Nightly?]: yes!
[Needs manual test from QE? If yes, steps to reproduce]:  no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: a low-risk patch
[Why is the change risky/not risky?]: should this patch cause shutdown crashes or shutdown hangs, we would already know.
[String changes made/needed]: none
Flags: needinfo?(honzab.moz)
Attachment #8838050 - Flags: approval-mozilla-beta?
Attachment #8838050 - Flags: approval-mozilla-aurora?
Comment on attachment 8838050 [details] [diff] [review]
v2 (close transaction at shutdown with conn::CloseTransaction(conn->Trans()))

Fix an intermittent failure. Aurora53+.
Attachment #8838050 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
See Also: → 1262224
Comment on attachment 8838050 [details] [diff] [review]
v2 (close transaction at shutdown with conn::CloseTransaction(conn->Trans()))

let's take this on beta to help with CI for the esr branch
Attachment #8838050 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [necko-active] → [necko-active][stockwell fixed]
You need to log in before you can comment on or make changes to this bug.