Closed Bug 1232558 Opened 9 years ago Closed 8 years ago

Intermittent e10s leakcheck | tab process: 12537 bytes leaked (ChannelEventQueue, ChildDNSService, CompareCache, CompareManager, CompareNetwork, ...)

Categories

(Core :: DOM: Service Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
e10s + ---

People

(Reporter: philor, Assigned: bkelly)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 5 obsolete files)

Component: Networking: DNS → DOM: Service Workers
Flags: needinfo?(ehsan)
Blocks: e10s-tests
tracking-e10s: --- → +
As far as I can tell this has triggered 4 times so far:

  https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232558

Its most likely fall out from bug 1226443 starting update activity late during shutdown.

I'd like to get bug 1231974 landed before looking at this.
Blocks: 1226443
Flags: needinfo?(ehsan)
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Theory:

1) Delayed update is scheduled during a test
2) Tests complete and framework starts cleaning up
3) Delayed update fires, calling PropagateSoftUpdate() which sends a message to parent
4) SWM gets xpcom-shutdown and starting shutting down. No update job is present to be canceled.
5) Parent calls back from the PropagateSoftUpdate() with NotifySoftUpdate()
6) Update job is queued and runs during shutdown.  This job is not canceled, because its scheduled after xpcom-shutdown.

This patch makes us just short-circuit in SoftUpdate if we get a NotifySoftUpdate() after shutdown.

Let's see if it helps:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c299e3d828df
Still got a leak with that last patch.  I realized we are not canceling queued jobs at shutdown, though.  Lets see if that helps as well.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e7e543fa561b
Attachment #8701334 - Attachment is obsolete: true
A few more shutdown checks added to this patch.
Attachment #8701485 - Attachment is obsolete: true
And then pipe the cancel/abort down into the ServiceWorkerScriptCache CompareManager.  Without this if we are already running the comparison the cancelation at shutdown does nothing.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c3c9719380b9
Still getting leaks here.  I'm starting to wonder if we're falling victim to the same necko leak in bug 1218297.  I see an EventTokenBucket in the leak list.
See Also: → 1218297
Try build with just the necko leak fix:

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=214fc0e2b765

Try build with the necko leak fix and the patches in this bug:

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=11867b4133c6

Lets see if the leak shows up in either of these.
Comment on attachment 8701629 [details] [diff] [review]
P1 Try to ensure service worker jobs do not run during shutdown. r=ehsan

The try runs show that the leak happens, but suggests these patches reduce the frequency.  So I'd like to proceed with them for now.

I'm doing more triggers, but they seem to reduce frequency from 15% to 5% on linux debug m-e10s(1).
Attachment #8701629 - Flags: review?(ehsan)
Attachment #8701600 - Flags: review?(ehsan)
Comment on attachment 8701629 [details] [diff] [review]
P1 Try to ensure service worker jobs do not run during shutdown. r=ehsan

Further triggers show the frequency has not actually dropped.
Attachment #8701629 - Flags: review?(ehsan)
Comment on attachment 8701600 [details] [diff] [review]
P2 Abort the ServiceWorkerScriptCache CompareManager at xpcom-shutdown. r=ehsan

This patch is probably a good start, though, since it consistently changes the number of objects leaked.
Attachment #8701600 - Flags: review?(ehsan)
I'll have to resume work on this in January.
See Also: → 1233774
I think we must spin the event loop here in order to gracefully let our network objects cleanup after being canceled.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f38bab11917
Depends on: 1237158
So I believe this leak is greatly exacerbated by excessive updating in the dom/canvas/test mochitests.  I filed bug 1237158 to unregister the service worker there and avoid these updates.  That should reduce the frequency of this bug.

I'd still like to make SWM shutdown cleanly here, though.
I think my shutdown hang issues with the "block shutdown until update job exits out" patch is due to e10s channels not firing OnStopRequest if the actor is torn down.
Jason, do you have any idea why I would not get an OnStopRequest callback from an e10s http channel when I call Cancel() during xpcom-shutdown?  I am spinning the event loop waiting for the callback, but it never seems to come.  I'm having real trouble gracefully closing network connections during shutdown in e10s mode.
Flags: needinfo?(jduell.mcbugs)
See Also: → 1232555
We have decided to simply avoid service workers running during shutdown via bug 1237363.  This has not triggered since bug 1237158 landed.  I'm going to close for now.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
No longer blocks: 1233774
> do you have any idea why I would not get an OnStopRequest callback from an e10s
> http channel when I call Cancel() during xpcom-shutdown? 

Not really.  We're having major issues with TCP/UDP sockets hanging on windows during shutdown (close() never returns), but that generally shouldn't affect HTTP OnStopRequest, which happens when all the bytes for the HTTP request have been received, not when the socket is closed.
Flags: needinfo?(jduell.mcbugs)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: