Closed Bug 1612009 Opened 6 years ago Closed 5 years ago

test_https_origin_after_redirect.html leaks with DocumentChannel and SW-e10s disabled

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox72 --- unaffected
firefox73 --- wontfix
firefox74 --- wontfix
firefox75 --- wontfix
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix

People

(Reporter: mccr8, Assigned: asuth)

References

(Regression)

Details

(Keywords: memory-leak, regression, Whiteboard: [MemShrink:P2][necko-triaged])

Attachments

(1 file, 1 obsolete file)

If you apply the patch in bug 1610888, then the test dom/serviceworkers/test/test_https_origin_after_redirect.html leaks. This got papered over on beta, and is probably too late to fix, but it would be good to understand what is happening.

The leaks are a BrowsingContext, a BrowsingContextGroup, a CanonicalBrowsingContext (probably the same thing as the BrowsingContext), a SessionStorageManager, and a few other things.

These kind of hold onto each other, so the question is, what is keeping the clump alive? I looked at the shutdown CC logs, and the BrowsingContext is being held alive by a BrowserParent, which holds alive the rest of it. That shows up in the first CC log, then they never appear again.

Note that BrowserParent doesn't show up as a leak. My theory is that something is holding the BrowserParent alive, we run the shutdown CC, then whatever it is releases the BrowserParent, which releases the clump of objects, but we're not going to run the CC again, so we leak. It is possible that the BrowserParent is being leaked until shutdown, which is not good.

Keywords: memory-leak
Whiteboard: [MemShrink:P2]

It looks like the BrowserParent is being released when we destroy a channel at shutdown. I'll attach a stack for the final release once the (slow) OSX symbolicating finishes.

Component: DOM: Navigation → Networking

Andreas, can you look into this a bit and see if we need to do something else in 73 to ensure we don't leak?

Flags: needinfo?(afarre)

I'm going to try to patch and reproduce, but my initial thougths go to the cycle collected fields of CanonicalBrowsingContext that are neither traversed or unlinked. That's my initial hypothesis at lest.

Assignee: nobody → afarre
Status: NEW → ASSIGNED
Flags: needinfo?(afarre)

That's a good thing to look at. The channel stuff being involved also makes me think about the various leaks we have when we forget to call OnStop on a channel or whatever. Unfortunately, I don't really know about how that is supposed to work.

Including the PendingRemotenessChange in CC did nothing, but I tried adding:

--- a/dom/ipc/BrowserParent.cpp
+++ b/dom/ipc/BrowserParent.cpp
@@ -741,16 +741,17 @@ void BrowserParent::ActorDestroy(ActorDestroyReason why) {
 
       if (!mBrowsingContext->IsTopContent()) {
         OnSubFrameCrashed();
       }
     }
   }
 
   mFrameLoader = nullptr;
+  mBrowsingContext = nullptr;
 }

and this removes the leak, so there's definitely a cycle going through here. Not sure if this is the correct fix, but it's /a/ fix at least.

Make sure to null out mBrowsingContext when
BrowserParent::ActorDestroy is called, or else we'll leak it.

Do we know at this point what the in-the-wild impact of this leak is? Is this likely to be something high-magnitude or can we safely say that shipping 73 with it isn't likely to be the end of the world?

Priority: -- → P2
Whiteboard: [MemShrink:P2] → [MemShrink:P2][necko-triaged]

Jens, is there a service worker person who could look at this issue? Leaking a channel in the RedirectChannelRegistrar until shutdown in a service worker test that involves redirection feels like it could be a service worker issue. Hopefully we're shipping DocumentChannel in the next release, but I do worry that there might be an underlying issue that persists even with DocumentChannel, so it would be good if we could understand what is going on here better. Thanks.

Flags: needinfo?(jstutte)
Flags: needinfo?(jstutte) → needinfo?(bugmail)

Normal opt build pernosco trace, investigating still ongoing, leaving needinfo:
https://pernos.co/debug/RWk9ovrL1s4Fqy7ZEz-auw/index.html

Attachment #9125043 - Attachment is obsolete: true

Can I assign the bug to you while you're working on it? I can take it again if it turns out to not be service workers?

Yes, taking. Leaving this in Networking for now because it's likely to be Channel-related still.

Assignee: afarre → bugmail
Flags: needinfo?(bugmail)

Will you come to this soon, Andrew? Thanks!

Flags: needinfo?(bugmail)

I should be able to get to this next week.

Flags: needinfo?(bugmail)
Flags: needinfo?(jstutte)

Setting as wontfix for 77 given that it is an unresolved p2.

Flags: needinfo?(jstutte)

Marking this WFM since all loads now use sw-e10s and DocumentChannel. Also, I believe there were some ordering consistency fixes made in DocumentChannel in the course of making some enhancements related to that.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: