Closed Bug 1409115 Opened 2 years ago Closed 2 years ago

Ghost window on Twitter and YouTube (with a broken profile?)

Categories

(Core :: DOM: Service Workers, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox-esr52 --- disabled
firefox57 --- wontfix
firefox58 + fixed
firefox59 + fixed

People

(Reporter: mccr8, Assigned: mccr8)

References

(Blocks 2 open bugs)

Details

(Keywords: memory-leak, Whiteboard: [MemShrink:P1])

Attachments

(2 files, 2 obsolete files)

STR:

- Open Twitter in my default profile (it does not seem to work when I'm logged into Twitter on a clean profile)
- Click on one of the "cards" of an individual Tweet. I clicked on an image generated from a Tweet containing an image.
- Close the tab.
- Go to about:memory, click on "minimize memory usage" a few times, then click on "measure".

In the resulting memory report, there should be no windows for Twitter still alive, but it seems like there are. Eventually, these turn into ghost windows.

This may be the same issue as bug 1398652, but I'll file it separately because that may be something else, and because it is marked MoCo confidential now.
I bisected this, and got this regression range containing bug 1290481:
  https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=7f1714325cb3e40427e40031a2191fe20cdc3241&tochange=6aa1cc819058deaa13f85278e2f5970b8ced0998

[Tracking Requested - why for this release]: This is a regression that causes a bad leak on a popular site (Twitter). The leak in turn can cause severe browser lagginess.
Blocks: 1290481
Keywords: mlk
Whiteboard: [MemShrink]
Tom, can you investigate this regression from bug 1290481? I can try to help with the leak investigation, but I'm not familiar with the quota manager.
Flags: needinfo?(ttung)
I'm kind of skeptical it could be caused by bug 1290481, to be honest.  Almost all the changes in that bug are in code that runs in the parent process or on IO threads.  It has very little interaction with any DOM objects.
Yeah, I'll retry it a few times.

I skimmed through the patches, and this is the only particularly suspicious thing:
  https://hg.mozilla.org/integration/mozilla-inbound/rev/8c60914e10e9#l3.235
That adds a QuotaInfo, which can contain a strong reference to an nsIFile, which I imagine could end up entraining a window, somehow.
(In reply to Andrew McCreight [:mccr8] from comment #4)
> I skimmed through the patches, and this is the only particularly suspicious
> thing:
>   https://hg.mozilla.org/integration/mozilla-inbound/rev/8c60914e10e9#l3.235
> That adds a QuotaInfo, which can contain a strong reference to an nsIFile,
> which I imagine could end up entraining a window, somehow.

I certainly hope not.  We shouldn't be doing any nsIFile stuff on the main thread.  And that code explicitly runs in the parent process.  (Sandbox prevents opening files in the child, etc.)
I bisected a second time using this command (taken from an intermediate step from my prior bisection) and got the same result:
  ./mach mozregression --profile <my profile> --good 2017-09-05 --bad 2017-09-07

This doesn't quite match up with bug 1398652, which was initially reported against a build from 20170905220108, while bug 1290481 would have first shown up a few days later.
I used git bisect across the 16 patches in the series and according to that this is the regressing change set:

Bug 1290481 - P6: Upgrade QuotaManager to v3.0 for adding directory padding file to existing DOM Cache directory. r=bkelly, janv
    
MozReview-Commit-ID: KlVsaGhpABk

(I'm on Linux in case that matters.)
Are you running into an issue where you are trying to downgrade the disk schema and stuff is breaking?  Did you do the mozregression with the profile being copied or recreated each time?
Flags: needinfo?(continuation)
(In reply to Ben Kelly [:bkelly] from comment #8)
> Are you running into an issue where you are trying to downgrade the disk
> schema and stuff is breaking?  Did you do the mozregression with the profile
> being copied or recreated each time?

I think mozregression creates a fresh copy of the profile for each step of the bisection. My profile is on the latest version of Nightly.
Flags: needinfo?(continuation)
Ok, then I have no explanation for how that rev could possibly create this kind of leak.  Also, I have been running FF57 beta which has that rev and I don't see this leak.  Also, the fact you can't reproduce this on a clean profile suggests there is some kind of schema corruption going on.

We may need a copy of your profile to diagnose the problem.
Andrew, do you have any ideas here?
Flags: needinfo?(bugmail)
Summary: Ghost window on Twitter → Ghost window on Twitter (with a broken profile?)
As I understand the operation of mozregression based on the documentation of its "--profile-persistence" option[1] and its default of "clone", given a nightly profile specified like in comment 6, we expect that every time mozregression runs a build prior to the "P6" patch, QuotaManager and IndexedDB and ServiceWorkers will be broken.  This means Twitter can't do anything fancy with IndexedDB or ServiceWorkers and has to fallback to being ephemeral.  (It can use LocalStorage and cookies, however.)

I think the take-away from that is that whatever is causing the ghost window involves IndexedDB or ServiceWorkers.  (I don't see it using the Cache API directly.)

I fear the bisection needs to be re-done using a profile that is freshly created using Firefox 56, which should eliminate the schema change issue.  The default --profile-persistence=clone should work.  (Maybe "clone-first" would work as well if that first run happens at the oldest point of the bisection and there were no schema upgrade backouts involved.)

1: https://github.com/mozilla/mozregression/blob/1256293c1c27b2e7b619827e75ddf377566b2fff/mozregression/cli.py#L144
Flags: needinfo?(bugmail)
I guess that makes sense. It sounds like the regression window is not going to be accurate then, as bkelly originally suggested, so I'll clear the various flags.
No longer blocks: 1290481
Flags: needinfo?(ttung)
Keywords: regression
Blocks: GhostWindows
Related questions would be:
- Are web notifications turned on?
- Do you notice ServiceWorkers for twitter hanging out under "/workers(twitter.com)/worker(https://twitter.com/push_service_worker.js" in the memory reports?  (Maybe also under decommitted?)
(In reply to Andrew Sutherland [:asuth] from comment #14)
> Related questions would be:
> - Are web notifications turned on?

I don't think so.

> - Do you notice ServiceWorkers for twitter hanging out under
> "/workers(twitter.com)/worker(https://twitter.com/push_service_worker.js" in
> the memory reports?  (Maybe also under decommitted?)

No.
STR wfm on my nightly 58.
(Sorry, that was not clear: using mccr8's STR, I also got the same ghost results.)
(In reply to David Durst [:ddurst] from comment #17)
> (Sorry, that was not clear: using mccr8's STR, I also got the same ghost
> results.)

What if you create a new profile?
Flags: needinfo?(ddurst)
I'm trying to re-create it with the same one (because in theory I should be able to).
And now I can't re-create it. I'm on 20171016100113, which I *think* happened in the process of restarting to clean things up,  -- but tbh, I didn't record the build ID before (I updated in 1398652 from 20170915100121, but I didn't record to what before now). :\
Flags: needinfo?(ddurst)
Just happened again. Only 3 tabs open (twitter, wikipedia, pitchfork). After minimizing memory usage, there's already one ghost.
David, Andrew, do either of you have any cookie or storage preferences changed from the default?  For example delete cookies on exit, etc?

I noticed I had a profile that could not do SW things and its because we have disabled SW in FF57 when nsContentUtils::StorageAllowed() returns eDeny.

Maybe we have a leak somewhere when things are denied this way?
Flags: needinfo?(ddurst)
Flags: needinfo?(continuation)
(In reply to Ben Kelly [:bkelly] from comment #22)
> David, Andrew, do either of you have any cookie or storage preferences
> changed from the default?  For example delete cookies on exit, etc?

Not as far as I know. (Right now I also have ghost windows for Facebook and Reddit, FWIW...)
Flags: needinfo?(continuation)
We probably need one of the problem profiles zipped up and shared with me, asuth, or janv.

Or, maybe try running in a DEBUG build and see if any useful warnings pop out from dom/cache, dom/quota, etc?
Not as far as I know either. FWIW, I think my most recent was a ghost for wikipedia, and not twitter.
Flags: needinfo?(ddurst)
Priority: -- → P2
Hmm, maybe now that :asuth has landed his backcompat schema hack for bug 1290481 we could try:

1. Upgrade to nightly with bug 1404344.  (May not be spun yet.)
2. See if the leak reproduces when you run on a build before bug 1290481.  (This should now avoid the schema bustage with the downgrade.)
3. If so, start a new bisect with the first bad being before bug 1290481.  (You want to avoid traversing revisions between bug 1290481 and bug 1404344.

What do you think?
Flags: needinfo?(continuation)
(In reply to Ben Kelly [:bkelly] from comment #26)
> What do you think?

The problem is that I don't remember seeing ghost windows back in the start of September, so I think maybe the new factor is something in my browser state and not Firefox itself.
Am I correct in assuming that "Refresh Nightly" would be sufficient to clean up the profile -- and then see if this recurs?
Just reproduced it with a cleaned profile using build 20171016220427, and not twitter -- in this case, it was from a forum (https://www.doityourself.com/) that has a lot of ads embedded.

I can try using other builds, since this was neither a lot of windows (8) and regular use. Should be able to reproduce in just a few days, but I can't start until later this week.
Summary: Ghost window on Twitter (with a broken profile?) → Ghost windows
(In reply to David Durst [:ddurst] from comment #29)
> Just reproduced it with a cleaned profile using build 20171016220427, and
> not twitter -- in this case, it was from a forum
> (https://www.doityourself.com/) that has a lot of ads embedded.

Did that happen when the tab for the forum was still open or after you closed it?
Still open.
(In reply to David Durst [:ddurst] from comment #31)
> Still open.

That's not really a leaking window then, most likely. Probably the page removed the iframe but it is still holding it alive through JS. It is a false positive of our current ghost window heuristic. See bug 1360310.
Flags: needinfo?(continuation)
Flags: needinfo?(continuation)
(In reply to Andrew McCreight [:mccr8] from comment #32)
> (In reply to David Durst [:ddurst] from comment #31)
> > Still open.
> 
> That's not really a leaking window then, most likely. Probably the page
> removed the iframe but it is still holding it alive through JS. It is a
> false positive of our current ghost window heuristic. See bug 1360310.

Aha. Gotcha. Resetting.
Summary: Ghost windows → Ghost window on Twitter (with a broken profile?)
Duplicate of this bug: 1398652
I have tested this issue using the provided STR from comment 0, but I haven't managed to reproduce it. Tried to reproduce this on latest Nightly (58.0a1) build, on a older Nightly build from 2017-10-16 and also on latest Beta 57.0b12 build.
Tried with multiple scenarios, but every time when I have closed the Twitter tab there were no ghost window in memory report. I have used a clean profile and also an older one for testing.

Andrew, in comment 7 you said that you are on Linux, what version of Linux did you use? I have tested this on Arch Linux and also on Windows 7.
Whiteboard: [MemShrink] → [MemShrink:P2]
I'm not really sure what to do with this.
Flags: needinfo?(continuation)
I've done a little more investigation into this, on my same profile. I added some code to run the Unlink method on the ghost window. This manages to get rid of most of the DOM stuff for the tab, but the GC stuff all remains. It looks like the JS reflector for the window is being leaked via a field called mScriptOwner. Based on SearchFox, this comes from IDB.
Component: DOM → DOM: IndexedDB
Flags: needinfo?(jvarga)
Ok, maybe that's not fair. There's a Promise with one unknown reference that is keeping an IDBDatabase alive:

0x7f6d091dcd60 [Promise]
    --[mPromiseObj]--> 0x7f6cf4123f40 [JS Object (Promise)]
    --[group_global]--> 0x7f6d0579a100 [JS Object (Window)]
    --[globalIndexedDBs]--> 0x7f6cf412df20 [JS Object (Object)]
    --[dm_typeahead]--> 0x7f6d0384e4c0 [JS Object (Object)]
    --[database]--> 0x7f6cfd3d8730 [JS Object (IDBDatabase)]
    --[UnwrapDOMObject(obj)]--> 0x7f6d06076120 [IDBDatabase ]

I'm not sure why the Promise is alive.
Component: DOM: IndexedDB → DOM
Flags: needinfo?(jvarga)
Ben is also seeing a leak on Twitter, in bug 1422912.

One thing we noticed is that for both of us, if we try to enable web notifications via the Twitter settings, clicking on the "Turn on" button does nothing. On my profile on my other machine, if I enable it and then block notifications, Twitter gives me an error message. Ben's serviceworker.txt was corrupted, and when he cleared it, the leak seemed to stop happening. My serviceworker.txt file looks okay at a glance, but maybe my service workers are also in a broken state.
See Also: → 1422912
When I get a verbose CC log, I see one or two nsGlobalWindows for twitter leaking, one for the card and one for twitter itself. For the card window, it is held alive by a promise, as seen above in comment 38.

I did refcount logging for nsGlobalWindow, and looked for the missing reference. The window is not held alive through shutdown. Looking from the end, there is a ton of refcount traffic associated with the shutdown CC, and before that, there's a bunch from when I ran the memory reporter, but in the middle there is this single release, which I think must be the missing reference:

<nsGlobalWindowInner> 0x7f232e679800 4 Release 2370 [thread 0x7f23524834c0]
    #01: operator delete(void*) (/media/ssd/mc/obj-dbg.noindex/dist/include/mozilla/mozalloc.h:230)
    #02: nsTArray_Impl<RefPtr<mozilla::dom::workers::ServiceWorkerJob::Callback>, nsTArrayInfallibleAllocator>::DestructRange(unsigned long, unsigned long) (/media/ssd/mc/obj-dbg.noindex/dist/include/nsTArray.h:2025 (discriminator 32))
    #03: ~nsTArray_Impl (/media/ssd/mc/obj-dbg.noindex/dist/include/nsTArray.h:887)
    #04: ~RefPtr (/media/ssd/mc/obj-dbg.noindex/dist/include/mozilla/RefPtr.h:78)
    #05: operator delete(void*) (/media/ssd/mc/obj-dbg.noindex/dist/include/mozilla/mozalloc.h:230)
    #06: mozilla::dom::workers::ServiceWorkerJob::Release() (/media/ssd/mc/dom/workers/ServiceWorkerJob.h:144 (discriminator 10))
    #07: nsTArray_Impl<RefPtr<mozilla::dom::workers::ServiceWorkerJob>, nsTArrayInfallibleAllocator>::DestructRange(unsigned long, unsigned long) (/media/ssd/mc/obj-dbg.noindex/dist/include/nsTArray.h:2025 (discriminator 34))
    #08: mozilla::RefPtrTraits<mozilla::dom::workers::ServiceWorkerJobQueue>::Release(mozilla::dom::workers::ServiceWorkerJobQueue*) (/media/ssd/mc/obj-dbg.noindex/dist/include/mozilla/RefPtr.h:41)
    #09: mozilla::dom::workers::ServiceWorkerManager::Observe(nsISupports*, char const*, char16_t const*) (/media/ssd/mc/dom/workers/ServiceWorkerManager.cpp:3863)
    #10: non-virtual thunk to mozilla::dom::workers::ServiceWorkerManager::Observe(nsISupports*, char const*, char16_t const*) (/media/ssd/mc/dom/workers/ServiceWorkerManager.cpp:3835)
    #11: nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) (/media/ssd/mc/xpcom/ds/nsObserverList.cpp:111)
    #12: nsTHashtable<nsObserverList>::GetEntry(char const*) const (/media/ssd/mc/xpcom/ds/nsTHashtable.h:135 (discriminator 2))
    #13: mozilla::ShutdownXPCOM(nsIServiceManager*) (/media/ssd/mc/xpcom/build/XPCOMInit.cpp:865 (discriminator 2))

It looks like that on shutdown the ServiceWorkerManager purges some list of jobs, which causes an array ServiceWorkerJob::Callbacks to be destroyed. ServiceWorkerResolveWindowPromiseOnRegisterCallback is a subclass of Callback that holds a strong pointer to a window, so maybe that's what is at fault?
Component: DOM → DOM: Service Workers
Attached patch Diagnostic patch. DO NOT LAND (obsolete) — Splinter Review
I made this little diagnostic patch that sets a global flag when the service manger is shutting down, and then prints out a message in the ServiceWorkerResolveWindowPromiseOnRegisterCallback dtor that includes the window and the promise if we are doing that during shutdown. I confirmed that on my Twitter test case that the window matches the twitter window I am leaking, and the Promise is the same Promise that is rooting the card window.
Blocks: 1420795
I'm seeing this Twitter leak on my Linux machine's main profile. On my OSX machine's main profile, I can readily reproduce a YouTube ghost window, by shift-clicking to a particular YouTube video from a particular reddit page. Note that these STR don't work on my Linux machine, but they do work 100% of the time on my OSX profile! I ran my default profile in a debug build with the patch here, and I confirmed that the leaking window was also being held alive as the window field of a ServiceWorkerResolveWindowPromiseOnRegisterCallback. It is possible that bug 1420795 is the same thing.
Summary: Ghost window on Twitter (with a broken profile?) → Ghost window on Twitter and YouTube (with a broken profile?)
Also, I did a try push that fatally asserts when we leak a ServiceWorkerResolveWindowPromiseOnRegisterCallback through shutdown, and it didn't trigger, so this isn't a case that seems to be covered by our testing. https://treeherder.mozilla.org/#/jobs?repo=try&revision=3bd087728a331bb8486bdd8e5642c31a66c17478
There is some other bug causing the service worker job queue to get stuck.  We need to investigate that as well.  Fixing the leak when it gets stuck is worth doing separate, though.
The way this service worker ownership works is like this: there's a per-process ServiceWorkerManager, which has a table where the values are strong references to RegistrationDataPerPrincipal. This has a table mJobQueues where the values are strong references to ServiceWorkerJobQueue. This contains an array of strong references to ServiceWorkerJob. This in turn contains various strong references to ServiceWorkerJob::Callback.

That class has a few subtypes:
a) ServiceWorkerJobQueue::Callback, which contains a strong reference back to a ServiceWorkerJobQueue.
b) ServiceWorkerResolveWindowPromiseOnRegisterCallback, which contains a strong reference to a window and to a Promise (which in turn will end up containing a strong reference to a window).
c) UnregisterJobCallback contains a strong reference to a nsIServiceWorkerUnregisterCallback, which can be implemented in JS or C++. One of the C++ implementations, UnregisterCallback, holds a strong reference to a promise.
d) UpdateJobCallback, which also has a subtype that holds a strong reference to a promise.

The keys for the various tables seem to be based on something more like a principal than a window, but they can keep alive per-window things. This whole setup seems prone to leaking, but I'm not sure how to improve it.

The strong window reference in (b) can be changed to a weak reference (nobody would expect these service worker things to be keep the window alive), but I don't know what to do about all of the promises, because those are probably only kept alive by the service worker stuff, so we can't make a weak reference to it.

At least for the case of ServiceWorkerResolveWindowPromiseOnRegisterCallback, it creates some kind of blank promise:
  RefPtr<Promise> promise = Promise::Create(sgo, result);
...and then doesn't do anything until with it until it is ready to be resolved. It looks like it would be possible to keep a weak reference to the window in the callback, and then lazily create the actual promise from that window once we're ready to actually resolve it. I'll see if that's possible for the other cases.
Oh I see, ServiceWorkerManager::Register() actually returns the promise. So I guess you do need to actually create the promise there. But on the other hand, I think that means it is okay to only have a weak reference to it in the callback, because presumably we don't actually need to do anything with it if content has thrown away any reference to it.
(In reply to Andrew McCreight [:mccr8] from comment #46)
> Oh I see, ServiceWorkerManager::Register() actually returns the promise. So
> I guess you do need to actually create the promise there. But on the other
> hand, I think that means it is okay to only have a weak reference to it in
> the callback, because presumably we don't actually need to do anything with
> it if content has thrown away any reference to it.

This sounds correct to me.
I have a patch that makes a bunch of callback references weak, and it seems to fix the Twitter leak for me. I haven't tried it on YouTube yet. One problem is that WorkerThreadUpdateCallback has a RefPtr<PromiseWorkerProxy>, which should really get the same treatment, but the proxy doesn't support weak references so I'm not sure how to do that.
For (d), the PromiseResolverCallback held by UpdateJobCallback holds alive a promise, but it looks like it is some kind of MozPromise, which is some multithreaded thing that is not directly holding alive JS, so I'll leave that alone.
Assignee: nobody → continuation
The PromiseWorkerProxy is for a Promise created on a worker global, so it should not be holding any window references.
This seems to break service workers enough that even the first test in dom/workers/test/serviceworkers/ times out, so maybe it isn't a real fix, but it just breaks service workers so they can't run and thus we can't leak.
I guess the problem is when script chains off the promise, but doesn't hold it alive.
Oh, right, that makes sense. Somehow the ownership of these promises has to be tied to the window and not the service worker manager. If the window owns them, then we can tell the CC about it and then we shouldn't have this leak.
I wonder if we should have some kind of PromiseWindowProxy like we have the PromiseWorkerProxy.  The window would then have some mechanism of signaling the window is shutting down, etc.

This would be easier if we had something that provided a common shutdown hook for nsIGlobalObjects like bug 1379209 suggests.
I implemented my idea by adding a multiset of strong references to promises to nsGlobalWindow. When service workers create a promise, they add it to this multiset. The CC traverses/unlink the multiset, so these promises can't keep the window alive, but the promises won't go away early. I encapsulated all of this in a PromiseWorkerProxy class as Ben suggested. This still fixes the leak, and passes at least the mochitest plain service worker tests.
Attachment #8934749 - Attachment is obsolete: true
L64 try looks okay: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d7c24a281451f9db3d78a62bac870538ef130a80

I have no idea how to test this, unfortunately.
(In reply to Andrew McCreight [:mccr8] from comment #58)
> L64 try looks okay:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=d7c24a281451f9db3d78a62bac870538ef130a80
> 
> I have no idea how to test this, unfortunately.

I think you would have to add some kind of xpidl or webidl interface that creates a promise, but never resolves it.  Not sure if its worth it or not...
Comment on attachment 8935134 [details]
Bug 1409115 - Fix leaks by adding promise window proxy.

https://reviewboard.mozilla.org/r/205976/#review212552

::: dom/promise/PromiseWindowProxy.h:40
(Diff revision 3)
> +// promise. It solves (2) by adding a strong reference to the promise
> +// on the window itself. This ensures that as long as both the
> +// PromiseWindowProxy and the window are still alive that the promise
> +// will remain alive. This strong reference is cycle collected, so it
> +// won't cause the window to leak.
> +class PromiseWindowProxy

final

::: dom/promise/PromiseWindowProxy.h:43
(Diff revision 3)
> +// will remain alive. This strong reference is cycle collected, so it
> +// won't cause the window to leak.
> +class PromiseWindowProxy
> +{
> +public:
> +  PromiseWindowProxy(nsPIDOMWindowInner* aWindow, mozilla::dom::Promise* aPromise);

Should we provide Move() constructors, etc?  I guess we can add them when we need to.

::: dom/promise/PromiseWindowProxy.cpp:21
(Diff revision 3)
> +
> +
> +PromiseWindowProxy::PromiseWindowProxy(nsPIDOMWindowInner* aWindow, Promise* aPromise)
> +  : mPromise(aPromise)
> +{
> +  auto* window = nsGlobalWindowInner::Cast(aWindow);

MOZ_DIAGNOSTIC_ASSERT() that aWindow and aPromise are non-nullptr here?

::: dom/promise/PromiseWindowProxy.cpp:37
(Diff revision 3)
> +}
> +
> +RefPtr<Promise>
> +PromiseWindowProxy::Get() const
> +{
> +  if (!mPromise) {

Please add MOZ_ASSERT(NS_IsMainThread()) to the various proxy methods.
Attachment #8935134 - Flags: review?(bkelly) → review+
Pushed by amccreight@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/70ab37e6d1ed
Fix leaks by adding promise window proxy. r=bkelly
[Tracking Requested - why for this release]: This is a really bad leak that has started to affect YouTube, possibly as they started rolling out use of a new WebAPI. It only affects some Firefox profiles, for unknown reasons.
Whiteboard: [MemShrink:P2] → [MemShrink:P1]
See Also: 1422912
Duplicate of this bug: 1422912
Blocks: 1421681
Tracking this for 58/59 since it's a memory leak happening with a popular site.
https://hg.mozilla.org/mozilla-central/rev/70ab37e6d1ed
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Comment on attachment 8935134 [details]
Bug 1409115 - Fix leaks by adding promise window proxy.

Approval Request Comment
[Feature/Bug causing the regression]: service workers
[User impact if declined]: really bad memory leaks on Youtube and Twitter, for some unknown segment of users
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes, I checked locally
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: Not very
[Why is the change risky/not risky?]: It doesn't change too much, and it has been on Nightly for a few days without any apparent problems.
[String changes made/needed]: none
Attachment #8935134 - Flags: approval-mozilla-beta?
Comment on attachment 8935134 [details]
Bug 1409115 - Fix leaks by adding promise window proxy.

While this maybe a big code change, it seems like a good bug to fix. Beta58+

Gerry, fyi.
Flags: needinfo?(gchang)
Attachment #8935134 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
The patch does not apply to beta. Please take a look. Thanks!
Flags: needinfo?(continuation)
Note, we uplifted the fix for the problem causing the SW job queue to stall.  So maybe we can get away without uplifting this if it's too hard.
The patch cannot be landed. Here is the error message:

grafting 454683:70ab37e6d1ed "Bug 1409115 - Fix leaks by adding promise window proxy. r=bkelly"
merging dom/base/nsGlobalWindow.cpp and dom/base/nsGlobalWindowInner.cpp to dom/base/nsGlobalWindow.cpp
merging dom/promise/moz.build
merging dom/workers/ServiceWorkerManager.cpp
merging dom/workers/ServiceWorkerRegistration.cpp
warning: conflicts while merging dom/base/nsGlobalWindow.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')

Please take a look. Thanks!
Looks like this just needs rebasing around the Inner/Outer split. Should be a fairly mechanical change.
The service worker job queue can get stuck sometimes, so we don't want
the per-process service worker data structure to hold strong
references to content. Instead, I add a proxy that is only a weak
reference. The wrinkle is that we need to keep the promise alive as
long as the job and the window are otherwise alive. I solve this by
putting a cycle collected reference to the promise on the window
itself.

MozReview-Commit-ID: 4qFaKQYB9mX
I've rebased this for beta.
Flags: needinfo?(continuation)
Sorry about that. In the middle of trying to rebase my patch, I changed two places that didn't need to be changed. I think I've fixed it, but I'll wait to upload a new patch until my try run finishes. https://treeherder.mozilla.org/#/jobs?repo=try&revision=970838864911270be00443a2f636c7cc4de27b82
Attachment #8937829 - Attachment is obsolete: true
Flags: needinfo?(continuation)
Duplicate of this bug: 1420795
Flags: needinfo?(gchang)
You need to log in before you can comment on or make changes to this bug.