support.microsoft.com ghost window via IDBDatabase object on 57 beta

RESOLVED FIXED in Firefox 57

Status

()

defect
P1
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: bkelly, Assigned: janv)

Tracking

(Blocks 1 bug)

unspecified
mozilla58
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox57+ fixed, firefox58 fixed)

Details

(Whiteboard: [MemShrink])

Attachments

(2 attachments, 1 obsolete attachment)

I currently have a ghost window on FF57 beta:

236.35 MB (100.0%) -- explicit
├──109.54 MB (46.35%) -- window-objects
│  ├───48.39 MB (20.47%) ++ top(https://mail.google.com/mail/u/0/#inbox, id=10737418241)
│  ├───32.60 MB (13.79%) ++ top(https://www.irccloud.com/irc/freenode/channel/whatwg, id=10737418243)
│  ├───15.42 MB (06.52%) -- top(none)
│  │   ├──11.01 MB (04.66%) -- ghost
│  │   │  ├──10.46 MB (04.43%) -- window(https://support.microsoft.com/en-us/help/2207548/slow-performance-on-windows-server-when-using-the-balanced-power-plan)
│  │   │  │  ├───4.47 MB (01.89%) ── layout/style-sheets
│  │   │  │  ├───4.36 MB (01.84%) -- js-compartment(https://support.microsoft.com/en-us/help/2207548/slow-performance-on-windows-server-when-using-the-balanced-power-plan)
│  │   │  │  │   ├──3.90 MB (01.65%) ++ classes
│  │   │  │  │   └──0.46 MB (00.19%) ++ (4 tiny)
│  │   │  │  └───1.63 MB (00.69%) ++ (2 tiny)
│  │   │  └───0.55 MB (00.23%) ++ window(about:blank)
│  │   └───4.41 MB (01.87%) -- detached
│  │       ├──4.40 MB (01.86%) -- window(https://www.google.com/search?q=windows+high+performance+profile&ie=utf-8&oe=utf-8)
│  │       │  ├──2.78 MB (01.18%) -- js-compartment(https://www.google.com/search?q=windows+high+performance+profile&ie=utf-8&oe=utf-8)
│  │       │  │  ├──2.60 MB (01.10%) ++ classes
│  │       │  │  └──0.17 MB (00.07%) ++ (4 tiny)
│  │       │  └──1.63 MB (00.69%) ++ (3 tiny)
│  │       └──0.01 MB (00.00%) ++ (3 tiny)
│  └───13.14 MB (05.56%) ++ top(https://bugzilla.mozilla.org/enter_bug.cgi?product=Core&component=DOM, id=10737419362)

I analyzed the CC log and found:

bkelly@valen:/mnt/c/devel/tmp/cclogs$ grep nsGlobalWindow * | grep microsoft
cc-edges.4324.1508506761.log:0000021E9AA19000 [rc=295] nsGlobalWindow # 10737419057 inner https://support.microsoft.com/en-us/help/2207548/slow-performance-on-windows-server-when-using-the-balanced-power-plan
bkelly@valen:/mnt/c/devel/tmp/cclogs$ /mnt/c/devel/heapgraph/find_roots.py cc-edges.4324.1508506761.log 0000021E9AA19000
Parsing cc-edges.4324.1508506761.log. Done loading graph. 

0000021EAD997B70 [IDBDatabase https://support.microsoft.com/en-us/help/2207548/slow-performance-on-windows-server-when-using-the-balanced-power-plan]
    --[mFactory]--> 0000021EAC64E400 [IDBFactory]
    --[mWindow]--> 0000021E9AA19000 [nsGlobalWindow # 10737419057 inner https://support.microsoft.com/en-us/help/2207548/slow-performance-on-windows-server-when-using-the-balanced-power-plan]

    Root 0000021EAD997B70 is a ref counted object with 1 unknown edge(s).

It appears an IDBDatabase object has been leaked for some reason and is holding the window alive.

I'm not exactly sure on STR.  Its possible I closed this tab while it was in the middle of loading.
The google search window is also being held alive from the same IDBDatabase object:

bkelly@valen:/mnt/c/devel/tmp/cclogs$ grep nsGlobalWindow * | grep google.com/search
cc-edges.4324.1508506761.log:0000021E9D13A000 [rc=259] nsGlobalWindow # 10737419108 inner https://www.google.com/search?q=windows+high+performance+profile&ie=utf-8&oe=utf-8
bkelly@valen:/mnt/c/devel/tmp/cclogs$ /mnt/c/devel/heapgraph/find_roots.py cc-edges.4324.1508506761.log 0000021E9D13A000
Parsing cc-edges.4324.1508506761.log. Done loading graph. 

0000021EAD997B70 [IDBDatabase https://support.microsoft.com/en-us/help/2207548/slow-performance-on-windows-server-when-using-the-balanced-power-plan]
    --[mScriptOwner]--> 0000021E9B785060 [JS Object (Window)]
    --[CLASS_OBJECT(Object), CLASS_OBJECT(Function), **UNKNOWN SLOT 175**]--> 0000021E9B787080 [JS Object (Proxy)]
    --[proxy target]--> 0000021EC96C7B80 [JS Object (Proxy)]
    --[group_global, proxy target]--> 0000021E9B785380 [JS Object (Window)]
    --[UnwrapDOMObject(obj)]--> 0000021E9D13A000 [nsGlobalWindow # 10737419108 inner https://www.google.com/search?q=windows+high+performance+profile&ie=utf-8&oe=utf-8]

    Root 0000021EAD997B70 is a ref counted object with 1 unknown edge(s).


Is that because of the opener relationship?
I tried performing the search and visiting the site again, but I can't reproduce.  I wonder if its just an IDB upgrade path or something that I'm not triggering again.
Component: DOM → DOM: IndexedDB
Jan, please investigate.
Assignee: nobody → jvarga
Flags: needinfo?(jvarga)
Priority: -- → P1
(In reply to Olli Pettay [:smaug] (work week Oct 16-20) from comment #4)
> So the obvious first thing to look at is all the member variables keeping a
> strong reference to IDBDataBase. At least the following and explain why they
> can't cause leaks:
> http://searchfox.org/mozilla-central/rev/
> 8a6a6bef7c54425970aa4fb039cc6463a19c0b7f/dom/indexedDB/ActorsChild.h#431

BackgroundDatabaseRequestChild is short lived, shouldn't be a problem

> http://searchfox.org/mozilla-central/rev/
> 8a6a6bef7c54425970aa4fb039cc6463a19c0b7f/dom/indexedDB/ActorsChild.h#304

This strong ref only exists until received database from the parent is fully constructed.
Is it possible the data from the parent isn't ever fully constructed?
(In reply to Ben Kelly [:bkelly] from comment #2)
> I tried performing the search and visiting the site again, but I can't
> reproduce.  I wonder if its just an IDB upgrade path or something that I'm
> not triggering again.

So, if there's a version change, IDBDatabase is created and held alive by mTemporaryStrongDatabase.
I can imagine that if something fails during version change, we may not clear stuff correctly.
I'll try to investigate this tomorrow.
I still can't reproduce it, but I will continue investigating, it's my top priority.
I can reproduce using a simplified test case, working on a fix.
Flags: needinfo?(jvarga)
Posted patch fix (obsolete) — Splinter Review
Andrew, can you take a look at this patch ?
I'll also attach a simple test you can use to verify the fix.
Basically, when a version change transaction is aborted we never release the strong "temporary" ref to created IDBDatabase object.
Attachment #8922462 - Flags: feedback?(bugmail)
Posted file test
Tracked for 57
Comment on attachment 8922462 [details] [diff] [review]
fix

Review of attachment 8922462 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

Restating: ReleaseDOMObject() was only being invoked in the HandleResponse(OpenDatabaseRequestResponse&) overload path, but errors trigger the HandleResponse(nsresult) overload path, so ReleaseDOMObject() was not being invoked in the error case.  This patch adds a weak back-reference to ensure ReleaseDOMObject() is called in that case.  There is also some invariant maintenance that the back-reference is cleared when the BackgroundDatabaseChild is destroyed.

::: dom/indexedDB/ActorsChild.cpp
@@ +2146,5 @@
>  
>    MaybeCollectGarbageOnIPCMessage();
>  
> +  if (mOpenRequestActor && mTemporaryStrongDatabase) {
> +    mOpenRequestActor->SetDatabaseActor(nullptr);

Should we also be nulling out mTemporaryStrongDatabase here if we believe it can be non-null in the first place?  If not, I think a comment is in order.  I do understand this clause here is mainly for the invariant about ensuring the weak back-reference is cleared.

::: dom/indexedDB/ActorsChild.h
@@ +252,5 @@
>    friend class PermissionRequestChild;
>    friend class PermissionRequestParent;
>  
>    RefPtr<IDBFactory> mFactory;
> +  BackgroundDatabaseChild* mDatabaseActor;

I think this wants to be annotated with MOZ_NON_OWNING_REF[1].  A comment that helps clarify the cycle would be great too.  I need to step out for lunch, but can help provide one in the review phase.

1: https://searchfox.org/mozilla-central/rev/dd47bee6468de7e1221b4d006342ad6b9813d0e5/mfbt/Attributes.h#512
Attachment #8922462 - Flags: feedback?(bugmail) → feedback+
Ok, I'll address your suggestions.
Thanks for quick feedback!
Posted patch fixSplinter Review
Attachment #8922462 - Attachment is obsolete: true
Attachment #8923189 - Flags: review?(bugmail)
(In reply to Andrew Sutherland [:asuth] from comment #14)
> ::: dom/indexedDB/ActorsChild.cpp
> @@ +2146,5 @@
> >  
> >    MaybeCollectGarbageOnIPCMessage();
> >  
> > +  if (mOpenRequestActor && mTemporaryStrongDatabase) {
> > +    mOpenRequestActor->SetDatabaseActor(nullptr);
> 
> Should we also be nulling out mTemporaryStrongDatabase here if we believe it
> can be non-null in the first place?  If not, I think a comment is in order. 
> I do understand this clause here is mainly for the invariant about ensuring
> the weak back-reference is cleared.

I think this can only happen in some pathological case and then other things would fail too.
So I removed it from the patch.

> 
> ::: dom/indexedDB/ActorsChild.h
> @@ +252,5 @@
> >    friend class PermissionRequestChild;
> >    friend class PermissionRequestParent;
> >  
> >    RefPtr<IDBFactory> mFactory;
> > +  BackgroundDatabaseChild* mDatabaseActor;
> 
> I think this wants to be annotated with MOZ_NON_OWNING_REF[1].  A comment
> that helps clarify the cycle would be great too.  I need to step out for
> lunch, but can help provide one in the review phase.

I checked the documentation for this and it says that it's for reference counted objects.
BackgroundDatabaseChild is not ref-counted. Anyway, I added a comment that explains the problem and the need
for the back-reference.
Comment on attachment 8923189 [details] [diff] [review]
fix

Review of attachment 8923189 [details] [diff] [review]:
-----------------------------------------------------------------

Excellent comment, thank you!
Attachment #8923189 - Flags: review?(bugmail) → review+
Pushed by jvarga@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d7a53e1454ad
Clear database actor's strong reference to IDBDatabase when opening of a database fails; r=asuth
https://hg.mozilla.org/mozilla-central/rev/d7a53e1454ad
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Is it appropriate to ask for 57 uplift here?
Flags: needinfo?(jvarga)
(In reply to Andrew Overholt [:overholt] from comment #21)
> Is it appropriate to ask for 57 uplift here?

Yes, but I would wait a day or so if possible.
Flags: needinfo?(jvarga)
I think we should have it ready to go here since the timeline is *so* short on 57.
Ok, I'll request it now.
Comment on attachment 8923189 [details] [diff] [review]
fix

Approval Request Comment
[Feature/Bug causing the regression]: This is probably very old issue, introduced years ago.
[User impact if declined]: Entire windows are leaked leading to excessive memory footprint in specific (rather rare) case when a version change transaction is aborted.
[Is this code covered by automated tests?]: No, it's hard to test it. Only manually at the moment.
[Has the fix been verified in Nightly?]: Yes, it landed on m-c.
[Needs manual test from QE? If yes, steps to reproduce]: No, only a developer can verify this.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: Low risk.
[Why is the change risky/not risky?]: There's a theoretical possibility that weak references are not correctly set/cleared but both author of the patch and the reviewer double checked the code. Ideally this would bake on m-c for some time, but the timeline is short on 57, so requesting beta approval now.
[String changes made/needed]: None
Attachment #8923189 - Flags: approval-mozilla-beta?
Comment on attachment 8923189 [details] [diff] [review]
fix

Taking a leap of faith here, this is a bad problem, worth fixing in 57, Beta57+
Attachment #8923189 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Hi Ben, could you please verify this issue is fixed as expected on a latest Nightly build? If it doesn't, please NI me and I might consider asking for a backout. Thanks!
Flags: needinfo?(bkelly)
Sorry, I don't have an easy way to reproduce it.  I think Jan has a reduced test case that reproduces.
Flags: needinfo?(bkelly) → needinfo?(jvarga)
Yeah, there's an html test:
https://bugzilla.mozilla.org/attachment.cgi?id=8922463
It opens an IDB database with name "myDB"

Then there's a patch which adds debugging printfs:
https://bugzilla.mozilla.org/attachment.cgi?id=8922462

First printf is in IDBFactory::OpenInternal and it displays the database name.
The following printf is in IDBDatabase constructor and it displays memory address of newly created IDBDatabase.
If the fix works correctly, there should be another printf coming from IDBDatabase desctructor.

I don't know maybe there's a better way to test leaks like this, but currently this is best what I have.
Flags: needinfo?(jvarga)
You need to log in before you can comment on or make changes to this bug.