Closed Bug 738011 Opened 12 years ago Closed 12 years ago

Report ghost windows in about:memory

Categories

(Toolkit :: about:memory, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla14

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

(Whiteboard: [MemShrink:P1])

Attachments

(1 file, 5 obsolete files)

See bug 737857 comment 0 for the definition of "ghost window".

I would love to put a tooltip on "window-objects/ghost", but unfortunately we can't do that at the moment.  I volunteer to write the code if njn will let me.
Blocks: 737857
This doesn't do criterion (3), that a window moves from detached to ghost only after a certain amount of time.
Whiteboard: [MemShrink]
Attached patch Patch v1 (obsolete) — Splinter Review
Attachment #608072 - Attachment is obsolete: true
Attachment #608469 - Flags: review?(n.nethercote)
Comment on attachment 608469 [details] [diff] [review]
Patch v1

Review of attachment 608469 [details] [diff] [review]:
-----------------------------------------------------------------

I like the basic idea, assuming it works.  Have you seen ghost windows in practice, maybe via a contrived test case?

As for the review outcome, I'm going to be a pain.  I'm giving r- because I'd like to see the outcome of the merging of the two big comments, and some other things.  Furthermore, I'm enough of a Gecko/XPCOM noob that I'd like someone else (jst?) to review an updated version as well -- in particular, the big chunk of new code in nsWindowMemoryReporter.cpp, which I only reviewed lightly.

::: dom/base/nsWindowMemoryReporter.cpp
@@ +87,5 @@
> +    nsCOMPtr<nsIScriptObjectPrincipal> scriptObjPrincipal =
> +      do_QueryInterface(aWindow);
> +    NS_ENSURE_TRUE(scriptObjPrincipal, NULL);
> +
> +    nsIPrincipal *principal = scriptObjPrincipal->GetPrincipal();

This looks merely like a more complicated way of getting the principal.  Was there something wrong with the old way?

@@ +131,5 @@
>    // DOM window objects fall into one of three categories:
>    // - "active" windows are currently either displayed in an active
>    //   tab, or a child of such a window.
>    // - "cached" windows are in the fastback cache.
>    // - "other" windows are closed (or navigated away from w/o being

You need to modify this comment to explain how the reporter paths have changed.  Also, see below my remarks on the big comment in nsWindowMemoryReporter.h.

@@ +179,4 @@
>    } else {
> +    nsCOMPtr<nsIURI> uri = GetWindowURI(aWindow);
> +    nsCAutoString domain;
> +    aTLDService->GetBaseDomain(uri, 0, domain);

Is |domain| ever used?  Looks like it could be removed, and |aTLDService| as well?

@@ +185,5 @@
> +      windowPath += NS_LITERAL_CSTRING("ghost/");
> +    }
> +    else {
> +      windowPath += NS_LITERAL_CSTRING("detached/");
> +    }

Previously windows of this kind had a path like this:

  window-objects/top(none)/<URL>

Now they look like this:

  window-objects/{ghost,detached}/<URL>

I'd prefer it if they looked like this:

  window-objects/top(none)/{ghost,detached}/<URL>

Two advantages of this:  (a) I like that every entry following window-objects has the form "top(...)";  (b) it mirrors the paths for active and cached windows, which look like this:

  window-objects/top(<top-URL>)/{active,cached}/<URL>

@@ +191,3 @@
>  
>    nsIDocShell *docShell = aWindow->GetDocShell();
>    if (docShell) {

In the big comment in nsWindowMemoryReporter.h you say 'A window W is "detached" iff its docshell is null'.  But here you test whether it has a top window.  I realize the two are equivalent but it still confused me for a moment.  Perhaps the |if (top)| block and the |if (docShell)| block could be combined.

@@ +410,5 @@
> +    // This window shares a domain with a non-detached window, so reset its
> +    // clock.
> +    aTimeStamp = TimeStamp();
> +  }
> +  else {

Nit: isn't it standard to put the 'else' on the same line as the '}'?

@@ +418,5 @@
> +    if (aTimeStamp.IsNull()) {
> +      // This may become a ghost window later; start its clock.
> +      aTimeStamp = data->now;
> +    }
> +    else if ((data->now - aTimeStamp).ToSeconds() > data->ghostTimeout) {

Ditto.

@@ +428,5 @@
> +          data->ghostWindowIDs->PutEntry(pWindow->WindowID());
> +        }
> +      }
> +    }
> +

Nit: remove blank line?

::: dom/base/nsWindowMemoryReporter.h
@@ +69,5 @@
> + * nsWindowMemoryReporter is responsible for the 'explicit/window-objects'
> + * memory reporter.
> + *
> + *
> + * We define two window states, "detached" and "ghost", as follows.

I like that you've written a big comment.  But we have another big comment in nsWindowMemoryReporter.cpp, within CollectWindowReports, which I suspect you weren't aware of.  The two should probably be combined.  I'm not sure if combining them here or there would be better.

Also, I think "detached" and "ghost" are sub-cases of the "other" state, is that right?  That should be made clear when the two comments are merged.

@@ +90,5 @@
> + *        immediately declare a window a ghost before the GC/CC has had a
> + *        chance to run.)
> + *
> + * Although ghost windows can happen legitimately, their presence is often
> + * indicative of a memory leak.

Can you give an example of a legitimate case?

@@ +92,5 @@
> + *
> + * Although ghost windows can happen legitimately, their presence is often
> + * indicative of a memory leak.
> + *
> + * nsWindowMemoryRepoter observes window detachment and uses mDetachedWindows

Nit: "Repoter" -> "Reporter"
Attachment #608469 - Flags: review?(n.nethercote) → review-
> I like the basic idea, assuming it works.  Have you seen ghost windows in practice, maybe via a 
> contrived test case?

Yes, I installed vimperator, got some zombie compartments, and saw corresponding ghost windows.

> This looks merely like a more complicated way of getting the principal.  Was there something wrong 
> with the old way?

The old code took nsGlobalWindow*.  The new code takes nsIDOMWindow* (because you don't get concrete classes out of a weak reference), so you have to QI it.  It's a pain, I agree.

> I'd prefer it if they looked like this:
>
>  window-objects/top(none)/{ghost,detached}/<URL>
>
> Two advantages of this:  (a) I like that every entry following window-objects has the form 
> "top(...)";  (b) it mirrors the paths for active and cached windows, which look like this:
>
>  window-objects/top(<top-URL>)/{active,cached}/<URL>

Detached windows always have size basically zero.  I think it's due to the fact that the window nominally gives up its document upon SetDocShell(null), even though the document is still somehow attached to the window (you can still do window.document from JS).

This is a separate bug that I don't want to try to fix here, but the net effect is, detached/ghost is always hidden under tiny.  So now we have

  window-objects/tiny/top(none)/ghost/window(url)

which is much more nesting than I'd like for such an important list of objects.

I'd be happy to do

  window-objects/top(none)/ghost-window(url)

(I'm working on a patch to add ghost windows to about:components, so ghost windows will be highly visible there, but I still would like not to over-nest things.)
> This is a separate bug that I don't want to try to fix here, but the net
> effect is, detached/ghost is always hidden under tiny.  So now we have
> 
>   window-objects/tiny/top(none)/ghost/window(url)
> 
> which is much more nesting than I'd like for such an important list of
> objects.

The "tiny" is an artifact inserted in non-verbose mode, so you don't have any control over that.
 
> I'd be happy to do
> 
>   window-objects/top(none)/ghost-window(url)

Eh, ok.

> (I'm working on a patch to add ghost windows to about:components, so ghost
> windows will be highly visible there, but I still would like not to
> over-nest things.)

In which case about:compartments will be a misnomer... :/
> The "tiny" is an artifact inserted in non-verbose mode, so you don't have any control over that.

Precisely why I want to reduce the nesting level elsewhere.  :)

> In which case about:compartments will be a misnomer... :/

Indeed!
> Nit: isn't it standard to put the 'else' on the same line as the '}'?

Since we're all about a data-driven approach here,

$ git grep '^\s*else\s*{' dom | wc -l
564
$ git grep '^\s*}\s*else\s*{' dom | wc -l
719

Cuddle-else does have a huge advantage in the tree as a whole (23k to 5k), so I'll change mine.  :)
Heavens, don't assume existing code is an indicator of good style! :P
> Heavens, don't assume existing code is an indicator of good style! :P

In all seriousness, that's usually how we do things in Gecko.  We have a bajillion styles, and we match the prevailing style in the module we're writing in.

Of course, every reviewer has his own personal pet style, so I often end up writing to my reviewer's preferred style.  You'll get a note as a "cuddle-else" reviewer.
Attached patch Interdiff v1 --> v2 (obsolete) — Splinter Review
Attachment #608577 - Flags: review?(n.nethercote)
Comment on attachment 608577 [details] [diff] [review]
Interdiff v1 --> v2

Oops, this includes changes I thought would be ignored.
Attachment #608577 - Attachment is obsolete: true
Attachment #608577 - Flags: review?(n.nethercote)
Attached patch Interdiff v1 --> v2 (obsolete) — Splinter Review
Attachment #608578 - Flags: review?(n.nethercote)
Attached patch Patch v2 (obsolete) — Splinter Review
Assignee: nobody → justin.lebar+bug
Attachment #608469 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #608579 - Flags: review?(n.nethercote)
Attachment #608578 - Flags: review?(n.nethercote)
Comment on attachment 608579 [details] [diff] [review]
Patch v2

Review of attachment 608579 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, it's ready for that second reviewer.  Thanks!

::: dom/base/nsWindowMemoryReporter.cpp
@@ +150,2 @@
>  
> +  if (top) {

No point having two |if (top)| blocks in a row.  Just move this inside the first one.

::: dom/base/nsWindowMemoryReporter.h
@@ +77,5 @@
> + *  - "cached" windows, which are in the fastback cache (aka the bfcache).
> + *
> + *  - "detached" windows, which have a null docshell.  (This happens when the
> + *    <iframe> or tab containing the window is destroyed -- i.e., when the window
> + *    is no longer active or cached.)

As before, the code in nsWindowMemoryReporter.cpp checks if the window has a top window rather than if it has a docshell.  Maybe note that the two are equivalent here?  Also, I wonder if it's worth asserting that the two are equivalent in nsWindowMemoryReporter.cpp.

@@ +79,5 @@
> + *  - "detached" windows, which have a null docshell.  (This happens when the
> + *    <iframe> or tab containing the window is destroyed -- i.e., when the window
> + *    is no longer active or cached.)
> + *
> + * Additionally, we classify a subset of detached windows as "ghost" windows.

You have three categories, one of which has a subset.  But they're presented in about:memory as four categories.  Maybe change "detached" and "ghost" to "detached-normal" and "detached-ghost"?  Or maybe just leave it, I'm being super-pedantic.

@@ +97,5 @@
> + * Although ghost windows can happen legitimately (a page can hold a reference
> + * to a cross-domain window and then close its container), the presence of
> + * ghost windows is often indicative of a memory leak.
> + *
> + * nsWindowMemoryRepoter observes window detachment and uses mDetachedWindows

"Repoter"
Attachment #608579 - Flags: review?(n.nethercote) → review+
> You have three categories, one of which has a subset.  But they're presented in about:memory as four 
> categories.  Maybe change "detached" and "ghost" to "detached-normal" and "detached-ghost"?  Or 
> maybe just leave it, I'm being super-pedantic.

I doubt anyone is going to be confused by this.  But I kind of do want to add a tooltip to explicit/window-objects/top(none)/ghost and friends.
Blocks: 738624
Attached patch Patch v3Splinter Review
Attachment #608578 - Attachment is obsolete: true
Attachment #608579 - Attachment is obsolete: true
Comment on attachment 608681 [details] [diff] [review]
Patch v3

Kyle, do you want to take this?
Attachment #608681 - Flags: review?(khuey)
Comment on attachment 608681 [details] [diff] [review]
Patch v3

Mounir, can you look at this patch?
Attachment #608681 - Flags: review?(khuey) → review?(mounir)
Whiteboard: [MemShrink] → [MemShrink:P1]
Attachment #608681 - Flags: review?(mounir) → review?(continuation)
Comment on attachment 608681 [details] [diff] [review]
Patch v3

Review of attachment 608681 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/base/nsWindowMemoryReporter.cpp
@@ +40,5 @@
> +#include "nsIEffectiveTLDService.h"
> +#include "mozilla/Services.h"
> +#include "mozilla/Preferences.h"
> +#include "nsNetCID.h"
> +#include "nsWeakPtr.h"

You include weak pointer in the header.  Do you need to include it here, too?  I don't know what the right thing is.

@@ +84,5 @@
>    }
>  
>    if (!uri) {
> +    nsCOMPtr<nsIScriptObjectPrincipal> scriptObjPrincipal =
> +      do_QueryInterface(aWindow);

Why this indirect way?

@@ +146,5 @@
>                                        : NS_LITERAL_CSTRING("active/");
>    } else {
> +    nsCOMPtr<nsIURI> uri = GetWindowURI(aWindow);
> +
> +    if (aGhostWindowIDs->Contains(aWindow->WindowID())) {

aGhostWindowIDs can't be null, so maybe it would make more sense to pass as a reference?

@@ +231,5 @@
>    // destroyed while we're calling the memory reporter callback.
>    WindowArray windows;
>    windowsById->Enumerate(GetWindows, &windows);
>  
> +  // Get the IDs of all the "ghost" windows in existance.

nit: existence.  Or you could just say "Get the IDs of all "ghost" windows.", because windows that don't exist surely can't be ghost windows, so this is a little redundant. ;)

@@ +294,5 @@
>  }
>  
> +NS_IMETHODIMP
> +nsWindowMemoryReporter::Observe(nsISupports* aSubject, const char* aTopic,
> +                                const PRUnichar* aData)

nit: move the * over for all three arguments

@@ +392,5 @@
> +  nsIEffectiveTLDService *tldService;
> +};
> +
> +static PLDHashOperator
> +GetNonDetachedWindowDomainsEnumerator(const PRUint64& aId, nsGlobalWindow* aWindow,

nit: move * over

@@ +431,5 @@
> + * all ghost windows we found.
> + */
> +void
> +nsWindowMemoryReporter::CheckForGhostWindows(
> +  nsTHashtable<nsUint64HashKey>* aOutGhostIDs /* = NULL */)

nit: * on the right

@@ +440,5 @@
> +    NS_WARNING("Couldn't get TLDService.");
> +    return;
> +  }
> +
> +  nsGlobalWindow::WindowByIdTable* windowsById =

nit: * on the right

@@ +443,5 @@
> +
> +  nsGlobalWindow::WindowByIdTable* windowsById =
> +    nsGlobalWindow::GetWindowsTable();
> +  if (!windowsById) {
> +    NS_WARNING("GetWindowsTable returned null?");

Why the question mark?  It definitely returned null. ;)

::: dom/base/nsWindowMemoryReporter.h
@@ +66,4 @@
>  };
>  
> +/**
> + * nsWindowMemoryReporter is responsible for the 'explicit/window-objects'

Thanks for moving this to the header, it is easier to find there.

@@ +70,5 @@
> + * memory reporter.
> + *
> + * We classify DOM window objects into one of three categories:
> + *
> + *  - "active" windows, which are displayed in a tab (as the top-level window

This case doesn't end in a period, but the other two cases do.  I don't know what the right way is, but at least be consistent.

@@ +77,5 @@
> + *  - "cached" windows, which are in the fastback cache (aka the bfcache).
> + *
> + *  - "detached" windows, which have a null docshell.  (This happens when the
> + *    <iframe> or tab containing the window is destroyed -- i.e., when the window
> + *    is no longer active or cached.)

No need I can see to put this in parens.

@@ +86,5 @@
> + *   1) The window is detached.
> + *
> + *   2) There exist no non-detached windows with the same base domain as
> + *      the window's principal.  (For example, the base domain of
> + *      "wiki.mozilla.co.uk" is "mozilla.co.uk".)

Again, no real need for parens here.  Also please briefly explain here why this is a criterion.  I sort of have a guess but I'm not really sure.

@@ +91,5 @@
> + *
> + *   3) The window has met criteria (1) and (2) above for at least
> + *      memory.ghost_window_timeout_seconds.  (This is so we don't
> + *      immediately declare a window a ghost before the GC/CC has had a
> + *      chance to run.)

no need for parens.

@@ +95,5 @@
> + *      chance to run.)
> + *
> + * Although ghost windows can happen legitimately (a page can hold a reference
> + * to a cross-domain window and then close its container), the presence of
> + * ghost windows is often indicative of a memory leak.

You should put this sentence up before the 3 criteria.  This is the high level view of what a ghost window is, which we should give before diving into the technicalities of what they are.  Also, this list of three things is inconsistent wrt the other list.  either make them both use dashes or both numbered, and the indentation should be consistent.

@@ +98,5 @@
> + * to a cross-domain window and then close its container), the presence of
> + * ghost windows is often indicative of a memory leak.
> + *
> + * nsWindowMemoryReporter observes window detachment and uses mDetachedWindows
> + * to remember when a window first met criteria (1) and (2).

What if a window starts failing to meet criteria 2?  As far as I can see, criteria 1 is monotonic, but the user could open up a new window with the new domain.  Can we be sure that there's no relation between the two?

@@ +102,5 @@
> + * to remember when a window first met criteria (1) and (2).
> + *
> + * When producing a memory report, we can iterate over each window in
> + * mDetachedWindows, see if it satisfied (1) and (2) long enough ago, and thus
> + * determine if it's a ghost.

This seems overly wordy to me.  You could just throw some kind of "to determine ghost windows" phrase onto the previous sentence instead of this whole thing.

@@ +105,5 @@
> + * mDetachedWindows, see if it satisfied (1) and (2) long enough ago, and thus
> + * determine if it's a ghost.
> + *
> + *
> + * We use the following memory reporter path for active and cached windows:

I don't know anything about memory paths, but I guess njn looked at this so it should be okay.

@@ +143,5 @@
>    // Protect ctor, use Init() instead.
>    nsWindowMemoryReporter();
> +
> +  /**
> +   * Called when our asynchronous event to call CheckForGhostWindows, enqueued

This sentence is weird and confusing.

@@ +161,5 @@
> +   *
> +   * This is called asynchronously after we observe a DOM window being detached
> +   * from its docshell, and also right before we generate a memory report.
> +   */
> +  void CheckForGhostWindows(nsTHashtable<nsUint64HashKey>* aOutGhostIDs = NULL);

nit: *aOutGhostIDs

@@ +173,5 @@
> +   * (Although windows are not added to this table until they're detached, it's
> +   * possible for a detached window to become non-detached, and we won't
> +   * remove it from the table until CheckForGhostWindows runs.)
> +   */
> +  nsDataHashtable<nsISupportsHashKey, mozilla::TimeStamp> mDetachedWindows;

It seems a little sad to have a list of detached windows here, and another list for the patch Olli is working on.  I wonder if it makes sense to have a central list?  You should coordinate a little with him on that.

@@ +176,5 @@
> +   */
> +  nsDataHashtable<nsISupportsHashKey, mozilla::TimeStamp> mDetachedWindows;
> +
> +  /**
> +   * Do we have an asynchronous call to CheckForGhostWindows pending?

Phrasing this as a question seems a little odd to me, but maybe that's just me.
Attachment #608681 - Flags: review?(continuation) → review+
>@@ +84,5 @@
>>    }
>>  
>>    if (!uri) {
>> +    nsCOMPtr<nsIScriptObjectPrincipal> scriptObjPrincipal =
>> +      do_QueryInterface(aWindow);
>
>Why this indirect way?

(See comment 4)  The old code took nsGobalWindow, but you can't get concrete classes out of a weak pointer, so I have to QI my way to victory.

>
>@@ +146,5 @@
>>                                        : NS_LITERAL_CSTRING("active/");
>>    } else {
>> +    nsCOMPtr<nsIURI> uri = GetWindowURI(aWindow);
>> +
>> +    if (aGhostWindowIDs->Contains(aWindow->WindowID())) {
>
>aGhostWindowIDs can't be null, so maybe it would make more sense to pass as a reference?

Everything else is a pointer here (even though e.g. aWindowTotalSizes is also never-null), so unless you feel strongly, I'm going to leave this one as a pointer.

>> + * Although ghost windows can happen legitimately (a page can hold a reference
>> + * to a cross-domain window and then close its container), the presence of
>> + * ghost windows is often indicative of a memory leak.
>
>You should put this sentence up before the 3 criteria.  This is the high level
>view of what a ghost window is, which we should give before diving into the
>technicalities of what they are.  Also, this list of three things is
>inconsistent wrt the other list.  either make them both use dashes or both
>numbered, and the indentation should be consistent.

I'll make the indentation consistent.  But I don't want to use numbers for both of them, because I talk about "criterion (X)" a bunch of times, and it's clearer if there's only one thing labeled "(1)".

>@@ +98,5 @@
>> + * to a cross-domain window and then close its container), the presence of
>> + * ghost windows is often indicative of a memory leak.
>> + *
>> + * nsWindowMemoryReporter observes window detachment and uses mDetachedWindows
>> + * to remember when a window first met criteria (1) and (2).
>
>What if a window starts failing to meet criteria 2?  As far as I can see, criteria 1 is monotonic, but the user could open up a new window with the new domain.  Can we be sure that there's no relation between the two?

If a window starts failing to meet criterion 2, it's no longer a ghost.  I don't know what you mean about being sure there's no relation between the two.

>@@ +143,5 @@
>>    // Protect ctor, use Init() instead.
>>    nsWindowMemoryReporter();
>> +
>> +  /**
>> +   * Called when our asynchronous event to call CheckForGhostWindows, enqueued
>
>This sentence is weird and confusing.

This was a seriously bad sentence; thanks for pointing it out.

>@@ +173,5 @@
>> +   * (Although windows are not added to this table until they're detached, it's
>> +   * possible for a detached window to become non-detached, and we won't
>> +   * remove it from the table until CheckForGhostWindows runs.)
>> +   */
>> +  nsDataHashtable<nsISupportsHashKey, mozilla::TimeStamp> mDetachedWindows;
>
>It seems a little sad to have a list of detached windows here, and another list for the patch Olli is working on.  I wonder if it makes sense to have a central list?  You should coordinate a little with him on that.

I talked to Olli over dinner last night, and it's not clear whether his spooky windows will be compatible with these here.  He probably doesn't care about reducing false-positives using the same-domain check that I have here, for example.

Anyway, he said I should land this as-is, and we can figure out if we can combine efforts once he figures out his patch.
That all sounds good to me!
I will probably create a bit different kind of list. Something which contains zombie documents
and windows, and adds some bypass counter to them etc.
Push backed out for mochitest-oth orange:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=9e4d09efa335

https://tbpl.mozilla.org/php/getParsedLog.php?id=10542171&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=10540830&tree=Mozilla-Inbound
...and a few more.

(Had trouble loading the logs due to size and I've got to be up in a few hours, so don't have anything more specific right now, sorry.)

https://hg.mozilla.org/integration/mozilla-inbound/rev/abe59de81f87
I think the last push failed because:

 * mochitest-chrome leaks many windows with the null principal,
 * every time we detach a window, we iterate over all windows and call EffectiveTLDService::GetBaseDomain(window_uri),
 * for windows with the null principal, window_uri is null, and
 * EffectiveTLDService::GetBaseDomain prints a NS_ENSURE warning when it's passed null,
 * which caused us to overflow our logs.

I fixed this by not calling EffectiveTLDService::GetBaseDomain when the URI is null.
Sorry, this was backed out again on inbound because of WinXP debug reftest logs overflowing:
https://hg.mozilla.org/integration/mozilla-inbound/rev/61aa02b5a7a1
Once more, with feeling.  I looked through some of the logs from try and didn't see any more masses of warnings.

https://hg.mozilla.org/integration/mozilla-inbound/rev/e517839de582
https://hg.mozilla.org/mozilla-central/rev/e517839de582
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla14
You need to log in before you can comment on or make changes to this bug.