Last Comment Bug 757807 - Ghost windows on nytimes.com
: Ghost windows on nytimes.com
Status: RESOLVED FIXED
[MemShrink]
: mlk
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: x86_64 Windows 7
: -- normal with 1 vote (vote)
: mozilla15
Assigned To: Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
:
Mentors:
http://robert.ocallahan.org/2012/05/f...
: 727974 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-23 05:30 PDT by Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
Modified: 2012-05-27 15:45 PDT (History)
9 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
affected
-
affected
+
fixed


Attachments
fix (6.64 KB, patch)
2012-05-23 23:34 PDT, Robert O'Callahan (:roc) (Exited; email my personal email if necessary)
bugs: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta-
Details | Diff | Splinter Review

Description Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 05:30:58 PDT
Several times with long-lived browser sessions I've noticed "Ghost Windows" in about:compartments at nytimes.com domains. Currently I've got:

http://movies.nytimes.com/2012/05/16/movies/the-dictator-with-sacha-baron-cohen.html [4]
http://www.nytimes.com/2007/02/14/world/asia/14iht-green.4590765.html [4]
http://www.nytimes.com/2012/05/18/world/asia/china-princelings-using-family-ties-to-g [clipped, maybe 4 too?]

I do not have many addons installed. Currently the only enabled addons are about:telemetry, about:ccdump, DOM Inspector, Font Information, and PDF Viewer.

I installed about:ccdump and had it identify "zombie documents". It identified two documents for each of the above URLs (6 total). Then I did "Show Roots" on each of those.
For both http://www.nytimes.com/2012/05/18/world/asia/china-princelings-using-family-ties-to-gain-riches.html?_r=1 documents, it returned the same root: a single <p> element in one of those documents with refcount 5, known edges 3.
For both http://www.nytimes.com/2007/02/14/world/asia/14iht-green.4590765.html documents, it return the same root: the same <p> element.
For both http://movies.nytimes.com/2012/05/16/movies/the-dictator-with-sacha-baron-cohen.html documents, it returned the same root: a <p> element in one of those documents.
Comment 1 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 05:47:18 PDT
That <p> element also has refcount 5, known edges 3. It is a child of a div with class "articleBody", i.e., one of the regular paragraphs of the article. (It happens to be the one start "There is nothing especially outrageous here.")

The other <p> element is the one starting "Chinese officials and their relatives rarely discuss such a delicate issue publicly."
Comment 2 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 05:58:04 PDT
In my current browser session, every time I double-click on a word in the article, the Web page pops up its little "?" icon you can click on for a definition. If I click on that icon, close the resulting popup window, and then close the article tab, I have a new ghost window for that article.

In a debug build with a test profile, this does not happen :-(.
Comment 3 Kyle Huey [:khuey] (khuey@mozilla.com) 2012-05-23 11:14:09 PDT
I would identify one of the nodes being held alive by unknown edges, set a breakpoint on nsFoo::Release with this == that node, and then close the tab and identify the stacks that don't match up with what the CC sees.
Comment 4 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 14:13:06 PDT
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #3)
> I would identify one of the nodes being held alive by unknown edges, set a
> breakpoint on nsFoo::Release with this == that node, and then close the tab
> and identify the stacks that don't match up with what the CC sees.

Tried that. In fact, tried exiting Firefox entirely. The breakpoint was not hit. Maybe I did something wrong. Or maybe it was because I was using an opt build --- although it did have symbols and it wasn't PGO.

Fortunately I seem to be able to reproduce the problem quickly when I start up again in my default profile (with session store restored).
Comment 5 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 15:43:14 PDT
This may actually involve DOM Inspector. I've been using DOM Inspector to bypass the NYTimes article limit by manually removing the overlay element.
Comment 6 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 16:08:18 PDT
Here's what I think are steps to reproduce.

1) Install DOM Inspector (2.0.11 here).
2) Visit articles at nytimes.com until you can no longer view articles for free.
3) Visit one more article to get the paywall overlay.
4) Open DOM inspector, click the left-most toolbar button, click on the paywall overlay image to select that element in DOM Inspector.
5) Follow the DOM hierarchy up to the outermost DIV element containing the overlay image, and remove that DIV from the DOM (e.g. pressing Del or using "Cut" from the DOMI context menu).
6) Double-click on a word in the article text of the NYTimes page.
7) Click on the ? icon to get a definition of the word in a popup window.
8) Close the popup window.
9) Close the DOM Inspector window.
10) Close the NYTimes page tab.
11) Open about:compartments.

You should have a ghost window for the NYTimes page.

I'll try to minimize these STR a bit.
Comment 7 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 16:31:42 PDT
I think you can skip the ? icon and the popup window bit. I've reproduced the bug in a new profile just by opening tons on NYTimes articles until I get the paywall overlay, then removing it using the DOM inspector as described above, then closing the NYTimes tabs.

Interestingly, it's not just the window where I removed the paywall overlay that is shown as a ghost window.
Comment 8 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 16:49:32 PDT
I tried doing comment #3 again and again I never hit the breakpoint. Maybe the addresses exposed by about:ccdump don't match the 'this' pointer in nsGenericElement::Release, or maybe Visual Studio is screwing things up in some other way.
Comment 9 Justin Lebar (not reading bugmail) 2012-05-23 17:06:02 PDT
*** Bug 727974 has been marked as a duplicate of this bug. ***
Comment 10 Justin Lebar (not reading bugmail) 2012-05-23 17:40:47 PDT
> Here's what I think are steps to reproduce.

On a clean profile (no add-ons), I saw a ghost window just after visiting a few sites on nytimes then opening about:compartments.  No double-clicking on words, no paywall interspersed.  (This was on a debug build, and I haven't been able to reproduce it.)

It will probably help to set memory.ghost_window_timeout_seconds to 1.  At its default value of 60, you won't get any ghost windows until they've been detached for 60s; that may be affecting your STR.

(Alternatively, you can leave the timeout the same and simply look for a nytimes compartment in the list.)
Comment 11 Justin Lebar (not reading bugmail) 2012-05-23 21:09:42 PDT
> It will probably help to set memory.ghost_window_timeout_seconds to 1.  At its default value of 60, 
> you won't get any ghost windows until they've been detached for 60s; that may be affecting your STR.

Actually, you don't need to do this.  When you open about:compartments, it gc/cc's a bunch of times, then marks anything which survives as a ghost.
Comment 12 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 21:32:09 PDT
I opened nytimes.com, then loaded a bunch of pages from there into different tabs, then closed the tabs. I had to go through several rounds of opening pages into other tabs and closing the tabs, because I had trouble getting ghost windows. Then, with just the home page open in a tab, I noticed that under User Compartments there was a page from a tab that I'd already closed, but it wasn't showing up in Ghost Windows. So I closed the main nytimes.com tab and presto, the offending compartment started showing up in Ghost Windows. Maybe that's expected, but it seemed weird.
Comment 13 Justin Lebar (not reading bugmail) 2012-05-23 21:39:15 PDT
> So I closed the main nytimes.com tab and presto,
> the offending compartment started showing up in Ghost Windows. Maybe that's
> expected, but it seemed weird.

Yes, that's by design.  I'm writing a blog post about it now, but until then, see

http://hg.mozilla.org/mozilla-central/file/f43e8d300f21/dom/base/nsWindowMemoryReporter.h#l58
Comment 14 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 21:49:44 PDT
Instead of comment #3 (breakpoint in nsGenericElement::Release), I tried setting a data breakpoint on the mRefCnt member of the offending element. This is a LOT faster, but this breakpoint was never hit during shutdown either :-(. I conclude that the 2 unknown references to the element are in fact permanently leaked. As supporting evidence, my debug build complains that zillions of nytimes-related URLs are leaked, and the --DOMWINDOW count ends at 5.
Comment 15 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 23:14:51 PDT
I think I've found it!

nsFrameSelection contains mDelayedMouseEvent. mDelayedMouseEvent contains "target", "currentTarget" and "originalTarget". All three are nsCOMPtrs, none are traced by cycle collection. In my memory dump, "target" and "originalTarget" are the unknown references to the leaking <p> element.
Comment 16 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 23:15:26 PDT
(this revelation brought to you by WinDbg and its "search memory" command)
Comment 17 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 23:22:07 PDT
The callers of GetDelayedCaretData() only use me->IsShift() and me->clickCount. So we don't even need to store an actual event there.
Comment 18 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-23 23:34:23 PDT
Created attachment 626706 [details] [diff] [review]
fix
Comment 19 Olli Pettay [:smaug] 2012-05-24 03:13:29 PDT
Comment on attachment 626706 [details] [diff] [review]
fix


> nsFrameSelection::nsFrameSelection()
>-  : mDelayedMouseEvent(false, 0, nsnull, nsMouseEvent::eReal)
> {
>   PRInt32 i;
>   for (i = 0;i<nsISelectionController::NUM_SELECTIONTYPES;i++){
>     mDomSelections[i] = new nsTypedSelection(this);
>     mDomSelections[i]->SetType(GetSelectionTypeFromIndex(i));
>   }
>   mBatching = 0;
>   mChangesDuringBatching = false;

You should initialize the new member variables somewhere in the ctor.

I think we should take the patch to Aurora too.
Leaks are bad.
Comment 20 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-24 03:41:15 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/a339c3a648af
Comment 21 Olli Pettay [:smaug] 2012-05-24 05:23:24 PDT
This should be safe for Aurora, maybe even for Beta.
Comment 22 Justin Lebar (not reading bugmail) 2012-05-24 07:58:48 PDT
Should we file a separate bug on hooking nsMouseEvent up to the CC?
Comment 23 Olli Pettay [:smaug] 2012-05-24 08:42:46 PDT
No. ns*Event structs shouldn't be CCable. They aren't refcounted either.
This was just a bizarre and wrong way to use an ns*Event struct.
Comment 24 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-24 21:52:27 PDT
Comment on attachment 626706 [details] [diff] [review]
fix

Review of attachment 626706 [details] [diff] [review]:
-----------------------------------------------------------------

This is a bad bug, and a very simple fix.

It would be great to get this fixed for FF13, but since it's an old bug it would be OK to miss FF13.
Comment 25 :Ms2ger (⌚ UTC+1/+2) 2012-05-25 08:24:57 PDT
https://hg.mozilla.org/mozilla-central/rev/a339c3a648af
Comment 26 Alex Keybl [:akeybl] 2012-05-25 11:23:52 PDT
Comment on attachment 626706 [details] [diff] [review]
fix

[Triage Comment]
Thanks Robert - I'd rather we didn't take this on FF13 for our final beta, since it doesn't feel like the issue chemspill-worthy (although it does look low risk). Approving for Aurora 14, however.
Comment 27 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-27 15:01:41 PDT
https://hg.mozilla.org/releases/mozilla-aurora/rev/e4dc3f0a913c
Comment 28 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2012-05-27 15:45:40 PDT
Bustage fix:
https://hg.mozilla.org/releases/mozilla-aurora/rev/7802a253d406

Note You need to log in before you can comment on or make changes to this bug.