Last Comment Bug 789773 - nsIWebProgressListener implementations referencing the load's window in onStateChange() (like NoScript or Roboform) cause popup loads to be aborted and the browser to hang on exit
: nsIWebProgressListener implementations referencing the load's window in onSta...
Status: VERIFIED FIXED
[MemShrink:P3]
: regression, verifyme
Product: Core
Classification: Components
Component: Document Navigation (show other bugs)
: 17 Branch
: x86 Windows XP
: -- normal with 2 votes (vote)
: mozilla18
Assigned To: Bobby Holley (:bholley) (busy with Stylo)
: juan becerra [:juanb]
Mentors:
: 789808 791401 791700 791802 791812 792073 792368 792548 (view as bug list)
Depends on: 792269 792899
Blocks: 774633 790218 791328 791812 791960
  Show dependency treegraph
 
Reported: 2012-09-09 01:30 PDT by Vladimir_S
Modified: 2012-12-27 02:52 PST (History)
42 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected
+
verified
+
verified
+
verified
unaffected


Attachments
about:memory for nytimes.com ghost windows (8.97 KB, text/plain)
2012-09-09 21:14 PDT, timbugzilla
no flags Details
about:memory - Chase.com (9.66 KB, text/plain)
2012-09-10 12:00 PDT, therube
no flags Details
Contrived test case as a chrome scratchpad (1.00 KB, application/x-javascript)
2012-09-12 02:50 PDT, Giorgio Maone [:mao]
no flags Details
Cross-browser reduced test case as a chrome scratchpad (1.16 KB, application/x-javascript)
2012-09-13 09:22 PDT, Giorgio Maone [:mao]
no flags Details
Don't kill loads when calling nsDocShell::CreateAboutBlankContentViewer to replace an existing initial about:blank document. v2 (2.84 KB, patch)
2012-09-14 04:18 PDT, Bobby Holley (:bholley) (busy with Stylo)
bzbarsky: review-
Details | Diff | Splinter Review
Tests. v1 (3.54 KB, patch)
2012-09-14 04:20 PDT, Bobby Holley (:bholley) (busy with Stylo)
bzbarsky: review+
Details | Diff | Splinter Review
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v1 (2.11 KB, patch)
2012-09-16 09:19 PDT, Bobby Holley (:bholley) (busy with Stylo)
bzbarsky: review+
Details | Diff | Splinter Review
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v2 (2.25 KB, patch)
2012-09-17 11:05 PDT, Bobby Holley (:bholley) (busy with Stylo)
no flags Details | Diff | Splinter Review
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v3 (2.29 KB, patch)
2012-09-17 14:57 PDT, Bobby Holley (:bholley) (busy with Stylo)
bzbarsky: review+
akeybl: approval‑mozilla‑aurora+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Vladimir_S 2012-09-09 01:30:08 PDT
User Agent: Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0
Build ID: 20120904124322

Steps to reproduce:

Tested the work of 17.0a2 and later 18.0a1 Fx branches with NoScript extension. Versions of NoScript: 2.5.3, 2.5.4 and 2.5.4rc3.


Actual results:

After the Aurora update from Sept.07.2012 after closing of the browser the process firefox.exe still remains active; VB-based forum works slightly incorrectly - gives no opportunity to load the attachment to message: the corresponding pop-up window does not appear. Tested on http://www.tehnari.ru/ (registration required). 


Expected results:

Nothing except ordinary work of browser.
Comment 1 Vladimir_S 2012-09-09 01:39:05 PDT
Details:
Passing to Safe Browser mode heals the situation.
Creating the new profile with ONLY NoScript installed extension shows the bug described.
The permission of all scripts in NoScript does not help.
Branch Fx 16.0b2 has no problems with NoScript.
Before Sept.07.2012 Aurora with active NoScript worked perfectly.
Comment 2 timbugzilla 2012-09-09 04:24:23 PDT
I have been having this problem (firefox.exe process remains after exiting browser) as well with noscript and the recent nightly moz-central win32 builds. It has also been causing ghost windows after about 1 hour of browsing.

Oddly, this happens on win7 x64 but not win8 RTM x86.

Please add "MemShrink" to the whiteboard field.
Comment 3 timbugzilla 2012-09-09 04:35:46 PDT
Sorry, make that "[MemShrink]"
Comment 4 Nicholas Nethercote [:njn] 2012-09-09 17:07:24 PDT
I don't see why this is relevant to MemShrink.
Comment 5 timbugzilla 2012-09-09 17:38:19 PDT
(In reply to Nicholas Nethercote [:njn] from comment #4)
> I don't see why this is relevant to MemShrink.

I am getting ghost windows with noscript (see comment 2).
Comment 6 timbugzilla 2012-09-09 21:14:00 PDT
I can reproduce this on win8 x86 now. After exiting Nightly the firefox.exe process remains alive. 

Ghost windows are held from nytimes.com pages. These are frequently seen (takes > 1 hour to start seeing them though).  There are problems with opening images via js commands on the nytimes site due to noscript. Perhaps there is a problem with the script surrogates?

I've attached my about:memory output as a txt file. An example of the js that noscript is incorrectly blocking on the whitelisted NY Times page is:

javascript:pop_me_up2('http://www.nytimes.com/imagepages/2012/09/09/realestate/09COVER3.html','09COVER3_html','width=720,height=563,scrollbars=yes,toolbars=no,resizable=yes')
Comment 7 timbugzilla 2012-09-09 21:14:56 PDT
Created attachment 659626 [details]
about:memory for nytimes.com ghost windows
Comment 8 Giorgio Maone [:mao] 2012-09-09 22:54:19 PDT
I'm on my way back from Mozcamp and I can't look into this seriously until tomorrow, but could you please check wether the problem goes away if you toggle the noscript.surrogate.enabled about:config preference to false, and possibly find a regression window either in NoScript or, more likely, in Firefox?
Comment 9 timbugzilla 2012-09-09 23:54:43 PDT
(In reply to Giorgio Maone from comment #8)
> I'm on my way back from Mozcamp and I can't look into this seriously until
> tomorrow, but could you please check wether the problem goes away if you
> toggle the noscript.surrogate.enabled about:config preference to false, and
> possibly find a regression window either in NoScript or, more likely, in
> Firefox?

I set it to false (in safe mode), but am still getting ghost windows, and still can't open the pop-up images on nytimes.com pages. Weird.
Comment 10 timbugzilla 2012-09-10 00:20:11 PDT
Oddly I can't seem to turn surrogates off - e.g. I still see entries like the following in about:memory despite the "false" value for noscript.surrogate.enabled remaining after restarting etc.

─109,056 B (00.12%) -- compartment(https://www.google.co.nz/, NoScript::ScriptSurrogate@https://www.google.co.nz/ (from: chrome://noscript/content/ScriptSurrogate.js:238))
│  │   │  │       ├───90,112 B (00.10%) -- gc-heap
Comment 11 timbugzilla 2012-09-10 02:06:29 PDT
I'm still getting ghost windows with noscript 2.5.2, and I don't remember having problems that far back. So, change in moz-central that triggered it?
Comment 12 [Baboo] 2012-09-10 04:13:07 PDT
The ghost windows are listed under about:compartments?
Comment 14 therube 2012-09-10 11:52:26 PDT
(Don't know about ghost windows, but) I'll confirm this, the hang on Quit.

"NoScript+Fetch Text URL+Aurora=Hang"
http://forums.informaction.com/viewtopic.php?f=10&t=9902

& I have some posts in this thread too:

"NoScript blocking javascript popup on whitelisted website"
http://forums.informaction.com/viewtopic.php?f=7&t=9461

The Sept.07.2012 date is what I found too.

works: http://hg.mozilla.org/releases/mozilla-aurora/rev/c2cc63c864e4
broke: http://hg.mozilla.org/releases/mozilla-aurora/rev/41eab5a0e537

Setting noscript.surrogate.enabled;false did not help the situation.
Comment 15 therube 2012-09-10 11:53:56 PDT
Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a2
Build identifier: 20120910013002

I see the same (at least the NoScript+Fetch Text URL+Aurora) on Win7 x64 & Aurora.
Comment 16 therube 2012-09-10 12:00:47 PDT
Created attachment 659812 [details]
about:memory - Chase.com
Comment 17 Matthias Versen [:Matti] 2012-09-10 13:15:44 PDT
Could anyone find the exact regression range (the last working build and the first broken one) ?
Could this be caused by a Noscript update ?
Comment 18 therube 2012-09-10 13:42:27 PDT
From what I saw, the NoScript version was immaterial.
(I had checked back to noscript-2.0.9.9rc6.xpi, March 2, 2011, & the issue still persists.)
Comment 19 Andrew McCreight [:mccr8] 2012-09-10 13:58:31 PDT
Yes, we need a regression window for Firefox.
Comment 20 therube 2012-09-10 14:06:53 PDT
STR:

URL: http://www.vbforums.com/forumdisplay.php?25-Visual-Basic-.NET
On that page, the thread, "Sticky: VB.NET Example Code & Hints you may not know"

Click the paperclip.
A window pops up.

Quit.

Firefox exits.
 
 
Install NoScript & repeat, but do NOT allow vbforums.com.

Click the paperclip.
No window pops up (that is not necessarily an issue).

Quit.

Firefox exits.
 
 
Repeat again, though this time allowing vbforums.com.

Click the paperclip.
No popup window.

Quit.

Firefox hangs in memory.
Comment 21 therube 2012-09-10 15:41:10 PDT
*** Bug 789808 has been marked as a duplicate of this bug. ***
Comment 22 Josh Matthews [:jdm] (away until 9/3) 2012-09-10 15:51:23 PDT
(In reply to Matthias Versen (Matti) from comment #17)
> Could anyone find the exact regression range (the last working build and the
> first broken one) ?
> Could this be caused by a Noscript update ?

Is comment 14 good enough?
Comment 23 David Clark 2012-09-10 16:44:45 PDT
I am able to reproduce this bug without NoScript in safe mode and a new profile on my Windows 8 x64 RTM with the x86 and x64 nightlies since the Sep 6th.

I did some cross referencing on the changelogs of the Sep 6th nightly (http://forums.mozillazine.org/viewtopic.php?f=23&t=2535353) and the regression range in comment 14 (http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?startdate=2012-09-06+01%3A00%3A00&enddate=2012-09-06+19%3A00%3A00) and got three bugs: 771354, 774607 and 774633.
Comment 24 Matthias Versen [:Matti] 2012-09-10 17:02:59 PDT
(In reply to Josh Matthews [:jdm] from comment #22)
> Is comment 14 good enough?
It is but I somehow missed that comment
Comment 25 Bobby Holley (:bholley) (busy with Stylo) 2012-09-10 17:21:17 PDT
I'm totally willing to believe that bug 774633 broke something here. I had to mess around with a lot of the fragile embedding layer stuff (appshell, windowwatcher, etc). So it's likely a subtle behavior change with respect to opening/closing windows that NoScript is choking on.

That being said, we're unlikely to back out bug 774633, because it's very important for security. It'd be great if somebody who knows the NoScript code could dive in and let us know what's going on.
Comment 26 Nicholas Nethercote [:njn] 2012-09-10 20:50:57 PDT
> It'd be great if somebody who knows the NoScript
> code could dive in and let us know what's going on.

Comment 23 suggests that NoScript isn't necessary for the bug to trigger.  Having said that, breaking NoScript is bad -- it's the 6th most popular add-on on AMO with over 2 million users.
Comment 27 Bobby Holley (:bholley) (busy with Stylo) 2012-09-11 01:17:49 PDT
(In reply to Nicholas Nethercote [:njn] from comment #26)
> > It'd be great if somebody who knows the NoScript
> > code could dive in and let us know what's going on.
> 
> Comment 23 suggests that NoScript isn't necessary for the bug to trigger. 

Ah, hm.

There are a lot of mixed signals here about what this bug is actually about. Can someone figure out exactly what the problem is, list precise steps to reproduce, and figure out which platforms are affected?
Comment 28 timbugzilla 2012-09-11 01:23:01 PDT
Perhaps comment 23 had a typo? I can't reproduce without noscript.
Comment 29 Sat 2012-09-11 14:07:43 PDT
Nightly from 5th September is the one that does not have this bug. The build from 6th already has it.
Comment 30 David Clark 2012-09-11 14:15:51 PDT
No my comment in comment 23 is not a typo, I've never installed NoScript before but can reproduce in safe mode and new profiles. I'm able to go back to the 5th Nightly and everything is fine so I'm guessing some configuration, of installed programs, on my side has the same effect as having NoScript installed.
Comment 31 Bobby Holley (:bholley) (busy with Stylo) 2012-09-11 16:54:38 PDT
David, can you describe the exact bug you're seeing? Is it windows-only?
Comment 32 therube 2012-09-11 17:09:12 PDT
> I'm guessing some configuration, of installed programs, on my side has the same
> effect as having NoScript installed.

That's very possible.

So how about providing your steps to reproduce & the extensions you have installed.

And how about bisecting your extensions to see if you can come up with the one(s) causing the problem?

In (another) case of mine, the Fetch Text URL extension & (as it was) NoScript, together combine to cause (likely) this issue when I use the "Fetch text URL in /new window/" (& not in a new tab, or current window) function in FTU.
Comment 33 David Clark 2012-09-11 19:13:57 PDT
I've tested the same profile on a Windows 8 RTM and a Windows 7 SP1 machine (both 64-bit versions) and it only affected the Windows 8 machine. It also affects safe mode as well as new profiles on that machine.

The problem I am seeing is that JavaScript popups links fail to display (example: http://www.htmlcodetutorial.com/linking/linking_famsupp_72.html, clicking on the 'my popup' link). Every time it is clicked however a compartment is created, shown as 'http://www.htmlcodetutorial.com/linking/linking_famsupp_72.html, about:blank [n]' (n is the number of times clicked) in about:compartments. This most likely leads to the problem when Firefox is closed but gets stuck during the shutdown process.
Comment 34 timbugzilla 2012-09-11 19:25:29 PDT
(In reply to David Clark from comment #33)
> I've tested the same profile on a Windows 8 RTM and a Windows 7 SP1 machine
> (both 64-bit versions) and it only affected the Windows 8 machine. It also
> affects safe mode as well as new profiles on that machine.
> 
> The problem I am seeing is that JavaScript popups links fail to display
> (example: http://www.htmlcodetutorial.com/linking/linking_famsupp_72.html,
> clicking on the 'my popup' link). Every time it is clicked however a
> compartment is created, shown as
> 'http://www.htmlcodetutorial.com/linking/linking_famsupp_72.html,
> about:blank [n]' (n is the number of times clicked) in about:compartments.
> This most likely leads to the problem when Firefox is closed but gets stuck
> during the shutdown process.

I cannot reproduce this in safe-mode (on win8 RTM (x86)). Should the bug reported in comment 33 be filed as a separate bug?
Comment 35 timbugzilla 2012-09-11 19:29:23 PDT
With noscript enabled the procedure in comment 33 does produce a ghost window and shutdown problems however.
Comment 36 David Clark 2012-09-11 22:13:00 PDT
Well I don't think I need to make a separate bug as I found that I can no longer can reproduce it after uninstalling and reinstalling a few programs (must have been an unique configuration issue).
Comment 37 Giorgio Maone [:mao] 2012-09-12 02:47:01 PDT
OK, I've found it, and surrogates are not guilty at all.

If any nsIWebProgressListener implementation tries to access wp.DOMWindow during the popup window initial chrome load (for the browser.xul file), the loading gets silently aborted (no exception is thrown), the ghost window is left around and the browser hangs on exit.

This issue is very similar to bug 771655. In facts, the contrived test case (which triggers the bug in safe mode) is almost identical: 

<scratchpad environment=browser>

if ("_testPL" in window)
  Cc['@mozilla.org/docloaderservice;1'].getService(Ci.nsIWebProgress)
    .removeProgressListener(_testPL);

window._testPL = {
  START_DOC: Ci.nsIWebProgressListener.STATE_START | Ci.nsIWebProgressListener.STATE_IS_DOCUMENT,
  onStateChange: function(wp, req, stateFlags, status) {
    if ((stateFlags & this.START_DOC) === this.START_DOC &&
        /^jar:file:\/\/.*\/browser.xul$/.test(req.name)) {
      wp.DOMWindow;
      Cu.reportError("wp.DOMWindow touched on " + req.name);
    }
  },
  QueryInterface: function(iid) {
    if (iid.equals(Ci.nsISupportsWeakReference) || 
        iid.equals(Ci.nsIWebProgressListener))
      return this;
    throw Cr.NS_ERROR_NO_INTERFACE;
  }
}

Cc['@mozilla.org/docloaderservice;1'].getService(Ci.nsIWebProgress)
  .addProgressListener(_testPL, Ci.nsIWebProgress.NOTIFY_STATE_REQUEST);

try {
  content.open("http://mozilla.org", "_blank", "width=640,height=400")
} catch(e) {
  Cu.reportError(e);
} finally {
  alert("Done");
}
</scratchpad>

Notice that the onStateChange() method runs to completion, but the scratchpad script doesn't: the "Done" alert is never shown, and no error is shown in the console either.

I probably can and will work-around this in NoScript by retrieving the window from the channel's notificationCallback, but I'm fairly sure other extensions (e.g. Roboform) are going to break Firefox if the root problem doesn't get understood and possibly fixed.
Comment 38 Giorgio Maone [:mao] 2012-09-12 02:50:51 PDT
Created attachment 660366 [details]
Contrived test case as a chrome scratchpad

To reproduce this bug, toggle the devtools.chrome.enabled preference to true, then open a Scratchpad window (ctrl+F4), select Environment>Browser and open the attachment. Run it with ctrl+Run.
Expected result: a mozilla.org popup opens, and a "Done" alert is shown.
Actual result: nothing happens, and the browser hangs on exit.
Comment 39 Giorgio Maone [:mao] 2012-09-12 03:23:59 PDT
It looks like the work-around I proposed myself in comment 37 doesn't work after all: just referencing the window, even if obtained from the channel's notificationCallback, breaks this stuff.

I guess I'll have to single out the browser.xul load (UGLY!). Please fix.
Comment 40 Giorgio Maone [:mao] 2012-09-12 04:41:19 PDT
FYI the ugly work-around mentioned above is available in NoScript 2.5.5rc1 from http://noscript.net/getit#devel
Comment 41 timbugzilla 2012-09-12 04:43:14 PDT
(In reply to Giorgio Maone from comment #40)
> FYI the ugly work-around mentioned above is available in NoScript 2.5.5rc1
> from http://noscript.net/getit#devel

Thanks! Does the work-around have any performance impact?
Comment 42 Giorgio Maone [:mao] 2012-09-12 06:19:50 PDT
(In reply to timbugzilla from comment #41)
> (In reply to Giorgio Maone from comment #40)
> > FYI the ugly work-around mentioned above is available in NoScript 2.5.5rc1
> > from http://noscript.net/getit#devel
> 
> Thanks! Does the work-around have any performance impact?

Practically none.
Comment 43 Vladimir_S 2012-09-12 07:49:14 PDT
(In reply to Giorgio Maone from comment #40)
> FYI the ugly work-around mentioned above is available in NoScript 2.5.5rc1
> from http://noscript.net/getit#devel

From bugreporter: thank you very much! Tested Aurora+NoScript 2.5.5rc1 - works pefectly, no problems. Neither with browser closing, no with VB forums.
Comment 44 therube 2012-09-12 08:57:45 PDT
Unfortunately this issue persists in SeaMonkey (-:
 
 
New Profile
Install NoScript 2.5.5rc1
Load http://www.htmlcodetutorial.com/linking/linking_famsupp_72.html
Allow htmlcodetutorial.com
Click the 'my popup' link on that page

No popup ensues

Quit

SeaMonkey hangs in memory
 
 
Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a2
Build identifier: 20120912013003
Comment 45 Giorgio Maone [:mao] 2012-09-12 12:02:14 PDT
(In reply to therube from comment #44)
> Unfortunately this issue persists in SeaMonkey (-:

No suprise, since current work-around involves checking for the specific browser window XUL address, which may differ (and does differ, in this case) across browsers.

I'm gonna release a refined work-around which tries to be browser-agnostic in next build.
Comment 46 therube 2012-09-12 12:47:07 PDT
Working now in SeaMonkey with the latest development build 2.5.5rc2, thanks :-).
Comment 47 Bobby Holley (:bholley) (busy with Stylo) 2012-09-13 09:10:00 PDT
Thanks for the reduced testcase, giorgio. Confirmed on mac and windows. I'm looking into it.

Note that in unpackaged debug builds, the testcase doesn't actually work, because it assumes that the load is in a jar. Just loosening up the regexp to /browser.xul/ seems to do the trick.
Comment 48 Giorgio Maone [:mao] 2012-09-13 09:22:31 PDT
Created attachment 660865 [details]
Cross-browser reduced test case as a chrome scratchpad

To reproduce this bug, toggle the devtools.chrome.enabled preference to true, then open a Scratchpad window (ctrl+F4), select Environment>Browser and open the attachment. Run it with ctrl+Run.

This version matches the correct browser window URI automatically, no matter which toolkit-based browser we're using.
Comment 49 Bobby Holley (:bholley) (busy with Stylo) 2012-09-13 10:46:13 PDT
Ok, so the shutdown hang has to do with the fact that we never actually make it out of the call to nsXULWindow::CreatNewContentWindow. That function spins the event loop while waiting for the chrome-y part of the content window to load:

http://hg.mozilla.org/mozilla-central/file/f39786e8364d/xpfe/appshell/src/nsXULWindow.cpp#l1770

But we never actually return from that. Yuck.
Comment 50 Bobby Holley (:bholley) (busy with Stylo) 2012-09-13 15:54:44 PDT
Ah, ok. So here's what happens:

1 - The scratchpad script calls content.open("http://mozilla.org", "_blank")

2 - This calls into nsXULWindow::CreateNewContentWindow, which calls nsAppShellService::CreateTopLevelWindow, and then spins the event loop until the chrome-y part of the window is ready.

3 - As the load starts, we fire OnStateChange to signal the beginning of the document load, which calls the handler in the testcase in comment 37. The stack at this point looks like this: http://pastebin.mozilla.org/1820172 . Note that we're dealing with the nsGlobalChromeWindow here, not the content window inside of it.

4 - The call to GetDOMWindow causes the DOM window to get XPConnect-wrapped on its way out the door as a return value, which causes us to generate an about:blank window, per the stack here: http://pastebin.mozilla.org/1820473.

5 - Then, per the changes in bug 774633, we call SetInitialPrincipalToSubject, which causes us to create a _new_ about:blank content viewer (with a different principal), like so: http://pastebin.mozilla.org/1820476

6 - Because of (4), the call in (5) runs into an existing mContentViewer in the DocShell, which it proceeds to blow away. It calls Stop on the load group, which cancels the outstanding request to load browser.xul.

7 - This cancellation synchronously calls into nsWebShellWindow::OnStateChange for the loading XUL window, and sets mLockedUntilChromeLoad to false (since we're no longer waiting for the load). But this all happens too soon! We haven't yet set it to true, which we do after unwinding the stack to nsXULWindow::CreateNewContentWindow. It sets mLockedUntilChromeLoad immediately after calling CreateTopLevelWindow. This is a reasonable assumption, because the URL it's trying to load is browser.xul, which _should_ load asynchronously. But we just cancelled that load and replaced it with an about:blank document, so we spin indefinitely waiting to hear that the chrome document loaded.
Comment 51 Bobby Holley (:bholley) (busy with Stylo) 2012-09-13 16:19:40 PDT
So, I think we should avoid cancelling pending loads if the existing document is about:blank (and possibly the initial document, as well?). I'm going to work up an automated test and a patch.
Comment 52 Bobby Holley (:bholley) (busy with Stylo) 2012-09-13 17:10:46 PDT
https://tbpl.mozilla.org/?tree=Try&rev=09a64b17e52d
Comment 53 Bobby Holley (:bholley) (busy with Stylo) 2012-09-14 04:18:37 PDT
Created attachment 661180 [details] [diff] [review]
Don't kill loads when calling nsDocShell::CreateAboutBlankContentViewer to replace an existing initial about:blank document. v2

This is green on try modulo one assertion that I think we can tweak. I'll let bz make the call here.

Note that this needs to land and be backported to beta pronto.
Comment 54 Bobby Holley (:bholley) (busy with Stylo) 2012-09-14 04:20:26 PDT
Created attachment 661181 [details] [diff] [review]
Tests. v1
Comment 55 Boris Zbarsky [:bz] 2012-09-15 17:25:32 PDT
Comment on attachment 661181 [details] [diff] [review]
Tests. v1

r=me
Comment 56 Boris Zbarsky [:bz] 2012-09-15 17:25:53 PDT
Comment on attachment 661180 [details] [diff] [review]
Don't kill loads when calling nsDocShell::CreateAboutBlankContentViewer to replace an existing initial about:blank document. v2

+++ b/docshell/base/nsDocShell.cpp
+  if (mContentViewer && !viewerIsInitialDocument) {
+    // We've got a nontrivial content viewer already. Make sure the user

This doesn't look right.  There are various ways web content can end up working with the initial document in a docshell, adding DOM nodes to it, etc.  So we really do want to enter this code for initial documents in some cases.

Why are we not creating the initial about:blank with the right principal?  Doing that seems like it would be a better fix to me.  For example, can we stash the right principal on the docshell before we go loading stuff in it, so it knows to create things in EnsureContentViewer with the right principal?
Comment 57 Bobby Holley (:bholley) (busy with Stylo) 2012-09-15 17:56:05 PDT
(In reply to Boris Zbarsky (:bz) from comment #56)
> Why are we not creating the initial about:blank with the right principal?
> Doing that seems like it would be a better fix to me.  For example, can we
> stash the right principal on the docshell before we go loading stuff in it,
> so it knows to create things in EnsureContentViewer with the right principal?

I asked you the same question on IRC in july. You said "we just don't have the APIs for that." So I'm confused about your question here, unless you're implying that the situation here is different.

Though now that you mention it, I could see that it might be different here since we're dealing with the creation of a chrome window, meaning that we don't have to go through all the declarative junk to make the docshell. So you maybe we should just stash the principal on the docshell in nsWebShellWindow::Initialize? That'd probably work...
Comment 58 Boris Zbarsky [:bz] 2012-09-15 18:15:52 PDT
> So you maybe we should just stash the principal on the docshell in
> nsWebShellWindow::Initialize?

That sounds plausible, yes.  Somewhere on the window-creation codepath there.

The "no apis" part was about generic window creation, which embedders can hook into.  But here we have a specific window creation codepath that we control from top to bottom.
Comment 59 Bobby Holley (:bholley) (busy with Stylo) 2012-09-16 09:19:33 PDT
Created attachment 661608 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v1

How about this?
Comment 60 Boris Zbarsky [:bz] 2012-09-16 11:54:58 PDT
Comment on attachment 661608 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v1

r=me
Comment 61 Bobby Holley (:bholley) (busy with Stylo) 2012-09-16 15:10:57 PDT
https://tbpl.mozilla.org/?tree=Try&rev=f9aac600cb86
Comment 62 Bobby Holley (:bholley) (busy with Stylo) 2012-09-17 09:53:52 PDT
Sigh...orange.
Comment 63 Bobby Holley (:bholley) (busy with Stylo) 2012-09-17 11:05:08 PDT
Created attachment 661845 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v2

Oh, for crying out loud. It looks like CreateAboutBlankContentViewer doesn't actually call SetIsInitialDocument() on the document even when it should. This API is a royal footgun.

I'm going to fix it the dumb way here since this patch needs to be backported, and then clean this up a bit in a separate bug.
Comment 64 Bobby Holley (:bholley) (busy with Stylo) 2012-09-17 11:08:27 PDT
https://tbpl.mozilla.org/?tree=Try&rev=0fd18ad6dd59
Comment 65 Bobby Holley (:bholley) (busy with Stylo) 2012-09-17 14:05:19 PDT
Comment on attachment 661845 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v2

Anddddd, this caused TestAppShellReadyState to go orange on windows again. :\
Comment 66 Jorge Villalobos [:jorgev] 2012-09-17 14:14:41 PDT
*** Bug 791812 has been marked as a duplicate of this bug. ***
Comment 67 Bobby Holley (:bholley) (busy with Stylo) 2012-09-17 14:57:19 PDT
Created attachment 661933 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v3

This one turned out to be super easy. I was using nsContentUtils::GetScriptSecurityManager, which was
returning null for the C++ unit test in question here, because nobody ever calls nsContentUtils::Init.
So let's just instantiate it with do_GetService instead.

With that, this _should_ be actually green across the board and ready to land. Flagging bz for
the final signoff.
Comment 68 Bobby Holley (:bholley) (busy with Stylo) 2012-09-17 15:52:43 PDT
*** Bug 791802 has been marked as a duplicate of this bug. ***
Comment 69 Boris Zbarsky [:bz] 2012-09-17 20:46:21 PDT
Comment on attachment 661933 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v3

r=me, weeping though I am.  ;)
Comment 71 Bobby Holley (:bholley) (busy with Stylo) 2012-09-18 08:15:16 PDT
Comment on attachment 661933 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v3

Just landed on inbound, but this needs to get on beta and aurora ASAP, so flagging for approval to get this on the radar. I'll let the drivers make the call for when to land it.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 774633
User impact if declined: Serious compat problems with addons. When those compat problems manifest, the browser also hangs in nasty ways.
Testing completed (on m-c, etc.): Just landed on m-i.
Risk to taking this patch (and alternatives if risky): This code is really nasty, so anything touching it is pretty risky. The current bug is pretty bad, though. Backing out bug 774633 at this point would necessitate backing out bug 754202, and maybe some other stuff. This patch is certainly lower risk than that.
String or UUID changes made by this patch: none
Comment 72 Alex Keybl [:akeybl] 2012-09-18 11:40:27 PDT
(In reply to Bobby Holley (:bholley) from comment #71)
> User impact if declined: Serious compat problems with addons. When those
> compat problems manifest, the browser also hangs in nasty ways.
> Risk to taking this patch (and alternatives if risky): This code is really
> nasty, so anything touching it is pretty risky. The current bug is pretty
> bad, though. Backing out bug 774633 at this point would necessitate backing
> out bug 754202, and maybe some other stuff. This patch is certainly lower
> risk than that.

If we need to take this fix, we'll land today for b4 (unorthodox I know, but we need external testing). One alternative is to do nothing for beta, if we believe only NoScript is affected, given the upcoming workaround. Is that the case?
Comment 73 Alex Keybl [:akeybl] 2012-09-18 11:43:32 PDT
I now see RoboForm and others may be affected too. I'm guessing testing only a subset of possibly affected add-ons won't do us much good, since this appears to be a prevalent API.
Comment 74 Alex Keybl [:akeybl] 2012-09-18 11:44:37 PDT
Comment on attachment 661933 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v3

[Triage Comment]
Lowest risk fix we know of for a pretty serious issue. Please make recommendations around how QA can best be on the lookout for regressions (NoScript/RoboForm testing, etc.)
Comment 75 Shane Tomlinson [:stomlinson] 2012-09-18 12:22:42 PDT
(In reply to Alex Keybl [:akeybl] from comment #72)

> If we need to take this fix, we'll land today for b4 (unorthodox I know, but
> we need external testing). One alternative is to do nothing for beta, if we
> believe only NoScript is affected, given the upcoming workaround. Is that
> the case?

Another major addon where this manifests is Google PageSpeed. Please see https://bugzilla.mozilla.org/show_bug.cgi?id=791812.
Comment 76 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-18 13:23:29 PDT
Removing QAWANTED since it looks like this is at a state where it needs verification (re: verifyme). If someone can point out the add-ons we need to be testing and the steps to reproduce, that would be great. I'm having a bit of trouble pulling out all the scenarios mentioned in this bug.
Comment 77 Bobby Holley (:bholley) (busy with Stylo) 2012-09-18 13:31:45 PDT
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #76)
> Removing QAWANTED since it looks like this is at a state where it needs
> verification (re: verifyme). If someone can point out the add-ons we need to
> be testing and the steps to reproduce, that would be great. I'm having a bit
> of trouble pulling out all the scenarios mentioned in this bug.

Other bugs we've linked to this are in the deps / dupe fields: bug 791812, bug 790218, bug 791328, bug 789808, and bug 791802. This patch here of course has an automated test that verifies the testcase from comment 48.

I'm prepping this for aurora and beta right now.
Comment 78 Kris Maglione [:kmag] 2012-09-18 13:37:36 PDT
(In reply to Alex Keybl [:akeybl] from comment #73)
> I now see RoboForm and others may be affected too. I'm guessing testing only
> a subset of possibly affected add-ons won't do us much good, since this
> appears to be a prevalent API.

It's a very common pattern, yes.
Comment 79 Bobby Holley (:bholley) (busy with Stylo) 2012-09-18 13:46:20 PDT
Pushed to mozilla-beta:

remote:   https://hg.mozilla.org/releases/mozilla-beta/rev/a12d980c8681
remote:   https://hg.mozilla.org/releases/mozilla-beta/rev/3682a06448ce
Comment 80 Bobby Holley (:bholley) (busy with Stylo) 2012-09-18 13:47:13 PDT
actually, status-firefox18: fixed is incorrect, since this hasn't yet been merged from inbound to mc. ;-)
Comment 81 Anthony Hughes (:ashughes) [GFX][QA][Mentor] 2012-09-18 13:57:53 PDT
Can someone please clarify the call-to-action for QA? I'm a bit confused as recent comments seem to imply automation is good enough and manually testing NoScript & Roboform is a waste of time. I'm not sure I agree with that assessment if this is the case.
Comment 82 Bobby Holley (:bholley) (busy with Stylo) 2012-09-18 14:06:29 PDT
Pushed to mozilla-aurora:

remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/e669ed4a734d
remote:   https://hg.mozilla.org/releases/mozilla-aurora/rev/865c90c3df4e
Comment 83 Bobby Holley (:bholley) (busy with Stylo) 2012-09-18 14:09:13 PDT
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #81)
> Can someone please clarify the call-to-action for QA? I'm a bit confused as
> recent comments seem to imply automation is good enough and manually testing
> NoScript & Roboform is a waste of time.

I don't believe it's a waste of time. The testcase reported in this bug was a reduced testcase from the NoScript author, but it's probably still worth ensuring that the various addons behave appropriately. More importantly (and probably higher priority) is going through the bugs mentioned in comment 77 that we've linked (behaviorally) to this bug, and making sure that the specific STR in those bugs no longer reproduces.

That being said, I did not make said call to action.
Comment 84 Irakli Gozalishvili [:irakli] [:gozala] [@gozala] 2012-09-18 17:18:03 PDT
This change has introduced regression in the addon-sdk code (see bug 792269). I'm trying to figure why is that, and would really like if you could help me to understand.

It looks like exceptions are thrown by the following line:
https://github.com/mozilla/addon-sdk/blob/master/packages/api-utils/lib/frame/utils.js#L56

because browser.docshell property
https://developer.mozilla.org/en-US/docs/XUL/browser#p-docShell
is not present.

Is that change intentional ? If so is there any other way we could use to access it ?
Comment 85 Irakli Gozalishvili [:irakli] [:gozala] [@gozala] 2012-09-18 18:09:35 PDT
So I tried to dig dipper into the issue and it looks like the browser node
created by the function I pointed to earlier is not a browser (not sure how to
describe this better). It shows up as XUL browser node but does not implements
any of the properties / method from browser.

This makes me believe that the issue maybe related to the way we create window
that we use to create `browser` in. Complete and reduced code looks as follows:


let XUL = 'http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul';

let windowWatcher = Cc['@mozilla.org/embedcomp/window-watcher;1'].
                      getService(Ci.nsIWindowWatcher);
                       
let window = windowWatcher.
              openWindow(null, 'data:text/html;charset=utf-8,Window', null, '', null);

let frame = window.document.createElementNS(XUL, 'browser');
frame.setAttribute('type', 'content');
frame.setAttribute('src', 'about:blank');
Comment 86 Kris Maglione [:kmag] 2012-09-18 18:27:15 PDT
The XBL portions of the <browser> element (which include the docShell) property are applied via CSS, so aren't guaranteed to be present immediately. I'm not sure why this patch would affect that, but the workaround is generally to do something that would force a reflow (getBoundingClientRect, e.g.), which is obviously less than ideal.
Comment 87 Boris Zbarsky [:bz] 2012-09-18 19:55:04 PDT
The real problem from the code in comment 85 is that it's creating elements from transient about:blank documents, as far as I can tell...
Comment 88 Irakli Gozalishvili [:irakli] [:gozala] [@gozala] 2012-09-18 20:35:52 PDT
I have modified snippet to take both suggestions into account:


let XUL = 'http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul';

let windowWatcher = Cc['@mozilla.org/embedcomp/window-watcher;1'].
                      getService(Ci.nsIWindowWatcher);
                       
let window = windowWatcher.
              openWindow(null, 'data:text/html;charset=utf-8,Window', null, '', null);


window.addEventListener('load', function() {
  let frame = window.document.createElementNS(XUL, 'browser');
  frame.setAttribute('type', 'content');
  frame.setAttribute('src', 'data:text/html;charset=utf-8,frame');
  frame.getBoundingClientRect()
  window.alert(window.document.readyState + ' : ' + window.location + ' ' + frame.docshell)
}, false)


Still browser is kind of broken, alert has following message:

complete : data:text/html;charset=utf-8,Window undefined

Is there anything else I'm missing ?
Comment 89 Boris Zbarsky [:bz] 2012-09-18 20:36:55 PDT
Getting the bounding client rect before inserting the node into the DOM is not going to do anything, really.
Comment 90 Kris Maglione [:kmag] 2012-09-18 21:04:52 PDT
Just to be clear, the reflow hack is usually not necessary when working with XUL. It's generally only necessary in some XBL edge cases, so if things work without it, it's best to avoid it. Regardless, the docShell property shouldn't be available until the browser is inserted into the DOM tree, and the reflow hack needs to happen after that.

Not sure about bz's about:blank suggestion (but it seems the most likely culprit given the circumstances). What's the document in the failing instance, and when is it being called?
Comment 91 Irakli Gozalishvili [:irakli] [:gozala] [@gozala] 2012-09-18 21:07:09 PDT
(In reply to Boris Zbarsky (:bz) from comment #89)
> Getting the bounding client rect before inserting the node into the DOM is
> not going to do anything, really.

Oh right, my bad! You were correct, problem was that I was creating a browser from a transient document, forcing reflow does not seems to be necessary. Also as you have pointed out issue in my tests was that I forgot to insert created element into document.

Thanks!
Comment 92 Bobby Holley (:bholley) (busy with Stylo) 2012-09-19 01:45:05 PDT
> Nicholas Nethercote [:njn]
> Whiteboard: [MemShrink] → [MemShrink:P3]

Nick, can you clarify this? I got an email about a .06% MaxHeap regression on mac, but that doesn't seem like very much.

This patch just eagerly instantiates the about:blank viewer on top-level chrome docshells, whereas we'd previously wait to lazily instantiate it later on in the same call. However! The only case where we wouldn't instantiate it would be in the call to nsAppShellService::CreateHiddenWindow ( http://mxr.mozilla.org/mozilla-central/source/xpfe/appshell/src/nsAppShellService.cpp#82 ). AIUI, this is done once at startup on MacOSX in order to create a dummy window that keeps the browser alive even when all the windows are closed.

So we could stick some conditionals somewhere in there to avoid it, though it would come at the cost of nastifying some already-really-nasty-and-regression-prone code. Is a .06% memory regression (24k of a 400MB heap) worth doing that for? If so, please file a separate bug and I'll look into it.
Comment 93 Nicholas Nethercote [:njn] 2012-09-19 03:00:05 PDT
The MemShrink annotation was added shortly after the bug was created because of the ghost windows, which have now been fixed, AIUI :)  I wasn't aware of any MaxHeap regression, and .06% on Mac isn't worth worrying about.
Comment 95 Philip Chee 2012-09-19 21:40:41 PDT
about:home doesn't exist in SeaMonkey. Can you use some other about: page like about:neterror ?
Comment 96 Bobby Holley (:bholley) (busy with Stylo) 2012-09-20 00:00:13 PDT
(In reply to Philip Chee from comment #95)
> about:home doesn't exist in SeaMonkey. Can you use some other about: page
> like about:neterror ?

Fine by me assuming the test continues to pass. File a separate bug?
Comment 97 Matthias Versen [:Matti] 2012-09-20 01:59:46 PDT
*** Bug 792548 has been marked as a duplicate of this bug. ***
Comment 98 Matthias Versen [:Matti] 2012-09-20 05:42:49 PDT
*** Bug 792368 has been marked as a duplicate of this bug. ***
Comment 99 Matthias Versen [:Matti] 2012-09-20 07:51:15 PDT
*** Bug 791401 has been marked as a duplicate of this bug. ***
Comment 100 André Ziegler 2012-09-21 01:33:24 PDT
I also had the issue with Firefox 16 Beta 3 and b4 seams to fix it. Now Firefox no longer hangs at close and JavaScript links work again now :)
Comment 101 Mihaela Velimiroviciu (:mihaelav) 2012-09-21 08:23:53 PDT
We have run several scenarios to verify this issue, and we could not reproduce it anymore.
Environment:
* OSs: Win XP, Win 7, Win 8, MacOS X 10.7.4
* Builds: latest Firefox 16 beta 4, latest Aurora (20120920042010)
* NoScript 2.5.3, 2.5.4, 2.5.5
We also verified duplicates: bug #791812 and bug #789808
Comment 102 Matthias Versen [:Matti] 2012-09-23 02:02:05 PDT
*** Bug 792073 has been marked as a duplicate of this bug. ***
Comment 103 Matthias Versen [:Matti] 2012-09-24 07:51:27 PDT
*** Bug 791700 has been marked as a duplicate of this bug. ***
Comment 104 Mihai Morar, (:MihaiMorar) 2012-12-27 02:52:08 PST
Verified fixed for FF 18b5 Windows 7 x64, Ubuntu x86 and Mac OS 10.8 following theese STR:

1. Install the latest FF Beta.
2. Navigate to http://mozqa.com/data/firefox/popups/popup_trigger.html?count=1
3. In the notification bar, go to Options and select 'Allow popups from mozqa.com'
4. Refresh the page 
5. Install No Script 2.5.3 add-on (https://addons.mozilla.org/en-US/firefox/addon/noscript/versions/?page=1#version-2.5.3)
6. Navigate to http://mozqa.com/data/firefox/popups/popup_trigger.html?count=1
7. In the NoScript notification bar, go to Options and select 'Allow mozqa.com'
8. Quit Firefox and check the processes in the task manager
Expected results: 
In step 4, one pop up window is opened
In step 6 no pop up window is opened
In step 7 one pop up window is opened again.
In step 8 - no firefox.exe process remains opened

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0 (20121219074241)
Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0 (20121219074241)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:18.0) Gecko/20100101 Firefox/18.0 (20121219074241)

Note You need to log in before you can comment on or make changes to this bug.