Closed Bug 789773 Opened 12 years ago Closed 12 years ago

nsIWebProgressListener implementations referencing the load's window in onStateChange() (like NoScript or Roboform) cause popup loads to be aborted and the browser to hang on exit

Categories

(Core :: DOM: Navigation, defect)

17 Branch
x86
Windows XP
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla18
Tracking Status
firefox15 --- unaffected
firefox16 + verified
firefox17 + verified
firefox18 + verified
firefox-esr10 --- unaffected

People

(Reporter: vlad.sakh, Assigned: bholley)

References

Details

(Keywords: regression, verifyme, Whiteboard: [MemShrink:P3])

Attachments

(5 files, 4 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:16.0) Gecko/20100101 Firefox/16.0
Build ID: 20120904124322

Steps to reproduce:

Tested the work of 17.0a2 and later 18.0a1 Fx branches with NoScript extension. Versions of NoScript: 2.5.3, 2.5.4 and 2.5.4rc3.


Actual results:

After the Aurora update from Sept.07.2012 after closing of the browser the process firefox.exe still remains active; VB-based forum works slightly incorrectly - gives no opportunity to load the attachment to message: the corresponding pop-up window does not appear. Tested on http://www.tehnari.ru/ (registration required). 


Expected results:

Nothing except ordinary work of browser.
Details:
Passing to Safe Browser mode heals the situation.
Creating the new profile with ONLY NoScript installed extension shows the bug described.
The permission of all scripts in NoScript does not help.
Branch Fx 16.0b2 has no problems with NoScript.
Before Sept.07.2012 Aurora with active NoScript worked perfectly.
I have been having this problem (firefox.exe process remains after exiting browser) as well with noscript and the recent nightly moz-central win32 builds. It has also been causing ghost windows after about 1 hour of browsing.

Oddly, this happens on win7 x64 but not win8 RTM x86.

Please add "MemShrink" to the whiteboard field.
Sorry, make that "[MemShrink]"
I don't see why this is relevant to MemShrink.
(In reply to Nicholas Nethercote [:njn] from comment #4)
> I don't see why this is relevant to MemShrink.

I am getting ghost windows with noscript (see comment 2).
I can reproduce this on win8 x86 now. After exiting Nightly the firefox.exe process remains alive. 

Ghost windows are held from nytimes.com pages. These are frequently seen (takes > 1 hour to start seeing them though).  There are problems with opening images via js commands on the nytimes site due to noscript. Perhaps there is a problem with the script surrogates?

I've attached my about:memory output as a txt file. An example of the js that noscript is incorrectly blocking on the whitelisted NY Times page is:

javascript:pop_me_up2('http://www.nytimes.com/imagepages/2012/09/09/realestate/09COVER3.html','09COVER3_html','width=720,height=563,scrollbars=yes,toolbars=no,resizable=yes')
I'm on my way back from Mozcamp and I can't look into this seriously until tomorrow, but could you please check wether the problem goes away if you toggle the noscript.surrogate.enabled about:config preference to false, and possibly find a regression window either in NoScript or, more likely, in Firefox?
(In reply to Giorgio Maone from comment #8)
> I'm on my way back from Mozcamp and I can't look into this seriously until
> tomorrow, but could you please check wether the problem goes away if you
> toggle the noscript.surrogate.enabled about:config preference to false, and
> possibly find a regression window either in NoScript or, more likely, in
> Firefox?

I set it to false (in safe mode), but am still getting ghost windows, and still can't open the pop-up images on nytimes.com pages. Weird.
Oddly I can't seem to turn surrogates off - e.g. I still see entries like the following in about:memory despite the "false" value for noscript.surrogate.enabled remaining after restarting etc.

─109,056 B (00.12%) -- compartment(https://www.google.co.nz/, NoScript::ScriptSurrogate@https://www.google.co.nz/ (from: chrome://noscript/content/ScriptSurrogate.js:238))
│  │   │  │       ├───90,112 B (00.10%) -- gc-heap
I'm still getting ghost windows with noscript 2.5.2, and I don't remember having problems that far back. So, change in moz-central that triggered it?
The ghost windows are listed under about:compartments?
(Don't know about ghost windows, but) I'll confirm this, the hang on Quit.

"NoScript+Fetch Text URL+Aurora=Hang"
http://forums.informaction.com/viewtopic.php?f=10&t=9902

& I have some posts in this thread too:

"NoScript blocking javascript popup on whitelisted website"
http://forums.informaction.com/viewtopic.php?f=7&t=9461

The Sept.07.2012 date is what I found too.

works: http://hg.mozilla.org/releases/mozilla-aurora/rev/c2cc63c864e4
broke: http://hg.mozilla.org/releases/mozilla-aurora/rev/41eab5a0e537

Setting noscript.surrogate.enabled;false did not help the situation.
Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a2
Build identifier: 20120910013002

I see the same (at least the NoScript+Fetch Text URL+Aurora) on Win7 x64 & Aurora.
Could anyone find the exact regression range (the last working build and the first broken one) ?
Could this be caused by a Noscript update ?
Component: Untriaged → Extension Compatibility
Whiteboard: [memshrink]
From what I saw, the NoScript version was immaterial.
(I had checked back to noscript-2.0.9.9rc6.xpi, March 2, 2011, & the issue still persists.)
Yes, we need a regression window for Firefox.
STR:

URL: http://www.vbforums.com/forumdisplay.php?25-Visual-Basic-.NET
On that page, the thread, "Sticky: VB.NET Example Code & Hints you may not know"

Click the paperclip.
A window pops up.

Quit.

Firefox exits.
 
 
Install NoScript & repeat, but do NOT allow vbforums.com.

Click the paperclip.
No window pops up (that is not necessarily an issue).

Quit.

Firefox exits.
 
 
Repeat again, though this time allowing vbforums.com.

Click the paperclip.
No popup window.

Quit.

Firefox hangs in memory.
(In reply to Matthias Versen (Matti) from comment #17)
> Could anyone find the exact regression range (the last working build and the
> first broken one) ?
> Could this be caused by a Noscript update ?

Is comment 14 good enough?
I am able to reproduce this bug without NoScript in safe mode and a new profile on my Windows 8 x64 RTM with the x86 and x64 nightlies since the Sep 6th.

I did some cross referencing on the changelogs of the Sep 6th nightly (http://forums.mozillazine.org/viewtopic.php?f=23&t=2535353) and the regression range in comment 14 (http://hg.mozilla.org/releases/mozilla-aurora/pushloghtml?startdate=2012-09-06+01%3A00%3A00&enddate=2012-09-06+19%3A00%3A00) and got three bugs: 771354, 774607 and 774633.
(In reply to Josh Matthews [:jdm] from comment #22)
> Is comment 14 good enough?
It is but I somehow missed that comment
Status: UNCONFIRMED → NEW
Ever confirmed: true
I'm totally willing to believe that bug 774633 broke something here. I had to mess around with a lot of the fragile embedding layer stuff (appshell, windowwatcher, etc). So it's likely a subtle behavior change with respect to opening/closing windows that NoScript is choking on.

That being said, we're unlikely to back out bug 774633, because it's very important for security. It'd be great if somebody who knows the NoScript code could dive in and let us know what's going on.
> It'd be great if somebody who knows the NoScript
> code could dive in and let us know what's going on.

Comment 23 suggests that NoScript isn't necessary for the bug to trigger.  Having said that, breaking NoScript is bad -- it's the 6th most popular add-on on AMO with over 2 million users.
(In reply to Nicholas Nethercote [:njn] from comment #26)
> > It'd be great if somebody who knows the NoScript
> > code could dive in and let us know what's going on.
> 
> Comment 23 suggests that NoScript isn't necessary for the bug to trigger. 

Ah, hm.

There are a lot of mixed signals here about what this bug is actually about. Can someone figure out exactly what the problem is, list precise steps to reproduce, and figure out which platforms are affected?
Keywords: qawanted
Perhaps comment 23 had a typo? I can't reproduce without noscript.
Nightly from 5th September is the one that does not have this bug. The build from 6th already has it.
No my comment in comment 23 is not a typo, I've never installed NoScript before but can reproduce in safe mode and new profiles. I'm able to go back to the 5th Nightly and everything is fine so I'm guessing some configuration, of installed programs, on my side has the same effect as having NoScript installed.
David, can you describe the exact bug you're seeing? Is it windows-only?
> I'm guessing some configuration, of installed programs, on my side has the same
> effect as having NoScript installed.

That's very possible.

So how about providing your steps to reproduce & the extensions you have installed.

And how about bisecting your extensions to see if you can come up with the one(s) causing the problem?

In (another) case of mine, the Fetch Text URL extension & (as it was) NoScript, together combine to cause (likely) this issue when I use the "Fetch text URL in /new window/" (& not in a new tab, or current window) function in FTU.
I've tested the same profile on a Windows 8 RTM and a Windows 7 SP1 machine (both 64-bit versions) and it only affected the Windows 8 machine. It also affects safe mode as well as new profiles on that machine.

The problem I am seeing is that JavaScript popups links fail to display (example: http://www.htmlcodetutorial.com/linking/linking_famsupp_72.html, clicking on the 'my popup' link). Every time it is clicked however a compartment is created, shown as 'http://www.htmlcodetutorial.com/linking/linking_famsupp_72.html, about:blank [n]' (n is the number of times clicked) in about:compartments. This most likely leads to the problem when Firefox is closed but gets stuck during the shutdown process.
(In reply to David Clark from comment #33)
> I've tested the same profile on a Windows 8 RTM and a Windows 7 SP1 machine
> (both 64-bit versions) and it only affected the Windows 8 machine. It also
> affects safe mode as well as new profiles on that machine.
> 
> The problem I am seeing is that JavaScript popups links fail to display
> (example: http://www.htmlcodetutorial.com/linking/linking_famsupp_72.html,
> clicking on the 'my popup' link). Every time it is clicked however a
> compartment is created, shown as
> 'http://www.htmlcodetutorial.com/linking/linking_famsupp_72.html,
> about:blank [n]' (n is the number of times clicked) in about:compartments.
> This most likely leads to the problem when Firefox is closed but gets stuck
> during the shutdown process.

I cannot reproduce this in safe-mode (on win8 RTM (x86)). Should the bug reported in comment 33 be filed as a separate bug?
With noscript enabled the procedure in comment 33 does produce a ghost window and shutdown problems however.
Well I don't think I need to make a separate bug as I found that I can no longer can reproduce it after uninstalling and reinstalling a few programs (must have been an unique configuration issue).
OK, I've found it, and surrogates are not guilty at all.

If any nsIWebProgressListener implementation tries to access wp.DOMWindow during the popup window initial chrome load (for the browser.xul file), the loading gets silently aborted (no exception is thrown), the ghost window is left around and the browser hangs on exit.

This issue is very similar to bug 771655. In facts, the contrived test case (which triggers the bug in safe mode) is almost identical: 

<scratchpad environment=browser>

if ("_testPL" in window)
  Cc['@mozilla.org/docloaderservice;1'].getService(Ci.nsIWebProgress)
    .removeProgressListener(_testPL);

window._testPL = {
  START_DOC: Ci.nsIWebProgressListener.STATE_START | Ci.nsIWebProgressListener.STATE_IS_DOCUMENT,
  onStateChange: function(wp, req, stateFlags, status) {
    if ((stateFlags & this.START_DOC) === this.START_DOC &&
        /^jar:file:\/\/.*\/browser.xul$/.test(req.name)) {
      wp.DOMWindow;
      Cu.reportError("wp.DOMWindow touched on " + req.name);
    }
  },
  QueryInterface: function(iid) {
    if (iid.equals(Ci.nsISupportsWeakReference) || 
        iid.equals(Ci.nsIWebProgressListener))
      return this;
    throw Cr.NS_ERROR_NO_INTERFACE;
  }
}

Cc['@mozilla.org/docloaderservice;1'].getService(Ci.nsIWebProgress)
  .addProgressListener(_testPL, Ci.nsIWebProgress.NOTIFY_STATE_REQUEST);

try {
  content.open("http://mozilla.org", "_blank", "width=640,height=400")
} catch(e) {
  Cu.reportError(e);
} finally {
  alert("Done");
}
</scratchpad>

Notice that the onStateChange() method runs to completion, but the scratchpad script doesn't: the "Done" alert is never shown, and no error is shown in the console either.

I probably can and will work-around this in NoScript by retrieving the window from the channel's notificationCallback, but I'm fairly sure other extensions (e.g. Roboform) are going to break Firefox if the root problem doesn't get understood and possibly fixed.
Summary: Conflict of Firefox Aurora and Nightly with NoScript extension → nsIWebProgressListener implementations accessing webProgress.DOMWindow in onStateChange() cause popup loads to be aborted and the browser to hang on exit
To reproduce this bug, toggle the devtools.chrome.enabled preference to true, then open a Scratchpad window (ctrl+F4), select Environment>Browser and open the attachment. Run it with ctrl+Run.
Expected result: a mozilla.org popup opens, and a "Done" alert is shown.
Actual result: nothing happens, and the browser hangs on exit.
Summary: nsIWebProgressListener implementations accessing webProgress.DOMWindow in onStateChange() cause popup loads to be aborted and the browser to hang on exit → nsIWebProgressListener implementations accessing webProgress.DOMWindow in onStateChange() (like NoScript or Roboform) cause popup loads to be aborted and the browser to hang on exit
It looks like the work-around I proposed myself in comment 37 doesn't work after all: just referencing the window, even if obtained from the channel's notificationCallback, breaks this stuff.

I guess I'll have to single out the browser.xul load (UGLY!). Please fix.
Summary: nsIWebProgressListener implementations accessing webProgress.DOMWindow in onStateChange() (like NoScript or Roboform) cause popup loads to be aborted and the browser to hang on exit → nsIWebProgressListener implementations referencing the load's window in onStateChange() (like NoScript or Roboform) cause popup loads to be aborted and the browser to hang on exit
FYI the ugly work-around mentioned above is available in NoScript 2.5.5rc1 from http://noscript.net/getit#devel
(In reply to Giorgio Maone from comment #40)
> FYI the ugly work-around mentioned above is available in NoScript 2.5.5rc1
> from http://noscript.net/getit#devel

Thanks! Does the work-around have any performance impact?
(In reply to timbugzilla from comment #41)
> (In reply to Giorgio Maone from comment #40)
> > FYI the ugly work-around mentioned above is available in NoScript 2.5.5rc1
> > from http://noscript.net/getit#devel
> 
> Thanks! Does the work-around have any performance impact?

Practically none.
(In reply to Giorgio Maone from comment #40)
> FYI the ugly work-around mentioned above is available in NoScript 2.5.5rc1
> from http://noscript.net/getit#devel

From bugreporter: thank you very much! Tested Aurora+NoScript 2.5.5rc1 - works pefectly, no problems. Neither with browser closing, no with VB forums.
Unfortunately this issue persists in SeaMonkey (-:
 
 
New Profile
Install NoScript 2.5.5rc1
Load http://www.htmlcodetutorial.com/linking/linking_famsupp_72.html
Allow htmlcodetutorial.com
Click the 'my popup' link on that page

No popup ensues

Quit

SeaMonkey hangs in memory
 
 
Mozilla/5.0 (Windows NT 5.1; rv:17.0) Gecko/17.0 Firefox/17.0 SeaMonkey/2.14a2
Build identifier: 20120912013003
(In reply to therube from comment #44)
> Unfortunately this issue persists in SeaMonkey (-:

No suprise, since current work-around involves checking for the specific browser window XUL address, which may differ (and does differ, in this case) across browsers.

I'm gonna release a refined work-around which tries to be browser-agnostic in next build.
Working now in SeaMonkey with the latest development build 2.5.5rc2, thanks :-).
Whiteboard: [memshrink] → [MemShrink]
Thanks for the reduced testcase, giorgio. Confirmed on mac and windows. I'm looking into it.

Note that in unpackaged debug builds, the testcase doesn't actually work, because it assumes that the load is in a jar. Just loosening up the regexp to /browser.xul/ seems to do the trick.
To reproduce this bug, toggle the devtools.chrome.enabled preference to true, then open a Scratchpad window (ctrl+F4), select Environment>Browser and open the attachment. Run it with ctrl+Run.

This version matches the correct browser window URI automatically, no matter which toolkit-based browser we're using.
Attachment #660366 - Attachment is obsolete: true
Ok, so the shutdown hang has to do with the fact that we never actually make it out of the call to nsXULWindow::CreatNewContentWindow. That function spins the event loop while waiting for the chrome-y part of the content window to load:

http://hg.mozilla.org/mozilla-central/file/f39786e8364d/xpfe/appshell/src/nsXULWindow.cpp#l1770

But we never actually return from that. Yuck.
Ah, ok. So here's what happens:

1 - The scratchpad script calls content.open("http://mozilla.org", "_blank")

2 - This calls into nsXULWindow::CreateNewContentWindow, which calls nsAppShellService::CreateTopLevelWindow, and then spins the event loop until the chrome-y part of the window is ready.

3 - As the load starts, we fire OnStateChange to signal the beginning of the document load, which calls the handler in the testcase in comment 37. The stack at this point looks like this: http://pastebin.mozilla.org/1820172 . Note that we're dealing with the nsGlobalChromeWindow here, not the content window inside of it.

4 - The call to GetDOMWindow causes the DOM window to get XPConnect-wrapped on its way out the door as a return value, which causes us to generate an about:blank window, per the stack here: http://pastebin.mozilla.org/1820473.

5 - Then, per the changes in bug 774633, we call SetInitialPrincipalToSubject, which causes us to create a _new_ about:blank content viewer (with a different principal), like so: http://pastebin.mozilla.org/1820476

6 - Because of (4), the call in (5) runs into an existing mContentViewer in the DocShell, which it proceeds to blow away. It calls Stop on the load group, which cancels the outstanding request to load browser.xul.

7 - This cancellation synchronously calls into nsWebShellWindow::OnStateChange for the loading XUL window, and sets mLockedUntilChromeLoad to false (since we're no longer waiting for the load). But this all happens too soon! We haven't yet set it to true, which we do after unwinding the stack to nsXULWindow::CreateNewContentWindow. It sets mLockedUntilChromeLoad immediately after calling CreateTopLevelWindow. This is a reasonable assumption, because the URL it's trying to load is browser.xul, which _should_ load asynchronously. But we just cancelled that load and replaced it with an about:blank document, so we spin indefinitely waiting to hear that the chrome document loaded.
So, I think we should avoid cancelling pending loads if the existing document is about:blank (and possibly the initial document, as well?). I'm going to work up an automated test and a patch.
This is green on try modulo one assertion that I think we can tweak. I'll let bz make the call here.

Note that this needs to land and be backported to beta pronto.
Attachment #661180 - Flags: review?(bzbarsky)
Attached patch Tests. v1Splinter Review
Attachment #661181 - Flags: review?(bzbarsky)
Blocks: 791328
Comment on attachment 661181 [details] [diff] [review]
Tests. v1

r=me
Attachment #661181 - Flags: review?(bzbarsky) → review+
Comment on attachment 661180 [details] [diff] [review]
Don't kill loads when calling nsDocShell::CreateAboutBlankContentViewer to replace an existing initial about:blank document. v2

+++ b/docshell/base/nsDocShell.cpp
+  if (mContentViewer && !viewerIsInitialDocument) {
+    // We've got a nontrivial content viewer already. Make sure the user

This doesn't look right.  There are various ways web content can end up working with the initial document in a docshell, adding DOM nodes to it, etc.  So we really do want to enter this code for initial documents in some cases.

Why are we not creating the initial about:blank with the right principal?  Doing that seems like it would be a better fix to me.  For example, can we stash the right principal on the docshell before we go loading stuff in it, so it knows to create things in EnsureContentViewer with the right principal?
Attachment #661180 - Flags: review?(bzbarsky) → review-
(In reply to Boris Zbarsky (:bz) from comment #56)
> Why are we not creating the initial about:blank with the right principal?
> Doing that seems like it would be a better fix to me.  For example, can we
> stash the right principal on the docshell before we go loading stuff in it,
> so it knows to create things in EnsureContentViewer with the right principal?

I asked you the same question on IRC in july. You said "we just don't have the APIs for that." So I'm confused about your question here, unless you're implying that the situation here is different.

Though now that you mention it, I could see that it might be different here since we're dealing with the creation of a chrome window, meaning that we don't have to go through all the declarative junk to make the docshell. So you maybe we should just stash the principal on the docshell in nsWebShellWindow::Initialize? That'd probably work...
> So you maybe we should just stash the principal on the docshell in
> nsWebShellWindow::Initialize?

That sounds plausible, yes.  Somewhere on the window-creation codepath there.

The "no apis" part was about generic window creation, which embedders can hook into.  But here we have a specific window creation codepath that we control from top to bottom.
Assignee: nobody → bobbyholley+bmo
Component: Extension Compatibility → Document Navigation
Product: Firefox → Core
How about this?
Attachment #661180 - Attachment is obsolete: true
Attachment #661608 - Flags: review?(bzbarsky)
Comment on attachment 661608 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v1

r=me
Attachment #661608 - Flags: review?(bzbarsky) → review+
Sigh...orange.
Oh, for crying out loud. It looks like CreateAboutBlankContentViewer doesn't actually call SetIsInitialDocument() on the document even when it should. This API is a royal footgun.

I'm going to fix it the dumb way here since this patch needs to be backported, and then clean this up a bit in a separate bug.
Attachment #661608 - Attachment is obsolete: true
Attachment #661845 - Flags: review?(bzbarsky)
Comment on attachment 661845 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v2

Anddddd, this caused TestAppShellReadyState to go orange on windows again. :\
Attachment #661845 - Flags: review?(bzbarsky)
This one turned out to be super easy. I was using nsContentUtils::GetScriptSecurityManager, which was
returning null for the C++ unit test in question here, because nobody ever calls nsContentUtils::Init.
So let's just instantiate it with do_GetService instead.

With that, this _should_ be actually green across the board and ready to land. Flagging bz for
the final signoff.
Attachment #661845 - Attachment is obsolete: true
Attachment #661933 - Flags: review?(bzbarsky)
Comment on attachment 661933 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v3

r=me, weeping though I am.  ;)
Attachment #661933 - Flags: review?(bzbarsky) → review+
Blocks: 791812
Comment on attachment 661933 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v3

Just landed on inbound, but this needs to get on beta and aurora ASAP, so flagging for approval to get this on the radar. I'll let the drivers make the call for when to land it.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 774633
User impact if declined: Serious compat problems with addons. When those compat problems manifest, the browser also hangs in nasty ways.
Testing completed (on m-c, etc.): Just landed on m-i.
Risk to taking this patch (and alternatives if risky): This code is really nasty, so anything touching it is pretty risky. The current bug is pretty bad, though. Backing out bug 774633 at this point would necessitate backing out bug 754202, and maybe some other stuff. This patch is certainly lower risk than that.
String or UUID changes made by this patch: none
Attachment #661933 - Flags: approval-mozilla-beta?
Attachment #661933 - Flags: approval-mozilla-aurora?
(In reply to Bobby Holley (:bholley) from comment #71)
> User impact if declined: Serious compat problems with addons. When those
> compat problems manifest, the browser also hangs in nasty ways.
> Risk to taking this patch (and alternatives if risky): This code is really
> nasty, so anything touching it is pretty risky. The current bug is pretty
> bad, though. Backing out bug 774633 at this point would necessitate backing
> out bug 754202, and maybe some other stuff. This patch is certainly lower
> risk than that.

If we need to take this fix, we'll land today for b4 (unorthodox I know, but we need external testing). One alternative is to do nothing for beta, if we believe only NoScript is affected, given the upcoming workaround. Is that the case?
Attachment #661933 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I now see RoboForm and others may be affected too. I'm guessing testing only a subset of possibly affected add-ons won't do us much good, since this appears to be a prevalent API.
Comment on attachment 661933 [details] [diff] [review]
Eagerly create the about:blank viewer in nsWebShellWindow::Initialize. v3

[Triage Comment]
Lowest risk fix we know of for a pretty serious issue. Please make recommendations around how QA can best be on the lookout for regressions (NoScript/RoboForm testing, etc.)
Attachment #661933 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Keywords: verifyme
(In reply to Alex Keybl [:akeybl] from comment #72)

> If we need to take this fix, we'll land today for b4 (unorthodox I know, but
> we need external testing). One alternative is to do nothing for beta, if we
> believe only NoScript is affected, given the upcoming workaround. Is that
> the case?

Another major addon where this manifests is Google PageSpeed. Please see https://bugzilla.mozilla.org/show_bug.cgi?id=791812.
Removing QAWANTED since it looks like this is at a state where it needs verification (re: verifyme). If someone can point out the add-ons we need to be testing and the steps to reproduce, that would be great. I'm having a bit of trouble pulling out all the scenarios mentioned in this bug.
Keywords: qawanted
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #76)
> Removing QAWANTED since it looks like this is at a state where it needs
> verification (re: verifyme). If someone can point out the add-ons we need to
> be testing and the steps to reproduce, that would be great. I'm having a bit
> of trouble pulling out all the scenarios mentioned in this bug.

Other bugs we've linked to this are in the deps / dupe fields: bug 791812, bug 790218, bug 791328, bug 789808, and bug 791802. This patch here of course has an automated test that verifies the testcase from comment 48.

I'm prepping this for aurora and beta right now.
(In reply to Alex Keybl [:akeybl] from comment #73)
> I now see RoboForm and others may be affected too. I'm guessing testing only
> a subset of possibly affected add-ons won't do us much good, since this
> appears to be a prevalent API.

It's a very common pattern, yes.
actually, status-firefox18: fixed is incorrect, since this hasn't yet been merged from inbound to mc. ;-)
Can someone please clarify the call-to-action for QA? I'm a bit confused as recent comments seem to imply automation is good enough and manually testing NoScript & Roboform is a waste of time. I'm not sure I agree with that assessment if this is the case.
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #81)
> Can someone please clarify the call-to-action for QA? I'm a bit confused as
> recent comments seem to imply automation is good enough and manually testing
> NoScript & Roboform is a waste of time.

I don't believe it's a waste of time. The testcase reported in this bug was a reduced testcase from the NoScript author, but it's probably still worth ensuring that the various addons behave appropriately. More importantly (and probably higher priority) is going through the bugs mentioned in comment 77 that we've linked (behaviorally) to this bug, and making sure that the specific STR in those bugs no longer reproduces.

That being said, I did not make said call to action.
QA Contact: jbecerra
Whiteboard: [MemShrink] → [MemShrink:P3]
This change has introduced regression in the addon-sdk code (see bug 792269). I'm trying to figure why is that, and would really like if you could help me to understand.

It looks like exceptions are thrown by the following line:
https://github.com/mozilla/addon-sdk/blob/master/packages/api-utils/lib/frame/utils.js#L56

because browser.docshell property
https://developer.mozilla.org/en-US/docs/XUL/browser#p-docShell
is not present.

Is that change intentional ? If so is there any other way we could use to access it ?
So I tried to dig dipper into the issue and it looks like the browser node
created by the function I pointed to earlier is not a browser (not sure how to
describe this better). It shows up as XUL browser node but does not implements
any of the properties / method from browser.

This makes me believe that the issue maybe related to the way we create window
that we use to create `browser` in. Complete and reduced code looks as follows:


let XUL = 'http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul';

let windowWatcher = Cc['@mozilla.org/embedcomp/window-watcher;1'].
                      getService(Ci.nsIWindowWatcher);
                       
let window = windowWatcher.
              openWindow(null, 'data:text/html;charset=utf-8,Window', null, '', null);

let frame = window.document.createElementNS(XUL, 'browser');
frame.setAttribute('type', 'content');
frame.setAttribute('src', 'about:blank');
The XBL portions of the <browser> element (which include the docShell) property are applied via CSS, so aren't guaranteed to be present immediately. I'm not sure why this patch would affect that, but the workaround is generally to do something that would force a reflow (getBoundingClientRect, e.g.), which is obviously less than ideal.
The real problem from the code in comment 85 is that it's creating elements from transient about:blank documents, as far as I can tell...
I have modified snippet to take both suggestions into account:


let XUL = 'http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul';

let windowWatcher = Cc['@mozilla.org/embedcomp/window-watcher;1'].
                      getService(Ci.nsIWindowWatcher);
                       
let window = windowWatcher.
              openWindow(null, 'data:text/html;charset=utf-8,Window', null, '', null);


window.addEventListener('load', function() {
  let frame = window.document.createElementNS(XUL, 'browser');
  frame.setAttribute('type', 'content');
  frame.setAttribute('src', 'data:text/html;charset=utf-8,frame');
  frame.getBoundingClientRect()
  window.alert(window.document.readyState + ' : ' + window.location + ' ' + frame.docshell)
}, false)


Still browser is kind of broken, alert has following message:

complete : data:text/html;charset=utf-8,Window undefined

Is there anything else I'm missing ?
Getting the bounding client rect before inserting the node into the DOM is not going to do anything, really.
Just to be clear, the reflow hack is usually not necessary when working with XUL. It's generally only necessary in some XBL edge cases, so if things work without it, it's best to avoid it. Regardless, the docShell property shouldn't be available until the browser is inserted into the DOM tree, and the reflow hack needs to happen after that.

Not sure about bz's about:blank suggestion (but it seems the most likely culprit given the circumstances). What's the document in the failing instance, and when is it being called?
(In reply to Boris Zbarsky (:bz) from comment #89)
> Getting the bounding client rect before inserting the node into the DOM is
> not going to do anything, really.

Oh right, my bad! You were correct, problem was that I was creating a browser from a transient document, forcing reflow does not seems to be necessary. Also as you have pointed out issue in my tests was that I forgot to insert created element into document.

Thanks!
> Nicholas Nethercote [:njn]
> Whiteboard: [MemShrink] → [MemShrink:P3]

Nick, can you clarify this? I got an email about a .06% MaxHeap regression on mac, but that doesn't seem like very much.

This patch just eagerly instantiates the about:blank viewer on top-level chrome docshells, whereas we'd previously wait to lazily instantiate it later on in the same call. However! The only case where we wouldn't instantiate it would be in the call to nsAppShellService::CreateHiddenWindow ( http://mxr.mozilla.org/mozilla-central/source/xpfe/appshell/src/nsAppShellService.cpp#82 ). AIUI, this is done once at startup on MacOSX in order to create a dummy window that keeps the browser alive even when all the windows are closed.

So we could stick some conditionals somewhere in there to avoid it, though it would come at the cost of nastifying some already-really-nasty-and-regression-prone code. Is a .06% memory regression (24k of a 400MB heap) worth doing that for? If so, please file a separate bug and I'll look into it.
The MemShrink annotation was added shortly after the bug was created because of the ghost windows, which have now been fixed, AIUI :)  I wasn't aware of any MaxHeap regression, and .06% on Mac isn't worth worrying about.
https://hg.mozilla.org/mozilla-central/rev/1d17123006b3
https://hg.mozilla.org/mozilla-central/rev/1612f075c42d
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
about:home doesn't exist in SeaMonkey. Can you use some other about: page like about:neterror ?
(In reply to Philip Chee from comment #95)
> about:home doesn't exist in SeaMonkey. Can you use some other about: page
> like about:neterror ?

Fine by me assuming the test continues to pass. File a separate bug?
Depends on: 792899
I also had the issue with Firefox 16 Beta 3 and b4 seams to fix it. Now Firefox no longer hangs at close and JavaScript links work again now :)
We have run several scenarios to verify this issue, and we could not reproduce it anymore.
Environment:
* OSs: Win XP, Win 7, Win 8, MacOS X 10.7.4
* Builds: latest Firefox 16 beta 4, latest Aurora (20120920042010)
* NoScript 2.5.3, 2.5.4, 2.5.5
We also verified duplicates: bug #791812 and bug #789808
Verified fixed for FF 18b5 Windows 7 x64, Ubuntu x86 and Mac OS 10.8 following theese STR:

1. Install the latest FF Beta.
2. Navigate to http://mozqa.com/data/firefox/popups/popup_trigger.html?count=1
3. In the notification bar, go to Options and select 'Allow popups from mozqa.com'
4. Refresh the page 
5. Install No Script 2.5.3 add-on (https://addons.mozilla.org/en-US/firefox/addon/noscript/versions/?page=1#version-2.5.3)
6. Navigate to http://mozqa.com/data/firefox/popups/popup_trigger.html?count=1
7. In the NoScript notification bar, go to Options and select 'Allow mozqa.com'
8. Quit Firefox and check the processes in the task manager
Expected results: 
In step 4, one pop up window is opened
In step 6 no pop up window is opened
In step 7 one pop up window is opened again.
In step 8 - no firefox.exe process remains opened

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:18.0) Gecko/20100101 Firefox/18.0 (20121219074241)
Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0 (20121219074241)
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:18.0) Gecko/20100101 Firefox/18.0 (20121219074241)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.