Closed
Bug 622326
Opened 14 years ago
Closed 13 years ago
Tab-modal confirm(): browser freeze
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
VERIFIED
FIXED
mozilla2.0b11
Tracking | Status | |
---|---|---|
blocking2.0 | --- | final+ |
People
(Reporter: kdevel, Assigned: Paolo)
References
(Blocks 1 open bug)
Details
(Keywords: regression, Whiteboard: [softblocker][fx4-fixed-bugday])
Attachments
(1 file, 5 obsolete files)
5.84 KB,
patch
|
Details | Diff | Splinter Review |
User-Agent: Build Identifier: Mozilla/5.0 (X11; Linux x86_64; rv:2.0b9pre) Gecko/20100101 Firefox/4.0b9pre Follow-up to Bug 621308. Reproducible: Always Steps to Reproduce: 1. open reduced testcase https://bug619644.bugzilla.mozilla.org/attachment.cgi?id=498236 2. press stop-button with mouse (stop loading page) 3. create second tab open reduced testcase https://bug619644.bugzilla.mozilla.org Actual Results: Browser freezes. Expected Results: Keep running. Built from http://hg.mozilla.org/mozilla-central/rev/b2275b45a33d
STR #3 should read: 3. create second tab, open reduced testcase https://bug619644.bugzilla.mozilla.org/attachment.cgi?id=498236
Comment 2•14 years ago
|
||
WFM using Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b9pre) Gecko/20101231 Firefox/4.0b9pre ID:20101231030358
Blocks: 59314
Keywords: regression
Product: Firefox → Toolkit
QA Contact: general → general
Version: unspecified → Trunk
Comment 3•14 years ago
|
||
Confirmed on http://hg.mozilla.org/mozilla-central/rev/b2275b45a33d Mozilla/5.0 (X11; Linux i686; rv:2.0b9pre) Gecko/20110101 Firefox/4.0b9pre ID:20110101030401 And this happens since http://hg.mozilla.org/mozilla-central/rev/baa6cc2f72e4 Mozilla/5.0 (X11; Linux i686; rv:2.0b8pre) Gecko/20101120 Firefox/4.0b8pre ID:20101120030848
Status: UNCONFIRMED → NEW
blocking2.0: --- → ?
Ever confirmed: true
Comment 4•14 years ago
|
||
Browser occupies a core of CPU(i.e 25% on C2Q) on windows7, However, the browser does not freeze. I filed a Bug 622331,(I am not sure whether it to be duplication of this or not)
Comment 5•14 years ago
|
||
Edge case so this doesn't block for now, would be nice to figure out whether there is something more sinister going on with these though.
blocking2.0: ? → -
Assignee | ||
Comment 6•14 years ago
|
||
nsThread logging revealed that the high CPU usage observed here is caused by an event handler, ClearScopeEvent, that continues to re-post itself to the event queue until a certain condition is met: http://hg.mozilla.org/mozilla-central/file/4a3866321a14/dom/base/nsGlobalWindow.cpp#l1153 The event is raised for the first time just before a DOM inner window is destroyed ("--DOMWINDOW" console message) to prevent memory leaks by clearing the inner window's global JavaScript object. This operation cannot be done safely until all the scripts in the window's scripting context stopped their execution (I guess that this is because any one of them might hold a reference to data stored in the global object of the window being destroyed?). The fact is that the scripting context is actually stored in the outer window and is thus shared between all the inner windows, either visible or invisible. This means that the condition the event handler is looking for (no scripts executing) will never be met if *any* inner window is executing a script, even a window that is not currently being destroyed. This also means that no inner window can be destroyed if there is at least one of them that, for example, is stuck in a modal event loop. To reproduce this condition, and the high CPU usage, reliably: 1. Type in the address bar: data:text/html,<body onload="alert(0);"> Leave the tab-modal message box open. 2. Open a new tab and repeat step 1. 3. Close the first tab. 4. Wait a few seconds. This is often necessary because the destruction of DOM windows when they are closed is not immediate, but delayed a bit. Can also be reproduced on FF 3.6 using two different windows instead of tabs. This issue existed for a long time, but the new tab-modal dialogs will make it happen more often, because they start inner modal event loops. The attached patch replaces the event approach with a polling timer, and solves the CPU usage issue for me. Later I noticed that there might be a better approach based on nsIScriptContext::SetTerminationFunction, I think I'll try and implement it later. Meanwhile I'm interested in feedback on the issue in general. I'm not sure whether termination functions are executed when any script on the stack returns to C++ or when the first script posted to the JS context returns to C++. If the case is the latter, in presence of modal loops, termination functions in general might be invoked much later than expected, however in my case that would be better because I should not need to check jsscx->GetExecutingScript() again.
Assignee: nobody → paolo.02.prg
Status: NEW → ASSIGNED
Attachment #502676 -
Flags: feedback?(bzbarsky)
Assignee | ||
Comment 7•14 years ago
|
||
This should fix the CPU usage issue and related event starvation. Just a bit of XPCOM / C++ advice required: +nsGlobalWindow::ClearWindowScopeOnDestruction(nsISupports *aWindow) +{ + nsCOMPtr<nsIDOMWindow> xpcom_window(do_QueryInterface(aWindow)); + nsIDOMWindow *raw_window = xpcom_window; + nsGlobalWindow *window = static_cast<nsGlobalWindow *>(raw_window); I suppose there is a saner way of doing this cast?
Attachment #502676 -
Attachment is obsolete: true
Attachment #502906 -
Flags: review?(bzbarsky)
Attachment #502676 -
Flags: feedback?(bzbarsky)
Assignee | ||
Comment 8•14 years ago
|
||
For testing, I don't think an automated one can check CPU usage reliably. The manual testing steps are however quite easy: 1. Open a CPU usage monitor (for example, Task Manager on Windows). 2. Check that CPU usage is low to begin with. 3. Go to this address: data:text/html,<body onload="alert(0);"> Leave the tab-modal message box open. 4. Open a new tab, go to this address: data:text/html,<body onload="alert(0);"> Leave the tab-modal message box open. 5. Close the first tab. 6. Wait 10 seconds. 7. Check that CPU usage is still low and Firefox is responsive.
Assignee | ||
Comment 9•14 years ago
|
||
(In reply to comment #5) > Edge case so this doesn't block for now, would be nice to figure out whether > there is something more sinister going on with these though. Further investigation revealed that this is not such an edge case, it may happen very often when tab-modal dialogs in more than one tab are ignored. When this problem occurs, starvation of UI refresh events make the interface unresponsive, and Firefox becomes unusable. There is a patch and a manual test case awaiting review. It may also be appropriate to backport the fix to 1.9.2 and earlier branches.
blocking2.0: - → ?
Updated•14 years ago
|
Whiteboard: [has patch] [needs review bz]
Comment 10•14 years ago
|
||
Reproducible regression causing hang/resource-starvation. This seems like the kind of scenario that we just can't accurately predict how edge-case-y it is, so I think we should block on fixing it.
blocking2.0: ? → final+
Updated•14 years ago
|
Whiteboard: [has patch] [needs review bz] → [has patch] [needs review bz] [softblocker]
Comment 11•14 years ago
|
||
> I'm not sure whether termination functions are executed when any script on
> the stack returns to C++
Sadly, yes.
jst, Blake, peterv do we even need this scope-clearing jazz anymore? We don't need it to actually clear the inner window scope, and with cycle collector we shouldn't need it for leak prevention....
Assignee | ||
Comment 12•14 years ago
|
||
(In reply to comment #11) > jst, Blake, peterv do we even need this scope-clearing jazz anymore? We don't > need it to actually clear the inner window scope, and with cycle collector we > shouldn't need it for leak prevention.... It also occurred to me that this might not be needed anymore because of the cycle collector. If this is true and there's no reason to wait until all scripts stopped to call NotifyWindowIDDestroyed, then all this code can be removed.
Comment 13•14 years ago
|
||
(In reply to comment #11) > jst, Blake, peterv do we even need this scope-clearing jazz anymore? We don't > need it to actually clear the inner window scope, and with cycle collector we > shouldn't need it for leak prevention.... See bug 470510. We're not quite there yet.
Comment 14•14 years ago
|
||
Comment on attachment 502906 [details] [diff] [review] The patch >+++ b/dom/base/nsGlobalWindow.cpp >+nsGlobalWindow::ClearWindowScopeOnDestruction(nsISupports *aWindow) I'd call this function something more like TryClearWindowScope. >+ nsCOMPtr<nsIDOMWindow> xpcom_window(do_QueryInterface(aWindow)); >+ nsIDOMWindow *raw_window = xpcom_window; >+ nsGlobalWindow *window = static_cast<nsGlobalWindow *>(raw_window); This can just be: nsGlobalWindow *window = static_cast<nsGlobalWindow *>(static_cast<nsIDOMWindow*>(this)); r=me with those two nits.
Attachment #502906 -
Flags: review?(bzbarsky) → review+
Assignee | ||
Comment 15•14 years ago
|
||
Attachment #502906 -
Attachment is obsolete: true
Assignee | ||
Comment 16•14 years ago
|
||
Setting checkin-needed, manual test case is in comment 8.
Keywords: checkin-needed
OS: Linux → All
Hardware: x86_64 → All
Whiteboard: [has patch] [needs review bz] [softblocker] → [has patch] [softblocker]
Assignee | ||
Comment 17•14 years ago
|
||
Uhm... latest patch contained garbage due to automatic line-ending conversions. Sorry for the bugspam.
Attachment #504003 -
Attachment is obsolete: true
Comment 18•14 years ago
|
||
Pushed http://hg.mozilla.org/mozilla-central/rev/0f1fd87b570d Paolo, thanks for the fix!
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Flags: in-testsuite?
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b10
Reporter | ||
Comment 19•14 years ago
|
||
new Bug 626015
I backed this out on suspicion of causing bug 626100, an intermittent large leak in a11y mochitests. http://hg.mozilla.org/mozilla-central/rev/9c32afba1189
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Updated•14 years ago
|
Component: General → DOM
Product: Toolkit → Core
QA Contact: general → general
Assignee | ||
Comment 21•14 years ago
|
||
It looks like the leak happens when SetDocShell(null) is called before all scripts in the outer window's scripting context have stopped. Here's the code: http://mxr.mozilla.org/comm-central/source/mozilla/dom/base/nsGlobalWindow.cpp#2319 In this condition and with the patch applied, the call to FreeInnerObjects sets one or more termination functions on the context, but the context is then destroyed in SetDocShell, a few lines of code later, and this assertion is hit: http://mxr.mozilla.org/comm-central/source/mozilla/dom/base/nsJSEnvironment.cpp#1158 Without the patch applied, no leak occurs apparently, but still mContext->ClearScope is never called on mJSObject for inner windows. By the way, the following lines mitigate the problem by solving the leak in case the assertion is hit. Maybe it's worth including them anyway? NS_PRECONDITION(!mTerminations, "Shouldn't have termination funcs by now"); + if (mTerminations) + delete mTerminations; To avoid the assertion in this specific case, I see two possibilities: 1) If we don't need to call ClearScope if we are going to destroy the context, we could just call FreeInnerObjects(PR_FALSE) instead of FreeInnerObjects(PR_TRUE). 2) If we need to call ClearScope even if we are going to destroy the context, I'm open to suggestions :-) However, I can also see a scenario where we set a termination function on the context during a previous scope clearing request operation, and we never got the opportunity of running it. The same goes for termination functions that other windows set on us. Any idea on how to handle this case? We might either cancel the execution of termination functions, or forcibly run them even if scripts are still executing, but I don't know this area of the code enough to understand the implications.
Comment 22•14 years ago
|
||
jst or mrbkap are probably the best people to answer those questions. I can dig through the code to figure that stuff out, but they might recall off the top of their heads.
Assignee | ||
Comment 23•13 years ago
|
||
NS_PRECONDITION(!mTerminations, "Shouldn't have termination funcs by now"); + if (mTerminations) + delete mTerminations; With this change only, we get the current behavior, without the leak. So in case we can't figure out a better behavior, we could just fix this and re-land, what do you think? I'm not sure what to do with assertions in cases like this one, where we know that the condition might occur during the normal flow of the program, even if in rare cases. See also bug 398103 for the assertion in LeaveModalState. Should these be removed, turned into warnings, or left in the code as they are?
Comment 24•13 years ago
|
||
> I'm not sure what to do with assertions in cases like this one
Remove them; they're clearly bogus if they're asserting something that just isn't true.
Assignee | ||
Comment 25•13 years ago
|
||
Attachment #504008 -
Attachment is obsolete: true
Attachment #506036 -
Flags: review?(bzbarsky)
Comment 26•13 years ago
|
||
Comment on attachment 506036 [details] [diff] [review] The patch, fixing memory leak This looks fine to me, but I'd like jst to take a look at least at the mTerminations changes too...
Attachment #506036 -
Flags: review?(jst)
Attachment #506036 -
Flags: review?(bzbarsky)
Attachment #506036 -
Flags: review+
Comment 28•13 years ago
|
||
Comment on attachment 506036 [details] [diff] [review] The patch, fixing memory leak - In nsJSContext::~nsJSContext(): + if (mTerminations) + delete mTerminations; No need for the if check here, delete is null safe. r=jst The one thing that I see here that's a difference in behavior from before, other than fixing this bug, is that this makes us potentially clear scope synchronously in cases where we used to go back to the event loop before doing so. This should make no difference to any JS code, but it could in theory cause objects to be deleted earlier than they do before this change. I can't think of any cases where that matters, but it could affect C++ callers that don't hold proper strong refs to stuff or what not. If we feel it's worth worrying about that we could keep ClearScopeEvent code in addition to the new termination function code, but I'm not convinced it is.
Attachment #506036 -
Flags: review?(jst) → review+
Assignee | ||
Comment 29•13 years ago
|
||
(In reply to comment #28) > If we feel it's worth worrying about > that we could keep ClearScopeEvent code in addition to the new termination > function code, but I'm not convinced it is. As soon as we're convinced it isn't, here is the patch ready to land :-) Thanks for the review!
Attachment #506036 -
Attachment is obsolete: true
Comment 30•13 years ago
|
||
I decided to go ahead and push this fix. http://hg.mozilla.org/mozilla-central/rev/5e0bba79c052 Thanks for your work her Paolo!
Status: REOPENED → RESOLVED
Closed: 14 years ago → 13 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 31•13 years ago
|
||
(In reply to comment #30) > I decided to go ahead and push this fix. > Thanks for your work her Paolo! Thanks to you! Curiously, I was going to use this patch to test my brand-new tryserver access exactly now, this saved some cycles for other devs I guess :-)
Updated•13 years ago
|
Whiteboard: [has patch] [softblocker] → [softblocker]
Target Milestone: mozilla2.0b10 → mozilla2.0b11
Comment 32•13 years ago
|
||
Sorry to spoil your try server test run plans, I guess you're going to have to fix some other bugs now so you can test your try server access :)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla2.0b11 → mozilla2.0b10
Comment 33•13 years ago
|
||
jst, did you mean to reopen this?
Comment 34•13 years ago
|
||
I'm assuming he did not.
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
Target Milestone: mozilla2.0b10 → mozilla2.0b11
Status: RESOLVED → VERIFIED
Whiteboard: [softblocker] → [softblocker][fx4-fixed-bugday]
Comment 35•13 years ago
|
||
Verified with Mozilla/5.0 (X11; Linux i686; rv:2.0b12pre) Gecko/20110204 Firefox/4.0b12pre
Updated•5 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•