tab-modal prompt in Gmail eventually triggers slow-script dialog for nsPrompter.js

VERIFIED FIXED in mozilla2.0b11

Status

()

defect
VERIFIED FIXED
9 years ago
6 years ago

People

(Reporter: Natch, Assigned: Dolske)

Tracking

(Depends on 1 bug, {regression})

Trunk
mozilla2.0b11
x86
Windows Vista
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking2.0 betaN+)

Details

(Whiteboard: [hardblocker][b11][fx4-fixed-bugday])

Attachments

(1 attachment, 3 obsolete attachments)

From bug 620145 comment 6:

When submitting the message a prompt is shown confirming the action.

If you leave the prompt open without dismissing it a stop script dialog comes
up for the following line:

../dist/bin/components/nsPrompter.js:463

Where the prompter is looping over args.promptActive and spinning the event
loop (I think). There's something wrong there apparently...
Product: Core → Toolkit
QA Contact: general → general
I can't reproduce this (on OS X, but shouldn't matter). Tried both a nightly and a debug build.

Can you reproduce with a _nightly_ instead of your own build? I ask because the slow-script dialog is normally suppressed for chrome code, though I vaguely recall it being enabled for debug builds.

The actual loop here in nsPrompter.js is normal an expected, that's how the event loop is spun. But I know our accounting is already semi-broken (see bug 577385), and would guess this is a symptom of that.
(In reply to comment #1)
Reproduced this on a nightly. You have to wait a __long__ time with the prompt showing for this to happen.

I can't reproduce this with any other prompt situation.

This doesn't seem to be the same scenario at all as bug 577385.
(In reply to comment #2)
> Reproduced this on a nightly. You have to wait a __long__ time with the prompt
> showing for this to happen.

Define "long time". Minutes? Hours?

> This doesn't seem to be the same scenario at all as bug 577385.

The point is that we already know the accounting is broken in some cases.
5 - 10 minutes. Usually closer to 5.
Still not fixed after bug 620145. Seems like this may just be bug 577385.
No longer blocks: 620145
blocking2.0: betaN+ → ?
blocking2.0: ? → betaN+
Assignee: nobody → ddahl
Posted file JS debugging output (obsolete) —
The JS debugging output from my debug build. I wonder if this issue is magnified since there are 3 dialogs in a row here. Just digging.
Whiteboard: [hard blocker]
If you change the call to thread.processNextEvent(true) to thread.processNextEvent(false) the stop script dialog pops up much much quicker (in seconds).

If the 'mayWait' arg is set to false, what events are potentially not being processed?

It is a huge number of events - if false the number is roughly 135000 - in which case we only process 65 successfully. 

When 'mayWait' is true, we process 75 events successfully.
Assignee: ddahl → dolske
Whiteboard: [hard blocker] → [hardblocker]
Summary: When submitted, gmail message with no text hangs in prompter code → tab-modal prompt in Gmail eventually triggers slow-script dialog for nsPrompter.js
If mayWait is false, as the name suggests, we don't wait.  In other words, if you processNextEvent(true) the function does not return until we've received and processed an event.  If you processNextEvent(false) the function will return immediately if there is no waiting event.  There is no distinction here based on the "type" of event, only on whether or not there is one ready to be dispatched.

How are you determining how many events we're processing?  Based on the numbers you've given I'd assume that 135000 is the number of loop iterations and 65 is the actual number of events we're handling.  Processing an event "unsuccessfully" isn't really a defined concept in this situation.
I've been able to reproduce this on OS X and Windows (though Windows seems to take _much_ longer to trigger, for some reason). A few data points...

I had been thinking that we disabled slow-script dialogs for chrome code, such that any dialog blaming chrome code must be a bug. We did that in bug 492410, we only disabled them for a branch release. Indeed, running |while(1) {}| in the Error Console triggers a slow-script dialog after 20 seconds.

My initial suspicion was that this was an accounting glitch related bug 577385, but I'm not so sure now. I get a stack like:

    xul.dll!nsJSContext::DOMOperationCallback()
    mozjs.dll!js_InvokeOperationCallback()
    mozjs.dll!js_HandleExecutionInterrupt()
    mozjs.dll!js::Interpret()
    mozjs.dll!js::RunScript()
    mozjs.dll!js::Invoke()
    mozjs.dll!js::ExternalInvoke()
    mozjs.dll!js::ExternalInvoke()
    mozjs.dll!JS_CallFunctionValue()
    xul.dll!nsJSContext::CallEventHandler()
    xul.dll!nsGlobalWindow::RunTimeout()
    xul.dll!nsGlobalWindow::TimerCallback()
    xul.dll!nsTimerImpl::Fire()
    xul.dll!nsTimerEvent::Run()
    xul.dll!nsThread::ProcessNextEvent()
    ...nsPrompter...
    ...nsGlobalWindow::Confirm()...
    ...gmail JS...
    ...original event loop...

This seems like it could just be a legit slow-script detection. After all, the |while(more) { processNextEvent() }| loop in nsPrompter is accumulating runtime, and I don't see any code that treats processNextEvent specially.

The part I don't understand yet is why the slow-script dialog takes randomly long periods of time to happen, instead of exactly 20 seconds after the tab-modal prompt is shown. Similarly, I've not yet been able to reproduce it with simpler testcases, so I'm not sure if the GMail case is somehow special.

Perhaps it's just a non-deterministic interaction between when the watchdog timer fires and what's on the stack? (eg, catching the tight nsPrompter loop vs some other JS running on top of that event loop?)

Continuing to investigate...
Tentatively marking "risky" since we don't know what the problem is yet.
Whiteboard: [hardblocker] → [hardblocker][dolske investigating][risky]
bkap and I did some poking, looks like this _is_ a legitimate slow-script dialog, chrome's looping and eventually the watchdog catches it.

Oddly, I'm now able to quickly reproduce this on Windows (whereas before it was hard). Maybe running a debug build, turning off the JITs, and dropping dom.max_chrome_script_run_time to 5 seconds helps?

One possible solution to this would be to resurrect bug 569314... Not for performance reasons, but so that upon start/stop of the new event loop spin we're in native code and can thus easily twiddle the context's runtime accounting. [Just 569314 as-is wouldn't be enough, because upon return is would still look like a long-running function, so there's still a small window where this bug could happen.]

But mrbkap's laser eyes -- *fryspewpewpew* -- noticed that in nsJSContext::DOMOperationCallback there's already this existing check:

769   if (ctx->mModalStateDepth) {
770     // We're waiting on a modal dialog, nothing more to do here.
771     return JS_TRUE;
772   }

We're getting past this, so it's a bit surprising we have a context where mModalStateDepth is 0. Need to do yet more debugger poking with mrbkap to figure out exactly what's happening.
Results from some debugging with^H^H^H^Hby mrbkap...

Root of the problem is that the slow-script checking code (nsJSContext::DOMOperationCallback) is checking is _current_ context (right?) -- which in this case is one of the GMail subframes, window.frames[3]. But back in bug 611166 we switched EnterModalState / LeaveModalState to use the _window's_ context instead (window aka the global object of which the called alert/confirm/prompt is a method upon). Thus the current context's mModalStateDepth is 0, and the check mentioned in comment 11 isn't able to suppress the slow-script dialog.

So, the planned fix is to essentially back out bug 611166, and instead fix the original problem in that bug by having EnterModalState return the window associated with the current context, so that LeaveModalState can use it. That avoids the "Uh, mismatched LeaveModalState() call!" assert while making sure we set mModalStateDepth on the context that the watchdog will be looking at.

[Probably safe to leave the code _added_ by 611166 -- so that we're setting the modal state on both the current context and window's context.]
Also good news; I can reproduce this with a simple script in an <iframe> that calls window.parent.alert().
Posted patch Patch v.1 (obsolete) — Splinter Review
Attachment #502624 - Attachment is obsolete: true
Attachment #507769 - Flags: superreview?(jst)
Attachment #507769 - Flags: review?(mrbkap)
Attachment #507769 - Attachment is patch: true
Attachment #507769 - Attachment mime type: application/octet-stream → text/plain
Whiteboard: [hardblocker][dolske investigating][risky] → [hardblocker][has patch][needs review mrbkap][needs sr jst][risky]
Comment on attachment 507769 [details] [diff] [review]
Patch v.1

Still need to bump the IID of nsIDOMWindowUtils.

Nits:

after callerWin = do_QueryInterface(sgo), you have too many newlines after the 
closing brace.

the |if (aCallerWin)| block doesn't need two space to either side of it either.

r=mrbkap with those things fixed.
Attachment #507769 - Flags: review?(mrbkap) → review+
Whiteboard: [hardblocker][has patch][needs review mrbkap][needs sr jst][risky] → [hardblocker][needs sr jst][needs updated patch]
Posted patch Patch v.2 (obsolete) — Splinter Review
Attachment #507769 - Attachment is obsolete: true
Attachment #508569 - Flags: superreview?(jst)
Attachment #507769 - Flags: superreview?(jst)
Whiteboard: [hardblocker][needs sr jst][needs updated patch] → [hardblocker][needs sr jst]
Comment on attachment 508569 [details] [diff] [review]
Patch v.2

- In nsGlobalWindow::EnterModalState():

+  JSContext *cx = nsContentUtils::GetCurrentJSContext();
+
+  nsCOMPtr<nsIDOMWindow> callerWin;
+  nsIScriptContext *scx;
+  if (cx && (scx = GetScriptContextFromJSContext(cx))) {
+    scx->EnterModalState();
+
+    nsIScriptGlobalObject *sgo = scx->GetGlobalObject();
+    callerWin = do_QueryInterface(sgo);
+  }
+

This code could use callerWin = do_QueryInterface(nsJSUtils::GetDynamicScriptGlobal(cx)), but make sure cx is non-null.

- In nsIDOMWindowUtils.idl:

+  /**
+   * Same as enterModalState, but returns the window associated with the
+   * current JS context.
+   */
+  nsIDOMWindow enterModalStateWithWindow();
+
+  /**
+   * Same as leaveModalState, but takes a window associated with the active
+   * context when enterModalStateWithWindow was called. The currently context
+   * might be different at the moment (see bug 621764).
+   */
+  void leaveModalStateWithWindow(in nsIDOMWindow aWindow);

Given that these effectively render the existing enterModalState/leaveModalState methods useless, or harmful, even, we could mark the old ones with [noscript], which is a binary comptible change that would prevent extensions from calling them. Or I guess there's a risk of that breaking JS extensions, which may mean we should wait with that change...

sr=jst with that.
Attachment #508569 - Flags: superreview?(jst) → superreview+
Whiteboard: [hardblocker][needs sr jst] → [hardblocker][needs sr jst][has patch]
Whiteboard: [hardblocker][needs sr jst][has patch] → [hardblocker][has patch]
Is this can-land now, dolske?
Let's try to get this in before we tag for b11.
Whiteboard: [hardblocker][has patch] → [hardblocker][has patch][b11]
Posted patch Patch v.3Splinter Review
Patch with jst's comments addressed.

I made the [noscript] change, this interface was added for scripting very recently (by me, for tab-modal prompts), I wouldn't expect addons to be using it, and a Addons mxr search indeed has no hits.
Attachment #508569 - Attachment is obsolete: true
Pushed http://hg.mozilla.org/mozilla-central/rev/835b313007b4
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [hardblocker][has patch][b11] → [hardblocker][b11]
Target Milestone: --- → mozilla2.0b11
Comment on attachment 508997 [details] [diff] [review]
Patch v.3

This checkin introduced the warning:
> dom/base/nsGlobalWindow.cpp: In member function ‘virtual void nsGlobalWindow::LeaveModalState(nsIDOMWindow*)’:
> dom/base/nsGlobalWindow.cpp:6425:14: warning: unused variable ‘cx’

From the landed patch:

>@@ -6406,16 +6417,24 @@ nsGlobalWindow::LeaveModalState()
>+  JSContext *cx = nsContentUtils::GetCurrentJSContext();
>+
>+  if (aCallerWin) {
>+    nsCOMPtr<nsIScriptGlobalObject> sgo(do_QueryInterface(aCallerWin));
>+    nsIScriptContext *scx = sgo->GetContext();
>+    scx->LeaveModalState();
>+  }

dolske: did you intend to use |cx| there, or is it safe to just remove it?
Verified OK (I waited 5 mins while it took 1 min with older versions to get the error) with 4.0b11
Status: RESOLVED → VERIFIED
Whiteboard: [hardblocker][b11] → [hardblocker][b11][fx4-fixed-bugday]
Build identifier: Mozilla/5.0 (Windows NT 6.1; rv:2.0b12pre) Gecko/20110206 Firefox/4.0b12pre

I can also confirm that the issue is no longer present.
(In reply to comment #22)
> Comment on attachment 508997 [details] [diff] [review]
> Patch v.3
> 
> This checkin introduced the warning:

Per dolske's request, filed a followup on this -- Bug 633891.
Depends on: 633891
Depends on: 647727
You need to log in before you can comment on or make changes to this bug.