Closed Bug 1167575 Opened 9 years ago Closed 8 years ago

code executed in wrong order with setTimeout after alert

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 51
Tracking Status
firefox48 --- wontfix
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- fixed

People

(Reporter: gouwa5700, Assigned: bzbarsky)

References

Details

(Keywords: addon-compat)

Attachments

(8 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.65 Safari/537.36

Steps to reproduce:

See this bug in action on JSBin: http://jsbin.com/cijonu

Code to reproduce the bug in Firefox( you might need to run it several times):

console.log('processing: task #1');
setTimeout(function(){
  console.log('processing: task #3');
},0);
alert('See console logs');
console.log('processing: task #2');


Actual results:

console output:

"processing: task #1"
"processing: task #3"
"processing: task #2"


Expected results:

console output:

"processing: task #1"
"processing: task #2"
"processing: task #3"
Build ID: 20151209030228
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:45.0) Gecko/20100101 Firefox/45.0

Hi,

I tested this with Firefox Nightly 45.0a1  on Mac OS X 10.10 and I can't reproduce the problem.
Please download the Firefox Nightly from here: https://nightly.mozilla.org/ and retest the problem.
Flags: needinfo?(gouwa5700)
Hi,
Marking this as Resolved: Incomplete due to the lack of response from the reporter.
Reporter, please feel free to reopen it if you are still having this issue on the latest Firefox version.
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(gouwa5700)
Resolution: --- → INCOMPLETE
I'm on FF 45.0.2 on OSX and also repro this issue.

Here is a JS bin
http://jsbin.com/yukudor/1/edit?js,console

Here is the code:

setTimeout(function () {
  console.log('timeout');
}, 1);

var result = prompt('hi');
console.log('prompt');

I would expect to see prompt, timeout. However, sometimes I instead get timeout, prompt.
Hi,

You are right, I was able to reproduce the issue from comment 4 with FF 45.0.2 on Mac OS X 10.10. I also tried with FF Nightly 48.0a1 and I can't reproduce this issue. Can you please retest this with FF Nightly to see if you can reproduce it? Please download the Firefox Nightly from here: https://nightly.mozilla.org/
I also find myself unable to repro on FF Nightly.
Hi,

Thanks for your feedback, I did some searches and I found out that this bug is reproducible also on the next Firefox 46 version. The good news is that on Firefox DevelopersEdition the bug can't be reproduce, that means the first good release version will be Firefox 47. Based on the Rapid Release Calendar, https://wiki.mozilla.org/RapidRelease/Calendar, Firefox 47 release will be shipped on 2016-06-06.
I just tested this bug on Firefox 48.0.1, I can still reproduce it. Also, I tried it with Firefox Nightly 51.0a1(2016-08-24), the bug persists. See the screenshots.

BTW, the way you guys treat this bug is amateur to me. Rather than digging into the code and find the root cause, you keep pushing it back by saying you can't reproduce the bug on the latest version. But it turns out you just didn't take it seriously. Don't ask me to test it on the latest Firefox, just find the root cause and fix it once for all.
Version: 40 Branch → 51 Branch
Status: RESOLVED → UNCONFIRMED
Resolution: INCOMPLETE → ---
Hi Greg, 
First of all thank you very much for your feedback and effort. 
Before we set a component to a bug and start to dig for the problem we need to be sure that is a Firefox problem and someone else can reproduce it or, if not we can assume that is not a Firefox issue.  

I tested again this issue ussing the examples from User Story and comment 4. I used Mac OS X 10.10 with  Firefox Nightly 51.0a1(2016-08-24) and Firefox 48.0.1 release and I can't reproduce it. 

Do you have add-ons? I have tested with new profile and no add-ons. You can try in safe mode to see if add-ons affect this issue. In safe mode all add-ons are disabled. 

Please see the attached files.
Component: Untriaged → DOM
Product: Firefox → Core
Thanks for your reply. I simply download the latest release version and nightly build with no additional modification.
It seems harder to reproduce the bug with the JS bin. But by copying and pasting the code in console, I get a high chance to reproduce it.

Here (http://g.recordit.co/2m52ddlWij.gif) is a screen recording of me reproducing the bug, please try to reproduce as I did.
I tried to reproduce the issue like you suggested in the gif file but with no results, for me is working in Nightly and release versions. 

I have an idea, can you please try to create a new profile and see how it goes?  
You have the steps here: https://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles?redirectlocale=en-US&redirectslug=Managing-profiles#w_starting-the-profile-manager
Code to reproduce the bug:
console.log('processing: task #1');
setTimeout(function(){
  console.log('processing: task #3');
},0);
alert('See console logs');
console.log('processing: task #2');

Steps to reproduce the bug in Firefox:
1. Open Firefox
2. Open console and paste the above code and run:
3. Open a new tab and repeat step 2

The bug should show up.
I tried the above instructions many times, it can be reproduced every time. Please try it and confirm if it shows up.
I tried both safe mode and creating a new profile, which doesn't change a fact. :(
I attached a screenshot, please see that in my case after "processing: task #2" I have "undefined", but the order is as expected.
just try it more times.
Attached video Recording #10.mp4
I set a component to this bug, but I can't confirm it because I can't reproduce it. 
Please see the attachement with the recording.
please try to open the console window with keyboard shortcut Command+Option+I, instead of right clicking and "Inspect Element".
Code to reproduce the bug:
console.log('processing: task #1');
setTimeout(function(){
  console.log('processing: task #3');
},0);
alert('See console logs');
console.log('processing: task #2');

Steps to reproduce the bug in Firefox:
1. Open Firefox
2. Open console window with keyboard shortcut Command+Option+I and paste the above code and run
3. Open a new tab and repeat step 2

The bug should show up.
I tried with keyboard shortcut Command+Option+I but nothing changed. I still can't reproduce it.
I found a reliable way to reproduce this issue:
1. open the link in comment 0 and close the dialog when it prompts
2. open the link in comment 0 again, and close the dialog when it prompts
3. open web console on both tabs
4. refresh the first tab and wait for the dialog to show up, but don't close it
5. refresh the second tab and wait for the dialog to show up, but don't close it
6. switch to the first tab and close the dialog
7. switch to the second tab and close the dialog

Before step 6, you should see
> processing: task #1
in the web console of the first tab, and after that, you should see
> processing: task #1
> processing: task #3
and task #2 will never show up until you finish step 7 and switch back to the first tab.

So there is some issue that synchronous script is blocked due to the same origin alert, while asynchronous script is not properly blocked. I guess this could have a wider impact than this specific case, though probably rare. I think it could lead to race condition in some script-intensive apps.

I'm not sure whether it is a bug in DOM or in the JavaScript engine. I suppose the latter.
Status: UNCONFIRMED → NEW
Component: DOM → JavaScript Engine
Ever confirmed: true
Isn't this just the thing where alert spins a nested event loop, which then executes the timeout, or something?
(In reply to Jan de Mooij [:jandem] from comment #26)
> Isn't this just the thing where alert spins a nested event loop, which then
> executes the timeout, or something?

Probably. I suppose the nested event loop shouldn't do that anyway. Asynchronous code is never supposed to be executed before synchronous code, is it?
alert does spin a nested event loop.

However while the alert is up we don't run timeouts for that window, precisely to avoid the problem being described in this bug.  See the IsInModalState() check at the top of nsGlobalWindow::RunTimeout.  If this stuff is not working, that's certainly a DOM issue, not a JS engine one....

Anyway, the steps in comment 25 _would_ reproduce this problem, because when the first modal comes down we'll go ahead and run the timeout (since we're no longer in modal state), but we're still spinning the event loop in the _other_ tab, so the linear script execution is not continuing in either tab.  That part all makes sense.  This particular scenario can't really be addressed without making alert() block the world (and thus prevent putting up a "nested" alert in the second tab), which we don't want to do because it makes for a terrible user experience.

_Maybe_ we could address it with the event queue rework for the cases when the two documents "share" an event queue.  And with that plus quantum DOM (putting things that don't share an event queue on different threads), we might be able to address it entirely, maybe.  It would involve alert blocking the entire thread, which may also not be the UX we want.

In any case, the upshot is that the spec says that alert is supposed to totally block the world and we purposefully violate that spec because that's incredibly user-hostile behavior.  We make things sort of look like the spec world for simple cases, but for complicated ones it's observably different.  Most simply, your page can be closed or navigated away from while an alert is up, which is not allowed per spec.

Back to the original bug report, though... The case of alerts coming up while other alerts are already up is a pretty rare edge case and the reproduction observed at https://bug1167575.bmoattachments.org/attachment.cgi?id=8784730 does not seem to involve that situation: the alert in the first tab is brought down before whatever is done in the second tab is done, no?
Component: JavaScript Engine → DOM
The steps in comment 15 and comment 23 don't mention whether the alerts are being dismissed and when.  That part is pretty important.  I can reproduce using Xidorn's steps, obviously, but that's the only reproduction scenario I've found so far...
(In reply to Boris Zbarsky [:bz] from comment #29)
> The steps in comment 15 and comment 23 don't mention whether the alerts are
> being dismissed and when.  That part is pretty important.  I can reproduce
> using Xidorn's steps, obviously, but that's the only reproduction scenario
> I've found so far...

In my case, I simply dismissed the alert when it showed up, the bug appeared on the second tab.

So to be precise, the code and steps to reproduce this bug for my instance of Firefox Nightly 51.0a1(2016/08/24) was:
Code to reproduce the bug:
console.log('processing: task #1');
setTimeout(function(){
  console.log('processing: task #3');
},0);
alert('See console logs');
console.log('processing: task #2');

Steps to reproduce the bug in Firefox:
1. Open Firefox
2. Open console window with keyboard shortcut Command+Option+I and paste the above code and hit Enter key to run
3. Dismiss the alert that shows up
4. Open a new tab and repeat step 2 and 3

The bug shows up on the console of the second tab.
(In reply to Boris Zbarsky [:bz] from comment #29)
> The steps in comment 15 and comment 23 don't mention whether the alerts are
> being dismissed and when.  That part is pretty important.  I can reproduce
> using Xidorn's steps, obviously, but that's the only reproduction scenario
> I've found so far...

The STR in comment 30 (which is the steps shown in the recording in comment 19) is also reproducible for me. It is not as reliably as my steps, but still highly reproducible.
OK, I have experimented with this a bit, and found the key ingredient to reproduce: I can reproduce using the console when about:newtab is the currently loaded page (even if it's the only tab that's ever been opened in that window).  I couldn't reproduce before because all my profiles always have the newtab page set to about:blank...

Looking into this now.
Xidorn, were you reproducing this on Mac?  I can reproduce on Mac reasonably consistently, but not on Linux....
Flags: needinfo?(xidorn+moz)
Yes, I reprodeced this on Mac.
Flags: needinfo?(xidorn+moz)
So I _think_ I know what's going on here.

The alert code has this structure (in components/prompts/src/nsPrompter.js):

        args.promptActive = true;
        newPrompt = tabPrompt.appendPrompt(args, onPromptClose);
        let thread = Services.tm.currentThread;
        while (args.promptActive)
            thread.processNextEvent(true);

where onPromptClose calls winUtils.leaveModalState().

I'm dismissing the alert by hitting enter, so we go through toolkit/components/prompts/content/tabprompts.xml the onKeyAction function, which calls onButtonClick, which does:

                this.Dialog["onButton" + buttonNum]();
                this.shutdownPrompt();

The shutdownPrompt() call is what ends up calling onPromptClose.  The line involving buttonNum lands in toolkit/components/prompts/src/CommonDialog.jsm in the onButton0 function, which does:

        this.args.promptActive = false;

So the idea is that we set the boolean that will break us out of the processNextEvent loop, then call onPromptClose, which exits the modal state.  The problem I'm seeing is this.  If I breakpoint in onButton0 and look at the C++ stack at that point, the part of the stack that starts off in the event loop and ends up in that tabmodalprompt XBL binding, it looks like this:

#27 0x0000000106ea5010 in nsXBLPrototypeHandler::ExecuteHandler (this=0x148a108c0, aTarget=0x124b2a430, aEvent=0x1224e2660) at nsXBLPrototypeHandler.cpp:325
....
#57 0x00000001071dcbfc in nsAppShell::ProcessNextNativeEvent (this=0x11f666940, aMayWait=true) at nsAppShell.mm:574
#58 0x0000000107142c16 in nsBaseAppShell::DoProcessNextNativeEvent (this=0x11f666940, mayWait=true) at nsBaseAppShell.cpp:138
#59 0x0000000107143183 in nsBaseAppShell::OnProcessNextEvent (this=0x11f666940, thr=0x100526100, mayWait=true) at nsBaseAppShell.cpp:289
#60 0x00000001071dd562 in nsAppShell::OnProcessNextEvent (this=0x11f666940, aThread=0x100526100, aMayWait=true) at nsAppShell.mm:743
#61 0x00000001071dd60e in non-virtual thunk to nsAppShell::OnProcessNextEvent(nsIThreadInternal*, bool) (this=0x11f666948, aThread=0x100526100, aMayWait=true) at nsAppShell.mm:733
#62 0x0000000102645fc9 in nsThread::ProcessNextEvent (this=0x100526100, aMayWait=true, aResult=0x7fff5fbe78e8) at nsThread.cpp:1029
#63 0x000000010267ee5e in NS_InvokeByIndex (that=0x100526100, methodIndex=10, paramCount=2, params=0x7fff5fbe78d0) at xptcinvoke_x86_64_unix.cpp:180

So frame 63 is the call from JS to processNextEvent.  Frame 61 is the nsThread code calling into appshell, which lands in frame 57 and processes the keypress, eventually firing the keypress event listener.

But then the stack unwinds back to frame 62, and if you look at what nsThread::ProcessNextEvent looks like, it does:

    obs->OnProcessNextEvent(this, reallyWait); // This is the call that ends up in appshell
...
      mEvents->GetEvent(reallyWait, getter_AddRefs(event), lock);

etc.

In other words, when JS makes that single processNextEvent() call, our event loop will process some OS events _and_ one Gecko runnable before returning.  At least on Mac.  So what can happen, and I expect is happening, is the following:

1)  We start spinning the event loop.
2)  The timer for the timeout fires and doesn't run the timer
    callback because we're in modal state.
3)  The next call to processNextEvent happens when the Gecko
    runnable queue is empty.
4)  The next OS event that comes in is the keypress.
5)  We process the keypress, do the XBL/prompter stuff, set our
    "stop spinning the event loop" boolean, and call onPromptClose.
6)  onPromptClose calls leaveModalState.
7)  leaveModalState queues a runnable to run the timeouts that
    fired while in modal state.
8)  We unwind to nsThread::ProcessNextEvent.
9)  We grab the next runnable and run it; this fires the timer callback.
10) We return to the JS code in nsPrompter.js and break out of
    the loop because args.promptActive is now false.

I don't know why about:newtab makes this more likely to happen...

In any case, we can't really leave modal state when we stop spinning the nested event loop, because someone might spin _another_ event loop under that (the alert-in-other-tab case) and then we would never leave modal state.

One option would be to leave modal state off a runnable, not sync off the keypress event.  That would ensure that leaving modal state can only happen from the actual gecko runnable queue part of processNextEvent and hence that the runnable leaving modal state queues won't get run by the processNextEvent() call in nsPrompter.js.  This is probably the safest option, but leaves the "can process multiple things under a single processNextEvent call" footgun in place generally.  I _think_ we'd need to also not leave our nested loop until that runnable has been processed; otherwise we can leave it and unwind before leaving modal state, which is bad.

Another option would be to change nsThread::ProcessNextEvent to not mess with runnables if the call to nsBaseAppShell::OnProcessNextEvent processed native events.  But it's not clear to me whether this makes sense, and in general seems like a pretty risky change.

A third option would be to change the Mac native event processing to put things in the runnable queue instead of directly dispatching notifications to Gecko.  This also sounds risky.  And I just checked, and Linux has the same behavior with keypress being processed sync under nsAppShell::OnProcessNextEvent (so I don't know why I never managed to reproduce this on Linux).

Sounds to me like we should do that first option, which would be inside the Firefox front-end code for alerts....

Justin, do you know who might be able to take a look at this?
Component: DOM → General
Flags: needinfo?(dolske)
Product: Core → Firefox
Interesting find, thanks Kanru for CC'ing.

I had some experience working with alert/prompters (see bug 1266353 and it's see also bugs). I could give this bug a try if there isn't someone available for this bug as their #1 priority ...
I'm a little curious if this is also reproducible with window-modal prompts. Flipping prompts.tab_modal.enabled should still give you the old behavior, although we don't test that so it's possible it doesn't work any more.

bz's comments indicate that the issue with processNextEvent running multiple things exists on OSX and Linux, but I don't see anything about Windows -- I presume it doesn't have this problem?

Do we know if this bug  causes problems on any real-world sites, or is it just a testcase demo?

Altogether, this seems like a low-priority issue to me. I think the processNextEvent code on OS X has basically always worked this was (at least I vaguely remember this aspect having come up before), so I'd expect that that this bug has existed since at least Firefox 4 (or earlier, if it happens with window-modal prompts too).

From a quick look, "leave modal state off a runnable" workaround sounds like it might be fine, although I'm rather wary of playing whack-a-mole with other edge-case bugs by refactoring this code path. But that should be avoidable by moving all of the 2nd half of tabprompts.xml's onKeyAction() into a runnable -- everything after the event.stopPropagation() call @  http://searchfox.org/mozilla-central/rev/0dd403224a5acb0702bdbf7ff405067f5d29c239/toolkit/components/prompts/content/tabprompts.xml#288-294 I don't think anything in there cares exactly when it runs, so deferring things slighty (event -> spin -> runnable) seems ok.
Flags: needinfo?(dolske)
> I'm a little curious if this is also reproducible with window-modal prompts

Not so far, for me, but I had a good bit of trouble reproducing with tab-modal ones too...

> I presume it doesn't have this problem?

It probably does.  I think we consistently have processNextEvent service OS events and then Gecko runnables.  The only question is what, precisely, servicing OS events does.  All of this is very timing-dependent, obviously.

> Do we know if this bug  causes problems on any real-world sites, or is it just a testcase demo?

Nothing we know of so far, which doesn't mean it's not happening.
I don't think moving things from onKeyAction will help enough.  Clicking the button should have the same problem: processing of the OS event will happen and post a Gecko runnable.

I think the right answer is actually to put the entire body of onButtonClick off on a runnable, if we want to go for minimal refactoring.
If we synchronously shut down the prompt, that will exit modal state while we're
still processing our mouse or key event, which happens _before_ Gecko runnable
processing.  Exiting modal state will generally post a runnable to run timeouts,
so we can end up processing that runnable before we return from processNextEvent
and unwind to whatever code did the alert() call on the web page.  The upshot is
that the webpage will see timeouts fire before the alert() call returns.

If we exit modal state off a Gecko runnable instead, that will ensure that we
return to the nsPrompter code immediately after exiting modal state, see that
the prompt is no longer active, and return to the calling web page code _before_
processing any more runnables.
Attachment #8786102 - Flags: review?(dolske)
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Attachment #8786102 - Attachment is obsolete: true
Attachment #8786102 - Flags: review?(dolske)
Attachment #8786326 - Flags: review?(dolske)
Attachment #8786209 - Attachment is obsolete: true
Attachment #8786209 - Flags: review?(dolske)
Jorge, there are a few addons that seem to do onButtonClick(someNumber) calls on prompts.  We should give their authors a heads-up about this now acting async, in case it affects their code....
Keywords: addon-compat
Er, Jorge, see comment 43.
Flags: needinfo?(jorge)
Thanks, I'll include it in the compat messaging.
Flags: needinfo?(jorge)
Attachment #8786326 - Flags: review?(dolske) → review+
Pushed by bzbarsky@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6d30a609081d
Ensure that we don't exit modal state while we're still waiting on a Gecko runnable in the event-loop-spinning code in nsPrompter.js.  r=dolske
https://hg.mozilla.org/mozilla-central/rev/6d30a609081d
Status: ASSIGNED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 51
I have reproduced this bug with Nightly 41.0a1 (2015-05-22) on Windows 8.1(64 Bit!).

This bug's fix is verified on Latest Nightly 51.0a1.
Build ID : 20160908030434
User Agent : Mozilla/5.0 (Windows NT 6.3; WOW64; rv:51.0) Gecko/20100101 Firefox/51.0

[bugday-20160907]
In order to avoid creating a spurious bug entry, I would appreciate some insights about a similar condition which arise in my extension code -- I would like to know if what was fixed here will fix a difficult to reproduce issue reported by users of my extension, and which issue I have randomly encountered myself.

The most prominent symptom is exactly what is reported here: setTimeout callbacks are reported in the wrong order, and from what the browser console says, it is as if setTimeout is no longer asynchronous, the callback functions are called immediately. At first I thought it was an issue with the extension code, until I finally realize this *always* occurs immediately after the following error is reported in the browser console:

NS_ERROR_NOT_AVAILABLE: Cannot call openModalWindow on a hidden window

Then following this error message, there are recursive calls to SetTimeout caused by the fact that setTimeout is no longer asynchronous, and this leads to a javascript timeout error. See a screenshot of the error on my issue tracker: <https://github.com/gorhill/uBlock/issues/1973#issuecomment-245930385>.

Now, I gather than the "hidden window" mentioned in the error message above is that of the chrome code of the extension, but nowhere in the extension code do I ever create a modal window, so I would appreciate some insights as to why this error occurs in my extension code.
That's a different issue from the one that was fixed here.  This bug was about tab-modal prompts.  The code that gives you the openModalWindow error is trying to open an actual window-modal prompt, but trying to parent it to some non-visible window.  Please file a new bug on that issue?
Is it something that we would like to uplift to aurora/50? Thanks
Given that it's not a new issue and that the fix is reasonably high-risk (in my opinion), I don't think we should uplift it.  I would much prefer a full testing cycle here.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: