Closed Bug 888753 Opened 11 years ago Closed 11 years ago

Promises hang under certain conditions with nested event loops

Categories

(Toolkit :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla26

People

(Reporter: Irving, Assigned: Irving)

Details

(Whiteboard: [Async])

Attachments

(1 file, 2 obsolete files)

One of the Addon Manager test cases just happened to use a nested event loop in just the right way to trigger a deadlock bug in the Promise implementation. Attached is a test case, including some debug dumps I added to Promise.jsm to see what was going on.

In short, what happened was that within a single tick of the event loop my code queues up .then() calls on two different already-resolved promises; this causes PromiseWalker to schedule the walker loop for the next tick.

When we walker loop starts, it invokes the first .then handler, which starts a nested event loop. This prevents the outer walker loop from iterating on to the next .then handler.

Because all the promises in question have already been completed, we don't get another call to schedulePromise, so the walker loop doesn't get restarted inside the nested event loop.

We might be able to fix this by always re-scheduling another walker loop at https://hg.mozilla.org/mozilla-central/annotate/e24391ffbe7e/toolkit/modules/Promise.jsm#l279 if this.handlers.length > 1
Attachment #769512 - Flags: feedback?(paolo.mozmail)
Attachment #769512 - Flags: feedback?(dteller)
Generally, we dislike strongly event loop testing, so I'm not sure of the best thing to do. Do we want to ensure that things work as you expect them to? Do we just want to ensure that we have a way of finding where the unexpected behavior comes from?

Note that "fixing" promise does not guarantee that we won't have the exact same issue if we move to DOM promise, some day.
The nested event loops I tripped over are part of a very large test suite that would need tons of rewriting. That said, the event loops should not have been invoked inside of callbacks, so I've fixed them; that's why this bug doesn't block me.

That said, our current constant-stack Promise implementation has code in it to protect against nested event loops, so when I tripped over a bug in that protection I filed it.

If the fix isn't too expensive, I'd prefer that we make our implementation as robust as possible (and perhaps test the DOM Promise implementation against the same suite).
(In reply to :Irving Reid from comment #0)
> One of the Addon Manager test cases just happened to use a nested event loop
> in just the right way to trigger a deadlock bug in the Promise
> implementation.

I'm glad you found this bug early!

> In short, what happened was that within a single tick of the event loop my
> code queues up .then() calls on two different already-resolved promises;
> this causes PromiseWalker to schedule the walker loop for the next tick.

Is your test case able to trigger the deadlock with two "then" calls on the
same promise? If so, it's better to test that case, as we don't guarantee
the order of calls to handlers on different promises.
 
> We might be able to fix this by always re-scheduling another walker loop at
> https://hg.mozilla.org/mozilla-central/annotate/e24391ffbe7e/toolkit/modules/
> Promise.jsm#l279 if this.handlers.length > 1

Looks like a good solution. Can you make a patch, adding the test case to
"test_Promise.jsm"? (I'd like a single file because we plan to run the suite
against other implementations in the future, to see how they comply).

(In reply to David Rajchenbach Teller [:Yoric] from comment #1)
> Generally, we dislike strongly event loop testing, so I'm not sure of the
> best thing to do. Do we want to ensure that things work as you expect them
> to? Do we just want to ensure that we have a way of finding where the
> unexpected behavior comes from?

I also thought event loop testing wasn't important here, until I found that
the JavaScript Debugger may actually use nested event loops during promise
resolution (and we can be robust to code potentially opening modal dialogs
during resolution, that are handled by nested event loops at the OS level).

> Note that "fixing" promise does not guarantee that we won't have the exact
> same issue if we move to DOM promise, some day.

This is why I think we should have a test case for this condition here, as we
fortunately detected it and can reproduce it reliably.
(In reply to Paolo Amadini [:paolo] from comment #3)
> > Note that "fixing" promise does not guarantee that we won't have the exact
> > same issue if we move to DOM promise, some day.
> 
> This is why I think we should have a test case for this condition here, as we
> fortunately detected it and can reproduce it reliably.

That sounds good.
Irving, could you check the behavior of the same test with DOM Promise and see with baku if we need to add this to their test suite?
Comment on attachment 769512 [details] [diff] [review]
Failing test case for deadlock in Promise library

Looks like I didn't clear the feedback request when writing comment 3. This bug
is currently actionable.
Attachment #769512 - Flags: feedback?(paolo.mozmail)
(In reply to Paolo Amadini [:paolo] from comment #3)
> (In reply to :Irving Reid from comment #0)
...
> > In short, what happened was that within a single tick of the event loop my
> > code queues up .then() calls on two different already-resolved promises;
> > this causes PromiseWalker to schedule the walker loop for the next tick.
> 
> Is your test case able to trigger the deadlock with two "then" calls on the
> same promise? If so, it's better to test that case, as we don't guarantee
> the order of calls to handlers on different promises.

Yes, but I'm not sure how important that is to us - when someone writes

promise.then(loop waiting for a signal)
promise.then(signal end of loop)

it looks like it shouldn't work, and a naive Promise implementation would never execute the second then() handler. But when they write

promise.then(do some work that will signal end of loop, and then loop waiting for the signal)

and the work happens to use a second promise to send the signal, it's not as obvious that it shouldn't work, and I think it would work in a naive implementation.

> > We might be able to fix this by always re-scheduling another walker loop at
> > https://hg.mozilla.org/mozilla-central/annotate/e24391ffbe7e/toolkit/modules/
> > Promise.jsm#l279 if this.handlers.length > 1
> 
> Looks like a good solution. Can you make a patch, adding the test case to
> "test_Promise.jsm"? (I'd like a single file because we plan to run the suite
> against other implementations in the future, to see how they comply).

moved the test case into test_Promise.jsm, but I'd like you to double-check that I hooked into your test framework correctly. The test does fail without my fix and pass with it, so I think I got it right...
Assignee: nobody → irving
Attachment #769512 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #792293 - Flags: review?(paolo.mozmail)
(In reply to :Irving Reid from comment #6)
> Yes, but I'm not sure how important that is to us - when someone writes
> 
> promise.then(loop waiting for a signal)
> promise.then(signal end of loop)
> 
> it looks like it shouldn't work, and a naive Promise implementation would
> never execute the second then() handler. But when they write
> 
> promise.then(do some work that will signal end of loop, and then loop
> waiting for the signal)
> 
> and the work happens to use a second promise to send the signal, it's not as
> obvious that it shouldn't work, and I think it would work in a naive
> implementation.

Sounds good. We should explain this in a comment, and specify that this covers
the current implementation but doesn't guarantee regression coverage, since the
code that sets the variable to exit the loop could be executed before the loop
itself. This may happen in a different implementation that still has the event
loop blocking issue.

We should be clear in how our tests work, and specify when we do unusual things,
because tests are sometimes used as a go-to place for finding examples on how
to use the code.
Comment on attachment 792293 [details] [diff] [review]
Reschedule promise walker more often to avoid deadlocks

Review of attachment 792293 [details] [diff] [review]:
-----------------------------------------------------------------

This looks quite good, I just had the chance for some nitpicking :-)

r+ with the requested changes and explanatory comments.

::: toolkit/modules/Promise.jsm
@@ +284,5 @@
>  
>    /**
> +   * Sets up the PromiseWalker loop to start on the next tick of the event loop
> +   */
> +  startWalkerLoop: function()

nit: scheduleWorkerLoop?

@@ +341,5 @@
>      // impact on overall performance.
> +    if (this.handlers.length > 1) {
> +      this.startWalkerLoop();
> +    }
> +    else {

mini style nit: } else {

::: toolkit/modules/tests/xpcshell/test_Promise.js
@@ +664,5 @@
>  
> +
> +
> +// Test deadlock in Promise.jsm with nested event loops
> +let done_event_loop = false;

shouldExitNestedEventLoop sounds more similar to what this does.

Also, these needn't be global, can be declared inside the test case for isolation.

@@ +669,5 @@
> +
> +function event_loop() {
> +  let thr = Services.tm.mainThread;
> +  while(!done_event_loop) {
> +    dump(".");

Let's keep the output deterministic by not calling "dump" for each iteration.

@@ +696,5 @@
> +      event_loop();
> +    }, null);
> +
> +    do_print("Setting wait for second promise");
> +    return promise2.then(null, error => {return 3;})

We don't invoke the error handler, right?

Looks like ".then(null)" might be sufficient, also please add a comment explaining why it is needed.

@@ +700,5 @@
> +    return promise2.then(null, error => {return 3;})
> +    .then(
> +      count => {
> +        done_event_loop = true;
> +      });

nit: a one-liner ".then(() => { shouldExitNestedEventLoop = true; });" seems more readable to me.
Attachment #792293 - Flags: review?(paolo.mozmail) → review+
https://hg.mozilla.org/mozilla-central/rev/12b9f93e32ad
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: