Closed Bug 1179909 Opened 9 years ago Closed 9 years ago

Make our event loop less crazy, episode 1

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 + fixed

People

(Reporter: khuey, Assigned: khuey)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 4 obsolete files)

Attached patch 0001-stable-state-refactor.patch (obsolete) — Splinter Review
This is motivated by three separate but related problems:

1. Our concept of recursion depth is broken for things that run from AfterProcessNextEvent observers (e.g. Promises).  We decrement the recursionDepth counter before firing observers, so a Promise callback running at the lowest event loop depth has a recursion depth of 0 (whereas a regular nsIRunnable would be 1).  This is a problem because it's impossible to distinguish a Promise running after a sync XHR's onreadystatechange handler from a top-level event (since the former runs with depth 2 - 1 = 1, and the latter runs with just 1).

2. The nsIThreadObserver mechanism that is used by a lot of code to run "after" the current event is a poor fit for anything that runs script.  First, the order the observers fire in is the order they were added, not anything fixed by spec.  Additionally, running script can cause the event loop to spin, which is a big source of pain here (bholley has some nasty bug caused by this).

3. We run Promises from different points in the code for workers and main thread.  The latter runs from XPConnect's nsIThreadObserver callbacks, while the former runs from a hardcoded call to run Promises in the worker event loop.  What workers do is particularly problematic because it means we can't get the right recursion depth no matter what we do to nsThread.

The solve this, this patch does the following:

1. Consolidate some handling of microtasks and all handling of stable state from appshell and WorkerPrivate into CycleCollectedJSRuntime.
2. Make the recursionDepth counter only available to CycleCollectedJSRuntime (and its consumers) and remove it from the nsIThreadInternal and nsIThreadObserver APIs.
3. Adjust the recursionDepth counter so that microtasks run with the recursionDepth of the task they are associated with.
4. Introduce the concept of metastable state to replace appshell's RunBeforeNextEvent.  Metastable state is reached after every microtask or task is completed.  This provides the semantics that bent and I want for IndexedDB, where transactions autocommit at the end of a microtask and do not "spill" from one microtask into a subsequent microtask.  This differs from appshell's RunBeforeNextEvent in two ways:
  a) It fires between microtasks, which was the motivation for starting this.
  b) It no longer ensures that we're at the same event loop depth in the native event queue.  bent decided we don't care about this.
5. Reorder stable state to happen after microtasks such as Promises, per HTML.  Right now we call the regular thread observers, including appshell, before the main thread observer (XPConnect), so stable state tasks happen before microtasks.

Work remaining to do:
1. Forbid script execution during nsIThreadObserver callbacks and the stable state execution.
2. Write a ton of tests.
Attachment #8629001 - Flags: feedback?(bzbarsky)
Attachment #8629001 - Flags: feedback?(bent.mozilla)
will this also fix bug 1170484 ?
(In reply to cacheoverflow from comment #1)
> will this also fix bug 1170484 ?

Not by itself, no.  See bug 1180686 comment 5 and subsequent comments.
Blocks: 1132436
Blocks: graphene
Kyle, could you rebase this patch? I would do it myself, but it doesn't look trivial.

We need this patch for the larch branch, used for browser.html.
Without it, browser.html keeps crashing.
Flags: needinfo?(khuey)
Attached file untested rebase (obsolete) —
This compiles on linux, but is otherwise untested.
Flags: needinfo?(khuey)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) (UTC+8 July 17-25, expect delays) from comment #4)
> This compiles on linux, but is otherwise untested.

Thank you. I just needed to update nsAppShell.h/mm.
Comment on attachment 8629001 [details] [diff] [review]
0001-stable-state-refactor.patch

So a few thoughts:

1)  It sounds like metastable state callbacks need to not ever run script, since if they do that would spin the event loop, right?  Can we put scriptblockers in there somewhere?

2)  Per spec, processing of microtasks is supposed to have a reentry guard.  We don't have that right now, but we should add it.  Once we do, will this new setup still work correctly in cases in which script running off a microtask spins the event loop?  In particular, I'm looking at the dummy event hackery we have and I'm not sure whether it would still be good enough.

3)  The metastable stuff needs better documentation.

4)  We should unify the two different things we call "microtasks", but that should probably be a followup...

In general, this looks pretty reasonable to me (though I haven't carefully reviewed every single bit).  Especially if by some miracle it's green on try the first time you push it there.  ;)
Attachment #8629001 - Flags: feedback?(bzbarsky) → feedback+
Blocks: 1180686
This is basically the patch bz reviewed, rebased, with one change.  Because of terrible things like bug 1094248, we need to convince appshell not to block if we have promises outstanding.  This is a problem now because BeforeProcessTask is called after appshell, while XPConnect's OnProcessNextEvent was called before it.

Ultimately we should make Promises and stable state behave more like mutation observers with respect to nested event loops, but this doesn't make things any worse.
Attachment #8629001 - Attachment is obsolete: true
Attachment #8635867 - Attachment is obsolete: true
Attachment #8629001 - Flags: feedback?(bent.mozilla)
Attachment #8643813 - Flags: review?(bugs)
Hmm, this might not regress bug 780770 (or this might), but OnProcessNextEvent/AfterProcessNextEvent handling becomes totally different because push/pop null cx happens after OnProcessNextEvent, but before AfterProcessNextEvent.
Could you explain why that change?
I would say it doesn't matter, but it does affect events running off of appshell, which can run content script, so perhaps it does matter.
This makes the changes we talked about on IRC.  There is a rather lengthy commit message that you should read if you have not already.
Attachment #8643813 - Attachment is obsolete: true
Attachment #8643813 - Flags: review?(bugs)
Attachment #8644794 - Flags: review?(bugs)
Comment on attachment 8644794 [details] [diff] [review]
0001-Bug-1179909-Refactor-stable-state-handling.-r-smaug.patch

># HG changeset patch
># User Kyle Huey <khuey@kylehuey.com>
>
>Bug 1179909: Refactor stable state handling. r=smaug
>
>This is motivated by three separate but related problems:
>
>1. Our concept of recursion depth is broken for things that run from AfterProcessNextEvent observers (e.g. Promises). We decrement the recursionDepth counter before firing observers, so a Promise callback running at the lowest event loop depth has a recursion depth of 0 (whereas a regular nsIRunnable would be 1). This is a problem because it's impossible to distinguish a Promise running after a sync XHR's onreadystatechange handler from a top-level event (since the former runs with depth 2 - 1 = 1, and the latter runs with just 1).
>


>-function promiseAsync_SyncHXRAndImportScripts()
>+function promiseAsync_SyncXHRAndImportScripts()
> {
>   var handlerExecuted = false;
> 
>   Promise.resolve().then(function() {
>     handlerExecuted = true;
> 
>     // Allow other assertions to run so the test could fail before the next one.
>     setTimeout(runTest, 0);
>   });
> 
>   ok(!handlerExecuted, "Handlers are not called until the next microtask.");
> 
>   var xhr = new XMLHttpRequest();
>   xhr.open("GET", "testXHR.txt", false);
>   xhr.send(null);
> 
>-  ok(!handlerExecuted, "Sync XHR should not trigger microtask execution.");
>+  todo(!handlerExecuted, "Sync XHR should not trigger microtask execution.");
> 
>   importScripts("relativeLoad_import.js");
> 
>-  ok(!handlerExecuted, "importScripts should not trigger microtask execution.");
>+  todo(!handlerExecuted, "importScripts should not trigger microtask execution.");
This is very worrisome.
Could we get the existing behavior on workers if AfterProcessTask would check if we're in workers and recursion depth is 2 before doing any microtask magic.
And we could also remove http://mxr.mozilla.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#5410



> NS_IMETHODIMP
> SheetLoadData::OnProcessNextEvent(nsIThreadInternal* aThread,
>-                                  bool aMayWait,
>-                                  uint32_t aRecursionDepth)
>+                                  bool aMayWait)
> {
>+  // XXXkhuey this is insane!
>   // We want to fire our load even before or after event processing,
>   // whichever comes first.
>   FireLoadEvent(aThread);
>   return NS_OK;
> }
> 
> NS_IMETHODIMP
> SheetLoadData::AfterProcessNextEvent(nsIThreadInternal* aThread,
>-                                     uint32_t aRecursionDepth,
>                                      bool aEventWasProcessed)
> {
>+  // XXXkhuey this too!
>   // We want to fire our load even before or after event processing,
>   // whichever comes first.
>   FireLoadEvent(aThread);
>   return NS_OK;
> }
We obviously need followup for this.




>+++ b/testing/xpcshell/runxpcshelltests.py
>@@ -859,17 +859,17 @@ class XPCShellTests(object):
>         if self.mozInfo is None:
>             self.mozInfo = os.path.join(self.testharnessdir, "mozinfo.json")
> 
>     def buildCoreEnvironment(self):
>         """
>           Add environment variables likely to be used across all platforms, including remote systems.
>         """
>         # Make assertions fatal
>-        self.env["XPCOM_DEBUG_BREAK"] = "stack-and-abort"
>+        self.env["XPCOM_DEBUG_BREAK"] = "stack"
Is this really relevant change?






>   nsCOMPtr<nsIException> mPendingException;
>+  nsThread* mOwningThread; // Manual refcounting to avoid include hell.
What include hell? Can't you just forward declare nsThread.


 
>+  struct RunInMetastableStateData {
Nit, { to its own line.


>-[scriptable, uuid(09b424c3-26b0-4128-9039-d66f85b02c63)]
>+[scriptable, uuid(cc8da053-1776-44c2-9199-b5a629d0a19d)]
> interface nsIThreadObserver : nsISupports
I think we should drop 'scriptable'


With todo(!handlerExecuted, "importScripts should not trigger microtask execution."); case fixed in worker this would be r+, but I think I could take a look at a new patch.
Attachment #8644794 - Flags: review?(bugs) → review-
Attachment #8645196 - Flags: review?(bugs) → review+
Depends on: 1193917
Should this be uplifted at least to aurora? There are some push+IDB+Promises combinations that crash currently.
I oppose trying to uplift anything non-critical related to event loops. Event loops are notoriously regression-prone and we need the full release cycle to find bugs.
If this doesn't get uplifted (which does seem reasonable), then it's probably advisable to disable IndexedDB on workers for all versions prior to trunk where practical.  See bug 1152026 for context.

I presume disabling would take the form of removing the indexeddb stuff from the worker global scope.
Since this landed there have been 5 new intermittents reported in script scheduling tests; I don't have particular evidence of a link, but the nature of this change does waggle it's eyebrows suggestively and gesture furtively while mouthing 'look over there', to quote xkcd[1].

New intermittents are bugs 1193478, 1193568, 1194301, 1194416 and 1194690

[1] https://xkcd.com/552/
Hmm, actually looking some more I think the fact this is only happening in debug builds suggests that this change may not be responsible; I'll do some retriggers and report back.
Tracking this for 43 to keep an eye on it.
Depends on: 1197188
(In reply to Andrew Sutherland [:asuth] from comment #21)
> If this doesn't get uplifted (which does seem reasonable), then it's
> probably advisable to disable IndexedDB on workers for all versions prior to
> trunk where practical.  See bug 1152026 for context.

Ben, Kyle, Jan, what do you think?
Flags: needinfo?(khuey)
Flags: needinfo?(bent.mozilla)
Flags: needinfo?(Jan.Varga)
Note that Kyle has a less dramatic workaround patch for consideration at https://bugzilla.mozilla.org/show_bug.cgi?id=1152026#c37 which seems like the best option now.
(In reply to Andrew Sutherland [:asuth] from comment #26)
> Note that Kyle has a less dramatic workaround patch for consideration at
> https://bugzilla.mozilla.org/show_bug.cgi?id=1152026#c37 which seems like
> the best option now.

Sounds good.
Flags: needinfo?(Jan.Varga)
Yeah, this is being dealt with in the other bug.
Flags: needinfo?(khuey)
Flags: needinfo?(bent.mozilla)
Depends on: 1194538
Depends on: 1200514
Depends on: 1193922
Depends on: 1223691
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: