Closed Bug 1193394 Opened 9 years ago Closed 7 years ago

Microtasks and promises

Categories

(Core :: DOM: Core & HTML, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: annevk, Assigned: arai)

References

(Blocks 6 open bugs, )

Details

Attachments

(2 files, 12 obsolete files)

59.22 KB, patch
arai
: review+
Details | Diff | Splinter Review
6.88 KB, patch
arai
: review+
Details | Diff | Splinter Review
Currently promises do not execute at the end of a callback, while the HTML specification does call for a microtask checkpoint there.

I worked through the specification with bz and I believe we concluded that although ES6 integration is less than ideal (see https://www.w3.org/Bugs/Public/show_bug.cgi?id=25981) we could implement the correct microtask timing before all the ES6 integration is done in the HTML Standard.

We should fix this as the timing differences this creates is starting to be noticed by developers.
Here is a demo of the issue: http://jsbin.com/kaqume/1/edit?js,console,output

Click the blue square. The `promise` logs should appear before the `mutate` logs. The correct ordering is:

"click"
"promise"
"mutate"
"click"
"promise"
"mutate"
"timeout"
"timeout"
I must say, HTML's task handling is now extremely complicated.  It would take me a while to digest it and figure out if it makes sense.
Depends on: 1195336
Making Promises to use the same microtask handling as what MutationObservers use would make us follow the spec quite a bit better, though not perfectly.
I think the only thing that really ended up blocking this was that we had no reentry guard on the microtask queue processing like the spec has.  If we add that, then I believe it would be just fine to run promises off end-of-microtask.  Of course this means that any time you spin the event loop inside a microtask callback promises will be all sorts of broken until you're done with the spinning, which may be a serious problem (consider what happens with alert() in a microtask callback).  :(
https://jsfiddle.net/h0446dky/ from https://github.com/w3c/html/issues/165 seems also like it would be fixed by this.
(In reply to Boris Zbarsky [:bz] from comment #4)
>  Of course this means that any time you spin the event
> loop inside a microtask callback promises will be all sorts of broken until
> you're done with the spinning, which may be a serious problem (consider what
> happens with alert() in a microtask callback).  :(

Our microtask setup (used by MutationObserver) tries to handle that event loop spinning at least in common cases. That is why there are those nsAutoSyncOperation stack variables.
> Our microtask setup (used by MutationObserver) tries to handle that event loop spinning at least in common cases.

Right, but the point is that per spec spinning the event loop in a microtask callback should NOT lead to microtask reentry.  So whatever we're doing is presumably a spec violation...

The testcase linked from comment 5 doesn't seem like it would be related to this bug; I expect us to process the promise queue in there.  But I'd need to check what actually happens in that case; will try to do that in a bit.
(In reply to Boris Zbarsky [:bz] from comment #7)
> > Our microtask setup (used by MutationObserver) tries to handle that event loop spinning at least in common cases.
> 
> Right, but the point is that per spec spinning the event loop in a microtask
> callback should NOT lead to microtask reentry.  So whatever we're doing is
> presumably a spec violation...
Yes, that is what we do for MutationObservers. No re-entry (in the common cases). That is what I meant "tries to handle".
OK, so if you put up an alert from inside a mutation observer and then switch away from that tab, does that break mutation observers for all other pages until you go back to that tab and click away the alert?
OK, we do process the promise queue before parsing is done, but it's off the event loop, so the parser doesn't have a defined insertion point there.

I guess we could try to do that by aligning promise processing better with our current microtask thing (and implementing the no-reentry thing and then seeing whether it's sufficiently web-compatible in practice given non-app-blocking alerts, which I doubt), or by just manually processing in nsScriptLoader::EvaluateScript...
Ultimately I want to have per-window/whatever event and microtask queues that we can turn on or off on a whim.

That does require the various specs to converge on a definition of microtask.  At one point MutationObservers and Promises were doing different things. idk if that's been fixed.
(In reply to Boris Zbarsky [:bz] from comment #10)
> or by just manually processing in
> nsScriptLoader::EvaluateScript...
FWIW, http://mxr.mozilla.org/mozilla-central/source/dom/base/nsScriptLoader.cpp#936
That's before the script runs.  The microtask stuff after the script runs is in nsScriptLoader::EvaluateScript.
It _is_ sort of related.  I think if we did promise stuff at end of callback (but only if not reentering and only if the JS stack is empty, which is what the current spec seems to call for) the testcase linked in comment 14 would be fixed.

The problem is that the "JS stack is empty" thing is rather bogus in practice.  See spec discussion at https://github.com/whatwg/html/issues/1294 and the resulting "er, actually this may not be what this spec says" note in the spec.  So we'd basically have to make up what "JS stack is empty" actually means.
See Also: → 1292685
Assignee: nobody → bugs
Attached patch wip (obsolete) β€” β€” Splinter Review
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e6d9b8d75f552ee73488e7cf9ac355fa913bc2be
(that is on top of f713114b8c8d)

Quite a few browser chrome and devtools issues.
Flags: needinfo?(ejpbruel)
FWIW, I've fixed all the failing devtools tests, and some browser chrome tests, but there are still dozens of browser chrome tests to fix.
Assuming the ni? in comment 18 is no longer relevant.
Flags: needinfo?(ejpbruel)
Blocks: 1317481
Adding the optimizations described in bug 1317481 should wait for this bug to be resolved: we need to have Gecko tell SpiderMonkey whenever a microtask is enqueued, and it doesn't make sense to do that before actually running promises off of microtasks.
Current state
https://treeherder.mozilla.org/#/jobs?repo=try&revision=06303808e6bbfc1588c2608108fc4a2a0e0d15d6
FWIW, the current status is https://treeherder.mozilla.org/#/jobs?repo=try&revision=e470859790dd1f572e957f5e7af915a2fda7dc75&group_state=expanded&filter-searchStr=Linux%20x64%20debug
Still quite a few tests to fix, or mostly tests I think.
Any progress on that? It seems to have an observable impact and the longer we go without it, the more code is written to expect the broken behavior.
I need to get back to this. We just have so many broken browser and devtools tests :/
Any chance this work can be chunked into pieces? It feels like something worth a collaborative effort. Maybe a sprint?
(In reply to Olli Pettay [:smaug] (vacation-ish until July 30) from comment #26)
> I need to get back to this. We just have so many broken browser and devtools
> tests :/

Would it be possible to:

1. Make a pref for that enables the new behavior
2. Change broken devtools/browser tests use the old behavior
3. Migrate broken tests to new behavior incrementally
4. Remove pref and old behavior

This would at least prevent any new tests from being written and fix our content visible behavior.
Flags: needinfo?(bugs)
(In reply to Ben Kelly [:bkelly] from comment #28)
> (In reply to Olli Pettay [:smaug] (vacation-ish until July 30) from comment
> #26)
> > I need to get back to this. We just have so many broken browser and devtools
> > tests :/
> 
> Would it be possible to:
> 
> 1. Make a pref for that enables the new behavior
Not really. This changes everything in rather low level. Having a pref would be hard, as far as I see.

I'll try to get back to this again, asap.
Flags: needinfo?(bugs)
Blocks: 1388979
Olli, we've noticed several idb wpt failures recently that are very likely due to this task. Bevis is able to help follow up the work if you aren't able to get back to this soon. Please let us know what's the best way to move this forward. :)
Flags: needinfo?(bugs)
Priority: -- → P2
Let me rebase the patches I have and then we need to look at the tests again.
Flags: needinfo?(bugs)
I'd be willing to help fix some of the broken tests, if that would be useful. Feel free to needinfo me if it would.
Attached patch promises_as_microtasks.diff (obsolete) β€” β€” Splinter Review
This is on top of the patch for bug 1406922.

Bevis, want to take a look. 

At least browser starts with this, let's see how broken tests are.
I'll update some my test fixes, but lots of code has moved to use async-wait, which may have changed stuff.

If we can get bug 1406922 to m-i, then we can keep rebasing this patch and fix tests in bugs which block this one.


remote: View your change here:
remote:   https://hg.mozilla.org/try/rev/812f6d76e8a2edee30bff88c00688eb5e86bf7e2
remote: 
remote: Follow the progress of your build on Treeherder:
remote:   https://treeherder.mozilla.org/#/jobs?repo=try&revision=812f6d76e8a2edee30bff88c00688eb5e86bf7e2
remote: 
remote: It looks like this try push has talos jobs. Compare performance against a baseline revision:
remote:   https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=812f6d76e8a2edee30bff88c00688eb5e86bf7e2
Attachment #8916783 - Flags: feedback?(btseng)
Comment on attachment 8916783 [details] [diff] [review]
promises_as_microtasks.diff

Review of attachment 8916783 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for summarizing these, Olli!
I'd like to take this bug to follow up your patches and suggestions to keep fixing the broken tests. :)
Attachment #8916783 - Flags: feedback?(btseng)
Take this bug to follow up.
Assignee: bugs → btseng
Attached patch rebased (obsolete) β€” β€” Splinter Review
Attachment #8916783 - Attachment is obsolete: true
While investigating the failures in tryserver, I found that this common assertion failure [1] at CycleCollectedJSContext::ProcessMetastableStateQueue() introduced by bug 1179909 looks unnecessary to be true.

According to the attached log, I just realized that it's possible to dispatch a DOMEvent in stable state and an AutoMicroTask instance will be used to perform a microtask check point after the JS callback of the DOM Event is returned.
In this case, the mDoingStableStates is true because we are still clearing the StableStateQueue.

The reason why this was not captured obviously is that we didn't run into AfterProcessMicrotask()/ProcessMetastableStateQueue() in PerformMicroTaskCheckPoint() if the MicroTaskQueue is empty but we do now to comply the spec change in 
https://html.spec.whatwg.org/multipage/webappapis.html#perform-a-microtask-checkpoint
(Note: The step 4 of "Cleanup Indexed Database transactions" could be mapped to what we do in ProcessMetastableStateQueue() currently).

[1] https://treeherder.mozilla.org/logviewer.html#?job_id=137149466&repo=try&lineNumber=2052
Attached patch WIP- Part 1: Microtasks and promises (obsolete) β€” β€” Splinter Review
Update WIP patch to remove improper assertion mentioned in comment 37 and re-run try server to identify possible failures:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7e2d1920398dcecf71db1ab6b1cfdf973c0f04e1
Attachment #8801311 - Attachment is obsolete: true
Attachment #8917340 - Attachment is obsolete: true
Attachment #8918773 - Attachment description: wip → WIP- Part 1: Microtasks and promises
This is to remove the expected failure checks to expect these failures be PASSED if the fix in patch part 1 in included.
(Note: all WPT shall be green on try after both patches of part1/part2 are applied.)
comment 37 shows a bug to fix in media code.
(In reply to Olli Pettay [:smaug] (work week Oct 16-20) from comment #40)
> comment 37 shows a bug to fix in media code.
Do you mean that a DOMEvent shouldn't be fired in media code during stable state?
(In reply to Bevis Tseng[:bevis][:btseng] from comment #41)
> (In reply to Olli Pettay [:smaug] (work week Oct 16-20) from comment #40)
> > comment 37 shows a bug to fix in media code.
> Do you mean that a DOMEvent shouldn't be fired in media code during stable
> state?

BTW, there is one thing abnormal compared to what we do in media code and the spec:
1. In HTMLMediaElement, the stable state is used to run its "synchronous section":
   http://searchfox.org/mozilla-central/source/dom/html/HTMLMediaElement.cpp#1880-1912
2. However, according to the spec, the "stable state" is now merged back to the microtask check point by queuing a microtask instead of adding task into a queue of the stable state.

There seems some history in spec that the stable state was represented as micotask check point.
(In reply to Bevis Tseng[:bevis][:btseng] from comment #42)
> 2. However, according to the spec, the "stable state" is now merged back to
> the microtask check point by queuing a microtask instead of adding task into
> a queue of the stable state.
Here is the link of "await a stable state":
https://html.spec.whatwg.org/multipage/webappapis.html#await-a-stable-state
> There seems some history in spec that the stable state was represented as
> micotask check point.
(In reply to Bevis Tseng[:bevis][:btseng] from comment #41)
> (In reply to Olli Pettay [:smaug] (work week Oct 16-20) from comment #40)
> > comment 37 shows a bug to fix in media code.
> Do you mean that a DOMEvent shouldn't be fired in media code during stable
> state?

Yes. We've been fixing other similar issues in media code before, where it does something during metastablestate.
Hmm, though this is perhaps a bit different issue since we trigger metastablestate within stable state.

But anyhow, is the issue only about some chrome only event? Couldn't that fire async?
(In reply to Olli Pettay [:smaug] (work week Oct 16-20) from comment #44)
> Yes. We've been fixing other similar issues in media code before, where it
> does something during metastablestate.
> Hmm, though this is perhaps a bit different issue since we trigger
> metastablestate within stable state.
> 
> But anyhow, is the issue only about some chrome only event? Couldn't that
> fire async?

After testing assertion of !CycleCollectedJSContext::mDoingStableStates in EventDispatcher::Dispatch on try using pure m-c branch [1], I found that this is the only source that abuse the stable state to dispatch an DOM event(Chrome only) and was introduced from bug 1274919:
http://searchfox.org/mozilla-central/diff/1499ee07106852b1146f573a00a2514189212f7b/dom/media/MediaDecoder.cpp#183

However, it's still confusing.
If await-stable-state now in HTML spec [2] is the same to queue a microtask, why should we concern about firing DOMEvent from stable state when DOM event from a microtask is possible?
I understood that microtask queue gives the implementation opportunities to finish some sub-tasks generated by current tasks before processing next tasks. However, it's difficult to know what can/cannot be done in stable state/microtask from the spec. :|

BTW, I am also doing another try to do HTMLMediaElement::RunInStableState() using microtask [3].
However, this makes media code inconsistent to other ::RunInStableState() related to media (HTMLTrackElement, TextTrackManager, MediaManager, MediaStreamGraph, AbstractThread) which might introduce new ordering issues if the order in StableState queue matters.

In addition, there is another try to replace all StableStates with Microtask [4].
Hopefully, this help us to clarify potential problems to be fixed.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=1bee6bf1006e7627557053e94dfca13f4754b0b5&group_state=expanded
[2] https://html.spec.whatwg.org/multipage/webappapis.html#await-a-stable-state
[3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=ff5f0a758fda20b86825ec534e415ed57394bb4c
[4] https://treeherder.mozilla.org/#/jobs?repo=try&revision=8ef01d31a5f57b9f7c16822595c8b6c2578ba00d
JS dispatching events during microtask is totally fine.
But the assertion is about stable state, which is still in our implementation closer to the old spec.
Can we please not fix stable state handling too in this bug, just to limit the scope a bit.

And in general, if JS runs during our current stable state, the stable state becomes something quite different, since event loop may now start spinning there and what not. That clearly hints about a bug in code dealing with stable states.
Summary the spec and the problems in FF about microtask checkpoint:
https://docs.google.com/presentation/d/1momsC3suU8m-CrdZyYD_6QATATehjzZHbkGmL6KsmSk/edit?usp=sharing
Update the status on try of today's linux-x64-dbg-build:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=958ab2ae4da88c1cab2bb60d5d76431b281a6c36&group_state=expanded
1. all WPT are green now.
2. the remained orange factors are c(1,3), m(2,5,9,15,16), mda(2,3), bc(1,2,3,4,5,6,7,8,9,10,13,14,15,16), dt(2,3,4,6,7).

I'll keep fixing these failure with the following orders mda,m,bc,c,dt according to my familiarity to these tests.

[Want to Help?]
Welcome!!! \o/
As explained in the slides of comment 47, the fix in this bug makes FF fundamentally more web-compatible to the execution orders between events/callbacks and microtasks.
Hence, for anyone who is interested in seeing this happened asap, it is encouraged to import the patches of this bug from try and file a bug that blocks this bug to fix any of these orange factors.

In addition, here are some clues to fix these failure:
1. The major reason that causes these failure is the change to support microtask chekpoint at the end of outermost JS callback for promise callbacks and there will only be single microtask queue shared by MutationObservers and Promises.
2. However, in current FF implementation, all the promise callbacks are only expected to be executed at the end of current task.
3. With the fix of this bug, it is expected to be executed earlier, for example, at the return of an event listener or at the return of a JS callback.
4. See the patches in bug 1413125 and bug 1413466 as examples for fixing these kinds of failure.

For example, if this is an implementation of current task:
> #1 ::Run() {
> #2   DispatchTrustedEvent(NS_LITERAL_STRING("controllerchange"); // event listeners will be invoked if added.
> #3   mCallback->Call(); // where 'mCallback' is an instance of CallbackFunction. i.e., JS callback.
> #4   doSomething();
> #5 }
Instead of executing all promised callbacks settled at #2 or #3 after Run() returns,
1. Callbacks of the Promises settled at #2 will be executed at the return of each event listener called inside DispatchTrustedEvent().
2. Callbacks of the Promises settled at #3 will be executed at the return of callbackFunction->Call().
Depends on: 1413817
Comment on attachment 8918773 [details] [diff] [review]
WIP- Part 1: Microtasks and promises

nsContentUtils::IsSubDocumentTabbable is just refined a bit so that we properly detect zombie viewer from focus handling point of view. If we're in a load event handling, we should be able to focus the current document.

Animation.cpp change is just making the code to use MicroTaskRunnable, and not Runnable anymore.
Attachment #8918773 - Flags: review?(btseng)
Comment on attachment 8918773 [details] [diff] [review]
WIP- Part 1: Microtasks and promises

Review of attachment 8918773 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/WorkerPrivate.cpp
@@ +6162,5 @@
>  
>        // XXXkhuey should we abort JS on the stack here if we got Abort above?
>      }
> +    CycleCollectedJSContext* context = CycleCollectedJSContext::Get();
> +    context->PerformDebuggerMicroTaskCheckpoint();

if (context) {
  context->PerformDebuggerMicroTaskCheckpoint();
}
Attachment #8918773 - Flags: review?(btseng) → review+
Attached patch (v1) Part 1: Microtasks and promises (obsolete) β€” β€” Splinter Review
rebase after bug 1409985 was fixed.
Attachment #8918768 - Attachment is obsolete: true
Attachment #8918773 - Attachment is obsolete: true
Attachment #8924811 - Flags: review+
This is to update the expected failure caught on try if part 1 is merged.
Attachment #8918783 - Attachment is obsolete: true
Attachment #8924813 - Flags: review?(bugs)
Comment on attachment 8924813 [details] [diff] [review]
(v1) Part 2: Update expected failure checks.

I guess extendable-event-async-waituntil.https.html case is a bug in the testcase and will be fixed elsewhere.
Attachment #8924813 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #53)
> I guess extendable-event-async-waituntil.https.html case is a bug in the
> testcase and will be fixed elsewhere.
Yes, bug 1409979 is fired to address this issue.
Bevis, it would be nice to run the try on all platforms, I saw test cases that fail on Android but not on Linux, see bug 1415734.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #55)
> Bevis, it would be nice to run the try on all platforms, I saw test cases
> that fail on Android but not on Linux, see bug 1415734.

Will do. Thanks for reminding!
Bug 1413817 already depends on bug 1415734.
No longer depends on: 1415734
Update try result on all platforms:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc35b181a4a55c0bd3cca8a19346ff224545a649

I'll fire more follow-up bugs for the failures not found previously on linux-build in comment 48.

Please see comment 48 for the reason and potential solutions of these failures.

Bug 1414173 also gives another example for the fix by postponing the verification of the test until the next task.
Depends on: 1415783
Remove following expected failures on OSX according to the result on try in comment 58:
testing/web-platform/meta/webvtt/rendering/cues-with-video/processing-model/basic.html.ini
testing/web-platform/meta/webvtt/rendering/cues-with-video/processing-model/bidi/bidi_ruby.html.ini
testing/web-platform/meta/webvtt/rendering/cues-with-video/processing-model/bidi/u0041_first.html.ini
testing/web-platform/meta/webvtt/rendering/cues-with-video/processing-model/bidi/u06E9_no_strong_dir.html.ini
Attachment #8924813 - Attachment is obsolete: true
Attachment #8926746 - Flags: review+
Attached patch (v2) Part 1: Microtasks and promises. (obsolete) β€” β€” Splinter Review
1. Rebase after nsGlobalWindowInner is introduced.
2. Simplify ProcessMetastableStateQueue to ClearIDBTransactions.

Treeherder results on all platforms for reference:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3f98ca453ea0c49c7a259ef5355c3a19d3736a9c

In addition, I'd like to do some experiment to make promise callbacks be processed in old fashion even though the MT queue is unified to see if we could land this patch asap.
Attachment #8924811 - Attachment is obsolete: true
Attachment #8929363 - Flags: review+
Update treeherder result:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b5800bb624450f22893e4918d653b3dbe231c4ac

Note: Patch part 1 on try has been rebased after CustomElementRegister.h/cpp is changed in bug 1419305 is landed.
FWIW, I still think it would be a good idea to turn the new behavior on by default and somehow mark these tests as needing work.  Or out-right disable the tests.
Can we organize a sprint in Austin to tackle those tests?
The risk is that the issue might not be in tests only but in actual code relying on the old behavior. That is why we should, IMO, go through the failing tests and fix them or the relevant production code.
FWIW, arai and hiro are working really hard on fixing all those tests. We're making good progress. Arai has nearly done all the browser-chrome ones (except for one, the remaining ones are just blocked waiting for review) and is moving on to the DevTools ones.
(In reply to Olli Pettay [:smaug] (ni?s and f?s processed after r?s) from comment #65)
> The risk is that the issue might not be in tests only but in actual code
> relying on the old behavior. That is why we should, IMO, go through the
> failing tests and fix them or the relevant production code.

at least a part of bug 1423008 is outside of test code.
so I feel it's really risky to land this with tests disabled.
It's risky to turn it on by default, according to some fix I have done, which causes unexpected crash or function broken if the implementation relying on the execution order in old scheduling heavily. Some of the fix to the test cases were rejected by reviewers (Bug 1414176) after they realized more issues inside their implementation if the scheduling is changed.

In addition, it's not possible to land it and disable it by default as well after some experiment is done in bug 1420816 which was blocked by some implementation relying on the order of mutation observers and promise callbacks which increase the complexity too much to make new scheduling compatible with the old one.

Hence, it's more reasonable to grant help from each module owner/peer of each failed test to fix it properly.
It's really appreciated that :arai and :hiro have almost fixed all browser-chrome tests and dom/animation tests.

About organizing a sprint in Austin,
according to the bug I've fixed so far, I expect this will take several run of sprint, so it's not enough to do it only in Austin.
It shall be more helpful to figure out an assignee of each failure dependency and fix them in higher priority instead if we think make Firefox more web-compatible with the microtask algorithm defined in HTML standard is very important (of course, it is!) especially when more and more developers use promise/await function for async function calls in their web contents.
Update the reasons why we need this to fix asap and how to fix it into the slides in comment 47.
Ok, thanks for explaining!
Weekly update of the try result:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8338f5cd7bd3a24b01db8c7f28f661b8f878ea40&selectedJob=151143213

1. We have all bc test green on linux64 \o/, although there are still several oranges other platforms.
2. The failures on U2F found last week is fixed in bug 1422661.
Update try result:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b71a2448564552227b51ca4db12e12e242a50836

1. rebase microtask patch due to the fix of bug 1416966 in dom/animation.
2. the failures in dom/network had been fix in bug 1415797.
Blocks: 1425778
summary for test failures here.  hope it helps :)
(it's rough summary, so it doesn't filter pre-existing intermittent or something like that)
https://docs.google.com/a/mozilla.com/spreadsheets/d/1iH50biQLuUSMSwMGd5LZFchDgmbAFjlEYdM92nDiPAc/edit?usp=sharing
Wow, that's some impressive analysis!
I suppose that based on the state of bug 1414180 we're hoping the browser-chrome chunk will shrink / disappear in the next run.
Update the try result rebased on today's m-c:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fe67e1680fd8958d759cf41662f2db8c152d76e3
Update try result on today's m-c:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=caddd5487d746816bb03fe7c691f5e8d03164cbc

Note: Patch part 1 has been rebased due to the fix of bug 1409976 in dom/base/nsDOMMutationObserver.cpp.
Weekly update of the try result:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c7b7ba9086f5524823ea95f717f4bfbe0430313a
Depends on: 1430378
Weekly update of the try result:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc889458d9c94fd40e976c572ff37e7556298307
No longer depends on: 1415796
Assignee: bevistseng → nobody
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8ef75c03168fa73749a49a5a6f0e51652c943a1a
Depends on: 1432396
separated the result for e10s and non-e10s
https://docs.google.com/spreadsheets/d/1iH50biQLuUSMSwMGd5LZFchDgmbAFjlEYdM92nDiPAc/edit#gid=51109845
reflected the file move (dom/workers/ServiceWorkerPrivate.cpp => dom/serviceworkers/ServiceWorkerPrivate.cpp)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bbc47847154283c68f27e071eedfd1347f0f0cb2
Blocks: 1434146
Assigning to :arai since this bug needs an owner.
Assignee: nobody → arai.unmht
Status: NEW → ASSIGNED
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c80025c134b1855e14a19899a74fbd3fbe51ec34
(note that some xpcshell test mail fail due to bug 1435644)
Is bug 1414176 the last perma orange?  If so, I think we can land this once bug 1414176 has fixed and file intermittent failure bugs for the rest.  Oh, bug 1432396 is also perma orange on non-E10S.
there are 3 remaining bugs:
  Bug 1414176 - Fix failure WebRTC tests relying on non-comformant Promise handling
  Bug 1431960 - browser/components/preferences/in-content/tests/browser_subdialogs.js fails after bug 1193394
  Bug 1432396 - Some layout/style/test/ tests fail with "ASSERTION: Stop called too early or too late" and "ASSERTION: No document in Destroy()!" on non-e10s after bug 1193394

and all bugs have patches :)
and yes, bug 1431960 is intermittent.
Depends on: 1437248
rebased onto m-c
https://treeherder.mozilla.org/#/jobs?repo=try&revision=701a987d4fcddd24a4f67f08df6f9ee2b9b81f3a

some new unknown failure happens
I'll investigate tomorrow
The assertion "!presContext->HasPendingMediaQueryUpdates" is also mine (bug 1436625). :/
See Also: → 1436625
Depends on: 1437680
(In reply to Hiroyuki Ikezoe (:hiro) from comment #92)
> The assertion "!presContext->HasPendingMediaQueryUpdates" is also mine (bug
> 1436625). :/

It's unrelated. :)
See Also: 1436625
Depends on: 1437712
Depends on: 1437714
Depends on: 1437728
here's rebased patch
Attachment #8929363 - Attachment is obsolete: true
Attachment #8950788 - Flags: review+
Attachment #8926746 - Attachment is obsolete: true
Attachment #8950789 - Flags: review+
Rebased
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c9653afa89f067d4a8146ffab41835e9153808fd

4 new possibly-unknown failure
https://docs.google.com/spreadsheets/d/1iH50biQLuUSMSwMGd5LZFchDgmbAFjlEYdM92nDiPAc/edit#gid=411918523
I'll check them
Depends on: 1439471
Depends on: 1439472
I did push a try run on Windows64.

DAMP will be regressed by this bug.  It might be possible that the test measures the time at inappropriate points.

Joel, can you please take a look it?

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-central&newProject=try&newRevision=9606ff55e53b7c25944c29edf6d034b70146c94c&originalSignature=2e2b666b2c9746ca9fe1184efc6f182c330e8398&newSignature=2e2b666b2c9746ca9fe1184efc6f182c330e8398&framework=1&selectedTimeRange=172800
Flags: needinfo?(jmaher)
did you test this on opt as well?  I am looping in :ochameau as he has been working heavily on making sure the damp test is meaningful to the devtools team.
Flags: needinfo?(jmaher) → needinfo?(poirot.alex)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #97)
> I did push a try run on Windows64.
> 
> DAMP will be regressed by this bug.  It might be possible that the test
> measures the time at inappropriate points.
> 
> Joel, can you please take a look it?
> 
> https://treeherder.mozilla.org/perf.html#/
> comparesubtest?originalProject=mozilla-
> central&newProject=try&newRevision=9606ff55e53b7c25944c29edf6d034b70146c94c&o
> riginalSignature=2e2b666b2c9746ca9fe1184efc6f182c330e8398&newSignature=2e2b66
> 6b2c9746ca9fe1184efc6f182c330e8398&framework=1&selectedTimeRange=172800

If you are concerned about console.openwithcache.open.settle regression, it is fine.
All the "settle" subtests are here to track the possible pending operations we don't correctly wait for after each test.
Your patch propably highlights these pending operations we miss.

Please proceed with your patch and ni? me if you get any reports from perf sherrifs.
Flags: needinfo?(poirot.alex)
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #98)
> did you test this on opt as well?

Yes, opt build also regresses, I did test it locally.

 (In reply to Alexandre Poirot [:ochameau] from comment #99)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #97)
> > I did push a try run on Windows64.
> > 
> > DAMP will be regressed by this bug.  It might be possible that the test
> > measures the time at inappropriate points.
> > 
> > Joel, can you please take a look it?
> > 
> > https://treeherder.mozilla.org/perf.html#/
> > comparesubtest?originalProject=mozilla-
> > central&newProject=try&newRevision=9606ff55e53b7c25944c29edf6d034b70146c94c&o
> > riginalSignature=2e2b666b2c9746ca9fe1184efc6f182c330e8398&newSignature=2e2b66
> > 6b2c9746ca9fe1184efc6f182c330e8398&framework=1&selectedTimeRange=172800
> 
> If you are concerned about console.openwithcache.open.settle regression, it
> is fine.
> All the "settle" subtests are here to track the possible pending operations
> we don't correctly wait for after each test.
> Your patch propably highlights these pending operations we miss.
> 
> Please proceed with your patch and ni? me if you get any reports from perf
> sherrifs.

Thanks!  We will leave the perf regression here in this bug.

Thank you both!
Weekly update
https://treeherder.mozilla.org/#/jobs?repo=try&revision=e75eb8b4c78f3f63c8b392adc75f455fedecf14b
Depends on: 1441388
Depends on: 1441424
I think it's better landing this bug's patches even if new failure for newly added tests aren't fixed (bug 1441388, bug 1441424, and maybe some more coming?),
otherwise we won't be able to land them ;)

so, I'd like to land them once existing important bugs (especially bug 1414176) get fixed,
and temporarily disable new failing tests, does that plan sound fine?
Flags: needinfo?(hikezoe)
Yeah, if the new tests were added without any implementation changes, I think we can land this with disabling the tests.  In any cases, we should discuss the test author individually.
Flags: needinfo?(hikezoe)
Yeah, I agree if there are new broken tests, those can be just disabled temporarily.
(looks like patch part 1 has wrong username)
rebased and fixed author
Attachment #8950788 - Attachment is obsolete: true
Attachment #8954607 - Flags: review+
Attachment #8950789 - Attachment is obsolete: true
Attachment #8954619 - Flags: review+
Note that I did take a chat with :cpearce about bug 1441424, and he told me that he can take a look at it tomorrow, and he is fine with disabling it.  So we will land this bug with disabling it if he couldn't.
I think he said, "ok. you can disable the test. I'll try to re-enable later on this week." -- i.e. no need to wait until tomorrow.

However, we should send a notice to dev-platform about this change.

We should probably:

* Link to Bevis' original description of the change: https://groups.google.com/forum/#!searchin/mozilla.dev.platform/bevis/mozilla.dev.platform/naB6gaevSSo/w-29kbpUBQAJ
* Explain that this brings us into line with the HTML spec and Chrome (what about Safari? Edge?)
* Possibly mention that this is increasingly necessary as Promises become more widely used (smaug mentioned to me last week that it's important for Custom Elements)
* Refer people to Bevis' slides if they encounter tests that fail as a result of this
* Mention that Olli and Bevis started this work (and arai, hiro, and others are finishing it)
* Mention that adding a pref for this behavior was considered but rejected due to the complexity it would involve: https://bugzilla.mozilla.org/show_bug.cgi?id=1420816#c9

I suppose a combined "Intent to implement and ship"[1] would be appropriate.

[1] https://wiki.mozilla.org/ExposureGuidelines#Intent_to_implement_and_ship
Try run with all patches applied (bug 1439472, and draft patch to disable bug 1441424 test), to see if there's new failure that needs to be disabled or needs extra investigation.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6844d6856193cdcbabe07d0a2715a581feef8c15
Thank you Brian.  I did it.
https://groups.google.com/forum/#!topic/mozilla.dev.platform/bGFs5G8E_NI
apparently something in session and session restore implementation changes :P
this is the pushlog between the last and the current try's parents
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=580d833df9c4&tochange=ee326c976eeb
for example in https://searchfox.org/mozilla-central/source/browser/base/content/test/tabs/browser_bug580956.js
waiting for an event tick after removing tab fixes.

some others seems also to be about undo close tab
rough list of failing tests:
  browser/base/content/test/tabs/browser_bug580956.js
  browser/components/sessionstore/test/browser_350525.js
  browser/components/sessionstore/test/browser_454908.js
  browser/components/sessionstore/test/browser_456342.js
  browser/components/sessionstore/test/browser_628270.js
  browser/components/sessionstore/test/browser_911547.js
  browser/components/sessionstore/test/browser_broadcast.js
  browser/components/sessionstore/test/browser_dying_cache.js
  browser/components/sessionstore/test/browser_formdata.js
  browser/components/sessionstore/test/browser_formdata_cc.js
  browser/components/sessionstore/test/browser_frame_history.js
  browser/components/sessionstore/test/browser_page_title.js
  browser/components/sessionstore/test/browser_sessionHistory.js
  browser/base/content/test/general/browser_bug817947.js
  browser/base/content/test/general/browser_ctrlTab.js
  xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_contentscript.js
possibly from bug 1392352 or bug 1432509.

bug 1392352 changes tabbrowser.xml to js, so it means that MicroTask checkpoints in XBL accessors are removed.
I'll check backing out that fixes.

if it's the reason, possible solutions are:
  1. backout bug 1392352 for now
  2. fix tests to wait for an event tick to make sure microtask checkpoint is performed before checking some state
     (looks like most tests are checking or using session-related UI or feature, so requiring an extra event tick won't be an issue)
  3. fix implementation somehow (?)
triggered try with backing out bug 1392352
https://treeherder.mozilla.org/#/jobs?repo=try&revision=73e026cabb23f72f6fbd362f22a0cc741b2bd7d1&filter-searchStr=bc%20linux%2064%20opt

meanwhile, I'll check if it's possible to fix testcases.
so far I confirmed that waiting an event tick after closing tab fixes 4 of them.
turns out that all of them are the same issue.
removing a tab won't update the closed tabs information instantly,
but those tests expect that.
So, just waiting for an event tick solves.

then, backing out bug 1392352 doesn't fix the orange, so it should be from something else.
anyway, I think just fixing tests should be fine.
Depends on: 1441872
reduced the range
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=724b51d7263f&tochange=ee326c976eeb
possibly here
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=724b51d7263f&tochange=83c87140dc3d
FWIW, I just pushed a try with the patch for bug 1441872 and :cpearce patch in bug 1441424.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a7d2e936e716831852bfcd9d052549e087608d4
arai also pushed a try;
https://treeherder.mozilla.org/#/jobs?repo=try&revision=821ea90bc82a42f6df93b3aeded41b1dfe163a4f&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-resultStatus=runnable

And it seems to dom/animation/test/chrome/test_running_on_compositor.html started failing on Android.  I am fine with disabling  the test.  Filed bug 1442150 for that and enabling it again.
Depends on: 1442150
https://hg.mozilla.org/integration/mozilla-inbound/rev/55c94c05c57f0bb34ca131c68c79e05a97aa4b6c
Bug 1193394 - Part 1: Microtasks and promises scheduling. r=bevis

https://hg.mozilla.org/integration/mozilla-inbound/rev/a54ef2d8f89626be6809676ff3534f077910052c
Bug 1193394 - Part 2: Update expected failure checks. r=smaug
seems like there's collision with bug 1420939.
I'll apply some extra workaround and disable some others.
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8632bbbd2730a57af28ba0c356f20b01a6185d5
Bug 1193394 - Part 3: Disable browser_ext_browserAction_popup.js and browser_ext_browserAction_popup_resize.js. r=bustage

https://hg.mozilla.org/integration/mozilla-inbound/rev/29e1fceaf48d9a09662a18789d034f3093c4b5e8
Bug 1193394 - Part 4: Wait for the next event tick before calling promiseDocumentFlushed in descriptionHeightWorkaround. r=bustage CLOSED TREE
I'll file followup bugs for them
Blocks: 1442187
Blocks: 1442189
No longer blocks: 1442189
Depends on: 1442189
No longer blocks: 1442187
Depends on: 1442187
https://hg.mozilla.org/integration/mozilla-inbound/rev/a0e26f6b2784e7946f166dbaed90861342fa6fb1
Backed out changeset 29e1fceaf48d (bug 1193394)

https://hg.mozilla.org/integration/mozilla-inbound/rev/0e7140a7c841c3b74b70d050636a01b9d4619b56
Bug 1193394 - Part 5: Disable some tests that relies on descriptionHeightWorkaround. r=bustage CLOSED TREE
Depends on: 1442218
https://hg.mozilla.org/integration/mozilla-inbound/rev/9683f24ff8ec5ba768c4a0a124d8439c228b2c8b
Bug 1193394 - Part 6: Disable browser_devices_get_user_media_unprompted_access.js and browser_ext_commands_execute_browser_action.js. r=bustage CLOSED TREE
Backed out browser-chrome failures on browser_ext_popup_background.js
backout: https://hg.mozilla.org/integration/mozilla-inbound/rev/a416b0a21b1395dfe7dc28577a31be57c075d9b4
push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=9683f24ff8ec5ba768c4a0a124d8439c228b2c8b 
failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=165196742&repo=mozilla-inbound&lineNumber=12857

[task 2018-03-01T14:05:09.256Z] 14:05:09     INFO - TEST-PASS | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_popup_background.js | Arrow content should have correct background - 
[task 2018-03-01T14:05:09.260Z] 14:05:09     INFO - TEST-PASS | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_popup_background.js | Arrow should have correct background - 
[task 2018-03-01T14:05:09.266Z] 14:05:09     INFO - Test that dynamically-changed background color is applied
[task 2018-03-01T14:05:09.267Z] 14:05:09     INFO - TEST-PASS | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_popup_background.js | Arrow content should have correct background - 
[task 2018-03-01T14:05:09.270Z] 14:05:09     INFO - TEST-PASS | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_popup_background.js | Arrow should have correct background - 
[task 2018-03-01T14:05:09.272Z] 14:05:09     INFO - Test that non-opaque background color results in default styling
[task 2018-03-01T14:05:09.273Z] 14:05:09     INFO - TEST-PASS | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_popup_background.js | Arrow fill should be the default one - 
[task 2018-03-01T14:05:09.275Z] 14:05:09     INFO - Test menu panel browserAction popup
[task 2018-03-01T14:05:09.277Z] 14:05:09     INFO - Console message: Warning: attempting to write 7264 bytes to preference browser.uiCustomization.state. This is bad for general performance and memory usage. Such an amount of data should rather be written to an external file. This preference will not be sent to any content processes.
[task 2018-03-01T14:05:09.279Z] 14:05:09     INFO - Console message: Warning: attempting to write 7320 bytes to preference browser.uiCustomization.state. This is bad for general performance and memory usage. Such an amount of data should rather be written to an external file. This preference will not be sent to any content processes.
[task 2018-03-01T14:05:09.286Z] 14:05:09     INFO - Buffered messages finished
[task 2018-03-01T14:05:09.288Z] 14:05:09     INFO - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_popup_background.js | Test timed out - 
[task 2018-03-01T14:05:09.289Z] 14:05:09     INFO - Not taking screenshot here: see the one that was previously logged
[task 2018-03-01T14:05:09.290Z] 14:05:09     INFO - TEST-UNEXPECTED-FAIL | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_popup_background.js | Extension left running at test shutdown - 
[task 2018-03-01T14:05:09.291Z] 14:05:09     INFO - Stack trace:
[task 2018-03-01T14:05:09.293Z] 14:05:09     INFO -     chrome://mochikit/content/tests/SimpleTest/ExtensionTestUtils.js:ExtensionTestUtils.loadExtension/<:109
[task 2018-03-01T14:05:09.294Z] 14:05:09     INFO -     chrome://mochikit/content/browser-test.js:Tester.prototype.nextTest<:666
[task 2018-03-01T14:05:09.295Z] 14:05:09     INFO -     timeoutFn@chrome://mochikit/content/browser-test.js:1162:9
[task 2018-03-01T14:05:09.296Z] 14:05:09     INFO -     setTimeout handler*Tester_execTest@chrome://mochikit/content/browser-test.js:1124:9
[task 2018-03-01T14:05:09.297Z] 14:05:09     INFO -     Tester.prototype.nextTest</<@chrome://mochikit/content/browser-test.js:958:9
[task 2018-03-01T14:05:09.298Z] 14:05:09     INFO -     SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:795:59
[task 2018-03-01T14:05:09.303Z] 14:05:09     INFO - TEST-FAIL | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_popup_background.js | Assertion count 2 is greater than expected range 0-0 assertions. - 
[task 2018-03-01T14:05:09.305Z] 14:05:09     INFO - GECKO(1043) | MEMORY STAT | vsize 2881MB | residentFast 372MB | heapAllocated 101MB
[task 2018-03-01T14:05:09.306Z] 14:05:09     INFO - TEST-OK | browser/components/extensions/test/browser/test-oop-extensions/browser_ext_popup_background.js | took 90226ms
Flags: needinfo?(arai.unmht)
(In reply to Tooru Fujisawa [:arai] from comment #127)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/
> a0e26f6b2784e7946f166dbaed90861342fa6fb1
> Backed out changeset 29e1fceaf48d (bug 1193394)
> 
> https://hg.mozilla.org/integration/mozilla-inbound/rev/
> 0e7140a7c841c3b74b70d050636a01b9d4619b56
> Bug 1193394 - Part 5: Disable some tests that relies on
> descriptionHeightWorkaround. r=bustage CLOSED TREE

Hi, I can totally understand that we would like to get this bug done, but "temporarily" disabling several security UI tests should not happen without Firefox peer review. FWIW, this patch would not get an r+ from me, unless we would have a strong guarantee that the cause of the failure will be fixed within 1 or 2 days and that you're ready to back out your patches again otherwise. Is that the case (sorry for missing some context here)?

Thank you.
Also to be clear, people are working on these tests and on the areas of code covered by these tests right now.
I apologize about disabling the tests.  Honestly we don't want to disable any tests but we have limited knowledge about browser UI code, so there was no choice at that moment.  Actually arai tried a workaround there instead of disabling the tests, but it failed.  So, please help up and get involved.  Thanks!
See Also: → 1442465
Disabling tests with just r=bustage is not acceptable IMO even if it is only temporary. If you need help from front-end engineers to help fix failing tests then there are plenty of us who could be asked for help, you can reach out in #fx-team or any of the Firefox peers should be willing to help: https://wiki.mozilla.org/Modules/Firefox
Sweet. Thanks for your cooperation, next time I will get in touch with you guys.  Now we need your helps in bug 1442465 and bug 1442187.
Hi, first of all I apologize for the disruption to your work.

The broader context for this is: https://groups.google.com/forum/#!topic/mozilla.dev.platform/bGFs5G8E_NI

The context specific to disabling the tests is that hiro and arai have been working tirelessly to fix tests from these patches (something they picked up since the Taipei engineer responsible was no longer available) but in the time it takes to fix one test and wait for review, another 2 or 3 failing tests would be added. Furthermore, this is a core change that was determined infeasible to guard behind a pref[1] such that we could simply pref it off for failing tests.

Hiro and Arai have been running try runs all week, chasing down each new failure and fixing it but by the time they get review for each new failure fix (since the reponsible group is different each time), another 2 or 3 have shown up. Yesterday they got as close as we've ever been in several months of doing this so once the final review came in Arai landed this bug and both Hiro and Arai stayed up very late watching for new failures. They tried very hard to fix the failures from bug 1420939 (which was only pushed to mozilla-inbound about 12 hours prior and hence wasn't included in the most recent try run) but the fix didn't appear to work so the test was disabled until we could investigate it today. Rest assured they understand probably better than anyone else the importance of tests.

I apologize again for the disruption but we'd appreciate your help to get this bug to stick--it blocks so many groups' work and is not in Hiro and Arai's primary area of work but is something they are doing to help unblock others.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1420816#c9
(In reply to Brian Birtles (:birtles) from comment #135)
> The context specific to disabling the tests is that hiro and arai have been
> working tirelessly to fix tests from these patches (something they picked up
> since the Taipei engineer responsible was no longer available) but in the
> time it takes to fix one test and wait for review, another 2 or 3 failing
> tests would be added. Furthermore, this is a core change that was determined
> infeasible to guard behind a pref[1] such that we could simply pref it off
> for failing tests.
> 
> Hiro and Arai have been running try runs all week, chasing down each new
> failure and fixing it but by the time they get review for each new failure
> fix (since the reponsible group is different each time), another 2 or 3 have
> shown up. Yesterday they got as close as we've ever been in several months
> of doing this so once the final review came in Arai landed this bug and both
> Hiro and Arai stayed up very late watching for new failures. They tried very
> hard to fix the failures from bug 1420939 (which was only pushed to
> mozilla-inbound about 12 hours prior and hence wasn't included in the most
> recent try run)

I'm not unsympathetic. But if that's the case, they should have sent out a warning email about the impending large change, and worked with the sheriffs to backout the conflicting changes.

The failing tests suggested that critical parts of our browser UI may have been broken by the change. And even if they weren't, having the tests disabled makes it much more likely that breakage to those parts of the UI would go unnoticed.

If you'd had approval from the relevant peers to disable the tests, that would be one thing. That would at least ensure that the people working on that code could verify that nothing was broken. And that they knew tests were disabled and that their changes might cause silent breakage.
> they should have sent out a warning email about the impending large change

Hiro did, yesterday.  It's on dev-platform, like any other intent to ship.

There was also mail about this from Bevis back in December describing the coming change and asking people to be very careful writing tests that use promises and not to make invalid assumptions about promise resolution timing.  People kept right on writing broken tests, because they had no way to tell they were writing them.

> If you'd had approval from the relevant peers to disable the tests,

The problem is that it takes longer to get approval than it does to add new tests that are broken...

Maybe it would have been good to give people more heads-up that their tests were about to be disabled.  Maybe it would have been ok to close the tree for a few days to land this.

I think the right thing to do at this point is to focus on getting any still-disabled tests reenabled ASAP.
To be clear, I'm not trying to attack anybody over this. But I want to make sure this doesn't happen again.

And since these patches have been backed out, and two peers and the module owner for the disabled tests have commented that the way this was handled was unacceptable, we need to come up with a better plan.

(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #137)
> > they should have sent out a warning email about the impending large change
> 
> Hiro did, yesterday.  It's on dev-platform, like any other intent to ship.

He sent an intent to ship, yes, and I read it. But intent to ship threads are generally about web-facing changes that need sign-off from the relevant peers. That's different from "I'm about to land a disruptive change that might break your patches and require a tree closure".

If they'd sent a message saying that they planned to land this, and if it broke any tests that didn't fail in their try run, those tests would be disabled, I would have objected, several others would have objected, and we would have either come up with a different plan, or come to some other sort of agreement.

> > If you'd had approval from the relevant peers to disable the tests,
> 
> The problem is that it takes longer to get approval than it does to add new
> tests that are broken...

That's true, but those are nonetheless the tree rules. I know we bend them often enough, but this went beyond bending. Some of my tests were disabled, which I wasn't happy about, but I would have let pass. But disabling security UI tests without *any* kind of review, let alone review by a peer of the module goes so far beyond that...

I'm a peer of the module those tests live in, and technically have the right to approve disabling those tests, and I would never even consider disabling them without approval of the people directly responsible for them. Especially the day before a code freeze. I would expect to lose my privileges over something like that.

> Maybe it would have been good to give people more heads-up that their tests
> were about to be disabled.  Maybe it would have been ok to close the tree
> for a few days to land this.

I don't think we would have needed to close the tree for a few days to land this. My suggestion would be:

1) Send an email to dev-platform and dev-firefox announcing that these changes will land during a tree closure.
2) Close the trees, merge to m-c, and land this patch.
3) If any failures occur, bisect to the last green try run, and backout the patches that triggered the failure.
4) Re-open trees once green.

That should take, at most, a few hours.

> I think the right thing to do at this point is to focus on getting any
> still-disabled tests reenabled ASAP.

The disabled tests have been re-enabled by the backout. So if we can land this time without breaking any new tests, that's fine. But if we can't, we need to decide on the right way to handle this.
> we need to come up with a better plan

OK, the better plan I would propose (assuming that the new semantics actually matches the spec for promises) is that people get _very_ responsive when Hiro or Arai approach them for help with getting their tests fixed.  Now how do we get there?

> If they'd sent a message saying that they planned to land this, and if
> it broke any tests that didn't fail in their try run, those tests would be disabled

Just so we're on the same page, the intent to ship _did_ say this:

   "We had fixed all test failures but still it's possible
    that new failures will appear before the change gets merged
    into mozilla-central.  If we found any failed tests we will disable
    it temporarily and file a bug to enable it respectively."

Note that I've objected to people disabling tests in order to land in the past, most vehemently.  So I really do sympathize with your take here, trust me!  I definitely assumed that the relevant test owners would be contacted about any tests that needed to be disabled...

In any case, what all this means in practice here is that landing this needs to be not just Hiro and Arai's responsibility.

> Especially the day before a code freeze.

This, I agree, is a bad idea.

> we need to decide on the right way to handle this.

The right way to handle this is for people who write broken tests (even if they did so inadvertently) to step up and help fix those tests.

> That should take, at most, a few hours.

I suspect just the end-to-end time on the relevant builds will eat up at least a day.  But OK, if we're ok with backing out new tests that fail instead of disabling them, then your proposed 4-step plan would work.  What's not clear to me is why backing out a test is in any way better than disabling it.  Either way, you're no longer running that test.
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #139)
> > If they'd sent a message saying that they planned to land this, and if
> > it broke any tests that didn't fail in their try run, those tests would be disabled
> 
> Just so we're on the same page, the intent to ship _did_ say this:
> 
>    "We had fixed all test failures but still it's possible
>     that new failures will appear before the change gets merged
>     into mozilla-central.  If we found any failed tests we will disable
>     it temporarily and file a bug to enable it respectively."

Ah. Fair enough. But this is why I suggested something separate from an intent to implement, since even though I read the message, I didn't see that.

> > we need to decide on the right way to handle this.
> 
> The right way to handle this is for people who write broken tests (even if
> they did so inadvertently) to step up and help fix those tests.

Agreed. I'm willing to help fix any tests that are affected by this.

> > That should take, at most, a few hours.
> 
> I suspect just the end-to-end time on the relevant builds will eat up at
> least a day.  But OK, if we're ok with backing out new tests that fail
> instead of disabling them, then your proposed 4-step plan would work. 
> What's not clear to me is why backing out a test is in any way better than
> disabling it.  Either way, you're no longer running that test.

I'm not suggesting backing out the tests. I'm suggesting backing out the changesets that caused the test failures. Hiro and Arai identified the changeset that broke the UI tests after their last try run pretty quickly. Backing out that changeset wouldn't have required disabling any tests, and the affected tests could have been fixed when the smaller patches re-landed without risking much new bustage landing in the mean time.
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #140)
> I'm not suggesting backing out the tests. I'm suggesting backing out the
> changesets that caused the test failures. Hiro and Arai identified the
> changeset that broke the UI tests after their last try run pretty quickly.
> Backing out that changeset wouldn't have required disabling any tests, and
> the affected tests could have been fixed when the smaller patches re-landed
> without risking much new bustage landing in the mean time.

That's completely my fault.  At that time, I tried to get in touch with the changeset author and reviewer to get the permission for backing out, but couldn't get in touch.  Whereas I decided to disable tests without permission.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #141)
> That's completely my fault.  At that time, I tried to get in touch with the
> changeset author and reviewer to get the permission for backing out, but
> couldn't get in touch.  Whereas I decided to disable tests without
> permission.

As a general rule, backouts for bustage are always acceptable without permission. Disabling tests for bustage is not.

I can make myself available the next time you try to land this if you need help dealing with any Firefox, toolkit, or XPConnect failures. If I'm not available and you can't find another peer, please prefer backing out to disabling tests.
I apologize for the trouble, and thank you all for your help!

I'm planning to land patches today if remaining patches get r+ (and no failure happens for try runs),
if something goes wrong hardly again, I'll back them out, and try again on Monday morning.


here's try runs for inbound and autoland, and inbound with beta-simulation, with patches from bug 1442187 and bug 1442218:

  inbound: https://treeherder.mozilla.org/#/jobs?repo=try&revision=51c3cb736e418415fd44a239ab220c82fe62b991
  autoland: https://treeherder.mozilla.org/#/jobs?repo=try&revision=52b2a752104d84365b8fdf249f1c47496c0a0862
  inbound beta: https://treeherder.mozilla.org/#/jobs?repo=try&revision=028e20e4e03b9f1496deb6f1e543bc3783c15ac2

and links to changes from the try runs above (to make it easier to figure out what causes the failure, when landing):

  https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b59a2fcd3435&tochange=tip
  https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=9f0ee2f582a2&tochange=tip
Flags: needinfo?(arai.unmht)
I would clearly prefer that we delay this to 61. As this bug has been opened 3 years ago, we can wait a few more days.
Could you please backout the changes and land that in 61 at the beginning of the cycle? (it isn't that far in time)

This will give you more time to fix the regressions in the tests.

Also, please note that we don't have the aurora branch anymore. So, it is critical that the current nightly (which is also in beta) is stable.
Flags: needinfo?(arai.unmht)
(I meant, not landing the patch again until nightly == 61, ie March 12th).
If we delay this to 61, uplifting other patches (I mean, patches unrelated to this bug but somehow touches promise) to ESR will become very troublesome,
as we've seen so many failure caused by the execution timing difference, similar issue may happen for each ESR uplift.
that's the reason why I'd like to land this to 60.
to be clear, it's not a strong opinion.
if keeping beta stable is more important than making ESR uplift safer, I'm fine with delaying.
Flags: needinfo?(arai.unmht)
Sorry but it looks like it is an important change landing very late in the 60 cycle. Seeing that you had to disable tests, I expect other unexpected fallouts during the beta cycle.

Could you detail what will be the impact for ESR?
(In reply to Sylvestre Ledru [:sylvestre] from comment #148)
> Sorry but it looks like it is an important change landing very late in the
> 60 cycle. Seeing that you had to disable tests, I expect other unexpected
> fallouts during the beta cycle.
> 
> Could you detail what will be the impact for ESR?

The change in this bug will change the execution timing of Promise resolution/rejection callback,
specifically, the callback will be executed earlier and more frequently than before.

so, a patch written after this fix will expect that frequent/earlier execution timing,
and uplifting them to not-fixed version means, the callback won't be executed until later,
and that may result in malfunction.  and every each uplift-patch will have to worry that if this patch miss 60.
> every each uplift-patch will have to worry that if this patch miss 60
as we take a wide range of patch (stability, security, etc), could you elaborate why this will impact every patch?

I am still unclear what would be the final impact on the overall level? Tests failing in the ESR branch?
(In reply to Sylvestre Ledru [:sylvestre] from comment #150)
> > every each uplift-patch will have to worry that if this patch miss 60
> as we take a wide range of patch (stability, security, etc), could you
> elaborate why this will impact every patch?

Promise callback is fundamental part these days, it's used in so many UI part and platform code,
and the execution timing (MicroTask checkpoint) is inside core layer of the browser code, so both part will be affected.
(so... maybe "every-each" might be too-strong.  at least most of UI and platform change)


> I am still unclear what would be the final impact on the overall level?
> Tests failing in the ESR branch?

yeah, if it's properly covered by test, it will result in test failure.
(In reply to Sylvestre Ledru [:sylvestre] from comment #150)
> > every each uplift-patch will have to worry that if this patch miss 60
> as we take a wide range of patch (stability, security, etc), could you
> elaborate why this will impact every patch?
> 
> I am still unclear what would be the final impact on the overall level?
> Tests failing in the ESR branch?

Yes, but also web compatibility fallout. So if people write a website and test with release Firefox, people on ESR may not get the 'right' behaviour because the order of execution of bits of code are different.

This is a very important part of aligning with other browsers in terms of execution order of async(-ish) code involving event handlers, promise resolution, etc. I agree with :arai that it'll ultimately be better to have this in 60, even if the next few days are going to be bumpy. Like Kris, I'm happy to commit to helping make it as smooth as possible.

(In reply to Tooru Fujisawa [:arai] from comment #143)
> I apologize for the trouble, and thank you all for your help!
> 
> I'm planning to land patches today if remaining patches get r+ (and no
> failure happens for try runs),
> if something goes wrong hardly again, I'll back them out, and try again on
> Monday morning.

If you're intending to land on inbound again, go ahead with that plan. But if we do end up hitting Monday, I wonder if Kris' plan in comment 138 to land on m-c might make this easier, because you have a roughly stable base to work with.
(In reply to Sylvestre Ledru [:sylvestre] from comment #150)
> I am still unclear what would be the final impact on the overall level?
> Tests failing in the ESR branch?

When we have a critical security bug found in 61+ and need to uplift to ESR we may get unexpected failures because ESR uses different promise behavior.  It will make it harder to do all future uplifts to ESR60.
> I would clearly prefer that we delay this to 61. As this bug has been opened 3 years ago, we can wait a few more days.

This is already causing a lot of bugs and inconsistencies in our codebase. Having to maintain this on ESR, especially in parallel with the fixed Promise behavior would be a nightmare for all code maintainers.

I sympathize with the general policy of avoiding landing major changes late in the cycle, but I see this as an exception worth having.
Blocks: 1442150
No longer depends on: 1442150
Sylvestre, any chance we can get your insight about landing this to Firefox 60 until Monday morning in JST (it's midnight in EU)?  Monday morning in JST is a good time to land this since usually there are not so much activities on our trees.

Thanks!
Flags: needinfo?(sledru)
Julien is the owner of 60. 

From the various discussions we had about that, looks like we don't have much of a choice... I hope this won't put the 60 release at risk!
Flags: needinfo?(sledru) → needinfo?(jcristau)
Thank you for your comment.

given that it's already Monday morning, I'll land these patches now.
Julien, if they're not acceptable for 60, please back them out.
https://hg.mozilla.org/integration/mozilla-inbound/rev/068c59c7c4ec46802b4a18e98adc227aed6d5da1
Bug 1193394 - Part 1: Microtasks and promises scheduling. r=bevis

https://hg.mozilla.org/integration/mozilla-inbound/rev/84ea422093dd615bd191a3e579ed299d96c802f1
Bug 1193394 - Part 2: Update expected failure checks. r=smaug
https://hg.mozilla.org/mozilla-central/rev/068c59c7c4ec
https://hg.mozilla.org/mozilla-central/rev/84ea422093dd
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
IonuΘ›, this bug regresses DAMP talos results significantly, but it's expected.  Please see :ochameau's comment int comment 99.  Thank you!
Flags: needinfo?(igoldan)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #162)
> IonuΘ›, this bug regresses DAMP talos results significantly, but it's
> expected.  Please see :ochameau's comment int comment 99.  Thank you!

Thanks for the heads up! I already see the effects on all desktop platforms.
Flags: needinfo?(igoldan)
Depends on: 1443401
Depends on: 1443429
I'm super happy to confirm that the abort/commit behavior when using promises within IndexedDB transactions is now fixed (see Comment 14 and Comment 16) and conform to Chrome.

https://jsfiddle.net/8v6syfca/

This is huge! ...the previous implementation with regards to transactions had forced many developers to write hacks and twisted code when working with IndexedDB.
Whoooop! That's brilliant news!
Depends on: 1443746
Depends on: 1446644
Depends on: 1446031
Flags: needinfo?(jcristau)
Depends on: 1483124
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: