Closed Bug 994712 Opened 10 years ago Closed 10 years ago

the wheels fell off the CART - talos regression on all platforms from 8-25%

Categories

(Toolkit :: Async Tooling, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32
Tracking Status
firefox31 + fixed
firefox32 --- fixed

People

(Reporter: jmaher, Assigned: Gijs)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression][qa-])

Attachments

(1 file)

it is unbelievable but true, the horse named talos is having trouble pulling the CART because the wheels fell off when bug 887923 landed on fx-team.  

I know this is true because I read it on the internet:
https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/9PhctvDggho

In fact, I looked on tbpl and saw a jump in numbers to double check:
https://tbpl.mozilla.org/?tree=Fx-Team&fromchange=1af4ac38f025&tochange=67e72ea05f66&jobname=fx-team%20talos%20svgr

If we don't have a fix in the works for this, I would like to back this out until we can get this to have a smaller impact.
Summary: the wheels fell of the CART - talos regression on all platfroms from 8-25% → the wheels fell of the CART - talos regression on all platforms from 8-25%
Arrrrrrrgh. :/

Yoric - were performance comparisons done between the original Promise implementation that Task used, and the one that bug 887923 uses instead?
Flags: needinfo?(dteller)
Summary: the wheels fell of the CART - talos regression on all platforms from 8-25% → the wheels fell off the CART - talos regression on all platforms from 8-25%
(In reply to Mike Conley (:mconley) from comment #1)
> Arrrrrrrgh. :/
> 
> Yoric - were performance comparisons done between the original Promise
> implementation that Task used, and the one that bug 887923 uses instead?

The new implementation calls setTimeout quite often. In terms of raw performance, it's clearly slower. In terms of responsiveness, it's expected to be much better, though. Also, it doesn't explode the stack (see bug 991668), which means we pretty much want it.

Unless there's a better plan, Paolo?
Flags: needinfo?(paolo.mozmail)
Looking at the regression here: https://datazilla.mozilla.org/?start=1396570098&stop=1397142363&product=Firefox&repository=Fx-Team-Non-PGO&os=win&os_version=5.1.2600&test=cart&project=talos

It looks like the hit came from the .error tests. The same push seems to have actually caused us to paint frames _faster_, which is interesting.

avih suggests that this spike in "error" might be because the Task module takes longer to load. Is that possible?
Looking a bit deeper into the numbers, CART runs the customize animation 25 times in a row (you can see their values if you click/hover a single data point on datazilla main view). It seems that the first animation on this sequence has consistently higher error values than the rest.

Before the regression, the first 1-customize-enter.error.TART (on windows 7 on datazilla) value is around ~370ms. The rest are ~240ms.

After the regression, the first is still ~370ms, but the rest are now ~290ms.

So to me, this suggest that initialization (probably regardless of loading the module from disk) of the customize animation with the new promise somehow takes longer, or waits longer - before the actual animation starts.
(In reply to Mike Conley (:mconley) from comment #3)
> Looking at the regression here:
> https://datazilla.mozilla.org/
> ?start=1396570098&stop=1397142363&product=Firefox&repository=Fx-Team-Non-
> PGO&os=win&os_version=5.1.2600&test=cart&project=talos
> 
> It looks like the hit came from the .error tests. The same push seems to
> have actually caused us to paint frames _faster_, which is interesting.
> 
> avih suggests that this spike in "error" might be because the Task module
> takes longer to load. Is that possible?

Mike, isn't the talos test essentially measuring from start of the gCustomizeMode.enter call until the end of the transition?

If we have yields before the transition, that'd explain the slowdown and the increase in error on the first value.

Can we be evil and make the talos test redefine the _transitioning property and use that instead to mark the beginning/end of the transition, and use that for measurements?
Flags: needinfo?(mconley)
Flags: needinfo?(avihpit)
This could be just a measurement issue, or an actual regression.

There is a difference between the two implementations that may cause a regression, that is the fact that resolutions are delayed until the next event loop tick, unless they already happen during a resolution loop.

If some code uses setTimeout to resolve a promise, the setTimeout callback will never be located inside a resolution chain, thus any promise will only be resolved during the next tick. This is expected in a correct Promise implementation - the old behavior was simply incorrect.

The simplest way to work around this is to avoid starting many Tasks in performance-sensitive areas. Another thing we could do is to create a Promise.setTimeout static function, that behaves exactly like setTimeout but sets up an outer Promise resolution loop before invoking the callback, so that resolutions (including those inside Task.spawn invocations) may start during the same tick.

Apart form this, there is no significant difference in performance between the two JavaScript Promises implementations in most cases - the only other case that might experience a difference is when the Promise.reject() static function is called, since a stack trace is now collected. Rejections, like exceptions, should not happen under normal circumstances.

In any case, I'd be more inclined to attempt a forward resolution first, to fix the measurements or any identified issue, rather than backing out, which I'd only do as a last resort. The more we keep the incorrect implementation in place, the more code could be developed that relies on it.
Flags: needinfo?(paolo.mozmail)
Of course, if instead of setTimeout the resolution is triggered by any other DOM event, the only solution is to avoid using Promises or Tasks at all. In fact, a possible Task usage mistake we identified during the conversion was the pattern of spawning a task and using it to set a response to the event using a closure, assuming the Task started synchronously.
And finally, if you're using Tasks only to start the animation and not to control individual frames, then this is just one event loop tick getting in the way of the measurement, and this is unlikely to cause any noticeable difference during real usage, so not a real regression.
I realize that there is a second difference: Promise.jsm is an empty shell to Promise-backend.js (I'm not sure exactly why, but that was requested by devtools, iiuc), so loading requires one additional set of disk I/O operations and two shell scripts.
Flags: needinfo?(dteller)
I think we should profile a build with and without the new Promise implementation to make sure we're not skipping over something here.

Otherwise, I'm fully prepared to eat this as a single event loop tick getting in the way of our measurement... is there a simple way to prove or disprove that theory?
(In reply to :Gijs Kruitbosch from comment #5)
> Mike, isn't the talos test essentially measuring from start of the
> gCustomizeMode.enter call until the end of the transition?
> 
> If we have yields before the transition, that'd explain the slowdown and the
> increase in error on the first value.
> 
> Can we be evil and make the talos test redefine the _transitioning property
> and use that instead to mark the beginning/end of the transition, and use
> that for measurements?

The regression reflect a user-visible change where the duration from triggering customize mode until it's done has increased. It's quite probable that more specifically, it reflect an increased "lag" from the trigger until the actual animation starts.

Changing the measurement to not reflect this change would just make this change invisible to us, which is not a desirable result.

CART measure several aspects of the animation, hopefully to reflect as many performance changes as we can wrt customize mode entering and exiting. The lag between the trigger and the actual animation start definitely falls into the things we want to express in numbers.

We can try to change how CART measures things, but however you look at it, the regression in this bug is not something we want invisible to us.
Flags: needinfo?(avihpit)
I think Avi and I are pretty much aligned on this.

Let's not back out the Promise implementation just yet, but let's see if some profiles show us anything useful about where the lag is coming from.

I'll get to that maybe today.
Flags: needinfo?(mconley)
Ok, took a few minutes and ran some profiles on my Win 7 VM. Verrrrrrry interesting.

I added some markers in the build I tested to mark different points during the transition.

This is 4 runs of CART via the add-on interface:

Original promise implementation: http://people.mozilla.org/~bgirard/cleopatra/#report=0eab479b51d76af9f6c34cf74fe1859b33ffc1ef

New promise implementation: http://people.mozilla.org/~bgirard/cleopatra/#report=3733fc9c24a13fe96bc4a0f4218055c1d5a241e2


Notice anything different? I sure did.

CART's jamming in a bunch of markers itself, so you'll have to dodge around them, but see if you can just select and zoom into a single range from TransitionEnter:start to TransitionEnter:end (TransitionEnter:start marks the point where we add the attribute that kicks off the CSS transition, and TransitionEnter:end marks the point where we receive the "transitionend" event OR the setTimeout fallback).

For the old promise implementation, I see a big red spike where we block the main-thread.

In the new promise implementation, it looks smooth as butter - but it takes longer.

My conclusion here is that with this new implementation, somehow we're able to spin the event loop more - the end result being that we're blocking the main-thread for less of the time but take longer to do the job.

Does that conclusion make sense?
The new implementation also somehow seems to fix 985373.

I'm starting to feel like we should take the money and run.
Bug 985373, to be specific, and to make a link.
(In reply to Mike Conley (:mconley) from comment #13)
> Does that conclusion make sense?

Yes. That being said, I wonder if we could get the lead-in for the actual transition to not have a timeout by short-circuiting the yield for PanelUI.ensureReady() so it's sync if the panel was ready anyway (right now, we always return a promise, which will cause the 'resume' of the outer "customize enter" function we passed to Task.spawn to be async - passing an object or some other truthy non-promise value would cause it to be sync). It would be interesting to know if that doesn't undo all the benefit we seem to have gotten from this change (and if it gains us anything in terms of the lead time that we seem to have before starting the transition). It seems you have this set up locally - are you able to quickly test this and what effect it has? :-)
Flags: needinfo?(mconley)
(In reply to Mike Conley (:mconley) from comment #13)
> In the new promise implementation, it looks smooth as butter - but it takes
> longer.

Do I understand correctly that the gains are user-visible but the losses aren't?
If so, I'd say this regression is actually a net gain.
Clearing the needinfo because short-circuiting the ensureReady function didn't seem to move any numbers, at least as far as I saw.

(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on April 9th-16th) from comment #17)
> (In reply to Mike Conley (:mconley) from comment #13)
> > In the new promise implementation, it looks smooth as butter - but it takes
> > longer.
> 
> Do I understand correctly that the gains are user-visible but the losses
> aren't?
> If so, I'd say this regression is actually a net gain.

I'm starting to lean that way as well, but I'd really like to know why this "error" value is going up. Neither of my profiles from my Win 7 VM is being clear about where the time is going... all I know is that we're spending _less_ time blocking the main thread, which is a good thing.
Flags: needinfo?(mconley)
I think these profiles from my 10.6 talos slave make it clearer. At the time of me writing this comment, Cleopatra seems to be freaking out and painting everything red, which it shouldn't. I also don't have symbols here, but they're really not important - what I'm interested in is measuring the time between markers (it's kinda hard since the markers are so dense, but if you double-click on one, the tree hierarchy will allow you to highlight individual markers).

CART on old promise implementation: http://people.mozilla.org/~bgirard/cleopatra/#report=939ac157194e0be380f9c75cba440e8dabfecb86

customize-enter test:
  Time to go from starting test to transition start: ~29ms
  Time to go from starting transition to end transition: ~218ms

customize-enter-css test: 
  Time to go from starting test to transition start: ~30ms
  Time to go from starting transition to end transition: ~215ms



CART on new promise implementation: http://people.mozilla.org/~bgirard/cleopatra/#report=86a6ce8015104e750a141d0cf2aac903f2ba4fe5

customize-enter test:
  Time to go from starting test to transition start: ~70ms (+41ms)
  Time to go from starting transition to end transition: ~267ms (+49ms)

customize-enter-css test: 
  Time to go from starting test to transition start: ~42ms (+12ms)
  Time to go from starting transition to end transition: ~237ms (+22ms)

Total increase for customize-enter test: ~90ms
Total increase for customize-enter-css test: ~34ms


The increase of time to hitting the transition start in the customize-enter test is a big one - it looks like the new Promise implementation lets us breathe a bit more between calling gCustomizeMode.enter() on about:customizing loading in a tab, to kicking off the transition.

There's a similar lag in the firing of the customization-transitionend event.

Strange that that lag seems to go down for the customize-enter-css tests.


So my conclusion from all of this is that the new Promise implementation simply allows the event loop to spin more, and that's where the lag is coming from. The fact that we block the main thread less is, IMO, a happy result. We actually also seem to get a bit better at painting each frame too (the average painting time for each frame went down).

I think we should keep the current Promise implementation, with the understanding that this CART "regression" is just lag introduced from spinning the event loop more. Avi, do you concur, or have I missed anything?
Flags: needinfo?(avihpit)
Thanks for the excellent investigation, Mike.

Personally, if I had to choose between these performance characteristics without taking any other factors into account, I'd choose the one with the old promise. IMO the weight of an extra lag of 40-90ms (or more on slower system) is higher than slightly quicker frame intervals.

It also appears that there isn't a readily available solution for the regression with the new promise.

It's also clear that we don't want to delay Australis.

AFAICT, the new promise implementation is quite important to keep, and we're going there anyway, so reverting back to the old promise would only be a temporary measure, and would hold us back with the new promise. Therefore, I think we should keep the new promise.

It's not clear to me from bug 887923 if the new promise is/should-be uplifted to Firefox 29 (or at all), but if it does and we're unable to fix the regression (and we don't really have the time or safety margins for that), it means it would impact the first public appearance of Australis, which is undesirable.

Assuming the worst wrt this regression (new promise is uplifted to 29, we're unable to fix the regression), I would be able to live with it, but I won't be terribly happy about it.

The fact that we understand what triggered the regression is good, but IMO this regression still needs to be addressed.

One option, which I don't know how viable it is, is to globally use the new promise, but use the old implementation with customize for an extra train, and hopefully we'll be able to fix it before customize switches to the new promise. If this approach is reasonably uncomplicated, it could be viable, but I don't think it's a good one if it requires a lot of effort or risk.

The bottom line is that I don't think we should block on this (neither the new promise, nor the regression it causes here), but even if it gets released with the regression, we should still put a good effort to fix it - from the new promise side and/or from the customize side, to get the performance back to good levels. Better late than never.
Flags: needinfo?(avihpit)
Going ahead and tracking this so we can keep and eye on things.
I am not clear based on the comments if we plan to fix this regression or ignore it.  If we will ignore it, please resolve this as wontfix.  From looking at the comments, I would vote on trying to fix this (even if it is partial)
So I had a few minutes to look at this again today. I instrumented Task.jsm, and didn't notice anything significantly slower in its initialization or running of the main task that's spawned for entering customize mode.

I then instrumented the internals of that task, and that's where I found a difference - Gijs was right in comment 16 that at least some of the time is going because we have nested Tasks. Making PanelUI.ensureReady return a resolved promise synchronously if it's already initialized seems to reduce that time significantly. I'm not sure why I didn't catch that before - I had run the experiment a week or so back, and didn't notice a difference. Perhaps I was overtired. :/

Anyhow, I've pushed to try to see if I'm imagining things. I only pushed for OS X because it builds quickly, and I expect this to be platform independent.

baseline: https://tbpl.mozilla.org/?tree=Try&rev=722a4c57999b
patch: https://tbpl.mozilla.org/?tree=Try&rev=04c13d9470f3
compare-talos: http://compare-talos.mattn.ca/?oldRevs=722a4c57999b&newRev=04c13d9470f3&server=graphs.mozilla.org&submit=true

If this turns out to be it, I'll owe Gijs a beverage of his choice. I should have been less careless trying his advice. :)
Hrm. So much for that theory. No improvement, at least not for 10.6.

Still digging...
thanks for looking into this!
(In reply to Mike Conley (:mconley) from comment #23)
> So I had a few minutes to look at this again today. I instrumented Task.jsm,
> and didn't notice anything significantly slower in its initialization or
> running of the main task that's spawned for entering customize mode.
> 
> I then instrumented the internals of that task, and that's where I found a
> difference - Gijs was right in comment 16 that at least some of the time is
> going because we have nested Tasks. Making PanelUI.ensureReady return a
> resolved promise synchronously if it's already initialized seems to reduce
> that time significantly. I'm not sure why I didn't catch that before - I had
> run the experiment a week or so back, and didn't notice a difference.
> Perhaps I was overtired. :/
> 

AIUI, it isn't resolve() that's async - it's .then(foo) which is run async, even if the promise is already resolved (and in this sense, "new promise" differs from old promise). This means that your try push /shouldn't/ make a difference - Promise.resolve() returns a resolved promise, but this._readyPromise is also already resolved at that point.

This makes the issue tricky to fix "properly". Task.jsm will continue running code sync if you yield something that *isn't* a promise, but it's a bit of a hack to not return a promise from something sometimes, but return a promise most of the time. AFAIK there isn't a way to check if a promise is resolved or not (except by attaching a .then() handler, but that will fire async and we're back to square 1...).

The simplest thing would be, perhaps, to add an API "isReady" that just returns a boolean indicating whether ensureReady /needs/ to be called, or not. I've pushed something to try for this on the same baseline:

remote:   https://tbpl.mozilla.org/?tree=Try&rev=60da2fdb580a

Let's see how that goes before investigating our other options. :-)
Ah, thanks for correcting me Gijs. No wonder I wasn't getting happy results - I wasn't doing it right. ;)

http://compare-talos.mattn.ca/?oldRevs=722a4c57999b&newRev=60da2fdb580a&server=graphs.mozilla.org&submit=true

So, holy crap, that's a heck of a win. I don't believe that wins _all_ of the regression back, but it wins the vast majority of it.

From the breakdown though, I do note that the patch also causes our customize-enter-css.half and customize-enter-css.all measurements to go up...

Thoughts, Avi?
Flags: needinfo?(avihpit)
Here's a compare-talos from before / after bug 887923 landed on fx-team. Since this is an fx-team build, we have a ton of other test results in here, and we have all platforms. Try to ignore those.

Try to only focus on CART and OS X in particular (since compare-talos has a hard time filtering out PGO build data, so the Windows and Linux comparisons are likely bogus).

We only did a single CART run for OS X on these two builds, so I've triggered 4 more each.

http://compare-talos.mattn.ca/?oldRevs=0df48cf38d63&newRev=7262fabfd83e&server=graphs.mozilla.org&submit=true
Whoa...

The compare-talos values from comment 27 suggest that this patch completely reverses the perf changes from the new promise (indicated on comment 28). This applies to both the improvements and regressions which the new promise brought, at least on OS X 10.6 - for which we have results from each stage.

So as far as this regression bug goes, reverting all the perf changes would fix it.

As for the other question of: are the previous values better? IMO they are, though that's a subjective view (expressed on comment 20). I don't think there's an absolute answer to this question since the changes are a mix of improvements and regressions without clear weight difference between them.

Note to self/mconley: the enter-css values, especially the .half and .all should have been better than the overall enter .half/all. That's the reason we added the css only part in the first place - to get a clearer picture of the css animation without the overhead of the rest. But it seems to not be the case (enter.half is 10ms, enter-css.half is ~30ms). Need to investigate.
Flags: needinfo?(avihpit)
Attachment #8412864 - Flags: review?(mconley)
Comment on attachment 8412864 [details] [diff] [review]
Gijs's patch - v1

Review of attachment 8412864 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds like we want this. r=me with isReady switched to a getter.

Good find, Gijs. :)

::: browser/components/customizableui/content/panelUI.js
@@ +199,5 @@
>          break;
>      }
>    },
>  
> +  isReady: function() {

Since this isn't taking any arguments, might as well make this a Getter.
Attachment #8412864 - Flags: review?(mconley) → review+
Assignee: nobody → gijskruitbosch+bugs
I just went for it:

remote:   https://hg.mozilla.org/integration/fx-team/rev/bef43b38d6ae
Whiteboard: [talos_regression] → [talos_regression][fixed-in-fx-team]
(In reply to Mike Conley (:mconley) from comment #32)
> I just went for it:
> 
> remote:   https://hg.mozilla.org/integration/fx-team/rev/bef43b38d6ae

Thanks! Sorry, am/was on holiday this weekend, and neglected to land this. :-(

NB: you don't seem to have applied your own nit of making this a getter. I'm not fussed either way, but rs=me to change that (just remember to update the consumer).
https://hg.mozilla.org/mozilla-central/rev/bef43b38d6ae
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [talos_regression][fixed-in-fx-team] → [talos_regression]
Target Milestone: --- → mozilla32
I have verified via the graphs (data on graph server) that we have fixed this on all platforms!!  Well done guys.

This missed the uplift to Aurora- can we backport this to Aurora?
Flags: needinfo?(mconley)
Comment on attachment 8412864 [details] [diff] [review]
Gijs's patch - v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 

CART regression caused by bug 887923.


User impact if declined: 

Possibly a more delayed, less smooth customize mode transition.


Testing completed (on m-c, etc.): 

A few days to bake on m-c, and confirmed improvement in performance from our CART test.


Risk to taking this patch (and alternatives if risky): 

Very low - we just fast path an initialization if it's already taken place.


String or IDL/UUID changes made by this patch:

None.
Attachment #8412864 - Flags: approval-mozilla-aurora?
Flags: needinfo?(mconley)
Attachment #8412864 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: [talos_regression] → [talos_regression][qa-]
You need to log in before you can comment on or make changes to this bug.