bug 959130 caused a significant ts_paint regression

RESOLVED WONTFIX

Status

()

Firefox
Session Restore
RESOLVED WONTFIX
4 years ago
4 years ago

People

(Reporter: Gijs, Assigned: smacleod)

Tracking

({perf, regression})

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [Australis:P-])

(Reporter)

Description

4 years ago
https://tbpl.mozilla.org/?tree=Fx-Team&fromchange=bf25e29dc677&tochange=c092abc72367&rev=75303a3ddc0c

Non-PGO numbers on this cset:

639.47
607.84
626.47
648.42
654.11
626.00
634.11

avg: 633.8


Non-PGO numbers on previous cset:

616.47
612.95
593.47
607.32
613.68
616.16
609.95

avg: 610.0
(Reporter)

Comment 1

4 years ago
Copying useful comments:

(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from bug 967766 comment #19)
> What bug 959130 is replace the input used by CrashMonitor (the OS.File
> worker) and Session Restore (the SessionWorker) by something that doesn't
> take up to 6 seconds to return a value.
> 
> I suspect that we move from the following situation:
> 1. CrashMonitor and SessionRestore request some input;
> 2. firstPaint;
> 3. CrashMonitor and SessionRestore receive their input;
> 4. do stuff with the input;
> 5. session restored.
> 
> To the following situation:
> 1. CrashMonitor and SessionRestore request some input;
> 2. input arrives faster, CrashMonitor and SessionRestore receive their input;
> 3. do stuff with the input;
> 4. firstPaint;
> 5. session restored.
> 
> In which case, there is nothing to worry about.

(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from bug 967766 comment #20)
> Steven, I believe that your code could test whether the input callbacks are
> called before or after firstPaint. This would help us settle the issue.



There is also the separate matter of which of these two csets caused the regression. If they can work separately, it might be worth doing a try push to figure out what's up here. Tim, do you want to take this on?
Flags: needinfo?(ttaubert)
They can definitely work separately.
Try push (pgo) without the CrashManager part:

https://tbpl.mozilla.org/?tree=Try&rev=61ad8e43c78f
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=8c080cc8aa5d&newRev=61ad8e43c78f&submit=true

Compare shows a 27.04ms / 4.9% improvement.



Try push (pgo) without both parts:

https://tbpl.mozilla.org/?tree=Try&rev=9e9eb6d53ab7
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=8c080cc8aa5d&newRev=9e9eb6d53ab7&submit=true

Compare shows a 14.43ms / 2.65% improvement.



So, yeah. We're better off with only the CM part backed out than with both parts backed out, that is a little weird. Maybe the overhead of having the two workers spawned at startup is the reason for that?
Flags: needinfo?(ttaubert)
(Reporter)

Comment 4

4 years ago
(In reply to Tim Taubert [:ttaubert] from comment #3)
> Try push (pgo) without the CrashManager part:
> 
> https://tbpl.mozilla.org/?tree=Try&rev=61ad8e43c78f
> http://perf.snarkfest.net/compare-talos/index.
> html?oldRevs=8c080cc8aa5d&newRev=61ad8e43c78f&submit=true
> 
> Compare shows a 27.04ms / 4.9% improvement.
> 
> 
> 
> Try push (pgo) without both parts:
> 
> https://tbpl.mozilla.org/?tree=Try&rev=9e9eb6d53ab7
> http://perf.snarkfest.net/compare-talos/index.
> html?oldRevs=8c080cc8aa5d&newRev=9e9eb6d53ab7&submit=true
> 
> Compare shows a 14.43ms / 2.65% improvement.
> 
> 
> 
> So, yeah. We're better off with only the CM part backed out than with both
> parts backed out, that is a little weird. Maybe the overhead of having the
> two workers spawned at startup is the reason for that?

Dumb question: can you lazymodulegetter those extra Cu.imports and does that help?
(Reporter)

Comment 5

4 years ago
TIL: I can't spell on weekends. My brain is on leave, clearly.
Summary: bug 959130 caused a signifcant ts_paint regression → bug 959130 caused a significant ts_paint regression
Pushed three csets on top of https://hg.mozilla.org/try/rev/059ed9908fda - a clean revision prior to all regression.

1) 059ed9908fda + part 1 (session file)
2) 059ed9908fda + part 2 (crash monitor)
3) 059ed9908fda + part 1 + part 2

We can compare those to the earlier try push of 059ed9908fda and see what the results are.
(In reply to Tim Taubert [:ttaubert] from comment #6)
> 1) 059ed9908fda + part 1 (session file)

regression: 6.48ms 1.37%

> 2) 059ed9908fda + part 2 (crash monitor)

regression: 12.57ms 2.65%

> 3) 059ed9908fda + part 1 + part 2

regression: 29.71ms 6.27%
So that looks like doing both concurrently is not good. That might be due to contention on the stream transport service I/O pool.
So just for fun, I implemented sync reading for both files and pushed that to try.

Here's the CrashMonitor part:

https://tbpl.mozilla.org/?tree=Try&rev=5c132ed51a0b
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=0a4ef47d0d09&newRev=5c132ed51a0b&submit=true
Spoiler: 7.58ms / 1.6% regression

And here the SessionFile part:

https://tbpl.mozilla.org/?tree=Try&rev=9f82784f4b20
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=0a4ef47d0d09&newRev=9f82784f4b20&submit=true
Spoiler: 0.23ms / 0.05% regression
(Assignee)

Updated

4 years ago
Assignee: nobody → smacleod
Status: NEW → ASSIGNED
I'm seeing 130-200ms until SessionFile.load() has finished locally, using NetUtil. Reverting back to the OS.File workers yields 270-290ms on cold start. I wonder if a good middle way would be to just put worker scripts in the startup cache? That would hopefully be a little faster than 270ms?

It's rather hard to talk about that without numbers but teaching dom/worker/ScriptLoader.cpp to read from scache seems more complicated than I thought, so I didn't really get anywhere with yesterday night's experiments.
Whiteboard: [Australis:P-]
(Assignee)

Comment 11

4 years ago
WONTFIXing this since we had decided to eat this regression and hopefully Bug 970495 has made things faster.
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.