Closed Bug 1287938 Opened 3 years ago Closed 3 years ago

ts_paint is measuring the wrong thing in the e10s case

Categories

(Testing :: Talos, defect, P1)

defect

Tracking

(e10s+, firefox50 fixed)

RESOLVED FIXED
mozilla50
Tracking Status
e10s + ---
firefox50 --- fixed

People

(Reporter: mconley, Assigned: mconley)

References

Details

Attachments

(1 file)

My patches in bug 1261842 were apparently causing a ts_paint regression. While investigating it, it became clear to me that ts_paint is measuring the wrong thing.

ts_paint works by starting a session, and immediately loading tspaint_test.html into the initial browser tab. When that page loads, the page (with privileges) waits until it is painted, and then takes a delta between the paint time and the time that the process started to determine how long it took to paint the initial page from starting the browser.

The problem is that in the e10s case, the way that we're getting the process start time gets the _content process start time_ and not the parent process start time. This is causing us to look really good on the ts_paint measurement, but it's pretty bogus.

My patches in bug 1261842 made it look like I had added a regression, but in fact what I had done was cause the content process to start earlier so that the delta between that start time and the paint event was larger.

We need to update ts_paint to use the parent process start time.
Assignee: nobody → mconley
Oh, that's exciting. We were trying to make sense out of ts_paint numbers for L20n lately and that would explain at least half of our WTFs.

Can we also switch ts_paint to use Performance API as we're at it? (performance.mark's and performance.measure's would allow us to easily plug our own tests - like "are we done before first paint?)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #1)
> Can we also switch ts_paint to use Performance API as we're at it?
> (performance.mark's and performance.measure's would allow us to easily plug
> our own tests - like "are we done before first paint?)

I don't see how - none of those APIs give you parent process start time, AIUI, not even on the browser.xul file - just "when did we start loading this file into this window" which is much later.
(In reply to :Gijs Kruitbosch from comment #2)
> (In reply to Zibi Braniecki [:gandalf][:zibi] from comment #1)
> > Can we also switch ts_paint to use Performance API as we're at it?
> > (performance.mark's and performance.measure's would allow us to easily plug
> > our own tests - like "are we done before first paint?)
> 
> I don't see how - none of those APIs give you parent process start time,
> AIUI, not even on the browser.xul file - just "when did we start loading
> this file into this window" which is much later.

right, so the best we can get is MozFirstPaint of ts_paint.html. Obviously I'd love MozFirstPaint of browser.xul, but we're using that, ok.

The problem I have is that we're doing it by calling Date.now() on a local variable, and I can't access startup results from anywhere else, or add my marks and compare them to the marks used by ts_paint already.
(Note that the first paint in the parent process can be got via Services.startup.getStartupInfo()['firstPaint'] )
Instead of checking for the paint time of the tspaint_test.html content,
we're now measuring the delta between process start and first paint as
reported by the parent process's startup info.

Review commit: https://reviewboard.mozilla.org/r/65580/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/65580/
Attachment #8772876 - Flags: review?(jmaher)
jmaher and I just had a conversation in #perf. We came to the conclusion that the thing we want ts_paint to measure is the delta between the process start and the first paint of the browser window.

Both of those pieces of data can be gotten from Services.startup.getStartupInfo. My patch modifies the test to get that data from the parent, calculate the delta, and record the result.
Comment on attachment 8772876 [details]
Bug 1287938 - Make ts_paint use the parent process start time and first paint info in calculations.

https://reviewboard.mozilla.org/r/65580/#review62592

I was reluctant at first to accept the talospowers issue, but upon further thought this is actually quite straightforward and good.
Attachment #8772876 - Flags: review?(jmaher) → review+
(In reply to Joel Maher ( :jmaher ) from comment #7)
> Comment on attachment 8772876 [details]
> Bug 1287938 - Make ts_paint use the parent process start time and first
> paint info in calculations.
> 
> https://reviewboard.mozilla.org/r/65580/#review62592
> 
> I was reluctant at first to accept the talospowers issue, but upon further
> thought this is actually quite straightforward and good.

Try push seems to indicate that my fix breaks the non-e10s case - it's not emitting results: https://treeherder.mozilla.org/#/jobs?repo=try&revision=deb714da1b8dfe35e8b717fb8adccc9808fb64a2

Investigating...
Priority: -- → P1
Comment on attachment 8772876 [details]
Bug 1287938 - Make ts_paint use the parent process start time and first paint info in calculations.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/65580/diff/1-2/
Comment on attachment 8772876 [details]
Bug 1287938 - Make ts_paint use the parent process start time and first paint info in calculations.

Had to make a minor change to deal with Linux weirdness.
Attachment #8772876 - Flags: review+ → review?(jmaher)
Comment on attachment 8772876 [details]
Bug 1287938 - Make ts_paint use the parent process start time and first paint info in calculations.

odd, reviewboard retains my r+, so let me do this the old fashioned way.
Attachment #8772876 - Flags: review?(jmaher) → review+
Pushed by mconley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bc767d7a30dd
Make ts_paint use the parent process start time and first paint info in calculations. r=jmaher
https://hg.mozilla.org/mozilla-central/rev/bc767d7a30dd
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
As a note, this patch bc767d7a30dd has been merged into Aurora, here is the perf alert from Aurora:

https://treeherder.mozilla.org/perf.html#/alerts?id=2185

  ts_paint linux64 pgo - 4.1% worse
  ts_paint linux64 pgo e10s - 95.99% worse 
  ts_paint osx-10-10 opt e10s - 67.31% worse 
  ts_paint windows7-32 pgo e10s - 67.8% worse
  ts_paint windows8-64 pgo e10s - 58.49% worse
  ts_paint windowsxp pgo e10s -	45.98% worse

  ts_paint osx-10-10 opt - 15.93% better
  ts_paint windows7-32 pgo - 17.93% better
  ts_paint windows8-64 pgo - 21.24% better
  ts_paint windowsxp pgo - 17.67% better
> ts_paint linux64 pgo e10s - 95.99% worse

Slightly off topic, these are the numbers:

500.58 < 981.08  95.99% 

So 96% regression here is just less than twice worse.

But if this test was higher-is-better, then 96% regression would have been 25 times worse (e.g. if the before results were 100, then 96% regression would put the new regressed result at 4).

But the numbers listed at comment 15 don't state that this test is lower-is-better nor displays the actual before and after numbers (from which it could be deduced its lower-is-better).

Therefore, by just looking at these numbers it's impossible to tell if 96% regression is twice worse or 25 times worse.
Avi, feel free to file a bug in perfherder to make the numbers more meaningful or have the copy/text summary show raw values- i do agree there is some ambiguity there!
You need to log in before you can comment on or make changes to this bug.