Closed Bug 1363845 Opened 7 years ago Closed 7 years ago

Talos pageloader changes to support 'time to first non-blank paint' measurement for tp6

Categories

(Testing :: Talos, enhancement)

Version 3
enhancement
Not set
normal

Tracking

(firefox56 fixed)

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: rwood, Assigned: rwood)

References

Details

(Whiteboard: [PI:July])

Attachments

(1 file)

For the new talos quantum-pageload test, have the test use the 'time to first non-blank paint' event/perf marker.
Summary: Have quantum-pageload talos test measure 'time to first non-blank paint' → Talos pageloader changes to support 'time to first non-blank paint' measurement
Depends on: 1210906
Depends on: 1377251
Assignee: nobody → rwood
Status: NEW → ASSIGNED
Can we use timeToNonBlankPaint here instead of the MozAfterPaint?
Flags: needinfo?(rwood)
(In reply to Zibi Braniecki [:gandalf][:zibi] from comment #5)
> Can we use timeToNonBlankPaint here instead of the MozAfterPaint?

Yes that is the goal. Right now I'm just trying to see if I can actually retrieve the value, and apparently it is almost guaranteed a timeToNonBlankPaint value will exist by the time the MozAfterPaint event is received.
Flags: needinfo?(rwood)
See Also: → 1381988
In bug 1377251 comment 24, bsmedberg suggests listening for the 'onload' event and then reading the timeToNonBlankPaint property, which should (!) have been set by onload time.
Summary: Talos pageloader changes to support 'time to first non-blank paint' measurement → Talos pageloader changes to support 'time to first non-blank paint' measurement for tp6
Comment on attachment 8888468 [details]
Bug 1363845 - Talos pageloader changes to support 'time to first non-blank paint' measurement for tp6;

https://reviewboard.mozilla.org/r/159414/#review164772

pretty close- a few bits of cleanup and the big question- does this work on try server?

::: testing/talos/talos/pageloader/chrome/pageloader.js:271
(Diff revision 1)
> -        if (gUseE10S) {
> -          let contentScript = "data:,function _contentLoadHandler(e) { " +
> +        let contentScript = "data:,function _contentLoadHandler(e) { " +
> -            "  if (e.originalTarget.defaultView == content) { " +
> +          "  if (e.originalTarget.defaultView == content) { " +
> -            "    content.wrappedJSObject.tpRecordTime = function(t, s, n) { sendAsyncMessage('PageLoader:RecordTime', { time: t, startTime: s, testName: n }); }; ";
> +          "    content.wrappedJSObject.tpRecordTime = function(t, s, n) { sendAsyncMessage('PageLoader:RecordTime', { time: t, startTime: s, testName: n }); }; ";
> -          if (useMozAfterPaint) {
> +        if (useFNBPaint) {
> +          dumpLine("Using fnbpaint");

this dumpline isn't useful for production

::: testing/talos/talos/pageloader/chrome/pageloader.js:273
(Diff revision 1)
> -            "  if (e.originalTarget.defaultView == content) { " +
> +          "  if (e.originalTarget.defaultView == content) { " +
> -            "    content.wrappedJSObject.tpRecordTime = function(t, s, n) { sendAsyncMessage('PageLoader:RecordTime', { time: t, startTime: s, testName: n }); }; ";
> +          "    content.wrappedJSObject.tpRecordTime = function(t, s, n) { sendAsyncMessage('PageLoader:RecordTime', { time: t, startTime: s, testName: n }); }; ";
> -          if (useMozAfterPaint) {
> +        if (useFNBPaint) {
> +          dumpLine("Using fnbpaint");
> +          contentScript += "" +
> +          "var retryCounter = 0; " +

can you make this: gRetryCounter?

::: testing/talos/talos/pageloader/chrome/pageloader.js:280
(Diff revision 1)
> +          "  x = content.window.performance.timing.timeToNonBlankPaint; " +
> +          "  if (typeof x == 'undefined') { " +
> +          "    sendAsyncMessage('PageLoader:FNBPaintError', {}); " +
> +          "  } " +
> +          "  if (x > 0) { " +
> +          "    dump('fnbpaint value is: ' + x + '\\n'); " +

the dump statements in here are not useful for production

::: testing/talos/talos/pageloader/chrome/pageloader.js:688
(Diff revision 1)
> +    // NOTE: window.performance.timing.timeToNonBlankPaint is currently a duration
> +    // value (ms) but may be changed to a timestamp value; see Bug 1377251
> +    var time = fnbpaint;
> +  } else {
> -  var end_time = Date.now();
> +    var end_time = Date.now();
> -  var time = (end_time - start_time);
> +    var time = (end_time - start_time);

if we are expecting fnbpaint and we don't get it, will we end up using the old fashioned time instead?

::: testing/talos/talos/pageloader/chrome/pageloader.js:829
(Diff revision 1)
>  
> -    if (gUseE10S) {
> -      let mm = content.selectedBrowser.messageManager;
> +    let mm = content.selectedBrowser.messageManager;
> -      mm.removeMessageListener("PageLoader:LoadEvent", ContentListener);
> +    mm.removeMessageListener("PageLoader:LoadEvent", ContentListener);
> -      mm.removeMessageListener("PageLoader:RecordTime", ContentListener);
> +    mm.removeMessageListener("PageLoader:RecordTime", ContentListener);
> +    mm.removeMessageListener("PageLoader:FNBPaintError", ContentListener);

mozafterpaint is on content, but fnbpaint is on mm; is that intentional?

should we also put fnbpaint behind if(useFNBPaint) ?
Attachment #8888468 - Flags: review?(jmaher) → review-
Comment on attachment 8888468 [details]
Bug 1363845 - Talos pageloader changes to support 'time to first non-blank paint' measurement for tp6;

https://reviewboard.mozilla.org/r/159414/#review164772

> the dump statements in here are not useful for production

I need this for now to run on try, since quantum-pageload only runs on windows and I am developing on osx. I am testing locally but with tsvg_static. Will def remove this before landing though!

> if we are expecting fnbpaint and we don't get it, will we end up using the old fashioned time instead?

Nope, if we are expecting fnbpaint and don't get it, pageloader (and bubbling up to talos) will error out before this point.

> mozafterpaint is on content, but fnbpaint is on mm; is that intentional?
> 
> should we also put fnbpaint behind if(useFNBPaint) ?

It's my lack of understanding of this, I thought that's how you had to send messages from content to chrome, via the mm. Not sure how to proceed here then.
A bunch of debug statements in pageloader, to help trace the code path with quantum-pageload:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9454e750645b97eb4023cfe51cad20652decdd15
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=9454e750645b97eb4023cfe51cad20652decdd15

Interesting, so this try run has fnbpaint enabled for tp6 as well as tsvg_static. The debug statements do show up on tsvg_static on OSX and show it capturing first non-blank paint successfully. However for some reason on the other platforms the debug statements aren't showing up so I can't verify it is capturing successfully there.

So next step, I'll put some temporary debug code in the pageloader content script, so that when fnbpaint is actually grabbed it will error out of talos, that way I can be certain fnbpaint is being captured successfully on the other platforms.
Hmm, so capturing fnbpaint in talos is only working on OSX but not the other platforms for some reason, even though the -fnbpaint flag is passed into talos on all.
Comment on attachment 8888468 [details]
Bug 1363845 - Talos pageloader changes to support 'time to first non-blank paint' measurement for tp6;

Big rebase, creating new attachment/review
Attachment #8888468 - Attachment is obsolete: true
Got it working after rebasing and signing pageloader add-on. Successfully grabbed time to fnbpaint in talos on all platforms. This try push is talos deliberately crashing out of tsvg_static and tp6 after grabbing the fnbpaint value.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=1e56d06278258583d9c75143f2f50f6197656277&selectedJob=117113123

Note:  window.performance.timing.timeToNonBlankPaint was just changed from a duration value to a timestamp an hour ago, so I'll update accordingly:

https://bugzilla.mozilla.org/show_bug.cgi?id=1377251#c33
Comment on attachment 8888468 [details]
Bug 1363845 - Talos pageloader changes to support 'time to first non-blank paint' measurement for tp6;

https://reviewboard.mozilla.org/r/159414/#review166280

this looks much nicer!
Attachment #8888468 - Flags: review?(jmaher) → review+
Pushed by rwood@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8d22078682e4
Talos pageloader changes to support 'time to first non-blank paint' measurement for tp6; r=jmaher
results are in:
== Change summary for alert #8268 (as of July 25 2017 19:22 UTC) ==

Improvements:

 73%  quantum_pageload_amazon summary windows7-32 opt e10s stylo     2,768.33 -> 736.17
 73%  quantum_pageload_amazon summary windows10-64 opt e10s stylo    2,441.42 -> 666.54
 69%  quantum_pageload_facebook summary windows7-32 opt e10s stylo   1,346.96 -> 411.50
 68%  quantum_pageload_facebook summary windows10-64 opt e10s stylo  1,305.12 -> 416.17
 67%  quantum_pageload_youtube summary windows7-32 opt e10s stylo    1,465.04 -> 478.08
 60%  quantum_pageload_youtube summary windows10-64 opt e10s stylo   1,389.48 -> 553.75
 53%  quantum_pageload_google summary windows7-32 opt e10s stylo     1,137.21 -> 528.88
 53%  quantum_pageload_google summary windows10-64 opt e10s stylo    1,142.17 -> 535.83

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=8268

:bsmedberg- are there any concerns with this change in values from the mozafterpaint->time to first non blank paint ?
Flags: needinfo?(benjamin)
Whiteboard: [PI:July]
https://hg.mozilla.org/mozilla-central/rev/8d22078682e4
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
I don't know enough about mozafterpaint to say for certain, although 2,768.33 -> 736.17 looks like we're measuring pretty different things: mstange can you give this a sanity-check?

Are we sure that this is measuring first *nonblank* paint and not the first *blank* paint?
Flags: needinfo?(benjamin) → needinfo?(mstange)
Previously tp6 was just waiting for load then grabbing Date.now() and using that for the end_date, to calculate the duration (we weren't using mozafterpaint for tp6). Now at the time of load instead of just grabbing Date.now(), we're waiting for 'window.performance.timing.timeToNonBlankPaint' to have a value (there right away most times) and grabbing that to use for the 'end_date' value, which is significantly less than Date.now().
Ah then this data makes perfect sense and is reasonable.
Flags: needinfo?(mstange)
so are we still sure this is measuring the first nonblank paint?  I imagine the overhead in getting the execution to pageloader and querying date.now() is accounting for a good portion of the time.  Just seeing 2000ms vs 600ms is a bit daunting.

:bsmedberg, does this match up with what we see in manual video analysis?
Flags: needinfo?(benjamin)
This is in the range of the live-site video recording. Examples:
 google search is 24 frames ~= 400ms.
 amazon homepage is 41 frames ~= 680ms

Video recording with mitmproxy is not yet stable enough to compare.

What does "overhead in getting the execution to pageloader and querying date.now" mean?
Flags: needinfo?(benjamin)
by overhead, I am referring to the time it takes to switch context then call the date.now() api wherein the case of window.performance..., this is set by the browser and doesn't change, it might be a few milliseconds, or maybe a few hundred, I doubt it is too much.
You're talking about recording the time the framework *starts* navigation? if you're using Date.now(), I'd strongly suggest switching to window.performance.now() for your start-navigation metric. It returns a monotonic DOMHighResTimeStamp which should be more comparable.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #31)
> You're talking about recording the time the framework *starts* navigation?
> if you're using Date.now(), I'd strongly suggest switching to
> window.performance.now() for your start-navigation metric. It returns a
> monotonic DOMHighResTimeStamp which should be more comparable.

Just FYI, Bug 1380676 - Fix instances of using Date.now() under testing/talos
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: