Closed Bug 1344893 Opened 7 years ago Closed 7 years ago

Report Navigation Timing into Telemetry

Categories

(Toolkit :: Performance Monitoring, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Performance Impact high
Tracking Status
firefox55 --- fixed

People

(Reporter: Harald, Assigned: wcpan)

References

(Blocks 1 open bug)

Details

(Whiteboard: [ps-radar])

Attachments

(2 files, 9 obsolete files)

8.88 KB, patch
wcpan
: review+
Details | Diff | Splinter Review
3.41 KB, patch
wcpan
: review+
Details | Diff | Splinter Review
To have counterparts to page load benchmarks we need to understand the breakdown of loading. We have new perceived measures landing, like non-blank paint (bug 1307242) but also need to know how it correlates to other milestones on the load lifecycle.

As a start we need probes for (naming per PerformanceTiming + proposed telemetry probe name):

- domLoading - TIME_TO_DOM_LOADING
- domInteractive - TIME_TO_DOM_INTERACTIVE
- domContentLoaded - …
- domComplete
- loadEvent

This should be only measures for the root document.
:kanru, can your team help with that?
Flags: needinfo?(kchen)
Hi Harald, I'm curious, how do you want to use this data? We can add the corresponding performance timing data to telemetry but you can only compare the averages of the timings. It's probably a good way to have insights into real world numbers and catch regressions but is not helpful for diagnosing particular website.
Flags: needinfo?(kchen) → needinfo?(hkirschner)
Whiteboard: [ps-radar]
This data would not be collected per site. But as these are measures that are collected by sites out there, we we can reach out to them and compare. This data is also reported by our page load benchmarks, so we need it to tweak the benchmark so it gets as close as possible to users.
Flags: needinfo?(hkirschner)
Whiteboard: [ps-radar] → [ps-radar][qf:p1]
We might use evented telemetry for that later, but for now just having the distributions for the timings is useful.
Flags: needinfo?(kchen)
responseStart should be also tracked as time to first byte metric. This only applies to top level document and should not be tracked for sub-resources.
Assignee: nobody → wpan
Can we prioritize this in the work queue as it blocks setting performance targets. We already have data from benchmarks.
Comment on attachment 8853035 [details]
Bug 1344893 - Part 1: Report Navigation Timing into Telemetry.

https://reviewboard.mozilla.org/r/125146/#review128194

Horribly sorry that I'm behind my reviews.

::: dom/base/nsDOMNavigationTiming.h:127
(Diff revision 1)
>  
>    void Clear();
>  
> +  bool IsTopLevelContentDocument() const;
> +
> +  nsWeakPtr mDocShell;

mozilla::WeakPtr might make this all  a bit simpler

::: dom/base/nsDOMNavigationTiming.cpp:283
(Diff revision 1)
>    }
>    return 0;
>  }
> +
> +bool
> +nsDOMNavigationTiming::IsTopLevelContentDocument() const

This stuff could perhaps use the GetSameTypeRoot thingie from docshell.
Attachment #8853035 - Flags: review?(bugs)
Comment on attachment 8853036 [details]
Bug 1344893 - Part 2: Add time to first byte metric.

https://reviewboard.mozilla.org/r/125148/#review128196

Similar stuff here as in the other patch.
And sorry about the delay!


If you update the patches, I can take another look (even if my review queue is closed).
Attachment #8853036 - Flags: review?(bugs)
Flags: needinfo?(bugs)
Harald, do you need data about DOMContentLoaded start or end time? Same with load event.
I assume so.
Flags: needinfo?(bugs) → needinfo?(hkirschner)
Comment on attachment 8853558 [details] [diff] [review]
0002-Bug-1344893-Part-2-Add-time-to-first-byte-metric.-r-.patch

oh, PerformanceTiming::PerformanceTiming is a bit hard to follow.
One needs to know that aHttpChannel is null for subresources.
Could you add a comment about why to check !aHttpChannel.

But I don't quite understand this.
We create PerformanceTiming only if needed. If the page never accesses .performance, nothing is added to telemetry, assuming I read the code right.

Am I missing something?
Attachment #8853558 - Flags: review-
> Harald, do you need data about DOMContentLoaded start or end time? Same with load event.

With the multitude of discussions about diagnostic metrics in mind, I would probably collect the start-end durations for respective. This assumes engineers would benefit from this, as those durations are otherwise only slightly helpful to defer perceived performance but definitely help with tracking specific improvements.
Flags: needinfo?(hkirschner)
(In reply to Olli Pettay [:smaug] (review queue closed. Please ask reviews from other DOM peers) from comment #15)
> Comment on attachment 8853558 [details] [diff] [review]
> 0002-Bug-1344893-Part-2-Add-time-to-first-byte-metric.-r-.patch
> 
> oh, PerformanceTiming::PerformanceTiming is a bit hard to follow.
> One needs to know that aHttpChannel is null for subresources.
> Could you add a comment about why to check !aHttpChannel.
> 
> But I don't quite understand this.
> We create PerformanceTiming only if needed. If the page never accesses
> .performance, nothing is added to telemetry, assuming I read the code right.
> 
> Am I missing something?

I think we will create PerformanceTiming for each HttpChannel:
https://dxr.mozilla.org/mozilla-central/rev/720b9177c6856c1c4339d0fac1bf5149c0d53950/netwerk/protocol/http/HttpChannelChild.cpp#1027

Another concern is that if dom.enable_performance is disabled, the value will be zero.
I should also check the flag to avoid invalid data.
Since the pref is enabled by default, this should not affect probe collecting too much.
aha, I couldn't have imagined HttpChannel to make that call. 

And yes, would be good to not report non-valid data.
Rebase for latest m-c, nothing changed, carry over r=smaug.
Attachment #8853557 - Attachment is obsolete: true
Attachment #8853558 - Attachment is obsolete: true
Attachment #8855245 - Flags: review+
1. added #include "nsITimedChannel.h" because it could not compile without unified build
2. changed the comment for !aHttpChannel
3. added pref checking
Flags: needinfo?(bugs)
(In reply to :Harald Kirschner :digitarald from comment #16)
> > Harald, do you need data about DOMContentLoaded start or end time? Same with load event.
> 
> With the multitude of discussions about diagnostic metrics in mind, I would
> probably collect the start-end durations for respective. This assumes
> engineers would benefit from this, as those durations are otherwise only
> slightly helpful to defer perceived performance but definitely help with
> tracking specific improvements.

Not quite sure what you mean with the comment. You want to track both start and end?
Flags: needinfo?(bugs) → needinfo?(hkirschner)
Comment on attachment 8855246 [details] [diff] [review]
0002-Bug-1344893-Part-2-Add-time-to-first-byte-metric.-r-.patch

>@@ -47,6 +52,15 @@ PerformanceTiming::PerformanceTiming(Performance* aPerformance,
>   }
> 
>   InitializeTimingInfo(aChannel);
>+
>+  // The aHttpChannel argument is null implies that this PerformanceTiming
>+  // object is being used for subresources, which is not relevant to this probe.
>+  if (!aHttpChannel &&
>+      nsContentUtils::IsPerformanceTimingEnabled() &&
>+      IsTopLevelContentDocument()) {
Isn't the comment backwards. aHttpChannel is non-null implies subresources.


But please wait for Harald's feedback on what data we want to collect here before landing.
Attachment #8855246 - Flags: review+
Harald, could you take a look at comment 21?
Flags: needinfo?(kchen)
(In reply to Olli Pettay [:smaug] from comment #21)
> (In reply to :Harald Kirschner :digitarald from comment #16)
> > > Harald, do you need data about DOMContentLoaded start or end time? Same with load event.
> > 
> > With the multitude of discussions about diagnostic metrics in mind, I would
> > probably collect the start-end durations for respective. This assumes
> > engineers would benefit from this, as those durations are otherwise only
> > slightly helpful to defer perceived performance but definitely help with
> > tracking specific improvements.
> 
> Not quite sure what you mean with the comment. You want to track both start
> and end?

Yes, we should track start and end.
Flags: needinfo?(hkirschner)
(In reply to :Harald Kirschner :digitarald from comment #24)
> Yes, we should track start and end.

So we not only want to track "from navigation start to all load events end", but also "from navigation start to the first load event start"?
Flags: needinfo?(hkirschner)
The 4 affected timings are:

- domContentLoadedEventStart
- domContentLoadedEventEnd
- loadEventStart
- loadEventEnd
Flags: needinfo?(hkirschner)
Add two probes to meature domContentLoadedEventStart and loadEventStart. (due to comment 26)

No other code changed, carry over r=smaug.
Attachment #8855245 - Attachment is obsolete: true
Attachment #8855246 - Attachment is obsolete: true
Attachment #8859060 - Flags: review+
Correct comment mentioned in comment 22. Carry over r=smaug.
Attachment #8859062 - Flags: review+
Comment on attachment 8859060 [details] [diff] [review]
0001-Bug-1344893-Part-1-Report-Navigation-Timing-into-Tel.patch

Request approval for adding probes.
Attachment #8859060 - Flags: feedback?(benjamin)
Comment on attachment 8859062 [details] [diff] [review]
0002-Bug-1344893-Part-2-Add-time-to-first-byte-metric.-r-.patch

Request approval for adding probes.
Attachment #8859062 - Flags: feedback?(benjamin)
Comment on attachment 8859060 [details] [diff] [review]
0001-Bug-1344893-Part-1-Report-Navigation-Timing-into-Tel.patch

I have deep skepticism that this will be valuable telemetry, but I'm willing to collect it so Harald can experiment with it and demonstrate its value. data-r=me
Attachment #8859060 - Flags: feedback?(benjamin) → feedback+
Comment on attachment 8859062 [details] [diff] [review]
0002-Bug-1344893-Part-2-Add-time-to-first-byte-metric.-r-.patch

data-r=me
Attachment #8859062 - Flags: feedback?(benjamin) → feedback+
Add data-review=bsmedberg. Carry over r=smaug.
Attachment #8859060 - Attachment is obsolete: true
Attachment #8859062 - Attachment is obsolete: true
Attachment #8859489 - Flags: review+
Add data-review=bsmedberg. Carry over r=smaug.
Attachment #8859490 - Flags: review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3559898682b9
Part 1: Report Navigation Timing into Telemetry. r=smaug, data-review=bsmedberg
https://hg.mozilla.org/integration/mozilla-inbound/rev/3e22ac1beeda
Part 2: Add time to first byte metric. r=smaug, data-review=bsmedberg
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/3559898682b9
https://hg.mozilla.org/mozilla-central/rev/3e22ac1beeda
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Wei-Cheng, do you know why the telemetry sample counts for the various TIME_TO_DOM_*_MS telemetry probes are not exactly the same? I assumed we would report one sample for each probe on every page load, so we should have the exactly same number of DOM_LOADING samples as DOM_COMPLETE samples. But I see the following sample counts for Beta 55:

TIME_TO_DOM_LOADING_MS = 20.62M (This count is very small!)
TIME_TO_DOM_INTERACTIVE_MS = 6.27B
TIME_TO_DOM_CONTENT_LOADED_START_MS = 7.08B
TIME_TO_DOM_CONTENT_LOADED_END_MS = 7.08B (LOADED_END count is same as LOADED_START, as I would expect)
TIME_TO_NON_BLANK_PAINT_MS = 4.02B
TIME_TO_DOM_COMPLETE_MS = 5.31B

Do we only record these samples if the web content queries the performance.timing properties?

https://telemetry.mozilla.org/new-pipeline/dist.html#max_channel_version=beta%252F55&measure=TIME_TO_DOM_LOADING_MS
Flags: needinfo?(wpan)
I think that's because mTiming does not always ready when the document is in loading state.
We probably need to record those probes in nsDocument::SetReadyStateInternal directly.
Flags: needinfo?(wpan)
Thanks. I filed bug 1387625.
Depends on: 1387625

Have we considered adding earlier navigation timing results?

I'm thinking specifically of
domainLookupStart / domainLookupEnd
and
connectStart / connectEnd

which would give us data points on network latency.

Performance Impact: --- → P1
Whiteboard: [ps-radar][qf:p1] → [ps-radar]
You need to log in before you can comment on or make changes to this bug.