Closed Bug 1423623 Opened 2 years ago Closed 2 years ago

add telemetry for alternate data stream on service worker synthesized channels

Categories

(Core :: DOM: Service Workers, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
firefox59 --- fixed

People

(Reporter: bkelly, Assigned: edenchuang)

References

(Blocks 3 open bugs)

Details

Attachments

(2 files, 3 obsolete files)

In bug 1350359 we added support for propagating the nsICacheInfoChannel and the alternate data stream to service worker synthesized channels.  It would be nice to somehow measure the impact from this change.

Eden, can you look at adding telemetry probes for:

1. Add a scalar to count how many synthesized channels are using the alternate data stream.  I think we could do it around here:

https://searchfox.org/mozilla-central/source/dom/workers/ServiceWorkerEvents.cpp#671

Comparing to the SW_SYNTHESIZED_RES_COUNT would also let us compute a percent of total synthesized requests that use alternate data.  I expect the percentage will be somewhat low for now since we only have the passthrough fetch event case, but will increase when we implement Cache API support.

2. Maybe change the other probes here:

https://searchfox.org/mozilla-central/source/netwerk/protocol/http/InterceptedChannel.cpp#172

To report values for "navigation", "subresource/script", and "subresource/other".  Or we could go ahead and provide more granularity for subresource now stuff like "subresource/stylesheet", "subresource/image", etc.

The goal here would be to see if the "subresource/script" timing values improve when we ship the Cache API alternate data stream.  We could also uplift this particular probe change to beta 58 so we could maybe see a difference when bug 1350359 merges to beta.

We probably want to do this before bug 1336199 so we can measure impact.
Flags: needinfo?(echuang)
Nicolas points out we shouldn't see an improvement here since the alternate data stream is saving time later in the pipeline during javascript compilation.  There may be some regression in load time as well if the alternate data stream is much larger.

We should at least have some idea of how this impacts our load timing metrics, though.
I guess maybe uplifting to beta may not be necessary.  This is more just sanity checking, etc.
(In reply to Ben Kelly [:bkelly] from comment #1)
> Nicolas points out we shouldn't see an improvement here since the alternate
> data stream is saving time later in the pipeline during javascript
> compilation.  There may be some regression in load time as well if the
> alternate data stream is much larger.
> 

Sorry, I am not sure I understand why we shouldn't see an improvement.
Does that mean we use the alternative data after javascript compilation?

> We should at least have some idea of how this impacts our load timing
> metrics, though.

I think we can still add some telemetry probes first according to your comment, then to see how to create a new measurement for the impact.
Assignee: nobody → echuang
Status: NEW → ASSIGNED
Flags: needinfo?(echuang)
Priority: -- → P2
(In reply to Eden Chuang[:edenchuang] from comment #3)
> (In reply to Ben Kelly [:bkelly] from comment #1)
> > Nicolas points out we shouldn't see an improvement here since the alternate
> > data stream is saving time later in the pipeline during javascript
> > compilation.  There may be some regression in load time as well if the
> > alternate data stream is much larger.
> > 
> 
> Sorry, I am not sure I understand why we shouldn't see an improvement.
> Does that mean we use the alternative data after javascript compilation?

So if you think about what it takes to execute a <script> element there are a few steps:

1. Load the script from network, http cache, or service worker
2. Parse/compile the script
3. Execute the script

We are working in step (1) by allowing the alternate data stream to be surfaced to step (2).  By itself, step (1) may get slower with the alternate data stream because it will often be 3 to 4 times larger than the original javascript code.

In the grand scheme of things, though, this is fine because the alternate data stream then lets step (2) run much faster.  This makes up for the time lost in step (1).

Since the probes I suggest in comment 0 only have visibility into step (1) they will not show any benefit in step 2.  And they might show a slight regression in load times because of the larger data size.

I still think its good to add these probes, though, so we can see what the actual impact is or if there is an extreme regression, etc.

Does that make sense?

Thanks.
Comment on attachment 8936994 [details] [diff] [review]
P1: add telemetry for alternate data stream on service worker synthesized channels. r?bkelly

Review of attachment 8936994 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with comments addressed.

Also, you need to request data approval. For example, see bug 1416629 comment 13.

::: netwerk/protocol/http/InterceptedChannel.cpp
@@ +171,5 @@
>  
>    nsCString navigationOrSubresource = nsContentUtils::IsNonSubresourceRequest(channel) ?
>      NS_LITERAL_CSTRING("navigation") : NS_LITERAL_CSTRING("subresource");
>  
> +  if (navigationOrSubresource.Equals(NS_LITERAL_CSTRING("subresource"))) {

This code needs to be added to InterceptedHttpChannel as well.  Sorry for the duplication.  I guess you could make a helper method somewhere that converts from content policy type to the telemetry key.  We should be removing InterceptedChannel.cpp in the next month or two, though, so maybe just copying the code is fine for now.

@@ +180,5 @@
> +          navigationOrSubresource.Append(NS_LITERAL_CSTRING("/script"));
> +          break;
> +        }
> +        case nsIContentPolicy::TYPE_OTHER: {
> +          navigationOrSubresource.Append(NS_LITERAL_CSTRING("/other"));

Can you move this case down into the default: case and make all non-script, image, and stylesheet takes "subresource-other"?

I think completely abandoning "subresource" without a suffix would be preferable to avoid confusion looking at the timeseries telemetry chart.  It may not be clear to people reading it that "subresource" data is still being reported, but a bunch of requests were removed on this push. By using the suffix consistently the "subresource" data will just end instead.
Attachment #8936994 - Flags: review?(bkelly) → review+
(In reply to Ben Kelly [:bkelly] from comment #6)
> > +        case nsIContentPolicy::TYPE_OTHER: {
> > +          navigationOrSubresource.Append(NS_LITERAL_CSTRING("/other"));
> 
> Can you move this case down into the default: case and make all non-script,
> image, and stylesheet takes "subresource-other"?
> 
> I think completely abandoning "subresource" without a suffix would be
> preferable to avoid confusion looking at the timeseries telemetry chart.  It
> may not be clear to people reading it that "subresource" data is still being
> reported, but a bunch of requests were removed on this push. By using the
> suffix consistently the "subresource" data will just end instead.

Or maybe better:

Could we continue to report the "subresource" key like we do today and also call the telemetry method with a second key for "subresource-script", "-image", and "-stylsheet"?
Update patch according to comment 6.

> Could we continue to report the "subresource" key like we do today and also
> call the telemetry method with a second key for "subresource-script",
> "-image", and "-stylsheet"?

Telemetry::Accumulate does not support to pass the second key. So I keep subresource-script for all script types and subresource-other for remains.
Attachment #8936994 - Attachment is obsolete: true
Attachment #8937503 - Flags: review?(bkelly)
(In reply to Eden Chuang[:edenchuang] from comment #8)
> Telemetry::Accumulate does not support to pass the second key. So I keep
> subresource-script for all script types and subresource-other for remains.

Oh, I meant to call Accumulate() twice.  Once with the "subresource" key and once with whatever "subresource-*" key is selected, if any.

Does that make sense?
Flags: needinfo?(echuang)
(In reply to Ben Kelly [:bkelly] from comment #9)
> (In reply to Eden Chuang[:edenchuang] from comment #8)
> > Telemetry::Accumulate does not support to pass the second key. So I keep
> > subresource-script for all script types and subresource-other for remains.
> 
> Oh, I meant to call Accumulate() twice.  Once with the "subresource" key and
> once with whatever "subresource-*" key is selected, if any.
> 
> Does that make sense?

Sorry I misunderstand the meaning of the second key.
Flags: needinfo?(echuang)
Comment on attachment 8937503 [details] [diff] [review]
P1: add telemetry for alternate data stream on service worker synthesized channels. r?bkelly

Will send another patch according to comment 9
Attachment #8937503 - Flags: review?(bkelly) → review-
Update the patch.

1. generate following subresource time stamp keys
   subresource-script for all script type.
   subresource-image for all image type.
   subresource-stylesheet for all stylesheet type.
   subresource-other for all other types.

2. keep original reporting and adding new report value with subresource time stamp keys.
Attachment #8937503 - Attachment is obsolete: true
Attachment #8937516 - Flags: review?(bkelly)
Comment on attachment 8937516 [details] [diff] [review]
P1: add telemetry for alternate data stream on service worker synthesized channels. r?bkelly

Review of attachment 8937516 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!  Please request data approval as described in bug 1416629 comment 13.
Attachment #8937516 - Flags: review?(bkelly) → review+
Attached file request.md
Francois, could you help to review the data collection request document? Thanks.
Attachment #8937629 - Flags: review?(francois)
Comment on attachment 8937516 [details] [diff] [review]
P1: add telemetry for alternate data stream on service worker synthesized channels. r?bkelly

Francois, can you do a data review of this telemetry probe? Thanks.
Attachment #8937516 - Flags: review+ → review?(francois)
Comment on attachment 8937629 [details]
request.md

1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes, scalars.yml.

2) Is there a control mechanism that allows the user to turn the data collection on and off? 

Yes, telemetry setting.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?**

Not permanent.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?  **

Category 1.

5) Is the data collection request for default-on or default-off?

Default on everywhere.

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes.

8) Does there need to be a check-in in the future to determine whether to renew the data?

No, telemetry alerts will be sent.
Attachment #8937629 - Flags: review?(francois) → review+
Comment on attachment 8937516 [details] [diff] [review]
P1: add telemetry for alternate data stream on service worker synthesized channels. r?bkelly

Review of attachment 8937516 [details] [diff] [review]:
-----------------------------------------------------------------

I have not looked at the code but the probe looks good. datareview+
Attachment #8937516 - Flags: review?(francois) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5a50adabbb11
Add telemetry for alternate data stream on service worker synthesized channels. r=bkelly, data-r=francois
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/5a50adabbb11
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Looking at telemetry in beta 59 currently we see about 8.6M interceptions using alternate data stream out of 126.97M total cases where respondWith is called.  So that's about 6.8% of interceptions taking advantage of alternate bytecode cache.

That's higher than I expected given that we only support a narrow range of use cases right now.  Only those service workers using respondWidth(fetch(evt.request)).  I guess there are a fair number of people using pass-through service workers for script requests.
You need to log in before you can comment on or make changes to this bug.