Closed Bug 1567462 Opened 2 years ago Closed 1 year ago

Resurrect telemetry for http redirects

Categories

(Core :: Networking: HTTP, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
mozilla73
Tracking Status
firefox70 --- wontfix
firefox73 --- fixed

People

(Reporter: overholt, Assigned: CuveeHsu)

References

Details

(Whiteboard: [necko-triaged])

Attachments

(3 files)

We had this telemetry from bug 1413512 in Firefox 59 through 63. There is a long tail of schemas if you look at the telemetry from then (https://mzl.la/2Y16Vec - click on the scheme to see all the types submitted) but I'm interested because Fission will often involve process switching.

Nika, was there anything else we discussed that would be useful?

Nhi, can anyone on the Necko team take this? The code is somewhere around https://searchfox.org/mozilla-central/source/netwerk/protocol/http/nsHttpHandler.cpp#839 FWIW.

Flags: needinfo?(nika)
Flags: needinfo?(nhnguyen)

It's good to know the wide diversity of different schemes which HTTP loads can redirect to. Many of these are external protocol handlers which, I believe, will never produce content to draw in a target process, but others (like data:, ftp:, jar: etc) can produce data. If we want to restrict what URLs can be the target of HTTP redirects to a known set in the future, we probably need data on what URLs are being used as targets in the wild.

We should probably do this collection after security checks might reject the attempt to redirect, as redirects which are already redirected likely don't need to be recorded and might throw off the numbers.

I don't think it's important to collect information on which custom protocols are being used. In fact, given that they're referring to people's programs on their computers (e.g. zoomapp:), we probably shouldn't collect that info for privacy reasons. I think it would be cleanest to collect into a well-defined series of buckets.

There should probably be one bucket for each internally-recognized scheme, and a bucket for externally-handled schemes. It looks like NS_NewURI now directly encodes this list, rather than using interface lookup, so it might be easiest to maintain the list of known schemes nearby that function (https://searchfox.org/mozilla-central/rev/96403eac4ff6f96f89a0b120b29bd46540d5705e/netwerk/base/nsNetUtil.cpp#1694).

In general, I'm surprised by the number of different schemes which are used as redirect targets! The sheer number helps validate the DocumentChannel approach, though, by making it so we don't need to handle every one of these independently. Having more formal documentation about what schemes are allowed as targets, though, may be valuable both for internal security and in the standard.

Flags: needinfo?(nika)
See Also: → document-channel
Assignee: nobody → juhsu
Flags: needinfo?(nhnguyen)
Priority: -- → P2
Whiteboard: [necko-triaged]

We should probably do this collection after security checks might reject the attempt to redirect, as redirects which are already redirected likely don't need to be recorded and might throw off the numbers.

I assume you mean checking if the redirection is valid.
That is https://searchfox.org/mozilla-central/rev/e0b0c38ee83f99d3cf868bad525ace4a395039f1/netwerk/base/nsIOService.cpp#602-611

We should collect the telemetry after this check.

I don't think it's important to collect information on which custom protocols are being used. In fact, given that they're referring to people's programs on their computers (e.g. zoomapp:), we probably shouldn't collect that info for privacy reasons. I think it would be cleanest to collect into a well-defined series of buckets.

There should probably be one bucket for each internally-recognized scheme, and a bucket for externally-handled schemes. It looks like NS_NewURI now directly encodes this list, rather than using interface lookup, so it might be easiest to maintain the list of known schemes nearby that function (https://searchfox.org/mozilla-central/rev/96403eac4ff6f96f89a0b120b29bd46540d5705e/netwerk/base/nsNetUtil.cpp#1694).

So, I look into the telemetry we collected.
What we're interested in is {http(s), about, javascript, ftp, file, jar, data}
I'm not sure if I need to take care of {ssh, moz*, dat}
We have few but non-zero redirections.
ssh, dat use external app, and I believe we will remove them from NS_NewURI in bug 1561860.
moz* is something we should forbid.

By the way, we have extremely low samples of redirections to ftp/file.
The number is much less than magnet://, a p2p protocol IIRC.

For fission purpose, we will continue to break down this information by whether the redirection happens for a document load or a subresource.

Attachment #9109209 - Attachment description: Bug 1567462 - Resurrect telemetry for http redirects → Bug 1567462 - Resurrect telemetry for http redirects, r=valentin
Attached file request.md
Attachment #9110921 - Flags: data-review?(chutten)
Comment on attachment 9110921 [details]
request.md

Load balancing to Loren.
Attachment #9110921 - Flags: data-review?(chutten) → data-review?(laustin)

Sorry for the push. Could we move this forward? Thanks.

Flags: needinfo?(laustin)

Any possible to have your attention here, Chris? I don't want to miss this train again. Sorry for the inconvenience.

Flags: needinfo?(chutten)
Comment on attachment 9110921 [details]
request.md

PRELIMINARY NOTES:
The data collection will only collect on pre-release channels (it lacks a "releaseChannelCollection" key with value "opt-out"), but the review request is for all channels. I'll proceed with the review for all channels.

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is Telemetry so is documented in its definitions file [Histograms.json](https://hg.mozilla.org/mozilla-central/file/tip/toolkit/components/telemetry/Histograms.json) and the [Probe Dictionary](https://telemetry.mozilla.org/probe-dictionary/).

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Junior Hsu is responsible.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

    Is the data collection request for default-on or default-off?

Default on for all channels.

    Does the instrumentation include the addition of any new identifiers?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.

---
Result: datareview+
Flags: needinfo?(laustin)
Flags: needinfo?(chutten)
Attachment #9110921 - Flags: data-review?(laustin) → data-review+

Thanks for the review. I'll update to patch to opt-out all channels.

Pushed by juhsu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f1be062c4e2a
Resurrect telemetry for http redirects, r=valentin
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73
Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/integration/autoland/rev/c68fb518e696
Make the test_redirect_protocol_telemetry.js test collect telemetry data for all products during testing. r=valentin

(In reply to Pulsebot from comment #15)

Pushed by mkmelin@iki.fi:
https://hg.mozilla.org/integration/autoland/rev/c68fb518e696
Make the test_redirect_protocol_telemetry.js test collect telemetry data for
all products during testing. r=valentin

I just realized we should also clear the pref after the test like so:
https://searchfox.org/mozilla-central/rev/62a130ba0ac80f75175e4b65536290b52391f116/netwerk/test/unit/test_parse_content_type.js#430-434
Geoff, can you submit an extra patch that does so?

Flags: needinfo?(geoff)

AFAIK, xpcshell tests get a clean plate automatically so I would think that's not necessary.

I agree with Magnus, but I'll do it if it means you'll remember not to break us in this way again. :-)

Flags: needinfo?(geoff)
You need to log in before you can comment on or make changes to this bug.