Exclude spurious document loads from the zero-byte-load telemetry
Categories
(Core :: Networking, enhancement, P2)
Tracking
()
People
(Reporter: nika, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [necko-triaged][necko-priority-queue])
As described in bug 1696551 comment 39, there appear to be a number of issues with the current zero-byte-load telemetry causing it to over-report failures for many navigations to/from about: documents. The telemetry should probably be updated as described in that bug to avoid these over-reporting problems.
Was looking into the telemetry a bit this morning. I think some (perhaps most?) of the
NS_ERROR_FAILUREerrors for documents loaded in-content (like about pages and activity-stream-noscripts.html) are not real errors, and should probably be filtered out so we can get a real idea of what the errors look like here.The zero-length-read telemetry reports a message whenever we don't read any data from the underlying file, (e.g. because it was cancelled before reading). We do this intentionally very frequently for about: documents loaded in content processes due to how process switching/selection works:
When we start a document navigation, we first open the channel in the parent process, and wait for
onStartRequest, where we'll make a process selection decision. After that decision, we then useRedirectToRealChannelto redirect the request into the "real" channel in the parent process. This registers the channel with the RedirectChannelRegistrar so that it can be linked up with a channel opened in the content process which was selected for the navigation. For non-http(s) channels, however, we often don't end up linking up the channel at all, as the content process will instead open up their own independent version of the channel to directly read the channel data (e.g. from the omnijar).If the channel is never taken to be linked up, we should notice this in
RedirectToRealChannelFinished, and end up callingFinishReplacementChannelSetup(NS_ERROR_FAILURE), which cancels the channel with that error, before any data is read from it.I think there's a good chance that this normal pattern is treated as a zero-byte read for the purposes of the zero byte read telemetry, and might be inflating the number of errors we're seeing here quite a bit. I'm not sure how best to filter it out though, as
NS_ERROR_FAILUREwith no cancellation reason isn't the most descriptive. The easiest might be to add a new error which is used by this call-site as well as this assignment to indicate this reason for cancelling. (maybe it should be something specific, likeNS_ERROR_DLL_NO_PARENT_CHANNEL? - names are hard). That error could then be filtered out. (We might also want to add a reason string here, though it'd be less precise).
As a side note, there are a large number of errors on
activity-stream-noscripts.htmlwhich were previously filtered out using https://searchfox.org/mozilla-central/rev/4d00b50c42a788c51ac4c5fe92b684569ba6c3a5/modules/libjar/nsJARChannel.cpp#1009-1015, however recent changes to how activity stream is bundled has changed the path for this file, meaning that these errors are no longer filtered out. I imagine (almost?) all of these errors are from the call-site I mention above (the newtab page is notoriously loaded in a content process).
Updated•10 months ago
|
Description
•