Closed Bug 1397293 Opened 7 years ago Closed 7 years ago

Fix Telemetry sending pings late after network shutdown

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox57 --- wontfix
firefox58 --- fixed

People

(Reporter: gfritzsche, Assigned: chutten)

References

Details

Attachments

(3 files)

Per bug 1393731, comment 6: > Looking locally, at my about:telemetry page, i see a common pattern: > - one health ping with reason "immediate" > - followed after <1sec by a "shutdown" health ping > > We should be able to confirm if this common with a longitudinal, per-client analysis. > If so, that would (again) point to that we are trying to send pings after the > Firefox network stack already shut down. This seems a concrete issue that we need to address. For potential Telemetry risk we may hold changes here back to 58.
Assignee: nobody → gfritzsche
We have multiple network related observer notifications, these are what i found: "profile-change-net-teardown" "network:offline-status-changed" with data=="offline" "network:offline-about-to-go-offline" Andrea, do you know which one to prefer in our situation? Right now we specifically want to avoid/stop sending activity from Telemetry when the browser shuts down. We may extend this later to cover temporary network availability too, but this is out of scope for now.
Flags: needinfo?(amarchesini)
Patrick, do you know about the questions in comment 1?
Assignee: gfritzsche → chutten
Flags: needinfo?(mcmanus)
Flags: needinfo?(mcmanus) → needinfo?(daniel)
"profile-change-net-teardown" is the shutdown notification to use which signals that shutdown is now in progress. The other two are for when Firefox is "merely" going offline.
Flags: needinfo?(daniel)
Flags: needinfo?(amarchesini)
Comment on attachment 8912205 [details] bug 1397293 - Don't try to send Telemetry after net shutdown https://reviewboard.mozilla.org/r/183578/#review188764 ::: toolkit/components/telemetry/Histograms.json:7272 (Diff revision 1) > "eUnreachable", > "eChannelOpen", > "eRedirect", > "abort", > - "timeout" > + "timeout", > + "eTooLate" Mh, since this is a categorical histogram, do we need to rename it when adding labels? See [doc](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/histograms.html#categorical) ::: toolkit/components/telemetry/TelemetrySend.jsm:1088 (Diff revision 1) > // We can't send the pings to the server, so don't try to. > this._log.trace("_doPing - Can't send ping " + ping.id); > return Promise.resolve(); > } > > + if (this._tooLateToSend) { Can we add test coverage for this in test_TelemetrySend (or a new file)? We might want to check that: - Pings still get persisted and will be sent next time telemetry starts. - The histogram contains the right values.
Attachment #8912205 - Flags: review?(alessio.placitelli) → review-
Comment on attachment 8912205 [details] bug 1397293 - Don't try to send Telemetry after net shutdown https://reviewboard.mozilla.org/r/183578/#review188764 > Mh, since this is a categorical histogram, do we need to rename it when adding labels? See [doc](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/histograms.html#categorical) Nope! So long as I keep under the 50-bucket default, we're cool. See: Bug 1374508, amongst others. > Can we add test coverage for this in test_TelemetrySend (or a new file)? We might want to check that: > > - Pings still get persisted and will be sent next time telemetry starts. > - The histogram contains the right values. I tried adding a test and had difficulty clearing the number of pending pings. I'll try again.
Comment on attachment 8912205 [details] bug 1397293 - Don't try to send Telemetry after net shutdown https://reviewboard.mozilla.org/r/183578/#review188764 > Nope! So long as I keep under the 50-bucket default, we're cool. See: Bug 1374508, amongst others. Nice, I didn't know that! > I tried adding a test and had difficulty clearing the number of pending pings. I'll try again. Cool, let me know if there's anything I can do to help with that.
Comment on attachment 8912367 [details] bug 1397293 - Suppress TelemetryStopwatch errors in TelemetrySend tests https://reviewboard.mozilla.org/r/183696/#review189064 ::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:621 (Diff revision 1) > Services.prefs.setBoolPref(TelemetryUtils.Preferences.FhrUploadEnabled, origFhrUploadEnabled); > }); > > add_task(async function cleanup() { > await PingServer.stop(); > + TelemetryStopwatch.setTestModeEnabled(false); AFAIK, that's not needed: these tests are executed in a separated environment compared to the other files.
Attachment #8912367 - Flags: review?(alessio.placitelli) → review+
Attachment #8912366 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8912205 [details] bug 1397293 - Don't try to send Telemetry after net shutdown https://reviewboard.mozilla.org/r/183578/#review189070 Thanks for addressing the feedback!
Attachment #8912205 - Flags: review- → review+
Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e663536181b0 Don't try to send Telemetry after net shutdown r=Dexter https://hg.mozilla.org/integration/autoland/rev/49ebf04b8291 Test TelemetrySend tooLateToSend r=Dexter https://hg.mozilla.org/integration/autoland/rev/747a89f3dead Suppress TelemetryStopwatch errors in TelemetrySend tests r=Dexter
Comment on attachment 8912205 [details] bug 1397293 - Don't try to send Telemetry after net shutdown Approval Request Comment [Feature/Bug causing the regression]: None. Long-standing issue with Telemetry [User impact if declined]: Twice as many "health" pings will be sent by users. [Is this code covered by automated tests?]: Yes, patch 2. [Has the fix been verified in Nightly?]: I've seen health ping volumes decrease[1] and eTooLate numbers increase[2] which is pretty good evidence in support of this working. [Needs manual test from QE? If yes, steps to reproduce]: I don't think so. [List of other uplifts needed for the feature/fix]: The other patches in this bug [Is the change risky?]: No. [Why is the change risky/not risky?]: Instead of trying to send a net request at a time we know it will fail, we don't try. [String changes made/needed]: None. [1]: https://screenshots.firefoxusercontent.com/images/af909e5a-3965-487e-a7b8-d068f5ae0146.png [2]: https://mzl.la/2ynjCru
Attachment #8912205 - Flags: approval-mozilla-beta?
Attachment #8912366 - Flags: approval-mozilla-beta?
Attachment #8912367 - Flags: approval-mozilla-beta?
Comment on attachment 8912205 [details] bug 1397293 - Don't try to send Telemetry after net shutdown As much as I'd like to take this change in 57, we are trying to limit the uplifts to severe recent regressions due to Quantum, Photon, Stylo work. This uplift doesn't meet that bar. Let this fix ride the 58 train.
Attachment #8912205 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #8912366 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #8912367 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Depends on: 1437120
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: