Closed Bug 1437120 Opened 2 years ago Closed 2 years ago

Sending pings at shutdown can silently fail (sync pings are not being sent)

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla60
Tracking Status
firefox-esr52 --- unaffected
firefox58 - wontfix
firefox59 --- fixed
firefox60 --- fixed

People

(Reporter: Dexter, Assigned: chutten)

References

Details

(Keywords: regression)

Attachments

(3 files)

STR:

1. go to about:config and change toolkit.telemetry.server to http://127.0.0.1:8080
2. (Optional) In Firefox, go to Tools -> Sync now
3. shutdown firefox

Expected result:

A sync ping should be serialized in the pending pings directory (%profile%/saved-telemetry-pings) along with the other pings. The sync ping must be visible in the archive, by browsing about:telemetry.

Current results:

The sync ping is displayed in the archive through about:telemetry, but no "sync" pending ping is present.
Priority: -- → P1
Trying this myself the only pending ping is a saved-session ping.

All the rest (main, bhr, sync, and another saved-session) are in datareporting/archived/2018-02.

...we _do_ immediately respond to changes in to the server URL, right? They weren't successfully sent off to inbound because we forgot to tell pingsender the user'd changed the server location?
I had a bit of a play with this. I instrumented our telemetry.js's call to TelemetryController.submitExternalPing() so it logs the value returned by the promise - turns out it resolved with the GUID of the submission.

I see 2 different behaviors:

* the ping ends up in datareporting/archived/2018-02

* the ping remains in saved-telemetry-pings and a log message is generated:
> *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping

After re-running the profile, the one in saved-telemetry-pings does seem to end up in datareporting/archived/2018-02

Also, FWIW, in bug 1435067 we want to change to using the pingsender, but it's not clear how or if that would affect this issue.
See Also: → 1436657
This makes the existing test fail with the scenario Alessio spotted.
We made progress on narrowing this down and should have a more concrete update tomorrow.
    The current theory of the bug is that submitting a ping without using pingsender after the network has torn itself down but before TelemetrySend has started shutting down may result in Telemetry thinking that it has successfully sent a ping when in fact it has bailed out early[1].

    It appears as though "sync" pings sent at shutdown are the only pings that fall into this category.

    If you use pingsender, you don't have a problem, because we specifically save your ping as pending to disk and then invoke pingsender to send it. This means most submitExternalPing-sent pings sent at shutdown are immune, as well as not-first-session "main" pings.

    If you submit a ping earlier than net teardown, you don't have a problem, because normal ping-sending logic applies. (Try to send it, if it doesn't work, pend it to disk) This means main/daily, main/environment-change, health, update, and the bulk of our data collection is immune.

    If you submit a ping later than TelemetrySend shutdown, you don't have a problem, because if TelemetrySend is shut down we save everything to disk. This means our first sessions' main/shutdown and saved-session pings are immune.

    It is possible that some shield studies may have been written to time their custom pings in just such a way to run afoul of this bug, because they are legacy addons that can do anything.

    The earliest this might have started happening is Nightly58 build 20170928100123, when the "too late to send" logic was added in bug 1397293. (I'm not sure when sync started sending "sync" pings at shutdown)

    As part of that bug we added a test to ensure exactly this sort of thing wouldn't happen. The test's faulty. It checks if a too-late-to-send ping is pending after we try to send it after net teardown, but it does so _just_ before it is removed from the in-memory pending queue. 

    We'll fix the test as part of this bug.

    [1]: https://searchfox.org/mozilla-central/rev/9d47b5fb14152865ad1375eb3ee2e571d81ecdb9/toolkit/components/telemetry/TelemetrySend.jsm#1102
Assignee: nobody → chutten
Status: NEW → ASSIGNED
This patch not only fixes the behaviour and the test to be according to our theory of what is correct, it also makes it so that the STR now result in a sync ping being saved in saved-telemetry-pings in a build with these changes (and a few extra changes to force it to think it's allowed to send even though it's a developer build, which are not included in the patch).

I would appreciate a more solid QA pass being thrown at this, but given that it fixes a bug (even if not the one we're hunting) I have still put it up for review and hope to get it into Nightly post-haste, even if "sync"-ping QA remains outstanding.

At which point we might be able to do server-side confirmation as well as client-side.
Comment on attachment 8950741 [details]
bug 1437120 - Ensure pings sent after net shutdown are persisted to disk

https://reviewboard.mozilla.org/r/219984/#review225946

::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:448
(Diff revision 1)
>    TelemetryController.submitExternalPing(TEST_TYPE, {});
> -  Assert.equal(TelemetrySend.pendingPingCount, 1, "Should not send the ping, should pend delivery");
>  
> +  // Triggering a shutdown should persist the pings
> +  await TelemetrySend.shutdown();
> +  Assert.equal(TelemetryStorage.pendingPingCount, 1, "Should have a pending ping on disk");

Are we sure that's the ping we're looking for? Can't this be the `main` ping? The `pingsender` should be disabled in [this test](https://searchfox.org/mozilla-central/rev/d03ad8843e3bf2e856126bc53b0475c595e5183b/toolkit/components/telemetry/tests/unit/head.js#340).
(In reply to Alessio Placitelli [:Dexter] from comment #8)
> Comment on attachment 8950741 [details]
> bug 1437120 - Ensure pings sent after net shutdown are persisted to disk
> 
> https://reviewboard.mozilla.org/r/219984/#review225946
> 
> ::: toolkit/components/telemetry/tests/unit/test_TelemetrySend.js:448
> (Diff revision 1)
> >    TelemetryController.submitExternalPing(TEST_TYPE, {});
> > -  Assert.equal(TelemetrySend.pendingPingCount, 1, "Should not send the ping, should pend delivery");
> >  
> > +  // Triggering a shutdown should persist the pings
> > +  await TelemetrySend.shutdown();
> > +  Assert.equal(TelemetryStorage.pendingPingCount, 1, "Should have a pending ping on disk");
> 
> Are we sure that's the ping we're looking for? Can't this be the `main`
> ping? The `pingsender` should be disabled in [this
> test](https://searchfox.org/mozilla-central/rev/
> d03ad8843e3bf2e856126bc53b0475c595e5183b/toolkit/components/telemetry/tests/
> unit/head.js#340).

+1. We should at least load it and check that the type is correct.
Comment on attachment 8950741 [details]
bug 1437120 - Ensure pings sent after net shutdown are persisted to disk

https://reviewboard.mozilla.org/r/219984/#review226040

Looks great now, thanks!
Attachment #8950741 - Flags: review?(alessio.placitelli) → review+
[Tracking Requested - why for this release]:
This loses some pings submitted to Telemetry on shutdown (sync specifically).
This breaks the promise of transport reliability that clients of Telemetry have.
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a0a032ce94f8
Ensure pings sent after net shutdown are persisted to disk r=Dexter
https://hg.mozilla.org/mozilla-central/rev/a0a032ce94f8
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
This fix is in the latest nightly (nightly 20180215103933, see comment 14).
Rebecca, could you confirm the fix?
Flags: needinfo?(rbillings)
I updated nightly, set the prefs and verified that sync pings are being generated correctly by the gzipServer:

ex: 127.0.0.1 - - [15/Feb/2018 09:32:30] "POST /submit/telemetry/bc23955b-8723-654c-a2f9-4c420e219858/main/Firefox/60.0a1/nightly/20180215103933?v=4 HTTP/1.1" 200 -
Thu Feb 15 09:32:30 2018	Received submission #10: 24750 bytes. Saved in report10.json

I have an empty folder under /Users/me/Library/Applications/Firefox/Profiles/todaysprofile/saved-telemetry-pings.
I also can pull up about:telemetry & do not see any "archive" option - apologies for not being more familiar with it!
Flags: needinfo?(rbillings)
Attached image Finding archived pings
Thanks!

Here is where to find the archived pings on about:telemetry, on the top left.
(It's hard to discover, i'll file something to think about improvements.)

Does it show up there?
Flags: needinfo?(rbillings)
Got it: I went to current ping> archive> ping type= sync and there are at least 4 entries from today. 
Let me know if there is any data or details, or json reports that would be helpful to send/attach.
Flags: needinfo?(rbillings)
Rebecca, what steps did you follow? Some sync pings were always getting through. We want to make sure we're saving the sync pings sent at Firefox shutdown to the saved-telemetry-pings/ folder (if they are sent too late in shutdown).
Flags: needinfo?(rbillings)
1. update nightly
2. clone the gzip server repo from https://github.com/mozilla/gzipServer
3. start it locally using "python gzipServer.py -p 8080"
4. create new FF profile
5. go to about:config and change toolkit.telemetry.server to http://127.0.0.1:8080
6. create new fxa account & verify
7. open nightly & sign in with new account
8. (Optional) In Firefox, go to Tools -> Sync now
9. shutdown firefox
10. start firefox
11. wait a couple of minutes

** I rechecked my saved-telemetry-pings folder for the created profile and it now contains 6 pings. However, the earliest one has a timestamp of 40 minutes after I did the test- which is why the folder was empty when I commented above. The test was at 9:32am, the ping creation time is 10:29am.
Flags: needinfo?(rbillings)
Could you check saved-telemetry-pings for a "sync" ping between steps 9 and 10? The files should be plain text files that you can open in a text editor like Notepad.
I re-ran the test. In 10 minutes I've seen at least 3 POST pings go by - but the saved-telemetry-pings folder remains empty. I synced the account at 2:57, and the about:telemetry archive says last sync at 2:44 [the end of a different testing session].I'm going to let it run for a while and see how long it takes to get data into the saved-telemetry-pings folder.
I let the browser remain open for an hour. The last sync was at 3:57, but about:telemetry continued to show 2:44 and the saved-telemetry-pings folder remained empty. I then closed the browser and the folder populated with 3 saved-telemetry-ping files which I zipped and emailed to :chutten.
Thank you :rbillings!

The zip file contained three valid pings: one sync, one saved-session, and one bhr ping.

This verifies that the fix is working on Nightly.
Status: RESOLVED → VERIFIED
Comment on attachment 8950741 [details]
bug 1437120 - Ensure pings sent after net shutdown are persisted to disk

Approval Request Comment
[Feature/Bug causing the regression]: bug 1397293 
[User impact if declined]: Many telemetry pings will not be sent, impacting our ability to track the health of Sync.
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Low risk, limited to telemetry pings sent during shutdown
[Why is the change risky/not risky?]:
[String changes made/needed]: None
Attachment #8950741 - Flags: approval-mozilla-beta?
Comment on attachment 8950741 [details]
bug 1437120 - Ensure pings sent after net shutdown are persisted to disk

fix telemetry pings issue, beta59+
Attachment #8950741 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
qe- since it has automated tests.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.