Closed Bug 1489520 Opened 7 years ago Closed 7 years ago

Canary client id not always resetting to new value

Categories

(Toolkit :: Telemetry, defect, P1)

defect
Points:
2

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox63 --- fixed
firefox64 --- fixed

People

(Reporter: mreid, Assigned: janerik)

References

Details

Attachments

(2 files)

I was looking at about:telemetry and noticed that my client id has changed to: c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0 This seems an unlikely UUID to say the least :) I double checked main_summary, and there appear to be about 500 main pings per day from this particular client id (I only have 8 so far in my local datareporting archive, so they're not all from me).
Hm, never mind. Looks like I disabled telemetry locally. False alarm!
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
Pings with that client ID should never be sent. If it does happen that is indeed a bug. I'll investigate.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Assignee: nobody → jrediger
Points: --- → 1
Priority: -- → P1
With the help of :mreid I investigated this a bit. The main pings with a client ID of c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0 were never send out from his Firefox instance (as expected). We always archive pings, even if not sending them. I verified this using stmo and the specific ping document IDs. I identified a small data race that could happen when flipping the flag: https://searchfox.org/mozilla-central/rev/d4ef4e9747133aa2914aca2a15cf9df1e42a6aa0/toolkit/components/telemetry/app/TelemetryController.jsm#846 If something is submitting a ping right after the flag was flipped, but before the observer is executed, it will use the cached c0ffee ID and, because the pref is enabled, send a ping. I guess that's a rare thing, but with that many clients out there, it can potentially happen more often. I don't know (yet) if that is what we are seeing with the 500 pings/day in main_summary.
Blocks: 1489560
I investigated a bit further. I looked into more details of pings with the canary client ID (c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0) here: https://sql.telemetry.mozilla.org/queries/58858 This shows that the canary client ID turns up in pings from release/beta versions. This can be explained like this: 1) A user tries out beta or nightly. 3) In beta/nightly they disable Telemetry upload, at which point the canary client ID is persisted. 4) They switch back to stable 5) They enable Telemetry. 5) A normal ping is send with the canary client ID. This would explain why we see consistent pings with the canary ID from probably the same clients. This does not fully explain why we see pings with a canary ID 500-800 times a day. Possible solution: At Telemetry init check the persisted client ID and if upload is enabled. If client ID matches the canary and the upload is enabled, reset the client ID to a valid value. (Probably in reverse if upload is disabled as well) This should now be done regardless of other solutions in order to get those clients to valid client IDs again eventually. It should probably be uplifted to beta to get it into release as soon as possible (not sure of direct stable uplifts). Additionally: We can extend Telemetry to avoid ever sending if the payload contains the canary clientID. Though this would mean we might lose otherwise valid (and allowed-to-upload) data. We should monitor the rate of pings with the canary client ID on ingestions.
Flags: needinfo?(mreid)
Flags: needinfo?(gfritzsche)
(In reply to Jan-Erik Rediger [:janerik] from comment #4) > I looked into more details of pings with the canary client ID > (c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0) here: > https://sql.telemetry.mozilla.org/queries/58858 > > This shows that the canary client ID turns up in pings from release/beta > versions. > This can be explained like this: > 1) A user tries out beta or nightly. > 3) In beta/nightly they disable Telemetry upload, at which point the canary > client ID is persisted. > 4) They switch back to stable > 5) They enable Telemetry. > 5) A normal ping is send with the canary client ID. Why would the client id not set to a new value between 5) & 6)?
Flags: needinfo?(gfritzsche) → needinfo?(jrediger)
(In reply to Jan-Erik Rediger [:janerik] from comment #4) > This would explain why we see consistent pings with the canary ID from > probably the same clients. > This does not fully explain why we see pings with a canary ID 500-800 times Is it possible to get numbers on how many of these are linked to channel switching?
(In reply to Jan-Erik Rediger [:janerik] from comment #4) > At Telemetry init check the persisted client ID and if upload is enabled. > If client ID matches the canary and the upload is enabled, reset the client > ID to a valid value. > (Probably in reverse if upload is disabled as well) > > This should now be done regardless of other solutions in order to get those > clients to valid client IDs again eventually. > It should probably be uplifted to beta to get it into release as soon as > possible (not sure of direct stable uplifts). +1, to adding checks both ways & the uplifts. How does that work out when switching back and forth? beta -> release -> beta?
(In reply to Georg Fritzsche [:gfritzsche] from comment #5) > Why would the client id not set to a new value between 5) & 6)? Because stable doesn't have that code yet. (In reply to Georg Fritzsche [:gfritzsche] from comment #6) > Is it possible to get numbers on how many of these are linked to channel switching? client ID is the only way for us to identify them across channels and in this case that's borked. The other information can give a hint, but won't be definitive. (In reply to Georg Fritzsche [:gfritzsche] from comment #7) > +1, to adding checks both ways & the uplifts. > How does that work out when switching back and forth? beta -> release -> beta? If the fix is in beta, it will correct the client ID. The release one won't. Prefs are part of the profile, so are also shared.
Flags: needinfo?(jrediger)
(In reply to Jan-Erik Rediger [:janerik] from comment #8) > (In reply to Georg Fritzsche [:gfritzsche] from comment #7) > > +1, to adding checks both ways & the uplifts. > > How does that work out when switching back and forth? beta -> release -> beta? > > If the fix is in beta, it will correct the client ID. The release one won't. > Prefs are part of the profile, so are also shared. Can you expand a bit on what would happen from a Telemetry perspective when switching back and forth?
Blocks: 1490668
(In reply to Georg Fritzsche [:gfritzsche] from comment #9) > Can you expand a bit on what would happen from a Telemetry perspective when > switching back and forth? Scenario: current code + fix-up client ID on startup in beta. Release as is today (e.g. no optout ping, no canary client id code). 1. User launches Beta, disables Telemetry -> canary id persisted 2. User launches release, loads canary ID. User enables Telemetry. client ID unmodified. Telemetry is now send with canary ID 3. User launches Beta again. Startup detects: upload enabled && canary ID -> new client ID generated. Telemetry is send with new client ID 4. User launches Release again: valid client ID & upload enabled -> valid Telemetry ping
(In reply to Georg Fritzsche [:gfritzsche] from comment #6) > (In reply to Jan-Erik Rediger [:janerik] from comment #4) > > This would explain why we see consistent pings with the canary ID from > > probably the same clients. > > This does not fully explain why we see pings with a canary ID 500-800 times > > Is it possible to get numbers on how many of these are linked to channel > switching? Well, every coffee-ping coming from buildids before 63 is definitely due to channel switching :) ...we could count "optout" pings per channel and then coffee-pings per channel to get some handle on the scope of the issue. Maybe all we'll get is "This affects at most X clients.", but it could be handy to know whether X is measured in hundreds or hundreds of thousands.
Summary: My Client ID seems to have changed to an unexpected value on August 22nd → Canary client id not always resetting to new value
Points: 1 → 2
> We should monitor the rate of pings with the canary client ID on ingestions. Agreed. Should we monitor the overall count over time? Or break things down by channel, build, etc? What would be the most actionable thing to do here?
Flags: needinfo?(mreid)
chutten: Updated patch submitted. mreid: I think breaking it down by channel would be helpful for now. Unless there's another bug hidden, the above patch should bring down the numbers. I will try to run some queries on the current data as per the above suggestions to see if we can currently say a bit more about affected clients.
Flags: needinfo?(mreid)
Comment on attachment 9008753 [details] Bug 1489520 - Fix canary client ID on startup when Telemetry upload is enabled Chris H-C :chutten has approved the revision.
Attachment #9008753 - Flags: review+
Blocks: 1491819
Pushed by jrediger@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/961cfd2ee957 Fix canary client ID on startup when Telemetry upload is enabled r=chutten
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Comment on attachment 9010716 [details] Bug 1489520 - [beta] Fix canary client ID on startup when Telemetry upload is enabled Chris H-C :chutten has approved the revision.
Attachment #9010716 - Flags: review+
Comment on attachment 9010716 [details] Bug 1489520 - [beta] Fix canary client ID on startup when Telemetry upload is enabled Approval Request Comment [Feature/Bug causing the regression]: bug 1445921, "optout" ping [User impact if declined]: a small percentage of profiles will have an invalid client id [Is this code covered by automated tests?]: Yes, this patch contains the tests [Has the fix been verified in Nightly?]: Yes, see bug 1491819 [Needs manual test from QE? If yes, steps to reproduce]: No, we keep monitoring Telemetry data [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: No [Why is the change risky/not risky?]: Not risky because it's covered by tests and analysis on Telemetry nightly data. We will keep monitoring incoming data. The fix only applies if it detects the issue and corrects it. [String changes made/needed]: None
Attachment #9010716 - Flags: approval-mozilla-beta?
Comment on attachment 9010716 [details] Bug 1489520 - [beta] Fix canary client ID on startup when Telemetry upload is enabled P1 telemetry bug fix with tests, , uplift approved for 63 beta 9, thanks.
Attachment #9010716 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: needinfo?(mreid)
See Also: → 1499753
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: