Canary client id not always resetting to new value

RESOLVED FIXED in Firefox 63

Status

()

defect
P1
normal
RESOLVED FIXED
9 months ago
7 months ago

People

(Reporter: mreid, Assigned: janerik)

Tracking

(Blocks 1 bug)

Trunk
mozilla64
Points:
2
Dependency tree / graph

Firefox Tracking Flags

(firefox63 fixed, firefox64 fixed)

Details

Attachments

(2 attachments)

Reporter

Description

9 months ago
I was looking at about:telemetry and noticed that my client id has changed to:

c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0

This seems an unlikely UUID to say the least :)

I double checked main_summary, and there appear to be about 500 main pings per day from this particular client id (I only have 8 so far in my local datareporting archive, so they're not all from me).
Reporter

Comment 1

9 months ago
Hm, never mind. Looks like I disabled telemetry locally.  False alarm!
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → INVALID

Updated

9 months ago
See Also: → 1445921
Assignee

Comment 2

9 months ago
Pings with that client ID should never be sent. If it does happen that is indeed a bug.
I'll investigate.
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Assignee

Updated

9 months ago
Assignee: nobody → jrediger
Assignee

Updated

9 months ago
Points: --- → 1
Priority: -- → P1
Assignee

Comment 3

9 months ago
With the help of :mreid I investigated this a bit.
The main pings with a client ID of c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0 were never send out from his Firefox instance (as expected). We always archive pings, even if not sending them.
I verified this using stmo and the specific ping document IDs.

I identified a small data race that could happen when flipping the flag:
https://searchfox.org/mozilla-central/rev/d4ef4e9747133aa2914aca2a15cf9df1e42a6aa0/toolkit/components/telemetry/app/TelemetryController.jsm#846

If something is submitting a ping right after the flag was flipped, but before the observer is executed,
it will use the cached c0ffee ID and, because the pref is enabled, send a ping. I guess that's a rare thing, but with that many clients out there, it can potentially happen more often.

I don't know (yet) if that is what we are seeing with the 500 pings/day in main_summary.
Reporter

Updated

9 months ago
Blocks: 1489560
Assignee

Comment 4

8 months ago
I investigated a bit further.

I looked into more details of pings with the canary client ID (c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0) here:
https://sql.telemetry.mozilla.org/queries/58858

This shows that the canary client ID turns up in pings from release/beta versions.
This can be explained like this:
1) A user tries out beta or nightly.
3) In beta/nightly they disable Telemetry upload, at which point the canary client ID is persisted.
4) They switch back to stable
5) They enable Telemetry.
5) A normal ping is send with the canary client ID.

This would explain why we see consistent pings with the canary ID from probably the same clients.
This does not fully explain why we see pings with a canary ID 500-800 times a day.

Possible solution:

At Telemetry init check the persisted client ID and if upload is enabled.
If client ID matches the canary and the upload is enabled, reset the client ID to a valid value.
(Probably in reverse if upload is disabled as well)

This should now be done regardless of other solutions in order to get those clients to valid client IDs again eventually.
It should probably be uplifted to beta to get it into release as soon as possible (not sure of direct stable uplifts).

Additionally:
We can extend Telemetry to avoid ever sending if the payload contains the canary clientID.
Though this would mean we might lose otherwise valid (and allowed-to-upload) data.

We should monitor the rate of pings with the canary client ID on ingestions.
Flags: needinfo?(mreid)
Flags: needinfo?(gfritzsche)
(In reply to Jan-Erik Rediger [:janerik] from comment #4)
> I looked into more details of pings with the canary client ID
> (c0ffeec0-ffee-c0ff-eec0-ffeec0ffeec0) here:
> https://sql.telemetry.mozilla.org/queries/58858
> 
> This shows that the canary client ID turns up in pings from release/beta
> versions.
> This can be explained like this:
> 1) A user tries out beta or nightly.
> 3) In beta/nightly they disable Telemetry upload, at which point the canary
> client ID is persisted.
> 4) They switch back to stable
> 5) They enable Telemetry.
> 5) A normal ping is send with the canary client ID.

Why would the client id not set to a new value between 5) & 6)?
Flags: needinfo?(gfritzsche) → needinfo?(jrediger)
(In reply to Jan-Erik Rediger [:janerik] from comment #4)
> This would explain why we see consistent pings with the canary ID from
> probably the same clients.
> This does not fully explain why we see pings with a canary ID 500-800 times

Is it possible to get numbers on how many of these are linked to channel switching?
(In reply to Jan-Erik Rediger [:janerik] from comment #4)
> At Telemetry init check the persisted client ID and if upload is enabled.
> If client ID matches the canary and the upload is enabled, reset the client
> ID to a valid value.
> (Probably in reverse if upload is disabled as well)
> 
> This should now be done regardless of other solutions in order to get those
> clients to valid client IDs again eventually.
> It should probably be uplifted to beta to get it into release as soon as
> possible (not sure of direct stable uplifts).

+1, to adding checks both ways & the uplifts.
How does that work out when switching back and forth? beta -> release -> beta?
Assignee

Comment 8

8 months ago
(In reply to Georg Fritzsche [:gfritzsche] from comment #5)
> Why would the client id not set to a new value between 5) & 6)?

Because stable doesn't have that code yet.

(In reply to Georg Fritzsche [:gfritzsche] from comment #6)
> Is it possible to get numbers on how many of these are linked to channel switching?

client ID is the only way for us to identify them across channels and in this case that's borked. The other information can give a hint, but won't be definitive.

(In reply to Georg Fritzsche [:gfritzsche] from comment #7)
> +1, to adding checks both ways & the uplifts.
> How does that work out when switching back and forth? beta -> release -> beta?

If the fix is in beta, it will correct the client ID. The release one won't. Prefs are part of the profile, so are also shared.
Flags: needinfo?(jrediger)
(In reply to Jan-Erik Rediger [:janerik] from comment #8)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #7)
> > +1, to adding checks both ways & the uplifts.
> > How does that work out when switching back and forth? beta -> release -> beta?
> 
> If the fix is in beta, it will correct the client ID. The release one won't.
> Prefs are part of the profile, so are also shared.

Can you expand a bit on what would happen from a Telemetry perspective when switching back and forth?
Assignee

Updated

8 months ago
Blocks: 1490668
Assignee

Comment 10

8 months ago
(In reply to Georg Fritzsche [:gfritzsche] from comment #9)
> Can you expand a bit on what would happen from a Telemetry perspective when
> switching back and forth?

Scenario: current code + fix-up client ID on startup in beta. Release as is today (e.g. no optout ping, no canary client id code).


1. User launches Beta, disables Telemetry -> canary id persisted
2. User launches release, loads canary ID. User enables Telemetry. client ID unmodified.
   Telemetry is now send with canary ID
3. User launches Beta again. Startup detects: upload enabled && canary ID -> new client ID generated.
   Telemetry is send with new client ID
4. User launches Release again: valid client ID & upload enabled -> valid Telemetry ping
(In reply to Georg Fritzsche [:gfritzsche] from comment #6)
> (In reply to Jan-Erik Rediger [:janerik] from comment #4)
> > This would explain why we see consistent pings with the canary ID from
> > probably the same clients.
> > This does not fully explain why we see pings with a canary ID 500-800 times
> 
> Is it possible to get numbers on how many of these are linked to channel
> switching?

Well, every coffee-ping coming from buildids before 63 is definitely due to channel switching :)

...we could count "optout" pings per channel and then coffee-pings per channel to get some handle on the scope of the issue. Maybe all we'll get is "This affects at most X clients.", but it could be handy to know whether X is measured in hundreds or hundreds of thousands.
Summary: My Client ID seems to have changed to an unexpected value on August 22nd → Canary client id not always resetting to new value
Assignee

Updated

8 months ago
Points: 1 → 2
Reporter

Comment 13

8 months ago
> We should monitor the rate of pings with the canary client ID on ingestions.

Agreed. Should we monitor the overall count over time? Or break things down by channel, build, etc? What would be the most actionable thing to do here?
Flags: needinfo?(mreid)
Assignee

Comment 14

8 months ago
chutten: Updated patch submitted.

mreid: I think breaking it down by channel would be helpful for now. Unless there's another bug hidden, the above patch should bring down the numbers.

I will try to run some queries on the current data as per the above suggestions to see if we can currently say a bit more about affected clients.
Flags: needinfo?(mreid)
Comment on attachment 9008753 [details]
Bug 1489520 - Fix canary client ID on startup when Telemetry upload is enabled

Chris H-C :chutten has approved the revision.
Attachment #9008753 - Flags: review+
Assignee

Updated

8 months ago
Blocks: 1491819

Comment 16

8 months ago
Pushed by jrediger@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/961cfd2ee957
Fix canary client ID on startup when Telemetry upload is enabled r=chutten

Comment 17

8 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/961cfd2ee957
Status: REOPENED → RESOLVED
Last Resolved: 9 months ago8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Comment on attachment 9010716 [details]
Bug 1489520 - [beta] Fix canary client ID on startup when Telemetry upload is enabled

Chris H-C :chutten has approved the revision.
Attachment #9010716 - Flags: review+
Assignee

Comment 20

8 months ago
Comment on attachment 9010716 [details]
Bug 1489520 - [beta] Fix canary client ID on startup when Telemetry upload is enabled

Approval Request Comment
[Feature/Bug causing the regression]:
bug 1445921, "optout" ping
[User impact if declined]:
a small percentage of profiles will have an invalid client id
[Is this code covered by automated tests?]:
Yes, this patch contains the tests
[Has the fix been verified in Nightly?]:
Yes, see bug 1491819
[Needs manual test from QE? If yes, steps to reproduce]:
No, we keep monitoring Telemetry data 
[List of other uplifts needed for the feature/fix]:
None
[Is the change risky?]:
No
[Why is the change risky/not risky?]:
Not risky because it's covered by tests and analysis on Telemetry nightly data. We will keep monitoring incoming data. The fix only applies if it detects the issue and corrects it. 
[String changes made/needed]:
None
Attachment #9010716 - Flags: approval-mozilla-beta?
Comment on attachment 9010716 [details]
Bug 1489520 - [beta] Fix canary client ID on startup when Telemetry upload is enabled

P1 telemetry bug fix with tests, , uplift approved for 63 beta 9, thanks.
Attachment #9010716 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Reporter

Updated

8 months ago
Flags: needinfo?(mreid)
Assignee

Updated

7 months ago
See Also: → 1499753
You need to log in before you can comment on or make changes to this bug.