Closed Bug 1594201 Opened 3 years ago Closed 3 years ago

enrollmentId tagging causes users to crash out of enrolled experiments


(Firefox :: Normandy Client, defect, P5)




Tracking Status
firefox70 --- unaffected
firefox71 --- fixed
firefox72 --- fixed


(Reporter: tdsmith, Unassigned)




(Keywords: regression)

We saw a big decrease in the daily volume of main pings tagged with the "webrender-performance-tracking-study-nightly-1569972" slug in late September.

It's temporally correlated to the enrollment_id tagging from bug 1555176, which first landed in nightly build 20190927094817.

If we look at the highest build number that clients sent us tagged pings from, there's a big spike representing the immediately preceding build, 20190926213542:

We also saw a crash in main ping volume from the beta channel experiment when beta 71 landed with this change around Oct 22:

I think this points to a problem where clients enrolled in experiments experience an error condition after upgrading to a version of Firefox that contains the enrollment_id tagging logic. If that's real, we should fix it before 71 goes to release, to avoid interrupting ongoing experiments.

I dug into this and bug 1594035. Although the immediate cause is a little different (since this is setExperimentActive and the other bug is sendEvent), the underlying cause is the same: It isn't valid to send telemetry with null values for enrollmentId. It needs to either be a string absent entirely.

The fix I have in mind for this will fix both issues together, so there won't be any activity in this bug. However, there is a fix coming soon. I think we should keep the bugs separate so we can verify them separately. Perhaps there will be some unique aspects to each bug.

Michael, is an uplift planned for 71 or will the fix be in 72 only? Thanks

Flags: needinfo?(mcooper)

Once this lands on 72, I plan to request uplift to 71.

Flags: needinfo?(mcooper)

The priority flag is not set for this bug.
:mythmon, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mcooper)

I'm setting this as a very low priority, not because it isn't important, but because I believe it is a dupe of bug 1594035. When that lands I'll verify the changes, and if both issues are fixed I'll close both bugs.

Flags: needinfo?(mcooper)
Priority: -- → P5

Michael, did the patch in bug 1594035 fix this bug and we can mark it as duplicate?

Flags: needinfo?(mcooper)

Since this issue is about effects on active experiments, I can't say for sure if it is fixed by the patch in bug 1594035. To be sure, we'd have to wait until the patch has been on Beta for long enough to observe Telemetry coming back.

That being said, after investigating and fixing the other bug, I'm confident that this is the same problem. I'm going to mark it as a dupe now. We can re-open it if that ends up being not true.

Closed: 3 years ago
Flags: needinfo?(mcooper)
Resolution: --- → DUPLICATE
Duplicate of bug: 1594035
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.