Closed Bug 1653244 Opened 4 years ago Closed 4 years ago

Experiment API enrollment info may not be showing up in some metrics and baseline pings.

Categories

(Data Platform and Tools :: Glean: SDK, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: travis_, Assigned: travis_)

Details

(Whiteboard: [telemetry:glean-rs:m?])

Attachments

(1 file)

Mako enrollment missing about 13% of clients who should be eligible. When the experiment originally launched, there was a bad matcher used for locale/language that may have affected this. The matcher in the recipe requires a ISO 639 format 3 character language but en-US, en-GB, etc. was used. This was corrected and the enrollment began normally, but there was about 13% of clients in the baseline ping (after uptake) that should be eligible for the experiment that still aren't enrolling, and about 95% of clients in the metrics ping.

I'm not certain if the original "bad" recipe is affecting those clients, but if you look at the same query from the perspective of the metrics ping instead of the baseline ping, the really interesting/scary part shows up. There are only 5% of clients enrolled based on the metrics ping, which leads me to believe that we are not always sending the experiment info in the ping_info section of the metrics pings.

When looking at the baseline ping experiment enrollment by build_id, the experiment enrollment appears to get better over time looking like an uptake curve, but looking at the same query on the metrics ping shows less than 10% enrolled in the experiment.

Looks like we do have "holes" in reported experiment data from both the metrics ping and the baseline ping. I believe this means there is some set of conditions (maybe around the frequent upgrading of nightly?) that causes Glean to not send the experiment info in the ping.

Summary: Experiment API enrollment info may not be showing up in metrics pings. → Experiment API enrollment info may not be showing up in some metrics and baseline pings.
Assignee: nobody → tlong
Priority: P3 → P1

Thanks to Dexter for pointing me in the right direction on this. Since the metrics pings were being scheduled using executeTask, the collection and submission was actually happening after init completed and wiped the application lifetime metrics. This was fixed in the attached PR by just calling the collection synchronously rather than executing them through the dispatchers. This should be fine since the only place that this happens is in the schedule() function in the MPS, and the only place that gets called is from the portion of Glean.initialize that gets called on the Dispatcher.

After filtering out the baseline pings that didn't have a start_time in the experiment window (old pings that hadn't been sent yet) rather than just relying on submission_timestamp, the baseline pings show an overall enrollment of 96%, which is much closer to what I would expect. The remaining baseline ping "holes" seem to be coming from instances where we got two baseline pings very close together in time, such as the user opening and then immediately backgrounding or force-closing the app. I don't know that there is much we can do in these cases, and there were only a very few clients that had this state that I could find, so leaving it as is for now.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED

PSA sent and downstream update of A-C in progress.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: