Add environment.experiments annotation to main summary



Data Platform and Tools
Datasets: Main Summary
11 months ago
11 months ago


(Reporter: wlach, Assigned: wlach)




(1 attachment)

Followup from bug 1362520:

We should add the environment.experiments information to the main summary so people can correlate other information in the main ping with an experiment being active/inactive.

It looks like this is just an array of strings:

We can probably just store it as such, similar to what we do with the list of active addons?
mreid, does the above approach make sense?
Flags: needinfo?(mreid)

Comment 2

11 months ago
Let me tell you how I and others are typically going want to query it. I want to select everyone who's in a particular experiment, and then group by the experiment branch.

So in terms of presto query semantics, I'd be looking for something like this

SELECT element_at(experiments, 'flashrollout').branch AS branch
FROM main_summary
WHERE element_at(experiments, 'flashrollout') IS NOT NULL

At least if I'm reading this right as a map of structs. Poking through an array for a particular experiment is a bit harder than a map. With an array you're almost stuck exploding it or using magic like filter() which IIRC requires a presto version we don't have deployed yet.

Comment 3

11 months ago
The "core" ping is a bit of a special case. The one we want is from the "experiments" field of the "Environment" section shared by a bunch of ping types:

It is very much like addons, it is a map from experiment id to some properties (of which there's only one right now, namely "branch").

For addons, we converted the map to an array of structs, rolling the "id" in as a field.

Benjamin's comments make sense to me, and I think it's likely that most analyses will be looking at a particular experiment, rather than analyzing experiments in general. Addons generally use both approaches, so it made sense to use an array (as well as a special dataset with addons exploded into rows).

I'm inclined to mirror the structure in the ping and keep this as a map of string -> struct.

Sunah, do you have an opinion on this?
Flags: needinfo?(mreid) → needinfo?(ssuh)
I agree with Mark -- structure it as a map, and then we can explode experiments out into a separate dataset partitioned by experiment ID based on this field if we find we're using it often.
Flags: needinfo?(ssuh)
Created attachment 8870559 [details] [review]

I have a PR in progress for this.


11 months ago
Points: --- → 2
Priority: -- → P1
The PR is merged and should be available in the next deploy (it functions as described in comment 3).
Last Resolved: 11 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.