Mark all Glean-sent telemetry from Firefoxen running in CI as coming from automation
Categories
(Toolkit :: Telemetry, task, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox90 | --- | fixed |
People
(Reporter: chutten, Assigned: chutten)
References
Details
(Whiteboard: [telemetry:fog:m?])
Attachments
(1 file)
With bug 1664461 we're starting to send pings via the Glean SDK from Firefox Desktop. We managed to spy a few coming in with the channel nightly-autoland
which suggests there might be some Firefox instances trying to send data from automation.
We should probably mark these as coming from automation so
- The data is easier to find for automation-specific analyses
- The data is easier to exclude for non-automation analyses
- We show a good example for anyone else running Firefox in automation to follow
Luckily, the Glean SDK has a standard way of tagging the source of pings: GLEAN_SOURCE_TAGS
. Just set that to automation
(the traditional value) and it'll show up in a meta
column in the resulting datasets for analysts to make use of.
Assignee | ||
Comment 1•4 years ago
|
||
Oh hey, a fun extra piece I only learned today: If you mark the data with the automation
source tag, it will be filtered out from ever reaching the stable tables (like firefox_desktop.fog_validation
) instead being stopped and hanging out in the Live tables (like firefox_desktop_live.fog_validation_v1
) where they will disappear (along with all the rest of the live data) after 30 days.
This detail was implemented in bug 1657360, and if that's not a cool behaviour then we can find something else to tag it as.
Comment 2•4 years ago
|
||
Seems like the wrong component. As stated this has nothing to do with mach
.
Assignee | ||
Comment 3•4 years ago
|
||
When I chatted with mhentges I was told to file it in Mach Core... ni?mhentges for confirmation.
Comment 4•4 years ago
|
||
Toolkit:Telemetry
could work as well, but I believe that that's for defining and implementing in-browser telemetry, and not managing automation?
I wanted to avoid this ticket "hot potato-ing" around components until it finds its home. It's true that this probably won't be solved in mach
itself, but I'm not sure where it's "real" home would be (probably a component that manages CI configuration?)
Let's see what the Toolkit:Telemetry
folks say here.
Comment 5•4 years ago
|
||
The original bug referenced in comment 0 is in Toolkit :: Telemetry
, and nothing under discussion (as far as I know) even involves making any changes to Python code, let alone mach
itself.
Assignee | ||
Comment 6•4 years ago
|
||
I was expecting there to be some sort of environment configuration template I could add a new envvar to that would ensure that, while running in automation, we also set the GLEAN_SOURCE_TAGS appropriately. I have no idea where that is, though.
Assignee | ||
Comment 7•4 years ago
|
||
Do you know of a common configuration template I could add a new envvar to? Or something that'd solve the same problem? (Or someone who might know more?)
Comment 8•4 years ago
|
||
Sorry for this falling off the radar :(
For C++ projects, does GLEAN_SOURCE_TAGS
need to be set at compile-time or run-time?
If it's compile-time, you could add them to the mozconfig
for the Glean build (just like how this mozconfig
is the linux64 nightly one).
If it's at run-time (e.g.: GLEAN_SOURCE_TAGS=automation firefox --do-something
), then I think you'll want to customize a taskcluster config. I'm guessing that this would affect all tasks running Firefox in CI, yeah?
Assignee | ||
Comment 9•4 years ago
|
||
This is a runtime thing, yeah. We'd like to catch all possible data that might come from automation (usually accidentally, but maybe we'll want to collect data on purpose at some point). Is there a root config?
Comment 10•4 years ago
|
||
That's a good question, I'm a little less familiar with in-tree taskgraph
.
Let me NI Aki from Release Engineering, I think that he'll have a better idea where we can handle this generally.
Comment 11•4 years ago
|
||
Are we talking about this job in-tree in Gecko?
If so, the component is probably Firefox Build System :: Task Configuration
. The easy way to do it is to add the env var here. (The env var will also be set on non-glean tasks, but if they ignore it, that may be ok. If that's not ok, we probably have to do something with a custom transform to update the env vars.) We can probably test to see if this patch works using ./mach try fuzzy
to send this patch to try; if we select the glean/fog task then it'll run and we can verify its env vars.
(If not gecko, I probably need to know which repository we're talking about.)
Comment 12•4 years ago
|
||
(In reply to Aki Sasaki [:aki] (he/him) from comment #11)
If so, the component is probably
Firefox Build System :: Task Configuration
. The easy way to do it is to add the env var here.
We'd also need to add an env
block for the default
section below to add it to the mac+windows tasks.
Comment 13•4 years ago
|
||
I think that this applies more generally than the Glean Test task.
If I understand correctly, we want all tasks that are executing Firefox in-tree to have this environment variable.
I'm guessing that this is so we don't pollute our telemetry with information from Firefox running in CI.
Comment 14•4 years ago
|
||
If we need something set in all of moz automation, why not use MOZ_AUTOMATION
, which is already set in most if not all automation tasks, rather than introduce a new env var we have to set everywhere?
Comment 15•4 years ago
|
||
Comment 16•4 years ago
|
||
We could, though I'm guessing that we don't want Mozilla-specific bits in Glean. However, given the potential difficulty of setting a new env var for all of CI, this might be the right tradeoff?
Eh, I'm not as valuable for this conversation because I'll be making assumptions for both perspectives here. I'll NI :chutten again here and let you two call the shots :)
Updated•4 years ago
|
Assignee | ||
Comment 17•4 years ago
|
||
It's true, the Glean SDK knows nothing about Firefox so putting MOZ_AUTOMATION
handling in there wouldn't be the right call.
However, FOG is designed to speak Gecko on one side and Glean on the other, so maybe there's something we can do at that level. I could read MOZ_AUTOMATION
in FOG's init and call Glean's set_source_tags
manually... but I'm not sure how that'd interact with any env var that Glean itself might read (e.g. what if both MOZ_AUTOMATION
and GLEAN_SOURCE_TAGS
are set? Which should win?). ni?Jan-Erik who'll know whether this is a good angle to try.
Comment 18•4 years ago
|
||
(If fog is the only place we're gathering glean data in gecko, then https://hg.mozilla.org/try/rev/64348c03c36dd0819b02411933a849459c395b0a should be sufficient. That looks like the only place glean is referenced in all of gecko taskgraph, but I don't know if we're invisibly running glean elsewhere.)
Comment 19•4 years ago
|
||
Glean reads GLEAN_SOURCE_TAGS
on init, and only then applies stuff set by set_source_tags
. So if we naively call it when MOZ_AUTOMATION
is set it would always override it.
I think GLEAN_SOURCE_TAGS
should override MOZ_AUTOMATION
though, so we would need to also check GLEAN_SOURCE_TAGS
to not apply MOZ_AUTOMATION
. Feels a bit icky to need to know about Glean stuff, but then again GLEAN_SOURCE_TAGS
is public API to be used for debugging, so I guess it's fine.
:aki: overriding it for just that task won't help. FOG enables general data collection throughout the components, so might be triggered by any other piece of code.
Assignee | ||
Comment 20•4 years ago
|
||
Sounds like the bug lives in Toolkit::Telemetry after all : )
Work to be done: FOG needs to read the environment and set a source tag of automation
if MOZ_AUTOMATION && !GLEAN_SOURCE_TAGS
.
Comment 21•4 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #19)
Glean reads
GLEAN_SOURCE_TAGS
on init, and only then applies stuff set byset_source_tags
. So if we naively call it whenMOZ_AUTOMATION
is set it would always override it.
I thinkGLEAN_SOURCE_TAGS
should overrideMOZ_AUTOMATION
though, so we would need to also checkGLEAN_SOURCE_TAGS
to not applyMOZ_AUTOMATION
. Feels a bit icky to need to know about Glean stuff, but then againGLEAN_SOURCE_TAGS
is public API to be used for debugging, so I guess it's fine.
For python,
if os.environ.get("MOZ_AUTOMATION"):
os.environ.setdefault("GLEAN_SOURCE_TAGS", "automation")
should just set GLEAN_SOURCE_TAGS if it isn't already set.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 22•4 years ago
|
||
Comment 23•4 years ago
|
||
Comment 24•4 years ago
|
||
Backed out changeset ff7488019a95 (Bug 1672455) for causing test crashes.
Backout link: https://hg.mozilla.org/integration/autoland/rev/683c2a81d1a3230a9b2ae93162277244a99d4921
Push with failures, failure log.
Assignee | ||
Comment 25•4 years ago
|
||
Oh great, there's a race condition. It doesn't happen if I run it slowly, but if set_source_tags
wins the race over the glean.init
thread in initialize
, then we'll try to call glean_core
's set_source_tags
on a global glean object that hasn't yet been populated.
IOW was_initialize_called
is not a sufficient guard for with_glean_mut
. We need to wait until at least setup_glean
was called (because that sets the global glean).
I'll be filing an RLB bug for this, and for this bug I'll reorder the calls to ensure set_source_tags
always loses the race.
Comment 27•4 years ago
|
||
Comment 28•4 years ago
|
||
bugherder |
Description
•