Closed Bug 1672455 Opened 7 months ago Closed 16 days ago

Mark all Glean-sent telemetry from Firefoxen running in CI as coming from automation

Categories

(Toolkit :: Telemetry, task, P1)

task

Tracking

()

RESOLVED FIXED
90 Branch
Tracking Status
firefox90 --- fixed

People

(Reporter: chutten, Assigned: chutten)

References

Details

(Whiteboard: [telemetry:fog:m?])

Attachments

(1 file)

With bug 1664461 we're starting to send pings via the Glean SDK from Firefox Desktop. We managed to spy a few coming in with the channel nightly-autoland which suggests there might be some Firefox instances trying to send data from automation.

We should probably mark these as coming from automation so

  • The data is easier to find for automation-specific analyses
  • The data is easier to exclude for non-automation analyses
  • We show a good example for anyone else running Firefox in automation to follow

Luckily, the Glean SDK has a standard way of tagging the source of pings: GLEAN_SOURCE_TAGS. Just set that to automation (the traditional value) and it'll show up in a meta column in the resulting datasets for analysts to make use of.

See Also: → 1651110

Oh hey, a fun extra piece I only learned today: If you mark the data with the automation source tag, it will be filtered out from ever reaching the stable tables (like firefox_desktop.fog_validation) instead being stopped and hanging out in the Live tables (like firefox_desktop_live.fog_validation_v1) where they will disappear (along with all the rest of the live data) after 30 days.

This detail was implemented in bug 1657360, and if that's not a cool behaviour then we can find something else to tag it as.

Seems like the wrong component. As stated this has nothing to do with mach.

Component: Mach Core → Telemetry
Product: Firefox Build System → Toolkit

When I chatted with mhentges I was told to file it in Mach Core... ni?mhentges for confirmation.

Flags: needinfo?(mhentges)

Toolkit:Telemetry could work as well, but I believe that that's for defining and implementing in-browser telemetry, and not managing automation?
I wanted to avoid this ticket "hot potato-ing" around components until it finds its home. It's true that this probably won't be solved in mach itself, but I'm not sure where it's "real" home would be (probably a component that manages CI configuration?)
Let's see what the Toolkit:Telemetry folks say here.

Flags: needinfo?(mhentges)

The original bug referenced in comment 0 is in Toolkit :: Telemetry, and nothing under discussion (as far as I know) even involves making any changes to Python code, let alone mach itself.

I was expecting there to be some sort of environment configuration template I could add a new envvar to that would ensure that, while running in automation, we also set the GLEAN_SOURCE_TAGS appropriately. I have no idea where that is, though.

Do you know of a common configuration template I could add a new envvar to? Or something that'd solve the same problem? (Or someone who might know more?)

Flags: needinfo?(mhentges)

Sorry for this falling off the radar :(

For C++ projects, does GLEAN_SOURCE_TAGS need to be set at compile-time or run-time?
If it's compile-time, you could add them to the mozconfig for the Glean build (just like how this mozconfig is the linux64 nightly one).
If it's at run-time (e.g.: GLEAN_SOURCE_TAGS=automation firefox --do-something), then I think you'll want to customize a taskcluster config. I'm guessing that this would affect all tasks running Firefox in CI, yeah?

Flags: needinfo?(mhentges) → needinfo?(chutten)

This is a runtime thing, yeah. We'd like to catch all possible data that might come from automation (usually accidentally, but maybe we'll want to collect data on purpose at some point). Is there a root config?

Flags: needinfo?(chutten) → needinfo?(mhentges)

That's a good question, I'm a little less familiar with in-tree taskgraph.
Let me NI Aki from Release Engineering, I think that he'll have a better idea where we can handle this generally.

Flags: needinfo?(mhentges) → needinfo?(aki)

Are we talking about this job in-tree in Gecko?

If so, the component is probably Firefox Build System :: Task Configuration. The easy way to do it is to add the env var here. (The env var will also be set on non-glean tasks, but if they ignore it, that may be ok. If that's not ok, we probably have to do something with a custom transform to update the env vars.) We can probably test to see if this patch works using ./mach try fuzzy to send this patch to try; if we select the glean/fog task then it'll run and we can verify its env vars.

(If not gecko, I probably need to know which repository we're talking about.)

Flags: needinfo?(aki)

(In reply to Aki Sasaki [:aki] (he/him) from comment #11)

If so, the component is probably Firefox Build System :: Task Configuration. The easy way to do it is to add the env var here.

We'd also need to add an env block for the default section below to add it to the mac+windows tasks.

I think that this applies more generally than the Glean Test task.
If I understand correctly, we want all tasks that are executing Firefox in-tree to have this environment variable.
I'm guessing that this is so we don't pollute our telemetry with information from Firefox running in CI.

If we need something set in all of moz automation, why not use MOZ_AUTOMATION, which is already set in most if not all automation tasks, rather than introduce a new env var we have to set everywhere?

We could, though I'm guessing that we don't want Mozilla-specific bits in Glean. However, given the potential difficulty of setting a new env var for all of CI, this might be the right tradeoff?
Eh, I'm not as valuable for this conversation because I'll be making assumptions for both perspectives here. I'll NI :chutten again here and let you two call the shots :)

Flags: needinfo?(chutten)

It's true, the Glean SDK knows nothing about Firefox so putting MOZ_AUTOMATION handling in there wouldn't be the right call.

However, FOG is designed to speak Gecko on one side and Glean on the other, so maybe there's something we can do at that level. I could read MOZ_AUTOMATION in FOG's init and call Glean's set_source_tags manually... but I'm not sure how that'd interact with any env var that Glean itself might read (e.g. what if both MOZ_AUTOMATION and GLEAN_SOURCE_TAGS are set? Which should win?). ni?Jan-Erik who'll know whether this is a good angle to try.

Flags: needinfo?(chutten) → needinfo?(jrediger)

(If fog is the only place we're gathering glean data in gecko, then https://hg.mozilla.org/try/rev/64348c03c36dd0819b02411933a849459c395b0a should be sufficient. That looks like the only place glean is referenced in all of gecko taskgraph, but I don't know if we're invisibly running glean elsewhere.)

Glean reads GLEAN_SOURCE_TAGS on init, and only then applies stuff set by set_source_tags. So if we naively call it when MOZ_AUTOMATION is set it would always override it.
I think GLEAN_SOURCE_TAGS should override MOZ_AUTOMATION though, so we would need to also check GLEAN_SOURCE_TAGS to not apply MOZ_AUTOMATION. Feels a bit icky to need to know about Glean stuff, but then again GLEAN_SOURCE_TAGS is public API to be used for debugging, so I guess it's fine.

:aki: overriding it for just that task won't help. FOG enables general data collection throughout the components, so might be triggered by any other piece of code.

Flags: needinfo?(jrediger)

Sounds like the bug lives in Toolkit::Telemetry after all : )

Work to be done: FOG needs to read the environment and set a source tag of automation if MOZ_AUTOMATION && !GLEAN_SOURCE_TAGS.

Severity: -- → N/A
Priority: -- → P3
Whiteboard: [telemetry:fog:m?]

(In reply to Jan-Erik Rediger [:janerik] from comment #19)

Glean reads GLEAN_SOURCE_TAGS on init, and only then applies stuff set by set_source_tags. So if we naively call it when MOZ_AUTOMATION is set it would always override it.
I think GLEAN_SOURCE_TAGS should override MOZ_AUTOMATION though, so we would need to also check GLEAN_SOURCE_TAGS to not apply MOZ_AUTOMATION. Feels a bit icky to need to know about Glean stuff, but then again GLEAN_SOURCE_TAGS is public API to be used for debugging, so I guess it's fine.

For python,

if os.environ.get("MOZ_AUTOMATION"):
    os.environ.setdefault("GLEAN_SOURCE_TAGS", "automation")

should just set GLEAN_SOURCE_TAGS if it isn't already set.

Assignee: nobody → chutten
Status: NEW → ASSIGNED
Priority: P3 → P1
Blocks: 1706626
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ff7488019a95
If FOG pings escape automation ensure they are tagged r=janerik
Flags: needinfo?(chutten)
Regressions: 1706712

Oh great, there's a race condition. It doesn't happen if I run it slowly, but if set_source_tags wins the race over the glean.init thread in initialize, then we'll try to call glean_core's set_source_tags on a global glean object that hasn't yet been populated.

IOW was_initialize_called is not a sufficient guard for with_glean_mut. We need to wait until at least setup_glean was called (because that sets the global glean).

I'll be filing an RLB bug for this, and for this bug I'll reorder the calls to ensure set_source_tags always loses the race.

Flags: needinfo?(chutten)
See Also: → 1706729
Duplicate of this bug: 1706712
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dd747d313704
If FOG pings escape automation ensure they are tagged r=janerik
Status: ASSIGNED → RESOLVED
Closed: 16 days ago
Resolution: --- → FIXED
Target Milestone: --- → 90 Branch
You need to log in before you can comment on or make changes to this bug.