1672455 - Mark all Glean-sent telemetry from Firefoxen running in CI as coming from automation

Assignee

Description

•

4 years ago

With bug 1664461 we're starting to send pings via the Glean SDK from Firefox Desktop. We managed to spy a few coming in with the channel nightly-autoland which suggests there might be some Firefox instances trying to send data from automation.

We should probably mark these as coming from automation so

The data is easier to find for automation-specific analyses
The data is easier to exclude for non-automation analyses
We show a good example for anyone else running Firefox in automation to follow

Luckily, the Glean SDK has a standard way of tagging the source of pings: GLEAN_SOURCE_TAGS. Just set that to automation (the traditional value) and it'll show up in a meta column in the resulting datasets for analysts to make use of.

Chris H-C :chutten

Assignee

Updated

•

4 years ago

Comment 1

•

4 years ago

Oh hey, a fun extra piece I only learned today: If you mark the data with the automation source tag, it will be filtered out from ever reaching the stable tables (like firefox_desktop.fog_validation) instead being stopped and hanging out in the Live tables (like firefox_desktop_live.fog_validation_v1) where they will disappear (along with all the rest of the live data) after 30 days.

This detail was implemented in bug 1657360, and if that's not a cool behaviour then we can find something else to tag it as.

Ricky Stewart

Comment 2

•

4 years ago

Seems like the wrong component. As stated this has nothing to do with mach.

Component: Mach Core → Telemetry

Product: Firefox Build System → Toolkit

Chris H-C :chutten

Assignee

Comment 3

•

4 years ago

When I chatted with mhentges I was told to file it in Mach Core... ni?mhentges for confirmation.

Flags: needinfo?(mhentges)

Mitchell Hentges [:mhentges] 🦀

Comment 4

•

4 years ago

Toolkit:Telemetry could work as well, but I believe that that's for defining and implementing in-browser telemetry, and not managing automation?
I wanted to avoid this ticket "hot potato-ing" around components until it finds its home. It's true that this probably won't be solved in mach itself, but I'm not sure where it's "real" home would be (probably a component that manages CI configuration?)
Let's see what the Toolkit:Telemetry folks say here.

Flags: needinfo?(mhentges)

Ricky Stewart

Comment 5

•

4 years ago

The original bug referenced in comment 0 is in Toolkit :: Telemetry, and nothing under discussion (as far as I know) even involves making any changes to Python code, let alone mach itself.

Chris H-C :chutten

Assignee

Comment 6

•

4 years ago

I was expecting there to be some sort of environment configuration template I could add a new envvar to that would ensure that, while running in automation, we also set the GLEAN_SOURCE_TAGS appropriately. I have no idea where that is, though.

Chris H-C :chutten

Assignee

Comment 7

•

4 years ago

Do you know of a common configuration template I could add a new envvar to? Or something that'd solve the same problem? (Or someone who might know more?)

Flags: needinfo?(mhentges)

Mitchell Hentges [:mhentges] 🦀

Comment 8

•

4 years ago

Sorry for this falling off the radar :(

For C++ projects, does GLEAN_SOURCE_TAGS need to be set at compile-time or run-time?
If it's compile-time, you could add them to the mozconfig for the Glean build (just like how this mozconfig is the linux64 nightly one).
If it's at run-time (e.g.: GLEAN_SOURCE_TAGS=automation firefox --do-something), then I think you'll want to customize a taskcluster config. I'm guessing that this would affect all tasks running Firefox in CI, yeah?

Flags: needinfo?(mhentges) → needinfo?(chutten)

Chris H-C :chutten

Assignee

Comment 9

•

4 years ago

This is a runtime thing, yeah. We'd like to catch all possible data that might come from automation (usually accidentally, but maybe we'll want to collect data on purpose at some point). Is there a root config?

Flags: needinfo?(chutten) → needinfo?(mhentges)

Mitchell Hentges [:mhentges] 🦀

Comment 10

•

4 years ago

That's a good question, I'm a little less familiar with in-tree taskgraph.
Let me NI Aki from Release Engineering, I think that he'll have a better idea where we can handle this generally.

Flags: needinfo?(mhentges) → needinfo?(aki)

Aki Sasaki (not active)

Comment 11

•

4 years ago

Are we talking about this job in-tree in Gecko?

If so, the component is probably Firefox Build System :: Task Configuration. The easy way to do it is to add the env var here. (The env var will also be set on non-glean tasks, but if they ignore it, that may be ok. If that's not ok, we probably have to do something with a custom transform to update the env vars.) We can probably test to see if this patch works using ./mach try fuzzy to send this patch to try; if we select the glean/fog task then it'll run and we can verify its env vars.

(If not gecko, I probably need to know which repository we're talking about.)

Flags: needinfo?(aki)

Aki Sasaki (not active)

Comment 12

•

4 years ago

(In reply to Aki Sasaki [:aki] (he/him) from comment #11)

If so, the component is probably Firefox Build System :: Task Configuration. The easy way to do it is to add the env var here.

We'd also need to add an env block for the default section below to add it to the mac+windows tasks.

Mitchell Hentges [:mhentges] 🦀

Comment 13

•

4 years ago

I think that this applies more generally than the Glean Test task.
If I understand correctly, we want all tasks that are executing Firefox in-tree to have this environment variable.
I'm guessing that this is so we don't pollute our telemetry with information from Firefox running in CI.

Aki Sasaki (not active)

Comment 14

•

4 years ago

If we need something set in all of moz automation, why not use MOZ_AUTOMATION, which is already set in most if not all automation tasks, rather than introduce a new env var we have to set everywhere?

Aki Sasaki (not active)

Comment 15

•

4 years ago

https://treeherder.mozilla.org/jobs?repo=try&revision=90ad0ac76d4c07c0e92d9435bac2e7be522be6ce&selectedTaskRun=ZDji0ju5TpOyK00mGjkPeQ.0 if comment 11 is more what we want.

Mitchell Hentges [:mhentges] 🦀

Comment 16

•

4 years ago

We could, though I'm guessing that we don't want Mozilla-specific bits in Glean. However, given the potential difficulty of setting a new env var for all of CI, this might be the right tradeoff?
Eh, I'm not as valuable for this conversation because I'll be making assumptions for both perspectives here. I'll NI :chutten again here and let you two call the shots :)

Mitchell Hentges [:mhentges] 🦀

Updated

•

4 years ago

Flags: needinfo?(chutten)

Chris H-C :chutten

Assignee

Comment 17

•

4 years ago

It's true, the Glean SDK knows nothing about Firefox so putting MOZ_AUTOMATION handling in there wouldn't be the right call.

However, FOG is designed to speak Gecko on one side and Glean on the other, so maybe there's something we can do at that level. I could read MOZ_AUTOMATION in FOG's init and call Glean's set_source_tags manually... but I'm not sure how that'd interact with any env var that Glean itself might read (e.g. what if both MOZ_AUTOMATION and GLEAN_SOURCE_TAGS are set? Which should win?). ni?Jan-Erik who'll know whether this is a good angle to try.

Flags: needinfo?(chutten) → needinfo?(jrediger)

Aki Sasaki (not active)

Comment 18

•

4 years ago

(If fog is the only place we're gathering glean data in gecko, then https://hg.mozilla.org/try/rev/64348c03c36dd0819b02411933a849459c395b0a should be sufficient. That looks like the only place glean is referenced in all of gecko taskgraph, but I don't know if we're invisibly running glean elsewhere.)

Jan-Erik Rediger [:janerik]

Comment 19

•

4 years ago

Glean reads GLEAN_SOURCE_TAGS on init, and only then applies stuff set by set_source_tags. So if we naively call it when MOZ_AUTOMATION is set it would always override it.
I think GLEAN_SOURCE_TAGS should override MOZ_AUTOMATION though, so we would need to also check GLEAN_SOURCE_TAGS to not apply MOZ_AUTOMATION. Feels a bit icky to need to know about Glean stuff, but then again GLEAN_SOURCE_TAGS is public API to be used for debugging, so I guess it's fine.

:aki: overriding it for just that task won't help. FOG enables general data collection throughout the components, so might be triggered by any other piece of code.

Flags: needinfo?(jrediger)

Chris H-C :chutten

Assignee

Comment 20

•

4 years ago

Sounds like the bug lives in Toolkit::Telemetry after all : )

Work to be done: FOG needs to read the environment and set a source tag of automation if MOZ_AUTOMATION && !GLEAN_SOURCE_TAGS.

Severity: -- → N/A

Priority: -- → P3

Whiteboard: [telemetry:fog:m?]

Aki Sasaki (not active)

Comment 21

•

4 years ago

(In reply to Jan-Erik Rediger [:janerik] from comment #19)

Glean reads GLEAN_SOURCE_TAGS on init, and only then applies stuff set by set_source_tags. So if we naively call it when MOZ_AUTOMATION is set it would always override it.
I think GLEAN_SOURCE_TAGS should override MOZ_AUTOMATION though, so we would need to also check GLEAN_SOURCE_TAGS to not apply MOZ_AUTOMATION. Feels a bit icky to need to know about Glean stuff, but then again GLEAN_SOURCE_TAGS is public API to be used for debugging, so I guess it's fine.

For python,

if os.environ.get("MOZ_AUTOMATION"):
    os.environ.setdefault("GLEAN_SOURCE_TAGS", "automation")

should just set GLEAN_SOURCE_TAGS if it isn't already set.

Chris H-C :chutten

Assignee

Updated

•

4 years ago

Assignee: nobody → chutten

Status: NEW → ASSIGNED

Priority: P3 → P1

Chris H-C :chutten

Assignee

Comment 22

•

4 years ago

Attached file Bug 1672455 - If FOG pings escape automation ensure they are tagged r?janerik! — Details

Chris H-C :chutten

Assignee

Updated

•

4 years ago

Blocks: 1706626

Pulsebot

Comment 23

•

4 years ago

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ff7488019a95 If FOG pings escape automation ensure they are tagged r=janerik

Alexandru Michis [:malexandru]

Comment 24

•

4 years ago

Backed out changeset ff7488019a95 (Bug 1672455) for causing test crashes.
Backout link: https://hg.mozilla.org/integration/autoland/rev/683c2a81d1a3230a9b2ae93162277244a99d4921
Push with failures, failure log.

Flags: needinfo?(chutten)

Treeherder Bug Filer

Updated

•

4 years ago

Regressions: 1706712

Chris H-C :chutten

Assignee

Comment 25

•

4 years ago

Oh great, there's a race condition. It doesn't happen if I run it slowly, but if set_source_tags wins the race over the glean.init thread in initialize, then we'll try to call glean_core's set_source_tags on a global glean object that hasn't yet been populated.

IOW was_initialize_called is not a sufficient guard for with_glean_mut. We need to wait until at least setup_glean was called (because that sets the global glean).

I'll be filing an RLB bug for this, and for this bug I'll reorder the calls to ensure set_source_tags always loses the race.

Flags: needinfo?(chutten)

Chris H-C :chutten

Assignee

Updated

•

4 years ago

Comment 27

•

4 years ago

Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dd747d313704 If FOG pings escape automation ensure they are tagged r=janerik

Andreea Pavel [:apavel]

Comment 28

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/dd747d313704

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox90: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 90 Branch