New attention stream pings missing metrics.uuid.rally_id
Categories
(Data Platform and Tools :: General, defect)
Tracking
(Not tracked)
People
(Reporter: whd, Assigned: rhelmer)
References
Details
(Whiteboard: [dataplatform])
Attachments
(1 file)
+++ This bug was initially created as a clone of Bug #1760255 +++
Stage automated deployments were paused last week and yesterday due to :frank working on https://mozilla-hub.atlassian.net/browse/DSRE-908, but it looks like some new pings were added on August 5th in https://github.com/mozilla-services/mozilla-pipeline-schemas/commit/49d2b1abd7783a60e1083783a8e15fe97919ca3e that are missing the field that is used by infra to shred data. I attempted a stage deploy with the new schemas and it failed due to the above.
Looks like :rhelmer handled this last time so NI'ing him: https://bugzilla.mozilla.org/show_bug.cgi?id=1760255#c7. This is at least the third time this has happened, see also https://bugzilla.mozilla.org/show_bug.cgi?id=1762997#c13. I think https://bugzilla.mozilla.org/show_bug.cgi?id=1695236 tracks making it easier to not forget to add the field to all rally pings.
Assignee | ||
Comment 1•2 years ago
|
||
Thanks! We haven't shipped these updates yet (0.4.5
), I'll fix the Glean configs and spin up a new release.
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 2•2 years ago
|
||
Comment 3•2 years ago
|
||
Another short term possibility to catch these earlier is use the great work :relud is doing on probe-scraper "push model".
Rally now uses the probe-scraper via a github action to validate the metrics. If we added a custom check for rally projects there, we'd have easy per PR feedback and prevent further problems like thise one. Would this be an acceptable solution for you?
@Jan-Erik, do you think that adding a custom check for all the Rally pings in the probe-scraper to see if they contain the "rally.id" metric is the best path forward? Can you see any other short term path to perform this check other than bug 1695236? Another potential option would be doing something like FOG does: having a custom script using the probe-scraper and performing such check into the script itself.
Comment 4•2 years ago
|
||
Is this only required for the Rally Core add-on or for all studies?
If the former we could special-case that easily (even if I sorta dislik special-cases). If it's the latter it might be a bit more difficult (guess the only thing how the probe-scraper would know it needs to look for it is if rally
is in the app name?)
Not sure what you mean by the FOG example, but yes, having an explicit check in the repository would be another way if it's only for the Core add-on (guaranteeing the check is copied to all studies is ... probably as buggy as remembering to add pings to begin with).
Comment 5•2 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #4)
Is this only required for the Rally Core add-on or for all studies?
If the former we could special-case that easily (even if I sorta dislik special-cases). If it's the latter it might be a bit more difficult (guess the only thing how the probe-scraper would know it needs to look for it is ifrally
is in the app name?)
It's required for all the studies. We could probably simply add a property (e.g. "is_rally_study") in repositories.yaml if we don't want to rely on the study names.
Not sure what you mean by the FOG example, but yes, having an explicit check in the repository would be another way if it's only for the Core add-on (guaranteeing the check is copied to all studies is ... probably as buggy as remembering to add pings to begin with).
Rally is moving to a monorepo, so this should ease some of the pain
Comment 6•2 years ago
|
||
I think doing the check in the monorepo would be great. Both rally-core and attention stream already live in the monorepo, and those are the two that we anticipate updating the most often. We also plan to move the rest of the studies into the monorepo relatively soon.
The only snag is that :Dexter and I were discussing adding Glean analytics to our website, which also lives in the monorepo, but will not have any rally_id attached to pings.
So as long as it's possible to do something like "if rally_id is defined as a metric for this app_id, then make sure it is sent in all pings from this app_id", I think adding the check to the monorepo would be a good approach.
Reporter | ||
Comment 7•2 years ago
|
||
Would this be an acceptable solution for you?
To answer broadly to the various options given here: I don't have a strong opinion on the manner in which this is addressed. I think if :relud's push model work will provide a generic mechanism to enforce constraints before they land in MPS generated-schemas
, then we should prefer that to any special case solution, and it's not worth putting any work into short term mitigation when probe scraper work is planned for Q3.
Assignee | ||
Comment 8•2 years ago
|
||
This should be fixed in https://github.com/mozilla-rally/rally/commit/7c66e8280c1de4bfdce8ae252a38e6e7041544ac which has landed on main
, is there anything else we need to do? Thanks!
Reporter | ||
Comment 9•2 years ago
|
||
is there anything else we need to do?
No. The probe scraper run today should pick up this change. Once that deploys successfully to stage I will close this ticket, and if it doesn't, I'll provide details here.
Reporter | ||
Updated•2 years ago
|
Reporter | ||
Comment 10•2 years ago
|
||
Reopening for the meta-pixel
ping introduced by https://github.com/mozilla-services/mozilla-pipeline-schemas/commit/c7259590c24d8f10c5f8dbbed860c2a665a3d7e8. There's no uuid
struct in the bq schemas at all in this case but that's probably because it's empty.
Comment 11•2 years ago
|
||
Ah!! I really thought I had this one, but it looks like it slipped through. Thank you for bringing this to our attention so quickly.
Actually, I am thinking of changing this ping name and structure now. Given that this is failing, will I be able to do that without messing up the scraper? i.e. remove meta-pixel
ping and related metrics, and instead replace it with tracking-pixel
and renamed metrics.
Assignee | ||
Updated•2 years ago
|
Comment 12•2 years ago
|
||
(In reply to Ashwin Agarwal from comment #11)
Ah!! I really thought I had this one, but it looks like it slipped through. Thank you for bringing this to our attention so quickly.
i.e. remove
meta-pixel
ping and related metrics, and instead replace it withtracking-pixel
and renamed metrics.
"Replacing" here really means adding a new metric, so yeah, no problem.
Not sure if the removal could cause problems, but worst case you can leave it around.
Reporter | ||
Comment 13•2 years ago
|
||
Actually, I am thinking of changing this ping name and structure now.
Deleting and replacing the ping requires commits like https://github.com/mozilla/mozilla-schema-generator/commit/4f636d7ea93bfb61cf78a463620a75310dc09d46 https://github.com/mozilla/mozilla-schema-generator/commit/dce35b19614e39439c98db57b3dba2e26b0edfc7, and https://github.com/mozilla/probe-scraper/pull/442/files (though probe-scraper push-model may be changing the semantics here, in which case I defer to :relud). See https://bugzilla.mozilla.org/show_bug.cgi?id=1769579#c20. To reiterate, this isn't a generally supported operation but since the stage deployment failed these pings never made it to production anyway and so deleting/recreating them is probably reasonable.
Comment 14•2 years ago
|
||
Ah yes I remember when we did that.
I still want to clarify: since this didn't make it to production, we don't need to go through that process, and I can instead delete the ping directly without issue (in just this specific instance)?
Reporter | ||
Comment 15•2 years ago
|
||
I still want to clarify: since this didn't make it to production, we don't need to go through that process, and I can instead delete the ping directly without issue (in just this specific instance)?
No, you need to go through that process because the safety checks to ensure we're not deleting tables occur in both CI and stage/production deployment.
Comment 16•2 years ago
|
||
Got it. We can track that process in this ticket, then. Sorry for gunking this up. I have some ideas about how we can avoid this in the future which I will surface to the team.
Comment 17•2 years ago
|
||
:relud , this is the commit where I added the ping that we want to get rid of: https://github.com/mozilla-rally/rally/commit/a3dacb30e198c5c19159678c6617064cf4ae1d77
Should I make a new dedicated commit that removes the ping, before adding it's replacement? Or is it better for that to be combined into one commit? i.e. if we skip the above commit, will the scraper get confused if it sees the meta-pixel
being removed in a later commit, given that it now isn't aware of it's existence.
Comment 18•2 years ago
|
||
removing individual metrics/pings/tags defined in a glean app is a valid thing to do, so long as the metric/ping/tag yaml file still exists. Doing so does not cause any tables or columns to be deleted, because our infrastructure assumes that a historical version of the app may still submit that data.
remove meta-pixel ping and related metrics, and instead replace it with tracking-pixel and renamed metrics.
Should I make a new dedicated commit that removes the ping, before adding it's replacement? Or is it better for that to be combined into one commit? i.e. if we skip the above commit, will the scraper get confused if it sees the meta-pixel being removed in a later commit, given that it now isn't aware of it's existence.
this is safe to do in a single commit. the scraper is (in theory) aware of all previous commits, so it will continue to be aware of the existence of the meta-pixel
ping, even though it was never actually sent in prod and is no longer present in the git HEAD.
Reporter | ||
Comment 19•2 years ago
•
|
||
It looks like the most recent schemas commit introduced some new pings: https://github.com/mozilla-services/mozilla-pipeline-schemas/commit/f45d6567e83f6412a96791dbb60cea81c5c62528: attribution
and tracking-pixel
, and I'm guessing that meta-pixel
is replaced by the later. However, the meta-pixel
schema still exists and is missing metrics.uuid.rally_id
and therefore BQ schemas deploys are still broken in stage. The main options are to delete the meta-pixel
ping or to update it to include metrics.uuid.rally_id
before we can proceed with a production deployment.
Comment 20•2 years ago
|
||
:whd that's correct, meta-pixel
is now tracking-pixel
.
:relud is it possible to delete meta-pixel
? I removed it from the YAML in this commit, but I guess the deploy is still unhappy since it already created the schema for meta-pixel
.
Comment 21•2 years ago
|
||
filed https://github.com/mozilla/probe-scraper/pull/495 and https://github.com/mozilla/mozilla-schema-generator/pull/229 to delete the ping.
Reporter | ||
Comment 22•2 years ago
|
||
Prod deploy successful. https://github.com/mozilla/mozilla-schema-generator/pull/230 for final cleanup. It's probably worth filing a bug for implementing CI checks with the push model for this specific case as it happens frequently.
Updated•2 years ago
|
Description
•