Open Bug 1899019 Opened 9 months ago Updated 3 days ago

Move FxMS ping and metrics out of newtab

Categories

(Firefox :: Messaging System, task, P1)

task
Points:
3

Tracking

()

REOPENED
137 Branch
Iteration:
137.2 - Feb 17 - Feb 28
Tracking Status
firefox137 --- affected

People

(Reporter: aminomancer, Assigned: pdahiya)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [hnt-trainhop])

Attachments

(2 files)

Our glean stuff is currently in newtab, but we should move it to asrouter, since both directories have herald rules tagging the respective teams for review. As it stands, we won't be tagged for review when our ping/metrics change, but the newtab team will be.

ni?chutten to let us know if it's fine to just do this, or if we need to perform additional migration steps to make sure glean data and the dictionary page do not get messed up or duplicated. Thanks!

Flags: needinfo?(chutten)
No longer blocks: 1877199
See Also: → 1877199
See Also: → 1899030

There are a few things that come to mind when thinking about how to do this without interrupting data flow:

  1. Make sure the destination and source files are in the same lists in metrics_index.py. This ought to keep the pipeline from even really noticing the difference.
  2. Ensure the destination and source files both already exist, and continue to exist after the definitions move, even if they're empty. The pipeline has a mechanism for recognizing new files and another mechanism for recognizing new/changed metrics and pings within those files. If those operations aren't externally synchronized, we could get into a weird situation requiring manual intervention.
    • If any files need to be added, add them early. And then don't move definitions to the new files until the day after that file appears in the list in repositories.yaml (usually just the day after the first build that has the new files in it).
    • As for the reasons why we're keeping them even if they're empty, there are two of them: 1) Saves someone some effort and some waiting for when they want to add metrics/pings to the source component later, 2) If there are backouts, even from autoland, file deletions can become annoying on the pipeline.
  3. If necessary, update the tags of these metrics and pings (if they don't have definition-specific tag metadata, they inherit the file-level $tags param) to point to the correct product&component so that in the dictionary they're associated with other similar instrumentation. This is a nice-to-have and can happen whenever you'd like.
  4. Test the data coming from the build using the Glean Debug Ping Viewer (via e.g. about:glean) and all of your automated suites. This will ensure the instrumentation didn't pick up a fault along the way.
  5. Monitor the data coming in after the swap reaches a build. Use client_info.app_build to see your data coming in from buildids that have your changes, and use the moz-fx-data-shared-prod.firefox_desktop_live.messaging_system_v1 table to read "messaging-system" pings that hit the pipeline within the past 15mins (this data is not deduplicated and is otherwise unfit for general analysis, but it's good for monitoring). You should ultimately see the rate of pings per client in builds with your change reach the rate from builds without your change, but more immediately we just want to make sure that the pings come through at all and they have the metrics you expect in the columns that you expect. If there aren't any "messaging-system" pings, or there aren't the right number of them or contents in them, reach out.

That should keep the data flowing and minimize the likelihood that we'll need to bug pipeline folks to manually intercede.

So long as things haven't changed since the last time I looked into it, "messaging-system" pings' downstream analyses are limited mostly to nightly batched derivations, so we have some wriggle room to fix things before they get too wrong and won't end up suffering too much if things do go wrong and we have to wait until things are made right.

Flags: needinfo?(chutten)

(In reply to Chris H-C :chutten from comment #1)

Thanks for the thorough instructions! This is really helpful.

Priority: -- → P3
Blocks: 1937151
Whiteboard: [hnt-trainhop]
Assignee: nobody → pdahiya
Iteration: --- → 137.2 - Feb 17 - Feb 28
Priority: P3 → P1
Points: --- → 3
Pushed by pdahiya@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d6f61409b245 Part 1 - Add FxMS metrics.yaml and pings.yaml files r=chutten,omc-reviewers,negin

NI @chutten for feedback and help debug above build failure on autoland about AsrouterMetrics.h API thanks!

Flags: needinfo?(pdahiya) → needinfo?(chutten)

Ohhhhhh, I know what's going on here. Florian mentioned having problems with empty files when he split the include files.

Florian, is there an easy way to support metrics files with no definitions? Turns out there is at least one legitimate use for them (this bug's use: adding a file to the pipeline before moving metrics into it to make absolutely certain the definitions aren't interrupted).

Flags: needinfo?(chutten) → needinfo?(florian)

(In reply to Chris H-C :chutten from comment #8)

Florian, is there an easy way to support metrics files with no definitions?

Not really easy, the task generating the header files takes as input the metrics_yaml.cached file, that only contains parsed metrics. The list of yaml files doesn't exist anymore at that point. It reappears later when we access metric.defined_in["filepath"].

If you really want to do it, you'll need to either:

  • add a dummy metric to make the metric.yaml file non-empty (and teach the code in glean_parser_ext to skip it, thus writing an empty header file.)
  • make the parsing code keep the list of input yaml metric files when generating the cache, and have the header output code generate empty header files for files that don't contain any metric.

Neither of these options sound great.

Turns out there is at least one legitimate use for them (this bug's use: adding a file to the pipeline before moving metrics into it to make absolutely certain the definitions aren't interrupted).

To me this sounds more like an obscure workaround for bugs than like a legitimate use case. More specifically, looking at your point '2.' in comment 1:

  • "we could get into a weird situation requiring manual intervention" doesn't sound good.
  • "don't move definitions to the new files until the day after that file appears in the list in repositories.yaml" I don't think we want developers to know or care for this.
  • "reasons why we're keeping them even if they're empty, there are two of them: 1) Saves someone some effort and some waiting for when they want to add metrics/pings to the source component later" There should be no waiting. Either the pipeline should update automatically as part of a new build being released, or submissions received from builds with a build id newer than the most recent update of the pipeline should be queued and only processed once the pipeline has been updated.
  • "If there are backouts, even from autoland, file deletions can become annoying on the pipeline." backouts should be painless. Sheriffs won't know that backing out a patch will be causing issues elsewhere, and shouldn't have to worry about it.

I understand that fixing all these issues requires significant effort, so we might need to take a temporary workaround. If so, I think putting a descriptive string in the 'empty' metrics.yaml file that says why it's there is better than leaving the file really empty, as a build error seems a reasonable behavior to me when processing an empty metric file.

Flags: needinfo?(florian)

The even more "wonderful" thing about this is that we already have all these issues fixed... for every project that isn't hosted in mozilla-central. Hopefully on the move to github we'll reevaluate that cost of turning probe-scraper "push mode" on for our biggest projects.

Okay, since this is temporary, let's do this.

Punam, can you add this to your patch?

diff --git a/toolkit/components/glean/moz.build b/toolkit/components/glean/moz.build
index 612ee7e1dafd1..dfe304f90e610 100644
--- a/toolkit/components/glean/moz.build
+++ b/toolkit/components/glean/moz.build
@@ -138,7 +138,11 @@ metrics_headers = sorted(
     {
         convert_yaml_path_to_header_name(path) + ".h"
         for path in metrics_yamls
-        if path != "toolkit/components/glean/metrics.yaml"
+        if path
+        not in (
+            "toolkit/components/glean/metrics.yaml",
+            "browser/components/asrouter/metrics.yaml", # TODO: Remove in bug 1899019
+        )
     }
 )
 EXPORTS.mozilla.glean += [("!" + filename) for filename in metrics_headers]

And then the inverse for the second patch. That'll give us the temporary permission to have that empty metrics.yaml.

Flags: needinfo?(pdahiya)

Thanks for feedback and help unblock, just updated the patch and started try run , please let me know if there is any concern around landing at the end of nightly cycle.

Flags: needinfo?(pdahiya)
Pushed by pdahiya@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/37e419149002 Part 1 - Add FxMS metrics.yaml and pings.yaml files r=chutten,omc-reviewers,negin
Status: NEW → RESOLVED
Closed: 3 days ago
Resolution: --- → FIXED
Target Milestone: --- → 137 Branch

Reopened for the unpublished changeset.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: