## TL;DR After doing some digging on the action metric (documented below), I decided it would be a good idea to see if any other columns had the same issue, and found that: * There are four Glean metrics which have empty BigQuery columns (i.e. columns with only `null` values in them) (going back to the start of FxMS Glean data collection in 115 Nightly on 2023-05-24). * There does not appear to be any data loss or meaningful negative consequence here. * Most of these are the result of a tactic we used when landing the telemetry to avoid data loss. ### Relevant details about Glean & BigQuery * If a metric is defined for a Glean ping, it will be included with every instance of that ping, even if the metric has not been explicitly set. * If nothing is received for a metric in a ping where it is sent, it gets ETLed into BigQuery in a way that causes it to appear as null in SELECTs. * The above two lines mean that columns full of nulls are completely consistent with no telemetry being sent by the code for a particular metric. * There is effectively no meaningful cost for BigQuery columns with only nulls. * When we were creating `browser/newtab/components/metrics.yaml`, we decided that we wanted to err on the side of having places to put data that we might send (rather than on the side of finding out later that we had discarded sent data that there was no place to put). So, we made sure to add metrics for things in the existing docs that looked plausible, even if we weren't seeing them currently in the debugging output. ### Next steps * Columns can't be removed from a table without versioning the schema, and this is rarely done for various reasons. * After discussion with chutten, the appropriate way to handle this sort of thing is usually to remove unused metrics from metrics.yaml, which will mark them as obsolete in the Glean dictionary. I expect this will make sense to do with all of these except for (maybe) `msstoresignedin`, which I will spin off another bug for. The columns: * .action * likely reason added to metrics.yaml: [existing telemetry docs](https://firefox-source-docs.mozilla.org/browser/components/newtab/docs/v2-system-addon/data_events.html#user-interaction-pings). * no data received, appears unused in source code (more details below) * proposal: mark as obsolete by removing from metrics.yaml * .cfr_action * likely reason added to metrics.yaml: unclear, seems like just a bug * [current](https://searchfox.org/mozilla-central/search?q=cfr_action&path=&case=false®exp=false_) & older searches have pretty much no uses of this string in telemetry. * proposal: mark as obsolete by removing from metrics.yaml * .page * likely reason added to metrics.yaml: metrics collection docs about [basic shape of user event pings](https://firefox-source-docs.mozilla.org/browser/components/newtab/docs/v2-system-addon/data_events.html#user-event-pings) * user_events are used in both newtab code and [asrouter code](https://searchfox.org/mozilla-central/rev/57f6fbd39c0b5957e11b27b4db58b821d8e1607d/browser/components/newtab/lib/TelemetryFeed.jsm#591). However, having dug through searchfox and vscode, I haven't found instances of `page` keys being sent from ASRouter user_event pings, and I haven't found any columns in the PingCentre tables that collect it. * proposal: mark as obsolete by removing from metrics.yaml * .attribution.msstoresignedin * likely reason added to metrics.yaml: [attribution code](https://searchfox.org/mozilla-central/source/browser/components/attribution/AttributionCode.sys.mjs#47) * no columns in the old telemetry tables collect this * null column in FxMS Glean tables presumably because it only gets sent if there's a windows install campaign (e.g. [Test new install method for Windows 11 users](https://github.com/mozilla/bedrock/issues/11090)), and none of those have yet been done. * proposal: spin off a bug to decide what to do here, if anything, because a lot depends on how product and desktop-integration want to use this (or not) More details for `.action`: The [AS Router part of the data events docs](https://firefox-source-docs.mozilla.org/browser/components/newtab/docs/v2-system-addon/data_events.html#activity-stream-router-pings) and [user_event ping basic shape docs](https://firefox-source-docs.mozilla.org/browser/components/newtab/docs/v2-system-addon/data_events.html#basic-shape) show them as having `action` fields. An example of what actually gets sent: ```JSON Submitting Glean ping for {"source":"CFR","message_id":"CFR_FIREFOX_VIEW","bucket_id":"CFR_FIREFOX_VIEW","event":"IMPRESSION","addon_version":"20230911154121","locale":"en-US","client_id":"a2d2550a-0255-4d3a-b330-983d15d7ae7a","pingType":"cfr"} ``` It appears that action fields used to get sent, but that changed when AS Router telemetry migrated to a new data pipeline in [bug 1585147](https://bugzilla.mozilla.org/show_bug.cgi?id=1585147) via [this PR](https://github.com/mozilla/activity-stream/commit/147fd74cecd71ed2f7f148475b356604786cd0a7), which added code that deleted the action field for a whole pile of telemetry types. For more due diligence, I looked through code (via SearchFox and otherwise) for the string `action:`, finding lots of false positives (links to examples): * [message actions from our JSON messages](https://searchfox.org/mozilla-central/rev/44a7ece8626d9bc418da7d13341f9163817d199b/browser/components/newtab/aboutwelcome/AboutWelcomeChild.jsm#384) * ping keys for [non-about:welcome stuff](https://searchfox.org/mozilla-central/source/browser/components/newtab/content-src/asrouter/asrouter-content.jsx#83) which end up getting deleted in TelemetryFeed.jsm ([mostly in policy functions](https://searchfox.org/mozilla-central/rev/44a7ece8626d9bc418da7d13341f9163817d199b/browser/components/newtab/lib/TelemetryFeed.jsm#656)) before being sent Adding a couple of needinfos for feedback on the analysis and proposals about what to do with these columns; comments from others are welcome too.
Bug 1850863 Comment 2 Edit History
Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.
## TL;DR After doing some digging on the action metric (documented below), I decided it would be a good idea to see if any other columns had the same issue, and found that: * There are four Glean metrics which have empty BigQuery columns (i.e. columns with only `null` values in them) (going back to the start of FxMS Glean data collection in 115 Nightly on 2023-05-24). * There does not appear to be any data loss or meaningful negative consequence here. * Most of these are the result of a tactic we used when landing the telemetry to avoid data loss. ### Relevant details about Glean & BigQuery * If a metric is defined for a Glean ping, it will be included with every instance of that ping, even if the metric has not been explicitly set. * If nothing is received for a metric in a ping where it is sent, it gets ETLed into BigQuery in a way that causes it to appear as null in SELECTs. * The above two lines mean that columns full of nulls are completely consistent with no telemetry being sent by the code for a particular metric. * There is effectively no meaningful cost for BigQuery columns with only nulls. * When we were creating `browser/newtab/components/metrics.yaml`, we decided that we wanted to err on the side of having places to put data that we might send (rather than on the side of finding out later that we had discarded sent data that there was no place to put). So, we made sure to add metrics for things in the existing docs that looked plausible, even if we weren't seeing them currently in the debugging output. ### Next steps * Columns can't be removed from a table without versioning the schema, and this is rarely done for various reasons. * After discussion with chutten, the appropriate way to handle this sort of thing is usually to remove unused metrics from metrics.yaml, which will mark them as obsolete in the Glean dictionary. I expect this will make sense to do with all of these except for (maybe) `msstoresignedin`, which I will spin off another bug for. The columns: * .action * likely reason added to metrics.yaml: [existing telemetry docs](https://firefox-source-docs.mozilla.org/browser/components/newtab/docs/v2-system-addon/data_events.html#user-interaction-pings). * no data received, appears unused in source code (more details below) * proposal: mark as obsolete by removing from metrics.yaml * .cfr_action * likely reason added to metrics.yaml: unclear, seems like just a bug * [current](https://searchfox.org/mozilla-central/search?q=cfr_action&path=&case=false®exp=false_) & older searches have pretty much no uses of this string in telemetry. * proposal: mark as obsolete by removing from metrics.yaml * .page * likely reason added to metrics.yaml: metrics collection docs about [basic shape of user event pings](https://firefox-source-docs.mozilla.org/browser/components/newtab/docs/v2-system-addon/data_events.html#user-event-pings) * user_events are used in both newtab code and [asrouter code](https://searchfox.org/mozilla-central/rev/57f6fbd39c0b5957e11b27b4db58b821d8e1607d/browser/components/newtab/lib/TelemetryFeed.jsm#591). However, having dug through searchfox and vscode, I haven't found instances of `page` keys being sent from ASRouter user_event pings, and I haven't found any columns in the PingCentre tables that collect it. * proposal: mark as obsolete by removing from metrics.yaml * .attribution.msstoresignedin * likely reason added to metrics.yaml: [attribution code](https://searchfox.org/mozilla-central/source/browser/components/attribution/AttributionCode.sys.mjs#47) * no columns in the old telemetry tables collect this * null column in FxMS Glean tables presumably because it only gets sent if there's a windows install campaign (e.g. [Test new install method for Windows 11 users](https://github.com/mozilla/bedrock/issues/11090)), and none of those have yet been done. * proposal: spin off a bug to decide what to do here, if anything, because a lot depends on how product and desktop-integration want to use this (or not) More details for `.action`: The [AS Router part of the data events docs](https://firefox-source-docs.mozilla.org/browser/components/newtab/docs/v2-system-addon/data_events.html#activity-stream-router-pings) and [user_event ping basic shape docs](https://firefox-source-docs.mozilla.org/browser/components/newtab/docs/v2-system-addon/data_events.html#basic-shape) show them as having `action` fields. An example of what actually gets sent: ```JSON Submitting Glean ping for {"source":"CFR","message_id":"CFR_FIREFOX_VIEW","bucket_id":"CFR_FIREFOX_VIEW","event":"IMPRESSION","addon_version":"20230911154121","locale":"en-US","client_id":"a2d2550a-0255-4d3a-b330-983d15d7ae7a","pingType":"cfr"} ``` It appears that action fields used to get sent, but that changed when AS Router telemetry migrated to a new data pipeline in [bug 1585147](https://bugzilla.mozilla.org/show_bug.cgi?id=1585147) via [this PR](https://github.com/mozilla/activity-stream/commit/147fd74cecd71ed2f7f148475b356604786cd0a7), which added code that deleted the action field for a whole pile of telemetry types. For more due diligence, I looked through code (via SearchFox and otherwise) for the string `action:`, finding lots of false positives (links to examples): * [message actions from our JSON messages](https://searchfox.org/mozilla-central/rev/44a7ece8626d9bc418da7d13341f9163817d199b/browser/components/newtab/aboutwelcome/AboutWelcomeChild.jsm#384) * ping keys for [non-about:welcome stuff](https://searchfox.org/mozilla-central/source/browser/components/newtab/content-src/asrouter/asrouter-content.jsx#83) which end up getting deleted in TelemetryFeed.jsm ([mostly in policy functions](https://searchfox.org/mozilla-central/rev/44a7ece8626d9bc418da7d13341f9163817d199b/browser/components/newtab/lib/TelemetryFeed.jsm#656)) before being sent Adding a couple of needinfos for feedback on the analysis and proposals about what to do with these columns; comments from others are welcome too.