Closed Bug 1814922 Opened 1 year ago Closed 1 year ago

Break down PingCentre send failures by "namespace"

Categories

(Toolkit :: Telemetry, task, P1)

task

Tracking

()

RESOLVED FIXED
111 Branch
Tracking Status
firefox111 --- fixed

People

(Reporter: chutten, Assigned: chutten)

References

Details

Attachments

(2 files)

We need a little more detail than the instrumentation in bug 1800079 offers.

This bug is about adding (and perhaps uplifting?) a new piece of instrumentation that counts failure to send standalone pings split by their structured ingestion namespace

If we can, while we're here, we should also instrument send successes so we have a denominator to compare against.

Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c08914932e48
Instrument PingCentre successes/failures by namespace r=nanj
Attachment #9315913 - Flags: data-review?(mmccorquodale)

Comment on attachment 9315913 [details]
data collection review request

  1. Is there or will there be documentation that describes the schema for the ultimate data set in a public, complete, and accurate way?
    Yes, this will be documented in the Glean dictionary.

  2. Is there a control mechanism that allows the user to turn the data collection on and off?
    Yes, users can opt out of telemetry collection.

  3. If the request is for permanent data collection, is there someone who will monitor the data over time?
    No, not permanent.

  4. Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
    Category 1, technical data.

  5. Is the data collection request for default-on or default-off?
    Default on.

  6. Does the instrumentation include the addition of any new identifiers?
    No new identifiers.

  7. Is the data collection covered by the existing Firefox privacy notice?
    Yes.

  8. Does the data collection use a third-party collection tool?
    No.


data-review +

Attachment #9315913 - Flags: data-review?(mmccorquodale) → data-review+
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 111 Branch
Regressions: 1815197

Comment on attachment 9315893 [details]
Bug 1814922 - Instrument PingCentre successes/failures by namespace r?nanj!

Beta/Release Uplift Approval Request

  • User impact if declined: No user impact, though it'll keep us (Mozilla) from getting release-channel data on the health of an important data collection system for one more month
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Low: instrumentation-only, test coverage
  • String changes made/needed: None
  • Is Android affected?: No
Attachment #9315893 - Flags: approval-mozilla-beta?

(The regression is a known, infrequent, test-only bug)

chutten, thanks for working on this so quickly (and thanks to all others involved!). When/ where should I expect to see this data come in?

Flags: needinfo?(chutten)

We are in RC week and already built and shipped our Release Candidate, the mozilla-beta tree is closed and won't reopen before the merge and version bump next Monday. Did you mean to have this patch in our RC build?

Comment on attachment 9315893 [details]
Bug 1814922 - Instrument PingCentre successes/failures by namespace r?nanj!

I don't think this is quite worth wedging into an RC, I should've doublechecked whattrainisitnow.com before I requested uplift. Thanks for the doublecheck, Pascal.

:skahmann, I've actually written a blog post on how long data takes. For this specific case, it made it into Nightly 20230204091116 (the first of two Feb 4 nightlies) and so has been reporting data from any Nightly users who updated Saturday or later. The data was going into additional_properties until early Monday morning when there was a schema deploy that would've caught that there were new metrics, adding a column.

Weirdly, I'd expect data to immediately start flowing into _live tables in the new column on Monday, meaning there'd be one partition (Monday's) available in the stable tables today (Tuesday) that I could query... but the data's not there. It is present in the live tables, though, ... but only today's data.

Maybe since the schema deploy ends after midnight UTC on Monday means that the new schema doesn't create new columns until Tuesday? ...I may need to update my blog post. :relud, I feel I may have asked this before and gotten an excellent answer which I plum forgot. Would you be willing to (re-)enlighten me about how schema deploys result in new columns?

Flags: needinfo?(chutten) → needinfo?(dthorn)
Attachment #9315893 - Flags: approval-mozilla-beta?

Oh, hey, maybe this is a one-off weirdness tracked in bug 1815297?

The timeline I would expect is:

  • metrics are added to nightly on 2023-02-04
  • next probe scraper run starts around 2023-02-06T00:00Z
  • schemas are generated and deployed within a few hours
  • ping tables for 2023-02-06 have metric in column and in additional properties
  • ping table for 2023-02-07 have metric in column

bug 1815297 is not related, that is a reactive bug from a monday weekly meeting checking that schemas are matching up with reality.

Flags: needinfo?(dthorn)

looking at schema generator logs, the new metric was detected in pine on the 2023-02-06 run, but was not detected for firefox-desktop until the 2023-02-07 run (today)

See Also: → 1827767
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: