Open Bug 1950749 Opened 9 months ago Updated 3 months ago

[meta] Improve developer ergonomics of adding new crash annotations to glean pings

Tracking

()

Status:

ASSIGNED

People

(Reporter: afranchuk, Assigned: afranchuk)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: meta)

Attachments

(7 obsolete files)

Bug 1950749 p1 - Move annotations-related things to their own folder. r=gsvelto 3 months ago Alex Franchuk [:afranchuk] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1950749 p2 - Refactor generation script to have one entry point. r=gsvelto 3 months ago Alex Franchuk [:afranchuk] 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1950749 p3 - Update the CrashAnnotations.yaml format to include Glean information. r=gsvelto 3 months ago Alex Franchuk [:afranchuk] 48 bytes, text/x-phabricator-request		Details \| Review
WIP: Bug 1950749 p4 - Create the pingsender crate. r=gsvelto 3 months ago Alex Franchuk [:afranchuk] 48 bytes, text/x-phabricator-request		Details \| Review
WIP: Bug 1950749 p5 - Use the pingsender crate in the crashreporter client. r=gsvelto 3 months ago Alex Franchuk [:afranchuk] 48 bytes, text/x-phabricator-request		Details \| Review
WIP: Bug 1950749 p6 - Use the pingsender crate for firefox desktop crashes. r=gsvelto 3 months ago Alex Franchuk [:afranchuk] 48 bytes, text/x-phabricator-request		Details \| Review
WIP: Bug 1950749 p7 - Use the pingsender crate for Android crashes. r=gsvelto 3 months ago Alex Franchuk [:afranchuk] 48 bytes, text/x-phabricator-request		Details \| Review

Alex Franchuk [:afranchuk]

Assignee

Description

•

9 months ago

•

Edited

Because Glean didn't support object metric types when the crash ping was originally created, annotations were manually translated to metrics. This is cumbersome. Ideally, a developer who wants to add an annotation can do so by simply editing CrashAnnotations.yaml and adding code to set the annotation. Instead, they must edit CrashAnnotations.yaml (some code paths will still use it for various purposes besides the legacy ping), the desktop and/or android metrics.yaml glean definitions, and the desktop/crashreporter client code to populate the metrics and/or the android client code to populate the metrics (all in different languages!).

There are a few solutions to this problem (numbered here for easy referencing):

Bug 1911301 would do a lot to mitigate the overhead here by centralizing the conversion code in one place and in one language. That would arguably help out a lot: asking developers to update multiple programming languages isn't great.
By adding a little more information to CrashAnnotations.yaml (where necessary), we may be able to generate the metrics files and/or conversions code.
We could change the crash ping to use an object metric for annotations. This makes querying annotation information in e.g. BigQuery slightly more involved, and also removes the ability to store particular annotations in more intelligent ways (quantity metrics with units, string lists, etc). However, it would allow the code that stores annotations in glean metrics to be generic. We could potentially also store some things more intelligently by adding extra information to CrashAnnotations.yaml. object metrics don't allow arbitrary keys in objects, however we could use a definition like:

type: object
properties:
  stringAnnotations:
    type: array
    items:
      type: object
      properties:
        key: string
        value: string
  boolAnnotations:
    type: array
    items:
      type: object
      properties:
        key: string
        value: string
...

Crash annotations could be stored using the glean API, dropping CrashAnnotations.yaml altogether. This is unlikely to be tenable since the annotation API is optimized for efficiency and for running in degraded circumstances. The API also features the ability to extract annotations from other processes, which would need to be reworked in some (likely significant) way.

Alex Franchuk [:afranchuk]

Assignee

Updated

•

9 months ago

Blocks: 1911301

Alex Franchuk [:afranchuk]

Assignee

Comment 1

•

7 months ago

The ping scope in CrashAnnotations.yaml is manually verified when adding Glean crash ping metrics (i.e., we don't have any way to programmatically verify that annotations put in the metrics are of the scope described in CrashAnnotations.yaml). We could add a lint for this, though it would require enumerating the mapping of annotations to metrics. This is something that would likely be fixed by reworking the ergonomics here.

Another possible solution:

Use the metrics.yaml as the new CrashAnnotations.yaml. I.e., it may be possible to define all the annotations within Glean definitions and change the crash annotations codegen to read from the metrics file rather than from the CrashAnnotations.yaml format. This is similar in spirit to solution 2: combining two separate definitions into one.

Alex Franchuk [:afranchuk]

Assignee

Updated

•

3 months ago

Assignee: nobody → afranchuk

Status: NEW → ASSIGNED

Alex Franchuk [:afranchuk]

Assignee

Comment 2

•

3 months ago

Plan

After some thought, I've come up with the following plan:

Change CrashAnnotations.yaml to instead be a Rust source file, so that the more complicated conversion logic from annotations to Glean metrics (e.g., stack traces) can be written once.
Use this new Rust source in:
- a build-time program to generate the language bindings for annotations (we still want language-native annotation enums),
- the crash reporter client,
- either linked into firefox or invoked through the crash reporter client binary, and
- linked into Android. This will be either combined into the same library that has minidump_analyzer, or the same one as crashhelper, to avoid yet another library (we will combine those in the future at some point anyway, when the crash helper does the minidump analysis itself).
Because Glean uses global statics, also send the crash ping from this new module.

Reasoning

This allows conversion code to be written in one language, rather than being generated in multiple languages (which would be higher maintenance to test, and more brittle to changes in Glean, language lints, etc).
Developers adding a crash annotation can do so by adding a definition to the Rust source file of annotations (which can be designed to be fairly declarative). If there is special logic to convert the annotation to a Glean metric, it can be written there, but otherwise default behaviors can handle the rest, and the developers would need only to then set the annotation in their code (no touching Glean metrics.yaml files, nor setting up conversions in many languages).
This is an incremental step toward an improved crash lifecycle. This code could be easily integrated into the crash helper in the future (which would be handling crash pings and reports).

Concession

I have a fairly clear idea of what should be done to implement this, however we could instead keep annotations in CrashAnnotations.yaml and generate all the language-specific code. I estimate this would take more time to develop/test (the only-Rust solution can piggyback off of the existing infrastructure like minidump-analyzer or crashhelper for "distribution" among the language runtimes), however the real cost would be future maintenance.

Flags: needinfo?(gsvelto)

Gabriele Svelto [:gsvelto]

Comment 3

•

3 months ago

I like this idea, and I especially like the fact that we could use the same rust code both at runtime and at compile time. Testing coverage is the cherry on top. We could use this to also further enforce static constraints on crash annotations (such as types and possibly even values, e.g. ensuring some annotation can only hold a subset of values, or the value must have certain constraints, etc...).

Flags: needinfo?(gsvelto)

Alex Franchuk [:afranchuk]

Assignee

Comment 4

•

3 months ago

•

Edited

I have been working on this, however the build system does not have much support for running compile-time Rust (aside from, perhaps, using a build script to produce side effects). The #build folks advised that I stick to using Python (at least for the time being) for the compile-time aspects. In the future there may be first-class support for running compile-time Rust: it sounded as if they were kicking around ideas to get it to work, because application-services developers need it, too. We might be able to get away with a cargo build script doing it, but for this bug I'm sticking with the Python. There still will be one Rust library that does the work across all platform environments, so we'll have reduced maintenance burden from that.

FYI the main issue #build were concerned with was supporting artifact builds.

Alex Franchuk [:afranchuk]

Assignee

Comment 5

•

3 months ago

Attached file Bug 1950749 p1 - Move annotations-related things to their own folder. r=gsvelto (obsolete) — Details

This leaves CrashAnnotations.yaml where it was to not confuse people
and tooling looking for it.

Alex Franchuk [:afranchuk]

Assignee

Comment 6

•

3 months ago

Attached file Bug 1950749 p2 - Refactor generation script to have one entry point. r=gsvelto (obsolete) — Details

Since the Java/Kotlin implementation worked based on the file extension,
I figured it could work that way for everthing.

Alex Franchuk [:afranchuk]

Assignee

Comment 7

•

3 months ago

Attached file Bug 1950749 p3 - Update the CrashAnnotations.yaml format to include Glean information. r=gsvelto (obsolete) — Details

Alex Franchuk [:afranchuk]

Assignee

Comment 8

•

3 months ago

Attached file WIP: Bug 1950749 p4 - Create the pingsender crate. r=gsvelto (obsolete) — Details

This crate handles converting crash annotations to Glean metrics and
sending the crash ping.

Glean metrics are generated directly from CrashAnnotations.yaml.

Alex Franchuk [:afranchuk]

Assignee

Comment 9

•

3 months ago

Attached file WIP: Bug 1950749 p5 - Use the pingsender crate in the crashreporter client. r=gsvelto (obsolete) — Details

Alex Franchuk [:afranchuk]

Assignee

Comment 10

•

•

3 months ago

Depends on: 1989439

Alex Franchuk [:afranchuk]

Assignee

Updated

•

3 months ago

Summary: Improve developer ergonomics of adding new crash annotations to glean pings → [meta] Improve developer ergonomics of adding new crash annotations to glean pings

•

3 months ago

Attachment #9513509 - Attachment is obsolete: true

You need to log in before you can comment on or make changes to this bug.