Update json schema for build system telemetry

RESOLVED FIXED in Firefox 63

Status

enhancement
RESOLVED FIXED
a year ago
5 months ago

People

(Reporter: ted, Assigned: ted)

Tracking

3 Branch
mozilla63
Dependency tree / graph

Firefox Tracking Flags

(firefox63 fixed)

Details

Attachments

(5 attachments)

In bug 1244160 we created a JSON schema for build system telemetry data. Since then we've had discussions about what info we really need to collect, so we'll want to update it accordingly.
Posted file Proposed new schema
Here's a first draft at my proposed new schema. I based this on the discussion we had at the developer workflow work week.

I'm a little fuzzy on some of the details we settled on that weren't explicitly called out in the notes, so I would love some feedback! Specifically here are a few things I'm not 100% sure on:
* We discussed including the list of files changed since the previous invocation, but I remember saying that felt a bit intrusive. I don't remember if we settled on a specific end result in that discussion. I've included a `file_types_changed` field here, which is intended to be a list of file extensions.
* The notes say "Exception code if applicable". I know we went around on submitting errors, and I argued that telemetry wasn't the right place to try to do useful error reporting. I have included an "exception" field here so we can perhaps get a rough approximation of exceptions encountered, but I don't know exactly what we should put in that field. The simplest thing we could do would be to just turn it into a `boolean` to record whether an exception was encountered or not.
* The notes say "Sequence of commands" and I remember talking about this and deciding that we could derive this from the client id, but it feels like there's something I'm forgetting. Do we need another field that provides a sequence number, or should we give each record a UUID, and have a `last_record` field that contains the UUID of the previous record so we can chain them, or what?
* I tried to pare down the keys in the `system` section to a small set of sensible data that should hopefully be straightforward to collect. (I'm a *little* worried about `drive_is_ssd`, but I found some pointers on how to determine this for Linux and Windows, so I think it's doable.)
gps: I'd love your feedback on this schema proposal. Happy to hear from anyone else as well!
Flags: needinfo?(gps)
Random notes.

Capturing a mach command's arguments will have privacy implications, since paths can leak into there.

Capturing exceptions will be *very* useful. Capturing them with PII sanitized will be difficult. It will be useful to have an exception reporting mechanism though. We may want to make it more structured. e.g. trying to capture the exception type, message, and its traceback fields separately. This will make analysis much simpler.

For file types changed, my recollection is that we wanted to report the extensions changed, which is what you have implemented. A missing useful piece is the count of changed files for each extension. It's very useful to know if we changed 1 .cpp file or 1000.

For build options, do you think we could throw in the C/C++ compiler type?

Do we have a reliable way of determining whether a drive is an SSD? I like that we're reporting that. I'm just not sure if we can accurately detect it. I imagine there would be plenty of corner cases. e.g. I'm pretty sure that a lot of VMs don't expose drives as SSDs even though they may be backed by SSDs.
Flags: needinfo?(gps)
Oh, overall this looks pretty good!

Also, I think we want a wall time field. UNIX timestamp will be fine. That's needed to help us reconstruct a sequence of commands. Wall time isn't great due to clock skew. But I don't think we can easily use a monotonic clock here due to e.g. resets across system restarts.
(In reply to Gregory Szorc [:gps] from comment #4)
> Random notes.
> 
> Capturing a mach command's arguments will have privacy implications, since
> paths can leak into there.

This is a good point. I'll see if I can come up with a proposal to limit the set of data here, with filtering on path arguments.

> Capturing exceptions will be *very* useful. Capturing them with PII
> sanitized will be difficult. It will be useful to have an exception
> reporting mechanism though. We may want to make it more structured. e.g.
> trying to capture the exception type, message, and its traceback fields
> separately. This will make analysis much simpler.

I'll restate what I said at the work week: I don't think build system telemetry is the right place to try to do exception reporting. I'm fine with having a simple "exception happened" field containing the exception type and maybe code, but trying to capture more than that is out of scope and better suited for a purpose-built system like Sentry.

> For file types changed, my recollection is that we wanted to report the
> extensions changed, which is what you have implemented. A missing useful
> piece is the count of changed files for each extension. It's very useful to
> know if we changed 1 .cpp file or 1000.

Good point. I guess this should be a map of extension -> count then.

> For build options, do you think we could throw in the C/C++ compiler type?

That's easy to do, yes.

> Do we have a reliable way of determining whether a drive is an SSD? I like
> that we're reporting that. I'm just not sure if we can accurately detect it.
> I imagine there would be plenty of corner cases. e.g. I'm pretty sure that a
> lot of VMs don't expose drives as SSDs even though they may be backed by
> SSDs.

For non-virtual-machines there seem to be reasonable ways to get this info on Linux and Windows (I didn't look into macOS yet). We could additionally report whether the build is running in a VM, which seems to be doable. I might add that as a schema field but punt on actually reporting the SSD/VM info at first (unless it turns out to be easy to accomplish).

Comment 7

10 months ago
Ted:

You mentioned that you had something to pretty close to ready to land a while ago.  Would you be able to revisit this?
Flags: needinfo?(ted)
Yes, I have some updated patches here. I revisited this work and decided to convert the schema to a voluptuous schema to make it easier to work with in Python. I'll post patches as soon as I figure out my way around phabricator. :)
Flags: needinfo?(ted)
voluptuous 0.11.1 added support for a `description` argument for Required and
Optional objects, which is useful for adding descriptions in the schema that
we can persist when converting it to json-schema format. This patch vendors
the current version of voluptuous, which is 0.11.5.

MozReview-Commit-ID: 2qt1KE8MPYR
This change adds a voluptuous schema for build system telemetry, replacing
the existing json schema file. Using voluptuous will make it easier to work
with the schema from Python code in the build system. A future commit will
use a Python module to provide a mach command to convert the voluptuous
schema to json schema format for consumption by other systems.

MozReview-Commit-ID: 2GimpiOo7fa
External systems such as the generic ingestion service will want to work with
the more standard json-schema format. This commit adds a `mach telemetry-schema`
command to convert the voluptuous schema to json-schema format using the
`luscious` Python module. Since that module has not been updated recently,
we install and use a fork with some changes.

MozReview-Commit-ID: BOECX2tyq21
Comment on attachment 8998139 [details]
bug 1461992 - update vendored copy of voluptuous to 0.11.5. r?build

Gregory Szorc [:gps] has approved the revision.

https://phabricator.services.mozilla.com/D2839
Attachment #8998139 - Flags: review+
Comment on attachment 8998140 [details]
bug 1461992 - add a voluptuous schema for build system telemetry. r?build

Gregory Szorc [:gps] has approved the revision.

https://phabricator.services.mozilla.com/D2840
Attachment #8998140 - Flags: review+

Updated

10 months ago
Attachment #8998141 - Attachment description: bug 1461992 - Add `mach telemetry-schema` to output build system telemetry schema in json-schema format. r?build → bug 1461992 - Add a script to output build system telemetry schema in json-schema format. r?build
Comment on attachment 8998139 [details]
bug 1461992 - update vendored copy of voluptuous to 0.11.5. r?build

Tom Prince [:tomprince] has approved the revision.
Attachment #8998139 - Flags: review+
Comment on attachment 8998141 [details]
bug 1461992 - Add a script to output build system telemetry schema in json-schema format. r?build

Gregory Szorc [:gps] has approved the revision.
Attachment #8998141 - Flags: review+

Comment 17

9 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/a077bf122c05
https://hg.mozilla.org/mozilla-central/rev/0f9bae9652bb
https://hg.mozilla.org/mozilla-central/rev/0f4b8b37bf95
Status: NEW → RESOLVED
Last Resolved: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
:mreid sent a review of my pull request and informed me that the
JSON schemas used in the data pipeline are all 2-space indented.
If we want to use this script to generate any future changes to
the build system schema, we should make sure it outputs data
correctly.
Comment on attachment 9003601 [details]
Bug 1461992: only indent 2 spaces in export_telemetry_schema.py output r?ted

Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9003601 - Flags: review+

Comment 20

8 months ago
Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/666d75e6584b
only indent 2 spaces in export_telemetry_schema.py output r=ted
Version: Version 3 → 3 Branch
You need to log in before you can comment on or make changes to this bug.