Closed Bug 1237610 Opened 8 years ago Closed 6 years ago

Gather and report local build metrics

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox64 fixed)

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: dminor, Assigned: ted)

References

(Depends on 1 open bug, Blocks 5 open bugs)

Details

Attachments

(6 files)

We would like to gather and report build metrics from local developer's machines to help identify problems and quantify progress in making the build system faster. Because of privacy implications, this system will be opt-in and will not collect PII (at least initially.)

Data we're considering collecting:
    * mach command (or at least build target)
    * machine characteristics - processor, ram, drive type, os
    * build type - full, incremental, or files affected (or a count of files affected)
    * elapsed time
    * CPU usage (samples, by second?)
    * Disk usage (samples)
    * Memory usage (samples)
    * Swap usage (samples)

We already collect resource usage data and store it in resource_usage.json during a build, so we might be able to reuse that.

Our initial plan is to report this data to Telemetry.
Depends on: 1237619
Assignee: nobody → dminor
Depends on: 1239296
Depends on: 1239719
Depends on: 1240059
Depends on: 1241944
Depends on: 1244143
Depends on: 1244160
Depends on: 1250624
Depends on: 1250656
Depends on: 1251076
Depends on: 1252601
Assignee: dminor → nobody
This is on me now.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Depends on: 1291053
Assignee: gps → ted
Until the generic ingestion service is available we're going to continue using ekyle's server. It stores data in a private s3 bucket, which should be good enough for now. This patch updates the address of the server and also flips submission from opt-in to opt-out. I'll post about this on dev-platform before landing.

Once this lands we should be able to start looking at some of the data and we can figure out if we need more information to answer the questions we'd like to ask.
Comment on attachment 8878612 [details]
bug 1237610 - make mach telemetry opt-out rather than opt-in.

https://reviewboard.mozilla.org/r/149928/#review155250

The code seems correct. But this gets a r- for policy.

I'm reasonably confident that we can't collect data from client installs without at least prompting the end user that this occurs. This would likely violate Mozilla data collection policies and probably some laws. We need sign-off from someone authorized to make such a sign-off. And I reckon that sign-off won't occur until we have some kind of notice that data collection occurs.

Also, submitting to an IP address via http:// is sketchy. Can we not get https:// or a hostname (I have access to some Mozilla-related DNS zones hosted on AWS that we could use)? Of course, https:// presents its own challenges, such as validating the certificate - which isn't guaranteed to work on all Python installs :/
Attachment #8878612 - Flags: review?(gps) → review-
Ah, I thought we had sorted out the privacy issues of submitting as opt-out, thanks for the info! We'll have to figure out what the right way to handle that is, then.

RE: SSL, good call. If you have a hostname to use I'm sure we can point it at ekyle's server and get a Let's Encrypt cert configured. I'll file a bug for that.
Depends on: 1374380
As I was writing a comment about the privacy issue, I realized the simplest way to do this might just be to put a prompt in `mach bootstrap`.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #5)
> Ah, I thought we had sorted out the privacy issues of submitting as opt-out,
> thanks for the info! We'll have to figure out what the right way to handle
> that is, then.

I think I had a verbal agreement that a proposal I came up sounded reasonable. Not sure if AutomatedTester or Coop ever got something more formal.
(In reply to Gregory Szorc [:gps] from comment #7) 
> I think I had a verbal agreement that a proposal I came up sounded
> reasonable. Not sure if AutomatedTester or Coop ever got something more
> formal.

If we want to wait for the generic ingestion service, we're covered, but if we want to collect the data before that service is ready on our own custom server, we'll need to address the concerns around PII and retention.

I'll grab bug 1291053 and start driving it.
We're still waiting on bug 1242017, so gps has setup a new, temporary ingestion point. See https://bugzilla.mozilla.org/show_bug.cgi?id=1374380#c5 for host details.
Product: Core → Firefox Build System
In our workweek last week, we decided that these are the initial characteristics we are looking to capture from the build telemetry work

What mach commands are they running? Mach build is most important
Basic timing information - time to run mach build
Basic hardware info - cpu brand string, memory, type of disk
Files were changed through invocation - mach watchman
Configure flags: Artifact build or not
Debug vs opt
Exception code if applicable 
Sequence of commands
Persistent client id
Sccache and icecream usage
Depends on: 1461992
Blocks: 1480362
Blocks: 1481612
Blocks: 1481613
Blocks: 1481614
Blocks: 1481617
Blocks: 1481624
Blocks: 1481774
A quick update from the data ingestion side: I have submitted a pull request of the updated JSON schema to the mozilla-pipeline-schemas repository at https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/191. I had an issue getting tests to pass against the Parquet schema format and after speaking with :frank in #datapipeline, I was told to submit the pull request with a comment asking for help finding the error. I am waiting for a review on that PR.
MozReview-Commit-ID: HJLO82QZQVO
This patch rewrites `gather_telemetry` to collect data matching the new schema.
This includes all required fields and most of the optional fields. Some fields
are not currently recorded and followup bugs have been filed to track their
implementation.
Comment on attachment 9005006 [details]
bug 1237610 - slight cleanup in should_skip_dispatch. r?build

Gregory Szorc [:gps] has approved the revision.
Attachment #9005006 - Flags: review+
Comment on attachment 9005007 [details]
bug 1237610 - don't call post_dispatch_handler when using debug-command. r?build

Gregory Szorc [:gps] has approved the revision.
Attachment #9005007 - Flags: review+
Comment on attachment 9005008 [details]
bug 1237610 - use a mach setting to control telemetry submission. r?build

Gregory Szorc [:gps] has approved the revision.
Attachment #9005008 - Flags: review+
This commit updates submit_telemetry_data.py to send data
to the Telemetry pipeline. The script assumes the presence
of a "telemetry" directory within the statedir, and an
"outgoing" directory within the "telemetry" directory (otherwise
there is no data to submit). The script will create a
"submitted" directory and "telemetry.log" file if absent,
making the assumption that this is the first build telemetry
submission for that user. UUID values for submitted data points
are seeded from the filename, without the ".json" suffix.
Comment on attachment 9008480 [details]
Bug 1237610: update `submit_telemetry_data.py` r?ted

Ted Mielczarek [:ted] [:ted.mielczarek] has approved the revision.
Attachment #9008480 - Flags: review+
Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ce05cf6d5e19
update `submit_telemetry_data.py` r=ted
Comment on attachment 9005009 [details]
bug 1237610 - Collect telemetry data matching the new schema. r?build

Gregory Szorc [:gps] has approved the revision.
Attachment #9005009 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/ce05cf6d5e19
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Depends on: 1493623
Blocks: 1497638
Depends on: 1887885
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: