Open Bug 1425785 Opened 6 years ago Updated 2 years ago

Collect system resource metrics from run-task

Categories

(Firefox Build System :: Task Configuration, task)

task

Tracking

(Not tracked)

People

(Reporter: gps, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

run-task is our generic task wrapper executable. It is currently used in a number of places on Linux. We could also use it on MacOS and Windows with a little effort.

I've been floating a well-received idea with a number of people to introduce system resource metrics collection into run-task. Essentially, we'd automatically record CPU, disk, memory, swap, etc metrics during tasks. We'd upload this data automatically for each task. ActiveData, Perfherder, etc could consume this data and do nice things with it (like monitor our overall CPU or I/O usage for tasks).

Instead of run-task doing an exec() to invoke the "main" process of the task, it would instead spawn a child process. run-task would remain alive so it can collect resource usage periodically and so it could "monitor" the spawned process.

What we're talking about is essentially integrating the existing mozsystemmonitor functionality into run-task. However, mozsystemmonitor is kinda hacked together and is difficult to get running on all machines since it requires psutil, which requires a compiled Python C extension. Supporting mozsystemmonitor has been a PITA. And, you are left with a less-than-robust Python code in the critical path of process execution. This almost certainly leads to a run time slowdown.

So, the plan is to re-implement mozsystemmonitor in Rust and run it as a statically linked binary.

In addition, we'll also formulate a mechanism to communicate counters, points-in-time events, and phases of monitored processes between invoked processes and our new resource monitor. Today, the build system emits special lines like "BUILDSTATUS BEGIN_TIER export" and there is custom code in `mach build` to convert these special lines to API calls into mozsystemmonitor to register events of interest. This allows us to e.g. compute average CPU usage for different phases of the build system. We'll formalize and improve this "API" in the Rust implementation so any spawned process can emit specially-formed lines of output to tell the resource monitor what's going on. Once deployed, we'll "teach" various processes to emit this syntax so the resource monitor can annotate events of interest and can correlate resource usage to these events.

This bug will likely morph into a meta/tracking bug.
Depends on: 1425787
How is this related to the resource_usage.json which gets uploaded for each task?

https://public-artifacts.taskcluster.net/OHybBs2aSFy2Bk41LBO5pA/0/public/test_info//resource-usage.json
resource_usage.json is produced by mozharness. That uses mozsystemmonitor under the hood.

We're moving away from mozharness. And mozharness isn't running for all "phases" of tasks that we want it to. So the plan is to move this functionality lower in the "stack" so it is usable by any task and for the duration of the entire task. This also means we can move tasks off mozharness without losing our resource metrics.
Ah, thanks for that info.
Product: TaskCluster → Firefox Build System
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.