Stream resource monitor profile data incrementally to support crash/timeout recovery in CI
Categories
(Testing :: Mozbase, enhancement)
Tracking
(Not tracked)
People
(Reporter: florian, Assigned: florian)
References
(Regressed 1 open bug)
Details
Attachments
(2 files)
Currently, SystemResourceMonitor only writes the profile file at the end when as_profile() is called. If a CI worker gets killed due to a timeout, no profile data is saved.
This adds incremental streaming of profile data to a JSON Lines file while the monitor is running. The first line contains the meta object, and subsequent lines contain one marker each (test markers, resource measurements, events). On normal shutdown, the file is replaced with the full serialized JSON profile as before.
Changes:
SystemResourceMonitorgains astart_streaming(path)method that begins writing profile data incrementally- Measurement data from the child process is now sent through the pipe immediately and drained periodically by a timer, instead of being buffered until
stop() - Each streamed JSON line has a
typefield ("meta"or"marker") for future extensibility - mozharness calls
start_streamingso CI test runs produce partial profile data even on timeout
| Assignee | ||
Comment 1•5 days ago
|
||
Pure refactor: extract _build_meta(), _build_thread(), _measurement_markers(),
_format_percent(), _drain_pipe(), and class-level category constants from
as_profile() and stop() into reusable methods. No behavior change.
| Assignee | ||
Comment 2•5 days ago
|
||
Currently, SystemResourceMonitor only writes the profile file at the end when
as_profile() is called. If a CI worker gets killed due to a timeout, no profile
data is saved.
This adds incremental streaming of profile data to a JSON Lines file while the
monitor is running. The first line contains the meta object, then a thread
object, then one line per marker (test markers, resource measurements, events).
On normal shutdown, the file is replaced with the full serialized JSON profile
as before.
| Assignee | ||
Comment 3•5 days ago
|
||
The profiler front-end needs to support the new format, there's a deploy preview at https://deploy-preview-5901--perf-html.netlify.app/
There's a try push at https://treeherder.mozilla.org/jobs?repo=try&revision=d015b5e0c9f74a6536a28e2ef3b7246adeb7cd63&selectedTaskRun=OpdbI0z2THOqvSq6unfEAA.0 with jobs that failed with a timeout and would usually not have uploaded resource usage profiles.
Description
•