Closed
Bug 893391
Opened 12 years ago
Closed 8 years ago
Investigate CPU steal / unexpected 100% CPU utilization
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: gps, Unassigned)
References
Details
System resource monitoring has landed in mozharness in bug 859573! The early returns seem to show an unexpected CPU usage value on some platforms. Let's take a sample xpcshell test result:
From OS X 10.8 opt:
https://tbpl.mozilla.org/php/getParsedLog.php?id=25246533&tree=Cedar&full=1
17:38:28 INFO - Total resource usage - Wall time: 1441s; CPU: 12%; Read bytes: 37781504; Write bytes: 9456452608; Read time: 22955; Write time: 385785
17:38:28 INFO - install - Wall time: 11s; CPU: 12%; Read bytes: 90764288; Write bytes: 177738752; Read time: 11310; Write time: 32294
17:38:28 INFO - run-tests - Wall time: 1430s; CPU: 12%; Read bytes: 30744576; Write bytes: 9276365824; Read time: 19331; Write time: 351462
1
From Ubuntu64 opt:
https://tbpl.mozilla.org/php/getParsedLog.php?id=25251650&tree=Cedar&full=1
19:38:06 INFO - Total resource usage - Wall time: 1835s; CPU: 100%; Read bytes: 6754304; Write bytes: 5492326400; Read time: 1004; Write time: 3301420
19:38:06 INFO - install - Wall time: 14s; CPU: 100%; Read bytes: 0; Write bytes: 229376; Read time: 0; Write time: 76
19:38:06 INFO - run-tests - Wall time: 1821s; CPU: 100%; Read bytes: 1683456; Write bytes: 5492097024; Read time: 512; Write time: 3301344
OS X ran in 1430s but only used 12% CPU on average. Ubuntu64 ran in 1821s (longer) but used 100%. That doesn't sound right!
Since the Ubuntu slave is a virtual machine (tst-linux64-ec2-382), I suspect the 100% accounts for other virtual machines on the physical machine. Linux reports CPU usage from other virtual machines as "CPU steal." The underlying resource monitoring code does distinguish between various CPU usage types (user, system, steal, etc). We just don't report on it yet.
We should consider reporting on or filtering out CPU steal from the output.
This is likely a temporary workaround until bug 893388 lands and we can analyze (and graph) the raw data.
Reporter | ||
Comment 1•12 years ago
|
||
I accidentally compared a debug and opt build. Here are the proper values:
OS X:
19:37:52 INFO - Total resource usage - Wall time: 874s; CPU: 10%; Read bytes: 28880896; Write bytes: 10214718464; Read time: 16594; Write time: 464182
19:37:52 INFO - install - Wall time: 18s; CPU: 13%; Read bytes: 215960576; Write bytes: 302470144; Read time: 17683; Write time: 30979
19:37:52 INFO - run-tests - Wall time: 856s; CPU: 10%; Read bytes: 21164032; Write bytes: 9909498880; Read time: 14153; Write time: 430615
Ubuntu64:
19:38:06 INFO - Total resource usage - Wall time: 1835s; CPU: 100%; Read bytes: 6754304; Write bytes: 5492326400; Read time: 1004; Write time: 3301420
19:38:06 INFO - install - Wall time: 14s; CPU: 100%; Read bytes: 0; Write bytes: 229376; Read time: 0; Write time: 76
19:38:06 INFO - run-tests - Wall time: 1821s; CPU: 100%; Read bytes: 1683456; Write bytes: 5492097024; Read time: 512; Write time: 3301344
This is even worse because the Ubuntu VM is executing the tests over 2x slower! (Part of it might be explained by I/O wait - although a straight compare of the numbers between platforms isn't advised due to differences in how things are measured.)
Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Comment 2•8 years ago
|
||
We're not going to investigate this for buildbot AWS instances.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•