Closed
Bug 975466
Opened 9 years ago
Closed 2 years ago
[meta] Spidermonkey builder resource load
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
INACTIVE
People
(Reporter: sfink, Assigned: sfink)
References
(Depends on 1 open bug)
Details
(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2199] )
Attachments
(1 file)
5.97 KB,
text/plain
|
Details |
I thought I'd open up a meta bug for spidermonkey builders, to track changes that influence resource utilization. I'm also going to take a stab at measuring the current load, and will record the findings here.
Assignee | ||
Comment 1•9 years ago
|
||
After massaging the data a bit (throwing out everything older than 90 days, categorizing builders, etc.), I found that there are a number of jobs that took a very long time (weeks, months). So I ignored anything that took longer than 2 hours. (That discards only 0.17% of the spidermonkey builds, but 15% of the other build jobs. If I leave them in, spidermonkey takes a much smaller percentage of the time, but I'm suspicious of that.) select b.spidermonkey, sum(r.run_time) from requests r join buildrequests br on r.id = br.id join builders b on br.buildername=b.buildername where r.run_time < 7200 and r.type=1 group by b.spidermonkey; +--------------+-----------------+ | spidermonkey | sum(r.run_time) | +--------------+-----------------+ | 0 | 965328186 | | 1 | 18495002 | +--------------+-----------------+ r.type=1 restricts this to compare to only other build jobs. So spidermonkey is 1.88% of the time spent on build jobs that take less than 2 hours. If you don't like the artificial 2 hour threshold, removing that means spidermonkey is 1.27% of the build load, 0.33% of the overall load.
Assignee | ||
Comment 2•9 years ago
|
||
Unrelated, but here's a dump of the run time of the various job types: select r.type, sum(r.run_time) from requests r join buildrequests br on r.id = br.id join builders b on br.buildername=b.buildername where r.run_time < 7200 group by r.type;+------+-----------------+ | type | sum(r.run_time) | +--------+-----------------+ |build | 983823188 | |test | 3579278383 | |talos | 360559291 | |valgrind| 14565336 | |misc | 3884796 | |fuzzer | 86438800 | +--------+-----------------+ or in machine-years (this is still for the last 90 days): +------+------------------------------+ | type | sum(r.run_time)/60/60/24/365 | +------+------------------------------+ | 1 | 31.2 | | 2 | 113.5 | | 3 | 11.4 | | 4 | 0.5 | | 5 | 0.1 | | 6 | 2.7 | +------+------------------------------+ (types are in the same order as the previous table).
Assignee | ||
Comment 3•9 years ago
|
||
I recorded most of what I did, if you'd like to spot where I screwed up.
Assignee | ||
Comment 4•9 years ago
|
||
Oh, right. Note that I am relying on buildsets.submitted_at rather than chasing through to the changes.when_timestamp. Hopefully that doesn't introduce too much inaccuracy? I had trouble getting that query to ever finish. I probably should have modified get_build_times_for_builders.py instead of doing everything in raw sql.
Comment 5•9 years ago
|
||
Steve, thank you for you doing this. The data you have here is probably enough - it's nice to know that Spidermonkey is less than 2% of our load. (In reply to Steve Fink [:sfink] from comment #1) > After massaging the data a bit (throwing out everything older than 90 days, > categorizing builders, etc.), I found that there are a number of jobs that > took a very long time (weeks, months). So I ignored anything that took > longer than 2 hours. You probably want to throw out things that take longer than 6 hours. We have plenty of jobs that take more than 2, and I know of some that take over 4. 6 hours should get you all normal jobs, and still throwaway the crazy ones (which are almost always not real). (In reply to Steve Fink [:sfink] from comment #4) > Oh, right. Note that I am relying on buildsets.submitted_at rather than > chasing through to the changes.when_timestamp. Hopefully that doesn't > introduce too much inaccuracy? I had trouble getting that query to ever > finish. I probably should have modified get_build_times_for_builders.py > instead of doing everything in raw sql. I actually just realized that you probably don't want either of these. They're useful for measuring turnaround times, as they represent push time, but if you just care about job run times you should use builds.finish_time-builds.start_time. It's tricky though, because at some point you have to account for coalesced builds due to peak load. My script (https://hg.mozilla.org/build/braindump/file/c10104f5f52c/buildbot-related/get_build_times_for_builders.py) knows how to do that.
Assignee | ||
Comment 6•9 years ago
|
||
(In reply to Ben Hearsum [:bhearsum] from comment #5) > (In reply to Steve Fink [:sfink] from comment #4) > > Oh, right. Note that I am relying on buildsets.submitted_at rather than > > chasing through to the changes.when_timestamp. Hopefully that doesn't > > introduce too much inaccuracy? I had trouble getting that query to ever > > finish. I probably should have modified get_build_times_for_builders.py > > instead of doing everything in raw sql. > > I actually just realized that you probably don't want either of these. > They're useful for measuring turnaround times, as they represent push time, > but if you just care about job run times you should use > builds.finish_time-builds.start_time. It's tricky though, because at some > point you have to account for coalesced builds due to peak load. My script > (https://hg.mozilla.org/build/braindump/file/c10104f5f52c/buildbot-related/ > get_build_times_for_builders.py) knows how to do that. Whoops, sorry. Although I do use buildsets.submitted_at, that's only for computing wait_time, and I'm not using that for the above data. So those results are already using what you suggest, finish_time-start_time. But without handling coalescing. Here are the results for build-only jobs less than 6 hours: +--------------+-----------------+ | spidermonkey | sum(r.run_time) | +--------------+-----------------+ | 0 | 1425141881 | | 1 | 18596579 | +--------------+-----------------+ which comes to 1.29%. (...and in related new, it looks like I've accidentally added several spidermonkey try jobs recently that I need to kill off...)
Updated•9 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2184]
Updated•9 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2184] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2195]
Updated•9 years ago
|
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2195] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2199]
Assignee | ||
Comment 8•6 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #7) > Steve, is there anything left to do here? No. This is a meta bug. I've been using it to gather bugs that add more spidermonkey builds (or change their cost substantially), just from a vague sense of wanting to keep a handle on things and to notify you people when I'm running up your AWS bill. But the actual content of this bug comes from buildbot-only days, and hopefully there are now better ways to estimate the load imposed by a set of jobs (garndt has something for that, I believe?). So if this is being tracked in some other way now, then there is indeed no reason for this bug to exist.
Flags: needinfo?(sphink)
Updated•5 years ago
|
Assignee: nobody → sphink
Updated•5 years ago
|
Component: General Automation → General
Assignee | ||
Comment 9•2 years ago
|
||
This bug isn't really being used anymore.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INACTIVE
You need to log in
before you can comment on or make changes to this bug.
Description
•