Closed
Bug 1245823
Opened 9 years ago
Closed 9 years ago
Speedup runtime of granular pending builds nagios check
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: aselagea)
References
Details
Attachments
(5 files, 5 obsolete files)
512.95 KB,
text/plain
|
Details | |
1.57 KB,
text/plain
|
Details | |
1.62 KB,
text/plain
|
Details | |
4.25 KB,
patch
|
aselagea
:
checked-in+
|
Details | Diff | Splinter Review |
4.59 KB,
text/plain
|
Details |
Over in Bug 1204970] we made check pending builds use allthethings.json and parse that for actual pool depth/age.
However over the past 24 hours or so its been flapping as CRITICAL due to nagios timeouts in trying to run it.
When we first created the new check, amy bumped the timeout to 60s, but bumping it more than that isn't as useful.
There are two main options here:
* Speed up the overall runtime of the check
* Make the script run in a cron and emit to a static file the nagios check queries.
I'm partial to trying for speeding up first.
Assignee | ||
Comment 1•9 years ago
|
||
I uploaded a new version of the pending count check script. There are two major things that have been changed:
- cached the list of builders by counting how many times they repeat: in this way, we will avoid comparing each builder name with its correspondent from the allthethings.json file, thus saving a great amount of time
- use the local allthethings.json file (which is also computed on cruncher) instead of its secure.pub.m.o link
These combined can greatly reduce the time required to run the script.
Attachment #8717015 -
Flags: feedback?(bugspam.Callek)
Reporter | ||
Comment 2•9 years ago
|
||
Comment on attachment 8717015 [details]
new_check_pending_builds.py
giving feedback would be much easier for me in terms of a diff rather than a flat file.
Also I'm interested in the timing info from before and after the change, e.g.
stamp_start=$(date --utc +%s); for i in {1..20}; do sleep 1; done; stamp_end=$(date --utc +%s); echo "$(((stamp_end - stamp_start) / 20))"
where you replace "sleep 1" with the call to execute this script, and it outputs the average time [in seconds] to execute.
And you run it again with the change (you can s/20/10/g in there if you are concerned about waiting too long)
Attachment #8717015 -
Flags: feedback?(bugspam.Callek)
Assignee | ||
Comment 3•9 years ago
|
||
Since at the moment we only have a small number of pending jobs (under 20), I opted doing some local tests using a previously saved "builds-pending.json" file, containing 4530 pending jobs and a size of 512KB. Here are the two cases:
I. new version of the script: "builds-pending.json" --> local file, "allthethings.json" --> also a local file (the new script uses the latest version of "allthethings.json" file from cruncher, so it's pretty much a similar situation)
II. old version of the script: "builds-pending.json" --> local file, "allthethings.json" --> specified by secure.pub.m.o link
As it can be noticed, there's a significant difference between the two scripts in terms of running speed. Attached the patch that contains the changes with respect to the version that can be found on braindump and the "builds-json" file used for tests.
Assignee | ||
Comment 4•9 years ago
|
||
Attachment #8717015 -
Attachment is obsolete: true
Assignee | ||
Comment 5•9 years ago
|
||
Attachment #8717383 -
Flags: feedback?(bugspam.Callek)
Assignee | ||
Comment 6•9 years ago
|
||
Assignee | ||
Comment 7•9 years ago
|
||
Assignee | ||
Comment 8•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → alin.selagea
Reporter | ||
Comment 9•9 years ago
|
||
Comment on attachment 8717383 [details] [diff] [review]
bug_1245823.patch
Review of attachment 8717383 [details] [diff] [review]:
-----------------------------------------------------------------
::: nagios-related/check_pending_builds.py
@@ +28,5 @@
> + if k in jobpool_cache:
> + pending_builds[k] += 1
> + else:
> + pending_builds[k] = 1
> + jobpool_cache.append(k)
jobpool_cache is unused here, can save some processing time by removing it, and instead doing:
if k in pending_builds:
@@ +36,5 @@
> # get the builder names and their corresponding slavepools
> def get_builders_and_slavepools():
> builders_and_slavepools = []
> + with open("/var/www/html/builds/allthethings.json") as response:
> + result = json.loads(response.read())
using `response` here clobbers the global import, and is confusing.
`as f:` is a common format.
And since you're changing to a file object instead of response's library, you can switch to:
`result = json.load(f)` since .load() uses a file-object and avoid the resident memory useage of .read()-->json-object
@@ +78,4 @@
> count_by_slavepool = []
> for m in range(0, len(slavepools)):
> count_by_slavepool.append([slavepools[m][0], 0])
> + for i,j in pending_builds.items():
use .iteritems() here (its better on mem usage)
@@ +84,3 @@
> for m in range(0, len(count_by_slavepool)):
> + if builders_and_slavepools[k][1] == slavepools[m][0]:
> + count_by_slavepool[m][1] = count_by_slavepool[m][1] + j
perhaps `count_by_slavepool[m][1]` += j would read better?
(it's semantically the same, and has no perf benefit this new way)
Attachment #8717383 -
Flags: feedback?(bugspam.Callek) → feedback+
Assignee | ||
Comment 10•9 years ago
|
||
Attachment #8717383 -
Attachment is obsolete: true
Attachment #8718771 -
Flags: review?(bugspam.Callek)
Assignee | ||
Comment 11•9 years ago
|
||
Attachment #8717381 -
Attachment is obsolete: true
Assignee | ||
Comment 12•9 years ago
|
||
Thanks for your feedback.
I did the suggested changes, tested the script again (both locally and on cruncher) and worked without issues. Here's the new version of the patch and the new version of the script.
If everything is OK, should we land the patch and then replace the script on cruncher?
Reporter | ||
Comment 13•9 years ago
|
||
(In reply to Alin Selagea [:aselagea][:buildduty] from comment #12)
> Thanks for your feedback.
>
> I did the suggested changes, tested the script again (both locally and on
> cruncher) and worked without issues. Here's the new version of the patch and
> the new version of the script.
Reviewing now ;-)
> If everything is OK, should we land the patch and then replace the script on
> cruncher?
Yes.
Reporter | ||
Comment 14•9 years ago
|
||
Comment on attachment 8718771 [details] [diff] [review]
bug_1245823.patch
Review of attachment 8718771 [details] [diff] [review]:
-----------------------------------------------------------------
r+ with the following changes.
::: nagios-related/check_pending_builds.py
@@ +27,5 @@
> + k = request['buildername']
> + if k in pending_builds.keys():
> + pending_builds[k] += 1
> + else:
> + pending_builds[k] = 1
trailing whitespace
@@ +28,5 @@
> + if k in pending_builds.keys():
> + pending_builds[k] += 1
> + else:
> + pending_builds[k] = 1
> + pending_builds.update()
.update() does nothing, drop it ;-)
@@ +35,5 @@
>
> # get the builder names and their corresponding slavepools
> def get_builders_and_slavepools():
> builders_and_slavepools = []
> + with open("/var/www/html/builds/allthethings.json") as f:
Hrm, actually lets make this a global variable like the all_things_url was, that will make it easier to discover if this path ever changes.
Attachment #8718771 -
Flags: review?(bugspam.Callek) → review+
Assignee | ||
Comment 15•9 years ago
|
||
Okay, updated the patch.
Attachment #8718771 -
Attachment is obsolete: true
Assignee | ||
Comment 16•9 years ago
|
||
Attachment #8718772 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Attachment #8718857 -
Flags: checked-in+
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•