Closed Bug 1245823 Opened 9 years ago Closed 9 years ago

Speedup runtime of granular pending builds nagios check

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

Product:

Component:

Type:

task

Priority:

Not set

Severity:

normal

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: Callek, Assigned: aselagea)

References

Details

Attachments

(5 files, 5 obsolete files)

new_check_pending_builds.py 9 years ago Alin Selagea [:aselagea] 4.72 KB, text/plain		Details
check_pending_builds.py 9 years ago Alin Selagea [:aselagea] 4.64 KB, text/plain		Details
bug_1245823.patch 9 years ago Alin Selagea [:aselagea] 4.31 KB, patch	Callek : feedback+	Details \| Diff \| Splinter Review
builds-pending.json 9 years ago Alin Selagea [:aselagea] 512.95 KB, text/plain		Details
results_new.txt 9 years ago Alin Selagea [:aselagea] 1.57 KB, text/plain		Details
results_old.txt 9 years ago Alin Selagea [:aselagea] 1.62 KB, text/plain		Details
bug_1245823.patch 9 years ago Alin Selagea [:aselagea] 4.29 KB, patch	Callek : review+	Details \| Diff \| Splinter Review
check_pending_builds.py 9 years ago Alin Selagea [:aselagea] 4.59 KB, text/plain		Details
bug_1245823.patch 9 years ago Alin Selagea [:aselagea] 4.25 KB, patch	aselagea : checked-in+	Details \| Diff \| Splinter Review
check_pending_builds.py 9 years ago Alin Selagea [:aselagea] 4.59 KB, text/plain		Details

Justin Wood (:Callek)

Reporter

Description

•

9 years ago

Over in Bug 1204970] we made check pending builds use allthethings.json and parse that for actual pool depth/age. However over the past 24 hours or so its been flapping as CRITICAL due to nagios timeouts in trying to run it. When we first created the new check, amy bumped the timeout to 60s, but bumping it more than that isn't as useful. There are two main options here: * Speed up the overall runtime of the check * Make the script run in a cron and emit to a static file the nagios check queries. I'm partial to trying for speeding up first.

Alin Selagea [:aselagea]

Assignee

Comment 1

•

9 years ago

Attached file new_check_pending_builds.py (obsolete) — Details

I uploaded a new version of the pending count check script. There are two major things that have been changed: - cached the list of builders by counting how many times they repeat: in this way, we will avoid comparing each builder name with its correspondent from the allthethings.json file, thus saving a great amount of time - use the local allthethings.json file (which is also computed on cruncher) instead of its secure.pub.m.o link These combined can greatly reduce the time required to run the script.

Attachment #8717015 - Flags: feedback?(bugspam.Callek)

Justin Wood (:Callek)

Reporter

Comment 2

•

9 years ago

Comment on attachment 8717015 [details] new_check_pending_builds.py giving feedback would be much easier for me in terms of a diff rather than a flat file. Also I'm interested in the timing info from before and after the change, e.g. stamp_start=$(date --utc +%s); for i in {1..20}; do sleep 1; done; stamp_end=$(date --utc +%s); echo "$(((stamp_end - stamp_start) / 20))" where you replace "sleep 1" with the call to execute this script, and it outputs the average time [in seconds] to execute. And you run it again with the change (you can s/20/10/g in there if you are concerned about waiting too long)

Attachment #8717015 - Flags: feedback?(bugspam.Callek)

Alin Selagea [:aselagea]

Assignee

Comment 3

•

9 years ago

Since at the moment we only have a small number of pending jobs (under 20), I opted doing some local tests using a previously saved "builds-pending.json" file, containing 4530 pending jobs and a size of 512KB. Here are the two cases: I. new version of the script: "builds-pending.json" --> local file, "allthethings.json" --> also a local file (the new script uses the latest version of "allthethings.json" file from cruncher, so it's pretty much a similar situation) II. old version of the script: "builds-pending.json" --> local file, "allthethings.json" --> specified by secure.pub.m.o link As it can be noticed, there's a significant difference between the two scripts in terms of running speed. Attached the patch that contains the changes with respect to the version that can be found on braindump and the "builds-json" file used for tests.

Alin Selagea [:aselagea]

Assignee

Comment 4

•

9 years ago

Attached file check_pending_builds.py (obsolete) — Details

Attachment #8717015 - Attachment is obsolete: true

Alin Selagea [:aselagea]

Assignee

Comment 5

•

9 years ago

Attached patch bug_1245823.patch (obsolete) — Details — Splinter Review

Attachment #8717383 - Flags: feedback?(bugspam.Callek)

Alin Selagea [:aselagea]

Assignee

Comment 6

•

9 years ago

Attached file builds-pending.json — Details

Alin Selagea [:aselagea]

Assignee

Comment 7

•

9 years ago

Attached file results_new.txt — Details

Alin Selagea [:aselagea]

Assignee

Comment 8

•

9 years ago

Attached file results_old.txt — Details

Alin Selagea [:aselagea]

Assignee

Updated

•

9 years ago

Assignee: nobody → alin.selagea

Justin Wood (:Callek)

Reporter

Comment 9

•

9 years ago

Comment on attachment 8717383 [details] [diff] [review] bug_1245823.patch Review of attachment 8717383 [details] [diff] [review]: ----------------------------------------------------------------- ::: nagios-related/check_pending_builds.py @@ +28,5 @@ > + if k in jobpool_cache: > + pending_builds[k] += 1 > + else: > + pending_builds[k] = 1 > + jobpool_cache.append(k) jobpool_cache is unused here, can save some processing time by removing it, and instead doing: if k in pending_builds: @@ +36,5 @@ > # get the builder names and their corresponding slavepools > def get_builders_and_slavepools(): > builders_and_slavepools = [] > + with open("/var/www/html/builds/allthethings.json") as response: > + result = json.loads(response.read()) using `response` here clobbers the global import, and is confusing. `as f:` is a common format. And since you're changing to a file object instead of response's library, you can switch to: `result = json.load(f)` since .load() uses a file-object and avoid the resident memory useage of .read()-->json-object @@ +78,4 @@ > count_by_slavepool = [] > for m in range(0, len(slavepools)): > count_by_slavepool.append([slavepools[m][0], 0]) > + for i,j in pending_builds.items(): use .iteritems() here (its better on mem usage) @@ +84,3 @@ > for m in range(0, len(count_by_slavepool)): > + if builders_and_slavepools[k][1] == slavepools[m][0]: > + count_by_slavepool[m][1] = count_by_slavepool[m][1] + j perhaps `count_by_slavepool[m][1]` += j would read better? (it's semantically the same, and has no perf benefit this new way)

Attachment #8717383 - Flags: feedback?(bugspam.Callek) → feedback+

Alin Selagea [:aselagea]

Assignee

Comment 10

•

9 years ago

Attached patch bug_1245823.patch (obsolete) — Details — Splinter Review

Attachment #8717383 - Attachment is obsolete: true

Attachment #8718771 - Flags: review?(bugspam.Callek)

Alin Selagea [:aselagea]

Assignee

Comment 11

•

9 years ago

Attached file check_pending_builds.py (obsolete) — Details

Attachment #8717381 - Attachment is obsolete: true

Alin Selagea [:aselagea]

Assignee

Comment 12

•

9 years ago

Thanks for your feedback. I did the suggested changes, tested the script again (both locally and on cruncher) and worked without issues. Here's the new version of the patch and the new version of the script. If everything is OK, should we land the patch and then replace the script on cruncher?

Justin Wood (:Callek)

Reporter

Comment 13

•

9 years ago

(In reply to Alin Selagea [:aselagea][:buildduty] from comment #12) > Thanks for your feedback. > > I did the suggested changes, tested the script again (both locally and on > cruncher) and worked without issues. Here's the new version of the patch and > the new version of the script. Reviewing now ;-) > If everything is OK, should we land the patch and then replace the script on > cruncher? Yes.

Justin Wood (:Callek)

Reporter

Comment 14

•

9 years ago

Comment on attachment 8718771 [details] [diff] [review] bug_1245823.patch Review of attachment 8718771 [details] [diff] [review]: ----------------------------------------------------------------- r+ with the following changes. ::: nagios-related/check_pending_builds.py @@ +27,5 @@ > + k = request['buildername'] > + if k in pending_builds.keys(): > + pending_builds[k] += 1 > + else: > + pending_builds[k] = 1 trailing whitespace @@ +28,5 @@ > + if k in pending_builds.keys(): > + pending_builds[k] += 1 > + else: > + pending_builds[k] = 1 > + pending_builds.update() .update() does nothing, drop it ;-) @@ +35,5 @@ > > # get the builder names and their corresponding slavepools > def get_builders_and_slavepools(): > builders_and_slavepools = [] > + with open("/var/www/html/builds/allthethings.json") as f: Hrm, actually lets make this a global variable like the all_things_url was, that will make it easier to discover if this path ever changes.

Attachment #8718771 - Flags: review?(bugspam.Callek) → review+

Alin Selagea [:aselagea]

Assignee

Comment 15

•

9 years ago

Attached patch bug_1245823.patch — Details — Splinter Review

Okay, updated the patch.

Attachment #8718771 - Attachment is obsolete: true

Alin Selagea [:aselagea]

Assignee

Comment 16

•

9 years ago

Attached file check_pending_builds.py — Details

Attachment #8718772 - Attachment is obsolete: true

Alin Selagea [:aselagea]

Assignee

Updated

•

9 years ago

Attachment #8718857 - Flags: checked-in+

Alin Selagea [:aselagea]

Assignee

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Updated

•

7 years ago

Product: Release Engineering → Infrastructure & Operations

Updated

•

5 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.