Closed Bug 1164055 Opened 9 years ago Closed 9 years ago

don't reclaim tasks so often in buildbot bridge

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: bhearsum)

References

Details

(Whiteboard: [bbb])

Attachments

(1 file)

Probably once every 5min or so is good.
The formal semantics are that you must reclaim before takenUntil expires.
takenUntil is a timestamp. To allow for a little clock drift, network delays,
it's often a good idea to reclaim ~5min before takenUntil expires.
This is very different from "every 5 min or so".

In particular when implemented with these semantics we can increase the duration of takenUntil
server-side, if we want to reduce the overhead of reclaimTask calls.

Hence use the condition: (takenUntil < now() - 5min)
If your loop is particularly slow, you can use 10 min instead.

See code sample, as suggested in review:
> + if int(t.takenUntil) < (arrow.utcnow().timestamp - 5 * 60):
>     try:
>       result = self.tc_queue.reclaimTask(t.taskId, int(t.runId))
>       ...

Insert it right here:
https://github.com/mozilla/buildbot-bridge/blob/62d122fff50eb4513fb89db89364c4dba1788b9d/bbb/services.py#L227-L228
Thanks for the clarifications around using 5min before expiry vs. every 5 min. There's three instances of the reflector running, so I _think_ we'll be able to keep up. We can keep an eye on this as we start moving schedulers over and adjust it if needed.

I gave this a test and the log output has it working correctly. We have the original claim (done by the BuildbotListener):
2015-05-13 08:52:04,674 - bbb.services - Claiming fHD3e-RnSze9JOZoTSwxIg

Which gets:
  "takenUntil": "2015-05-13T16:12:04.930Z"

We get a bunch of "in progress" messages from the reflector:
2015-05-13 08:52:21,976 - bbb.services - BuildRequest 67 is in progress
2015-05-13 08:53:22,037 - bbb.services - BuildRequest 67 is in progress
2015-05-13 08:54:22,088 - bbb.services - BuildRequest 67 is in progress
...
2015-05-13 09:06:22,844 - bbb.services - BuildRequest 67 is in progress

And eventually:
2015-05-13 09:07:22,910 - bbb.services - BuildRequest 67 is in progress
2015-05-13 09:07:22,910 - bbb.services - Claim for BuildRequest 67 is more than 5min old, reclaiming

Which gets:
  "takenUntil": "2015-05-13T16:27:23.206Z"

...and the cycle repeats.
Attachment #8605331 - Flags: review?(jopsen)
Comment on attachment 8605331 [details] [diff] [review]
slower-reclaim-bbb.diff

Review of attachment 8605331 [details] [diff] [review]:
-----------------------------------------------------------------

So, this line looks very sketchy to me:
> if arrow.now() > arrow.get(t.takenUntil).replace(minutes=-5):
But checking the docs .replace(minutes=-5) is relative.
While .replace(minute=5) is not...

So definitely works... it just looks very wrong, he he :)
(ie. if it replace minute with 5 we would have some very weird behaviour; for the record it does make a relative offset as we want)
Attachment #8605331 - Flags: review?(jopsen) → review+
Comment on attachment 8605331 [details] [diff] [review]
slower-reclaim-bbb.diff

I pushed this and bumped the version number. I'll get it rolled out soon, hopefully.
Attachment #8605331 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [bbb]
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: