Closed Bug 1149703 Opened 7 years ago Closed 2 years ago

Mozharness needs to reclaimTask while uploading

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: mshal, Unassigned)

Details

Attachments

(1 file)

If taskcluster uploads take a long time in mozharness, it's possible we burn through the claim time (takenUntil) of our claimed task. We'll need to periodically issue reclaimTask() calls until the task deadline (currently 1 hour) is hit. Effectively this might mean a separate thread that does reclaimTask() until reclaimTask returns a failure response, indicating the task expired. Something like:

response = claimTask()
while True:
    sleep(response.takenUntil - now() - 5 min)
    response = reclaimTask()
    if bad response:
        raise an exception /  break out of the loop
catlee suggested something simpler, so we don't have to deal with an extra thread for this. We can just reclaim in between uploads if the elapsed time is beyond a threshold:

for upload_file in files:
    createArtfiact()
    if elapsed_time > threshold:
        reclaimTask()

The threshold is there so we don't have to issue a reclaim in between lots of small uploaded files. Each individual file would then have a larger timeout, rather than a single claim period for all files.
Bug 1124303 is working hard at improving the networks and we would prefer not to go this route. However, as a back up, I took a look at this yesterday.

It's hard to test in staging. Ended up doing a try run.

This try run seems to at least negative test that I'm not making things worse: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e346c780909

to positive test, we will need to timeout. I suppose I can add a sleep(18minutes) before uploading files so we go past the 20m expiration.

Will do another try run with above mentioned sleep.
fyi - I am on PTO until May 13th.

The above patch, https://bugzilla.mozilla.org/show_bug.cgi?id=1149703#c2, still needs a positive test by inserting a fake sleep in a try run.

also: this is a last resort as the submission should be taking only a few min and not exceeding the default 20min
The uploading logic may be changing slightly to support l10n artifacts. Feel free to ping me when you're back in case I've made any progress on that front :)

I suspect things may have changed in the past five years.
reclaimTask now runs in the worker. We shouldn't need to add anything to mozharness.

Resolving; please reopen with a comment why if my read isn't correct.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.