Mozharness needs to reclaimTask while uploading

NEW
Unassigned

Status

Release Engineering
Applications: MozharnessCore
3 years ago
3 years ago

People

(Reporter: mshal, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
If taskcluster uploads take a long time in mozharness, it's possible we burn through the claim time (takenUntil) of our claimed task. We'll need to periodically issue reclaimTask() calls until the task deadline (currently 1 hour) is hit. Effectively this might mean a separate thread that does reclaimTask() until reclaimTask returns a failure response, indicating the task expired. Something like:

response = claimTask()
while True:
    sleep(response.takenUntil - now() - 5 min)
    response = reclaimTask()
    if bad response:
        raise an exception /  break out of the loop
(Reporter)

Comment 1

3 years ago
catlee suggested something simpler, so we don't have to deal with an extra thread for this. We can just reclaim in between uploads if the elapsed time is beyond a threshold:

for upload_file in files:
    createArtfiact()
    if elapsed_time > threshold:
        reclaimTask()

The threshold is there so we don't have to issue a reclaim in between lots of small uploaded files. Each individual file would then have a larger timeout, rather than a single claim period for all files.

Comment 2

3 years ago
Created attachment 8592621 [details] [diff] [review]
150414_taskcluster_reclaim_task-mh.patch

Bug 1124303 is working hard at improving the networks and we would prefer not to go this route. However, as a back up, I took a look at this yesterday.

It's hard to test in staging. Ended up doing a try run.

This try run seems to at least negative test that I'm not making things worse: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0e346c780909

to positive test, we will need to timeout. I suppose I can add a sleep(18minutes) before uploading files so we go past the 20m expiration.

Will do another try run with above mentioned sleep.

Comment 3

3 years ago
fyi - I am on PTO until May 13th.

The above patch, https://bugzilla.mozilla.org/show_bug.cgi?id=1149703#c2, still needs a positive test by inserting a fake sleep in a try run.

also: this is a last resort as the submission should be taking only a few min and not exceeding the default 20min
(Reporter)

Comment 4

3 years ago
The uploading logic may be changing slightly to support l10n artifacts. Feel free to ping me when you're back in case I've made any progress on that front :)
You need to log in before you can comment on or make changes to this bug.