Closed Bug 1250458 Opened 4 years ago Closed 4 years ago
taskcluster upload should be able to cope with slow network
58 bytes, text/x-review-board-request
Revealed by the slow network in bug 1250374, if we take more than 20 minutes to upload to taskcluster we'll fail to reclaim the task, it'll expire, and everything will go pear shaped. See bug 1250374 comment #8. https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/building/buildbase.py#1541 https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/taskcluster_helper.py#12 mshal set this up originally but I think he's on a work-week this week.
Dropping severity since this is not actively blocking anything other than resiliency of our network
Severity: blocker → major
IMO the easiest thing to do is to call reclaimTask in between each file, which would mean the per-file limit is 20 minutes instead of a per-job limit of 20 minutes. It would be better still if there's an easy way to periodically call reclaimTask in a separate thread or something, but off-hand I don't know how hard that would be to do.
+1. It would help the most common failure modes without over-complicating this logic.
There's other things we could do here, but lets grab the low hanging fruit.
Assignee: nobody → nthomas
Status: NEW → ASSIGNED
Attachment #8722797 - Flags: review?(mshal)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=073959365ab7 if you're interested.
Huh, I thought we could've needed to add a new reclaim_task method in taskcluster_helper.py to call http://docs.taskcluster.net/queue/api-docs/#reclaimTask :jonasfj, does calling claimTask again effectively do the same thing here as reclaimTask as far as resetting the timer?
@mshal, You are right, claimTask != reclaimTask, hmm, I see can't refer to the docs as I didn't write any... > :jonasfj, does calling claimTask again effectively do the same thing here as reclaimTask as far as > resetting the timer? Calling claimTask(taskId, runId) on a task and run that is already running will return 409, conflict. To post-pone the takenUntil timestamp call reclaimTask(taskId, runId)
(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #7) > Calling claimTask(taskId, runId) on a task and run that is already running > will return 409, conflict. Hmm, that doesn't seem to jive with nthomas' try push - it looks like it is successful (or something is silently ignoring the error). > > To post-pone the takenUntil timestamp call reclaimTask(taskId, runId) So, I think we'll want a reclaim_task in taskcluster_helper that does something like: self.taskcluster_queue.reclaimTask( task['status']['taskId'], task['status']['runs'][-1]['runId']) (untested)
Attachment #8722797 - Attachment is obsolete: true
Review commit: https://reviewboard.mozilla.org/r/45353/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/45353/
Attachment #8739729 - Flags: review?(nthomas)
Comment on attachment 8739729 [details] MozReview Request: Bug 1250458 - Reclaim task before file uploads r=nthomas The same approach works fine in create_reference_artifact() in the same helper.
Comment on attachment 8739729 [details] MozReview Request: Bug 1250458 - Reclaim task before file uploads r=nthomas https://reviewboard.mozilla.org/r/45353/#review41893 lgtm
Attachment #8739729 - Flags: review?(nthomas) → review+
You need to log in before you can comment on or make changes to this bug.