Taskcluster client should retry "failed" requests (due to server failure not request error)



4 years ago
4 years ago


(Reporter: jlal, Assigned: jonasfj)





4 years ago
We should steal the algorithm AWS client uses IMO

Comment 1

4 years ago
We should retry request in case of both connection and server errors.
Any 500 error should be retried, loss of connection, network, etc, RETRY.

But with exponential back-off that is configurable in taskcluster-client.

I started adding `idempotent: true,` as arguments for `api.declare` for the queue.
Currently this property is completely ignored and thrown away before references are generated, and not all API end-point has this property.
But if there is interest, it would be near trivial to add an "idempotent: true || false" property for all API end-points. Which could be used to govern taskcluster-client retry logic.

But we might also just want to retry non-idempotent operations anyways. At the moment I'm not really sure we have any such operations anyways.


4 years ago
Assignee: nobody → jopsen

Comment 2

4 years ago
This was added in:

As of taskcluster-client 0.17.1 we now do 5 attempts with exponential back-off:
1st request waits    0 ms
2nd request waits  100 ms
3rd request waits  400 ms
4th request waits  900 ms
5th request waits 2500 ms

Accumulated time spent sleeping: 3900ms
 that + time necessary to do the 5 attemts.

The logic is inspired by AWS, and looks as follows:

retries = 0
do {
  sleep(retries * retries * 100 ms);  

  execute request;

  if (connection error || 5xx error) {
    // we're trying
  } else {
    break; // We're done

  retries += 1;
} while(retries < options.maxRetries)
Last Resolved: 4 years ago
Resolution: --- → FIXED
Component: TaskCluster → General
Product: Testing → Taskcluster
Target Milestone: --- → mozilla41
Version: unspecified → Trunk
Resetting Version and Target Milestone that accidentally got changed...
Target Milestone: mozilla41 → ---
Version: Trunk → unspecified
You need to log in before you can comment on or make changes to this bug.