1133876 - python-client: Ensure that retries works with taskcluster.utils.putFile

Reporter

Description

•

9 years ago

The requests library docs is not clear on whether or not it seeks a file to 0,
before streaming it's body. From looking at the requests code I suspect it doesn't seek to zero.

As a result taskcluster.utils.putFile, will probably fail to put the
entire file if the first retry files. And since S3 requires content-length
to be correct, subsequent retries will also fail. So we get no file.

We should fix this, either in `makeSingleHttpRequest` by seeking to zero, if
the payload has a seek method. Or by reimplementing retry logic in
taskcluster.utils.putFile.

See:
https://github.com/taskcluster/taskcluster-client.py/blob/1b1ec3b2c3fe5d422bbc7cd9cb51914b16d2a561/taskcluster/utils.py#L140-L148


Note, requests might seek to zero, from the code I read I strongly suspect it only does so for multi-part uploads. Not simple PUT requests.
Either way, we should test that this works! And fix this.
It's extremely likely that putFile retry logic is broken.

Jonas Finnemann Jensen (:jonasfj)

Reporter

Comment 1

•

9 years ago

We really should test retry logic in our test suite. Both with 500 and connection errors.
If you run a localhost server, it's easy to make a method that fails every second time, don't know if httmock will let you do that.

Anyways, it's very important to test retry logic, because it's only used very rarely. And debugging these cases is very hard. So we should test it for both DNS errors, connection errors, 5xx errors, etc.

Michael Shal [:mshal]

Comment 2

•

9 years ago

I tested this locally by temporarily redirecting taskcluster-artifacts.s3-us-west-2.amazonaws.com to 127.0.0.1 to trigger bug 1133856, and then remove that line from /etc/hosts so that one of the retries succeeds. In this case, I see a 200 response for the first retry, and then 400 responses for the next several retries.

I added this to makeHttpRequest:

      if hasattr(payload, 'seek'):
        payload.seek(0)
      response = makeSingleHttpRequest(...)

And then I saw a string of 200's. So it seems jonasfj's intuition is correct and we need to add the seek on retries.

However, even in the case where subsequent puts got a 400 response, the artifact that was attached to the task was still good (ie: it had the full text in the file that I uploaded - it wasn't subsequently overwritten with nothing). So I don't think this fully explains the behavior I saw in task g-8Wo0oST5iRYfdGapGlrg where the artifact exists in the task but is empty, and returns a 404 when trying to download it.

Michael Shal [:mshal]

Comment 3

•

9 years ago

(In reply to Jonas Finnemann Jensen (:jonasfj) from comment #1)
> We really should test retry logic in our test suite. Both with 500 and
> connection errors.
> If you run a localhost server, it's easy to make a method that fails every
> second time, don't know if httmock will let you do that.
> 
> Anyways, it's very important to test retry logic, because it's only used
> very rarely. And debugging these cases is very hard. So we should test it
> for both DNS errors, connection errors, 5xx errors, etc.

Agreed, though I don't know how to add these test cases into taskcluster-client.py's current architecture.

Jonas Finnemann Jensen (:jonasfj)

Reporter

Comment 4

•

9 years ago

@jhford,
Neither mshal or I are familiar with httmock, and/or the python testing framework.
If you have time please drop-in and save the day :)

Flags: needinfo?(jhford)

Michael Shal [:mshal]

Updated

•

9 years ago

Blocks: 1117960

Jonas Finnemann Jensen (:jonasfj)

Reporter

Comment 5

•

9 years ago

I've a fix for the issue here in a patch for bug 1134369.
I think it covers most, if not all the issues.

Jonas Finnemann Jensen (:jonasfj)

Reporter

Comment 6

•

9 years ago

I think/hope this is fixed with bug 1134369

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

John Ford [:jhford] CET/CEST Berlin Time

Updated

•

9 years ago

Flags: needinfo?(jhford)

Pete Moore [:pmoore][:pete]

Updated

•

9 years ago

Component: TaskCluster → General

Product: Testing → Taskcluster

Target Milestone: --- → mozilla41

Version: unspecified → Trunk

Pete Moore [:pmoore][:pete]

Comment 7

•

9 years ago

Resetting Version and Target Milestone that accidentally got changed...

Target Milestone: mozilla41 → ---

Version: Trunk → unspecified

Bugzilla

Quick Search

python-client: Ensure that retries works with taskcluster.utils.putFile

Categories

(Taskcluster :: General, defect)

Tracking

(Not tracked)

People

(Reporter: jonasfj, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Updated

Updated

Comment 7