Closed Bug 1316249 Opened 8 years ago Closed 8 years ago

Fails uploading to tooltool RuntimeError: Non-200 return from AWS: 400 Bad Request

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(firefox52 fixed)

RESOLVED FIXED
Tracking Status
firefox52 --- fixed

People

(Reporter: ting, Unassigned)

Details

Attachments

(2 files)

Ran into this in bug 1251936, I am located in TPO. $ python tooltool.py upload --authentication-file=.tooltool-token --message "Bug 1251936 - New Windows clang for x86 /x64, r283955" INFO - clang64.tar.bz2: starting upload INFO - clang32.tar.bz2: starting upload ERROR - clang32.tar.bz2: failed Traceback (most recent call last): File "tooltool.py", line 791, in _s3_upload (resp.status, resp.reason, resp_body)) RuntimeError: Non-200 return from AWS: 400 Bad Request <?xml version="1.0" encoding="UTF-8"?><Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>20E806D572EAA723</RequestId><HostId>tIFXayLqhUbP7PZ17XSaKW+PyUMNFBtCbGiUb3QFT37phf7xQmSjo3gvDosD7FvYo1i7PR+jDB8yxOEfus9fuI51i/5OM3oV</HostId></Error> ERROR - clang64.tar.bz2: failed Traceback (most recent call last): File "tooltool.py", line 791, in _s3_upload (resp.status, resp.reason, resp_body)) RuntimeError: Non-200 return from AWS: 400 Bad Request <?xml version="1.0" encoding="UTF-8"?><Error><Code>RequestTimeout</Code><Message>Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.</Message><RequestId>CF7F00947BB490F2</RequestId><HostId>SpjvbaWXoeMW/vJTcNgMJ9p7gA9LAwkWvxq9P4sBzphpEHlehyytIg3ciDO8J7OhrHm8rijubqEDhVyMe9n1Y1QUCCSKktBh</HostId></Error>
Attached file tcpdump
Captured by command: ">WinDump.exe -s0 -v -w tcpdump.pcap", I tried but am bad at interpreting the log. :(
Flags: needinfo?(dustin)
Summary: Fails uploading to tooltool → Fails uploading to tooltool RuntimeError: Non-200 return from AWS: 400 Bad Request
BTW, I don't know how to delete the batches on https://api.pub.build.mozilla.org/tooltool/ from failed uploading.
How big is the file?
Don't worry about the failed uploads; since they have no associated files, nobody will ever find them :) Before diving in, I should qualify and say that I haven't heard of anyone else having this kind of issue. Here's the relevant bit of the pcap. I don't know how familiar you are with TCP, but I'll write this out here anyway :) look up api.pub: 06:05:08.687714 IP 10.247.30.97.63144 > 10.247.75.120.53: 47188+ A? api.pub.build.mozilla.org. (43) 06:05:08.688678 IP 10.247.75.120.53 > 10.247.30.97.63144: 47188 2/0/0 CNAME relengapi-zlb.vips.scl3.mozilla.com., A 63.245.215.27 (108) look up the S3 bucket URL: 06:05:09.336483 IP 10.247.30.97.64338 > 10.247.75.120.53: 32472+ A? mozilla-releng-usw2-tooltool.s3-us-west-2.amazonaws.com. (73) 06:05:09.361308 IP 10.247.30.97.64338 > 10.247.75.120.53: 32472+ A? mozilla-releng-usw2-tooltool.s3-us-west-2.amazonaws.com. (73) 06:05:09.444379 IP 10.247.75.120.53 > 10.247.30.97.64338: 32472 2/0/0 CNAME s3-us-west-2-r-w.amazonaws.com., A 54.231.169.13 (120) connect to S3: 06:05:09.445037 IP 10.247.30.97.49691 > 54.231.169.13.443: Flags [S], seq 1773387666, win 8192, options [mss 1460,nop,wscale 8,nop,nop,sackOK], length 0 06:05:09.620559 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [S.], seq 1941016628, ack 1773387667, win 8192, options [mss 1432,wscale 8,nop,sackOK,nop,nop], length 0 06:05:09.620644 IP 10.247.30.97.49691 > 54.231.169.13.443: Flags [.], ack 1, win 257, length 0 TLS setup: 06:05:09.620919 IP 10.247.30.97.49691 > 54.231.169.13.443: Flags [P.], seq 1:518, ack 1, win 257, length 517 06:05:09.795139 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 518, win 62, length 0 06:05:09.798327 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], seq 1:1433, ack 518, win 62, length 1432 06:05:09.798394 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], seq 1433:2865, ack 518, win 62, length 1432 06:05:09.798395 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [P.], seq 2865:3150, ack 518, win 62, length 285 06:05:09.798444 IP 10.247.30.97.49691 > 54.231.169.13.443: Flags [.], ack 3150, win 257, length 0 06:05:09.800839 IP 10.247.30.97.49691 > 54.231.169.13.443: Flags [P.], seq 518:668, ack 3150, win 257, length 150 06:05:09.975943 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [P.], seq 3150:3225, ack 668, win 66, length 75 HTTP request (I'm guessing here, it's encrypted after all) 06:05:09.977644 IP 10.247.30.97.49691 > 54.231.169.13.443: Flags [P.], seq 668:1121, ack 3225, win 257, length 453 Push data; these have a length of 0 because your system is letting the network interface do the calculation. Judging by the later ACK's, these are legitimate, full-sized PSH segments (of about 1.5k each). I've guessed at the sequence numbers based on the ethernet packet sizes (from tcpdump -e). 06:05:09.977874 IP bad-len 0 # seq 1121:2640 06:05:09.977956 IP bad-len 0 # seq ..:4117 06:05:09.978313 IP bad-len 0 # seq ..:5693 06:05:09.978488 IP bad-len 0 # seq ..:7366 06:05:09.978674 IP bad-len 0 # seq ..:9679 06:05:09.978972 IP bad-len 0 # seq ..:11959 06:05:09.979217 IP bad-len 0 # seq ..:13482 06:05:09.979426 IP bad-len 0 # seq ..:15208 After 15k (15655 bytes to be exact), your system sends a smaller segment, probably due to window size, as none of its pushes have been ACKed yet 06:05:09.979645 IP 10.247.30.97.49691 > 54.231.169.13.443: Flags [P.], seq 15208:15655, ack 3225, win 257, length 447 ACKs start coming back from S3, and your system continues sending data as the window advances 06:05:10.151978 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 2553, win 81, length 0 06:05:10.152024 IP bad-len 0 # seq ..:18519 06:05:10.152247 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 4072, win 92, length 0 06:05:10.152263 IP bad-len 0 # seq ..:21383 06:05:10.152708 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 5549, win 104, length 0 06:05:10.152709 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 7125, win 126, length 0 06:05:10.152732 IP bad-len 0 # seq ..:28543 06:05:10.152921 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 8798, win 148, length 0 06:05:10.152937 IP bad-len 0 # seq ..:31407 06:05:10.153022 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 11111, win 165, length 0 06:05:10.153039 IP bad-len 0 # seq ..:36538 And, that's the last byte of data we see from your system: 36.5k. I think the file you're uploading is larger than that! S3 proceeds to ACK all of that data (which has been re-segmented by the network interface, hence the different sequence numbers here) 06:05:10.153663 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 13391, win 165, length 0 06:05:10.153663 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 15208, win 158, length 0 06:05:10.195567 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 15655, win 165, length 0 06:05:10.326343 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 17087, win 165, length 0 06:05:10.326344 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 18519, win 160, length 0 06:05:10.326344 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 19951, win 155, length 0 06:05:10.326344 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 21383, win 150, length 0 06:05:10.326928 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 22815, win 165, length 0 06:05:10.326928 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 25679, win 158, length 0 06:05:10.326929 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 28543, win 147, length 0 06:05:10.327210 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 31407, win 158, length 0 06:05:10.327287 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 34271, win 158, length 0 06:05:10.327288 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [.], ack 36538, win 150, length 0 By now your system has stopped sending data after byte 36538, and S3 has ACKed everything up to that byte. After exactly 15 seconds, S3 sends an HTTP response with the XML error message about the timeout that you eventually see onscreen: 06:05:25.326949 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [P.], seq 3225:3950, ack 36538, win 165, length 725 06:05:25.326950 IP 54.231.169.13.443 > 10.247.30.97.49691: Flags [P.], seq 3950:4003, ack 36538, win 165, length 53 Your system immediately ACK's that response and then hangs up the connection: 06:05:25.326987 IP 10.247.30.97.49691 > 54.231.169.13.443: Flags [.], ack 4003, win 253, length 0 06:05:25.331795 IP 10.247.30.97.49691 > 54.231.169.13.443: Flags [R.], seq 36538, ack 4003, win 0, length 0 So I suppose the question is, why would your system stop sending data after only 36.5k? It's not *hung*, as it responds to the error message immediately, so it must be in some kind of wait state. The code that's sending the data is not complicated: https://github.com/mozilla/build-tooltool/blob/master/tooltool.py#L775 def _s3_upload(filename, file): # urllib2 does not support streaming, so we fall back to good old httplib url = urlparse.urlparse(file['put_url']) cls = httplib.HTTPSConnection if url.scheme == 'https' else httplib.HTTPConnection host, port = url.netloc.split(':') if ':' in url.netloc else (url.netloc, 443) port = int(port) conn = cls(host, port) try: req_path = "%s?%s" % (url.path, url.query) if url.query else url.path conn.request('PUT', req_path, open(filename), {'Content-type': 'application/octet-stream'}) resp = conn.getresponse() resp_body = resp.read() conn.close() I do see one potential issue, though: I didn't open in binary mode. Try changing `open(filename)` to `open(filename, "rb")`? I can see how that might cause an issue if, indeed, the file you're uploading is about 36k, as httplib would send a different content-length than the actual number of bytes transmitted. Want to try that?
I tried and it works!
Flags: needinfo?(dustin)
Comment on attachment 8809396 [details] Bug 1316249: tooltool: specify binary mode in all open(..) calls; https://reviewboard.mozilla.org/r/92016/#review92188
Attachment #8809396 - Flags: review?(gps) → review+
Pushed by gszorc@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/8d8337d3199c tooltool: specify binary mode in all open(..) calls; r=gps
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: