Intermittent "400 Bad Request" errors when uploading to the TC queue causing Windows job failures
Categories
(Taskcluster :: General, defect)
Tracking
(Not tracked)
People
(Reporter: bogdan_tara, Unassigned)
Details
+++ This bug was initially created as a clone of Bug #1394557 +++
![]() |
||
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 2•3 years ago
|
||
I'm not sure what to do here. We already try to workaround this with the code added in bug 1394557.
Getting newer versions of generic-worker on the workers might help. We're currently running v16.5.1 on the t-win7-32 pool, and latest is v36.
Comment 3•3 years ago
•
|
||
From the logs, it looks like the worker attempted to upload the artifact with exponential backoff algorithm for 15 minutes, before giving up:
[taskcluster 2020-07-20T23:36:49.591Z] Uploading artifact public/test_info/wpt_raw.log from file build\blobber_upload_dir\wpt_raw.log with content encoding "gzip", mime type "text/plain" and expiry 2021-07-20T21:58:48.435Z
[taskcluster:error] Error uploading artifact: S3 returned status code 400 which could be an intermittent issue - see https://bugzilla.mozilla.org/show_bug.cgi?id=1394557
[taskcluster 2020-07-20T23:51:59.223Z] Uploading artifact public/test_info/wptreport.json from file build\blobber_upload_dir\wptreport.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2021-07-20T21:58:48.435Z
I suspect this was a networking/connectivity issue from the Windows 7 worker, talking to S3, as noted in bug 1394557, S3 returns a 400 HTTP status code due to connection inactivity, which could happen if there was a network blip that lasted 15 minutes.
I suspect this was a one-time thing during high load / poor performance of something in between the worker and s3.
If it doesn't happen again, I'd suggest closing this bug and consider it a flake. The worker acted appropriately, in that it tried for 15 minutes to publish the artifact, before giving up.
Updated•3 years ago
|
Description
•