Closed Bug 890302 Opened 6 years ago Closed 6 years ago

Increase timeout in test-mar-url.sh

Categories

(Release Engineering :: Release Automation: Other, defect)

x86
Linux
defect
Not set

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 894368

People

(Reporter: jhopkins, Assigned: jhopkins)

Details

Attachments

(1 file)

At http://hg.mozilla.org/build/tools/file/2787a5b6689e/release/test-mar-url.sh#l6 there is this line:

curl --retry 5 --retry-max-time 30 -k -s -I -L "${mar_url}" > "${mar_headers_file}" 2>&1

This will only retry up to 5 times or 30 seconds _total_ time for all retries, whichever comes first.

We should bump this up to something like
 --retry 50 --retry-max-time 300
to avoid failing the whole verify operation on transient failures.
Assignee: nobody → pmoore
Hi John,

I've also updated retry attempts and timeout options for get-update-xml.sh, which is the script called by final-verification.sh that fetches the update.xml files, which are then parsed in order to retrieve the urls which are then tested in test-mar-url.sh.

Thanks,
Pete
Attachment #773233 - Flags: review?(jhopkins)
Assignee: pmoore → jhopkins
Attachment #773233 - Flags: review?(jhopkins) → review+
Comment on attachment 773233 [details] [diff] [review]
Increased timeouts and retry attempts for curl operations during final-verification.sh steps

https://hg.mozilla.org/build/tools/rev/32a4053c58e5
Attachment #773233 - Flags: checked-in+
Reopen if there are any problems.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
This change may not have done what was expected, because of this from the man page:
--retry <num>
If  a  transient  error  is  returned  when  curl tries to perform a transfer, it will retry this number of times before giving up. ... Transient error means either: a timeout, an FTP 5xx response code or an HTTP 5xx response code.

We're getting 408 responses instead.
I think we can try some extra tactics to reduce the http 408 errors that we are now getting:

1) detect http/408 errors, and retry automatically (as per Nick's comment above, curl is not natively retrying server timeouts, so we could do this either using retry.py or handle directly in the shell script(s))
2) reduce number of parallel processes, to reduce load on CDN (suggest 48 instead of 128)

I'd also like to simplify the error reporting when all update.xml files are consistent, so that not every identical copy is output in the failure report (only when they are not identical). In the case they are all identical, just one copy would be shown, with an explanation that they were all identical.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 894368
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.