Create a curl wrapper / helper to compensate for transient failures
Categories
(Webtools :: Searchfox, enhancement)
Tracking
(Not tracked)
People
(Reporter: asuth, Unassigned)
References
(Blocks 2 open bugs)
Details
In various bugs we've experienced transient failures (bug 1649453 weird SSL error, bug 1662442 shortlived server 500, bug 1694210 was motivated by intermittent connection edge cases due to a legit but rare-ish curl bug) and we've considered adding a wrapper to help provide some primitive higher level retry attempt. We would want to only use this for fetches that we would otherwise fail on and we would want the failures to appear in the email log.
There's no need to act on this right now; I'm just filing to clean up some prior bugs.
| Reporter | ||
Comment 2•4 years ago
|
||
An important excerpt from :kats comment at https://bugzilla.mozilla.org/show_bug.cgi?id=1649453#c0:
Had some discussion with :asuth on #searchfox-outage:mozilla.org - it might be reasonable to use a wrapper around curl that attempts to diagnose the error (by running ping or traceroute or something) and/or attempts a retry after a little bit, in case of transient errors.
| Reporter | ||
Comment 4•4 years ago
|
||
Config4 just fell over here, which was wget, but the same idea of "hey, maybe we could retry" still applies (and we could/should be using curl here)
+ wget -nv https://s3-us-west-2.amazonaws.com/searchfox.repositories/whatwg-html.tar
https://s3-us-west-2.amazonaws.com/searchfox.repositories/whatwg-html.tar:
2021-04-13 12:10:02 ERROR 500: Internal Server Error.
| Reporter | ||
Comment 5•4 years ago
|
||
config4 fell over again with a similar 500 fetch failure from AWS of an artifact that should have existed. There were also a few 404's that preceded it, but it's always hard to tell what the 404's corresponded to because of how the output is emitted.
| Reporter | ||
Comment 6•3 years ago
|
||
I added some function helpers to load-vars.sh as part of bug 1782375 as bash functions that get exported via export -f and this is probably a good mechanism for implementing this helper given that:
- we can bind the decision making to per-tree config settings like what to do on errors
- trying to be clever with quotes inside variables does not work and functions are the recommended solution to that. What we're doing right now with curl only works because we're building up a job list for parallel manually which returns things to string space and which then undergoes fresh shell parsing from the full strings.
| Reporter | ||
Comment 7•3 years ago
|
||
Bug 1795328 is a config2 fetch failure for the blame similar to config4 in comment 4.
One thing I'm perhaps more willing to consider at this point is also just pushing more of this logic into rust that's configured by JSON and/or TOML and can perform a bunch of parallel downloads with a bit more global awareness. If we assume that transient network problems can happen for up to a minute but that they're also rare, it's nicer to have a batch downloader that can apply a delay effectively once rather than having N script invocations that each end up needing to apply delay/retry heuristics. This could also generally improve performance / decrease cost.
That said, the curl wrapper that waits 30 seconds on failure and tries again once would probably go very far.
| Reporter | ||
Comment 8•3 years ago
|
||
Got a 502 on config1 today from the following:
curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.revision.ec2ad590b1485ab7d895d3d8ba9434739f8dd603.firefox.macosx64-searchfox-debug/artifacts/public/build/target.mozsearch-rust.zip > macosx64.mozsearch-rust.zip
Last 502 was also config1 from Jan 8th, I think, so not very common, but an additional data point for the retry logic, especially as I was able to try this later and there were no problems.
| Reporter | ||
Comment 9•2 years ago
|
||
Got a weird 404 on config1 today as follows. The revisions were pretty normal; the armv7 artifact definitely came into existence in a timely fashion and the coverage job which allowed us to pick this revision was also timely.
+ grep moz_source_stamp
"moz_source_stamp": "e05ed4cedf9f0528e75958b2f9c7f87fff68ecae",
+ '[' -f android-armv7.mozsearch-index.zip ']'
+ TC_PREFIX=https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.revision.e05ed4cedf9f0528e75958b2f9c7f87fff68ecae.firefox.android-armv7-searchfox-debug/artifacts/public/build
+ echo 'curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.revision.e05ed4cedf9f0528e75958b2f9c7f87fff68ecae.firefox.android-armv7-searchfox-debug/artifacts/public/build/target.mozsearch-index.zip > android-armv7.mozsearch-index.zip'
+ echo 'curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.revision.e05ed4cedf9f0528e75958b2f9c7f87fff68ecae.firefox.android-armv7-searchfox-debug/artifacts/public/build/target.mozsearch-rust.zip > android-armv7.mozsearch-rust.zip'
+ echo 'curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.revision.e05ed4cedf9f0528e75958b2f9c7f87fff68ecae.firefox.android-armv7-searchfox-debug/artifacts/public/build/target.mozsearch-rust-stdlib.zip > android-armv7.mozsearch-rust-stdlib.zip'
+ echo 'curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.revision.e05ed4cedf9f0528e75958b2f9c7f87fff68ecae.firefox.android-armv7-searchfox-debug/artifacts/public/build/target.generated-files.tar.gz > android-armv7.generated-files.tar.gz'
+ echo 'curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.revision.e05ed4cedf9f0528e75958b2f9c7f87fff68ecae.firefox.android-armv7-searchfox-debug/artifacts/public/build/target.mozsearch-distinclude.map > android-armv7.distinclude.map'
+ parallel --halt now,fail=1
curl: (22) The requested URL returned error: 404
curl: (35) OpenSSL SSL_connect: Connection reset by peer in connection to s3.us-west-2.amazonaws.com:443
parallel: This job failed:
curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.revision.e05ed4cedf9f0528e75958b2f9c7f87fff68ecae.firefox.android-armv7-searchfox-debug/artifacts/public/build/target.mozsearch-index.zip > android-armv7.mozsearch-index.zip
| Reporter | ||
Comment 10•2 years ago
|
||
I also continue to think my proposal in comment 7 to push more of this fetch stuff into a rust helper tool is probably desirable. The added rationale is that I really want searchfox to be able to index phabricator revisions / try runs and having a more formal ingestion pipeline with strong typing is much preferred for resiliency/error reporting and security reasons (like not passing random things that are hopefully git/hg revisions to a shell script which will probably not be hardened against hostile input). This could also have nice introspection/documentation characteristics now that I've enhanced the "tracing" logging situation.
| Reporter | ||
Comment 11•2 years ago
|
||
2023-01-06 utc22: 502 on config1 for an armv7 fetch but where all the armv7 assets are there:
+ parallel --halt now,fail=1
curl: (22) The requested URL returned error: 502
parallel: This job failed:
curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.revision.7968ae37c117d4be0e81c8843cd1b2b283129791.firefox.android-armv7-searchfox-debug/artifacts/public/build/target.mozsearch-rust-stdlib.zip > android-armv7.mozsearch-rust-stdlib.zip
| Reporter | ||
Comment 12•2 years ago
|
||
2023-01-13 utc10: 502 on linux64 rust artifact, next to some 404s that might be fine:
+ parallel --halt now,fail=1
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 404
curl: (22) The requested URL returned error: 502
parallel: This job failed:
curl -SsfL --compressed https://firefox-ci-tc.services.mozilla.com/api/index/v1/task/gecko.v2.mozilla-central.revision.31122740c39d5368ae280c3493fb7ca3b4e5c07d.firefox.linux64-searchfox-debug/artifacts/public/build/target.mozsearch-rust.zip > linux64.mozsearch-rust.zip
Description
•