intermittent 502s from download.mozilla.org

RESOLVED FIXED

Status

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: jgmize, Assigned: oremj)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 years ago
pasting from #ee-infra (timestamps in CDT):

11:31 AM <ee-jenkins> Project bedrock_test_stage_us_west build #955: FAILURE in 7 min 10 sec: https://ci.us-west.moz.works/job/bedrock_test_stage_us_west/955/
11:31 AM <ee-jenkins> Project bedrock_test_dev_us_west build #1133: FAILURE in 7 min 18 sec: https://ci.us-west.moz.works/job/bedrock_test_dev_us_west/1133/
11:34 AM <ee-jenkins> Project bedrock_test_stage_eu_west build #1032: FAILURE in 7 min 52 sec: https://ci.us-west.moz.works/job/bedrock_test_stage_eu_west/1032/
11:35 AM <ee-jenkins> Project bedrock_test_dev_eu_west build #1624: FAILURE in 8 min 1 sec: https://ci.us-west.moz.works/job/bedrock_test_dev_eu_west/1624/
11:42 AM <agibson> jgmize: hmm, lots of 502’s on the download links :/
11:44 AM <•jgmize> are they all on the same url? according to https://ci.us-west.moz.works/job/bedrock_integration_tests_runner/6096/testReport/junit/tests.functional/test_download_l10n/test_localized_download_links_https___download_mozilla_org__product_firefox_46_0_1_SSL_os_osx_lang_en_GB_/ the broken link is
11:44 AM <•jgmize> https://download.mozilla.org/?product=firefox-46.0.1-SSL&os=osx&lang=en-GB
11:44 AM <agibson> jgmize: looks like all the links? https://ci.us-west.moz.works/job/bedrock_integration_tests_runner/6098/#showFailuresLink
11:45 AM <•jgmize> yes, see that now :( https://ci.us-west.moz.works/job/bedrock_integration_tests_runner/6090/ also has problems on several languages


conversation continued in #stubby:

11:51 AM <agibson> the test reruns still seem to be failing, fwiw
11:54 AM <oremj> looks like the tests are following redirects, is it getting that response from the location returned by bouncer
11:54 AM <oremj> or bouncer itself?
11:56 AM <oremj> the ELBs say they haven't returned any 5xx errors in the last 12 hours
11:57 AM <oremj> alright, looks like the CDN
12:08 PM <oremj> jgmize: I submitted a case with amazon, hopefully they can provide me with more data on why these are failing
12:08 PM <oremj> it's failing at the cloudfront or s3 level
12:08 PM <oremj> so I don't have a view in to what might be happening
12:09 PM <jgmize> thanks for the update oremj. mind keeping me in the loop with whatever info amazon gives us?
12:09 PM <oremj> will do
12:10 PM <jgmize> thanks again
12:10 PM <jgmize> oremj: did you already file a bug for tracking purposes or would you like me to?
12:11 PM <oremj> I didn't
12:12 PM <oremj> looks like we had error rates up to 60% for 8 non-sequential minutes
11:51 AM <agibson> the test reruns still seem to be failing, fwiw
11:54 AM <oremj> looks like the tests are following redirects, is it getting that response from the location returned by bouncer
11:54 AM <oremj> or bouncer itself?
11:56 AM <oremj> the ELBs say they haven't returned any 5xx errors in the last 12 hours
11:57 AM <oremj> alright, looks like the CDN
12:08 PM <oremj> jgmize: I submitted a case with amazon, hopefully they can provide me with more data on why these are failing
12:08 PM <oremj> it's failing at the cloudfront or s3 level
12:08 PM <oremj> so I don't have a view in to what might be happening
12:09 PM <jgmize> thanks for the update oremj. mind keeping me in the loop with whatever info amazon gives us?
12:09 PM <oremj> will do
12:10 PM <jgmize> thanks again
12:10 PM <jgmize> oremj: did you already file a bug for tracking purposes or would you like me to?
12:11 PM <oremj> I didn't
12:12 PM <oremj> looks like we had error rates up to 60% for 8 non-sequential minutes
12:13 PM <jgmize> ok, filing now
(Assignee)

Comment 1

2 years ago
Can we start logging the "X-Amz-Cf-Id" header on failed requests, if it exists.
(Assignee)

Comment 2

2 years ago
There is something going on with Cloudfront where using another Cloudfront distribution as an origin and using "Match Viewer" SSL policy causes a 502 when hitting an uncached resource. If the object was already in cache, a 200 will be returned.

To alleviate the errors, I've put it in HTTP only mode, so all requests are sent to the origin distribution via HTTP.

I will continue to work with AWS support about why we are getting 502s in Match Viewer mode.
Assignee: nobody → oremj
Component: Bouncer → Operations: Product Delivery
Product: Webtools → Cloud Services
QA Contact: oremj
Version: Trunk → unspecified
(Assignee)

Comment 3

2 years ago
This was a temporary problem on some of the cloudfront nodes. Amazon says, the problem is fixed.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.