Closed Bug 1275978 Opened 9 years ago Closed 9 years ago

intermittent 502s from download.mozilla.org

Categories

(Cloud Services :: Operations: Product Delivery, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jgmize, Assigned: oremj)

Details

pasting from #ee-infra (timestamps in CDT): 11:31 AM <ee-jenkins> Project bedrock_test_stage_us_west build #955: FAILURE in 7 min 10 sec: https://ci.us-west.moz.works/job/bedrock_test_stage_us_west/955/ 11:31 AM <ee-jenkins> Project bedrock_test_dev_us_west build #1133: FAILURE in 7 min 18 sec: https://ci.us-west.moz.works/job/bedrock_test_dev_us_west/1133/ 11:34 AM <ee-jenkins> Project bedrock_test_stage_eu_west build #1032: FAILURE in 7 min 52 sec: https://ci.us-west.moz.works/job/bedrock_test_stage_eu_west/1032/ 11:35 AM <ee-jenkins> Project bedrock_test_dev_eu_west build #1624: FAILURE in 8 min 1 sec: https://ci.us-west.moz.works/job/bedrock_test_dev_eu_west/1624/ 11:42 AM <agibson> jgmize: hmm, lots of 502’s on the download links :/ 11:44 AM <•jgmize> are they all on the same url? according to https://ci.us-west.moz.works/job/bedrock_integration_tests_runner/6096/testReport/junit/tests.functional/test_download_l10n/test_localized_download_links_https___download_mozilla_org__product_firefox_46_0_1_SSL_os_osx_lang_en_GB_/ the broken link is 11:44 AM <•jgmize> https://download.mozilla.org/?product=firefox-46.0.1-SSL&os=osx&lang=en-GB 11:44 AM <agibson> jgmize: looks like all the links? https://ci.us-west.moz.works/job/bedrock_integration_tests_runner/6098/#showFailuresLink 11:45 AM <•jgmize> yes, see that now :( https://ci.us-west.moz.works/job/bedrock_integration_tests_runner/6090/ also has problems on several languages conversation continued in #stubby: 11:51 AM <agibson> the test reruns still seem to be failing, fwiw 11:54 AM <oremj> looks like the tests are following redirects, is it getting that response from the location returned by bouncer 11:54 AM <oremj> or bouncer itself? 11:56 AM <oremj> the ELBs say they haven't returned any 5xx errors in the last 12 hours 11:57 AM <oremj> alright, looks like the CDN 12:08 PM <oremj> jgmize: I submitted a case with amazon, hopefully they can provide me with more data on why these are failing 12:08 PM <oremj> it's failing at the cloudfront or s3 level 12:08 PM <oremj> so I don't have a view in to what might be happening 12:09 PM <jgmize> thanks for the update oremj. mind keeping me in the loop with whatever info amazon gives us? 12:09 PM <oremj> will do 12:10 PM <jgmize> thanks again 12:10 PM <jgmize> oremj: did you already file a bug for tracking purposes or would you like me to? 12:11 PM <oremj> I didn't 12:12 PM <oremj> looks like we had error rates up to 60% for 8 non-sequential minutes 11:51 AM <agibson> the test reruns still seem to be failing, fwiw 11:54 AM <oremj> looks like the tests are following redirects, is it getting that response from the location returned by bouncer 11:54 AM <oremj> or bouncer itself? 11:56 AM <oremj> the ELBs say they haven't returned any 5xx errors in the last 12 hours 11:57 AM <oremj> alright, looks like the CDN 12:08 PM <oremj> jgmize: I submitted a case with amazon, hopefully they can provide me with more data on why these are failing 12:08 PM <oremj> it's failing at the cloudfront or s3 level 12:08 PM <oremj> so I don't have a view in to what might be happening 12:09 PM <jgmize> thanks for the update oremj. mind keeping me in the loop with whatever info amazon gives us? 12:09 PM <oremj> will do 12:10 PM <jgmize> thanks again 12:10 PM <jgmize> oremj: did you already file a bug for tracking purposes or would you like me to? 12:11 PM <oremj> I didn't 12:12 PM <oremj> looks like we had error rates up to 60% for 8 non-sequential minutes 12:13 PM <jgmize> ok, filing now
Can we start logging the "X-Amz-Cf-Id" header on failed requests, if it exists.
There is something going on with Cloudfront where using another Cloudfront distribution as an origin and using "Match Viewer" SSL policy causes a 502 when hitting an uncached resource. If the object was already in cache, a 200 will be returned. To alleviate the errors, I've put it in HTTP only mode, so all requests are sent to the origin distribution via HTTP. I will continue to work with AWS support about why we are getting 502s in Match Viewer mode.
Assignee: nobody → oremj
Component: Bouncer → Operations: Product Delivery
Product: Webtools → Cloud Services
QA Contact: oremj
Version: Trunk → unspecified
This was a temporary problem on some of the cloudfront nodes. Amazon says, the problem is fixed.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.