occasional 502s from balrog web cluster
Categories
(Cloud Services :: Operations: Balrog, task)
Tracking
(Not tracked)
People
(Reporter: bhearsum, Assigned: oremj)
Details
Attachments
(1 file)
|
347.73 KB,
image/png
|
Details |
We've noticed a small number of 502s coming from the Balrog web cluster in the past few weeks. The ones we've seen happen during one of our release automation tests, and get a 502 over multiple retries. For example:
task 2020-04-16T03:46:45.507Z] Downloading 'https://aus5.mozilla.org/update/3/Firefox/63.0/20180906162647/Darwin_x86_64-gcc3-u-i386-x86_64/ar/beta-localtest/default/default/default/update.xml?force=1' and placing in cache...
[task 2020-04-16T03:46:45.861Z] --2020-04-16 03:46:45-- https://aus5.mozilla.org/update/3/Firefox/63.0/20180906162647/Darwin_x86_64-gcc3-u-i386-x86_64/ar/beta-localtest/default/default/default/update.xml?force=1
[task 2020-04-16T03:46:45.863Z] Resolving aus5.mozilla.org (aus5.mozilla.org)... 13.224.13.123, 13.224.13.48, 13.224.13.125, ...
[task 2020-04-16T03:46:45.871Z] Connecting to aus5.mozilla.org (aus5.mozilla.org)|13.224.13.123|:443... connected.
[task 2020-04-16T03:46:54.929Z] HTTP request sent, awaiting response...
[task 2020-04-16T03:46:54.929Z] HTTP/1.1 502 Bad Gateway
[task 2020-04-16T03:46:54.929Z] Content-Type: text/html; charset=UTF-8
[task 2020-04-16T03:46:54.929Z] Content-Length: 332
[task 2020-04-16T03:46:54.929Z] Connection: keep-alive
[task 2020-04-16T03:46:54.929Z] Referrer-Policy: no-referrer
[task 2020-04-16T03:46:54.929Z] Date: Thu, 16 Apr 2020 03:46:54 GMT
[task 2020-04-16T03:46:54.929Z] Alt-Svc: clear
[task 2020-04-16T03:46:54.929Z] X-Cache: Error from cloudfront
[task 2020-04-16T03:46:54.929Z] Via: 1.1 1b74ccf4cb51eacf97a0e6d60ae46a3f.cloudfront.net (CloudFront)
[task 2020-04-16T03:46:54.929Z] X-Amz-Cf-Pop: SEA19-C2
[task 2020-04-16T03:46:54.929Z] X-Amz-Cf-Id: TBg_tucnh1e-0AXzuQDju2dNAGVxmQaxr13zAtWfpx6Mwrtze5DnsA==
[task 2020-04-16T03:46:54.929Z] 2020-04-16 03:46:54 ERROR 502: Bad Gateway.
[task 2020-04-16T03:46:54.929Z]
[task 2020-04-16T03:46:56.091Z] --2020-04-16 03:46:56-- https://aus5.mozilla.org/update/3/Firefox/63.0/20180906162647/Darwin_x86_64-gcc3-u-i386-x86_64/ar/beta-localtest/default/default/default/update.xml?force=1
[task 2020-04-16T03:46:56.093Z] Resolving aus5.mozilla.org (aus5.mozilla.org)... 13.224.13.66, 13.224.13.123, 13.224.13.48, ...
[task 2020-04-16T03:46:56.101Z] Connecting to aus5.mozilla.org (aus5.mozilla.org)|13.224.13.66|:443... connected.
[task 2020-04-16T03:46:56.198Z] HTTP request sent, awaiting response...
[task 2020-04-16T03:46:56.198Z] HTTP/1.1 502 Bad Gateway
[task 2020-04-16T03:46:56.198Z] Content-Type: text/html; charset=UTF-8
[task 2020-04-16T03:46:56.198Z] Content-Length: 332
[task 2020-04-16T03:46:56.198Z] Connection: keep-alive
[task 2020-04-16T03:46:56.198Z] Referrer-Policy: no-referrer
[task 2020-04-16T03:46:56.198Z] Date: Thu, 16 Apr 2020 03:46:54 GMT
[task 2020-04-16T03:46:56.198Z] Alt-Svc: clear
[task 2020-04-16T03:46:56.198Z] X-Cache: Error from cloudfront
[task 2020-04-16T03:46:56.199Z] Via: 1.1 f9d716a351f14a0ac1fac2449734849b.cloudfront.net (CloudFront)
[task 2020-04-16T03:46:56.199Z] X-Amz-Cf-Pop: SEA19-C2
[task 2020-04-16T03:46:56.199Z] X-Amz-Cf-Id: wpklRbsLkPFekLm33TLr5E50ywPq7SwqV7r0DXRreoKu1s9SKD1NKA==
[task 2020-04-16T03:46:56.199Z] Age: 2
[task 2020-04-16T03:46:56.199Z] 2020-04-16 03:46:56 ERROR 502: Bad Gateway.
[task 2020-04-16T03:46:56.199Z]
[task 2020-04-16T03:46:59.136Z] --2020-04-16 03:46:59-- https://aus5.mozilla.org/update/3/Firefox/63.0/20180906162647/Darwin_x86_64-gcc3-u-i386-x86_64/ar/beta-localtest/default/default/default/update.xml?force=1
[task 2020-04-16T03:46:59.261Z] Resolving aus5.mozilla.org (aus5.mozilla.org)... 13.224.13.123, 13.224.13.66, 13.224.13.48, ...
[task 2020-04-16T03:46:59.269Z] Connecting to aus5.mozilla.org (aus5.mozilla.org)|13.224.13.123|:443... connected.
[task 2020-04-16T03:46:59.304Z] HTTP request sent, awaiting response...
[task 2020-04-16T03:46:59.304Z] HTTP/1.1 502 Bad Gateway
[task 2020-04-16T03:46:59.304Z] Content-Type: text/html; charset=UTF-8
[task 2020-04-16T03:46:59.304Z] Content-Length: 332
[task 2020-04-16T03:46:59.304Z] Connection: keep-alive
[task 2020-04-16T03:46:59.304Z] Referrer-Policy: no-referrer
[task 2020-04-16T03:46:59.304Z] Date: Thu, 16 Apr 2020 03:46:54 GMT
[task 2020-04-16T03:46:59.304Z] Alt-Svc: clear
[task 2020-04-16T03:46:59.304Z] X-Cache: Error from cloudfront
[task 2020-04-16T03:46:59.304Z] Via: 1.1 7022a5bbf9872d4a09d63e6cdb457dfe.cloudfront.net (CloudFront)
[task 2020-04-16T03:46:59.304Z] X-Amz-Cf-Pop: SEA19-C2
[task 2020-04-16T03:46:59.304Z] X-Amz-Cf-Id: 0JZVEjW3TWORMswt7-1cIMiG9-Vetu-goUWUBrj8TUWZ2Cz922ZRdA==
[task 2020-04-16T03:46:59.304Z] Age: 5
[task 2020-04-16T03:46:59.304Z] 2020-04-16 03:46:59 ERROR 502: Bad Gateway.
[task 2020-04-16T03:46:59.304Z]
| Reporter | ||
Comment 1•5 years ago
|
||
We saw similar (but slightly different errors) in https://bugzilla.mozilla.org/show_bug.cgi?id=1622267 and https://bugzilla.mozilla.org/show_bug.cgi?id=1622266, so there could be a relation.
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Comment 2•5 years ago
|
||
Here's a graph of the 5xx errors during that time period. We did see a few 502 spikes, but even those spikes maxed out at 0.075% of total request during those windows.
Comment 3•5 years ago
|
||
The internet seems to be having a bad day. Github is throwing 500 as well.
| Reporter | ||
Comment 4•5 years ago
|
||
(In reply to Hassan Ali (:hassan) from comment #3)
The internet seems to be having a bad day. Github is throwing 500 as well.
I think today's issues are unrelated; the ones in comment #0 here are from 5 days ago.
| Assignee | ||
Comment 5•5 years ago
|
||
I've adjusted the preStop hook on the balrog/nginx pods, which should ensure that they aren't shutdown before the load balancer is done sending requests to them. Some quick testing looks like this should solve the issue here.
Description
•