Intermittent requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://autograph-external.stage.autograph.services.mozaws.net/sign/file
Categories
(Cloud Services :: Operations: Autograph, defect, P5)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure)
Filed by: csabou [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=254970506&repo=mozilla-inbound
Full log: https://queue.taskcluster.net/v1/task/MXY4VF-3SBSMPERlkzxrQw/runs/0/artifacts/public/logs/live_backing.log
2019-07-05 18:20:16,067 - urllib3.connectionpool - DEBUG - Starting new HTTPS connection (1): autograph-external.stage.autograph.services.mozaws.net:443
2019-07-05 18:20:16,953 - urllib3.connectionpool - DEBUG - https://autograph-external.stage.autograph.services.mozaws.net:443 "POST /sign/file HTTP/1.1" 502 166
2019-07-05 18:20:16,954 - iscript.autograph - DEBUG - Autograph response: <html>
<head><title>502 Bad Gateway</title></head>
<body bgcolor="white">
<center><h1>502 Bad Gateway</h1></center>
2019-07-05 18:20:16,955 - scriptworker_client.aio - WARNING - retry_async: call_autograph: too many retries!
Traceback (most recent call last):
File "/builds/dep2/bin/iscript", line 11, in <module>
load_entry_point('iscript', 'console_scripts', 'iscript')()
File "/builds/dep2/scriptworker-scripts/iscript/src/iscript/script.py", line 85, in main
return sync_main(async_main, default_config=get_default_config())
File "/builds/dep2/scriptworker-scripts/scriptworker_client/src/scriptworker_client/client.py", line 125, in sync_main
loop.run_until_complete(_handle_asyncio_loop(async_main, config, task))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
return future.result()
File "/builds/dep2/scriptworker-scripts/scriptworker_client/src/scriptworker_client/client.py", line 156, in handle_asyncio_loop
await async_main(config, task)
File "/builds/dep2/scriptworker-scripts/iscript/src/iscript/script.py", line 55, in async_main
await sign_and_pkg_behavior(config, task)
File "/builds/dep2/scriptworker-scripts/iscript/src/iscript/mac.py", line 1234, in sign_and_pkg_behavior
await sign_all_apps(config, key_config, entitlements_path, all_paths)
File "/builds/dep2/scriptworker-scripts/iscript/src/iscript/mac.py", line 638, in sign_all_apps
await raise_future_exceptions(futures)
File "/builds/dep2/scriptworker-scripts/scriptworker_client/src/scriptworker_client/aio.py", line 55, in raise_future_exceptions
raise exceptions[0]
File "/builds/dep2/scriptworker-scripts/iscript/src/iscript/autograph.py", line 396, in sign_omnija_with_autograph
key_config, from, "autograph_omnija", to=signed_out
File "/builds/dep2/scriptworker-scripts/iscript/src/iscript/autograph.py", line 318, in sign_file_with_autograph
await sign_with_autograph(key_config, input_bytes, fmt, "file")
File "/builds/dep2/scriptworker-scripts/iscript/src/iscript/autograph.py", line 289, in sign_with_autograph
sleeptime_kwargs={"delay_factor": 2.0},
File "/builds/dep2/scriptworker-scripts/scriptworker_client/src/scriptworker_client/aio.py", line 154, in retry_async
return await func(*args, **kwargs)
File "/builds/dep2/scriptworker-scripts/iscript/src/iscript/autograph.py", line 218, in call_autograph
r.raise_for_status()
File "/builds/dep2/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://autograph-external.stage.autograph.services.mozaws.net/sign/file
exit code: 1
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 2•6 years ago
|
||
Looks like autograph-external.stage.autograph.services.mozaws.net
had a few hour outage starting at 2019-07-05 18:20, which mostly affected dep-mac-v3-signing1a/1b but also some of the depsigning-workerN. There are a couple of retries for omnija signing already.
Callek, should we be pointing at a stage host for dep omnija signing ? Was expecting to see prod there like we do for widevine.
Comment hidden (Intermittent Failures Robot) |
Comment 5•6 years ago
|
||
Thanks! Lets close this bug out as 'autograph outage', and use bug 1564264 to move omnija signing to autograph prod.
Updated•6 years ago
|
Updated•6 years ago
|
![]() |
||
Comment 6•6 years ago
|
||
Saw this again on today's beta simulations:
https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=256351305&revision=5999f810cba92a3a56cf034c9007a73804801e5a&searchStr=os%2Cx%2Ccross%2Ccompiled%2Cdevedition%2Copt%2Cbuild-signing-macosx64-devedition-nightly%2Fopt%2C%28ns%29
File "/builds/dep2/scriptworker-scripts/iscript/src/iscript/autograph.py", line 218, in call_autograph
r.raise_for_status()
File "/builds/dep2/lib/python3.7/site-packages/requests-2.22.0-py3.7.egg/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://autograph-external.stage.autograph.services.mozaws.net/sign/file
exit code: 1
Comment 7•6 years ago
|
||
Failure appeared on autoland too.
Log link: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=256375792&repo=autoland&lineNumber=462
Log snippet:
2019-07-13 19:52:52,396 - signingscript.sign - INFO - Creating zipfile /builds/scriptworker/work/public/build/target.zip...
2019-07-13 19:53:05,559 - signingscript.task - INFO - sign(): Signing /builds/scriptworker/work/public/build/target.zip with autograph_omnija...
2019-07-13 19:53:05,560 - signingscript.utils - INFO - mkdir /builds/scriptworker/work/ojzipbkqh8db9
2019-07-13 19:54:13,169 - scriptworker.utils - WARNING - retry_async: call_autograph: too many retries!
Traceback (most recent call last):
File "/builds/scriptworker/bin/signingscript", line 11, in <module>
sys.exit(main())
File "/builds/scriptworker/lib/python3.6/site-packages/signingscript/script.py", line 115, in main
return scriptworker.client.sync_main(async_main, default_config=get_default_config())
File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/client.py", line 164, in sync_main
loop.run_until_complete(_handle_asyncio_loop(async_main, context))
File "/tools/python3/lib/python3.6/asyncio/base_events.py", line 468, in run_until_complete
return future.result()
File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/client.py", line 204, in _handle_asyncio_loop
await async_main(context)
File "/builds/scriptworker/lib/python3.6/site-packages/signingscript/script.py", line 58, in async_main
context, os.path.join(work_dir, path), path_dict['formats']
File "/builds/scriptworker/lib/python3.6/site-packages/signingscript/task.py", line 208, in sign
output = await signing_func(context, output, fmt)
File "/builds/scriptworker/lib/python3.6/site-packages/signingscript/sign.py", line 479, in sign_omnija
return await signing_func(context, orig_path, fmt)
File "/builds/scriptworker/lib/python3.6/site-packages/signingscript/sign.py", line 519, in sign_omnija_zip
await raise_future_exceptions(tasks)
File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/utils.py", line 327, in raise_future_exceptions
succeeded_results, _ = await _process_future_exceptions(tasks, raise_at_first_error=True)
File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/utils.py", line 361, in process_future_exceptions
raise exc
File "/builds/scriptworker/lib/python3.6/site-packages/signingscript/sign.py", line 1210, in sign_omnija_with_autograph
await sign_file_with_autograph(context, from, "autograph_omnija", to=signed_out)
File "/builds/scriptworker/lib/python3.6/site-packages/signingscript/sign.py", line 965, in sign_file_with_autograph
signed_bytes = base64.b64decode(await sign_with_autograph(s, input_bytes, fmt, 'file'))
File "/builds/scriptworker/lib/python3.6/site-packages/signingscript/sign.py", line 931, in sign_with_autograph
attempts=3, sleeptime_kwargs={'delay_factor': 2.0})
File "/builds/scriptworker/lib/python3.6/site-packages/scriptworker/utils.py", line 261, in retry_async
return await func(*args, **kwargs)
File "/builds/scriptworker/lib/python3.6/site-packages/signingscript/sign.py", line 868, in call_autograph
r.raise_for_status()
File "/builds/scriptworker/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 502 Server Error: Bad Gateway for url: https://autograph-external.stage.autograph.services.mozaws.net/sign/file
exit code: 1
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 9•6 years ago
|
||
This seems to be permafailing on certain pushes and it's green in between:
![]() |
||
Comment 10•6 years ago
|
||
Greg, please check what's causing this frequent failure.
![]() |
||
Comment 11•6 years ago
•
|
||
Looks like a stage CloudHSM disconnect for the "i-0ce34be64db461c14" instance we have a bunch of these errors in the logs:
"pkcs11: 0x5: CKR_GENERAL_ERROR"
"2019/07/15 14:16:04 Failed to initialize PKCS#11 library: pkcs11: 0x5: CKR_GENERAL_ERROR"
Need to restart autograph on the instance.
![]() |
||
Comment 12•6 years ago
|
||
Autograph was already flapping on the node so restarting it wouldn't have helped. Reconnecting with cloudhsm_mgmt_util
worked, but didn't fix the problem. Detached the instance from the ASG and replaced it with a new one, so the error should be fixed until the next CloudHSM disconnect (issue https://github.com/mozilla-services/autograph/issues/244 also we want lbheartbeat to be reflect that HSM connection state and bug 1564119).
Updated•6 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
![]() |
||
Comment 15•6 years ago
|
||
We now have monitoring of HSM disconnects and instances in a bad state will be taken out of rotation. Bobm is verifying this behavior this week, after which we can safely close this bug.
![]() |
||
Comment 16•6 years ago
|
||
(In reply to Julien Vehent [:ulfr] from comment #15)
We now have monitoring of HSM disconnects and instances in a bad state will be taken out of rotation. Bobm is verifying this behavior this week, after which we can safely close this bug.
Verified.
![]() |
||
Updated•6 years ago
|
Updated•6 years ago
|
Description
•