Closed Bug 1598716 Opened 5 years ago Closed 5 years ago

bitbar-devicepool: failed bitbar requests stops processing

Categories

(Infrastructure & Operations :: RelOps: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aerickson, Assigned: aerickson)

Details

Proposed solution: Make the exception fatal and have systemd restart the program until it's resolved.

Background: Pagerduty alerted me that android-hw worker health was very low. This was due to workers not working for an hour due to the exception below. The process was still running, just not starting any jobs.

Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]: Traceback (most recent call last):
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/main.py", line 170, in <module>
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     main()
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/main.py", line 167, in main
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     args.func(args)
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/main.py", line 58, in test_run_manager
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     manager.run()
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/test_run_manager.py", line 256, in run
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     self.get_bitbar_test_stats(project_name, projects_config[project_name])
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/test_run_manager.py", line 68, in get_bitbar_test_stats
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     temp_offline_devices = get_offline_devices(device_model=project_config.get('device_model', None))
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/devices.py", line 78, in get_offline_devices
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     device_problems = get_device_problems(device_model=device_model)
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/devices.py", line 68, in get_device_problems
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     data = TESTDROID.get(path=path, payload=payload)['data']
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/testdroid/__init__.py", line 254, in get
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     res =  requests.get(url, params=payload, headers=headers)
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/api.py", line 75, in get
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     return request('get', url, params=params, **kwargs)
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/api.py", line 60, in request
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     return session.request(method=method, url=url, **kwargs)
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     resp = self.send(prep, **send_kwargs)
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     r = adapter.send(request, **kwargs)
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/adapters.py", line 516, in send
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]:     raise ConnectionError(e, request=request)
Nov 22 15:41:08 bitbar-devicepool-0 bash[17169]: requests.exceptions.ConnectionError: HTTPSConnectionPool(host='mozilla.testdroid.com', port=443): Max retries exceeded with url: /api/v2/admin/device-problems?limit=0 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f7360aa4bd0>: Failed to establish a new connection: [Errno 110] Connection timed out',))
Status: NEW → ASSIGNED

The PR has been merged and the active devicepool host is running the code.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.