Closed Bug 1574425 Opened 5 years ago Closed 5 years ago

bitbar Android performance machines stopped taking jobs

Categories

(Infrastructure & Operations :: RelOps: Hardware, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aryx, Assigned: aerickson)

Details

keep in mind that :aerickson is the main point of contact.

our automated bot detected that 72 devices are offline in the last 1.5 hours, so this is recent and it is something that bitbar is already notified about (although they come online in 4 hours)

I've rebooted devicepool0 and things appear to be coming back up and starting jobs.

Flags: needinfo?(bob)

timeline:
2019/08/16 07:12 TC requests start being aborted or having connection errors. These exceptions cause worker threads to stop starting new jobs.
2019/08/16 14:23 BC reboots devicepool. Issues are resolved.

first exception:

Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]: Exception in thread mozilla-gw-batttest-g5:
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]: Traceback (most recent call last):
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:   File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:     self.run()
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:   File "/usr/lib/python2.7/threading.py", line 754, in run
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:     self.__target(*self.__args, **self.__kwargs)
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/test_run_manager.py", line 101, in handle_queue
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:     pending_tasks = get_taskcluster_pending_tasks(taskcluster_provisioner_id, worker_type)
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/taskcluster.py", line 10, in get_taskcluster_pending_tasks
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:     r = requests.get(taskcluster_queue_url)
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/api.py", line 75, in get
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:     return request('get', url, params=params, **kwargs)
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/api.py", line 60, in request
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:     return session.request(method=method, url=url, **kwargs)
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:     resp = self.send(prep, **send_kwargs)
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:     r = adapter.send(request, **kwargs)
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/adapters.py", line 498, in send
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]:     raise ConnectionError(err, request=request)
Aug 16 07:12:14 bitbar-devicepool-0 bash[15911]: ConnectionError: ('Connection aborted.', error(0, 'Error'))

exception near end of issue:

Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]: Exception in thread mozilla-gw-perftest-g5:
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]: Traceback (most recent call last):
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:   File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:     self.run()
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:   File "/usr/lib/python2.7/threading.py", line 754, in run
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:     self.__target(*self.__args, **self.__kwargs)
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/test_run_manager.py", line 101, in handle_queue
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:     pending_tasks = get_taskcluster_pending_tasks(taskcluster_provisioner_id, worker_type)
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/mozilla_bitbar_devicepool/taskcluster.py", line 10, in get_taskcluster_pending_tasks
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:     r = requests.get(taskcluster_queue_url)
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/api.py", line 75, in get
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:     return request('get', url, params=params, **kwargs)
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/api.py", line 60, in request
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:     return session.request(method=method, url=url, **kwargs)
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 533, in request
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:     resp = self.send(prep, **send_kwargs)
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/sessions.py", line 646, in send
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:     r = adapter.send(request, **kwargs)
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:   File "/home/bitbar/mozilla-bitbar-devicepool/venv/local/lib/python2.7/site-packages/requests/adapters.py", line 516, in send
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]:     raise ConnectionError(e, request=request)
Aug 16 14:21:46 bitbar-devicepool-0 bash[15911]: ConnectionError: HTTPSConnectionPool(host='queue.taskcluster.net', port=443): Max retries exceeded with url: /v1/pending/proj-autophone/geck
Assignee: nobody → aerickson
Status: NEW → ASSIGNED

I've rolled out a fix to devicepool0 that handles these exceptions.

https://github.com/bclary/mozilla-bitbar-devicepool/pull/36

Closing as the incident is over and I've deployed a fix.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.