Closed Bug 1563377 Opened 3 months ago Closed 2 months ago

Make bitbar device pool manager multi-threaded

Categories

(Testing :: Autophone, enhancement, P1)

enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bc, Assigned: aerickson)

References

Details

We currently iterate over the bitbar projects checking for pending tasks on taskcluster and starting tests on an as-needed basis. I've recently deployed a hot-patch that queues up tests equal to twice the number of devices in a device group to help with the issue with short lived superseded jobs. See bug 1563307.

Each bitbar test takes 10-13 seconds to start not counting the other auxiliary tasks involved. As the number of devices in groups grows, the time to completely start tests for each device grows as well and is now over 15 minutes for perf p2 and g5 projects with the pre-population of tests.

We should convert the test run manager into a multi-threaded script which runs each project on its own thread so that they do not have to wait for other projects to be processed.

Side Note: Using autophone's old component for android-hw @ bitbar since it is already available and why not let autophone live on even if in name only. ;-)

See Also: → 1562988

We should hit this first as it will do the most to alleviate the problem with idle devices.

Priority: -- → P1

Andrew and I have deployed a work in progress changeset which has converted the test run manager to run each project on a separate thread. So far, it is working well however it has surfaced an issue with Bitbar which is causing our connection to the bitbar api to fail after an hour or so of operation. The queue is now over 6100 and I do not know when we will be able to get it under control. Hopefully tomorrow we will have a resolution for the bitbar issue, but the eta is unknown at this time.

The bitbar issue was related to an api call to device problems. When I stubbed that out, the system no longer caused back end problems at bitbar. We are currently running well though the number of superseded jobs still pending is taking a while to work through.

unit-p2 4 hour backlog on production
perf-g5 3 hour backlog on production
perf-p2 21 hour backlog on production

Status: NEW → ASSIGNED
Blocks: 1483695

PR has been landed. devicepool0 is running the code.

Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.