Closed Bug 1578904 Opened 5 years ago Closed 5 years ago

[aws provider] Worker registration takes long

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: owlish, Assigned: owlish)

References

Details

Which leads to estimator not working correctly.

TODO:

  • Investigate how the registration time can be improved
  • Investigate other ways to alleviate (ensure idempotency of the requests etc.)

Tests with 1 worker: The longest time period is between worker record creation time and /register/worker request appearing in the logs. So it seems to be something happening outside of worker manager. Also, the time that instances spend in starting state is in the order of seconds (overall registering time is in the order of minutes).

Apparently worker-runner and worker logs need to be examined

There's also the matter of looping over an array of instances when creating worker records in the DB, which has linear complexity. Given that registering of 1 instance already takes minutes, this doesn't seem to be the bottleneck; however, this can also be dealt with (saving the whole array of instances into a single record, for example, would give it a constant complexity of 1)

Upon examining the worker logs, it appears that it takes around 1s or less from worker-runner start to registering worker. So the bottleneck to be residing between instance start and worker-runner start - that's the part that takes the longest

It seems like the bottleneck might be a common denominator between aws provider and gcp provider - if it's on our part, gcp provider might be having the same problem. If it's something that comes from the cloud, then it wouldn't necessarily manifest itself in other clouds. I am curious about the results of google provider load testing, if any. bstack?

Flags: needinfo?(bstack)
Status: NEW → ASSIGNED

Upon a meeting with bstack and dustin, turns out that the delay is outside of our systems and our control. Also turns out it's actually not that long.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Flags: needinfo?(bstack)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.