Closed Bug 1109871 Opened 10 years ago Closed 7 years ago

aws_watch_pending.py robustness improvements

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: rail, Unassigned)

References

Details

it turns out that process_timeout tries to kill the process, but probably not the group, so the main process stays running in parallel with another one, what leads to duplicate instance names.

Would be also great to get rid of pre-defined hostnames, but that's out of the scope of this bug.
I haven't been working on this for a while. Back to the pool.
Assignee: rail → nobody
Re-purposing this bug as an offshoot of the tree closure tracked in bug 1192234.

As itemized in https://bugzilla.mozilla.org/show_bug.cgi?id=1192234#c8, I'm now looking for 4 separate improvements:

1) Alert to #buildduty quickly when failing via papertrail-mediated SNS or whatever.
2) Decrease time to create an individual spot request, or parallelize as much as makes sense. I'm not sure whether general AWS throttling will prevent us from being too aggressive here.
3) Keep better track of free IP addresses when making requests so we can switch subnets when necessary, or at least alert sheriffs/buildduty that delays are imminent.
4) Enforce a single running version of aws_watch_pending.py, as was the original intent of this bug.
Summary: aws_watch_pending.py is killed but not really :) → aws_watch_pending.py robustness improvements
Depends on: 1166119
Depends on: 1192898
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.