Closed Bug 688403 Opened 13 years ago Closed 13 years ago

Deploy twisted log watching script to slave machines.

Categories

(Release Engineering :: General, defect)

defect
Not set
minor

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: Callek, Unassigned)

References

Details

In Bug 526149 we created/deployed watch_twistd_log.py to build masters, which watches-for and e-mails releng on twisted exceptions.

The summary was changed early on in that bug to include slaves, but there was never any slave deployment done.

This bug is about actually deploying/using the script on our slaves.

Things to be cautious of (that we know right now):
* Excessive overload of exception e-mail, hurting buildduty workload.
* Windows twistd.log file-access contention (especially for twisted log rotate)

The first point above would be relatively knowable shortly after deploying this, since it will query all the logs present on the machine in one go, rather than taking only the 1 hour's worth. Due to the use of the timestamp-file...

Windows deployment would be a bit more complicated, and if the slave deployment at all is deemed worthwhile, can be done in another bug if it simplifies things.
I'm not sure of the value of this given that most slave-side errors are raised in some way on the master side, or cause the slave to disconnect. I also don't think we can do this on test machines because it could impact performance. We'd have to see about that before implementing there.
(In reply to Ben Hearsum [:bhearsum] from comment #1)
> I'm not sure of the value of this given that most slave-side errors are
> raised in some way on the master side, or cause the slave to disconnect. I
> also don't think we can do this on test machines because it could impact
> performance. We'd have to see about that before implementing there.

Yes, most of these bubble up to masters already, or end up tanking a build that causes us to investigate. As Ben notes, we couldn't deploy this to test slaves anyway.

My more pressing concern is twistd.log accumulation on the slaves. I don't think we're properly culling logs everywhere yet.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.