Closed Bug 675975 Opened 13 years ago Closed 12 years ago

Increase maxRotatedFiles for masters, and set a value for slaves

Categories

(Release Engineering :: General, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rail, Assigned: coop)

Details

(Whiteboard: [buildmasters][cleanup][buildslaves])

Attachments

(1 file)

Masters (at least release masters) should keep twistd.log files for longer period. At the moment we have maxRotatedFiles set to 10 and rotateLength set to 10M what covers only ~2 days.
The flip side of this is that we don't bound the number of files on slaves at all. I've seen slaves with more than 1500 twistd.log.N files, at 1MB each.
(In reply to comment #1)
> The flip side of this is that we don't bound the number of files on slaves
> at all. I've seen slaves with more than 1500 twistd.log.N files, at 1MB each.

Can this be fixed via slavealloc, i.e. can we specify something in the buildbot.tac that will cause older logs to get blown away? Or will we need to prune old logs by hand even if we change a setting in buildbot.tac?

I've seen the log pile-up myself and try to manually clear things out when I'm investigating problems on slave, but we really need something systemic here.
Priority: -- → P3
Whiteboard: [buildmasters][cleanup]
Yes, you can set (eg)
  maxRotatedFiles = 100
in buildbot.tac and twisted will delete any files in excess of that when the logs are next rotated.

catlee, you sometimes use slave logs to compare step timing between slaves and masters. Do you have any input on what we should set maxRotatedFiles to on the slaves ?
Summary: Increase maxRotatedFiles for masters → Increase maxRotatedFiles for masters, and set a value for slaves
(In reply to comment #3)
> Yes, you can set (eg)
>   maxRotatedFiles = 100
> in buildbot.tac and twisted will delete any files in excess of that when the
> logs are next rotated.
> 
> catlee, you sometimes use slave logs to compare step timing between slaves
> and masters. Do you have any input on what we should set maxRotatedFiles to
> on the slaves ?

that's not necessary to do on the slave, so 100 is more than enough.
(In reply to Chris AtLee [:catlee] from comment #4)
> that's not necessary to do on the slave, so 100 is more than enough.

I'd advocate for 10 logs on the slave and 100 on the masters. That should be plenty.

Where's the right place to make this change for the masters? I recognize that I'll need to modify the running masters by hand, but I'd like any new masters to get the change automatically. https://hg.mozilla.org/build/buildbot-configs/file/2f57786d9c6b/setup-master.py#l50 perhaps?
Assignee: nobody → coop
Whiteboard: [buildmasters][cleanup] → [buildmasters][cleanup][buildslaves]
How long (in hours/days) does 100 logs correspond to? These can be very useful for debugging after the fact so if possible I'd like to keep a few days available on the masters.
(In reply to Ben Hearsum [:bhearsum] from comment #6)
> How long (in hours/days) does 100 logs correspond to? These can be very
> useful for debugging after the fact so if possible I'd like to keep a few
> days available on the masters.

The 10 logs we keep now corresponds to about 2 days, so I reckon 100 logs will buy us 20 days.

Is that long enough for after-the-fact debugging? I could push it up to 150 logs to capture a month's worth of data. Even the busier test masters seem to have >10GB free space, so that shouldn't be an issue.
I should note that on newer masters we have a cronjob doing cleanup on the master:

http://mxr.mozilla.org/build/source/buildbot-configs/Makefile.setup#74

which is http://mxr.mozilla.org/build/source/tools/buildfarm/maintenance/master_cleanup.py

Which if we increase the twistd.log file time, will start cleaning these logs too on us. since the setting there is 4 days, the solution there might be to have that cleanup script exclude twistd.log files explicitly.
Status: NEW → ASSIGNED
Priority: P3 → P2
Has anyone besides Dustin pushed a change to slavealloc before and know what's actually involved?

FWIW, the running version of slavealloc has the patch applied. Slaves are getting buildbot.tac files with maxRotatedFiles set, but I haven't actually seen any twistd.log files get cleaned up yet. I'm wondering if a log rotation event needs to happen before the cleanup will occur.
Attachment #604519 - Flags: review?(nrthomas)
bear said he might try to get this landed and deployed next week while I'm away.

(In reply to Chris Cooper [:coop] from comment #5)
> Where's the right place to make this change for the masters? I recognize
> that I'll need to modify the running masters by hand, but I'd like any new
> masters to get the change automatically.
> https://hg.mozilla.org/build/buildbot-configs/file/2f57786d9c6b/setup-master.
> py#l50 perhaps?

Still need to figure this part out.
Attachment #604519 - Flags: review?(nrthomas) → review+
I've got this running on slavealloc's staging instance and have tested that a call to 

http://slavealloc-staging.build.mozilla.org/gettac/bm-xserve08

returns a .tac file with the new value
Comment on attachment 604519 [details] [diff] [review]
Set maxRotatedFiles=10 for slaves

committed changeset 2370:aaa8d9468d4b
Attachment #604519 - Flags: checked-in+
landed in production slavealloc and tested
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: