Closed Bug 856594 Opened 7 years ago Closed 6 years ago

keep buildbot master twistd logs longer

Categories

(Release Engineering :: General, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bhearsum, Assigned: catlee)

References

Details

Attachments

(3 files, 1 obsolete file)

We only keep about 4 days worth right now, which makes it difficult to go back and debug issues sometimes. Keeping at least a week's worth would be helpful, but the more the better. Can we keep a month's worth?
Can be worse than 4 days too, just depends how fast the log turns over because we're keeping 10 by 10MB logs (see buildbot.tac). eg bm61:8001 has about 31 hours right now. The 4 days we pass to
 http://hg.mozilla.org/build/tools/file/default/buildfarm/maintenance/master_cleanup.py
is only affecting the builds in that case.
Assignee: nobody → catlee
Attached patch keep moar logs (obsolete) — Splinter Review
Attachment #775930 - Flags: review?(nthomas)
we also need to update buildbot.tac
Attached patch keep moar logsSplinter Review
oh look, that's also managed by puppet!
Attachment #775935 - Flags: review?(nthomas)
Attachment #775930 - Attachment is obsolete: true
Attachment #775930 - Flags: review?(nthomas)
Comment on attachment 775983 [details] [diff] [review]
master_cleanup shouldn't touch twistd.log files

lgtm, buildbot can manage this.
Attachment #775983 - Flags: review?(nthomas) → review+
Comment on attachment 775935 [details] [diff] [review]
keep moar logs

>diff --git a/modules/buildmaster/files/buildbot.tac b/modules/buildmaster/files/buildbot.tac
>-rotateLength = 10000000
>-maxRotatedFiles = 10
>+rotateLength = 50000000
>+maxRotatedFiles = 200

I double checked that using up to 10G will be fine on all our current masters. 100 times moar logs should be sufficient. :-)

>diff --git a/modules/buildmaster/templates/buildmaster-cron.erb b/modules/buildmaster/templates/buildmaster-cron.erb

This part is no longer required with the change to the cleanup script to ignore twistd.log, right ?
Attachment #775935 - Flags: review?(nthomas) → review+
> >diff --git a/modules/buildmaster/templates/buildmaster-cron.erb b/modules/buildmaster/templates/buildmaster-cron.erb
> 
> This part is no longer required with the change to the cleanup script to
> ignore twistd.log, right ?

Correct. I've removed that from my local patch
Attachment #775983 - Flags: checked-in+
Comment on attachment 775935 [details] [diff] [review]
keep moar logs

https://hg.mozilla.org/build/puppet/rev/270a41ab5a3e

I think this requires a restart to pick up the change...Unless I can figure out how to hack it via manhole.
Attachment #775935 - Flags: checked-in+
deployed via manhole
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
looks like we're not managing buildbot.tac with puppet
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Duplicate of this bug: 923163
I'm guessing the MAX_BROKER_REFS is safe for other master types?

The difference here for our masters looks to be:
--- buildbot.tac-prod   2014-01-31 15:39:49.000000000 -0500
+++ buildbot.tac        2014-01-31 15:40:58.000000000 -0500
@@ -7,9 +7,9 @@
 import twisted.spread.pb
 twisted.spread.pb.MAX_BROKER_REFS = 2048
 
-basedir = r'/builds/buildbot/build_scheduler/master'
-rotateLength = 10000000
-maxRotatedFiles = 10
+basedir = r'.'
+rotateLength = 50000000
+maxRotatedFiles = 200
 
 # if this is a relocatable tac file, get the directory containing the TAC
 if basedir == '.':
Attachment #8368776 - Flags: review?(bhearsum)
Comment on attachment 8368776 [details] [diff] [review]
manage buildbot.tac with puppet

Review of attachment 8368776 [details] [diff] [review]:
-----------------------------------------------------------------

We're just increasing it, so it should be effectively a no-op for masters that are still under the original limit.
Attachment #8368776 - Flags: review?(bhearsum) → review+
Comment on attachment 8368776 [details] [diff] [review]
manage buildbot.tac with puppet

this will need merging to production, and then masters will need to be restarted
Attachment #8368776 - Flags: checked-in+
this has merged to production a while ago, but also broke new master setup. Fixed in:

  https://hg.mozilla.org/build/puppet/rev/0a8babc0952d
  https://hg.mozilla.org/build/puppet/rev/4a491539e2df

error was like:

Wed Feb 19 11:43:38 -0800 2014 Puppet (err): Could not set 'file' on ensure: No such file or directory - /builds/buildbot/tests1-macosx/master/buildbot.tac.puppettmp_3151 at 62:/etc/puppet/production/modules/buildmaster/manifests/buildbot_master/mozilla.pp
Wed Feb 19 11:43:38 -0800 2014 Puppet (err): Could not set 'file' on ensure: No such file or directory - /builds/buildbot/tests1-macosx/master/buildbot.tac.puppettmp_3151 at 62:/etc/puppet/production/modules/buildmaster/manifests/buildbot_master/mozilla.pp
Wrapped exception:
No such file or directory - /builds/buildbot/tests1-macosx/master/buildbot.tac.puppettmp_3151
Wed Feb 19 11:43:38 -0800 2014 /Stage[main]//Node[buildbot-master107.srv.releng.scl3.mozilla.com]/Buildmaster::Buildbot_master::Mozilla[bm107-tests1-macosx]/File[/builds/buildbot/tests1-macosx/master/buildbot.tac]/ensure (err): change from absent to file failed: Could not set 'file' on ensure: No such file or directory - /builds/buildbot/tests1-macosx/master/buildbot.tac.puppettmp_3151 at 62:/etc/puppet/production/modules/buildmaster/manifests/buildbot_master/mozilla.pp
There's some fallout from this. 200 x 50,000,000 bytes is 9.4G, so add in the rest of a master and the system and it's possible to hit the nagios warning on free space, at about 1550MB left. Trimming it the 200 down a bit would be one solution.
Bug 1005133 got filed to handle the disk usage. Should we close this out on the principle that 1005133 will make sure we trim all the masters to 100 ?
Depends on: 1005133
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.