keep buildbot master twistd logs longer

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: bhearsum, Assigned: catlee)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments, 1 obsolete attachment)

(Reporter)

Description

5 years ago
We only keep about 4 days worth right now, which makes it difficult to go back and debug issues sometimes. Keeping at least a week's worth would be helpful, but the more the better. Can we keep a month's worth?
Can be worse than 4 days too, just depends how fast the log turns over because we're keeping 10 by 10MB logs (see buildbot.tac). eg bm61:8001 has about 31 hours right now. The 4 days we pass to
 http://hg.mozilla.org/build/tools/file/default/buildfarm/maintenance/master_cleanup.py
is only affecting the builds in that case.
(Assignee)

Updated

4 years ago
Assignee: nobody → catlee
(Assignee)

Comment 2

4 years ago
Created attachment 775930 [details] [diff] [review]
keep moar logs
Attachment #775930 - Flags: review?(nthomas)
(Assignee)

Comment 3

4 years ago
we also need to update buildbot.tac
(Assignee)

Comment 4

4 years ago
Created attachment 775935 [details] [diff] [review]
keep moar logs

oh look, that's also managed by puppet!
Attachment #775935 - Flags: review?(nthomas)
(Assignee)

Updated

4 years ago
Attachment #775930 - Attachment is obsolete: true
Attachment #775930 - Flags: review?(nthomas)
(Assignee)

Comment 5

4 years ago
Created attachment 775983 [details] [diff] [review]
master_cleanup shouldn't touch twistd.log files
Attachment #775983 - Flags: review?(nthomas)
Comment on attachment 775983 [details] [diff] [review]
master_cleanup shouldn't touch twistd.log files

lgtm, buildbot can manage this.
Attachment #775983 - Flags: review?(nthomas) → review+
Comment on attachment 775935 [details] [diff] [review]
keep moar logs

>diff --git a/modules/buildmaster/files/buildbot.tac b/modules/buildmaster/files/buildbot.tac
>-rotateLength = 10000000
>-maxRotatedFiles = 10
>+rotateLength = 50000000
>+maxRotatedFiles = 200

I double checked that using up to 10G will be fine on all our current masters. 100 times moar logs should be sufficient. :-)

>diff --git a/modules/buildmaster/templates/buildmaster-cron.erb b/modules/buildmaster/templates/buildmaster-cron.erb

This part is no longer required with the change to the cleanup script to ignore twistd.log, right ?
Attachment #775935 - Flags: review?(nthomas) → review+
(Assignee)

Comment 8

4 years ago
> >diff --git a/modules/buildmaster/templates/buildmaster-cron.erb b/modules/buildmaster/templates/buildmaster-cron.erb
> 
> This part is no longer required with the change to the cleanup script to
> ignore twistd.log, right ?

Correct. I've removed that from my local patch
(Assignee)

Updated

4 years ago
Attachment #775983 - Flags: checked-in+
(Assignee)

Comment 9

4 years ago
Comment on attachment 775935 [details] [diff] [review]
keep moar logs

https://hg.mozilla.org/build/puppet/rev/270a41ab5a3e

I think this requires a restart to pick up the change...Unless I can figure out how to hack it via manhole.
Attachment #775935 - Flags: checked-in+
(Assignee)

Comment 10

4 years ago
deployed via manhole
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
(Assignee)

Comment 11

4 years ago
looks like we're not managing buildbot.tac with puppet
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Updated

4 years ago
Duplicate of this bug: 923163
(Assignee)

Comment 13

4 years ago
Created attachment 8368776 [details] [diff] [review]
manage buildbot.tac with puppet

I'm guessing the MAX_BROKER_REFS is safe for other master types?

The difference here for our masters looks to be:
--- buildbot.tac-prod   2014-01-31 15:39:49.000000000 -0500
+++ buildbot.tac        2014-01-31 15:40:58.000000000 -0500
@@ -7,9 +7,9 @@
 import twisted.spread.pb
 twisted.spread.pb.MAX_BROKER_REFS = 2048
 
-basedir = r'/builds/buildbot/build_scheduler/master'
-rotateLength = 10000000
-maxRotatedFiles = 10
+basedir = r'.'
+rotateLength = 50000000
+maxRotatedFiles = 200
 
 # if this is a relocatable tac file, get the directory containing the TAC
 if basedir == '.':
Attachment #8368776 - Flags: review?(bhearsum)
(Reporter)

Comment 14

4 years ago
Comment on attachment 8368776 [details] [diff] [review]
manage buildbot.tac with puppet

Review of attachment 8368776 [details] [diff] [review]:
-----------------------------------------------------------------

We're just increasing it, so it should be effectively a no-op for masters that are still under the original limit.
Attachment #8368776 - Flags: review?(bhearsum) → review+
(Assignee)

Comment 15

4 years ago
Comment on attachment 8368776 [details] [diff] [review]
manage buildbot.tac with puppet

this will need merging to production, and then masters will need to be restarted
Attachment #8368776 - Flags: checked-in+
this has merged to production a while ago, but also broke new master setup. Fixed in:

  https://hg.mozilla.org/build/puppet/rev/0a8babc0952d
  https://hg.mozilla.org/build/puppet/rev/4a491539e2df

error was like:

Wed Feb 19 11:43:38 -0800 2014 Puppet (err): Could not set 'file' on ensure: No such file or directory - /builds/buildbot/tests1-macosx/master/buildbot.tac.puppettmp_3151 at 62:/etc/puppet/production/modules/buildmaster/manifests/buildbot_master/mozilla.pp
Wed Feb 19 11:43:38 -0800 2014 Puppet (err): Could not set 'file' on ensure: No such file or directory - /builds/buildbot/tests1-macosx/master/buildbot.tac.puppettmp_3151 at 62:/etc/puppet/production/modules/buildmaster/manifests/buildbot_master/mozilla.pp
Wrapped exception:
No such file or directory - /builds/buildbot/tests1-macosx/master/buildbot.tac.puppettmp_3151
Wed Feb 19 11:43:38 -0800 2014 /Stage[main]//Node[buildbot-master107.srv.releng.scl3.mozilla.com]/Buildmaster::Buildbot_master::Mozilla[bm107-tests1-macosx]/File[/builds/buildbot/tests1-macosx/master/buildbot.tac]/ensure (err): change from absent to file failed: Could not set 'file' on ensure: No such file or directory - /builds/buildbot/tests1-macosx/master/buildbot.tac.puppettmp_3151 at 62:/etc/puppet/production/modules/buildmaster/manifests/buildbot_master/mozilla.pp
There's some fallout from this. 200 x 50,000,000 bytes is 9.4G, so add in the rest of a master and the system and it's possible to hit the nagios warning on free space, at about 1550MB left. Trimming it the 200 down a bit would be one solution.
Bug 1005133 got filed to handle the disk usage. Should we close this out on the principle that 1005133 will make sure we trim all the masters to 100 ?
Depends on: 1005133
(Assignee)

Updated

3 years ago
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.