Closed Bug 510503 Opened 15 years ago Closed 14 years ago

fine tune cronjobs on staging-stage and stage to right amount of l10n installers

Categories

(Release Engineering :: General, defect, P3)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

(Whiteboard: [l10n])

Attachments

(1 file)

After turning on the l10n nightly updates we might have in our ftp servers:
1) latest l10n
2) dated dir of the latest (we have to check that this is a symlink rather than having 2 copies stored)
3) dated dir of the previous nightly/ies (for now we only need one to generate the incremental updates)
4) repack-on-change (tinderbox-builds only keeps the last 24 hours)  

If we keep the last two dated dirs we can have incremental updates (one of our goals is to only need to store one in latest).

We have to check the cronjobs to make sure that we don't delete neither more nor less than we should.

For now, I believe that 4 days (*) should be the number to be chosen since we don't want to hit the situation in which for few days a locale has not been generated (for whatever reason) and then when we don't want patch-packager barking because it is trying to process a snippet that is pointing to a missing MAR. 


(*) I believe that patch-packager stops caring about 4 days old snippets
Whiteboard: [l10n]
No longer blocks: 480081
Depends on: 480081
Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Priority: -- → P3
Priority: P3 → P2
Nick I think that you are the only one who can modify the cronjobs of stage.
What do you think of this cronjob to remove the *-l10n dated dirs?
@daily find /home/ftp/pub/firefox/nightly/20*/*/20*mozilla*-l10n -mtime +5 -type d -maxdepth 0 -mindepth 0 -exec rm -rfv {} \;

I chose to go 5 days rather than 4 to keep a day longer than what prometheus-vm might need.
This would be a little better
@daily find /pub/mozilla.org/firefox/nightly/20?? -mindepth 2 -maxdepth 2 -type d -name '*-mozilla-*-l10n' -mtime +5  -exec rm -rf {} \;
because find does the hard work of selecting directories, rather than the shell (expanding those * wildcards) and find just doing a stat and removal. 

Did you want rm -v for a while to check it's not removing too much or little ? Causing output (and hence cron mail) for the "working normally" situation is usually avoided. It'd also be nicer to only look in 2009/07 and later (or maybe the last couple of months), but perhaps that doesn't really matter if this only runs once a day.

Could you remind me how the '-mtime +5' works ? If a locale breaks for a few days then patch-packager will discard the snippet for the old build, and we no longer need the old mar ? (And they'll need two new nightlies for updates to work again). Otherwise we hang on to a few more directories than we really need.
(In reply to comment #2)
> This would be a little better
> @daily find /pub/mozilla.org/firefox/nightly/20?? -mindepth 2 -maxdepth 2 -type
> d -name '*-mozilla-*-l10n' -mtime +5  -exec rm -rf {} \;
> because find does the hard work of selecting directories, rather than the shell
> (expanding those * wildcards) and find just doing a stat and removal. 
> 
Cool; I will adjust it on staging-nightly-updates.

> Did you want rm -v for a while to check it's not removing too much or little ?
> Causing output (and hence cron mail) for the "working normally" situation is
> usually avoided. It'd also be nicer to only look in 2009/07 and later (or maybe
> the last couple of months), but perhaps that doesn't really matter if this only
> runs once a day.
> 
No need for verbose.

> Could you remind me how the '-mtime +5' works ? If a locale breaks for a few
> days then patch-packager will discard the snippet for the old build, and we no
> longer need the old mar ? (And they'll need two new nightlies for updates to
> work again). Otherwise we hang on to a few more directories than we really
> need.
The scenario that worries me is when a complete.mar gets deleted but the unprocessed snippet (if we have a locale failing for 3 days in a row) will be processed by prometheus-vm since it would try to wget a complete.mar that was deleted already.

# let's assume -mtime +1
day 0 -> locale success - uploaded complete.mar - uploaded snippet A - prometheus-vm does nothing with snippet A
day 1 -> locale fails - prometheus-vm does nothing with snippet A
day 2 -> complete.mar gets removed - locale succeeds - uploaded snippet B - prometheus-vm tries to grab complete.mar that snippet A points to but it does not find in ftp and it won't work until we delete snippet A
Added
0 23 * * * root find /pub/mozilla.org/firefox/nightly/20?? -mindepth 2 -maxdepth 2 -type d -name '*-mozilla-*-l10n' -mtime +5 -exec rm -rf {} \;
to stage:/etc/cron.d/ftp-staging-rw-server. Output mailed to me & root@stage. The symlink cleaner for firefox/nightly/ runs an hour later.

Of the 224G in use by l10n nightlies the first run cleared up about 80G. As the days go by it'll clean up more, there are some strange dates on these dirs:

Aug 24 12:31 2009-08-24-11-firefox3.0.14-l10n
Aug 24 13:34 2009-08-24-13-firefox3.0.14-l10n
Sep  1 11:16 2009-08-25-03-mozilla-1.9.2-l10n
Sep  1 11:16 2009-08-25-03-mozilla-central-l10n
I made the assumption those funky dates were from bumps along the road, as the  dates on the files inside those directories didn't match up (they were at most a day after the dir name.) After correcting the dates on the dirs the cleanup was run again, removing another 44G, leaving us with 95G in use for l10n nightlies.

The graph shows that in context. The upward jump on 29/09 was moving away old firefox/nightly/$version-candidates dirs, and the one at 05/09 is the cleanup from this bug.
(In reply to comment #4)
> There are some strange dates on these dirs:
> 
> Aug 24 12:31 2009-08-24-11-firefox3.0.14-l10n
> Aug 24 13:34 2009-08-24-13-firefox3.0.14-l10n
> Sep  1 11:16 2009-08-25-03-mozilla-1.9.2-l10n
> Sep  1 11:16 2009-08-25-03-mozilla-central-l10n
>
I have looked into it and yes this is very odd. For instance:
- how is the directory 2009-08-31-04-mozilla-central-l10n dated as last modified the 7th (Sep  7 13:25) if there is no file in there older than Sep 1 13:29?

Maybe we should be looking at the date of the files inside of the dated dirs instead of the dirs in themselves? We would be for real removing anything older than 5 days.

(In reply to comment #5)
> Created an attachment (id=398637) [details]
> Free space graph for last 6 weeks
> 
> After correcting the dates on the dirs the cleanup
> was run again, removing another 44G, leaving us with 95G in use for l10n
> nightlies.
> 
How did you correct the dates on the dirs?

How can I generate a graph like that?
(In reply to comment #4)
> Added
> 0 23 * * * root find /pub/mozilla.org/firefox/nightly/20?? -mindepth 2
> -maxdepth 2 -type d -name '*-mozilla-*-l10n' -mtime +5 -exec rm -rf {} \;
If we do the cleaning I believe we can reduce it to -mtime +4 since when we reach the nightly update generation time it will be -mtime +5 since after midnight it will be one more day.

(In reply to comment #6)
> (In reply to comment #4)
> > There are some strange dates on these dirs:
> > 
> > Aug 24 12:31 2009-08-24-11-firefox3.0.14-l10n
> > Aug 24 13:34 2009-08-24-13-firefox3.0.14-l10n
> > Sep  1 11:16 2009-08-25-03-mozilla-1.9.2-l10n
> > Sep  1 11:16 2009-08-25-03-mozilla-central-l10n
> >
> I have looked into it and yes this is very odd. For instance:
> - how is the directory 2009-08-31-04-mozilla-central-l10n dated as last
> modified the 7th (Sep  7 13:25) if there is no file in there older than Sep 1
> 13:29?
> 
> Maybe we should be looking at the date of the files inside of the dated dirs
> instead of the dirs in themselves? We would be for real removing anything older
> than 5 days.
> 
Nick what do you think of this line for finding files older than 4 days?:
find /pub/mozilla.org/firefox/nightly/20??/??/*mozilla-*-l10n -type f -mtime +4
Added this line to staging-stage's crontab for cleaning up L10n repackages on tinderbox-builds:

@hourly find /home/ftp/pub/firefox/tinderbox-builds/ -mtime +0 -type d
-mindepth 1 -maxdepth 1 -name "*-l10n" -exec rm -rf {} \;
Priority: P2 → P3
ffxbld currently has 
0  * * * *        find /home/ftp/pub/firefox/tinderbox-builds/       -mmin +300 -depth -type d -mindepth 2 -maxdepth 2 -name 1????????? -exec rm -rf {} \;
5  * * * *        find /home/ftp/pub/firefox/tinderbox-builds/       -mmin +300 -depth -type f -mindepth 1 -exec rm -rf {} \; 
10 * * * *        find /home/ftp/pub/firefox/tinderbox-builds/*-l10n -mmin +300 -depth -type f -mindepth 1 -maxdepth 1 -exec rm -rf {} \;

The first and third ones we just have to live with for now, but the second one is removing files like bloat.log, codesize-auto.log, malloc.log, sdleak.tree etc and causing false redness in staging. And that makes it more difficult to figure if new slaves are working OK or not. I've disabled the second check for now, but speak up if it's doing something else that's useful.

In other cron news, I set the staging-stage:/data/symbols_ffx cleanup to run every hour and delete symbols more that 24 hours old (rather than once a day and more than 4 days old). That frees up a few gig until bug 525157 comes through with more disk space. FYI, this is the symbol server/Socorro symbolstore analogue, and not actually used for anything. Packaged unit test and talos use the zip of symbols uploaded along side the firefox archive.
(In reply to comment #9)
> ffxbld currently has 
> 0  * * * *        find /home/ftp/pub/firefox/tinderbox-builds/       -mmin +300
> -depth -type d -mindepth 2 -maxdepth 2 -name 1????????? -exec rm -rf {} \;
> 5  * * * *        find /home/ftp/pub/firefox/tinderbox-builds/       -mmin +300
> -depth -type f -mindepth 1 -exec rm -rf {} \; 
> 10 * * * *        find /home/ftp/pub/firefox/tinderbox-builds/*-l10n -mmin +300
> -depth -type f -mindepth 1 -maxdepth 1 -exec rm -rf {} \;
> 
> The first and third ones we just have to live with for now, but the second one
> is removing files like bloat.log, codesize-auto.log, malloc.log, sdleak.tree
> etc and causing false redness in staging. And that makes it more difficult to
> figure if new slaves are working OK or not. I've disabled the second check for
> now, but speak up if it's doing something else that's useful.
> 
Good to know. If we don't run that cronjob we don't really pile up that much, right? I am fine with it.

> In other cron news, I set the staging-stage:/data/symbols_ffx cleanup to run
> every hour and delete symbols more that 24 hours old (rather than once a day
> and more than 4 days old). That frees up a few gig until bug 525157 comes
> through with more disk space. FYI, this is the symbol server/Socorro
> symbolstore analogue, and not actually used for anything. Packaged unit test
> and talos use the zip of symbols uploaded along side the firefox archive.
>
I had no idea what these were for so glad to learn about it.
(In reply to comment #10)
> Good to know. If we don't run that cronjob we don't really pile up that much,
> right? I am fine with it.

The logs are _replaced_ every time the builds run, and are all < 50M in size. There's no build up. I'll go back in a day or so to check all is well.
Let's file other bugs if we need to or reopen this one if needed.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.