Closed
Bug 661891
Opened 14 years ago
Closed 14 years ago
Please update symbol cleanup script on dm-symbolpush01.mozilla.org
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ted, Assigned: nmaul)
References
()
Details
+++ This bug was initially created as a clone of Bug #657837 +++
The symbol cleanup script (in the URL field) is installed somewhere on dm-symbolpush01.mozilla.org, and runs in cron jobs as the ffxbld user (and other users) to clean up old symbols. I fixed a few things in bug 661845, and it needs to be updated.
Assignee | ||
Updated•14 years ago
|
Assignee: server-ops → nmaul
Assignee | ||
Comment 1•14 years ago
|
||
It's beginning to look like this script hasn't been run in a *long* time. We're not finding it in cron on that server, or any other that we've identified as mounting this share.
In running the 3 latest versions of this script in hg (2008, your previous change, and current), in dry-run mode, here's what they find:
[root@dm-symbolpush01 ~]# wc -l mod*
1516 mod1-symbols_ffx-list
82356 mod2-symbols_ffx-list
98711 mod3-symbols_ffx-list
The lists appear to be inclusive (that is, I don't see things in the 2008 list that aren't in the newer ones), as expected.
The 2008 one finds files to remove dating back to December of last year... so it hasn't run in a while or those would be gone by now. Your previous revision (5/17) finds things dating back to *2008*, which should confirm that it's never been run at all, or else those would have been deleted already as well.
Note that I'm only looking in the symbols_ffx directory... presumably we'd want to run this for other directories as well?
I will run it by hand once for symbols_ffx. Let me know how you'd like the cron jobs to look for this (what frequency, which directories).
Thanks!
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•14 years ago
|
||
This has finished for symbols_ffx, and got back only about 1GB of space. Let me know about the other directories.
Assignee | ||
Comment 3•14 years ago
|
||
On closer inspection, I don't think this script is working as effectively as it should.
For these examples, I'm looking specifically at /symbols_ffx/freebl3.pdb/... the filesystem is too big to get quick measurements on, so I just took the first dir in there as a representative sample.
The empty directory pruning doesn't seem to work: in that directory, I see 5699 directories, of which 5067 are empty.
Also in that directory, There are 252 files that are over a year old. These go all the way back to 2008. For example:
Mar 26 2008 ./D54D35C678DF405C86C95415B2C614021/freebl3.sym
Apr 4 2008 ./47F6A1FB1/freebl3.pdb
May 12 2008 ./01A7E7FDFEB74A879BF483573FBD060C1/freebl3.pdb
If you'd like I can easily run a "find | xargs rmdir" command to prune all the empty directories... it would still be a good idea to figure out what's wrong with the script though.
Reporter | ||
Comment 4•14 years ago
|
||
(In reply to comment #1)
> It's beginning to look like this script hasn't been run in a *long* time.
> We're not finding it in cron on that server, or any other that we've
> identified as mounting this share.
Huh, I wonder what happened here? It was definitely running at one point, I wonder if this server got reprovisioned or something?
> In running the 3 latest versions of this script in hg (2008, your previous
> change, and current), in dry-run mode, here's what they find:
>
> [root@dm-symbolpush01 ~]# wc -l mod*
> 1516 mod1-symbols_ffx-list
> 82356 mod2-symbols_ffx-list
> 98711 mod3-symbols_ffx-list
>
> The lists appear to be inclusive (that is, I don't see things in the 2008
> list that aren't in the newer ones), as expected.
Ouch. That's a lot of removals.
> The 2008 one finds files to remove dating back to December of last year...
> so it hasn't run in a while or those would be gone by now. Your previous
> revision (5/17) finds things dating back to *2008*, which should confirm
> that it's never been run at all, or else those would have been deleted
> already as well.
The latest revision of the script has some changes that will make it remove some older files that the other scripts wouldn't, so I wouldn't put too much stock in that.
> I will run it by hand once for symbols_ffx. Let me know how you'd like the
> cron jobs to look for this (what frequency, which directories).
Daily on each symbols_* directory (except symbols_os) should be fine.
Reporter | ||
Comment 5•14 years ago
|
||
(In reply to comment #3)
> On closer inspection, I don't think this script is working as effectively as
> it should.
>
> For these examples, I'm looking specifically at /symbols_ffx/freebl3.pdb/...
> the filesystem is too big to get quick measurements on, so I just took the
> first dir in there as a representative sample.
>
> The empty directory pruning doesn't seem to work: in that directory, I see
> 5699 directories, of which 5067 are empty.
Ah, hm. It's possible my code broke at some point. I can take a look at that later.
> Also in that directory, There are 252 files that are over a year old. These
> go all the way back to 2008. For example:
> Mar 26 2008 ./D54D35C678DF405C86C95415B2C614021/freebl3.sym
> Apr 4 2008 ./47F6A1FB1/freebl3.pdb
> May 12 2008 ./01A7E7FDFEB74A879BF483573FBD060C1/freebl3.pdb
Right, that's not necessarily wrong. We intentionally don't ever delete symbols from release builds (including alphas, betas, and full releases), so there are symbols dating back to Firefox 3.
> If you'd like I can easily run a "find | xargs rmdir" command to prune all
> the empty directories... it would still be a good idea to figure out what's
> wrong with the script though.
Feel free to do that, I can also look into fixing the script.
(In reply to comment #2)
> This has finished for symbols_ffx, and got back only about 1GB of space. Let
> me know about the other directories.
Hrm. I would have expected more than that given the volume of things it deleted.
Assignee | ||
Comment 6•14 years ago
|
||
(In reply to comment #5)
> Right, that's not necessarily wrong. We intentionally don't ever delete
> symbols from release builds (including alphas, betas, and full releases), so
> there are symbols dating back to Firefox 3.
Is there some cut-off for this? Otherwise I would expect to see symbols going back much further than 2008... Firefox 1.0, even.
>
> > If you'd like I can easily run a "find | xargs rmdir" command to prune all
> > the empty directories... it would still be a good idea to figure out what's
> > wrong with the script though.
>
> Feel free to do that, I can also look into fixing the script.+
Running now, for symbols_ffx. I see the same thing in /symbols_tbrd/jar50.pdb/ (1561 dirs, 1501 are empty), so if/when the script is fixed we can double-check functionality on that.
> (In reply to comment #2)
> > This has finished for symbols_ffx, and got back only about 1GB of space. Let
> > me know about the other directories.
>
> Hrm. I would have expected more than that given the volume of things it
> deleted.
I'm trying to get a disk usage report, to see if we can get some idea of where all the space is going. Unfortunately the sheer number of files is making it very slow. I'll let it run over the weekend, and hopefully we'll have a report on Monday. We might find something unexpected eating up a lot of space somewhere.
Reporter | ||
Comment 7•14 years ago
|
||
(In reply to comment #6)
> Is there some cut-off for this? Otherwise I would expect to see symbols
> going back much further than 2008... Firefox 1.0, even.
Firefox 3 is the first release that we shipped with Breakpad crash reporting, which would have been in 2008.
> I'm trying to get a disk usage report, to see if we can get some idea of
> where all the space is going. Unfortunately the sheer number of files is
> making it very slow. I'll let it run over the weekend, and hopefully we'll
> have a report on Monday. We might find something unexpected eating up a lot
> of space somewhere.
Yeah, even "du" is painfully slow there. We're definitely going to investigate alternatives to flat-file storage in the future.
Assignee | ||
Comment 8•14 years ago
|
||
(In reply to comment #7)
> Firefox 3 is the first release that we shipped with Breakpad crash
> reporting, which would have been in 2008.
Ah, makes sense. Thanks.
> Yeah, even "du" is painfully slow there. We're definitely going to
> investigate alternatives to flat-file storage in the future.
Once we cut down on the number of files/directories (especially the empties, as there seems to be so many of them), this should get better.
Another issue is that dm-symbolpush01 itself is fairly limited. It's 1 CPU core and 512MB RAM (it's a VM, not a real server). This low amount of RAM really cuts into how much disk caching it can do, which means you get a lot of disk seeks that would have been cached on a beefier server. If it's a recurring problem, we can look into giving it more RAM... that would probably help out.
Reporter | ||
Comment 9•14 years ago
|
||
Ah, good point. In general, the only things that happen on this box are the scp/ssh upload/unzip of the symbols, and the cleanup script. The Socorro consumers of the data access the NFS mount elsewhere.
Assignee | ||
Comment 10•14 years ago
|
||
This cron is now in place. Note that the cron itself runs as root, simply because then I only need one cron instead of 12. As I understand it this only removes things, so this should not create a permissions problem.
Additionally, the cron also manually removes empty directories, like so:
find ${BASE}/symbols_${i} -depth -type d -empty -print0 | xargs -0 rmdir
The output from the script leads me to believe it should be doing this as well, but it appears to be failing to do so for some reason. This is just a quick-fix for that. It would be good to either fix up the script, change the script's output (if I'm misinterpreting it somehow), or just remove that bit of code from the script altogether and rely on this find job.
This has also been added to puppet, so it has a better chance of not falling off in the future should we need to recreate it.
For future reference, the cron is located in root's crontab (which is where puppet put it, not me). The wrapper script it calls is:
/mnt/netapp/breakpad/cleanup-breakpad-symbols.sh
and the cleanup script itself is:
/mnt/netapp/breakpad/cleanup-breakpad-symbols.py
If you need this updated in the future, it might be a good idea to reference or clone this bug, so whomever in infra handles it will have a better idea what's going... will save some time. Thanks!
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•