Closed
Bug 695611
Opened 13 years ago
Closed 11 years ago
Out of space on tryserver symbol server
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jrmuizel, Assigned: dustin)
References
Details
(Whiteboard: [tryserver][cleanup][symbols])
I'm getting these errors during try builds: scp: /symbols/windows/a524b238173f101be1648b9bbdd275073f562f3e-firefox-10.0a1.en-US.win32.crashreporter-symbols-full.zip: No space left on device http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmuizelaar@mozilla.com-42ce777e5d97/try-win32/try-win32-build3238.txt.gz These cause the build to be restarted after which it fails again and again.
Assignee | ||
Comment 1•13 years ago
|
||
So from the look of the logfile, this is an scp to dm-wwwbuild01 (build.mozilla.org). Sure enough, its /symbols is full: /dev/sdc1 50G 50G 0 100% /symbols This is in the 'make uploadsymbols'. There is a crontask for user trybld that runs find /symbols/windows -mtime +14 -prune -exec rm -rf {} \; daily, and indeed, the symbols files there are all <15 days old. I re-ran this by hand with +13, which should free up enough space to get through the morning. Now, over to releng to figure out what the underlying cause is here. /dev/sdc1 50G 44G 5.5G 89% /symbols (and it's still rm'ing) It wouldn't be a bad idea to file a bug to monitor disk space on that partition on dm-wwwbuild01.
Assignee: server-ops-releng → nobody
Severity: normal → major
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Updated•13 years ago
|
Summary: Out of space on symbol server → Out of space on tryserver symbol server
Comment 2•13 years ago
|
||
I would assume that if devs need symbols for their build, they're grabbing them ASAP. Do we have any idea about tryserver symbol usage to validate that belief? If that's true, I think we could get away with storing symbols for half as long (7 days) *and* adding some monitoring to watch the partition.
OS: Mac OS X → All
Priority: -- → P3
Hardware: x86 → All
Whiteboard: [tryserver][cleanup][symbols]
Comment 3•13 years ago
|
||
If only we could see the http logs on build.m.o. We don't have root there, files are the usual 644 owned by root.
Comment 4•13 years ago
|
||
Scumbag liar was telling porkies, the logs are visible. So the place to look is /var/log/httpd/access_log*. Since 18 Sep there have been * precisely zero successful symbol downloads * two attempts on Oct 17 to pull symbols for a mozilla-central nightly * a bunch of requests against windows libraries * a bunch of search engine spiders getting a 404 on the root symbol dir The middle two probably just indicate that people have the try server URL earlier in the search path than our main symbol server and the MS one. At any rate try symbols is not a well-used service.
Comment 5•13 years ago
|
||
We hit zero freespace again today. In the meantime the cron has been set to '-mtime +10' so I bet it's not the first time. I've switched it to '-mtime +9' for a little more breathing space, and set the cron to run every two hours instead of once a day (less boom and bust). dustin, is there a nagios check on this partition ? I don't see one when I'm logged into the nagios web interface. Clock is wrong on the host too.
Comment 6•13 years ago
|
||
I wonder if it wouldn't be better to just upload the full symbol packages alongside the builds themselves, and get rid of the symbol server. Then the retention policy for symbols could match the builds, and we'd only have to worry about one storage volume.
Assignee | ||
Comment 7•13 years ago
|
||
I fixed the time - looks like ntp had fallen off the wagon again. I added the nagios check and set it to alert in #build (ne #buildduty). +1 to ted's suggestion. dm-wwwbuild01 is not a fileserver :)
Comment 8•11 years ago
|
||
Hey! We fixed bug 702337! So now you can just get rid of this storage entirely.
Assignee | ||
Comment 9•11 years ago
|
||
Nick, can you do the 'rm' here, then hand to me and I'll clean any remaining Apache config out of the old and new clusters?
Comment 10•11 years ago
|
||
relengweb1.dmz.scl3:/symbols/ (aka relengweb1.dmz.scl3:/mnt/netapp/relengweb/oldstuff/symbols) is gone, Jim.
Assignee: nobody → dustin
Comment 11•11 years ago
|
||
Updated this too: https://developer.mozilla.org/en-US/docs/Using_the_Mozilla_symbol_server
Comment 12•11 years ago
|
||
We're getting cron mail from trybld@relengweb1.dmz.scl3 trying to clean up /symbols/windows so I added that directory back. Please remove the cron from puppet (it quotes Puppet Name: cleanup-tryserver-symbols) and so on.
Assignee | ||
Comment 13•11 years ago
|
||
Fixed, dirs and symlinks removed.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•