Closed Bug 695611 Opened 13 years ago Closed 11 years ago

Out of space on tryserver symbol server

Categories

(Release Engineering :: General, defect, P3)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jrmuizel, Assigned: dustin)

References

Details

(Whiteboard: [tryserver][cleanup][symbols])

I'm getting these errors during try builds:
scp: /symbols/windows/a524b238173f101be1648b9bbdd275073f562f3e-firefox-10.0a1.en-US.win32.crashreporter-symbols-full.zip: No space left on device

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmuizelaar@mozilla.com-42ce777e5d97/try-win32/try-win32-build3238.txt.gz

These cause the build to be restarted after which it fails again and again.
So from the look of the logfile, this is an scp to dm-wwwbuild01 (build.mozilla.org).  Sure enough, its /symbols is full:

/dev/sdc1              50G   50G     0 100% /symbols

This is in the 'make uploadsymbols'.

There is a crontask for user trybld that runs
  find /symbols/windows -mtime +14 -prune -exec rm -rf {} \;
daily, and indeed, the symbols files there are all <15 days old.

I re-ran this by hand with +13, which should free up enough space to get through the morning.  Now, over to releng to figure out what the underlying cause is here.

/dev/sdc1              50G   44G  5.5G  89% /symbols
  (and it's still rm'ing)

It wouldn't be a bad idea to file a bug to monitor disk space on that partition on dm-wwwbuild01.
Assignee: server-ops-releng → nobody
Severity: normal → major
Component: Server Operations: RelEng → Release Engineering
QA Contact: zandr → release
Summary: Out of space on symbol server → Out of space on tryserver symbol server
I would assume that if devs need symbols for their build, they're grabbing them ASAP. Do we have any idea about tryserver symbol usage to validate that belief?

If that's true, I think we could get away with storing symbols for half as long (7 days) *and* adding some monitoring to watch the partition.
OS: Mac OS X → All
Priority: -- → P3
Hardware: x86 → All
Whiteboard: [tryserver][cleanup][symbols]
If only we could see the http logs on build.m.o. We don't have root there, files are the usual 644 owned by root.
Scumbag liar was telling porkies, the logs are visible. 

So the place to look is /var/log/httpd/access_log*. Since 18 Sep there have been
* precisely zero successful symbol downloads
* two attempts on Oct 17 to pull symbols for a mozilla-central nightly
* a bunch of requests against windows libraries
* a bunch of search engine spiders getting a 404 on the root symbol dir

The middle two probably just indicate that people have the try server URL earlier in the search path than our main symbol server and the MS one.

At any rate try symbols is not a well-used service.
We hit zero freespace again today. In the meantime the cron has been set to '-mtime +10' so I bet it's not the first time.

I've switched it to '-mtime +9' for a little more breathing space, and set the cron to run every two hours instead of once a day (less boom and bust).

dustin, is there a nagios check on this partition ? I don't see one when I'm logged into the nagios web interface. Clock is wrong on the host too.
I wonder if it wouldn't be better to just upload the full symbol packages alongside the builds themselves, and get rid of the symbol server. Then the retention policy for symbols could match the builds, and we'd only have to worry about one storage volume.
I fixed the time - looks like ntp had fallen off the wagon again.

I added the nagios check and set it to alert in #build (ne #buildduty).

+1 to ted's suggestion.  dm-wwwbuild01 is not a fileserver :)
Hey! We fixed bug 702337! So now you can just get rid of this storage entirely.
Nick, can you do the 'rm' here, then hand to me and I'll clean any remaining Apache config out of the old and new clusters?
relengweb1.dmz.scl3:/symbols/ (aka relengweb1.dmz.scl3:/mnt/netapp/relengweb/oldstuff/symbols) is gone, Jim.
Assignee: nobody → dustin
We're getting cron mail from trybld@relengweb1.dmz.scl3 trying to clean up /symbols/windows so I added that directory back. Please remove the cron from puppet (it quotes Puppet Name: cleanup-tryserver-symbols) and so on.
Fixed, dirs and symlinks removed.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.