558595 - tracemonkey 64-bit linux tinderboxes out of disk space

Andreas Gal :gal

Reporter

Description

•

15 years ago

http://tinderbox.mozilla.org/showlog.cgi?log=TraceMonkey/1270941134.1270941295.13237.gz

Robert Sayre

Comment 1

•

15 years ago

It's a bummer that Andreas had to play meat nagios here. Shouldn't disk provisioning work automatically? And, in the event that it fails, shouldn't it be reported automatically?

matthew zeier [:mrz]

Comment 2

•

15 years ago

(In reply to comment #1) > Shouldn't disk provisioning work automatically? What do you mean by provisioning (or automatically)? > And, in the event that it fails, shouldn't it > be reported automatically? I think you're talking about Nagios? If Nagios is monitoring disk usage, yes, it'll alert RelEng.

Robert Sayre

Comment 3

•

15 years ago

(In reply to comment #2) > (In reply to comment #1) > > Shouldn't disk provisioning work automatically? > > What do you mean by provisioning (or automatically)? I don't see why any of these machines should ever run out of disk space... but they all seem to run at like 90% capacity for some reason, so of course they fail occasionally. > > > And, in the event that it fails, shouldn't it > > be reported automatically? > > I think you're talking about Nagios? If Nagios is monitoring disk usage, yes, > it'll alert RelEng. Why wouldn't Nagios monitor disk usage?

matthew zeier [:mrz]

Comment 4

•

15 years ago

> I don't see why any of these machines should ever run out of disk space... but > they all seem to run at like 90% capacity for some reason, so of course they > fail occasionally. I don't know the process - that's for RelEng. Guessing they don't clean up. Has three drives, 9GB split between / & swap, 20GB split between /builds & /var and 30GB disk that I don't see mounted. (RelEng - why aren't you using that 30GB disk?) > > I think you're talking about Nagios? If Nagios is monitoring disk usage, yes, > > it'll alert RelEng. > > Why wouldn't Nagios monitor disk usage? They only reason it wouldn't is if it wasn't configured to do so. In this case it is configured to do so but oncall doesn't get paged or notified on these. I believe RelEng does. Anyways, all of these issues are RelEng issues so I'm punting this bug over to them.

Assignee: server-ops → nobody

Component: Server Operations: Tinderbox Maintenance → Release Engineering

QA Contact: mrz → release

Aki Sasaki (not active)

Comment 5

•

15 years ago

(In reply to comment #4) > Has three drives, 9GB split between / & swap, 20GB split between /builds & /var > and 30GB disk that I don't see mounted. > > (RelEng - why aren't you using that 30GB disk?) Hm, where are you seeing this disk? moz2-linux64-slave12, in VI Edit Settings, I only see 2 disks. I'd love to have a third disk lying there though -- then I could expand /builds without feeling guilty about taking even more space on the SAN :) (In reply to comment #3) > I don't see why any of these machines should ever run out of disk space... but > they all seem to run at like 90% capacity for some reason, so of course they > fail occasionally. We keep builds around for: a) debugging b) faster depend builds The way buildbot currently lays it out, there is a separate directory (separate checkout, separate objdir) per build type per branch. So if there's a debug and opt build for 8 project/release branches, that's 16 directory trees. If you include l10n and unit tests, there are even more directory trees. We are working on having smarter layouts as a longer term fix... Recent versions of buildbot allow for sharing of build directories between builders now, for example. And moving unit tests off to the talos boxes should reduce the number of test binaries lying around.

Robert Sayre

Comment 6

•

15 years ago

maybe we should buy some hard drives. from bug 531675 (In reply to comment #8) > > Full disks took out Tinderbox right in the middle of my Try Server run, but > what results I got look good to me. > > I took the bug summary literally and just made eval ignore the 2nd argument > entirely. Hope that's the right thing.

Chris Cooper [:coop] (he/him)

Comment 7

•

15 years ago

(In reply to comment #6) > from bug 531675 > > Full disks took out Tinderbox right in the middle of my Try Server run, but > > what results I got look good to me. Unrelated. That was a disk space issue on the tinderbox server. > maybe we should buy some hard drives. The newer hardware slaves have larger drives, but the VMs (like this one, moz2-linux64-slave12) have smaller drives because the VMs are all sharing space on the network storage device. (In reply to comment #4) > > they all seem to run at like 90% capacity for some reason, so of course they > > fail occasionally. > > I don't know the process - that's for RelEng. Guessing they don't clean up. We try not to cleanup too aggressively *on purpose*. The alternative is sucking down a brand new hg clone every time and we're already straining existing bandwidth.

OS: Mac OS X → Linux

Hardware: x86 → x86_64

Chris Cooper [:coop] (he/him)

Comment 8

•

15 years ago

This doesn't seem to have recurred, but I'm marking it and linking it to the linux64 tracking bug so it will be easy to find if it does.

Blocks: support-L64

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → WORKSFORME

Whiteboard: [linux64]

Nobody; OK to take it and work on it

Assignee

Updated

•

11 years ago

Product: mozilla.org → Release Engineering

Bugzilla

tracemonkey 64-bit linux tinderboxes out of disk space

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: gal, Unassigned)

References

Details

(Whiteboard: [linux64])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated