Closed Bug 801607 Opened 12 years ago Closed 10 years ago

Make EC2 instances less susceptible to "abort: No space left on device"

Categories

(Release Engineering :: General, defect, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Unassigned)

References

Details

(Keywords: sheriffing-untriaged)

Latest instance:

slave: bld-linux64-ec2-020
https://tbpl.mozilla.org/php/getParsedLog.php?id=16115372&tree=Mozilla-Inbound

{
========= Started clone build tools failed (results: 2, elapsed: 19 secs) (at 2012-10-15 05:45:08.363782) =========
hg clone http://hg.mozilla.org/build/tools tools
 in dir /builds/slave/m-in-lnx/. (timeout 1320 secs)
 watching logfiles {}
 argv: ['hg', 'clone', 'http://hg.mozilla.org/build/tools', 'tools']
 environment:
  CCACHE_HASHDIR=
  G_BROKEN_FILENAMES=1
  HISTCONTROL=ignoredups
  HISTSIZE=1000
  HOME=/home/cltbld
  HOSTNAME=bld-linux64-ec2-020.build.aws-us-west-1.mozilla.com
  LESSOPEN=|/usr/bin/lesspipe.sh %s
  LOGNAME=cltbld
  MAIL=/var/spool/mail/cltbld
  PATH=/usr/local/bin:/usr/lib64/ccache:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/cltbld/bin
  PWD=/builds/slave/m-in-lnx
  SHELL=/bin/bash
  SHLVL=1
  TERM=linux
  USER=cltbld
  _=/tools/buildbot/bin/python
 using PTY: False
requesting all changes
adding changesets
adding manifests
adding file changes
added 3062 changesets with 6319 changes to 1063 files
updating to branch default
abort: No space left on device
program finished with exit code 255
elapsedTime=19.700516
========= Finished clone build tools failed (results: 2, elapsed: 19 secs) (at 2012-10-15 05:45:28.080959) =========
}
and:

s: bld-linux64-ec2-033
https://tbpl.mozilla.org/php/getParsedLog.php?id=16113771&tree=Mozilla-Inbound

{
========= Started clone build tools failed (results: 2, elapsed: 20 secs) (at 2012-10-15 04:16:27.915769) =========
hg clone http://hg.mozilla.org/build/tools tools
 in dir /builds/slave/m-in-lnx-dbg/. (timeout 1320 secs)
 watching logfiles {}
 argv: ['hg', 'clone', 'http://hg.mozilla.org/build/tools', 'tools']
 environment:
  CCACHE_HASHDIR=
  G_BROKEN_FILENAMES=1
  HISTCONTROL=ignoredups
  HISTSIZE=1000
  HOME=/home/cltbld
  HOSTNAME=bld-linux64-ec2-033.build.aws-us-west-1.mozilla.com
  LESSOPEN=|/usr/bin/lesspipe.sh %s
  LOGNAME=cltbld
  MAIL=/var/spool/mail/cltbld
  PATH=/usr/local/bin:/usr/lib64/ccache:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/cltbld/bin
  PWD=/builds/slave/m-in-lnx-dbg
  SHELL=/bin/bash
  SHLVL=1
  TERM=linux
  USER=cltbld
  _=/tools/buildbot/bin/python
 using PTY: False
requesting all changes
adding changesets
adding manifests
adding file changes
added 3061 changesets with 6318 changes to 1063 files
updating to branch default
abort: No space left on device: /builds/slave/m-in-lnx-dbg/tools/lib/python
program finished with exit code 255
elapsedTime=20.961564
========= Finished clone build tools failed (results: 2, elapsed: 20 secs) (at 2012-10-15 04:16:48.912714) =========
}
Sounds like we need to bump buildSpace requirements due to mock overhead -- downloaded RPMs and mock chroot.
Having some initial cleanup on boot would be a good thing to do too.
Depends on: 712206
Severity: critical → major
Priority: -- → P2
Blocks: 807624, 807294, 798820
The EC2 instances have ~100G build space, and the ix hardware they're replacing has close to twice that. This could also be summarised as "update build space requirements to reality, since we've been getting away with them being wrong by having a lot of free space".
linux64 needs another gig - http://mxr.mozilla.org/build/source/buildbot-configs/mozilla/config.py#190 should be a 7 (inbound overrides the 6, and I don't think it's been starting these).
I bumped the build space for linux64 to 7 - http://hg.mozilla.org/build/buildbot-configs/rev/eebcc5ed460e
in production
Whiteboard: [sheriff-want]
Product: mozilla.org → Release Engineering
It doesn't look like there's an overarching problem to fix here...just the normal individual buildSpace bumps.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.