Cleanup temporary files on boot

RESOLVED WORKSFORME

Status

Release Engineering
Platform Support
P3
normal
RESOLVED WORKSFORME
5 years ago
3 years ago

People

(Reporter: armenzg, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

Depends on: 880774
Depends on: 880874

Updated

5 years ago
Summary: Determine why the Mac builders are loosing their ability to create in TMPDIR → Determine why the Mac builders are losing their ability to create in TMPDIR
Depends on: 881498
Depends on: 847778
Depends on: 881501
Depends on: 881502
Depends on: 881503
Depends on: 881506

Comment 1

5 years ago
Hm, it's not clear -- should this be blocking the above builders' bugs, or should we set up those builders to see if it still happens?
Depends on: 882686
My feeling is that we should:

* disable the affected builders (the Try ones in particular are all still enabled and failing job after job)

* pick one to give to the hypothetical investigator of this bug

* fix the rest
Depends on: 882781
Depends on: 871367
So, $TMPDIR is /var/folders/30/yq_p3wk15yb9wdsv6sm_m0v00000gn/T/ on bld-lion-r5-031, from the last build.  Part of that is dynamically generated, but a find on /var/folders doesn't show any huge number of files.

A find for cpp-unit-profd shows, along with a lot of permission denied,
/Users/cltbld/.Trash/Recovered files/cpp-unit-profd
/Users/cltbld/.Trash/Recovered files #1/cpp-unit-profd
/Users/cltbld/Library/Caches/TemporaryItems/cpp-unit-profd

and under that TemporaryItems directory, I also see ./cpp-unit-profd-{1..9999}.  So philor nailed it in bug 852429 comment 627:

> The something else being that the unique naming only planned on needing four
> digits of unique, and that wasn't enough. Hard to believe we've crashed
> 10,000 times on all of these broken Windows slaves, so I suspect (as is the
> case most of the time when we try to delete files on Windows) that the
> harness doesn't actually succeed at deleting the profile directory at all on
> Windows.

So the fix is clean that up on startup - either in the builds, in puppet, or in the preflight scripts catlee's working on.

I'm going to reimage all of the dependent machines for bug 760093 anyway.
Depends on: 883548
Depends on: 883549
Depends on: 884297

Updated

5 years ago
Summary: Determine why the Mac builders are losing their ability to create in TMPDIR → Cleanup temporary files on boot
Depends on: 884659

Comment 4

5 years ago
Removing the dependencies here. I think it's pretty safe to say that *all* lion slaves are at risk of hitting this bug until it's resolved.

Updated

5 years ago
Component: Release Engineering: Machine Management → Release Engineering: Platform Support
QA Contact: armenzg → coop
(Assignee)

Updated

5 years ago
Product: mozilla.org → Release Engineering

Updated

5 years ago
Blocks: 874642

Comment 5

5 years ago
Please can we bump the priority on this? I've unfortunately just had to disable 9 lion try build pool slaves, leaving us slightly short :-s
It seems that ted might be fixing this with the bug he's going to be filing.

10:46 ted: edmorley: TestHarness.h is old :-/
10:47 edmorley: ted: could we disable whatever is making these directories and not cleaning them up, until we fix this (or move cpp unit tests out of make check, if that will also fix it)
10:47 ted: edmorley: we can probably just fix TestHarness.h to use the cwd for the profile dir now
10:47 ted: since runcppunittests.py runs them in a temp dir
10:48 edmorley: ted: ah
10:48 ted: http://mxr.mozilla.org/mozilla-central/source/testing/runcppunittests.py#41
10:48 ted: should be a really small fix
10:50 edmorley: ted: could I leave you to file a bug blocking bug 874642 with a quick explanation? (I've got to start getting organised for my doctor's appointment shortly)
10:50 bugbot: Bug https://bugzilla.mozilla.org/show_bug.cgi?id=874642 normal, --, ---, nobody, NEW , Intermittent TEST-UNEXPECTED-FAIL | TestSTSParser | test failed with return code 65280 | Couldn't get the profile directory.
10:50 armenzg_brb is now known as armenzg_buildduty
10:50 ted: edmorley: yeah
10:50 ted: can probably patch it too
10:50 edmorley: ty
Assignee: nobody → ted
I'm planning on fixing one of the causes of this, but I don't think it'll fix it in general. This is probably still a useful thing to do.
Assignee: ted → nobody
Is this all it is needed to fix the machines? (using -* complained of too many to expand; I think).

rm -rf /Users/cltbld/Library/Caches/TemporaryItems/cpp-unit-profd-1*
rm -rf /Users/cltbld/Library/Caches/TemporaryItems/cpp-unit-profd-2*
rm -rf /Users/cltbld/Library/Caches/TemporaryItems/cpp-unit-profd-3*
rm -rf /Users/cltbld/Library/Caches/TemporaryItems/cpp-unit-profd-4*
rm -rf /Users/cltbld/Library/Caches/TemporaryItems/cpp-unit-profd-5*
rm -rf /Users/cltbld/Library/Caches/TemporaryItems/cpp-unit-profd-6*
rm -rf /Users/cltbld/Library/Caches/TemporaryItems/cpp-unit-profd-7*
rm -rf /Users/cltbld/Library/Caches/TemporaryItems/cpp-unit-profd-8*
rm -rf /Users/cltbld/Library/Caches/TemporaryItems/cpp-unit-profd-9*
Assignee: nobody → armenzg
Depends on: 906706
It seems that our current list of hosts to fix are:

bld-lion-r5-023
bld-lion-r5-024
bld-lion-r5-029
bld-lion-r5-030
bld-lion-r5-031
bld-lion-r5-033
bld-lion-r5-036
bld-lion-r5-037
bld-lion-r5-038
bld-lion-r5-069
The two machines that I tried this on took jobs appropriately.

I cleaned up and rebooted all remaining ones.
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Thank you for sorting out those machines.

Reopening this bug, since it's for the automated ongoing cleanup (eg via puppet/...; comment 3) vs the one-off cleanup needed to get those particular machines back in service :-)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
edmorley, you want me to keep this one open until bug 906706 is completed?

I thought that having bug 906706 was good enough to close this one.
No longer blocks: 901286
I've cleaned up *all* hosts.

It seems that Puppet doesn't make the cut.

We'll have to deal with it with pre-flight tasks.
Assignee: armenzg → nobody
Depends on: 712206

Updated

5 years ago
Assignee: nobody → sbruno
Priority: -- → P3

Updated

4 years ago
No longer blocks: 880774
See Also: → bug 880774
Created attachment 8420139 [details] [diff] [review]
Cleanup temp files on Darwin before starting buildbot

This is a temporary fix to stop the bleeding on Mac builders until catlee has runner ready for pre-flight tasks.
Attachment #8420139 - Flags: review?(bugspam.Callek)
Comment on attachment 8420139 [details] [diff] [review]
Cleanup temp files on Darwin before starting buildbot

Review of attachment 8420139 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/buildslave/files/darwin-run-buildslave.sh
@@ +7,5 @@
>  rm -rf /var/tmp/semaphore/run-buildbot
>  
> +# Pre-flight cleanup
> +if [ "${TMPDIR}" != "" ]; then
> +  rm -rf ${TMPDIR}/*

I'm pretty sure this might kill off any ssh-agent type stuff, and possibly some plist startup things, I forget.

Can we do this with a find ... -delete say with an age > 2 hours?
Attachment #8420139 - Flags: review?(bugspam.Callek) → review+
(In reply to Justin Wood (:Callek) from comment #15)
> Comment on attachment 8420139 [details] [diff] [review]
> Cleanup temp files on Darwin before starting buildbot
> 
> Review of attachment 8420139 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: modules/buildslave/files/darwin-run-buildslave.sh
> @@ +7,5 @@
> >  rm -rf /var/tmp/semaphore/run-buildbot
> >  
> > +# Pre-flight cleanup
> > +if [ "${TMPDIR}" != "" ]; then
> > +  rm -rf ${TMPDIR}/*
> 
> I'm pretty sure this might kill off any ssh-agent type stuff, and possibly
> some plist startup things, I forget.
> 
> Can we do this with a find ... -delete say with an age > 2 hours?

None of that stuff lives in $TMPDIR, but rather in /private/tmp.
Comment on attachment 8420139 [details] [diff] [review]
Cleanup temp files on Darwin before starting buildbot

https://hg.mozilla.org/build/puppet/rev/c60a425448a7
Attachment #8420139 - Flags: checked-in+

Updated

4 years ago
Blocks: 1013511

Updated

4 years ago
Assignee: sbruno → nobody

Updated

4 years ago
No longer blocks: 847778

Updated

4 years ago
No longer blocks: 906677
No longer blocks: 882781

Updated

3 years ago
Status: REOPENED → RESOLVED
Last Resolved: 5 years ago3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.