Closed
Bug 862355
Opened 12 years ago
Closed 12 years ago
Clean up the tmpdir on Windows build slaves on reboot
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: benjamin, Assigned: coop)
References
Details
(Keywords: sheriffing-P1)
Attachments
(1 file)
1.01 KB,
patch
|
armenzg
:
review+
|
Details | Diff | Splinter Review |
Windows machines don't automatically clean out tmpdir on reboot, so we've discovered some machines with very large tmpdirs causing interesting test failures (in bug 852429). Please arrange the windows build slaves so that they delete everything in tmpdir on reboot.
Comment 1•12 years ago
|
||
found in triage.
Component: Release Engineering → Release Engineering: Machine Management
QA Contact: armenzg
Updated•12 years ago
|
Component: Release Engineering: Machine Management → Release Engineering: Automation (General)
QA Contact: armenzg → catlee
Whiteboard: [mozharness]
Updated•12 years ago
|
Keywords: sheriffing-P1
Comment 2•12 years ago
|
||
Is anyone working on this?
Comment 3•12 years ago
|
||
Though it's not obvious from the terrible reporting we have for make check, since a green run passes 54802 tests and one that hits this problem passes 50607 tests, we're apparently failing to even run around 8% of make check most of the time on Windows. If nobody is ever going to do anything about this, please say so, so we can work on some other approach to dealing with it.
Severity: normal → major
Comment 4•12 years ago
|
||
Armen, what options do we have here ? AFAICT the issue is the w64 compile slaves, so can we use an on-boot scheduled task ? Or perhaps it's better in runslave.py, or count_and_reboot.py ?
Flags: needinfo?(armenzg)
Comment 5•12 years ago
|
||
Also, why is this tagged [mozharness] ?
Updated•12 years ago
|
Summary: Clean up the tmpdir on Windows machines on reboot → Clean up the tmpdir on Windows build slaves on reboot
Whiteboard: [mozharness]
Comment 6•12 years ago
|
||
The good things about the Win64 machines is that we can access them through SSH as Administrators. This means that we can add pretty much anything to it. An on-boot task should do the trick. Modifying runslave.py is also an option. What specific directories are we referring to? C:\Windows\Temp?
Flags: needinfo?(armenzg)
Comment 7•12 years ago
|
||
Doesn't look like it. It's the XPCOM TmpD, which gets set from the Windows function ::GetTempPathW, which sets it from the env var TMP, or the env var TEMP, or the env var USERPROFILE, or (sweet!) the Windows dir. Since the make check buildstep's env dump says that both TMP and TEMP are C:/Users/cltbld/AppData/Local/Temp, my expectation is that if you looked at that on an affect buildslave, like w64-ix-slave08 or any of the others listed in bug 861176, you would find that it contains whatever the maximum number of files/subdirectories Windows allows actually is (and while you are there, could you clear it out?).
Comment 8•12 years ago
|
||
On w64-ix-slave08, Windows reports a total of 109684 folders and 61899 files in C:/Users/cltbld/AppData/Local/Temp. Breaking that down a bit at the top-level: 10000 cpp-unit-profd or cpp-unit-profd-nnnn (up to 9999, suspicious!) 7292 tmp-<6randchars> 3930 ssh-<10randchars> Some of those date back to 2011, I removed them now. philor, do you know of any similar issues on test slaves ? It'd affect where we put the fix.
Comment 9•12 years ago
|
||
I don't know of any active tmpdir problems on test slaves, but then I've never seen any part of them other than the crap we dump on the desktop. TmpD is XPCOM, so you can most certainly get it and drop stuff in it just as easily from any other sort of test as from a cppunittest, so even if we haven't already set ourselves up for a fail like only being able to run tests 10,000 times (or possibly 10,000 divided by the number of tests that create a profile times), we probably will in the future. If it doesn't overcomplicate the fix, I'd say fixing it for either all Windows slaves, or all slaves (are we not seeing this problem on not-Windows because the cppunittest harness tries to delete the profile but, just like all deleting of things on Windows, it fails every time, or are we not seeing it because we already clean up the tmpdir on not-Windows?) would be better.
Comment 10•12 years ago
|
||
And see also bug 870638 about wanting to have crash reports cleaned up on test slaves, which is a dupe of another one about wanting them cleaned up because... Firefox Health Reporter, maybe, was getting baffled by the way its tests had to deal with the surprise of seeing tens of thousands of crashes having happened.
Comment 11•12 years ago
|
||
Armen tells me the right place to do this is in this file: http://hg.mozilla.org/build/puppet-manifests/file/22b8f942937e/modules/buildslave/files/buildbot-win64.bat So we want to clean out C:/Users/cltbld/AppData/Local/Temp C:/Users/cltbld/Desktop anything else?
Comment 12•12 years ago
|
||
Alas, we didn't mean only Windows build slaves, we just managed to fill them up faster - bld-lion-r5-042 apparently has 10000 cpp-unit-profd-NNNN directories, since it is now failing make check too.
Comment 13•12 years ago
|
||
Ah, apparently we did mean only Windows, it's just that when randomfoothing happens and we can't create a profile directory on a Mac, it has the same symptom as not being able to create cpp-unit-profd-1234 because there's already a 1234.
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → coop
Status: NEW → ASSIGNED
Priority: -- → P2
Assignee | ||
Comment 14•12 years ago
|
||
This seemed to do the trick on the staging slave I tried this on. I'll start cobbling together an install script.
Attachment #750006 -
Flags: review?(armenzg)
Comment 15•12 years ago
|
||
Comment on attachment 750006 [details] [diff] [review] Remove temp files on Windows Review of attachment 750006 [details] [diff] [review]: ----------------------------------------------------------------- Why do we clobber the desktop? FYI this would not work on test machines since on the Desktop we have startTalos.bat
Attachment #750006 -
Flags: review?(armenzg) → review+
Assignee | ||
Comment 16•12 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #15) > FYI this would not work on test machines since on the Desktop we have > startTalos.bat Right, but I this bug is specifically about the build slaves. For hygiene reasons, I think we should also deploy a similar change to the test slaves, but that can be a follow-up.
Assignee | ||
Comment 17•12 years ago
|
||
This has been deployed to all build slaves now, modulo w64-ix-slave23 that needs a re-image in bug 873140.
Comment 18•12 years ago
|
||
Appears to have done the trick, thanks!
Assignee | ||
Updated•12 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•