Closed Bug 771687 Opened 13 years ago Closed 9 years ago

Fuzzing jobs are hitting file-in-use errors

Categories

(Release Engineering :: General, defect, P3)

x86
Windows Server 2003
defect

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: jruderman, Unassigned)

Details

(Whiteboard: [fuzzer])

One of the first things bot.py does is clean up any left-over wtmp1/ directory from previous runs. On Windows, it's frequently hitting an error where a file in wtmp1/ is in use. Do these machines get rebooted between jobs? Is there indexing or antivirus or something else running on the machines that would interfere with the file system? Do you know how to debug problems like this? From /mnt/pvt_builds/fuzzing/tinderbox-builds/idle-win32fuzzer-win32-bm13-build1-build2560.txt.gz on pvtbuilds2: wtmp1 shouldn't exist now. killing it. Traceback (most recent call last): File "fuzzing/dom/automation/bot.py", line 15, in <module> bot.main() File "e:\builds\moz2_slave\fuzzer-win32\fuzzing\bot.py", line 208, in main shutil.rmtree("wtmp1") File "d:\mozilla-build\python25\lib\shutil.py", line 174, in rmtree onerror(os.remove, fullname, sys.exc_info()) File "d:\mozilla-build\python25\lib\shutil.py", line 172, in rmtree os.remove(fullname) WindowsError: [Error 13] The process cannot access the file because it is being used by another process: 'wtmp1\\w10-err.txt' You can find more instances with: zcat * | grep "because it is being used"
I found a couple of machines where the js executable was still running after the buildbot job finished. It's a known issue that buildbot isn't always successful at cleaning up on windows, but I think I saw this on linux too so there's something fishy going on. On a windows box wtmp1/ was using 93G of space, and had filled up the builds partition. The w1-err.txt file was the problem, lots of repeated e:\builds\moz2_slave\fuzzer-win32\fuzzing\js\jsfunfuzz.js:660: strict warning: jgipjb is read-only by the looks. See pvtbuilds2:/tmp/fuzz.tar.gz for a copy of fuzzer-win32/. tar truncated the err log to the first 145MB; the fuzzing repo rev was 2f9ea46c14ff.
I don't think the machines get rebooted between fuzzing jobs. That might be an easy way to fix this issue.
Was the still-running js executable left over from a job that finished normally, or a job that hit a buildbot timeout?
(In reply to Chris AtLee [:catlee] from comment #2) > I don't think the machines get rebooted between fuzzing jobs. That might be > an easy way to fix this issue. I think we should be doing that anyway. I'm taking a stab at the platform based on the pattern in bug 692715.
Component: Release Engineering → Release Engineering: Automation (General)
OS: All → Windows 7
QA Contact: catlee
Hardware: All → x86
Whiteboard: [fuzzer]
Chris, actually our logs seem to show Windows Server (I think 2003).
OS: Windows 7 → Windows Server 2003
Priority: -- → P3
Product: mozilla.org → Release Engineering
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.