Closed
Bug 771687
Opened 13 years ago
Closed 9 years ago
Fuzzing jobs are hitting file-in-use errors
Categories
(Release Engineering :: General, defect, P3)
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: jruderman, Unassigned)
Details
(Whiteboard: [fuzzer])
One of the first things bot.py does is clean up any left-over wtmp1/ directory from previous runs. On Windows, it's frequently hitting an error where a file in wtmp1/ is in use.
Do these machines get rebooted between jobs? Is there indexing or antivirus or something else running on the machines that would interfere with the file system? Do you know how to debug problems like this?
From /mnt/pvt_builds/fuzzing/tinderbox-builds/idle-win32fuzzer-win32-bm13-build1-build2560.txt.gz on pvtbuilds2:
wtmp1 shouldn't exist now. killing it.
Traceback (most recent call last):
File "fuzzing/dom/automation/bot.py", line 15, in <module>
bot.main()
File "e:\builds\moz2_slave\fuzzer-win32\fuzzing\bot.py", line 208, in main
shutil.rmtree("wtmp1")
File "d:\mozilla-build\python25\lib\shutil.py", line 174, in rmtree
onerror(os.remove, fullname, sys.exc_info())
File "d:\mozilla-build\python25\lib\shutil.py", line 172, in rmtree
os.remove(fullname)
WindowsError: [Error 13] The process cannot access the file because it is being used by another process: 'wtmp1\\w10-err.txt'
You can find more instances with:
zcat * | grep "because it is being used"
Comment 1•13 years ago
|
||
I found a couple of machines where the js executable was still running after the buildbot job finished. It's a known issue that buildbot isn't always successful at cleaning up on windows, but I think I saw this on linux too so there's something fishy going on.
On a windows box wtmp1/ was using 93G of space, and had filled up the builds partition. The w1-err.txt file was the problem, lots of repeated
e:\builds\moz2_slave\fuzzer-win32\fuzzing\js\jsfunfuzz.js:660: strict warning: jgipjb is read-only
by the looks. See pvtbuilds2:/tmp/fuzz.tar.gz for a copy of fuzzer-win32/. tar truncated the err log to the first 145MB; the fuzzing repo rev was 2f9ea46c14ff.
Comment 2•13 years ago
|
||
I don't think the machines get rebooted between fuzzing jobs. That might be an easy way to fix this issue.
| Reporter | ||
Comment 3•13 years ago
|
||
Was the still-running js executable left over from a job that finished normally, or a job that hit a buildbot timeout?
Comment 4•13 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #2)
> I don't think the machines get rebooted between fuzzing jobs. That might be
> an easy way to fix this issue.
I think we should be doing that anyway.
I'm taking a stab at the platform based on the pattern in bug 692715.
Component: Release Engineering → Release Engineering: Automation (General)
OS: All → Windows 7
QA Contact: catlee
Hardware: All → x86
Whiteboard: [fuzzer]
Comment 5•13 years ago
|
||
Chris, actually our logs seem to show Windows Server (I think 2003).
OS: Windows 7 → Windows Server 2003
Updated•13 years ago
|
Priority: -- → P3
| Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
| Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•