Closed
Bug 522804
Opened 12 years ago
Closed 12 years ago
ZipFile exception.WindowsError build problem
Categories
(Firefox Build System :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: ddahl, Assigned: ted)
Details
Attachments
(2 files)
708 bytes,
patch
|
Pike
:
review+
|
Details | Diff | Splinter Review |
1.98 KB,
patch
|
Pike
:
review+
|
Details | Diff | Splinter Review |
Looks like a python build issue? Exception exceptions.WindowsError: (32, 'The process cannot access the file because it is being used by another process', '../../dist/bin/chrome\\toolkit.jar.lck') in <bound method LockFile.__del__ of <utils.LockFile object at 0x00B24E50>> ignored Exception exceptions.AttributeError: "ZipFile instance has no attribute '_remove'" in <bound method ZipFile.__del__ of <MozZipFile.ZipFile instance at 0x00B2DC88>> ignored Exception exceptions.AttributeError: "ZipFile instance has no attribute '_remove'" in <bound method ZipFile.__del__ of <MozZipFile.ZipFile instance at 0x00B2EC88>> ignored make[6]: *** [libs] Error 1 make[6]: *** [libs] Error 1 make[5]: *** [generic_libs] Error 2 make[5]: *** Waiting for unfinished jobs.... make[5]: *** [mathml_libs] Error 2 make[4]: *** [libs_tier_gecko] Error 2 make[3]: *** [tier_gecko] Error 2 make[2]: *** [default] Error 2 make[1]: *** [build] Error 2 make: *** [profiledbuild] Error 2
Reporter | ||
Updated•12 years ago
|
Whiteboard: [orange]
Reporter | ||
Comment 1•12 years ago
|
||
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1255733733.1255734014.28630.gz WINNT 5.2 mozilla-central build [busted] Started 15:55, finished 16:01
Comment 2•12 years ago
|
||
ddahl, can you post the link to the error log for comment 0? Interestingly, the error message in comment 1 was slightly different from the one in Comment 0. In particular, Comment 0's error mentions "exceptions.WindowsError" and a file being "used by another process", but neither of those are in the log for comment 1. Here's the error for comment 1: ../../../dist/bin/chrome\toolkit.jar.lck has been locked for more than 600 seconds (PID 2348 ) Exception exceptions.AttributeError: "ZipFile instance has no attribute '_remove'" in <bound method ZipFile.__del__ of <MozZipFile.ZipFile instance at 0x00B2FBE8>> ignored make[7]: *** [libs] Error 1 make[6]: *** [libs] Error 2 make[5]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32/build/obj-firefox/intl' make[4]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32/build/obj-firefox' make[3]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32/build/obj-firefox' make[2]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32/build/obj-firefox' make[1]: Leaving directory `/e/builds/moz2_slave/mozilla-central-win32/build' make[5]: *** [libs] Error 2 make[4]: *** [libs_tier_necko] Error 2 make[3]: *** [tier_necko] Error 2 make[2]: *** [default] Error 2 make[1]: *** [build] Error 2 make: *** [profiledbuild] Error 2 program finished with exit code 2
Assignee | ||
Comment 3•12 years ago
|
||
Pike wrote the JarMaker...
Comment 4•12 years ago
|
||
I guess the lock itself was some slave problem or so, and should have busted the build, IMHO. The MozZipFile didn't deal with it too gracefully, hiding the original error. I'll check if that can be done better.
Comment 5•12 years ago
|
||
I'm having problems to reproduce this locally, if someone can create a test, that'd help a ton.
Reporter | ||
Comment 6•12 years ago
|
||
Here's a full log: http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1258497842.1258509206.14528.gz&fulltext=1 WINNT 5.2 mozilla-1.9.2 build [busted] Started 15:48, finished 16:27
Comment 7•12 years ago
|
||
This is clearly not blocking bug randoorange, this is an intermittent build failure.
No longer blocks: 438871
Comment 8•12 years ago
|
||
Hrm... Should this have [orange] status in whiteboard, then? IIUC, the [orange] whiteboard-status is supposed to exactly match the "bug 438871 blocking status". Support for this is on http://tinderbox.mozilla.org/Firefox/ : "Intermittent failures are tracked with a metabug and the [orange] list. " and also on the last comment of the metabug, bug 438871 comment 22. ("Adding all the bugs which have [orange] in their whiteboard but do not block this bug.") I'd argue that exactly one of the following is true: (A) intermittent build failures can be considered "test failures" for the purposes of the metabug and for [orange], to make things easier on sheriffs ... OR.... (B) we need a distinct whiteboard status (and/or metabug) for intermittent build failures. I'd been assuming that (A) was the case, but maybe (B) is instead the case?
Comment 9•12 years ago
|
||
No, it should not have [orange] on the whiteboard. Removed.
Whiteboard: [orange]
Comment 10•12 years ago
|
||
I don't think that the ZipFile exception is the real problem here. It seems that the locks just go bad for a variety of different reasons, and the JarMaker aborts the build on that. Which on the way down raises the zipfile thing, but the real problem is that the JarMaker can't work right on the locks on windows. I tried to setup a windows build to test this, but failed again, and I don't have the hours right now to fix that. If someone can step up and add some windows help, that'd be great.
Assignee | ||
Comment 11•12 years ago
|
||
I started hitting this pretty consistently on my Windows machine, probably because it was the first time I tried to do a full -j4 build since that parallel dirs patch landed. The problem appears to be that os.open can fail with EACCES as well as EEXIST on Windows, and we're not handling the former in lockFile. With this patch I don't hit this error at all.
Attachment #413665 -
Flags: review?(l10n)
Comment 12•12 years ago
|
||
Comment on attachment 413665 [details] [diff] [review] handle more exception types Cool, thanks for finding that.
Attachment #413665 -
Flags: review?(l10n) → review+
Assignee | ||
Comment 13•12 years ago
|
||
Since I haven't been able to land this patch yet, I'll just throw another patch on it. This fixes two more race conditions that I hit: 1) In LockFile.__del__, on Windows you can hit EACCES in os.remove if another process has the lockfile open. This adds a loop around the remove with a check for EACCES. 2) In the open for reading/stat, we can hit EACCES on Windows, so fix this like I did for the first open. (also I fixed a silly printing nit where we didn't strip the trailing newline from the PID that was read from the file.)
Assignee | ||
Comment 14•12 years ago
|
||
I can build on Windows -j4 reliably with these patches without hitting any JarMaker errors.
Comment 15•12 years ago
|
||
Comment on attachment 414232 [details] [diff] [review] fix two more race conditions Let's do a quick time.sleep in the __del__ loop, I don't mind if 10 or 100 ms, as long as it's drastically faster than the 1 sec for the retry of locking the file. Given that there are process switches and disk io, 10 might be too quick? r=me with that.
Attachment #414232 -
Flags: review?(l10n) → review+
Assignee | ||
Updated•12 years ago
|
Component: General → Build Config
QA Contact: general → build-config
Assignee | ||
Comment 16•12 years ago
|
||
I pushed these both to m-c despite the freeze because they fix intermittent build bustage: http://hg.mozilla.org/mozilla-central/rev/b85c66115cd2 http://hg.mozilla.org/mozilla-central/rev/941ad9d7d079
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•