Closed
Bug 809978
Opened 12 years ago
Closed 11 years ago
xpt.py link: race condition - unpack requires a string argument of length 32
Categories
(Firefox Build System :: General, defect)
Firefox Build System
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: joey, Assigned: joey)
Details
Attachments
(1 file, 3 obsolete files)
23.40 KB,
patch
|
ted
:
review-
|
Details | Diff | Splinter Review |
When idl files are compiled in parallel and a module is later created by "xpt.py link" the link action can periodically fail attempting to link files while they are being compiled. The error can be detected under a very specific condition so add a small retry block around the I/O loop when linking to give the compile thread a little extra time to finish writing content out to disk.
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → joey
Assignee | ||
Comment 1•12 years ago
|
||
Comment 2•12 years ago
|
||
This really doesn't sound like the right solution: shouldn't makefile deps prevent attempting to link XPTs until the prerequisites are finished?
Assignee | ||
Comment 3•12 years ago
|
||
Comment on attachment 679770 [details] [diff] [review] retry link I/O on potential race conditions Added a 60-second (5x12) loop around the I/O block for reading source. If exception reported is: 'unpack requires a string arg of length x'. xpt.py operation is 'link' and the file was recently modified -- retry file I/O. Cleanups: o include filename when reporting exceptions. o removed whitespace embedded within blank lines.
Attachment #679770 -
Flags: review?(gps)
Assignee | ||
Comment 4•12 years ago
|
||
Try job: https://tbpl.mozilla.org/?tree=Try&rev=05faab407819
Assignee | ||
Comment 5•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #2) > This really doesn't sound like the right solution: shouldn't makefile deps > prevent attempting to link XPTs until the prerequisites are finished? Yes makefile deps would be the optimal answer but adding new logic in the makefiles tends to be frowned upon so this was a straightforward alternative.
Comment 6•12 years ago
|
||
This feels wrong. If our dependencies aren't handling this then our dependencies are broken.
Assignee | ||
Comment 7•12 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #6) > This feels wrong. If our dependencies aren't handling this then our > dependencies are broken. Yes the rules for xpidl module linking is definitely broken. People have reported the issue a few times but the fix has been 'try and rebuild'. If there will not be an issue adding makefile logic proper deps would help resolve the problem. In the interim this patch provides a workaround that will reduce the chance of the transient failure affecting builds. Deploy it to improve stability, retain the bug/state=open. The patch will be easy enough to deprecate after makefile deps have been properly fixed.
Comment 8•12 years ago
|
||
Comment on attachment 679770 [details] [diff] [review] retry link I/O on potential race conditions Review of attachment 679770 [details] [diff] [review]: ----------------------------------------------------------------- I'm pretty busy right now and this seems like a somewhat controversial patch. I'd rather not get myself involved at this juncture. Ask me again in a week or two if you still need review. Sorry.
Attachment #679770 -
Flags: review?(gps)
Assignee | ||
Comment 9•12 years ago
|
||
Err msg reported: "struct.error: unpack requires a string argument of length 32"
Assignee | ||
Comment 10•12 years ago
|
||
(In reply to Joey Armstrong [:joey] from comment #9) > Err msg reported: "struct.error: unpack requires a string argument of length > 32" ps> error message was reported as part of the subject line
Assignee | ||
Comment 11•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #679770 -
Attachment is obsolete: true
Assignee | ||
Comment 12•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #719528 -
Attachment is obsolete: true
Assignee | ||
Comment 13•11 years ago
|
||
Comment on attachment 720780 [details] [diff] [review] retry link I/O on potential race conditions Revive patch so work is not lost. Note: this patch will not address the underlying build dependency problems, that is the topic of another bug. This patch intends to board up the broken window so people will no longer trip over the problem. o Add ArgParse support for formal command line arg support + usage/help text. Existing 'link' and 'dump' arguments retained. Added argument --workaround-bug-754358 to enable functionality. If argument is commented out or removed while porting to mozbuild problem behavior will revert until bug 754358 is addressed. o Race condition workaround - if --workaround-bug-754358 passed - and last error was a struct length error - and operation is (xpt.py) 'link' - and file was very recently modified attempt to re-read the source file a few times. If error persists or the initial error was something other than file size 0 fail as usual.
Attachment #720780 -
Flags: review?(ted)
Comment 14•11 years ago
|
||
The simpler fix here is to stop linking xpt files in each directory where there are several, now that we can do that. It just requires to write more "interface" lines in the manifest.
Assignee | ||
Comment 15•11 years ago
|
||
Comment on attachment 720780 [details] [diff] [review] retry link I/O on potential race conditions cancel review - argparse python module not available on android or fedora. Will need add a compatibility layer, fallback to using optparse when argparse not available until the transition to 2.7 as a default lands everywhere.
Attachment #720780 -
Flags: review?(ted)
Comment 16•11 years ago
|
||
(In reply to Joey Armstrong [:joey] from comment #15) > Comment on attachment 720780 [details] [diff] [review] > retry link I/O on potential race conditions > > cancel review - argparse python module not available on android or fedora. > Will need add a compatibility layer, fallback to using optparse when > argparse not available until the transition to 2.7 as a default lands > everywhere. or just do as comment 14 says.
Assignee | ||
Comment 17•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #720780 -
Attachment is obsolete: true
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #16) > (In reply to Joey Armstrong [:joey] from comment #15) > > Comment on attachment 720780 [details] [diff] [review] > > retry link I/O on potential race conditions > > > > cancel review - argparse python module not available on android or fedora. > > Will need add a compatibility layer, fallback to using optparse when > > argparse not available until the transition to 2.7 as a default lands > > everywhere. > > or just do as comment 14 says. That would be a good future solution but this problem has persisted for a few months so why not board up the broken window in the interim so people stop tripping over it until the problem can be fixed permanently ?
Assignee | ||
Comment 19•11 years ago
|
||
Comment on attachment 721204 [details] [diff] [review] retry link I/O on potential race conditions Python problem was due to virtualenv path being stripped from xpidl_link. Full command to xpt.py now passed. https://tbpl.mozilla.org/?tree=Try&rev=308414ae8d0f Revive patch so work is not lost. Note: this patch will not address the underlying build dependency problems, that is the topic of another bug. This patch intends to board up the broken window so people will no longer trip over the problem. o Add ArgParse support for formal command line arg support + usage/help text. Existing 'link' and 'dump' arguments retained. Added argument --workaround-bug-754358 to enable functionality. If argument is commented out or removed while porting to mozbuild problem behavior will revert until bug 754358 is addressed. o Race condition workaround - if --workaround-bug-754358 passed - and last error was a struct length error - and operation is (xpt.py) 'link' - and file was very recently modified attempt to re-read the source file a few times. If error persists or the initial error was something other than file size 0 fail as usual
Attachment #721204 -
Flags: review?(ted)
Assignee | ||
Comment 20•11 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #14) > The simpler fix here is to stop linking xpt files in each directory where > there are several, now that we can do that. It just requires to write more > "interface" lines in the manifest. Is there a bug open for these edits and an ETA when they are all expected to land ?
Comment 21•11 years ago
|
||
Comment on attachment 721204 [details] [diff] [review] retry link I/O on potential race conditions Review of attachment 721204 [details] [diff] [review]: ----------------------------------------------------------------- ::: config/rules.mk @@ +1358,5 @@ > # no need to link together if XPIDLSRCS contains only XPIDL_MODULE > ifneq ($(XPIDL_MODULE).idl,$(strip $(XPIDLSRCS))) > + > +# based on $(XPIDL_LINK) > +XPT_PY ?= $(PYTHON) $(LIBXUL_DIST)/sdk/bin/xpt.py I would feel better if you defined XPT_PY in config.mk, and then defined XPIDL_LINK as "$(XPT_PY) link". Also, this is sort of broken given the xpidl-module-deps below, isn't it? (you'll be adding a dependency on the python binary) ::: xpcom/typelib/xpt/tools/xpt.py @@ +99,5 @@ > class Type(object): > """ > Data type of a method parameter or return value. Do not instantiate > this class directly. Rather, use one of its subclasses. > + Thanks for cleaning up the extraneous whitespace. @@ +1129,5 @@ > + and len(sys.argv) > 1 > + and sys.argv[1] == 'link' > + and len(err.args) > 0 > + and err.args[0].startswith('unpack requires a string argument of length') > + ): This seems like a very awkward way to diagnose the condition "trying to read an empty file". I would instead just change the code above where it stats the input file to detect if the filesize is < 32 bytes, and do your retry loop there. @@ +1440,5 @@ > +def local_argparse(): > + """ > + Use argparse to parse & validate command line args. > + A dictionary of values will be returned. > + This feels really extraneous to this patch. I wouldn't bother putting in argparse, I'd just make this behavior the default until we fix things so that we can remove it.
Attachment #721204 -
Flags: review?(ted) → review-
Assignee | ||
Comment 22•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #21) > Comment on attachment 721204 [details] [diff] [review] > retry link I/O on potential race conditions > > Review of attachment 721204 [details] [diff] [review]: > ----------------------------------------------------------------- > > @@ +1129,5 @@ > > + and len(sys.argv) > 1 > > + and sys.argv[1] == 'link' > > + and len(err.args) > 0 > > + and err.args[0].startswith('unpack requires a string argument of length') > > + ): > > This seems like a very awkward way to diagnose the condition "trying to read > an empty file". I would instead just change the code above where it stats > the input file to detect if the filesize is < 32 bytes, and do your retry > loop there. This condition is not simply testing for an empty file, it was written to specifically limit when the retry loop is allowed to run. If a non-standard record size is written to disk (32 > x > 0) the condition will be a false positive and failure is not immediate. The race condition is known to happen with empty files so that is why the conditional is testing for it.
Assignee | ||
Comment 23•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #21) > Comment on attachment 721204 [details] [diff] [review] > retry link I/O on potential race conditions > > Review of attachment 721204 [details] [diff] [review]: > ----------------------------------------------------------------- > @@ +1440,5 @@ > > +def local_argparse(): > > + """ > > + Use argparse to parse & validate command line args. > > + A dictionary of values will be returned. > > + > > This feels really extraneous to this patch. I wouldn't bother putting in > argparse, I'd just make this behavior the default until we fix things so > that we can remove it. Question: why would adding formal argument parsing and help text within the script be a problem ?
Comment 24•11 years ago
|
||
Because it's not necessary? The script doesn't currently take any optional arguments, so you're adding this purely for the sake of adding the argument you're using, which I think you should just make the default behavior.
Comment 25•11 years ago
|
||
(In reply to Joey Armstrong [:joey] from comment #22) > This condition is not simply testing for an empty file, it was written to > specifically limit when the retry loop is allowed to run. If a non-standard > record size is written to disk (32 > x > 0) the condition will be a false > positive and failure is not immediate. The race condition is known to > happen with empty files so that is why the conditional is testing for it. This error condition happens because there are < 32 bytes in the file it's trying to read. You're testing this in a roundabout way by catching the specific struct.unpack error we hit in this condition. I think you should just test the result of os.stat directly, since it's testing the same thing but is more straightforward.
Assignee | ||
Comment 26•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #21) > Comment on attachment 721204 [details] [diff] [review] > retry link I/O on potential race conditions > > Review of attachment 721204 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: config/rules.mk > @@ +1358,5 @@ > > # no need to link together if XPIDLSRCS contains only XPIDL_MODULE > > ifneq ($(XPIDL_MODULE).idl,$(strip $(XPIDLSRCS))) > > + > > +# based on $(XPIDL_LINK) > > +XPT_PY ?= $(PYTHON) $(LIBXUL_DIST)/sdk/bin/xpt.py > > I would feel better if you defined XPT_PY in config.mk, and then defined > XPIDL_LINK as "$(XPT_PY) link". Also, this is sort of broken given the > xpidl-module-deps below, isn't it? (you'll be adding a dependency on the > python binary) Oh also the deps are not broken the added dependency is correct. We should recompile when tools used to generate content are changed or it will create a hole for testing by leaving stale content on disk and/or not update timestamps. This is a formula for landmines. Stale content left on disk by not recompiling. Whomever follows along later and modifies source to force a recompile is the luck person who has to deal with the untested changes from xpt.py.
Assignee | ||
Comment 27•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #24) > Because it's not necessary? The script doesn't currently take any optional > arguments, so you're adding this purely for the sake of adding the argument > you're using, which I think you should just make the default behavior. Syntax for argunment handling in the script is a bit awkward as it stands. Arguments are positional and checking for values like 'link' at *hardcoded* offsets. If formal command line argument handling is added it will be a lot easier for people to enhance this script later on. Even trying to add arguments like --debug or --verbose can be difficult with the positional handling.
Assignee | ||
Comment 28•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #25) > (In reply to Joey Armstrong [:joey] from comment #22) > > This condition is not simply testing for an empty file, it was written to > > specifically limit when the retry loop is allowed to run. If a non-standard > > record size is written to disk (32 > x > 0) the condition will be a false > > positive and failure is not immediate. The race condition is known to > > happen with empty files so that is why the conditional is testing for it. > > This error condition happens because there are < 32 bytes in the file it's > trying to read. You're testing this in a roundabout way by catching the > specific struct.unpack error we hit in this condition. I think you should > just test the result of os.stat directly, since it's testing the same thing > but is more straightforward. Yes testing for <32 would widen the scope to detect problems but the majority of those conditions would not make sense in this loop. If I were to introduce a typo in xpt.py which generated a bad typedef size, falling into the retry loop attempting to detect a race condition would add needless overhead. If the record size were to change the compile should fail immediately, which happens now because of the empty file test.
Comment 29•11 years ago
|
||
I don't think you should be optimizing for the possibility of someone adding typos in the code. We have unit tests that should catch errors like that.
Assignee | ||
Comment 30•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #29) > I don't think you should be optimizing for the possibility of someone adding > typos in the code. We have unit tests that should catch errors like that. This is not optimization to detect typos it is restricting a condition to detect one very specific error condition. Widening the check to handle (32 > x > 0) implies a wildcard match that will force the logic block to handle failures it was never intended to. The only observed/reported cases for the race condition have been touch(file) + size(rec)==0. If we begin observing size(rec) != 0 that would be the time to expand the conditional into a catchall.
Comment 31•11 years ago
|
||
Backed out for landing with review-. Joey, why did you land this? remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/26e6a0b13beb remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/008214fd37cf
Comment 33•11 years ago
|
||
I /think/ bug 850380 will invalidate this one since all idls and xpts will be defined in a single make file and we won't have to worry about race conditions since all dependencies will be explicit. Perhaps we can close this bug WONTFIX?
Flags: needinfo?(gps)
Comment 34•11 years ago
|
||
I was hitting this error and rebuilds didn't resolve it - something had written 0-byte nsILocalFile.xpt and nsILocalFileWin.xpt files, so rebuilding just hit the error again. Perhaps I aborted the build (I don't think I did, but can no longer remember), but that shouldn't cause all future builds to fail. Could hitting the race condition per above leave the 0-byte files? *At Least* on a 0-byte file, throw a usable error that says what the file was and that it should be removed and a rebuild done. I was stuck on this for multiple days until I got annoyed enough to try to figure out why (and I didn't want to blow away a partially-done build on windows, given how long they take, and then find it happens again, so I let it sit).
Comment 35•11 years ago
|
||
We should probably never write target files directly but write to temporary files and then rename them.
Assignee | ||
Comment 36•11 years ago
|
||
patch with race workaround no longer needed.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
Updated•6 years ago
|
Product: Core → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•