Closed Bug 396187 Opened 17 years ago Closed 15 years ago

nsinstall had problems running on new win ref platform VM

Categories

(Release Engineering :: General, defect, P3)

x86
Windows XP
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: joduinn, Unassigned)

References

()

Details

(Whiteboard: [win32 path limit])

Attachments

(4 files)

During the gecho1.9a8 release, we hit problems running nsinstall.

When the build directory was called "Fx-Mozilla1.9-Release", nsinstall failed out with "unable to write" errors. When we renamed the directory to "Fx-Rel", nsinstalled worked fine. The nightly machine, also running the new win ref platform worked fine, with a directory called "Fx-Trunk".

Theories so far:
- problem with total command line length being too long. A quick sample found some lines being 2722 chars long, others noticed being longer. 

- problem with directory names of more the 8chars.  The two working situations are both 8 chars or less.

- something about how MSYS handles command lines different to cygwin.



We need to at least understand what is causing this unexpected build failure, before we can decide what to do about it.
Priority: -- → P3
Per meeting with joduinn, taking these and setting P2.
Priority: P3 → P2
Whiteboard: eta 1 Oct.
D'oh, forgot to reassign these to me.
Assignee: build → preed
I've got a couple of test Tinderboxen running now to see where the line limit is.
So, after a bunch of testing, I think I've deduced that this doesn't seem to be a problem with directory length, path length, or command line length, but rather with the number of relative references to a directory (i.e. '..'), possibly tickling a path length limit with the way nsinstall constructs the paths (if it does so by getcwd() + all the '..' references + the rest of the path).

Anyway, this is a makefile change that I tested to fix this problem; I'll attach the log shortly. Theoretically, this could be applied to most of the makefiles in the mozilla/dom/tests/mochitest/ajax/ directories, since they could have this problem as well; we probably just didn't hit it because the file names were short enough.
Attachment #283280 - Flags: review?(benjamin)
Attached file Logfile from the build
Directory that had the initial problem. Now make leaves with success...
Builds are at the URL above; Linux and Mac were happy; Win32 build looks to be going (but I tested this in Win32, so I think it's fine).

Per a meeting with joduinn, I'm gonna spend more time on this to try and find nsinstall's *exactly* path length limit. 
Comment on attachment 283280 [details] [diff] [review]
Patch for the makefile in that directory

I'm sad... but ok
Attachment #283280 - Flags: review?(benjamin) → review+
Attachment #283282 - Attachment mime type: application/octet-stream → text/plain
Attached file test-fat32.log.gz
Attached file test-ntfs.log.gz
After yet more testing, I think I can pretty much say this is a path length problem, not a directory length problem, nor a command-line length problem.

I've attached two log files (one run on NTFS and one run on FAT32) that used the Makefile where the command originally failed, along with a bunch of instrumentation, to see at which point nsinstall fails.

We'll look at the first fat32 failure case in the logfile (since this was on FAT32 originally), as well as the command right after it (which succeeded):

cwd is: /e/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Command is /d/mozilla-build/moztools/bin/nsinstall /e/builds/tinderbox/Fx-Mozill
a1.9-Release/WINNT_5.2_Depend/mozilla/dom/tests/mochitest/ajax/scriptaculous/tes
t/unit/_ajax_inplaceeditor_result.html ../../../../../../../_tests/testing/mochi
test/tests/dom/tests/mochitest/ajax/scriptaculous/test/unit
Command arg len:289
Source file arg len:148
Dest dir arg len:101
cwd len:128
nsinstall: cannot copy e:\builds\tinderbox\Fx-Mozilla1.9-Release\WINNT_5.2_Depen
d\mozilla\dom\tests\mochitest\ajax\scriptaculous\test\unit\_ajax_inplaceeditor_r
esult.html to ..\..\..\..\..\..\..\_tests\testing\mochitest\tests\dom\tests\moch
itest\ajax\scriptaculous\test\unit\_ajax_inplaceeditor_result.html: The system c
annot find the path specified.^M

cwd is: /e/aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
Command is /d/mozilla-build/moztools/bin/nsinstall /e/builds/tinderbox/Fx-Mozill
a1.9-Release/WINNT_5.2_Depend/mozilla/dom/tests/mochitest/ajax/scriptaculous/tes
t/unit/_ajax_inplaceeditor_text.html ../../../../../../../_tests/testing/mochite
st/tests/dom/tests/mochitest/ajax/scriptaculous/test/unit
Command arg len:287
Source file arg len:146
Dest dir arg len:101
cwd len:128

What's interesting about this is the cwd and destination directory arg length are the same: 229 characters. However, _ajax_inplaceeditor_result.html--32
 characters--fails, while _ajax_inplaceeditor_text.html--30 characters--succeeds.  Lucky for us, _autocomplete_result_nobr.html--31--succeeds.

So, it looks like the limitation on the path for nsinstall (however it's being used) is 260 characters.

I also looked at the original 1.9a8 log file failure, and tried to do a character count of what I think the path would be considered internally by nsinstall:

for f in *.html; do echo -n $f: && echo e:/builds/tinderbox/Fx-Mozilla1.9-Rel

ease/WINNT_5.2_Depend/mozilla/obj-fx-trunk/dom/tests/mochitest/ajax/scriptaculo

us/test/unit/../../../../../../_tests/testing/mochitest/tests/dom/tests/mochite

st/ajax/scriptaculous/src/$f | wc -c; done

_ajax_inplaceeditor_result.html:253

_ajax_inplaceeditor_text.html:251

_ajax_updater_result.html:247

_autocomplete_result.html:247

_autocomplete_result_nobr.html:252

ajax_autocompleter_test.html:250

ajax_inplaceeditor_test.html:250

bdd_test.html:235

builder_test.html:239

dragdrop_test.html:240

effects_test.html:239

element_test.html:239

loading_test.html:239

position_clone_test.html:246

slider_test.html:238

sortable_test.html:240

string_test.html:238

unit_test.html:236


Now, originally, only two files caused a build failure: _autocomplete_result_nobr.html and _ajax_inplaceeditor_result.html.

Again, lucky for us, the length of the filenames help here, and imply that 251 characters is the limit.

Now, what is interesting here is the discrepancy of 9 characters. I'm not entirely sure why that is; guesses include the way nsinstall converts "unixy" paths (using '/') to NT paths '\', extra possible nulls floated around in the concatenation of the strings, or possibly something else entirely.

It's also possible that my guess at the real paths being used to construct a path in nsinstall is just off by 9 characters, and there's some other path fragment I'm missing.

Wikipedia claims that the maximum file *name* length on Fat32 is 255 UTF-16 characters. What is unclear is whether the full path is considered part of the name on UTF-16. It may also depend on which Win32 functions nsinstall is using to resolve the path. In any event, I'm thinking this is the limit we're running into.

joduinn: shall we consider this mystery resolved or is there more information you'd like me to dig up?

If not, I'll check in the fix that bsmedberg r+'d (despite the fact that we could be likely to run into this problem again in other mochitest directories, since their paths are quite long.)
(with preed in my office!)

These experiments show a 260char path limit running nsinstall on msys. Glad to hear the FA32-vs-NTFS difference didnt seem relevant.

1) Does this 260 char limit apply if you run simple batch/scripts on MSYS (taking nsinstall and tinderbox out of the equation)? 

2) What is the equivalent path limit when running on cygwin?

Answering these remaining questions would answer the original question: what is the root cause of the breakage we hit when we tried to use the new win ref platform during the 1.9a8 release? We would then fix that root cause or at least clearly document exactly how to avoid hitting it again.


...oh, and yes, as best as I understand it, it seems the patch you have about absolute canonical pathnames seems a good idea!
1) A 260char limite is definitely not a generic problem with the shell: we have many, many commandlines in our build system that are far larger

2) The patch here is not a good idea if there's a better way to avoid it... we should in general use relative paths for objdir installation and avoid extraneous calls to $(shell) which fork processes and make the build slower.
(In reply to comment #12)
> 1) A 260char limite is definitely not a generic problem with the shell: we have
> many, many commandlines in our build system that are far larger

It's not a commandline limit.

It's a *path* limit... that is, the guess is that when nsinstall constructs an absolute path from a relative path under MSYS, and it's longer than 260 characters, it causes a failure. There is a discrepancy of 4-5 character (260 vs. 255), which I'm thinking may have to do with the way nsinstall constructs the path.

Now, an interesting question is why this didn't fail under cygwin. I'm gonna test with that and see what happens.
Not working on this anymore, so reassigning back into the build pool.
Assignee: mozbugs → build
Assignee: build → nobody
QA Contact: mozpreed → build
Priority: P2 → P3
This has hit us on 1.9 release automation as well as the new Mozilla2 Buildbot I'm setting up. Is it possible to get some traction on this?
Ted suggested switching to nsinstall.py instead; any thoughts? It needs a little work (to support symlinks), I think it is feature complete besides that.
I think at least on Win32 (where we don't have symlinks) it would be worth trying, since it should solve this problem. jag was worried about the effect on build time, I don't know if that's perceptible or not.
This looks like the 256 character limit (MAX_PATH) in Windows (http://support.microsoft.com/kb/q177665/). Both Cygwin and MSYS limits.h have PATH_MAX less than this, and it looks like this is what nsinstall was compiled with.

Just bumping up MAX_PATH is not enough though, because the underlying Windows functions have this limit as well. It seems that the way to do this is call the Unicode versions of the Windows functions, and prepend "\\?\" to the path:
http://msdn2.microsoft.com/en-us/library/aa363852.aspx

For what it's worth, I can reproduce this problem on nsinstall both with MSYS and Cygwin; e.g.:

1) create 200 character directory "a..."
2) create 200 character file "b..."
3) nsinstall a... b...

Then nsinstall crashes. If I write a Python program to try to do the above, it raises a WindowsError "the filename or extension is too long" on step 3.

We probably bumped into this limit with the combination of extra-long test files, and creating "e:\builds\tinderbox\Fx-Mozilla1.9-Release" where it was just "e:\builds\tinderbox\Fx-Mozilla1.9" before.

I'm not really sure when nsinstall.exe was compiled, or using what compiler, the modtime it is Apr 28 1998. It uses a bunch of unix headers so I don't think it could've been MSVC. Maybe brendan knows? I'm guessing GCC/cygwin.

Anyway, I don't think nsinstall.py alone is going to save the day, as Python has the same problem. Most of the advice I find is along the lines of "don't do that", e.g. http://www.microsoft.com/technet/prodtechnol/windows2000serv/reskit/w2000Msgs/3763.mspx?mfr=true :)

If we want to support paths longer than 256 characters, then I guess we could teach nsinstall to use the Win32 API, and have it automatically expand relative paths and prepend "\\?\".
Whiteboard: eta 1 Oct.
(In reply to comment #18)
> If we want to support paths longer than 256 characters, then I guess we could
> teach nsinstall to use the Win32 API, and have it automatically expand relative
> paths and prepend "\\?\".

By "expand" I meant "convert to absolute path", as you must've guessed :)

Component: Release Engineering → Release Engineering: Future
QA Contact: build → release
No longer blocks: 394044
Whiteboard: [win32 path limit]
We've "fixed" this by consciously using shorter paths for our build dirs to avoid the limit. 

Given that this only affects our older cygwin-based builders, I don't see us ever implementing a real fix.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → WONTFIX
FWIW I probably fixed this in mozilla-central (and 1.9.2) in bug 463417
(In reply to comment #20)
> Given that this only affects our older cygwin-based builders, I don't see us
> ever implementing a real fix.

Actually it was MSYS slaves that had this problem, so all but the very oldest machines.
Moving closed Future bugs into Release Engineering in preparation for removing the Future component.
Component: Release Engineering: Future → Release Engineering
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: