Closed
Bug 963123
Opened 11 years ago
Closed 11 years ago
New Windows build slaves fail in NSS on branches below trunk
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Assigned: armenzg)
References
Details
Attachments
(3 files)
103.00 KB,
text/plain
|
Details | |
186.23 KB,
text/plain
|
Details | |
2.46 KB,
patch
|
glandium
:
review+
armenzg
:
checked-in+
|
Details | Diff | Splinter Review |
I've seen this for a few weeks with my weekly Try pushes of Aurora as if it were merged to Beta already, and just retriggered until I got an older slave since non-trunk isn't really supported on Try, but last night b-2008-ix-0001 (which must be the first new Win build slave we've had in the non-try pool for a while) took the Windows XULRunner nightly on Aurora and failed the same way, so apparently we've got a systemic problem with newly created slaves, which some build change that's on trunk only causes to not happen there.
https://tbpl.mozilla.org/php/getParsedLog.php?id=33444843&tree=Mozilla-Aurora
Creating Resource file: module.res
nsinstall: failed to create directory c:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\obj-firefox\security\C:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\security\nss\cmd\lib: [Error 123] The filename, directory name, or volume label syntax is incorrect: u'c:\\builds\\moz2_slave\\m-aurora-w32-xr-ntly-000000000\\build\\obj-firefox\\security\\C:'
Only on new slaves, only on branches below trunk, the NSS build system decides that C:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\security\nss\cmd\lib is a path relative to the directory c:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\obj-firefox\security
Updated•11 years ago
|
Assignee: nobody → armenzg
Assignee | ||
Comment 1•11 years ago
|
||
jhopkins, can you think of anything that is causing this?
I assume there is a difference between the rev2 and rev1 Win64 imaging.
Assignee | ||
Comment 2•11 years ago
|
||
I can reproduce the problem if I do this:
C:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\obj-firefox>python C:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\config\nsinstall.py -D c:\\builds\\moz2_slave\\m-aurora-w32-xr-ntly-000000000\\build\\obj-firefox\\security\\C:
nsinstall: failed to create directory c:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\obj-firefox\security\C:: [Error 123] The filename, directory name, or volume label syntax is incorrect: u'c:\\builds\\moz2_slave\\m-aurora-w32-xr-ntly-000000000\\build\\obj-firefox\\security\\C:'
The line that prints the message is on line 68.
http://mxr.mozilla.org/mozilla-aurora/source/config/nsinstall.py#63
What I have not been able to figure out is who is passing 'c:\\builds\\moz2_slave\\m-aurora-w32-xr-ntly-000000000\\build\\obj-firefox\\security\\C:' to nsinstall.py.
Any ideas?
63 # just create one directory?
64 def maybe_create_dir(dir, mode, try_again):
65 dir = os.path.abspath(dir)
66 if os.path.exists(dir):
67 if not os.path.isdir(dir):
68 print('nsinstall: {0} is not a directory'.format(dir), file=sys.stderr)
69 return 1
70 if mode:
71 os.chmod(dir, mode)
72 return 0
73
74 try:
75 if mode:
76 os.makedirs(dir, mode)
77 else:
78 os.makedirs(dir)
79 except Exception as e:
80 # We might have hit EEXIST due to a race condition (see bug 463411) -- try again once
81 if try_again:
82 return maybe_create_dir(dir, mode, False)
83 print("nsinstall: failed to create directory {0}: {1}".format(dir, e))
84 return 1
85 else:
86 return 0
Assignee | ||
Comment 3•11 years ago
|
||
All lines are trying to make the same directory:
u'c:\\builds\\moz2_slave\\m-aurora-w32-xr-ntly-000000000\\build\\obj-firefox\\security\\C:'
The most significant different line is:
C:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\obj-firefox\security\build\Makefile:486:0: command 'C:/mozilla-build/python27/python.exe c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/build/pymake/pymake/../make.py -C C:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/security/nss/lib/crmf libs CC=' cl' SOURCE_MD_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist SOURCE_MDHEADERS_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist/include/nspr DIST=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist NSPR_INCLUDE_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist/include/nspr NSPR_LIB_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist/lib MOZILLA_CLIENT=1 NO_MDUPDATE=1 NSS_ENABLE_ECC=1 SQLITE_LIB_NAME=nss3 SQLITE_INCLUDE_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist/include ABS_topsrcdir='c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build' BUILD='c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/security/$(subst $(ABS_topsrcdir)/security/,,$(CURDIR))' BUILD_TREE='$(BUILD)' OBJDIR='$(BUILD)' DEPENDENCIES='$(BUILD)/.deps' SINGLE_SHLIB_DIR='$(BUILD)' SOURCE_XP_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist BUILD_OPT=1 OPT_CODE_SIZE=1 NS_USE_GCC= OS_TARGET=WIN95 NSS_ENABLE_ZLIB= PROGRAMS= CHECKLOC= FREEBL_NO_DEPEND=0 NSS_NO_PKCS11_BYPASS=1 PUBLIC_EXPORT_DIR='c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist/include/$(MODULE)' SOURCE_XPHEADERS_DIR='$(SOURCE_XP_DIR)/include/$(MODULE)' MODULE_INCLUDES='$(addprefix -I$(SOURCE_XP_DIR)/include/,$(REQUIRES))' MAKE_OBJDIR='$(INSTALL) -D $(OBJDIR)' TARGETS='$(LIBRARY) $(SHARED_LIBRARY) $(PROGRAM)' PYTHON='c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/_virtualenv/Scripts/python.exe' NSINSTALL_PY='C:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/config/nsinstall.py' NSINSTALL='$(PYTHON) $(NSINSTALL_PY)' INSTALL='$(NSINSTALL) -t' ' failed, return code 2nsinstall: failed to create directory c:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\obj-firefox\security\C:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\security\nss\lib\dbm\src: [Error 123] The filename, directory name, or volume label syntax is incorrect: u'c:\\builds\\moz2_slave\\m-aurora-w32-xr-ntly-000000000\\build\\obj-firefox\\security\\C:'
Assignee | ||
Comment 4•11 years ago
|
||
I also see all of these failures (added new lines to help reading):
C:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build\obj-firefox\security\build\Makefile:486:0: command
'C:/mozilla-build/python27/python.exe
c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/build/pymake/pymake/../make.py -C C:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/security/nss/lib/dbm libs
CC=' cl'
SOURCE_MD_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist
SOURCE_MDHEADERS_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist/include/nspr
DIST=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist
NSPR_INCLUDE_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist/include/nspr
NSPR_LIB_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist/lib
MOZILLA_CLIENT=1
NO_MDUPDATE=1
NSS_ENABLE_ECC=1
SQLITE_LIB_NAME=nss3
SQLITE_INCLUDE_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist/include
ABS_topsrcdir='c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build'
BUILD='c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/security/$(subst $(ABS_topsrcdir)/security/,,$(CURDIR))'
BUILD_TREE='$(BUILD)'
OBJDIR='$(BUILD)'
DEPENDENCIES='$(BUILD)/.deps'
SINGLE_SHLIB_DIR='$(BUILD)'
SOURCE_XP_DIR=c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist
BUILD_OPT=1
OPT_CODE_SIZE=1
NS_USE_GCC=
OS_TARGET=WIN95
NSS_ENABLE_ZLIB=
PROGRAMS=
CHECKLOC=
FREEBL_NO_DEPEND=0
NSS_NO_PKCS11_BYPASS=1
PUBLIC_EXPORT_DIR='c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/dist/include/$(MODULE)'
SOURCE_XPHEADERS_DIR='$(SOURCE_XP_DIR)/include/$(MODULE)'
MODULE_INCLUDES='$(addprefix -I$(SOURCE_XP_DIR)/include/,$(REQUIRES))'
MAKE_OBJDIR='$(INSTALL) -D $(OBJDIR)'
TARGETS='$(LIBRARY) $(SHARED_LIBRARY) $(PROGRAM)'
PYTHON='c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox/_virtualenv/Scripts/python.exe'
NSINSTALL_PY='C:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/config/nsinstall.py'
NSINSTALL='$(PYTHON) $(NSINSTALL_PY)'
INSTALL='$(NSINSTALL) -t' ' failed, return code 2
Assignee | ||
Comment 5•11 years ago
|
||
I'm suspicious of OS_TARGET=WIN95
Comment 6•11 years ago
|
||
No, that's normal.
Assignee | ||
Comment 7•11 years ago
|
||
I'm putting the slave on staging and see what it does.
I'm out of ideas. I don't know how to debug Makefiles et al.
Assignee | ||
Comment 8•11 years ago
|
||
I have confirmed that it compiles for mozilla-central:
http://dev-master01.build.scl1.mozilla.com:8040/builders/WINNT%205.2%20mozilla-central%20xulrunner%20nightly
but not for mozilla-aurora:
http://dev-master01.build.scl1.mozilla.com:8040/builders/WINNT%205.2%20mozilla-aurora%20xulrunner%20nightly/builds/0/steps/compile/logs/stdio
I believe there are code differences from one branch to the other which is causing the rev2 win64 machines fail in some weird way.
Assignee | ||
Comment 9•11 years ago
|
||
mshal is actually helping me look into this.
Comment 10•11 years ago
|
||
Here's what I know so far:
The incorrect nsinstall paths come from this line in security/build/Makefile.in:
DEFAULT_GMAKE_FLAGS += BUILD='$(MOZ_BUILD_ROOT)/security/$$(subst $$(ABS_topsrcdir)/security/,,$$(CURDIR))'
MOZ_BUILD_ROOT is c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/obj-firefox
ABS_topsrcdir is c:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build
CURDIR is C:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build/security/nss/cmd/lib
CURDIR is the weird one - it has a 'C:' instead of 'c:', so the $(subst) doesn't match. CURDIR is built-in to make, which here means pymake. pymake gets its CURDIR value either from os.getcwd(), or from the directory passed in with '$(MAKE) -C subdir'
Here's the weird part (to me) - if I print out os.getcwd() in pymake's main, when I run it on the command-line I get:
pymake getcwd is: c:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build
But the same thing from a builder gets me:
pymake getcwd is: C:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build
I tried wrapping this in os.path.normcase(), but due to the fact that paths also come from the $(MAKE) -C subdir route, it doesn't fix the problem. When nss does a $(MAKE) -C, it uses NSS_SUBDIR, which is set to $(topsrcdir), which comes from the configure-generated security/build/Makefile:
topsrcdir := C:/builds/moz2_slave/m-aurora-w32-xr-ntly-000000000/build
Note that in comparison, my own Windows machine generates topsrcdir with a lower-case "c:".
So I think we can try to fix the $(subst) to be more lenient, which might be a bit wonky to do in make. Or we can figure out why os.getcwd() gives us a different casing for "C:" in the command-line vs. in the builder. We would also need to figure out why topsrcdir gets generated with upper-case then (which would probably be the same root cause).
Comment 11•11 years ago
|
||
Actually I think we can just replace ABS_topsrcdir with topsrcdir, since both topsrcdir & CURDIR will use the same capitalization. It would also get rid of a $(shell)...
Comment 12•11 years ago
|
||
Try this in python:
os.chdir('c:\Program Files')
print(os.getcwd())
os.chdir('C:\Users')
print(os.getcwd())
Yep, what os.getcwd() returns is conditioned by how the current directory was set. And afaik, the script we run on slaves do change directory before launching commands.
So there's a simpler fix, then: fix the path used by the build slaves scripts, which itself must be containing a 'C', while the same path on other builders must have a 'c'.
Comment 13•11 years ago
|
||
That's strange - I tried a similar test in the command-line:
$ pwd
/c/Users/marf
$ python ok.py
c:\Users\marf
$ cd /C/Users/marf
$ pwd
/C/Users/marf
$ python ok.py
c:\Users\marf
In both cases, os.getcwd() gives a lower-case 'c' instead of what pwd shows. I can reproduce your test though, so I guess the 'c' vs. 'C' must be internal to python rather than what it gets from the OS.
Comment 14•11 years ago
|
||
I'm not sure where in the configuration the 'C:' setting is coming from (I pinged armenzg about it, but he's not sure either). Either way I don't think we should fail in this bizarre way if one path says 'c:' and another says 'C:' on a case-insensitive filesystem.
Attachment #8366168 -
Flags: review?(mh+mozilla)
Comment 15•11 years ago
|
||
(In reply to Michael Shal [:mshal] from comment #13)
> In both cases, os.getcwd() gives a lower-case 'c' instead of what pwd shows.
> I can reproduce your test though, so I guess the 'c' vs. 'C' must be
> internal to python rather than what it gets from the OS.
And the harness is in python...
Comment 16•11 years ago
|
||
Comment on attachment 8366168 [details] [diff] [review]
0001-Bug-963123-Fix-case-sensitivity-to-C-vs-c-in-NSS.patch
Review of attachment 8366168 [details] [diff] [review]:
-----------------------------------------------------------------
I still think this should be worked around at the builder level, too. There's no reason that can't be made to work without the patch, if it works on the older slaves. And I'm not terribly thrilled at the idea of touching the (fragile) nss build system on branches to make them work with new slaves.
Attachment #8366168 -
Flags: review?(mh+mozilla) → review+
Assignee | ||
Comment 17•11 years ago
|
||
Oh wow and I thought I would not be surprised anymore of how one little thing could affect such a far distant problem!
I really thought I would not be able to find anything after several hours of investigations.
When a machine starts, we grab a buildbot.tac file (this allows for allocating machines).
One of the values is basedir = 'C:\\\\builds\\\\moz2_slave'
Which runslave.py passes as cwd to buildbot when starting [1]:
211 rv = subprocess.call(
212 self.options.twistd_cmd +
213 [ '--no_save',
214 '--logfile', os.path.join(self.get_basedir(), 'twistd.log'),
215 '--python', self.get_filename(),
216 ],
217 cwd=self.get_basedir())
It seems that 12 win64 rev2 machines have that value [2] as well.
I triggered a new build after fixing it [3] and I see this workdir:
in dir c:\builds\moz2_slave\m-aurora-w32-xr-ntly-000000000\build (timeout 7200 secs) (maxTime 16200 secs)
instead of
in dir C:\\builds\\moz2_slave\m-aurora-w32-xr-ntly-000000000\build (timeout 7200 secs) (maxTime 16200 secs)
I've fixed the 12 Win64 machines that had the issue and the b-2008-ix machines.
I will add b-2008-ix-0001 back into production.
[1] http://hg.mozilla.org/build/puppet-manifests/file/tip/modules/buildslave/files/runslave.py
[2]
mysql> select name, basedir from slaves where binary basedir like 'C:%' and name like 'w64-ix%';
+-----------------+------------------------+
| name | basedir |
+-----------------+------------------------+
| w64-ix-slave159 | C:\\builds\\moz2_slave |
| w64-ix-slave160 | C:\\builds\\moz2_slave |
| w64-ix-slave161 | C:\\builds\\moz2_slave |
| w64-ix-slave162 | C:\\builds\\moz2_slave |
| w64-ix-slave163 | C:\\builds\\moz2_slave |
| w64-ix-slave164 | C:\\builds\\moz2_slave |
| w64-ix-slave165 | C:\\builds\\moz2_slave |
| w64-ix-slave166 | C:\\builds\\moz2_slave |
| w64-ix-slave167 | C:\\builds\\moz2_slave |
| w64-ix-slave168 | C:\\builds\\moz2_slave |
| w64-ix-slave169 | C:\\builds\\moz2_slave |
| w64-ix-slave170 | C:\\builds\\moz2_slave |
+-----------------+------------------------+
12 rows in set (0.54 sec)
[3]
mysql> select name, basedir from slaves where name like 'b-2008-ix-0001';
+----------------+------------------------+
| name | basedir |
+----------------+------------------------+
| b-2008-ix-0001 | C:\\builds\\moz2_slave |
+----------------+------------------------+
1 row in set (0.42 sec)
mysql> update slaves set basedir='c:\\builds\\moz2_slave' where binary basedir like 'C:%' and name like 'w64-ix%';Query OK, 12 rows affected (0.42 sec)
Rows matched: 12 Changed: 12 Warnings: 0
mysql> update slaves set basedir='c:\\builds\\moz2_slave' where name like 'b-2008-ix-0001';
Query OK, 1 row affected (0.40 sec)
Rows matched: 1 Changed: 1 Warnings: 0
mysql> select name, basedir from slaves where name like 'b-2008-ix-0001';
+----------------+----------------------+
| name | basedir |
+----------------+----------------------+
| b-2008-ix-0001 | c:\builds\moz2_slave |
+----------------+----------------------+
1 row in set (0.40 sec)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 18•11 years ago
|
||
:glandium, would you still be fine if we land the build fix? This would prevent any issues in the future.
Comment 19•11 years ago
|
||
I'm glad you were able to find the source of the path! Personally I'd prefer not to leave the case-sensitivity in the build system since I think it's likely we'll stumble on it again, but I'll defer to glandium.
Comment 20•11 years ago
|
||
Oh yes, please land, but let it ride the trains instead of uplifting.
Assignee | ||
Comment 21•11 years ago
|
||
great :)
I will land it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 22•11 years ago
|
||
Pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=c7c76fa42da0
Assignee | ||
Comment 23•11 years ago
|
||
Comment on attachment 8366168 [details] [diff] [review]
0001-Bug-963123-Fix-case-sensitivity-to-C-vs-c-in-NSS.patch
https://hg.mozilla.org/integration/mozilla-inbound/rev/877ea08fb1cf
Attachment #8366168 -
Flags: checked-in+
Comment 24•11 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•