Last Comment Bug 670951 - Mac XULRunner builds fail intermittently during unification
: Mac XULRunner builds fail intermittently during unification
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Build Config (show other bugs)
: unspecified
: All All
: -- critical (vote)
: mozilla9
Assigned To: Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
:
:
Mentors:
: 679766 (view as bug list)
Depends on:
Blocks: 681448 681451 700688
  Show dependency treegraph
 
Reported: 2011-07-12 09:30 PDT by Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary)
Modified: 2011-11-08 09:12 PST (History)
20 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
wontfix


Attachments
test patch (1.43 KB, patch)
2011-09-20 10:54 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details | Diff | Splinter Review
s/else if/elsif/ (1.43 KB, patch)
2011-09-20 11:21 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details | Diff | Splinter Review
new patch (1.56 KB, patch)
2011-09-20 11:31 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review+
Details | Diff | Splinter Review
logging (996 bytes, patch)
2011-09-20 13:29 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review+
emorley: checkin+
Details | Diff | Splinter Review
proposed patch (696 bytes, patch)
2011-09-20 15:16 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details | Diff | Splinter Review
proposed patch (708 bytes, patch)
2011-09-20 16:08 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
no flags Details | Diff | Splinter Review
proposed patch (2.05 KB, patch)
2011-09-21 18:03 PDT, Rafael Ávila de Espíndola (:espindola) (not reading bugmail)
ted: review+
Details | Diff | Splinter Review

Description Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-07-12 09:30:28 PDT
The Mac XULRunner builds are failing intermittently during unify with 

sent 72078642 bytes  received 23648 bytes  6866884.76 bytes/sec
total size is 71997252  speedup is 1.00
Linking XPT files...
/tools/buildbot/bin/python2.6 /builds/slave/m-cen-osx64-xr/build/config/optimizejars.py --optimize /builds/slave/m-cen-osx64-xr/build/obj-firefox/x86_64/xulrunner/installer/mac/../../../jarlog//en-US ../../../dist/bin/chrome ../../../dist/xulrunner/XUL.framework/Versions/Current/chrome
Removing unpackaged files...
cd ../../../dist/xulrunner/XUL.framework/Versions/Current; rm -rf xulrunner-config regchrome* regxpcom* xpcshell* xpidl* xpt_dump* xpt_link*  core bsdecho gtscc js js-config jscpucfg nsinstall viewer TestGtkEmbed codesighs* elf-dynstr-gc mangle* maptsv* mfc* mkdepend* msdump* msmap* nm2tsv* nsinstall* res/samples res/throbber shlibsign* ssltunnel* certutil* pk12util* winEmbed.exe chrome/chrome.rdf chrome/app-chrome.manifest chrome/overlayinfo components/compreg.dat components/xpti.dat content_unit_tests necko_unit_tests *.dSYM 
Packaging JavaScript Shell...
rm -f ../../../dist/jsshell-mac64.zip
/usr/bin/zip -9j ../../../dist/jsshell-mac64.zip ../../../dist/bin/js  ../../../dist/bin/libnspr4.dylib ../../../dist/bin/libplds4.dylib ../../../dist/bin/libplc4.dylib 
  adding: js (deflated 67%)
  adding: libnspr4.dylib (deflated 65%)
  adding: libplds4.dylib (deflated 70%)
  adding: libplc4.dylib (deflated 70%)
rm -f obj-firefox/i386/dist/xulrunner/XUL.framework/Versions/Current/*.chk \
	      obj-firefox/x86_64/dist/xulrunner/XUL.framework/Versions/Current/*.chk
/builds/slave/m-cen-osx64-xr/build/build/macosx/universal/fix-buildconfig file \
	  obj-firefox/i386/dist/xulrunner/XUL.framework/Versions/Current/chrome/toolkit/ \
	  obj-firefox/x86_64/dist/xulrunner/XUL.framework/Versions/Current/chrome/toolkit/
mkdir -p obj-firefox/i386/dist/universal/xulrunner
rm -f obj-firefox/x86_64/dist/universal
ln -s obj-firefox/i386/dist/universal obj-firefox/x86_64/dist/universal
rm -rf obj-firefox/i386/dist/universal/xulrunner/XUL.framework
/builds/slave/m-cen-osx64-xr/build/build/macosx/universal/unify \
          --unify-with-sort "\.manifest$" \
          --unify-with-sort "components\.list$" \
	  obj-firefox/i386/dist/xulrunner/XUL.framework \
	  obj-firefox/x86_64/dist/xulrunner/XUL.framework \
	  obj-firefox/i386/dist/universal/xulrunner/XUL.framework
/builds/slave/m-cen-osx64-xr/build/build/macosx/universal/unify: warning: makeUniversalDirectory: only in x86 obj-firefox/x86_64/dist/xulrunner/XUL.framework/Versions/8.0a1:
  xulrunner
Can't call method "path" on an undefined value at /builds/slave/m-cen-osx64-xr/build/build/macosx/universal/unify line 1092.
make[2]: *** [postflight_all] Error 255
make[1]: *** [realbuild] Error 2
make: *** [build] Error 2
program finished with exit code 2
Comment 1 Ted Mielczarek [:ted.mielczarek] 2011-07-18 10:23:03 PDT
I really don't know what's going on here. Maybe some bad ordering of packaging targets?
Comment 2 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-07-18 10:24:00 PDT
rs said that we ran into a similar problem for Firefox builds in the past, and that it was fixed by running some step twice.
Comment 3 Nick Thomas [:nthomas] 2011-07-19 15:41:44 PDT
Definitely looks like a build system problem. The m-c builds on 2011-07-17 thru 19 were all on moz2-darwin10-slave06, but the first two succeeded and the last on failed. Also happening on Aurora.

The message
 only in x86 obj-firefox/x86_64/dist/xulrunner/XUL.framework/Versions/8.0a1:
is a bit odd, seems to be looking for the 32bit file in the 64bit dir.
Comment 4 Ted Mielczarek [:ted.mielczarek] 2011-07-20 05:16:54 PDT
I think that's a red herring. `unify` was originally written to splice together x86+ppc halves, and I think when we switched it to splice x86+x86_64 it retained a few strings explicitly referring to "x86" and "ppc":
http://mxr.mozilla.org/mozilla-central/source/build/macosx/universal/unify#724
Comment 5 Benjamin Smedberg [:bsmedberg] 2011-08-17 10:09:42 PDT
*** Bug 679766 has been marked as a duplicate of this bug. ***
Comment 6 Aki Sasaki [:aki] 2011-08-18 14:11:29 PDT
Bumping severity as this prevented us from shipping XULRunner 7.0b1.
Comment 7 Kyle Huey [:khuey] (Exited; not receiving bugmail, email if necessary) 2011-08-18 14:12:31 PDT
Can't you just rerun the build?
Comment 8 Aki Sasaki [:aki] 2011-08-18 14:17:58 PDT
That failed too.
How many times should I rerun it?
Comment 9 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-08-19 12:45:25 PDT
Sorry if it is a silly question, but how do I reproduce this?
Comment 10 Aki Sasaki [:aki] 2011-08-19 13:20:45 PDT
|make -f client.mk build| in mozilla-beta, with http://hg.mozilla.org/build/buildbot-configs/file/de8369b8cd85/mozilla2/macosx64/mozilla-beta/xulrunner/mozconfig as your mozconfig should probably do it.
Comment 11 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2011-08-23 13:21:28 PDT
nominating for tracking even though no patch ready yet, because this prevented us from generating xulrunner builds for 7.0beta1, and will be problem for 7.0betaN & 7.0 release.
Comment 12 Johnathan Nightingale [:johnath] 2011-08-23 14:55:03 PDT
Rafael - any luck reproducing this or thoughts on how to approach it (or thoughts on who we should punt it towards)?
Comment 13 John O'Duinn [:joduinn] (please use "needinfo?" flag) 2011-08-23 14:56:54 PDT
joey, ted: any suggestions?

espindola: does comment#10 answer your question?
Comment 14 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-08-24 06:42:57 PDT
(In reply to Johnathan Nightingale [:johnath] from comment #12)
> Rafael - any luck reproducing this or thoughts on how to approach it (or
> thoughts on who we should punt it towards)?

Not yet, sorry. Will try to take a look today. The analysis and opening the rdar's in relation to bug 678607 took some time.
Comment 15 Lukas Blakk [:lsblakk] use ?needinfo 2011-08-30 20:20:03 PDT
Failed for 7.0b3
Comment 16 christian 2011-09-15 13:39:01 PDT
Do we have a recent xulrunner for Firefox 7 beta working?
Comment 17 Nick Thomas [:nthomas] 2011-09-15 13:49:47 PDT
Between it sometimes working first time, and rebuilding when it doesn't, we've had mac SDKs for 7.0b2 through b5, eg
  https://ftp.mozilla.org/pub/mozilla.org/xulrunner/releases/7.0b5/sdk/
Comment 18 John Ford [:jhford] CET/CEST Berlin Time 2011-09-16 15:32:01 PDT
xulrunner failed to build and the rebuild also failed for Firefox 7.0b6
Comment 19 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 10:21:38 PDT
what OS X version is used for the build, 10.6?
Comment 20 Ben Hearsum (:bhearsum) 2011-09-20 10:23:45 PDT
System Version: Mac OS X 10.6.2 (10C540)

Darwin moz2-darwin10-slave09.build.mozilla.org 10.2.0 Darwin Kernel Version 10.2.0: Tue Nov  3 10:37:10 PST 2009; root:xnu-1486.2.11~1/RELEASE_I386 i386
Comment 21 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 10:54:43 PDT
Created attachment 561237 [details] [diff] [review]
test patch

Looking at the code I noticed that the script would fail if only the x86 file was defined. It is really strange that it fails sometimes only.

I am trying to reproduce the bug with this patch. Should at least give a bit more information.
Comment 22 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 11:21:52 PDT
Created attachment 561250 [details] [diff] [review]
s/else if/elsif/
Comment 23 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 11:31:12 PDT
Created attachment 561252 [details] [diff] [review]
new patch

We already had a waring for a file existing in only of of the arches.
Comment 24 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 11:41:49 PDT
summary so far:

the unify script has a bug and crashes if a file exists only on X86 (as opposed to PPC, the internal names the script uses).

The attached patch fixes the crash, but the log of a failed build shows

/builds/slave/rel-m-beta-xr-osx64-bld/build/build/macosx/universal/unify: warning: makeUniversalDirectory: only in x86 obj-firefox/x86_64/dist/xulrunner/XUL.framework/Versions/7.0:
  xulrunner

We probably still have to figure out what xulrunner was created only for x86_64.
Comment 25 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 13:27:40 PDT
In packager.mk we run:

 rsync -auv --copy-unsafe-links $(_APPNAME) $(MOZ_PKG_DIR)

doing a diff of both x86_64 and i386 output shows:

< XUL.framework/Versions/7.0/xpidl
< XUL.framework/Versions/7.0/xulrunner

but we do run nsinstall for xulrunner on both x86_64 and i386. Trying to figure out what went wrong between one and the other.

I cannot reproduce this on my laptop, trying to build on a bot.
Comment 26 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 13:29:16 PDT
Created attachment 561282 [details] [diff] [review]
logging

This patch makes a rsync invocation verbose to try do bisect where the problem is.
Comment 27 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 15:16:52 PDT
Created attachment 561311 [details] [diff] [review]
proposed patch

I have not been able to reproduce the problem, but I think this missing dependency could be the cause.
Comment 28 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 15:17:48 PDT
Comment on attachment 561282 [details] [diff] [review]
logging

Making this more verbose is probably a good thing anyway.
Comment 29 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 15:19:41 PDT
Comment on attachment 561252 [details] [diff] [review]
new patch

I am not so sure about this one. While this patch fixes a bug, this bug is probably what saved us from shipping a package with a non universal xulrunner.

Can we just drop support for having files in one dir but not in the other?
Comment 30 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-20 16:08:41 PDT
Created attachment 561334 [details] [diff] [review]
proposed patch
Comment 31 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-21 08:14:14 PDT
Comment on attachment 561334 [details] [diff] [review]
proposed patch

The joys of recursive makefiles :-(

I now realize that all this patch does is move the failure earlier in the build. If $(DIST)/bin/xulrunner already exists, everything works. If it doesn't, this particular makefile process has no idea how to create it:

make[7]: *** No rule to make target `../../dist/bin/xulrunner'.  Stop.
Comment 32 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-21 18:03:35 PDT
Created attachment 561635 [details] [diff] [review]
proposed patch

I think the correct way to fix this is to install the files directly in the location we will use and drop the use of rsync, but that is a fairly big change. We would have to change $(DEST)/bin to $(DEST_BIN) everywhere and have xulrunner on OS X set it to $(DIST)/$(FRAMEWORK_NAME).framework/Versions/$(FRAMEWORK_VERSION).

This patch is a really small step in that direction (but also a hack). The Makefile responsible for building xulrunner now also puts it in the framework directory. It may or may not be overwritten by rsync, but it will be there in the end.
Comment 33 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-22 08:49:17 PDT
Comment on attachment 561282 [details] [diff] [review]
logging

Logging change checked in:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2986e795dbbc
Comment 35 Ted Mielczarek [:ted.mielczarek] 2011-09-23 11:57:13 PDT
Comment on attachment 561252 [details] [diff] [review]
new patch

Review of attachment 561252 [details] [diff] [review]:
-----------------------------------------------------------------

This is fine, but it doesn't really solve the underlying problem, so maybe we should not take this patch?
Comment 36 Ted Mielczarek [:ted.mielczarek] 2011-09-23 12:01:15 PDT
Comment on attachment 561635 [details] [diff] [review]
proposed patch

Review of attachment 561635 [details] [diff] [review]:
-----------------------------------------------------------------

::: xulrunner/app/Makefile.in
@@ +208,5 @@
> +FRAMEWORK_DIR = \
> +   $(DIST)/$(FRAMEWORK_NAME).framework/Versions/$(FRAMEWORK_VERSION)
> +
> +$(FRAMEWORK_DIR)/Resources:
> +	$(NSINSTALL) -D $@

n.b.: this could use the work from bug 680246 when that lands.

::: xulrunner/stub/Makefile.in
@@ +114,5 @@
> +$(FRAMEWORK_DIR):
> +	$(NSINSTALL) -D $@
> +
> +$(FRAMEWORK_DIR)/$(PROGRAM): $(PROGRAM) $(FRAMEWORK_DIR)
> +	$(NSINSTALL) $(PROGRAM) $(FRAMEWORK_DIR)

You could just write "$(NSINSTALL) $?" in the rule here, but perhaps that's not as clear.
Comment 37 Matt Brubeck (:mbrubeck) 2011-09-24 08:29:54 PDT
https://hg.mozilla.org/mozilla-central/rev/cf051f97c093
Comment 38 Rafael Ávila de Espíndola (:espindola) (not reading bugmail) 2011-09-24 08:39:13 PDT
One more patch to go.
Comment 40 Lukas Blakk [:lsblakk] use ?needinfo 2011-09-28 16:59:45 PDT
Just hit this on 7.0.1 macosx64 xulrunner

sent 72125488 bytes  received 23648 bytes  5344380.44 bytes/sec
total size is 72043998  speedup is 1.00
Linking XPT files...
/tools/buildbot/bin/python2.6 /builds/slave/rel-m-rel-xr-osx64-bld/build/config/optimizejars.py --optimize /builds/slave/rel-m-rel-xr-osx64-bld/build/obj-firefox/x86_64/xulrunner/installer/mac/../../../jarlog//en-US ../../../dist/bin/chrome ../../../dist/xulrunner/XUL.framework/Versions/Current/chrome
Removing unpackaged files...
cd ../../../dist/xulrunner/XUL.framework/Versions/Current; rm -rf xulrunner-config regchrome* regxpcom* xpcshell* xpidl* xpt_dump* xpt_link*  core bsdecho gtscc js js-config jscpucfg nsinstall viewer TestGtkEmbed codesighs* elf-dynstr-gc mangle* maptsv* mfc* mkdepend* msdump* msmap* nm2tsv* nsinstall* res/samples res/throbber shlibsign* ssltunnel* certutil* pk12util* winEmbed.exe chrome/chrome.rdf chrome/app-chrome.manifest chrome/overlayinfo components/compreg.dat components/xpti.dat content_unit_tests necko_unit_tests *.dSYM 
Packaging JavaScript Shell...
rm -f ../../../dist/jsshell-mac64.zip
/usr/bin/zip -9j ../../../dist/jsshell-mac64.zip ../../../dist/bin/js  ../../../dist/bin/libnspr4.dylib ../../../dist/bin/libplds4.dylib ../../../dist/bin/libplc4.dylib 
  adding: js (deflated 67%)
  adding: libnspr4.dylib (deflated 65%)
  adding: libplds4.dylib (deflated 70%)
  adding: libplc4.dylib (deflated 70%)
rm -f obj-firefox/i386/dist/xulrunner/XUL.framework/Versions/Current/*.chk \
	      obj-firefox/x86_64/dist/xulrunner/XUL.framework/Versions/Current/*.chk
/builds/slave/rel-m-rel-xr-osx64-bld/build/build/macosx/universal/fix-buildconfig file \
	  obj-firefox/i386/dist/xulrunner/XUL.framework/Versions/Current/chrome/toolkit/ \
	  obj-firefox/x86_64/dist/xulrunner/XUL.framework/Versions/Current/chrome/toolkit/
mkdir -p obj-firefox/i386/dist/universal/xulrunner
rm -f obj-firefox/x86_64/dist/universal
ln -s obj-firefox/i386/dist/universal obj-firefox/x86_64/dist/universal
rm -rf obj-firefox/i386/dist/universal/xulrunner/XUL.framework
/builds/slave/rel-m-rel-xr-osx64-bld/build/build/macosx/universal/unify \
          --unify-with-sort "\.manifest$" \
          --unify-with-sort "components\.list$" \
	  obj-firefox/i386/dist/xulrunner/XUL.framework \
	  obj-firefox/x86_64/dist/xulrunner/XUL.framework \
	  obj-firefox/i386/dist/universal/xulrunner/XUL.framework
/builds/slave/rel-m-rel-xr-osx64-bld/build/build/macosx/universal/unify: warning: makeUniversalDirectory: only in x86 obj-firefox/x86_64/dist/xulrunner/XUL.framework/Versions/7.0.1:
  xulrunner
Can't call method "path" on an undefined value at /builds/slave/rel-m-rel-xr-osx64-bld/build/build/macosx/universal/unify line 1092.
make[2]: *** [postflight_all] Error 255
make[1]: *** [realbuild] Error 2
make: *** [build] Error 2
program finished with exit code 2
elapsedTime=8399.688690
Comment 41 Ted Mielczarek [:ted.mielczarek] 2011-09-28 17:20:06 PDT
Right, because this only landed in time for Firefox 9. I'm not sure why this got marked status-firefox7: fixed.

Note You need to log in before you can comment on or make changes to this bug.