Closed Bug 715298 Opened 8 years ago Closed 8 years ago

Fennec Java builds succeed on "linux-ix-slave" builders, but fail on "try-linux-slave" builders

Categories

(Release Engineering :: General, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cpeterson, Assigned: coop)

References

Details

(Whiteboard: [mobile][try])

Attachments

(2 files)

I have a couple changesets (with Fennec Java changes) I have pushed to the try servers. When my Fennec builds are (randomly?) assigned to any "linux-ix-slave" builder, my builds succeed. When my builds are assigned to any "try-linux-slave" builder, my builds fail.

The errors are that the ProGuard Java .class shrinker/optimizer can't find a couple methods in Android SDK libs, but AFAICT the builders have the same Android SDK and JDK versions and paths.

> 1) specific changeset used in all cases

* CHANGESETS PUSHED TO TRY SERVER:
https://tbpl.mozilla.org/?tree=Try&rev=8ac71ab745e5
8ac71ab745e5
4e1d2cef3695
921bab94b9af
757fb03b09c3
0e422aa201ca
c9c6f7ed46a5
64be73b3b014


> 2) you say "builds on try machines" - I assume are you doing push to try from a local repo that was the same changeset in all of these cases? 

Yes, I have local patches in my hg patch queue that I pushed to try. I tested ~2 different patches when trying to debugging my build problems, but those patches would be functionally equivalent and the patches landed on both "linux-ix-slave" and "try-linux-slave" builders.

hg qnew --currentdate --currentuser --edit --message "try: -b do -e -p android -u none -t none" TRY
hg push --force -rtip ssh://hg.mozilla.org/try


* BUILDS SUCCEEDED ON THESE BUILDERS:
linux-ix-slave08
linux-ix-slave10
linux-ix-slave10
linux-ix-slave11
linux-ix-slave11
mv-moz2-linux-ix-slave23
mv-moz2-linux-ix-slave23


* BUILDS FAILED ON THESE BUILDERS:
try-linux-slave06
try-linux-slave06
try-linux-slave19
try-linux-slave20
try-linux-slave28
try-linux-slave28
try-linux-slave23 

* BUILD ERROR:
ProGuard, version 4.4
Reading input...
Reading program directory [/builds/slave/try-andrd-dbg/build/obj-firefox/mobile/android/base/classes]
Reading library directory [/tools/android-sdk-r13/platforms/android-13]
Initializing...
Warning: org.mozilla.gecko.GeckoApp$30: can't find referenced method 'void setZOrderOnTop(boolean)' in class android.view.SurfaceView
Warning: org.mozilla.gecko.GeckoApp$30: can't find referenced method 'void setZOrderMediaOverlay(boolean)' in class android.view.SurfaceView
:cpeterson, could you attach your mozconfigs to the bug? (I assume, but asking explicitly, that the same mozconfig was being used in both the successful and failing cases?)
I see this in the try build for

builder: try-android-xul
slave: mv-moz2-linux-ix-slave22
starttime: 1325532788.85
results: success (0)
buildid: 20120102113320
builduid: 25fff2ef53714e968cd0f21242be42a6
revision: 8ac71ab745e5

mozconfig dump:

# Global options
mk_add_options MOZ_MAKE_FLAGS=-j4

# Nightlies only since this has a cost in performance
ac_add_options --enable-js-diagnostics

# Build Fennec
ac_add_options --enable-application=mobile
ac_add_options --disable-elf-hack

# Android
ac_add_options --target=arm-linux-androideabi
ac_add_options --with-endian=little
ac_add_options --with-android-ndk="/tools/android-ndk-r5c"
ac_add_options --with-android-sdk="/tools/android-sdk-r13/platforms/android-13"
ac_add_options --with-android-tools="/tools/android-sdk-r13/tools"
ac_add_options --with-android-toolchain=/tools/android-ndk-r5c/toolchains/arm-linux-androideabi-4.4.3/prebuilt/linux-x86
ac_add_options --with-android-platform=/tools/android-ndk-r5c/platforms/android-5/arch-arm
ac_add_options --with-system-zlib
ac_add_options --enable-update-channel=${MOZ_UPDATE_CHANNEL}

export JAVA_HOME=/tools/jdk6
export MOZILLA_OFFICIAL=1

ac_add_options --with-branding=mobile/xul/branding/nightly


I would like to compare that to a failed build on the non-try side to make sure that the ndk and sdk environment vars are matching
:bear, my change only affects the "try-android" builds, not "try-android-xul". The mozconfig you copied is for "builder: try-android-xul".

:joduinn, do you want the mozconfig I use to build locally? I'm not sure which mozconfig the try servers are using. I didn't specify a particular mozconfig when pushing. The android nightly builds use this mozconfig:

https://hg.mozilla.org/mozilla-central/file/tip/mobile/android/config/mozconfigs/android/nightly
I just want to compare the two mozconfigs.  I took a look at two of the slaves you listed and saw that they had the proper android sdk's in place but one had some older info bacause it has been around longer than the other.

My hunch is that you may not be using the mozconfig on try that you think/assume you are and I just want to rule that out.
Here is the mozconfig I use to compile locally on my Mac:

ANDROID_MIN_VERSION=5
ANDROID_TARGET_VERSION=13

ANDROID_NDK_DIR="/Users/cpeterson/Code/google/android-ndk-r5c"
ANDROID_SDK_DIR="/Users/cpeterson/Code/google/android-sdk-macosx"

mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/OBJDIR
mk_add_options MOZ_MAKE_FLAGS="-j9 -s"

ac_add_options  --disable-crashreporter
ac_add_options  --disable-pedantic
ac_add_options  --disable-tests

ac_add_options  --with-ccache

# Android SDK
ac_add_options   --with-android-version=$ANDROID_MIN_VERSION

ac_add_options       --with-android-ndk="$ANDROID_NDK_DIR"
ac_add_options  --with-android-platform="$ANDROID_NDK_DIR/platforms/android-$ANDROID_MIN_VERSION/arch-arm"
ac_add_options --with-android-toolchain="$ANDROID_NDK_DIR/toolchains/arm-linux-androideabi-4.4.3/prebuilt/darwin-x86"

ac_add_options       --with-android-sdk="$ANDROID_SDK_DIR/platforms/android-$ANDROID_TARGET_VERSION"
ac_add_options     --with-android-tools="$ANDROID_SDK_DIR/tools"

# Android options
ac_add_options --enable-application=mobile/android
ac_add_options --target=arm-linux-androideabi
ac_add_options --with-endian=little
> My hunch is that you may not be using the mozconfig on try that you think/assume you are 
> and I just want to rule that out.

:bear, how can I check which mozconfig is being used on the try servers? The changeset I am pushing to try does not change mozconfig or rely on any new mozconfig flags.

I wonder if the linux-ix-slave and try-linux-slave builders could be using different Java versions? The Java version on my dev machine is javac 1.6.0_29.
Blocks: Proguard
(In reply to Chris Peterson (:cpeterson) from comment #6)
> :bear, how can I check which mozconfig is being used on the try servers? The
> changeset I am pushing to try does not change mozconfig or rely on any new
> mozconfig flags.

Chris: if you check the build log for your try run, search for "mozconfig" and you should find two build steps early on in the log: one that fetches the mozconfig (got mozconfig), and then a second step that shows the contents of the .mozconfig that the build will use (cat .mozconfig).

Here's as recent example try log that contains those steps:

https://tbpl.mozilla.org/php/getParsedLog.php?id=8502559&tree=Try

(In reply to Chris Peterson (:cpeterson) from comment #6) 
> I wonder if the linux-ix-slave and try-linux-slave builders could be using
> different Java versions? The Java version on my dev machine is javac
> 1.6.0_29.

Doubtful since in theory we control those things, but I will double-check to be sure.
Assignee: joduinn → coop
OS: Mac OS X → Linux
Priority: -- → P3
I compared two of the builders mentioned in comment #0. It makes me sad that the output is not identical:

[cltbld@linux-ix-slave08 ~]$ rpm -qa | grep jdk | sort
jdk1.5-1.5.0_10-0moz1
jdk1.6-1.6.0_17-0moz1
[cltbld@linux-ix-slave08 ~]$ cd /tools
[cltbld@linux-ix-slave08 tools]$ cd
[cltbld@linux-ix-slave08 ~]$ ls -ld /tools/jdk*
lrwxrwxrwx  1 root   root     19 Jan 15  2010 /tools/jdk -> /tools/jdk-1.5.0_10
drwxr-xr-x  9 root   root   4096 Nov  9  2006 /tools/jdk-1.5.0_10
drwxr-xr-x 10 cltbld cltbld 4096 Mar 19  2010 /tools/jdk-1.6.0_17
lrwxrwxrwx  1 root   root     19 May  5  2010 /tools/jdk6 -> /tools/jdk-1.6.0_17
[cltbld@linux-ix-slave08 ~]$ rpm -qa | grep android | sort
android-ndk5-r5c-0moz3
android-ndk-r4c-0moz3
android-sdk13-r13-0moz1
android-sdk-r8-0moz3
[cltbld@linux-ix-slave08 ~]$ ls -ld /tools/android*
lrwxrwxrwx 1 root root   22 Jun 17  2010 /tools/android-ndk -> /tools/android-ndk-r4c
drwxr-xr-x 6 root root 4096 Jul  1  2010 /tools/android-ndk-r4c
drwxr-xr-x 9 root root 4096 Jul 20 13:58 /tools/android-ndk-r5c
lrwxrwxrwx 1 root root   21 Jun 17  2010 /tools/android-sdk -> /tools/android-sdk-r8
drwxr-xr-x 8 root root 4096 Aug  9 09:29 /tools/android-sdk-r13
drwxr-xr-x 5 root root 4096 Jun 21  2010 /tools/android-sdk-r8

[cltbld@try-linux-slave06 ~]$ rpm -qa | grep jdk | sort
jdk1.5-1.5.0_10-0moz1
jdk1.6-1.6.0_17-0moz1
[cltbld@try-linux-slave06 ~]$ ls -ld /tools/jdk*
lrwxrwxrwx  1 root   root     19 Jul 10  2009 /tools/jdk -> /tools/jdk-1.5.0_10
drwxr-xr-x  9 root   root   4096 Nov  9  2006 /tools/jdk-1.5.0_10
drwxr-xr-x 10 cltbld cltbld 4096 Mar 19  2010 /tools/jdk-1.6.0_17
lrwxrwxrwx  1 root   root     19 May 18  2010 /tools/jdk6 -> /tools/jdk-1.6.0_17
[cltbld@try-linux-slave06 ~]$ rpm -qa | grep android | sort
android-ndk5-r5c-0moz3
android-ndk-r4c-0moz3
android-ndk-r5c-0moz1
android-sdk13-r13-0moz1
android-sdk-r8-0moz3
[cltbld@try-linux-slave06 ~]$ ls -ld /tools/android*
lrwxrwxrwx 1 root root   22 Jun 17  2010 /tools/android-ndk -> /tools/android-ndk-r4c
drwxr-xr-x 6 root root 4096 Jul  2  2010 /tools/android-ndk-r4c
lrwxrwxrwx 1 root root   22 Jul 14 14:18 /tools/android-ndk-r5 -> /tools/android-ndk-r5c
drwxr-xr-x 9 root root 4096 Jul 19 23:51 /tools/android-ndk-r5c
lrwxrwxrwx 1 root root   21 Jun 17  2010 /tools/android-sdk -> /tools/android-sdk-r8
drwxr-xr-x 8 root root 4096 Aug  9 11:38 /tools/android-sdk-r13
drwxr-xr-x 5 root root 4096 Jun 21  2010 /tools/android-sdk-r8

Chris: would the extra android-ndk (r5) on the try-linux-* slaves be responsible for what you're seeing?

I really need to look two build logs (one pass, one fail) to find out what's going on though. I can start mining the changesets you've provided to try to find matching build logs, but if you already have links to them, they would be appreciated.
Whiteboard: [mobile][try]
> Chris: would the extra android-ndk (r5) on the try-linux-* slaves be responsible for what you're seeing?

I don't think the android-ndk-r5 -> android-ndk-r5c symlink cause any problems.
:coop, I've attached two build logs: a successful build (from a linux-ix-slave) and a failed build (from a try-linux-slave).

The logs show that the same mozconfig is downloaded (mobile/android/config/mozconfigs/android/nightly) and the cat'd contents are the same.
(In reply to Chris Peterson (:cpeterson) from comment #12)
> :coop, I've attached two build logs: a successful build (from a
> linux-ix-slave) and a failed build (from a try-linux-slave).
> 
> The logs show that the same mozconfig is downloaded
> (mobile/android/config/mozconfigs/android/nightly) and the cat'd contents
> are the same.

OK, I've pulled try-linux-slave28 (the slave used in the BAD log) and am going to do some testing to find out what the java delta is between this machine and a GOOD machine.
Status: NEW → ASSIGNED
Priority: P3 → P2
(In reply to Chris Cooper [:coop] from comment #13)
> OK, I've pulled try-linux-slave28 (the slave used in the BAD log) and am
> going to do some testing to find out what the java delta is between this
> machine and a GOOD machine.

hg has been unavailable due to the downtime, but I did do a quick comparison of the jdk dirs between mv-moz2-linux-ix-slave23(GOOD) and try-linux-slave28(BAD):

[cltbld@mv-moz2-linux-ix-slave23 jdk]$ rsync -e ssh -nav /tools/jdk-1.5.0_10 try-linux-slave28.build.mozilla.org:/tools
cltbld@try-linux-slave28.build.mozilla.org's password: 
building file list ... done
jdk-1.5.0_10/jre/lib/
jdk-1.5.0_10/jre/lib/charsets.jar
jdk-1.5.0_10/jre/lib/deploy.jar
jdk-1.5.0_10/jre/lib/javaws.jar
jdk-1.5.0_10/jre/lib/jsse.jar
jdk-1.5.0_10/jre/lib/plugin.jar
jdk-1.5.0_10/jre/lib/rt.jar
jdk-1.5.0_10/jre/lib/ext/
jdk-1.5.0_10/jre/lib/ext/localedata.jar
jdk-1.5.0_10/jre/lib/i386/client/
jdk-1.5.0_10/jre/lib/i386/client/classes.jsa
jdk-1.5.0_10/lib/
jdk-1.5.0_10/lib/tools.jar
I add "java -version" to the Fennec makefile to log the exact JDK version installed on the builders. I ran a few try builds until I hit a try-linux-slave and a linux-ix-slave. They report the same JDK versions installed, so my build errors (on try-linux-slave) must be a different problem.

* slave: try-linux-slave13 (MY BUILD FAILED)
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)

* slave: linux-ix-slave07 (MY BUILD SUCCEEDED)
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)
(In reply to Chris Peterson (:cpeterson) from comment #15)
> I add "java -version" to the Fennec makefile to log the exact JDK version
> installed on the builders. I ran a few try builds until I hit a
> try-linux-slave and a linux-ix-slave. They report the same JDK versions
> installed, so my build errors (on try-linux-slave) must be a different
> problem.

cpeterson: I don't know much java myself. How about I set aside one known GOOD slave and one known BAD slave and let you poke around at them directly?
Sounds good.

btw, you mentioned earlier that the try-linux-* VMs would be phased out. What is that time frame of the phase out? That may make this investigation unnecessary. <:)
(In reply to Chris Peterson (:cpeterson) from comment #17) 
> btw, you mentioned earlier that the try-linux-* VMs would be phased out.
> What is that time frame of the phase out? That may make this investigation
> unnecessary. <:)

Not soon enough to be relevant here, sadly.
Depends on: 719810
(In reply to Chris Cooper [:coop] from comment #16)
> cpeterson: I don't know much java myself. How about I set aside one known
> GOOD slave and one known BAD slave and let you poke around at them directly?

I've set aside the following slaves for you:
* try-linux-slave28 (BAD)
* linux-ix-slave07  (GOOD)

I already had them pulled for investigation, so that seemed easiest.

I'll send you connection details out-of-band.

Please re-assign the bug back to me once you've completed your investigation and I'll repatriate the slaves.
Assignee: coop → cpeterson
:coop, I am done with the try-linux-slave28 and linux-ix-slave07 builders. Please feel free to reimage them.


The ProGuard errors "went away" when I changed its -libraryjars classpath from "$(ANDROID_SDK)" to "$(ANDROID_SDK)/android.jar". Both ProGuard's documentation says both directory and .jar -libraryjar classpaths should work, but for some reason specifying only the directory causes try-linux-slave builders to barf. I don't know why, but I have a reasonable workaround.


After poking around the try-linux-slave28 and linux-ix-slave07 builders, the only difference I can see between their JDK environments is that linux-ix-slave07's /tools/jdk6/bin/java is a "Server VM", but its /tools/jdk6/bin/javac invokes a "Client VM" java.

* try-linux-slave28 (BAD) Java versions:

$ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)

$ javac -J-version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)


* linux-ix-slave07 (GOOD) Java versions:

$ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)

$ javac -J-version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Client VM (build 14.3-b01, mixed mode, sharing)  <-- javac invokes "Client VM" java when default java is "Server VM"??
Assignee: cpeterson → coop
Blocks: 721395
Blocks: 721396
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.