Closed Bug 516858 Opened 10 years ago Closed 10 years ago

c-central + m-central MacOSX builds fail to compile after m-c changeset 32506 : 9c3a70ea7acf

Categories

(MailNews Core :: Build Config, defect, critical)

x86
macOS
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 3.0rc1

People

(Reporter: sgautherie, Assigned: standard8)

References

()

Details

Attachments

(4 files)

http://hg.mozilla.org/mozilla-central/rev/9c3a70ea7acf
{
9c3a70ea7acf
2009-09-15 15:58 -0400
Josh Aas - Breakpad PPC bustage fix. r=ted
}

*****

{
http://tinderbox.mozilla.org/showlog.cgi?log=SeaMonkey/1253051783.1253055323.22710.gz
OS X 10.5 comm-central-trunk build on 2009/09/15 14:56:23
[...]
/builds/slave/comm-central-trunk-macosx/build/mozilla/toolkit/crashreporter/google-breakpad/src/client/mac/handler/minidump_generator.cc:392: error: 'struct ppc_thread_state' has no member named '__r31'
/builds/slave/comm-central-trunk-macosx/build/mozilla/toolkit/crashreporter/google-breakpad/src/client/mac/handler/minidump_generator.cc:396: error: 'struct ppc_thread_state' has no member named '__mq'
make[8]: *** [minidump_generator.o] Error 1
}

"Same" with
http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird/1253046599.1253047478.2650.gz
MacOSX 10.5 comm-central build on 2009/09/15 13:29:59
My bet would be that we haven't ported the configure changes from http://hg.mozilla.org/mozilla-central/rev/a4e2df0a6af5

Hence comm-central is configuring for build 10.4, mozilla-central is wanting 10.5 and I expect there is quite a mis-match there.

Did I see a patch around somewhere that will add a MOZILLA_1_9_2_BRANCH define?
My bustage fix there unfortunately makes Breakpad not compile for PPC with the 10.4 SDK. You can switch your m-c builds to the 10.5 SDK (you can do so in the mozconfig, or include the universal mozconfig from m-c, which sets it) which will fix this bustage. We're not planning on supporting 10.4 on m-c anymore, so you might hit other bustage anyway (like when we switch to Core Text).
Well for Thunderbird's mozconfigs we include the universal one:

http://hg.mozilla.org/build/buildbot-configs/file/8612f756fead/thunderbird/macosx/mozconfig

So I'm pretty sure this is just the fact our configure is doing something to mozilla-central's.
Ah. Might be a mismsatch with --enable-macos-target. Try adding:
ac_add_options --enable-macos-target=10.5
to the mozconfig. That's the default in m-c's configure, but your configure is probably still setting 10.4.
Yes, I think we should just go and match mozilla-central and build with 10.5 as a minimum for non-1.9.1 builds. We possibly can revisit 1.9.2 later if we might decide to go with it, but we might just go with 1.9.3 in any case.
(In reply to comment #5)
> Yes, I think we should just go and match mozilla-central and build with 10.5 as
> a minimum for non-1.9.1 builds. We possibly can revisit 1.9.2 later if we might
> decide to go with it, but we might just go with 1.9.3 in any case.

I was thinking that I'd see one of Serge's patches with a MOZILLA_1_9_2_BRANCH definition (although I could easily write that), hence we could easily detect our configuration based on the branch we're building with.
(In reply to comment #1)
> Did I see a patch around somewhere that will add a MOZILLA_1_9_2_BRANCH define?

Yes: bug 516195.
Depends on: 516195
Depends on: 501436
Assignee: nobody → bugzilla
Assignee: bugzilla → nobody
Severity: blocker → critical
Component: Breakpad Integration → Build Config
Product: Toolkit → MailNews Core
QA Contact: breakpad.integration → build-config
Target Milestone: --- → Thunderbird 3.0rc1
This should fix the bustage - I'm currently rebuilding and testing the patch, however it is basically a port of the configure.in part of the patch on bug 501436.
Assignee: nobody → bugzilla
Status: NEW → ASSIGNED
Attachment #401289 - Flags: review?(gozer)
Comment on attachment 401289 [details] [diff] [review]
The fix
[Checkin: Comment 10]

Looks safe to me, so +1. But I have to say, this bit makes me cringe a little:

if test "$MOZILLA_1_9_1_BRANCH$MOZILLA_1_9_2_BRANCH" = "1"; then

as opposed to actually testing for what is being tested. if $191 or $192; then

I understand they can never both be true, but still...
Attachment #401289 - Flags: review?(gozer) → review+
Ok, time for an update. I checked in the patch and a couple of bustage fixes:

http://hg.mozilla.org/comm-central/rev/8897768028ba
http://hg.mozilla.org/comm-central/rev/c10f2b5bb6c3
http://hg.mozilla.org/comm-central/rev/96649d27f85a

The bustage fixes should fix us for bustage from bug 516213.

However we're still broken on trunk - for some reason I think the build is timing out/crashing when we're doing the ppc part of the build, possibly in the LDAP code. No idea as to why yet.
I've been able to reproduce on one of the nightly builders.

It's when it's running:

gcc-4.2 -arch ppc -o ufn.o -c -gdwarf-2 -Wmost -fno-common -isysroot /Developer/SDKs/MacOSX10.5.sdk -pthread -O -UDEBUG -DMOZILLA_CLIENT=1 -DNDEBUG=1 -DXP_UNIX=1 -DDARWIN=1 -DHAVE_BSD_FLOCK=1 -Dppc=1 -DHAVE_LCHOWN=1 -DHAVE_STRERROR=1 -DHAVE_GETADDRINFO=1 -DHAVE_GETNAMEINFO=1 -DFORCE_PR_LOG -D_PR_PTHREADS -UHAVE_CVAR_BUILT_ON_SEM -DUSE_WAITPID -DNEEDPROTOS -DNET_SSL -DNO_LIBLCACHE -DLDAP_REFERRALS -DNS_DOMESTIC -UMOZILLA_CLIENT -DUSE_PTHREADS -I/Volumes/Build/comm-central-trunk-macosx-nightly/build/objdir-tb/ppc/mozilla/dist/public/ldap -I/Volumes/Build/comm-central-trunk-macosx-nightly/build/directory/c-sdk/ldap/include -I/Volumes/Build/comm-central-trunk-macosx-nightly/build/objdir-tb/ppc/mozilla/dist/./public /Volumes/Build/comm-central-trunk-macosx-nightly/build/directory/c-sdk/ldap/libraries/libldap/ufn.c

It just sits there. I've been able to re-trigger by running that compilation line by itself, and gcc just sits there. No CPU, no RAM being used, and dtruss shows absolutely no activity, so it must be stuck in gcc's c-land
0x942ce791 in __wait4 ()
(gdb) bt
#0  0x942ce791 in __wait4 ()
#1  0x942ce787 in waitpid$UNIX2003 ()
#2 ...

So for some reason, it's stuck waiting for a process to come back, but it never will, since it doesn't exist.
managed to re-run gcc with dtruss, and here is the output. Unfortunately, osx's dtrace doesn't ship a pid provider, so the observability into gcc is pretty much null
Interestingly, gcc-4.0 has no issues with the file

momo-vm-osx-leopard-05:tmp cltbld$ gcc-4.0 -arch ppc -c -o ufn.o -gdwarf-2 -Wmost -fno-common -isysroot /Developer/SDKs/MacOSX10.5.sdk -pthread -O ufn.c

Works just fine

momo-vm-osx-leopard-05:tmp cltbld$ gcc-4.2 -arch ppc -c -o ufn.o -gdwarf-2 -Wmost -fno-common -isysroot /Developer/SDKs/MacOSX10.5.sdk -pthread -O ufn.c

Gets stuck and tried the same on my Snow Leopard MacBook Pro, and same thing. gcc-4.0 okay, gcc-4.2 gets stuck.
Now that I can repro on my own box, I'll try and shrink ufn.c down some more.
(In reply to comment #14)
> gcc-4.0 okay, gcc-4.2 gets stuck.

Since the day I started to build my own Thunderbird I only used Apples gcc-4.2 for my builds. I don't know which version you or Mozilla is using, but I also had some build problems with older versions of Apples gcc-4.2. The first gcc-4.2 version with no problems was gcc version 4.2.1 "Apple Inc. build 5566" (included in Xcode 3.1.2 Developer Tools). So updating the version of gcc-4.2 could be a possibility.
gozer: do you have XCode 3.1 installed? That's what our build slaves are using.
$ gcc --version
i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5484)
$ gcc-4.2  --version
i686-apple-darwin9-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5564)

$ grep 'Xcode version' /Developer/Applications/Xcode.app/Contents/Info.plist
        <string>Xcode version 3.1</string>

Interesting, maybe we need to update to Xcode 3.1.2 ? Looking at https://wiki.mozilla.org/ReferencePlatforms/Mac-10.5 seems to indicate the OS X refplatform is at Xcode 3.1 just like mine, Ted ?
It's also the case that our builds are busted in code that Firefox doesn't compile, we break in LDAP after all.
Ben could comment on exactly what version of XCode is installed. It is possible that there's a compiler problem with that LDAP code, sure. If you have a spare machine, you might try updating to the absolute latest XCode to see if that works. I know Josh mentioned that XCode 3.1.3 did contain some bug fixes in gcc 4.2.
We've got:
bm-xserve17:~ cltbld$ gcc --version 
i686-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5484)
Copyright (C) 2005 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

bm-xserve17:~ cltbld$ gcc-4.2 --version
i686-apple-darwin9-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5564)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

bm-xserve17:~ cltbld$ cat /Developer/Applications/Xcode.app/Contents/version.plist 
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
	<key>BuildVersion</key>
	<string>1</string>
	<key>CFBundleShortVersionString</key>
	<string>3.1</string>
	<key>CFBundleVersion</key>
	<string>1099</string>
	<key>ProjectName</key>
	<string>DevToolsIDE</string>
	<key>SourceVersion</key>
	<string>10990000</string>
</dict>
</plist>
Also, I can reproduce this bustage with Xcode 3.2 on my Snow Leopard box

i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5646)

So definitely has the smell of a gcc bug

Ben, can you try and compile attachment 401461 [details] on one of your build boxes? It's preprocessed, so it requires nothing beside of gcc

This should hang
$> gcc-4.2 -arch ppc -c -o ufn.o -gdwarf-2 -O1 ufn.c 

But not this
$> gcc-4.2 -arch ppc -c -o ufn.o -gdwarf-2 -O2 ufn.c 
or this
$> gcc-4.2 -arch ppc -c -o ufn.o -g -O1 ufn.c
(In reply to comment #22)
> This should hang
> $> gcc-4.2 -arch ppc -c -o ufn.o -gdwarf-2 -O1 ufn.c 

It does

> 
> But not this
> $> gcc-4.2 -arch ppc -c -o ufn.o -gdwarf-2 -O2 ufn.c 

It doesn't


> or this
> $> gcc-4.2 -arch ppc -c -o ufn.o -g -O1 ufn.c

This one hangs
FYI, I've tested it today and on my Mac (Intel iMac) I can build TB 3.1a1pre with 10.5 SDK, gcc-4.2 and "-arch ppc" without any problems...
I suspect we could work around this issue by just having the LDAP sdk compiled with -O0, for instance.
I am going to try and spin a nightly with

--enable-optimize=-O2

to try and work around that issue, let's see what happens.
Duplicate of this bug: 518296
Any hope of at least a temporary workaround to get builds (even without ldap) again?
(In reply to comment #28)
> Any hope of at least a temporary workaround to get builds (even without ldap)
> again?

gozer is going to write up what tests he's done when he gets time. I'm then going to look at getting a fix into the LDAP code base.

All of which shouldn't really take long, but a b4 release and string freeze has just delayed it.

I'm not too concerned yet as Windows & Linux builds are still running and all the test boxes so we have reasonable coverage there.
I believe I narrowed it down to a single line of code, but it's certainly very strange.

in ldap_ufn_expand

   if (( msgid = ldap_search( ld, dn, scope, filter, attrs,
       aonly )) == -1 ) {
    ldap_msgfree( tmpcand );
    *err = ldap_get_lderrno( ld, ((void *)0), ((void *)0) );
    return( ((void *)0) ); /* XXX */

that last return is causing the gcc hang for me. Commenting it out makes the bug dissapear. Changing it is more interesting. return 1; works just fine, gcc is happy. Anything else that the optimizer can resolve to 0 seems to cause problems, variants I've tried

return 1; //WORKS
return 2; //WORKS
return 0; //HANGS
return i; //WORKS
return i-i; //HANGS
return 2-2; //HANGS

Absolutely not sure *why* the compiler is doing this, but definitely something tripping up the optimizer somehow.
Blocks: 520401
Given the original fix for this bug was to build with 10.5 not 10.4, and the issue we have now is ldap specific, I've spun the LDAP issue off into bug 520401.

Therefore I'm closing this bug as fixed, even though the builds won't work yet.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Depends on: 516213
(In reply to comment #10)

> http://hg.mozilla.org/comm-central/rev/c10f2b5bb6c3
> http://hg.mozilla.org/comm-central/rev/96649d27f85a
> 
> The bustage fixes should fix us for bustage from bug 516213.

Mark, should SeaMonkey copy these additional/WebGL changes?
Comment on attachment 401289 [details] [diff] [review]
The fix
[Checkin: Comment 10]

>diff --git a/configure.in b/configure.in
>@@ -383,8 +383,14 @@ if test -n "$CROSS_COMPILE" && test "$ta
>+        dnl 1.9.1 and 1.9.2 support 10.4, 1.9.3 and later don't.
>+        if test "$MOZILLA_1_9_1_BRANCH$MOZILLA_1_9_2_BRANCH" = "1"; then
>         CFLAGS="-isysroot /Developer/SDKs/MacOSX10.4u.sdk $CFLAGS"
>         CXXFLAGS="-isysroot /Developer/SDKs/MacOSX10.4u.sdk $CXXFLAGS"
>+        else
>+        CFLAGS="-isysroot /Developer/SDKs/MacOSX10.5u.sdk $CFLAGS"
>+        CXXFLAGS="-isysroot /Developer/SDKs/MacOSX10.5u.sdk $CXXFLAGS"

Previously, m-c and c-c used MacOSX10.4u.sdk.
Is it expected that they now use MacOSX10.5.sdk and MacOSX10.5u.sdk respectively?
(In reply to comment #32)
> Mark, should SeaMonkey copy these additional/WebGL changes?

If SeaMonkey is compiling fine without then it doesn't need it.

(In reply to comment #33)
> Previously, m-c and c-c used MacOSX10.4u.sdk.
> Is it expected that they now use MacOSX10.5.sdk and MacOSX10.5u.sdk
> respectively?

Oh yes, that's wrong. It might explain some of the problems we've been having as well. I'll attach a patch in a bit.
s/MacOSX10.5u.sdk/MacOSX10.5.sdk/ - could explain why our non-universal tinderboxes had a bit of trouble.
Attachment #405243 - Flags: review?(gozer)
Comment on attachment 405243 [details] [diff] [review]
[checked in] The fix

(In reply to comment #35)
> Created an attachment (id=405243) [details]
> The fix
> 
> s/MacOSX10.5u.sdk/MacOSX10.5.sdk/ - could explain why our non-universal
> tinderboxes had a bit of trouble.

Definitely, there is no such thing as MacOSX10.5u.sdk, it's MacOSX10.5.sdk. Does this also makes the previous mozconfig change unnecessary ?
Attachment #405243 - Flags: review?(gozer) → review+
Comment on attachment 405243 [details] [diff] [review]
[checked in] The fix

a=Standard8: minor configure fix to pick up the correct sdk version rather than one that doesn't exist.
Attachment #405243 - Flags: approval-thunderbird3+
Comment on attachment 405243 [details] [diff] [review]
[checked in] The fix

Checked in:

http://hg.mozilla.org/comm-central/rev/2330bc790d88

I've also backed out just the unit test change where we added --with-macosx-sdk to the mozconfig:

http://hg.mozilla.org/build/buildbot-configs/rev/3ef4faf32076

and clobbered the trunk unit test boxes.

If the builds still pass then I'll do the same to the bloat boxes.
Attachment #405243 - Attachment description: The fix → [checked in] The fix
Blocks: 522028
(In reply to comment #38)
> I've also backed out just the unit test change where we added --with-macosx-sdk
> to the mozconfig:
> 
> http://hg.mozilla.org/build/buildbot-configs/rev/3ef4faf32076
> 
> and clobbered the trunk unit test boxes.
> 
> If the builds still pass then I'll do the same to the bloat boxes.

The configure.in fix wasn't enough, so I've backed out (well, put back in) the mozconfig change:

http://hg.mozilla.org/build/buildbot-configs/rev/388a3e541e8f

I've raised bug 522028 for actually figuring out what we're getting wrong that these builds need the --with-macosx-sdk option.
Attachment #405243 - Attachment description: [checked in] The fix → The fix [Checkin: Comment 38]
Attachment #401289 - Attachment description: The fix → The fix [Checkin: Comment 10]
Attachment #405243 - Attachment description: The fix [Checkin: Comment 38] → [checked in] The fix
(In reply to comment #34)
> (In reply to comment #32)
> > Mark, should SeaMonkey copy these additional/WebGL changes?
> 
> If SeaMonkey is compiling fine without then it doesn't need it.

Eventually, SeaMonkey port need became bug 523562 :-/
Blocks: 523562
Flags: in-litmus-
Flags: in-litmus- → in-testsuite-
Depends on: 492089
You need to log in before you can comment on or make changes to this bug.