Closed Bug 455512 Opened 16 years ago Closed 16 years ago

dom_events.xpt differences bettwen ppc and i386 architectures breaks builds

Categories

(Calendar :: Lightning Only, defect)

PowerPC
macOS
defect
Not set
blocker

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: gozer, Assigned: ted)

References

Details

Attachments

(5 files, 1 obsolete file)

As has been noticed since building against comm-central, OS X builds fail to complete, due to this build issue.

[comm-central]/mozilla/build/macosx/universal/unify: copyIfIdentical: files differ:
objdir-tb/ppc/mozilla/dist/thunderbird/Shredder.app/Contents/MacOS/components/dom_events.xpt
objdir-tb/i386/mozilla/dist/thunderbird/Shredder.app/Contents/MacOS/components/dom_events.xpt

Attached is a diff of the xpt_dump of both architectures, and it seems the nsIVariant uid is missing on i386.
Setting tb-integration flag to raise visibility.  I'd suggesting boosting priority/severity as well, but don't know the calendar rules.
Severity: normal → major
Flags: tb-integration+
cc'ing Ted in case he has any ideas.
bsmedberg suggests that we replace an "interface nsIVariant;" with a '#include "nsIVariant.idl"'. Looking in the DOM events directory, there are only two IDL files that reference nsIVariant, and one includes the IDL file, so I think you'd want to change the one here:

http://mxr.mozilla.org/mozilla-central/source/dom/public/idl/events/nsIDOMDataTransfer.idl#40
Being curious: Since the IDL files are the same for both builds, why do the resulting xpt files differ? Can anybody please explain why the include helps here? Thanks!
it's even worse: last time i checked, the single .xpt files were identical but after using the xtp_link the result differed.
That's a great question, and I have no idea. Could be some oddity in glib, who knows. You're certainly welcome to try debugging libIDL on both platforms, or we can just see if this quick hack works, and wash our hands of it.
I've finally gotten around to trying what's suggested in comment 3, and it worked, pushing the problem down to another file:

layout_xul_tree.xpt and the symbol nsIDOMElement. xpt_dump diff is:

--- /tmp/layout_xul_tree.i386.dump      2008-09-17 22:23:29.000000000 -0700
+++ /tmp/layout_xul_tree.ppc.dump       2008-09-17 22:23:12.000000000 -0700
@@ -8,6 +8,8 @@
 Interface Directory:
    - ::nsIAtom (00000000-0000-0000-0000-000000000000):
       [Unresolved]
+   - ::nsIDOMElement (00000000-0000-0000-0000-000000000000):
+      [Unresolved]
    - ::nsIScriptableRegion (00000000-0000-0000-0000-000000000000):
       [Unresolved]
    - ::nsISupports (00000000-0000-0000-c000-000000000046):
@@ -121,8 +123,6 @@
          No Constants
    - ::nsISupportsArray (791eafa0-b9e6-11d1-8031-006008159b5a):
       [Unresolved]
-   - ::nsIDOMElement (a6cf9078-15b3-11d2-932e-00805f8add32):
-      [Unresolved]
    - ::nsITreeSelection (ab6fe746-300b-4ab4-abb9-1c0e3977874c):
       Parent: ::nsISupports
       Flags:
I ask myself why Thunderbird and Firefox builds don't have that issue. Don't they do universal builds? Or is there maybe only some tinderbox setting missing or an update to calendar makefiles is required?
Of course they do Universal builds. I can't think of any settings that would cause this issue, unless you have a different version of glib/libIDL on the build machine.
cb-xserve03 (calendar build box):
 - OS X 10.4.7
 - libidl @0.8.6_0 (active)
 - glib2 @2.8.6_0 (active)

bm-xserve07 (thunderbird build box):
 - OS X 10.4.8
 - libidl @0.8.6_0 (active)
 - glib2 @2.10.3_0 (active)

Could the glib difference be the source of the problem ?
It's possible, sure.
It would be good to track this back a little further: are the files in dom/public/idl/events/_xpidlgen/nsI*.xpt identical? If so, it's not xpidl that's being difficult, but xpt_link. Can we make sure we're using the host (x86) version of xpt_link on both halves of the compile process?
I could potentially try and upgrade glib2, thoughts ?

I've also compared _xpidlgen/

$> diff -ru ppc/mozilla/dom/public/idl/events/_xpidlgen i386/mozilla/dom/public/idl/events/_xpidlgen
$>

So no differences there.
Wait... these two don't differ?
ppc/mozilla/dom/public/idl/events/dom_events.xpt
i386/mozilla/dom/public/idl/events/dom_events.xpt

But these do?
ppc/mozilla/dist/thunderbird/Shredder.app/Contents/MacOS/components/dom_events.xpt
i386/mozilla/dist/thunderbird/Shredder.app/Contents/MacOS/components/dom_events.xpt
No, dom_events.xpt differ, the diff I ran was only on the content of the _xpidlgen subdirectory
Re-ran the diff after an unpatched build, and the result is that dom_events.xpt are always different between platforms, but consistently so

MD5 (./i386/mozilla/dist/bin/components/dom_events.xpt) = def9b8501ac8d95891d541071bb5cd0d
MD5 (./i386/mozilla/dist/Shredder.app/Contents/MacOS/components/dom_events.xpt) = def9b8501ac8d95891d541071bb5cd0d
MD5 (./i386/mozilla/dist/thunderbird/Shredder.app/Contents/MacOS/components/dom_events.xpt) = def9b8501ac8d95891d541071bb5cd0d
MD5 (./i386/mozilla/dom/public/idl/events/_xpidlgen/dom_events.xpt) = def9b8501ac8d95891d541071bb5cd0d
MD5 (./ppc/mozilla/dist/bin/components/dom_events.xpt) = bb997fa40b902d3084f77fae0f8b2e7b
MD5 (./ppc/mozilla/dist/Shredder.app/Contents/MacOS/components/dom_events.xpt) = bb997fa40b902d3084f77fae0f8b2e7b
MD5 (./ppc/mozilla/dist/thunderbird/Shredder.app/Contents/MacOS/components/dom_events.xpt) = bb997fa40b902d3084f77fae0f8b2e7b
MD5 (./ppc/mozilla/dom/public/idl/events/_xpidlgen/dom_events.xpt) = bb997fa40b902d3084f77fae0f8b2e7b
Can we go back and re-try comparing the nsI*.xpt files in an unpatched build? I'd like to isolate the problem to either xpidl or xpt_link.
one note in advance: this machine is a ppc one. maybe that's less tested.

looking at the xpt_linkers found int the output tree, i'm wondering why ./i386/mozilla/xpcom/typelib/xpt/tools/host_xpt_link and ./ppc/mozilla/xpcom/typelib/xpt/tools/xpt_link got a different size. so if they're different, that might be a hint why they produce different output.
As far as I can tell from looking at the build logs, it looks like only the ppc xpt_link is ever invoked
Raising severity because this blocks development and testing for the entire Mac OS X platform.
Severity: major → blocker
(In reply to comment #18)
> As far as I can tell from looking at the build logs, it looks like only the ppc
> xpt_link is ever invoked

The build machine is a PPC machine? Is the Thunderbird box an x86 machine? I know the Firefox build machines are x86, so this could be the difference.
Correct, the Lightning box (cb-xserve03) is PPC, the Thunderbird box (bm-xserve07) is Intel. It's also 10.4.7 vs 10.4.8

Darwin cb-xserve03 8.7.0 Darwin Kernel Version 8.7.0: Fri May 26 15:20:53 PDT 2006; root:xnu-792.6.76.obj~1/RELEASE_PPC Power Macintosh powerpc
ProductName:    Mac OS X Server
ProductVersion: 10.4.7
BuildVersion:   8J135

Darwin bm-xserve07 8.8.4 Darwin Kernel Version 8.8.4: Sun Oct 29 15:26:54 PST 2006; root:xnu-792.16.4.obj~1/RELEASE_I386 i386 i386
ProductName:    Mac OS X Server
ProductVersion: 10.4.8
BuildVersion:   8N1215
As I noted in #build, apparently we use -O3 for some insane reason when building host tools:
http://mxr.mozilla.org/mozilla-central/source/configure.in#1558

gozer is trying a build with HOST_OPTIMIZE_FLAGS=-O2.
Attached patch patch to test (obsolete) — Splinter Review
This changes the sorting that xpt_link uses on the interfaces array to hopefully be more stable.
The trouble is pinpointed somewhat further. Debugging the main processing loop of xpt_link (http://mxr.mozilla.org/mozilla-central/source/xpcom/typelib/xpt/tools/xpt_link.c#369), I've been able to narrow it down somewhat further.

The order in which elements are sorted first (http://mxr.mozilla.org/mozilla-central/source/xpcom/typelib/xpt/tools/xpt_link.c#349) is slightly different between the 2 versions of the program. nsIVariant has 2 implementations in the list, one with IID 0000..., the other one with what looks like a defined value.

The order of these 2 elements is reversed depending on which xpt_link you end up running. Looking at the comparaison (http://mxr.mozilla.org/mozilla-central/source/xpcom/typelib/xpt/tools/xpt_link.c#658), it sorts first by name, and failing that, it compares the name pointers (definitely a strange choice).

Turns out that, correctly or not, the pointer addresses for the name element is indeed reproducibly different, ending up sorting one element before the other, depending on what xpt_link you call.

Then somewhere along the main loop (haven't found where quite yet), one of these 2 instances is removed in favor of the other one, leaving only one to make it in the generated dom_events.xpt

I've only had time to look at this first candidate that might be deleting it (http://mxr.mozilla.org/mozilla-central/source/xpcom/typelib/xpt/tools/xpt_link.c#378) and it isn't it.

More debugging is needed.

Just as an experiment, I've tried altering that ordering to not depending on pointer ordering and fallback on a different proprety, like the IID itself, and that does end up picking the same nsIVariant, but the resulting dom_events.xpt are wildly differing, so something is certainly up with that particular ordering choice.

I would suspect pointer ordering is not necessarely the intent, but has a depended upon side-effect (like initialization/creation order, for instance)
Attached patch another testSplinter Review
This patch changes the behavior of xpt_link when it encounters unresolved interfaces with the same name. Previously, it would just pick one. This patch makes it double check to see if one of them has a non-zero IID, and prefer that one instead.
Attachment #340146 - Attachment is obsolete: true
Comment on attachment 340377 [details] [diff] [review]
another test

Apologies for the diff, but my changes seem to confuse every 'diff -w' variant I tried here. A good chunk of it is just re-indentation. Pretty much everything from 
-            } else {
-                if (!IDE_array[i].interface_descriptor ||

down through

+            } else if (to_delete == i) {
+                /* Shrink the IDE_array to delete the duplicate interface.

until

+            } else {
+                /* XXX: error! */
Attachment #340377 - Flags: review?(benjamin)
Also, lightning/build/universal.mk and providers/gdata/universal.mk didn't take into account the slightly different objdir layout in comm-central:

dist/ vs. mozilla/dist/
Attachment #340594 - Flags: review?(kairo)
Attachment #340594 - Flags: review?(kairo) → review+
Comment on attachment 340594 [details] [diff] [review]
[checked in] OBJDIR patch for comm-central

>diff -r d3427aaaa3f8 calendar/lightning/build/universal.mk
>--- a/calendar/lightning/build/universal.mk     Wed Sep 24 11:22:26 2008 +0200
>+++ b/calendar/lightning/build/universal.mk     Fri Sep 26 09:43:30 2008 -0700
>@@ -65,3 +65,4 @@
>                $(DIST_X86)/xpi-stage/lightning \
>                > $(DIST_UNI)/xpi-stage/lightning/install.rdf
>        cd $(DIST_UNI)/xpi-stage/lightning && $(ZIP) -qr ../lightning.xpi *
>+

What's the reason for this hunk?


The patch looks good to me, even though I think it sucks that calendar needs its own universal.mk stuff
Comment on attachment 340594 [details] [diff] [review]
[checked in] OBJDIR patch for comm-central

checked in without the useless whiteline change I should have cleaned out before publishing

http://hg.mozilla.org/comm-central/rev/441

changeset:   441:b865155974be
tag:         tip
user:        Philippe M. Chiasson <gozer@mozillamessaging.com>
date:        Fri Sep 26 13:01:56 2008 -0400
summary:     Bug 455512. Adjust lightning/gdata universal build to take into account the new directory layout in comm-central. r=KaiRo
Attachment #340594 - Attachment description: OBJDIR patch for comm-central → [checked in] OBJDIR patch for comm-central
Attachment #340377 - Flags: review?(benjamin) → review+
Looks like the checkin fixed macosx but broke linux and win32 builds:

MacOSX 10.4 comm-central-calendar ltn build: ok
Linux comm-central-calendar ltn build: busted
Win2k3 comm-central-calendar ltn build: busted

last entries in linux and win32 log file are 
objdir-tb/mozilla/dist/universal/xpi-stage/lightning.xpi: No such file or directory
objdir-tb/mozilla/dist/universal/xpi-stage/gdata-provider.xpi: No such file or directory
Pushed:
http://hg.mozilla.org/mozilla-central/rev/a78f9bb9006e

Please file follow up bugs on any other issues.
Assignee: nobody → ted.mielczarek
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Pushed:
http://hg.mozilla.org/build/buildbot-configs/rev/28f776647d66

And all the nightly builders have turned green, so success.
Status: RESOLVED → VERIFIED
Target Milestone: --- → 1.0
Target Milestone: 1.0 → 1.0b1
You need to log in before you can comment on or make changes to this bug.