Closed Bug 792085 Opened 7 years ago Closed 7 years ago

IonMonkey merge broke powerpc and sparc64

Categories

(Core :: JavaScript Engine, defect)

PowerPC
OpenBSD
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla19
Tracking Status
firefox17 --- unaffected
firefox18 --- fixed
firefox19 --- fixed

People

(Reporter: gaston, Assigned: gaston)

References

(Depends on 1 open bug)

Details

Attachments

(4 files, 4 obsolete files)

It seems the ionmonkey merge to m-c broke ppc and sparc64 (ie, non-methodjit ?) builds.

http://buildbot.rhaalovely.net/builders/mozilla-central-sparc64/builds/408 is broken
http://buildbot.rhaalovely.net/builders/mozilla-central-sparc64/builds/407 was successful

pretty sure the new breakage is related to ionmonkey merge into js/src.

message is :
js/src/assembler/assembler/AssemblerBufferWithConstantPool.h:336:9: error: 'JaegerSpew' is not a member of 'js'

I'm afraid js/src now tries to use sparc v8 assembly (for solaris ?) even on sparc64/sparcv9. It was fine before.

The powerpc build failure is slightly different (my last m-c ppc build was on 31/7, so the failure is also probably ionmonkey-related)

js/src/assembler/assembler/MacroAssembler.h:62:2: error: #error "The MacroAssembler is not supported on this platform."

Looks like bug 670719 makes a comeback.
Don't realy know where to start... but now the ppc failure is the #error in MacroAssembler.h, which didnt change in ages, and all code in there is conditional to ENABLE_ASSEMBLER which is inconditionally set in js/src/Makefile.in. So my only guess would be that MacroAssembler.h is now included somewhere where it wasnt included previously on !ENABLE_JIT platforms. Could be the same breakage cause for sparc64.

Anyone can point me at a changeset for the IM->m-c merge so i can grep for it ? Or any better idea ?
These are the changesets for the IM->m-c merge (merge, then fix):

https://hg.mozilla.org/mozilla-central/rev/ca3fa3fbe62a
https://hg.mozilla.org/mozilla-central/rev/fdfaef738a00

PPC/Sparc64 should be able to build with IonMonkey disabled via --disable-ion. It's likely that some architecture-specific definitions leaked via includes, since we don't have PPC or Sparc machines on which to test. Fixing this should just be a matter of hunting down unnecessary inclusions and removing them, or adding more "#ifdef ION".
i've tried --disable-ion on PPC and the failure is the same. Will try on sparc64, but i guess i'll have to 'hunt down' those bad ionmonkey-only inclusions.

That said, i dont see ENABLE_ION being set or unset to a 'default sane value' on other than arm/x86/x64 (sparc v8 has it set to 0 but commented out). Wouldnt it be better (even if not fixing our current issue) to have it default to unset (even only for poor exotic/legacy archs) where we strive to get mozilla building for a longer timeframe ?
ENABLE_ION is only set for x86/x64/arm so it shouldn't be set for SPARC (as I read configure.in).
This WIP patch allows js/src to be built, sprinkling various #ifdef around (mostly JS_ION, but also a bunch of JS_METHODJIT around *LoopCount calls). Unfortunately the js shell built against that code segfaults, and trying to load it into gdb segfaults gdb. So i guess i'm doing smth wrong, but sending the diff anyway in case more eyes can look into that annoying issue.

note: i think the {,js/src/}configure.in chunk is unrelated, that was a gcc/llvm detection bug iirc and i think it's been fixed. dont remember the bug # atm
With that patch (and another one adding missing #include string.h under gfx/, to be filed) m-c builds but firefox segfaults at startup.
(In reply to Landry Breuil (:gaston) from comment #8)
> With that patch (and another one adding missing #include string.h under
> gfx/, to be filed) m-c builds but firefox segfaults at startup.

How about just the JS shell, in js/src? Build documentation is available at https://developer.mozilla.org/en-US/docs/SpiderMonkey/Build_Documentation
(In reply to Sean Stangl from comment #9)
> (In reply to Landry Breuil (:gaston) from comment #8)
> > With that patch (and another one adding missing #include string.h under
> > gfx/, to be filed) m-c builds but firefox segfaults at startup.
> 
> How about just the JS shell, in js/src? Build documentation is available at
> https://developer.mozilla.org/en-US/docs/SpiderMonkey/Build_Documentation

As i said in comment 7:

> This WIP patch allows js/src to be built, sprinkling various #ifdef around
> (mostly JS_ION, but also a bunch of JS_METHODJIT around *LoopCount calls).
> Unfortunately the js shell built against that code segfaults, and trying to
> load it into gdb segfaults gdb.
Updating patch, with this js compiles on my ppc, but is still broken at runtime, segfaults at startup, trying to build with debug syms to get a trace.

Any comment on the patch itself, if sprinkling various #if around is the way to go, or i should rather futze the ion headers to define stuff only on supported archs ?

This is getting annoying, i lost countless hours on getting the js engine on ppc unfucked in bug #691898, and the ionmonkey merge is basically giving me the finger after all those efforts.

Aurora is broken too on ppc atm. Do you guys want ffx 17 to be the last release working on powerpc, or are you please willing to help ?
Attachment #665539 - Attachment is obsolete: true
Attachment #676678 - Attachment is patch: true
(In reply to Landry Breuil (:gaston) from comment #11)
> Any comment on the patch itself, if sprinkling various #if around is the way
> to go, or i should rather futze the ion headers to define stuff only on
> supported archs ?

#ifdef seems reasonable - if it makes things easier, you could put the #ifdefs in the critical headers themselves. Making new dummy headers for unsupported platforms would be an option, but probably more work.

> This is getting annoying, i lost countless hours on getting the js engine on
> ppc unfucked in bug #691898, and the ionmonkey merge is basically giving me
> the finger after all those efforts.
> 
> Aurora is broken too on ppc atm. Do you guys want ffx 17 to be the last
> release working on powerpc, or are you please willing to help ?

I'm sorry to hear that it's been so much trouble. I don't really have any way of testing these builds, but I'm definitely willing to help diagnose what's going wrong. During IonMonkey development we ensured that --disable-ion as a build-time preference was working, but now that I think about it, we only tested this on platforms where the header files were supported.

But if you've gotten it building, it should be comparable to a --disable-ion build on x86/x64/ARM, and hopefully the startup crash is something like a NULL deref that just needs an #ifdef.
Debugging through gdb proves to be useless, even with a debug build :

[21:27] mikey:/usr/obj/m-c/js/src/ $egdb shell/js             
GNU gdb (GDB) 7.5
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "powerpc-unknown-openbsd5.2".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/obj/m-c/js/src/shell/js...DW_FORM_strp pointing outside of .debug_str section [in module /usr/obj/m-c/js/src/shell/js]
(gdb) r
Starting program: /usr/obj/m-c/js/src/shell/js 
Error while reading shared library symbols for ../../dist/bin/libnspr4.so.1.0:
Dwarf Error: wrong version in compilation unit header (is 0, should be 2, 3, or 4) [in module ../../dist/bin/libnspr4.so.1.0]

Program received signal SIGSEGV, Segmentation fault.
0x01b716e4 in ?? ()
(gdb) bt
#0  0x01b716e4 in ?? ()
#1  0x01b716d0 in ?? ()
#2  0x018663a8 in ?? ()
#3  0x018209a8 in ?? ()
#4  0x01813994 in ?? ()
#5  0x00000000 in ?? ()


Debugging with the standard gdb 6.3 provided by the basesystem doesnt even go there, it crashes right away.

This GDB was configured as "powerpc-unknown-openbsd5.2"...Segmentation fault (core dumped)
CC'ing Marty -- it turns out that he has a PPC machine and may be able to diagnose the crashing further near the end of the week.
I applied the earlier version of this patch, added some more JS_ION stuff,
got webrtc to compile and fixed the page size in the JS engine.

With all of this I am able to start up a ff18 build on my Linux 32 bit ppc machine.  It is working well enough to post this.

I haven't yet looked at Attachment #676678 [details] [diff] to see how it differs. 

The startup crash you are seeing is probably caused by the issue I fixed with in js/src/gc/Heap.h
I am not yet sure if 12 proper for ArenaShift or not but at least it works in a debug build
Hah, nice to know. Unfortunately PageShift is gone from mozilla-central so the same fix doesnt apply, but i'll test your patch on my mozilla-aurora checkout.
Aha! the def moved to js/public/HeapAPI.h !
Bingo! Replacing
#elif defined(__powerpc__)
by
#elif defined(__powerpc64__)   
in js/public/HeadAPI.h on top of att #676678, effectively setting PageShift back to 12, produces a working jsshell ! I'll build ffx with it, but i'm pretty sure that fixes the problem.

Apparently bug #746112 made the assumption that all powerpcs had a 64k pagesize, which might in fact be valid only for ppc64.

On OpenBSD/ppc (not ppc64!) we have :

hw.machine=macppc
hw.pagesize=4096
Depends on: 808282
So here's what i have now, basically previous patch adding #ifdef JS_ION around, and a bunch of #ifdef JS_METHODJIT on some resetloopcount() calls, which are mjit-only.
It also restores the 4k pagesize on ppc32, setting it to 64k on ppc64 - would like confirmation from billm on that chunk. Anyone with a ppc64 willing to test ?

That full patch produces a working js shell, ran a bunch of tests fine. I'm still fighting with libxul linking in bug 802282, but i have good hopes this produces a working firefox.
Assignee: general → landry
Attachment #676678 - Attachment is obsolete: true
Attachment #677997 - Flags: review?(dvander)
Attachment #677997 - Flags: feedback?(wmccloskey)
Equivalent patch for mozilla-aurora, in case this is the right direction. To be tested here, once m-c testing is okay.
Comment on attachment 677997 [details] [diff] [review]
Add a bunch of #ifdef JS_ION to let js build on ppc, restore 4k pages on ppc32

It looks like the page size is a configure-time option for PPC kernels. Eventually we might want to make this a configure-time option for SpiderMonkey.

We have assertion to check that the page size is correct, and it may be why you were crashing on startup. However, it should probably print something when it fails. Would you mind changing the InitMemorySubsystem implementations in js/src/gc/Memory.cpp so that they print a message before crashing? Maybe something like:
  "SpiderMonkey compiled with incorrect page size; please update js/public/HeapAPI.h."

Sorry for all the trouble this caused you.
Attachment #677997 - Flags: feedback?(wmccloskey) → feedback+
(In reply to Bill McCloskey (:billm) from comment #22)
> Comment on attachment 677997 [details] [diff] [review]
> Add a bunch of #ifdef JS_ION to let js build on ppc, restore 4k pages on
> ppc32
> 
> It looks like the page size is a configure-time option for PPC kernels.
> Eventually we might want to make this a configure-time option for
> SpiderMonkey.
> 
> We have assertion to check that the page size is correct, and it may be why
> you were crashing on startup. However, it should probably print something
> when it fails. Would you mind changing the InitMemorySubsystem
> implementations in js/src/gc/Memory.cpp so that they print a message before
> crashing? Maybe something like:
>   "SpiderMonkey compiled with incorrect page size; please update
> js/public/HeapAPI.h."

Yes, that's the place where it crashes . adding a debug printf and reverting the chunk that makes pagesize 32 on all ppcs indeeds shows that MOZ_CRASH is called there.

Do you mean i should append that debug printf to the diff, or it's just to confirm the hypothesis ?

   if (size_t(sysconf(_SC_PAGESIZE)) != PageSize) {
        printf("SpiderMonkey compiled with incorrect page size; please update js/public/HeapAPI.h.\n");
        MOZ_CRASH();
   }

Would do, or i shouldnt use printf directly ?
Yes, please add the printf to the diff. Or, better yet, use:
  fprintf(stderr, "...");
Same patch with fprintf(stderr); in case pagesize doesnt match the os value.
Attachment #677997 - Attachment is obsolete: true
Attachment #677997 - Flags: review?(dvander)
Attachment #678031 - Flags: review?(dvander)
I'm running fx 19.0a1 from m-c on openbsd/ppc(32), so the patch is definitely fixing things for that platform, and maybe other non-ionmonkey platforms. review ping ?
Comment on attachment 678031 [details] [diff] [review]
Add a bunch of #ifdef JS_ION to let js build on ppc, restore 4k pages on ppc32

Review of attachment 678031 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for doing this! r=me with the extra #ifdefs resolved

::: js/src/jsinterp.cpp
@@ +1125,3 @@
>      /* Reset the loop count on the script we're entering. */
>      script->resetLoopCount();
> +#endif

Why do these resetLoopCounts have to be #ifdef'd? It looks like a general function.
Attachment #678031 - Flags: review?(dvander) → review+
(In reply to David Anderson [:dvander] from comment #27)
> ::: js/src/jsinterp.cpp
> @@ +1125,3 @@
> >      /* Reset the loop count on the script we're entering. */
> >      script->resetLoopCount();
> > +#endif
> 
> Why do these resetLoopCounts have to be #ifdef'd? It looks like a general
> function.

Unless i'm mistaken, resetLoopCount() is defined only within #ifdef JS_METHODJIT in jsscript.h..
Ah, yup, you're right. It might be cleaner to move that out, and #ifdef its contents instead.
Well.. if we do that, for consistency we also need to do it for incrLoopCount(), and then what to do for getMaxLoopCount() called in isShortRunning() ? Move the 3 of them outside #ifdef JS_METHODJIT in jsscript.h, then add #ifdef JS_METHODJIT inside them ?
I'm not sure that'll simplify the whole code...
On a sidenote, aurora builds fine on OpenBSD/ppc with att #677998, but firefox segfaults at startup.
The js shell itself is fine, it's still running the jsapi tests. xpcshell segfaults too, standalone or when run by gmake package. aurora packages fine on OpenBSD/amd64, so this different issue looks to be only on ppc, and somehow related to js ?

Of course, this produces unusable coredumps which blows gdb. Should i look for another MOZ_CRASH() candidate for this segfault ? in xpcshell itself, js, or all mozilla/libxul ?
Pushed to try for extra safety, i'll take care of landing this on inbound once everything's green.

https://tbpl.mozilla.org/?tree=Try&rev=93d5344f8022

Anyone with a ppc could try the aurora patch and see if it works for them ? so that i know if the patches effectively fixes ppc and is a local openbsd issue, or if there's still smth broken.
All the jsapi tests pass fine on aurora/ppc, so the crash is located somewhere in xpcshell/firefox.
Keywords: checkin-needed
And it was a good idea, since a chunk burnt all platforms :

js/src/ion/Registers.h:23:8: error: macro names must be identifiers
#ifndef(JS_CPU_ARM)

i absolutely dont remember why this ended up in my diff. Will rebuild here without that chunk, then repush to try.
Ah, now i remember. MacroAssembler.h only can be included if JS_METHODJIT is defined otherwise build blows. So it should be :

-#ifndef JS_CPU_ARM
+#if !defined(JS_CPU_ARM) && defined(JS_METHODJIT)

https://tbpl.mozilla.org/?tree=Try&rev=b2af5ff4acbb has this version.
attachement #677998 builds and lets me start up firefox on aurora. My mchine is ppc32 debian linux. I don't get the crash on startup issue you are seeing.
(In reply to steve from comment #35)
> attachement #677998 builds and lets me start up firefox on aurora. My mchine
> is ppc32 debian linux. I don't get the crash on startup issue you are seeing.

Great (well, for you :) !

So i'll dig into this as a local OpenBSD issue; but the aurora patch should be good to go once the m-c patch has baked a while. I'll update it with latest JS_CPU_ARM change (and MOZ_CRASH fprintf's added) before resubmitting it.

All green on try, so pushed to inbound :
https://hg.mozilla.org/integration/mozilla-inbound/rev/5d6ef2feaa93
https://hg.mozilla.org/mozilla-central/rev/5d6ef2feaa93
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
I finally managed to get a somewhat readable xpcshell coredump, but not with libxul symbols (rebuilding with debug, but i doubt it'll link) :

<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /usr/obj/m-a/js/xpconnect/shell/xpcshell...done.
(gdb) core dist/firefox/xpcshell.core 
[New process 11820]
Error while reading shared library symbols for /usr/obj/m-a/dist/bin/libnspr4.so.1.0:
Dwarf Error: wrong version in compilation unit header (is 0, should be 2, 3, or 4) [in module /usr/obj/m-a/dist/bin/libnspr4.so.1.0]
Core was generated by `xpcshell'.
Program terminated with signal 11, Segmentation fault.
#0  0x00000000 in ?? ()
(gdb) bt
#0  0x00000000 in ?? ()
#1  0x91fc0d44 in ?? () from /usr/obj/m-a/dist/bin/libxul.so.1.0
#2  0x84c1d078 in ?? ()
#3  0x8ab227a4 in _dl_call_init_recurse (object=0x20cf, initfirst=0)
    at /usr/src/libexec/ld.so/loader.c:891
#4  0x8ab227a4 in _dl_call_init_recurse (object=0x8bfde0c8, initfirst=0)
    at /usr/src/libexec/ld.so/loader.c:891
#5  0x8ab22868 in _dl_call_init (object=0x8bfde0c8) at /usr/src/libexec/ld.so/loader.c:878
#6  0x8ab23a1c in _dl_boot (argv=0xfffcf32c, envp=<optimized out>, dyn_loff=-1968046080, 
    dl_data=0xfffcf2c8) at /usr/src/libexec/ld.so/loader.c:598
#7  0x8ab222e8 in _dl_start () at /usr/src/libexec/ld.so/powerpc/ldasm.S:110
Backtrace stopped: frame did not save the PC

(gdb) up
#1  0x91fc0d44 in ?? () from /usr/obj/m-a/dist/bin/libxul.so.1.0
(gdb) 
#2  0x84c1d078 in ?? ()
(gdb) 
#3  0x8ab227a4 in _dl_call_init_recurse (object=0x20cf, initfirst=0)
    at /usr/src/libexec/ld.so/loader.c:891
891                     _dl_call_init_recurse(n->data, initfirst);
(gdb) p n
$1 = (struct dep_node *) 0x93e18358 <c_fast_paths+2396>
(gdb) p *node
A syntax error in expression, near `'.
(gdb) p n->data
$2 = (elf_object_t *) 0x20030888
(gdb) p *(n->data)                                                                                    Cannot access memory at address 0x20030888


Dunno if the 'Dwarf error' is just a warning or not, but the problem seems to lie at the os level when ld.so tries load the binary and all its deps. And it if works on linux...
I'll try to compare the aurora and central build logs to see if there are differences at the linking level/scripts. mike, any idea on that point ?
(In reply to Landry Breuil (:gaston) from comment #38)
> the problem seems
> to lie at the os level when ld.so tries load the binary and all its deps.

Does --disable-elf-hack have any effect on this?
(In reply to John Schoenick [:johns] from comment #39)
> (In reply to Landry Breuil (:gaston) from comment #38)
> > the problem seems
> > to lie at the os level when ld.so tries load the binary and all its deps.
> 
> Does --disable-elf-hack have any effect on this?

elf-hack is only enabled on Linux, and only on x86, x86-64 and arm.
And here's the updated patch for aurora with the ifndef JS_CPU_ARM fixed + fprintf() moz_crash annotated. Still crashes for me on OpenBSD/ppc, but reported working fine on linux/ppc32. I'm considering a local OpenBSD toolchain issue.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): ionmonkey merge
User impact if declined: build failure on powerpc
Testing completed (on m-c, etc.): in progress
Risk to taking this patch (and alternatives if risky): NPOTB 
String or UUID changes made by this patch: none
Attachment #677998 - Attachment is obsolete: true
Attachment #679364 - Flags: review?(dvander)
Attachment #679364 - Flags: approval-mozilla-aurora?
Attachment #679364 - Flags: review?(dvander) → review+
Comment on attachment 679364 [details] [diff] [review]
Add a bunch of #ifdef JS_ION to let js build on ppc, restore 4k pages on ppc32 (aurora patch)

Approving since this is NPOTB and in support of community builds.
Attachment #679364 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.