Closed Bug 15399 Opened 20 years ago Closed 20 years ago

XPTCInvoke not implemented for linuxppc

Categories

(Core :: JavaScript Engine, defect, P3, critical)

PowerPC
Linux
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: b.judd, Assigned: shaver)

Details

Overview Description: mozilla crashes on startup with the error "Assertion
failure: 0 == rv, at ptsynch.c:168" or "Assertion failure: lock != NULL, at
ptsynch.c:166" or "Segmentation Fault" (The exact error message seems to depend
on the phase of the moon :)). This occurs with the latest (2/2/10) build of
Seamonkey on a Mac running linuxppc R5. This is a repeatable crash (happens
every time)

Steps to reproduce: cd dist/bin; ./mozilla-apprunner

Actual results: the application crashes, stdout follows

./mozilla-apprunner.sh
MOZILLA_FIVE_HOME=/home/gwyn/sources/mozilla/dist/bin

LD_LIBRARY_PATH=/home/gwyn/sources/mozilla/dist/bin:/usr/local/qt/lib:/usr/local/qt/lib:/usr/local/qt/lib
      MOZ_PROGRAM=./apprunner
      MOZ_TOOLKIT=
        moz_debug=0
     moz_debugger=
nsNativeComponentLoader: autoregistering
/home/gwyn/sources/mozilla/dist/bin/components
*** Registering sample components
*** Registering net components
*** Registering about: components
*** Registering data: components
*** Registering file: components
*** Registering resource: components
*** Registering ftp: components
*** Registering http: components
*** Registering uconv components
*** Registering locale components
*** Registering lwbrk components
*** Registering CharDet components
*** Registering JSComponentLoader components
*** Register XPConnect test components
*** Registering pref components
*** Register libjar
*** Registering Security
*** Registering GTK timer
*** Registering GFX Postscript
*** Registering UnixToolkitService
*** Registering layout components
*** Registering rdf components
Registering DOM Viewer
*** XPInstall is being registered
nsBrowserInstance registration successful
nsFindComponent registration successful
nsUnknownContentTypeHandler registration successful
nsStreamTransfer registration successful
Registering the PrefsWindow
nsNativeComponentLoader: autoregistering succeeded
nsUnixToolkitService: Using 'gtk' for the Toolkit.
NS_SetupRegistry() MOZ_TOOLKIT=gtk, WIDGET_DLL=libwidget_gtk.so,
GFX_DLL=libgfx_gtk.so
started appcores
GFX: dpi=100 t2p=0.0694444 p2t=14.4 depth=16
Using '/home/gwyn/sources/mozilla/dist/bin' as the resource: base
initialized appshell
Got the event queue from the service
Calling gdk_input_add with event queue
Assertion failure: 0 == rv, at ptsynch.c:168
.//run-mozilla.sh: line 29: 11950 Aborted                 $prog ${1+"$@"}

Expected Results: ummm, probably the application should start running :)

Build Date and Platform: CVS source from today and yesterday (it happens with
both), todays date is 1 October 1999
I can't test this configuration Jan/Don.....CAn the reporter
please try this again on current build?
okay I tried it again with todays build...same result :(
some more info...if I run the viewer application it will display a window for a
few seconds but generally not long enough to tell what is in the window...the
apprunner application will not get that far however
Assignee: don → pavlov
pav,

Looks like the call to "pthread_mutex_lock" is failing.  Any clue as to who
should get this bug?
Assignee: pavlov → beard
beard, you have a linux ppc box.
Status: NEW → ASSIGNED
I can't build on my linuxppc box. If you'd be so kind as to point me to a build,
I'll give it a whirl.
**First bug report from me**

I have this problem as well, on a Debian/PPC (potato) machine. First, is it safe to build Mozilla
with gcc 2.95.2? Just so that possibility is eliminated...

I can also do a little better than "the phase of the moon" for which error I see.
I see the segfault if running unpriveledged (ie, it dies when trying to write the component reg, I think)
I see "Assertion failure: 0 == rv, at ptsynch.c:168" after the registration if running as root after
     registering components.
I've never seen the last one.

The behavior is similar between M10 and CVS from a few nights ago.

I've trashed it (disk space), but if beard needs still a build, it's simple enough to get back.
Would you prefer a build of M10 or a snapshot build?
Well if it comes to that I have a reasonably current build...my machine is
limited (only 64M ram) so it takes about 4-5 hours to make it completely but if
someone wants access to this machine then let me know
More Info... the new (M11) stuff gave me a more helpful error message. Sort of...
This is long, if there's somewhere other than the comments I should have put it, please say so.
I got this message from Mozilla itself now :

GFX: dpi=96 t2p=0.0666667 p2t=15 depth=32
Using '/usr/src/mozilla/dist/bin' as the resource: base
Got the event queue from the service
Calling gdk_input_add with event queue
Assertion: "nsXPTCStubBase::Stub called on unsupported platform" (0) at file ../
../../../../../dist/include/xptcstubsdef.inc, line 5
Break: at file ../../../../../../dist/include/xptcstubsdef.inc, line 5
Assertion: "unexpected OnProgress failure" (NS_SUCCEEDED(rv)) at file nsFileTran
sport.cpp, line 815
puetz:/usr/src/mozilla/dist/bin# p, line 815
Assertion: "nsXPTCStubBase::Stub called on unsupported platform" (0) at file ../
../../../../../dist/include/xptcstubsdef.inc, line 6
Break: at file ../../../../../../dist/include/xptcstubsdef.inc, line 6
Assertion failure: 0 == rv, at ptsynch.c:168
.//run-mozilla.sh: line 29: 19255 Aborted                 $prog ${1+"$@"}

but there was more detail here:

./TestXPTCInvoke

        FooImpl::FooMethod2 called with i == 2, FooImpl part of a FooBarImpl
invoke calls:
Assertion: "XPTC_InvokeByIndex called on unsupported platform" (0) at file xptcinvoke_unsupported.cpp, line 31
Break: at file xptcinvoke_unsupported.cpp, line 31
Assertion: "XPTC_InvokeByIndex called on unsupported platform" (0) at file xptcinvoke_unsupported.cpp, line 31
Break: at file xptcinvoke_unsupported.cpp, line 31
Calling Bar...
direct calls:
        BarImpl::BarMethod1 called with i == 1, BarImpl part of a FooBarImpl
        BarImpl::BarMethod2 called with i == 2, BarImpl part of a FooBarImpl
invoke calls:
Assertion: "XPTC_InvokeByIndex called on unsupported platform" (0) at file xptcinvoke_unsupported.cpp, line 31
Break: at file xptcinvoke_unsupported.cpp, line 31
Assertion: "XPTC_InvokeByIndex called on unsupported platform" (0) at file xptcinvoke_unsupported.cpp, line 31
Break: at file xptcinvoke_unsupported.cpp, line 31
impl == 10016830
foo  == 10016830
bar  == 10016834
Calling Foo...
direct calls:
        FooBarImpl2::FooMethod1 called with i == 1, local value = 12345678
        FooBarImpl2::FooMethod2 called with i == 2, local value = 12345678
invoke calls:
Assertion: "XPTC_InvokeByIndex called on unsupported platform" (0) at file xptcinvoke_unsupported.cpp, line 31
Break: at file xptcinvoke_unsupported.cpp, line 31
Assertion: "XPTC_InvokeByIndex called on unsupported platform" (0) at file xptcinvoke_unsupported.cpp, line 31
Break: at file xptcinvoke_unsupported.cpp, line 31
Calling Bar...
direct calls:
        FooBarImpl2::BarMethod1 called with i == 1, local value = 12345678
        FooBarImpl2::BarMethod2 called with i == 2, local value = 12345678
invoke calls:
Assertion: "XPTC_InvokeByIndex called on unsupported platform" (0) at file xptcinvoke_unsupported.cpp, line 31
Break: at file xptcinvoke_unsupported.cpp, line 31
Assertion: "XPTC_InvokeByIndex called on unsupported platform" (0) at file xptcinvoke_unsupported.cpp, line 31
Break: at file xptcinvoke_unsupported.cpp, line 31

This is NOT quite the same behavior as M10 - M10 simply didn't work, and all it gave
was the assertion 0==rv failure. I don't know enough to know if these errors are related - but they
seem likely to be. Having looked into the XPTC stuff, I'm don't think I have enough experience
to proceed - I don't know assembler :-(. I'm debating the eevvviiillll hack of trying the AIX stuff  -
I think the ABI's are similar, but I'll bet it doesn't work.

I have a slightly working M9 build (packaged by debian). It runs only w/ root privileges, and
the UI doesn't actually respond at all - things highlight, but nothing happens. It does, however,
startup - when did this new XPTCInvoke stuff go in? I know Mozilla used to run on LinuxPPC a
while back, and this M9 shows more promise than anything I've built myself. I don't see any
patches in the source .deb, so maybe I should try building M9 myself, just as a reference.
Is there any progress on this? I'm back from thanksgiving and have my machine

again (and a big new hard drive, so keeping code around is no longer a problem).

Anyone who needs access to a Linux/PPC box to work on this, I'll setup an

account.



I'm thinking maybe I should beat the debian-powerpc mailing list a bit and see

if I can stir up someone who knows PPC assembally and the ABI to help do this.

But I'd like to know that this isn't recieving active work before I try to

recruit som
is this a Linux that runs on Mac?  If so, we only support Redhat Linux 6.0 and
above.
QA Contact: leger → mcafee
Assignee: beard → shaver
Status: ASSIGNED → NEW
QA Contact: mcafee → jband
Summary: Browser Crash on startup with linuxppc R5 → xptcall on Linux/PPC appears to be broken
Don't mind Jan, I suspect she's thinking about Netscape again.  Mozilla supports
all platforms that have people willing to work on them, and as far as I know
there are a few people working on Linux/PPC support.  Pop into #mozilla and look
for ``annex''; he's one of the linuxppc.com guys doing so.

I'll take this bug, at least until the xptcall stuff is fixed.  jband gets to be
QA!  Stamped it!  No erasies!
QA Contact: jband → mcafee
Summary: xptcall on Linux/PPC appears to be broken → Browser Crash on startup with linuxppc R5
Yes, this is Linux on a PowerMac G3. I'm aware that it doesn't currently work
:-) (See above - OS: Linux, Platform: Macintosh)


LinuxPPC R5 is based off RH6 (I'm on Debian/PPC (potato), which is of course
based on debian potato, but we're nowhere near distro issues). Both use glibc
2.1, which was the problem with older Redhats (to my knowledge). I think this
_should_ work except for one thing - the XPTCInvoke stuff is not implemented for
this platform. As far as I know this is the only problem, though I can't see for
sure until I can get past this one. I have a powerpc debian package of M9 that
starts, though it doesn't work right (buttons in the UI do nothing). I don't
know whether this is because M9 is less dependant on XPTCInvoke or what the deal
is.

I don't know assembally language, so I can't write this needed implementation
myself - but I can go look for help (I can't be the only person who wants
mozilla!). Before I went searching for outside help, though, I wanted to make
sure that I wasn't going to be duplicating effort.
Actually, Netscape 4.6 and 4.7 work fine :-)

So there are people actively working on getting XPTCInvoke to work now? To my
knowledge, it's been broken ever since the code that used it originally went
in,since there is no implementation of it at all. (BTW, what compelled you to
write it in assembly? I don't know for sure how the system works, but it would
seem there should be some more portable way to code things. But I can't read
asm, so I guess I don't know what it does anyway). Mozilla on MacOS is now solid
enough to be my main browser (can finally post to slashdot, yay!) so I want it
on Linux/PPC more than I used to.
Component: Browser-General → Javascript Engine
Summary: Browser Crash on startup with linuxppc R5 → XPTCInvoke not implemented for linuxppc
Adding rogerl, changing component.
Adding jdunn, who stole the MacPPC code for AIX,
maybe he can give us a hint here.
Somewhere in Oct, the xptcall became mandatory for running
'viewer' & appruner/mozilla-bin so I am guessing someone
turned on the ASSERT if it wasn't implemented.

What is the OS_ARCH & OS_TEST for linuxppc?
I would copy the mac *.cpp & *.s from md/mac into md/unix
renaming them to
    xptcinvoke_ppc_linux.cpp, xptcinvoke_asm_ppc_linux.s
    xtpcstubs_ppc_linux.cpp, xptstubs_asm_ppc_linux.s

Add the appropriate ifdef to the cpp's (ifdef linux_ppc)

Edit xpcom/reflect/xptcall/md/unix/Makefile.in
and add the $(OS_ARCH)$(OS_TEST),LinuxPPC stanza
that has the above 4 files.

This should get you started.  Get the cpp's and .s's
to build and try out the TestXPTCInvoke test.  Then
once you get that running try the xpconnect JS one.
$(OS_ARCH)=Linux
$(OS_TEST)=ppc
:-)

But neither the Mac nor AIX assembly will compile (different mneumonics,
I assumed it wouldn't).
okay I managed to get the XPTCInvoke stuff working with some help from beard but
I still get the (0==rv) error. It looks like the call to pthread_mutex_lock is
failing with "Invalid Argument" (EINVAL). Maybe the mutex is getting overwritten
from somewhere else?
Cool!
Well, M9 (pre-built and packaged) starts up, though nothing in the UI responds at all (presumably
the missing invoke functionality...). So if you have any idea where in the code (which files?) to the
0==rv error might originate, the diffs between M9 (which started up, though uselessly) and M10
(which died with the 0==rv error, since the assert wasn't on yet) might be enlightening.
Or maybe the changes are too big to sift through and it would be better to just debug as if there
was no reference. I have no feel for the extent of differences between milestones...

I did a little superficial thinking - I preprocessed the ptsynch.c file from both versions,
diff'ed them, and grepped for various things (mutex, 0 == rv) on the diff. I didn't see
much of anything changed (line numbers, little else) but... it might be somewhere for a
person who actually understands what's going on in there to start.

BTW, how/when is the invoke code going to appear (so I can get it built and be ready to debug
the next crash). Attached here, checked in, something different? I don't know what the standard
procedure is for such things.
As far as I know the XPTCInvoke changes have been checked in. I didn't do the
ones that were checked in but the ones I came up with were virtually identical
to the ones that were checked in so either they are right or we're all wrong :)

As for the other question it's probably (in my thinking) not something that is
changed in ptsynch.c per se but maybe something else is using it wrong. The
"EINVAL" error is one that crops up when the mutex is not properly initialised
or (possibly) if some random piece of code writes over it. I am thinking the
latter because if the call to PR_NewLock failed somehow it would be obvious.
OK, bonsai shows the new files now. Wonder why I couldn't find it last night...

oh well.



If it's getting overwritten by stray code, that strikes me as an odd bug for it

to be platform specific. How very odd - I wonder if it's more a compiler issue,

what are people using on the 'supported' Linux boxes in tinderbox? gcc-2.8/2.91/

2.95, egcs (prior to merge), or what?



Do you have any hunch on where to start looking for the code that's overwriting

this? I'll have some time this weekend to search, if I can figure out some way

more efficient than single-stepping through mozilla :-). Is there a tool

(debugger command or otherwise) that will let me control, or at least log,

accesses to that memory location? I'm still learning the ropes for gdb, and

missing my Metrowerks CodeWarrior debugger...
Hmm... is this xptcinvoke code supposed to be semi-functional? Or is it
just in, and incomplete? I'm not even making it far enough to look
for the thread problem - TestXPTCInvoke segfaults on the first attempt
to call by index, and when stepping in DDD the reason is pretty clear - the
pointer that arrives in InvokeTestTarget::MultTwoLLs for the return value is completely
bogus, bearing no resemplance to the one from the caller - in the case I'm
currently looking at, 0x7ffffaa8 becomes 0xff99b84 (included just in case you
recognize this as some munging - it's remarkably consistent. I've seen the same
two pointers every single run through, which also strikes me as odd.

Another interesting thing (probably not relavent, though) is that 0xff99b84 is very close
to ram+swap-free ram in the box at the moment, within ~100k, which given the fact the
machine is playing MP3's is as close as it could be :-). Don't know if that's significant,
and I also don't know an accurate enough way to determine free RAM to know if this is
exactly the case (free changes the # in the act of reporting it). It could be coincidence -
but it's an odd pointer for the system to have invented on it's own.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Fixed, according to the 12/13 status report and the xptcall porting status page.
Hmm... last I saw, there was code such that TestXPTCinvoke passed, but TestXPC
would still fail in various ways. I will try to grab tonight's nightly source drop and build
it to verify this as fixed.

Also, I haven't veen able to get a CVS update that is consistent (ie, builds) tonight. The
update runs for a long time (hours, rrr...), occasionally waiting to get a lock from 'cvsuser'
(presumably this means someone changed the repository during my download?). Is there
some right way to do this besides 'cvs -z3 update -d' with the local entries set to HEAD?
Probably a FAQ from silly people like me... or maybe I'm just unlucky.

If I'm just being dumb, feel free to tell me so via private email...
Good god, man.

Blow away your tree; cvs update -d has given you the _entire_ mozilla.org source
repository.  You want:

$ cd mozilla/..
$ cvs -d `cat mozilla/CVS/Root` -z3 co -P SeaMonkeyAll
I can confirm that TestXPTCInvoke now runs successfully, err at least it doesn't
segfault, I'm not sure if the output is correct. Also I don't get the 0==rv
error anymore :). Unfortunately it still doesn't run successfully, when I run
mozilla-viewer I get the following output (clipped to the last few lines for
brevity):

nsNativeComponentLoader: autoregistering succeeded
nsUnixToolkitService: Using 'gtk' for the Widget Toolkit.
nsUnixToolkitService: Using 'gtk' for the Gfx Toolkit.
NS_SetupRegistry() MOZ_TOOLKIT=gtk, WIDGET_DLL=libwidget_gtk.so,
GFX_DLL=libgfx_gtk.so
Going to create the event queue
GFX: dpi=96 t2p=0.0666667 p2t=15 depth=15
Using '/home/sources/mozilla/dist/bin' as the resource: base
Got the event queue from the service
Calling gdk_input_add with event queue
PreCondition: "bad param" (0) at file nsInterfaceInfo.cpp, line 127
Break: at file nsInterfaceInfo.cpp, line 127
Assertion: "no interface info" (info) at file xptcstubs_ppc_linux.cpp, line 56
Break: at file xptcstubs_ppc_linux.cpp, line 56
.//run-mozilla.sh: line 29: 31691 Segmentation fault      $prog ${1+"$@"}

Can anyone else confirm this? Could be my tree is horked as well, I don't know
much about cvs.
Does TestXPC run also?
TestXPTCInvoke is a very simple test that doesn't really test alot
(sort of see if you are going in the right direction).  Passing
TestXPC is the tough one.
No go looks like:

[gwyn@thislove]$ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:. ./TestXPC
Warning: MOZILLA_FIVE_HOME not set.
nsNativeComponentLoader: autoregistering begins.
nsNativeComponentLoader: autoregistering succeeded
XPC_GetXPConnect() returned NULL!
I get the following from TestXPC (I can also confirm Test XPTCInvoke as passing). This is from
this mornings CVS, about 9:00 I think...

kevin@puetz:/usr/local/src/mozilla/dist/bin $ ./TestXPC
nsNativeComponentLoader: autoregistering begins.
nsNativeComponentLoader: autoregistering succeeded
###!!! ASSERTION: bad param: '0', file nsInterfaceInfo.cpp, line 131
###!!! Break: at file nsInterfaceInfo.cpp, line 131
###!!! ASSERTION: no interface info: 'info', file xptcstubs_ppc_linux.cpp, line 56
###!!! Break: at file xptcstubs_ppc_linux.cpp, line 56
Segmentation fault

mozilla and mozilla-viewer produce the same crash. So it looks like invoke works and stubs
doesn't?

The first assertion seems to suggest that an out-of-bounds index is being requested - but I
can't get ddd to let me put a breakpoint on nsInterfaceInfo::GetMethodInfo, so I'm not sure
what's going on here. The Assertion failure and segfault are happening towards the end of
nsComponentManager::AutoRegister, after it's printed
> nsNativeComponentLoader: autoregistering begins.
> nsNativeComponentLoader: autoregistering succeeded
but before it returns. Again, for some reason (probably the dynamic load?), I can't set
breakpoints or single-step into this function (at least I don't know how to). Is there some way to
follow this? stepi (step instruction) seems to work, but that takes me into libc and malloc and all
kinds of places I'm not smart enough to be :-(

Adding a line to nsInterfaceInfo::GetMethodInfo in xpcom/reflect/xptinfo/src/nsInterfaceInfo.cpp
cerr << "crapped-up index: " << index << " >= " << mMethodBaseIndex << " + " <<
    mMethodCount << endl;
and the #include <iostream.h> (probably are some NS debug macros, but I'm too lazy to find out)
above the assertion, I get the following:
> nsNativeComponentLoader: autoregistering succeeded
> crapped-up index: 13635 >= 3 + 4
> ###!!! ASSERTION: bad param: '0', file nsInterfaceInfo.cpp, line 134

I've also seen 21827, though so I think it's picking up garbage, not mangling a value. They all give
the same value if I run more than one instance of TestXPC concurrently (or nearly so -
./TestXPC & ./TestXPC & ./TestXPC &). Hmm, maybe not garbage per se - I just noticed
another pattern - I get 13635 if it's run in the foreground relative to it's shell and 21827 if it's
in the background. Investigating as I write this...
./TestXPC ^Z bg, sh -c "./TestXPC" and sh -c "./TestXPC" & give 13635
./TestXPC & fg, sh -c "./TestXPC &" sh -c "./TestXPC &" & give 21827
nohup ./TestXPC and nohup ./TestXPC & give 13603
nohup sh -c "..." give as if they weren't nohup'ed.
redirection doesn't change anything that I can tell on either input or output

That's an odd pattern, unless it's a fluke. Maybe it will mean something to someone. It's pretty
weird that it cares what it's fg/bg status is.
Not the same numbers for index anymore. The numbers are still consistent across runs,
and still have those 3 states, but they aren't the same numbers anymore.
I have just got this running on AIX (ppc) and it took a bit.
What I found was that I wasn't getting the correct address for
the parameters from the 'Stubs' function, through to SharedStub
and then passed on to Dispatch.

To debug this, I narrowed it down to which 'stubs' call was eventually
causing the problem (for me it was Stubs9.  So I put a break in there
and once there put a break in SharedStub.

What I found was that in SharedStub I wasn't pointing r5 to the
correct parameters (parameter 8 thru x).  I had been using the
MAC code and found that AIX was a bit different.

it might be helpful to put a printf in ::Stubn and just
print out 'n'
printf("::Stubn %d\n", n);
just to narrow it down.
To the beswt of our knowledge interally, this is fixed and I am marking 
Verified.  Please reopen if you disagree.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.