Closed
Bug 151577
Opened 23 years ago
Closed 22 years ago
Mozilla (> 0.9.7) doesn't run on Familiar (ARM) Linux
Categories
(NSPR :: NSPR, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: dr, Assigned: wtc)
References
Details
Attachments
(1 file, 1 obsolete file)
|
20.59 KB,
text/plain
|
Details |
I'm trying to run Mozilla on the iPaq, which uses an ARM architecture. The
operating system on my iPaq is Familiar 0.5.2 (http://familiar.handhelds.org). I
have been successful in cross-compiling Mozilla for ARM, but there is a problem
with more recent releases that prevents the compiled code from executing.
Facts:
- All my builds are embed builds (created by embedding/config/Makefile). I
execute using ./run-mozilla ./TestGtkEmbed.
- I have been able to get the 0.9.6 release to run successfully on the iPaq,
and I'm working on putting together some documentation on how I managed to do it.
- The 1.0 release does not work. I get exceptions from nsThread.cpp when I
execute the program. The gtk UI comes up after that, but when I try to load a
page, I die with a segfault.
- The 0.9.9 release doesn't work either (I tried it because it was the first
release to include freetype support). In this case, I don't get the exception
from nsThread, but the program seems to die in the same way.
I'm having trouble tracking down the actual location of the crash, because gdb
complains about not being able to grok how dlls are loaded. What I'm going to
try to do instead, is build 0.9.7 (and maybe 0.9.8) to nail down a timeframe in
which the problem started occurring. There aren't many changes committed to
XPCOM and NSPR threads during that time, so (assuming the problem does lie in
threading code) it might help us see what the cause of the trouble is.
One other quick comment: Mozilla 0.9.6 works on the iPaq with no code
modification at all, besides the patch in bug 33364. That is, the code is
completely portable. The only trick was getting my environment and .mozconfig
set up correctly.
(Also, CC'ing some folks who've been involved with mozilla/ARM in other bugs.
Apologies if you don't want to be added here - just fishing for people who might
have looked at this already).
Comment 2•23 years ago
|
||
First, look at nspr and xpcom changes in between 0.9.6 and 0.9.7.
Two things:
1. Ignore my comment about bug 33364. The change I needed to make was similar,
in the file xpcom/reflect/xptcall/src/md/unix/Makefile. (change armv4l and arm32
to arm%).
2. Mozilla 0.9.7 seems to work. I'll try now with 0.9.8.
One more thing about 0.9.7. I get the following assertion:
nsNativeComponentLoader not thread-safe:
'owningThread == NS_CurrentThread()', file nsDebug.cpp, line 528
It doesn't seem to cause any problems here, though.
Comment 5•23 years ago
|
||
Did you take a look at bug 9519, bug 87965 and bug 106864 ?
I've got only experience with a 'full system' build of mozilla (starting from
0.9.8) for arm (and the result is a fully functional mozilla !)
The assertion in comment 4 should be harmless.
Comment 6•23 years ago
|
||
question : which compiler do you use to compile mozilla ?
I've experienced segmentation faults with a 'plain' gcc-3.0.4
because of 'floating point' registers being incorrect reloaded.
(or at least, there emulation)
Jeroen: I'm using a build of gcc-2.95.3 which I downloaded from arm.linux.org.uk
(ftp://ftp.arm.linux.org.uk/pub/armlinux/toolchain). Also, I looked at those
bugs you mentioned - bug 106864 might be the culprit for me. I'll try applying
your patch there and see what happens...
Also, I've finished the first round of my search. 0.9.7 works for me, and 0.9.8
does not. So that means I've narrowed down the segfault to code changed between
December 21, 2001 and February 4, 2002. I'm going to go hunt through bonsai for
relevant changes committed between those dates.
The patch in bug 106864 doesn't fix 0.9.8 for me. It fixes the strings problem
it was intended to fix, but that never caused me any grief in my embed build.
Jeroen: Would you be able to try creating an embed build of your own, to see if
it works? (To do this, cd into embedding/config and make. This results in
unstripped binaries located in dist/Embed). I'm wondering if the problem is
particular to embed builds...
->NSPR. This doesn't look like it's in XPCOM threads, judging from checkins to
mozilla/xpcom/threads between 12/21/2001 and 2/4/2002.
Assignee: rpotts → wtc
Component: Threading → NSPR
Product: Browser → NSPR
QA Contact: rpotts → wtc
Version: other → 4.2
| Reporter | ||
Comment 10•23 years ago
|
||
Looks like there are several checkins to NSPR threads contributed by
jeff@NerdOne.com during the 0.9.7 - 0.9.8 timeframe. (CC'ing Jeff). Jeff, any
idea what I ought to be looking for, or why I might be experiencing this problem?
Summary: Mozilla doesn't run on ARM arch → Mozilla doesn't run on Familiar (ARM) Linux
Comment 11•23 years ago
|
||
sorry, i can't be of help. i was tracking down
a bunch of memory leaks. and i was only running
on win32.
Comment 12•23 years ago
|
||
I've tested mozilla-1.0 release sources, without any patches, using
gcc-3.0.4 (containing a patch concerning floating point reloads),
glibc-2.2.5.
This is on an xscale based Xingu board.
The TestGtkEmbed seems to work fine (I succeeded displaying slashdot
when exporting my X windows to a linux pc (the 'streams' test works fine
on the platform itself.)
| Reporter | ||
Comment 13•23 years ago
|
||
*Whooff* <-- the sound of me throwing my hands in the air, being stumped.
Comment 14•23 years ago
|
||
Well, TestGtkEmbed works for me on my netwinder, mostly.
Built on a Debian/ARM woddy system (gcc-2.95.4 Debian prerelease/glibc-2.2.5-6)
The only problem I face is that PSM does *not* work. Any https:// site will
crash mozilla with:
Assertion failure: lock != NULL, at ptsynch.c:206
The problem appears to be in NSPR, in nsprpub/pr/src/misc/prdtoa.c:Balloc. The
PR_Lock seens to go off without a hitch, but the PR_Unlock dies.
Jeroen, does PSM work on your build. If so, what gcc/glibc patches for ARM did
you use?
| Assignee | ||
Comment 15•23 years ago
|
||
That assertion failure means the 'freelist_lock' in prdtoa.c
is a null pointer.
Since PR_Lock goes off without a hitch but PR_Unlock dies,
and _PR_CleanupDtoa is the only function that sets freelist_lock
to NULL, this implies that _PR_CleanupDtoa is called, which
in turn implies that PR_Cleanup is called.
This conclusion doesn't make sense to me because PR_Cleanup
is only called before an application terminates and only
some applications call PR_Cleanup.
Comment 16•23 years ago
|
||
I don't have PSM enabled... I'll try to spin a build where it is enabled...
| Reporter | ||
Comment 17•23 years ago
|
||
I don't have PSM enabled either, since it's a pain to cross-compile. My major
problem is just that the first page load results in a segfault (in releases
after 0.9.7).
I'm having some other difficulties as well: one I'm working on right now is, in
my "working" builds, I can't seem to submit forms. That's a different issue, though.
| Reporter | ||
Comment 18•23 years ago
|
||
I've attached a console log in bug 152955, which might help give a
bigger-picture view of everything that's going wrong in my 0.9.7 build.
Summary: Mozilla doesn't run on Familiar (ARM) Linux → Mozilla (> 0.9.7) doesn't run on Familiar (ARM) Linux
Comment 19•23 years ago
|
||
Dan,
maybe you should focus your effort on mozilla-1.0, as this
source tree contains all patches which are needed for arm.
Then we could try to focus on the differences you and I seem to get...
When you get a 'segfault' for the 1.0 release, is it inside mozilla or inside
one of the libraries ? (ever tried to debug with gdb ?)
Do you get different results when exporting the display to a remote pc and not
exporting the display ?
Did you try with gcc-3.0.4 ? (don't forget to add something like
<http://gcc.gnu.org/ml/gcc-patches/2002-03/msg00248.html> (note: this was not
the final patch going into cvs, but it fixes the problem))
| Reporter | ||
Comment 20•23 years ago
|
||
Jeroen:
I wish I could use mozilla 1.0! It's pretty embarassing at this point, filing
bugs against 0.9.7 :) As for your suggestions:
- I've tried to debug with GDB. There are two difficulties with that. One is
that the iPaq has such limited resources that an unstripped build won't fit! But
I can get around that by stripping all the binaries except those I need to
debug. The other, more major problem, is that threads don't work in Familiar
Linux's build of GDB (see http://handhelds.org/bugzilla/show_bug.cgi?id=161). So
when I tried to track down where the segfault was, I found I was unable to trace
into any threads. Pretty useless, huh?!
- I get the same results when exporting the display to my desktop as I do when
displaying on the iPaq screen. Tried that one already :)
- As for gcc-3.0.4... That was a sad story. I tried very hard to build it as a
cross-compiler, but never managed to get it to completely build. So after two
weeks of banging my head against that problem, I gave up and downloaded a
pre-built 2.95.3 from ftp://ftp.arm.linux.org.uk/pub/armlinux/toolchain. Do you
have your 3.0.4 cross-compiling, or is it a native compiler running on a desktop
ARM machine? If it's an i386->ARM cross-compiler, I'd be very grateful if you
could send me a copy!
Also, regarding the gcc floating point patch: is this the final patch that went in?:
http://gcc.gnu.org/ml/gcc-patches/2002-03/msg00829.html
Anyway, there is definitely *some* problem that cropped up between 0.9.7 and
0.9.8. But if you have 1.0 running on ARM, then I'm really pretty stumped. Maybe
it's a mozilla bug, maybe it's a compiler bug, maybe it's a libraries bug...
| Reporter | ||
Comment 21•23 years ago
|
||
For your curiosity, here's the gdb problem I'm having:
(gdb) run
Starting program: /usr/local/Embed/./TestGtkEmbed
warning: Unable to find dynamic linker breakpoint function.
GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
...
Cannot access memory at address 0x40016df0
This actually appears to be different from the thread issue I mentioned (which I
also recall having seen) but, dollars to doughnuts, it's again a gdb bug, and
not mozilla's problem.
This happened using mozilla 1.0 (source release), on gdb 5.0 (5.0-3-fam1).
| Reporter | ||
Comment 22•23 years ago
|
||
Woohah! I managed to get a new gdb from
http://handhelds.org/bugzilla/show_bug.cgi?id=161. I ran Mozilla in it, and lo
and behold, the Stack Trace of Justice!
Most of the binaries here are stripped. (Only TestGtkEmbed and
libgtkembedmoz.so are unstripped). Now that I can see what DLL's I'm in, I'll
crank out another stack trace with those binaries unstripped.
Also, I'm able to "cont" past this first crash, and experience several more:
loading url sleepy.at
Program received signal SIG32, Real-time event 32.
0x4047c82c in sigsuspend () from /lib/libc.so.6
(gdb) cont
Continuing.
Program received signal SIG32, Real-time event 32.
0x4047c82c in sigsuspend () from /lib/libc.so.6
(gdb) cont
Continuing.
Warning: MOZILLA_FIVE_HOME not set.
Program received signal SIG32, Real-time event 32.
0x4047c82c in sigsuspend () from /lib/libc.so.6
(gdb) cont
Continuing.
open_uri_cb http://sleepy.at/
load_started_cb
Program received signal SIGTRAP, Trace/breakpoint trap.
0x4050a5d4 in write () from /lib/libc.so.6
Each of the first three SIG32s involves necko trying to start a thread. The
first (attached) is in nsIOService::Init, the next is in nsDNSService::Init,
and the last is in nsHttpHandler::Init (trying to start a timer). I'll attach
each full stack trace when I have unstripped Necko, XPCOM and NSPR.
| Assignee | ||
Comment 23•23 years ago
|
||
What is the output of "ldd TestGtkEmbed"?
By the way, the stack trace doesn't look right.
#0 0x4047c82c in sigsuspend () from /lib/libc.so.6
#1 0x40436244 in pthread_getconcurrency () from /lib/libpthread.so.0
#2 0x404357d0 in pthread_create () from /lib/libpthread.so.0
#3 0x4041130c in PR_Select () from ./libnspr4.so
#4 0x404115dc in PR_CreateThread () from ./libnspr4.so
PR_CreateThread does not call PR_Select. So it's not
clear how much we can trust this stack trace.
| Reporter | ||
Comment 24•23 years ago
|
||
Agh, yuck! Sorry about the extra newlines. Minicom isn't too friendly with the X
clipboard. "ldd" returns what you'd expect it to:
root@midget2 /usr/local/Embed -> ldd ./TestGtkEmbed
./TestGtkEmbed:
libgtkembedmoz.so => ./libgtkembedmoz.so (0x4001f000)
libgtksuperwin.so => ./libgtksuperwin.so (0x40073000)
libdl.so.2 => /lib/libdl.so.2 (0x40081000)
libmozjs.so => ./libmozjs.so (0x4008c000)
libxpcom.so => ./libxpcom.so (0x40178000)
libplds4.so => ./libplds4.so (0x403b4000)
libplc4.so => ./libplc4.so (0x403bf000)
libnspr4.so => ./libnspr4.so (0x403cc000)
libpthread.so.0 => /lib/libpthread.so.0 (0x4042c000)
libc.so.6 => /lib/libc.so.6 (0x4044a000)
libgtk-1.2.so.0 => /usr/lib/libgtk-1.2.so.0 (0x40566000)
libgdk-1.2.so.0 => /usr/lib/libgdk-1.2.so.0 (0x406db000)
libgmodule-1.2.so.0 => /usr/lib/libgmodule-1.2.so.0 (0x4071c000)
libglib-1.2.so.0 => /usr/lib/libglib-1.2.so.0 (0x40727000)
libXi.so.6 => /usr/X11R6/lib/libXi.so.6 (0x40759000)
libXext.so.6 => /usr/X11R6/lib/libXext.so.6 (0x40768000)
libX11.so.6 => /usr/X11R6/lib/libX11.so.6 (0x4077e000)
libm.so.6 => /lib/libm.so.6 (0x40855000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
I'm about to attach a gdb log -- maybe the stack trace from that will make more
sense...
| Assignee | ||
Comment 25•23 years ago
|
||
libpthread.so is before libc.so, so the library linking order
is correct.
| Reporter | ||
Comment 26•23 years ago
|
||
Unfortunately I wasn't able to get Necko unstripped also - not enough space in
ROM on this damn iPaq (see also
http://handhelds.org/bugzilla/show_bug.cgi?id=418)!! I'll see if I can manage
to squeeze it on somehow anyway.
Attachment #88709 -
Attachment is obsolete: true
| Reporter | ||
Comment 27•23 years ago
|
||
wtc: Any particular variables I should be peeking into? Or does everything look
correct now? ... Should I try again with necko unstripped?
| Reporter | ||
Comment 28•23 years ago
|
||
Is it possible that this value isn't kosher on my architecture?:
if (0 == stackSize) stackSize = (64 * 1024); /* default == 64K */
(ptthread.c, line 357)
I'm guessing this is probably fine because, looking in bonsai at changes to
ptthread.c between 21 Dec 2001 and 4 Feb 2002, this line was not changed. I just
snooped into this because seeing stackSize go from 0 to 65536 made me suspicious.
| Reporter | ||
Comment 29•23 years ago
|
||
Another shot-in-the-dark, which maybe Jeroen could answer: might something be
wrong with memcpy? Wondering if bug 118135 introduced the problem... Seems
highly unlikely, but I'm trying to guess what sorts of things might not be
threadsafe.
Comment 30•23 years ago
|
||
Has anyone tried any other threaded apps on the ARM?
Comment 31•23 years ago
|
||
I must confess that I am confused about the stack traces :(
shouldn't you just ignore the SIG32 ?
and what about the 'SIGTRAP' :
did you put a breakpoint in the gdk library ?
(I guess not, but that's what the stacktrace seems to indicate...)
| Reporter | ||
Comment 32•23 years ago
|
||
Ok, I did a little more research into the signals I'm seeing, and you're right,
I shouldn't be seeing any of them. They don't indicate any problem in threads or
necko. References:
http://www.mozilla.org/unix/debugging-faq.html (of course)
http://sources.redhat.com/ml/gdb/2000-q1/msg00329.html
http://sources.redhat.com/ml/gdb/2000-q1/msg00336.html
http://sources.redhat.com/ml/bug-gdb/1999-10/msg00058.html
http://www.advogato.org/person/drunen/ ("Seems pthreads is using this to
change thread contexts, and gdb doesn't seem to get it.")
http://www.uwsg.iu.edu/hypermail/linux/kernel/0012.1/0232.html
etc. etc. etc.
Just using "prun" doesn't deal with the SIG32s, so I've added "handle SIG32
nostop noprint pass" to my .gdbinit. That works, of course, and gets me straight
to the SIGTRAP. The SIGTRAP comes up every time, but in different places in the
code. And no, I have not set any breakpoints in GDK, or anywhere else.
Blizzard: I know you dealt with this problem a while back. Can you tell me how
I'm supposed to get around it? That'll bring me a big step closer to the holy
grail of a Useful Stack Trace.
As for "other threaded apps on the ARM," Mozilla 0.9.7 does run for me (though
with many bugs). I assume that answers the question :)
Comment 33•23 years ago
|
||
gdb should be handling the signals. This means that either gdb is screwed up or
there's no thread debugging library on the arm that works. You shouldn't need
that SIG32 crap anywhere.
| Reporter | ||
Comment 34•23 years ago
|
||
Ok, gdb was screwed up: I needed a newer glibc to go with the homebrewed gdb. So
I'm getting results now that seem definitive. But they're weird. Apparently I'm
getting a segfault in nsCSSValue::GetUnit().
I don't have an unstripped content DLL, unfortunately, because I can't load it
onto the iPaq (not enough room). But nsCSSValue::GetUnit() is a one-liner:
nsCSSUnit GetUnit(void) const { return mUnit; };
The other weird thing here is that the function which supposedly calls GetUnit()
is CSSStyleRuleImpl::MapRuleInfoInto(). That function should indirectly call
GetUnit by way of one of the Map*ForDeclaration functions, but I can't see which
one.
Here's the stack I've got:
Program received signal SIGSEGV, Segmentation fault.
0x415d1760 in ?? () from /usr/local/mozilla/components/libgkcontent.so
(gdb) shar content
(no debugging symbols found)...Loaded symbols for
/usr/local/mozilla/components/libgkcontent.so
(gdb) where
#0 0x415d1760 in nsCSSValue::GetUnit ()
from /usr/local/mozilla/components/libgkcontent.so
#1 0x417b0724 in CSSStyleRuleImpl::MapRuleInfoInto ()
from /usr/local/mozilla/components/libgkcontent.so
#2 0x417af038 in CSSStyleRuleImpl::MapRuleInfoInto ()
from /usr/local/mozilla/components/libgkcontent.so
#3 0x41ac7c3c in nsRuleNode::WalkRuleTree ()
from /usr/local/mozilla/components/libgkcontent.so
#4 0x41ac72cc in nsRuleNode::GetBorderData ()
from /usr/local/mozilla/components/libgkcontent.so
#5 0x41ad1e70 in nsRuleNode::GetStyleData ()
from /usr/local/mozilla/components/libgkcontent.so
#6 0x41b00ed4 in nsStyleContext::GetStyleData ()
from /usr/local/mozilla/components/libgkcontent.so
#7 0x423e41e8 in ?? () from /usr/local/mozilla/components/libgklayout.so
I'm wondering if it might be a compiler bug relating to inlining...?
| Reporter | ||
Comment 35•23 years ago
|
||
Ok, I spent some time and have what seems to be a working GCC 3.1 cross-compiler
(with binutils 2.12.1, glibc 2.2.5). By "seems to be working," I mean that it
builds "hello world" using namespaces and iostreams... Not exactly a rigorous
test, and I suspect Mozilla will be significantly more taxing on the compiler.
Anyway, to make a long story short, I'm trying to cross-compile Mozilla with GCC
3.1 now. We'll see what happens.
Comment 36•23 years ago
|
||
If you're using GCC 3.1 you might have to play with the name mangling in the
xptstub code. You also might need to mess with the xptcall code since I'll be
the ABI has changed.
Comment 37•23 years ago
|
||
And just as a start, it would be a good idea to first try it
at -O1 and only later at -O2, -O3 or -Os
(With gcc-3.0.4 I exerienced one small glitch compiling nsDOMClassInfo
at -O2 resulting in a XUL information not being available. Compiling this file
at -O1 resolved the problem)
| Reporter | ||
Comment 38•23 years ago
|
||
Well, here are my results from yesterday:
The build finished successfully (without --enable-optimize). The build also
runs, to the extent that it ran with gcc 2.95.3 (I didn't have to hack any of
the xpt stuff). But it dies on the first page-load, just like it did with the
old gcc.
Worse yet, I can't seem to get any remotely useful stack trace for the crash:
Program received signal SIGSEGV, Segmentation fault.
0x0015100c in ?? ()
(gdb) where
#0 0x0015100c in ?? ()
Cannot access memory at address 0x0
My glibc configuration is:
--host=arm-linux
--enable-add-ons=linuxthreads
My gcc configuration is:
--target=arm-linux
--enable-languages=c,c++
These also include --prefix, --with-headers, and --with-local-prefix, of course.
Jeroen: I looked at the configure bits you sent me... I'm wondering, should I
also be using:
--with-cpu=strongarm110
--without-fp
--with-softfloat-support=internal
--enable-threads=posix
or is that unnecessary for me?
Comment 39•23 years ago
|
||
--with-cpu=strongarm110 -> yes
(don't try xscale binaries on an ipaq ;) )
--without-fp -> no : use --with-fp (default)
--with-softfloat-support=internal -> not important without softfloat (leave away)
--enable-threads=posix -> yes : needed for c++ with multiple threads
the softfloat part is only useful if your _complete_ system is compiled
for softfloat. The normal distributions (like familiar) let the kernel
do the work for emulating floating points.
Comment 40•23 years ago
|
||
So, has anyone looked into the ptsynch assert in comment #14?
I ran into some of the ARM Linux people, and got some debugging ideas from them.
However, it was all for naught. Has anyone else gotten PSM working on the ARM,
and if so, what was their toolchain setup. I have this hunch that this might be
a glibc/linuxthreads bug. Potentially identical to bug #14263, maybe?
| Reporter | ||
Comment 41•23 years ago
|
||
Hi Mark,
I'm wondering if it might be best to open another bug for PSM on ARM. My trouble
is simply that Mozilla post-0.9.7 doesn't work *at all*. Perhaps the bugs are
related, but perhaps not...
I don't think my problem is a toolchain issue, if that helps you... I've used
gcc 2.95.3 and 3.1, and glibc 2.2.3 and 2.2.5.
On the other hand, if you're seeing a problem in PR_Unlock then you might want
to have a look at the changes that jeff@NerdOne.com contributed to stop threads
from leaking memory. They were committed on 27 December 2001. See bugs:
bug 96112
bug 96122
bug 96197
bug 96198
bug 96199
I don't really have any idea if these are the cause of the problem, though. The
code added all looks more or less straightforward. The only thing I'm wary about
in any of the patches is the change from PR_MALLOC() to malloc() in some places
in 96122, but that's only a gut reaction... I guess the only way to really test
would be to back out the changes and see if it works.
Oh, actually, here's an idea for you: Since Mozilla 0.9.7 works for me, perhaps
you could try compiling that and seeing if PSM works in it.
Regarding compilation of PSM, you should have a look at bug 104541.
| Reporter | ||
Comment 42•23 years ago
|
||
Mark: I also should mention that there are a whole bunch of other changes that
were committed to NSPR threads between 0.9.7 and 0.9.8 (December 2001 - February
2002). Those are all equally worth looking at, assuming that PSM in 0.9.7 works
for you.
Comment 43•22 years ago
|
||
what is the current status on this? Is this something that is still an issue?
(trying to help focus)
Comment 44•22 years ago
|
||
WORKSFORME under Familiar 0.7 (and 0.7.1). For more info, see
http://www.mozilla.org/projects/minimo/
of course, you have to have the right toolchain ;)
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → WORKSFORME
Comment 45•22 years ago
|
||
re: PR_dtoa, see bug 209814
You need to log in
before you can comment on or make changes to this bug.
Description
•