Closed Bug 403466 Opened 17 years ago Closed 8 years ago

xulrunner crashes or hangs on first startup

Categories

(Toolkit Graveyard :: XULRunner, defect)

x86
OpenSolaris
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: ginnchen+exoracle, Unassigned)

Details

(Keywords: crash, hang)

Attachments

(1 file, 1 obsolete file)

Build xulrunner
do
dist/bin/xulrunner dist/xpi-stage/simple/application.ini

it crashes
run the command again, it works.

If you remove dist/bin/components/compreg.dat, it will crash again.
Making components directory read-only doesn't help.

I noticed this assert with debug build, it might be related.

###!!! ASSERTION: mozJSComponentLoader should be a singleton: '!sSelf', file mozJSComponentLoader.cpp, line 459

I got different core stacks.
Here's a typical one.

 fd6d4943 void XPCJSRuntime::TraceJS(JSTracer*,void*) (80463f0, 832f490) + 27
 fcd71a38 js_GC    (832e890, 0) + 354
 fcd2f1a9 JS_GC    (832e890) + 41
 fd7035a5 unsigned mozJSComponentLoader::Observe(nsISupports*,const char*,const unsigned short*) (83288b8, 828404c, fe645285, 0) + 65
 fe381eb7 void nsObserverList::NotifyObservers(nsISupports*,const char*,const unsigned short*) (8436760, 828404c, fe645285, 0) + 87
 fe3828cd unsigned nsObserverService::NotifyObservers(nsISupports*,const char*,const unsigned short*) (82626e0, 828404c, fe645285, 0) + 55
 fe37b06c NS_ShutdownXPCOM_P (828404c) + c0
 fd69fc7c ScopedXPCOMStartup::~ScopedXPCOMStartup #Nvariant 1() (8046950) + 38
 fd6a7323 XRE_main (1, 8046e8c, 8074d90) + 26f3
 08057701 main     (1, 8046e8c, 8046e94) + 8a9
 08056742 _start   (2, 804702c, 804702c, 0, 8047065, 80470a9) + 7a
set a breakpoint in mozJSComponentLoader::mozJSComponentLoader, someone's probably doing something really wrong.
something was changed

first, for the testcase
compreg.dat locates at ~/.mozillatest/simple/XXXXXX.default/

second, I got an assert in pt_AttachThread()
The reason is we're calling PR_GetCurrentThread() during _PR_ImplicitInitilization().

The stack is
 fee8641d abort    (fedeebe0, feda7fd9, 80465e0, fedd1aa4, feddde84, feddde8c) + cd
 feda803c PR_Assert (feddde84, feddde8c, 125) + 6c
 fedd1aa4 pt_AttachThread (fe2b0190, fedd2509, 0, fed70a38, 8046608, fc2ca486) + b4
 fedd255a PR_GetCurrentThread (fe2b0190, fc2ca469, 8046620, fc5a8c9a, fe9ad160, 0) + 5a
 fc2ca486 nsAutoOwningThread::nsAutoOwningThread #Nvariant 1() (fe9ad160, 0) + 26
 fc5a8c9a nsCategoryImp::nsCategoryImp() (fe9ad158, 0) + 5a
 fc5a8a9f __SLIP.INIT_A (fe2b0190, fc5a8d09, 804665c, fdd9682f, feffb7cc, fed70a38) + 2f
 fc5a8d26 void __STATIC_CONSTRUCTOR() (feffb7cc, fed70a38, 11, 8046694, fefd3876, feffb170) + 26
 fdd9682f __cplus_fini_at_exit (feffb170, fede0d38, feffb7cc, fefa0768, fefa0768, 33)
 fefd3876 call_init (fb2d1958, 1) + f6
 fefd3e0b load_completion (fede0d38) + ef
 fefd8b1a dlsym_intn (fed701f0, feddacf8, fefa0768, 8046764) + e2
 fefd8b92 dlsym_check (fed701f0, feddacf8, fefa0768, 8046764) + 6e
 fefd8c02 dlsym    (fed701f0, feddacf8) + 46
 fedb04a1 pr_FindSymbolInProg (feddacf8, 0) + 51
 fedb0505 _PR_InitZones (fedeebe0, fedb94c9, 80467e4, fedb9736, fedeebe0, fedb9719) + 35
 fedb9506 _PR_InitStuff (fedeebe0, fedb9719, 80467fc, fedb85d4, 807bbe0, fedb85a9) + 46
 fedb9736 _PR_ImplicitInitialization (807bbe0, fedb85a9, 8046d48, feffde58, 8046d48, 805c2dc) + 26
 fedb85d4 PR_GetEnv (806ab8c, 0) + 34
 0805c2dc main     (1, 8046d80, 8046d88, fef8e2c0) + 96c
 0805adcd _start   (1, 8046f3c, 0, 8046f53, 8046f6c, 8046fba) + 7d

We defined nspr_use_zone_allocator = PR_FALSE to avoid things like this, as the comment at 
http://lxr.mozilla.org/seamonkey/source/toolkit/xre/nsAppRunner.cpp#2383

It doesn't work for Xulrunner, because it calls PR_GetEnv() before XRE_main().

Attached patch patch (obsolete) — Splinter Review
Assignee: nobody → ginn.chen
Status: NEW → ASSIGNED
Attachment #314030 - Flags: review?
Assignee: ginn.chen → nobody
Status: ASSIGNED → NEW
Component: XPConnect → XULRunner
Product: Core → Toolkit
QA Contact: xpconnect → xulrunner
Version: unspecified → Trunk
Attachment #314030 - Flags: review? → review?(wtc)
Comment on attachment 314030 [details] [diff] [review]
patch

r=wtc.

Ginn, so the reason you define nspr_use_zone_allocator is
not to speed up startup performance?

"To avoid this happen" ==> "To prevent this from happening"
or simply "To prevent this".
Attachment #314030 - Flags: review?(wtc) → review+
(In reply to comment #4)
> (From update of attachment 314030 [details] [diff] [review])
> r=wtc.
> 
> Ginn, so the reason you define nspr_use_zone_allocator is
> not to speed up startup performance?

Right.
Comment on attachment 314030 [details] [diff] [review]
patch

BTW: It doesn't need approval1.9 flag, right?
Attachment #314030 - Flags: superreview?(benjamin)
Is this specific to Solaris? If so, I'd prefer an ifdef or something, since boy is it ugly.
+// NSPR searches "nspr_use_zone_allocator" during its initialization.

NSPR searches => NSPR searches _for_

I don't think it's specific to Solaris.
But on Linux, the sequence is different, _PR_ImplicitInitialization is called by _init() from libxul.so before it gets into main(), and pr_FindSymbolInProg works fine.

Since it's only reproducible on Solaris, I agree to add ifdef.

timeless: Thank you.
Adding an ifdef makes the code uglier.  It's fine to explain in the
comment that only Solaris needs this global variable.  The
PRBool variable only takes up 4 bytes.
If we need to do this for the standalone glue, we should do it in libxpcomglue_s.a, not in this single file. I just feel that we really shouldn't need to do it.
We need to do this because on Solaris we gets into main() before libxul.so is loaded.
We need to make sure nspr can be initialized successfully in this case.

Attachment #314030 - Flags: superreview?(benjamin) → superreview-
This is a blocker of Solaris xulrunner.
Severity: critical → blocker
We need to make sure nspr_use_zone_allocator is defined and linked into firefox-bin, thunderbird-bin, xulrunner-bin, etc. on Solaris.
Otherwise, a slight change of nsBrowserApp.cpp will crash Firefox.

Which file should it be put in?

wtc, should it be fixed in NSPR instead?
Fixing this in NSPR means removing a (rarely used) feature
of NSPR.  In general we can't do that because of our backward
compatibility requirement.

If you really can't get your Mozilla patch checked in, let's
talk again.
In general, the problem here is PR_GetCurrentThread() is called before _PR_InitStuff() finished, so PR_Assert fails.
It is reproducible with some test programs in xpcom/test. Just do 'make check' for debug build on Solaris.
It's kind of random, some of the test programs are fine, the others always fail.
With LD_DEBUG=.init, we can see if xpcom library .init earlier than main(), it is fine, otherwise we hit this bug.

On Solaris, we use -z lazyload to make Firefox startup faster.
It would not work as expected, if NSPR tries to dlsym nspr_use_zone_allocator.
So we fake a nspr_use_zone_allocator symbol in nsAppRunner.cpp.

I think we can't just add the fake symbol for every executable.
If we can't get it fixed in NSPR, the workaround I have is to disable lazyload for xpcom library.
Attached patch workaroundSplinter Review
Attachment #314030 - Attachment is obsolete: true
The workaround will not work if there's another lazyload library calls _PR_InitStuff() in its .init.
(In reply to Ginn Chen from comment #13)
> This is a blocker of Solaris xulrunner.

is this really blocking development?
Keywords: crash, hang
I've not built xulrunner for a long time.
I don't know if this bug still exist.

Run xulrunner with LD_BIND_NOW=1 might help.

It doesn't block development, since we have workaround.
Severity: blocker → normal
XULRunner has been removed from the Mozilla tree: see https://groups.google.com/forum/#!topic/mozilla.dev.platform/_rFMunG2Bgw for context.

I am closing all the bugs currently in the XULRunner bugzilla component, in preparation for moving this component to the graveyard. If this bug is still valid in a XULRunner-less world, it will need to be moved to a different bugzilla component to be reopened.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Product: Toolkit → Toolkit Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: