xulrunner crashes or hangs on first startup

RESOLVED INCOMPLETE

Status

Toolkit Graveyard
XULRunner
RESOLVED INCOMPLETE
11 years ago
2 years ago

People

(Reporter: Ginn Chen, Unassigned)

Tracking

({crash, hang})

Trunk
x86
OpenSolaris
crash, hang

Details

Attachments

(1 attachment, 1 obsolete attachment)

687 bytes, patch
Details | Diff | Splinter Review
(Reporter)

Description

11 years ago
Build xulrunner
do
dist/bin/xulrunner dist/xpi-stage/simple/application.ini

it crashes
run the command again, it works.

If you remove dist/bin/components/compreg.dat, it will crash again.
Making components directory read-only doesn't help.

I noticed this assert with debug build, it might be related.

###!!! ASSERTION: mozJSComponentLoader should be a singleton: '!sSelf', file mozJSComponentLoader.cpp, line 459

I got different core stacks.
Here's a typical one.

 fd6d4943 void XPCJSRuntime::TraceJS(JSTracer*,void*) (80463f0, 832f490) + 27
 fcd71a38 js_GC    (832e890, 0) + 354
 fcd2f1a9 JS_GC    (832e890) + 41
 fd7035a5 unsigned mozJSComponentLoader::Observe(nsISupports*,const char*,const unsigned short*) (83288b8, 828404c, fe645285, 0) + 65
 fe381eb7 void nsObserverList::NotifyObservers(nsISupports*,const char*,const unsigned short*) (8436760, 828404c, fe645285, 0) + 87
 fe3828cd unsigned nsObserverService::NotifyObservers(nsISupports*,const char*,const unsigned short*) (82626e0, 828404c, fe645285, 0) + 55
 fe37b06c NS_ShutdownXPCOM_P (828404c) + c0
 fd69fc7c ScopedXPCOMStartup::~ScopedXPCOMStartup #Nvariant 1() (8046950) + 38
 fd6a7323 XRE_main (1, 8046e8c, 8074d90) + 26f3
 08057701 main     (1, 8046e8c, 8046e94) + 8a9
 08056742 _start   (2, 804702c, 804702c, 0, 8047065, 80470a9) + 7a

Comment 1

10 years ago
set a breakpoint in mozJSComponentLoader::mozJSComponentLoader, someone's probably doing something really wrong.
(Reporter)

Comment 2

10 years ago
something was changed

first, for the testcase
compreg.dat locates at ~/.mozillatest/simple/XXXXXX.default/

second, I got an assert in pt_AttachThread()
The reason is we're calling PR_GetCurrentThread() during _PR_ImplicitInitilization().

The stack is
 fee8641d abort    (fedeebe0, feda7fd9, 80465e0, fedd1aa4, feddde84, feddde8c) + cd
 feda803c PR_Assert (feddde84, feddde8c, 125) + 6c
 fedd1aa4 pt_AttachThread (fe2b0190, fedd2509, 0, fed70a38, 8046608, fc2ca486) + b4
 fedd255a PR_GetCurrentThread (fe2b0190, fc2ca469, 8046620, fc5a8c9a, fe9ad160, 0) + 5a
 fc2ca486 nsAutoOwningThread::nsAutoOwningThread #Nvariant 1() (fe9ad160, 0) + 26
 fc5a8c9a nsCategoryImp::nsCategoryImp() (fe9ad158, 0) + 5a
 fc5a8a9f __SLIP.INIT_A (fe2b0190, fc5a8d09, 804665c, fdd9682f, feffb7cc, fed70a38) + 2f
 fc5a8d26 void __STATIC_CONSTRUCTOR() (feffb7cc, fed70a38, 11, 8046694, fefd3876, feffb170) + 26
 fdd9682f __cplus_fini_at_exit (feffb170, fede0d38, feffb7cc, fefa0768, fefa0768, 33)
 fefd3876 call_init (fb2d1958, 1) + f6
 fefd3e0b load_completion (fede0d38) + ef
 fefd8b1a dlsym_intn (fed701f0, feddacf8, fefa0768, 8046764) + e2
 fefd8b92 dlsym_check (fed701f0, feddacf8, fefa0768, 8046764) + 6e
 fefd8c02 dlsym    (fed701f0, feddacf8) + 46
 fedb04a1 pr_FindSymbolInProg (feddacf8, 0) + 51
 fedb0505 _PR_InitZones (fedeebe0, fedb94c9, 80467e4, fedb9736, fedeebe0, fedb9719) + 35
 fedb9506 _PR_InitStuff (fedeebe0, fedb9719, 80467fc, fedb85d4, 807bbe0, fedb85a9) + 46
 fedb9736 _PR_ImplicitInitialization (807bbe0, fedb85a9, 8046d48, feffde58, 8046d48, 805c2dc) + 26
 fedb85d4 PR_GetEnv (806ab8c, 0) + 34
 0805c2dc main     (1, 8046d80, 8046d88, fef8e2c0) + 96c
 0805adcd _start   (1, 8046f3c, 0, 8046f53, 8046f6c, 8046fba) + 7d

We defined nspr_use_zone_allocator = PR_FALSE to avoid things like this, as the comment at 
http://lxr.mozilla.org/seamonkey/source/toolkit/xre/nsAppRunner.cpp#2383

It doesn't work for Xulrunner, because it calls PR_GetEnv() before XRE_main().

(Reporter)

Comment 3

10 years ago
Created attachment 314030 [details] [diff] [review]
patch
Assignee: nobody → ginn.chen
Status: NEW → ASSIGNED
Attachment #314030 - Flags: review?
(Reporter)

Updated

10 years ago
Assignee: ginn.chen → nobody
Status: ASSIGNED → NEW
Component: XPConnect → XULRunner
Product: Core → Toolkit
QA Contact: xpconnect → xulrunner
Version: unspecified → Trunk
(Reporter)

Updated

10 years ago
Attachment #314030 - Flags: review? → review?(wtc)

Comment 4

10 years ago
Comment on attachment 314030 [details] [diff] [review]
patch

r=wtc.

Ginn, so the reason you define nspr_use_zone_allocator is
not to speed up startup performance?

"To avoid this happen" ==> "To prevent this from happening"
or simply "To prevent this".
Attachment #314030 - Flags: review?(wtc) → review+
(Reporter)

Comment 5

10 years ago
(In reply to comment #4)
> (From update of attachment 314030 [details] [diff] [review])
> r=wtc.
> 
> Ginn, so the reason you define nspr_use_zone_allocator is
> not to speed up startup performance?

Right.
(Reporter)

Comment 6

10 years ago
Comment on attachment 314030 [details] [diff] [review]
patch

BTW: It doesn't need approval1.9 flag, right?
Attachment #314030 - Flags: superreview?(benjamin)

Comment 7

10 years ago
Is this specific to Solaris? If so, I'd prefer an ifdef or something, since boy is it ugly.

Comment 8

10 years ago
+// NSPR searches "nspr_use_zone_allocator" during its initialization.

NSPR searches => NSPR searches _for_

(Reporter)

Comment 9

10 years ago
I don't think it's specific to Solaris.
But on Linux, the sequence is different, _PR_ImplicitInitialization is called by _init() from libxul.so before it gets into main(), and pr_FindSymbolInProg works fine.

Since it's only reproducible on Solaris, I agree to add ifdef.

timeless: Thank you.

Comment 10

10 years ago
Adding an ifdef makes the code uglier.  It's fine to explain in the
comment that only Solaris needs this global variable.  The
PRBool variable only takes up 4 bytes.

Comment 11

10 years ago
If we need to do this for the standalone glue, we should do it in libxpcomglue_s.a, not in this single file. I just feel that we really shouldn't need to do it.
(Reporter)

Comment 12

10 years ago
We need to do this because on Solaris we gets into main() before libxul.so is loaded.
We need to make sure nspr can be initialized successfully in this case.

Updated

10 years ago
Attachment #314030 - Flags: superreview?(benjamin) → superreview-
(Reporter)

Comment 13

10 years ago
This is a blocker of Solaris xulrunner.
Severity: critical → blocker
(Reporter)

Comment 14

10 years ago
We need to make sure nspr_use_zone_allocator is defined and linked into firefox-bin, thunderbird-bin, xulrunner-bin, etc. on Solaris.
Otherwise, a slight change of nsBrowserApp.cpp will crash Firefox.

Which file should it be put in?

wtc, should it be fixed in NSPR instead?

Comment 15

10 years ago
Fixing this in NSPR means removing a (rarely used) feature
of NSPR.  In general we can't do that because of our backward
compatibility requirement.

If you really can't get your Mozilla patch checked in, let's
talk again.
(Reporter)

Comment 16

9 years ago
In general, the problem here is PR_GetCurrentThread() is called before _PR_InitStuff() finished, so PR_Assert fails.
It is reproducible with some test programs in xpcom/test. Just do 'make check' for debug build on Solaris.
It's kind of random, some of the test programs are fine, the others always fail.
With LD_DEBUG=.init, we can see if xpcom library .init earlier than main(), it is fine, otherwise we hit this bug.

On Solaris, we use -z lazyload to make Firefox startup faster.
It would not work as expected, if NSPR tries to dlsym nspr_use_zone_allocator.
So we fake a nspr_use_zone_allocator symbol in nsAppRunner.cpp.

I think we can't just add the fake symbol for every executable.
If we can't get it fixed in NSPR, the workaround I have is to disable lazyload for xpcom library.
(Reporter)

Comment 17

9 years ago
Created attachment 413357 [details] [diff] [review]
workaround
Attachment #314030 - Attachment is obsolete: true
(Reporter)

Comment 18

8 years ago
The workaround will not work if there's another lazyload library calls _PR_InitStuff() in its .init.

Comment 19

7 years ago
(In reply to Ginn Chen from comment #13)
> This is a blocker of Solaris xulrunner.

is this really blocking development?
Keywords: crash, hang
(Reporter)

Comment 20

7 years ago
I've not built xulrunner for a long time.
I don't know if this bug still exist.

Run xulrunner with LD_BIND_NOW=1 might help.

It doesn't block development, since we have workaround.
Severity: blocker → normal

Comment 21

2 years ago
XULRunner has been removed from the Mozilla tree: see https://groups.google.com/forum/#!topic/mozilla.dev.platform/_rFMunG2Bgw for context.

I am closing all the bugs currently in the XULRunner bugzilla component, in preparation for moving this component to the graveyard. If this bug is still valid in a XULRunner-less world, it will need to be moved to a different bugzilla component to be reopened.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → INCOMPLETE
(Assignee)

Updated

2 years ago
Product: Toolkit → Toolkit Graveyard
You need to log in before you can comment on or make changes to this bug.