Closed
Bug 14263
Opened 25 years ago
Closed 25 years ago
[DOGFOOD] Linux/Alpha: Assertion failure: 0 == rv, at ptsynch.c:168
Categories
(NSPR :: NSPR, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: niles, Assigned: jband_mozilla)
References
Details
(Whiteboard: [PDT-])
Attachments
(2 files)
115.71 KB,
text/plain
|
Details | |
562 bytes,
patch
|
Details | Diff | Splinter Review |
This bug prevents Mozilla-19990917 from running at all on Linux/Alpha.
I get:
> ./apprunner
nsNativeComponentLoader: autoregistering /home/niles/mozilla/dist/bin/components
nsNativeComponentLoader: autoregistering succeeded
nsUnixToolkitService: Unknown toolkit ' '. Using 'gtk'.
nsUnixToolkitService: Using 'gtk' for the Toolkit.
NS_SetupRegistry() MOZ_TOOLKIT=gtk, WIDGET_DLL=libwidget_gtk.so,
GFX_DLL=libgfx_gtk.so
started appcores
GFX: dpi=100 t2p=0.0694444 p2t=14.4 depth=16
Using '/home/niles/mozilla/dist/bin' as the resource: base
initialized appshell
ProfileName : mozProfile
ProfileDir : /home/niles/.mozilla/mozProfile
Initialized app shell component {4a85a5d0-cddd-11d2-b7f6-00805f05ffa5},
rv=0x00000000
Got the event queue from the service
Calling gdk_input_add with event queue
Assertion failure: 0 == rv, at ptsynch.c:168
when I try to run Mozilla on Linux/Alpha.
I've tried to run gdb on it but I can't get GDB to use my
environmental variables! So component registeration fails.
I've tried "set environment" to no avail.
(am I being stupid? how do I do this?) Perhaps, this
is a bug in GDB for Linux/Alpha. This is what I get:
(gdb) run
Starting program: /home/niles/mozilla/dist/bin/./simplebrowser
Assertion: "Cannot obtain unix toolkit service." (rv == NS_OK) at file
../../../../webshell/tests/viewer/nsSetupRegistry.cpp, line 285
Break: at file ../../../../webshell/tests/viewer/nsSetupRegistry.cpp, line 285
NS_SetupRegistry() MOZ_TOOLKIT=error, WIDGET_DLL=error, GFX_DLL=error
I put in a bunch of print statements and I've found that
this problem happens when PR_Lock(lock) get called with
lock set to some crazy pointer value that seems well outside
the range of the program space. But without a debugger I
can't tell where it's being called from. Obviously, this pretty
much indicates it's not a NSPR bug, but I didn't know where else
to send it.
Assignee: srinivas → chofmann
Summary: Linux/Alpha: Assertion failure: 0 == rv, at ptsynch.c:168 → Linux/Alpha: Assertion failure: 0 == rv, at ptsynch.c:168
Assigning the bug to Chris Hofmann. Chris, can you please re-assign this appropriately?
Reporter | ||
Comment 2•25 years ago
|
||
If I run mozilla under the debugger I get the behavior listed in Bug #14259. I would not mark this as a duplicate yet, as I believe there's two separate bugs happening here.
Updated•25 years ago
|
Assignee: chofmann → dp
Comment 3•25 years ago
|
||
niles, can you try a build closer to current? dp, might be able to see something in the stack trace, but lets see if it has already been fixed first.
Reporter | ||
Comment 4•25 years ago
|
||
Oh, I guess I should have made that clear I just tried it with yesturday's nightly build and it still behaved the same. I'm not sure how to give you more info since if I run it under the gdb it behaves like Bug #14259.
Reporter | ||
Comment 5•25 years ago
|
||
I believe this is a SMP+(create .mozilla config) bug. I got the exact same problem when I tried to run the M10 binary on a SMP x86 machine with no .mozilla directory. I rebooted in non-SMP mode and it ran fine. Once the .mozilla directory was present it ran fine in SMP mode too! Do you have any Linux/SMP machines which you can test M10 with? Make sure you delete the .mozilla directory. Please post back with the results. This seems like a more serious bug that I first thought if it affects x86/SMP/Linux as well as Alpha/SMP/Linux. It seems that the threads are getting out of order.
Updated•25 years ago
|
Status: NEW → ASSIGNED
Summary: Linux/Alpha: Assertion failure: 0 == rv, at ptsynch.c:168 → [DOGFOOD] Linux/Alpha: Assertion failure: 0 == rv, at ptsynch.c:168
Target Milestone: M12
Comment 6•25 years ago
|
||
Could you attach the xpcom log for the fail case. Maybe we can see where things go out of hand. To get a log: setenv NSPR_LOG_MODULES nsComponentManager:5 setenv NSPR_LOG_FILE xpcom.log apprunner <now you should have a log file xpcom.log> For every new run, please delete the log file.
Reporter | ||
Comment 7•25 years ago
|
||
Comment 8•25 years ago
|
||
1024[12010cf00]: nsComponentManager: FindFactory({be761f00-a3b0-11d2-996c-0080c7cb1081}) 1024[12010cf00]: found (null) as 120120ff0 in factory cache. 1024[12010cf00]: Factory CreateInstance() succeeded. 1024[12010cf00]: Factory CreateInstance() FAILED. This part sounds scary. Why does Factory CreateInstance() FAIL ? Why are there two Factory CreateInstance() ouputs one after another.... Thanks for the log output.
Current thinking is that we won't be able to support 64 bit processors by beta 1
Comment 10•25 years ago
|
||
This is silly: 64-bit Mozilla is built and run by many members of the Mozilla community every day. Bugs are found and fixed, usually based on patches that the 64-bit platform champions submit. Given NSPR, there is little excuse for breaking 64-bit ANSI-C platforms. But as dp points out, this bug is likely a thread safety problem, not a 64-bit bug. /be
Comment 11•25 years ago
|
||
DP had the following clarification, saying that it is not just an Alpha (64bit) bug, but rather a multi-cpu problem. I think the position of the PDT was that this was not critical for 99.99% of the inhouse use. I would certainly agree that this is a crasher to handle by FCS, and as soon as possible in the beta cycle, but it will not stop the bulk of the day-in-day-out dogfood use. If Brendan thinks that the threading infrastructure is horked, then we have a porkjockey problem... but otherwise this seems like a more obscure bug than the many crashers that we have categorized as dogfood. The following is DP's email commentary: Just wanted to give you the full scoop on this bug. This isn't just alpha. This is for multiprocessor linux too. This is a symptom of threading problems. Charlie Manske said he sees freezes on his multicpu windows machine. Threading first shows up on multicpu machine. Affects others under wierd circumstances. 1. This is a symptom of incorrect threading happening. Wont know where it will bite us. 2. We know multicpu linux machines will not eat dogfood. (1) is the reason I maked it dogfood.
Comment 12•25 years ago
|
||
xpconnect isn't being built as a component on alpha. The pthread problem is because JS isn't getting initialized properly. I've attached a patch that fixes my box here.
Comment 13•25 years ago
|
||
Comment 14•25 years ago
|
||
*** Bug 14259 has been marked as a duplicate of this bug. ***
Updated•25 years ago
|
Assignee: dp → jband
Status: ASSIGNED → NEW
Comment 15•25 years ago
|
||
xpconnect isn't a component. msw@gimp.org claims if that becomes one, this bug is solved. Thanks a zillion to msw@gimp.org (I am Ccing you on this bug)
Comment 16•25 years ago
|
||
I've checked in the Makefile.in patch to build xpconnect as a component. I've had another Linux/Alpha confirm that his build from after the fix works.
Comment 17•25 years ago
|
||
I'm not seeing these problems anymore in my Alpha builds. Assume fixed?
Updated•24 years ago
|
Target Milestone: M12 → ---
You need to log in
before you can comment on or make changes to this bug.
Description
•