Closed Bug 26035 Opened 25 years ago Closed 25 years ago

solaris: Nightly build hangs after start

Categories

(Core Graveyard :: Tracking, defect, P3)

Sun
Solaris
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: digulla, Assigned: cls)

References

Details

Both solaris builds from Thu Jan 27 2000 (see nightly/latest/) hang shortly after started: ------- cut --------------------------------------------------------------- > ./mozilla .//run-mozilla.sh ./mozilla-bin MOZILLA_FIVE_HOME=/tmp/package LD_LIBRARY_PATH=/tmp/package::/home/dia/rvplayer5.0:/home/dia/rvplayer5.0:/export/gnu/lib SHLIB_PATH=/tmp/package LIBPATH=/tmp/package MOZ_PROGRAM=./mozilla-bin MOZ_TOOLKIT= moz_debug=0 moz_debugger= /apl/sfitools/defaults/cshrc: No such file or directory ************************************************** nsNativeComponentLoader: GetFactory(/builds/client/sol251/mozilla/mozilla/obj-sparc-sun-solaris2.5.1/dist/bin/components/libxpinstall.so) Load FAILED with error: <unknown; can't get error from NSPR> ************************************************** *** Deferring registration of sample JS components *** Registering sidebar JS components *** Registering sample JS components nNCL: registering deferred (0) nNCL: registering deferred (0) Gdk-WARNING **: shmat failed! Gdk-WARNING **: shmat failed! WEBSHELL+ = 1 WEBSHELL+ = 2 ------- cut ----------------------------------------- I had to edit run-mozilla.sh to be able to enter a debugger: ------- cut ----------------------------------------- > ./mozilla -g -d /export/gnu/bin/gdb .//run-mozilla.sh -g -d /export/gnu/bin/gdb ./mozilla-bin MOZILLA_FIVE_HOME=/tmp/package LD_LIBRARY_PATH=/tmp/package::/home/dia/rvplayer5.0:/home/dia/rvplayer5.0:/export/gnu/lib SHLIB_PATH=/tmp/package LIBPATH=/tmp/package MOZ_PROGRAM=./mozilla-bin MOZ_TOOLKIT= moz_debug=1 moz_debugger=/export/gnu/bin/gdb /apl/sfitools/defaults/cshrc: No such file or directory Could not find a debugger on your system. ------- cut ----------------------------------------- then, finally: (no debugging symbols found)...(no debugging symbols found)... (no debugging symbols found)... Gdk-WARNING **: shmat failed! Gdk-WARNING **: shmat failed! WEBSHELL+ = 1 WEBSHELL+ = 2 ^C Program received signal SIGINT, Interrupt. 0xef2b7400 in poll () from /usr/lib/libc.so.1 (gdb) bt #0 0xef2b7400 in poll () from /usr/lib/libc.so.1 #1 0xef53c900 in _MD_PauseCPU () from /tmp/package/libnspr3.so #2 0xef5376a0 in _PR_InitCPUs () from /tmp/package/libnspr3.so #3 0xef5394b4 in _PR_NativeRunThread () from /tmp/package/libnspr3.so (gdb)
mcafee/pav, any ideas on this one?
Summary: Hangs after start → solaris: Hangs after start
The latest M13 from did not show this. Must have happened in the last two weeks of January.
Specifically, this build doesn't show this behavior: 7146006 Jan 31 14:03 mozilla-sparc-sun-solaris2.6-M13.tar.gz ------ cut --------------------------------------------------- > ./run-mozilla.sh MOZILLA_FIVE_HOME=/tmp/package LD_LIBRARY_PATH=/tmp/package::/home/dia/rvplayer5.0:/home/dia/rvplayer5.0:/export/gnu/lib SHLIB_PATH=/tmp/package LIBPATH=/tmp/package MOZ_PROGRAM=mozilla-bin MOZ_TOOLKIT= moz_debug=0 moz_debugger= /apl/sfitools/defaults/cshrc: No such file or directory nNCL: registering deferred (0) Gdk-WARNING **: shmat failed! Gdk-WARNING **: shmat failed! WEBSHELL+ = 1 WEBSHELL+ = 2 failed to get the xpfe.dragdrop.enable pref, assuming it is off nsXULKeyListenerImpl::Init() nsCollationUnix::Initialize mLocale = /en_US/en_US/en_US/en_US/en_US/C nsCollationUnix::Initialize mCharset = ISO-8859-1 WEBSHELL+ = 3 WEBSHELL+ = 4 title string = [Mozilla] Setting content window browser.startup.page = 1 startpage = http://www.mozilla.org/projects/seamonkey/release-notes/m13.html title string = [Mozilla - Mozilla] ld.so.1: mozilla-bin: fatal: relocation error: file /usr/openwin/lib/locale/iso8859-1/xomEuro.so.2: symbol _XlcCompileResourceList: referenced symbol not found Killed ------ cut --------------------------------------------------- Here is another stack trace from the latest build (which also hangs): ------ cut --------------------------------------------------- #0 0xef2b7400 in poll () from /usr/lib/libc.so.1 #1 0xedd06960 in g_main_poll (timeout=-1, use_priority=3, priority=0) at gmain.c:1031 #2 0xedd060f0 in g_main_iterate (block=1, dispatch=1) at gmain.c:808 #3 0xedd065d4 in g_main_run (loop=0x113fb0) at gmain.c:932 #4 0xede3fd5c in gtk_main () at gtkmain.c:476 #5 0xeca37c98 in nsAppShell::Run () from /tmp/package/libwidget_gtk.so #6 0xecafcaf0 in nsAppShellService::Run () from /tmp/package/libnsappshell.so #7 0x17864 in NS_CanRun () #8 0x17c88 in main () ------ cut --------------------------------------------------- It seems that the threading is not working as expected. If you think that makes sense, I could try to compile Mozilla here (last time I tried, it failed with lots of errors :-/)
add wtc to cc list to see if he has any ideas.
This stack trace shows that you specified the wrong build variables: (gdb) bt #0 0xef2b7400 in poll () from /usr/lib/libc.so.1 #1 0xef53c900 in _MD_PauseCPU () from /tmp/package/libnspr3.so #2 0xef5376a0 in _PR_InitCPUs () from /tmp/package/libnspr3.so #3 0xef5394b4 in _PR_NativeRunThread () from /tmp/package/libnspr3.so These functions are conditionally compiled only in the obsolete "global threads only" version of NSPR. You must have specified GLOBAL_THREADS_ONLY=1 to NSPR's build system. What are the contents of your mozilla/nsprpub/config/my_config.mk and mozilla/nsprpub/config/my_overrides.mk?
As I said: That was the official nightly build :-) It has been fixed in the nightly yesterday but the problem itself (the hang) is still there.
resummarize
Summary: solaris: Hangs after start → solaris: Nightly build hangs after start
The nightly from Feb, 6th shows some slightly different messages, but still hangs in the same place: Profile Wizard and Manager activites : Begin Profile Manager : Command Line Options : Begin Profile Manager : Command Line Options : End ProfileManager : GetProfileDir ProfileManager : GetProfileDir Profile Manager : Profile Wizard and Manager activites : End WEBSHELL+ = 1 WEBSHELL+ = 2 ^C Program received signal SIGINT, Interrupt. 0xef2b7400 in poll () from /usr/lib/libc.so.1 (gdb) bt #0 0xef2b7400 in poll () from /usr/lib/libc.so.1 #1 0xee2e6960 in g_main_poll (timeout=-1, use_priority=3, priority=0) at gmain.c:1031 #2 0xee2e60f0 in g_main_iterate (block=1, dispatch=1) at gmain.c:808 #3 0xee2e65d4 in g_main_run (loop=0x1912d0) at gmain.c:932 #4 0xee43fd5c in gtk_main () at gtkmain.c:476 #5 0xee577f04 in nsAppShell::Run () from /tmp/package/libwidget_gtk.so #6 0xee7cb548 in nsAppShellService::Run () from /tmp/package/components/libnsappshell.so #7 0x186bc in NS_CanRun () #8 0x18b7c in main () (gdb)
*** Bug 26566 has been marked as a duplicate of this bug. ***
The nightly from Feb., the 14th also hangs (13th Feb. crashed, see #27680, fixed now). Interestingly, though: The build from the 13th showed a startup-image (Mozilla breathing fire over a city) which I never saw before and which is again missing in the build from the 14th. Moreover, when I delete ~/.mozilla, it hangs after WEBSHELL+ = 1 (WEBSHELL+ = 2) is not printed anymore. It will now always hang after this line (before that, it would always print WEBSHELL+ = 2 and then hang) at the same place (poll() in gmain.c:1031).
*** Bug 28400 has been marked as a duplicate of this bug. ***
Comments from: dhouston@bio.ri.ccf.org While trying to test the fix for another bug I found that the latest (2-17-2000) build of Mozilla will not run on my Solaris 7 box. I see the following in the console: MOZILLA_FIVE_HOME=/home/dhouston/mozilla/package LD_LIBRARY_PATH=/home/dhouston/mozilla/package:/usr/local/lib:/usr/dt/lib:/vol/oracle/product/7.3.2/lib SHLIB_PATH=/home/dhouston/mozilla/package LIBPATH=/home/dhouston/mozilla/package MOZ_PROGRAM=mozilla-bin MOZ_TOOLKIT= moz_debug=0 moz_debugger= stty: : Invalid argument nNCL: registering deferred (0) Profile Manager : Profile Wizard and Manager activites : Begin Profile Manager : Command Line Options : Begin Profile Manager : Command Line Options : End WEBSHELL+ = 1 but the program never comes up. The mozilla-bin process is running on the machine and if I truss it I see the following ad infinitum: lwp_sema_wait(0xFC5F1E78) (sleeping...) door_return(0x00000000, 0, 0x00000000, 0) (sleeping...) poll(0xFC611BE0, 1, 35000) = 0 poll(0x00164AC8, 3, -1) (sleeping...) signotifywait() (sleeping...) poll(0xFC611BE0, 1, 35000) (sleeping...) lwp_sema_wait(0xFC5F1E78) (sleeping...) door_return(0x00000000, 0, 0x00000000, 0) (sleeping...) I just applied the latest kernel patch (106541-09) and the libthread patch (106980-09) in an attempt fix it but to no avail. I also set shmsys in /etc/system. M13 worked on this machine. This bug blocks confirmation of bug 28187.
Blocks: 28187
I have been having this problem with all the nightly boulds since they stopped crashing from bug 13202. The stack trace is the same as what was first shown for this bug. #0 0xef2b788c in _poll () #1 0xef54e51c in _MD_PauseCPU () #2 0xef5492ac in _PR_InitCPUs () #3 0xef54b0c0 in _PR_NativeRunThread () My own build, from the source, works (most of the time). The nightlies never have.
I'll add that this problem also occurs on Solaris 8.
it seems to be looping in the following (truss output) poll(0xFEC41BE0, 1, 35000) (sleeping...) lwp_sema_wait(0xFE291E78) (sleeping...) door_return(0x00000000, 0, 0x00000000, 0) (sleeping...) poll(0xFEC41BE0, 1, 35000) = 0 poll(0x0018FAB0, 3, -1) (sleeping...) signotifywait() (sleeping...)
The nightly from 1st of March still hangs. I tried to build Mozilla from source but then it crashes in gtk_set_locale() (somewhere deep inside X11). Also, after deleting ~/.mozilla, I don't get the WEBSHELL+ = 2 anymore (only WEBSHELL+ = 1). Does everyone else have the same problem ?
This blocks testing on Solaris.
Severity: major → blocker
The nightly build from March, 10th still has the bug but shows a slightly different output in GDB: [New LWP 2 ] [New LWP 3 ] [New LWP 4 ] Profile Manager : Profile Wizard and Manager activites : Begin Profile Manager : Command Line Options : Begin Profile Manager : Command Line Options : End WEBSHELL+ = 1 [New LWP 5 ] [Switching to LWP 4 ] [Switching to LWP 1 ] As you can see, it actually switches between LWP 4 and 1 which it did not do before. But after that, it still hangs in poll(), as usual.
Updating QA Contact. I do not have Solaris.
QA Contact: leger → mcafee
Current CVS builds (and runs on solaris 2.6/2.7) problem appears to be that the nightly build is from MARCH.
[richb - 4/21/00] We (Sun) also have no problems building (and running) the latest Mozilla releases on Solaris, both with the Gnu compilers and the Sun Workshop 5.0 compilers. I suggest that this bug should be closed as fixed.
Since I have no Solaris box near me anymore, you cannot check this ATM. Since I won't have access to a Solaris box for some time, I've closed the bug :-) If it doesn't work, I'll complain when I can verify it.
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
By no means should this bug be considered fixed. There has been no working nightly build for sparc-solaris since january. That it is posible to compile from the source does not address this bug at all. In fact no new nighly builds have been delivered at all since march. I would not call that "FIXED". If there is no plan to ever suport solaris, you should say so, so that I can stop wasting my time.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The nightly build uses an old build script, this bug is orthagonal to whatever Sun is able to do in their backyard. The nightly build script is some weird way of building, I have yelled at the build group repeatedly about this. chofmann: either put somone on this or stop the nightly builds, I would rather publish nothing than tease people with useless bits.
[richb - 5/17/00] Solaris nightlies are currently broken with Sun compilers because of the String API rewrite. See bug 39424. Tinderbox no longer seems to show some of the Sun builds (nebiros and bismarch seemed to have disappeared). Because the Solaris O/S is of secondary importance to Netscape (ie. they are concentrating on their Tier-1 platforms; Windows, Mac and Linux), we at Sun have swapped over to just building with Gnu compilers. This is the same compiler that is used for the Linux platform, so when changes break the build there, they usually get fixed pretty quickly, and we can benefit from this. We are a limited size team, and we don't have the time to have an engineer continually trying to fix build breaks. This doesn't fix problems resulting from code that's checked in with bogus endian-ness or alignment problems on RISC architectures, but it should give us a greater chance of getting our nightly builds to completion.
->seawood
Assignee: chofmann → cls
Status: REOPENED → NEW
I just downloaded the latest nightly build from ftp://ftp.mozilla.org/pub/mozilla/nightly/latest/ and it works fine for me. -rw-rw-r-- 1 22 12614112 Jun 15 15:34 mozilla-sparc-sun-solaris2.6.tar.gz Nebiros is back up on the SeaMonkey-Ports page building with the WS5.0 compiler. The builds complete but it is currently failing one of the DOM Conversion tests.
Status: NEW → RESOLVED
Closed: 25 years ago25 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.