Closed Bug 26035 Opened 25 years ago Closed 24 years ago

solaris: Nightly build hangs after start

Categories

(Core Graveyard :: Tracking, defect, P3)

Sun
Solaris
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: digulla, Assigned: cls)

References

Details

Both solaris builds from Thu Jan 27 2000 (see nightly/latest/) hang
shortly after started:

------- cut ---------------------------------------------------------------
> ./mozilla
.//run-mozilla.sh ./mozilla-bin
MOZILLA_FIVE_HOME=/tmp/package
 
LD_LIBRARY_PATH=/tmp/package::/home/dia/rvplayer5.0:/home/dia/rvplayer5.0:/export/gnu/lib
       SHLIB_PATH=/tmp/package
          LIBPATH=/tmp/package
      MOZ_PROGRAM=./mozilla-bin
      MOZ_TOOLKIT=
        moz_debug=0
     moz_debugger=
/apl/sfitools/defaults/cshrc: No such file or directory
**************************************************
nsNativeComponentLoader:
GetFactory(/builds/client/sol251/mozilla/mozilla/obj-sparc-sun-solaris2.5.1/dist/bin/components/libxpinstall.so)
Load FAILED with error: <unknown; can't get error from NSPR>
**************************************************
*** Deferring registration of sample JS components
*** Registering sidebar JS components
*** Registering sample JS components
nNCL: registering deferred (0)
nNCL: registering deferred (0)

Gdk-WARNING **: shmat failed!

Gdk-WARNING **: shmat failed!
WEBSHELL+ = 1
WEBSHELL+ = 2
------- cut -----------------------------------------

I had to edit run-mozilla.sh to be able to enter a debugger:

------- cut -----------------------------------------
> ./mozilla -g -d /export/gnu/bin/gdb 
.//run-mozilla.sh -g -d /export/gnu/bin/gdb ./mozilla-bin
MOZILLA_FIVE_HOME=/tmp/package
 
LD_LIBRARY_PATH=/tmp/package::/home/dia/rvplayer5.0:/home/dia/rvplayer5.0:/export/gnu/lib
       SHLIB_PATH=/tmp/package
          LIBPATH=/tmp/package
      MOZ_PROGRAM=./mozilla-bin
      MOZ_TOOLKIT=
        moz_debug=1
     moz_debugger=/export/gnu/bin/gdb
/apl/sfitools/defaults/cshrc: No such file or directory
Could not find a debugger on your system.
------- cut -----------------------------------------

then, finally:

(no debugging symbols found)...(no debugging symbols found)...
(no debugging symbols found)...
Gdk-WARNING **: shmat failed!

Gdk-WARNING **: shmat failed!
WEBSHELL+ = 1
WEBSHELL+ = 2
^C
Program received signal SIGINT, Interrupt.
0xef2b7400 in poll () from /usr/lib/libc.so.1
(gdb) bt
#0  0xef2b7400 in poll () from /usr/lib/libc.so.1
#1  0xef53c900 in _MD_PauseCPU () from /tmp/package/libnspr3.so
#2  0xef5376a0 in _PR_InitCPUs () from /tmp/package/libnspr3.so
#3  0xef5394b4 in _PR_NativeRunThread () from /tmp/package/libnspr3.so
(gdb)
mcafee/pav, any ideas on this one?
Summary: Hangs after start → solaris: Hangs after start
The latest M13 from did not show this. Must have happened in the last two weeks
of January.
Specifically, this build doesn't show this behavior:

7146006 Jan 31 14:03 mozilla-sparc-sun-solaris2.6-M13.tar.gz

------ cut ---------------------------------------------------
> ./run-mozilla.sh 
MOZILLA_FIVE_HOME=/tmp/package
 
LD_LIBRARY_PATH=/tmp/package::/home/dia/rvplayer5.0:/home/dia/rvplayer5.0:/export/gnu/lib
       SHLIB_PATH=/tmp/package
          LIBPATH=/tmp/package
      MOZ_PROGRAM=mozilla-bin
      MOZ_TOOLKIT=
        moz_debug=0
     moz_debugger=
/apl/sfitools/defaults/cshrc: No such file or directory
nNCL: registering deferred (0)

Gdk-WARNING **: shmat failed!

Gdk-WARNING **: shmat failed!
WEBSHELL+ = 1
WEBSHELL+ = 2
failed to get the xpfe.dragdrop.enable pref, assuming it is off
nsXULKeyListenerImpl::Init()
nsCollationUnix::Initialize mLocale = /en_US/en_US/en_US/en_US/en_US/C
nsCollationUnix::Initialize mCharset = ISO-8859-1
WEBSHELL+ = 3
WEBSHELL+ = 4
title string = [Mozilla]
Setting content window
browser.startup.page = 1
startpage = http://www.mozilla.org/projects/seamonkey/release-notes/m13.html
title string = [Mozilla - Mozilla]
ld.so.1: mozilla-bin: fatal: relocation error: file
/usr/openwin/lib/locale/iso8859-1/xomEuro.so.2: symbol _XlcCompileResourceList:
referenced symbol not found
Killed
------ cut ---------------------------------------------------

Here is another stack trace from the latest build (which also hangs):

------ cut ---------------------------------------------------
#0  0xef2b7400 in poll () from /usr/lib/libc.so.1
#1  0xedd06960 in g_main_poll (timeout=-1, use_priority=3, priority=0)
    at gmain.c:1031
#2  0xedd060f0 in g_main_iterate (block=1, dispatch=1) at gmain.c:808
#3  0xedd065d4 in g_main_run (loop=0x113fb0) at gmain.c:932
#4  0xede3fd5c in gtk_main () at gtkmain.c:476
#5  0xeca37c98 in nsAppShell::Run () from /tmp/package/libwidget_gtk.so
#6  0xecafcaf0 in nsAppShellService::Run () from /tmp/package/libnsappshell.so
#7  0x17864 in NS_CanRun ()
#8  0x17c88 in main ()
------ cut ---------------------------------------------------

It seems that the threading is not working as expected. If you think that
makes sense, I could try to compile Mozilla here (last time I tried, it
failed with lots of errors :-/)
add wtc to cc list to see if he has any ideas.
This stack trace shows that you specified the
wrong build variables:
    (gdb) bt
    #0  0xef2b7400 in poll () from /usr/lib/libc.so.1
    #1  0xef53c900 in _MD_PauseCPU () from /tmp/package/libnspr3.so
    #2  0xef5376a0 in _PR_InitCPUs () from /tmp/package/libnspr3.so
    #3  0xef5394b4 in _PR_NativeRunThread () from /tmp/package/libnspr3.so

These functions are conditionally compiled only in the
obsolete "global threads only" version of NSPR.
You must have specified GLOBAL_THREADS_ONLY=1 to NSPR's build
system.  What are the contents of your mozilla/nsprpub/config/my_config.mk
and mozilla/nsprpub/config/my_overrides.mk?
As I said: That was the official nightly build :-) It has been fixed in the
nightly yesterday but the problem itself (the hang) is still there.
resummarize
Summary: solaris: Hangs after start → solaris: Nightly build hangs after start
The nightly from Feb, 6th shows some slightly different messages, but still
hangs in the same place:

Profile Wizard and Manager activites : Begin
Profile Manager : Command Line Options : Begin
Profile Manager : Command Line Options : End
ProfileManager : GetProfileDir
ProfileManager : GetProfileDir
Profile Manager : Profile Wizard and Manager activites : End
WEBSHELL+ = 1
WEBSHELL+ = 2
^C
Program received signal SIGINT, Interrupt.
0xef2b7400 in poll () from /usr/lib/libc.so.1
(gdb) bt
#0  0xef2b7400 in poll () from /usr/lib/libc.so.1
#1  0xee2e6960 in g_main_poll (timeout=-1, use_priority=3, priority=0)
    at gmain.c:1031
#2  0xee2e60f0 in g_main_iterate (block=1, dispatch=1) at gmain.c:808
#3  0xee2e65d4 in g_main_run (loop=0x1912d0) at gmain.c:932
#4  0xee43fd5c in gtk_main () at gtkmain.c:476
#5  0xee577f04 in nsAppShell::Run () from /tmp/package/libwidget_gtk.so
#6  0xee7cb548 in nsAppShellService::Run ()
   from /tmp/package/components/libnsappshell.so
#7  0x186bc in NS_CanRun ()
#8  0x18b7c in main ()
(gdb) 
*** Bug 26566 has been marked as a duplicate of this bug. ***
The nightly from Feb., the 14th also hangs (13th Feb. crashed, see #27680, fixed
now). Interestingly, though: The build from the 13th showed a startup-image
(Mozilla breathing fire over a city) which I never saw before and which is again
missing in the build from the 14th.

Moreover, when I delete ~/.mozilla, it hangs after WEBSHELL+ = 1 (WEBSHELL+ = 2)
is not printed anymore. It will now always hang after this line (before that, it
would always print WEBSHELL+ = 2 and then hang) at the same place (poll() in
gmain.c:1031).
*** Bug 28400 has been marked as a duplicate of this bug. ***
Comments from: dhouston@bio.ri.ccf.org

While trying to test the fix for another bug I found that the latest (2-17-2000)
build of Mozilla will not run on my Solaris 7 box. I see the following in the
console:

MOZILLA_FIVE_HOME=/home/dhouston/mozilla/package
 
LD_LIBRARY_PATH=/home/dhouston/mozilla/package:/usr/local/lib:/usr/dt/lib:/vol/oracle/product/7.3.2/lib
       SHLIB_PATH=/home/dhouston/mozilla/package
          LIBPATH=/home/dhouston/mozilla/package
      MOZ_PROGRAM=mozilla-bin
      MOZ_TOOLKIT=
        moz_debug=0
     moz_debugger=
stty: : Invalid argument
nNCL: registering deferred (0)
Profile Manager : Profile Wizard and Manager activites : Begin
Profile Manager : Command Line Options : Begin
Profile Manager : Command Line Options : End
WEBSHELL+ = 1

but the program never comes up. The mozilla-bin process is running on the
machine and if I truss it I see the following ad infinitum:

lwp_sema_wait(0xFC5F1E78)       (sleeping...)
door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)
poll(0xFC611BE0, 1, 35000)                      = 0
poll(0x00164AC8, 3, -1)         (sleeping...)
signotifywait()                 (sleeping...)
poll(0xFC611BE0, 1, 35000)      (sleeping...)
lwp_sema_wait(0xFC5F1E78)       (sleeping...)
door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)

I just applied the latest kernel patch (106541-09) and the libthread patch
(106980-09) in an attempt fix it but to no avail. I also set shmsys in
/etc/system.

M13 worked on this machine. 

This bug blocks confirmation of bug 28187.
Blocks: 28187
I have been having this problem with all the nightly boulds since
they stopped crashing from bug 13202. The stack trace is the same as
what was first shown for this bug.

#0  0xef2b788c in _poll ()
#1  0xef54e51c in _MD_PauseCPU ()
#2  0xef5492ac in _PR_InitCPUs ()
#3  0xef54b0c0 in _PR_NativeRunThread ()

My own build, from the source, works (most of the time).
The nightlies never have.
I'll add that this problem also occurs on Solaris 8.
it seems to be looping in the following (truss output)

poll(0xFEC41BE0, 1, 35000)	(sleeping...)
lwp_sema_wait(0xFE291E78)	(sleeping...)
door_return(0x00000000, 0, 0x00000000, 0) (sleeping...)
poll(0xFEC41BE0, 1, 35000)			= 0
poll(0x0018FAB0, 3, -1)		(sleeping...)
signotifywait()			(sleeping...)
The nightly from 1st of March still hangs. I tried to build Mozilla from source
but then it crashes in gtk_set_locale() (somewhere deep inside X11).

Also, after deleting ~/.mozilla, I don't get the WEBSHELL+ = 2 anymore (only
WEBSHELL+ = 1). Does everyone else have the same problem ?
This blocks testing on Solaris.
Severity: major → blocker
The nightly build from March, 10th still has the bug but shows a slightly
different output in GDB:

[New LWP    2        ]
[New LWP    3        ]
[New LWP    4        ]
Profile Manager : Profile Wizard and Manager activites : Begin
Profile Manager : Command Line Options : Begin
Profile Manager : Command Line Options : End
WEBSHELL+ = 1

[New LWP    5        ]
[Switching to LWP    4        ]
[Switching to LWP    1        ]

As you can see, it actually switches between LWP 4 and 1 which it did not do
before. But after that, it still hangs in poll(), as usual.
Updating QA Contact.  I do not have Solaris.
QA Contact: leger → mcafee
Current CVS builds (and runs on  solaris 2.6/2.7)
problem appears to be that the nightly build is from MARCH.
[richb - 4/21/00]
We (Sun) also have no problems building (and running) the latest Mozilla
releases on Solaris, both with the Gnu compilers and the Sun Workshop 5.0
compilers. I suggest that this bug should be closed as fixed.
Since I have no Solaris box near me anymore, you cannot check this ATM.
Since I won't have access to a Solaris box for some time, I've closed the
bug :-) If it doesn't work, I'll complain when I can verify it.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
By no means should this bug be considered fixed.
There has been no working nightly build for sparc-solaris since january.
That it is posible to compile from the source does not address this bug at all.
In fact no new nighly builds have been delivered at all since march. I would
not call that "FIXED". If there is no plan to ever suport solaris, you should
say so, so that I can stop wasting my time.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The nightly build uses an old build script, this bug
is orthagonal to whatever Sun is able to do in their
backyard.  The nightly build script is some weird way
of building, I have yelled at the build group repeatedly
about this.  chofmann:  either put somone on this or
stop the nightly builds, I would rather publish nothing
than tease people with useless bits.
[richb - 5/17/00]
Solaris nightlies are currently broken with Sun compilers because of the 
String API rewrite. See bug 39424. Tinderbox no longer seems to show some
of the Sun builds (nebiros and bismarch seemed to have disappeared).
Because the Solaris O/S is of secondary importance to Netscape (ie. they
are concentrating on their Tier-1 platforms; Windows, Mac and Linux), we
at Sun have swapped over to just building with Gnu compilers. This is the
same compiler that is used for the Linux platform, so when changes break
the build there, they usually get fixed pretty quickly, and we can benefit 
from this. We are a limited size team, and we don't have the time to have
an engineer continually trying to fix build breaks. This doesn't fix problems
resulting from code that's checked in with bogus endian-ness or alignment
problems on RISC architectures, but it should give us a greater chance of
getting our nightly builds to completion.
->seawood
Assignee: chofmann → cls
Status: REOPENED → NEW
I just downloaded the latest nightly build from
ftp://ftp.mozilla.org/pub/mozilla/nightly/latest/ and it works fine for me. 

 -rw-rw-r--   1 22     12614112   Jun 15 15:34  
mozilla-sparc-sun-solaris2.6.tar.gz

Nebiros is back up on the SeaMonkey-Ports page building with the WS5.0
compiler.  The builds complete but it is currently failing one of the DOM
Conversion tests.
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → WORKSFORME
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.