Closed
Bug 124377
Opened 23 years ago
Closed 23 years ago
LiveLock in Xsun
Categories
(Core Graveyard :: X-remote, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: animalfriend, Assigned: blizzard)
References
()
Details
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; SunOS sun4u; en-US; rv:0.9.8) Gecko/20020205
BuildID: 2002020508
(I have a workaround for this bug, c below)
When running Mozilla-0.9.8 for some time, the whole application freezes,
utilizing 100% CPU, half of it being spent in XSun, rest in mozilla-bin. There
is no error message and this state continues until Mozilla is killed (SIGTERM,
STRG-C).
Unfortunately, there is no specific action that triggers this behaviour, though
it often happens when I change to a new page, but sometimes also when scrolling
a (rather simple) page.
I have seen this happen on SunRays (aka X-Terminals) and Ultra3 stations.
Reproducible: Sometimes
Steps to Reproduce:
1. Go to the gif-buster
2. Wait a bit, freeze normally occurs after 4-10 reloads
Actual Results: Freeze, have to kill process.
After restarting, i noticed:
- My history is blank (interrupted during page load when history file is updated?)
- I have a file with length 0 in my disk cache (lots of space free, so disk is
not the problem)
Expected Results: Everything else but lockup. ;-)
Workaround:
Start mozilla with --sync option (synchronized X commands). This seems to hit
performance (networkded X) but makes mozilla rock solid.
Environment:
---------- sysinfo
Manufacturer (Short) is Sun
Manufacturer (Full) is Sun Microsystems
System Model is Fire 280R
Main Memory is 4.0 GB
Virtual Memory is 7.2 GB
Number of CPUs is 2
CPU Type is sparcv9+vis2
App Architecture is sparc
Kernel Architecture is sun4u
OS Name is SunOS
OS Version is 5.8
OS Distribution is Solaris 8 ... SPARC
Kernel Version is ..... 64-bit
----------
$DISPLAY is set to ":6.0" (SunRay Terminal)
Freezing appears less frequently on single CPU systems (Ultra3), so maybe there
is a conncurrency problem.
Reporter | ||
Comment 1•23 years ago
|
||
Maybe this bug is connected with 102552
Comment 2•23 years ago
|
||
1. Wanna try to update your GDK/GTK+ library versions, please ?
2. Did you apply the recommended patches for Solaris 2.8 ?
(OT: SunRays are no Xterminals - they are completely different architectures...)
Reporter | ||
Comment 3•23 years ago
|
||
1. Cannot update GTK+, because I am a student and this is my University's server
(no admin). GTK is 1.2.6
2. I checked the patch status, nearly all are installed and up2date (5 lagging
behind with 1 version or so)
I could track down the bug a bit, by using truss mozilla --no-shm (I figured the
bug to be connected with that). Here's a snippet:
----------
18129: open("/tmp/.X11-pipe/X6", O_RDWR) = 6
18129: fstat(6, 0xFFBEEAA0) = 0
18129: uname(0xFFBEE900) = 1
18129: fcntl(6, F_SETFD, 0x00000001) = 0
18129: access("/home/cip/96/jnlukasc/.Xauthority", 4) = 0
18129: open("/home/cip/96/jnlukasc/.Xauthority", O_RDONLY) = 7
18129: fstat64(7, 0xFFBEE9C8) = 0
18129: ioctl(7, TCGETA, 0xFFBEE954) Err#25 ENOTTY
18129: read(7, "\0\0\00483BC1E u\001 6\0".., 8192) = 8192
18129: read(7, " s w s k 4 G j 2 K o / 3".., 8192) = 5777
18129: read(7, 0x0007FE4C, 8192) = 0
18129: llseek(7, 0, SEEK_CUR) = 13969
18129: close(7) = 0
18129: writev(6, 0xFFBEF008, 4) = 48
18129: fstat64(6, 0xFFBEEE98) = 0
18129: fcntl(6, F_SETFL, 0x00000080) = 0
18129: read(6, "01\0\0\v\0\0\0 *", 8) = 8
18129: read(6, "\0\019\n04\0\0\0\0 ?FFFF".., 168) = 168
18129: write(6, " 7\0\00504\0\0\0\0\0\0 %".., 64) = 64
18129: read(6, "01\0\002\0\0\0\0\0\0\0\0".., 32) = 32
18129: read(6, "01\b\003\0\00219\0\0\01F".., 32) = 32
18129: readv(6, 0xFFBEF020, 2) = 2148
18129: writev(6, 0xFFBEEEA4, 3) = 20
18129: read(6, "01\0\004\0\0\0\0\0\0\0\0".., 32) = 32
18129: getpid() = 18129 [18118]
18129: writev(6, 0xFFBEEF94, 3) = 20
18129: read(6, "01\0\005\0\0\0\00188\0\0".., 32) = 32
18129: writev(6, 0xFFBEEE64, 3) = 20
18129: read(6, "01\0\006\0\0\0\00188\0\0".., 32) = 32
18129: getuid() = 30165 [30165]
18129: writev(6, 0xFFBEEE54, 3) = 32
18129: read(6, "01\0\0\b\0\0\0\00194 [\0".., 32) = 32
18129: write(6, "9401\002\001\0\0", 8) = 8
18129: read(6, "01\0\0\t\0\0\0\0\0\0\010".., 32) = 32
18129: open("/tmp/.X11-sme16", O_RDWR) = 7
18129: mmap(0x00000000, 65536, PROT_READ|PROT_WRITE, MAP_SHARED, 7, 0) = 0xFE5
B0000
18129: close(7) = 0
18129: unlink("/tmp/.X11-sme16") = 0
18129: write(6, "9402\002\001\0\0", 8) = 8
18129: read(6, "01\0\0\n\0\0\0\0\0\0\0\0".., 32) = 32
18129: uname(0xFFBEEA88) = 1
18129: write(6, " ", 1) = 1
18129: read(6, "019B\00F\0\0\0\0\0\0\091".., 32) = 32
18129: write(6, " ", 1) = 1
18129: read(6, "01\0\011\0\0\0\0\0\0\095".., 32) = 32
18129: write(6, " ", 1) = 1
18129: read(6, "01\0\012\0\0\0\0\0\0\0D6".., 32) = 32
18129: write(6, " ", 1) = 1
18129: read(6, "01\0\013\0\0\0\0\0\0\096".., 32) = 32
18129: write(6, " ", 1) = 1
18129: read(6, "01\0\014\0\0\0\0\0\001 W".., 32) = 32
18129: write(6, " ", 1) = 1
18129: read(6, 0xFFBEF15C, 32) Err#11 EAGAIN
----------
I am not an X expert, but I'd say that MIT-SHM gets used (huh?). EAGAIN means
"out-of-process", strange in a read IMHO. In the LiveLock, there are lots of
read(6) operations with EAGAIN.
Am I the only one with this problem? What else could cause this (KDE-2.2)?
With the workaround --sync Mozilla is rock-solid for me (better than <4.7),
therefore I'd say this is a minor glitch, maybe just document it in Release notes.
OT: Difference SunRay - XTerminal is IMHO only that there runs no Unix on the
appliance, just the XServer, but principle is the same.
Comment 4•23 years ago
|
||
> 1. Cannot update GTK+, because I am a student and this is my University's
> server (no admin). GTK is 1.2.6
AFAIK you can always install newer versions of GDK/GTK+ in your homedir and use
them via setting the LD_LIBRARY_PATH to the location of the newer libs.
----
> I am not an X expert, but I'd say that MIT-SHM gets used (huh?)
Uhm, no.
The use of /tmp/.X11-sme16 means that Sun's shared memory _transport_ is being
used instead of CPU-intensive piping of data (packages are exchanged via shared
memory).
----
> EAGAIN means "out-of-process", strange in a read IMHO. In the LiveLock,
> there are lots of read(6) operations with EAGAIN.
From the read(2) manual page (% man -s2 read):
-- snip --
When attempting to read from an empty pipe (or FIFO):
[snip]
o If some process has the pipe open for writing and
O_NONBLOCK is set, read() returns -1 and sets errno to
EAGAIN.
-- snip --
This means the pipe simply has no new data - nothing special.
----
Do you know the current Xsun patch revision, e.g. what does
-- snip --
% showrev -p | fgrep 108652-
-- snip --
say ?
Comment 5•23 years ago
|
||
I forgot something to quote:
Every glib/GDK/GTK+ version below 1.2.8 will end in a more or less instable
mozilla (except you use a Xlib toolkit Zilla which does not rely on GDK/GTK+
libraries) ...
Reporter | ||
Comment 6•23 years ago
|
||
1. Tried that, not better. (compiled 1.2.10 of glib and gtk) and used LD_
variable, checked that it's getting used with ldd, zilla was functional but
locked after some time).
Add version of gtk to README then (section requirements for Solaris, only lists
patches).
2. Problem must be linked with transport somehow. I used loopback and it ran
stable (is loopback faster than sync???).
3. Uh, sry about that read() thing (5 yrs study of CS wasted). I also saw an
SIGALRM on some mutex that is repeated in the truss - would that help? Or if I
append the truss as attachment?
4. Patch is -47. Probably recommended patch ball, pretty much everything else up
to date.
5. Could be the Solaris version was build on campus. I try to get in contact
with the admins, see if they did something special (==wrong). Am I on the only
installation with this problem????
6. A friend just tested with KDE1, same error, so KDE2 is also not the culprit.
Comment 7•23 years ago
|
||
Just chiming in with a me too. I noticed this same behavior when upgrading
patches from 108652-40 to 108652-46. As frankenstein would say -40 good, -46
bad. When -47 came out, I tried that and it exhibits the same behavior. Am
willing to collect more info on machine state with either patches installed if
someone gives me some direction on what they want (truss, etc.).
Have backed out to patch level -40 and mozilla runs fine.
I should also say that this problem has persisted through three milestones of
mozilla, 0.9.6-.8 (I tend to use the precompiled one provided by mozilla.org).
Reporter | ||
Comment 8•23 years ago
|
||
Thank god someone else had this problem and me not crazy.
I can confirm your findings: 6 and 8 (self- and precompiled) both had same
problem, newest gtk didn't help either. I built gtk myself using POSIX threads
as stated on this site.
I cannot patch back XSun (without getting killed by root), but if that fixes
the prob, I guess we tracked it down to a Sun X bug. Agreed? Or are there
precompiled and proven to work gtk-libs somewhere we should test?
Add DISPLAY reroute code in mozilla.sh as patch for SUN architecture for 1.0???
Comment 9•23 years ago
|
||
Well, you may try if the GDK/GTK+-free Xlib toolkit in Zilla does not suffer
from this problem.
Simply build it with:
-- snip --
# unpack source tarball
% ./configure --enable-defaut-toolkit=xlib
% gmake
-- snip --
If this Zilla works then the GDK/GTK+ libraries (or the GDK/GTK+-specific code
in Mozilla) have a bug ...
Summary: LiveLock in XSun → LiveLock in Xsun
Comment 10•23 years ago
|
||
BTW: There is ma minor typo in my last comment:
It's
-- snip --
configure --enable-default-toolkit=xlib
-- snip --
a 'l' was missing...
Comment 11•23 years ago
|
||
We too have been seeing this issue. We noticed it when a 10/1 MU
version was rock solid (108652-38), and the latest recommended patches kept
hanging the Xsun with both Mozilla and OpenOffice. We were
at 108652-51 (the latest on the Sun site), users reported
hangs frequently (sometimes several per hour). We are
backing down to -40 to see if that fixes the problem.
Reporter | ||
Comment 12•23 years ago
|
||
Brian, did you try changing the $DISPLAY to use local loopback
(like "hostname:0.0")? Did that fix the problem?
What would be the downside of going back to pl 40? IMHO the performance impact
is not that big.
Can anyone check if the problem is known at SUN? I don't know how to get access
to their bug database.
Reporter | ||
Comment 13•23 years ago
|
||
This bug seems to be FIXED as of Mozilla 1.0!
I had the gif-buster URL running for several minutes without the $DISPLAY
workaround and everything ran smooth.
Configuration:
- Mozilla 1.0
- gtk 1.2.10
- glib 1.2.10
- XSun patch revision: 108652-53
I don't know which of these fixed the issue - maybe they did combined.
Status: UNCONFIRMED → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
Updated•6 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•