Closed Bug 206642 Opened 22 years ago Closed 19 years ago

Mozilla hangs on startup - strace shows its spinning on trying to read XUL.mfasl file

Categories

(Core :: XPCOM, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mohit_aron, Unassigned)

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.3) Gecko/20030312 Very frequently, Mozilla seems to hang when I attempt to start it - that is, no browser window comes up. An strace of mozilla in this condition shows its spinning on the XUL.mfasl file. The problem goes away if I kill the spinning mozilla process and delete the XUL.mfasl file and then restart mozilla. The problem specially happens if Mozilla crashes for some reason. But even on a clean shutdown, I've often seen this problem when I attempt to start Mozilla the next time. This problem has also been observed by many others in my company. In fact, the solution (deleting the XUL.mfasl file) is now documented in a "helplist" that we maintain over here for our engineers. Reproducible: Sometimes Steps to Reproduce: Don't know how to reproduce this problem deterministically. However, if Mozilla were made to crash a couple of times and then restarted, perhaps this problem might show up. Actual Results: I cannot reproduce the problem deterministically - so don't know what to say here. Expected Results: Mozilla should start up and show the browser window rather than not showing anything.
Can you be extremely specific about which version of mozilla with which you are experiencing this problem. I see that the build id for the browser that you used to file this bug is "...; rv:1.3) Gecko/20030312", but have you actually experienced this problem with that browser (for sure). This was a very common problem in builds prior to about 01/31/2003, but since that time we have had no reports at all of people experiencing this problem. (See bug 169777, and bug 189832). If you can in fact, reproduce this with a build with a build ID after 20030131, could you please (g)zip up the XUL.mfasl and email it to me so that I can have a look at that file. Thanks. (Note, there is no personal information contained in that file, aside from the path for where you installed mozilla).
Status: UNCONFIRMED → NEW
Ever confirmed: true
UL.mfasl file that causes mozilla to hang upon startup is attached.
> I see that the build id for the browser that you used to file this bug is > "...; rv:1.3) Gecko/20030312", but have you actually experienced this problem > with that browser (for sure). I'm absolutely experiencing problems with this browser. So are others in the company. I just ran into the problem again 2 minutes back. I've attached the corresponding XUL.mfasl file - as soon as I removed this file, mozilla was back in business.
This fastload file is not from 1.2 or earlier builds ? (they are known to be broken)
Thanks for providing the XUL.mfasl file. That helps a lot. I hacked out some validity checks in my build so that it would use this fastload file, and I can reproduce this hang (or at least I can see I am doomed and am wrongly seeking past the EOF. I dumped out the format, and noticed the following things: 1) MFL_FILE_VERSION is 4, which means the build that generated this file is from after brendan landed the final touches to fix the previous problem. 2) There is no nsSystemPrincipal singleton serialized ID map into the fastload file. 3) There are no ".js" documents in the document map in the file. I spoke with brendan about this and given that the chrome path begins with "/auto/..." he suspects that this may be related to a problem with NFS. Can you provide further details about what OS types and versions are running for client and (NFS) server, which version of NFS is running for client and server and any other data about the mountd and nfsd binaries you have installed (e.g., on redhat, /usr/sbin/rpc.mountd --version).
Assignee: dougt → jrgm
If on Redhat Linux, maybe something like 'rpm -qa |egrep -i "(nfs|mount)"' and then 'rpm -qi' on each of the rpms named from the output of the first command.
% rpm -qa | egrep -i "(nfs|mount)" mount-2.10r-0.6.x nfs-utils-0.3.1-0.6.x.1 % rpm -qi mount-2.10r-0.6.x Name : mount Relocations: (not relocateable) Version : 2.10r Vendor: Red Hat, Inc. Release : 0.6.x Build Date: Tue 10 Apr 2001 02:16:11 PM PDT Install date: Thu 30 Jan 2003 07:45:45 AM PST Build Host: porky.devel.redhat.com Group : System Environment/Base Source RPM: mount-2.10r-0.6.x.src.rpm Size : 115615 License: GPL Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Summary : Programs for mounting and unmounting filesystems. Description : The mount package contains the mount, umount, swapon and swapoff programs. Accessible files on your system are arranged in one big tree or hierarchy. These files can be spread out over several devices. The mount command attaches a filesystem on some device to your system's file tree. The umount command detaches a filesystem from the tree. Swapon and swapoff, respectively, specify and disable devices and files for paging and swapping. % rpm -qi nfs-utils-0.3.1-0.6.x.1 Name : nfs-utils Relocations: (not relocateable) Version : 0.3.1 Vendor: Red Hat, Inc. Release : 0.6.x.1 Build Date: Tue 17 Apr 2001 09:56:16 AM PDT Install date: Thu 30 Jan 2003 07:50:48 AM PST Build Host: porky.devel.redhat.com Group : System Environment/Daemons Source RPM: nfs-utils-0.3.1-0.6.x.1.src.rpm Size : 524367 License: GPL Packager : Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla> Summary : NFS utlilities and supporting daemons for the kernel NFS server. Description : The nfs-utils package provides a daemon for the kernel NFS server and related tools, which provides a much higher level of performance than the traditional Linux NFS server used by most users. This package also contains the showmount program. Showmount queries the mount daemon on a remote host for information about the NFS (Network File System) server on the remote host. For example, showmount can display the clients which are mounted on that host.
% uname -rs Linux 2.2.19-6.2.1 The NFS server my company uses is one from Netapp - their 820 filer.
Some information about problems with redhat client problems and NFS servers (including Netapp), although they seem to be more about 2.4.x kernels and 7.x redhat. Suggests possible improvement if setting nfs version 2, or limiting the rsize/wsize to 8KB. http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=65069 (Yet, I don't quite grok why it appears to selectively drop the ".js" documents from the serialization (although some ".xul" documents are missing too)).
By the way, what does 'cat /etc/redhat-release' say. And are there any suspicious errors in /var/log/messages? (Sorry to ask so many questions, but I just don't see how, under typical conditions, fastload would serialize only some of the streams).
% cat /etc/redhat-release Red Hat Linux release 6.2 (Zoot) No, there's nothing in /var/log/messages that I can connect to this bug. Can you please explain how this bug is affected by NFS ? Just out of curiosity. Also, I routinely have to delete my XUL.mfasl file to start up mozilla. I don't see a speed benefit anywhere. So why is this file even kept around ? Given the number of problems that surround it, it seems its better to get rid of this file altogether.
Mohit Aron, the symptoms you are reporting have not been reported by others. They are not known bugs in FastLoad. From my years in the mid-late-80s hacking on NFS for SGI, they seem like client NFS buffered-write-vs.-seek bugs. You won't see a FastLoad speedup without a local filesystem. Even then, on very fast hardware, you won't see as big a speedup with FastLoad as you would on older hardware. But FastLoad does improve Ts, which is a performance number we measure and keep ever-improving (mostly), via tinderbox. Some of our tinderboxes are older machines. None uses NFS for the filesystem containing the profile. /be
> Mohit Aron, the symptoms you are reporting have not been reported by others. > They are not known bugs in FastLoad. From my years in the mid-late-80s > hacking on NFS for SGI, they seem like client NFS buffered-write-vs.-seek > bugs. > > You won't see a FastLoad speedup without a local filesystem. Even then, on > very fast hardware, you won't see as big a speedup with FastLoad as you would > on older hardware. But FastLoad does improve Ts, which is a performance > number we measure and keep ever-improving (mostly), via tinderbox. Some of > our tinderboxes are older machines. None uses NFS for the filesystem > containing the profile. I don't even remember the last time I had my home directory on a local filesystem rather than an NFS one - and I've been using computers for more than 10 years now (Bachelors, Phd, and 3 years in industry). If bugs in fastload are triggered by commercial implementations of NFS, then it'd be better to get rid of the fastload mechanism. I can't imagine anyone doing serious work putting his/her directory on a local filesystem. You say other users haven't reported this bug - I don't see what I can do about that. I work at Google - and people routinely see this problem here. And they've been seeing it for a long time - its just that nobody cared to file a bug. I just took the effort of filing one. Please don't tell me that this is a non-problem because others haven't reported it.
Build-Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.2.1) Gecko/20030212 Debian/1.2.1-9woody1 Reading the comments, the developers seem to be focussing on possible NFS problems as the cause? I've also had this problem on this version (but not earlier versions, I don't think). As you can see I'm on 20030212 on Debian Woody. Only problem is, I have no NFS support on the system whatsoever. Not even in the kernel (custom compiled, both settings set to No). I do have Samba support (server and client), but purely for documents - nothing related to either the Mozilla system files or the home directory. Apart from that, the symptoms are exactly as described - in my case on a Celeron 466 w/192M RAM, it effectively locks the system up if I don't kill it fairly straight away (ie. when it starts using more than about 30M). It's not directly reproducible, but my Mozilla rarely crashes so more often than not it's after a clean shutdown. HTH Andrew
Yes, this was a common problem with "rv:1.2.1) Gecko/20030212". But since 1.3 final, aside from the situation noted in this bug, this has not been a known issue.
It's hard to fix a bug that people don't report. Thanks to Mohit for filing this one, but we're still not going to get far without a way to reproduce the problem here where jrgm and I sit. One thing that would help: if you can use tcpdump or ethereal or something similar to snoop NFS packets when you first start the browser *without* a XUL.mfasl file in your profile directory. Once the browser is up, if you then quit and start again, and find the browser hanging as described here, I would love to see the packet trace (voluminous thought it would be). /be
I'm trying to understand why this bug is being blamed on NFS ? If there really was such a blatant problem with NFS, I should be seeing the problem with a host of other files on my NFS directory. Why only Mozilla then ? Also, please notice that the problem happens randomly - so I can't deterministically be running a tcpdump when the problem happens.
Assignee: jrgmorrison → nobody
QA Contact: scc → xpcom
Mohit, can you reproduce this in builds dated 20060804 or newer (i.e. with bug 341595 fixed) ?
Please REOPEN the bug if it can be reproduced in Firefox 2.0b2 or newer. -> WORKSFORME
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: