Open Bug 697963 Opened 13 years ago Updated 2 years ago

zombie Firefox process just after starting Firefox with no ~/.mozilla directory (just a new profile in existing .mozilla does NOT reproduce)

Categories

(Core :: Graphics, defect)

10 Branch
x86_64
Linux
defect

Tracking

()

UNCONFIRMED

People

(Reporter: vincent-moz, Unassigned)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:10.0a1) Gecko/20111028 Firefox/10.0a1
Build ID: 20111028031044

Steps to reproduce:

1. Remove the .mozilla directory (to have a really fresh profile, with no add-ons).
2. Start Firefox with "./firefox" from a terminal.


Actual results:

A "ps -aef" shows:
vlefevre 22759 22445  9 13:49 pts/13   00:00:01 ./firefox
vlefevre 22760 22759  0 13:49 pts/13   00:00:00 [firefox] <defunct>


Expected results:

There should be no zombie (<defunct>) processes.

This is always reproducible.
Does it cause any problems? Does Firefox not start?
Firefox starts and I can use it. But I don't know whether this problem can be related to other bugs. I wonder whether the problem is due to an expected dead process that hasn't been waited for or to some processed that shouldn't have died.
Does the zombie go away after some time?
Summary: zombie just after starting firefox → zombie Firefox process just after starting Firefox with no ~/.mozilla directory
It didn't seem to go away. But I haven't tried for a long time since this was mainly for testing Firefox stability with a really clean profile.
I can reproduce this problem on another machine with:

  Mozilla/5.0 (X11; Linux x86_64; rv:10.0a1) Gecko/20111102 Firefox/10.0a1

I've attached the strace output concerning the dead process. One can see that it writes an error message "X error occurred in GLX probe, ...", but this message isn't visible anywhere (it seems to be transmitted to the pipe to the parent firefox process).
I could be the separate process that gets the details of your GPU and driver for the HW acceleration feature. There were problems with this detection that is why it was moved to separate process, otherwise it would kill whole Firefox. Maybe you hit some of those problems and the process gets stale.

Do you always get the zombie when you delete ~/.mozilla?
Do you never get the zombie when the profile is already created? The detection should be the same, unless you disabled HW acceleration manually.

Can you paste contents of Graphics section from about:support ?
Post one when you have the zombie, and one when you have a normal profile created and the zombie is not produced.
(In reply to :aceman from comment #6)
> Do you always get the zombie when you delete ~/.mozilla?

Yes, and the message "failed to create drawable" from libGL is output twice.

> Do you never get the zombie when the profile is already created?

Never, and I get only one message "failed to create drawable" from libGL.

> Can you paste contents of Graphics section from about:support ?

Adapter Description

GLXtest process failed (exited with status 1): X error occurred in GLX probe, error_code=9, request_code=55, minor_code=0

WebGL Renderer

Blocked for your graphics card because of unresolved driver issues.

GPU Accelerated Windows

0/1. Blocked for your graphics driver version. Try updating your graphics driver to version <Anything with EXT_texture_from_pixmap support> or newer.

in both cases.
So can you give details of your card and driver? Try lspci (search for VGA) and glxinfo (lines with e.g. Driver and Renderer).
VGA compatible controller: nVidia Corporation G98 [Quadro NVS 295] (rev a1)

glxinfo gives:
OpenGL renderer string: Software Rasterizer
and no lines with driver.

The machine is a Debian/unstable one, without proprietary drivers.
OK, what is the version of Mesa installed?
Also, please see into /var/log/Xorg.0.log (check the file date to be sure it is the log of the currently running X server) and check which driver for your card you are using. Is it "nv" or "nouveau" or something else? Which version? I think proprietary nvidia driver would show up as "nvidia" there.
nouveau compiled for 1.11.0, module version 0.0.16
driver date: 2011-03-24
And Mesa?
Mesa 7.11
Depends on: 677531
Component: General → Graphics
Product: Firefox → Core
QA Contact: general → thebes
I confirm we've had (and thought we fixed) similar bugs in the past around that glxtest process: bug 677531, bug 681026 .

This process is expected to die, so the problem is just that we're failing to waitpid() for it.

I can't reproduce this problem here, but this could easily be related to the fact that on your system, a X error occurs in the glxtest process, which is not the case on my system.

The first thing to do to try to reproduce would be, precisely, to trigger a X error in toolkit/xre/glxtest.cpp to see if that reproduces the problem.
Actually, this still doesn't make sense to me: the fact that about:support shows your X error is proof that we hit the GetData() function in GfxInfoX11.cpp which is also where we waitpid().
(In reply to Vincent Lefevre from comment #7)
> (In reply to :aceman from comment #6)
> > Do you always get the zombie when you delete ~/.mozilla?
> 
> Yes, and the message "failed to create drawable" from libGL is output twice.
> 
> > Do you never get the zombie when the profile is already created?
> 
> Never, and I get only one message "failed to create drawable" from libGL.

OK, this is the key to reproduce. I can reproduce by using an empty .mozilla, but I can't reproduce by using just a new profile in existing .mozilla.
Summary: zombie Firefox process just after starting Firefox with no ~/.mozilla directory → zombie Firefox process just after starting Firefox with no ~/.mozilla directory (just a new profile in existing .mozilla does NOT reproduce)
In toolkit/xre/nsAppRunner.cpp, we're forking the glxtest process very early, then we set up the crash reporter, and only later do we do SelectProfile() which apparently forks again.
Benjamin, do you have any clue as to why we are failing to waitpid() for the glxtest process specifically when there is no .mozilla directory, if if this can be related to SelectProfile() forking after the glxtest process is forked? (comment 17)
When the user launches with -P or -profilemanager, we show the profile selection dialog in the first process and then exec to start a brand new process with the selected profile. In that case we do initialize gecko and graphics and presumably you should be waiting on it later.

This shouldn't affect *this* case, though, since a missing .mozilla directory shouldn't be showing the profile manager ever and we shouldn't be forking or execing anything, instead following this code path: http://mxr.mozilla.org/mozilla-central/source/toolkit/xre/nsAppRunner.cpp#2130

That should also kick off any profile migration code, but that also AFAIK doesn't cause any execs or forks: we've worked pretty hard to remove all of the exec/forking from the common-case startup path.
Actually, in the present case (no existing .mozilla directory), we do execv():

Breakpoint 1, LaunchChild (aNative=0x4bc630, aBlankCommandLine=false)
    at /home/bjacob/mozilla-inbound/toolkit/xre/nsAppRunner.cpp:1647
1647      if (execv(exePath.get(), gRestartArgv) == -1)
(gdb) bt
#0  LaunchChild (aNative=0x4bc630, aBlankCommandLine=false)
    at /home/bjacob/mozilla-inbound/toolkit/xre/nsAppRunner.cpp:1647
#1  0x00007ffff3eb22d9 in ImportProfiles (aPService=0x4cfaf0, aNative=0x4bc630)
    at /home/bjacob/mozilla-inbound/toolkit/xre/nsAppRunner.cpp:1919
#2  0x00007ffff3eb2fc3 in SelectProfile (aResult=0x7fffffffba60, aNative=0x4bc630, aStartOffline=0x7fffffffba5f, 
    aProfileName=0x7fffffffb7a0) at /home/bjacob/mozilla-inbound/toolkit/xre/nsAppRunner.cpp:2075
#3  0x00007ffff3eb656f in XRE_main (argc=2, argv=0x7fffffffe258, aAppData=0x424b10)
    at /home/bjacob/mozilla-inbound/toolkit/xre/nsAppRunner.cpp:3190
#4  0x0000000000401878 in do_main (exePath=0x7fffffffd020 "/home/bjacob/build/inbound/dist/bin/libxpcom.so", argc=2, 
    argv=0x7fffffffe258) at /home/bjacob/mozilla-inbound/browser/app/nsBrowserApp.cpp:198
#5  0x0000000000401a91 in main (argc=2, argv=0x7fffffffe258)
    at /home/bjacob/mozilla-inbound/browser/app/nsBrowserApp.cpp:281


I think that explains it.

Do we want to fix it? I'm tempted to WONTFIX this as this only affects an uncommon case (basically, first usage of firefox), the bug doesn't cause any harm besides using 1 PID and a small amount of memory while Firefox is running, and fixing it requires making non-local changes in this very 'core' part of the codebase. What do you think?
I don't think this bug is harmful, indeed. But we can also just remove the profile import code, since that code was going from Firefox <0.9.2 to Firefox >=0.9.3.  We can avoid an extra process on first-launch and remove some unuseful code at the same time...
I'm noticing this bug I think, and I *do* have a .mozilla directory.  I only noticed when my Firefox hung for some unknown reason and I saw a whole bunch of defunct processes.  I get an extra one each time I do a restart of Firefox.

My graphics hardware and drivers are too old and useless to support hardware acceleration.  Everything is showing as blacklisted.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: