Closed Bug 658840 Opened 14 years ago Closed 13 years ago

Forked glxinfo process leaks strings

Categories

(Core :: Graphics, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla7

People

(Reporter: jruderman, Assigned: bjacob)

References

Details

(Keywords: regression, valgrind)

Attachments

(3 files)

Workaround #1: pass --child-silent-after-fork=yes to Valgrind, and ignore Valgrind's exit code.

Workaround #2: add these leaks to a Valgrind suppression file. This 

(How are we dealing with this on Tinderbox?)

(How did our other leak detection tools survive the insanity of having every Firefox startup correspond to two shutdowns?)
> Workaround #2: add these leaks to a Valgrind suppression file.

This doesn't work, because two of the leaks have busted stacks (???)
Can you explain me more about this link? I don't understand it.

Is this a leak whereby the forked glxinfo process stays around as a zombie? I'm afraid this problem currently happens, as the zombie only gets killed by GfxInfo::Init() and I'm not sure if that is always getting called (need do check).

Or are you talking about a different kind of leak?
This is a leak where the forked glxinfo process exits, but doesn't free some of its memory before exiting.  Some of this memory is associated with strings.
We fork() but don't exec(), so the forked process doesn't have separate memory mappings.  This could be the tools getting confused.  (Or it could be us doing things we're not supposed to.)
Returning to let main finish (instead of calling exit()) should mean fewer leaks.

(In reply to comment #4)
> We fork() but don't exec(),

AFAIK we still have only one thread at this point, so I think that's OK.
Summary: Forked glxinfo process leaks → Forked glxinfo process leaks strings
I don't think either of the workarounds are correct, because:

#1 would ignore all children, not just this one we don't care about
#2 would ignore these leaks in the main process (where they are a bug) and not just the glxtest forked process

I think that karlt's solution is probably fine: make fire_glxtest_process return a value instead of exiting, and "return" out of main if we're the forked process.

But I'm also a little worried that we're going to confuse the refcount/tracemalloc logs with the forked process, since they have duplicated file handles and are writing to the same log :-(
What if we came up with a way of telling valgrind "I'm exiting and I know I'm going to leak, ignore them". We could perhaps do this with a valgrind client request from the child process.
(In reply to comment #7)
We'd also need to tell our other leak tools to ignore leaks (but perhaps we have to do something there anyway to deal with the duplicate file handles issue).
Like this?
Attachment #535334 - Flags: review?(benjamin)
Note that even with this patch, we still exit() in the error cases, see fatal_error() and x_error_handler(). Should I fix that too?
Attachment #535334 - Flags: review?(benjamin) → review+
http://hg.mozilla.org/mozilla-central/rev/84636464bba8

Assuming this is all what we want to fix here, otherwise reopen.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Can't this approach cause crashes if some of the stuff cleaned up on the exit of the child process is still needed by the parent process (since they're still sharing memory space, since we've never called exec())?
(In reply to comment #12)
> (since they're still sharing memory space

Are they? I didn't know that. That's beyond my understanding of things. Other people please advise!
This is a fork (not clone) so in general they don't share memory space.
"Under Linux, fork() is implemented using copy-on-write pages."

The child has copies of file descriptors for the same files.
ah, ok.
Assignee: nobody → bjacob
Target Milestone: --- → mozilla7
Can anyone please provide a test case or some guidelines that I can use to verify this fix? Do I need to create any specific environment to test the fix?

Thanks!
@Mihaela: the bug was detected in valgrind. So the steps-to-reproduce were basically:
1. get a Linux optimized build with debug symbols (might also be possible on Mac, but not on Windows)
2. valgrind ./firefox

Notice that valgrind always reports some leaks that have to be ignored by adding them to a 'valgrind suppressions file'. See:
https://developer.mozilla.org/en/Debugging_Mozilla_with_Valgrind

But frankly, I wouldn't worry too much about this bug. It never was a very big bug and the fix almost certainly works.
Not directly as reported in comment 0, but, in a debug build, glxinfo dumps strings stats on exit (if it doesn't crash).  This happens early while the browser is still running.  Something like:

 => mAllocCount:              5
 => mReallocCount:            3
 => mFreeCount:               5
 => mShareCount:              1
 => mAdoptCount:              0
 => mAdoptFreeCount:          0

mFreeCount should match mAllocCount, as above.
(In reply to Benoit Jacob from comment #10)
> Note that even with this patch, we still exit() in the error cases, see
> fatal_error() and x_error_handler(). Should I fix that too?
That would be appreciated, but now at least I know why I seem to get leaks.

(In reply to David Baron from comment #12)
> Can't this approach cause crashes if some of the stuff cleaned up on the
> exit of the child process is still needed by the parent process (since
> they're still sharing memory space, since we've never called exec())?
I thought that applied to vfork() rather than fork()?
(In reply to neil@parkwaycc.co.uk from comment #19)
> (In reply to Benoit Jacob from comment #10)
> > Note that even with this patch, we still exit() in the error cases, see
> > fatal_error() and x_error_handler(). Should I fix that too?
> That would be appreciated, but now at least I know why I seem to get leaks.

Probably even better would be if the leak code didn't report leaks on error exit codes.  on_exit() provides this info, but was dropped between SunOS 4 and 5, so atexit is recommended instead.  I don't know how to get the exit code with atexit().

I guess glxtest could abort() instead of exit().  (ISTR we now make abort perform a SIGSEGV-generating op, though.  That could cause surprises but I guess we already cope with that.)
(In reply to Karl Tomlinson (:karlt) from comment #20)
> (In reply to neil@parkwaycc.co.uk from comment #19)
> > (In reply to Benoit Jacob from comment #10)
> > > Note that even with this patch, we still exit() in the error cases, see
> > > fatal_error() and x_error_handler(). Should I fix that too?
> > That would be appreciated, but now at least I know why I seem to get leaks.
> 
> Probably even better would be if the leak code didn't report leaks on error
> exit codes.  on_exit() provides this info, but was dropped between SunOS 4
> and 5, so atexit is recommended instead.  I don't know how to get the exit
> code with atexit().

I really don't want to use atexit() in library (here, libxul) code. (FWIW, since any code is susceptible of moving to a library in the future, I never want to use atexit).

> 
> I guess glxtest could abort() instead of exit().  (ISTR we now make abort
> perform a SIGSEGV-generating op, though.  That could cause surprises but I
> guess we already cope with that.)

Great idea!
First I reproduced the string leak here on linux, by making the glxtest process trigger an artificial error. Then I replaced exit() by abort() in the fatal_error and X error handler functions, and XPCOM_MEM_LEAK_LOG didn't report any leak anymore. Patch coming.
Attachment #560845 - Flags: review?(karlt)
Attachment #560845 - Flags: review?(karlt) → review+
(In reply to Benoit Jacob [:bjacob] from comment #21)
> (In reply to Karl Tomlinson (:karlt) from comment #20)
> > Probably even better would be if the leak code didn't report leaks on error
> > exit codes.  on_exit() provides this info, but was dropped between SunOS 4
> > and 5, so atexit is recommended instead.  I don't know how to get the exit
> > code with atexit().
> 
> I really don't want to use atexit() in library (here, libxul) code. (FWIW,
> since any code is susceptible of moving to a library in the future, I never
> want to use atexit).

Yes, best avoid atexit wherever possible.
My comment was based on the assumption that the leak reporting code used atexit, but it looks like that is either no longer the case, or never was:
http://hg.mozilla.org/mozilla-central/rev/f4ddad2c0eb7
http://hg.mozilla.org/mozilla-central/annotate/5319b0100025/xpcom/string/src/nsSubstring.cpp#l105
Attachment #535334 - Flags: checkin+
Will abort() dump core or trigger crashreporter, or is it too early?
It doesn't, as the glxtest process is forked specifically before the point where we enable the crashreporter. See toolkit/xre/nsAppRunner.cpp.

This was necessary as the whole point of the glxtest process is that it must be able to crash without the user noticing.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: