Closed
Bug 658840
Opened 14 years ago
Closed 14 years ago
Forked glxinfo process leaks strings
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
mozilla7
People
(Reporter: jruderman, Assigned: bjacob)
References
Details
(Keywords: regression, valgrind)
Attachments
(3 files)
6.13 KB,
text/plain
|
Details | |
2.17 KB,
patch
|
benjamin
:
review+
karlt
:
checkin+
|
Details | Diff | Splinter Review |
1.08 KB,
patch
|
karlt
:
review+
|
Details | Diff | Splinter Review |
Workaround #1: pass --child-silent-after-fork=yes to Valgrind, and ignore Valgrind's exit code.
Workaround #2: add these leaks to a Valgrind suppression file. This
(How are we dealing with this on Tinderbox?)
(How did our other leak detection tools survive the insanity of having every Firefox startup correspond to two shutdowns?)
Reporter | ||
Comment 1•14 years ago
|
||
> Workaround #2: add these leaks to a Valgrind suppression file.
This doesn't work, because two of the leaks have busted stacks (???)
Assignee | ||
Comment 2•14 years ago
|
||
Can you explain me more about this link? I don't understand it.
Is this a leak whereby the forked glxinfo process stays around as a zombie? I'm afraid this problem currently happens, as the zombie only gets killed by GfxInfo::Init() and I'm not sure if that is always getting called (need do check).
Or are you talking about a different kind of leak?
Reporter | ||
Comment 3•14 years ago
|
||
This is a leak where the forked glxinfo process exits, but doesn't free some of its memory before exiting. Some of this memory is associated with strings.
We fork() but don't exec(), so the forked process doesn't have separate memory mappings. This could be the tools getting confused. (Or it could be us doing things we're not supposed to.)
Comment 5•14 years ago
|
||
Returning to let main finish (instead of calling exit()) should mean fewer leaks.
(In reply to comment #4)
> We fork() but don't exec(),
AFAIK we still have only one thread at this point, so I think that's OK.
Reporter | ||
Updated•14 years ago
|
Summary: Forked glxinfo process leaks → Forked glxinfo process leaks strings
Comment 6•14 years ago
|
||
I don't think either of the workarounds are correct, because:
#1 would ignore all children, not just this one we don't care about
#2 would ignore these leaks in the main process (where they are a bug) and not just the glxtest forked process
I think that karlt's solution is probably fine: make fire_glxtest_process return a value instead of exiting, and "return" out of main if we're the forked process.
But I'm also a little worried that we're going to confuse the refcount/tracemalloc logs with the forked process, since they have duplicated file handles and are writing to the same log :-(
Comment 7•14 years ago
|
||
What if we came up with a way of telling valgrind "I'm exiting and I know I'm going to leak, ignore them". We could perhaps do this with a valgrind client request from the child process.
Comment 8•14 years ago
|
||
(In reply to comment #7)
We'd also need to tell our other leak tools to ignore leaks (but perhaps we have to do something there anyway to deal with the duplicate file handles issue).
Assignee | ||
Comment 10•14 years ago
|
||
Note that even with this patch, we still exit() in the error cases, see fatal_error() and x_error_handler(). Should I fix that too?
Updated•14 years ago
|
Attachment #535334 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 11•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/84636464bba8
Assuming this is all what we want to fix here, otherwise reopen.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Can't this approach cause crashes if some of the stuff cleaned up on the exit of the child process is still needed by the parent process (since they're still sharing memory space, since we've never called exec())?
Assignee | ||
Comment 13•14 years ago
|
||
(In reply to comment #12)
> (since they're still sharing memory space
Are they? I didn't know that. That's beyond my understanding of things. Other people please advise!
Comment 14•14 years ago
|
||
This is a fork (not clone) so in general they don't share memory space.
"Under Linux, fork() is implemented using copy-on-write pages."
The child has copies of file descriptors for the same files.
ah, ok.
Updated•14 years ago
|
Assignee: nobody → bjacob
Target Milestone: --- → mozilla7
Comment 16•14 years ago
|
||
Can anyone please provide a test case or some guidelines that I can use to verify this fix? Do I need to create any specific environment to test the fix?
Thanks!
Assignee | ||
Comment 17•14 years ago
|
||
@Mihaela: the bug was detected in valgrind. So the steps-to-reproduce were basically:
1. get a Linux optimized build with debug symbols (might also be possible on Mac, but not on Windows)
2. valgrind ./firefox
Notice that valgrind always reports some leaks that have to be ignored by adding them to a 'valgrind suppressions file'. See:
https://developer.mozilla.org/en/Debugging_Mozilla_with_Valgrind
But frankly, I wouldn't worry too much about this bug. It never was a very big bug and the fix almost certainly works.
Comment 18•14 years ago
|
||
Not directly as reported in comment 0, but, in a debug build, glxinfo dumps strings stats on exit (if it doesn't crash). This happens early while the browser is still running. Something like:
=> mAllocCount: 5
=> mReallocCount: 3
=> mFreeCount: 5
=> mShareCount: 1
=> mAdoptCount: 0
=> mAdoptFreeCount: 0
mFreeCount should match mAllocCount, as above.
Comment 19•14 years ago
|
||
(In reply to Benoit Jacob from comment #10)
> Note that even with this patch, we still exit() in the error cases, see
> fatal_error() and x_error_handler(). Should I fix that too?
That would be appreciated, but now at least I know why I seem to get leaks.
(In reply to David Baron from comment #12)
> Can't this approach cause crashes if some of the stuff cleaned up on the
> exit of the child process is still needed by the parent process (since
> they're still sharing memory space, since we've never called exec())?
I thought that applied to vfork() rather than fork()?
Comment 20•14 years ago
|
||
(In reply to neil@parkwaycc.co.uk from comment #19)
> (In reply to Benoit Jacob from comment #10)
> > Note that even with this patch, we still exit() in the error cases, see
> > fatal_error() and x_error_handler(). Should I fix that too?
> That would be appreciated, but now at least I know why I seem to get leaks.
Probably even better would be if the leak code didn't report leaks on error exit codes. on_exit() provides this info, but was dropped between SunOS 4 and 5, so atexit is recommended instead. I don't know how to get the exit code with atexit().
I guess glxtest could abort() instead of exit(). (ISTR we now make abort perform a SIGSEGV-generating op, though. That could cause surprises but I guess we already cope with that.)
Assignee | ||
Comment 21•14 years ago
|
||
(In reply to Karl Tomlinson (:karlt) from comment #20)
> (In reply to neil@parkwaycc.co.uk from comment #19)
> > (In reply to Benoit Jacob from comment #10)
> > > Note that even with this patch, we still exit() in the error cases, see
> > > fatal_error() and x_error_handler(). Should I fix that too?
> > That would be appreciated, but now at least I know why I seem to get leaks.
>
> Probably even better would be if the leak code didn't report leaks on error
> exit codes. on_exit() provides this info, but was dropped between SunOS 4
> and 5, so atexit is recommended instead. I don't know how to get the exit
> code with atexit().
I really don't want to use atexit() in library (here, libxul) code. (FWIW, since any code is susceptible of moving to a library in the future, I never want to use atexit).
>
> I guess glxtest could abort() instead of exit(). (ISTR we now make abort
> perform a SIGSEGV-generating op, though. That could cause surprises but I
> guess we already cope with that.)
Great idea!
First I reproduced the string leak here on linux, by making the glxtest process trigger an artificial error. Then I replaced exit() by abort() in the fatal_error and X error handler functions, and XPCOM_MEM_LEAK_LOG didn't report any leak anymore. Patch coming.
Assignee | ||
Comment 22•14 years ago
|
||
Attachment #560845 -
Flags: review?(karlt)
Updated•14 years ago
|
Attachment #560845 -
Flags: review?(karlt) → review+
Comment 23•14 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #21)
> (In reply to Karl Tomlinson (:karlt) from comment #20)
> > Probably even better would be if the leak code didn't report leaks on error
> > exit codes. on_exit() provides this info, but was dropped between SunOS 4
> > and 5, so atexit is recommended instead. I don't know how to get the exit
> > code with atexit().
>
> I really don't want to use atexit() in library (here, libxul) code. (FWIW,
> since any code is susceptible of moving to a library in the future, I never
> want to use atexit).
Yes, best avoid atexit wherever possible.
My comment was based on the assumption that the leak reporting code used atexit, but it looks like that is either no longer the case, or never was:
http://hg.mozilla.org/mozilla-central/rev/f4ddad2c0eb7
http://hg.mozilla.org/mozilla-central/annotate/5319b0100025/xpcom/string/src/nsSubstring.cpp#l105
Updated•14 years ago
|
Attachment #535334 -
Flags: checkin+
Comment 24•14 years ago
|
||
Will abort() dump core or trigger crashreporter, or is it too early?
Assignee | ||
Comment 25•14 years ago
|
||
It doesn't, as the glxtest process is forked specifically before the point where we enable the crashreporter. See toolkit/xre/nsAppRunner.cpp.
This was necessary as the whole point of the glxtest process is that it must be able to crash without the user noticing.
You need to log in
before you can comment on or make changes to this bug.
Description
•