Closed Bug 658840 Opened 14 years ago Closed 13 years ago

Forked glxinfo process leaks strings

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla7

People

(Reporter: jruderman, Assigned: bjacob)

References

Details

(Keywords: regression, valgrind)

Attachments

(3 files)

valgrind output (firefox debug) 14 years ago Jesse Ruderman 6.13 KB, text/plain		Details
return from main() in the glxtest process 14 years ago Benoit Jacob [:bjacob] (mostly away) 2.17 KB, patch	benjamin : review+ karlt : checkin+	Details \| Diff \| Splinter Review
use abort() for errors in glxtest 13 years ago Benoit Jacob [:bjacob] (mostly away) 1.08 KB, patch	karlt : review+	Details \| Diff \| Splinter Review

Jesse Ruderman

Reporter

Description

•

14 years ago

Attached file valgrind output (firefox debug) — Details

Workaround #1: pass --child-silent-after-fork=yes to Valgrind, and ignore Valgrind's exit code.

Workaround #2: add these leaks to a Valgrind suppression file. This 

(How are we dealing with this on Tinderbox?)

(How did our other leak detection tools survive the insanity of having every Firefox startup correspond to two shutdowns?)

Jesse Ruderman

Reporter

Comment 1

•

14 years ago

> Workaround #2: add these leaks to a Valgrind suppression file.

This doesn't work, because two of the leaks have busted stacks (???)

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 2

•

14 years ago

Can you explain me more about this link? I don't understand it.

Is this a leak whereby the forked glxinfo process stays around as a zombie? I'm afraid this problem currently happens, as the zombie only gets killed by GfxInfo::Init() and I'm not sure if that is always getting called (need do check).

Or are you talking about a different kind of leak?

Jesse Ruderman

Reporter

Comment 3

•

14 years ago

This is a leak where the forked glxinfo process exits, but doesn't free some of its memory before exiting.  Some of this memory is associated with strings.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 4

•

14 years ago

We fork() but don't exec(), so the forked process doesn't have separate memory mappings.  This could be the tools getting confused.  (Or it could be us doing things we're not supposed to.)

Karl Tomlinson (:karlt)

Comment 5

•

14 years ago

Returning to let main finish (instead of calling exit()) should mean fewer leaks.

(In reply to comment #4)
> We fork() but don't exec(),

AFAIK we still have only one thread at this point, so I think that's OK.

Jesse Ruderman

Reporter

Updated

•

14 years ago

Summary: Forked glxinfo process leaks → Forked glxinfo process leaks strings

Benjamin Smedberg

Comment 6

•

14 years ago

I don't think either of the workarounds are correct, because:

#1 would ignore all children, not just this one we don't care about
#2 would ignore these leaks in the main process (where they are a bug) and not just the glxtest forked process

I think that karlt's solution is probably fine: make fire_glxtest_process return a value instead of exiting, and "return" out of main if we're the forked process.

But I'm also a little worried that we're going to confuse the refcount/tracemalloc logs with the forked process, since they have duplicated file handles and are writing to the same log :-(

Jeff Muizelaar [:jrmuizel]

Comment 7

•

14 years ago

What if we came up with a way of telling valgrind "I'm exiting and I know I'm going to leak, ignore them". We could perhaps do this with a valgrind client request from the child process.

Karl Tomlinson (:karlt)

Comment 8

•

14 years ago

(In reply to comment #7)
We'd also need to tell our other leak tools to ignore leaks (but perhaps we have to do something there anyway to deal with the duplicate file handles issue).

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 9

•

14 years ago

Attached patch return from main() in the glxtest process — Details — Splinter Review

Like this?

Attachment #535334 - Flags: review?(benjamin)

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 10

•

14 years ago

Note that even with this patch, we still exit() in the error cases, see fatal_error() and x_error_handler(). Should I fix that too?

Benjamin Smedberg

Updated

•

14 years ago

Attachment #535334 - Flags: review?(benjamin) → review+

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 11

•

13 years ago

http://hg.mozilla.org/mozilla-central/rev/84636464bba8

Assuming this is all what we want to fix here, otherwise reopen.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 12

•

13 years ago

Can't this approach cause crashes if some of the stuff cleaned up on the exit of the child process is still needed by the parent process (since they're still sharing memory space, since we've never called exec())?

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 13

•

13 years ago

(In reply to comment #12)
> (since they're still sharing memory space

Are they? I didn't know that. That's beyond my understanding of things. Other people please advise!

Karl Tomlinson (:karlt)

Comment 14

•

13 years ago

This is a fork (not clone) so in general they don't share memory space.
"Under Linux, fork() is implemented using copy-on-write pages."

The child has copies of file descriptors for the same files.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 15

•

13 years ago

ah, ok.

Dão Gottwald [:dao]

Updated

•

13 years ago

Assignee: nobody → bjacob

Target Milestone: --- → mozilla7

Mihaela Velimiroviciu (:mihaelav)

Comment 16

•

13 years ago

Can anyone please provide a test case or some guidelines that I can use to verify this fix? Do I need to create any specific environment to test the fix?

Thanks!

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 17

•

13 years ago

@Mihaela: the bug was detected in valgrind. So the steps-to-reproduce were basically:
1. get a Linux optimized build with debug symbols (might also be possible on Mac, but not on Windows)
2. valgrind ./firefox

Notice that valgrind always reports some leaks that have to be ignored by adding them to a 'valgrind suppressions file'. See:
https://developer.mozilla.org/en/Debugging_Mozilla_with_Valgrind

But frankly, I wouldn't worry too much about this bug. It never was a very big bug and the fix almost certainly works.

Karl Tomlinson (:karlt)

Comment 18

•

13 years ago

Not directly as reported in comment 0, but, in a debug build, glxinfo dumps strings stats on exit (if it doesn't crash).  This happens early while the browser is still running.  Something like:

 => mAllocCount:              5
 => mReallocCount:            3
 => mFreeCount:               5
 => mShareCount:              1
 => mAdoptCount:              0
 => mAdoptFreeCount:          0

mFreeCount should match mAllocCount, as above.

neil@parkwaycc.co.uk

Comment 19

•

13 years ago

(In reply to Benoit Jacob from comment #10)
> Note that even with this patch, we still exit() in the error cases, see
> fatal_error() and x_error_handler(). Should I fix that too?
That would be appreciated, but now at least I know why I seem to get leaks.

(In reply to David Baron from comment #12)
> Can't this approach cause crashes if some of the stuff cleaned up on the
> exit of the child process is still needed by the parent process (since
> they're still sharing memory space, since we've never called exec())?
I thought that applied to vfork() rather than fork()?

Karl Tomlinson (:karlt)

Comment 20

•

13 years ago

(In reply to neil@parkwaycc.co.uk from comment #19)
> (In reply to Benoit Jacob from comment #10)
> > Note that even with this patch, we still exit() in the error cases, see
> > fatal_error() and x_error_handler(). Should I fix that too?
> That would be appreciated, but now at least I know why I seem to get leaks.

Probably even better would be if the leak code didn't report leaks on error exit codes.  on_exit() provides this info, but was dropped between SunOS 4 and 5, so atexit is recommended instead.  I don't know how to get the exit code with atexit().

I guess glxtest could abort() instead of exit().  (ISTR we now make abort perform a SIGSEGV-generating op, though.  That could cause surprises but I guess we already cope with that.)

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 21

•

13 years ago

(In reply to Karl Tomlinson (:karlt) from comment #20)
> (In reply to neil@parkwaycc.co.uk from comment #19)
> > (In reply to Benoit Jacob from comment #10)
> > > Note that even with this patch, we still exit() in the error cases, see
> > > fatal_error() and x_error_handler(). Should I fix that too?
> > That would be appreciated, but now at least I know why I seem to get leaks.
> 
> Probably even better would be if the leak code didn't report leaks on error
> exit codes.  on_exit() provides this info, but was dropped between SunOS 4
> and 5, so atexit is recommended instead.  I don't know how to get the exit
> code with atexit().

I really don't want to use atexit() in library (here, libxul) code. (FWIW, since any code is susceptible of moving to a library in the future, I never want to use atexit).

> 
> I guess glxtest could abort() instead of exit().  (ISTR we now make abort
> perform a SIGSEGV-generating op, though.  That could cause surprises but I
> guess we already cope with that.)

Great idea!
First I reproduced the string leak here on linux, by making the glxtest process trigger an artificial error. Then I replaced exit() by abort() in the fatal_error and X error handler functions, and XPCOM_MEM_LEAK_LOG didn't report any leak anymore. Patch coming.

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 22

•

13 years ago

Attached patch use abort() for errors in glxtest — Details — Splinter Review

Attachment #560845 - Flags: review?(karlt)

Karl Tomlinson (:karlt)

Updated

•

13 years ago

Attachment #560845 - Flags: review?(karlt) → review+

Karl Tomlinson (:karlt)

Comment 23

•

13 years ago

(In reply to Benoit Jacob [:bjacob] from comment #21)
> (In reply to Karl Tomlinson (:karlt) from comment #20)
> > Probably even better would be if the leak code didn't report leaks on error
> > exit codes.  on_exit() provides this info, but was dropped between SunOS 4
> > and 5, so atexit is recommended instead.  I don't know how to get the exit
> > code with atexit().
> 
> I really don't want to use atexit() in library (here, libxul) code. (FWIW,
> since any code is susceptible of moving to a library in the future, I never
> want to use atexit).

Yes, best avoid atexit wherever possible.
My comment was based on the assumption that the leak reporting code used atexit, but it looks like that is either no longer the case, or never was:
http://hg.mozilla.org/mozilla-central/rev/f4ddad2c0eb7
http://hg.mozilla.org/mozilla-central/annotate/5319b0100025/xpcom/string/src/nsSubstring.cpp#l105

Karl Tomlinson (:karlt)

Updated

•

13 years ago

Attachment #535334 - Flags: checkin+

neil@parkwaycc.co.uk

Comment 24

•

13 years ago

Will abort() dump core or trigger crashreporter, or is it too early?

Benoit Jacob [:bjacob] (mostly away)

Assignee

Comment 25

•

13 years ago

It doesn't, as the glxtest process is forked specifically before the point where we enable the crashreporter. See toolkit/xre/nsAppRunner.cpp.

This was necessary as the whole point of the glxtest process is that it must be able to crash without the user noticing.

You need to log in before you can comment on or make changes to this bug.