Closed
Bug 573192
Opened 16 years ago
Closed 11 years ago
pt_SetMethods seems to have a tendency to leak
Categories
(NSPR :: NSPR, defect, P1)
NSPR
NSPR
Tracking
(Not tracked)
RESOLVED
FIXED
4.10.8
People
(Reporter: jruderman, Assigned: froydnj)
References
(Blocks 4 open bugs)
Details
(Keywords: memory-leak, valgrind, Whiteboard: [MemShrink:P2])
Attachments
(3 files, 2 obsolete files)
|
82.94 KB,
text/plain
|
Details | |
|
694 bytes,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
|
6.53 KB,
patch
|
froydnj
:
review+
wtc
:
review+
|
Details | Diff | Splinter Review |
I'm using a 64-bit Firefox built from mozilla-central on Ubuntu 10.04.
Valgrind reports a bunch of leaks of what I think are PRFileDesc objects. I looked at the CrashReporter leak (the first one in the Valgrind log) and saw that it calls PR_Close(fd), which I think is all it's supposed to do to clean up.
Comment 1•15 years ago
|
||
Jesse: thanks a lot for the bug fix.
NSPR has a freelist for PRFileDesc objects.
This freelist is destroyed by PR_Cleanup,
but Mozilla cannot call PR_Cleanup because
it has C++ static destructors that call NSPR.
It doesn't seem possible to disable NSPR's
PRFileDesc freelist. I found two environment
varibales NSPR_FD_CACHE_SIZE_LOW and
NSPR_FD_CACHE_SIZE_HIGH, but they seem to
merely change the behavior of the PRFileDesc
freelist, without fully disabling it.
You can try adding a
_PR_CleanupIO();
call to _PR_Fini in mozilla/nsprpub/pr/src/pthreads/ptthread.c:
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/nsprpub/pr/src/pthreads/ptthread.c&rev=3.88&mark=1010,1028#1010
If that doesn't work, we'll need to look into
adding an environment variable that disables
NSPR's PRFileDesc freelist completely.
Status: NEW → ASSIGNED
Comment 2•15 years ago
|
||
(In reply to comment #1)
> Jesse: thanks a lot for the bug fix.
I meant "thanks a lot for the bug report." Sorry about
the typo.
| Assignee | ||
Comment 3•11 years ago
|
||
We are seeing these leaks come up in LSan (Leak Sanitizer; similar ideas to valgrind, but lower overhead at the cost of slightly less precision) runs; we've had to write fairly broad suppressions to silence complaints about the leaks:
http://mxr.mozilla.org/mozilla-central/source/build/sanitizers/lsan_suppressions.txt#75
http://mxr.mozilla.org/mozilla-central/source/build/sanitizers/lsan_suppressions.txt#59
http://mxr.mozilla.org/mozilla-central/source/build/sanitizers/lsan_suppressions.txt#52
We're also seeing over 1MB allocated to file descriptors in some tests:
https://bugzilla.mozilla.org/show_bug.cgi?id=1039914#c6
This is an active leak while the browser is running, so just adding _PR_CleanupIO() calls doesn't help.
According to:
http://mxr.mozilla.org/nspr/source/pr/src/io/prfdcach.c#234
we ought to be disabling the cache when limit_high is 0, but that's clearly not happening:
http://mxr.mozilla.org/nspr/source/pr/src/io/prfdcach.c#57
http://mxr.mozilla.org/nspr/source/pr/src/io/prfdcach.c#131
Can we just change those blocks of code to not cache at all, rather than adding yet another environment variable that users of Firefox won't be able to set anyway?
Updated•11 years ago
|
Whiteboard: [MemShrink]
| Assignee | ||
Comment 4•11 years ago
|
||
Here's a patch which completely blows away the stack-based cache that was being used and arguably makes the code closer to what was intended (e.g. the comments about .limit_high being 0 disabling the cache). Modern malloc implementations should be fast enough to deal with repeated reallocations nowadays. There's no indication that this caching scheme helps on any workload, and plenty of evidence that leak-checking tools are quite confused by this cache.
Attachment #8464772 -
Flags: review?(wtc)
Comment 5•11 years ago
|
||
Comment on attachment 8464772 [details] [diff] [review]
nspr-filedesc.patch
Review of attachment 8464772 [details] [diff] [review]:
-----------------------------------------------------------------
This is a nice cleanup. Sorry for the drive-by, I was interested enough in this to take a look at the patch. I have a few comments on possible improvements and possibly some preexisting bugs that could be addressed as well.
::: pr/src/io/prfdcach.c
@@ +35,5 @@
> PRIntn limit_low, limit_high;
> } _PR_Fd_Cache;
>
> static _PR_Fd_Cache _pr_fd_cache;
> static PRFileDesc **stack2fd = &(((PRFileDesc*)NULL)->higher);
min: This can go as well.
@@ +120,5 @@
> {
> PR_ASSERT(PR_NSPR_IO_LAYER == fd->identity);
> fd->methods = &_pr_faulty_methods;
> fd->identity = PR_INVALID_IO_LAYER;
> fd->secret->state = _PR_FILEDESC_FREED;
min: This part is only needed in the caching branch.
@@ +127,1 @@
> {
--min: We _could_ get rid of this as well, the test below covers both cases.
@@ +163,5 @@
> if (low > high) low = high; /* sanity check the params */
>
> PR_Lock(_pr_fd_cache.ml);
> + _pr_fd_cache.limit_high = high;
> + _pr_fd_cache.limit_low = low;
med: We still should do two things here:
#1 - Going from caching to not caching: delete all the cached entries,
update the count to 0.
#2 - Going from more caching to less caching: delete all the cached
entries above the new threshold. Update the count to the new
threshold. This wasn't being done before, but it's the correct
thing to do.
maj: (Not your bug) It looks like |count| isn't actually initialized
in |_PR_InitFdCache|.
Attachment #8464772 -
Flags: feedback+
Comment 6•11 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #5)
> maj: (Not your bug) It looks like |count| isn't actually initialized
> in |_PR_InitFdCache|.
Nix that, I see now that it's a file level static.
Updated•11 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P2]
Comment 7•11 years ago
|
||
Comment on attachment 8464772 [details] [diff] [review]
nspr-filedesc.patch
Review of attachment 8464772 [details] [diff] [review]:
-----------------------------------------------------------------
Ted might have some useful feedback here.
Attachment #8464772 -
Flags: review?(ted)
| Assignee | ||
Comment 8•11 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #5)
> @@ +163,5 @@
> > if (low > high) low = high; /* sanity check the params */
> >
> > PR_Lock(_pr_fd_cache.ml);
> > + _pr_fd_cache.limit_high = high;
> > + _pr_fd_cache.limit_low = low;
>
> med: We still should do two things here:
>
> #1 - Going from caching to not caching: delete all the cached entries,
> update the count to 0.
>
> #2 - Going from more caching to less caching: delete all the cached
> entries above the new threshold. Update the count to the new
> threshold. This wasn't being done before, but it's the correct
> thing to do.
I support doing these, but I think these requests require a separate bug. I didn't want to complicate the patch review with bugfixes for separate things.
Comment 9•11 years ago
|
||
I'm sorry I forgot to review the patch yesterday. I will
review it tomorrow.
Comment 10•11 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #3)
>
> This is an active leak while the browser is running, so just adding
> _PR_CleanupIO() calls doesn't help.
Can you give me an LSan report of the active leak? I inspected the
relevant code but didn't see anything obviously wrong.
I did see that PR_Close may leak the fd when it returns PR_FAILURE.
PR_Close may return PR_FAILURE for two reasons:
1. The thread has been interrupted by a PR_Interrupt call. I will
attach a patch to fix this, but in mozilla-central I only found
a PR_Interrupt call in xpcom/threads/BackgroundHangMonitor.cpp,
and the hang monitor thread doesn't seem to call PR_Close. So
this should not affect Firefox.
2. The close() system call fails. But this can only happen if
the application has a bug.
So I'd like to understand this active leak before I approve
your patch to remove the use of PRStack in prfdcach.c. Thanks.
Flags: needinfo?(wtc)
Comment 11•11 years ago
|
||
Since PR_Close isn't a blocking operation, it doesn't really need
to check the thread interrupt status.
Also, most applications don't know what to do if PR_Close fails,
so a PR_Close failure will result in a leak of the PRFileDesc.
Attachment #8469756 -
Flags: review?(nfroyd)
Comment 12•11 years ago
|
||
Comment on attachment 452429 [details]
parts of a valgrind log
Nathan: this old valgrind log shows the allocation sites but
doesn't show where the memory blocks are leaked.
Can LSan show where the memory blocks are (actively) leaked?
Comment 13•11 years ago
|
||
(In reply to Wan-Teh Chang from comment #10)
> Can you give me an LSan report of the active leak? I inspected the
> relevant code but didn't see anything obviously wrong.
There are leak stacks in bug 1028456, bug 1021854, and bug 1021302.
> Can LSan show where the memory blocks are (actively) leaked?
Unfortunately, LSan only shows the allocation stacks. I do have an experimental tool that does conservative heap scanning at shutdown to give more information about what is holding things alive, if that's what you mean. I could try that.
| Assignee | ||
Comment 14•11 years ago
|
||
"active leak" is probably a bad term. See for instance bug 1039914, especially bug 1039914 comment 20, where there's ~1.5MB of unfreed memory allocated by _PR_Getfd. That memory will get reused if we open more file descriptors, but it'd be better to have the memory allocator managing that memory for all requests, rather than _PR_Getfd managing it for its own purposes.
| Assignee | ||
Comment 15•11 years ago
|
||
Comment on attachment 8469756 [details] [diff] [review]
PR_Close should not fail if the calling thread has been interrupted
Review of attachment 8469756 [details] [diff] [review]:
-----------------------------------------------------------------
Makes sense to me.
Attachment #8469756 -
Flags: review?(nfroyd) → review+
Comment 16•11 years ago
|
||
Andrew, Nathan: thanks for the info.
So, are these leaks unreachable memory, or are they the memory still
reachable from the NSPR fd cache?
If the latter, then I think the problem is that NSPR's fd cache, when
implemented using a PRStack (an atomic singly-linked list), has an
unbounded size. Do you agree?
Comment 17•11 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #14)
> ..., but it'd be better to have the memory allocator managing that
> memory for all requests, rather than _PR_Getfd managing it for its own
> purposes.
Free lists for commonly used objects were a common optimization technique
when this code was written. But that was before the era of jemalloc and
tcmalloc.
Is the current recommendation to just rely on the memory allocator?
Comment 18•11 years ago
|
||
> Is the current recommendation to just rely on the memory allocator?
Yes. I've removed from Firefox several such cases of heap blocks being cached and recycled over the past few years. Bug 814312 is one case, and the first part of https://blog.mozilla.org/nnethercote/2012/01/18/memshrink-progress-week-31/ describes another, and I know there were at least two more. In every case it was a simplification, a memory win, and didn't hurt speed at all.
In general, custom allocators rarely beat well-written general-purpose allocators. There's even academic research about it (http://emeryblogger.com/2012/10/28/most-influential-oopsla2012/). One common exception to this rule is that pool/arena allocators are often a win, but this case isn't one of those.
Comment 19•11 years ago
|
||
Comment on attachment 8464772 [details] [diff] [review]
nspr-filedesc.patch
It looks like Wan-Teh is already invested here, so it doesn't seem worthwhile for me to try to come up to speed. Feel free to re-request if you really need me.
Attachment #8464772 -
Flags: review?(ted)
Comment 20•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #18)
> > Is the current recommendation to just rely on the memory allocator?
>
> Yes. I've removed from Firefox several such cases of heap blocks being
> cached and recycled over the past few years.
Nathan, Nicholas: in that case, should we just remove the _PR_Fd_Cache
and the _PR_Getfd and _PR_Putfd functions completely?
| Assignee | ||
Comment 21•11 years ago
|
||
(In reply to Wan-Teh Chang from comment #20)
> (In reply to Nicholas Nethercote [:njn] from comment #18)
> > > Is the current recommendation to just rely on the memory allocator?
> >
> > Yes. I've removed from Firefox several such cases of heap blocks being
> > cached and recycled over the past few years.
>
> Nathan, Nicholas: in that case, should we just remove the _PR_Fd_Cache
> and the _PR_Getfd and _PR_Putfd functions completely?
I think removing _PR_Fd_Cache makes sense. Removing the other two functions seems like a bad idea, since they perform functions that would be tedious to reproduce at each callsite: basic clearing for _PR_Getfd and poisoning and secret-freeing for _PR_Putfd. Renaming the functions to something like _PR_AllocFD and _PR_FreeFD make more sense to me.
Comment 22•11 years ago
|
||
Nathan's suggestion sounds good to me.
| Assignee | ||
Comment 23•11 years ago
|
||
Wan-Teh, do you have an opinion on the patch?
Flags: needinfo?(wtc)
Updated•11 years ago
|
Comment 24•11 years ago
|
||
(In reply to Nathan Froyd (:froydnj) from comment #21)
>
> I think removing _PR_Fd_Cache makes sense. Removing [the _PR_Getfd and _PR_Putfd functions]
> seems like a bad idea, since they perform functions that would be tedious to
> reproduce at each callsite: basic clearing for _PR_Getfd and poisoning and
> secret-freeing for _PR_Putfd.
Poisoning will be done by the memory allocator. But I agree with the other points.
> Renaming the functions to something like _PR_AllocFD and _PR_FreeFD make more
> sense to me.
Yes. They could be confused with the exported functions PR_AllocFileDesc and
PR_FreeFileDesc though. I don't think we need _PR_FreeFD. We may not need
_PR_AllocFD -- we may be able to just change the callers of _PR_Getfd to
call PR_AllocFileDesc instead. I haven't looked into this though.
Flags: needinfo?(wtc)
Comment 25•11 years ago
|
||
Nathan, Nicholas:
Sorry about the late reply.
Ideally, I'd prefer that we make the current lock-free _PR_Fd_Cache have
a bounded size. We should be able to do this by incrementing and decrementing
an element count variable atomically. This way, applications that can't
use an alternative memory allocator will continue to benefit from the
_PR_Fd_Cache.
If you don't have time to do that, let's just get rid of _PR_Fd_Cache.
I won't be able to review your patches in the next two weeks. Please
have one of you review the other person's patches. Then, Kai Engert (:kaie)
or Ted can check in the patches for you. Thanks.
| Assignee | ||
Comment 26•11 years ago
|
||
This is a new patch, with the deletion of stack2fd as suggested by Eric. I've chosen to leave Eric's other suggestions regarding code simplification undone in this bug, because I think they make the patch easier to review. Eric's comment about _pr_fd_cache.count not being initialized in _PR_InitFdCache is, I think, superfluous, because the whole of _pr_fd_cache is initialized to zero as static data.
So I think this is ready for review by Nicholas, per Wan-Teh's comment 25.
Attachment #8464772 -
Attachment is obsolete: true
Attachment #8464772 -
Flags: review?(wtc)
Attachment #8485959 -
Flags: review?(n.nethercote)
Comment 27•11 years ago
|
||
Comment on attachment 8485959 [details] [diff] [review]
nspr-filedesc.patch
Review of attachment 8485959 [details] [diff] [review]:
-----------------------------------------------------------------
I'm giving f+ because this is the first time I've looked at this code. I suggest r?ing erahm once you've addressed my comments, because he has looked more closely at this code than I have.
::: pr/src/io/prfdcach.c
@@ +126,2 @@
> {
> + if (_pr_fd_cache.count < _pr_fd_cache.limit_high)
Shouldn't this be |<=| instead of |<|?
@@ +146,5 @@
> }
> }
> +
> + PR_Free(fd->secret);
> + PR_Free(fd);
These PR_Free() calls now execute if |0 == _pr_fd_cache.limit_high| is true. They didn't before. Is this intentional?
If not, given this issue and the |<=| issue above, I'd suggest changing the control flow as little as possible. Changing the |0 ==| to |0 !=| is fine, because that removes an otherwise-empty then-branch, but I'd undo the other control flow changes.
Attachment #8485959 -
Flags: review?(n.nethercote) → feedback+
| Assignee | ||
Comment 28•11 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #27)
> Comment on attachment 8485959 [details] [diff] [review]
> nspr-filedesc.patch
>
> Review of attachment 8485959 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> I'm giving f+ because this is the first time I've looked at this code. I
> suggest r?ing erahm once you've addressed my comments, because he has looked
> more closely at this code than I have.
>
> ::: pr/src/io/prfdcach.c
> @@ +126,2 @@
> > {
> > + if (_pr_fd_cache.count < _pr_fd_cache.limit_high)
>
> Shouldn't this be |<=| instead of |<|?
Let's say _pr_fd_cache.limit_high is 100.
* In the _pr_fd_cache.count = 99 case, we do want to add the entry to the cache;
* In the _pr_fd_cache.count = 100 case, we do not want to add the entry to the cache.
So I think |<| is correct and the previous code had an off-by-one error in it: it would add an entry to the cache when the cache already contained as many entries as limit_high specified. I could see the case for being conservative and making it |<=| for compatibility with the existing code.
> @@ +146,5 @@
> > }
> > }
> > +
> > + PR_Free(fd->secret);
> > + PR_Free(fd);
>
> These PR_Free() calls now execute if |0 == _pr_fd_cache.limit_high| is true.
> They didn't before. Is this intentional?
Yes, this is intentional. Elsewhere in the code, |_pr_fd_cache.limit_high == 0| is cited as the condition for disabling the cache. And when the cache is disabled, we always want to free the entries.
> If not, given this issue and the |<=| issue above, I'd suggest changing the
> control flow as little as possible. Changing the |0 ==| to |0 !=| is fine,
> because that removes an otherwise-empty then-branch, but I'd undo the other
> control flow changes.
I am confused as to how inverting the |0 ==| to |0 !=| eliminates an otherwise-empty then-case if you want to retain the rest of the control flow in that function. Because you still need to do something in the |0 == _pr_fd_cache.limit_high| case, whether you have control flow like:
if (0 != _pr_fd_cache.limit_high) {
if (_pr_fd_cache.count > _pr_fd_cache.limit_high) {
/* free the fd */
} else {
/* place fd in cache */
}
return;
}
/* free the fd */
or whether it's written as in the patch, combining the two /* free the fd */ cases. If you want maximal retaining of the control flow, I suppose you could write:
if (0 == _pr_fd_cache.limit_high) {
/* free the fd */
} else {
if (_pr_fd_cache.count > _pr_fd_cache.limit_high) {
/* free the fd */
} else {
/* place fd in cache */
}
}
Can you help clarify things for me here?
Flags: needinfo?(n.nethercote)
Comment 29•11 years ago
|
||
If the control flow changes are deliberate, that's fine. I just wanted to make sure they were deliberate.
Flags: needinfo?(n.nethercote)
| Assignee | ||
Comment 30•11 years ago
|
||
Comment on attachment 8485959 [details] [diff] [review]
nspr-filedesc.patch
Review of attachment 8485959 [details] [diff] [review]:
-----------------------------------------------------------------
OK, asking for Eric's review then.
Attachment #8485959 -
Flags: review?(erahm)
Comment 31•11 years ago
|
||
Comment on attachment 8485959 [details] [diff] [review]
nspr-filedesc.patch
Review of attachment 8485959 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm. Can we also make FD caching opt-in for debug builds [1] as well? Can you also file follow ups for the issues mentioned in comment 8?
[1] http://hg.mozilla.org/mozilla-central/annotate/6b8da5940f74/nsprpub/pr/src/io/prfdcach.c#l239
Attachment #8485959 -
Flags: review?(erahm) → review+
| Assignee | ||
Comment 32•11 years ago
|
||
Updating patch with r=erahm. Ted, Wan-Teh said you can you check this in; can you do that, please?
Attachment #8485959 -
Attachment is obsolete: true
Attachment #8489559 -
Flags: review+
Flags: needinfo?(ted)
| Assignee | ||
Comment 33•11 years ago
|
||
I filed bug 1067536 for the opt-in DEBUG setting and bug 1067537 for addressing the issues Eric pointed out in comment 8.
Comment 34•11 years ago
|
||
http://hg.mozilla.org/projects/nspr/rev/84acbf6789fb
Sorry for the delay. Do you need a new NSPR tag to sync this to mozilla-central?
Flags: needinfo?(ted)
| Assignee | ||
Comment 35•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #34)
> http://hg.mozilla.org/projects/nspr/rev/84acbf6789fb
>
> Sorry for the delay. Do you need a new NSPR tag to sync this to
> mozilla-central?
Yes please. Can you handle the tagging and the syncing to central?
Flags: needinfo?(ted)
Comment 36•11 years ago
|
||
I tagged NSPR_4_10_8_BETA1, you can push it yourself using
python client.py update_nspr NSPR_4_10_8_BETA1
Flags: needinfo?(ted)
Updated•11 years ago
|
Assignee: wtc → nfroyd
Priority: -- → P1
Target Milestone: --- → 4.10.8
Comment 37•11 years ago
|
||
Comment on attachment 8489559 [details] [diff] [review]
nspr-filedesc.patch
Review of attachment 8489559 [details] [diff] [review]:
-----------------------------------------------------------------
r=wtc. I have a question about a minor issue.
::: pr/src/io/prfdcach.c
@@ +126,2 @@
> {
> + if (_pr_fd_cache.count < _pr_fd_cache.limit_high)
To match the original code, this should be <= instead of < , right?
The original code tests _pr_fd_cache.count > _pr_fd_cache.limit_high
for the opposite condition. Is the original code wrong?
Attachment #8489559 -
Flags: review+
| Assignee | ||
Comment 39•11 years ago
|
||
(In reply to Wan-Teh Chang from comment #37)
> Comment on attachment 8489559 [details] [diff] [review]
> nspr-filedesc.patch
>
> Review of attachment 8489559 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> r=wtc. I have a question about a minor issue.
>
> ::: pr/src/io/prfdcach.c
> @@ +126,2 @@
> > {
> > + if (_pr_fd_cache.count < _pr_fd_cache.limit_high)
>
> To match the original code, this should be <= instead of < , right?
Yes.
> The original code tests _pr_fd_cache.count > _pr_fd_cache.limit_high
> for the opposite condition. Is the original code wrong?
I think so. Nicholas asked the same question in his review of the patch. I responded to that question in the first part of comment 28; assuming I understood what |limit_high| represented, I think the original code had an off-by-one error, because we could wind up adding an entry to the cache when count == limit_high. Which would mean that we could have more than limit_high entries in the cache, which seems wrong (although admittedly a minor mistake).
Flags: needinfo?(nfroyd)
Comment 40•11 years ago
|
||
This issue is in ASSIGNED state, but I can see a commit already against the 4.10.8 version and 4.10.8 is already released. Is this issue resolved or not?
Comment 41•11 years ago
|
||
This bug is fixed in NSPR 4.10.8, which was released on Jan 28, 2015.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•