Open Bug 1187421 Opened 9 years ago Updated 2 years ago

Leak in error_get_my_stack

Categories

(NSPR :: NSPR, defect)

defect

Tracking

(Not tracked)

People

(Reporter: mccr8, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: memory-leak)

Attachments

(2 files)

LeakSanitizer detected this small leak in a Firefox content process:

Direct leak of 72 byte(s) in 1 object(s).
    #0 0x4750f1 in calloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:90
    #1 0x7ff63c019890 in error_get_my_stack src/security/nss/lib/base/error.c:109
    #2 0x7ff63c019c38 in nss_ClearErrorStack src/security/nss/lib/base/error.c:245
    #3 0x7ff63c0182f8 in NSSArena_Create src/security/nss/lib/base/arena.c:352
    #4 0x7ff63bff452b in NSSTrustDomain_Create src/security/nss/lib/pki/trustdomain.c:34

error_get_my_stack allocates a new stack and then calls PR_SetThreadPrivate() on it, so I'm guessing the thread private data isn't being freed somehow. It looks like that's supposed to happen in nss_DestroyErrorStack().

This happens with dom/media/tests/mochitest/ipc/test_ipc.html, and in e10s Mochitest-1. I haven't narrowed it down any further.
This is still pretty widespread across other mochitest suites as well.
Tim, is there any chance you'd be able to look at this? It's the only NSS LSAN suppression we've got at this point.
Flags: needinfo?(ttaubert)
I've seen this before. The error stack is allocated whenever we hit errors. Looks like we're not calling PR_Cleanup() or whatever again the function was that cleans this up.
Flags: needinfo?(ttaubert)
I'm not sure where the right place to call PR_Cleanup() would be? As long as Firefox is still using NSPR too we can't call it right after NSS_Shutdown().
Assignee: nobody → nobody
Component: Libraries → NSPR
Product: NSS → NSPR
Version: 3.15 → other
I ran into this bug as well. It looks to me like there are multiple leaks here. This patch fixes them. (And hopefully doesn't break anything.)
Attachment #8838442 - Flags: review?(ted)
This is the NSS part of the leak fix.
Assignee: nobody → franziskuskiefer
Attachment #8838444 - Flags: review?(ttaubert)
Comment on attachment 8838444 [details] [diff] [review]
fix-error-leak-nss.patch

Review of attachment 8838444 [details] [diff] [review]:
-----------------------------------------------------------------

::: lib/base/error.c
@@ +265,5 @@
>  {
>      if (INVALID_TPD_INDEX != error_stack_index) {
>          PR_SetThreadPrivate(error_stack_index, NULL);
> +        error_stack_index = INVALID_TPD_INDEX;
> +        error_call_once = error_call_again; /* allow to init again */

I've seen memset(0) used to achieve the same, we do that in libssl at least. So maybe:

PORT_Memset(&error_call_once, 0, sizeof(error_call_once));
Attachment #8838444 - Flags: review?(ttaubert) → review+
Please remove the leak suppression for this bug from build/sanitizers/lsan_suppressions.txt, assuming that an ASan run is green with your patches.
(In reply to Franziskus Kiefer [:fkiefer or :franziskus] from comment #11)
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=d177f0889c53

Looks like this doesn't fix the Firefox problem :( Something else must be broken (probably not on the NSS side as I can't reproduce the leak with only NSS anymore).
Thanks for checking! Maybe the whitelist could be changed to NSSTrustDomain_Create, which sounds less generic. It looks like the problem is much less common than described in comment 0, so I think a bunch of stuff was fixed.
See Also: → 1341354
I filed bug 1341354 for the residual leaks.
Attachment #8838442 - Flags: review?(ted) → review?(kaie)
Comment on attachment 8838442 [details] [diff] [review]
fix-nspr-leaks.patch

I suspect this might have been knowingly allowed to leak in the past.

Looking at _PR_DestroyThreadPrivate, it attempts to loop through TPD multiple times, and assumes that some items could have unpredictable side effects, and maybe even the TPD items could be set again, after the loop.

If there's anything pending that couldn't get destroyed, then destroying _pr_tpd_destructors could potentially result in a new shutdown crash, right?

If you want to try getting this cleaned up, please set the variables to NULL after freeing them (which the existing code already does for most other variables).
Additional comment:

You are using a mix of free and PR_Free. I checked if it matches the allocation.

$ grep -e "->name *=" pr/src/linking/prlink.c

This shows that it always gets allocated using strdup.
Please use "free" to destroy pr_exe_loadmap->name, not PR_Free.


pr_exe_loadmap is allocated using PR_NEWZAP, which uses PR_Calloc.
Using PR_Free to destroy it is correct.


_pr_tpd_destructors is allocated using PR_CALLOC.
Please use PR_Free to destroy it, not "free".
Attachment #8838442 - Flags: review?(kaie) → review-
I researched when this code was added, it was more than 15 years ago:
  https://hg.mozilla.org/projects/nspr/rev/9640a6cb2c22

Function _PR_ShutdownLinker is still in that original state.
See Also: → 1437404
Assignee: franziskuskiefer → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: