NSPR pthread_key_t leak and memory corruption

RESOLVED FIXED in 4.10.3

Status

defect
P1
normal
RESOLVED FIXED
7 years ago
6 years ago

People

(Reporter: elio.maldonado.batiz, Assigned: kaie)

Tracking

other
4.10.3
All
Linux

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: summary: comment 45)

Attachments

(8 attachments, 7 obsolete attachments)

1.06 KB, text/plain
Details
1.54 KB, text/plain
Details
2.00 KB, patch
rrelyea
: review+
Details | Diff | Splinter Review
2.65 KB, patch
Details | Diff | Splinter Review
1.27 KB, patch
Details | Diff | Splinter Review
737 bytes, patch
kaie
: review+
Details | Diff | Splinter Review
2.59 KB, patch
rrelyea
: review+
Details | Diff | Splinter Review
2.15 KB, patch
kaie
: review+
Details | Diff | Splinter Review
Reporter

Description

7 years ago
This bug was originally reported against nspr 4.8.6 on RHEL 5 with heavy usage but with the reproducers provided I could see on RHEL 6 and fedora-16.
Assignee

Updated

7 years ago
Attachment #619215 - Attachment mime type: text/x-csrc → text/plain
Assignee

Updated

7 years ago
Attachment #619217 - Attachment mime type: text/x-csrc → text/plain
Assignee

Comment 3

7 years ago
Comment on attachment 619218 [details] [diff] [review]
Proof of concept fix from Joe Orton (fix the leak)

Did you intend to ask Wan-Teh for review?
Attachment #619218 - Attachment is patch: true
Attachment #619218 - Flags: review?(wtc)
Reporter

Comment 4

7 years ago
This hasn't gotten attention for quite some time, I hope it will soon when Wan-Teh comes back. Red Hat folks doing planning are asking me if we can get it on time for an upcoming RHEL update.
Assignee

Comment 5

7 years ago
Elio, I propose to bring such issues to attention during the weekly conf call, if they are urgent. Bugmail is too easy to miss.
Assignee

Comment 6

7 years ago
Comment on attachment 619218 [details] [diff] [review]
Proof of concept fix from Joe Orton (fix the leak)

I agree with this patch.

The manual page for pthread_key_create says that it's the responsibility of the application to cleanup the resources associated to the key.

Because _PR_Fini is a function called at unloading a shared library, I agree this is the correct place to perform the cleanup. In particular the existing comment suggests to do it here.

r=kaie

However, I propose a minor enhancement for portability. We should define a symbol, in the same way as it's being done for pthread_key_create, because the name of that function may differ between platforms.
Attachment #619218 - Flags: review?(wtc) → review+
Assignee

Comment 7

7 years ago
Slightly improved patch for portability.
Assignee

Comment 8

7 years ago
Explanation what this bug is about and why we need it fixed:

If the NSPR shared libraries get repeatedly loaded and unloaded during the lifetime of a single process, NSPR might repeatedly allocate the same key again and again, with never freeing it up. Eventually the memory available for thread specific keys will be exhausted and the application will fail.

By cleaning up we avoid this problem and increase reliability.

Comment 9

7 years ago
Just a note, there are really two separate bugs here:

a) memory leak => NSPR never calls pthread_key_delete()
b) memory corruption => NSPR corrupts memory if pthread_key_create() fails by not handling the error, it then "steals" another application's TSD and writes over it.  This is arguably the more severe bug.

Calling pthread_key_delete() in _PR_Fini fixes (a) but not (b).
Assignee

Comment 10

7 years ago
Joe, I agree with you.

But I don't have a quick solution for (b).

If we run in a well behaved environment, and NSPR cleans up after itself (a),
then it's much less likely that we'll suffer from (b), do you agree?

I understand that (a) is the quick fix that we want to get done.
Assignee

Comment 11

7 years ago
I believe it's difficult to find a solution for (b)
that would allow the code to function anywhy.

It's basically a kind of out-of-memory error of critical coordination memory.

But maybe we should detect (b) and abort the application.
Assignee

Updated

7 years ago
Attachment #645312 - Attachment description: Patch v2 → Patch v2 (fix the leak)
Assignee

Updated

7 years ago
Attachment #619218 - Attachment description: Proof of concept fix from Joe Orton → Proof of concept fix from Joe Orton (fix the leak)

Comment 13

7 years ago
Comment on attachment 645312 [details] [diff] [review]
Patch v2 (fix the leak) [checked in]

r+ rrelyea.

This patch is safe. We are freeing our thread keyjust before unloading nspr (which will clear the state of _pr_initialized if we ever reload NSPR).

I'm r+ it but it's not complete. If we call PR_Cleanup() before we unload, we won't free the thread key.

bob
Attachment #645312 - Flags: review+
Assignee

Comment 14

7 years ago
Checking in include/md/_pth.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_pth.h,v  <--  _pth.h
new revision: 3.40; previous revision: 3.39
done
Checking in src/pthreads/ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v  <--  ptthread.c
new revision: 3.97; previous revision: 3.96
done
Assignee

Updated

7 years ago
Attachment #645312 - Attachment description: Patch v2 (fix the leak) → Patch v2 (fix the leak) [checked in]
Assignee

Comment 15

7 years ago
> I'm r+ it but it's not complete. If we call PR_Cleanup() before we unload,
> we won't free the thread key.

... because PR_Cleanup() will set _pr_initialized to FALSE.

This makes it difficult to decide in _PR_FINI whether it's necessary
to call pthread_key_delete.

I propose that we avoid this difficulty and call pthread_key_delete
from inside PR_Cleanup().
Assignee

Comment 16

7 years ago
(In reply to Kai Engert (:kaie) from comment #15)
> > I'm r+ it but it's not complete. If we call PR_Cleanup() before we unload,
> > we won't free the thread key.
> 
> ... because PR_Cleanup() will set _pr_initialized to FALSE.
> 
> This makes it difficult to decide in _PR_FINI whether it's necessary
> to call pthread_key_delete.
> 
> I propose that we avoid this difficulty and call pthread_key_delete
> from inside PR_Cleanup().

Actually, maybe that's too risky if there's still another thread active.

It would be safer to add another bool to pt_book that says whether or not we must free the key.
Assignee

Comment 17

7 years ago
Assignee: wtc → kaie
Attachment #619218 - Attachment is obsolete: true
Attachment #645351 - Attachment is obsolete: true
Attachment #645381 - Flags: review?(rrelyea)

Comment 18

7 years ago
Comment on attachment 645381 [details] [diff] [review]
Incremental patch v4, cleanup scenario + abort on failure

r+ rrelyea, but I'd also like wtc to review since it's in NSPR.

Also, I think we can check pt_book.mustFreeThreadKey beforew we try to do a PT_PTHREAD_KEY_CREATE() and not call PT_PTHREAD_KEY_CREATE if pt_book.mustFreeThreadKey is true.

Also, I assume your change of the PR_ASSERTs are so that you blow up even if DEBUG is not on. I'd like wtc's opinion on whether or not this is correct, or if we should just return. I totally agree that we shouldn't set pt_book.mustFreeThreadKey if PT_PTHREAD_KEY_CREATE didn't succeed.

bob
Attachment #645381 - Flags: review?(rrelyea) → review+

Comment 19

7 years ago
Comment on attachment 645312 [details] [diff] [review]
Patch v2 (fix the leak) [checked in]

Review of attachment 645312 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla/nsprpub/pr/src/pthreads/ptthread.c
@@ +975,4 @@
>          rv = pthread_setspecific(pt_book.key, NULL);
>          PR_ASSERT(0 == rv);
>      }
> +    _PT_PTHREAD_KEY_DELETE(pt_book.key);

Please remove the _PT_PTHREAD_KEY_DELETE macro definition
from _pth.h and just use pthread_key_delete.  Thanks.

Comment 20

7 years ago
Comment on attachment 645381 [details] [diff] [review]
Incremental patch v4, cleanup scenario + abort on failure

Review of attachment 645381 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla/nsprpub/pr/src/pthreads/ptthread.c
@@ +42,5 @@
>      PRLock *ml;                 /* a lock to protect ourselves */
>      PRCondVar *cv;              /* used to signal global things */
>      PRInt32 system, user;       /* a count of the two different types */
>      PRUintn this_many;          /* number of threads allowed for exit */
>      pthread_key_t key;          /* private private data key */

Please change "private private" to "thread private".

@@ +43,5 @@
>      PRCondVar *cv;              /* used to signal global things */
>      PRInt32 system, user;       /* a count of the two different types */
>      PRUintn this_many;          /* number of threads allowed for exit */
>      pthread_key_t key;          /* private private data key */
> +    PRBool mustFreeThreadKey;

Please document this boolean member.  For example:
  /* whether 'key' should be deleted */

I suggest renaming this boolean member "keyCreated" or "keyValid",
to mean whether 'key' has been created or is valid.  The name
"mustFreeThreadKey" is a little long.

@@ +970,5 @@
> +         * or we already went through PR_Cleanup()
> +         */
> +        if (pt_book.mustFreeThreadKey)
> +        {
> +            _PT_PTHREAD_KEY_DELETE(pt_book.key);

You should set pt_book.mustFreeThreadKey to PR_FALSE after
this line.

You can do something like this:
  if (pt_book.mustFreeThreadKey) {
    pthread_key_delete(pt_book.key);
    pt_book.mustFreeThreadKey = PR_FALSE;
  }

  /* Either NSPR was never successfully initialized or PR_Cleanup has been called. */
  if (!_pr_initialized) return;

You need to remove the _PT_PTHREAD_KEY_DELETE call you added
in the previous patch.
Assignee

Comment 21

7 years ago
Wan-Teh, I'm trying to understand your proposal from comment 20.

The way I understand it doesn't make sense, because pt_book.key would be used after having destroying it already.

I will attach a new patch that addresses the renames and comments you had proposed.
Assignee

Comment 22

7 years ago
Posted patch Patch v5 (obsolete) — Splinter Review
Wan-Teh, is this ok?
Attachment #652387 - Flags: review?(wtc)

Updated

7 years ago
Status: NEW → ASSIGNED
Priority: -- → P1
Hardware: x86_64 → All
Target Milestone: --- → 4.9.3
Version: 4.9 → other
Reporter

Comment 23

7 years ago
Hi Wan-Teh, have had a chance to review Kai's attachment? I was hoping this would make to cut for 4.9.3.

Comment 24

7 years ago
Comment on attachment 652387 [details] [diff] [review]
Patch v5

Sorry, I thought this bug is already fixed. I will take a look
at this patch.

Comment 25

7 years ago
Posted patch Patch v6 (obsolete) — Splinter Review
Kai, please review this patch.

I take a simpler approach. I just add a pthread_key_delete call
to PR_Cleanup, but only when there is no other thread active.
(You pointed out this issue in comment 16.)

Let's see if this simpler approach is good enough in practice.
If not, then we will use the approach of your patch v5 (only
call pthread_key_delete in _PR_Fini and use a keyCreated boolean
to keep track of whether the key needs to be deleted).

I didn't fix problem b (memory corruption) that Joe Orton reported
in comment 9. We need to deal with that separately. Many things
break if NSPR initialization fails.

Note: I verified that when libnspr4.so is unloaded and reloaded,
_pr_initialized is 0, even if we don't call PR_Cleanup before
unloading libnspr4.so. This confirms what Bob said in comment 13.

I've checked in the patch on the NSPR trunk (NSPR 4.9.3). I will
address your review comments with a new patch.

Checking in pr/include/md/_pth.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_pth.h,v  <--  _pth.h
new revision: 3.41; previous revision: 3.40
done
Checking in pr/src/pthreads/ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v  <--  ptthread.c
new revision: 3.98; previous revision: 3.97
done
Attachment #652387 - Attachment is obsolete: true
Attachment #652387 - Flags: review?(wtc)
Attachment #673474 - Flags: review?(kaie)

Comment 26

7 years ago
This cumulative patch summarizes all the patches that have
been checked in.

Updated

7 years ago
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED

Comment 27

7 years ago
(In reply to Kai Engert (:kaie) from comment #21)
> Wan-Teh, I'm trying to understand your proposal from comment 20.
> 
> The way I understand it doesn't make sense, because pt_book.key would be
> used after having destroying it already.

Just for the record: my suggestion was wrong. Thank you for catching my
mistake.
Assignee

Comment 28

7 years ago
Our test machine buildnss01 frequently reports a failure with a crashed selfserv.
I just looked at the crash, and it might be related to this bug:

#0  0x0000003582a30285 in raise () from /lib64/libc.so.6
#1  0x0000003582a31d30 in abort () from /lib64/libc.so.6
#2  0x00002b06d6561c11 in PR_Assert (s=0x2b06d65907af "0 == rv", file=0x2b06d6590788 "../../../../pr/src/pthreads/ptthread.c", ln=772) at ../../../../pr/src/io/prlog.c:554
#3  0x00002b06d658655f in _pt_thread_death (arg=0x1e32d650) at ../../../../pr/src/pthreads/ptthread.c:772
#4  0x0000003583a05b19 in __nptl_deallocate_tsd () from /lib64/libpthread.so.0
#5  0x0000003583a0678b in start_thread () from /lib64/libpthread.so.0
#6  0x0000003582ad325d in clone () from /lib64/libc.so.6

I marked the line containing the assert with !!!

766         /* PR_TRUE for: call destructors */
767         _pt_thread_death_internal(arg, PR_TRUE);
768
769         if (NULL == thred)
770         {
771             rv = pthread_setspecific(pt_book.key, NULL);
772  !!!        PR_ASSERT(0 == rv);
773         }
Assignee

Comment 29

7 years ago
... and this crash was reported 40 minutes ago, meaning we still crash after Wan-Teh's most recent change to CVS.
Assignee

Comment 30

7 years ago
reopening, because we're not done
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee

Comment 31

7 years ago
Actually, it seems this new crash was INTRODUCED by the changes since Friday.

Sorry, but I don't understand yet why patch v6 is supposed to work. It's a long time since I've worked on this, and I must re-read all involved code before I can make a judgement.

But given that we have a new crash (and this checkin seems the only one that could have introduced it since Friday), I want to back it out, in order to see if the crash goes away.

If this patch v6 was responsible, then I propose we try to check in my proposed patch, to see if it works better.
Assignee

Comment 32

7 years ago
Comment on attachment 673474 [details] [diff] [review]
Patch v6

backed out

Checking in include/md/_pth.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_pth.h,v  <--  _pth.h
new revision: 3.42; previous revision: 3.41
done
Checking in src/pthreads/ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v  <--  ptthread.c
new revision: 3.99; previous revision: 3.98
done

Comment 33

7 years ago
Kai, thank you for your help.

The assertion failures that caused the test failures are:

ssl.sh: Stress TLS RC4 128 with MD5 (session ticket, client auth) ----
selfserv starting at Mon Oct 22 09:30:08 EDT 2012
selfserv -D -p 8543 -d ../ext_server -n buildnss01.localdomain -B -s \
         -e buildnss01.localdomain-ec -w nss -r -r -u -i ../tests_pid.14519  &
trying to connect to selfserv at Mon Oct 22 09:30:09 EDT 2012
tstclnt -p 8543 -h buildnss01.localdomain  -q \
        -d ../ext_client -v < /home/tinderbox/mozilla/security/tinderlight/data/buildnss01_trunk_64_DBG/mozilla/security/nss/tests/ssl/sslreq.dat
tstclnt: connecting to buildnss01.localdomain:8543 (address=127.0.0.1)
kill -0 15277 >/dev/null 2>/dev/null
selfserv with PID 15277 found at Mon Oct 22 09:30:09 EDT 2012
selfserv with PID 15277 started at Mon Oct 22 09:30:09 EDT 2012
strsclnt -q -p 8543 -d ../ext_client  -w nss -V ssl3: -c 100 -C c -n ExtendedSSLUser -u \
          buildnss01.localdomain
strsclnt started at Mon Oct 22 09:30:09 EDT 2012
strsclnt: -- SSL: Server Certificate Validated.
strsclnt: 0 cache hits; 1 cache misses, 0 cache not reusable
          0 stateless resumes
strsclnt: 99 cache hits; 1 cache misses, 0 cache not reusable
          99 stateless resumes
strsclnt completed at Mon Oct 22 09:30:09 EDT 2012
ssl.sh: #2079: Stress TLS RC4 128 with MD5 (session ticket, client auth) produced a returncode of 0, expected is 0.  - PASSED
trying to kill selfserv with PID 15277 at Mon Oct 22 09:30:09 EDT 2012
kill -USR1 15277
Assertion failure: 0 == rv, at ../../../../pr/src/pthreads/ptthread.c:979
selfserv: 99 cache hits; 1 cache misses, 0 cache not reusable
          99 stateless resumes, 0 ticket parse failures
selfserv: normal termination
1154582848[1e341f20]: Assertion failure: 0 == rv, at ../../../../pr/src/pthreads/ptthread.c:772
Assertion failure: 0 == rv, at ../../../../pr/src/pthreads/ptthread.c:772
./ssl.sh: line 173: 15277 Aborted                 (core dumped) ${PROFTOOL} ${BINDIR}/selfserv -D -p ${PORT} -d ${P_R_SERVERDIR} -n ${HOSTADDR} ${SERVER_OPTIONS} ${ECC_OPTIONS} -w nss ${sparam} -i ${R_SERVERPID} $verbose
selfserv -b -p 8543 2>/dev/null;
selfserv with PID 15277 killed at Mon Oct 22 09:30:09 EDT 2012
ssl.sh: #2080: kill_selfserv core detection step - Core file is detected - FAILED

The first assertion failure (much more common) is:

 960 void _PR_Fini(void)
 961 {
 962     void *thred;
 963     int rv;
 964 
 965     if (!_pr_initialized) return;
 966 
 967     _PT_PTHREAD_GETSPECIFIC(pt_book.key, thred);
 968     if (NULL != thred)
 969     {
 970         /*
 971          * PR_FALSE, because it is unsafe to call back to the 
 972          * thread private data destructors at final cleanup.
 973          */
 974         _pt_thread_death_internal(thred, PR_FALSE);
 975         rv = pthread_setspecific(pt_book.key, NULL);
 976         PR_ASSERT(0 == rv);
 977     }
 978     rv = pthread_key_delete(pt_book.key);
 979     PR_ASSERT(0 == rv);  <=== THIS FAILED.
 980     /* TODO: free other resources used by NSPR */
 981     /* _pr_initialized = PR_FALSE; */
 982 }  /* _PR_Fini */

The second assertion failure is:

750 static void _pt_thread_death(void *arg)
 751 {
 752     void *thred;
 753     int rv;
 754 
 755     _PT_PTHREAD_GETSPECIFIC(pt_book.key, thred);
 756     if (NULL == thred)
 757     {
 758         /*
 759          * Have PR_GetCurrentThread return the expected value to the
 760          * destructors.
 761          */
 762         rv = pthread_setspecific(pt_book.key, arg);
 763         PR_ASSERT(0 == rv);
 764     }
 765 
 766     /* PR_TRUE for: call destructors */
 767     _pt_thread_death_internal(arg, PR_TRUE);
 768 
 769     if (NULL == thred)
 770     {
 771         rv = pthread_setspecific(pt_book.key, NULL);
 772         PR_ASSERT(0 == rv);  <=== THIS FAILED.
 773     }
 774 }
Status: REOPENED → ASSIGNED

Comment 34

7 years ago
I checked in the code cleanup changes from Kai's patch v5
(attachment 652387 [details] [diff] [review]).

Checking in pr/include/md/_pth.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_pth.h,v  <--  _pth.h
new revision: 3.43; previous revision: 3.42
done
Checking in pr/src/pthreads/ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v  <--  ptthread.c
new revision: 3.100; previous revision: 3.99
done
Attachment #673474 - Attachment is obsolete: true
Attachment #673474 - Flags: review?(kaie)

Comment 35

7 years ago
Attachment #673476 - Attachment is obsolete: true

Comment 36

7 years ago
Posted patch The rest of Kai's patch v5 (obsolete) — Splinter Review

Comment 37

7 years ago
I want to see if this assertion fails, without any change to PR_Cleanup.

Checking in ptthread.c;
/cvsroot/mozilla/nsprpub/pr/src/pthreads/ptthread.c,v  <--  ptthread.c
new revision: 3.101; previous revision: 3.100
done

If this assertion fails, please back out this change only. Keep the
code cleanup changes.
Attachment #674082 - Flags: review?(kaie)
Assignee

Updated

7 years ago
Attachment #674082 - Flags: review?(kaie) → review+
Reporter

Comment 38

6 years ago
Why is this bug still in the assigned state?
Reporter

Comment 39

6 years ago
Kai, Wan-Teh,

This bug is still in the assigned state. The patches where approved and committed to cvs but were not picked up with the migration to mercurial. Was this intentional because may be extra work to do or or oversight due to never having been marked as fixed?  Red Hat needs to know soon we can pick it up for use on the upcoming update to RHEL-6.  I am inclined to pick it up and I need to make a recommendation very soon. Please, let me know if it will added to the current sources in the hg repository and its safe for me to use it. Thanks in advance for your answer.

-Elio
Flags: needinfo?(kaie)
Assignee

Comment 40

6 years ago
Elio, there are multiple patches in this bug, and it's not clear to me, which patches still need landing.

Elio, can you please make it clear, which patches are you interested in using? It's best if you quote the attachment numbers.
Flags: needinfo?(kaie)
Assignee

Comment 41

6 years ago
Comment on attachment 674074 [details] [diff] [review]
Code cleanup changes from Kai's patch v5 [checked in]

This patch from comment 34 is contained in HG.
Assignee

Comment 42

6 years ago
Comment on attachment 674082 [details] [diff] [review]
Assert that the pthread_key_delete call in _PR_Fini succeeds [checked in]

This patch from comment 37 is contained in HG
Assignee

Comment 43

6 years ago
Elio, unless you think any of the other patches still must be checked in to HG, this should be marked fixed.

I'm no longer certain what the patch from comment 36 is about. It might have been an alternative proposal that is no longer necessary, because of the patches that Wan-Teh had checked in?
Reporter

Comment 44

6 years ago
Kai, 

I must have been comparing the wrong local copies in my system. After redoing the diff of ptthread.c of last state from from CVS against the current from hg, I see that the current one one has everything that you checked in. Everything needed made in the switch. No other patches are needed. Sorry for the confusion. It seems to me that this bug should be switched to RESOLVED.

Thanks,

-Elio
Assignee

Updated

6 years ago
Status: ASSIGNED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → FIXED
Assignee

Comment 45

6 years ago
After reading more carefully. I'm reopening this bug.

This bug is still in an incomplete state. It's unfortunate we aren't able to finalize important correctness bugs in a timely manner, and stretch such issues over a long period of time, where each time going back requires a lot of time to read again and remember all the details.

I'm trying to describe the current state of this bug.

(A) We had noticed that we must call pthread_key_delete.
    This has been done (comment 14).

    So, *part* of this bug is fixed.

(B) Bob noticed we have some scenarios that still need fixing (comment 13).

(C) I created another patch to complete the fix (comment 17), which Bob r+ed (comment 18),
    but suggested that wtc should review, too.

(D) Wan-Teh did a review, but only focused on stylistic details.
    I addressed Wan-Teh's requests in comment 22.

(E) Wan-Teh followed up, judging that he didn't like my patch at all, 
    and wrote an alternative implementation (comment 25).
    It was checked in without review.
    However, it didn't work, and therefore it was backed out (comment 32).

(F) After Wan-Teh's patch didn't work, he checked in some cleanup from my patch (comment 34)
    and he merged the remaining functional portions from my earlier suggested patch
    (comment 36), and he checked in an experimental assertion (37).

At this point work on this bug had stalled.

Wan-Teh didn't follow up reviewing the functional changes from my patch 5 (which he attached in comment 36).


Summary:
- Portions of the bugs are fixed
- Portions are still unfixed
- Wan-Teh's experiment from comment 37 didn't cause problems

The remaining work:
- Either Wan-Teh or Bob must review attachment 674079 [details] [diff] [review] (which attempts to fix the issue mentioned by Bob in comment 13)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee

Comment 46

6 years ago
Comment on attachment 674079 [details] [diff] [review]
The rest of Kai's patch v5

This patch still needs review.

However, it no longer applies to NSPR trunk, because of a context change. I'll attach a merged version.
Assignee

Updated

6 years ago
Whiteboard: summary: comment 45
Assignee

Updated

6 years ago
Attachment #674082 - Attachment description: Assert that the pthread_key_delete call in _PR_Fini succeeds → Assert that the pthread_key_delete call in _PR_Fini succeeds [checked in]
Assignee

Updated

6 years ago
Attachment #645381 - Attachment is obsolete: true
Assignee

Updated

6 years ago
Attachment #674074 - Attachment description: Code cleanup changes from Kai's patch v5 → Code cleanup changes from Kai's patch v5 [checked in]
Assignee

Comment 47

6 years ago
This is my patch v5 merged to NSPR trunk (only one context line had changed).

Note that right now I don't remember any details about this patch any more. If you have questions on this patch, or request changes to this patch, I will have to reread the related code.

Either Bob or Wan-Teh, could you please review?
Attachment #674079 - Attachment is obsolete: true
Attachment #789149 - Flags: superreview?(wtc)
Attachment #789149 - Flags: review?(rrelyea)

Comment 48

6 years ago
Comment on attachment 789149 [details] [diff] [review]
Kai's patch v7 (ensure we'll free the thread key)

r+ rrelyea.

This looks like it handles wtc's style comments.

Wan-Teh, it's a relatively short patch that seems to address the issues raised, the only reservation would be if you saw something we missed when you did your rewrite. If not then I think we can drop this in and make this bug go away.

bob
Attachment #789149 - Flags: review?(rrelyea) → review+
Assignee

Updated

6 years ago
Attachment #789149 - Attachment description: Kai's patch v7 → Kai's patch v7 (ensure we'll free the thread key)
Assignee

Comment 49

6 years ago
Checked in:
https://hg.mozilla.org/projects/nspr/rev/08840926f193
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: 4.9.3 → 4.10.3

Comment 50

6 years ago
Comment on attachment 789149 [details] [diff] [review]
Kai's patch v7 (ensure we'll free the thread key)

Review of attachment 789149 [details] [diff] [review]:
-----------------------------------------------------------------

This patch may have caused Mozilla bug 946072: an assertion failure at
nspr/pr/src/pthreads/ptthread.c:848:

 826 static void _pt_thread_death(void *arg)
 827 {
 828     void *thred;
 829     int rv;
 830 
 831     _PT_PTHREAD_GETSPECIFIC(pt_book.key, thred);
 832     if (NULL == thred)
 833     {
 834         /*
 835          * Have PR_GetCurrentThread return the expected value to the
 836          * destructors.
 837          */
 838         rv = pthread_setspecific(pt_book.key, arg);
 839         PR_ASSERT(0 == rv);
 840     }
 841 
 842     /* PR_TRUE for: call destructors */
 843     _pt_thread_death_internal(arg, PR_TRUE);
 844 
 845     if (NULL == thred)
 846     {
 847         rv = pthread_setspecific(pt_book.key, NULL);
 848         PR_ASSERT(0 == rv);  <=== ASSERTION FAILED
 849     }
 850 }

Bob, do you have a test program for this bug? I'll run the test program
and see if I can come up with an alternative fix. It is very hard to
shut down NSPR cleanly.
Attachment #789149 - Flags: superreview?(wtc) → checked-in+

Comment 51

6 years ago
Comment on attachment 789149 [details] [diff] [review]
Kai's patch v7 (ensure we'll free the thread key)

Review of attachment 789149 [details] [diff] [review]:
-----------------------------------------------------------------

I will back out this patch in mozilla-central as an experiment.
If the assertion in Mozilla bug 946072 stops failing, then we
will know the assertion failure is caused by this patch.

Comment 52

6 years ago
1. Call PR_Assert (a function used by the PR_ASSERT macro) so that
an error message is printed before aborting the process.

2. We don't need to abort the process if the pthread_setspecific call
fails. (Also, that pthread_setspecific call is unlikely to fail.)
PR_ASSERT is sufficient.

3. Assert that the new pthread_key_delete call succeeds.

4. Always set pt_book.keyCreated to PR_FALSE after calling pthread_key_delete.
Attachment #8362138 - Flags: review?(kaie)
Assignee

Comment 53

6 years ago
Comment on attachment 8362138 [details] [diff] [review]
Call PR_Assert instead of PR_Abort

Good improvement, thanks.

r=kaie
Attachment #8362138 - Flags: review?(kaie) → review+

Comment 54

6 years ago
Comment on attachment 8362138 [details] [diff] [review]
Call PR_Assert instead of PR_Abort

Review of attachment 8362138 [details] [diff] [review]:
-----------------------------------------------------------------

Patch checked in: https://hg.mozilla.org/projects/nspr/rev/67602d9f63c2
Attachment #8362138 - Flags: checked-in+
You need to log in before you can comment on or make changes to this bug.