Closed
Bug 844784
Opened 11 years ago
Closed 10 years ago
TSan: Thread data race in PR_EnterMonitor
Categories
(NSPR :: NSPR, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
4.10.4
People
(Reporter: posidron, Assigned: wtc)
References
(Blocks 1 open bug)
Details
(Keywords: sec-want, Whiteboard: [tsan][tsan-test-blocker])
Attachments
(7 files, 7 obsolete files)
29.95 KB,
text/plain
|
Details | |
15.18 KB,
patch
|
laszio.bugzilla
:
review+
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
845 bytes,
patch
|
laszio.bugzilla
:
review+
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
6.73 KB,
patch
|
jcranmer
:
review+
laszio.bugzilla
:
superreview+
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
789 bytes,
patch
|
Details | Diff | Splinter Review | |
24.96 KB,
patch
|
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
621 bytes,
patch
|
jcranmer
:
review+
wtc
:
checked-in+
|
Details | Diff | Splinter Review |
During Firefox start-up with ThreadSanitizer (LLVM version), we get a data race reported as described in the attached log. Trace was created on mozilla-central with changeset c233837cce08. According to the TSan devs, most of the reported traces should be real data races, even though they can be "benign". We need to determine if the race can/should be fixed, or put on the ignore list. Even for benign races, TSan devs suggest to fix them (second priority), as they can also cause problems [1]. [1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
Comment 1•11 years ago
|
||
Comment says:
> * This is safe only if mon->owner (a pthread_t) can be
> * read in one instruction. Perhaps mon->owner should be
> * a "PRThread *"?
Not sure if that means the race is intentional? Maybe it can be fixed even if it is.
Assignee | ||
Comment 2•11 years ago
|
||
Yes, the code was intentionally written this way. I think it should be fixed. If "TraceMalloc" is still being used, the fix needs to avoid calling malloc() in PR_EnterMonitor. See bug 84035 comment 5 for a description of my last attempt at a fix. Note that we may consider the less efficient approach (the first approach in that comment), which I rejected last time.
Comment 3•11 years ago
|
||
This is still a source for a good number of very frequent races in TSan. @wtc: Is there anything actionable we can do here?
Flags: needinfo?(wtc)
Assignee | ||
Comment 4•11 years ago
|
||
decoder: could you test this patch? I'd appreciate it if you could also review the patch or suggest a reviewer. A good reviewer should be very familiar with the pthreads mutexes and condition variables, and know how Java's "synchronized" key word and "wait/notify/notifyAll" methods work. (NSPR's PRMonitor was originally intended to implement those features in the JVM.)
Attachment #818599 -
Flags: review?(choller)
Flags: needinfo?(wtc)
Assignee | ||
Comment 5•11 years ago
|
||
The code review tool showed some spaces at the end of lines. I deleted those spaces.
Attachment #818599 -
Attachment is obsolete: true
Attachment #818599 -
Flags: review?(choller)
Attachment #818645 -
Flags: review?(choller)
Comment 6•11 years ago
|
||
Comment on attachment 818645 [details] [diff] [review] Patch v1.1 I can test the patch, but unfortunately I'm not a peer for NSPR to review it. Ted might be able to review it or suggest the proper person. I'll provide feedback asap on the patch :) Thanks!
Attachment #818645 -
Flags: review?(choller) → review?(ted)
Comment 7•11 years ago
|
||
Comment on attachment 818645 [details] [diff] [review] Patch v1.1 I can confirm that this fixes the race for me :)
Attachment #818645 -
Flags: feedback+
Assignee | ||
Comment 8•11 years ago
|
||
Comment on attachment 818645 [details] [diff] [review] Patch v1.1 Review of attachment 818645 [details] [diff] [review]: ----------------------------------------------------------------- decoder: thank you for testing my patch. Searching for "pthread_cond_wait" in mozilla-central, I found that pthread_cond_wait is mainly used in third-party code. So I can't find a suitable Mozilla developer to review this patch this way. Perhaps we can ask the owner of mfbt/Atomics.h to review this patch? ::: pr/src/pthreads/ptsynch.c @@ +467,5 @@ > + if (0 != rv) > + { > + pthread_cond_destroy(&mon->entryCV); > + pthread_mutex_destroy(&mon->lock); > + PR_Free(mon); The way I free the allocated resources on failure can be improved. Too much code duplication!
Comment 9•11 years ago
|
||
I am definitely not a suitable reviewer. jcranmer is the original author of mfbt/Atomics.h, and froydnj has worked on it a lot, so either of them might be a suitable reviewer. I'd also suggest sewardj, who has a deep understanding of thread synchronization.
Comment 10•11 years ago
|
||
Comment on attachment 818645 [details] [diff] [review] Patch v1.1 Review of attachment 818645 [details] [diff] [review]: ----------------------------------------------------------------- ::: pr/src/pthreads/ptsynch.c @@ -529,5 @@ > - PR_Lock(&mon->lock); > - /* and now I have the lock */ > - PR_ASSERT(0 == mon->entryCount); > - PR_ASSERT(_PT_PTHREAD_THR_HANDLE_IS_INVALID(mon->owner)); > - _PT_PTHREAD_COPY_THR_HANDLE(self, mon->owner); So the original fault here looks to be the broken assumption of "all word-sized loads are atomic" and is basically a modification of the double-checked locking (anti)pattern. It looks to me, on a cursory glance, that the non-pthreads code also has the same race condition (it's just not getting picked up because we don't run thread sanitizers on those platforms), so I wonder if this fix should apply more widely.
Comment 11•11 years ago
|
||
Comment on attachment 818645 [details] [diff] [review] Patch v1.1 Removing this review request, I suggested three possible reviewers.
Attachment #818645 -
Flags: review?(ted)
Assignee | ||
Comment 12•11 years ago
|
||
Comment on attachment 818645 [details] [diff] [review] Patch v1.1 Review of attachment 818645 [details] [diff] [review]: ----------------------------------------------------------------- ::: pr/src/pthreads/ptsynch.c @@ -529,5 @@ > - PR_Lock(&mon->lock); > - /* and now I have the lock */ > - PR_ASSERT(0 == mon->entryCount); > - PR_ASSERT(_PT_PTHREAD_THR_HANDLE_IS_INVALID(mon->owner)); > - _PT_PTHREAD_COPY_THR_HANDLE(self, mon->owner); Yes, the non-pthreads code in nspr/pr/src/threads/prmon.c has the same data race in PR_EnterMonitor and can be fixed in a similar way. I'll do that in a separate patch.
Assignee | ||
Comment 13•11 years ago
|
||
I eliminated the code duplication during error cleanup in PR_NewMonitor. This requires the use of goto. Using goto for error cleanup is acceptable in C code. jcranmer, could you review this patch? Thank you for your help. A PRMonitor is a recursive mutex with an implicit condition variable. The |lock|, |owner|, |entryCV|, and |entryCount| members of PRMonitor are used to implement the recursive mutex. I use the algorithm described in Kleiman, Shah, and Smaalders, "Programming with Threads", pp. 259-260. The boost::recursive_mutex class uses the same algorithm: http://svn.boost.org/svn/boost/trunk/boost/thread/pthread/recursive_mutex.hpp It should be easy to review this part of the patch. The |waitCV| member of PRMonitor is used to implement the implicit condition variable of a PRMonitor. In general it is not safe to use a recursive pthread_mutex_t with a pthread_cond_t because pthread_cond_wait may only unlock the recursive once. This may be why I couldn't find another implementation of this on the Web. Hopefully it is not hard to review this part of the patch.
Attachment #818645 -
Attachment is obsolete: true
Attachment #821828 -
Flags: review?(Pidgeot18)
Assignee | ||
Comment 14•11 years ago
|
||
jcranmer: the Wikipedia article on reentrant mutex (http://en.wikipedia.org/wiki/Reentrant_mutex) has relevant info for this patch. In particular, the last paragraph describes the "synchronized" blocks and the wait(), notify(), and notifyAll() methods in Java. PRMonitor was originally used to implement those features in a JVM.
Comment 15•11 years ago
|
||
Comment on attachment 821828 [details] [diff] [review] Patch v2 Review of attachment 821828 [details] [diff] [review]: ----------------------------------------------------------------- ::: pr/src/pthreads/ptsynch.c @@ +506,5 @@ > > PR_IMPLEMENT(void) PR_AssertCurrentThreadInMonitor(PRMonitor *mon) > { > + PR_ASSERT(mon->entryCount != 0 && > + pthread_equal(mon->owner, pthread_self())); Should this and the access to mon->owner in PR_GetMonitorEntryCount() also be protected by mon->lock? @@ +633,3 @@ > PR_ASSERT(mon != NULL); > + rv = pthread_cond_broadcast(&mon->waitCV); > + PR_ASSERT(0 == rv); The original implementation requires that PR_Notify() and PR_NotifyAll() be called with lock held (at least in debug builds). Because pthread_cond_{signal, broadcase} don't have this limitation, the behavior will be different.
Comment 16•11 years ago
|
||
Comment on attachment 821828 [details] [diff] [review] Patch v2 Review of attachment 821828 [details] [diff] [review]: ----------------------------------------------------------------- From investigating the other implementations, it looks like they also have the underlying race condition and thus need to be fixed. Since we only run TSAN on pthreads-based implementations (I presume), I won't block review on fixing the other implementations. ::: pr/include/private/primpl.h @@ +1445,5 @@ > > struct PRMonitor { > const char* name; /* monitor name for debugging */ > #if defined(_PR_PTHREADS) > + pthread_mutex_t lock; I feel like it is worth mentioning that the lock is only held when accessing fields of the PRMonitor here instead of being held while the monitor is entered. ::: pr/src/pthreads/ptsynch.c @@ +501,3 @@ > pthread_t self = pthread_self(); > if (pthread_equal(mon->owner, self)) > return mon->entryCount; This method also needs to be protected by the lock (unless no one uses this method, in which case it ought to be deleted). @@ +506,5 @@ > > PR_IMPLEMENT(void) PR_AssertCurrentThreadInMonitor(PRMonitor *mon) > { > + PR_ASSERT(mon->entryCount != 0 && > + pthread_equal(mon->owner, pthread_self())); Echoing what Ting-Yuan says, the entryCount and owner fields need to be protected by the lock as well. (Although, strictly speaking, the data race only occurs if the assert would fire--but since there's a data race in that scenario, the assert may fail to fire). @@ -595,5 @@ > - PR_ASSERT(_PT_PTHREAD_MUTEX_IS_LOCKED(mon->lock.mutex)); > - /* and the entries better be positive */ > - PR_ASSERT(mon->entryCount > 0); > - /* and it better be by us */ > - PR_ASSERT(pthread_equal(mon->owner, pthread_self())); You've lost these we're-in-the-monitor-checks, which amount to a change in the API.
Attachment #821828 -
Flags: review?(Pidgeot18) → review+
Comment 17•10 years ago
|
||
What's the status here? This is one of the most frequent races I'm seeing on TSan.
Flags: needinfo?(wtc)
Assignee | ||
Comment 18•10 years ago
|
||
Christian: thanks for the reminder. I am terribly sorry that I forgot to finish this bug. What happened was that I wanted to preserve a a performance optimization of the original code that signals a condition variable after unlocking the mutex, but never got around to doing that. Joshua and Ting-Yuan: thank you very much for your thoughtful review comments. I made all the changes you suggested, plus the performance optimization described above. Please review this new patch. I will annotate this patch in the next comment.
Attachment #821828 -
Attachment is obsolete: true
Attachment #8365617 -
Flags: review?(Pidgeot18)
Flags: needinfo?(wtc)
Assignee | ||
Comment 19•10 years ago
|
||
Comment on attachment 8365617 [details] [diff] [review] Patch v3 Review of attachment 8365617 [details] [diff] [review]: ----------------------------------------------------------------- ::: pr/include/private/primpl.h @@ +1445,5 @@ > > struct PRMonitor { > const char* name; /* monitor name for debugging */ > #if defined(_PR_PTHREADS) > + PRIntn notifyTimes; /* number of pending notifies for waitCV */ notifyTimes is only accessed while the current thread owns the monitor, so it is not really protected by the |lock| field. I will point out where I take advantage of this. If you think this point is too subtle, I can also just require notifyTimes be accessed only when |lock| is held. ::: pr/src/pthreads/ptsynch.c @@ +188,5 @@ > } /* PR_Lock */ > > PR_IMPLEMENT(PRStatus) PR_Unlock(PRLock *lock) > { > + pthread_t self = pthread_self(); This is just a minor cleanup, to avoid calling pthread_self() twice in this function. @@ -312,5 @@ > notified->length += 1; > - > -finished: > - PR_ASSERT(PR_TRUE == cvar->lock->locked); > - PR_ASSERT(pthread_equal(cvar->lock->owner, pthread_self())); These two assertions are also present at the beginning of this function. It is clear that this function doesn't change the conditions asserted, so we don't need to assert again when exiting the function. @@ +432,5 @@ > +static void pt_PostNotifyToMonitor(PRMonitor *mon, PRBool broadcast) > +{ > + PRIntn rv; > + > + /* Assert that the monitor is owned by us. */ This function is called by both PR_Notify and PR_NotifyAll. So I do the monitor ownership check here. It does make the monitor ownership check for PR_Notify and PR_NotifyAll less obvious. Note: I just realized that I should use the PR_ASSERT_CURRENT_THREAD_IN_MONITOR macro here. @@ +443,5 @@ > + > + if (broadcast) > + mon->notifyTimes = -1; > + else if (-1 != mon->notifyTimes) > + mon->notifyTimes += 1; In this function, I am taking advantage of the fact that notifyTimes is protected by the monitor, so I am modifying notifyTimes after releasing mon->lock. This complicates the description of the thread-safe protection of the fields of a PRMonitor.
Attachment #8365617 -
Flags: review?(thuang)
Comment 20•10 years ago
|
||
Comment on attachment 8365617 [details] [diff] [review] Patch v3 Review of attachment 8365617 [details] [diff] [review]: ----------------------------------------------------------------- ::: pr/src/pthreads/ptsynch.c @@ +443,5 @@ > + > + if (broadcast) > + mon->notifyTimes = -1; > + else if (-1 != mon->notifyTimes) > + mon->notifyTimes += 1; I feel like a comment to this effect in the code would be worthwhile.
Attachment #8365617 -
Flags: review?(Pidgeot18) → review+
Assignee | ||
Comment 21•10 years ago
|
||
Joshua: thanks for the review. I added the comment that you suggested. I am a little worried that ThreadSanitizer may flag the modifications of mon->notifyTimes in pt_PostNotifyToMonitor as not thread-safe because mon->notifyTimes is modified in other places with mon->lock acquired. mon->notifyTimes is only accessed by the owner of PRMonitor, so different threads' accesses to mon->notifyTimes will be separated by a transfer of PRMonitor ownership. Hopefully ThreadSanitizer will consider this thread-safe. Patch checked in: https://hg.mozilla.org/projects/nspr/rev/b15b05fafb4f
Attachment #8365617 -
Attachment is obsolete: true
Attachment #8365617 -
Flags: review?(thuang)
Attachment #8367092 -
Flags: review?(thuang)
Attachment #8367092 -
Flags: checked-in+
Assignee | ||
Updated•10 years ago
|
Status: NEW → ASSIGNED
Priority: -- → P1
Target Milestone: --- → 4.10.4
Comment 22•10 years ago
|
||
Comment on attachment 8367092 [details] [diff] [review] Patch v4 Review of attachment 8367092 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me except for PR_Wait() with timeout. ::: pr/src/pthreads/ptsynch.c @@ +558,5 @@ > + rv = pthread_mutex_lock(&mon->lock); > + PR_ASSERT(0 == rv); > + PR_ASSERT(mon->entryCount != 0 && > + pthread_equal(mon->owner, pthread_self())); > + rv = pthread_mutex_unlock(&mon->lock); In normal/non-debug build, the net effect is to lock and immediately unlock. How about embracing the lock/unlock together with the assertions by #ifdef DEBUG ? Put the lock/unlock into PR_ASSERT() also make sense to me. I'm not sure which style is preferred in this code base. @@ +672,5 @@ > + if (timeout == PR_INTERVAL_NO_TIMEOUT) > + rv = pthread_cond_wait(&mon->waitCV, &mon->lock); > + else > + rv = pt_TimedWait(&mon->waitCV, &mon->lock, timeout); > + PR_ASSERT(0 == rv); Should a timed-out wait be regarded as fatal? In the original implementation (and other implementations I've seen), PR_Wait() returns E_XXX_YYY after acquiring the lock (not mon->lock in this patch, I mean the recursive mutex implemented with entryCV). Sorry I missed this in previous review.
Attachment #8367092 -
Flags: review?(thuang) → review-
Assignee | ||
Comment 23•10 years ago
|
||
Comment on attachment 8367092 [details] [diff] [review] Patch v4 Review of attachment 8367092 [details] [diff] [review]: ----------------------------------------------------------------- Ting-Yuan: thanks a lot for your comments. I responded to them below. ::: pr/src/pthreads/ptsynch.c @@ +558,5 @@ > + rv = pthread_mutex_lock(&mon->lock); > + PR_ASSERT(0 == rv); > + PR_ASSERT(mon->entryCount != 0 && > + pthread_equal(mon->owner, pthread_self())); > + rv = pthread_mutex_unlock(&mon->lock); The problem you pointed out in a non-debug build is acceptable, because people are not supposed to call this function directly. Here is the documentation from the prmon.h header: 79 /* 80 ** PR_ASSERT_CURRENT_THREAD_IN_MONITOR 81 ** If the current thread is in |mon|, this assertion is guaranteed to 82 ** succeed. Otherwise, the behavior of this function is undefined. 83 */ 84 #if defined(DEBUG) || defined(FORCE_PR_ASSERT) 85 #define PR_ASSERT_CURRENT_THREAD_IN_MONITOR(/* PRMonitor* */ mon) \ 86 PR_AssertCurrentThreadInMonitor(mon) 87 #else 88 #define PR_ASSERT_CURRENT_THREAD_IN_MONITOR(/* PRMonitor* */ mon) 89 #endif 90 91 /* Don't call this function directly. */ 92 NSPR_API(void) PR_AssertCurrentThreadInMonitor(PRMonitor *mon); I'll make the change you suggested. @@ +672,5 @@ > + if (timeout == PR_INTERVAL_NO_TIMEOUT) > + rv = pthread_cond_wait(&mon->waitCV, &mon->lock); > + else > + rv = pt_TimedWait(&mon->waitCV, &mon->lock, timeout); > + PR_ASSERT(0 == rv); This is a good question. The PR_WaitCondVar and PR_Wait functions do not report the timed-out error. This is documented in https://developer.mozilla.org/en-US/docs/PR_WaitCondVar and https://developer.mozilla.org/en-US/docs/PR_Wait. If pt_TimedWait times out, it still returns 0. This is done at the end of the pt_TimedWait function: rv = pthread_cond_timedwait(cv, ml, &tmo); /* NSPR doesn't report timeouts */ #ifdef _PR_DCETHREADS if (rv == -1) return (errno == EAGAIN) ? 0 : errno; else return rv; #else return (rv == ETIMEDOUT) ? 0 : rv; #endif The original implementation of PR_Wait returns the return value of the PR_WaitCondVar(mon->cvar, timeout) call, which doesn't report the timed-out error either.
Assignee | ||
Comment 24•10 years ago
|
||
Attachment #8367536 -
Flags: review?(thuang)
Comment 25•10 years ago
|
||
Comment on attachment 8367536 [details] [diff] [review] PR_AssertCurrentThreadInMonitor should be a no-op when assertions are disabled. Review of attachment 8367536 [details] [diff] [review]: ----------------------------------------------------------------- You are right. I should read the comments more carefully. Sorry about that.
Attachment #8367536 -
Flags: review?(thuang) → review+
Updated•10 years ago
|
Attachment #8367092 -
Flags: review- → review+
Assignee | ||
Comment 26•10 years ago
|
||
Comment on attachment 8367536 [details] [diff] [review] PR_AssertCurrentThreadInMonitor should be a no-op when assertions are disabled. No worries. Thanks again for the review. Patch checked in: https://hg.mozilla.org/projects/nspr/rev/f2aa2bf5abbb
Attachment #8367536 -
Flags: checked-in+
Assignee | ||
Comment 27•10 years ago
|
||
[Ting-Yuan: I am just using "superreview" as a second review.] When I pushed my new PRMonitor code to mozilla-inbound, it caused assertion failures on Mac OS X 10.6. (The assertion failures were reported in bug 958796 comment 29 because I cited that bug in the commit message.) The assertion that failed is this one at the end of PR_ExitMonitor, at ptsynch.c:637: 600 PR_IMPLEMENT(PRStatus) PR_ExitMonitor(PRMonitor *mon) 601 { ... 620 mon->entryCount -= 1; /* reduce by one */ 621 if (mon->entryCount == 0) 622 { 623 /* and if it transitioned to zero - notify an entry waiter */ 624 /* make the owner unknown */ 625 _PT_PTHREAD_INVALIDATE_THR_HANDLE(mon->owner); 626 notifyEntryWaiter = PR_TRUE; 627 notifyTimes = mon->notifyTimes; 628 mon->notifyTimes = 0; 629 } 630 rv = pthread_mutex_unlock(&mon->lock); 631 PR_ASSERT(0 == rv); 632 if (notifyTimes) 633 pt_PostNotifiesFromMonitor(&mon->waitCV, notifyTimes); 634 if (notifyEntryWaiter) 635 { 636 rv = pthread_cond_signal(&mon->entryCV); 637 PR_ASSERT(0 == rv); <=== FAILED 638 } 639 return PR_SUCCESS; 640 } /* PR_ExitMonitor */ This reminded me of the need to reference-count PRCondVar because of the delayed condition variable notifications. (The reference count in PRCondVar is the 'notify_pending' member.) When I implemented the same optimization for PRMonitor, I thought a reference count was not necessary because the mutex and condition variables of a PRMonitor are created and destroyed together. But I believe I was wrong. How to review this patch: 1. First, read the test program monref.c. I hope you will agree that the test program destroys the PRMonitor at a safe time. Note: By adding an artificial delay before the pthread_cond_signal(&mon->entryCV) call on line 636 for the 'ThreadFunc' thread, I can make the test program hang on Linux, and AddressSanitizer reports heap-use-after-free. 2. Then, review the changes to primpl.h and ptsynch.c that implement the reference count of PRMonitor. I know this problem is very subtle. Thank you for your patience in advance.
Attachment #8371192 -
Flags: superreview?(thuang)
Attachment #8371192 -
Flags: review?(Pidgeot18)
Assignee | ||
Comment 28•10 years ago
|
||
Comment on attachment 8371192 [details] [diff] [review] Add a reference count to PRMonitor because PR_ExitMonitor may access struct members after unlocking the internal mutex Review of attachment 8371192 [details] [diff] [review]: ----------------------------------------------------------------- ::: pr/src/pthreads/ptsynch.c @@ +539,2 @@ > #endif > + PR_Free(mon); I will remove the whitespace. @@ +640,4 @@ > if (notifyEntryWaiter) > { > + if (notifyTimes) > + pt_PostNotifiesFromMonitor(&mon->waitCV, notifyTimes); notifyTimes is nonzero only when notifyEntryWaiter is true, so it is correct to move this inside the "if (notifyEntryWaiter)" block. This makes it clear that the members of 'mon' are accessed after unlocking mon->lock only if notifyEntryWaiter is true, and therefore we only need to release a reference in the same "if" block.
Comment 29•10 years ago
|
||
Comment on attachment 8371192 [details] [diff] [review] Add a reference count to PRMonitor because PR_ExitMonitor may access struct members after unlocking the internal mutex Review of attachment 8371192 [details] [diff] [review]: ----------------------------------------------------------------- I can't find any problem in this patch. By the way, did you push to the try tree before inbound?
Attachment #8371192 -
Flags: superreview?(thuang) → superreview+
Assignee | ||
Comment 30•10 years ago
|
||
No, I didn't push to the try server before pushing to mozilla-inbound. Please review the new patch. I made the following changes. 1. I realized that PR_DestroyMonitor has become a "release reference" function, so PR_ExitMonitor can just call PR_DestroyMonitor to release a reference. With this change, PR_DestroyMonitor can compare refCount with 0 using == 0 instead of <= 0. 2. I added some comments.
Attachment #8371192 -
Attachment is obsolete: true
Attachment #8371192 -
Flags: review?(Pidgeot18)
Attachment #8371758 -
Flags: superreview?(thuang)
Attachment #8371758 -
Flags: review?(Pidgeot18)
Assignee | ||
Comment 31•10 years ago
|
||
I used this patch and the monref.c test program to reproduce the heap-use-after-free bug in PR_ExitMonitor. Note: the primordial thread (PT_THREAD_PRIMORD) is the thread that runs the main() function, also called the main thread.
Assignee | ||
Updated•10 years ago
|
Attachment #8371798 -
Attachment is patch: true
Assignee | ||
Comment 32•10 years ago
|
||
Comment on attachment 8371758 [details] [diff] [review] Add a reference count to PRMonitor because PR_ExitMonitor may access struct members after unlocking the internal mutex, v2 Review of attachment 8371758 [details] [diff] [review]: ----------------------------------------------------------------- Patch checked in: https://hg.mozilla.org/projects/nspr/rev/ef953c9ffbc4 The try server confirms this patch fixes the assertion failure. I would still appreciate a post-commit review. Thanks.
Attachment #8371758 -
Flags: checked-in+
Assignee | ||
Comment 33•10 years ago
|
||
Comment on attachment 8371758 [details] [diff] [review] Add a reference count to PRMonitor because PR_ExitMonitor may access struct members after unlocking the internal mutex, v2 Review of attachment 8371758 [details] [diff] [review]: ----------------------------------------------------------------- Here are the try results. Check the OS X 10.6 Debug results. Before: https://tbpl.mozilla.org/?tree=Try&rev=767d9ffc5d6c After: https://tbpl.mozilla.org/?tree=Try&rev=708b36ddb7d2
Updated•10 years ago
|
Attachment #8371758 -
Flags: superreview?(thuang) → superreview+
Assignee | ||
Comment 34•10 years ago
|
||
Comment on attachment 8371758 [details] [diff] [review] Add a reference count to PRMonitor because PR_ExitMonitor may access struct members after unlocking the internal mutex, v2 Review of attachment 8371758 [details] [diff] [review]: ----------------------------------------------------------------- Pushed to mozilla-inbound as part of NSPR_4_10_4_BETA3: https://hg.mozilla.org/integration/mozilla-inbound/rev/386a2183cc84
Assignee | ||
Updated•10 years ago
|
Whiteboard: [tsan][tsan-test-blocker] → [tsan][tsan-test-blocker][leave open]
Assignee | ||
Comment 36•10 years ago
|
||
Please review the patch in three steps. 1. prulock.c, prucv.c, and primpl.h (except the changes to struct PRMonitor): I added _PR_InitLock, _PR_FreeLock, _PR_InitCondVar, and _PR_FreeCondVar so that PRLock and PRCondVar structures can be initialized and cleaned up without memory allocation and deallocation. This allows PRMonitor to contain PRLock and PRCondVar members directly. I also cleaned up some function declarations in primpl.h. 2. ptsynch.c: I cleaned up some code when I ported the code to Windows (or rather, non-pthreads platforms). 3. prmon.c and primpl.h (the struct PRMonitor changes): there are two important differences: - PRCondVar must be notified while holding PRLock. However, it still helps for PR_Notify and PR_NotifyAll to delay the notifications because we only need to notify mon->waitCV when we fully exit the monitor (i.e., when mon->entryCount becomes 0). - The reference counting of PRMonitor is not necessary, even though a similar optimization of delayed condition variable notifications is done inside PR_Unlock. The difference is that this implementation of PR_Unlock does that without accessing members of the PRCondVar struct. If we ever change this implementation of PR_Unlock, we will need to re-evaluate if we need to add a reference to PRMonitor in PR_ExitMonitor.
Attachment #8372955 -
Flags: superreview?(Pidgeot18)
Attachment #8372955 -
Flags: review?(thuang)
Assignee | ||
Comment 37•10 years ago
|
||
I attached the wrong patch. Sorry.
Attachment #8372955 -
Attachment is obsolete: true
Attachment #8372955 -
Flags: superreview?(Pidgeot18)
Attachment #8372955 -
Flags: review?(thuang)
Attachment #8372956 -
Flags: superreview?(Pidgeot18)
Attachment #8372956 -
Flags: review?(thuang)
Updated•10 years ago
|
Attachment #8371758 -
Flags: review?(Pidgeot18) → review+
Comment 38•10 years ago
|
||
Comment on attachment 8372956 [details] [diff] [review] [Correct patch] Port the new PRMonitor code to non-pthreads platforms (e.g., Windows) Review of attachment 8372956 [details] [diff] [review]: ----------------------------------------------------------------- Sorry for the delay, these are not easy patches to keep in the brain.
Attachment #8372956 -
Flags: superreview?(Pidgeot18) → review+
Assignee | ||
Comment 39•10 years ago
|
||
Joshua: thank you for the review. It is worth the wait. Patch checked in, after fixing some coding style nits: https://hg.mozilla.org/projects/nspr/rev/cd5086f15d8d
Attachment #8372956 -
Attachment is obsolete: true
Attachment #8372956 -
Flags: review?(thuang)
Attachment #8381858 -
Flags: checked-in+
Assignee | ||
Comment 40•10 years ago
|
||
PR_OPERATION_NOT_SUPPORTED_ERROR ("The requested operation is not supported by the platform") seems like the wrong error code for the failure of pthread_mutex_init or pthread_cond_init. Since a nonzero return value of pthread_mutex_init and pthread_cond_init is the Unix error code, we can just map it to an NSPR error code.
Attachment #8381867 -
Flags: review?(Pidgeot18)
Assignee | ||
Comment 41•10 years ago
|
||
Comment on attachment 8381858 [details] [diff] [review] Port the new PRMonitor code to non-pthreads platforms (e.g., Windows), v2 Review of attachment 8381858 [details] [diff] [review]: ----------------------------------------------------------------- Patch pushed to mozilla-inbound as part of the NSPR_4_10_4_BETA4 update: https://hg.mozilla.org/integration/mozilla-inbound/rev/65f75ee1c34f
Assignee | ||
Updated•10 years ago
|
Whiteboard: [tsan][tsan-test-blocker][leave open] → [tsan][tsan-test-blocker]
Comment 42•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/65f75ee1c34f
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Attachment #8381867 -
Flags: review?(Pidgeot18) → review+
Assignee | ||
Comment 43•10 years ago
|
||
Comment on attachment 8381867 [details] [diff] [review] Don't map pthread_mutex_init or pthread_cond_init error to PR_OPERATION_NOT_SUPPORTED_ERROR Patch checked in (for NSPR 4.10.5): https://hg.mozilla.org/projects/nspr/rev/b033da9a671e
Attachment #8381867 -
Flags: checked-in+
You need to log in
before you can comment on or make changes to this bug.
Description
•