Closed Bug 939786 Opened 9 years ago Closed 9 years ago

TSan: data race nsprpub/pr/src/pthreads/ptthread.c:137 _pt_root

Categories

(NSPR :: NSPR, defect, P1)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED
4.10.4

People

(Reporter: decoder, Assigned: wtc)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Attachments

(3 files)

The attached logfile shows a thread/data race (mozilla-central revision beddd6d4bcdf) detected by TSan (ThreadSanitizer).

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause inacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
This is a frequent trace I'm seeing, and it's slipping through TSan's duplicate detection quite effectively. I suggest we either fix it, or add a compile-time blacklisting annotation, as a runtime suppression might cause performance issues. The comment seems to indicate that it's benign double-write:

>     * Both threads set thred->id without holding a lock.  Since
>     * they are writing the same value, this unprotected double
>     * write should be safe.
Component: General → NSPR
Product: Core → NSPR
Version: Trunk → other
Can someone suggest how we should best address this issue?
Flags: needinfo?(wtc)
Flags: needinfo?(Pidgeot18)
Christian:

The struct member is a pthread_t, which depends on the implementation
but is almost always an integer thread ID or a pointer to some struct.
My assumption is that it is OK for two threads to write the same value
to an integer or a pointer without synchronization, if they each read
only after their respective write.

We can use the lock pt_book.ml to protect the writes to this struct
member. Perhaps we should just do this, rather than spending time
analyzing whether this data race is benign. Note that the correctness
of the code, with lock protection, still depends on the fact that
the two threads each read that struct member only after their respective
write, which we will not change.
Flags: needinfo?(wtc)
If it's a benign double-write, then making both of the writes unordered atomic writes (and the subsequent read) should legitimize the data race according to tsan. I don't know the status of C11 atomic support or getting NSPR to optionally compile under C11, though.
Flags: needinfo?(Pidgeot18)
Attached patch PatchSplinter Review
Christian: could you test this patch under TSan?

Joshua: please review this patch. I decided to use the lock |pt_book.ml| and a new boolean thred->idSet to ensure we only set thred->id once. I am worried that if we still do two writes to thred->id, then a read of thred->id between the two writes may be reported as a data race.
Attachment #8368232 - Flags: review?(Pidgeot18)
Attachment #8368232 - Flags: feedback?(choller)
Comment on attachment 8368232 [details] [diff] [review]
Patch

Review of attachment 8368232 [details] [diff] [review]:
-----------------------------------------------------------------

::: pr/src/pthreads/ptthread.c
@@ -155,5 @@
> -
> -    PR_Lock(pt_book.ml);
> -    thred->tid = tid;
> -    PR_NotifyAllCondVar(pt_book.cv);
> -    PR_Unlock(pt_book.ml);

I'm not 100% sure of the consequences of moving this code later, but it doesn't seem to be too problematic.
Attachment #8368232 - Flags: review?(Pidgeot18) → review+
Comment on attachment 8368232 [details] [diff] [review]
Patch

Confirmed, this fixes the race reports, thanks!
Attachment #8368232 - Flags: feedback?(choller) → feedback+
Comment on attachment 8368232 [details] [diff] [review]
Patch

Review of attachment 8368232 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks a lot for reviewing and testing the patch.

Christian: I also ran TSan yesterday (to verify that the new boolean thred->idSet is necessary), but I forgot to tell you that. Sorry.

Patch checked in: https://hg.mozilla.org/projects/nspr/rev/4a8cec3c2696

::: pr/src/pthreads/ptthread.c
@@ -155,5 @@
> -
> -    PR_Lock(pt_book.ml);
> -    thred->tid = tid;
> -    PR_NotifyAllCondVar(pt_book.cv);
> -    PR_Unlock(pt_book.ml);

Good point. It's very important to verify that none of the code in between uses thred->id or thred->tid. I checked that when I wrote this patch, but since you asked about this, I just checked this again and realized that I missed one place.

Here is all the code in between:

=====
#ifdef _PR_NICE_PRIORITY_SCHEDULING
    /*
     * We need to know the kernel thread ID of each thread in order to
     * set its nice value hence we do it here instead of at creation time.
     */
    tid = gettid();
    errno = 0;
    rv = getpriority(PRIO_PROCESS, 0);

    /* If we cannot read the main thread's nice value don't try to change the
     * new thread's nice value. */
    if (errno == 0) {
        setpriority(PRIO_PROCESS, tid,
                    pt_RelativePriority(rv, thred->priority));
    }
#endif

    /*
    ** DCE Threads can't detach during creation, so do it late.
    ** I would like to do it only here, but that doesn't seem
    ** to work.
    */
#if defined(_PR_DCETHREADS)
    if (detached)
    {
        /* pthread_detach() modifies its argument, so we must pass a copy */
        pthread_t self = id;
        rv = pthread_detach(&self);
        PR_ASSERT(0 == rv);
    }
#endif /* defined(_PR_DCETHREADS) */

    /* Set up the thread stack information */
    _PR_InitializeStack(thred->stack);

    /*
     * Set within the current thread the pointer to our object.
     * This object will be deleted when the thread termintates,
     * whether in a join or detached (see _PR_InitThreads()).
     */
    rv = pthread_setspecific(pt_book.key, thred);
    PR_ASSERT(0 == rv);

    /* make the thread visible to the rest of the runtime */
    PR_Lock(pt_book.ml);
=====

What I missed was the PR_Lock call at the very end. Fortunately PR_Lock does not access |thred| at all. So we are still fine.

This does feel like a near miss. I probably should add a comment to PR_Lock to remind ourselves that it must not call PR_GetCurrentThread or pthread_getspecific(pt_book.key).
Attachment #8368232 - Flags: checked-in+
Assignee: general → wtc
Status: NEW → RESOLVED
Closed: 9 years ago
Priority: -- → P1
Resolution: --- → FIXED
Target Milestone: --- → 4.10.4
Attachment #8369605 - Flags: review?(Pidgeot18)
Comment on attachment 8369605 [details] [diff] [review]
Add a comment to PR_Lock

Review of attachment 8369605 [details] [diff] [review]:
-----------------------------------------------------------------

::: pr/src/pthreads/ptsynch.c
@@ +169,5 @@
>  
>  PR_IMPLEMENT(void) PR_Lock(PRLock *lock)
>  {
> +    /* Nb: PR_Lock must not call PR_GetCurrentThread to access the |id| or
> +     * |tid| field of the current thread's PRThread structure because 

Nit: trailing whitespace
Attachment #8369605 - Flags: review?(Pidgeot18) → review+
Comment on attachment 8369605 [details] [diff] [review]
Add a comment to PR_Lock

Review of attachment 8369605 [details] [diff] [review]:
-----------------------------------------------------------------

Patch checked in (after removing the trailing whitespace):
https://hg.mozilla.org/projects/nspr/rev/1b53c94eb7f8
Attachment #8369605 - Flags: checked-in+
Fix pushed to mozilla-central as part of NSPR_4_10_4_BETA3:
https://hg.mozilla.org/mozilla-central/rev/386a2183cc84
You need to log in before you can comment on or make changes to this bug.