Closed Bug 931792 Opened 9 years ago Closed 9 years ago

TSan: Thread data race on sOutputFD in Canary vs. ~Canary

Categories

(Core :: XPCOM, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla28

People

(Reporter: decoder, Assigned: blassey)

References

(Blocks 1 open bug)

Details

(Keywords: sec-want)

Attachments

(1 file, 2 obsolete files)

TSan shows a race in the Canary class of nsThread.cpp:


WARNING: ThreadSanitizer: data race (pid=521)
  Write of size 4 at 0x7f41cfe47948 by main thread:
    #0 Canary xpcom/threads/nsThread.cpp:507 (libxul.so+0x000002d375cf)
    #1 Canary xpcom/threads/nsThread.cpp:515 (libxul.so+0x000002d367c1)
    #2 NS_ProcessNextEvent xpcom/glue/nsThreadUtils.cpp:251 (libxul.so+0x000002cc7659)
    #3 Run ipc/glue/MessagePump.cpp:85 (libxul.so+0x0000022a34f0)
    #4 RunInternal ipc/chromium/src/base/message_loop.cc:220 (libxul.so+0x000002dc5900)
    #5 Run widget/xpwidgets/nsBaseAppShell.cpp:161 (libxul.so+0x000002184a86)
    #6 Run toolkit/components/startup/nsAppStartup.cpp:268 (libxul.so+0x000001e923e9)
    #7 XRE_mainRun toolkit/xre/nsAppRunner.cpp:3886 (libxul.so+0x0000008e6eac)
    #8 XRE_main toolkit/xre/nsAppRunner.cpp:3954 (libxul.so+0x0000008e72c2)
    #9 XRE_main toolkit/xre/nsAppRunner.cpp:4156 (libxul.so+0x0000008e7a40)
    #10 do_main browser/app/nsBrowserApp.cpp:275 (exe+0x000000082ec0)
    #11 __libc_start_main /build/buildd/eglibc-2.15/csu/libc-start.c:226 (libc.so.6+0x00000002176c)

  Previous read of size 4 at 0x7f41cfe47948 by thread T14:
    #0 ~Canary xpcom/threads/nsThread.cpp:518 (libxul.so+0x000002d36b11)
    #1 NS_ProcessNextEvent xpcom/glue/nsThreadUtils.cpp:251 (libxul.so+0x000002cc7659)
    #2 ThreadFunc xpcom/threads/nsThread.cpp:250 (libxul.so+0x000002d3543c)
    #3 _pt_root nsprpub/pr/src/pthreads/ptthread.c:213 (libnspr4.so+0x00000003df5d)

  Thread T14 'Cache I/O' (tid=546, running) created by main thread at:
[...]

SUMMARY: ThreadSanitizer: data race xpcom/threads/nsThread.cpp:507 Canary


Right now, this looks like a false positive to me, because one access is on the constructor, and the other is in the destructor:

>  Canary() {
> [...]
>        sOutputFD = env_var_flag ? (env_var_flag[0] ?
>                                    open(env_var_flag, flags, mode) :
>                                    STDERR_FILENO) : 0;
> [...]
>  }
>
>  ~Canary() {
>    if (sOutputFD != 0 && EventLatencyIsImportant())
>      ualarm(0, 0);
>  }


The tool seems to assume that we're racing on sOutputFD. I don't know if that's possible by the standard though. I'm filing this bug so we can track the issue and I'll ping the TSan developers.
As kcc pointed out on the TSan list, the sOutputFD field is static (duh!) and the constructor of one object is racing with the destructor of another object. This should be fixed.
Brad, you wrote this code long ago, could you take a look or suggest someone to fix it?
Flags: needinfo?(blassey.bugs)
(In reply to Christian Holler (:decoder) from comment #1)
> As kcc pointed out on the TSan list, the sOutputFD field is static (duh!)
> and the constructor of one object is racing with the destructor of another
> object. This should be fixed.

the obvious fix would be to move the EventLatencyMatters() checks before the sOutputFD checks, but that would be kind of sad because it would be a tad slower and I don't think there's a sane optimization that actually breaks this code.  The fd is only ever written on the main thread and the off main thread reads can race those writes but they don't matter because the off main thread check will fail.

On the other hand I wonder if we could just open and close this fd once at shutdown and startup when theyres only one thread.
However I wonder if we could just init and close the fd before and after we have any other threads.
Attached patch canary_tsan.patch (obsolete) — Splinter Review
Assuming this isn't an ASAN build, calling NS_IsNainThread() first shouldn't be a perf hit.

If we're really worried about that I think adding a 'if (sOutputFD == 0) return;' at the top of the constructor would be safe, though would probably still make Tsan complain.
Assignee: nobody → blassey.bugs
Attachment #823445 - Flags: review?(benjamin)
Flags: needinfo?(blassey.bugs)
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #4)
> Created attachment 823445 [details] [diff] [review]
> canary_tsan.patch
> 
> Assuming this isn't an ASAN build, calling NS_IsNainThread() first shouldn't
> be a perf hit.

Well, it will always be a load from tls instead of a normal load so I'm not really worried.
I don't see any reason we can't just deal with this if we need to when we create the main thread and thus not possible have any silly races.
Attachment #824249 - Flags: review?(benjamin)
Comment on attachment 824249 [details] [diff] [review]
bug 931792 - only init sCanaryOutputFD once

Since we're making this init at startup, can we put this in nsThreadManager::Init? Having the aMainThread check in the nsThread constructor seems a little odd.
Attachment #824249 - Flags: review?(benjamin) → review-
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #7)
> Comment on attachment 824249 [details] [diff] [review]
> bug 931792 - only init sCanaryOutputFD once
> 
> Since we're making this init at startup, can we put this in
> nsThreadManager::Init? Having the aMainThread check in the nsThread
> constructor seems a little odd.

it means making sCanaryOutputFD non-static, but if that's fine with you I have no objection.
Comment on attachment 823445 [details] [diff] [review]
canary_tsan.patch

I'm told this patch isn't useful.
Attachment #823445 - Flags: review?(benjamin)
I'm not sure I like this any more, but here you go.
Attachment #823445 - Attachment is obsolete: true
Attachment #824249 - Attachment is obsolete: true
Attachment #825414 - Flags: review?(benjamin)
Attachment #825414 - Flags: review?(benjamin) → review+
https://hg.mozilla.org/mozilla-central/rev/be317170d5ae
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
You need to log in before you can comment on or make changes to this bug.