Closed Bug 1093823 Opened 5 years ago Closed 5 years ago

ThreadLocal<bool> yields valgrind warning

Categories

(Core :: MFBT, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla36

People

(Reporter: tromey, Assigned: tromey)

Details

Attachments

(1 file, 2 obsolete files)

I ran some tests under valgrind and saw this in the logs:

==14094== Conditional jump or move depends on uninitialised value(s)
==14094==    at 0x4A2EBC0: pthread_setspecific (/usr/src/debug/glibc-2.18/nptl/pthread_setspecific.c:49)
==14094==    by 0x6EAAF2D: mozilla::ThreadLocal<bool>::set(bool) (/obj-x86_64-unknown-linux-gnu/xpcom/threads/../../dist/include/mozilla/ThreadLocal.h:142)
==14094==    by 0x6EA019C: NS_SetMainThread() (/xpcom/threads/nsThreadManager.cpp:43)
==14094==    by 0x6E3287A: NS_LogInit (/xpcom/base/nsTraceRefcnt.cpp:1016)
==14094==    by 0x41F814: NS_LogInit (/xpcom/glue/standalone/nsXPCOMGlue.cpp:884)
==14094==    by 0x403C3F: InitXPCOMGlue(char const*, nsIFile**) (/browser/app/nsBrowserApp.cpp:571)
==14094==    by 0x403CEE: main (/browser/app/nsBrowserApp.cpp:625)


After that, there are many instance of this:

==14094== Conditional jump or move depends on uninitialised value(s)
==14094==    at 0x4A2EB04: pthread_getspecific (/usr/src/debug/glibc-2.18/nptl/pthread_getspecific.c:57)
==14094==    by 0x6EAAE2C: mozilla::ThreadLocal<bool>::get() const (/obj-x86_64-unknown-linux-gnu/xpcom/threads/../../dist/include/mozilla/ThreadLocal.h:127)
==14094==    by 0x6EA0137: NS_IsMainThread() (/xpcom/threads/nsThreadManager.cpp:33)


The bug here is that ThreadLocal::set creates a Helper object
on the stack.  If the ThreadLocal's "T" type is narrower than
"void *", then some bits of this Helper are not initialized.
This leads valgrind to report the issue during set(), and to
report it again in get().


One possible fix is to memset the Helper to 0 in ThreadLocal::set.
GCC will optimize out this memset in the pointer cases and convert
it to a zero store in the bool case.  However, I am not sure what
other compilers might do.  I verified that this silences the valgrind
complaint.

Another approach would be to add this to one of the valgrind suppression
files.

I think it's worth fixing this as the extra noise makes the valgrind
logs harder to read.
A few more details.

I am using x86-64 Fedora 20.
I reproduced this with both the system valgrind and with a locally-built valgrind 3.10.0.

My mozconfig is:

ac_add_options --disable-optimize
ac_add_options --enable-debug
ac_add_options --enable-warnings-as-errors
ac_add_options --disable-crashreporter
ac_add_options --disable-jemalloc
ac_add_options --enable-valgrind


One way to reproduce:

./mach mochitest-devtools --debugger valgrind --debugger-args=--smc-check=all-non-file browser/devtools/profiler/test/browser_profiler_console-record-09.js
(In reply to Tom Tromey :tromey from comment #0)
> I think it's worth fixing this as the extra noise makes the valgrind
> logs harder to read.

Yes please!
(In reply to Nathan Froyd (:froydnj) from comment #2)
> (In reply to Tom Tromey :tromey from comment #0)
> > I think it's worth fixing this as the extra noise makes the valgrind
> > logs harder to read.
> 
> Yes please!

If you have a preferred way to fix it, I'm happy to implement it.
I wasn't sure if the memset thing is a good way or not.
(In reply to Tom Tromey :tromey from comment #3)
> If you have a preferred way to fix it, I'm happy to implement it.
> I wasn't sure if the memset thing is a good way or not.

Does something like:

  h.mPtr = 0;
  h.mValue = aValue;

do the right thing?  (I'd be fine with the memset version except that I don't know that some GCCs on some architectures (*cough* ARM) would a) inline the memset and eliminate it that way; or b) eliminate the memset without inlining.  This way doesn't involve calling memset if something goes wrong.)

A comment here is probably warranted, too.
I will test that.

Another couple ideas:

* Write get and set to cast via uintptr_t.  This will either be a no-op,
  or cause the compiler to extend the value, in the bool case.  We can
  remove Helper entirely.
  With a suitable comment I think this is both clean and clear; it's
  my preferred approach right now.

* Specialize ThreadLocal<bool>.  Seems grosser but it would work.
(In reply to Nathan Froyd (:froydnj) from comment #4)

> Does something like:
> 
>   h.mPtr = 0;
>   h.mValue = aValue;
> 
> do the right thing?

I tried this on my reduced test case and it does work -- valgrind
complains before the addition and is silent after.
(This is perhaps due to a lack of caffeine at this end, but ..)
IMO there's something strange about all this.
pthread_{set,get}_specific simply provide a thread-local key-value
storage facility, afaics.  I can see that the keys need to be defined
for it to work properly, but I don't get why pthread_setspecific
should care what the values contain or whether they are defined.  It
just copies them around, right?  In which case V should not complain.

I looked at __pthread_setspecific at glibc-2.19/nptl/pthread_setspecific.c:49
but am none the wiser.  What it is complaining about is this "if"

      /* Remember that we stored at least one set of data.  */
      if (value != NULL)
        THREAD_SETMEM (self, specific_used, true);

but I don't get why libc would special-case the value NULL.

If I had to guess, I'd say it's some hack which allows libc to avoid
storing the (k,v) pair at all if v == NULL, based on the observation
that getspecific(k) returns NULL if k isn't known.
> * Write get and set to cast via uintptr_t.  This will either be a no-op,
>   or cause the compiler to extend the value, in the bool case.  We can
>   remove Helper entirely.

I think this would be preferable (1) because it gets rid of the redundant
store and load and replaces it with a mov{s,z}{b,w,l}q widener, and (2)
as a result of (1) we don't wind up potentially messing with the processor's
store forwarding logic, which typically doesn't like wide-load-after-narrow-store.

But maybe it's not worth the hassle.  Nathan, any idea if this stuff gets
used on fast paths?
(In reply to Julian Seward [:jseward] from comment #7)
> (This is perhaps due to a lack of caffeine at this end, but ..)

Not at all -- thanks for digging deeper.

> I looked at __pthread_setspecific at glibc-2.19/nptl/pthread_setspecific.c:49
> but am none the wiser.  What it is complaining about is this "if"
> 
>       /* Remember that we stored at least one set of data.  */
>       if (value != NULL)
>         THREAD_SETMEM (self, specific_used, true);
> 
> but I don't get why libc would special-case the value NULL.

What is going on is that this line sets the "specific_used" flag in
the non-NULL case, so that later on pthread_create.c:__nptl_deallocate_tsd
can take a faster path when no thread-local values have been set.
It is a performance optimization to try to avoid the loop that looks
at values and calls destructors.

There's also another value-checking branch:

	  if (value == NULL)
	    /* We don't have to do anything.  The value would in any case
	       be NULL.  We can save the memory allocation.  */
	    return 0;

This one is more self-explanatory -- it is in the "level 2" block and
tries to avoid allocating the second level array when possible.

I think this all seems reasonable enough that I wouldn't try to pursue
it in glibc.


> But maybe it's not worth the hassle.  Nathan, any idea if this stuff gets
> used on fast paths?

I wonder that too.  There's only the one <bool> instantiation; all the other
ones, I believe, use pointers, and so wouldn't pay any price.
(In reply to Julian Seward [:jseward] from comment #8)
> > * Write get and set to cast via uintptr_t.  This will either be a no-op,
> >   or cause the compiler to extend the value, in the bool case.  We can
> >   remove Helper entirely.
> 
> I think this would be preferable (1) because it gets rid of the redundant
> store and load and replaces it with a mov{s,z}{b,w,l}q widener, and (2)
> as a result of (1) we don't wind up potentially messing with the processor's
> store forwarding logic, which typically doesn't like
> wide-load-after-narrow-store.
> 
> But maybe it's not worth the hassle.  Nathan, any idea if this stuff gets
> used on fast paths?

I do not know about hot paths here.  We're paying a through-the-PLT function call here anyway, I'm not going to get too worked up about this.  uintptr_t storage works for me, too.
The resulting patch looks pretty ugly, I'm afraid.  The problem is that
when writing "get", with pointers we want to use reinterpret_cast, but with
bool we want to use static_cast.  So we need a new template function.  And then
to get the compiler to pick the right one I used type traits... seems like a lot
of machinery!  Hopefully I'm missing something obvious.

A C-style cast would work but that seems pretty gross as well.
I thought of a prettier way to do this; attached.

However, I am not sure it is acceptable.  It occurs to me that with
the current approach it is fine to instantiate ThreadLocal using a
POD, provided the size of the struct is small enough.  That will not
work with the approach taken in this patch.

If that case is intended to work then I think we're back to
zero-initializing, or a suppression.
I am OK with disallowing ThreadLocal<PODStruct>.  If you want a structure, you should use a pointer.  If you really want a small structure, you can do the necessary dances with integers and casting.

Enforcing this with |static_assert(IsIntegral<T>::value || IsPointer<T>::value, ...);| would be good, though.
This version adds the static assert.

I'll send through "try" shortly.
Attachment #8518256 - Attachment is obsolete: true
Attachment #8518284 - Flags: review?(nfroyd)
Comment on attachment 8518284 [details] [diff] [review]
avoid valgrind report from ThreadLocal<bool>

Review of attachment 8518284 [details] [diff] [review]:
-----------------------------------------------------------------

WFM.  Thanks for fixing this.
Attachment #8518284 - Flags: review?(nfroyd) → review+
... this time with the missing header included.
Looking through some other try builds, I think the R-e10s orange there is
an intermittent failure.  Also I re-ran the one red and it worked the second
time.  So I'm marking this checkin-needed.  Thanks.
Keywords: checkin-needed
Attachment #8518284 - Attachment is obsolete: true
Assignee: nobody → ttromey
https://hg.mozilla.org/mozilla-central/rev/e6e89ea52d09
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
You need to log in before you can comment on or make changes to this bug.