Closed Bug 1348419 Opened 3 years ago Closed 3 years ago

Use thread_local for thread local storage on Windows now that we've dropped XP

Categories

(Core :: MFBT, defect)

All
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox55 --- fixed

People

(Reporter: mstange, Assigned: tromey)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

Jeff says this will give us performance for free.

https://msdn.microsoft.com/en-us/library/9w1sdazb(v=vs.140).aspx:
> On XP systems, thread may not function correctly if a DLL uses
> __declspec(thread) data and it is loaded dynamically via LoadLibrary.
Why not C++11 thread_local?
Indeed, the notes on https://developer.mozilla.org/en-US/docs/Using_CXX_in_Mozilla_code talk about XP and OSX 1.06 being the main problems. There might be a problem on Android, though...
Even if there is a problem on Android, there is no reason to use Windows specific __declspec(thread).
I'm doing a try run of a patch that changes everything to use thread_local:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=59e892d4a18c8e76f4fb5e2a64513e8b7cf3661b

This particular patch is very eager and removes ThreadLocal.h entirely.
A more targeted approach is certainly possible though.
If this works we should also update the aforementioned wiki page to reflect that thread_local is ok.
That was quick:

[task 2017-03-21T17:15:10.770759Z] /home/worker/workspace/gcc/bin/../lib/gcc/x86_64-unknown-linux-gnu/4.9.4/../../../../x86_64-unknown-linux-gnu/bin/ld: TraceLogging.o: relocation R_X86_64_PC32 against undefined hidden symbol `_ZTHN2js10TlsContextE' can not be used when making a shared object
That one seems like it should be fixable, though I didn't dig into it.
Subsequent try runs have found a few more spots to change that I missed.
The latest patch says:

c:/builds/moz2_slave/try_w32_sm-compacting-00000000/src/js/src/jsutil.cpp(41): error C2492: 'threadType': data with thread storage duration may not have dll interface

which is:

https://msdn.microsoft.com/en-us/library/40a45kxx.aspx

However, it's unclear to me that threadType must be public.  It's defined here:

https://dxr.mozilla.org/mozilla-central/rev/05bfa2831c0ba4a26fa72328ffe6a99aba9c356a/js/src/jsutil.cpp#41

and only referenced in that file.
Not yet r? due to the build problem with the spidermonkey build.
Latest try here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=df92ef0e5186101f5def64fe5f567b7b59895315
So it builds nearly everywhere at least.

Some spots might be able to be reduced a bit further by removing initialization
variables, e.g. search for sScriptSettingsTLSInitialized.  I think there are 2-3 of these,
but I didn't change them because doing so would require a bit more investigation.
Not to bike shed here but, since this removes the explicitness of thread local usage, I wonder if it would be good to adopt a naming convention to distinguish thread_local variables from static and global ones.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #11)
> Not to bike shed here but, since this removes the explicitness of thread
> local usage, I wonder if it would be good to adopt a naming convention to
> distinguish thread_local variables from static and global ones.

It would be fine by me.  Can you suggest a prefix?
Also, are prefixes like this generally used in SpiderMonkey?  I am under the impression that they are not,
but I figured I'd ask before adding on a change.
Flags: needinfo?(jmuizelaar)
A much simpler patch, pretty much just substituting "thread_local" for "__thread", also
failed in the same way as comment#6.  I can't reproduce that locally though and don't
really have a theory for what is going wrong.
https://gcc.gnu.org/gcc-4.8/changes.html#cxx says that __thread is a bit faster than thread_local still,
at the cost of not allowing constructors and destructors; but existing code can't use these already...
(In reply to Tom Tromey :tromey from comment #12)
> (In reply to Jeff Muizelaar [:jrmuizel] from comment #11)
> > Not to bike shed here but, since this removes the explicitness of thread
> > local usage, I wonder if it would be good to adopt a naming convention to
> > distinguish thread_local variables from static and global ones.
> 
> It would be fine by me.  Can you suggest a prefix?

If globals have a `g` prefix and statics an `s` prefix, maybe thread-local variables should have a `t` or `tls` prefix?

https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Coding_Style#Prefixes
(In reply to Tom Tromey :tromey from comment #14)
> https://gcc.gnu.org/gcc-4.8/changes.html#cxx says that __thread is a bit
> faster than thread_local still,
> at the cost of not allowing constructors and destructors; but existing code
> can't use these already...

Oh. :(
Sounds like we'll want to keep using __thread then.

Here's a recent profile of some JS running on Linux, and it shows that even __thread seems to have measurable overhead: https://perfht.ml/2o5FXAI
(I'm not sure if I trust the _dl_deallocate_tls symbol name...)
(In reply to Markus Stange [:mstange] from comment #16)
> (In reply to Tom Tromey :tromey from comment #14)
> > https://gcc.gnu.org/gcc-4.8/changes.html#cxx says that __thread is a bit
> > faster than thread_local still,
> > at the cost of not allowing constructors and destructors; but existing code
> > can't use these already...
> 
> Oh. :(
> Sounds like we'll want to keep using __thread then.
> 
> Here's a recent profile of some JS running on Linux, and it shows that even
> __thread seems to have measurable overhead: https://perfht.ml/2o5FXAI
> (I'm not sure if I trust the _dl_deallocate_tls symbol name...)

I definitely don't trust that _dl_deallocate_tls symbol name.  Here's the disassembly of BarriersAreAllowed on my machine:

0000000000000000 <js::gc::BarriersAreAllowedOnCurrentThread()>:
   0:	55                   	push   %rbp
   1:	48 89 e5             	mov    %rsp,%rbp
   4:	66 48 8d 3d 00 00 00 	data16 lea 0x0(%rip),%rdi        # c <js::gc::BarriersAreAllowedOnCurrentThread()+0xc>
   b:	00 
			8: R_X86_64_TLSGD	js::TlsContext-0x4
   c:	66 66 48 e8 00 00 00 	data16 data16 callq 14 <js::gc::BarriersAreAllowedOnCurrentThread()+0x14>
  13:	00 
			10: R_X86_64_PLT32	__tls_get_addr-0x4
  14:	5d                   	pop    %rbp
  15:	48 8b 00             	mov    (%rax),%rax
  18:	0f b6 80 b8 01 00 00 	movzbl 0x1b8(%rax),%eax
  1f:	c3                   	retq   

One of those _dl_deallocate_tls might be __tls_get_addr or something that it tail calls.  _dl_deallocate_tls doesn't get called from the TLS variable machinery in glibc.  Maybe you don't have symbol information for the dynamic loader available?
The problem with that build is definitely just from the difference between __thread and thread_local.
Still no real theory; I note that this particular build is using GCC 4.9, but I haven't found
any indications of 4.9 having a bug along these lines.

Anyway, a smaller and perhaps somewhat more disappointing patch will still do the job.
Assignee: nobody → ttromey
The patch I'm about to attach is the more minimal approach.
It will conflict with the patch in bug 1349655 (race!) but the conflict is obvious and readily resolved.
(In reply to Chris Peterson [:cpeterson] from comment #15)

> If globals have a `g` prefix and statics an `s` prefix, maybe thread-local
> variables should have a `t` or `tls` prefix?

Thanks, that makes sense to me; but in the end I kept MOZ_THREAD_LOCAL and the wrapper type,
so I think I won't be doing the rename.  Clearing the NI on this basis.
Flags: needinfo?(jmuizelaar)
Comment on attachment 8849730 [details]
Bug 1348419 - Use thread_local on XP_WIN and XP_MACOSX;

https://reviewboard.mozilla.org/r/122516/#review125080

Progress, at least.
Attachment #8849730 - Flags: review?(nfroyd) → review+
Duplicate of this bug: 1239805
Duplicate of this bug: 1239806
I'm not sure I understand what the point of a prefix is on these, to be honest.  Most variables are thread-local in reality -- every single automatic-duration variable is thread-local in the sense that other threads can't touch them without having committed some sort of memory safety violation.  Doesn't seem necessary to specifically distinguish these ones, when every thread will see its own version, and there's no sharing of copies.
Comment on attachment 8849730 [details]
Bug 1348419 - Use thread_local on XP_WIN and XP_MACOSX;

https://reviewboard.mozilla.org/r/122516/#review125126

::: commit-message-cde1a:6
(Diff revision 2)
> +Bug 1348419 - use thread_local when __thread is not available; r?froydnj
> +
> +This changes ThreadLocal.h to use C++11's thread_local when __thread is
> +not available.  __thread is preferred because, according to the GCC 4.8
> +release notes, it is faster in some situations; and because switching
> +fully to __thread caused some mysterious build failures with the

nit: to thread_local
As predicted (In reply to Mike Hommey [:glandium] from comment #2)
> There might be a problem on
> Android, though...

Yes indeed.

https://treeherder.mozilla.org/logviewer.html#?job_id=85740248&repo=try&lineNumber=20170

... which looks a bit like the ARM variant of the hazard build problem; many errors of the form:

[task 2017-03-22T21:30:11.201013Z] 21:30:11     INFO -  /home/worker/workspace/build/src/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/bin/../lib/gcc/arm-linux-androideabi/4.9/../../../../arm-linux-androideabi/bin/ld: error: /home/worker/workspace/build/src/obj-firefox/toolkit/library/../../xpcom/base/Unified_cpp_xpcom_base1.o: requires unsupported dynamic reloc R_ARM_REL32; recompile with -fPIC

I spent a bit of time searching for answers today -- I know really nothing about Android -- but I've
been finding it difficult to understand what's going on.  It seems that Android perhaps doesn't have "real"
TLS but only "emultls", which is a GCC term for "we turn __thread into pthread_*specific calls for you".
If that's true (it's hard to be sure) then there's no real benefit to using __thread here anyway, and we could
just use an even simpler patch, basically just adding a definition of MOZ_THREAD_LOCAL for the benefit of Windows.
(In reply to Tom Tromey :tromey from comment #27)
> As predicted (In reply to Mike Hommey [:glandium] from comment #2)
> > There might be a problem on
> > Android, though...
> 
> Yes indeed.
> 
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=85740248&repo=try&lineNumber=20170
> 
> ... which looks a bit like the ARM variant of the hazard build problem; many
> errors of the form:
> 
> [task 2017-03-22T21:30:11.201013Z] 21:30:11     INFO - 
> /home/worker/workspace/build/src/android-ndk/toolchains/arm-linux-
> androideabi-4.9/prebuilt/linux-x86_64/bin/../lib/gcc/arm-linux-androideabi/4.
> 9/../../../../arm-linux-androideabi/bin/ld: error:
> /home/worker/workspace/build/src/obj-firefox/toolkit/library/../../xpcom/
> base/Unified_cpp_xpcom_base1.o: requires unsupported dynamic reloc
> R_ARM_REL32; recompile with -fPIC

Sigh. Yet again BFD ld is being painfully useless in its error reporting... if only it said *which* symbol the reloc was used for... You'd need to reproduce the build and inspect the .o file to know...
(In reply to Mike Hommey [:glandium] from comment #28)
> requires unsupported dynamic reloc
> > R_ARM_REL32; recompile with -fPIC
> 
> Sigh. Yet again BFD ld is being painfully useless in its error reporting...
> if only it said *which* symbol the reloc was used for... You'd need to
> reproduce the build and inspect the .o file to know...

I found this error message in gold, but not the BFD ld.
But I was only grepping git master, maybe it got removed in the intervening years.

Back in comment #6 I couldn't reproduce that failure at all.
Could be a GCC 4.9 bug, I didn't try digging up an old GCC either.
(In reply to Tom Tromey :tromey from comment #29)
> (In reply to Mike Hommey [:glandium] from comment #28)
> > requires unsupported dynamic reloc
> > > R_ARM_REL32; recompile with -fPIC
> > 
> > Sigh. Yet again BFD ld is being painfully useless in its error reporting...
> > if only it said *which* symbol the reloc was used for... You'd need to
> > reproduce the build and inspect the .o file to know...
> 
> I found this error message in gold, but not the BFD ld.

They're both useless for these kinds of errors.

> But I was only grepping git master, maybe it got removed in the intervening
> years.
> 
> Back in comment #6 I couldn't reproduce that failure at all.
> Could be a GCC 4.9 bug, I didn't try digging up an old GCC either.

It could just be a wrapper missing for some header.
If we turn this bug into Windows-only, we should reopen bug 1239805.
I installed the Android NDK and built locally, and I was able to reproduce the error.

Looking at some of the affected files I see symbols like:

         U __emutls_v._ZN2js10TlsContextE
         w _ZTHN2js10TlsContextE

I wasn't expecting that weak symbol, so theorizing that this was the problem,
I wrote a small test program.  This showed I can reproduce the weak symbol and the
link error using thread_local, but not __thread.

The build log shows:

 0:06.00 checking for __thread keyword for TLS variables... yes

However, obj-arm-linux-androideabi/mozilla-config.h does not mention HAVE_THREAD_TLS_KEYWORD
-- whereas obj-x86_64-pc-linux-gnu/mozilla-config.h does.

So, it seems like something is going wrong during configury.
There's a similar line in the top-level old-configure.in.
These were introduced in bug 617115.
It turns out that the linker check is what really circumvents the define:
https://dxr.mozilla.org/mozilla-central/rev/b5d8b27a753725c1de41ffae2e338798f3b5cacd/old-configure.in#1812
Here we have MOZ_LINKER=1.
I wonder if this is really needed, though, as emultls is in use.
Ah, IIRC TLS requires linker support for it, which isn't there yet.
(In reply to Mike Hommey [:glandium] from comment #36)
> Ah, IIRC TLS requires linker support for it, which isn't there yet.

I removed this test and rebuilt using the Android toolchain, and the resulting libxul.so
doesn't have any TLS sections that I can see -- so my theory is that emultls means
that they aren't emitted at all (which makes sense because there are no TLS symbols
either).  Tomorrow I'll push this version through try and see what happens.

But yes, your recollection is correct:
https://dxr.mozilla.org/mozilla-central/rev/81e37ef1360ba4505726ddf542ebdcc952a57578/mozglue/linker/CustomElf.cpp#157-162
Seeing this now:
/home/worker/workspace/build/src/memory/jemalloc/src/src/tsd.c:10:60: error: __emutls_t.je_tsd_tls causes a section type conflict with je_tsd_tls
Haven't dug into it yet.
After talking with Nathan today I gave up on making Android work, and instead I'll attach
a more minimal patch that just switches over XP_WIN and XP_MACOS.  If this is approved
I'll file a follow-up bug to convert Android.
Attachment #8849730 - Flags: review+ → review?(nfroyd)
Comment on attachment 8849730 [details]
Bug 1348419 - Use thread_local on XP_WIN and XP_MACOSX;

https://reviewboard.mozilla.org/r/122516/#review132576

Hm, mozreview is showing me that you have test failures on try (Android build bustage), but it's not the same patch that I'm reviewing here.  This patch looks right in any event.
Attachment #8849730 - Flags: review?(nfroyd) → review+
Pushed by ttromey@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e39e02b11392
Use thread_local on XP_WIN and XP_MACOSX; r=froydnj
Backed out for failing at jsutil.cpp(42) with "data with thread storage duration may not have dll interface":

https://hg.mozilla.org/integration/autoland/rev/b17ddbfb0f551c6b4d24c6d2049f5405713697e7

Push showing the failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=f0fd2712e2c2e017e64ba494e2c8c918e35304d4&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=91772352&repo=autoland

jsutil.cpp
c:/builds/moz2_slave/autoland_w32-d_sm-plaindebug-0/src/js/src/jsutil.cpp(42): error C2492: 'threadType': data with thread storage duration may not have dll interface
c:/builds/moz2_slave/autoland_w32-d_sm-plaindebug-0/src/config/rules.mk:1022: recipe for target 'jsutil.obj' failed
mozmake[3]: *** [jsutil.obj] Error 2
mozmake[3]: *** Waiting for unfinished jobs....
Flags: needinfo?(ttromey)
Sorry about that.
I'm surprised to see that on an XP machine since I thought we'd dropped that.
That was the premise of this bug.
Also I got bitten again by the "all doesn't mean all" thing on try :(
Flags: needinfo?(ttromey)
This is mislabeled (don't ask me about the details why it's kept), but the build uses Windows 2008. Windows XP support has indeed been dropped.
> This is mislabeled

Thanks, I had no idea.
Ugh, so we'd have to use a different kind of thread-local storage interface depending on whether we were building JS standalone or not?  Ugh.

Is there a reason that we'd have exported thread-local variables?  Do consumers really need to get at those?  Could we just provide a standard (exported) API for accessing the variables instead, and keep the variables as private implementation details?
(In reply to Nathan Froyd [:froydnj] from comment #47)
> Ugh, so we'd have to use a different kind of thread-local storage interface
> depending on whether we were building JS standalone or not?  Ugh.
> 
> Is there a reason that we'd have exported thread-local variables?  Do
> consumers really need to get at those?  Could we just provide a standard
> (exported) API for accessing the variables instead, and keep the variables
> as private implementation details?

The variable in question is marked exported at its definition, but not declared in any
header, and there are already accessors.  So I think at least for this particular
variable there is no issue.  The only problem would be if we wanted to add new
exported thread-locals.
Comment on attachment 8849730 [details]
Bug 1348419 - Use thread_local on XP_WIN and XP_MACOSX;

Probably should re-review.
The only change is in jsutil.cpp.
I'll be sure to run more try jobs this time.
Attachment #8849730 - Flags: review+ → review?(nfroyd)
Comment on attachment 8849730 [details]
Bug 1348419 - Use thread_local on XP_WIN and XP_MACOSX;

https://reviewboard.mozilla.org/r/122516/#review133450

::: js/src/jsutil.cpp:42
(Diff revision 4)
>  mozilla::Atomic<AutoEnterOOMUnsafeRegion*> AutoEnterOOMUnsafeRegion::owner_;
>  
>  namespace oom {
>  
>  JS_PUBLIC_DATA(uint32_t) targetThread = 0;
> -JS_PUBLIC_DATA(MOZ_THREAD_LOCAL(uint32_t)) threadType;
> +MOZ_THREAD_LOCAL(uint32_t) threadType;

I always enjoy "XType" variables that are plain integers rather than some sort of enumeration type.
Attachment #8849730 - Flags: review?(nfroyd) → review+
> I always enjoy "XType" variables that are plain integers rather than some sort of enumeration type.

Yeah, that's funny.
Also I noticed that the ThreadLocal static assert is a bit restrictive, it could
really be IsPod instead:
https://dxr.mozilla.org/mozilla-central/rev/a374c35469935a874fefe64d3e07003fc5bc8884/mfbt/ThreadLocal.h#153-155
I think I did the try correctly (I added new jobs from the treeherder ui to ensure they were run),
so I'm going to land this again.
Pushed by ttromey@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/853da67e9bc3
Use thread_local on XP_WIN and XP_MACOSX; r=froydnj
https://hg.mozilla.org/mozilla-central/rev/853da67e9bc3
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Summary: Use __declspec(thread) for thread local storage on Windows now that we've dropped XP → Use thread_local for thread local storage on Windows now that we've dropped XP
Is this supposed to work on Mac OS X 10.10.5, Xcode 7.2.1 ?

I'm getting this error during compilation:
/Users/varga/Sources/Mozilla/js/src/threading/Mutex.h:70:10: error: 
      thread-local storage is not supported for the current target
  static MOZ_THREAD_LOCAL(MutexVector*) HeldMutexStack;
         ^
/Users/varga/Sources/Mozilla/obj-ff-dbg/dist/include/mozilla/ThreadLocal.h:185:32: note: 
      expanded from macro 'MOZ_THREAD_LOCAL'
#define MOZ_THREAD_LOCAL(TYPE) thread_local mozilla::detail::ThreadLocal<TYPE>
                               ^
In file included from /Users/varga/Sources/Mozilla/js/src/builtin/RegExp.cpp:7:
In file included from /Users/varga/Sources/Mozilla/js/src/builtin/RegExp.h:10:
In file included from /Users/varga/Sources/Mozilla/js/src/vm/RegExpObject.h:13:
/Users/varga/Sources/Mozilla/js/src/jscntxt.h:74:8: error: thread-local storage
      is not supported for the current target
extern MOZ_THREAD_LOCAL(JSContext*) TlsContext;
       ^
/Users/varga/Sources/Mozilla/obj-ff-dbg/dist/include/mozilla/ThreadLocal.h:185:32: note: 
      expanded from macro 'MOZ_THREAD_LOCAL'
#define MOZ_THREAD_LOCAL(TYPE) thread_local mozilla::detail::ThreadLocal<TYPE>
Depends on: 1357897
Depends on: 1358850
You need to log in before you can comment on or make changes to this bug.