Open Bug 1324316 Opened 7 years ago Updated 15 days ago

Consider dropping mozilla::ThreadLocal and using native thread_local instead

Categories

(Core :: MFBT, defect)

defect

Tracking

()

Tracking Status
firefox53 --- affected

People

(Reporter: xidorn, Unassigned)

References

Details

As noted in Using C++ in Mozilla code [1], thread_local has been supported in all compilers we use. It was not allowed to be used because of breaking Windows XP and OS X 10.6. However, now we have dropped support to both Windows XP and OS X 10.6, we should be able to use the native thread_local now, and drop mozilla::ThreadLocal.

[1] https://developer.mozilla.org/en-US/docs/Using_CXX_in_Mozilla_code
Depends on: xp-eol
IIRC, Android doesn't support thread local storage, which may or may not impact the availability of thread_local. So you should check what the story is on Android, too.
I searched "ndk thread_local" and didn't find many complaint about its not being supported. There is an issue reports that Clang with LTO fails to compile program using thread_local [1], and in that issue, the reporter said "Everything works without -flto." which indicates that Android is supposed to support thread_local.

IIRC we still uses GCC for Android build, so Clang LTO may not be an issue for us for now.

[1] https://github.com/android-ndk/ndk/issues/156
FWIW, I don't think there's any reason we wouldn't switch to thread_local once we 1) have compiler support everywhere, and 2) that support is "optimal" (if memory serves, this means memory for the thread-local is allocated at thread allocation time).

I think we'd want to (at a minimum) discourage use of thread_local with classes with non-trivial destructors, requiring people to manually manage thread_local pointers (or some similar scheme) for more complicated objects.  But given thread_local only supports pointer-like things now (and we don't really support things with destructors), that shouldn't be any barrier to switching.
Moreover, given that mozilla::ThreadLocal appears to be using pthread_getspecific/TlsGetValue, I think we'd get definite *speedup* switching to thread_local which, last I looked, had much better codegen.
(In reply to Luke Wagner [:luke] from comment #4)
> Moreover, given that mozilla::ThreadLocal appears to be using
> pthread_getspecific/TlsGetValue, I think we'd get definite *speedup*
> switching to thread_local which, last I looked, had much better codegen.

ThreadLocal uses __thread on POSIX-y systems if the compiler supports it nowadays.  I think now that we've dropped XP/Vista support, we could use __declspec(thread) on Windows and get the same benefits?
Oh, nice, n/m then :)
There was talk around TLS stuff at the quantum DOM meetings in Hawaii.  Bill might have opinions here.
I don't think this change should have much impact on Quantum DOM. If we need custom TLS functionality in certain places, we can just switch classes as needed. Whether we switch from ThreadLocal or thread_local isn't a big deal.
Blocks: xp-eol
No longer depends on: xp-eol
See bug 1348419 for some efforts.

In particular, there we found that GCC 4.8 release notes say that __thread is still faster than thread_local:
https://gcc.gnu.org/gcc-4.8/changes.html#cxx

says:

> Unfortunately, this support requires a run-time penalty for references to non-function-local thread_local variables defined in a different translation unit even if they don't need dynamic initialization, so users may want to continue to use __thread for TLS variables with static initialization semantics. 

Removing the mozilla::ThreadLocal wrapper is still possible.
FWIW in bug 1348419 it turned out that dealing with OSX and Android was too much of a pain,
in the end these still need ThreadLocal.  For Androi it is easy to test: just do a try run
using __thread or thread_local -- the bug was linker errors.  For OSX the problem was that
thread_local was not supported by a version of xcode a bit older than whatever is in use
on try.
NDK14+ (March 2017) supports thread_local now:
https://android-developers.googleblog.com/2017/03/introducing-android-native-development.html

For thread_local vs __thread, it sounds like if you use thread_local in places where __thread is valid, they have the same cost these days, so there should be no reason to use __thread if thread_local is supported:
https://stackoverflow.com/a/13123870

I'm adding a cpp test for ThreadLocal that tests thread_local, MOZ_THREAD_LOCAL, and ThreadLocal<T,KeyStorage>, and support looks good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=16c0b519be407842c7b74808822af2ac245d0da0

I think at this point we're clear to deprecate MOZ_THREAD_LOCAL, though we need to preserve ThreadLocal<T,KeyStorage> for some jemalloc (osx?) configurations:
https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/memory/build/mozjemalloc.cpp#1223
Assignee: nobody → jgilbert
Bug 1403715 is relevant for Android, except if thread_local doesn't use TLS segments under the hood.
I checked my try run and I hadn't enabled the test: It was building (and linking, I can only infer!), just not run. Doing another try push with the test running.
Trying to use thread_local everywhere did cause trouble though:

> 01:34:20     INFO -  /builds/worker/workspace/build/src/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x/../../../../arm-linux-androideabi/bin/ld: error: ../../js/src/build/libjs_static.a(Unified_cpp_js_src_jit0.o): requires unsupported dynamic reloc R_ARM_REL32; recompile with -fPIC
> 01:34:20     INFO -  /builds/worker/workspace/build/src/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x/../../../../arm-linux-androideabi/bin/ld: error: ../../js/src/build/libjs_static.a(Unified_cpp_js_src_jit0.o): requires unsupported dynamic reloc R_ARM_REL32; recompile with -fPIC
> 01:34:20     INFO -  /builds/worker/workspace/build/src/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x/../../../../arm-linux-androideabi/bin/ld: error: read-only segment has dynamic relocations
> 01:34:20     INFO -  clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
> 01:34:20     INFO -  /builds/worker/workspace/build/src/config/rules.mk:710: recipe for target 'libxul.so' failed
> 01:34:20    ERROR -  make[4]: *** [libxul.so] Error 1

We appear to still be using gcc's binaries to link?
(In reply to Jeff Gilbert [:jgilbert] from comment #15)
> Trying to use thread_local everywhere did cause trouble though:
> 
> > 01:34:20     INFO -  /builds/worker/workspace/build/src/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x/../../../../arm-linux-androideabi/bin/ld: error: ../../js/src/build/libjs_static.a(Unified_cpp_js_src_jit0.o): requires unsupported dynamic reloc R_ARM_REL32; recompile with -fPIC
> > 01:34:20     INFO -  /builds/worker/workspace/build/src/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x/../../../../arm-linux-androideabi/bin/ld: error: ../../js/src/build/libjs_static.a(Unified_cpp_js_src_jit0.o): requires unsupported dynamic reloc R_ARM_REL32; recompile with -fPIC
> > 01:34:20     INFO -  /builds/worker/workspace/build/src/android-ndk/toolchains/arm-linux-androideabi-4.9/prebuilt/linux-x86_64/lib/gcc/arm-linux-androideabi/4.9.x/../../../../arm-linux-androideabi/bin/ld: error: read-only segment has dynamic relocations
> > 01:34:20     INFO -  clang-7: error: linker command failed with exit code 1 (use -v to see invocation)
> > 01:34:20     INFO -  /builds/worker/workspace/build/src/config/rules.mk:710: recipe for target 'libxul.so' failed
> > 01:34:20    ERROR -  make[4]: *** [libxul.so] Error 1
> 
> We appear to still be using gcc's binaries to link?

Yes, this is expected, but presumably the same error would show up with lld, since the text segment has dynamic relocations regardless.

It seems to pass try with at least one instance replace with thread_local:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8e0ea47ba9d0dc026c82bb071afc4ebe4d79f4f

(In reply to Jeff Gilbert [:jgilbert] from comment #17)

It seems to pass try with at least one instance replace with thread_local:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8e0ea47ba9d0dc026c82bb071afc4ebe4d79f4f

I don't see Android builds on that push? Or do Android builds not matter for that particular instance that was replaced?

You're right, that one missed the Android builds. Triggering those now.

Android builds, even API16.

The problem is not whether they build, it's whether they work. And if they do on all the supported versions of Android. And I'm sure we don't even need to go to the latter question.

Google's current C++ style guide entry sounds promising, though not free of caveats:
https://google.github.io/styleguide/cppguide.html#thread_local

thread_local variables that aren't declared inside a function must be initialized with a true compile-time constant, and this must be enforced by using the ABSL_CONST_INIT attribute. Prefer thread_local over other ways of defining thread-local data.

This thread sounds generally promising that issues are fixed for Chromium as of early 2019:
https://groups.google.com/a/chromium.org/forum/#!topic/chromium-dev/2msN8k3Xzgs

I'm not sure how to test this, since we don't have e.g. gtests for API16+ builds, seems like.

It sounds to me that this was an issue fixed by updating the NDK, and that old versions of Android shouldn't cause this to not work, so long as the NDK generally works with them still.

Except for comment 13...

(In reply to Mike Hommey [:glandium] from comment #24)

Except for comment 13...

Hm, Bug 1403715 is closed as WONTFIX. What's the status of that? Is the custom linker still used? Is there a bug tracking its removal?

Flags: needinfo?(mh+mozilla)

It is still used. But even without the custom linker, though, only Android versions >= 10 support thread local storage in the system linker. Google removed their version usage dashboard, and I don't have Android Studio handy, but this screenshot from April says more than 90% of devices have a version < 10. But it may or may not be relevant, because the question from comment 13 still applies: does thread_local require TLS on Android or not?

Flags: needinfo?(mh+mozilla)

(In reply to Mike Hommey [:glandium] from comment #26)

because the question from comment 13 still applies: does thread_local require TLS on Android or not?

I am not familiar with the details here, but would that be achieved by adding -femulated-tls (https://reviews.llvm.org/D10522) ?

Probably.

Note that we do need to keep ThreadLocal at least for the memory allocator, and that the reason we do probably applies to emulated TLS, so the memory allocator would probably need some #ifdefs adjusted to include/exclude Android.

Offhand remark: MOZ_THREAD_LOCAL on macOS appears to be quite slow. Moving a thread-local access off a hot path onto that path slowed a microbenchmark down by 33% on macOS; comparable code on Linux actually appears to have sped up. (Bug 1742779 comment 4, still looking into this, including for Windows.)

MOZ_THREAD_LOCAL on macOS uses thread_local.

Is there any progress in verifying if we can fix the issues with thread_local on Android?

My app is running out of PTHREAD_KEYS_MAX keys when integrating GeckoView, as GeckoView makes heavy use of pthread_key_create (at least 27-32 keys from what I could tell), while on Android the limit is 128. This, coupled with Unity Engine using up to ~65 on its own too, and the OS itself using 15+ keys before my app runs, means it's bound to be a disaster for any non-trivial apps to integrate.

Locally forcing the thread_local or __thread branch in mfbt/ThreadLocal.h results in GeckoView silently stopping during initialization.

The C++ Portability Guide does mention it's not supported on Android, but isn't clear on why. It works in a simple C++ Android app.

https://firefox-source-docs.mozilla.org/code-quality/coding-style/using_cxx_in_firefox_code.html?highlight=c%2011#notes
"Thread locals: thread_local is not supported on Android."

Flags: needinfo?(mh+mozilla)

It works in a simple C++ Android app.

Does it work on Android 4.1?

Flags: needinfo?(mh+mozilla) → needinfo?(dvir.azulay)

(In reply to Mike Hommey [:glandium] from comment #33)

It works in a simple C++ Android app.

Does it work on Android 4.1?

I didn't test, and not sure how this goes, but I'd say it's super valuable to get it to work optionally in GeckoView so if someone does target only modern-ish Android OSes they can avoid this limitation by changing the flag and compiling GeckoView themselves :)

Flags: needinfo?(dvir.azulay)
Severity: normal → S3

It would be cool if we could do this, but I don't think it's actionable by me.

Assignee: jgilbert → nobody
You need to log in before you can comment on or make changes to this bug.