Very useful report, thank you Tyson.
In this case, we have:
- A thread is about to wait for a
Monitor, and notifies the profiler about it through
profiler_thread_sleep, which modifies an atomic value in the appropriately-named
- The main thread is shutting down, and near the end of that, when all other threads should have already shutdown, is calling
profiler_shutdown, which destroys its list of
RegisteredThreads and associated
In the report, it seems that the actual ordering is fine: Modifying an atomic, and later destroying that atomic.
But the main issue is that there is no proper guard to prevent the reverse order, i.e.:
profiler_shutdown destroying the atomic, but the thread could still actually exist and try to modify that destroyed atomic.
I think in practice this should not happen, since non-main threads are supposed to have been stopped and "joined" by the time the main thread is in that phase of its shutdown; but there could still be some long-living threads surviving (bug elsewhere that makes us forget to terminate them, or threads started by external code, which we cannot control).
Also there have been other reports that could be caused by such long-living threads. And some bugs had to deal with uncertain thread-termination times, by adding multi-ownership handles to profiling stacks (see bug 1445822).
Also bug 1701874 seems to be due to a thread shutting down after the main thread's
So I think we should certainly try to fix this race issue, and it could resolve a few related bugs.
I think one solution would be to let the thread own its
RegisteredThread, and register/unregister it with the profiler through proper mutexed functions. This would require some significant rework, while trying to keep the low overhead of some time-critical accesses.
It's not a security issue though, as it's happening at the end of process shutdown with no user-controllable code. (But I cannot remove the security flag.)
Also in Firefox releases, we're actually calling
exit(0) earlier, so most users won't even reach that point. But it would still be good to fix, to help without our internal tests that don't exit early.