Closed Bug 1061764 Opened 10 years ago Closed 10 years ago

Atomic<bool, Relaxed> should mean no synchronization on windows x86

Categories

(Core :: MFBT, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla35

People

(Reporter: h4writer, Assigned: h4writer)

References

Details

Attachments

(1 file, 1 obsolete file)

We use an atomic to check if there is an interrupt [0], but this is really important to be able to do that fast. When looking into assembly on linux, this generates a normal load. On windows this generates a "lock cmpxchg dword ptr [edx], ecx". This shouldn't be needed. We might have to change to "volatile bool/volatile int32" for interrupt if this persist, since it is hurting us on benchmarks [0] http://dxr.mozilla.org/mozilla-central/source/js/src/vm/Runtime.h#720
Blocks: 1061646
(In reply to Hannes Verschore [:h4writer] from comment #0) > We might have to change to "volatile bool/volatile int32" for interrupt if > this persist, since it is hurting us on benchmarks Please no, let's fix Atomics.h instead.
What version of MSVC are you compiling with? On MSVC 2010 and below [0], we don't get MOZ_HAVE_CXX11_ATOMICS and load looks like it shouldn't invoke any atomics [1]. On MSVC 2012 and up we use <atomic> directly [2]; I wouldn't expect that to be wrong, though. [0] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#43 [1] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#758 [2] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#228
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #2) > What version of MSVC are you compiling with? On MSVC 2010 and below [0], we > don't get MOZ_HAVE_CXX11_ATOMICS and load looks like it shouldn't invoke any > atomics [1]. On MSVC 2012 and up we use <atomic> directly [2]; I wouldn't > expect that to be wrong, though. > > [0] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#43 > [1] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#758 > [2] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#228 I'm compiling on MSVC 2012 (32bit on 64bit host)
I wouldn't expect the original assembly to come from ::load, I'd expect it to come from ::exchange or ::compareExchange (probably the latter): http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#620 Hannes, can you verify where the actual call that's generating that instruction is?
Flags: needinfo?(hv1989)
It is here: http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#230 Apparently this is a known issue solved in VC12 (MSVC 2013) https://connect.microsoft.com/VisualStudio/feedback/details/770885/std-atomic-load-implementation-is-absurdly-slow And we want to start using MSVC 2013 in this release \0/
Depends on: VC12
Flags: needinfo?(hv1989)
Whoa, so their <atomic> is wrong. Should we adjust the conditional on http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#43 to |_MSC_VER >= 1800| (MSVC 2013 and up)?
See previous comment. So 2012 has a slow implementation on atomic. It is fixed in 2013 (not confirmed, I'll try to get a 2013 environment to check). So for now use the backup implementation for msvc beneath version 2013.
Assignee: nobody → hv1989
Attachment #8483535 - Flags: review?(nfroyd)
Comment by ehoogeveen ;)
Attachment #8483535 - Attachment is obsolete: true
Attachment #8483535 - Flags: review?(nfroyd)
Attachment #8483542 - Flags: review?(nfroyd)
Comment on attachment 8483542 [details] [diff] [review] Only use the MSVC atomic for 2013+ Review of attachment 8483542 [details] [diff] [review]: ----------------------------------------------------------------- ::: mfbt/Atomics.h @@ +42,5 @@ > # endif > +/* > + * Although Visual Studio 2012's CRT supports <atomic>, its atomic load > + * implementation unnecessarily uses an atomic intrinsic for the less > + * restrictive memory orderings, which can be prohibitively expensive. Optional: webpage link might be nice. @@ +43,5 @@ > +/* > + * Although Visual Studio 2012's CRT supports <atomic>, its atomic load > + * implementation unnecessarily uses an atomic intrinsic for the less > + * restrictive memory orderings, which can be prohibitively expensive. > + * Therefore, we require at least Visual Studio 2013 for using the CRT. Thank you for the explanatory comment!
Attachment #8483542 - Flags: review?(nfroyd) → review+
I didn't add the webpage, but the bugnumber instead. (Since webpages can change places or get removed). https://hg.mozilla.org/integration/mozilla-inbound/rev/52f69f7160e2
Question on the pre-existing self-hosted Relaxed implementation: I was surprised to see that the ValueType was just 'T' and Barrier<Relaxed> methods were all noops. It seems like this would free up the compiler to eliminate stores entirely (or hoist them out of a loop etc). I know Relaxed is the "anything goes" model, but I think that is w.r.t ordering, not whether there is a store at all. I would've expected either ValueType to be 'volatile T' or for us to do something like ACCESS_ONCE() in Linux: http://lwn.net/Articles/508991/.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
On first impression hoisting a constant-store out of a loop sounds like it would meet the Relaxed requirements. This would be as if all the stores implicitly occurred at one instantaneous moment, indivisibly separated from each other. Relaxed stores don't synchronize with anything else, whether it happens inside the loop or outside it, so this should be totally permissible. Just as long as at least one store happens, which it abstractly seems to me at least one would. I'll let jcranmer poke in here with his thoughts on this, too.
Flags: needinfo?(Pidgeot18)
Hoisting a store out of a loop is only one case, there are a bunch more (see lwn article above).
The guarantees on std::atomic for memory_order_relaxed amount to: * Cache coherency * Global well-ordering of variable access * Atomicity of rmw operations and no load/store splitting As to what a compiler can do with regular load/stores that it can't do with relaxed loads, I think the only class that might implemented in a compiler is introducing speculative loads. As a user, you really shouldn't be using relaxed order except for RMW operations. In the only cases where we might see behavioral differences, we would be having a "benign" data race, which scares me in general and we shouldn't have those in our codebase at all. Various other polyfills also use non-volatile T from my recollection, and I think our original implementation was slightly cribbed from libstdc++ 4.5 :-)
Flags: needinfo?(Pidgeot18)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: