Atomic<bool, Relaxed> should mean no synchronization on windows x86

RESOLVED FIXED in mozilla35

Status

()

RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: h4writer, Assigned: h4writer)

Tracking

unspecified
mozilla35
x86
Windows 7
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

4 years ago
We use an atomic to check if there is an interrupt [0], but this is really important to be able to do that fast. When looking into assembly on linux, this generates a normal load. On windows this generates a "lock cmpxchg dword ptr [edx], ecx". This shouldn't be needed.

We might have to change to "volatile bool/volatile int32" for interrupt if this persist, since it is hurting us on benchmarks

[0] http://dxr.mozilla.org/mozilla-central/source/js/src/vm/Runtime.h#720
(Assignee)

Updated

4 years ago
Blocks: 1061646
(In reply to Hannes Verschore [:h4writer] from comment #0)
> We might have to change to "volatile bool/volatile int32" for interrupt if
> this persist, since it is hurting us on benchmarks

Please no, let's fix Atomics.h instead.
What version of MSVC are you compiling with? On MSVC 2010 and below [0], we don't get MOZ_HAVE_CXX11_ATOMICS and load looks like it shouldn't invoke any atomics [1]. On MSVC 2012 and up we use <atomic> directly [2]; I wouldn't expect that to be wrong, though.

[0] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#43
[1] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#758
[2] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#228
(Assignee)

Comment 3

4 years ago
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #2)
> What version of MSVC are you compiling with? On MSVC 2010 and below [0], we
> don't get MOZ_HAVE_CXX11_ATOMICS and load looks like it shouldn't invoke any
> atomics [1]. On MSVC 2012 and up we use <atomic> directly [2]; I wouldn't
> expect that to be wrong, though.
> 
> [0] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#43
> [1] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#758
> [2] http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#228

I'm compiling on MSVC 2012 (32bit on 64bit host)
I wouldn't expect the original assembly to come from ::load, I'd expect it to come from ::exchange or ::compareExchange (probably the latter):

http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#620

Hannes, can you verify where the actual call that's generating that instruction is?
Flags: needinfo?(hv1989)
(Assignee)

Comment 5

4 years ago
It is here:
http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#230

Apparently this is a known issue solved in VC12 (MSVC 2013)
https://connect.microsoft.com/VisualStudio/feedback/details/770885/std-atomic-load-implementation-is-absurdly-slow

And we want to start using MSVC 2013 in this release \0/
Depends on: 914596
Flags: needinfo?(hv1989)
Whoa, so their <atomic> is wrong. Should we adjust the conditional on http://dxr.mozilla.org/mozilla-central/source/mfbt/Atomics.h#43 to |_MSC_VER >= 1800| (MSVC 2013 and up)?
(Assignee)

Comment 7

4 years ago
Created attachment 8483535 [details] [diff] [review]
Only use the MSVC atomic for 2013+

See previous comment. So 2012 has a slow implementation on atomic. It is fixed in 2013 (not confirmed, I'll try to get a 2013 environment to check). So for now use the backup implementation for msvc beneath version 2013.
Assignee: nobody → hv1989
Attachment #8483535 - Flags: review?(nfroyd)
(Assignee)

Comment 8

4 years ago
Created attachment 8483542 [details] [diff] [review]
Only use the MSVC atomic for 2013+

Comment by ehoogeveen ;)
Attachment #8483535 - Attachment is obsolete: true
Attachment #8483535 - Flags: review?(nfroyd)
Attachment #8483542 - Flags: review?(nfroyd)
Comment on attachment 8483542 [details] [diff] [review]
Only use the MSVC atomic for 2013+

Review of attachment 8483542 [details] [diff] [review]:
-----------------------------------------------------------------

::: mfbt/Atomics.h
@@ +42,5 @@
>  #  endif
> +/*
> + * Although Visual Studio 2012's CRT supports <atomic>, its atomic load
> + * implementation unnecessarily uses an atomic intrinsic for the less
> + * restrictive memory orderings, which can be prohibitively expensive.

Optional: webpage link might be nice.

@@ +43,5 @@
> +/*
> + * Although Visual Studio 2012's CRT supports <atomic>, its atomic load
> + * implementation unnecessarily uses an atomic intrinsic for the less
> + * restrictive memory orderings, which can be prohibitively expensive.
> + * Therefore, we require at least Visual Studio 2013 for using the CRT.

Thank you for the explanatory comment!
Attachment #8483542 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 10

4 years ago
I didn't add the webpage, but the bugnumber instead. (Since webpages can change places or get removed).
https://hg.mozilla.org/integration/mozilla-inbound/rev/52f69f7160e2

Comment 11

4 years ago
Question on the pre-existing self-hosted Relaxed implementation: I was surprised to see that the ValueType was just 'T' and Barrier<Relaxed> methods were all noops.  It seems like this would free up the compiler to eliminate stores entirely (or hoist them out of a loop etc).  I know Relaxed is the "anything goes" model, but I think that is w.r.t ordering, not whether there is a store at all.  I would've expected either ValueType to be 'volatile T' or for us to do something like ACCESS_ONCE() in Linux: http://lwn.net/Articles/508991/.
https://hg.mozilla.org/mozilla-central/rev/52f69f7160e2
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
On first impression hoisting a constant-store out of a loop sounds like it would meet the Relaxed requirements.  This would be as if all the stores implicitly occurred at one instantaneous moment, indivisibly separated from each other.  Relaxed stores don't synchronize with anything else, whether it happens inside the loop or outside it, so this should be totally permissible.  Just as long as at least one store happens, which it abstractly seems to me at least one would.  I'll let jcranmer poke in here with his thoughts on this, too.
Flags: needinfo?(Pidgeot18)

Comment 14

4 years ago
Hoisting a store out of a loop is only one case, there are a bunch more (see lwn article above).
The guarantees on std::atomic for memory_order_relaxed amount to:
* Cache coherency
* Global well-ordering of variable access
* Atomicity of rmw operations and no load/store splitting

As to what a compiler can do with regular load/stores that it can't do with relaxed loads, I think the only class that might implemented in a compiler is introducing speculative loads.

As a user, you really shouldn't be using relaxed order except for RMW operations. In the only cases where we might see behavioral differences, we would be having a "benign" data race, which scares me in general and we shouldn't have those in our codebase at all.

Various other polyfills also use non-volatile T from my recollection, and I think our original implementation was slightly cribbed from libstdc++ 4.5 :-)
Flags: needinfo?(Pidgeot18)
You need to log in before you can comment on or make changes to this bug.