Closed Bug 859257 Opened 12 years ago Closed 12 years ago

Clang miscompiles MConstant::computeRange / ToInt64, hurts performance

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(3 files)

Below is a testcase reduced from a Marsaglia RNG written in JS. I get these numbers: Clang 3.2 debug: 144 ms Clang 3.2 opt : 2874 ms So the opt build is a lot slower. Nightly builds seem to have the same problem. It looks like Clang miscompiles this line in MConstant::computeRange(): int64_t integral = ToInt64(d); Integral ends up being 0 instead of 36969 and this messes up range analysis. I think in ToIntWidth, we end up reading du.s.lo here instead of du.s.hi: di_h = du.s.hi; I tried to move the ToInt64 call into a JS_NEVER_INLINE function, but that didn't help. function marsaglia(m_z) { for (var i = 0; i < 100000000; ++i) { m_z = (36969 * m_z)|0; } return m_z; } var t = new Date; var res = marsaglia(2); print(new Date - t); print(res);
Summary: Clang miscompiles MConstant::computeRange / Int64, hurts performance → Clang miscompiles MConstant::computeRange / ToInt64, hurts performance
Attached patch PatchSplinter Review
Marking DoublePun values as |volatile| seems to avoid the problem..
Assignee: general → jdemooij
Status: NEW → ASSIGNED
Attachment #734552 - Flags: review?(jwalden+bmo)
Nathan, do we need to upstream this to clang, or are we violating aliasing rules on our end somehow? I was told to ping you now that respindola is no longer around....
Flags: needinfo?(nfroyd)
(In reply to Boris Zbarsky (:bz) from comment #2) > Nathan, do we need to upstream this to clang, or are we violating aliasing > rules on our end somehow? I was told to ping you now that respindola is no > longer around.... I don't see any aliasing violations here; there's no pointer cast games, which usually is the source of aliasing woes. Jan, what platform are you seeing the miscompilation on?
Flags: needinfo?(nfroyd) → needinfo?(jdemooij)
Also, Jan, if you could provide preprocessed source for RangeAnalysis.cpp, that would be helpful. Something like: make -C js/src RangeAnalysis.i should do the trick.
(In reply to Nathan Froyd (:froydnj) from comment #3) > Jan, what platform are you seeing the miscompilation on? OS X, 32-bit build, Clang 3.2: $ clang++ --version clang version 3.2 (tags/RELEASE_32/final) Target: x86_64-apple-darwin11.4.2 Thread model: posix Nightly builds seem to have the same problem though.
Flags: needinfo?(jdemooij)
Attached file RangeAnalysis.i.bz2
(In reply to Nathan Froyd (:froydnj) from comment #4) > Also, Jan, if you could provide preprocessed source for RangeAnalysis.cpp, > that would be helpful. Attaching RangeAnalysis.i.bz2
Thanks Jan. Can you tell me what the double that gets misconverted is?
Flags: needinfo?(jdemooij)
(In reply to Nathan Froyd (:froydnj) from comment #7) > Thanks Jan. Can you tell me what the double that gets misconverted is? It's 36969.0 (int32 stored as double), note that this value is from the JS testcase in comment 0.
Flags: needinfo?(jdemooij)
Comment on attachment 734552 [details] [diff] [review] Patch Review of attachment 734552 [details] [diff] [review]: ----------------------------------------------------------------- Doesn't make me happy, but if it works, maybe it's enough for now. I'd like to see this random nuke-from-orbit volatility gone really quick, tho -- I'd be surprised if this doesn't hurt perf on To* operations in general, even if it makes them correct.
Attachment #734552 - Flags: review?(jwalden+bmo) → review+
I've been debugging this and I don't see anything clang is doing that is obviously wrong...but I did notice this in the preprocessed output: union DoublePun { struct { uint32_t hi, lo; } s; uint64_t u64; double d; }; which comes from:union DoublePun { struct { uint32_t lo, hi; uint32_t hi, lo; } s; uint64_t u64; double d; }; ...and that looks like we're not picking up the right definitions from jscpucfg.h. The breakage here is strictly limited to RangeAnalysis.cpp, since it includes vm/NumericConversions.h without including other js headers prior to it. On the plus (minus?) side, this means that it wasn't just a clang issue: all platforms should benefit from this fix. Testing with the fix on OS X shows the opt config to be slightly faster than debug: bld-lion-r5-054:~ cltbld$ ./build-js-debug/js test.js 176 869830658 bld-lion-r5-054:~ cltbld$ ./build-js-opt/js test.js 173 869830658 If you'd rather have a separate backout of Jan's patch followed by the There are other issues here (FPU_IS_ARM_FPA seems to be undefined by *anything*, including configury; we should be getting DoublePun definitions from One True Source, etc. etc.), but those can be handled in separate bugs. I'll file a followup for ditching FPU_IS_ARM_FPA; there's already open bugs on enhancing FloatingPoint.h and consolidating the DoublePun definitions.
Attachment #736918 - Flags: review?(jwalden+bmo)
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
(In reply to Nathan Froyd (:froydnj) from comment #11) > ...and that looks like we're not picking up the right definitions from > jscpucfg.h. Very nice find, thanks a lot. I do wonder though why adding "volatile" avoids the problem, any idea?
(In reply to Jan de Mooij [:jandem] from comment #13) > (In reply to Nathan Froyd (:froydnj) from comment #11) > > ...and that looks like we're not picking up the right definitions from > > jscpucfg.h. > > Very nice find, thanks a lot. I do wonder though why adding "volatile" > avoids the problem, any idea? That's a good question. I'm not really sure. "volatile" shouldn't be changing anything, since we have the wrong structure definition in the first place! My guess is that swapping the structure members around somehow enabled the compiler to do some aggressive-but-not-valid transformations. And that the results of those optimizations made RangeAnalysis exhibit particularly bad behavior. "volatile" probably prevented propagation of the facts the compiler needed to do those transformations. Which means that RangeAnalysis exhibited normal (if wrong?) behavior once again. It'd be interesting to see where the opt build--before any of the fixes from this bug--was spending its time running that testcase. I think it's a good bet something in RangeAnalysis was going haywire.
Comment on attachment 736918 [details] [diff] [review] include jscpucfg.h in NumericConversions.h so DoublePun definitions are correct Review of attachment 736918 [details] [diff] [review]: ----------------------------------------------------------------- Sigh. Good enough for now.
Attachment #736918 - Flags: review?(jwalden+bmo) → review+
Jan pointed out this morning that the fix I committed was quite different from the patch Waldo reviewed: my commit simply added the #include without removing the conditionals for __clang__. I blame my poor rebasing skills. I've fixed that up by backing out Jan's original commit: https://hg.mozilla.org/integration/mozilla-inbound/rev/f10f33fc2e93
Depends on: 882486
Depends on: 886246
Depends on: 888568
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: