Closed Bug 1004519 Opened 6 years ago Closed 5 years ago

consider turning on #pragma intrinsic for mem*/str* on MSVC, or just -Oi

Categories

(Firefox Build System :: General, defect)

All
Windows 7
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla32

People

(Reporter: froydnj, Assigned: froydnj)

Details

Attachments

(1 file)

A fair number of people expect memcpy/memset with constant sizes to be inlined.  For instance:

http://hg.mozilla.org/mozilla-central/file/35f9431188ca/mfbt/PodOperations.h#l35

is just stupid if memset isn't inlined.

Our default settings with MSVC cause this expected inlining not to happen; light testing with MSVC indicates that one needs either -Oi or -O2 to trigger inline code in these cases.  I know that I have written patches with some obtuse code on the assumption that MSVC did *not* inline these; I'm pretty sure there are a reasonable # of other instances where people do expect the compiler to perform the expected inlining and MSVC currently doesn't.

(PodOperations.h originally comes from the JS engine, and the JS engine compiles its sources with -O2.  That's likely the reason the above is written how it is, because the author expects the memset to be inlined everywhere.  It wouldn't be in non-JS engine code, which strikes me as a performance footgun waiting to happen.)

We should see what happens if we turn on -Oi, both in code size and performance, and/or investigate putting the appropriate #pragma intrinsic lines in mozilla-config.h.  It looks like MSVC attempts to use the string move instructions at -O1 -Oi; I'm not sure that's necessarily the best decision, but it's what we're getting.)

CC'ing Eric because I was discussing this with him yesterday; CC'ing and NI'ing Benjamin because he might know whether we had this on at one point and turned it off for size/space/performance reasons.
Flags: needinfo?(benjamin)
I don't remember anyone experimenting with -Oi... I didn't know it existed.

We did experiment with -O1/-O2 and -O1 produces better results in general for us.
Flags: needinfo?(benjamin)
According to the MSDN page [1], -O1 and -O2 are equivalent to using the following flags:

-O1: -Og         -Os -Oy -Ob2 -Gs -GF -Gy
-O2: -Og -Oi -Ot     -Oy -Ob2 -Gs -GF -Gy

So the difference is that -O1 implies -Os, whereas -O2 implies -Ot and -Oi. -Ot and -Os are mutually exclusive, favoring speed or code size respectively, but -Oi can be specified separately.

[1] http://msdn.microsoft.com/en-us/library/8f8h5cxt%28v=vs.100%29.aspx
I was curious, so I kicked off some try runs.  Ideally, I will have enabled just PGO on this one:

https://tbpl.mozilla.org/?tree=Try&rev=1b1e1070372b

and this one will flip -Oi on with PGO:

https://tbpl.mozilla.org/?tree=Try&rev=aa957bb8dd0e

We save a small amount of space:

@nightcrawler:~$ unzip -l ff-win-pgo-intrinsics.zip |grep xul
 24745200  2014-05-07 16:46   firefox/xul.dll
@nightcrawler:~$ unzip -l ff-win-pgo-baseline.zip |grep xul
 24782576  2014-05-07 16:36   firefox/xul.dll

~35K or so less space for using intrinsics, not too bad.

The Talos results appear somewhat mixed:

http://perf.snarkfest.net/compare-talos/index.html?oldRevs=1b1e1070372b&newRev=aa957bb8dd0e&submit=true

It looks like we win more than we lose, and we gain about ~0.5-1% on most tests.  Given the variability of the numbers, I'm not sure if the losses (or the gains, for that matter) are real or not.  I'm in the process of retriggering all the tests multiple times; at least the tp suite gives some not-quite-steady numbers, and compare-talos appears to take multiple runs into account.  Maybe we'll have a little more information when that completes.
Some of the sub-tests seem to have a mind of their own, particularly under Dromaeo. If you get suspicious numbers, maybe try another pair of runs against a different baseline. I've seen consistent large gains become consistent large losses and vice versa.
(In reply to David Major [:dmajor] (UTC+12) from comment #4)
> Some of the sub-tests seem to have a mind of their own, particularly under
> Dromaeo. If you get suspicious numbers, maybe try another pair of runs
> against a different baseline. I've seen consistent large gains become
> consistent large losses and vice versa.

Retriggering seems to have helped the situation...I think.  At least there are no longer nearly as many +/- blobs.  Kraken seems to have regressed, but I think that's because many of the kraken builds died and the baseline is not a good one.  tp5n_main_normal_fileio_paint regressed 5%, but I'm not sure we really care about that.  Likewise, tp5n_main_normal_netio_paint shows a 23% improvement, which is probably bogus.

tp5o_shutdown_paint is still showing large regressions on windows xp and window 8, but I think they're not being flagged red/green because the variation is so large.

I think this is probably worth landing.
We get a small size win from doing this, we align our Windows builds with
developer expectations around mem*, and the patch appears to be fairly
performance-neutral.  What say you?
Attachment #8420337 - Flags: review?(mh+mozilla)
Comment on attachment 8420337 [details] [diff] [review]
enable intrinsic replacement on windows builds

Review of attachment 8420337 [details] [diff] [review]:
-----------------------------------------------------------------

http://msdn.microsoft.com/en-us/library/f99tchzc%28v=vs.100%29.aspx

"The intrinsic floating-point functions do not perform any special checks on input values and so work in restricted ranges of input, and have different exception handling and boundary conditions than the library routines with the same name. Using the true intrinsic forms implies loss of IEEE exception handling, and loss of _matherr and errno functionality; the latter implies loss of ANSI conformance. However, the intrinsic forms can considerably speed up floating-point-intensive programs, and for many programs, the conformance issues are of little practical value."

That could be a problem, but i don't know if we rely on those.

OTOH, http://msdn.microsoft.com/en-us/library/e7s85ffb%28v=vs.100%29.aspx says /fp:precise is the default and /fp:precise disables intrinsics.

Based on the documentation alone, it's not really clear what using -Oi alone does to the floating point functions that have intrinsics. Could you check? I'm almost tempted to say we should have a configure test to double check the compiler is not screwing with floating point intrinsics.
Attachment #8420337 - Flags: review?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #7)
> Based on the documentation alone, it's not really clear what using -Oi alone
> does to the floating point functions that have intrinsics. Could you check?
> I'm almost tempted to say we should have a configure test to double check
> the compiler is not screwing with floating point intrinsics.

The floating-point math functions that have intrinsics are |abs| and |fabs|.  -Oi just uses fabs for this, which I believe is all the standard C library uses.

The integer |abs| and |labs| instrinsics return the same values (including for *_MIN) as their library counterparts.

I guess we could have problems if -Oi ever added other functions.  But that seems somewhat unlikely (?).
(In reply to Nathan Froyd (:froydnj) from comment #8)
> (In reply to Mike Hommey [:glandium] from comment #7)
> > Based on the documentation alone, it's not really clear what using -Oi alone
> > does to the floating point functions that have intrinsics. Could you check?
> > I'm almost tempted to say we should have a configure test to double check
> > the compiler is not screwing with floating point intrinsics.
> 
> The floating-point math functions that have intrinsics are |abs| and |fabs|.
> -Oi just uses fabs for this, which I believe is all the standard C library
> uses.

There's also atan, exp, log10, sqrt, atan2, log, sin, tan, cos.
cf. http://msdn.microsoft.com/en-us/library/tzkfha43%28v=vs.100%29.aspx
(In reply to Mike Hommey [:glandium] from comment #9)
> (In reply to Nathan Froyd (:froydnj) from comment #8)
> > (In reply to Mike Hommey [:glandium] from comment #7)
> > > Based on the documentation alone, it's not really clear what using -Oi alone
> > > does to the floating point functions that have intrinsics. Could you check?
> > > I'm almost tempted to say we should have a configure test to double check
> > > the compiler is not screwing with floating point intrinsics.
> > 
> > The floating-point math functions that have intrinsics are |abs| and |fabs|.
> > -Oi just uses fabs for this, which I believe is all the standard C library
> > uses.
> 
> There's also atan, exp, log10, sqrt, atan2, log, sin, tan, cos.
> cf. http://msdn.microsoft.com/en-us/library/tzkfha43%28v=vs.100%29.aspx

Missed those the first time around.

I looked at the codegen for acos/asin/cosh/fmod/pow/sinh/tanh.  The documentation is correct here, the only differences are whether you're passing args on the stack or args in FP registers.

The intrinsics that are supported directly as x87 instructions are only generated when -fp:fast, which we don't do.  For -Oi and -fp:strict, these x87-implementable intrinsics pass their arguments in FP registers, just like the other group.

So I think we're OK here as far as FP stuff goes.
Comment on attachment 8420337 [details] [diff] [review]
enable intrinsic replacement on windows builds

Given comment 10, re-nominating for review.
Attachment #8420337 - Flags: review?(mh+mozilla)
Comment on attachment 8420337 [details] [diff] [review]
enable intrinsic replacement on windows builds

Review of attachment 8420337 [details] [diff] [review]:
-----------------------------------------------------------------

rs=me
Attachment #8420337 - Flags: review?(mh+mozilla) → review+
Assignee: nobody → nfroyd
https://hg.mozilla.org/mozilla-central/rev/b798c69993ed
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.