Open
Bug 928549
Opened 11 years ago
Updated 2 years ago
NS_SUCCEEDED/NS_FAILED could get better codegen with a sign check
Categories
(Core :: XPCOM, defect)
Tracking
()
NEW
People
(Reporter: away, Unassigned)
Details
Attachments
(1 file)
1.33 KB,
patch
|
Details | Diff | Splinter Review |
NS_SUCCEEDED/NS_FAILED are pretty common, and (at least on Windows) I've been seeing a ton of code that goes like this: e8a5feffff call FunctionThatReturnsNSResult 8bc8 mov ecx,eax c1e91f shr ecx,1Fh f6c101 test cl,1 7515 jne ErrorHandler Really that code should just check the sign bit: e8a5feffff call FunctionThatReturnsNSResult 85c0 test eax,eax 7815 js ErrorHandler The reduced instruction count is probably not important on modern processors. But saving six bytes is nice, and freeing up a register is even nicer. Changing to a sign-check shrinks opt xul.dll by 147KB. However, that gets completely erased by PGO, and in fact under PGO the binary actually grows by 24KB. Maybe the optimizer is using the extra register freedom to do smart things? I haven't taken the time to do perf measurements; I'd like to collect feedback first. Is there a good reason for keeping the existing bit mask? It has been unchanged for 15 years, and surely somebody must have proposed this change before.
Comment 1•11 years ago
|
||
Isn't the cast to int32_t technically undefined behavior?
Comment 2•11 years ago
|
||
(In reply to On vacation Oct 12 - Oct 27 from comment #1) > Isn't the cast to int32_t technically undefined behavior? It is technically implementation-defined ([conv.integral]p3); I'm pretty sure we rely on the expected behavior in a lot of places (even if we shouldn't). FWIW, GCC (and I assume clang) are able to optimize the existing code to the sign bit test. Maybe the PGO optimizer is just smarter (why?!). Does anything interesting happen if you change the function to: inline bool NS_FAILED_impl(nsresult _nsresult) { return (static_cast<uint32_t>(_nsresult) & 0x80000000) != 0; } ? (I assume you also checked the new xul.dll to see that the size reduction was because MSVC was now doing the sign-bit tests?)
(In reply to Nathan Froyd (:froydnj) from comment #2) > FWIW, GCC (and I assume clang) are able to optimize the existing code to the > sign bit test. No surprise there :-) > Does anything interesting happen if you change the function to: > > inline bool NS_FAILED_impl(nsresult _nsresult) { > return (static_cast<uint32_t>(_nsresult) & 0x80000000) != 0; > } I just tried and that doesn't seem to help. > ? (I assume you also checked the new xul.dll to see that the size reduction > was because MSVC was now doing the sign-bit tests?) I spot-checked a few places where I had noticed right-shifts before, and those shifts were gone.
Comment 4•11 years ago
|
||
It would actually be nice if someone takes a look at what clang does to this code, as the clang optimizer is vastly different than the gcc optimizer.
Comment 5•11 years ago
|
||
My opinions are: * we should probably not rely on undefined behavior unless there are big wins to be had * Otherwise we should do the thing that gets us the best-optimized code in the release build for each platform Given that, the only really interesting datapoint here is which is better in PGO builds in terms of speed and codesize (and I suspect codesize is the better metric here).
Comment 6•11 years ago
|
||
(In reply to comment #5) > My opinions are: > > * we should probably not rely on undefined behavior unless there are big wins > to be had > * Otherwise we should do the thing that gets us the best-optimized code in the > release build for each platform > > Given that, the only really interesting datapoint here is which is better in > PGO builds in terms of speed and codesize (and I suspect codesize is the better > metric here). Note that we don't PGO on Mac, which is why I suggested investigation with clang.
Comment 7•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment #4) > It would actually be nice if someone takes a look at what clang does to this > code, as the clang optimizer is vastly different than the gcc optimizer. clang tests the sign bit as expected.
For my own notes: I did verify that current shipped PGO builds do use right-shifts. (In case there are concerns about red herrings in local developer builds)
I can't measure any performance improvement from this. It's well within noise on both PGO and non-PGO. And given that PGO doesn't even get a size improvement, I don't have much reason to spend more time on this. Oh well, at least I got an answer to my question of "why hasn't anyone done this before"
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•