Closed
Bug 334826
Opened 19 years ago
Closed 17 years ago
Replace some atomic functions with Win32 intrinsics
Categories
(NSPR :: NSPR, enhancement)
Tracking
(Not tracked)
RESOLVED
FIXED
4.7
People
(Reporter: swsnyder, Assigned: wtc)
References
Details
(Keywords: perf)
Attachments
(3 files, 3 obsolete files)
|
2.70 KB,
patch
|
timeless
:
review-
|
Details | Diff | Splinter Review |
|
6.18 KB,
patch
|
nelson
:
superreview+
|
Details | Diff | Splinter Review |
|
2.40 KB,
patch
|
nelson
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.2) Gecko/20060413 Fedora/1.5.0.2-1fc4.sws Firefox/1.5.0.2
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.2) Gecko/20060413 Fedora/1.5.0.2-1fc4.sws Firefox/1.5.0.2
The PR_AtomicIncrement() and PR_AtomicDecrement() routines are called very often. Bringing up Seamonkey v1.0.1 to a blank page involves a combined ~232,000 calls to these routines. The cost of these calls on Windows platforms can be greatly reduced by the use of inline intrinsic functions.
Reproducible: Always
| Reporter | ||
Comment 1•19 years ago
|
||
PR_AtomicIncrement() and PR_AtomicDecrement() act as wrappers, each encapulating a single call to a platform-specific inline ASM routine. But the PR_Atomic* calls themselves are not inlined. On an X86 platform the overhead (push param onto stack, call function, move param from stack to register, return from function, clean up stack) is greater than the few instructions of actual functionality (the platform-specific ASM code). Moveover the compiler cannot make optimum register allocations because the stack must be used for passing the parameter to the called routine.
I am submitting a patch which conditionally replaces PR_AtomicIncrement() and PR_AtomicDecrement() with their MSC intrinsic equivelents, _InterlockedIncrement() and _InterlockedDecrement().
Here is an example of the original code generation, calling PR_AtomicDecrement(). The code was generated by the MSVC 7.1 compiler from unmodified Seamonkey v1.0.1 source:
nsDirectoryService::Release:
007FE823 push esi
007FE824 mov esi,dword ptr [ebp+8]
007FE827 push edi
007FE828 lea edi,[esi+0Ch]
007FE82B push edi
007FE82C call dword ptr [__imp__PR_AtomicDecrement (8352B4h)]
PR_AtomicDecrement:
30009D53 mov ecx,dword ptr [val]
30009D56 mov eax,0FFFFFFFFh
30009D5B lock xadd dword ptr [ecx],eax
30009D5F dec eax
30009D60 ret
007FE832 add esp,4
007FE835 test eax,eax
007FE837 jne nsDirectoryService::Release+4Eh (7FE86Eh)
Here's an example, after patching, of the new code generated:
nsStorageInputStream::Release:
008058D3 push esi
008058D4 mov esi,dword ptr [this]
008058D7 lea ecx,[esi+8]
008058DA or eax,0FFFFFFFFh
008058DD lock xadd dword ptr [ecx],eax
008058E1 dec eax
008058E2 jne nsStorageInputStream::Release+46h (805916h)
The relevant code starts at 0x007FE828 in the first example and 0x008058D7 in the second. The number of instructions is reduced from 10 to 4, and the number of memory references is reduced from 5 to 1.
Note that the return values of the intrinsic functions different from the PR_Atomic* functions when run on Win95 and WinNT v3.51. I'm told that the minimum version of Windows supported for trunk builds is Win2K, so that shouldn't prevent the acceptance of this patch.
| Reporter | ||
Comment 2•19 years ago
|
||
This patch generally improves performance and reduces code size on x86 Win32 platform by replacing the PR_AtomicIncrement() and PR_AtomicDecrement() with their intrinsic equivalents.
| Reporter | ||
Comment 3•19 years ago
|
||
Edited summary for Win32-specific nature of the patch.
Summary: Replace some atomic funtions with intrinsics → Replace some atomic functions with Win32 intrinsics
Comment 4•19 years ago
|
||
Wouldn't it be simpler to ask the compiler to inline PR_AtomicIncrement() and PR_AtomicDecrement()?
Updated•19 years ago
|
Component: General → XPCOM
Keywords: perf
Product: Firefox → Core
QA Contact: general → xpcom
Version: unspecified → Trunk
Comment 5•19 years ago
|
||
Darin, bsmedberg, what do you think of Steve's patch?
Updated•19 years ago
|
Attachment #219147 -
Flags: review?(wtchang)
Comment 6•19 years ago
|
||
-> nspr
Assignee: nobody → wtchang
Status: UNCONFIRMED → NEW
Component: XPCOM → NSPR
Ever confirmed: true
Product: Core → NSPR
QA Contact: xpcom → wtchang
Target Milestone: --- → 4.7
Version: Trunk → 4.7
| Reporter | ||
Comment 7•19 years ago
|
||
Wouldn't it be simpler to ask the compiler to inline PR_AtomicIncrement() and
>PR_AtomicDecrement()?
If that were done we would benefit from the lack of stack handling. But we would still be using the existing ASM code, along with the hard-coding of registers that that implies, with the compiler having to work around those decisions.
With the intrinsics the compiler can select the registers and instructions available for code gen on the fly, based on the surrounding code.
Simply inlining the existing code would be a half-way point between the existing scheme and my use of intrinsics. On the other hard, it would be more portable, as my patch addresses only the x86 Win32 platform.
| Reporter | ||
Comment 8•19 years ago
|
||
I've discovered a problem resulting from my original patch. It seems that nspr4.dll is giving out the wrong value to an extension that is querying the version number.
With the patched code, the browser itself works great and the About page shows the correct version numbers (Gecko & browser). However, the Roboform extension for Seamonkey complains at runtime that only versions 1.0.0 through 1.0.1 are supported and I am running version "rv:1.8.0.2". Hmmm... Replacing nspr4.dll with one built from unpatched code satisfies the Roboform version query.
Perhaps the lack of references to the PR_AtomicIncrement() and PR_AtomicDecrement() routines means that something is not being pulled into the link of nspr4.dll?
More later.
Comment 9•19 years ago
|
||
your removal of the functions from the .c file means that everything that was linked with the old nspr will fail with the new one...
| Reporter | ||
Comment 10•19 years ago
|
||
Updated patch retains the PR_AtomicIncrement() and PR_AtomicDecrement() functions for compatibility with previous builds. These are not called by any Mozilla code; they just exist unused in the NSPR library to satisfy the expectations of 3rd-party code.
Comment 11•19 years ago
|
||
fwiw, NSPR is used by more than just Mozilla code.
Comment 12•19 years ago
|
||
Comment on attachment 219147 [details] [diff] [review]
Replace some atomic ops with Win32 intrinsics
breaking nspr compatibility is obviously not ok :)
Attachment #219147 -
Flags: review?(wtchang) → review-
| Reporter | ||
Comment 13•19 years ago
|
||
Comment #12: Jeez, you break a fundamental component and you never hear the end of it! :-)
I've been using the revised patch on 1.8.0 builds for the previous 5 months (run on Win2K and WinXP only, of course) and have experienced no problems since that initial "setback." My SeaMonkey builds have all been completely stable and I've had no further complaints from 3rd-party extensions.
It's not an improvement that users would notice, but it does provide a noteable general reduction in code size and clock cycles on Win32 builds. I'll take my performance gains wherever I can get 'em.
Comment 14•19 years ago
|
||
for reference, i meant to post a counter patch. but this got lost. i'll try to do that now.
Comment 15•19 years ago
|
||
instead of trying to hack the code, this offers a macro which either does the optimized thing or not.
all code built against the newer nspr could use the new macro.
a treewide s/r would upgrade a tree.
please test.
Comment 16•19 years ago
|
||
+#if defined(_WIN32) && defined(_M_IX86) && (_MSC_VER >= 1310) && defined(_MT)
+#define NSPR_USE_INTRINSIC_LOCKS
+#else
+#undef NSPR_USE_INTRINSIC_LOCKS
+
+#ifdef NSPR_USE_INTRINSIC_LOCKS
that's missing an #endif, right?
Updated•19 years ago
|
QA Contact: wtchang → nspr
Comment 17•18 years ago
|
||
Comment on attachment 219500 [details] [diff] [review]
Retains PR_Atomic* functions for compatibility
Wan-Teh, what do you think of this?
I'm inclined to give this r+, except that I think the symbol _USE_INTRIN_ATOMICS_ should be NSPR_USE_INTRIN_ATOMICS to avoid name space pollution.
I checked that nsprstub.c does (indirectly) include pratom.h.
Attachment #219500 -
Flags: review?(wtc)
| Assignee | ||
Comment 18•18 years ago
|
||
Comment on attachment 239519 [details] [diff] [review]
does this work
I was going to suggest this, and was pleasantly surprised that
it's exactly what timeless suggested.
One reason PR_AtomicXXX are functions is that we can fix bugs
or enhance the implementation. Macros will be expanded into the
caller's code, and we can't change them without recompiling
the caller's code.
We should use the compiler intrinsics. To prevent breaking
binary compatibility by accident, it's safer to add new macros
or functions and not touch the existing PR_AtomicXXX functions.
Comment 19•18 years ago
|
||
Would love to get this into NSPR in time for Gecko 1.9/FF3...
Flags: blocking1.9+
Comment 20•18 years ago
|
||
(In reply to comment #8)
> I've discovered a problem resulting from my original patch. It seems that
> nspr4.dll is giving out the wrong value to an extension that is querying the
> version number.
>
> With the patched code, the browser itself works great and the About page shows
> the correct version numbers (Gecko & browser). However, the Roboform extension
> for Seamonkey complains at runtime that only versions 1.0.0 through 1.0.1 are
> supported and I am running version "rv:1.8.0.2". Hmmm... Replacing nspr4.dll
> with one built from unpatched code satisfies the Roboform version query.
>
> Perhaps the lack of references to the PR_AtomicIncrement() and
> PR_AtomicDecrement() routines means that something is not being pulled into the
> link of nspr4.dll?
>
> More later.
I did a bunch of these intrinsics things quite some time ago and a user ran into the same problem with Roboform. I don't recall if I took the change out or if I just left it there and the user stopped using my builds.
I should go over my intrinsics patches at some point in case I have a few that haven't already been found and merged in.
| Reporter | ||
Comment 21•18 years ago
|
||
(In reply to comment #20)
[snip]
The problem was that my original patch completely removed the original functions in favor of inline intrinsics. The problem was resolved in my 2nd patch by leaving the original functions (for binary compatibility) and using the inline intrinsics for new builds.
Though I saw the problem with Roboform, it was correctly observed that my original patch broke the published API and was therefore unacceptable.
Comment 22•17 years ago
|
||
Attachment #239519 -
Attachment is obsolete: true
Attachment #298515 -
Flags: review?(wtc)
| Assignee | ||
Comment 23•17 years ago
|
||
Comment on attachment 298515 [details] [diff] [review]
matched if/endif
timeless: thanks for the patch.
Let's name the macro _PR_HAVE_INTRINSIC_ATOMIC_OPS. (There is
an existing internal macro called _PR_HAVE_ATOMIC_OPS.)
Please add an underscore (_) between ATOMIC and INCREMENT/DECREMENT.
Are there compiler intrinsics for _InterlockedAdd and _InterlockedExchange?
It is a little strange to only have intrinsics macros for two of the four
PR_AtomicXXX functions.
Comment 24•17 years ago
|
||
http://msdn2.microsoft.com/en-us/library/w5405h95.aspx
Alphabetical Listing of Intrinsic Functions
http://msdn2.microsoft.com/en-us/library/51s265a6.aspx
_InterlockedAdd Intrinsic Functions
http://msdn2.microsoft.com/en-us/library/1s26w950.aspx
_InterlockedExchange Intrinsic Functions
| Reporter | ||
Comment 25•17 years ago
|
||
(In reply to comment #23)
> Are there compiler intrinsics for _InterlockedAdd and _InterlockedExchange?
> It is a little strange to only have intrinsics macros for two of the four
> PR_AtomicXXX functions.
There's an argument to be made for using these 2 intrinsics for completeness/consistency with Inc & Dec, but there is really no real-world performance gain.
While the intrinsics are more flexible than the existing ASM code in terms of code generation, those 2 functions are called so rarely (at least by Firefox/SeaMonkey) that there is really no practical reason to prefer them over the existing code.
| Reporter | ||
Comment 26•17 years ago
|
||
Re comment #23 and naming conventions: is there any advantage to noting the compiler that provides the intrinsic functions? These functions are compiler-specific.
I'm thinking here of potential confusion of the analogous GCC intrinsics
+# define _MD_ATOMIC_INCREMENT(val) __sync_add_and_fetch(val, 1)
+# define _MD_ATOMIC_DECREMENT(val) __sync_sub_and_fetch(val, 1)
+# define _MD_ATOMIC_ADD(ptr,val) __sync_add_and_fetch(ptr, val)
+# define _MD_ATOMIC_SET(ptr,newval) __sync_lock_test_and_set(ptr, newval)
(or similar) as replacements for the functions in os_Linux_x86.s.
| Assignee | ||
Comment 27•17 years ago
|
||
I agree we should skip the intermediate HAVE_XXX macro and just do this:
+#if defined(_WIN32) && (_MSC_VER >= 1310)
+long __cdecl _InterlockedIncrement(PRInt32 volatile *Addend);
+#pragma intrinsic(_InterlockedIncrement)
+
+long __cdecl _InterlockedDecrement(PRInt32 volatile *Addend);
+#pragma intrinsic(_InterlockedDecrement)
+
+#define PR_ATOMIC_INCREMENT(val) _InterlockedIncrement(val)
+#define PR_ATOMIC_DECREMENT(val) _InterlockedDecrement(val)
+#else
+#define PR_ATOMIC_INCREMENT(val) PR_AtomicIncrement(val)
+#define PR_ATOMIC_DECREMENT(val) PR_AtomicDecrement(val)
+#endif
These intrinsics are available on all processor architectures and
in single-threaded programs, so we don't need to test the _M_IX86
and _MT macros. The return values of the intrinsics don't match
NSPR's specification on Windows 95 and NT 3.51, so we will be
dropping support for these two old versions of Windows.
Comment 28•17 years ago
|
||
(In reply to comment #25)
> While the intrinsics are more flexible than the existing ASM code in terms of
> code generation, those 2 functions are called so rarely (at least by
> Firefox/SeaMonkey) that there is really no practical reason to prefer them over
> the existing code.
The advantages to using Intrinsics over MSVC Inline Assembler are:
- Intrinsics are generally portable and typically easier to understand
- MSVC for x64 doesn't support MSVC inline assembler so using intrinsics simplifies the code
- I don't know if the intrinsics are the same between GCC and MSVC but having only one code set would be nice for multiple platforms.
| Assignee | ||
Comment 29•17 years ago
|
||
Add PR_ATOMIC_INCREMENT, PR_ATOMIC_DECREMENT, PR_ATOMIC_SET,
and PR_ATOMIC_ADD. Implement these macros as compiler intrinsics
for the latest versions of MSVC and GCC. Implement them as
the PR_AtomicXXX functions by default.
I've checked in this patch on the NSPR trunk (NSPR 4.7) for
testing.
Checking in pr/include/pratom.h;
/cvsroot/mozilla/nsprpub/pr/include/pratom.h,v <-- pratom.h
new revision: 3.9; previous revision: 3.8
done
Checking in pr/tests/atomic.c;
/cvsroot/mozilla/nsprpub/pr/tests/atomic.c,v <-- atomic.c
new revision: 3.9; previous revision: 3.8
done
Attachment #298515 -
Attachment is obsolete: true
Attachment #299498 -
Flags: superreview?(nelson)
Attachment #299498 -
Flags: review?(swsnyder)
Attachment #298515 -
Flags: review?(wtc)
Comment 30•17 years ago
|
||
Comment on attachment 299498 [details] [diff] [review]
Proposed patch (checked in)
The patch to atomic.c contains a pattern of 6 lines of code
that is repeated many times. I would have preferred that
those lines appear in a macro with several arguments, and
that each occurrence be a single macro invocation. Also,
I would have written the statements of this form:
result = result | ((EXPRESSION) ? 0 : 1);
as the equivalent and simpler statement:
result |= !(EXPRESSION);
But this patch will do for test code. r=nelson
Attachment #299498 -
Flags: superreview?(nelson) → superreview+
| Assignee | ||
Comment 32•17 years ago
|
||
Nelson, I agree with both of your suggestions. I don't have time
to implement them now, but I added a TODO comment.
I also think of an issue -- the PR_AtomicXXX functions may be
implemented with a lock even when the platform has atomic
instructions because we don't have the time or expertise to
write the assembly code. So if we define PR_ATOMIC_XXX with
compiler intrinsics, the macros and the functions can't be
used on the same variable because they use incompatible methods
to achieve atomicity. But people will expect that the macros
and the functions can be used interchangeably. So I documented
this issue in pratom.h and restrict the use of GCC atomic
intrinsics to the platforms where PR_AtomicXXX is truly atomic.
You don't need to review this patch carefully. This is just FYI.
I've checked in this patch on the NSPR trunk (NSPR 4.7).
Checking in pr/include/pratom.h;
/cvsroot/mozilla/nsprpub/pr/include/pratom.h,v <-- pratom.h
new revision: 3.10; previous revision: 3.9
done
Checking in pr/tests/atomic.c;
/cvsroot/mozilla/nsprpub/pr/tests/atomic.c,v <-- atomic.c
new revision: 3.10; previous revision: 3.9
done
Attachment #299608 -
Flags: review?(nelson)
Comment 33•17 years ago
|
||
Comment on attachment 219500 [details] [diff] [review]
Retains PR_Atomic* functions for compatibility
Obsoleting this patch, which has been superseded.
Attachment #219500 -
Attachment is obsolete: true
Attachment #219500 -
Flags: review?(wtc)
Comment 34•17 years ago
|
||
Comment on attachment 299608 [details] [diff] [review]
Supplemental patch 1 (checked in)
I think this bug can now be resolved as fixed.
Attachment #299608 -
Flags: review?(nelson) → review+
| Assignee | ||
Updated•17 years ago
|
Severity: normal → enhancement
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Version: 4.7 → other
| Reporter | ||
Comment 35•13 years ago
|
||
Comment on attachment 299498 [details] [diff] [review]
Proposed patch (checked in)
Removed myself from review in attempt to get Bugzilla to stop nagging me about this 4-year-old bug.
Attachment #299498 -
Flags: review?(swsnyder) → review?
Updated•13 years ago
|
Attachment #299498 -
Flags: review?
You need to log in
before you can comment on or make changes to this bug.
Description
•