Closed
Bug 334826
Opened 18 years ago
Closed 16 years ago
Replace some atomic functions with Win32 intrinsics
Categories
(NSPR :: NSPR, enhancement)
Tracking
(Not tracked)
RESOLVED
FIXED
4.7
People
(Reporter: swsnyder, Assigned: wtc)
References
Details
(Keywords: perf)
Attachments
(3 files, 3 obsolete files)
2.70 KB,
patch
|
timeless
:
review-
|
Details | Diff | Splinter Review |
6.18 KB,
patch
|
nelson
:
superreview+
|
Details | Diff | Splinter Review |
2.40 KB,
patch
|
nelson
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.2) Gecko/20060413 Fedora/1.5.0.2-1fc4.sws Firefox/1.5.0.2 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.2) Gecko/20060413 Fedora/1.5.0.2-1fc4.sws Firefox/1.5.0.2 The PR_AtomicIncrement() and PR_AtomicDecrement() routines are called very often. Bringing up Seamonkey v1.0.1 to a blank page involves a combined ~232,000 calls to these routines. The cost of these calls on Windows platforms can be greatly reduced by the use of inline intrinsic functions. Reproducible: Always
Reporter | ||
Comment 1•18 years ago
|
||
PR_AtomicIncrement() and PR_AtomicDecrement() act as wrappers, each encapulating a single call to a platform-specific inline ASM routine. But the PR_Atomic* calls themselves are not inlined. On an X86 platform the overhead (push param onto stack, call function, move param from stack to register, return from function, clean up stack) is greater than the few instructions of actual functionality (the platform-specific ASM code). Moveover the compiler cannot make optimum register allocations because the stack must be used for passing the parameter to the called routine. I am submitting a patch which conditionally replaces PR_AtomicIncrement() and PR_AtomicDecrement() with their MSC intrinsic equivelents, _InterlockedIncrement() and _InterlockedDecrement(). Here is an example of the original code generation, calling PR_AtomicDecrement(). The code was generated by the MSVC 7.1 compiler from unmodified Seamonkey v1.0.1 source: nsDirectoryService::Release: 007FE823 push esi 007FE824 mov esi,dword ptr [ebp+8] 007FE827 push edi 007FE828 lea edi,[esi+0Ch] 007FE82B push edi 007FE82C call dword ptr [__imp__PR_AtomicDecrement (8352B4h)] PR_AtomicDecrement: 30009D53 mov ecx,dword ptr [val] 30009D56 mov eax,0FFFFFFFFh 30009D5B lock xadd dword ptr [ecx],eax 30009D5F dec eax 30009D60 ret 007FE832 add esp,4 007FE835 test eax,eax 007FE837 jne nsDirectoryService::Release+4Eh (7FE86Eh) Here's an example, after patching, of the new code generated: nsStorageInputStream::Release: 008058D3 push esi 008058D4 mov esi,dword ptr [this] 008058D7 lea ecx,[esi+8] 008058DA or eax,0FFFFFFFFh 008058DD lock xadd dword ptr [ecx],eax 008058E1 dec eax 008058E2 jne nsStorageInputStream::Release+46h (805916h) The relevant code starts at 0x007FE828 in the first example and 0x008058D7 in the second. The number of instructions is reduced from 10 to 4, and the number of memory references is reduced from 5 to 1. Note that the return values of the intrinsic functions different from the PR_Atomic* functions when run on Win95 and WinNT v3.51. I'm told that the minimum version of Windows supported for trunk builds is Win2K, so that shouldn't prevent the acceptance of this patch.
Reporter | ||
Comment 2•18 years ago
|
||
This patch generally improves performance and reduces code size on x86 Win32 platform by replacing the PR_AtomicIncrement() and PR_AtomicDecrement() with their intrinsic equivalents.
Reporter | ||
Comment 3•18 years ago
|
||
Edited summary for Win32-specific nature of the patch.
Summary: Replace some atomic funtions with intrinsics → Replace some atomic functions with Win32 intrinsics
Comment 4•18 years ago
|
||
Wouldn't it be simpler to ask the compiler to inline PR_AtomicIncrement() and PR_AtomicDecrement()?
Updated•18 years ago
|
Component: General → XPCOM
Keywords: perf
Product: Firefox → Core
QA Contact: general → xpcom
Version: unspecified → Trunk
Comment 5•18 years ago
|
||
Darin, bsmedberg, what do you think of Steve's patch?
Updated•18 years ago
|
Attachment #219147 -
Flags: review?(wtchang)
Comment 6•18 years ago
|
||
-> nspr
Assignee: nobody → wtchang
Status: UNCONFIRMED → NEW
Component: XPCOM → NSPR
Ever confirmed: true
Product: Core → NSPR
QA Contact: xpcom → wtchang
Target Milestone: --- → 4.7
Version: Trunk → 4.7
Reporter | ||
Comment 7•18 years ago
|
||
Wouldn't it be simpler to ask the compiler to inline PR_AtomicIncrement() and
>PR_AtomicDecrement()?
If that were done we would benefit from the lack of stack handling. But we would still be using the existing ASM code, along with the hard-coding of registers that that implies, with the compiler having to work around those decisions.
With the intrinsics the compiler can select the registers and instructions available for code gen on the fly, based on the surrounding code.
Simply inlining the existing code would be a half-way point between the existing scheme and my use of intrinsics. On the other hard, it would be more portable, as my patch addresses only the x86 Win32 platform.
Reporter | ||
Comment 8•18 years ago
|
||
I've discovered a problem resulting from my original patch. It seems that nspr4.dll is giving out the wrong value to an extension that is querying the version number. With the patched code, the browser itself works great and the About page shows the correct version numbers (Gecko & browser). However, the Roboform extension for Seamonkey complains at runtime that only versions 1.0.0 through 1.0.1 are supported and I am running version "rv:1.8.0.2". Hmmm... Replacing nspr4.dll with one built from unpatched code satisfies the Roboform version query. Perhaps the lack of references to the PR_AtomicIncrement() and PR_AtomicDecrement() routines means that something is not being pulled into the link of nspr4.dll? More later.
Comment 9•18 years ago
|
||
your removal of the functions from the .c file means that everything that was linked with the old nspr will fail with the new one...
Reporter | ||
Comment 10•18 years ago
|
||
Updated patch retains the PR_AtomicIncrement() and PR_AtomicDecrement() functions for compatibility with previous builds. These are not called by any Mozilla code; they just exist unused in the NSPR library to satisfy the expectations of 3rd-party code.
Comment 11•18 years ago
|
||
fwiw, NSPR is used by more than just Mozilla code.
Comment 12•17 years ago
|
||
Comment on attachment 219147 [details] [diff] [review] Replace some atomic ops with Win32 intrinsics breaking nspr compatibility is obviously not ok :)
Attachment #219147 -
Flags: review?(wtchang) → review-
Reporter | ||
Comment 13•17 years ago
|
||
Comment #12: Jeez, you break a fundamental component and you never hear the end of it! :-) I've been using the revised patch on 1.8.0 builds for the previous 5 months (run on Win2K and WinXP only, of course) and have experienced no problems since that initial "setback." My SeaMonkey builds have all been completely stable and I've had no further complaints from 3rd-party extensions. It's not an improvement that users would notice, but it does provide a noteable general reduction in code size and clock cycles on Win32 builds. I'll take my performance gains wherever I can get 'em.
Comment 14•17 years ago
|
||
for reference, i meant to post a counter patch. but this got lost. i'll try to do that now.
Comment 15•17 years ago
|
||
instead of trying to hack the code, this offers a macro which either does the optimized thing or not. all code built against the newer nspr could use the new macro. a treewide s/r would upgrade a tree. please test.
Comment 16•17 years ago
|
||
+#if defined(_WIN32) && defined(_M_IX86) && (_MSC_VER >= 1310) && defined(_MT) +#define NSPR_USE_INTRINSIC_LOCKS +#else +#undef NSPR_USE_INTRINSIC_LOCKS + +#ifdef NSPR_USE_INTRINSIC_LOCKS that's missing an #endif, right?
Updated•17 years ago
|
QA Contact: wtchang → nspr
Comment 17•16 years ago
|
||
Comment on attachment 219500 [details] [diff] [review] Retains PR_Atomic* functions for compatibility Wan-Teh, what do you think of this? I'm inclined to give this r+, except that I think the symbol _USE_INTRIN_ATOMICS_ should be NSPR_USE_INTRIN_ATOMICS to avoid name space pollution. I checked that nsprstub.c does (indirectly) include pratom.h.
Attachment #219500 -
Flags: review?(wtc)
Assignee | ||
Comment 18•16 years ago
|
||
Comment on attachment 239519 [details] [diff] [review] does this work I was going to suggest this, and was pleasantly surprised that it's exactly what timeless suggested. One reason PR_AtomicXXX are functions is that we can fix bugs or enhance the implementation. Macros will be expanded into the caller's code, and we can't change them without recompiling the caller's code. We should use the compiler intrinsics. To prevent breaking binary compatibility by accident, it's safer to add new macros or functions and not touch the existing PR_AtomicXXX functions.
Comment 19•16 years ago
|
||
Would love to get this into NSPR in time for Gecko 1.9/FF3...
Flags: blocking1.9+
Comment 20•16 years ago
|
||
(In reply to comment #8) > I've discovered a problem resulting from my original patch. It seems that > nspr4.dll is giving out the wrong value to an extension that is querying the > version number. > > With the patched code, the browser itself works great and the About page shows > the correct version numbers (Gecko & browser). However, the Roboform extension > for Seamonkey complains at runtime that only versions 1.0.0 through 1.0.1 are > supported and I am running version "rv:1.8.0.2". Hmmm... Replacing nspr4.dll > with one built from unpatched code satisfies the Roboform version query. > > Perhaps the lack of references to the PR_AtomicIncrement() and > PR_AtomicDecrement() routines means that something is not being pulled into the > link of nspr4.dll? > > More later. I did a bunch of these intrinsics things quite some time ago and a user ran into the same problem with Roboform. I don't recall if I took the change out or if I just left it there and the user stopped using my builds. I should go over my intrinsics patches at some point in case I have a few that haven't already been found and merged in.
Reporter | ||
Comment 21•16 years ago
|
||
(In reply to comment #20) [snip] The problem was that my original patch completely removed the original functions in favor of inline intrinsics. The problem was resolved in my 2nd patch by leaving the original functions (for binary compatibility) and using the inline intrinsics for new builds. Though I saw the problem with Roboform, it was correctly observed that my original patch broke the published API and was therefore unacceptable.
Comment 22•16 years ago
|
||
Attachment #239519 -
Attachment is obsolete: true
Attachment #298515 -
Flags: review?(wtc)
Assignee | ||
Comment 23•16 years ago
|
||
Comment on attachment 298515 [details] [diff] [review] matched if/endif timeless: thanks for the patch. Let's name the macro _PR_HAVE_INTRINSIC_ATOMIC_OPS. (There is an existing internal macro called _PR_HAVE_ATOMIC_OPS.) Please add an underscore (_) between ATOMIC and INCREMENT/DECREMENT. Are there compiler intrinsics for _InterlockedAdd and _InterlockedExchange? It is a little strange to only have intrinsics macros for two of the four PR_AtomicXXX functions.
Comment 24•16 years ago
|
||
http://msdn2.microsoft.com/en-us/library/w5405h95.aspx Alphabetical Listing of Intrinsic Functions http://msdn2.microsoft.com/en-us/library/51s265a6.aspx _InterlockedAdd Intrinsic Functions http://msdn2.microsoft.com/en-us/library/1s26w950.aspx _InterlockedExchange Intrinsic Functions
Reporter | ||
Comment 25•16 years ago
|
||
(In reply to comment #23) > Are there compiler intrinsics for _InterlockedAdd and _InterlockedExchange? > It is a little strange to only have intrinsics macros for two of the four > PR_AtomicXXX functions. There's an argument to be made for using these 2 intrinsics for completeness/consistency with Inc & Dec, but there is really no real-world performance gain. While the intrinsics are more flexible than the existing ASM code in terms of code generation, those 2 functions are called so rarely (at least by Firefox/SeaMonkey) that there is really no practical reason to prefer them over the existing code.
Reporter | ||
Comment 26•16 years ago
|
||
Re comment #23 and naming conventions: is there any advantage to noting the compiler that provides the intrinsic functions? These functions are compiler-specific. I'm thinking here of potential confusion of the analogous GCC intrinsics +# define _MD_ATOMIC_INCREMENT(val) __sync_add_and_fetch(val, 1) +# define _MD_ATOMIC_DECREMENT(val) __sync_sub_and_fetch(val, 1) +# define _MD_ATOMIC_ADD(ptr,val) __sync_add_and_fetch(ptr, val) +# define _MD_ATOMIC_SET(ptr,newval) __sync_lock_test_and_set(ptr, newval) (or similar) as replacements for the functions in os_Linux_x86.s.
Assignee | ||
Comment 27•16 years ago
|
||
I agree we should skip the intermediate HAVE_XXX macro and just do this: +#if defined(_WIN32) && (_MSC_VER >= 1310) +long __cdecl _InterlockedIncrement(PRInt32 volatile *Addend); +#pragma intrinsic(_InterlockedIncrement) + +long __cdecl _InterlockedDecrement(PRInt32 volatile *Addend); +#pragma intrinsic(_InterlockedDecrement) + +#define PR_ATOMIC_INCREMENT(val) _InterlockedIncrement(val) +#define PR_ATOMIC_DECREMENT(val) _InterlockedDecrement(val) +#else +#define PR_ATOMIC_INCREMENT(val) PR_AtomicIncrement(val) +#define PR_ATOMIC_DECREMENT(val) PR_AtomicDecrement(val) +#endif These intrinsics are available on all processor architectures and in single-threaded programs, so we don't need to test the _M_IX86 and _MT macros. The return values of the intrinsics don't match NSPR's specification on Windows 95 and NT 3.51, so we will be dropping support for these two old versions of Windows.
Comment 28•16 years ago
|
||
(In reply to comment #25) > While the intrinsics are more flexible than the existing ASM code in terms of > code generation, those 2 functions are called so rarely (at least by > Firefox/SeaMonkey) that there is really no practical reason to prefer them over > the existing code. The advantages to using Intrinsics over MSVC Inline Assembler are: - Intrinsics are generally portable and typically easier to understand - MSVC for x64 doesn't support MSVC inline assembler so using intrinsics simplifies the code - I don't know if the intrinsics are the same between GCC and MSVC but having only one code set would be nice for multiple platforms.
Assignee | ||
Comment 29•16 years ago
|
||
Add PR_ATOMIC_INCREMENT, PR_ATOMIC_DECREMENT, PR_ATOMIC_SET, and PR_ATOMIC_ADD. Implement these macros as compiler intrinsics for the latest versions of MSVC and GCC. Implement them as the PR_AtomicXXX functions by default. I've checked in this patch on the NSPR trunk (NSPR 4.7) for testing. Checking in pr/include/pratom.h; /cvsroot/mozilla/nsprpub/pr/include/pratom.h,v <-- pratom.h new revision: 3.9; previous revision: 3.8 done Checking in pr/tests/atomic.c; /cvsroot/mozilla/nsprpub/pr/tests/atomic.c,v <-- atomic.c new revision: 3.9; previous revision: 3.8 done
Attachment #298515 -
Attachment is obsolete: true
Attachment #299498 -
Flags: superreview?(nelson)
Attachment #299498 -
Flags: review?(swsnyder)
Attachment #298515 -
Flags: review?(wtc)
Comment 30•16 years ago
|
||
Comment on attachment 299498 [details] [diff] [review] Proposed patch (checked in) The patch to atomic.c contains a pattern of 6 lines of code that is repeated many times. I would have preferred that those lines appear in a macro with several arguments, and that each occurrence be a single macro invocation. Also, I would have written the statements of this form: result = result | ((EXPRESSION) ? 0 : 1); as the equivalent and simpler statement: result |= !(EXPRESSION); But this patch will do for test code. r=nelson
Attachment #299498 -
Flags: superreview?(nelson) → superreview+
Assignee | ||
Comment 32•16 years ago
|
||
Nelson, I agree with both of your suggestions. I don't have time to implement them now, but I added a TODO comment. I also think of an issue -- the PR_AtomicXXX functions may be implemented with a lock even when the platform has atomic instructions because we don't have the time or expertise to write the assembly code. So if we define PR_ATOMIC_XXX with compiler intrinsics, the macros and the functions can't be used on the same variable because they use incompatible methods to achieve atomicity. But people will expect that the macros and the functions can be used interchangeably. So I documented this issue in pratom.h and restrict the use of GCC atomic intrinsics to the platforms where PR_AtomicXXX is truly atomic. You don't need to review this patch carefully. This is just FYI. I've checked in this patch on the NSPR trunk (NSPR 4.7). Checking in pr/include/pratom.h; /cvsroot/mozilla/nsprpub/pr/include/pratom.h,v <-- pratom.h new revision: 3.10; previous revision: 3.9 done Checking in pr/tests/atomic.c; /cvsroot/mozilla/nsprpub/pr/tests/atomic.c,v <-- atomic.c new revision: 3.10; previous revision: 3.9 done
Attachment #299608 -
Flags: review?(nelson)
Comment 33•16 years ago
|
||
Comment on attachment 219500 [details] [diff] [review] Retains PR_Atomic* functions for compatibility Obsoleting this patch, which has been superseded.
Attachment #219500 -
Attachment is obsolete: true
Attachment #219500 -
Flags: review?(wtc)
Comment 34•16 years ago
|
||
Comment on attachment 299608 [details] [diff] [review] Supplemental patch 1 (checked in) I think this bug can now be resolved as fixed.
Attachment #299608 -
Flags: review?(nelson) → review+
Assignee | ||
Updated•16 years ago
|
Severity: normal → enhancement
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Version: 4.7 → other
Reporter | ||
Comment 35•12 years ago
|
||
Comment on attachment 299498 [details] [diff] [review] Proposed patch (checked in) Removed myself from review in attempt to get Bugzilla to stop nagging me about this 4-year-old bug.
Attachment #299498 -
Flags: review?(swsnyder) → review?
Updated•12 years ago
|
Attachment #299498 -
Flags: review?
You need to log in
before you can comment on or make changes to this bug.
Description
•