Closed Bug 405992 Opened 17 years ago Closed 16 years ago

Implement native atomic operations for Linux/ARM.

Categories

(NSPR :: NSPR, enhancement)

Other
Linux
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ilpo.ruotsalainen, Assigned: wtc)

References

Details

(Keywords: mobile)

Attachments

(1 file, 4 obsolete files)

User-Agent: Opera/9.24 (X11; Linux i686; U; en) Build Identifier: NSPR_HEAD_20071016 Native atomic operations for Linux/ARM should be implemented to increase performance. Reproducible: Always Steps to Reproduce: 1. Download sources. 2. Notice missing code.
This patch implements atomic operations for Linux/ARM. However, this implementation requires at kernel version 2.6.13 or later.
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
Brad, Chris, Doug, could you help me review the patch in this bug? Ilpo, we need to add a compile-time check (either an autoconf test or a C preprocessor test using macros) to determine when we can call the __kernel_cmpxchg function at the hardcoded address 0xffff0fc0. It might be easier to implement NSPR's own cmpxchg function. It should be a couple of lines of inline assembly code if ARM has the right instruction.
(In reply to comment #2) > Ilpo, we need to add a compile-time check (either an autoconf test > or a C preprocessor test using macros) to determine when we can > call the __kernel_cmpxchg function at the hardcoded address 0xffff0fc0. Any suggestions what the check should be based on? We can't test if we can run the code in cross-compiling cases (does anyone do that?) and even checking kernel version on the build host is not really a good check for this. > It might be easier to implement NSPR's own cmpxchg function. It should > be a couple of lines of inline assembly code if ARM has the right instruction. Oversimplifying, there's no way to implement atomic cmpxchg on ARMv5 (and older, I think, but those might be outside the scope anyway).
Keywords: mobile
We could ignore cross compiles for now and just check the host. Many embedded teams are working with scratchbox anyway. In the case of a cross compiling we could have a config flag for now
wtc, a configure check would be hard. we are building in "scratchbox" which is a cross-compilation toolkit, not the real target device. It might need to just be a configure option that is passed by the developer (and we hope that they know what they are doing) on the patch, I do wonder what the address is all about; a nice big comment was to why we need to do it this way would be very helpful. Especially the requirement to call __kernel_cmpxchg_t repeatedly until success. For example, the text: The ARM __kuser_cmpxchg routine is meant to implement an atomic cmpxchg in user space. It however can produce spurious false negative if a processor exception occurs in the middle of the operation. works very nice as a comment (stolen from commit Nicolas Pitre checkin comment to the linux 2.6 kernel - b49c0f24cf6744a3f4fd09289fe7cade349dead5). You should probably change the routine name above to __kernel_cmpxchg_t. (I assume both of these fuctions are the same thing) I also wonder about the need for the do loop? Is it sufficient to simply set |ov| and/or |nv| once, then loop waiting for a successful __kernel_cmpxchg_t?
(In reply to comment #5) > on the patch, I do wonder what the address is all about; a nice big comment was > to why we need to do it this way would be very helpful. I'll post an updated patch with added comments soon. > I also wonder about the need for the do loop? Is it sufficient to simply set > |ov| and/or |nv| once, then loop waiting for a successful __kernel_cmpxchg_t? If you only set them once you end up in an (more or less) infinite loop the first time some other thread gets in between setting |ov| and calling the cmpxchg since then |ov| will not match the value of the atomic variable anymore and the cmpxchg will not succeed unless some other thread changes the atomic value back to what it was previously.
any update Ilpo?
Attached patch Patch v2 (obsolete) — Splinter Review
Attachment #290686 - Attachment is obsolete: true
Attachment #310211 - Flags: review?(wtc)
Ilpo, is see this in my nsprpub/config/autoconf.mk USE_ARM_KUSER = @USE_ARM_KUSER@ I am pretty sure that this should have either been set or not set depending on my configure flags.
(In reply to comment #9) > Ilpo, is see this in my nsprpub/config/autoconf.mk > > USE_ARM_KUSER = @USE_ARM_KUSER@ > > I am pretty sure that this should have either been set or not set depending on > my configure flags. Did you remember to regenerate nsprpub/configure?
yes, thanks. I have been using this for a few weeks now without any problem. Ilpo, did you happen to get any perf numbers? Wtc, any chance to look at this new patch?
(In reply to comment #11) > did you happen to get any perf numbers? I have some on an ARM1176 development platform (CPU 400 MHz) Linux version 2.6.21 (gcc version 4.2.1 (CodeSourcery Sourcery G++ Lite 2007q3-51)) Xulrunner (1.9b4pre) with a very simple UI ## Browser test suite based on iBench: No patch Ilpo's patch gain Total Time: 1713.095 sec 1635.665 sec 4.5% First time: 215.58 sec 205.89 sec 4.5% Cached time: 213.935 sec 204.255 sec 4.5% ## Dromaeo.com: With Ilpo's patch: http://dromaeo.com/?id=5263 Without patch: http://dromaeo.com/?id=6490 http://dromaeo.com/?id=6490,5263 Total: 259234.00ms ±4.86% 218554.80ms ±2.67% winning 19%
Because Ilpo's patch was not a CVS diff, a number of the patch reviewing tools on bugzilla don't work with it. This patch is exactly the same patch as the previous one, but in the form of a CVS unified diff.
Comment on attachment 310211 [details] [diff] [review] Patch v2 This patch evidently uses some non-standard non-portable extensions of the c language that (I gather) must be common to all gcc compilers, but are certainly not accepted by most other c compilers. There's precedent for doing this in this same header file, for alpha. I guess we don't foresee using any other compiler on Linux any time soon. I would think that using the single-variable expression statement at the end of the basic block would (or could) be optimized away. This code appears to rely on the assumption that it will not be optimized away. The alpha code does not appear to rely on that assumption. Perhaps declaring that variable as volatile would avoid that possibility. There's a chance of variable name collision causing unexpected results. If (for example) someone invoked one of these macros with a variable named vp or ov (or nv), the results would not be as expected. I recommend prepending the names of the local variables in these basic blocks with some number of underscores (e.g. __ov, __nv), as was done in the alpha code, to reduce the chance of such collisions. Fixing the potential variable name collision problem is a must. If you're confident that the single variable expression statement cannot be optimized away, then that can stay as is.
Attachment #310211 - Flags: review-
After thinking about this some more, it seems to me that there is another way to do this that does not rely on so many assumptions about optimizers or non-standard c language extensions. It is to declare these functions as __inline__ functions instead of them being macros. I'm thinking of something roughly like this: __inline__ PRInt32 _MD_ATOMIC_ADD(PRInt32 *ptr, PRInt32 n) { PRInt32 ov, nv; volatile PRInt32 *vp = (ptr); /* The whole sequence must be restarted if __kernel_cmpxchg() fails * because some other thread may have changed the value of *vp. */ do { ov = *vp; nv = ov + n; } while (__kernel_cmpxchg(ov, nv, vp)); return nv; } #define _MD_ATOMIC_INCREMENT(ptr) _MD_ATOMIC_ADD(ptr, 1) #define _MD_ATOMIC_DECREMENT(ptr) _MD_ATOMIC_ADD(ptr, -1) This approach doesn't depend on allowing executable basic blocks inside of expressions (e.g. ({ code }) and doesn't rely on a single-variable expression statement that is the last statement in a basic block not being optimized away or being the "return value" of the block/expression. And it avoids the variable name collision potential completely. If this approach produces comparable performance to the approach taken in the previous patch, I'd much prefer it.
Comment on attachment 310211 [details] [diff] [review] Patch v2 Thanks for the patch. Could you skip the USE_ARM_KUSER makefile variable and just do AC_DEFINE(_PR_ARM_KUSER) directly in configure.in? (It's fine to use the shell variable USE_ARM_KUSER in configure.in.) In NSPR, we put the closing brace } and while on the same line: } while (__kernel_cmpxchg(ov, (nv), vp));
Attachment #310211 - Flags: review?(wtc) → review-
I'd like to see this checked in as soon as we can; here's an updated patch based on the review comments. The generated code in my simple tests is identical to the macro version; I haven't run actual perf tests, though. Also fixed the configure/makefile issue; I just AC_DEFINE if the configure flag is specified, don't think we need the shell var anywhere.
Attachment #317821 - Flags: review?(nelson)
I changed --enable-arm-kuser to --with-arm-kuser. Let me know if --enable is more appropriate. I fixed the indentation offset (which is 4 in NSPR and NSS). I would have defined __kernel_cmpxchg_t and __kernel_cmpxchg as: typedef int (*__kernel_cmpxchg_t)(int oldval, int newval, volatile int *ptr); #define __kernel_cmpxchg ((__kernel_cmpxchg_t)0xffff0fc0) But I found that the definitions in this patch come from the example in the Linux/ARM kernel source code, so I kept them. I checked in the patch on the NSPR trunk (NSPR 4.7.1). Checking in configure.in; /cvsroot/mozilla/nsprpub/configure.in,v <-- configure.in new revision: 1.232; previous revision: 1.231 done Checking in configure; /cvsroot/mozilla/nsprpub/configure,v <-- configure new revision: 1.228; previous revision: 1.227 done Checking in pr/include/md/_linux.h; /cvsroot/mozilla/nsprpub/pr/include/md/_linux.h,v <-- _linux.h new revision: 3.49; previous revision: 3.48 done
Attachment #310211 - Attachment is obsolete: true
Attachment #316713 - Attachment is obsolete: true
Attachment #317821 - Attachment is obsolete: true
Attachment #317821 - Flags: review?(nelson)
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Target Milestone: --- → 4.7.1
(In reply to comment #18) > I changed --enable-arm-kuser to --with-arm-kuser. Let me > know if --enable is more appropriate. with should be fine. > I fixed the indentation offset (which is 4 in NSPR and NSS). Note that http://mxr.mozilla.org/mozilla/source/nsprpub/pr/include/md/_linux.h#1 indicates 2 -- my editor's just doing what the file says. :) You may want to update the modelines. > I checked in the patch on the NSPR trunk (NSPR 4.7.1). Thanks! I'll do a separate patch to get this flag passed down from Gecko's toplevel configure.
Vlad, I will fix the Emacs mode line in _linux.h. The mode line was given to us when Netscape released its browser source code in 1998. Since none of the NSPR developers use Emacs, we just added the mode line to our files without looking closely. In comment #18, I wrote: > I would have defined __kernel_cmpxchg_t and __kernel_cmpxchg as: > typedef int (*__kernel_cmpxchg_t)(int oldval, int newval, volatile int *ptr); > #define __kernel_cmpxchg ((__kernel_cmpxchg_t)0xffff0fc0) I now understand why __kernel_cmpxchg has to be defined as: #define __kernel_cmpxchg (*(__kernel_cmpxchg_t *)0xffff0fc0) I believe that 0xffff0fc0 is not the address of the __kernel_cmpxchg function but rather the address of a word that stores the address of the function.
Comment on attachment 317838 [details] [diff] [review] updated patch with review comments (as checked in) Thanks, Vlad and Wan-Teh, for incorporating my comments.
Attachment #317838 - Flags: review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: