Closed
Bug 405992
Opened 17 years ago
Closed 16 years ago
Implement native atomic operations for Linux/ARM.
Categories
(NSPR :: NSPR, enhancement)
Tracking
(Not tracked)
RESOLVED
FIXED
4.7.1
People
(Reporter: ilpo.ruotsalainen, Assigned: wtc)
References
Details
(Keywords: mobile)
Attachments
(1 file, 4 obsolete files)
3.06 KB,
patch
|
nelson
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Opera/9.24 (X11; Linux i686; U; en)
Build Identifier: NSPR_HEAD_20071016
Native atomic operations for Linux/ARM should be implemented to increase performance.
Reproducible: Always
Steps to Reproduce:
1. Download sources.
2. Notice missing code.
Reporter | ||
Comment 1•17 years ago
|
||
This patch implements atomic operations for Linux/ARM. However, this implementation requires at kernel version 2.6.13 or later.
Assignee | ||
Updated•17 years ago
|
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee | ||
Comment 2•17 years ago
|
||
Brad, Chris, Doug, could you help me review the patch in this bug?
Ilpo, we need to add a compile-time check (either an autoconf test
or a C preprocessor test using macros) to determine when we can
call the __kernel_cmpxchg function at the hardcoded address 0xffff0fc0.
It might be easier to implement NSPR's own cmpxchg function. It should
be a couple of lines of inline assembly code if ARM has the right instruction.
Reporter | ||
Comment 3•17 years ago
|
||
(In reply to comment #2)
> Ilpo, we need to add a compile-time check (either an autoconf test
> or a C preprocessor test using macros) to determine when we can
> call the __kernel_cmpxchg function at the hardcoded address 0xffff0fc0.
Any suggestions what the check should be based on? We can't test if we can run the code in cross-compiling cases (does anyone do that?) and even checking kernel version on the build host is not really a good check for this.
> It might be easier to implement NSPR's own cmpxchg function. It should
> be a couple of lines of inline assembly code if ARM has the right instruction.
Oversimplifying, there's no way to implement atomic cmpxchg on ARMv5 (and older, I think, but those might be outside the scope anyway).
Comment 4•17 years ago
|
||
We could ignore cross compiles for now and just check the host. Many embedded teams are working with scratchbox anyway.
In the case of a cross compiling we could have a config flag for now
Comment 5•17 years ago
|
||
wtc, a configure check would be hard. we are building in "scratchbox" which is a cross-compilation toolkit, not the real target device. It might need to just be a configure option that is passed by the developer (and we hope that they know what they are doing)
on the patch, I do wonder what the address is all about; a nice big comment was to why we need to do it this way would be very helpful. Especially the requirement to call __kernel_cmpxchg_t repeatedly until success. For example, the text:
The ARM __kuser_cmpxchg routine is meant to implement an atomic cmpxchg
in user space. It however can produce spurious false negative if a
processor exception occurs in the middle of the operation.
works very nice as a comment (stolen from commit Nicolas Pitre checkin comment to the linux 2.6 kernel - b49c0f24cf6744a3f4fd09289fe7cade349dead5). You should probably change the routine name above to __kernel_cmpxchg_t. (I assume both of these fuctions are the same thing)
I also wonder about the need for the do loop? Is it sufficient to simply set |ov| and/or |nv| once, then loop waiting for a successful __kernel_cmpxchg_t?
Reporter | ||
Comment 6•17 years ago
|
||
(In reply to comment #5)
> on the patch, I do wonder what the address is all about; a nice big comment was
> to why we need to do it this way would be very helpful.
I'll post an updated patch with added comments soon.
> I also wonder about the need for the do loop? Is it sufficient to simply set
> |ov| and/or |nv| once, then loop waiting for a successful __kernel_cmpxchg_t?
If you only set them once you end up in an (more or less) infinite loop the first time some other thread gets in between setting |ov| and calling the cmpxchg since then |ov| will not match the value of the atomic variable anymore and the cmpxchg will not succeed unless some other thread changes the atomic value back to what it was previously.
Comment 7•17 years ago
|
||
any update Ilpo?
Reporter | ||
Comment 8•17 years ago
|
||
Attachment #290686 -
Attachment is obsolete: true
Updated•17 years ago
|
Attachment #310211 -
Flags: review?(wtc)
Comment 9•17 years ago
|
||
Ilpo, is see this in my nsprpub/config/autoconf.mk
USE_ARM_KUSER = @USE_ARM_KUSER@
I am pretty sure that this should have either been set or not set depending on my configure flags.
Reporter | ||
Comment 10•17 years ago
|
||
(In reply to comment #9)
> Ilpo, is see this in my nsprpub/config/autoconf.mk
>
> USE_ARM_KUSER = @USE_ARM_KUSER@
>
> I am pretty sure that this should have either been set or not set depending on
> my configure flags.
Did you remember to regenerate nsprpub/configure?
Comment 11•16 years ago
|
||
yes, thanks. I have been using this for a few weeks now without any problem. Ilpo, did you happen to get any perf numbers?
Wtc, any chance to look at this new patch?
Comment 12•16 years ago
|
||
(In reply to comment #11)
> did you happen to get any perf numbers?
I have some on an ARM1176 development platform (CPU 400 MHz)
Linux version 2.6.21 (gcc version 4.2.1 (CodeSourcery Sourcery G++ Lite 2007q3-51))
Xulrunner (1.9b4pre) with a very simple UI
## Browser test suite based on iBench:
No patch Ilpo's patch gain
Total Time: 1713.095 sec 1635.665 sec 4.5%
First time: 215.58 sec 205.89 sec 4.5%
Cached time: 213.935 sec 204.255 sec 4.5%
## Dromaeo.com:
With Ilpo's patch: http://dromaeo.com/?id=5263
Without patch: http://dromaeo.com/?id=6490
http://dromaeo.com/?id=6490,5263
Total: 259234.00ms ±4.86% 218554.80ms ±2.67% winning 19%
Comment 13•16 years ago
|
||
Because Ilpo's patch was not a CVS diff, a number of the patch reviewing
tools on bugzilla don't work with it. This patch is exactly the same
patch as the previous one, but in the form of a CVS unified diff.
Comment 14•16 years ago
|
||
Comment on attachment 310211 [details] [diff] [review]
Patch v2
This patch evidently uses some non-standard non-portable extensions of the
c language that (I gather) must be common to all gcc compilers, but are
certainly not accepted by most other c compilers. There's precedent for
doing this in this same header file, for alpha. I guess we don't foresee
using any other compiler on Linux any time soon.
I would think that using the single-variable expression statement at the
end of the basic block would (or could) be optimized away. This code
appears to rely on the assumption that it will not be optimized away.
The alpha code does not appear to rely on that assumption. Perhaps
declaring that variable as volatile would avoid that possibility.
There's a chance of variable name collision causing unexpected results.
If (for example) someone invoked one of these macros with a variable
named vp or ov (or nv), the results would not be as expected. I recommend
prepending the names of the local variables in these basic blocks with
some number of underscores (e.g. __ov, __nv), as was done in the alpha
code, to reduce the chance of such collisions.
Fixing the potential variable name collision problem is a must.
If you're confident that the single variable expression statement cannot
be optimized away, then that can stay as is.
Attachment #310211 -
Flags: review-
Comment 15•16 years ago
|
||
After thinking about this some more, it seems to me that there is another
way to do this that does not rely on so many assumptions about optimizers
or non-standard c language extensions. It is to declare these functions
as __inline__ functions instead of them being macros.
I'm thinking of something roughly like this:
__inline__ PRInt32 _MD_ATOMIC_ADD(PRInt32 *ptr, PRInt32 n)
{
PRInt32 ov, nv;
volatile PRInt32 *vp = (ptr);
/* The whole sequence must be restarted if __kernel_cmpxchg() fails
* because some other thread may have changed the value of *vp. */
do {
ov = *vp;
nv = ov + n;
} while (__kernel_cmpxchg(ov, nv, vp));
return nv;
}
#define _MD_ATOMIC_INCREMENT(ptr) _MD_ATOMIC_ADD(ptr, 1)
#define _MD_ATOMIC_DECREMENT(ptr) _MD_ATOMIC_ADD(ptr, -1)
This approach doesn't depend on allowing executable basic blocks
inside of expressions (e.g. ({ code })
and doesn't rely on a single-variable expression statement that is
the last statement in a basic block not being optimized away or
being the "return value" of the block/expression.
And it avoids the variable name collision potential completely.
If this approach produces comparable performance to the approach taken
in the previous patch, I'd much prefer it.
Assignee | ||
Comment 16•16 years ago
|
||
Comment on attachment 310211 [details] [diff] [review]
Patch v2
Thanks for the patch.
Could you skip the USE_ARM_KUSER makefile variable and just
do AC_DEFINE(_PR_ARM_KUSER) directly in configure.in? (It's
fine to use the shell variable USE_ARM_KUSER in configure.in.)
In NSPR, we put the closing brace } and while on the same line:
} while (__kernel_cmpxchg(ov, (nv), vp));
Attachment #310211 -
Flags: review?(wtc) → review-
I'd like to see this checked in as soon as we can; here's an updated patch based on the review comments. The generated code in my simple tests is identical to the macro version; I haven't run actual perf tests, though. Also fixed the configure/makefile issue; I just AC_DEFINE if the configure flag is specified, don't think we need the shell var anywhere.
Attachment #317821 -
Flags: review?(nelson)
Assignee | ||
Comment 18•16 years ago
|
||
I changed --enable-arm-kuser to --with-arm-kuser. Let me
know if --enable is more appropriate.
I fixed the indentation offset (which is 4 in NSPR and NSS).
I would have defined __kernel_cmpxchg_t and __kernel_cmpxchg as:
typedef int (*__kernel_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
#define __kernel_cmpxchg ((__kernel_cmpxchg_t)0xffff0fc0)
But I found that the definitions in this patch come from
the example in the Linux/ARM kernel source code, so I kept
them.
I checked in the patch on the NSPR trunk (NSPR 4.7.1).
Checking in configure.in;
/cvsroot/mozilla/nsprpub/configure.in,v <-- configure.in
new revision: 1.232; previous revision: 1.231
done
Checking in configure;
/cvsroot/mozilla/nsprpub/configure,v <-- configure
new revision: 1.228; previous revision: 1.227
done
Checking in pr/include/md/_linux.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_linux.h,v <-- _linux.h
new revision: 3.49; previous revision: 3.48
done
Attachment #310211 -
Attachment is obsolete: true
Attachment #316713 -
Attachment is obsolete: true
Attachment #317821 -
Attachment is obsolete: true
Attachment #317821 -
Flags: review?(nelson)
Assignee | ||
Updated•16 years ago
|
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Target Milestone: --- → 4.7.1
(In reply to comment #18)
> I changed --enable-arm-kuser to --with-arm-kuser. Let me
> know if --enable is more appropriate.
with should be fine.
> I fixed the indentation offset (which is 4 in NSPR and NSS).
Note that http://mxr.mozilla.org/mozilla/source/nsprpub/pr/include/md/_linux.h#1 indicates 2 -- my editor's just doing what the file says. :) You may want to update the modelines.
> I checked in the patch on the NSPR trunk (NSPR 4.7.1).
Thanks! I'll do a separate patch to get this flag passed down from Gecko's toplevel configure.
Assignee | ||
Comment 20•16 years ago
|
||
Vlad, I will fix the Emacs mode line in _linux.h. The mode line
was given to us when Netscape released its browser source code in
1998. Since none of the NSPR developers use Emacs, we just added
the mode line to our files without looking closely.
In comment #18, I wrote:
> I would have defined __kernel_cmpxchg_t and __kernel_cmpxchg as:
> typedef int (*__kernel_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
> #define __kernel_cmpxchg ((__kernel_cmpxchg_t)0xffff0fc0)
I now understand why __kernel_cmpxchg has to be defined as:
#define __kernel_cmpxchg (*(__kernel_cmpxchg_t *)0xffff0fc0)
I believe that 0xffff0fc0 is not the address of the __kernel_cmpxchg
function but rather the address of a word that stores the address of
the function.
Comment 21•16 years ago
|
||
Comment on attachment 317838 [details] [diff] [review]
updated patch with review comments (as checked in)
Thanks, Vlad and Wan-Teh, for incorporating my comments.
Attachment #317838 -
Flags: review+
You need to log in
before you can comment on or make changes to this bug.
Description
•