Implement native atomic operations for Linux/ARM.

RESOLVED FIXED in 4.7.1

Status

enhancement
RESOLVED FIXED
12 years ago
11 years ago

People

(Reporter: ilpo.ruotsalainen, Assigned: wtc)

Tracking

({mobile})

Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 4 obsolete attachments)

Reporter

Description

12 years ago
User-Agent:       Opera/9.24 (X11; Linux i686; U; en)
Build Identifier: NSPR_HEAD_20071016

Native atomic operations for Linux/ARM should be implemented to increase performance.

Reproducible: Always

Steps to Reproduce:
1. Download sources.
2. Notice missing code.
Reporter

Comment 1

12 years ago
This patch implements atomic operations for Linux/ARM. However, this implementation requires at kernel version 2.6.13 or later.
Assignee

Updated

12 years ago
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
Assignee

Comment 2

12 years ago
Brad, Chris, Doug, could you help me review the patch in this bug?

Ilpo, we need to add a compile-time check (either an autoconf test
or a C preprocessor test using macros) to determine when we can
call the __kernel_cmpxchg function at the hardcoded address 0xffff0fc0.

It might be easier to implement NSPR's own cmpxchg function.  It should
be a couple of lines of inline assembly code if ARM has the right instruction.
Reporter

Comment 3

12 years ago
(In reply to comment #2)
> Ilpo, we need to add a compile-time check (either an autoconf test
> or a C preprocessor test using macros) to determine when we can
> call the __kernel_cmpxchg function at the hardcoded address 0xffff0fc0.

Any suggestions what the check should be based on? We can't test if we can run the code in cross-compiling cases (does anyone do that?) and even checking kernel version on the build host is not really a good check for this.

> It might be easier to implement NSPR's own cmpxchg function.  It should
> be a couple of lines of inline assembly code if ARM has the right instruction.

Oversimplifying, there's no way to implement atomic cmpxchg on ARMv5 (and older, I think, but those might be outside the scope anyway).

Updated

12 years ago
Keywords: mobile
We could ignore cross compiles for now and just check the host.  Many embedded teams are working with scratchbox anyway.

In the case of a cross compiling we could have a config flag for now
wtc, a configure check would be hard.  we are building in "scratchbox" which is a cross-compilation toolkit, not the real target device. It might need to just be a configure option that is passed by the developer (and we hope that they know what they are doing)

on the patch, I do wonder what the address is all about; a nice big comment was to why we need to do it this way would be very helpful.  Especially the requirement to call __kernel_cmpxchg_t repeatedly until success.  For example, the text:

    The ARM __kuser_cmpxchg routine is meant to implement an atomic cmpxchg
    in user space.  It however can produce spurious false negative if a
    processor exception occurs in the middle of the operation.

works very nice as a comment (stolen from commit Nicolas Pitre checkin comment to the linux 2.6 kernel - b49c0f24cf6744a3f4fd09289fe7cade349dead5).  You should probably change the routine name above to __kernel_cmpxchg_t.  (I assume both of these fuctions are the same thing)

I also wonder about the need for the do loop?  Is it sufficient to simply set |ov| and/or |nv| once, then loop waiting for a successful __kernel_cmpxchg_t?
Reporter

Comment 6

12 years ago
(In reply to comment #5)
> on the patch, I do wonder what the address is all about; a nice big comment was
> to why we need to do it this way would be very helpful.

I'll post an updated patch with added comments soon.

> I also wonder about the need for the do loop?  Is it sufficient to simply set
> |ov| and/or |nv| once, then loop waiting for a successful __kernel_cmpxchg_t?

If you only set them once you end up in an (more or less) infinite loop the first time some other thread gets in between setting |ov| and calling the cmpxchg since then |ov| will not match the value of the atomic variable anymore and the cmpxchg will not succeed unless some other thread changes the atomic value back to what it was previously.
any update Ilpo?
Reporter

Comment 8

11 years ago
Posted patch Patch v2 (obsolete) — Splinter Review
Attachment #290686 - Attachment is obsolete: true

Updated

11 years ago
Attachment #310211 - Flags: review?(wtc)
Ilpo, is see this in my nsprpub/config/autoconf.mk

USE_ARM_KUSER   = @USE_ARM_KUSER@

I am pretty sure that this should have either been set or not set depending on my configure flags.
Reporter

Comment 10

11 years ago
(In reply to comment #9)
> Ilpo, is see this in my nsprpub/config/autoconf.mk
> 
> USE_ARM_KUSER   = @USE_ARM_KUSER@
> 
> I am pretty sure that this should have either been set or not set depending on
> my configure flags.

Did you remember to regenerate nsprpub/configure?
yes, thanks.  I have been using this for a few weeks now without any problem.  Ilpo, did you happen to get any perf numbers?

Wtc, any chance to look at this new patch?
(In reply to comment #11)
> did you happen to get any perf numbers?

I have some on an ARM1176 development platform (CPU 400 MHz)
Linux version 2.6.21 (gcc version 4.2.1 (CodeSourcery Sourcery G++ Lite 2007q3-51))
Xulrunner (1.9b4pre) with a very simple UI
 
## Browser test suite based on iBench:
             No patch        Ilpo's patch     gain
Total Time:  1713.095 sec    1635.665 sec     4.5%
First time:   215.58  sec     205.89  sec     4.5%
Cached time:  213.935 sec     204.255 sec     4.5%

## Dromaeo.com:

With Ilpo's patch: http://dromaeo.com/?id=5263
Without patch: http://dromaeo.com/?id=6490

http://dromaeo.com/?id=6490,5263
Total:	259234.00ms ±4.86%	218554.80ms ±2.67%	winning 19%

Because Ilpo's patch was not a CVS diff, a number of the patch reviewing 
tools on bugzilla don't work with it.  This patch is exactly the same 
patch as the previous one, but in the form of a CVS unified diff.
Comment on attachment 310211 [details] [diff] [review]
Patch v2

This patch evidently uses some non-standard non-portable extensions of the 
c language that (I gather) must be common to all gcc compilers, but are 
certainly not accepted by most other c compilers.  There's precedent for 
doing this in this same header file, for alpha.  I guess we don't foresee 
using any other compiler on Linux any time soon.  

I would think that using the single-variable expression statement at the
end of the basic block would (or could) be optimized away.  This code 
appears to rely on the assumption that it will not be optimized away.
The alpha code does not appear to rely on that assumption.  Perhaps 
declaring that variable as volatile would avoid that possibility.

There's a chance of variable name collision causing unexpected results.
If (for example) someone invoked one of these macros with a variable 
named vp or ov (or nv), the results would not be as expected.  I recommend 
prepending the names of the local variables in these basic blocks with
some number of underscores (e.g. __ov, __nv), as was done in the alpha 
code, to reduce the chance of such collisions.  

Fixing the potential variable name collision problem is a must. 
If you're confident that the single variable expression statement cannot
be optimized away, then that can stay as is.
Attachment #310211 - Flags: review-
After thinking about this some more, it seems to me that there is another
way to do this that does not rely on so many assumptions about optimizers
or non-standard c language extensions.  It is to declare these functions
as __inline__ functions instead of them being macros.  

I'm thinking of something roughly like this:

__inline__ PRInt32 _MD_ATOMIC_ADD(PRInt32 *ptr, PRInt32 n)
{
    PRInt32 ov, nv;
    volatile PRInt32 *vp = (ptr);

    /* The whole sequence must be restarted if __kernel_cmpxchg() fails 
     * because some other thread may have changed the value of *vp. */  
    do { 
        ov = *vp; 
        nv = ov + n; 
    } while (__kernel_cmpxchg(ov, nv, vp)); 
    return nv;
}

#define _MD_ATOMIC_INCREMENT(ptr) _MD_ATOMIC_ADD(ptr, 1)
#define _MD_ATOMIC_DECREMENT(ptr) _MD_ATOMIC_ADD(ptr, -1)

This approach doesn't depend on allowing executable basic blocks 
inside of expressions (e.g. ({ code })
and doesn't rely on a single-variable expression statement that is 
the last statement in a basic block not being optimized away or 
being the "return value" of the block/expression.  
And it avoids the variable name collision potential completely.

If this approach produces comparable performance to the approach taken
in the previous patch, I'd much prefer it.
Assignee

Comment 16

11 years ago
Comment on attachment 310211 [details] [diff] [review]
Patch v2

Thanks for the patch.

Could you skip the USE_ARM_KUSER makefile variable and just
do AC_DEFINE(_PR_ARM_KUSER) directly in configure.in?  (It's
fine to use the shell variable USE_ARM_KUSER in configure.in.)

In NSPR, we put the closing brace } and while on the same line:

    } while (__kernel_cmpxchg(ov, (nv), vp));
Attachment #310211 - Flags: review?(wtc) → review-
I'd like to see this checked in as soon as we can; here's an updated patch based on the review comments.  The generated code in my simple tests is identical to the macro version; I haven't run actual perf tests, though.  Also fixed the configure/makefile issue; I just AC_DEFINE if the configure flag is specified, don't think we need the shell var anywhere.
Attachment #317821 - Flags: review?(nelson)
Assignee

Comment 18

11 years ago
I changed --enable-arm-kuser to --with-arm-kuser.  Let me
know if --enable is more appropriate.

I fixed the indentation offset (which is 4 in NSPR and NSS).

I would have defined __kernel_cmpxchg_t and __kernel_cmpxchg as:
typedef int (*__kernel_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
#define __kernel_cmpxchg ((__kernel_cmpxchg_t)0xffff0fc0)

But I found that the definitions in this patch come from
the example in the Linux/ARM kernel source code, so I kept
them.

I checked in the patch on the NSPR trunk (NSPR 4.7.1).

Checking in configure.in;
/cvsroot/mozilla/nsprpub/configure.in,v  <--  configure.in
new revision: 1.232; previous revision: 1.231
done
Checking in configure;
/cvsroot/mozilla/nsprpub/configure,v  <--  configure
new revision: 1.228; previous revision: 1.227
done
Checking in pr/include/md/_linux.h;
/cvsroot/mozilla/nsprpub/pr/include/md/_linux.h,v  <--  _linux.h
new revision: 3.49; previous revision: 3.48
done
Attachment #310211 - Attachment is obsolete: true
Attachment #316713 - Attachment is obsolete: true
Attachment #317821 - Attachment is obsolete: true
Attachment #317821 - Flags: review?(nelson)
Assignee

Updated

11 years ago
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → 4.7.1
(In reply to comment #18)
> I changed --enable-arm-kuser to --with-arm-kuser.  Let me
> know if --enable is more appropriate.

with should be fine.

> I fixed the indentation offset (which is 4 in NSPR and NSS).

Note that http://mxr.mozilla.org/mozilla/source/nsprpub/pr/include/md/_linux.h#1 indicates 2 -- my editor's just doing what the file says.  :)  You may want to update the modelines.

> I checked in the patch on the NSPR trunk (NSPR 4.7.1).

Thanks!  I'll do a separate patch to get this flag passed down from Gecko's toplevel configure.

Assignee

Comment 20

11 years ago
Vlad, I will fix the Emacs mode line in _linux.h.  The mode line
was given to us when Netscape released its browser source code in
1998.  Since none of the NSPR developers use Emacs, we just added
the mode line to our files without looking closely.

In comment #18, I wrote:
> I would have defined __kernel_cmpxchg_t and __kernel_cmpxchg as:
> typedef int (*__kernel_cmpxchg_t)(int oldval, int newval, volatile int *ptr);
> #define __kernel_cmpxchg ((__kernel_cmpxchg_t)0xffff0fc0)

I now understand why __kernel_cmpxchg has to be defined as:
#define __kernel_cmpxchg (*(__kernel_cmpxchg_t *)0xffff0fc0)

I believe that 0xffff0fc0 is not the address of the __kernel_cmpxchg
function but rather the address of a word that stores the address of
the function.
Comment on attachment 317838 [details] [diff] [review]
updated patch with review comments (as checked in)

Thanks, Vlad and Wan-Teh, for incorporating my comments.
Attachment #317838 - Flags: review+
You need to log in before you can comment on or make changes to this bug.