Closed Bug 601914 Opened 10 years ago Closed 9 years ago

XPCOM does not work with ARM hardfp ABI

Categories

(Core :: XPCOM, defect)

Other
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0b2+ ---

People

(Reporter: tero.koskinen, Assigned: siarhei.siamashka)

References

Details

Attachments

(5 files, 2 obsolete files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; fi-FI; rv:1.9.2.10) Gecko/20100915 Ubuntu/10.04 (lucid) Firefox/3.6.10
Build Identifier: 

XPCOM does not work with the hardware floating ABI of ARM.

Reproducible: Always

Steps to Reproduce:
1. Have a toolchain and a platform with hard floating point ARM ABI
2. Build mozilla-central
3. Build TestXPTCInvoke
4. Run TestXPTCInvoke on ARM device

Actual Results:  
Crash.


Expected Results:  
No crash.



Some output and backtrace:
P1 : 3
P2 : 5
P3 : 7
P4 : 11
P5 : 13
P6 : 17
P7 : 19
P8 : 23
P9 : 29
P10: 31
ret: 0xaed33b48
        3 + 5 + 7 + 11 + 13 + 17 + 19 + 23 + 29 + 31 = 158
        1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 = 66.000000
        moo cow = milk
should fail failed, returned 80004003
calling via invoke:
        1 + 1 = 2
        1L + 1L = 2
        2 * 2 = 4
        2L * 2L = 4
        1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55

Program received signal SIGSEGV, Segmentation fault.
0x00008e8c in InvokeTestTarget::AddTwoFloats (this=0x2d2b8, p1=0, 
    p2=3.2578125, retval=0x3f800000)
    at
/home/tkoskine/work/hardfp/mozilla-central/xpcom/reflect/xptcall/tests/TestXPTCInvoke.cpp:231
231    
/home/tkoskine/work/hardfp/mozilla-central/xpcom/reflect/xptcall/tests/TestXPTCInvoke.cpp:
No such file or directory.
        in
/home/tkoskine/work/hardfp/mozilla-central/xpcom/reflect/xptcall/tests/TestXPTCInvoke.cpp
(gdb) bt
#0  0x00008e8c in InvokeTestTarget::AddTwoFloats (this=0x2d2b8, p1=0, 
    p2=3.2578125, retval=0x3f800000)
    at
/home/tkoskine/work/hardfp/mozilla-central/xpcom/reflect/xptcall/tests/TestXPTCInvoke.cpp:231
#1  0x3ce8d710 in NS_InvokeByIndex_P ()
   from /usr/lib/xulrunner-2.0b5pre/libxul.so
#2  0x3ce8d710 in NS_InvokeByIndex_P ()
   from /usr/lib/xulrunner-2.0b5pre/libxul.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
Here is a partial patch for this issue.

The patch allows Fennec to run on ARM with hardware floating point ABI enabled.
I haven't yet tried, does the TestXPTCInvoke work fully.

Note: this patch unconditionally enables hard fp ABI for ARM.
OS: Linux → MeeGo
Hardware: Other → ARM
Version: unspecified → Trunk
Blocks: 583135
OS: MeeGo → Linux
Hardware: ARM → Other
Version: Trunk → unspecified
(In reply to comment #1)
> Note: this patch unconditionally enables hard fp ABI for ARM.

We could check __ARM_PCS/__ARM_PCS_VFP from http://gcc.gnu.org/viewcvs?view=revision&revision=162637 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45886)
(In reply to comment #1)
> Created attachment 480894 [details] [diff] [review]
> Partial fix for XPCOM (includes debug stuff and is missing some functionality)
> 
> Here is a partial patch for this issue.

I understand that bugs and missing functionality is expected at this stage. Here are some problems that I see in this patch at the moment: 

+        case nsXPTType::T_FLOAT  :
+            if (float_count < 16) {
+                copy_float_param(float_count, (PRUint32 *)&s->val.f);
+                d--;
+            } else {
+                *((float*)   d) = s->val.f;
+            }
+            float_count += 1;
+            break;

Probably it is better to postpone setting floating point registers until the
very function call. Just to be sure that the compiler will never corrupt them
while running the rest of C code of this function (for example by the code in
the else branch here, where float data type is used itself).

+            if ((float_count + 1) / 2 < 8) {
+                copy_double_param((float_count + 1) / 2, (PRUint64
*)&s->val.d);
+                d--;
+            } else {
+                d = copy_double_word(stk, d, end, (PRUint64 *)&s->val.d);
+            }
+            float_count += 2;

I'm afraid that the registers allocation part is a bit more tricky. For example
let's consider this code:

/*****************/
float x[3];

void f(float a, double b, float c)
{
        x[0] = a;
        x[1] = b;
        x[2] = c;
}
/*****************/

gcc -O2 -fomit-frame-pointer -c test.c
objdump -d test.o

00000000 <f>:
   0:   eef77bc1        vcvt.f32.f64    s15, d1
   4:   e3003000        movw    r3, #0
   8:   e3403000        movt    r3, #0
   c:   edc30a02        vstr    s1, [r3, #8]
  10:   ed830a00        vstr    s0, [r3]
  14:   edc37a01        vstr    s15, [r3, #4]
  18:   e12fff1e        bx      lr

Here we have the following registers mapping:
a : s0
b : d1
c : s1

Your code is going to assign them as:
a : s0
b : d1
c : s3
Calling conventions are described in the following document, in section "5.5 Parameter Passing": http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf
Comment on attachment 480894 [details] [diff] [review]
Partial fix for XPCOM (includes debug stuff and is missing some functionality)

A few comments:

----

(I think you know the ABI stuff, but I'm documenting an example for
other readers and for reference. This particular example won't work
with your current solution.)

Consider the following target function:
    void some_func( void *  this,   // Implicit in C++.
                    int32_t a,
                    int32_t b,
                    float   p,
                    int32_t c,
                    int32_t d,
                    int32_t e,
                    double  q);

Registers are assigned in order:

 this   -> r0       \
 a      -> r1       | As in softfp.
 b      -> r2       /
 p      -> s0       <- Goes into a VFP register.
 c      -> r3       <- Next consecutive integer register.
 d      -> sp[0]    <- No integer argument registers left, so stack.
 e      -> sp[1]    <- Next stack slot.
 q      -> d1(s3:s2)<- 'double', so use next aligned VFP register.

Your solution addresses this by hijacking the stack write at the
appropriate point in invoke_copy_to_stack. However, the VFP registers
may well be clobbered between the call to copy_double_param and the call
to "func" in NS_InvokeByIndex, so the procedure isn't safe.

The integer argument registers (r1-r3) would normally have the same
problem, and that is why this routine used to be written in assembly.
However, the call to "func" is given three integer arguments which will
always end up in r1-r3, so the compiler gives us the copy implicitly.

You might be able to solve the problem with a call like this:

    return func(that, stack_space[base_size * 2 - 3],
                      stack_space[base_size * 2 - 2],
                      stack_space[base_size * 2 - 1],
                      double_stack[0],
                      double_stack[1],
                      double_stack[2],
                      double_stack[3],
                      double_stack[4],
                      double_stack[5],
                      double_stack[6],
                      double_stack[7]);

That clearly requires creating an 8-double-long buffer somewhere. Your
hijacked copy_float_param function could write into that buffer, rather
than into float registers.

GCC may also generate non-optimal code for this, so hand-writing some
assembly might be a good idea. (That can be left for another day if your
priorities are to get things working.) The motivation behind porting to
pure C is so that it would build for ARMv4T whilst providing optimal
calling code for newer platforms. (ARMv4T doesn't have the "blx"
instruction required for efficient calls on newer platforms.) With the
"hard" ABI, you already have an implicit requirement for a modern
platform anyway.

----

Writing the code in a new file may be a good idea if you have to #ifdef
a large number of modifications. (It depends on how much you need to
make conditional, though.) Something like "xptcinvoke_arm_hardfp.cpp"
might be appropriate, if you need to do this.

----

It's probably made irrelevant by my previous suggestion, but the
following will almost certainly perform poorly:

+        "cmp    %1, #0\n\t"
+        "flddeq d0, [%0]\n\t"
+        "cmp    %1, #1\n\t"
+        "flddeq d1, [%0]\n\t"
+        "cmp    %1, #2\n\t"
+        "flddeq d2, [%0]\n\t"
+        "cmp    %1, #3\n\t"
+        "flddeq d3, [%0]\n\t"
+        "cmp    %1, #4\n\t"
+        "flddeq d4, [%0]\n\t"
+        "cmp    %1, #5\n\t"
+        "flddeq d5, [%0]\n\t"
+        "cmp    %1, #6\n\t"
+        "flddeq d6, [%0]\n\t"
+        "cmp    %1, #7\n\t"
+        "flddeq d7, [%0]\n\t"

I think you'd be better off writing to a buffer and using a single VLDM
to load the arguments. (You'll have to do that anyway, if you use the
"func" call modification that I proposed.)
(In reply to comment #3)
> [...]
(In reply to comment #4)
> [...]

Or, what Siarhei said! (I didn't check the bug before pushing "submit".)
The toolchain must support __ARM_PCS/__ARM_PCS_VFP predefined symbols for it to work. As a temporary solution for testing purposes, __ARM_PCS_VFP can be defined in some makefile.

I had to use naked function fully implemented in assembly because alloca stuff seems to be very fragile. It often breaks as a result of some more or less innocent looking changes in the code.

Jacob, could you please review this patch?
Attachment #481102 - Flags: review?(Jacob.Bramley)
Comment on attachment 481102 [details] [diff] [review]
Hopefully fully functional arm hardfloat support in xptcinvoke (try1)

Tested... TestXPTCInvoke pass all tests with this patch.
Attachment #481102 - Flags: feedback+
> make conditional, though.) Something like "xptcinvoke_arm_hardfp.cpp"
> might be appropriate, if you need to do this.
> 
new file require some Makefile stub define, and configure test for hardfp ABI.

Also some hardfp toolchain detection need to be done for JS JIT in js/src
Comment on attachment 481102 [details] [diff] [review]
Hopefully fully functional arm hardfloat support in xptcinvoke (try1)


>@@ -21,6 +21,7 @@
>  *
>  * Contributor(s):
>  * Mike Hommey <mh@glandium.org>
>+ * Nokia Corporation

I believe contributor here should be

>+ * Siarhei Siamashka <siarhei.siamashka@gmail.com>
Status: UNCONFIRMED → NEW
Ever confirmed: true
tracking-fennec: --- → 2.0+
(In reply to comment #10)
> I believe contributor here should be
> 
> >+ * Siarhei Siamashka <siarhei.siamashka@gmail.com>

There are examples of corporate contributions going purely under the company or group names (such as "Adobe AS3 Team"). Is it a general Mozilla recommendation to use individual names? (I just want to know for future reference.)
IIUC we need also change xptcstubs_arm.cpp... Do we have any tests for this one?
(In reply to comment #12)
> IIUC we need also change xptcstubs_arm.cpp... Do we have any tests for this
> one?

I couldn't find any directly, but there is js/src/xpconnect/tests/TestXPC(.cpp) which might test xptcstubs_arm.cpp code also.
Comment on attachment 481102 [details] [diff] [review]
Hopefully fully functional arm hardfloat support in xptcinvoke (try1)


In general, I like this approach, and it's clear what the code does.
I've made a few comments below on specific points. I'm giving "r-"
purely because there are a few points below and I'd like to see v2.

>+static inline void copy_word(PRUint32* &ireg_args,
>+                             PRUint32* &stack_args,
>+                             PRUint32* end,
>+                             PRUint32  data)
>+{
>+    if (ireg_args < end)
>+        *ireg_args++ = data;
>+    else
>+        *stack_args++ = data;
>+}

This looks correct, but I'm not sure about the coding style. I always
use braces, and I'd list the increment on a separate line for clarity.
Mozilla's style guide is here: https://developer.mozilla.org/En/Mozilla_Coding_Style_Guide
Having said that, style does vary from component to component, and I
don't do much work in XPCOM so others probably know better.

(There are other instances, but I don't want to dwell on it.)

>+static inline void copy_dword(PRUint32* &ireg_args,
>+                              PRUint32* &stack_args,
>+                              PRUint32* end,
>+                              PRUint64  data)
>+{

This function could do with a comment to explain the invariants and how
they are satisfied. For example, you know that 'end' is aligned to 8
bytes (despite being a PRUint32*), so it is safe to check just
ireg_args+1 and yet (potentially) write three words onto it.

>+static inline bool copy_vfp_single(float* &vfp_s_args, double* &vfp_d_args,
>+                                   float* end, float data)
>+{
>+    if (vfp_s_args < end)
>+    {
>+        *vfp_s_args++ = data;
>+        if (vfp_s_args < (float *)vfp_d_args)
>+            vfp_s_args = (float *)vfp_d_args; /* back-filling case */
>+        else if (vfp_s_args > (float *)vfp_d_args)
>+            vfp_d_args++;
>+        return true;
>+    }
>+    return false;
>+}
>+
>+static inline bool copy_vfp_double(float* &vfp_s_args, double* &vfp_d_args,
>+                                   float* end, double data)
>+{
>+    if (vfp_d_args < (double *)end)
>+    {
>+        if (vfp_s_args == (float *)vfp_d_args)
>+            vfp_s_args += 2;
>+        *vfp_d_args++ = data;
>+        return true;
>+    }
>+    // failed to allocate register even once -> can't use vfp registers anymore
>+    vfp_s_args = end;
>+    return false;
>+}

The code would perhaps be clearer if you made (for example) copy_f32 and
copy_f64 functions, as you did for the integer code, and then maintain
only one vfp_args pointer. The alignment rules are the same as for
integer code. Currently, copy_vfp_double works differently to
copy_dword, yet it performs essentially the same function but with a
different data type.

>+EXPORT_XPCOM_API(nsresult)
>+NS_InvokeByIndex(nsISupports* that, PRUint32 methodIndex,
>+                   PRUint32 paramCount, nsXPTCVariant* params)

You've got a lot of casts in this function (and its associated macro),
but you've defined your own types. Since you're passing to assembly
anyway, why not just start with the types that already fit what you
want? (This looks like an artefact from the softfp variant.)

Actually, since you drive everything from your assembly function, you
could just put do_call_asm directly in NS_InvokeByIndex. (It doesn't
need to be naked, as long as you specify the appropriate asm
dependencies.)

Untested example:

EXPORT_XPCOM_API(nsresult)
NS_InvokeByIndex(nsISupports* that, PRUint32 methodIndex,
                   PRUint32 paramCount, nsXPTCVariant* params)
{
    register PRUint32   result asm("r0");
    asm (
#if defined(__GXX_ABI_VERSION) && __GXX_ABI_VERSION >= 100 /* G++ V3 ABI */
        "   ldr         FUNC, [THAT, METHODINDEX]\n"
#else /* non G++ V3 ABI */
        "   add         ip, METHODINDEX, THAT\n"
        "   ldr         FUNC, [ip, #8]\n"
#endif
        "   [...]       @ Implementation as before."
        :   [RESULT] "=&r" (result)
        :   [THAT] "r" (that),
            [METHODINDEX] "r" (methodIndex * sizeof(vtable)),
            [...]
        :   [...]
        );
    return result;
}

>+static void __attribute__((naked)) do_call_asm()
>+{
>+    asm volatile (
>+        /* define some aliases for better readability */
>+        "FUNC                 .req    r4\n"
>+        "THAT                 .req    r5\n"
>+        "PARAMS               .req    r6\n"
>+        "PARAM_COUNT_PLUS_2   .req    r7\n"
>+        "INVOKE_COPY_TO_STACK .req    r8\n"
>+        "STACK_SPACE_SIZE     .req    r9\n"
>+
>+        /* function entry */
>+        "ldr    ip, [sp]\n"
>+        "push   {r4-r11, ip, lr}\n" /* save all registers */

With the exception of dummy values for aligning the stack pointer, only
save what you use. You don't seem to use r10-r11.

If you put this in-line (as I suggested above), you should just add the
registers to the clobber list, and let the compiler stack them for you.
Putting the code in-line imposes a slight inconvenience, however, as the
compiler might not guarantee that the stack is 8-byte-aligned for the
asm block. This is easily fixed by adding (SP & 4) to STACK_SPACE_SIZE.

>+        /* save current vfp registers (don't want to load some garbage later) */
>+        "add    ip, sp, PARAM_COUNT_PLUS_2, lsl #3\n"
>+        "vstm   ip, {d0, d1, d2, d3, d4, d5, d6, d7}\n"

Why preserve these? They contain nothing useful at this point and are
caller-saved registers, so we can clobber them here.

>+        /* call func */
>+        "mov    r0, THAT\n"
>+        "add    ip, sp, PARAM_COUNT_PLUS_2, lsl #3\n"
>+        "vldm   ip, {d0, d1, d2, d3, d4, d5, d6, d7}\n"
>+        "ldmdb  ip!, {r1, r2, r3}\n"

There's no need for the update on IP. (It's harmless, but wasted effort.)
Attachment #481102 - Flags: review?(Jacob.Bramley) → review-
I've Added Mike Hommey on CC because he already went through the process of updating this code for ARMv4T and may have some insight.
Comment on attachment 481102 [details] [diff] [review]
Hopefully fully functional arm hardfloat support in xptcinvoke (try1)

>+#else /* __ARM_PCS_VFP */

Somewhere near here i'd like to see something like this:

Calling conventions are described in the following document, in section "5.5
Parameter Passing":
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf


>+static inline bool copy_vfp_single(float* &vfp_s_args, double* &vfp_d_args,
>+                                   float* end, float data)
>+{
>+    if (vfp_s_args < end)
>+    {
>+        *vfp_s_args++ = data;
>+        if (vfp_s_args < (float *)vfp_d_args)
>+            vfp_s_args = (float *)vfp_d_args; /* back-filling case */

Could you possibly explain as a comment before this function what's going on here and how the back-filling case works/what it means?

>+        else if (vfp_s_args > (float *)vfp_d_args)
>+            vfp_d_args++;
>+        return true;
>+    }

Generally in mozilla code we try to return early on failure, it saves on indentation:
>+    return false;

so instead you'd have:

if (vfp_s_args > end)
  return false;

<main code path here>

>+}

>+static inline bool copy_vfp_double(float* &vfp_s_args, double* &vfp_d_args,
>+                                   float* end, double data)
>+{

again, having early returns for failure is generally preferred 

>+    if (vfp_d_args < (double *)end)
>+    {
>+        if (vfp_s_args == (float *)vfp_d_args)
>+            vfp_s_args += 2;
>+        *vfp_d_args++ = data;
>+        return true;
>+    }

>+    // failed to allocate register even once -> can't use vfp registers anymore

I can't understand this comment as it's written, please try to clarify.
Note that comments do not have to be terse.

>+    vfp_s_args = end;
>+    return false;
>+}

while this macro uses two space indentation (which doesn't match the file),
>+#define do_call(func, that, params, paramCount, invoke_copy_to_stack)    \
>+  ((PRUint32 (*)(void (*)(), void *, void *, PRUint32, void (*)()))      \
>+   do_call_asm)                                                          \
>+  (reinterpret_cast<void (*)()>(func), (that), (params), (paramCount),   \
>+  reinterpret_cast<void (*)()>(invoke_copy_to_stack))

the file is using four space indentation, and this is a function in it, i'd rather both the macro-function above and the function below match the file and uses 4 space.
>+EXPORT_XPCOM_API(nsresult)
>+NS_InvokeByIndex(nsISupports* that, PRUint32 methodIndex,
>+                   PRUint32 paramCount, nsXPTCVariant* params)
>+{
>+  vtable_func *vtable, func;
Attachment #481102 - Flags: feedback-
(In reply to comment #10)
> >  * Contributor(s):
> >  * Mike Hommey <mh@glandium.org>
> >+ * Nokia Corporation
> 
> I believe contributor here should be
> 
> >+ * Siarhei Siamashka <siarhei.siamashka@gmail.com>

Thanks for your comment. I'll try to find out whether Nokia actually wants the copyright/credit for this particular patch or I can keep it for myself (I'm a Nokia employee at the moment for what it matters).
Note that this is entirely about copyright, not "credit". I believe that Nokia is automatically the copyright owner unless it assigns it, so Nokia is correct.
(In reply to comment #14)
> >+static inline void copy_dword(PRUint32* &ireg_args,
> >+                              PRUint32* &stack_args,
> >+                              PRUint32* end,
> >+                              PRUint64  data)
> >+{
> 
> This function could do with a comment to explain the invariants and how
> they are satisfied. For example, you know that 'end' is aligned to 8
> bytes (despite being a PRUint32*), so it is safe to check just
> ireg_args+1 and yet (potentially) write three words onto it.

OK, such additional comments can be added.
 
> >+static inline bool copy_vfp_single(float* &vfp_s_args, double* &vfp_d_args,
> >+                                   float* end, float data)

> >+static inline bool copy_vfp_double(float* &vfp_s_args, double* &vfp_d_args,
> >+                                   float* end, double data)
> 
> The code would perhaps be clearer if you made (for example) copy_f32 and
> copy_f64 functions, as you did for the integer code, and then maintain
> only one vfp_args pointer. The alignment rules are the same as for
> integer code. Currently, copy_vfp_double works differently to
> copy_dword, yet it performs essentially the same function but with a
> different data type.

There is a substantial difference. VFP registers allocation has to support "back-filling", while integer code does not need to do this.

For example "void f(int a, long long b, int c, int d)" will have:
a = r0
b = r2:r3 pair
c = [sp]
d = [sp, #4]

But "void f(float a, double b, float c, float d)" will have:
a = s0
b = d1 (s2:s3 pair)
c = s1 <--- THIS IS IMPORTANT (we are back-filling the gap)
d = s4

I have attached a test program which I used to compare VFP registers allocation code against a simple but slow implementation done according to the spec.

I wonder if it would be a good idea to actually add a simple reference implementation to the patch too (inactive, but switchable via ifdefs)?

Another important test case is function "void f(double d0, double d1, double d2, double d3, double d4, double d5, double d6, float s14, double this_goes_to_stack, float this_goes_to_stack_too_instead_of_s15)"

Actually current XPCOM test suite does not stress some of the interesting cases and may need to be improved.

> >+EXPORT_XPCOM_API(nsresult)
> >+NS_InvokeByIndex(nsISupports* that, PRUint32 methodIndex,
> >+                   PRUint32 paramCount, nsXPTCVariant* params)
> 
> You've got a lot of casts in this function (and its associated macro),
> but you've defined your own types. Since you're passing to assembly
> anyway, why not just start with the types that already fit what you
> want? (This looks like an artefact from the softfp variant.)

I guess these types have some meaning, so getting rid of them without a good reason (just because we can) may be not so intuitive.

> >+        /* save current vfp registers (don't want to load some garbage later) */
> >+        "add    ip, sp, PARAM_COUNT_PLUS_2, lsl #3\n"
> >+        "vstm   ip, {d0, d1, d2, d3, d4, d5, d6, d7}\n"
> 
> Why preserve these? They contain nothing useful at this point and are
> caller-saved registers, so we can clobber them here.

Yeah, appears it's a case of "cargo cult programming" :) I somehow initially thought that loading random garbage values found on stack via VLDM instruction later (for the gaps which are not filled) may be not a very good idea. But since VLDM itself is not going to trigger floating point exceptions, this indeed can be just dropped.

The rest of your comments about assembly code basically boils down to "you can do it with less instructions and less registers used". Yes, I agree :) Will improve (maybe also taking care of perfect instructions scheduling). I just thought that this part did not strictly require performance optimizations for the initial revision (unlike VFP registers allocation code which resides in the inner loop).


I'll also double check it for the potential quirks like theoretical big endian compatibility, proper thumb interworking on older ARM cores, etc. Though the patch should work quite fine at least on little endian armv7 compatible processors even now.
(In reply to comment #19)
> There is a substantial difference. VFP registers allocation has to support
> "back-filling", while integer code does not need to do this.

Oh, you are correct. Apologies!

> The rest of your comments about assembly code basically boils down to "you can
> do it with less instructions and less registers used". Yes, I agree :) Will
> improve (maybe also taking care of perfect instructions scheduling). I just
> thought that this part did not strictly require performance optimizations for
> the initial revision (unlike VFP registers allocation code which resides in the
> inner loop).

It's true that making it work is the priority, but I believe this is hot code so it's worth making it fast too if we can.

----

I think removing the do_call_asm macro could go some way to clarifying the code, but that can be picked up in an optimization phase anyway if you prefer.
(In reply to comment #16)
> >+#else /* __ARM_PCS_VFP */
> 
> Somewhere near here i'd like to see something like this:
> 
> Calling conventions are described in the following document, in section "5.5
> Parameter Passing":
> http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf

Unfortunately the web links to ARM documentation are (or used to be) "polymorphic" and it is quite likely that this link will be dead in a year or so. But I agree, at least the name of this document can be mentioned.

> >+static inline bool copy_vfp_single(float* &vfp_s_args, double* &vfp_d_args,
> >+                                   float* end, float data)
> >+{
> >+    if (vfp_s_args < end)
> >+    {
> >+        *vfp_s_args++ = data;
> >+        if (vfp_s_args < (float *)vfp_d_args)
> >+            vfp_s_args = (float *)vfp_d_args; /* back-filling case */
> 
> Could you possibly explain as a comment before this function what's going on
> here and how the back-filling case works/what it means?

Just read the spec, Luke ;) But seriously, this code is indeed a bit difficult to read and review because it tries to cut some corners and do things faster (though I'm not claiming that it is the fastest possible implementation). See comment 19 for a bit more explanations and a test program.

> 
> >+        else if (vfp_s_args > (float *)vfp_d_args)
> >+            vfp_d_args++;
> >+        return true;
> >+    }
> 
> Generally in mozilla code we try to return early on failure, it saves on
> indentation:
> >+    return false;

Putting this code to inline functions was already mostly done for the purpose of keeping indentation sane. Just two levels of indentation for a very small function is IMHO more than fine. But as you insist, I can surely do this.

> >+    // failed to allocate register even once -> can't use vfp registers anymore
> 
> I can't understand this comment as it's written, please try to clarify.
> Note that comments do not have to be terse.

Yeah, maybe a direct quote from the spec would be a lot better here: "The back-filling continues only so long as no VFP CPRC has been allocated to a slot on the stack."

Thanks to everyone who found time to review the patch! The v2 is coming.
tracking-fennec: 2.0+ → 2.0-
tracking-fennec: 2.0- → 2.0+
(In reply to comment #14)
> Untested example:
> 
> EXPORT_XPCOM_API(nsresult)
> NS_InvokeByIndex(nsISupports* that, PRUint32 methodIndex,
>                    PRUint32 paramCount, nsXPTCVariant* params)
> {
>     register PRUint32   result asm("r0");

Looks like this is unsafe: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32820#c7
Probably it's better to avoid such fancy features as explicitly forcing variables into specific registers.
Let's try it this way:
- Coding style fixes
- More comments
- Naked function replaced with inline assembly block
Attachment #481102 - Attachment is obsolete: true
Attachment #482332 - Flags: review?(Jacob.Bramley)
(In reply to comment #22)
> (In reply to comment #14)
> >     register PRUint32   result asm("r0");
> 
> Looks like this is unsafe: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32820#c7

Oh, that's disappointing. It's what the manual recommends!
There are numerous bugs with inline assembly. The trick is to know them and avoid problems. One of the solutions would be to turn 'NS_InvokeByIndex' into a naked function, but such functions with arguments are also potentially unsafe:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44290
There were other problems with 'naked' attribute other than this bug, generally it is only safe to use naked function without any arguments (and do some function type casting like in my first patch), otherwise gcc may easily screw it up especially with optimizations disabled, frame pointers enabled or in some other not quite typical configurations.

Another unfixed inline assembly bug to be aware of: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43440
(In reply to comment #13)
> (In reply to comment #12)
> > IIUC we need also change xptcstubs_arm.cpp... Do we have any tests for this
> > one?
> 
> I couldn't find any directly, but there is js/src/xpconnect/tests/TestXPC(.cpp)
> which might test xptcstubs_arm.cpp code also.

This test uses stubs, but does not seem to use any floating point arguments there (as shown by debugging print added into 'PrepareAndDispatch' function). Trying "make -C objdirXXX/ xpcshell-tests" also does not reveal any obvious uses of floating point arguments.

Does anybody know what would be the easiest way to add a new test which could stress floating point arguments with XPCOM stubs?
One alternative is to just add hardfp support to the stubs code, artificially test it in some other way, and hope that it will be all right.
Assignee: nobody → siarhei.siamashka
Comment on attachment 482332 [details] [diff] [review]
ARM hardfloat support for xptcinvoke (try2)

Jacob could you review this patch?
Comment on attachment 482332 [details] [diff] [review]
ARM hardfloat support for xptcinvoke (try2)

Sorry for the slow review!

The patch looks good. The only thing missing is that you should specify d0-d7 and d15-d31 in the clobber list for the asm block. (They might be clobbered by the call.)

Adding d15-d31 may be tricky; I don't know if you can query for VFPv3-D32 support in the pre-processor. If it's not possible, I'd just leave it out. As long as the InvokeByIndex function is not in-lined, and as long as it doesn't use VFP value itselfs, d16-d31 should never be used by the compiler anyway, and it won't matter if the call clobbers them.

r+ with d0-d7. I'm not convinced that it'll be possible to specify d15-d31 reliably, so you can leave them off for now.
Attachment #482332 - Flags: superreview?(benjamin)
Attachment #482332 - Flags: review?(vladimir)
Attachment #482332 - Flags: review?(Jacob.Bramley)
Attachment #482332 - Flags: review+
(In reply to comment #29)
> As long as the InvokeByIndex function is not in-lined, and as long as it
> doesn't use VFP value itselfs, d16-d31 should never be used by the compiler 
> anyway, and it won't matter if the call clobbers them.

That's actually a good point. With LTO becoming more popular recently, taking special care to prevent inlining may be also needed. Would also adding __attribute__((noinline)) to InvokeByIndex function make sense?

And now I see that surely some more comments explaining why it is done this way would be a good addition.

> r+ with d0-d7. I'm not convinced that it'll be possible to specify d15-d31
> reliably, so you can leave them off for now.

Thanks. Should I attach v3 patch and request a review for it again?
(In reply to comment #29)
> Adding d15-d31 may be tricky; I don't know if you can query for VFPv3-D32
> support in the pre-processor. If it's not possible, I'd just leave it out.

Actually it seems that gcc currently accepts d15-d31 registers in the clobber list even for -mfpu=vfp and -mfpu=vfpv3-d16. If they are just added as is, in the worst case it would be a build failure (which is a lot better than some obscure runtime bug). So would it be the best to have d15-d31 registers in the clobber list unconditionally with a descriptive comment there? So that if gcc becomes more strict in the future, anyone even not familiar with arm assembler would still get the idea about what is going on and how to workaround it.

I have not found any gcc preprocessor symbol that could be used to detect the difference between d16 and d32 variants. Assuming that it does not exist yet, asking gcc developers to provide one may be a good idea.
(In reply to comment #31)
> So would it be the best to have d15-d31 registers in the
> clobber list unconditionally with a descriptive comment there?

That sounds like a good solution to me.

> I have not found any gcc preprocessor symbol that could be used to detect the
> difference between d16 and d32 variants. Assuming that it does not exist yet,
> asking gcc developers to provide one may be a good idea.

Agreed. I confirmed that there is no such symbol, but there really should be.

(In reply to comment #30)
> Thanks. Should I attach v3 patch and request a review for it again?

General policy is that minor changes — such as tweaks based on comments in a review or simple rebasing — do not need a re-review. However, if you don't have commit rights then you'll need to post a new patch anyway. Also, you've got a couple more reviews waiting yet so you might as well tag a v3 patch.
- added VFP registers to the inline assembly clobber list as requested by Jacob
- more comments
Attachment #482332 - Attachment is obsolete: true
Attachment #482332 - Flags: superreview?(benjamin)
Attachment #482332 - Flags: review?(vladimir)
Attachment #485722 - Flags: superreview?(benjamin)
Attachment #485722 - Flags: review?(vladimir)
Comment on attachment 485722 [details] [diff] [review]
ARM hardfloat support for xptcinvoke (v3)

I don't know ARM assembly and am not a qualified reviewer. I'm happy to delegate to vlad.
Attachment #485722 - Flags: superreview?(benjamin)
Comment on attachment 485722 [details] [diff] [review]
ARM hardfloat support for xptcinvoke (v3)

You probably want jacob to r+ this one as well
Attachment #485722 - Flags: review?(vladimir) → review?(Jacob.Bramley)
Comment on attachment 485722 [details] [diff] [review]
ARM hardfloat support for xptcinvoke (v3)

Jacob already checked try2, I have attached interdiff between try2/try3 for additional review
Attachment #485722 - Flags: review?(Jacob.Bramley) → review?(vladimir)
Attachment #482332 - Attachment is obsolete: false
Attachment #485996 - Flags: review?(Jacob.Bramley)
Comment on attachment 485996 [details] [diff] [review]
Diff between try2 which r+ and try3

Yep, I already looked at v3 too, though I didn't add an r+ tag to show it.
Attachment #485996 - Flags: review?(Jacob.Bramley) → review+
Attachment #485722 - Attachment is obsolete: true
Attachment #485722 - Flags: review?(vladimir)
HardFP can land for b2.
tracking-fennec: 2.0+ → 2.0b2+
http://hg.mozilla.org/mozilla-central/rev/dd50da0646a4
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
wow.  not sure wtf all of that js stuff is in the mc push.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
ignore me.  wrong bug.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
it sounds like some renaming in js haunting peoples merges.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
backed out.  will reland when tree goes green.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Better push, no extra JS tests changes.
http://hg.mozilla.org/mozilla-central/rev/eaefbdbc2f2d
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → FIXED
Depends on: 644140
You need to log in before you can comment on or make changes to this bug.