601914 - XPCOM does not work with ARM hardfp ABI

Reporter

Description

•

15 years ago

User-Agent: Mozilla/5.0 (X11; U; Linux i686; fi-FI; rv:1.9.2.10) Gecko/20100915 Ubuntu/10.04 (lucid) Firefox/3.6.10 Build Identifier: XPCOM does not work with the hardware floating ABI of ARM. Reproducible: Always Steps to Reproduce: 1. Have a toolchain and a platform with hard floating point ARM ABI 2. Build mozilla-central 3. Build TestXPTCInvoke 4. Run TestXPTCInvoke on ARM device Actual Results: Crash. Expected Results: No crash. Some output and backtrace: P1 : 3 P2 : 5 P3 : 7 P4 : 11 P5 : 13 P6 : 17 P7 : 19 P8 : 23 P9 : 29 P10: 31 ret: 0xaed33b48 3 + 5 + 7 + 11 + 13 + 17 + 19 + 23 + 29 + 31 = 158 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 11 = 66.000000 moo cow = milk should fail failed, returned 80004003 calling via invoke: 1 + 1 = 2 1L + 1L = 2 2 * 2 = 4 2L * 2L = 4 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 = 55 Program received signal SIGSEGV, Segmentation fault. 0x00008e8c in InvokeTestTarget::AddTwoFloats (this=0x2d2b8, p1=0, p2=3.2578125, retval=0x3f800000) at /home/tkoskine/work/hardfp/mozilla-central/xpcom/reflect/xptcall/tests/TestXPTCInvoke.cpp:231 231 /home/tkoskine/work/hardfp/mozilla-central/xpcom/reflect/xptcall/tests/TestXPTCInvoke.cpp: No such file or directory. in /home/tkoskine/work/hardfp/mozilla-central/xpcom/reflect/xptcall/tests/TestXPTCInvoke.cpp (gdb) bt #0 0x00008e8c in InvokeTestTarget::AddTwoFloats (this=0x2d2b8, p1=0, p2=3.2578125, retval=0x3f800000) at /home/tkoskine/work/hardfp/mozilla-central/xpcom/reflect/xptcall/tests/TestXPTCInvoke.cpp:231 #1 0x3ce8d710 in NS_InvokeByIndex_P () from /usr/lib/xulrunner-2.0b5pre/libxul.so #2 0x3ce8d710 in NS_InvokeByIndex_P () from /usr/lib/xulrunner-2.0b5pre/libxul.so Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb)

Tero Koskinen

Reporter

Comment 1

•

15 years ago

Attached patch Partial fix for XPCOM (includes debug stuff and is missing some functionality) — Details — Splinter Review

Here is a partial patch for this issue. The patch allows Fennec to run on ARM with hardware floating point ABI enabled. I haven't yet tried, does the TestXPTCInvoke work fully. Note: this patch unconditionally enables hard fp ABI for ARM.

Tero Koskinen

Reporter

Updated

•

15 years ago

OS: Linux → MeeGo

Hardware: Other → ARM

Version: unspecified → Trunk

Jan Arne Petersen

Updated

•

15 years ago

Blocks: 583135

OS: MeeGo → Linux

Hardware: ARM → Other

Version: Trunk → unspecified

Jan Arne Petersen

Comment 2

•

15 years ago

(In reply to comment #1) > Note: this patch unconditionally enables hard fp ABI for ARM. We could check __ARM_PCS/__ARM_PCS_VFP from http://gcc.gnu.org/viewcvs?view=revision&revision=162637 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45886)

Siarhei Siamashka

Assignee

Comment 3

•

15 years ago

(In reply to comment #1) > Created attachment 480894 [details] [diff] [review] > Partial fix for XPCOM (includes debug stuff and is missing some functionality) > > Here is a partial patch for this issue. I understand that bugs and missing functionality is expected at this stage. Here are some problems that I see in this patch at the moment: + case nsXPTType::T_FLOAT : + if (float_count < 16) { + copy_float_param(float_count, (PRUint32 *)&s->val.f); + d--; + } else { + *((float*) d) = s->val.f; + } + float_count += 1; + break; Probably it is better to postpone setting floating point registers until the very function call. Just to be sure that the compiler will never corrupt them while running the rest of C code of this function (for example by the code in the else branch here, where float data type is used itself). + if ((float_count + 1) / 2 < 8) { + copy_double_param((float_count + 1) / 2, (PRUint64 *)&s->val.d); + d--; + } else { + d = copy_double_word(stk, d, end, (PRUint64 *)&s->val.d); + } + float_count += 2; I'm afraid that the registers allocation part is a bit more tricky. For example let's consider this code: /*****************/ float x[3]; void f(float a, double b, float c) { x[0] = a; x[1] = b; x[2] = c; } /*****************/ gcc -O2 -fomit-frame-pointer -c test.c objdump -d test.o 00000000 <f>: 0: eef77bc1 vcvt.f32.f64 s15, d1 4: e3003000 movw r3, #0 8: e3403000 movt r3, #0 c: edc30a02 vstr s1, [r3, #8] 10: ed830a00 vstr s0, [r3] 14: edc37a01 vstr s15, [r3, #4] 18: e12fff1e bx lr Here we have the following registers mapping: a : s0 b : d1 c : s1 Your code is going to assign them as: a : s0 b : d1 c : s3

Siarhei Siamashka

Assignee

Comment 4

•

15 years ago

Calling conventions are described in the following document, in section "5.5 Parameter Passing": http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf

Jacob Bramley [:jbramley]

Comment 5

•

15 years ago

Comment on attachment 480894 [details] [diff] [review] Partial fix for XPCOM (includes debug stuff and is missing some functionality) A few comments: ---- (I think you know the ABI stuff, but I'm documenting an example for other readers and for reference. This particular example won't work with your current solution.) Consider the following target function: void some_func( void * this, // Implicit in C++. int32_t a, int32_t b, float p, int32_t c, int32_t d, int32_t e, double q); Registers are assigned in order: this -> r0 \ a -> r1 | As in softfp. b -> r2 / p -> s0 <- Goes into a VFP register. c -> r3 <- Next consecutive integer register. d -> sp[0] <- No integer argument registers left, so stack. e -> sp[1] <- Next stack slot. q -> d1(s3:s2)<- 'double', so use next aligned VFP register. Your solution addresses this by hijacking the stack write at the appropriate point in invoke_copy_to_stack. However, the VFP registers may well be clobbered between the call to copy_double_param and the call to "func" in NS_InvokeByIndex, so the procedure isn't safe. The integer argument registers (r1-r3) would normally have the same problem, and that is why this routine used to be written in assembly. However, the call to "func" is given three integer arguments which will always end up in r1-r3, so the compiler gives us the copy implicitly. You might be able to solve the problem with a call like this: return func(that, stack_space[base_size * 2 - 3], stack_space[base_size * 2 - 2], stack_space[base_size * 2 - 1], double_stack[0], double_stack[1], double_stack[2], double_stack[3], double_stack[4], double_stack[5], double_stack[6], double_stack[7]); That clearly requires creating an 8-double-long buffer somewhere. Your hijacked copy_float_param function could write into that buffer, rather than into float registers. GCC may also generate non-optimal code for this, so hand-writing some assembly might be a good idea. (That can be left for another day if your priorities are to get things working.) The motivation behind porting to pure C is so that it would build for ARMv4T whilst providing optimal calling code for newer platforms. (ARMv4T doesn't have the "blx" instruction required for efficient calls on newer platforms.) With the "hard" ABI, you already have an implicit requirement for a modern platform anyway. ---- Writing the code in a new file may be a good idea if you have to #ifdef a large number of modifications. (It depends on how much you need to make conditional, though.) Something like "xptcinvoke_arm_hardfp.cpp" might be appropriate, if you need to do this. ---- It's probably made irrelevant by my previous suggestion, but the following will almost certainly perform poorly: + "cmp %1, #0\n\t" + "flddeq d0, [%0]\n\t" + "cmp %1, #1\n\t" + "flddeq d1, [%0]\n\t" + "cmp %1, #2\n\t" + "flddeq d2, [%0]\n\t" + "cmp %1, #3\n\t" + "flddeq d3, [%0]\n\t" + "cmp %1, #4\n\t" + "flddeq d4, [%0]\n\t" + "cmp %1, #5\n\t" + "flddeq d5, [%0]\n\t" + "cmp %1, #6\n\t" + "flddeq d6, [%0]\n\t" + "cmp %1, #7\n\t" + "flddeq d7, [%0]\n\t" I think you'd be better off writing to a buffer and using a single VLDM to load the arguments. (You'll have to do that anyway, if you use the "func" call modification that I proposed.)

Jacob Bramley [:jbramley]

Comment 6

•

15 years ago

(In reply to comment #3) > [...] (In reply to comment #4) > [...] Or, what Siarhei said! (I didn't check the bug before pushing "submit".)

Siarhei Siamashka

Assignee

Comment 7

•

15 years ago

Attached patch Hopefully fully functional arm hardfloat support in xptcinvoke (try1) (obsolete) — Details — Splinter Review

The toolchain must support __ARM_PCS/__ARM_PCS_VFP predefined symbols for it to work. As a temporary solution for testing purposes, __ARM_PCS_VFP can be defined in some makefile. I had to use naked function fully implemented in assembly because alloca stuff seems to be very fragile. It often breaks as a result of some more or less innocent looking changes in the code. Jacob, could you please review this patch?

Oleg Romashin (:romaxa)

Updated

•

15 years ago

Attachment #481102 - Flags: review?(Jacob.Bramley)

Oleg Romashin (:romaxa)

Comment 8

•

15 years ago

Comment on attachment 481102 [details] [diff] [review] Hopefully fully functional arm hardfloat support in xptcinvoke (try1) Tested... TestXPTCInvoke pass all tests with this patch.

Attachment #481102 - Flags: feedback+

Oleg Romashin (:romaxa)

Comment 9

•

15 years ago

> make conditional, though.) Something like "xptcinvoke_arm_hardfp.cpp" > might be appropriate, if you need to do this. > new file require some Makefile stub define, and configure test for hardfp ABI. Also some hardfp toolchain detection need to be done for JS JIT in js/src

Oleg Romashin (:romaxa)

Comment 10

•

15 years ago

Comment on attachment 481102 [details] [diff] [review] Hopefully fully functional arm hardfloat support in xptcinvoke (try1) >@@ -21,6 +21,7 @@ > * > * Contributor(s): > * Mike Hommey <mh@glandium.org> >+ * Nokia Corporation I believe contributor here should be >+ * Siarhei Siamashka <siarhei.siamashka@gmail.com>

Doug Turner (:dougt)

Updated

•

15 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Doug Turner (:dougt)

Updated

•

15 years ago

tracking-fennec: --- → 2.0+

Jacob Bramley [:jbramley]

Comment 11

•

15 years ago

(In reply to comment #10) > I believe contributor here should be > > >+ * Siarhei Siamashka <siarhei.siamashka@gmail.com> There are examples of corporate contributions going purely under the company or group names (such as "Adobe AS3 Team"). Is it a general Mozilla recommendation to use individual names? (I just want to know for future reference.)

Oleg Romashin (:romaxa)

Comment 12

•

15 years ago

IIUC we need also change xptcstubs_arm.cpp... Do we have any tests for this one?

Tero Koskinen

Reporter

Comment 13

•

15 years ago

(In reply to comment #12) > IIUC we need also change xptcstubs_arm.cpp... Do we have any tests for this > one? I couldn't find any directly, but there is js/src/xpconnect/tests/TestXPC(.cpp) which might test xptcstubs_arm.cpp code also.

Jacob Bramley [:jbramley]

Comment 14

•

15 years ago

Comment on attachment 481102 [details] [diff] [review] Hopefully fully functional arm hardfloat support in xptcinvoke (try1) In general, I like this approach, and it's clear what the code does. I've made a few comments below on specific points. I'm giving "r-" purely because there are a few points below and I'd like to see v2. >+static inline void copy_word(PRUint32* &ireg_args, >+ PRUint32* &stack_args, >+ PRUint32* end, >+ PRUint32 data) >+{ >+ if (ireg_args < end) >+ *ireg_args++ = data; >+ else >+ *stack_args++ = data; >+} This looks correct, but I'm not sure about the coding style. I always use braces, and I'd list the increment on a separate line for clarity. Mozilla's style guide is here: https://developer.mozilla.org/En/Mozilla_Coding_Style_Guide Having said that, style does vary from component to component, and I don't do much work in XPCOM so others probably know better. (There are other instances, but I don't want to dwell on it.) >+static inline void copy_dword(PRUint32* &ireg_args, >+ PRUint32* &stack_args, >+ PRUint32* end, >+ PRUint64 data) >+{ This function could do with a comment to explain the invariants and how they are satisfied. For example, you know that 'end' is aligned to 8 bytes (despite being a PRUint32*), so it is safe to check just ireg_args+1 and yet (potentially) write three words onto it. >+static inline bool copy_vfp_single(float* &vfp_s_args, double* &vfp_d_args, >+ float* end, float data) >+{ >+ if (vfp_s_args < end) >+ { >+ *vfp_s_args++ = data; >+ if (vfp_s_args < (float *)vfp_d_args) >+ vfp_s_args = (float *)vfp_d_args; /* back-filling case */ >+ else if (vfp_s_args > (float *)vfp_d_args) >+ vfp_d_args++; >+ return true; >+ } >+ return false; >+} >+ >+static inline bool copy_vfp_double(float* &vfp_s_args, double* &vfp_d_args, >+ float* end, double data) >+{ >+ if (vfp_d_args < (double *)end) >+ { >+ if (vfp_s_args == (float *)vfp_d_args) >+ vfp_s_args += 2; >+ *vfp_d_args++ = data; >+ return true; >+ } >+ // failed to allocate register even once -> can't use vfp registers anymore >+ vfp_s_args = end; >+ return false; >+} The code would perhaps be clearer if you made (for example) copy_f32 and copy_f64 functions, as you did for the integer code, and then maintain only one vfp_args pointer. The alignment rules are the same as for integer code. Currently, copy_vfp_double works differently to copy_dword, yet it performs essentially the same function but with a different data type. >+EXPORT_XPCOM_API(nsresult) >+NS_InvokeByIndex(nsISupports* that, PRUint32 methodIndex, >+ PRUint32 paramCount, nsXPTCVariant* params) You've got a lot of casts in this function (and its associated macro), but you've defined your own types. Since you're passing to assembly anyway, why not just start with the types that already fit what you want? (This looks like an artefact from the softfp variant.) Actually, since you drive everything from your assembly function, you could just put do_call_asm directly in NS_InvokeByIndex. (It doesn't need to be naked, as long as you specify the appropriate asm dependencies.) Untested example: EXPORT_XPCOM_API(nsresult) NS_InvokeByIndex(nsISupports* that, PRUint32 methodIndex, PRUint32 paramCount, nsXPTCVariant* params) { register PRUint32 result asm("r0"); asm ( #if defined(__GXX_ABI_VERSION) && __GXX_ABI_VERSION >= 100 /* G++ V3 ABI */ " ldr FUNC, [THAT, METHODINDEX]\n" #else /* non G++ V3 ABI */ " add ip, METHODINDEX, THAT\n" " ldr FUNC, [ip, #8]\n" #endif " [...] @ Implementation as before." : [RESULT] "=&r" (result) : [THAT] "r" (that), [METHODINDEX] "r" (methodIndex * sizeof(vtable)), [...] : [...] ); return result; } >+static void __attribute__((naked)) do_call_asm() >+{ >+ asm volatile ( >+ /* define some aliases for better readability */ >+ "FUNC .req r4\n" >+ "THAT .req r5\n" >+ "PARAMS .req r6\n" >+ "PARAM_COUNT_PLUS_2 .req r7\n" >+ "INVOKE_COPY_TO_STACK .req r8\n" >+ "STACK_SPACE_SIZE .req r9\n" >+ >+ /* function entry */ >+ "ldr ip, [sp]\n" >+ "push {r4-r11, ip, lr}\n" /* save all registers */ With the exception of dummy values for aligning the stack pointer, only save what you use. You don't seem to use r10-r11. If you put this in-line (as I suggested above), you should just add the registers to the clobber list, and let the compiler stack them for you. Putting the code in-line imposes a slight inconvenience, however, as the compiler might not guarantee that the stack is 8-byte-aligned for the asm block. This is easily fixed by adding (SP & 4) to STACK_SPACE_SIZE. >+ /* save current vfp registers (don't want to load some garbage later) */ >+ "add ip, sp, PARAM_COUNT_PLUS_2, lsl #3\n" >+ "vstm ip, {d0, d1, d2, d3, d4, d5, d6, d7}\n" Why preserve these? They contain nothing useful at this point and are caller-saved registers, so we can clobber them here. >+ /* call func */ >+ "mov r0, THAT\n" >+ "add ip, sp, PARAM_COUNT_PLUS_2, lsl #3\n" >+ "vldm ip, {d0, d1, d2, d3, d4, d5, d6, d7}\n" >+ "ldmdb ip!, {r1, r2, r3}\n" There's no need for the update on IP. (It's harmless, but wasted effort.)

Attachment #481102 - Flags: review?(Jacob.Bramley) → review-

Jacob Bramley [:jbramley]

Comment 15

•

15 years ago

I've Added Mike Hommey on CC because he already went through the process of updating this code for ARMv4T and may have some insight.

timeless

Comment 16

•

15 years ago

Comment on attachment 481102 [details] [diff] [review] Hopefully fully functional arm hardfloat support in xptcinvoke (try1) >+#else /* __ARM_PCS_VFP */ Somewhere near here i'd like to see something like this: Calling conventions are described in the following document, in section "5.5 Parameter Passing": http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf >+static inline bool copy_vfp_single(float* &vfp_s_args, double* &vfp_d_args, >+ float* end, float data) >+{ >+ if (vfp_s_args < end) >+ { >+ *vfp_s_args++ = data; >+ if (vfp_s_args < (float *)vfp_d_args) >+ vfp_s_args = (float *)vfp_d_args; /* back-filling case */ Could you possibly explain as a comment before this function what's going on here and how the back-filling case works/what it means? >+ else if (vfp_s_args > (float *)vfp_d_args) >+ vfp_d_args++; >+ return true; >+ } Generally in mozilla code we try to return early on failure, it saves on indentation: >+ return false; so instead you'd have: if (vfp_s_args > end) return false; <main code path here> >+} >+static inline bool copy_vfp_double(float* &vfp_s_args, double* &vfp_d_args, >+ float* end, double data) >+{ again, having early returns for failure is generally preferred >+ if (vfp_d_args < (double *)end) >+ { >+ if (vfp_s_args == (float *)vfp_d_args) >+ vfp_s_args += 2; >+ *vfp_d_args++ = data; >+ return true; >+ } >+ // failed to allocate register even once -> can't use vfp registers anymore I can't understand this comment as it's written, please try to clarify. Note that comments do not have to be terse. >+ vfp_s_args = end; >+ return false; >+} while this macro uses two space indentation (which doesn't match the file), >+#define do_call(func, that, params, paramCount, invoke_copy_to_stack) \ >+ ((PRUint32 (*)(void (*)(), void *, void *, PRUint32, void (*)())) \ >+ do_call_asm) \ >+ (reinterpret_cast<void (*)()>(func), (that), (params), (paramCount), \ >+ reinterpret_cast<void (*)()>(invoke_copy_to_stack)) the file is using four space indentation, and this is a function in it, i'd rather both the macro-function above and the function below match the file and uses 4 space. >+EXPORT_XPCOM_API(nsresult) >+NS_InvokeByIndex(nsISupports* that, PRUint32 methodIndex, >+ PRUint32 paramCount, nsXPTCVariant* params) >+{ >+ vtable_func *vtable, func;

Attachment #481102 - Flags: feedback-

Siarhei Siamashka

Assignee

Comment 17

•

15 years ago

(In reply to comment #10) > > * Contributor(s): > > * Mike Hommey <mh@glandium.org> > >+ * Nokia Corporation > > I believe contributor here should be > > >+ * Siarhei Siamashka <siarhei.siamashka@gmail.com> Thanks for your comment. I'll try to find out whether Nokia actually wants the copyright/credit for this particular patch or I can keep it for myself (I'm a Nokia employee at the moment for what it matters).

Benjamin Smedberg

Comment 18

•

15 years ago

Note that this is entirely about copyright, not "credit". I believe that Nokia is automatically the copyright owner unless it assigns it, so Nokia is correct.

Siarhei Siamashka

Assignee

Comment 19

•

15 years ago

Attached file Test program for VFP registers allocation code. — Details

(In reply to comment #14) > >+static inline void copy_dword(PRUint32* &ireg_args, > >+ PRUint32* &stack_args, > >+ PRUint32* end, > >+ PRUint64 data) > >+{ > > This function could do with a comment to explain the invariants and how > they are satisfied. For example, you know that 'end' is aligned to 8 > bytes (despite being a PRUint32*), so it is safe to check just > ireg_args+1 and yet (potentially) write three words onto it. OK, such additional comments can be added. > >+static inline bool copy_vfp_single(float* &vfp_s_args, double* &vfp_d_args, > >+ float* end, float data) > >+static inline bool copy_vfp_double(float* &vfp_s_args, double* &vfp_d_args, > >+ float* end, double data) > > The code would perhaps be clearer if you made (for example) copy_f32 and > copy_f64 functions, as you did for the integer code, and then maintain > only one vfp_args pointer. The alignment rules are the same as for > integer code. Currently, copy_vfp_double works differently to > copy_dword, yet it performs essentially the same function but with a > different data type. There is a substantial difference. VFP registers allocation has to support "back-filling", while integer code does not need to do this. For example "void f(int a, long long b, int c, int d)" will have: a = r0 b = r2:r3 pair c = [sp] d = [sp, #4] But "void f(float a, double b, float c, float d)" will have: a = s0 b = d1 (s2:s3 pair) c = s1 <--- THIS IS IMPORTANT (we are back-filling the gap) d = s4 I have attached a test program which I used to compare VFP registers allocation code against a simple but slow implementation done according to the spec. I wonder if it would be a good idea to actually add a simple reference implementation to the patch too (inactive, but switchable via ifdefs)? Another important test case is function "void f(double d0, double d1, double d2, double d3, double d4, double d5, double d6, float s14, double this_goes_to_stack, float this_goes_to_stack_too_instead_of_s15)" Actually current XPCOM test suite does not stress some of the interesting cases and may need to be improved. > >+EXPORT_XPCOM_API(nsresult) > >+NS_InvokeByIndex(nsISupports* that, PRUint32 methodIndex, > >+ PRUint32 paramCount, nsXPTCVariant* params) > > You've got a lot of casts in this function (and its associated macro), > but you've defined your own types. Since you're passing to assembly > anyway, why not just start with the types that already fit what you > want? (This looks like an artefact from the softfp variant.) I guess these types have some meaning, so getting rid of them without a good reason (just because we can) may be not so intuitive. > >+ /* save current vfp registers (don't want to load some garbage later) */ > >+ "add ip, sp, PARAM_COUNT_PLUS_2, lsl #3\n" > >+ "vstm ip, {d0, d1, d2, d3, d4, d5, d6, d7}\n" > > Why preserve these? They contain nothing useful at this point and are > caller-saved registers, so we can clobber them here. Yeah, appears it's a case of "cargo cult programming" :) I somehow initially thought that loading random garbage values found on stack via VLDM instruction later (for the gaps which are not filled) may be not a very good idea. But since VLDM itself is not going to trigger floating point exceptions, this indeed can be just dropped. The rest of your comments about assembly code basically boils down to "you can do it with less instructions and less registers used". Yes, I agree :) Will improve (maybe also taking care of perfect instructions scheduling). I just thought that this part did not strictly require performance optimizations for the initial revision (unlike VFP registers allocation code which resides in the inner loop). I'll also double check it for the potential quirks like theoretical big endian compatibility, proper thumb interworking on older ARM cores, etc. Though the patch should work quite fine at least on little endian armv7 compatible processors even now.

Jacob Bramley [:jbramley]

Comment 20

•

15 years ago

(In reply to comment #19) > There is a substantial difference. VFP registers allocation has to support > "back-filling", while integer code does not need to do this. Oh, you are correct. Apologies! > The rest of your comments about assembly code basically boils down to "you can > do it with less instructions and less registers used". Yes, I agree :) Will > improve (maybe also taking care of perfect instructions scheduling). I just > thought that this part did not strictly require performance optimizations for > the initial revision (unlike VFP registers allocation code which resides in the > inner loop). It's true that making it work is the priority, but I believe this is hot code so it's worth making it fast too if we can. ---- I think removing the do_call_asm macro could go some way to clarifying the code, but that can be picked up in an optimization phase anyway if you prefer.

Siarhei Siamashka

Assignee

Comment 21

•

15 years ago

(In reply to comment #16) > >+#else /* __ARM_PCS_VFP */ > > Somewhere near here i'd like to see something like this: > > Calling conventions are described in the following document, in section "5.5 > Parameter Passing": > http://infocenter.arm.com/help/topic/com.arm.doc.ihi0042d/IHI0042D_aapcs.pdf Unfortunately the web links to ARM documentation are (or used to be) "polymorphic" and it is quite likely that this link will be dead in a year or so. But I agree, at least the name of this document can be mentioned. > >+static inline bool copy_vfp_single(float* &vfp_s_args, double* &vfp_d_args, > >+ float* end, float data) > >+{ > >+ if (vfp_s_args < end) > >+ { > >+ *vfp_s_args++ = data; > >+ if (vfp_s_args < (float *)vfp_d_args) > >+ vfp_s_args = (float *)vfp_d_args; /* back-filling case */ > > Could you possibly explain as a comment before this function what's going on > here and how the back-filling case works/what it means? Just read the spec, Luke ;) But seriously, this code is indeed a bit difficult to read and review because it tries to cut some corners and do things faster (though I'm not claiming that it is the fastest possible implementation). See comment 19 for a bit more explanations and a test program. > > >+ else if (vfp_s_args > (float *)vfp_d_args) > >+ vfp_d_args++; > >+ return true; > >+ } > > Generally in mozilla code we try to return early on failure, it saves on > indentation: > >+ return false; Putting this code to inline functions was already mostly done for the purpose of keeping indentation sane. Just two levels of indentation for a very small function is IMHO more than fine. But as you insist, I can surely do this. > >+ // failed to allocate register even once -> can't use vfp registers anymore > > I can't understand this comment as it's written, please try to clarify. > Note that comments do not have to be terse. Yeah, maybe a direct quote from the spec would be a lot better here: "The back-filling continues only so long as no VFP CPRC has been allocated to a slot on the stack." Thanks to everyone who found time to review the patch! The v2 is coming.

Brad Lassey [:blassey] (use needinfo?)

Updated

•

15 years ago

tracking-fennec: 2.0+ → 2.0-

Brad Lassey [:blassey] (use needinfo?)

Updated

•

15 years ago

tracking-fennec: 2.0- → 2.0+

Siarhei Siamashka

Assignee

Comment 22

•

15 years ago

(In reply to comment #14) > Untested example: > > EXPORT_XPCOM_API(nsresult) > NS_InvokeByIndex(nsISupports* that, PRUint32 methodIndex, > PRUint32 paramCount, nsXPTCVariant* params) > { > register PRUint32 result asm("r0"); Looks like this is unsafe: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32820#c7 Probably it's better to avoid such fancy features as explicitly forcing variables into specific registers.

Siarhei Siamashka

Assignee

Comment 23

•

15 years ago

Attached patch ARM hardfloat support for xptcinvoke (try2) — Details — Splinter Review

Let's try it this way: - Coding style fixes - More comments - Naked function replaced with inline assembly block

Attachment #481102 - Attachment is obsolete: true

Attachment #482332 - Flags: review?(Jacob.Bramley)

Jacob Bramley [:jbramley]

Comment 24

•

15 years ago

(In reply to comment #22) > (In reply to comment #14) > > register PRUint32 result asm("r0"); > > Looks like this is unsafe: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32820#c7 Oh, that's disappointing. It's what the manual recommends!

Siarhei Siamashka

Assignee

Comment 25

•

15 years ago

There are numerous bugs with inline assembly. The trick is to know them and avoid problems. One of the solutions would be to turn 'NS_InvokeByIndex' into a naked function, but such functions with arguments are also potentially unsafe: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44290 There were other problems with 'naked' attribute other than this bug, generally it is only safe to use naked function without any arguments (and do some function type casting like in my first patch), otherwise gcc may easily screw it up especially with optimizations disabled, frame pointers enabled or in some other not quite typical configurations. Another unfixed inline assembly bug to be aware of: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43440

Siarhei Siamashka

Assignee

Comment 26

•

15 years ago

(In reply to comment #13) > (In reply to comment #12) > > IIUC we need also change xptcstubs_arm.cpp... Do we have any tests for this > > one? > > I couldn't find any directly, but there is js/src/xpconnect/tests/TestXPC(.cpp) > which might test xptcstubs_arm.cpp code also. This test uses stubs, but does not seem to use any floating point arguments there (as shown by debugging print added into 'PrepareAndDispatch' function). Trying "make -C objdirXXX/ xpcshell-tests" also does not reveal any obvious uses of floating point arguments. Does anybody know what would be the easiest way to add a new test which could stress floating point arguments with XPCOM stubs?

Siarhei Siamashka

Assignee

Comment 27

•

15 years ago

One alternative is to just add hardfp support to the stubs code, artificially test it in some other way, and hope that it will be all right.

Oleg Romashin (:romaxa)

Updated

•

15 years ago

Assignee: nobody → siarhei.siamashka

Oleg Romashin (:romaxa)

Comment 28

•

15 years ago

Comment on attachment 482332 [details] [diff] [review] ARM hardfloat support for xptcinvoke (try2) Jacob could you review this patch?

Jacob Bramley [:jbramley]

Comment 29

•

15 years ago

Comment on attachment 482332 [details] [diff] [review] ARM hardfloat support for xptcinvoke (try2) Sorry for the slow review! The patch looks good. The only thing missing is that you should specify d0-d7 and d15-d31 in the clobber list for the asm block. (They might be clobbered by the call.) Adding d15-d31 may be tricky; I don't know if you can query for VFPv3-D32 support in the pre-processor. If it's not possible, I'd just leave it out. As long as the InvokeByIndex function is not in-lined, and as long as it doesn't use VFP value itselfs, d16-d31 should never be used by the compiler anyway, and it won't matter if the call clobbers them. r+ with d0-d7. I'm not convinced that it'll be possible to specify d15-d31 reliably, so you can leave them off for now.

Attachment #482332 - Flags: superreview?(benjamin)

Attachment #482332 - Flags: review?(vladimir)

Attachment #482332 - Flags: review?(Jacob.Bramley)

Attachment #482332 - Flags: review+

Siarhei Siamashka

Assignee

Comment 30

•

15 years ago

(In reply to comment #29) > As long as the InvokeByIndex function is not in-lined, and as long as it > doesn't use VFP value itselfs, d16-d31 should never be used by the compiler > anyway, and it won't matter if the call clobbers them. That's actually a good point. With LTO becoming more popular recently, taking special care to prevent inlining may be also needed. Would also adding __attribute__((noinline)) to InvokeByIndex function make sense? And now I see that surely some more comments explaining why it is done this way would be a good addition. > r+ with d0-d7. I'm not convinced that it'll be possible to specify d15-d31 > reliably, so you can leave them off for now. Thanks. Should I attach v3 patch and request a review for it again?

Siarhei Siamashka

Assignee

Comment 31

•

15 years ago

(In reply to comment #29) > Adding d15-d31 may be tricky; I don't know if you can query for VFPv3-D32 > support in the pre-processor. If it's not possible, I'd just leave it out. Actually it seems that gcc currently accepts d15-d31 registers in the clobber list even for -mfpu=vfp and -mfpu=vfpv3-d16. If they are just added as is, in the worst case it would be a build failure (which is a lot better than some obscure runtime bug). So would it be the best to have d15-d31 registers in the clobber list unconditionally with a descriptive comment there? So that if gcc becomes more strict in the future, anyone even not familiar with arm assembler would still get the idea about what is going on and how to workaround it. I have not found any gcc preprocessor symbol that could be used to detect the difference between d16 and d32 variants. Assuming that it does not exist yet, asking gcc developers to provide one may be a good idea.

Jacob Bramley [:jbramley]

Comment 32

•

15 years ago

(In reply to comment #31) > So would it be the best to have d15-d31 registers in the > clobber list unconditionally with a descriptive comment there? That sounds like a good solution to me. > I have not found any gcc preprocessor symbol that could be used to detect the > difference between d16 and d32 variants. Assuming that it does not exist yet, > asking gcc developers to provide one may be a good idea. Agreed. I confirmed that there is no such symbol, but there really should be. (In reply to comment #30) > Thanks. Should I attach v3 patch and request a review for it again? General policy is that minor changes — such as tweaks based on comments in a review or simple rebasing — do not need a re-review. However, if you don't have commit rights then you'll need to post a new patch anyway. Also, you've got a couple more reviews waiting yet so you might as well tag a v3 patch.

Siarhei Siamashka

Assignee

Comment 33

•

15 years ago

Attached patch ARM hardfloat support for xptcinvoke (v3) (obsolete) — Details — Splinter Review

- added VFP registers to the inline assembly clobber list as requested by Jacob - more comments

Attachment #482332 - Attachment is obsolete: true

Attachment #482332 - Flags: superreview?(benjamin)

Attachment #482332 - Flags: review?(vladimir)

Oleg Romashin (:romaxa)

Updated

•

15 years ago

Attachment #485722 - Flags: superreview?(benjamin)

Attachment #485722 - Flags: review?(vladimir)

Benjamin Smedberg

Comment 34

•

15 years ago

Comment on attachment 485722 [details] [diff] [review] ARM hardfloat support for xptcinvoke (v3) I don't know ARM assembly and am not a qualified reviewer. I'm happy to delegate to vlad.

Attachment #485722 - Flags: superreview?(benjamin)

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 35

•

15 years ago

Comment on attachment 485722 [details] [diff] [review] ARM hardfloat support for xptcinvoke (v3) You probably want jacob to r+ this one as well

Attachment #485722 - Flags: review?(vladimir) → review?(Jacob.Bramley)

Oleg Romashin (:romaxa)

Comment 36

•

15 years ago

Attached patch Diff between try2 which r+ and try3 — Details — Splinter Review

Oleg Romashin (:romaxa)

Comment 37

•

15 years ago

Comment on attachment 485722 [details] [diff] [review] ARM hardfloat support for xptcinvoke (v3) Jacob already checked try2, I have attached interdiff between try2/try3 for additional review

Attachment #485722 - Flags: review?(Jacob.Bramley) → review?(vladimir)

Oleg Romashin (:romaxa)

Updated

•

15 years ago

Attachment #482332 - Attachment is obsolete: false

Oleg Romashin (:romaxa)

Updated

•

15 years ago

Attachment #485996 - Flags: review?(Jacob.Bramley)

Jacob Bramley [:jbramley]

Comment 38

•

15 years ago

Comment on attachment 485996 [details] [diff] [review] Diff between try2 which r+ and try3 Yep, I already looked at v3 too, though I didn't add an r+ tag to show it.

Attachment #485996 - Flags: review?(Jacob.Bramley) → review+

Oleg Romashin (:romaxa)

Comment 39

•

15 years ago

Attached patch Combined patch try2 + try3 interdiff, with header PUSHME — Details — Splinter Review

Attachment #485722 - Attachment is obsolete: true

Attachment #485722 - Flags: review?(vladimir)

Doug Turner (:dougt)

Comment 40

•

15 years ago

HardFP can land for b2.

tracking-fennec: 2.0+ → 2.0b2+

Doug Turner (:dougt)

Comment 41

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/dd50da0646a4

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Doug Turner (:dougt)

Comment 42

•

15 years ago

wow. not sure wtf all of that js stuff is in the mc push.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Doug Turner (:dougt)

Comment 43

•

15 years ago

ignore me. wrong bug.

Status: REOPENED → RESOLVED

Closed: 15 years ago → 15 years ago

Resolution: --- → FIXED

Doug Turner (:dougt)

Updated

•

15 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Doug Turner (:dougt)

Comment 44

•

15 years ago

it sounds like some renaming in js haunting peoples merges.

Status: REOPENED → RESOLVED

Closed: 15 years ago → 15 years ago

Resolution: --- → FIXED

Doug Turner (:dougt)

Comment 45

•

15 years ago

backed out. will reland when tree goes green.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Oleg Romashin (:romaxa)

Comment 46

•

15 years ago

Better push, no extra JS tests changes. http://hg.mozilla.org/mozilla-central/rev/eaefbdbc2f2d

Status: REOPENED → RESOLVED

Closed: 15 years ago → 15 years ago

Resolution: --- → FIXED

Mike Hommey [:glandium]

Updated

•

14 years ago

Depends on: 644140

Partial fix for XPCOM (includes debug stuff and is missing some functionality) 15 years ago Tero Koskinen 5.45 KB, patch		Details \| Diff \| Splinter Review
Hopefully fully functional arm hardfloat support in xptcinvoke (try1) 15 years ago Siarhei Siamashka 8.17 KB, patch	jbramley : review- romaxa : feedback+ timeless : feedback-	Details \| Diff \| Splinter Review
Test program for VFP registers allocation code. 15 years ago Siarhei Siamashka 3.32 KB, text/plain		Details
ARM hardfloat support for xptcinvoke (try2) 15 years ago Siarhei Siamashka 9.69 KB, patch	jbramley : review+	Details \| Diff \| Splinter Review
ARM hardfloat support for xptcinvoke (v3) 15 years ago Siarhei Siamashka 10.81 KB, patch		Details \| Diff \| Splinter Review
Diff between try2 which r+ and try3 15 years ago Oleg Romashin (:romaxa) 1.92 KB, patch	jbramley : review+	Details \| Diff \| Splinter Review
Combined patch try2 + try3 interdiff, with header PUSHME 15 years ago Oleg Romashin (:romaxa) 12.19 KB, patch		Details \| Diff \| Splinter Review