Closed Bug 532198 Opened 15 years ago Closed 15 years ago

Reimplement xptcinvoke for arm/linux in C

Categories

(Core :: XPCOM, defect)

ARM
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9.3a5

People

(Reporter: reed, Assigned: glandium)

References

Details

Attachments

(2 files, 7 obsolete files)

Attached patch patch - v1 (obsolete) — Splinter Review
Upstreaming a patch from Kevin Welton <Kevin.Welton@arm.com> to fix NS_InvokeByIndex() to use BLX for method invocations when compiling for Thumb. This comes from http://linux.onarm.com/bugzilla/show_bug.cgi?id=36.
Attachment #415476 - Flags: review?(timeless)
Attachment #415476 - Flags: review?(benjamin)
looks like that's patching a patch that was already backed out. http://hg.mozilla.org/mozilla-central/rev/daa5346f19d6 We can use the #ifdef _thumb_ for future work though.
(In reply to comment #1) > looks like that's patching a patch that was already backed out. > http://hg.mozilla.org/mozilla-central/rev/daa5346f19d6 That's the backout commit...
This patch applies fine to trunk with a few minor offsets.
Attachment #415476 - Flags: review?(vladimir)
Attachment #415476 - Flags: review?(timeless)
Attachment #415476 - Flags: review?(benjamin)
There's no reason to not use BLX all the time, no? I thought that's what the updated xptcinvoke patch did.
Attachment #415476 - Flags: review-
Comment on attachment 415476 [details] [diff] [review] patch - v1 for lazy people, that's bug 530087 >+ "blx ip \n\t" /* call it... */ > "mov lr, pc \n\t" /* call it... */ comments should match the conventions of our file, that means the exact same whitespace, no compression. >+ "blx ip \n\t" /* copy args to the stack like the >+ * compiler would */ i think: /* .... */ /* .... */ >+#ifdef __thumb__ >+ "blx ip \n\t" /* call method */ >+ "mov lr, pc \n\t" /* call method */
You can (and should) use BLX all the time in this case, but it won't then work on ARMv4. The JIT doesn't work on ARMv4 anyway, and I don't think we care about such old machines. Having said that, some distributions might care about ARMv4, and they might want support, even if it's slow. (Don't ask me why!) A solution here would be to use the old-style "MOV pc, ip" on ARMv4 (using a compile-time switch of some kind) and BLX everywhere else. We would also have to disable the JIT for ARMv4, but ARMv4 distributions must do that anyway because the JIT hasn't had ARMv4 support for a long time. NOTE: Using "MOV pc, ip" will break the return stack prediction on newer cores, so even if you don't want to interwork, use BLX <reg>. The performance difference should be significant.
Assignee: nobody → Jacob.Bramley
The new patch uses BLX all the time, and thus imposes an ARMv5T baseline requirement for XPCOM. (We previously had this requirement for the JIT anyway, but that could be trivially disabled for crazy people who wanted to compile for ARMv4.) I've (barely) tested this with the ARM build, but couldn't get GCC 4.3.3 to produce a Thumb-2 build. (The error looked like a compiler error in an unrelated module.) I could massage it into building just the bits we care about with -mthumb, but it takes a long time to build anyway, let alone with me hacking the build system. In any case, I've delayed enough for such a trivial patch, so here it is. Whilst it's trivial, it wants pushing to the try server or something like that before it's actually pushed.
Attachment #415476 - Attachment is obsolete: true
Attachment #428899 - Flags: review?(vladimir)
Attachment #415476 - Flags: review?(vladimir)
(In reply to comment #6) > Having said that, some distributions might care about ARMv4, and they might > want support, even if it's slow. (Don't ask me why!) Here is why: openmoko is armv4. Debian supports openmoko.
(In reply to comment #8) > (In reply to comment #6) > Here is why: openmoko is armv4. Debian supports openmoko. Hmm. The Neo FreeRunner uses an ARM920T, which is an ARMv4T device. However, it sits at 400MHz and has just 128MB of RAM. I admit that this is the closest ARMv4(T) device I've seen to being a realistic platform, but is it really worth going out of our way to support? We would have to include #ifdef blocks in our code to support ARMv4T, and these blocks will need testing.
Yeah, it's highly unlikely that we'll support any ARMv4 platforms going forward. (Even ARMv5 is borderline, but there's little cost in supporting it if we want to support ARMv6 anyway.)
Attached patch (Almost) C-only version (obsolete) — Splinter Review
How about "simply" replacing the assembly with (almost) C code only ? The compiler will take care of using the proper opcode for the target architecture. It will also create more efficient code and also inline the invoke_* functions. I get a 25% speed improvement on the TestXPTCInvoke speed test with an ARMv4T target. (note it ends up using bl on armv4t, and blx when using e.g. an armv5t target) Original assembly: direct took 3.78 seconds invoke took 50.84 seconds So, invoke overhead was ~ 47.06 seconds (~ 93%) With C, without inlining: direct took 3.62 seconds invoke took 38.90 seconds So, invoke overhead was ~ 35.28 seconds (~ 91%) With inlined invoke functions: direct took 3.62 seconds invoke took 37.53 seconds So, invoke overhead was ~ 33.91 seconds (~ 90%) Note I'm not entirely sure the attached patch is totally safe for any compiler, but it works properly when built with gcc 4.4. If this is not the case, I'm pretty sure this can be made safer. Also note invoke_count_words could also be removed, to improve speed even more, as per bug 530087.
Attachment #433119 - Flags: review?(vladimir)
(In reply to comment #11) > I get a 25% speed improvement on the TestXPTCInvoke speed test with an ARMv4T > target. (note it ends up using bl on armv4t, and blx when using e.g. an armv5t > target) (Note this was compiled for ARMv4T, but run on an ARMv5T)
That's a neat solution. The only worry I have is the fiddling of the stack pointer within C code; that is likely to break even in the same compiler with unrelated code changes causing it to prod the stack immediately before calling "func". Actually, I'd be interested in seeing the assembly for that, as I'd like to see what it's doing differently. (In reply to comment #11) > Note I'm not entirely sure the attached patch is totally safe for any compiler, > but it works properly when built with gcc 4.4. If this is not the case, I'm > pretty sure this can be made safer. That doesn't matter; the original assembly was extremely compiler-dependent! > Also note invoke_count_words could also be removed, to improve speed even more, > as per bug 530087. Not without sacrificing memory: invoke_copy_to_stack on WinCE fills the stack downwards and avoids calling invoke_count_words to work out how much space to reserve. On Linux, with EABI, 64-bit arguments must be 64-bit aligned, so it's impossible for invoke_copy_to_stack to know where to put each argument (with one pass) if it's filling them backwards; the alignment requirements on an argument early in the list can shift everything above it. The best solution I have is to reserve arg_count*2 words on the stack and waste stack space if we have 32-bit arguments. That will allow us to stick to one pass, skipping invoke_copy_to_stack entirely and filling the arguments forwards with proper alignment, at the cost of some stack space. (In reply to comment #12) > (In reply to comment #11) > > I get a 25% speed improvement on the TestXPTCInvoke speed test with an ARMv4T > > target. (note it ends up using bl on armv4t, and blx when using e.g. an armv5t > > target) > > (Note this was compiled for ARMv4T, but run on an ARMv5T) "bl" and "blx" will exhibit equivalent performance; the only difference is when interworking is involved. Return stack prediction is used for both (but maybe not on an ARMv5T device). There are two issues here. I think you know these anyway, but for the sake of documentation & clarity in the bug: * Correctness on a system with mixed ARM & Thumb code: "blx" _must_ be used here. * Performance: - ARMv4T interworking branches aren't return-stack-predicted on newer cores, and this costs a huge amount of performance. (I've seen figures ranging from 5% to 50%, but I don't know what would happen in this particular case.) - ARMv4 long-range branches involve doing "mov pc, <dst_reg>". That's what the old code used to do. ("bl" can't branch to an address in a register and is restricted to PC±32MB.) ---- I was planning to work on this (and 530087) further over the next few days because it's been in my "to do" list for ages, but you're welcome to take it if you want to.
(In reply to comment #13) > That's a neat solution. The only worry I have is the fiddling of the stack > pointer within C code; that is likely to break even in the same compiler with > unrelated code changes causing it to prod the stack immediately before calling > "func". I think it should be safe, actually, because the compiler knows the stack pointer is being fiddled with by the assembly, and avoids using it. Without forcing the stack pointer with the assembly, the compiled code modifies sp to read the first 3 arguments. (iirc, once with add and once with ldmia) While this would actually do the same thing, it wouldn't have the same guarantees. > Actually, I'd be interested in seeing the assembly for that, as I'd like to see > what it's doing differently. Will attach. > There are two issues here. I think you know these anyway, but for the sake of > documentation & clarity in the bug: > * Correctness on a system with mixed ARM & Thumb code: "blx" _must_ be used > here. > * Performance: > - ARMv4T interworking branches aren't return-stack-predicted on newer cores, > and this costs a huge amount of performance. (I've seen figures ranging from 5% > to 50%, but I don't know what would happen in this particular case.) > - ARMv4 long-range branches involve doing "mov pc, <dst_reg>". That's what > the old code used to do. ("bl" can't branch to an address in a register and is > restricted to PC±32MB.) The undeniable advantage of the C code is that the compiler will just use the right construct depending on the target architecture. No need to #ifdef __thumb__ or whatever. > I was planning to work on this (and 530087) further over the next few days > because it's been in my "to do" list for ages, but you're welcome to take it if > you want to. Since I've gone this far already, I think I'll just take it ;)
Assignee: Jacob.Bramley → mh+mozilla
(In reply to comment #14) > The undeniable advantage of the C code is that the compiler will just use the > right construct depending on the target architecture. No need to #ifdef > __thumb__ or whatever. Oh, certainly. C code is always preferable to assembly for that very reason, unless you need to do something that C code can't express efficiently (though that isn't the case here). I'm still not entirely convinced about the SP-fiddling though. It seems unsafe to me. For instance, how do you know that the stack_space array is at the bottom of your frame? C doesn't guarantee anything there (even though you declare it last). Yes, it probably works, but it feels unsafe. That's why the original was in assembly, so it was clear what was happening. In the case of writing to SP in the asm block: Yes, the compiler knows you've clobbered it, but C mandates that you have a stack and I'd be surprised if the compiler copes well with you doing that. If it isn't using a frame pointer, for example, it won't know how to wind back the stack pointer. Perhaps writing to SP in this way forces it to use a frame pointer, but none of that behaviour is documented (as far as I know) so it's likely to break between even minor compiler revisions. I like the C implementation, but it does make me feel nervous :-)
(In reply to comment #15) > (In reply to comment #14) > > The undeniable advantage of the C code is that the compiler will just use the > > right construct depending on the target architecture. No need to #ifdef > > __thumb__ or whatever. > > Oh, certainly. C code is always preferable to assembly for that very reason, > unless you need to do something that C code can't express efficiently (though > that isn't the case here). > > I'm still not entirely convinced about the SP-fiddling though. It seems unsafe > to me. For instance, how do you know that the stack_space array is at the > bottom of your frame? Because it is the only thing that is allocated on stack in the function. But we can probably use __builtin_alloca instead if you're uncomfortable with that, it generates the same code. > C doesn't guarantee anything there (even though you > declare it last). Yes, it probably works, but it feels unsafe. That's why the > original was in assembly, so it was clear what was happening. > > In the case of writing to SP in the asm block: Yes, the compiler knows you've > clobbered it, but C mandates that you have a stack and I'd be surprised if the > compiler copes well with you doing that. If it isn't using a frame pointer, for > example, it won't know how to wind back the stack pointer. Perhaps writing to > SP in this way forces it to use a frame pointer, but none of that behaviour is > documented (as far as I know) so it's likely to break between even minor > compiler revisions. I don't know if -fomit-frame-pointer is supposed to work on gcc arm, but i can't get gcc to stop using the frame pointer. The function is always marked with "frame_needed = 1", while other functions from the same source are marked as "frame_needed = 0", and as such don't use a frame pointer (even without -fomit-frame-pointer ; I guess it is included in -O2). Anyways, I can think of several other approaches using a bit more assembly, but still leaving most of the work to the C code, I'll check what kind of code get generated with that. Here is the assembly i get with the already attached C code (forcing -fno-inline), with an armv4t target: stmfd sp!, {r4, r5, r6, r7, r8, r9, fp, lr} mov r5, r0 add fp, sp, #28 mov r6, r1 mov r0, r2 mov r1, r3 mov r7, r2 mov r8, r3 bl _ZL18invoke_count_wordsjP13nsXPTCVariant(PLT) mov r3, r0, asl #2 add r3, r3, #18 bic r3, r3, #7 sub sp, sp, r3 mov r1, r7 mov r2, r8 add r0, sp, #4 bl _ZL20invoke_copy_to_stackPjjP13nsXPTCVariant(PLT) mov r4, sp add r3, sp, #16 #APP @ 200 "xptcinvoke_arm.cpp" 1 mov sp, r3 @ 0 "" 2 mov r0, r5 ldr r3, [r4, #12] ldr ip, [r5, #0] ldmib r4, {r1, r2} @ phole ldm ldr ip, [ip, r6, asl #2] mov lr, pc bx ip sub sp, fp, #28 ldmfd sp!, {r4, r5, r6, r7, r8, r9, fp, lr} bx lr Here it uses bl for the function calls and bx for the method invoke. With a armv5t target, the end changes to the following: ldmib r4, {r1, r2} @ phole ldm mov lr, pc ldr pc, [ip, r6, asl #2] sub sp, fp, #28 ldmfd sp!, {r4, r5, r6, r7, r8, r9, fp, pc} Without the assembly, this is how the first three 32-bits words are read from the stack: ldr r3, [sp, #12] ldr ip, [r5, #0] ldmib sp, {r1, r2} @ phole ldm In the end, sp would point on the third 32-bits word, instead of the fourth. Plus, the code is highly dependent on optimization level, while with the sp fiddling, the generated code is safe in all cases I tested.
> Because it is the only thing that is allocated on stack in the function. But we > can probably use __builtin_alloca instead if you're uncomfortable with that, it > generates the same code. Not necessarily: For one thing, the "register" qualfier is just a hint, but there may also be compiler temporaries that go on the stack. Admittedly, for a function of this size and complexity, it's a fairly safe assumption, but I think perhaps it needs a comment warning others from adding code without considering the consequences, as it would be dreadfully easy to break. Is there any way we can ASSERT this? I don't think there's an easy way, but I suppose in the assembly snippet we could assert that sp == stack_space before assigning it: #ifdef DEBUG __asm__ __volatile__ ( "cmp sp, %1" "moveq %0, #1" "movne %0, #0" "add sp, #(4*4)" : "=r" (stack_ok) : "r" (stack_space) : "sp" ); #else __asm__ __volatile__ ( "add sp, #(4*4)" /* sp is at stack_space[0], so advance it to [4]. * (This saves &stack_space[4] from being allocated * a register. */ : : : "sp" ); #endif > I don't know if -fomit-frame-pointer is supposed to work on gcc arm, [...] It sounds like touching the SP does force frame_needed to 1, so it's probably safe. > With a armv5t target, the end changes to the following: > ldmib r4, {r1, r2} @ phole ldm > mov lr, pc > ldr pc, [ip, r6, asl #2] > sub sp, fp, #28 > ldmfd sp!, {r4, r5, r6, r7, r8, r9, fp, pc} Ouch. I don't know why the compiler is emiting "ldr pc, [...]". It'd be preferable to load into a temporary (such as "ip") and then blx to that. Hmm. What do you get if you specify "-mcpu=cortex-a8"? It's possible that the compiler is assuming that it doesn't matter for ARMv5T devices (and it's probably right). (In any case, this is a compiler-related issue that will occur in all the compiled code.) > Without the assembly, this is how the first three 32-bits words are read from > the stack: > ldr r3, [sp, #12] > ldr ip, [r5, #0] > ldmib sp, {r1, r2} @ phole ldm > > In the end, sp would point on the third 32-bits word, instead of the fourth. Nope, sp isn't updated there. It's a strange way to load the values as a single ldm would suffice, but I suppose it would work and wouldn't affect performance much. ---- I just have one final concern, and then I'll leave you alone :-) We need to guarantee that the stack space is 64-bit aligned (for EABI compliance), but I'm not sure how you do that. Indeed, you removed some alignment code from invoke_count_words. Have you confirmed that this is actually correct? Any issues would only show up with 64-bit arguments (and only then if they don't fall across an alignment boundary).
I missed the fairly obvious call to "ASSERT(stack_ok)" in my DEBUG example :-)
(In reply to comment #17) > I just have one final concern, and then I'll leave you alone :-) We need to > guarantee that the stack space is 64-bit aligned (for EABI compliance), but I'm > not sure how you do that. Indeed, you removed some alignment code from > invoke_count_words. Have you confirmed that this is actually correct? Stack allocation already guarantees the stack pointer is going to be aligned. That's why the allocation on stack is space + 1, and the pointer passed to invoke_copy_to_stack is &space_stack[1], so that we start filling the stack on an unaligned word, and that the fourth argument is correctly aligned. Anyways, I am currently testing a rather different approach (combined with the removal of invoke_count_words, because it makes things easier) that doesn't do too much magic with the stack. I should be done (including full xpcshell-tests run) by tonight or tomorrow.
Attachment #433119 - Flags: review?(vladimir)
Right-o. This will also resolve bug 530087 :-)
I have a incomplete patch for the moment but the results are already good: (somehow the machine I'm using got faster since yesterday, so I reiterated all the benchmarks for comparison) original code: direct took 2.20 seconds invoke took 29.81 seconds So, invoke overhead was ~ 27.61 seconds (~ 93%) with attachment 433119 [details] [diff] [review]: direct took 2.20 seconds invoke took 22.66 seconds So, invoke overhead was ~ 20.45 seconds (~ 90%) with the WIP patch: direct took 2.22 seconds invoke took 16.94 seconds So, invoke overhead was ~ 14.72 seconds (~ 87%) We're looking at almost 2x improvement, though the speed test is actually biased, as it only tries one type of calls, that doesn't even need to use the stack. The current status of the WIP is that it is clean of any assembly and works properly, except that it lacks proper comments, and more importantly, breaks the non EABI case (which I unfortunately can't test, but I can at least try to make it theoretically work).
Some values for a modified speed test, using 10000000 iterations (10 times less) over AddMixedFloats instead of AddTwoInts: original code: direct took 10.72 seconds invoke took 17.41 seconds So, invoke overhead was ~ 6.69 seconds (~ 38%) with attachment 433119 [details] [diff] [review]: direct took 10.79 seconds invoke took 16.69 seconds So, invoke overhead was ~ 5.90 seconds (~ 35%) with the WIP patch: direct took 10.78 seconds invoke took 14.58 seconds So, invoke overhead was ~ 3.80 seconds (~ 26%)
I wouldn't worry about supporting non-EABI in Linux. It might be nice to have a conditional #error in case someone tries it. GCC still supports non-EABI for now, at least, though I don't know of anyone using it. A 2* improvement is fantastic! I'm guessing that's because you only make one pass of the argument list; I saw similar effects on the WinCE version with that modification (which was really simple for WinCE's calling convention).
Attached patch C-only version (obsolete) — Splinter Review
The patch is unfortunately not easily readable. It's probably better to apply it to see the resulting file. Please tell me if the comments are not clear enough. It is now free of assembly, and I think it is now perfectly safe. FYI, the main reason I went with the copy_double_word function is that somehow, gcc would generate 2 almost identical assembly snippets for the double word copies when I just inlined the code like it is in the original code: 1 for nsXPTType::T_I64 and nsXPTType::T_U64 and another one for nsXPTType::T_DOUBLE. And when I say almost identical, it's almost pathetic that it can't factor: add r0, r0, #7 add r0, r0, #7 bic r7, r0, #7 bic r7, r0, #7 cmp r7, r5 cmp r5, r7 moveq r7, lr moveq r7, lr sub r9, r3, #16 sub r9, r3, #16 ldmia r9, {r8-r9} ldmia r9, {r8-r9} add r0, r7, #4 add r0, r7, #4 stmia r7, {r8-r9} stmia r7, {r8-r9} b .L5 b .L5 As a nice side effect, it made it cleaner to keep non-eabi support, so I just implemented it. During the development of this patch, I hit some other pathetic cases of stupid gcc optimizations that I'll blog about.
Attachment #428899 - Attachment is obsolete: true
Attachment #433119 - Attachment is obsolete: true
Attachment #433581 - Flags: review?(Jacob.Bramley)
Mike - I will apply the patch and test on an N900 and N810. Are there tests apps you have been using to get your measurements? I was going to test using some talos suites too.
When building for N900: c++ -o xptcinvoke_arm.o -c -fvisibility=hidden -DMOZILLA_INTERNAL_API -DOSTYPE=\"Linux2.6\" -DOSARCH=Linux -DEXPORT_XPTC_API -D_IMPL_NS_COM -I/home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/../.. -I/home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/../../../../xptinfo/src -I/home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix -I. -I../../../../../../dist/include -I../../../../../../dist/include/nsprpub -I/home/mfinkle/mozilla-192/mobile/xulrunner/dist/include/nspr -I/home/mfinkle/mozilla-192/mobile/xulrunner/dist/include/nss -fPIC -fno-rtti -fno-exceptions -Wall -Wpointer-arith -Woverloaded-virtual -Wsynth -Wno-ctor-dtor-privacy -Wno-non-virtual-dtor -Wcast-align -Wno-invalid-offsetof -Wno-long-long -pedantic -fno-strict-aliasing -pthread -pipe -DNDEBUG -DTRIMMED -g -fno-inline -Os -freorder-blocks -fno-reorder-functions -finline-limit=50 -O2 -DMOZILLA_CLIENT -include ../../../../../../mozilla-config.h -Wp,-MD,.deps/xptcinvoke_arm.pp /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp: In function 'PRUint32* copy_double_word(PRUint32*, PRUint32*, PRUint32*, PRUint64*)': /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:75: warning: cast from 'PRUint32*' to 'PRUint64*' increases required alignment of target type /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp: In function 'nsresult NS_InvokeByIndex_P(nsISupports*, PRUint32, PRUint32, nsXPTCVariant*)': /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:158: error: ISO C++ forbids variable-size array 'stack_space' make[8]: *** [xptcinvoke_arm.o] Error 1
Attached patch Additional patch (obsolete) — Splinter Review
Mark, can you try squashing this patch ?
(In reply to comment #25) > Mike - I will apply the patch and test on an N900 and N810. Are there tests > apps you have been using to get your measurements? The measurements in e.g. comment 11 come from TestXPTCInvoke in xpcom/reflect/xptcinvoke/tests (you need to uncomment the DoSpeedTest call)
(In reply to comment #27) > Created an attachment (id=433641) [details] > Additional patch > > Mark, can you try squashing this patch ? With this patch applied on the first patch, I get a successful build. However, when attempting to launch Firefox Mobile on my N900, I get a Segmentation fault.
(In reply to comment #29) > However, when attempting to launch Firefox Mobile on my N900, I get a > Segmentation fault. Can you try building the TestXPTCInvoke program in xpcom/reflect/xptcinvoke/tests (just make -C xpcom/reflect/xptcinvoke/tests) and then run it with: dist/bin/run-mozilla.sh dist/bin/TestXPTCInvoke ? How far does it go ?
Also, is the N900 toolchain building EABI binaries ?
(In reply to comment #30) > Can you try building the TestXPTCInvoke program in > xpcom/reflect/xptcinvoke/tests (just make -C xpcom/reflect/xptcinvoke/tests) > and then run it with: > dist/bin/run-mozilla.sh dist/bin/TestXPTCInvoke ? How far does it go ? Completes the whole thing. Even with DoSpeedTests uncommented. > Also, is the N900 toolchain building EABI binaries ? I think so. The arch in the packaged filenames says "guneabi"
I have not been able to install gdb on my N900, but I have seen that the segfault is happening in invoke_copy_to_stack. I'll try to narrow it down.
Attachment #433581 - Flags: review?(Jacob.Bramley) → review+
Comment on attachment 433581 [details] [diff] [review] C-only version >+ current = (PRUint32 *)(((PRUint32)current + 7) & 0xfffffff8); '~7' is clearer to me than '0xfffffff8'. I suppose that's subjective, though. >+ *((PRUint64*) current) = *dw; I don't think that's valid in the non-EABI case where 'current' isn't 8-byte aligned. It will probably work anyway, but I think C assumes that 64-bit values have 64-bit alignment; LDRD and STRD had this restriction on older architectures. I think the non-EABI and EABI cases should be entirely contained in the #ifdef to avoid these issues. >+ /* Wrap when reaching the end of the stack buffer */ >+ if (d == end) d = stk; It might be wise to add a debug assertion here to check that 'd' is within 'stk' and '&stk[end]'. That will catch some logic errors that can be hard to spot, and easy to introduce in future patches. > * The routine will issue calls to count the number of words > * required for argument passing and to copy the arguments to > * the stack. We don't do this any more. Well, not explicitly anyway. Can you clobber this bit of the comment please? ---- I like the look of this patch (with the 'Additional patch' on top), and it appears correct so I've marked it 'r+'. However, the N900 crash obviously needs to diagnosed & fixed before this is pushed. Thanks, Jacob
(In reply to comment #34) > '~7' is clearer to me than '0xfffffff8'. I suppose that's subjective, > though. I prefer, too, but I just left what already was there. > >+ *((PRUint64*) current) = *dw; > > I don't think that's valid in the non-EABI case where 'current' isn't > 8-byte aligned. It will probably work anyway, but I think C assumes that > 64-bit values have 64-bit alignment; LDRD and STRD had this restriction > on older architectures. I think the non-EABI and EABI cases should be > entirely contained in the #ifdef to avoid these issues. I don't think there is a 64-bit alignment requirement on architectures that matter. FWIW, in Debian we target armv4t, and upstream Mozilla codebase already targets at least armv5. > >+ /* Wrap when reaching the end of the stack buffer */ > >+ if (d == end) d = stk; > > It might be wise to add a debug assertion here to check that 'd' is > within 'stk' and '&stk[end]'. That will catch some logic errors that can > be hard to spot, and easy to introduce in future patches. Fair enough. > I like the look of this patch (with the 'Additional patch' on top), and it > appears correct so I've marked it 'r+'. However, the N900 crash obviously needs > to diagnosed & fixed before this is pushed. Totally agreed. I'd like some more feedback from Mark if possible.
Any update, Mark ?
We are looking into this. Brian Crowder was able to get a stack from a build: #0 0xbe9ff4a8 in ?? () #1 0x43162a2c in invoke_copy_to_stack (stk=0x12, paramCount=3198154224, s=0x8) at /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:92 #2 0x42693488 in XPCWrappedNative::CallMethod (ccx=..., mode=<value optimized out>) at /home/mfinkle/mozilla-192/mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:2722 #3 0x4269c2c0 in XPCWrappedNative::GetAttribute (ccx=...) at /home/mfinkle/mozilla-192/mozilla/js/src/xpconnect/src/xpcprivate.h:2524 #4 0x4269993c in XPC_WN_GetterSetter (cx=0x41c05800, obj=<value optimized out>, argc=0, argv=0x40455608, vp=0xbe9ff708) at /home/mfinkle/mozilla-192/mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1784 #5 0x406d7c0c in js_Invoke (cx=0x41c05800, argc=0, vp=0x40455600, flags=<value optimized out>)
I have been testing on mozilla-192. I will start testing on mozilla-central as well and report.
The xptcinvoke code is the same on 1.9.2 and trunk. It would be interesting to have the values for d, stk and paramCount on your stacktrace, but that would either need to break on the invoke_copy_to_stack function entry, or run an unoptimized build.
I made a non-optimized build from mozilla-central. The build works fine on my N900. I am now going to make a non-optimized build from mozilla-192 and try it.
Non optimized build from mozilla-192 segfaults on startup. Here is the stack: Starting program: /home/opt/fennec/fennec Program received signal SIGSEGV, Segmentation fault. 0xbe9efb70 in ?? () 0xbe9efb70: andeq r0, r0, r0 (gdb) bt #0 0xbe9efb70 in ?? () #1 0x43a98b30 in invoke_copy_to_stack (stk=0x421d2200, paramCount=1109287616, s=0x8) at /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:92 #2 0x426db3f8 in XPCWrappedNative::CallMethod (ccx=..., mode=XPCWrappedNative::CALL_GETTER) at /home/mfinkle/mozilla-192/mozilla/js/src/xpconnect/src/xpcwrappednative.cpp:2722 #3 0x426ebdf4 in XPCWrappedNative::GetAttribute (ccx=...) at /home/mfinkle/mozilla-192/mozilla/js/src/xpconnect/src/xpcprivate.h:2524 #4 0x426e5fa0 in XPC_WN_GetterSetter (cx=0x40446200, obj=0x421d2200, argc=0, argv=0x404b05a0, vp=0xbe9eff3c) at /home/mfinkle/mozilla-192/mozilla/js/src/xpconnect/src/xpcwrappednativejsops.cpp:1784 #5 0x4076948c in js_Invoke (cx=0x40446200, argc=0, vp=0x404b0598, flags=2) at /home/mfinkle/mozilla-192/mozilla/js/src/jsinterp.cpp:1360
I tried stepping through invoke_copy_to_stack: (gdb) step 1 80 in /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp (gdb) step 1 88 in /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp (gdb) step 1 86 in /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp (gdb) step 1 87 in /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp (gdb) step 1 92 in /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp (gdb) step 1 nsXPTCVariant::IsPtrData (this=<value optimized out>) at ../../../../../../dist/include/xptcall.h:110 110 ../../../../../../dist/include/xptcall.h: No such file or directory. in ../../../../../../dist/include/xptcall.h (gdb) step 1 invoke_copy_to_stack (stk=0xbe809940, paramCount=1, s=0xbe809b70) at /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp:91 91 /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp: No such file or directory. in /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp (gdb) step 1 92 in /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp (gdb) step 1 94 in /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp (gdb) step 1 92 in /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp (gdb) step 1 88 in /home/mfinkle/mozilla-192/mozilla/xpcom/reflect/xptcall/src/md/unix/xptcinvoke_arm.cpp (gdb) step 1 Program received signal SIGSEGV, Segmentation fault. 0xbe809b70 in ?? () 0xbe809b70: andeq r0, r0, r0 Looks like the if(s->IsPtrData()) test is true and we attempt to continue. I should have been checking locals after each step. More coming.
Mike - One thing I overlooked: This appears to be broken only on mozilla-192, but not on mozilla-central. So it should be OK to checkin the patches you have and we can continue to work on getting this to mozilla-192. Jacob & Brian - Do you guys agree?
I would personally prefer to understand the 192 broken-ness before landing it anywhere.
(In reply to comment #44) > I would personally prefer to understand the 192 broken-ness before landing it > anywhere. Especially considering that really, there should be no difference between both.
(In reply to comment #44) > I would personally prefer to understand the 192 broken-ness before landing it > anywhere. I certainly agree! The brokenness is probably still there in mozilla-central, but just not exercised. It might be fine, but until we understand what's happening, I don't trust it.
mfinkle, is this crashing on any/the first xptcall invocation? Or just a specific one? Stepping through it an instruction at a time and printing registers would be helpful; a gdb command that does 'si ; x/i $pc ; info reg' would probably be helpful, instead of 'step 1' above.
(In reply to comment #47) > mfinkle, is this crashing on any/the first xptcall invocation? Or just a > specific one? Happens on the first invocation > Stepping through it an instruction at a time and printing > registers would be helpful; a gdb command that does 'si ; x/i $pc ; info reg' > would probably be helpful, instead of 'step 1' above. I am collecting the data now. Will attach a dump of the session.
Attached file gdb session from N900
Here is the gdb session leading up to the crash - using the "si ; x/i $pc ; info reg" commands
Attached patch Patch v3 (obsolete) — Splinter Review
D'oh, the problem was so obvious and so problematic that I wonder how it could work without crashes on my builds, or with Mark's m-c builds O_o The problem was that the stack buffer was too small in the paramCount == 1 case, and invoke_copy_to_stack would copy outside the buffer boundaries (it would write *before* the buffer, where the saved pc is for function return, in non optimized builds at least). The added assertion would catch this if this ever happens again. Mark, can you check this patch ? I'll also try it on my end.
Attachment #433581 - Attachment is obsolete: true
Attachment #433641 - Attachment is obsolete: true
Attached patch Patch v3.1 (obsolete) — Splinter Review
This version actually builds. I'm still waiting for the build to fully finish to actually test.
Attachment #438688 - Attachment is obsolete: true
Attached patch Patch v3.2Splinter Review
With a brown paper bag fix, this one seems to work properly. More testing undergoing.
Attachment #438697 - Attachment is obsolete: true
Comment on attachment 438700 [details] [diff] [review] Patch v3.2 I verified this one works for me. I think it will work for Mark, too.
Attachment #438700 - Flags: review?(Jacob.Bramley)
Comment on attachment 438700 [details] [diff] [review] Patch v3.2 This patch works for me! Nice job Mike. Running Fennec on my N900. It launches fine and I am browsing pages and running sunspider.
I tested Ts on my own Fennec build with and without the patch: (averaged over 10 runs) Ts without: 5912ms Ts with: 5787ms Nice 14% improvement
Problem in Google Spreadsheet formula. Correct percentage: 2% Not as good, but still positive!
Comment on attachment 438700 [details] [diff] [review] Patch v3.2 Looks good to me. Nice job!
Attachment #438700 - Flags: review?(Jacob.Bramley) → review+
Summary: Fix NS_InvokeByIndex() to use BLX for method invocations when compiling for Thumb → Reimplement xptcinvoke for arm in C
Summary: Reimplement xptcinvoke for arm in C → Reimplement xptcinvoke for arm/linux in C
Attachment #438700 - Flags: review?(vladimir)
Comment on attachment 438700 [details] [diff] [review] Patch v3.2 Looks good to me! Note that I'm not an xpcom peer -- bsmedberg is though, so asking him just to confirm this patch. Code looks fine to me, and is much cleaner. Thanks!
Attachment #438700 - Flags: superreview?(benjamin)
Attachment #438700 - Flags: review?(vladimir)
Attachment #438700 - Flags: review+
Comment on attachment 438700 [details] [diff] [review] Patch v3.2 I'll delegate to people who know!
Attachment #438700 - Flags: superreview?(benjamin) → superreview+
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
Depends on: 560769
Comment on attachment 438700 [details] [diff] [review] Patch v3.2 Code is baking on trunk and would be helpful on 192 for Fennec 1.1
Attachment #438700 - Flags: approval1.9.2.5?
Comment on attachment 438700 [details] [diff] [review] Patch v3.2 dropping for fennec 1.1
Attachment #438700 - Flags: approval1.9.2.5?
Blocks: 1897635
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: