831611 - [META] Get perf working on b2g

Reporter

Description

•

12 years ago

The Gecko profiler is nice, but does not work well on B2G, principally because it can't walk native (i.e. C++) stacks: AIUI the stack-walking code we have doesn't work from within signal handlers. Anyway, I think it should be easy to get perf working on b2g. I already got a somewhat-sketchy binary [1] and ran it. I now need to write fix_perf_stack.py to symbolicate the result. [1] http://code.google.com/p/android-group-korea/downloads/detail?name=perf-binary-kwangwoo-for-android.zip&can=2&q=

Justin Lebar (not reading bugmail)

Reporter

Comment 1

•

12 years ago

Attached file Garbled perf report of starting the clock app (obsolete) — Details

It's possible that I've messed up the address translation here, but I checked a few of them by hand, and they seem right. This report looks pretty useless.

Justin Lebar (not reading bugmail)

Reporter

Comment 2

•

12 years ago

Oh, and this was on a DMD build, which has -funwind-tables. But perhaps perf isn't using that.

Justin Lebar (not reading bugmail)

Reporter

Comment 3

•

12 years ago

Heh, we apparently did this once. https://github.com/andreasgal/B2G/issues/167 When I run perf report, I get > Failed to open /system/bin/linker, continuing without symbols but I presume that's not what's keeping me from getting good addresses out of perf which I can later translate.

Justin Lebar (not reading bugmail)

Reporter

Comment 4

•

12 years ago

Okay, so taking objdir-gecko/toolkit/library/libxul.so, running strip -g (strips out debug info), putting that on the phone, and then running perf gives me good top-level frames. The stacks are still useless, though, and it still shows us spending 75% startup time in a mystery function. The mystery function's PC is 0xffff0fd4, which looks an awful lot like a stack address or something. I'm not sure what to make of that; I wouldn't expect even the JIT to run code off the stack. Another reason not to believe it's JIT code is that it's the one thing the Gecko profiler is supposed to know how to show us, and the gandalf's SPS profiles in bug 831135 comment 4 do not show time spent in the JIT.

Dave Hylands [:dhylands]

Comment 5

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #4) > Okay, so taking objdir-gecko/toolkit/library/libxul.so, running strip -g > (strips out debug info), putting that on the phone, and then running perf > gives me good top-level frames. The stacks are still useless, though, and > it still shows us spending 75% startup time in a mystery function. > > The mystery function's PC is 0xffff0fd4, which looks an awful lot like a > stack address or something. I'm not sure what to make of that; I wouldn't > expect even the JIT to run code off the stack. I recognize that address. Take a look here: https://github.com/mozilla-b2g/B2G/blob/master/scripts/profile-symbolicate.py#L5 0xffff0fd4 corresponds to __kernel_cmpxchg which is used to implement atomic operations in user space.

Dave Hylands [:dhylands]

Comment 6

•

12 years ago

Actually, looking at the bionic source, it seems that we may be using something designed for ARMv5. If I do: touch empty.c prebuilt/linux-x86/toolchain/arm-linux-androideabi-4.4.x/bin/arm-linux-androideabi-gcc -E -dD empty.c | grep ARCH it reports: #define __ARM_ARCH_5TE__ 1 My unagi reports being an ARMv7, so theoretically, we shoud be able to rebuilt the toolchain or add -mcpu=cortex-a8 then it reports: #define __ARM_ARCH_7A__ 1 which would enable the presumably more efficient variant of the atomic operations. See: bionic/libc/arch-arm/bionic/atomics_arm.S

Mike Hommey [:glandium]

Comment 7

•

12 years ago

Note perf doesn't like elfhack, so you have to disable elfhack if you have it enabled. You can also get debug symbols for system libraries by invoking a build.sh command i forget.

Dave Hylands [:dhylands]

Comment 8

•

12 years ago

So I disassembled ./out/target/product/otoro/obj/STATIC_LIBRARIES/libc_common_intermediates/arch-arm/bionic/atomics_arm.o and it seems to be using LDREX and STREX, so I wonder where the references to 0xffff0fd4 are coming from?

Justin Lebar (not reading bugmail)

Reporter

Comment 9

•

12 years ago

> so I wonder where the references to 0xffff0fd4 are coming from? It could be being called from the kernel, for all we know. :-/

Justin Lebar (not reading bugmail)

Reporter

Comment 10

•

12 years ago

Attached file Perf report from main process (obsolete) — Details

elfhack disabled, building with -fno-omit-frame-pointer and -funwind-tables. I unfortunately still don't have useful stacks from Gecko. However, there is an interesting point here, if we believe it: It says that the main process spends ~50% of its time in finish_task_switch.clone.3 in the kernel. We don't have a Gecko caller for most of the context switches, but the one called out here (pt_PostNotifies, ptsynch.c:111) seems reasonable. Note that perf is a CPU time profiler, rather than a wall clock time profiler. (SPS profiles wall time.) So it's not clear how much of the main process's time is spent asleep, versus how much time is spent context-switching.

Justin Lebar (not reading bugmail)

Reporter

Comment 11

•

12 years ago

> 7.37% GL updater libdmd.so [.] __udivsi3 This is also interesting, because the only place DMD does division (I checked the object file) is in replace_aligned_alloc and replace_posix_memalign. I wouldn't expect those functions to be called often enough to show up here.

Justin Lebar (not reading bugmail)

Reporter

Comment 12

•

12 years ago

I found the code in the kernel which walks the userspace stacks for perf. It is not very intelligent. :) http://lxr.linux.no/#linux+v3.7.3/arch/arm/kernel/perf_event.c#L528 It uses the frame pointer (regs->ARM_fp) and then expects that to point to the previous frame's fp, and so on. What's interesting is that even when I compile a simple program with -fno-omit-frame-pointer, perf still can't walk the stack. Perhaps I'm not compiling it right, or perhaps the stack-walking code in the kernel on our devices is different from the code at kernel tip.

Justin Lebar (not reading bugmail)

Reporter

Comment 13

•

12 years ago

Okay, we discovered that perf seems to be able to walk stacks if you do -marm -mapcs-frame -fno-omit-frame-pointer Crucially, perf does /not/ seem happy with -mthumb -mtpcs-frame -mtpcs-leaf-frame -fno-omit-frame-pointer Which means that if we want perf stacks, we apparently have to compile as arm. Let's see if this works at all...

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

Depends on: 832379

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

Depends on: 832752

Gabriele Svelto [:gsvelto]

Comment 14

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #13) > Okay, we discovered that perf seems to be able to walk stacks if you do > > -marm -mapcs-frame -fno-omit-frame-pointer > > Crucially, perf does /not/ seem happy with > > -mthumb -mtpcs-frame -mtpcs-leaf-frame -fno-omit-frame-pointer That's the same issue I've encountered when trying to implement stack walking using _Unwind_Backtrace(). If you try generating a back-trace from GDB using the same flags you'll encounter the same problem. The issue is that when the kernel enters the signal handler it leaves what looks like a non-Thumb stack frame on the stack even when the signal handler has been invoked in a Thumb executable. Most unwinders get confused by this (understandably, the ABI is not the same) and will work only if your code is also compiled in non-Thumb mode. > Which means that if we want perf stacks, we apparently have to compile as > arm. Let's see if this works at all... The issue is that you'll have some significant code-size differences between ARM- and Thumb-mode so the results will be a bit skewed. I was hoping to set aside enough free time to implement a minimal NS_StackWalk() that would deal with this if the breakpad-based profiler doesn't land soon but I'm not really sure how long it might take me.

Justin Lebar (not reading bugmail)

Reporter

Comment 15

•

12 years ago

> The issue is that when the kernel enters the signal handler it leaves what looks like a non-Thumb > stack frame on the stack even when the signal handler has been invoked in a Thumb executable. Most > unwinders get confused by this (understandably, the ABI is not the same) and will work only if your > code is also compiled in non-Thumb mode. I don't think this is the issue with the kernel unwinding code I linked to in comment 12. First of all there's no user-space signal handler, as far as I can tell. It's a kernel interrupt. So the kernel does not have to unwind through a signal handler. But second of all, the kernel does not appear to have any check for whether we're in ARM or Thumb mode. It's assuming that the FP is in the same place in both cases. And -mthumb -mtpcs-frame -mtpcs-leaf-frame -fno-omit-frame-pointer does not put the FP there, so the kernel can't unwind.

Justin Lebar (not reading bugmail)

Reporter

Comment 16

•

12 years ago

fwiw it would be /great/ if we could figure out how to get perf to be happy with thumb code. As you say, -marm will have different performance characteristics than -mthumb. But also, -marm doesn't work with gcc 4.4, so we have to compile with a different compiler! And -mapcs-frame will of course obviously make things slower. At some point it becomes a different build... :) But I'm still convinced that we want to get perf working one way or another, if we can; we have too many issues with "I have no idea where this time is going" with the built-in profiler, and it's not entirely because we can't walk the native stacks. See e.g. bug 831135.

Gabriele Svelto [:gsvelto]

Comment 17

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #15) > First of all there's no user-space signal handler, as far as I can tell. > It's a kernel interrupt. So the kernel does not have to unwind through a > signal handler. AFAIK the kernel leaves a dummy frame which will cause the userspace signal handler to invoke sigreturn() when returning. The only thing that changes is the status of the CPSR (current status program register), see setup_return(): http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=blob_plain;f=arch/arm/kernel/signal.c;hb=HEAD The code in _Unwind_Backtrace() or GDB however excepts all the stack frames to conform to the Thumb ABI and this particular frame doesn't thus breaking the whole process. I'm not sure if the kernel has the same issue but from your description I'd guess it does. > But second of all, the kernel does not appear to have any check for whether > we're in ARM or Thumb mode. It's assuming that the FP is in the same place > in both cases. And -mthumb -mtpcs-frame -mtpcs-leaf-frame > -fno-omit-frame-pointer does not put the FP there, so the kernel can't > unwind. Yeah, this sounds a lot like a problem caused by an ABI mismatch, i.e. the last Thumb frame before the handler points back as if the signal handler frame would also conform to the Thumb ABI. Since it doesn't the whole procedure falls apart. That's what I meant when I said that the kernel leaves a non-Thumb frame; I now realize that I could have worded my post far better than I did :-)

Gabriele Svelto [:gsvelto]

Comment 18

•

12 years ago

To elaborate a bit further on this most unwinders I've looked at try to lookup the return address of the last frame before the signal handler in the EH to find the appropriate unwinding information. The return address however points to sigreturn() which is a syscall (thus in the 0xffff0000-0xffffffff area) and thus they can't find a matching EH entry; as a fallback they try using the FP but they don't find a valid one either and so they give up. It might be that we just need to special-case the unwinding code by detecting that we're dealing with a signal handler stack frame and skipping over it to the last frame before the signal handler was invoked. After all the signal handler frame should always have the same size and layout.

Justin Lebar (not reading bugmail)

Reporter

Comment 19

•

12 years ago

Let me elaborate more on why I don't think signal handler frames are the problem here. Maybe I'm wrong and you can give us a new way forward towards getting perf to work. Suppose we compile the following file: > int bar(int); > > int foo() { > int i; > for (i = 0; i < 100; i++) { > bar(i); > } > return i; > } With -marm -O2, I get > 00000000 <foo>: > 0: e92d4010 push {r4, lr} > 4: e3a04000 mov r4, #0 > 8: e1a00004 mov r0, r4 > c: e2844001 add r4, r4, #1 > 10: ebfffffe bl 0 <bar> > 14: e3540064 cmp r4, #100 ; 0x64 > 18: 1afffffa bne 8 <foo+0x8> > 1c: e1a00004 mov r0, r4 > 20: e8bd8010 pop {r4, pc} All that's relevant for our purposes are the push and pop statements, so we can ignore the rest. Is the frame here unwindable by the kernel code in comment 12? Well, the kernel code is looking for a frame which contains three pointers, {fp, sp, lr}. This frame can't possibly work, because it pushes only two registers. Suppose we compile instead with -marm -O2 -mapcs-frame. Then our function prelude is > 00000000 <foo>: > 0: e1a0c00d mov ip, sp > 4: e92dd818 push {r3, r4, fp, ip, lr, pc} Now we have {fp, sp, lr} in the stack frame, and it turns out that the kernel can unwind this frame. Compare this to the prelude when we compile with -mthumb -O2: > 00000000 <foo>: > 0: b510 push {r4, lr} That's not going to work; it's the same as -marm -O2. What about -mthumb -O2 -mtpcs-frame -mtpcs-leaf-frame -fno-omit-frame-pointer? > 00000000 <foo>: > 0: b084 sub sp, #16 > 2: b598 push {r3, r4, r7, lr} In order for this to work, gcc must be storing the fp in r4, because the kernel is expecting {fp, sp, lr}. (The kernel doesn't do anything with the sp value, so it would be ok if r7 was used as a general-purpose register.) Let's see if we can figure out which register is used as the fp in this disassembly (again, generated with -mthumb -O2 -mtpcs-frame -mtpcs-leaf-frame -fno-omit-frame-pointer): > 00000000 <foo>: > 0: b084 sub sp, #16 > 2: b598 push {r3, r4, r7, lr} > 4: ab08 add r3, sp, #32 > 6: 9305 str r3, [sp, #20] > 8: 467b mov r3, pc > a: 9307 str r3, [sp, #28] > c: 465b mov r3, fp > e: 9304 str r3, [sp, #16] > 10: 4673 mov r3, lr > 12: 9306 str r3, [sp, #24] > 14: ab07 add r3, sp, #28 > 16: 469b mov fp, r3 > 18: af00 add r7, sp, #0 > 1a: 2400 movs r4, #0 > 1c: 1c20 adds r0, r4, #0 > 1e: 3401 adds r4, #1 > 20: f7ff fffe bl 0 <bar> > 24: 2c64 cmp r4, #100 ; 0x64 > 26: d1f9 bne.n 1c <foo+0x1c> > 28: 46bd mov sp, r7 > 2a: 2064 movs r0, #100 ; 0x64 > 2c: bc98 pop {r3, r4, r7} > 2e: bc0e pop {r1, r2, r3} > 30: 4693 mov fp, r2 > 32: 469d mov sp, r3 > 34: 4708 bx r1 r4 is used as the loop control variable (cmp r4, #100), so it's not being used as a frame pointer. Again, as far as I can tell from reading the source code, the kernel unwinder does not look for EH entries, or any elf magic. It looks only at stack frames. The kernel does not have both ARM and Thumb unwind code, so, while I don't think it's wrong to call the problem an ABI mismatch, the issue isn't that the kernel is unwinding through an ARM signal handler into Thumb code and not noticing the transition to Thumb. The issue, as far as I can tell, is that the kernel simply has no code whatsoever to unwind this sort of frame. If we could convince gcc to push {fp, X, lr} in thumb mode, I think there would be a (good) chance the kernel could unwind these thumb frames.

Gabriele Svelto [:gsvelto]

Comment 20

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #19) > Let me elaborate more on why I don't think signal handler frames are the > problem here. Maybe I'm wrong and you can give us a new way forward towards > getting perf to work. Just a small premise, if my ARM-fu is not too rusty the frame-pointer should be r7 in Thumb mode and r11 in ARM mode. > In order for this to work, gcc must be storing the fp in r4, because the > kernel > is expecting {fp, sp, lr}. (The kernel doesn't do anything with the sp > value, > so it would be ok if r7 was used as a general-purpose register.) I can't rememeber right now what's the sp in Thumb mode but r7 is definitely the frame-pointer so this is {fp, lr}. I can't remember if r4 has a special role but it might. > r4 is used as the loop control variable (cmp r4, #100), so it's not being > used > as a frame pointer. I have to dig out the full ABI description but FP is definitely r7 in Thumb mode. The only thing I'm not sure of is if r4 has a special role or not. > Again, as far as I can tell from reading the source code, the kernel unwinder > does not look for EH entries, or any elf magic. It looks only at stack > frames. I see, so for this to work we need a valid fp-chain in the stack; things like _Unwind_Bactrace() or GDB also look at the EH entries so I assumed the kernel might want to do the same. > The issue, as far as I can tell, is that > the kernel simply has no code whatsoever to unwind this sort of frame. It seems so, but it might also be possible that it's not looking for the FP where it's supposed to be (i.e. in r7 instead of r11). > If we could convince gcc to push {fp, X, lr} in thumb mode, I think there > would be a (good) chance the kernel could unwind these thumb frames. Yes, pretty much every unwinder should work then.

Justin Lebar (not reading bugmail)

Reporter

Comment 21

•

12 years ago

> It seems so, but it might also be possible that it's not looking for the FP where it's > supposed to be (i.e. in r7 instead of r11). I encourage you to read the kernel code I've linked. It's quite straightforward, and the relevant bit is only about a hundred lines. It would certainly be possible for the kernel to look for {fp, lr} on the stack when we're in thumb mode, but as far as I can tell, it is not doing so.

Justin Lebar (not reading bugmail)

Reporter

Comment 22

•

12 years ago

cjones, how hard exactly would it be to rebuild our kernel? If B2G works with -mthumb -mtpcs-frame -mtpcs-leaf-frame -fno-omit-frame-pointer (I need to check), then modifying the kernel so that it can walk these frames might be simpler than upgrading to gcc 4.6, fixing bug 832752, and fixing whatever roadblock I hit after that.

Flags: needinfo?(jones.chris.g)

Gabriele Svelto [:gsvelto]

Comment 23

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #21) > I encourage you to read the kernel code I've linked. It's quite > straightforward, and the relevant bit is only about a hundred lines. OK, it looks very straightforward indeed. I'm not sure why it doesn't walk correctly the -mthumb -O2 -mtpcs-frame -mtpcs-leaf-frame -fno-omit-frame-pointer case, the fp should be on the stack so it must be looking for it at the wrong offset. > It would certainly be possible for the kernel to look for {fp, lr} on the stack > when we're in thumb mode, but as far as I can tell, it is not doing so. Yes, BTW I agree that it would be better to put a fix in the kernel if possible as I don't think we'll be able to work around this with the compiler. Did you have the chance to check what more recent GCCs outputs for this case? If not shall I do it?

Justin Lebar (not reading bugmail)

Reporter

Comment 24

•

12 years ago

> Did you have the chance to check what more recent GCCs outputs for this case? I didn't realize it at the time, but the output from comment 19 is with gcc 4.6. But gcc 4.4 has the same preludes, though. > I'm not sure why it doesn't walk correctly the -mthumb -O2 -mtpcs-frame -mtpcs-leaf-frame -fno-omit- > frame-pointer case, the fp should be on the stack so it must be looking for it at the wrong offset. I think you're probably right that it's pushing {r7, lr} as {fp, lr}. So the fp is on the stack. But then the kernel is unwinding the stack using > struct frame_tail { > struct frame_tail __user *fp; > unsigned long sp; > unsigned long lr; > } __attribute__((packed)); which expects there to be a word between fp and lr.

Justin Lebar (not reading bugmail)

Reporter

Comment 25

•

12 years ago

Attached file Perf report of starting up FM radio app (after preloading) (obsolete) — Details

We're getting there... This is a perf report of starting up the FM radio app. I attached perf to the idle preallocated process, then started the FM radio app. Once it finished loading, I detached perf. I stuck a strip -g'ed libxul.so on the phone, and if you scroll down a bit, you can see that we get pretty reasonable looking stacks from it. But the topmost stack is still totally opaque, and my addr2line tool isn't helping. I suspect we may be inside JS at the time and be unable to unwind out of it.

Attachment #703150 - Attachment is obsolete: true

Attachment #703293 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Reporter

Comment 26

•

12 years ago

Attached file Perf report of launching FM radio proc with apparently correct addr2line — Details

I'd forgotten to disable elfhack; doing so made this a lot better! This still isn't very useful, though. It seems like perf is unable to aggregate the top entry (the 70-something percenter), so it's just spitting out a bunch of random stack frames. The top entry is the one that's not addr2line'ed by perf, so my guess is that the fact that it can't aggregate these frames is related to the fact that it can't addr2line them.

Attachment #705432 - Attachment is obsolete: true

Zibi Braniecki [:zbraniecki][:gandalf]

Updated

•

12 years ago

Blocks: 831135

Zibi Braniecki [:zbraniecki][:gandalf]

Updated

•

12 years ago

Blocks: slim-fast

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 27

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #22) > cjones, how hard exactly would it be to rebuild our kernel? > > If B2G works with -mthumb -mtpcs-frame -mtpcs-leaf-frame > -fno-omit-frame-pointer (I need to check), then modifying the kernel so that > it can walk these frames might be simpler than upgrading to gcc 4.6, fixing > bug 832752, and fixing whatever roadblock I hit after that. We can get new kernel builds. Please list exactly the config change you want and mwu or someone else can send out the request.

Flags: needinfo?(jones.chris.g)

Justin Lebar (not reading bugmail)

Reporter

Comment 28

•

12 years ago

> Please list exactly the config change you want and mwu or someone else can send out the > request. I'd need to actually patch the kernel source. That would be difficult if I couldn't compile it myself. I could try writing a patch against a different version of the kernel, perhaps even targeting a different device (e.g. an Android phone).

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 29

•

12 years ago

A patch against the kernel tree would be even easier. The baseline tree used by downstreams is in your b2g checkout as kernel/. There should be some msm* config you can use to sanity check the patch.

Justin Lebar (not reading bugmail)

Reporter

Comment 30

•

12 years ago

Attached file perf report -G (starting with root) of launching FM radio app into preallocated process — Details

cjones asked on IRC to see this. It's pretty hard for me to read, but perhaps there's some interesting data here.

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

Attachment #705445 - Attachment is patch: false

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 31

•

12 years ago

It's a little hard to tell but it looks like we might be spending fairly significant time loading JS modules. At any rate, about 20% of the time at the top of the profile is going to JS execution stuff. Another 20% in the middle is painting. Below that and ~12% is more JS execution, this time definitely of content script. There's some HTML and CSS stuff going on in there too, maybe dynamically created content. Then we have 6% or so running onload. Below that is about 10% in screenshotting :(. 5% is showing up in layers code, but not in the tree where it should be. Not sure what's up there. Looks like the data is relatively useful. Just need to load it in a UI that lets it talk to us :(.

Justin Lebar (not reading bugmail)

Reporter

Comment 32

•

12 years ago

> Looks like the data is relatively useful. Just to be clear, that's assuming that the data is correct, which I myself am not ready to give a lot of confidence to. :)

Justin Lebar (not reading bugmail)

Reporter

Comment 33

•

12 years ago

fwiw, cjones's latest profiling exploits have convinced me that there's a lot of value in being able to use SPS to profile. So I'm going to see if anything I learned from this perf work can apply towards getting stacks out of SPS.

Gabriele Svelto [:gsvelto]

Comment 34

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #33) > fwiw, cjones's latest profiling exploits have convinced me that there's a > lot of value in being able to use SPS to profile. So I'm going to see if > anything I learned from this perf work can apply towards getting stacks out > of SPS. The existing bug for that is 810526 which is assigned to me. I've put it on hold as work on bug 779291 was going on though I don't think that we'll have anything soon. IMHO there's two ways forward, either we help out finishing the last dependencies for breakpad-based SPS unwinding or we put together something real quick for B2G only. On this topic I had a discussion recently with :mwu and we came up with the idea of trying to come up with our own minimal stack-walking code by re-using libgcc's existing facilities to read EH tables. We didn't check how feasible it is but it could be a quick way of having a decent stack walker (we'd need to add workarounds for signal handlers but that shouldn't be too complicated). If we could modify _Unwind_Backtrace() directly this would all be much easier but unfortunately that's not an option.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 35

•

12 years ago

Note, the gecko profiler tells us what's going on in main threads, but if non-main threads are contributing to overhead, it will only be able to show that certain tasks are taking longer. So there's still value in having a working perf, even if it can't show us samples in .js code, e.g.

Justin Lebar (not reading bugmail)

Reporter

Comment 36

•

12 years ago

> IMHO there's two ways forward, either we help out finishing the last dependencies for > breakpad-based SPS unwinding or we put together something real quick for B2G only. I dont't think this has to be particularly complicated. Instead, I think we can just do what the kernel does: Assume we're compiled with a standard frame (e.g. -mapcs-frame or -mtpcs-frame) and walk the stack manually.

Gabriele Svelto [:gsvelto]

Comment 37

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #36) > I dont't think this has to be particularly complicated. Instead, I think we > can just do what the kernel does: Assume we're compiled with a standard > frame (e.g. -mapcs-frame or -mtpcs-frame) and walk the stack manually. In that case it should be a real quick fix to NS_StackWalk() with only a small addition for the {fp, lr} case you encountered.

Michael Vines [:m1] [:evilmachines]

Comment 38

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #16) > -marm doesn't work with gcc 4.4 What I'm seeing is that a libxul.so built with |--with-thumb=no| using gcc 4.4 fails in B2G sometime after the startup animation runs but before the homescreen appears. Is this the same "doesn't work"?

Justin Lebar (not reading bugmail)

Reporter

Comment 39

•

12 years ago

> a libxul.so built with |--with-thumb=no| using gcc 4.4 fails in B2G sometime after the > startup animation runs but before the homescreen appears. Yes; IIRC I was able to get to the lock screen but the device did something wrong when I tried to unlock (I forget if it crashed or simply didn't unlock). In fact something is apparently miscompiled in the JS engine; we don't pass the JS tests with a JS shell built with gcc 4.4 -marm. See bug 832379.

Garbled perf report of starting the clock app 12 years ago Justin Lebar (not reading bugmail) 28.90 KB, text/plain		Details
Perf report from main process 12 years ago Justin Lebar (not reading bugmail) 50.02 KB, text/plain		Details
Perf report of starting up FM radio app (after preloading) 12 years ago Justin Lebar (not reading bugmail) 528.40 KB, text/plain		Details
Perf report of launching FM radio proc with apparently correct addr2line 12 years ago Justin Lebar (not reading bugmail) 1.21 MB, text/plain		Details
perf report -G (starting with root) of launching FM radio app into preallocated process 12 years ago Justin Lebar (not reading bugmail) 2.26 MB, text/plain		Details
perf(1) profile of CrystalSkull main thread converted to SPS format 12 years ago Jed Davis [:jld] ⟨⏰\|UTC-8⟩ ⟦he/him⟧ 104.09 KB, application/octet-stream		Details