Allow Linux perf_event sampling of context switches to get call stacks on B2G

RESOLVED WONTFIX

Status

RESOLVED WONTFIX
6 years ago
3 years ago

People

(Reporter: jld, Assigned: jld)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [perf-reviewed])

Attachments

(2 attachments)

This was looked at in bug 847592 — see bug 847592 comment 49 and the next few — but it needs its own bug.  To summarize: We'll sometimes want to know what threads are doing when they're not on the CPU — if they're blocked (and, if so, on what) or if they're runnable but not running due to scheduling decisions.  The perf_event framework has an event type for probing context switches, which would give us this information, but it doesn't gather call stacks on ARM.  It looks as if at least part of what's missing is perf_arch_fetch_caller_regs[1], whose implementations on other platforms are relatively simple.  That retrieves the necessary kernel registers, so there may be something else needed to get the user stack as well.

[1] http://lxr.free-electrons.com/ident?i=perf_arch_fetch_caller_regs
Summary: Allow perf_event sampling of context switches to get call chain → Allow Linux perf_event sampling of context switches to get call stacks on B2G
Assignee: nobody → jld
Created attachment 762996 [details] [diff] [review]
Linux kernel patch (against 3.10.0-rc5+) to add perf_arch_fetch_caller_regs on ARM

Here's a patch against Linux master.  Note that it hasn't been run past upstream yet, and it may not apply cleanly to other versions.
Created attachment 762997 [details] [diff] [review]
Linux kernel patch (against 3.0.8-GP+) to add perf_arch_fetch_caller_regs on ARM

This one is for applying to the geeksphone keon kernel.
Attachment #762996 - Attachment description: Linux kernel patch (against 3.10.0-rc5+) to add perf_arch_caller_regs on ARM → Linux kernel patch (against 3.10.0-rc5+) to add perf_arch_fetch_caller_regs on ARM

Updated

6 years ago
Keywords: perf
Whiteboard: p= c= ,

Updated

6 years ago
Status: NEW → ASSIGNED
Whiteboard: p= c= , → [p=profiling c=]
Whiteboard: [p=profiling c=] → [p=5 c=]

Updated

6 years ago
Whiteboard: [p=5 c=] → [p=5 c=profiling]
Depends on: 904899
Whiteboard: [p=5 c=profiling] → [c=profiling s=2013.09.06 p=3]
Whiteboard: [c=profiling s=2013.09.06 p=3] → [c=profiling s= p=3]
I got a proof-of-concept working, Wednesday morning during the Oslo work week: https://people.mozilla.org/~bgirard/cleopatra/#report=619e178f5e8e20e78f9b07ab2dd4e5e0b7176345

Judging by the relatively large number of non-unwound samples there are probably some bugs, but it works.

The question is: how does this compare to the Gecko profiler, especially given that it's gaining the ability to sample multiple threads in different processes and combine them into one profile (which was arguably perf's big advantage), and should soon be able to trace past jitcode (which at this point perf seems unlikely to ever be able to do, at least not without even more kernel changes and special-case hacks)?
I still think being able to tell why a thread is blocked (just switched out, stuck on IO, waiting for a lock) is extremely valuable. If we can get that information with the Gecko profiler then I'm happy.

Updated

5 years ago
Keywords: perf
Whiteboard: [c=profiling s= p=3]

Updated

5 years ago
Whiteboard: [perf-reviewed]
This is probably not happening in any form; I think the Gecko profiler can get roughly equivalent information by now; and it's possible that the upstream Linux kernel already took care of this.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.