Closed Bug 897769 Opened 11 years ago Closed 11 years ago

Test/benchmark PERF_SAMPLE_STACK_USER on B2G

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: jld, Assigned: jld)

References

Details

(Keywords: perf, Whiteboard: [c=profiling p=3 s=2013.08.09])

Attachments

(1 file)

loop.c 11 years ago Jed Davis [:jld] ⟨⏰\|UTC-7⟩ ⟦he/him⟧ 535 bytes, text/x-csrc		Details

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Assignee

Description

•

11 years ago

Part of the story for using ARM exception handling tables instead of the current frame pointer hacks is being able to get userspace stacks for perf_event profiling.  There were Linux kernel changes in August 2012 to allow copying part of the sampled process's stack into the perf buffer as part of the sample so that a userland agent could perform table-driven stack unwinding instead of trying to embed that much complexity in the kernel.

So we'd need to backport it to the older kernels we're using for b2g, and get an idea of how well it performs.  This is, I think, the main item of missing information here.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Assignee

Comment 1

•

11 years ago

Here's a small benchmark program on my keon:
    0m2.98s real     0m2.98s user     0m0.00s system

With perf running globally at 1 kHz, not copying the stack:
    0m3.01s real     0m3.00s user     0m0.00s system

Copying 512 bytes of stack per sample:
    0m3.01s real     0m3.00s user     0m0.00s system

Copying 32 KiB of stack (same as what the breakpad unwinder in Gecko does):
    0m3.30s real     0m3.14s user     0m0.00s system

Copying up to 32 KiB of stack (and allocating that much buffer space per sample), but using only 1184 bytes[1]:
    0m3.21s real     0m3.05s user     0m0.00s system

The "real" time includes fwrite()ing the full records to /dev/null, which appears to be slower than actually unwinding them will be[2].  The "user" time difference is the actual cost of the interrupt handler.  The empty space in the last case shouldn't cost anything directly (it's not zeroed or otherwise written), but it presumably has cache and/or TLB effects, and increases profiler wakeups.

For something to measure this against, here's an example of the current in-kernel frame pointer unwinding: 1 kHz, -mapcs-frame, 102 stack frames:

    0m3.05s real     0m3.04s user     0m0.00s system


[1] The stack dump proceeds until the specified size limit is reached or an access fails, so if we're on the main stack then the area with the process's arguments and initial environment will be copied.

[2] My work in progress on bug 810526 has been getting 50-60 µs/sample on somewhat deeply nested stacks, of which a large minority was the Gecko profiler infrastructure.  Additionally, there remains room for optimization, and it should be faster when it's not handling the "pop under bitmask" instructions needed for the frame pointers used for meta-profiling.)

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Assignee

Comment 2

•

11 years ago

The kernel source: https://github.com/jld/gp-keon-kernel/compare/perf-stackcopy-gp

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Assignee

Comment 3

•

11 years ago

Attached file loop.c — Details

My small "benchmark" program.  Creates a bunch of frames and runs a timing loop.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Assignee

Comment 4

•

11 years ago

I'm going to say that the answer is “yes, fast enough”.  1kHz should be enough for most uses.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Mike Lee [:mlee]

Updated

•

11 years ago

Keywords: perf

Whiteboard: [c=profiling,p=3] → [c=profiling p=3 s=2013.08.09]

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Test/benchmark PERF_SAMPLE_STACK_USER on B2G

Categories

(Firefox OS Graveyard :: General, defect)

Tracking

(Not tracked)

People

(Reporter: jld, Assigned: jld)

References

Details

(Keywords: perf, Whiteboard: [c=profiling p=3 s=2013.08.09])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Attachment

General

Description

File Name

Content Type