Closed Bug 1561704 Opened 5 years ago Closed 2 years ago

simpleperf symbols often are wrong for local builds when using elfhack

Categories

(GeckoView :: General, defect, P3)

Unspecified
All
defect

Tracking

(firefox69 affected)

RESOLVED INCOMPLETE
Tracking Status
firefox69 --- affected

People

(Reporter: jesup, Unassigned)

References

(Blocks 1 open bug)

Details

sefeng and I both get bad/offset symbols from simpleperf runs against GVE if we use a local build. Running it with GVE and symbols from acreskey's builds works, and using our local builds with Gecko Profiler works correctly, so it's likely related to differences in how addresses are translated to symbols between simpleperf and Gecko Profiler.

Bad example: https://perfht.ml/2LrBZkC

Not being able to use simpleperf will hurt startup speed/perf investigations (and perhaps other investigations like pageload)

[qf-triaging as qf:investigate, since this doesn't really fall into other categories.]

Whiteboard: [qf][geckoview] → [qf:investigate][geckoview]

Mike, do you have any suggestions why Gecko Profiler can symbolicate local builds of the geckoview_example app on Android but simpleperf cannot? Is this an elfhack problem?

Flags: needinfo?(mh+mozilla)

It could be. Try running elfhack -r on the libraries, like we do when running xpcshell tests.

Flags: needinfo?(mh+mozilla)

If elfhack -r makes it work, then it means there's a problem with simpleperf, making assumptions about PT_LOAD segments that it shouldn't do.

(In reply to Mike Hommey [:glandium] from comment #3)

It could be. Try running elfhack -r on the libraries, like we do when running xpcshell tests.
If elfhack -r makes it work, then it means there's a problem with simpleperf, making assumptions about PT_LOAD segments that it shouldn't do.

^ Randell, Mike suggests running elfhack -r on the libraries before running simpleperf.

Flags: needinfo?(rjesup)

If elfhack -r makes it work, then it means there's a problem with simpleperf, making assumptions about PT_LOAD segments that it shouldn't do.

^ Randell, Mike suggests running elfhack -r on the libraries before running simpleperf.

"Not elfhacked. Skipping" for all .so's copied over as part of running simpleperf (xul, glue, etc)

Any other ideas? code is in utils.py and annotate.py it appears in https://android.googlesource.com/platform/prebuilts/simpleperf

Flags: needinfo?(rjesup) → needinfo?(mh+mozilla)

At quick glance it seems to use addr2line, so it should work. But maybe the offsets are wrong in the first place.... But without a trace and the corresponding libraries, it's not possible to say much more.

Flags: needinfo?(mh+mozilla)

The libraries in the .apk that's somewhere in the data dump you sent me are elfhacked. That's likely the cause.

Whiteboard: [qf:investigate][geckoview] → [geckoview]

James says we could make simpleperf work with elfhack, but the simpler workaround is to just disable elfhack locally when using simpleperf.

Randell, is disabling elfhack locally an acceptable workaround for your simpleperf profiling? Is there a way we can make disabling elfhack easier for profiling?

Flags: needinfo?(rjesup)
Summary: simpleperf symbols often are wrong for local builds → simpleperf symbols often are wrong for local builds when using elfhack
Whiteboard: [geckoview]

We'd have to verify that removing elfhack solves the problem; I suspect the libraries that are copied to the binary_cache are non-elfhacked, but the ones (per glandium) in the apk are. I imagine it's not hard to turn off elfhack by changing the build scripts, but I have not idea if there's an option to do so.

snorp, who would know?

Flags: needinfo?(rjesup) → needinfo?(snorp)

--disable-elf-hack.

Priority: -- → P3
Flags: needinfo?(rjesup)

--disable-elf-hack does in fact solve the problem. I believe the issue was triggered by using --enable-release, which defaults elfhack to enabled.

Perhaps --enable-profiling could imply --disable-elf-hack? Or perhaps --enable-profiling can turn on all the other perf things from --enable-release, like the Rust optimization level. glandium?

Another bit of confusion is why Gecko Profiler can read correct symbols with an elfhacked libxul, but simpleperf can't. markus/greg?

Flags: needinfo?(rjesup)
Flags: needinfo?(mstange)
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(gtatum)

I do have ac_add_options --enable-release in my .mozconfig.

But I don't see the elfhack binary on my system.
If someone can point me to it I can check the elfhack state of my binaries. But I would guess that they're not elfhacked.

Markus is not currently available, and he would know the most about what's going on here. I can look into the (somewhat complicated) implementation of the symbolication path, but I'll need and STR to do so. I'm not familiar with all the tech here.

Flags: needinfo?(gtatum)

As per comment 7, simpleperf is using addr2line, so it should work... but what is producing the library+offsets it processes is likely assuming things that are wrong when elfhack is used.

Perhaps --enable-profiling could imply --disable-elf-hack? Or perhaps --enable-profiling can turn on all the other perf things from --enable-release, like the Rust optimization level. glandium?

Neither is desirable.

Flags: needinfo?(mh+mozilla)

(In reply to Andrew Creskey from comment #13)

I do have ac_add_options --enable-release in my .mozconfig.

But I don't see the elfhack binary on my system.
If someone can point me to it I can check the elfhack state of my binaries. But I would guess that they're not elfhacked.

I noticed that elfhack isn't compatible with AArch64; if your builds are AArch64 that would make sense.

(In reply to Mike Hommey [:glandium] from comment #15)

As per comment 7, simpleperf is using addr2line, so it should work... but what is producing the library+offsets it processes is likely assuming things that are wrong when elfhack is used.

Perhaps --enable-profiling could imply --disable-elf-hack? Or perhaps --enable-profiling can turn on all the other perf things from --enable-release, like the Rust optimization level. glandium?

Neither is desirable.

Perhaps we should have --enable-perf-testing then, which could imply --enable-release and --disable-elf-hack (on arm32), and maybe a few other things (turn off nightly GC poisoning, etc)

Though the description (likely badly out-of-date) for --enable-profiling is '{Set|Do not set} compile flags necessary for using sampling profilers (e.g. shark, perf)' -- that would imply to me that --enable-profiling should be what you set for using 'perf'

Rank: 17
Severity: normal → S3
Rank: 17 → 333

It doesn't look like we ever got to the bottom of this. But if we use simpleperf again, I would suggest using samply load perf.data to convert it to a Firefox Profiler profile; if that also gives bad symbols, then we can do some more investigation.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(mstange.moz)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.