simpleperf symbols often are wrong for local builds when using elfhack
Categories
(GeckoView :: General, defect, P3)
Tracking
(firefox69 affected)
Tracking | Status | |
---|---|---|
firefox69 | --- | affected |
People
(Reporter: jesup, Unassigned)
References
(Blocks 1 open bug)
Details
sefeng and I both get bad/offset symbols from simpleperf runs against GVE if we use a local build. Running it with GVE and symbols from acreskey's builds works, and using our local builds with Gecko Profiler works correctly, so it's likely related to differences in how addresses are translated to symbols between simpleperf and Gecko Profiler.
Bad example: https://perfht.ml/2LrBZkC
Not being able to use simpleperf will hurt startup speed/perf investigations (and perhaps other investigations like pageload)
Comment 1•5 years ago
|
||
[qf-triaging as qf:investigate, since this doesn't really fall into other categories.]
Comment 2•5 years ago
|
||
Mike, do you have any suggestions why Gecko Profiler can symbolicate local builds of the geckoview_example app on Android but simpleperf cannot? Is this an elfhack problem?
Comment 3•5 years ago
|
||
It could be. Try running elfhack -r on the libraries, like we do when running xpcshell tests.
Comment 4•5 years ago
|
||
If elfhack -r makes it work, then it means there's a problem with simpleperf, making assumptions about PT_LOAD segments that it shouldn't do.
Comment 5•5 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #3)
It could be. Try running elfhack -r on the libraries, like we do when running xpcshell tests.
If elfhack -r makes it work, then it means there's a problem with simpleperf, making assumptions about PT_LOAD segments that it shouldn't do.
^ Randell, Mike suggests running elfhack -r
on the libraries before running simpleperf.
Reporter | ||
Comment 6•5 years ago
|
||
If elfhack -r makes it work, then it means there's a problem with simpleperf, making assumptions about PT_LOAD segments that it shouldn't do.
^ Randell, Mike suggests running
elfhack -r
on the libraries before running simpleperf.
"Not elfhacked. Skipping" for all .so's copied over as part of running simpleperf (xul, glue, etc)
Any other ideas? code is in utils.py and annotate.py it appears in https://android.googlesource.com/platform/prebuilts/simpleperf
Comment 7•5 years ago
|
||
At quick glance it seems to use addr2line, so it should work. But maybe the offsets are wrong in the first place.... But without a trace and the corresponding libraries, it's not possible to say much more.
Comment 8•5 years ago
|
||
The libraries in the .apk that's somewhere in the data dump you sent me are elfhacked. That's likely the cause.
Updated•5 years ago
|
Comment 9•5 years ago
|
||
James says we could make simpleperf work with elfhack, but the simpler workaround is to just disable elfhack locally when using simpleperf.
Randell, is disabling elfhack locally an acceptable workaround for your simpleperf profiling? Is there a way we can make disabling elfhack easier for profiling?
Reporter | ||
Comment 10•5 years ago
|
||
We'd have to verify that removing elfhack solves the problem; I suspect the libraries that are copied to the binary_cache are non-elfhacked, but the ones (per glandium) in the apk are. I imagine it's not hard to turn off elfhack by changing the build scripts, but I have not idea if there's an option to do so.
snorp, who would know?
Comment 11•5 years ago
|
||
--disable-elf-hack.
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 12•5 years ago
|
||
--disable-elf-hack does in fact solve the problem. I believe the issue was triggered by using --enable-release, which defaults elfhack to enabled.
Perhaps --enable-profiling could imply --disable-elf-hack? Or perhaps --enable-profiling can turn on all the other perf things from --enable-release, like the Rust optimization level. glandium?
Another bit of confusion is why Gecko Profiler can read correct symbols with an elfhacked libxul, but simpleperf can't. markus/greg?
Comment 13•5 years ago
|
||
I do have ac_add_options --enable-release
in my .mozconfig.
But I don't see the elfhack
binary on my system.
If someone can point me to it I can check the elfhack state of my binaries. But I would guess that they're not elfhacked.
Comment 14•5 years ago
|
||
Markus is not currently available, and he would know the most about what's going on here. I can look into the (somewhat complicated) implementation of the symbolication path, but I'll need and STR to do so. I'm not familiar with all the tech here.
Comment 15•5 years ago
|
||
As per comment 7, simpleperf is using addr2line, so it should work... but what is producing the library+offsets it processes is likely assuming things that are wrong when elfhack is used.
Perhaps --enable-profiling could imply --disable-elf-hack? Or perhaps --enable-profiling can turn on all the other perf things from --enable-release, like the Rust optimization level. glandium?
Neither is desirable.
Reporter | ||
Comment 16•5 years ago
|
||
(In reply to Andrew Creskey from comment #13)
I do have
ac_add_options --enable-release
in my .mozconfig.But I don't see the
elfhack
binary on my system.
If someone can point me to it I can check the elfhack state of my binaries. But I would guess that they're not elfhacked.
I noticed that elfhack isn't compatible with AArch64; if your builds are AArch64 that would make sense.
(In reply to Mike Hommey [:glandium] from comment #15)
As per comment 7, simpleperf is using addr2line, so it should work... but what is producing the library+offsets it processes is likely assuming things that are wrong when elfhack is used.
Perhaps --enable-profiling could imply --disable-elf-hack? Or perhaps --enable-profiling can turn on all the other perf things from --enable-release, like the Rust optimization level. glandium?
Neither is desirable.
Perhaps we should have --enable-perf-testing then, which could imply --enable-release and --disable-elf-hack (on arm32), and maybe a few other things (turn off nightly GC poisoning, etc)
Though the description (likely badly out-of-date) for --enable-profiling is '{Set|Do not set} compile flags necessary for using sampling profilers (e.g. shark, perf)' -- that would imply to me that --enable-profiling should be what you set for using 'perf'
Updated•5 years ago
|
Updated•5 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 17•2 years ago
|
||
It doesn't look like we ever got to the bottom of this. But if we use simpleperf again, I would suggest using samply load perf.data
to convert it to a Firefox Profiler profile; if that also gives bad symbols, then we can do some more investigation.
Description
•