Closed Bug 1484828 Opened 6 years ago Closed 6 years ago

The linux shared library code doesn't preserve enough information to do accurate symbolication of Android libraries whose first mapping has a non-zero vaddr

Categories

(Core :: Gecko Profiler, enhancement)

All
Android
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox63 --- wontfix
firefox64 --- fixed

People

(Reporter: mstange, Assigned: mstange)

References

Details

Attachments

(3 files)

There's a problem with how the code in shared-libraries-linux.cc fills the fields of the SharedLibrary objects. This currently makes accurate symbolication impossible for certain Android system libraries, like libESXGLESv2_adreno.so.

Here's our dl_iterate_callback:
https://searchfox.org/mozilla-central/rev/83562422ecb0f5683da7a9a26ce05ce62bc0c882/tools/profiler/core/shared-libraries-linux.cc#129-156

It records the range of addresses that are covered by a library's mappings, but it does not remember the offset to the library's base address. But during symbolication, for every pc address we need to subtract the library's base address in order to get a value that can be meaningfully compared to the addresses in the library's symbol table.

Or, more specifically: Our SharedLibrary object has these fields:
  uintptr_t mStart;
  uintptr_t mEnd;
  uintptr_t mOffset;
And our current dl_iterate_callback makes it so that mOffset is always zero.

It needs to not be zero for libraries whose first mapping has a non-zero vaddr.

Notes:

> Current status:
>
> filteredThread.libs.find(l => l.name == "libESXGLESv2_adreno.so")
> {
>   name: "libESXGLESv2_adreno.so",
> ​​  path: "/system/vendor/lib/egl/libESXGLESv2_adreno.so",
> ​​  debugName: "libESXGLESv2_adreno.so",
> ​​  debugPath: "/system/vendor/lib/egl/libESXGLESv2_adreno.so",
> ​​  breakpadId: "E83024E3490FD596F3EBD25DCE1A68370",
> ​​  arch: "",
> ​​  start: 0xa57b4000,
>   end: 0xa5c00ff9,
> ​​  offset: 0,
> ​​}
​​> 
> /proc/9201/maps: a57b4000-a5ab5000 r-xp 00000000 fd:00 3008       /system/vendor/lib/egl/libESXGLESv2_adreno.so
> /proc/9201/maps: a5ab5000-a5bf2000 r--p 00300000 fd:00 3008       /system/vendor/lib/egl/libESXGLESv2_adreno.so
> /proc/9201/maps: a5bf2000-a5c00000 rw-p 0043d000 fd:00 3008       /system/vendor/lib/egl/libESXGLESv2_adreno.so
> 
> ph [2] for /system/vendor/lib/egl/libESXGLESv2_adreno.so with base address 0xa57a2000, offset 0x0, vaddr 0x12000, memsz 0x300750
> ph [3] for /system/vendor/lib/egl/libESXGLESv2_adreno.so with base address 0xa57a2000, offset 0x3007d8, vaddr 0x3137d8, memsz 0x14b821
> 
> $ .mozbuild/android-ndk-r15c/toolchains/arm-linux-androideabi-4.9/prebuilt/darwin-x86_64/bin/arm-linux-androideabi-readelf -l libESXGLESv2_adreno.so
> [...]
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   PHDR           0x000034 0x00012034 0x00012034 0x00120 0x00120 R   0x4
>   INTERP         0x000154 0x00012154 0x00012154 0x00013 0x00013 R   0x1
>       [Requesting program interpreter: /system/bin/linker]
>   LOAD           0x000000 0x00012000 0x00012000 0x300750 0x300750 R E 0x1000
>   LOAD           0x3007d8 0x003137d8 0x003137d8 0x14a5dc 0x14b821 RW  0x1000
>   DYNAMIC        0x43b750 0x0044e750 0x0044e750 0x00150 0x00150 RW  0x4
>   NOTE           0x000168 0x00012168 0x00012168 0x00038 0x00038 R   0x4
>   GNU_STACK      0x000000 0x00000000 0x00000000 0x00000 0x00000 RW  0
>   EXIDX          0x1e71a0 0x001f91a0 0x001f91a0 0x08a30 0x08a30 R   0x4
>   GNU_RELRO      0x3007d8 0x003137d8 0x003137d8 0x13c828 0x13c828 RW  0x8
> 
> Example frame address in process virtual memory: 0xa584ede3
> 
> 0xa584ede3 - 0xa57a2000 = 0xacde3
> 
> 000acda0 T EsxContext::GlClear(unsigned int) <-- correct!
> 000ace00 T EsxContext::GlClearColor(float, float, float, float)
> 
> So, want:
> 
> {
>   name: "libESXGLESv2_adreno.so",
> ​​  path: "/system/vendor/lib/egl/libESXGLESv2_adreno.so",
> ​​  debugName: "libESXGLESv2_adreno.so",
> ​​  debugPath: "/system/vendor/lib/egl/libESXGLESv2_adreno.so",
> ​​  breakpadId: "E83024E3490FD596F3EBD25DCE1A68370",
> ​​  arch: "",
> ​​  start: 0xa57a2000 + 0x00012000,
>   end: 0xa57a2000 + 0x00012000 + 0x300750,
> ​​  offset: 0x00012000,
> ​​}
I don't understand what's happening in this code and would appreciate some
guidance.  It's possible that there are two different meanings of the word
"offset" that are clashing here.
SharedLibrary::GetOffset() means firstMappingStart - baseAddress.
I get a feeling that the "zero offset" check here cares more about the mapping
offset phdr.p_offset, but I'm not sure.

Anyway, the purpose of this patch is the following:
Now that the previous patch sets SharedLibrary::mOffset to something non-zero,
this check breaks unwinding for the affected libraries. With the check removed,
unwinding seems to work mostly fine, as evidenced by this profile:
https://perfht.ml/2MBKzON

But removing the check might break something else that I'm not seeing.

Depends on D3835
It took some time staring at this code and trying to remember what was going on with the profiler 5 years ago, but I think I've got it.

(In reply to Markus Stange [:mstange] from comment #3)
> I don't understand what's happening in this code and would appreciate some
> guidance.  It's possible that there are two different meanings of the word
> "offset" that are clashing here.
> SharedLibrary::GetOffset() means firstMappingStart - baseAddress.
> I get a feeling that the "zero offset" check here cares more about the
> mapping offset phdr.p_offset, but I'm not sure.

Yes, the problem I was having (and what that TODO comment about the dynamic linker was referring to) was with the case where no mapping covers file offset 0 (p_offset, not p_vaddr), meaning that I couldn't access the ELF headers to look for the PT_ARM_EXIDX.  And I couldn't just get the phdrs directly from dl_iterate_phdr because we didn't have dl_iterate_phdr on B2G[1].

There are magic number checks in the EHTable constructor, so if it gets a pointer to something that isn't an ELF file header it should skip the library.  It would be nice to explicitly avoid that case if possible, but it's not essential.

As for the case we're encountering here, where p_offset 0 is loaded at non-zero p_vaddr: I thought we had to deal with that on B2G (ICS libc, maybe?) and had gotten it working, but I suppose either it bit-rotted or I misremember.


[1] https://searchfox.org/mozilla-central/rev/42fe9dcb23ad00e54140dc5ae4ad3a068f040056/tools/profiler/shared-libraries-linux.cc#84-85
Comment on attachment 9002596 [details]
Bug 1484828 - Set SharedLibrary::mOffset to firstMappingStart - baseAddress. r=glandium

Mike Hommey [:glandium] has approved the revision.
Attachment #9002596 - Flags: review+
Thanks for the fast reviews! And sorry that I'm so slow to update these patches. I'm looking at them again now.
I've addressed the comments. Not sure if moz-phab re-requested review on the patches.
I haven't found out how to re-request review in Phabricator yet - Jed, could you take another look?
Flags: needinfo?(jld)
Sorry for the delay; I've re-reviewed.
Flags: needinfo?(jld)
Pushed by mstange@themasta.com:
https://hg.mozilla.org/integration/autoland/rev/9d52b6dad6f5
Set SharedLibrary::mOffset to firstMappingStart - baseAddress. r=glandium
https://hg.mozilla.org/integration/autoland/rev/5af47e5b67f0
Rename some fields in EHABIStackWalk.cpp. r=jld
https://hg.mozilla.org/integration/autoland/rev/9b246aa2b9cd
Don't skip libraries with non-zero offsets. r=jld
https://hg.mozilla.org/mozilla-central/rev/9d52b6dad6f5
https://hg.mozilla.org/mozilla-central/rev/5af47e5b67f0
https://hg.mozilla.org/mozilla-central/rev/9b246aa2b9cd
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: