The sharedLibraries returned by nsIProfiler have empty breakpadId fields on Android

RESOLVED FIXED in Firefox 61

Status

()

P1
normal
RESOLVED FIXED
2 years ago
4 months ago

People

(Reporter: mstange, Assigned: m_kato)

Tracking

Trunk
mozilla61
All
Android
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox61 fixed)

Details

Attachments

(3 attachments, 4 obsolete attachments)

(Reporter)

Description

2 years ago
This is going to make it hard to find the right symbol table for them.
(Reporter)

Comment 1

2 years ago
We try to obtain the breakpadId with the function getId at http://searchfox.org/mozilla-central/rev/7419b368156a6efa24777b21b0e5706be89a9c2f/tools/profiler/core/shared-libraries-linux.cc#26 by mapping the file at the supplied path into memory and parsing the elf structure.

I haven't debugged this, but here are my guesses about what causes this:
For system libraries, we're not going to find the file because we don't have its full path due to bug 1350501.
For Firefox libraries, either we're not able to open a path of the form .../base.apk!/.../libxul.so, or we're running into the fact that these libraries are compressed using xz and so breakpad can't parse them.

glandium, any ideas for what we could do about this? Are there other ways to obtain the breakpadId?
Flags: needinfo?(mh+mozilla)
(Reporter)

Comment 2

2 years ago
Having something like ElfLoader::GetMappableFromPath available and somehow being able to pass the result into breakpad's ElfFileIdentifierFromMappedFile would be nice.
(Reporter)

Comment 3

2 years ago
I can confirm that fixing bug 1350501 gives us breakpadIds for system libraries. So now we just need to handle the zip path / compressed binary case.
(Reporter)

Comment 4

2 years ago
Casting dl_info->dlpi_addr to const void* and passing that to FileID::ElfFileIdentifierFromMappedFile works for the firefox libraries, but not for most system libraries. So in combination with bug 1350501 I think I now have everything I need. This seems like a hack though.
(Reporter)

Comment 5

2 years ago
(In reply to Markus Stange [:mstange] from comment #4)
> but not for most system libraries.

I don't understand why, though. The first bytes seem to be as expected, for example /system/lib/libc.so (loaded at 0xb6dfe000) starts with 7f454c4601010100, the 454c46 being the letters ELF.
(In reply to Markus Stange [:mstange] from comment #5)
> (In reply to Markus Stange [:mstange] from comment #4)
> > but not for most system libraries.
> 
> I don't understand why, though. The first bytes seem to be as expected, for
> example /system/lib/libc.so (loaded at 0xb6dfe000) starts with
> 7f454c4601010100, the 454c46 being the letters ELF.

Can you produce a memory dump at that address, for, say, 1KB?

(In reply to Markus Stange [:mstange] from comment #2)
> Having something like ElfLoader::GetMappableFromPath available and somehow
> being able to pass the result into breakpad's
> ElfFileIdentifierFromMappedFile would be nice.

That already exists, and in fact, the profiler uses that to access the eh_frame data (or used to): __dl_mmap/__dl_munmap. In fact, that function works for system libraries too, as long as they are in /system/lib.
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #6)
> eh_frame data (or used to)

seems it still does.
(Reporter)

Comment 8

2 years ago
Interesting! I'll take a look. Thanks!
So if the library start address returned by AutoObjectMapperFaultyLib::Map is an address that FileID::ElfFileIdentifierFromMappedFile can successfully dump a breakpadId from, then that means that for those libraries I don't need an absolute path *in order to compute the breakpadId*. However, I'm still interested in the absolute path of these files regardless. And as far as I can tell, __dl_mmap can't really help me with that problem.

But thanks for pointing me to AutoObjectMapper; at least I'll be able to re-use the code around get_installation_lib_dir() in order to get the correct absolute path for libmozglue.so, which was the last piece I needed in bug 1350501.
Comment hidden (mozreview-request)
(Reporter)

Updated

2 years ago
Assignee: nobody → mstange
Status: NEW → ASSIGNED

Comment 10

2 years ago
mozreview-review
Comment on attachment 8862605 [details]
Bug 1350500 - Compute the breakpadId for Firefox libraries on Android by mapping them into memory.

https://reviewboard.mozilla.org/r/134456/#review140412

::: tools/profiler/core/shared-libraries-linux.cc:56
(Diff revision 1)
>  # include <sys/types.h>
>  
>  #elif defined(GP_OS_android) && !defined(MOZ_WIDGET_GONK)
>  # define CONFIG_CASE_2 1
>  # include "ElfLoader.h" // dl_phdr_info
> +# include "../lul/AutoObjectMapper.h"

Why not add the lul directory to LOCAL_INCLUDES in tools/profiler/moz.build?

::: tools/profiler/core/shared-libraries-linux.cc:73
(Diff revision 1)
> +#if defined(USE_FAULTY_LIB)
> +static void DontLog(const char* aMsg) {}
> +#endif

I'd say it would be better to make AutoObjectMapperPOSIX allow aLog to be NULL, and avoid failedToMessage doing sprintfs altogether in that case.

::: tools/profiler/core/shared-libraries-linux.cc:99
(Diff revision 1)
>    PageAllocator allocator;
>    auto_wasteful_vector<uint8_t, sizeof(MDGUID)> identifier(&allocator);

Seems like you could move that before the USE_FAULTY_LIB section and remove the duplicate code from that section.
Attachment #8862605 - Flags: review?(mh+mozilla)
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)

Comment 13

a year ago
mozreview-review
Comment on attachment 8862605 [details]
Bug 1350500 - Compute the breakpadId for Firefox libraries on Android by mapping them into memory.

https://reviewboard.mozilla.org/r/134456/#review143370

::: tools/profiler/core/shared-libraries-linux.cc:59
(Diff revision 2)
> +    AutoObjectMapperFaultyLib mapper(nullptr);
> +    void* image = nullptr;
> +    size_t size = 0;
> +    if (mapper.Map(&image, &size, bin_name) && image && size) {
> +      if (FileID::ElfFileIdentifierFromMappedFile(image, identifier)) {
> +        return FileID::ConvertIdentifierToUUIDString(identifier) + "0";

I'd feel better if there was only one line doing this. But it wouldn't be much readable, except using a goto. meh.
Attachment #8862605 - Flags: review?(mh+mozilla) → review+
(In reply to Markus Stange [:mstange] from comment #11)
> Created attachment 8866517 [details]
> Bug 1350500 - Allow passing nullptr for aLog in AutoObjectMapperPOSIX.

[per comments on irc] my concern about this is that it could hide useful
diagnostic info in case of map failure.  I'd be happier if it could be
redone so as to cause any failure messages to be sent to stderr, at least.
Attachment #8866517 - Flags: review?(jseward)
(Assignee)

Updated

8 months ago
Blocks: 1435934
Markus, do you have a time to handle this bug?   If no time, I can take this (this bug is very important for remote profiling on Android).
Flags: needinfo?(mstange)
attached fix will cause dead lock since dl_iterate_phdr seems be have the mutex and dlopen tries to get the mutex too.
(Reporter)

Comment 17

7 months ago
I do not have time to handle this bug, sorry. I'm assigning it to you.
Assignee: mstange → m_kato
Flags: needinfo?(mstange)
Created attachment 8963477 [details] [diff] [review]
Always build AutoObjectMapper on Android
(Assignee)

Updated

7 months ago
Attachment #8866517 - Attachment is obsolete: true
Comment on attachment 8963477 [details] [diff] [review]
Always build AutoObjectMapper on Android

AutoObjectMapper doesn't build on Android/aarch64.  For part 1. fix, we should always build it on Android/aarch64 too.
Attachment #8963477 - Flags: review?(mstange)
Created attachment 8963478 [details] [diff] [review]
Elfloader uses read/write lock instead of mutex
Comment on attachment 8963478 [details] [diff] [review]
Elfloader uses read/write lock instead of mutex

When callback of dl_iterate_phdr uses dlopen (part 1 fix uses it), dlopen causes deadlock since __wrap_dl_iterate_phdr already acquires mutex.  So I would like to replace mutex of ElfLoader with read/write lock.
Attachment #8963478 - Flags: review?(mh+mozilla)
(Reporter)

Updated

7 months ago
Attachment #8963477 - Flags: review?(mstange) → review?(mh+mozilla)
Priority: -- → P1
Comment on attachment 8963477 [details] [diff] [review]
Always build AutoObjectMapper on Android

Review of attachment 8963477 [details] [diff] [review]:
-----------------------------------------------------------------

I guess you want that for arm64, which is not covered by the if clause, but is that really going to be useful without the rest of lul?
Attachment #8963477 - Flags: review?(mh+mozilla)
Comment on attachment 8963478 [details] [diff] [review]
Elfloader uses read/write lock instead of mutex

Review of attachment 8963478 [details] [diff] [review]:
-----------------------------------------------------------------

Without more information about what deadlock is happening how, I can't tell much, but it would seem you want a reentrant lock (PTHREAD_MUTEX_RECURSIVE), not a read-write one.
Attachment #8963478 - Flags: review?(mh+mozilla)
Created attachment 8963834 [details] [diff] [review]
Elfload should use PTHREAD_MUTEX_RECURSIVE

Use PTHREAD_MUTEX_RECURSIVE to avoid deadlock by dlopen.
Attachment #8963478 - Attachment is obsolete: true
Attachment #8963834 - Flags: review?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #23)
> Comment on attachment 8963477 [details] [diff] [review]
> Always build AutoObjectMapper on Android
> 
> Review of attachment 8963477 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I guess you want that for arm64, which is not covered by the if clause, but
> is that really going to be useful without the rest of lul?

Yes, this is for Android/aarch64.  aarch64 profiler support is unstable and lul doesn't implement aarch64 code now.

If I should turn off profiler on Android/aarch64 as default, I will update a patch.  How do you think?
by bug 1360322, Android/aarch64 turns on profiler without lul...
(In reply to Mike Hommey [:glandium] from comment #24)
> Comment on attachment 8963478 [details] [diff] [review]
> Elfloader uses read/write lock instead of mutex
> 
> Review of attachment 8963478 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Without more information about what deadlock is happening how, I can't tell
> much, but it would seem you want a reentrant lock (PTHREAD_MUTEX_RECURSIVE),
> not a read-write one.

Okay, I had missed this sentence "When callback of dl_iterate_phdr uses dlopen". Under those conditions, a recursive lock is actually dangerous. A read-write one would work better, but that would be a significant difference with the system linker, which doesn't guarantee you can call dlopen while dl_iterate_phdr'ing. How about filling an array with dl_iterate_phdr and then iterate that array and call dlopen?

Updated

7 months ago
Attachment #8963834 - Flags: review?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #28)
> (In reply to Mike Hommey [:glandium] from comment #24)
> > Comment on attachment 8963478 [details] [diff] [review]
> > Elfloader uses read/write lock instead of mutex
> > 
> > Review of attachment 8963478 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Without more information about what deadlock is happening how, I can't tell
> > much, but it would seem you want a reentrant lock (PTHREAD_MUTEX_RECURSIVE),
> > not a read-write one.
> 
> Okay, I had missed this sentence "When callback of dl_iterate_phdr uses
> dlopen". Under those conditions, a recursive lock is actually dangerous. A
> read-write one would work better, but that would be a significant difference
> with the system linker, which doesn't guarantee you can call dlopen while
> dl_iterate_phdr'ing. How about filling an array with dl_iterate_phdr and
> then iterate that array and call dlopen?

Hmm, if using copied array, for Android 4.4 suppoort, we need to create LibraryHandleList (for 5.0+) and DebuggingHelper's array list (for 4.1-4.3).  Original part 1 fix uses dlopen to get ElfHeader, but callback of dl_iterate_phdr have base address of image.  So I should rewrite markus's patch...
(Assignee)

Updated

7 months ago
Attachment #8963477 - Attachment is obsolete: true
(Assignee)

Updated

7 months ago
Attachment #8963834 - Attachment is obsolete: true
Created attachment 8965225 [details] [diff] [review]
Compute the breakpadId for Firefox libraries on Android by mapping them into memory

Update version of previous markus's fix without dlopen.

dlopen has mutex for custom linker, calling dlopen from callback of dl_iterate_phdr causes dead lock.  When using custom elf linker, module is already loaded into memory, so we can use ElfFileIdentifierFromMappedFile directly from base address.
Attachment #8965225 - Flags: review?(mh+mozilla)
(if r-, I will create lock-free version dl_iterate_phdr.)
Comment on attachment 8965225 [details] [diff] [review]
Compute the breakpadId for Firefox libraries on Android by mapping them into memory

Review of attachment 8965225 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/shared-libraries-linux.cc
@@ +39,5 @@
>  # error "Unexpected configuration"
>  #endif
>  
> +// Get the breakpad Id for the binary file pointed by bin name and base address
> +static std::string getId(const char *bin_name, unsigned long base)

libStart, like everywhere else?

@@ +49,5 @@
>    auto_wasteful_vector<uint8_t, sizeof(MDGUID)> identifier(&allocator);
>  
> +#if defined(GP_OS_android)
> + if (nsDependentCString(bin_name).Find("!/") != kNotFound) {
> +    // This moudule is in APK and compressed.  Since this is loaded into memory,

Typo: module.

@@ +51,5 @@
> +#if defined(GP_OS_android)
> + if (nsDependentCString(bin_name).Find("!/") != kNotFound) {
> +    // This moudule is in APK and compressed.  Since this is loaded into memory,
> +    // use it instead of loading again
> +    if (FileID::ElfFileIdentifierFromMappedFile((void*)base, identifier)) {

So the problem here is that while it might work now, there is no actual guarantee that ld is forever going to produce binaries where what we find a base points the the elf headers. As a matter of fact, iirc, lld breaks this assumption.
Attachment #8965225 - Flags: review?(mh+mozilla)
(In reply to Makoto Kato [:m_kato] from comment #31)
> (if r-, I will create lock-free version dl_iterate_phdr.)

You can't really do that, as long as the system linker is involved.

Why not just change dl_iterate_callback to collect dlpi_name and maybe other information, then change the loop that follows the  dl_iterate_phdr call to fill SharedLibraryInfo?
Created attachment 8967299 [details] [diff] [review]
Part 2. Don't call dlopen on callback
Attachment #8965225 - Attachment is obsolete: true
Comment on attachment 8967299 [details] [diff] [review]
Part 2. Don't call dlopen on callback

Since aarch64 doesn't implement native stack walker (bug 1450185), there is no AutoObjectMapperFaultyLib on Android/aarch64.
Attachment #8967299 - Flags: review?(mh+mozilla)
Comment on attachment 8967299 [details] [diff] [review]
Part 2. Don't call dlopen on callback

Review of attachment 8967299 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/shared-libraries-linux.cc
@@ +31,5 @@
>  #if defined(GP_OS_linux)
>  # include <link.h>      // dl_phdr_info
>  #elif defined(GP_OS_android)
> +#if defined(GP_PLAT_arm_android) || defined(GP_PLAT_x86_android)
> +// Android/aarch64 doesn't implement native stack walker yet.

If you're adding that because the previous patch breaks the build for aarch64, please fix the previous patch instead. If the build is otherwise not broken, please move this to a separate bug.

@@ +66,5 @@
>  
>    PageAllocator allocator;
>    auto_wasteful_vector<uint8_t, sizeof(MDGUID)> identifier(&allocator);
>  
> +#if defined(GP_PLAT_arm_android) || defined(GP_PLAT_x86_android)

same here

@@ +110,5 @@
>  
>  static int
>  dl_iterate_callback(struct dl_phdr_info *dl_info, size_t size, void *data)
>  {
> +  nsTArray<LoadedLibraryInfo>* libInfoList =

this is one of the cases where using auto is fine.
Attachment #8967299 - Flags: review?(mh+mozilla) → review+
(In reply to Mike Hommey [:glandium] from comment #36)
> Comment on attachment 8967299 [details] [diff] [review]
> Part 2. Don't call dlopen on callback
> 
> Review of attachment 8967299 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: tools/profiler/core/shared-libraries-linux.cc
> @@ +31,5 @@
> >  #if defined(GP_OS_linux)
> >  # include <link.h>      // dl_phdr_info
> >  #elif defined(GP_OS_android)
> > +#if defined(GP_PLAT_arm_android) || defined(GP_PLAT_x86_android)
> > +// Android/aarch64 doesn't implement native stack walker yet.
> 
> If you're adding that because the previous patch breaks the build for
> aarch64, please fix the previous patch instead. If the build is otherwise
> not broken, please move this to a separate bug.

So I added https://bugzilla.mozilla.org/attachment.cgi?id=8963477 at first time due to build break on android/aarch64.
Comment on attachment 8963477 [details] [diff] [review]
Always build AutoObjectMapper on Android

AutoObjectMapper works even if on aarch64 and this requires on all android by part 1.
Attachment #8963477 - Attachment is obsolete: false
Attachment #8963477 - Flags: review?(mh+mozilla)
If bug 1450185 is landed, all android has native stack walker, so comment #38's patch is unnecessary.
Comment on attachment 8963477 [details] [diff] [review]
Always build AutoObjectMapper on Android

cancel because I will be waiting for bug 1450185
Attachment #8963477 - Flags: review?(mh+mozilla)
(Assignee)

Updated

6 months ago
Depends on: 1450185

Comment 41

6 months ago
Pushed by m_kato@ga2.so-net.ne.jp:
https://hg.mozilla.org/integration/mozilla-inbound/rev/81e24b24a789
Compute the breakpadId for Firefox libraries on Android by mapping them into memory. r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/cdd70d25704e
Don't call dlopen on callback. r=glandium

Comment 42

6 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/81e24b24a789
https://hg.mozilla.org/mozilla-central/rev/cdd70d25704e
Status: ASSIGNED → RESOLVED
Last Resolved: 6 months ago
status-firefox61: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
(Reporter)

Comment 43

6 months ago
Thanks for driving this to the finish line! I just got a profile from today's Firefox Nightly for Android and it contained the expected breakpadIds.

Unfortunately, I didn't get any symbols in the profile. I think this is caused by a bug in the symbol API - the necessary symbols are on the server, but the API isn't handing them to us. I've filed bug 1457706 about this.
Depends on: 1458193
status-firefox55: affected → ---
You need to log in before you can comment on or make changes to this bug.