Closed Bug 916923 Opened 6 years ago Closed 6 years ago

Crash dumps not working for Android x86 emulator tests

Categories

(Testing :: General, defect)

x86_64
Linux
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
mozilla27

People

(Reporter: gbrown, Assigned: froydnj)

References

Details

Attachments

(3 files, 2 obsolete files)

From :ted in https://bugzilla.mozilla.org/show_bug.cgi?id=895186#c186:

> This isn't a symbols issue, this dump looks completely broken. If you can grab one of
> these minidumps off of a slave and attach it here or somewhere else we can poke at it.

Dump file attached. This was taken from an M3 test run against http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/gbrown@mozilla.com-3f363a773eed.
Assignee: nobody → ted
We're not writing out a good module list in this dump:
2013-09-16 15:04:46: minidump.cc:2706: ERROR: MinidumpModuleList could not store module 151/152, libmozalloc.so, 0xa9f0e000+0x4000
2013-09-16 15:04:46: minidump.cc:4613: ERROR: GetStream could not read stream type 4
2013-09-16 15:04:46: minidump_dump.cc:115: ERROR: minidump.GetModuleList() failed

Is this just elfhack screwing us up again? /proc/self/maps from the dump shows:
a9f0a000-a9f0e000 rw-p a9f0a000 00:00 0 
a9f0e000-a9f10000 r-xs 00000000 00:08 2977       /dev/ashmem/libmozalloc.so (del
eted)
a9f10000-a9f12000 rw-s 00001000 00:08 2977       /dev/ashmem/libmozalloc.so (del
eted)
a9f12000-a9f15000 rw-s 00000000 00:08 2977       /dev/ashmem/libmozalloc.so (del
eted)
Ah, that'd be on demand decompression messing with breakpad. It should happen on arm, too :-/
I guess it's time for ted to review those breakpad patches ;)
Aha, now i remember. This is something that did happen on arm and that we worked around here:
https://mxr.mozilla.org/mozilla-central/source/mozglue/linker/Mappable.cpp#185

But that only takes care of the increasing addresses case, and that doesn't work on x86 where the addresses are decreasing. The easy work around here (short of getting ted to review those breakpad patches) is to add another similar guard page at the beginning.

Nathan, would you mind writing the patch for this? I can't stay the only one writing code under mozglue/linker (and you the only one reviewing).
Flags: needinfo?(nfroyd)
(In reply to Mike Hommey [:glandium] from comment #3)
> Aha, now i remember. This is something that did happen on arm and that we
> worked around here:
> https://mxr.mozilla.org/mozilla-central/source/mozglue/linker/Mappable.
> cpp#185
> 
> But that only takes care of the increasing addresses case, and that doesn't
> work on x86 where the addresses are decreasing. The easy work around here
> (short of getting ted to review those breakpad patches) is to add another
> similar guard page at the beginning.
> 
> Nathan, would you mind writing the patch for this? I can't stay the only one
> writing code under mozglue/linker (and you the only one reviewing).

I can try writing the patch for this, but I'll either need to get an x86 Android device or have someone willing to test potentially broken patches for me.  (And I won't be able to get to it this week.)
Flags: needinfo?(nfroyd)
(In reply to Nathan Froyd (:froydnj) (AFK 16-20 September) from comment #4)
> (In reply to Mike Hommey [:glandium] from comment #3)
> I can try writing the patch for this, but I'll either need to get an x86
> Android device or have someone willing to test potentially broken patches
> for me.  (And I won't be able to get to it this week.)

I can test patches for you.
Android x86 test sets hidden on {mozilla-central, mozilla-inbound, b2g-inbound, fx-team, try} due to this bug (per bug 895186 comment 196).
Blocks: 917361
No longer blocks: 895186
Blocks: 914439, 907383
(In reply to Phil Ringnalda (:philor) from comment #7)
> Interestingly, this is intermittent -
> https://tbpl.mozilla.org/php/getParsedLog.php?id=28007387&tree=Mozilla-
> Inbound is busted,
> https://tbpl.mozilla.org/php/getParsedLog.php?id=27990168&tree=Mozilla-
> Inbound is what that crash should have been.

This is not entirely surprising, as this depends on the memory layout. It is very possible that in some cases, memory is laid in such a way that makes the library mappings and the decompression mappings not adjacent, and that would stop confusing breakpad.
From IRC:
ted: i have no plans to fix that, i think glandium or froydnj is going to

Any volunteers to own the bug?
Assignee: ted → nobody
Nathan said he'd write the patch in comment 4
Assignee: nobody → nfroyd
Mike, I think this is what you had in mind in comment 3.  I was going to try to
add just a single page on each platform as appropriate, but two pages is easier
to think about and read the code for.

Geoff, could you please test this patch?  I think it works ok on my local ARM
device.
Attachment #808727 - Flags: review?(mh+mozilla)
Attachment #808727 - Flags: feedback?(gbrown)
Hm, this page actually fails to load libraries properly; the headers of the libraries turn up to be all zeros...not a good thing.
This patch seems to work better; I can actually browse pages on ARM now.

The remapping of the area we want to use seemingly should have been done regardless;
the mmap(2) man page says:

       MAP_FIXED
              Don't interpret addr as a hint: place the mapping at
              exactly that address.  addr must be a multiple of the page
              size.  If the memory region specified by addr and len
              overlaps pages of any existing mapping(s), then the
              overlapped part of the existing mapping(s) will be
              discarded.  If the specified address cannot be used,
              mmap() will fail.  Because requiring a fixed address for a
              mapping is less portable, the use of this option is
              discouraged.

In particular, the remapping of the last page that we were doing previously ought to
have deleted the mapping we just set up.  Perhaps the kernel is only zealous about
checking when regions have identical start addresses (as the remap of the first page
here does) and is less worried about remapping within the region...?

Geoff, if you would test this; I think it works better than the last one. :)
Attachment #808727 - Attachment is obsolete: true
Attachment #808727 - Flags: review?(mh+mozilla)
Attachment #808727 - Flags: feedback?(gbrown)
Attachment #808822 - Flags: review?(mh+mozilla)
Attachment #808822 - Flags: feedback?(gbrown)
Blocks: 919784
Comment on attachment 808822 [details] [diff] [review]
work around crash reporter issues with adjacent memory mappings on x86 android

Review of attachment 808822 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/Mappable.cpp
@@ +191,5 @@
> +     * and on others, such mappings are placed at adjacent, decreasing
> +     * memory addresses (x86). It is more convenient to just do two pages
> +     * everywhere than to twiddle with platform-specific #defines.
> +     */
> +    void *buf = ::mmap(NULL, length + 2 * PAGE_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);

Since you're actually (re)mapping further below, you might as well map anonymous PROT_NONE memory here, and mmap MAP_FIXED for the actual buffer as you're doing now. And that'd remove the need for the two mmaps for the first and last page, since you'd be mapping them here.

@@ +196,2 @@
>      if (buf != MAP_FAILED) {
> +      DEBUG_LOG("Decompression buffer of size %x in ashmem \"%s\", mapped @%p",

0x%x

@@ +201,5 @@
> +      char *last_page = map_page + ((length + PAGE_SIZE - 1) & PAGE_MASK);
> +      int flags = MAP_FIXED | MAP_PRIVATE | MAP_ANONYMOUS;
> +      ::mmap(first_page, PAGE_SIZE, PROT_NONE, flags, -1, 0);
> +      ::mmap(last_page, PAGE_SIZE, PROT_NONE, flags, -1, 0);
> +      

trailing whitespaces.

@@ +204,5 @@
> +      ::mmap(last_page, PAGE_SIZE, PROT_NONE, flags, -1, 0);
> +      
> +      void *actual_buf = ::mmap(map_page, last_page - map_page, PROT_READ | PROT_WRITE,
> +                                MAP_FIXED | MAP_SHARED, fd, 0);
> +      DEBUG_LOG("Actual usable buffer of size %x in ashmem \"%s\", mapped @%p",

I'd just move the Decompression buffer... message here.
Attachment #808822 - Flags: review?(mh+mozilla) → feedback+
One large anonymous mmap with a fixed shared mmap in the middle coming right up.
Attachment #808822 - Attachment is obsolete: true
Attachment #808822 - Flags: feedback?(gbrown)
Attachment #808916 - Flags: review?(mh+mozilla)
Attachment #808916 - Flags: feedback?(gbrown)
Comment on attachment 808916 [details] [diff] [review]
work around crash reporter issues with adjacent memory mappings on x86 android

Review of attachment 808916 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/Mappable.cpp
@@ +201,5 @@
> +
> +      void *actual_buf = ::mmap(map_page, last_page - map_page, PROT_READ | PROT_WRITE,
> +                                MAP_FIXED | MAP_SHARED, fd, 0);
> +      if (actual_buf == MAP_FAILED) {
> +        DEBUG_LOG("Fixed allocation of decompression buffer at %p failed", map_page);

may want to unmap buf, here.
Attachment #808916 - Flags: review?(mh+mozilla) → review+
Comment on attachment 808916 [details] [diff] [review]
work around crash reporter issues with adjacent memory mappings on x86 android

Review of attachment 808916 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozglue/linker/Mappable.cpp
@@ +202,5 @@
> +      void *actual_buf = ::mmap(map_page, last_page - map_page, PROT_READ | PROT_WRITE,
> +                                MAP_FIXED | MAP_SHARED, fd, 0);
> +      if (actual_buf == MAP_FAILED) {
> +        DEBUG_LOG("Fixed allocation of decompression buffer at %p failed", map_page);
> +        retun NULL;

s/retun/return/
!!
(In reply to Geoff Brown [:gbrown] from comment #17)
> Comment on attachment 808916 [details] [diff] [review]
> work around crash reporter issues with adjacent memory mappings on x86
> android
> 
> Review of attachment 808916 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: mozglue/linker/Mappable.cpp
> @@ +202,5 @@
> > +      void *actual_buf = ::mmap(map_page, last_page - map_page, PROT_READ | PROT_WRITE,
> > +                                MAP_FIXED | MAP_SHARED, fd, 0);
> > +      if (actual_buf == MAP_FAILED) {
> > +        DEBUG_LOG("Fixed allocation of decompression buffer at %p failed", map_page);
> > +        retun NULL;
> 
> s/retun/return/
> !!

I know, discovered that in my try run. :(  Fixed version running at: https://tbpl.mozilla.org/?tree=Try&rev=2dbed6097358
Operating system: Android
                  0.0.0 Linux 2.6.29 #2 PREEMPT Thu Nov 15 13:38:07 CST 2012 i686 generic_x86/sdk_x86/generic_x86:4.2/JOP40C/eng.and
roid-build.20121231.103448:eng/test-keys
CPU: x86
     GenuineIntel family 6 model 3 stepping 3
     1 CPU

Crash reason:  SIGSEGV
Crash address: 0x4

Thread 47 (crashed)
 0  libcutils.so + 0x7449
    eip = 0xb7f7e449   esp = 0x9b52ba4c   ebp = 0x9b52baac   ebx = 0xb7f75ff4
    esi = 0x00000000   edi = 0x9b52bb24   eax = 0x00000001   ecx = 0xb8061740
    edx = 0x00000004   efl = 0x00200282
    Found by: given as instruction pointer in context

Thread 0
 0  libc.so + 0x26997
    eip = 0xb7feb997   esp = 0xbfe989dc   ebp = 0xb9f41ab4   ebx = 0x0000001b
    esi = 0xffffffff   edi = 0xffffffff   eax = 0xfffffffc   ecx = 0xbfe98a7c
    edx = 0x00000010   efl = 0x00000246
    Found by: given as instruction pointer in context
 1  libutils.so + 0x25ba1
    eip = 0xb7f69ba2   esp = 0xbfe989f0   ebp = 0xb9f41ab4
    Found by: stack scanning
 2  libc.so + 0x132e9
    eip = 0xb7fd82ea   esp = 0xbfe98a10   ebp = 0xb9f41ab4
    Found by: stack scanning
 3  libc.so + 0x10cf9
    eip = 0xb7fd5cfa   esp = 0xbfe98a20   ebp = 0xb9f41ab4
    Found by: stack scanning
 4  libc.so + 0x10ca7
    eip = 0xb7fd5ca8   esp = 0xbfe98a30   ebp = 0xb9f41ab4
    Found by: stack scanning
 5  libnss3.so!pt_PostNotifies [ptsynch.c:59411322cbbb : 113 + 0x7]
    eip = 0xa73b3c08   esp = 0xbfe98a40   ebp = 0xb9f41ab4
    Found by: stack scanning
 6  libdvm.so + 0x7eac5
    eip = 0xb72c9ac6   esp = 0xbfe98a48   ebp = 0xb9f41ab4
    Found by: stack scanning
 7  libnss3.so!pt_PostNotifies [ptsynch.c:59411322cbbb : 70 + 0xa]
    eip = 0xa73b3b4b   esp = 0xbfe98aac   ebp = 0xb9f41ab4
    Found by: stack scanning
 8  libnss3.so!PR_Unlock [ptsynch.c:59411322cbbb : 208 + 0xb]
    eip = 0xa73b3cf5   esp = 0xbfe98ac0   ebp = 0xb9f41ab4
    Found by: stack scanning
 9  libxul.so!nsAString_internal::SetCapacity(unsigned int, mozilla::fallible_t const&) [nsTSubstring.cpp:59411322cbbb : 546 + 0xe]
    eip = 0xa23de09b   esp = 0xbfe98acc   ebp = 0xb9f41ab4
    Found by: stack scanning
10  libxul.so!mozilla::BaseAutoLock<mozilla::Mutex>::~BaseAutoLock() [Mutex.h:59411322cbbb : 79 + 0xb]
    eip = 0xa16586e9   esp = 0xbfe98ae0   ebp = 0xb9f41ab4
    Found by: stack scanning
11  libnss3.so!PR_NotifyCondVar [ptsynch.c:59411322cbbb : 413 + 0x7]
    eip = 0xa73b3e72   esp = 0xbfe98af0   ebp = 0xb9f41ab4
    Found by: stack scanning
12  libxul.so!mozilla::BaseAutoLock<mozilla::Mutex>::~BaseAutoLock() [Mutex.h:59411322cbbb : 176 + 0x8]
    eip = 0xa16586d3   esp = 0xbfe98af4   ebp = 0xb9f41ab4
    Found by: stack scanning
13  libxul.so!nsAppShell::NotifyNativeEvent() [nsAppShell.cpp:59411322cbbb : 163 + 0xa]
    eip = 0xa20073d7   esp = 0xbfe98b00   ebp = 0xb9f41ab4
    Found by: stack scanning
14  libxul.so!mozilla::BaseAutoLock<mozilla::Mutex>::~BaseAutoLock() [Mutex.h:59411322cbbb : 79 + 0xb]
    eip = 0xa16586e9   esp = 0xbfe98b10   ebp = 0xb9f41ab4
    Found by: stack scanning
15  libxul.so!nsAppShell::PostEvent(mozilla::AndroidGeckoEvent*) [nsAppShell.cpp:59411322cbbb : 769 + 0x7]
    eip = 0xa2007b56   esp = 0xbfe98b30   ebp = 0xb9f41ab4
    Found by: stack scanning
16  libutils.so + 0x25ad8
    eip = 0xb7f69ad9   esp = 0xbfe98b5c   ebp = 0xb9f41ab4
    Found by: stack scanning
17  libandroid_runtime.so + 0x11aff3
    eip = 0xb7ec8ff4   esp = 0xbfe98b60   ebp = 0xb9f41ab4
    Found by: stack scanning
18  libutils.so + 0x2616b
    eip = 0xb7f6a16c   esp = 0xbfe98b70   ebp = 0xb9f41ab4
    Found by: stack scanning
19  libbinder.so + 0x1f6ff
    eip = 0xb7ef1700   esp = 0xbfe98b80   ebp = 0xb9f41ab4
    Found by: stack scanning
...
Attachment #808916 - Flags: feedback?(gbrown) → feedback+
https://hg.mozilla.org/mozilla-central/rev/cdba23acd0e0
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
See Also: → 962139
You need to log in before you can comment on or make changes to this bug.