Closed Bug 1309708 Opened 8 years ago Closed 8 years ago

Crash report dialog on first startup after APK update

Tracking

(firefox51 fixed, firefox52 fixed)

Status:

RESOLVED FIXED

Milestone:

Firefox 52

Tracking Flags:

Tracking

Status

firefox51

---

fixed

firefox52

---

fixed

People

(Reporter: esawin, Assigned: esawin)

References

Details

Attachments

(4 files, 3 obsolete files)

0001-Bug-1309708-1.0-Don-t-close-native-library-handles-d.patch 8 years ago Eugen Sawin [:esawin] 734 bytes, patch		Details \| Diff \| Splinter Review
Linker logs (with __wrap_dlclose) 8 years ago Eugen Sawin [:esawin] 755.05 KB, text/plain		Details
Linker logs (w/o __wrap_dlclose) 8 years ago Eugen Sawin [:esawin] 566.84 KB, text/plain		Details
maps dump 8 years ago Eugen Sawin [:esawin] 42.54 KB, text/plain		Details
0001-Bug-1309708-2.0-Run-the-library-extraction-in-a-dedi.patch 8 years ago Eugen Sawin [:esawin] 791 bytes, patch	snorp : review-	Details \| Diff \| Splinter Review
0001-Bug-1309708-1.1-Delete-reported-library-mapping-on-c.patch 8 years ago Eugen Sawin [:esawin] 1.91 KB, patch	glandium : review+	Details \| Diff \| Splinter Review
0001-Bug-1309708-1.2-Delete-reported-library-mapping-on-c.patch 8 years ago Eugen Sawin [:esawin] 1.94 KB, patch	esawin : review+ gchang : approval-mozilla-beta+	Details \| Diff \| Splinter Review

Eugen Sawin [:esawin]

Assignee

Description

•

8 years ago

With bug 1298090 landed, we show the crash report dialog alongside Fennec on the first startup after an APK update.

Eugen Sawin [:esawin]

Assignee

Comment 1

•

8 years ago

Attached patch 0001-Bug-1309708-1.0-Don-t-close-native-library-handles-d.patch (obsolete) — Details — Splinter Review

A workaround is to force-stop Fennec via Android App info before startup after the APK update.

It looks like the PackageReplacedReceiver is being kept alive by Android during Fennec startup and we run into some conflict when trying to open the libraries closed by the receiver.

Keeping the handles open fixes the issue and gives us a little performance boost on first startup, too.

I can investigate more, unless you have an idea already what's happening exactly, glandium?

Attachment #8800425 - Flags: review?(mh+mozilla)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 2

•

8 years ago

Android can cache processes even if there is no activity/service/whatever currently running in it. That's probably what's happening here too. I think it makes sense to leave the process in a usable state when we are unpacking.

Eugen Sawin [:esawin]

Assignee

Comment 3

•

8 years ago

Attached file Linker logs (with __wrap_dlclose) — Details

Looks like we're getting a segfault when reading libnss.so.

Eugen Sawin [:esawin]

Assignee

Comment 4

•

8 years ago

Attached file Linker logs (w/o __wrap_dlclose) — Details

Here is a clean install and start without __wrap_dlclose for comparison.

Mike Hommey [:glandium]

Comment 5

•

8 years ago

(In reply to Eugen Sawin [:esawin] from comment #3)
> Created attachment 8801824 [details]
> Linker logs (with __wrap_dlclose)
> 
> Looks like we're getting a segfault when reading libnss.so.

Where do you see that in the attached log?

Eugen Sawin [:esawin]

Assignee

Comment 6

•

8 years ago

(In reply to Mike Hommey [:glandium] from comment #5)
> (In reply to Eugen Sawin [:esawin] from comment #3)
> > Created attachment 8801824 [details]
> > Linker logs (with __wrap_dlclose)
> > 
> > Looks like we're getting a segfault when reading libnss.so.
> 
> Where do you see that in the attached log?

10-17 20:16:28.777 17019 17019 I GeckoLinker: /data/app/org.mozilla.fennec_esawin-2/base.apk!/assets/armeabi-v7a/libnss3.so: Loading segment @0xe1de9000 r-x
...
10-17 20:16:28.779 17019 17019 I GeckoLinker: Relocate /data/app/org.mozilla.fennec_esawin-2/base.apk!/assets/armeabi-v7a/libnss3.so @0xe1de9000
...
10-17 20:16:35.375 17019 17089 I GeckoLinker: Caught segmentation fault @0xe1de9000

Mike Hommey [:glandium]

Comment 7

•

8 years ago

Why is the process still doing stuff after that segfault is caught?

10-17 20:16:35.375 17019 17089 I GeckoLinker: Redispatching to registered handler @0xeb75ffed

Obviously, that handler is somehow recovering from the crash, because the same thread does other stuff later:

10-17 20:16:36.582 17019 17089 E GeckoLinker: rabbit GetHandleByPtr: 0xd1c20f71
10-17 20:16:36.582 17019 17089 E GeckoLinker: rabbit GetHandleByPtr: /data/app/org.mozilla.fennec_esawin-2/base.apk!/assets/armeabi-v7a/libnss3.so
10-17 20:16:36.582 17019 17089 E GeckoLinker: rabbit Load: libsoftokn3.so
10-17 20:16:36.582 17019 17089 E GeckoLinker: rabbit Load [100]: libsoftokn3.so
10-17 20:16:36.582 17019 17089 I GeckoLinker: Mapped /data/app/org.mozilla.fennec_esawin-2/base.apk @0xc5dca000
10-17 20:16:36.582 17019 17089 I GeckoLinker: ZipCollection::Register("/data/app/org.mozilla.fennec_esawin-2/base.apk")
10-17 20:16:36.582 17019 17089 I GeckoLinker: /data/app/org.mozilla.fennec_esawin-2/base.apk - GetFile assets/armeabi-v7a/libsoftokn3.so

So it would be good to know what's at 0xeb75ffed, and I think that'd be dalvik. IMHO, this particular segfault message is a red herring.

A backtrace of the crash for which you get the crash report dialog would be a better thing to have.

Flags: needinfo?(esawin)

Eugen Sawin [:esawin]

Assignee

Comment 8

•

8 years ago

I should have been more clear, there is no actual crash, Fennec starts up alongside an (empty) crash report dialog. I suspect there is something unclean about closing the lib handle and re-opening it, but I've only got the logs as a clue.

Flags: needinfo?(esawin)

Mike Hommey [:glandium]

Comment 9

•

8 years ago

Can you run this under gdb with a breakpoint in SEGVHandler::handler (from mozglue/linker/ElfLoader.cpp), and find out what's trying to read at the original address of libnss3?

Mike Hommey [:glandium]

Comment 10

•

8 years ago

(Also, the content of /proc/self/maps when that happens would be good to have.)

Flags: needinfo?(esawin)

Eugen Sawin [:esawin]

Assignee

Comment 11

•

8 years ago

I've forced a crash in SEGVHandler::handler to get a stack (easier than attaching during startup), however this issue is not reproducing on debug builds.

On an optimized build, this is what I found:

4001d000-40060000 r-xp 00000000 b3:16 1228       /system/lib/libc.so
40060000-40063000 rw-p 00043000 b3:16 1228       /system/lib/libc.so
40063000-4006e000 rw-p 00000000 00:00 0 
4006e000-4007f000 r--s 00000000 00:0b 3080       /dev/__properties__ (deleted)
400cb000-400f0000 r-xp 00000000 b3:16 422        /system/bin/mksh
400f0000-400f1000 r--p 00024000 b3:16 422        /system/bin/mksh
400f1000-400f2000 rw-p 00025000 b3:16 422        /system/bin/mksh
400f2000-400f6000 rw-p 00000000 00:00 0 
4016e000-4016f000 r--p 00000000 00:00 0 
4017e000-40191000 r-xp 00000000 b3:16 406        /system/bin/linker
40191000-40192000 r--p 00012000 b3:16 406        /system/bin/linker
40192000-40193000 rw-p 00013000 b3:16 406        /system/bin/linker
40193000-401a3000 rw-p 00000000 00:00 0 
412c6000-412ce000 rw-p 00000000 00:00 0          [heap]
bed03000-bed24000 rw-p 00000000 00:00 0          [stack]
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

#0  0x5d8a2a98 in SEGVHandler::handler (signum=<optimized out>, info=<optimized out>, context=<optimized out>)
    at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1259
#1  0xffff0514 in ?? ()
#2  0xffff0514 in ?? ()

That doesn't seem very helpful. I'm not sure why debug builds don't hit this issue, could be timing related?

Flags: needinfo?(esawin)

Mike Hommey [:glandium]

Comment 12

•

8 years ago

I don't expect a crash from a SEGFAULT signal handler to yield something useful, and the stack trace you got is proof of that. Can you try again with gdb/jimdb?

Flags: needinfo?(esawin)

Eugen Sawin [:esawin]

Assignee

Comment 13

•

8 years ago

I get the same stack when backtracing from SEGVHandler::handler without MOZ_CRASH:

40011000-40012000 r--p 00000000 00:00 0 
4002f000-40054000 r-xp 00000000 b3:16 422        /system/bin/mksh
40054000-40055000 r--p 00024000 b3:16 422        /system/bin/mksh
40055000-40056000 rw-p 00025000 b3:16 422        /system/bin/mksh
40056000-4005a000 rw-p 00000000 00:00 0 
4005a000-4006b000 r--s 00000000 00:0b 3398       /dev/__properties__ (deleted)
4010b000-4011e000 r-xp 00000000 b3:16 406        /system/bin/linker
4011e000-4011f000 r--p 00012000 b3:16 406        /system/bin/linker
4011f000-40120000 rw-p 00013000 b3:16 406        /system/bin/linker
40120000-40130000 rw-p 00000000 00:00 0 
40130000-40173000 r-xp 00000000 b3:16 1228       /system/lib/libc.so
40173000-40176000 rw-p 00043000 b3:16 1228       /system/lib/libc.so
40176000-40181000 rw-p 00000000 00:00 0 
41e9e000-41ea6000 rw-p 00000000 00:00 0          [heap]
beed4000-beef5000 rw-p 00000000 00:00 0          [stack]
ffff0000-ffff1000 r-xp 00000000 00:00 0          [vectors]

SEGVHandler::handler (signum=11, info=0x5db1cc90, context=0x5db1cd10)
    at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1258
1258      DEBUG_LOG("Caught segmentation fault @%p", info->si_addr);
(gdb) bt
#0  SEGVHandler::handler (signum=11, info=0x5db1cc90, context=0x5db1cd10) at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1258
#1  0xffff0514 in ?? ()
#2  0xffff0514 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Flags: needinfo?(esawin)

Mike Hommey [:glandium]

Comment 14

•

8 years ago

It's weird that the addresses for info and context are not mapped according to your /proc/$pid/maps dump. Are you sure it's a dump for the right process?

Can you print *info and print *(ucontext_t*)context?

Flags: needinfo?(esawin)

Eugen Sawin [:esawin]

Assignee

Comment 15

•

8 years ago

Attached file maps dump — Details

You're right, it was the wrong process. I've attached the full maps dump.

SEGVHandler::handler (signum=11, info=0x5da1cc90, context=0x5da1cd10)
    at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1259
1259      DEBUG_LOG("Caught segmentation fault @%p", info->si_addr);
(gdb) bt
#0  SEGVHandler::handler (signum=11, info=0x5da1cc90, context=0x5da1cd10) at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1259
#1  0xffff0514 in ?? ()
#2  0xffff0514 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

(gdb) p *info
$1 = {si_signo = 11, si_errno = 0, si_code = 2, _sifields = {_pad = {1571815424, 0 <repeats 28 times>}, _kill = {_pid = 1571815424, 
      _uid = 0}, _timer = {_tid = 1571815424, _overrun = 0, _pad = 0x5da1cca4 "", _sigval = {sival_int = 0, sival_ptr = 0x0}, 
      _sys_private = 0}, _rt = {_pid = 1571815424, _uid = 0, _sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {_pid = 1571815424, 
      _uid = 0, _status = 0, _utime = 0, _stime = 0}, _sigfault = {_addr = 0x5db00000}, _sigpoll = {_band = 1571815424, _fd = 0}}}

(gdb) p *(ucontext_t*)context
No symbol "ucontext_t" in current context.

(gdb) x/40xb context                                                                                                                      
0x5da1cd10:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cd18:     0x00    0x90    0xa1    0x5d    0x00    0x00    0x00    0x00
0x5da1cd20:     0x00    0x40    0x00    0x00    0x0e    0x00    0x00    0x00
0x5da1cd28:     0x17    0x00    0x00    0x00    0x04    0x12    0x00    0x00
0x5da1cd30:     0x00    0x00    0xb0    0x5d    0x21    0x95    0xc4    0x61


libxul.so had been mapped to 0x5db00000 before the dlclose afaics.

Flags: needinfo?(esawin)

Eugen Sawin [:esawin]

Assignee

Comment 16

•

8 years ago

By libxul.so I mean /data/app/org.mozilla.fennec_esawin-2.apk!/assets/armeabi-v7a/libxul.so.

Mike Hommey [:glandium]

Comment 17

•

8 years ago

(In reply to Eugen Sawin [:esawin] from comment #15)
> (gdb) x/40xb context                                                        
> 
> 0x5da1cd10:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
> 0x5da1cd18:     0x00    0x90    0xa1    0x5d    0x00    0x00    0x00    0x00
> 0x5da1cd20:     0x00    0x40    0x00    0x00    0x0e    0x00    0x00    0x00
> 0x5da1cd28:     0x17    0x00    0x00    0x00    0x04    0x12    0x00    0x00
> 0x5da1cd30:     0x00    0x00    0xb0    0x5d    0x21    0x95    0xc4    0x61

Sadly, this is cut right when it begins to be interesting...  need at least 64 more bytes. Make that 200 for good measure.

Flags: needinfo?(esawin)

Eugen Sawin [:esawin]

Assignee

Comment 18

•

8 years ago

#0  SEGVHandler::handler (signum=11, info=0x5da1cc90, context=0x5da1cd10) at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1259
#1  0xffff0514 in ?? ()
#2  0xffff0514 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) p *info
$1 = {si_signo = 11, si_errno = 0, si_code = 2, _sifields = {_pad = {1571815424, 0 <repeats 28 times>}, _kill = {_pid = 1571815424, 
      _uid = 0}, _timer = {_tid = 1571815424, _overrun = 0, _pad = 0x5da1cca4 "", _sigval = {sival_int = 0, sival_ptr = 0x0}, 
      _sys_private = 0}, _rt = {_pid = 1571815424, _uid = 0, _sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {_pid = 1571815424, 
      _uid = 0, _status = 0, _utime = 0, _stime = 0}, _sigfault = {_addr = 0x5db00000}, _sigpoll = {_band = 1571815424, _fd = 0}}}
(gdb) x/200xb context
0x5da1cd10:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cd18:     0x00    0x90    0xa1    0x5d    0x00    0x00    0x00    0x00
0x5da1cd20:     0x00    0x40    0x00    0x00    0x0e    0x00    0x00    0x00
0x5da1cd28:     0x17    0x00    0x00    0x00    0x04    0x12    0x00    0x00
0x5da1cd30:     0x00    0x00    0xb0    0x5d    0x21    0x35    0x07    0x61
0x5da1cd38:     0x04    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cd40:     0x00    0x00    0x00    0x00    0xec    0xf8    0xdf    0x5d
0x5da1cd48:     0xf0    0xf8    0xdf    0x5d    0x04    0x00    0x00    0x00
0x5da1cd50:     0xf4    0xf8    0xdf    0x5d    0xfc    0x60    0xa0    0x5d
0x5da1cd58:     0x08    0x28    0x62    0x61    0x30    0xd3    0x61    0x61
0x5da1cd60:     0xfd    0xb8    0x2e    0x5f    0xb4    0xf8    0xdf    0x5d
0x5da1cd68:     0xa9    0xf8    0x5c    0x60    0x7e    0xf8    0x5c    0x60
0x5da1cd70:     0x30    0x00    0x00    0x80    0x00    0x00    0xb0    0x5d
0x5da1cd78:     0x04    0x12    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cd80:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cd88:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cd90:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cd98:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cda0:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cda8:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cdb0:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cdb8:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cdc0:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cdc8:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x5da1cdd0:     0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00

Flags: needinfo?(esawin)

Mike Hommey [:glandium]

Comment 19

•

8 years ago

So the interesting data in there is:
lr: 0x605cf8a9
pc: 0x605cf87e

Assuming /proc/pid/maps isn't changed, that means the crash is happening from somewhere within /data/data/org.mozilla.fennec_esawin/cache/libxul.so, which starts at 0x5fb0e000.
That makes pc at 0xac187e and lr at 0xac18a9 within it. You'd need to run addr2line to know what those are.

But that's assuming /proc/pid/maps is unchanged.

Mike Hommey [:glandium]

Updated

•

8 years ago

Flags: needinfo?(esawin)

Eugen Sawin [:esawin]

Assignee

Updated

•

8 years ago

Flags: needinfo?(esawin)

Eugen Sawin [:esawin]

Assignee

Comment 20

•

8 years ago

Attached patch 0001-Bug-1309708-2.0-Run-the-library-extraction-in-a-dedi.patch (obsolete) — Details — Splinter Review

Since troubleshooting on IRC didn't get us closer to a proper solution of this issue, I would like to isolate the broadcast receiver into its own process for now to prevent any collisions with Fennec's linker.

With this patch we run the receiver in a separate process (":extract") private to Fennec which avoids any linker issues.

Attachment #8808808 - Flags: review?(snorp)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 21

•

8 years ago

Comment on attachment 8808808 [details] [diff] [review]
0001-Bug-1309708-2.0-Run-the-library-extraction-in-a-dedi.patch

Review of attachment 8808808 [details] [diff] [review]:
-----------------------------------------------------------------

I was initially ok with this as a band-aid, but it introduces a race if we are starting fennec and unpacking after an update at the same time. I think I prefer the original patch on this bug, but we should still try to figure out the real problem.

Attachment #8808808 - Flags: review?(snorp) → review-

Eugen Sawin [:esawin]

Assignee

Comment 22

•

8 years ago

Attached patch 0001-Bug-1309708-1.1-Delete-reported-library-mapping-on-c.patch (obsolete) — Details — Splinter Review

When opening a library via CustomElf::Load, we report the mapping for the crash reporter.
We don't remove the mapping when closing the library handle, which results in stale (and duplicate) reported mappings (when re-opening).

Attachment #8800425 - Attachment is obsolete: true

Attachment #8808808 - Attachment is obsolete: true

Attachment #8800425 - Flags: review?(mh+mozilla)

Attachment #8809229 - Flags: review?(mh+mozilla)

Mike Hommey [:glandium]

Comment 23

•

8 years ago

Comment on attachment 8809229 [details] [diff] [review]
0001-Bug-1309708-1.1-Delete-reported-library-mapping-on-c.patch

Review of attachment 8809229 [details] [diff] [review]:
-----------------------------------------------------------------

This makes so much more sense now. Thanks for persevering and finding the root cause. Yet again, bug 689178 is greatly needed.

::: mozglue/android/APKOpen.cpp
@@ +206,5 @@
> +    struct mapping_info *info = &lib_mapping[pos];
> +    if (!strcmp(info->name, name)) {
> +      struct mapping_info *last = &lib_mapping[--mapping_count];
> +      free(info->name);
> +      *info = *last;

You shouldn't decrease mapping_count before this.

Attachment #8809229 - Flags: review?(mh+mozilla) → review+

Eugen Sawin [:esawin]

Assignee

Comment 24

•

8 years ago

Attached patch 0001-Bug-1309708-1.2-Delete-reported-library-mapping-on-c.patch — Details — Splinter Review

Addressed comment.

Attachment #8809229 - Attachment is obsolete: true

Attachment #8809390 - Flags: review+

Pulsebot

Comment 25

•

8 years ago

Pushed by esawin@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/459230d2ab75
[1.2] Delete reported library mapping on close. r=glandium

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 26

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/459230d2ab75

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox52: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 52

Eugen Sawin [:esawin]

Assignee

Comment 27

•

8 years ago

Comment on attachment 8809390 [details] [diff] [review]
0001-Bug-1309708-1.2-Delete-reported-library-mapping-on-c.patch

Approval Request Comment
[Feature/regressing bug #]: Old issue, exposed by bug 1298090.
[User impact if declined]: Potential crash on startup with broken/empty crash report.
[Describe test coverage new/current, TreeHerder]: Nightly, Aurora.
[Risks and why]: Low, affects only crash reporter integration.
[String/UUID change made/needed]: None.

Attachment #8809390 - Flags: approval-mozilla-beta?

Gerry Chang [:gchang]

Updated

•

8 years ago

status-firefox51: --- → affected

Gerry Chang [:gchang]

Comment 28

•

8 years ago

Comment on attachment 8809390 [details] [diff] [review]
0001-Bug-1309708-1.2-Delete-reported-library-mapping-on-c.patch

Fix an issue related to crash report handling. Beta51+. Should be in 51 beta 4.

Attachment #8809390 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Carsten Book [:Tomcat]

Comment 29

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/99d833f2f864

status-firefox51: affected → fixed

Nobody; OK to take it and work on it

Updated

•

3 years ago

Product: Firefox for Android → Firefox for Android Graveyard

You need to log in before you can comment on or make changes to this bug.