Closed
Bug 1309708
Opened 7 years ago
Closed 6 years ago
Crash report dialog on first startup after APK update
Categories
(Firefox for Android Graveyard :: General, defect)
Tracking
(firefox51 fixed, firefox52 fixed)
RESOLVED
FIXED
Firefox 52
People
(Reporter: esawin, Assigned: esawin)
References
Details
Attachments
(4 files, 3 obsolete files)
755.05 KB,
text/plain
|
Details | |
566.84 KB,
text/plain
|
Details | |
42.54 KB,
text/plain
|
Details | |
1.94 KB,
patch
|
esawin
:
review+
gchang
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
With bug 1298090 landed, we show the crash report dialog alongside Fennec on the first startup after an APK update.
Assignee | ||
Comment 1•7 years ago
|
||
A workaround is to force-stop Fennec via Android App info before startup after the APK update. It looks like the PackageReplacedReceiver is being kept alive by Android during Fennec startup and we run into some conflict when trying to open the libraries closed by the receiver. Keeping the handles open fixes the issue and gives us a little performance boost on first startup, too. I can investigate more, unless you have an idea already what's happening exactly, glandium?
Attachment #8800425 -
Flags: review?(mh+mozilla)
Android can cache processes even if there is no activity/service/whatever currently running in it. That's probably what's happening here too. I think it makes sense to leave the process in a usable state when we are unpacking.
Assignee | ||
Comment 3•7 years ago
|
||
Looks like we're getting a segfault when reading libnss.so.
Assignee | ||
Comment 4•7 years ago
|
||
Here is a clean install and start without __wrap_dlclose for comparison.
Comment 5•7 years ago
|
||
(In reply to Eugen Sawin [:esawin] from comment #3) > Created attachment 8801824 [details] > Linker logs (with __wrap_dlclose) > > Looks like we're getting a segfault when reading libnss.so. Where do you see that in the attached log?
Assignee | ||
Comment 6•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #5) > (In reply to Eugen Sawin [:esawin] from comment #3) > > Created attachment 8801824 [details] > > Linker logs (with __wrap_dlclose) > > > > Looks like we're getting a segfault when reading libnss.so. > > Where do you see that in the attached log? 10-17 20:16:28.777 17019 17019 I GeckoLinker: /data/app/org.mozilla.fennec_esawin-2/base.apk!/assets/armeabi-v7a/libnss3.so: Loading segment @0xe1de9000 r-x ... 10-17 20:16:28.779 17019 17019 I GeckoLinker: Relocate /data/app/org.mozilla.fennec_esawin-2/base.apk!/assets/armeabi-v7a/libnss3.so @0xe1de9000 ... 10-17 20:16:35.375 17019 17089 I GeckoLinker: Caught segmentation fault @0xe1de9000
Comment 7•7 years ago
|
||
Why is the process still doing stuff after that segfault is caught? 10-17 20:16:35.375 17019 17089 I GeckoLinker: Redispatching to registered handler @0xeb75ffed Obviously, that handler is somehow recovering from the crash, because the same thread does other stuff later: 10-17 20:16:36.582 17019 17089 E GeckoLinker: rabbit GetHandleByPtr: 0xd1c20f71 10-17 20:16:36.582 17019 17089 E GeckoLinker: rabbit GetHandleByPtr: /data/app/org.mozilla.fennec_esawin-2/base.apk!/assets/armeabi-v7a/libnss3.so 10-17 20:16:36.582 17019 17089 E GeckoLinker: rabbit Load: libsoftokn3.so 10-17 20:16:36.582 17019 17089 E GeckoLinker: rabbit Load [100]: libsoftokn3.so 10-17 20:16:36.582 17019 17089 I GeckoLinker: Mapped /data/app/org.mozilla.fennec_esawin-2/base.apk @0xc5dca000 10-17 20:16:36.582 17019 17089 I GeckoLinker: ZipCollection::Register("/data/app/org.mozilla.fennec_esawin-2/base.apk") 10-17 20:16:36.582 17019 17089 I GeckoLinker: /data/app/org.mozilla.fennec_esawin-2/base.apk - GetFile assets/armeabi-v7a/libsoftokn3.so So it would be good to know what's at 0xeb75ffed, and I think that'd be dalvik. IMHO, this particular segfault message is a red herring. A backtrace of the crash for which you get the crash report dialog would be a better thing to have.
Flags: needinfo?(esawin)
Assignee | ||
Comment 8•7 years ago
|
||
I should have been more clear, there is no actual crash, Fennec starts up alongside an (empty) crash report dialog. I suspect there is something unclean about closing the lib handle and re-opening it, but I've only got the logs as a clue.
Flags: needinfo?(esawin)
Comment 9•7 years ago
|
||
Can you run this under gdb with a breakpoint in SEGVHandler::handler (from mozglue/linker/ElfLoader.cpp), and find out what's trying to read at the original address of libnss3?
Comment 10•7 years ago
|
||
(Also, the content of /proc/self/maps when that happens would be good to have.)
Flags: needinfo?(esawin)
Assignee | ||
Comment 11•7 years ago
|
||
I've forced a crash in SEGVHandler::handler to get a stack (easier than attaching during startup), however this issue is not reproducing on debug builds. On an optimized build, this is what I found: 4001d000-40060000 r-xp 00000000 b3:16 1228 /system/lib/libc.so 40060000-40063000 rw-p 00043000 b3:16 1228 /system/lib/libc.so 40063000-4006e000 rw-p 00000000 00:00 0 4006e000-4007f000 r--s 00000000 00:0b 3080 /dev/__properties__ (deleted) 400cb000-400f0000 r-xp 00000000 b3:16 422 /system/bin/mksh 400f0000-400f1000 r--p 00024000 b3:16 422 /system/bin/mksh 400f1000-400f2000 rw-p 00025000 b3:16 422 /system/bin/mksh 400f2000-400f6000 rw-p 00000000 00:00 0 4016e000-4016f000 r--p 00000000 00:00 0 4017e000-40191000 r-xp 00000000 b3:16 406 /system/bin/linker 40191000-40192000 r--p 00012000 b3:16 406 /system/bin/linker 40192000-40193000 rw-p 00013000 b3:16 406 /system/bin/linker 40193000-401a3000 rw-p 00000000 00:00 0 412c6000-412ce000 rw-p 00000000 00:00 0 [heap] bed03000-bed24000 rw-p 00000000 00:00 0 [stack] ffff0000-ffff1000 r-xp 00000000 00:00 0 [vectors] #0 0x5d8a2a98 in SEGVHandler::handler (signum=<optimized out>, info=<optimized out>, context=<optimized out>) at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1259 #1 0xffff0514 in ?? () #2 0xffff0514 in ?? () That doesn't seem very helpful. I'm not sure why debug builds don't hit this issue, could be timing related?
Flags: needinfo?(esawin)
Comment 12•7 years ago
|
||
I don't expect a crash from a SEGFAULT signal handler to yield something useful, and the stack trace you got is proof of that. Can you try again with gdb/jimdb?
Flags: needinfo?(esawin)
Assignee | ||
Comment 13•7 years ago
|
||
I get the same stack when backtracing from SEGVHandler::handler without MOZ_CRASH: 40011000-40012000 r--p 00000000 00:00 0 4002f000-40054000 r-xp 00000000 b3:16 422 /system/bin/mksh 40054000-40055000 r--p 00024000 b3:16 422 /system/bin/mksh 40055000-40056000 rw-p 00025000 b3:16 422 /system/bin/mksh 40056000-4005a000 rw-p 00000000 00:00 0 4005a000-4006b000 r--s 00000000 00:0b 3398 /dev/__properties__ (deleted) 4010b000-4011e000 r-xp 00000000 b3:16 406 /system/bin/linker 4011e000-4011f000 r--p 00012000 b3:16 406 /system/bin/linker 4011f000-40120000 rw-p 00013000 b3:16 406 /system/bin/linker 40120000-40130000 rw-p 00000000 00:00 0 40130000-40173000 r-xp 00000000 b3:16 1228 /system/lib/libc.so 40173000-40176000 rw-p 00043000 b3:16 1228 /system/lib/libc.so 40176000-40181000 rw-p 00000000 00:00 0 41e9e000-41ea6000 rw-p 00000000 00:00 0 [heap] beed4000-beef5000 rw-p 00000000 00:00 0 [stack] ffff0000-ffff1000 r-xp 00000000 00:00 0 [vectors] SEGVHandler::handler (signum=11, info=0x5db1cc90, context=0x5db1cd10) at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1258 1258 DEBUG_LOG("Caught segmentation fault @%p", info->si_addr); (gdb) bt #0 SEGVHandler::handler (signum=11, info=0x5db1cc90, context=0x5db1cd10) at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1258 #1 0xffff0514 in ?? () #2 0xffff0514 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?)
Flags: needinfo?(esawin)
Comment 14•7 years ago
|
||
It's weird that the addresses for info and context are not mapped according to your /proc/$pid/maps dump. Are you sure it's a dump for the right process? Can you print *info and print *(ucontext_t*)context?
Flags: needinfo?(esawin)
Assignee | ||
Comment 15•7 years ago
|
||
You're right, it was the wrong process. I've attached the full maps dump. SEGVHandler::handler (signum=11, info=0x5da1cc90, context=0x5da1cd10) at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1259 1259 DEBUG_LOG("Caught segmentation fault @%p", info->si_addr); (gdb) bt #0 SEGVHandler::handler (signum=11, info=0x5da1cc90, context=0x5da1cd10) at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1259 #1 0xffff0514 in ?? () #2 0xffff0514 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) p *info $1 = {si_signo = 11, si_errno = 0, si_code = 2, _sifields = {_pad = {1571815424, 0 <repeats 28 times>}, _kill = {_pid = 1571815424, _uid = 0}, _timer = {_tid = 1571815424, _overrun = 0, _pad = 0x5da1cca4 "", _sigval = {sival_int = 0, sival_ptr = 0x0}, _sys_private = 0}, _rt = {_pid = 1571815424, _uid = 0, _sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {_pid = 1571815424, _uid = 0, _status = 0, _utime = 0, _stime = 0}, _sigfault = {_addr = 0x5db00000}, _sigpoll = {_band = 1571815424, _fd = 0}}} (gdb) p *(ucontext_t*)context No symbol "ucontext_t" in current context. (gdb) x/40xb context 0x5da1cd10: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cd18: 0x00 0x90 0xa1 0x5d 0x00 0x00 0x00 0x00 0x5da1cd20: 0x00 0x40 0x00 0x00 0x0e 0x00 0x00 0x00 0x5da1cd28: 0x17 0x00 0x00 0x00 0x04 0x12 0x00 0x00 0x5da1cd30: 0x00 0x00 0xb0 0x5d 0x21 0x95 0xc4 0x61 libxul.so had been mapped to 0x5db00000 before the dlclose afaics.
Flags: needinfo?(esawin)
Assignee | ||
Comment 16•7 years ago
|
||
By libxul.so I mean /data/app/org.mozilla.fennec_esawin-2.apk!/assets/armeabi-v7a/libxul.so.
Comment 17•7 years ago
|
||
(In reply to Eugen Sawin [:esawin] from comment #15) > (gdb) x/40xb context > > 0x5da1cd10: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 > 0x5da1cd18: 0x00 0x90 0xa1 0x5d 0x00 0x00 0x00 0x00 > 0x5da1cd20: 0x00 0x40 0x00 0x00 0x0e 0x00 0x00 0x00 > 0x5da1cd28: 0x17 0x00 0x00 0x00 0x04 0x12 0x00 0x00 > 0x5da1cd30: 0x00 0x00 0xb0 0x5d 0x21 0x95 0xc4 0x61 Sadly, this is cut right when it begins to be interesting... need at least 64 more bytes. Make that 200 for good measure.
Flags: needinfo?(esawin)
Assignee | ||
Comment 18•7 years ago
|
||
#0 SEGVHandler::handler (signum=11, info=0x5da1cc90, context=0x5da1cd10) at /home/esawin/dev/g/mozglue/linker/ElfLoader.cpp:1259 #1 0xffff0514 in ?? () #2 0xffff0514 in ?? () Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) p *info $1 = {si_signo = 11, si_errno = 0, si_code = 2, _sifields = {_pad = {1571815424, 0 <repeats 28 times>}, _kill = {_pid = 1571815424, _uid = 0}, _timer = {_tid = 1571815424, _overrun = 0, _pad = 0x5da1cca4 "", _sigval = {sival_int = 0, sival_ptr = 0x0}, _sys_private = 0}, _rt = {_pid = 1571815424, _uid = 0, _sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {_pid = 1571815424, _uid = 0, _status = 0, _utime = 0, _stime = 0}, _sigfault = {_addr = 0x5db00000}, _sigpoll = {_band = 1571815424, _fd = 0}}} (gdb) x/200xb context 0x5da1cd10: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cd18: 0x00 0x90 0xa1 0x5d 0x00 0x00 0x00 0x00 0x5da1cd20: 0x00 0x40 0x00 0x00 0x0e 0x00 0x00 0x00 0x5da1cd28: 0x17 0x00 0x00 0x00 0x04 0x12 0x00 0x00 0x5da1cd30: 0x00 0x00 0xb0 0x5d 0x21 0x35 0x07 0x61 0x5da1cd38: 0x04 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cd40: 0x00 0x00 0x00 0x00 0xec 0xf8 0xdf 0x5d 0x5da1cd48: 0xf0 0xf8 0xdf 0x5d 0x04 0x00 0x00 0x00 0x5da1cd50: 0xf4 0xf8 0xdf 0x5d 0xfc 0x60 0xa0 0x5d 0x5da1cd58: 0x08 0x28 0x62 0x61 0x30 0xd3 0x61 0x61 0x5da1cd60: 0xfd 0xb8 0x2e 0x5f 0xb4 0xf8 0xdf 0x5d 0x5da1cd68: 0xa9 0xf8 0x5c 0x60 0x7e 0xf8 0x5c 0x60 0x5da1cd70: 0x30 0x00 0x00 0x80 0x00 0x00 0xb0 0x5d 0x5da1cd78: 0x04 0x12 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cd80: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cd88: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cd90: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cd98: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cda0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cda8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cdb0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cdb8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cdc0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cdc8: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x5da1cdd0: 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00
Flags: needinfo?(esawin)
Comment 19•7 years ago
|
||
So the interesting data in there is: lr: 0x605cf8a9 pc: 0x605cf87e Assuming /proc/pid/maps isn't changed, that means the crash is happening from somewhere within /data/data/org.mozilla.fennec_esawin/cache/libxul.so, which starts at 0x5fb0e000. That makes pc at 0xac187e and lr at 0xac18a9 within it. You'd need to run addr2line to know what those are. But that's assuming /proc/pid/maps is unchanged.
Updated•7 years ago
|
Flags: needinfo?(esawin)
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(esawin)
Assignee | ||
Comment 20•7 years ago
|
||
Since troubleshooting on IRC didn't get us closer to a proper solution of this issue, I would like to isolate the broadcast receiver into its own process for now to prevent any collisions with Fennec's linker. With this patch we run the receiver in a separate process (":extract") private to Fennec which avoids any linker issues.
Attachment #8808808 -
Flags: review?(snorp)
Comment on attachment 8808808 [details] [diff] [review] 0001-Bug-1309708-2.0-Run-the-library-extraction-in-a-dedi.patch Review of attachment 8808808 [details] [diff] [review]: ----------------------------------------------------------------- I was initially ok with this as a band-aid, but it introduces a race if we are starting fennec and unpacking after an update at the same time. I think I prefer the original patch on this bug, but we should still try to figure out the real problem.
Attachment #8808808 -
Flags: review?(snorp) → review-
Assignee | ||
Comment 22•6 years ago
|
||
When opening a library via CustomElf::Load, we report the mapping for the crash reporter. We don't remove the mapping when closing the library handle, which results in stale (and duplicate) reported mappings (when re-opening).
Attachment #8800425 -
Attachment is obsolete: true
Attachment #8808808 -
Attachment is obsolete: true
Attachment #8800425 -
Flags: review?(mh+mozilla)
Attachment #8809229 -
Flags: review?(mh+mozilla)
Comment 23•6 years ago
|
||
Comment on attachment 8809229 [details] [diff] [review] 0001-Bug-1309708-1.1-Delete-reported-library-mapping-on-c.patch Review of attachment 8809229 [details] [diff] [review]: ----------------------------------------------------------------- This makes so much more sense now. Thanks for persevering and finding the root cause. Yet again, bug 689178 is greatly needed. ::: mozglue/android/APKOpen.cpp @@ +206,5 @@ > + struct mapping_info *info = &lib_mapping[pos]; > + if (!strcmp(info->name, name)) { > + struct mapping_info *last = &lib_mapping[--mapping_count]; > + free(info->name); > + *info = *last; You shouldn't decrease mapping_count before this.
Attachment #8809229 -
Flags: review?(mh+mozilla) → review+
Assignee | ||
Comment 24•6 years ago
|
||
Addressed comment.
Attachment #8809229 -
Attachment is obsolete: true
Attachment #8809390 -
Flags: review+
Comment 25•6 years ago
|
||
Pushed by esawin@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/459230d2ab75 [1.2] Delete reported library mapping on close. r=glandium
Comment 26•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/459230d2ab75
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox52:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 52
Assignee | ||
Comment 27•6 years ago
|
||
Comment on attachment 8809390 [details] [diff] [review] 0001-Bug-1309708-1.2-Delete-reported-library-mapping-on-c.patch Approval Request Comment [Feature/regressing bug #]: Old issue, exposed by bug 1298090. [User impact if declined]: Potential crash on startup with broken/empty crash report. [Describe test coverage new/current, TreeHerder]: Nightly, Aurora. [Risks and why]: Low, affects only crash reporter integration. [String/UUID change made/needed]: None.
Attachment #8809390 -
Flags: approval-mozilla-beta?
Updated•6 years ago
|
status-firefox51:
--- → affected
Comment 28•6 years ago
|
||
Comment on attachment 8809390 [details] [diff] [review] 0001-Bug-1309708-1.2-Delete-reported-library-mapping-on-c.patch Fix an issue related to crash report handling. Beta51+. Should be in 51 beta 4.
Attachment #8809390 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 29•6 years ago
|
||
bugherderuplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/99d833f2f864
Updated•2 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•