Closed
Bug 1460989
Opened 4 years ago
Closed 4 years ago
Galaxy S8 SEGV_ACCERR crash in v60+
Categories
(Firefox for Android Graveyard :: General, defect, P1)
Tracking
(firefox60+ wontfix, firefox61+ fixed, firefox62+ fixed)
RESOLVED
FIXED
Firefox 62
People
(Reporter: kbrosnan, Assigned: jchen)
Details
Attachments
(1 file)
59 bytes,
text/x-review-board-request
|
glandium
:
review+
RyanVM
:
approval-mozilla-release+
|
Details |
[Tracking Requested - why for this release]: This crash continues to show up in 60+. 05-11 16:05:47.874 +0000 15126 15126 F DEBUG : Build fingerprint: 'samsung/dreamqlteue/dreamqlteue:8.0.0/R16NW/G950U1UEU2CRB9:user/release-keys' 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : Revision: '12' 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : ABI: 'arm' 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : pid: 15101, tid: 15113, name: Binder:15101_2 >>> org.mozilla.firefox.PasswordsProvider <<< 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xf4669124 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : r0 f4669114 r1 00001000 r2 c0000000 r3 f466a000 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : r4 d309f91c r5 d2d0707c r6 d2d08040 r7 d2d0805c 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : r8 00000000 r9 f22bd0b4 sl d46a261c fp 00000000 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : ip f085a634 sp d46a25e8 lr f081c515 pc d2f9c2ba cpsr 000f0030 05-11 16:05:47.876 +0000 15126 15126 F DEBUG : 05-11 16:05:47.876 +0000 15126 15126 F DEBUG : backtrace: 05-11 16:05:47.876 +0000 15126 15126 F DEBUG : #00 pc 000142ba /data/app/org.mozilla.firefox-rV4vVmDoVTl9JHHpgaa4dw==/lib/arm/libmozglue.so (offset 0x3c000) 05-11 16:05:47.876 +0000 15126 15126 F DEBUG : #01 pc 00052511 /system/lib/libc.so (fclose+160) 05-11 16:05:47.876 +0000 15126 15126 F DEBUG : #02 pc 00006fff [anon:jemalloc:d2d00000] Captured a couple logcats https://gist.github.com/kbrosnan/1c26e5ca2e5f58cb2fa7484497aa8a18 and https://gist.github.com/kbrosnan/888a15dfcb6be383fc4348ec73c43cb5
Updated•4 years ago
|
Reporter | ||
Comment 1•4 years ago
|
||
https://gist.github.com/kbrosnan/fae70da91cdac9096badd26281f019ab
There are a couple versions of this I see in the Google Play console. This first one has 12k reports in the last week and the stack seems to be consistent among the reports I looked at: signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), __dl__ZN6soinfo19increment_ref_countEv #00 pc 0000000000013cd6 /system/bin/linker (__dl__ZN6soinfo19increment_ref_countEv+9) #01 pc 0000000000005ef3 /system/bin/linker (__dl__ZL12find_libraryP19android_namespace_tPKciPK17android_dlextinfoP6soinfo+54) #02 pc 0000000000005c6f /system/bin/linker (__dl__Z9do_dlopenPKciPK17android_dlextinfoPKv+1110) #03 pc 000000000000343b /system/bin/linker (__dl__ZL10dlopen_extPKciPK17android_dlextinfoPKv+38) #04 pc 000000000000103d /system/lib/libdl.so (dlopen+4) #05 pc 000000000000eddf /system/lib/libEGL.so (_ZN7androidL12load_wrapperEPKc+46) #06 pc 000000000000e437 /system/lib/libEGL.so (_ZN7android6Loader4openEPNS_16egl_connection_tE+286) #07 pc 0000000000009c0b /system/lib/libEGL.so (_ZN7android16egl_init_driversEv+54) #08 pc 0000000000009de7 /system/lib/libEGL.so (eglGetDisplay+66) #09 pc 00000000000a6acb /system/lib/libandroid_runtime.so (_ZL24android_eglGetDisplayIntP7_JNIEnvP8_jobjecti+26) #10 pc 0000000000c148cb /system/framework/arm/boot-framework.oat (android.hardware.usb.UsbDevice.native_get_device_name [DEDUPED]+98) #11 pc 0000000000e61fb1 /system/framework/arm/boot-framework.oat (android.os.-$Lambda$6x30vPJhBKUfNY8tswxuZo3DCe0.run+56) #12 pc 00000000001ee141 /system/framework/arm/boot.oat (java.lang.Thread.run+64) #13 pc 00000000003dcee1 /system/lib/libart.so (art_quick_invoke_stub_internal+64) #14 pc 00000000003e1455 /system/lib/libart.so (art_quick_invoke_stub+228) #15 pc 00000000000ac605 /system/lib/libart.so (_ZN3art9ArtMethod6InvokeEPNS_6ThreadEPjjPNS_6JValueEPKc+140) #16 pc 000000000033246d /system/lib/libart.so (_ZN3artL18InvokeWithArgArrayERKNS_33ScopedObjectAccessAlreadyRunnableEPNS_9ArtMethodEPNS_8ArgArrayEPNS_6JValueEPKc+52) #17 pc 00000000003332e1 /system/lib/libart.so (_ZN3art35InvokeVirtualOrInterfaceWithJValuesERKNS_33ScopedObjectAccessAlreadyRunnableEP8_jobjectP10_jmethodIDP6jvalue+320) #18 pc 00000000003508f9 /system/lib/libart.so (_ZN3art6Thread14CreateCallbackEPv+892) #19 pc 000000000004881f /system/lib/libc.so (_ZL15__pthread_startPv+22) #20 pc 000000000001b2a1 /system/lib/libc.so (__start_thread+32) There are several variations of this second one which has about 1200 reports in the last week: signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), libmozglue.so (Galaxy S8 and S9) #00 pc 00000000000142ba /data/app/org.mozilla.firefox-RbXXu2v-RzvP6BI9F1vlIA==/lib/arm/libmozglue.so #01 pc 000000000001499b /data/app/org.mozilla.firefox-RbXXu2v-RzvP6BI9F1vlIA==/lib/arm/libmozglue.so signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), libmozglue.so (Galaxy S6) #00 pc 000000000001299a /data/app/org.mozilla.firefox-1/lib/arm/libmozglue.so #01 pc 000000000001298f /data/app/org.mozilla.firefox-1/lib/arm/libmozglue.so signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), libmozglue.so (Galaxy J7) #00 pc 0000000000012ab2 /data/app/org.mozilla.firefox-1/lib/arm/libmozglue.so #01 pc 0000000000012aa7 /data/app/org.mozilla.firefox-1/lib/arm/libmozglue.so signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), libmozglue.so (Galaxy S8) #00 pc 000000000000fdec /data/app/org.mozilla.firefox-1/lib/arm/libmozglue.so #01 pc 000000000000fd9f /data/app/org.mozilla.firefox-1/lib/arm/libmozglue.so
The attached logcat shows that we are launching Gecko inside GeckoService, which is probably a good clue. We start up Gecko there for a few reasons[0], but I think the most likely is that we're warming up in GeckoCustomTabsService[1]. Maybe that's racing when the Activity startup? [0] https://dxr.mozilla.org/mozilla-release/source/mobile/android/base/java/org/mozilla/gecko/GeckoService.java?q=GeckoService.java&redirect_type=direct#197 [1] https://dxr.mozilla.org/mozilla-release/source/mobile/android/base/java/org/mozilla/gecko/customtabs/GeckoCustomTabsService.java#51
Reporter | ||
Comment 4•4 years ago
|
||
This crash happened in the background while using another app https://gist.github.com/kbrosnan/8c9cfc656a7267c45666ae4c70911f03
(In reply to Kevin Brosnan [:kbrosnan] from comment #4) > This crash happened in the background while using another app > https://gist.github.com/kbrosnan/8c9cfc656a7267c45666ae4c70911f03 snorp: you can see the Android Application instance doesn't exist so onCreate is getting called: https://searchfox.org/mozilla-central/source/mobile/android/base/java/org/mozilla/gecko/GeckoApplication.java#265 and that's the log message 05-16 14:42:35.050 16503 16503 I GeckoApplication: zerdatime 452443751 - application start This is a separate process, mind you, since what's about to happen is that Sync is going to ask for passwords: 05-16 14:42:35.144 16503 16515 E GeckoLibLoad: Load sqlite start 05-16 14:42:35.167 16503 16515 F libc : Fatal signal 11 (SIGSEGV), code 2, fault addr 0xf4669124 in tid 16515 (Binder:16503_2) For whatever reason, we can't unpack and load the libraries in the PasswordProvider process on these devices. Maybe it's when Gecko is running already; maybe it's when Gecko is not yet running. This is all very racy; it's hard to remember that we can have _two copies of the App Java code_ in play.
Comment 6•4 years ago
|
||
Good discussion and some hints but still mysterious - maybe Glandium has thoughts?
Flags: needinfo?(mh+mozilla)
Priority: -- → P1
Comment 7•4 years ago
|
||
what does addr2line say the corresponding symbols are in libmozglue.so?
Flags: needinfo?(mh+mozilla)
Updated•4 years ago
|
Flags: needinfo?(kbrosnan)
Comment 9•4 years ago
|
||
Do you have an answer to comment 7, or at least the exact versions of fennec for which we have crash reports/logs?
Reporter | ||
Comment 10•4 years ago
|
||
I don't know what addr2line is. There is not a regression range as this bug is the result of Samsung pushing the Android 8.0 release and finding one with ~1/day reproducibility would be quite difficult. See previous issues bug 1455662 and bug 1450793. Email context from around April 2nd. Liz Henry: From the Google Play interface, our crash rate seems to have gone rocketing up recently with release 59. Most of these crashes are with Android 8.0 on Galaxy 8 devices. I'm wondering if this is a 8.0 specific bug, and if we have contacts at Samsung to talk to. We also might want to start testing on 8.0 and up, if we aren't already. It doesn't look like this crash spike is reflected in crash-stats reports. Any ideas for next steps? Me: It looks like the problem is mostly with the Galaxy S8. About 2 weeks ago Samsung and carriers started pushing out Android 8.0 (Oreo API 26 & 27) to the Samsung Galaxy S8 [1]. There would not be a way to catch this in Firefox beta as those phones were running 7.0 or 7.1 at the time. Getting our hands on an upgraded North American S8 would be useful. [1] https://news.google.com/news/search/section/q/galaxy%20s8%20oreo/galaxy%20s8%20oreo?hl=en&gl=US&ned=us
Reporter | ||
Comment 11•4 years ago
|
||
The logs I have attached are from 60.0 or 60.0.1 ARMv7 API 16 build. The timestamps from the logs should be accurate.
Comment 12•4 years ago
|
||
(In reply to Kevin Brosnan [:kbrosnan] from comment #11) > The logs I have attached are from 60.0 or 60.0.1 ARMv7 API 16 build. The > timestamps from the logs should be accurate. Do you know which is which?
Reporter | ||
Comment 13•4 years ago
|
||
From looking at archive.mozilla.org they all predate the 60.0.1 release. Builds did not exist until 16-May-2018 00:56 and were pushed to the release folder around 16-May-2018 17:56.
Comment 14•4 years ago
|
||
So, the crash is at: 05-11 16:05:47.876 +0000 15126 15126 F DEBUG : #00 pc 000142ba /data/app/org.mozilla.firefox-rV4vVmDoVTl9JHHpgaa4dw==/lib/arm/libmozglue.so (offset 0x3c000) which would correspond to this: 502ae: b570 push {r4, r5, r6, lr} 502b0: b086 sub sp, #24 502b2: 4604 mov r4, r0 502b4: 460d mov r5, r1 502b6: e9d4 0100 ldrd r0, r1, [r4] -->502ba: 1a08 subs r0, r1, r0 502bc: 2101 movs r1, #1 502be: eb01 01a0 add.w r1, r1, r0, asr #2 502c2: 4620 mov r0, r4 And we have the following register values and crash address: 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0xf4669124 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : r0 f4669114 r1 00001000 r2 c0000000 r3 f466a000 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : r4 d309f91c r5 d2d0707c r6 d2d08040 r7 d2d0805c 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : r8 00000000 r9 f22bd0b4 sl d46a261c fp 00000000 05-11 16:05:47.875 +0000 15126 15126 F DEBUG : ip f085a634 sp d46a25e8 lr f081c515 pc d2f9c2ba cpsr 000f0030 ... which doesn't seem to be plausible. Even assuming the actual crash is on the deref of r4 on the previous instruction, then the fault address should be at 0xd309f91c (or close), not 0xf4669124. If the crash is really happening at this location (0x502ba), that's in std::__ndk1::vector<const elf32_phdr *, std::__ndk1::allocator<const elf32_phdr *> >::__push_back_slow_path<const elf32_phdr *const &>, which disassembly says is only called from CustomElf::Load here: https://dxr.mozilla.org/mozilla-central/rev/51f2535c797495f1f4e864072c2449b7c28669de/mozglue/linker/CustomElf.cpp#141 This is not making any sense. I'm afraid the only way forward is to be able to reproduce it on a device and get more data. Who has access to a S8?
Reporter | ||
Comment 15•4 years ago
|
||
I do. On the mobile team Randal Barker and Susheel Daswani also have this device. Dave Miller (Justdave) has an S8 as a primary device and says that he encounters the crash up to a few times a day. Though he is a sysadmin.
Kevin - can you provide the Samsung model number for the S8 you're seeing the issue on?
Flags: needinfo?(kbrosnan)
Comment 17•4 years ago
|
||
My S8 was extremely unstable last month, but then it became good again. Over the past two days it has been crashing again: here are the last 2 crash reports: https://crash-stats.mozilla.com/report/index/22f3397d-4732-400f-b545-3be460180522 https://crash-stats.mozilla.com/report/index/b562e850-9a86-49a9-9bee-293ae0180522
Comment 18•4 years ago
|
||
If anyone wants my S8 feel free to ask, I'll ask for a S9 replacement of course :)
Reporter | ||
Comment 19•4 years ago
|
||
That info is in bug 1450793 comment 19. jya your issue is a different bug. This only happens with North American models of the S8. Your phone has completely different hardware.
Flags: needinfo?(kbrosnan)
Reporter | ||
Comment 20•4 years ago
|
||
60.0.1 crash https://gist.github.com/kbrosnan/060e3fd2bb0a985b8ae2bea7c97f2eba
Reporter | ||
Comment 21•4 years ago
|
||
Another 60.0.1 https://gist.github.com/kbrosnan/211eb76dce7cdaab6f08b300c3e32e0c
Comment 22•4 years ago
|
||
Are those logs from your device? If they are, in those logs, there are lines like "05-22 15:37:36.620 1074 1074 E /system/bin/tombstoned: Tombstone written to: /data/tombstones//tombstone_00" Could you attach those tombstore files?
Flags: needinfo?(kbrosnan)
Reporter | ||
Comment 23•4 years ago
|
||
Getting permission denied trying to access the files. Looks like it requires root to access the tombstone files. From a quick search there are some options for rooting the Galaxy S8 w/ Android 8.0.
Flags: needinfo?(kbrosnan)
Assignee | ||
Comment 24•4 years ago
|
||
The crash from comment 0 is happening at [1]. We think mprotect() is lying about whether it succeeded or not. [1] https://searchfox.org/mozilla-central/rev/dc6d85680539cb7e5fe2843d38a30a0581bfefe1/mozglue/linker/ElfLoader.cpp#1007
Comment 25•4 years ago
|
||
(In reply to Jim Chen [:jchen] [:darchons] from comment #24) > The crash from comment 0 is happening at [1]. We think mprotect() is lying > about whether it succeeded or not. > > [1] > https://searchfox.org/mozilla-central/rev/ > dc6d85680539cb7e5fe2843d38a30a0581bfefe1/mozglue/linker/ElfLoader.cpp#1007 That certainly makes more sense than comment 14, but how did you get that line number?
Updated•4 years ago
|
Assignee: nobody → nchen
Assignee | ||
Comment 26•4 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #25) > (In reply to Jim Chen [:jchen] [:darchons] from comment #24) > > The crash from comment 0 is happening at [1]. We think mprotect() is lying > > about whether it succeeded or not. > > > > [1] > > https://searchfox.org/mozilla-central/rev/ > > dc6d85680539cb7e5fe2843d38a30a0581bfefe1/mozglue/linker/ElfLoader.cpp#1007 > > That certainly makes more sense than comment 14, but how did you get that > line number? I added the address (0x142ba) and file offset (0x3c000) from the crash dump plus an elfhack offset (0x1000) to get the virtual offset (0x512ba), and looked that up in the breakpad symbols.
Comment hidden (mozreview-request) |
Comment 28•4 years ago
|
||
(In reply to Jim Chen [:jchen] [:darchons] from comment #26) > (In reply to Mike Hommey [:glandium] from comment #25) > > (In reply to Jim Chen [:jchen] [:darchons] from comment #24) > > > The crash from comment 0 is happening at [1]. We think mprotect() is lying > > > about whether it succeeded or not. > > > > > > [1] > > > https://searchfox.org/mozilla-central/rev/ > > > dc6d85680539cb7e5fe2843d38a30a0581bfefe1/mozglue/linker/ElfLoader.cpp#1007 > > > > That certainly makes more sense than comment 14, but how did you get that > > line number? > > I added the address (0x142ba) and file offset (0x3c000) from the crash dump > plus an elfhack offset (0x1000) to get the virtual offset (0x512ba), and > looked that up in the breakpad symbols. Aaaaaah right, 0x3c000 is the *file* offset.
Comment 29•4 years ago
|
||
mozreview-review |
Comment on attachment 8980661 [details] Bug 1460989 - Hold system linker lock while modifying debug map; https://reviewboard.mozilla.org/r/246834/#review253066 ::: mozglue/linker/ElfLoader.cpp:914 (Diff revision 1) > return; > } > > page = firstPage; > int ret = mprotect(page, length, prot | PROT_WRITE); > - success = ret == 0; > + int newProt = getProt(start, &end); Please skip this in the case where ret is not 0. Also consider how much of a wtf this is (mprotect pretends there was no error while not doind what it's asked for), please add an explicit comment. We don't want someone looking at the code later and think "this looks useless, let's remove it".
Attachment #8980661 -
Flags: review?(mh+mozilla)
Comment 30•4 years ago
|
||
mozreview-review |
Comment on attachment 8980661 [details] Bug 1460989 - Hold system linker lock while modifying debug map; https://reviewboard.mozilla.org/r/246834/#review253070 ::: mozglue/linker/ElfLoader.cpp:919 (Diff revision 1) > - success = ret == 0; > + int newProt = getProt(start, &end); > + success = (ret == 0) && (newProt != -1) && (newProt & PROT_WRITE); > if (!success) { > ERROR("mprotect(%p, %zu, %d) = %d (errno=%d; %s)", > page, length, prot | PROT_WRITE, ret, > errno, strerror(errno)); errno shouldn't be displayed when ret is 0. I'd be interested in the output here. I wonder if the root cause is because of prot containing PROT_EXEC, and some OS-level protection forbidding executable pages to be turned writable, although that shouldn't be silent.
Comment 31•4 years ago
|
||
While you're there, please change the %d for prot to %o. I guess snorp missed the subtle difference in bug 1455662 comment 6.
Comment hidden (mozreview-request) |
Comment 33•4 years ago
|
||
mozreview-review |
Comment on attachment 8980661 [details] Bug 1460989 - Hold system linker lock while modifying debug map; https://reviewboard.mozilla.org/r/246834/#review253746 ::: mozglue/linker/ElfLoader.cpp:927 (Diff revision 2) > + // success even after _failing_ to make the page writable. Therefore, check > + // for write access again instead of relying on the mprotect return value. > + int newProt = getProt(start, &end); > + success = (newProt != -1) && (newProt & PROT_WRITE); > if (!success) { > - ERROR("mprotect(%p, %zu, %d) = %d (errno=%d; %s)", > + WARN("mprotect(%p, %zu, %o) returned 0 but page is not writable: %o", It should be ERROR like the other, or the other WARN like this one.
Attachment #8980661 -
Flags: review?(mh+mozilla) → review+
Comment hidden (mozreview-request) |
Comment 35•4 years ago
|
||
Pushed by nchen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a4465713555b Check page protection flags again after mprotect(); r=glandium
Comment 36•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a4465713555b
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 62
Comment 37•4 years ago
|
||
Please nominate this for Beta uplift so we can hopefully include it in Monday's b11 build.
Flags: needinfo?(nchen)
Assignee | ||
Comment 38•4 years ago
|
||
Comment on attachment 8980661 [details] Bug 1460989 - Hold system linker lock while modifying debug map; Approval Request Comment [Feature/Bug causing the regression]: N/A [User impact if declined]: Crash on some devices like Galaxy S8 [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: No [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Not really [Why is the change risky/not risky?]: The patch adds an additional protection against the crash, but doesn't otherwise affect behavior. [String changes made/needed]: None
Flags: needinfo?(nchen)
Attachment #8980661 -
Flags: approval-mozilla-beta?
Comment 40•4 years ago
|
||
Comment on attachment 8980661 [details] Bug 1460989 - Hold system linker lock while modifying debug map; More mprotect sanity checking to hopefully work around Galaxy S8 crashes. Approved for 61.0b11.
Flags: needinfo?(nchen)
Attachment #8980661 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Updated•4 years ago
|
Flags: needinfo?(nchen)
Comment 41•4 years ago
|
||
bugherderuplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/6fafce03df13
Assignee | ||
Comment 42•4 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #39) > Do we have the log of a failed madvise? I haven't seen any. I think snorp has gone through more logs.
Flags: needinfo?(nchen) → needinfo?(snorp)
Updated•4 years ago
|
We are seeing an even bigger crash spike on the Play Store, again not reflected in crash-stats, for S8 devices. I am not sure how to pin down the cause, but if we see a similar spike on beta in a day or two then this might be the cause. Jim can you see those crash results? I can email you the link if you have the permissions.
Flags: needinfo?(nchen)
(In reply to Jim Chen [:jchen] [:darchons] from comment #42) > (In reply to Mike Hommey [:glandium] from comment #39) > > Do we have the log of a failed madvise? > > I haven't seen any. I think snorp has gone through more logs. I also haven't seen any. Play Store doesn't give us logcat, and I don't think there are any failed mprotect() in kbrosnan's logs.
Flags: needinfo?(snorp)
Liz, can we get Jim access to the play store console? I think it may help if he can look through that directly.
Flags: needinfo?(lhenry)
Sylvestre, can you help with that since you have the permissions to add a new user?
Flags: needinfo?(lhenry) → needinfo?(sledru)
Comment 48•4 years ago
|
||
Sure, I sent an invite to nchen@mozilla.com FYI, I have a Samsung S8 and with this bug, it is an unusable application.
Flags: needinfo?(sledru)
Reporter | ||
Comment 49•4 years ago
|
||
I will bring this device to SF and maybe we can get this in front of a developer.
Comment 50•4 years ago
|
||
backout |
Backed out for increasing the crash rate as reported in the Play Console. https://hg.mozilla.org/mozilla-central/rev/f7fd9b08c0156be5b5cd99de5ed0ed0b98d93051
Status: RESOLVED → REOPENED
Flags: needinfo?(nchen)
Resolution: FIXED → ---
Target Milestone: Firefox 62 → ---
Comment 51•4 years ago
|
||
backout |
https://hg.mozilla.org/releases/mozilla-beta/rev/275a429f33cf
Updated•4 years ago
|
Attachment #8980661 -
Flags: approval-mozilla-beta+ → approval-mozilla-beta-
Comment 52•4 years ago
|
||
(In reply to Kevin Brosnan [:kbrosnan] from comment #49) > I will bring this device to SF and maybe we can get this in front of a > developer. Kevin: Can you update this bug if you are able to do this?
Flags: needinfo?(kbrosnan)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 55•4 years ago
|
||
Comment on attachment 8980661 [details] Bug 1460989 - Hold system linker lock while modifying debug map; Basically the crash is caused by our linker racing with the system linker in changing the page protection flags. This patch fixes that by holding the system linker lock while changing the debug map.
Flags: needinfo?(nchen)
Attachment #8980661 -
Flags: review+ → review?(mh+mozilla)
Updated•4 years ago
|
Attachment #8980661 -
Flags: approval-mozilla-beta-
Comment 56•4 years ago
|
||
mozreview-review |
Comment on attachment 8980661 [details] Bug 1460989 - Hold system linker lock while modifying debug map; https://reviewboard.mozilla.org/r/246834/#review257582 ::: commit-message-d36cd:10 (Diff revision 4) > +flags. To fix the race, we need to take the system linker's internal > +lock when we perform any kind of modification on the debug map. > + > +One way to hold the system linker lock is to call dl_iterate_phdr, and > +perform our actions inside the callback, which is invoked with the > +locked being held. However, dl_iterate_phdr is only present on Android s/locked/lock/ ::: mozglue/linker/ElfLoader.cpp:321 (Diff revision 4) > +/** > + * Return the current Android version, or 0 on failure. > + */ > +int > +GetAndroidSDKVersion() { > + static int version; maybe explicitly initialize to 0? (although that's what it is implicitly) ::: mozglue/linker/ElfLoader.cpp:632 (Diff revision 4) > { > Register(static_cast<LibHandle *>(handle)); > if (dbg) { > - dbg.Add(handle); > + // We could race with the system linker when modifying the debug map, so > + // only do so while holding the system linker's internal lock. > + RunWithSystemLinkerLock([this, handle] { dbg.Add(handle); }); Ideally, the logging should be in DebuggerHelper::Add, but there is only this caller, and lambda'ing the whole function would not be much better, so I guess it's okay like this. ::: mozglue/linker/ElfLoader.cpp:661 (Diff revision 4) > { > Forget(static_cast<LibHandle *>(handle)); > if (dbg) { > - dbg.Remove(handle); > + // We could race with the system linker when modifying the debug map, so > + // only do so while holding the system linker's internal lock. > + RunWithSystemLinkerLock([this, handle] { dbg.Remove(handle); }); Likewise
Attachment #8980661 -
Flags: review?(mh+mozilla) → review+
Comment 57•4 years ago
|
||
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/mozilla-central/rev/a7384267f89c Hold system linker lock while modifying debug map. r=glandium, a=RyanVM
Comment 58•4 years ago
|
||
I landed this with glandium's first two nits addressed. I've also triggered new Fennec nightlies so we can get crash data over the weekend. Jim, can you please request Beta approval on this when you get a chance? I'll wait to approve until after reviewing the Play Store crash data on Monday, but it would be great to get it into the b15 build that morning. Thanks!
Status: REOPENED → RESOLVED
Closed: 4 years ago → 4 years ago
Flags: needinfo?(nchen)
Resolution: --- → FIXED
Target Milestone: --- → Firefox 62
Assignee | ||
Comment 59•4 years ago
|
||
Comment on attachment 8980661 [details] Bug 1460989 - Hold system linker lock while modifying debug map; Approval Request Comment [Feature/Bug causing the regression]: N/A [User impact if declined]: Random crashes on Galaxy S8/S9 [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: Tested locally [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Slightly [Why is the change risky/not risky?]: We are adding a new code path on every startup, but that is fairly trivial and should be very stable. [String changes made/needed]: None
Flags: needinfo?(nchen)
Attachment #8980661 -
Flags: approval-mozilla-beta?
Comment 60•4 years ago
|
||
Unfortunately, due to bug 1464034, we haven't been able to ship this fix to Nightly users yet :-\
Comment 61•4 years ago
|
||
Comment on attachment 8980661 [details] Bug 1460989 - Hold system linker lock while modifying debug map; I'm not going to be able to take this for today's b15 build due to the lack of crash data from over the weekend, so I'm moving the approval request to release to keep it on the radar for RC inclusion. We were able to manually upload a Nightly APK to the Play Store yesterday, so that at least gets us unblocked here.
Attachment #8980661 -
Flags: approval-mozilla-beta? → approval-mozilla-release?
Comment 62•4 years ago
|
||
Comment on attachment 8980661 [details] Bug 1460989 - Hold system linker lock while modifying debug map; The Google Play console is showing a big drop-off in crash rate on Nightly builds \m/. Approved for the Fennec 61 RC.
Attachment #8980661 -
Flags: approval-mozilla-release? → approval-mozilla-release+
Comment 63•4 years ago
|
||
bugherderuplift |
https://hg.mozilla.org/releases/mozilla-release/rev/c323b0b175c9
Updated•2 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•