Closed
Bug 1451891
Opened 6 years ago
Closed 6 years ago
More crashes in __fixunsdfsi
Categories
(Firefox for Android Graveyard :: General, defect, P1)
Tracking
(firefox-esr52 unaffected, firefox-esr60 unaffected, firefox59 unaffected, firefox60 unaffected, firefox61+ fixed, firefox62 verified, firefox63 verified)
VERIFIED
FIXED
Firefox 63
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | --- | unaffected |
firefox59 | --- | unaffected |
firefox60 | --- | unaffected |
firefox61 | + | fixed |
firefox62 | --- | verified |
firefox63 | --- | verified |
People
(Reporter: mccr8, Assigned: jchen)
References
Details
(Keywords: crash, regression, Whiteboard: [geckoview:p2])
Crash Data
Attachments
(1 file)
59 bytes,
text/x-review-board-request
|
glandium
:
review+
pascalc
:
approval-mozilla-beta+
|
Details |
I'm still seeing crashes with this signature from bug 1447607 in the April 4 Android Nightlies. It is still the top crash, though the volume is greatly reduced. It is "only" 27% of all Android crashes. Here's an example: bp-978a316e-ec20-4fc6-9ef2-ee23f0180405 I don't know if this bug should be reopened or a new bug filed, so I'll just file a new bug and it can be duped over as appropriate.
Reporter | ||
Comment 1•6 years ago
|
||
This is still a top crash, so I'm nominating it for tracking.
tracking-firefox61:
--- → ?
Updated•6 years ago
|
status-firefox59:
--- → unaffected
status-firefox60:
--- → unaffected
status-firefox-esr52:
--- → unaffected
Reporter | ||
Comment 2•6 years ago
|
||
David, is there somebody who can investigate this while snorp is on PTO? Thanks.
Flags: needinfo?(dbolter)
Comment 3•6 years ago
|
||
This could be related to ElfLoader not using thread-local-storage to store its errors, as raised by :snorp at https://bugzilla.mozilla.org/show_bug.cgi?id=1447607#c37. Unfortunately, the lower level dlerror()'s returned strings aren't just C string literals that permanently exist in the process's address space, but stuff sprintf'ed into a per-thread buffer. If the thread goes away, the buffer is no longer legal memory to read. Of course, timing-wise, it does seem somewhat unlikely that this is directly caused by a race. More likely is that the TLS issue lets mismatched "cause error that sets the error; call dlerror()" pairs become a problem. Looking at the wrapped dlsym, it looks like there's a hazard still, namely at https://searchfox.org/mozilla-central/rev/2ce99e8054b0ff6ed1adf484aeaacacf2fea084c/mozglue/linker/ElfLoader.cpp#86 this block: if (handle != RTLD_DEFAULT && handle != RTLD_NEXT) { LibHandle *h = reinterpret_cast<LibHandle *>(handle); return h->GetSymbolPtr(symbol); } If GetSymbolPtr could possibly return null, then we can expect someone to call dlerror() after calling this and get whatever lastError is, but nowhere in the branch of this wrapped-dlsym did we set lastError. And lastError gets set all over the place in cases that don't seem to guarantee that the caller will invoke dlerror(). Making lastError something that's stored in TLS is probably the simplest option. A more hacky option is to maintain a singleton char buffer on the ElfLoader itself and strncpy whatever dlerror() returns into that at the time, since that ensures per-thread pointers never leave the thread. Further auditing of ElfLoader also sounds good.
Updated•6 years ago
|
Flags: needinfo?(dbolter) → needinfo?(snorp)
I don't see any crashes with a build ID later than the 4th, so maybe we're ok now?
Flags: needinfo?(snorp)
Reporter | ||
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
Updated•6 years ago
|
Comment 5•6 years ago
|
||
This appears to be back again in 62 nightly, at least I spotted one signature in crash stats - https://bit.ly/2JGevrN.
Reporter | ||
Comment 6•6 years ago
|
||
A dozen of these in the June 27th build, from a number of different install times, making it the #2 Android, after a crash that was all from a single install time.
Comment 7•6 years ago
|
||
__fixunsdfsi is the top crash in Fennec Nightly 63.0a1.
status-firefox63:
--- → affected
status-firefox-esr60:
--- → unaffected
Comment 8•6 years ago
|
||
I found this crash today testing 62 Beta 7 following the steps: Case A: 1." Always restore" off; 2. From Recently closed I restored all tabs; 3. Close the browser and opened again. Case B: 1. Switching between languages. 2. Go to Mozilla -> About Firefox beta; 3. Close the browser and re-open. Note: from 16 languages tested, I had 8 crashes.
Comment 9•6 years ago
|
||
Crash also reproduced with the following steps and setup: Sony Xperia Z3 (Android 5.1.1) Firefox Nightly 63.0a1 (2018-07-10) 1. Select Tools > Downloads 2. Close Firefox using its Quit option or by Force Stopping it (Android Settings > Apps > Firefox > Force Stop). 3. Reopen Firefox, downloads tab restored
Updated•6 years ago
|
Whiteboard: [geckoview]
Comment 10•6 years ago
|
||
Marcia, James says __fixunsdfsi is a generic function and Socorro should skip __fixunsdfsi stack frames to find a useful function name for the crash signature. Who should I ask about add __fixunsdfsi to Socorro's skip list?
Flags: needinfo?(mozillamarcia.knous)
Whiteboard: [geckoview] → [geckoview:p2]
Reporter | ||
Comment 11•6 years ago
|
||
(In reply to Chris Peterson [:cpeterson] from comment #10) > Marcia, James says __fixunsdfsi is a generic function and Socorro should > skip __fixunsdfsi stack frames to find a useful function name for the crash > signature. Who should I ask about add __fixunsdfsi to Socorro's skip list? You can file a bug in the Socorro::Signature component. It also only takes a few minutes to do the pull request yourself. Bug 1471379 is an example.
Comment 12•6 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #11) > You can file a bug in the Socorro::Signature component. It also only takes a > few minutes to do the pull request yourself. Thanks! I filed bug 1475482.
Flags: needinfo?(mozillamarcia.knous)
Comment 13•6 years ago
|
||
We still have a lot of crashes in pr_FindSymbolInLib on beta 62. Is that also something we should add to the skip list?
Flags: needinfo?(nchen)
Comment hidden (obsolete) |
Assignee | ||
Comment 15•6 years ago
|
||
It looks like `DLLErrorInternal` and `pr_FindSymbolInLib` are actually involved in this crash so I guess they shouldn't go on the skip list. I'm looking at writing a speculative fix.
Assignee: nobody → nchen
Flags: needinfo?(nchen)
Comment hidden (mozreview-request) |
Assignee | ||
Comment 17•6 years ago
|
||
Comment on attachment 8994681 [details] Bug 1451891 - Fix race conditions in __wrap_dlerror; Actually this is a race condition and there is a better, proper fix.
Attachment #8994681 -
Flags: review?(jorendorff)
Comment hidden (mozreview-request) |
Comment 19•6 years ago
|
||
mozreview-review |
Comment on attachment 8994681 [details] Bug 1451891 - Fix race conditions in __wrap_dlerror; https://reviewboard.mozilla.org/r/259198/#review266506 Also, why does having a null or incorrect dlerror message cause a crash? DLLInternalError appears handle the null error case: https://searchfox.org/mozilla-central/rev/bdfd20ef30d521b57d5b6feeda71325e8b4cad66/nsprpub/pr/src/linking/prlink.c#118,125,135 ::: mozglue/linker/ElfLoader.cpp:99 (Diff revision 2) > - void* sym = dlsym(handle, symbol); > - ElfLoader::Singleton.lastError = dlerror(); > + ElfLoader::Singleton.lastError = nullptr; // Use system dlerror. > + return dlsym(handle, symbol); Could there still be a race where some other thread sets lastError to a custom/non-system error pointer in between setting `lastError = nullptr` here and __wrap_dlerror fetching lastError.exchange(nullptr)?
Assignee | ||
Comment 20•6 years ago
|
||
Null is fine, but a dlerror message from another thread is potentially bad, because the thread the message came from could have exited. In that case the buffer that the dlerror message points to would have been freed, and reading the dlerror message would result in a crash.
Comment 21•6 years ago
|
||
mozreview-review |
Comment on attachment 8994681 [details] Bug 1451891 - Fix race conditions in __wrap_dlerror; https://reviewboard.mozilla.org/r/259198/#review266576
Attachment #8994681 -
Flags: review?(mh+mozilla) → review+
Comment 22•6 years ago
|
||
Pushed by nchen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c95b51e4a996 Fix race conditions in __wrap_dlerror; r=glandium
Comment 23•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c95b51e4a996
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 63
Assignee | ||
Comment 25•6 years ago
|
||
Comment on attachment 8994681 [details] Bug 1451891 - Fix race conditions in __wrap_dlerror; Approval Request Comment [Feature/Bug causing the regression]: N/A [User impact if declined]: Random crashes [Is this code covered by automated tests?]: No [Has the fix been verified in Nightly?]: Yes [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Not really [Why is the change risky/not risky?]: Patch makes the affected code more thread-safe but should have no effect on functionality [String changes made/needed]: None
Flags: needinfo?(nchen)
Attachment #8994681 -
Flags: approval-mozilla-beta?
Comment 26•6 years ago
|
||
Comment on attachment 8994681 [details] Bug 1451891 - Fix race conditions in __wrap_dlerror; Crashes stopped on nightly since the landing of the patch 4 days ago and no regression reported, approving for the next Beta.
Attachment #8994681 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 27•6 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/c43d2dbaf59e
Comment 28•6 years ago
|
||
Device: - Samsung Galaxy Note 8 (Android 8.0) Verified this following the steps provided in comment 8 and comment 9 and FF did not crash.
Status: RESOLVED → VERIFIED
Hardware: Unspecified → ARM
Updated•3 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•