Closed
Bug 884300
Opened 11 years ago
Closed 11 years ago
CFI records broken on all non-Windows platforms
Categories
(Toolkit :: Crash Reporting, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla24
People
(Reporter: benjamin, Assigned: ted)
References
Details
Attachments
(1 file)
55.30 KB,
patch
|
glandium
:
review+
lsblakk
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
Stackwalking is very broken on x86-android. Symbols are present, but it looks like we're only stack-scanning. I haven't had time to investigate yet, but I'd love help.
In the meantime treat stacks on android-x86 as vague guides for issues.
Does anyone whom regularly looks at breakpad stuff have an android-x86 device?
See e.g. bp-6f72b2a0-ac35-440d-aafb-2a6b42130617 from bug 884200
Comment 1•11 years ago
|
||
Do we actually have the CFI info on the symbol server?
Assignee | ||
Comment 2•11 years ago
|
||
[tmielczarek@symbols1.dmz.phx1 symbols_mob]$ grep nsXPCWrappedJS::CallMethod libxul.so/200331396232091C8697FA6B30D2083A0/libxul.so.sym
FUNC c24b3e 125 0 nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*)
[tmielczarek@symbols1.dmz.phx1 symbols_mob]$ grep "STACK CFI INIT c24b3e" libxul.so/200331396232091C8697FA6B30D2083A0/libxul.so.sym
STACK CFI INIT c24b3e 125 .cfa: 0x3bd5e430 4 + .ra: 0x15af3200 -4 + ^
The CFI records look pretty weird. There are...constants in there for some reason. The symbol files for this libxul are:
http://symbols.mozilla.org/fennec/libxul.so/200331396232091C8697FA6B30D2083A0/libxul.so.sym
http://symbols.mozilla.org/fennec/libxul.so/200331396232091C8697FA6B30D2083A0/libxul.so.dbg.gz
...although, looking at an ARM symbol file, I see the same weirdness:
[tmielczarek@symbols1.dmz.phx1 symbols_mob]$ grep nsXPCWrappedJS::CallMethod libxul.so/350963275103BEEF7243723E5CF7551A0/libxul.so.sym
FUNC 7a9160 a0 0 nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*)
[tmielczarek@symbols1.dmz.phx1 symbols_mob]$ grep "STACK CFI INIT 7a9160" libxul.so/350963275103BEEF7243723E5CF7551A0/libxul.so.sym
STACK CFI INIT 7a9160 a0 .cfa: 0x10fadbc0 0 + .ra: 0x10fae0d8 0 +
Comment 3•11 years ago
|
||
I know that if there's something to do on the product side, this has no real chance for 22, but I think we should track nevertheless in case all we need there is server-side anyhow.
Bascially, many crash reports on Android x86 are not very meaningful as long as this bug exists (some reports might at least tell us something, and the Java-exception-type crash reports should still be OK).
tracking-firefox22:
--- → ?
tracking-firefox23:
--- → ?
Assignee | ||
Comment 4•11 years ago
|
||
This is fallout from the prerequisite patches from bug 779291. It wasn't actually caught by the breakpad unit tests either. :-(
Assignee | ||
Updated•11 years ago
|
OS: Android → Linux
Summary: Stackwalking broken on x86-android → CFI records broken on Linux platforms
Assignee | ||
Comment 5•11 years ago
|
||
I added a new check in the Breakpad unit tests to catch this, and verified that it fails without the changes in module.cc, but passes with those changes.
Attachment #764185 -
Flags: review?(mh+mozilla)
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → ted
Comment 6•11 years ago
|
||
Comment on attachment 764185 [details] [diff] [review]
fix DWARF CFI record serialization
Review of attachment 764185 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/crashreporter/breakpad-patches/03-unique-string.patch
@@ +1681,5 @@
> + }
> +
> + TEST(Write, OmitUnusedFiles) {
> + Module m(MODULE_NAME, MODULE_OS, MODULE_ARCH, MODULE_ID);
> +@@ -229,21 +239,23 @@
That entire block moving in the patch looks weird. Was it outdated?
Attachment #764185 -
Flags: review?(mh+mozilla) → review+
Assignee | ||
Comment 7•11 years ago
|
||
I think something might have confused diff. The actual patch looks fine.
Assignee | ||
Comment 8•11 years ago
|
||
Comment 9•11 years ago
|
||
We'll be landing this to all branches including mozilla-release, once ready. If we don't take this fix, it will hinder our ability to diagnose and resolve Android x86 crashes.
Assignee | ||
Comment 10•11 years ago
|
||
Alex: I tried to clarify in the summary, but I guess I didn't make it clear enough. this breaks stackwalking on all of our non-Windows platforms: Mac, Linux, Android. Some may be broken worse than others, so it may not have been as noticeable.
Assignee | ||
Updated•11 years ago
|
Summary: CFI records broken on Linux platforms → CFI records broken on all non-Windows platforms
Comment 11•11 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #10)
> Alex: I tried to clarify in the summary, but I guess I didn't make it clear
> enough. this breaks stackwalking on all of our non-Windows platforms: Mac,
> Linux, Android. Some may be broken worse than others, so it may not have
> been as noticeable.
If this was significantly impacting our crash investigations, we'd have noticed over the past couple of cycles though right? It's not that signature reports are being dropped or something critical in that way?
As long as we don't think this is hindering our investigations, we only want to respin Mobile to be super cautious around Android x86.
Assignee | ||
Comment 12•11 years ago
|
||
Note that the patch here is very tiny and only impacts symbol dumping, not anything else. (The fix itself was two lines, the rest is unit tests + updating a copy of the local patch against upstream.)
Comment 13•11 years ago
|
||
(In reply to Alex Keybl [:akeybl] from comment #11)
> If this was significantly impacting our crash investigations, we'd have
> noticed over the past couple of cycles though right?
Not necessarily. We sometimes have bogus stacks due to corruption, even if breakpad works fine, and our crash investigations are very much centered around Windows as that's where most of our crashes come from, and on Android a lot of it is Java exceptions which are not impacted.
Do we know if all stacks look corrupt because of this, and if all frames are bogus? Might the signatures or even some stacks be still good on some platforms?
Reporter | ||
Comment 14•11 years ago
|
||
The top frame should always be correct. This will only affect signatures when the top frame hits the skiplist. If that happens, the signatures could be really weird or totally bogus.
Assignee | ||
Updated•11 years ago
|
Keywords: helpwanted
Assignee | ||
Comment 15•11 years ago
|
||
Note that we shipped this broken in Firefox 21, if that changes your desire to take the fix on release. This is a very safe patch that I'd be personally fine with landing anywhere, but since we shipped it broken in 21 and didn't really notice, it's hard to argue that we desperately need it for 22.
status-firefox21:
--- → affected
status-firefox22:
--- → affected
status-firefox23:
--- → affected
status-firefox24:
--- → affected
Comment 16•11 years ago
|
||
Let's punt to FF23 given comment 14 and comment 15. I'd rather we not take any non-critical change, and this doesn't feel critical at this point.
Comment 17•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla24
Updated•11 years ago
|
Assignee | ||
Comment 18•11 years ago
|
||
I was going to check the symbols on the symbol server, but this didn't land in time for today's nightly builds. I settled for grabbing the symbol package from a m-c build:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1371648262/firefox-24.0a1.en-US.linux-x86_64.crashreporter-symbols.zip
Spot checking that looks good:
STACK CFI INIT 34cc 9a .cfa: $rsp 8 + .ra: .cfa -8 + ^
Assignee | ||
Comment 19•11 years ago
|
||
Comment on attachment 764185 [details] [diff] [review]
fix DWARF CFI record serialization
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Originally landed as part of bug 779291 (landed prior to the final patches there, so it made Firefox 21)
User impact if declined: Crash reports on non-Windows platforms can have bad stacks.
Testing completed (on m-c, etc.): Landed on m-c, has unit tests in the Breakpad tree (we don't run these on Firefox builds).
Risk to taking this patch (and alternatives if risky): Very little risk, the fix is a two-line change that only impacts serializing symbols to disk, so only the symbol dumping process.
String or IDL/UUID changes made by this patch: None
Attachment #764185 -
Flags: approval-mozilla-aurora?
Updated•11 years ago
|
Attachment #764185 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Assignee | ||
Comment 20•11 years ago
|
||
You need to log in
before you can comment on or make changes to this bug.
Description
•