Closed
Bug 1485759
Opened 7 years ago
Closed 7 years ago
No symbolication in Fennec Android Nightly since Aug 14
Categories
(Toolkit :: Crash Reporting, defect, P1)
Tracking
()
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | --- | unaffected |
firefox61 | --- | unaffected |
firefox62 | --- | unaffected |
firefox63 | blocking | fixed |
People
(Reporter: mccr8, Assigned: glandium)
References
Details
(Keywords: regression)
Attachments
(1 file)
2.47 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
Crashes from the 8-22 Android Nightly have signatures like
OOM | large | libxul.so@0xffeda4 | libxul.so@0xffcfdf | libxul.so@0xffa1b9 | libxul.so@0x1f1247f | libxul.so@0x1f13a7f | libxul.so@0x1f13a37 | libxul.so@0x1f13a37 | libxul.so@0x102f6fb | libxul.so@0x1f13a37 | libxul.so@0x102f50f | libxul.so@0x1f13e1d
example: bp-283d1876-52e1-483d-9893-7632f0180823
This seems to affect every C++ frame in the signatures.
Reporter | ||
Comment 1•7 years ago
|
||
Linux looks okay for that Nightly.
Comment 2•7 years ago
|
||
Are these possibly some of the Geckoview crashes? cpeterson might know.
Flags: needinfo?(cpeterson)
Comment 3•7 years ago
|
||
Socorro has a processor rule that fixes the product for crashes incorrectly marked as Fennec that should be Focus. The example crash in the description isn't a content process crash. Unless I'm misunderstanding things, I'm pretty sure it's not Focus and probably not a GeckoView crash.
Comment 4•7 years ago
|
||
Ok, based on Comment 3 clearing the ni for Chris.
Flags: needinfo?(cpeterson)
Reporter | ||
Comment 5•7 years ago
|
||
This is continuing in the 8-23 build, it looks like: https://crash-stats.mozilla.com/search/?build_id=20180823100113&release_channel=nightly&product=FennecAndroid&platform=Android&date=%3E%3D2018-08-22T17%3A00%3A00.000Z&date=%3C2018-08-24T11%3A19%3A46.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Comment 6•7 years ago
|
||
These unsymbolicated crash reports (or at least bp-283d1876-52e1-483d-9893-7632f0180823 from comment 0) are not from GeckoView because:
1. Focus is testing GeckoView 62.0b Beta and the above crash report is Gecko version 63.0a1 Nightly.
2. The above crash report has the "gws-and-facebook-spoof%40mozilla.org:1.0.0,webcompat%40mozilla.org:2.0.1" extension installed, which is a Fennec 63.0a1 Nightly experiment.
@ James: do you have any theories why we are getting unsymbolicated crash reports from Fennec 63.0a1 Nightly starting around August 22 this week?
status-firefox63:
--- → affected
Flags: needinfo?(snorp)
OS: Unspecified → Android
Summary: No symbolication for the 8-22 Android Nightly → No symbolication for the 8-22 Fennec Android Nightly
Reporter | ||
Comment 7•7 years ago
|
||
I went back to the old builds, and it looks like the first build that isn't symbolicated is the 20180814100103 build.
20180814100103 looks like the first bad build. For example: bp-26813110-78e1-4c0f-9333-f7c210180822
The prior build, 20180813100105, looks okay to me. For example: bp-54f60dc1-2547-4d77-b294-2166f0180817
The range for commits added to the 20180814100103 build is:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=bf79440c1376b1e1114ba653917e1577d7b1007b&tochange=914b3b370ad059a04ad751642b74e013f8e3ad08
I see "Bug 1480006 - Enable LTO on Android CI builds." in that range. glandium, can you take a look please?
Reporter | ||
Updated•7 years ago
|
tracking-firefox63:
--- → ?
Summary: No symbolication for the 8-22 Fennec Android Nightly → No symbolication in Fennec Android Nightly since LTO was enabled
Reporter | ||
Updated•7 years ago
|
Summary: No symbolication in Fennec Android Nightly since LTO was enabled → No symbolication in Fennec Android Nightly since Aug 14
Assignee | ||
Comment 9•7 years ago
|
||
If I take the crashreporter symbols from the nightly build corresponding to the crash in comment 0, that is:
https://queue.taskcluster.net/v1/task/ODjSM4BUSmOVwnkRxjSE2g/runs/0/artifacts/public/build/target.crashreporter-symbols.zip
That archive contains the file libxul.so/4296B0626A41F5C300000000000000000/libxul.so.sym, which looks file.
OTOH, the Module tab of the crash report says "missing symbols" for libxul.so debug identifier 4296B0626A41F5C301000000900000000, which matches.
Ted, any idea what's up there?
Flags: needinfo?(mh+mozilla) → needinfo?(ted)
Assignee | ||
Comment 10•7 years ago
|
||
Also, if I manually symbolicate, with that libxul.so.sym, the top frames are;
NS_ABORT_OOM(unsigned int) xpcom/base/nsDebugImpl.cpp:624
nsTSubstring<char16_t>::SetCapacity(unsigned int) xpcom/string/nsTSubstring.h:818
nsTSubstring<char16_t>::SetLength(unsigned int) xpcom/string/nsTSubstring.cpp:809
mozilla::dom::XMLHttpRequestMainThread::AppendToResponseText(char const*, unsigned int, bool) dom/xhr/XMLHttpRequestString.cpp:244
Some combo of glandium/ted probably has this under control :)
Flags: needinfo?(snorp)
Comment 12•7 years ago
|
||
For the crash in comment 0, looking at the raw dump tab shows in the modules list:
{ "base_addr": "0xc7889000", "code_id": "62b09642416ac3f50100000090000000", "debug_file": "libxul.so", "debug_id": "4296B0626A41F5C301000000900000000", "end_addr": "0xcb2cf000", "filename": "libxul.so", "missing_symbols": true, "version": "" },
I downloaded the matching build:
https://queue.taskcluster.net/v1/task/ODjSM4BUSmOVwnkRxjSE2g/runs/0/artifacts/public/build/en-US/target.apk
And (after decompressing it with xz) the libxul.so in there shows a very short build id:
Displaying notes found at file offset 0x009100a0 with length 0x00000018:
Owner Data size Description
GNU 0x00000008 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 62b09642416ac3f5
dump_syms on that file produces:
$ dump_syms -i ./libxul.so
MODULE Linux arm 4296B0626A41F5C300000000000000000 libxul.so
INFO CODE_ID 62B09642416AC3F5
For comparison, I downloaded a nightly from 2018-08-13 (right before enabling LTO):
https://index.taskcluster.net/v1/task/gecko.v2.mozilla-central.nightly.2018.08.13.latest.mobile.android-api-16-opt/artifacts/public/build/en-US/target.apk
and it has a Build ID that's the length I would expect:
Displaying notes found at file offset 0x000001ec with length 0x00000024:
Owner Data size Description
GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring)
Build ID: 2c23f3cb5ea39f0db6cade029c8217dc9696d647
dump_syms on that file produces:
$ dump_syms -i ./libxul.so
MODULE Linux arm CBF3232CA35E0D9FB6CADE029C8217DC0 libxul.so
INFO CODE_ID 2C23F3CB5EA39F0DB6CADE029C8217DC9696D647
I think the Breakpad minidump writing code might have a bug with Build IDs that are this short. It looks like it's reading off the end of the array or something.
Flags: needinfo?(ted)
Assignee | ||
Comment 13•7 years ago
|
||
aha! with bfd ld, --build-id is equivalent to --build-id=sha1. with lld, it's equivalent to --build-id=fast. We "just" need to be more explicit.
Assignee | ||
Comment 14•7 years ago
|
||
Assignee: nobody → mh+mozilla
Attachment #9004538 -
Flags: review?(core-build-config-reviews)
![]() |
||
Updated•7 years ago
|
Attachment #9004538 -
Flags: review?(core-build-config-reviews) → review+
Assignee | ||
Comment 15•7 years ago
|
||
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #12)
> For the crash in comment 0, looking at the raw dump tab shows in the modules
> list:
> { "base_addr": "0xc7889000", "code_id": "62b09642416ac3f50100000090000000",
> "debug_file": "libxul.so", "debug_id": "4296B0626A41F5C301000000900000000",
> "end_addr": "0xcb2cf000", "filename": "libxul.so", "missing_symbols": true,
> "version": "" },
Heh, I actually already pasted that number in comment 9, but failed to realize that there was a 9 in between the zeros.
> I think the Breakpad minidump writing code might have a bug with Build IDs
> that are this short. It looks like it's reading off the end of the array or
> something.
I think it's still desirable to have sha1s as build-ids (or at least something larger), but it seems like it would be a good thing to fix that bug indeed.
Comment 16•7 years ago
|
||
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2b045052d4aa
Pass --build-id=sha1 to the linker instead of --build-id. r=froydnj
Comment 17•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Updated•7 years ago
|
Severity: normal → blocker
status-firefox61:
--- → unaffected
status-firefox62:
--- → unaffected
status-firefox-esr52:
--- → unaffected
status-firefox-esr60:
--- → unaffected
Priority: -- → P1
Comment 18•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #13)
> aha! with bfd ld, --build-id is equivalent to --build-id=sha1. with lld,
> it's equivalent to --build-id=fast. We "just" need to be more explicit.
It would be nice if lld had a page on llvm.org that listed its commandline options. :-/
https://github.com/llvm-mirror/lld/blob/b771e1958601a28fafce682708530b493d0c89a6/ELF/Options.td#L30
Spelunking through blame shows:
https://github.com/llvm-mirror/lld/commit/3408d8720ddc65f867e6046f0ebd898feaae1075
"We made a deliberate choice to not use a secure hash function for the
sake of performance. Computing a secure hash is slow -- for example,
MD5 throughput is usually 400 MB/s or so. SHA1 is slower than that."
(In reply to Mike Hommey [:glandium] from comment #15)
> I think it's still desirable to have sha1s as build-ids (or at least
> something larger), but it seems like it would be a good thing to fix that
> bug indeed.
I filed bug 1487197 on that.
Reporter | ||
Comment 19•7 years ago
|
||
Looks like Android symbolication is working again: bp-57c5ddd9-6275-4e4e-8854-cf2340180829
Thanks for the fix!
You need to log in
before you can comment on or make changes to this bug.
Description
•