Closed Bug 1485759 Opened 6 years ago Closed 6 years ago

No symbolication in Fennec Android Nightly since Aug 14

Categories

(Toolkit :: Crash Reporting, defect, P1)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- unaffected
firefox61 --- unaffected
firefox62 --- unaffected
firefox63 blocking fixed

People

(Reporter: mccr8, Assigned: glandium)

References

Details

(Keywords: regression)

Attachments

(1 file)

Crashes from the 8-22 Android Nightly have signatures like

OOM | large | libxul.so@0xffeda4 | libxul.so@0xffcfdf | libxul.so@0xffa1b9 | libxul.so@0x1f1247f | libxul.so@0x1f13a7f | libxul.so@0x1f13a37 | libxul.so@0x1f13a37 | libxul.so@0x102f6fb | libxul.so@0x1f13a37 | libxul.so@0x102f50f | libxul.so@0x1f13e1d

example: bp-283d1876-52e1-483d-9893-7632f0180823

This seems to affect every C++ frame in the signatures.
Linux looks okay for that Nightly.
Are these possibly some of the Geckoview crashes? cpeterson might know.
Flags: needinfo?(cpeterson)
Socorro has a processor rule that fixes the product for crashes incorrectly marked as Fennec that should be Focus. The example crash in the description isn't a content process crash. Unless I'm misunderstanding things, I'm pretty sure it's not Focus and probably not a GeckoView crash.
Ok, based on Comment 3 clearing the ni for Chris.
Flags: needinfo?(cpeterson)
These unsymbolicated crash reports (or at least bp-283d1876-52e1-483d-9893-7632f0180823 from comment 0) are not from GeckoView because:

1. Focus is testing GeckoView 62.0b Beta and the above crash report is Gecko version 63.0a1 Nightly.
2. The above crash report has the "gws-and-facebook-spoof%40mozilla.org:1.0.0,webcompat%40mozilla.org:2.0.1" extension installed, which is a Fennec 63.0a1 Nightly experiment.

@ James: do you have any theories why we are getting unsymbolicated crash reports from Fennec 63.0a1 Nightly starting around August 22 this week?
Flags: needinfo?(snorp)
OS: Unspecified → Android
Summary: No symbolication for the 8-22 Android Nightly → No symbolication for the 8-22 Fennec Android Nightly
I went back to the old builds, and it looks like the first build that isn't symbolicated is the 20180814100103 build.

20180814100103 looks like the first bad build. For example: bp-26813110-78e1-4c0f-9333-f7c210180822
The prior build, 20180813100105, looks okay to me. For example: bp-54f60dc1-2547-4d77-b294-2166f0180817

The range for commits added to the 20180814100103 build is:
  https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=bf79440c1376b1e1114ba653917e1577d7b1007b&tochange=914b3b370ad059a04ad751642b74e013f8e3ad08

I see "Bug 1480006 - Enable LTO on Android CI builds." in that range. glandium, can you take a look please?
Blocks: android-lto
Flags: needinfo?(mh+mozilla)
Keywords: regression
Summary: No symbolication for the 8-22 Fennec Android Nightly → No symbolication in Fennec Android Nightly since LTO was enabled
Summary: No symbolication in Fennec Android Nightly since LTO was enabled → No symbolication in Fennec Android Nightly since Aug 14
Tracking 63+ for this issues since we need crash symbols for Fennec.
If I take the crashreporter symbols from the nightly build corresponding to the crash in comment 0, that is:
https://queue.taskcluster.net/v1/task/ODjSM4BUSmOVwnkRxjSE2g/runs/0/artifacts/public/build/target.crashreporter-symbols.zip

That archive contains the file libxul.so/4296B0626A41F5C300000000000000000/libxul.so.sym, which looks file.

OTOH, the Module tab of the crash report says "missing symbols" for libxul.so debug identifier 4296B0626A41F5C301000000900000000, which matches.

Ted, any idea what's up there?
Flags: needinfo?(mh+mozilla) → needinfo?(ted)
Also, if I manually symbolicate, with that libxul.so.sym, the top frames are;

NS_ABORT_OOM(unsigned int) xpcom/base/nsDebugImpl.cpp:624
nsTSubstring<char16_t>::SetCapacity(unsigned int) xpcom/string/nsTSubstring.h:818
nsTSubstring<char16_t>::SetLength(unsigned int) xpcom/string/nsTSubstring.cpp:809
mozilla::dom::XMLHttpRequestMainThread::AppendToResponseText(char const*, unsigned int, bool) dom/xhr/XMLHttpRequestString.cpp:244
Some combo of glandium/ted probably has this under control :)
Flags: needinfo?(snorp)
For the crash in comment 0, looking at the raw dump tab shows in the modules list:
 { "base_addr": "0xc7889000", "code_id": "62b09642416ac3f50100000090000000", "debug_file": "libxul.so", "debug_id": "4296B0626A41F5C301000000900000000", "end_addr": "0xcb2cf000", "filename": "libxul.so", "missing_symbols": true, "version": "" },

I downloaded the matching build:
https://queue.taskcluster.net/v1/task/ODjSM4BUSmOVwnkRxjSE2g/runs/0/artifacts/public/build/en-US/target.apk

And (after decompressing it with xz) the libxul.so in there shows a very short build id:

Displaying notes found at file offset 0x009100a0 with length 0x00000018:
  Owner                 Data size	Description
  GNU                  0x00000008	NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 62b09642416ac3f5

dump_syms on that file produces:
$ dump_syms -i ./libxul.so 
MODULE Linux arm 4296B0626A41F5C300000000000000000 libxul.so
INFO CODE_ID 62B09642416AC3F5

For comparison, I downloaded a nightly from 2018-08-13 (right before enabling LTO):
https://index.taskcluster.net/v1/task/gecko.v2.mozilla-central.nightly.2018.08.13.latest.mobile.android-api-16-opt/artifacts/public/build/en-US/target.apk

and it has a Build ID that's the length I would expect:
Displaying notes found at file offset 0x000001ec with length 0x00000024:
  Owner                 Data size	Description
  GNU                  0x00000014	NT_GNU_BUILD_ID (unique build ID bitstring)
    Build ID: 2c23f3cb5ea39f0db6cade029c8217dc9696d647


dump_syms on that file produces:
$ dump_syms -i ./libxul.so 
MODULE Linux arm CBF3232CA35E0D9FB6CADE029C8217DC0 libxul.so
INFO CODE_ID 2C23F3CB5EA39F0DB6CADE029C8217DC9696D647

I think the Breakpad minidump writing code might have a bug with Build IDs that are this short. It looks like it's reading off the end of the array or something.
Flags: needinfo?(ted)
aha! with bfd ld, --build-id is equivalent to --build-id=sha1. with lld, it's equivalent to --build-id=fast. We "just" need to be more explicit.
Assignee: nobody → mh+mozilla
Attachment #9004538 - Flags: review?(core-build-config-reviews)
Attachment #9004538 - Flags: review?(core-build-config-reviews) → review+
(In reply to Ted Mielczarek [:ted] [:ted.mielczarek] from comment #12)
> For the crash in comment 0, looking at the raw dump tab shows in the modules
> list:
>  { "base_addr": "0xc7889000", "code_id": "62b09642416ac3f50100000090000000",
> "debug_file": "libxul.so", "debug_id": "4296B0626A41F5C301000000900000000",
> "end_addr": "0xcb2cf000", "filename": "libxul.so", "missing_symbols": true,
> "version": "" },

Heh, I actually already pasted that number in comment 9, but failed to realize that there was a 9 in between the zeros.

> I think the Breakpad minidump writing code might have a bug with Build IDs
> that are this short. It looks like it's reading off the end of the array or
> something.

I think it's still desirable to have sha1s as build-ids (or at least something larger), but it seems like it would be a good thing to fix that bug indeed.
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2b045052d4aa
Pass --build-id=sha1 to the linker instead of --build-id. r=froydnj
https://hg.mozilla.org/mozilla-central/rev/2b045052d4aa
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Severity: normal → blocker
Priority: -- → P1
(In reply to Mike Hommey [:glandium] from comment #13)
> aha! with bfd ld, --build-id is equivalent to --build-id=sha1. with lld,
> it's equivalent to --build-id=fast. We "just" need to be more explicit.

It would be nice if lld had a page on llvm.org that listed its commandline options. :-/

https://github.com/llvm-mirror/lld/blob/b771e1958601a28fafce682708530b493d0c89a6/ELF/Options.td#L30

Spelunking through blame shows:
https://github.com/llvm-mirror/lld/commit/3408d8720ddc65f867e6046f0ebd898feaae1075

"We made a deliberate choice to not use a secure hash function for the
sake of performance. Computing a secure hash is slow -- for example,
MD5 throughput is usually 400 MB/s or so. SHA1 is slower than that."

(In reply to Mike Hommey [:glandium] from comment #15)
> I think it's still desirable to have sha1s as build-ids (or at least
> something larger), but it seems like it would be a good thing to fix that
> bug indeed.

I filed bug 1487197 on that.
Looks like Android symbolication is working again: bp-57c5ddd9-6275-4e4e-8854-cf2340180829

Thanks for the fix!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: