Many crash reports for "official" builds are missing symbols for Mozilla-specific modules
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
People
(Reporter: smichaud, Unassigned)
References
Details
I first noticed the problem on macOS, but it seems worst on Linux and Android:
Look at the "Modules" tab in each crash report, in the list above. The ones matching this report are those where the Mozilla-specific modules have no symbols.
These crash reports are for "official" builds -- those on the "release" or "nightly" release channels. They're not for try builds or local builds. You'd think they should all have symbols on the symbol server.
For some reason the Windows builds aren't effected -- or at least they're much less effected.
Reporter | ||
Comment 1•4 years ago
|
||
Here's one example of a build whose Mozilla-specific symbols are consistently missing from all crash reports:
Build id 20211109190508 (for macOS, on trunk):
And another of a build where the Mozilla-specific symbols are missing from some reports but not others:
Build id 20211105093421 (for macOS, on trunk):
Comment 2•4 years ago
|
||
I looked at one of the crash reports in the first query for build id 20211109190508. Here's a sample crash report:
https://crash-stats.mozilla.org/report/index/4c3204f6-f990-4928-85cc-d3da20211115
The xul file has debug id 0645105DF0F537C99BA632680734B5FC0
. It's for amd64 build.
For the second query, here's a sample crash report with a frame from xul that was symbolicated:
https://crash-stats.mozilla.org/report/index/c370102d-a909-4515-8b3a-d4cd90211110
The debug id for the xul symbol is 3FB2F572076032AFAEDC9D6E7F9C6A2F0
. This is an arm64 build.
Here's one that wasn't symbolicated:
https://crash-stats.mozilla.org/report/index/bd1439dc-f93c-48a3-844e-18e260211109
The debug id for the xul symbol is 35C78809BBC53CCCA7E32AF90FC46E730
. This is an amd64 build.
Does anyone know if we are having difficulties uploading symbols? If we were, is there anything that notifies someone that symbols weren't uploaded successfully?
Reporter | ||
Comment 3•4 years ago
|
||
Does anyone know if we are having difficulties uploading symbols?
Not me :-)
I've been manually scraping symbols for macOS system files for the last two years, and sending them to Gabriele Svelto or Marco Castelluccio, who've been uploading them to the symbol server. As far as I know there've been no problems with that. But I have nothing to do with getting Mozilla-specific symbols to the symbol server. I have the impression that's done (more or less) automatically, as part of the building process.
Reporter | ||
Comment 4•4 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #2)
For the second query, here's a sample crash report with a frame from xul that was symbolicated:
https://crash-stats.mozilla.org/report/index/c370102d-a909-4515-8b3a-d4cd90211110
The debug id for the xul symbol is
3FB2F572076032AFAEDC9D6E7F9C6A2F0
. This is an arm64 build.Here's one that wasn't symbolicated:
https://crash-stats.mozilla.org/report/index/bd1439dc-f93c-48a3-844e-18e260211109
The debug id for the xul symbol is
35C78809BBC53CCCA7E32AF90FC46E730
. This is an amd64 build.
Yes, the crash reports for build id 20211105093421 seem to consistently be missing Mozilla-specific symbols for crashes on amd64 builds:
But not for those on arm64 builds:
Reporter | ||
Comment 5•4 years ago
|
||
Another interesting case, where the same build sometimes is missing Mozilla-specific symbols in crash reports, but this is not determined by "cpu architecture":
The build id is 20210927210923
(Firefox 93.0). Most crash reports for this build are fully symbolicated:
But a few crash reports for Firefox 93.0 are missing Mozilla-specific symbols:
bp-77475b68-5c1a-4ee4-8274-daf010211104
bp-aa3ab071-02f5-4f1c-96fb-f8e2b0211101
bp-d9b1d5a6-ca67-4b19-b409-0f0eb0211021
bp-2c66223d-cac2-4fff-a1ef-ea3460211013
bp-d062ad57-890e-443f-a88d-e38cf0211013
Reporter | ||
Comment 6•4 years ago
|
||
(Following up comment #5)
Oops, I goofed. All the crash reports that are missing Mozilla-specific symbols are on the "default" branch -- meaning that they're try builds or local builds.
Reporter | ||
Comment 7•4 years ago
|
||
(Following up comment #6)
Actually, two of them are on the release channel:
bp-aa3ab071-02f5-4f1c-96fb-f8e2b0211101
bp-d9b1d5a6-ca67-4b19-b409-0f0eb0211021
It may be significant that both are for the content process.
Reporter | ||
Comment 8•4 years ago
|
||
It may be significant that both are for the content process.
It probably is significant.
I can't find any Firefox crash reports matching this bug report for relatively recent Firefox releases (90.0+) for the parent process:
But I do find some for the content process:
Reporter | ||
Comment 9•4 years ago
|
||
If I'm right about the content process being (sometimes) significant, we may have two (unrelated) bugs here:
-
The build process for official builds sometimes fails to copy Mozilla-specific symbols for that build to the symbol server.
-
Something (probably on the server side) sometimes fails to find Mozilla-specific symbols for content-process crashes.
Comment 10•4 years ago
|
||
Running those last two crash-reports (bp-aa3ab071-02f5-4f1c-96fb-f8e2b0211101 and bp-d9b1d5a6-ca67-4b19-b409-0f0eb0211021) in socc-pair: rust-minidump agrees there are no symbols, and the module entry in the minidump for XUL is specifically dummy/corrupt:
318: {
base_addr: 0x10bcf2000
- did not match
+ code_id: id
- code_id: 0000000074a7000
debug_file: XUL
debug_id: 000000000000000000000000000000000
end_addr: 0x113199000
filename: XUL
missing_symbols: true
version: 0.1.0.0
}
319: {
base_addr: 0x7fff3da7a000
- did not match
+ code_id: id
- code_id: 000000004000
debug_file: WebInspectorUI
debug_id: D8D7C00FBE023C7796030ECB13AB881D0
end_addr: 0x7fff3da7e000
filename: WebInspectorUI
version: 0.611.3.10
}
(socc-pair
produces a diff between socorro and rust-minidump, the +
indicates what socorro produced -- id
is a is a dummy value it produces when the code_id is invalid. rust-minidump exposes the raw value instead. On macos there isn't a valid value for code_id
so we put random garbage in there, which rust-minidump and socorro handle differently right now.)
The important detail is the debug_id for XUL is 000000000000000000000000000000000
which is obviously wrong. debug_id is a necessary piece to resolve the URL of the symbols. So either the minidump generator is broken or whatever sets up the values its reading is wrong (I don't know that code well enough to speculate on this).
For comparison, if we look at a good crash report we see a proper debug_id for XUL:
base_addr: 0x10303a000
- did not match
+ code_id: id
- code_id: 00000000728f000
debug_file: XUL
debug_id: 130C715E724B369FBF16E15168C3BCD40
end_addr: 0x10a2c9000
filename: XUL
loaded_symbols: true
~ ignoring field symbol_disk_cache_hit: false
~ ignoring field symbol_fetch_time: 989.7459716796876
~ ignoring field symbol_url: https://symbols.mozilla.org/try/XUL/130C715E724B369FBF16E15168C3BCD40/XUL.sym
version: null
Comment 11•4 years ago
|
||
Haven't looked closely if the version ranges overlap but about a month ago there was a big mess with the firefox updater that created "zombie" firefox builds where the debug_ids were valid but for the wrong version. This caused some chaos as the backtraces seemed to have full cfi/symbols but were complete gibberish. Is it possible this is echoes of the same issue?
Reporter | ||
Comment 12•4 years ago
|
||
The important detail is the debug_id for XUL is 000000000000000000000000000000000 which is obviously wrong. debug_id is a necessary piece to resolve the URL of the symbols. So either the minidump generator is broken or whatever sets up the values its reading is wrong (I don't know that code well enough to speculate on this).
Thanks, Alexis, for this information. At some point I'll look through Breakpad and Socorro to try to find out how this is happening.
So, to revise what I said in comment #9:
We have two (probably) unrelated bugs here:
-
The build process for official builds sometimes fails to copy Mozilla-specific symbols for that build to the symbol server.
-
Sometimes the
debug_id
for Mozilla-specific modules (likeXUL
) is zeroed out in crash reports (perhaps only with content-process crashes). This prevents these modules from being symbolicated in those crash reports.
Reporter | ||
Comment 13•4 years ago
|
||
(In reply to Alexis Beingessner [:Gankra] from comment #11)
Haven't looked closely if the version ranges overlap but about a month ago there was a big mess with the firefox updater that created "zombie" firefox builds where the debug_ids were valid but for the wrong version. This caused some chaos as the backtraces seemed to have full cfi/symbols but were complete gibberish. Is it possible this is echoes of the same issue?
I don't think it's related to this bug. It might, though, be related to bug 1737201.
Comment 14•4 years ago
|
||
The severity field is not set for this bug.
:gsvelto, could you have a look please?
For more information, please visit auto_nag documentation.
Updated•4 years ago
|
Reporter | ||
Comment 15•4 years ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #12)
We have two (probably) unrelated bugs here:
The build process for official builds sometimes fails to copy Mozilla-specific symbols for that build to the symbol server.
Sometimes the
debug_id
for Mozilla-specific modules (likeXUL
) is zeroed out in crash reports (perhaps only with content-process crashes). This prevents these modules from being symbolicated in those crash reports.
Could we get someone to work on issue #1? It shouldn't be too hard to find out where the build failures are happening, and at least rig up a way to repeat the failed step until success is achieved.
Here's a recent failure to work on: http://ftp.mozilla.org/pub/firefox/nightly/2021/12/2021-12-06-21-54-35-mozilla-central/firefox-97.0a1.en-US.mac.dmg (build id 20211206233400
) is missing symbols on the symbol server for at least the amd64
architecture:
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 16•4 years ago
•
|
||
Here's a recent failure to work on: http://ftp.mozilla.org/pub/firefox/nightly/2021/12/2021-12-06-21-54-35-mozilla-central/firefox-97.0a1.en-US.mac.dmg (build id
20211206233400
) is missing symbols on the symbol server for at least the amd64 architecture:
I just noticed that this is wrong. Build 20211206233400
is a nightly-pine
build, which (as best I can tell) isn't available at http://ftp.mozilla.org/pub/.
Sigh. I'll look for the most recent instance of issue #1 on a mozilla-central nightly.
Edit: Here it is: http://ftp.mozilla.org/pub/firefox/nightly/2021/11/2021-11-05-09-34-21-mozilla-central/firefox-96.0a1.en-US.mac.dmg (build id 20211105093421
). Note that the Mozilla-specific modules don't have NULL debug-ids (a sign of issue #2). The symbols are only missing for the amd64
cpu architecture.
This is a while ago. Could the problem have already been fixed?
Comment 17•4 years ago
|
||
The task for that build failed: https://firefox-ci-tc.services.mozilla.com/tasks/DdctdbUmTb200i-DA3tlKQ
The log shows that we failed to upload the file: https://firefoxci.taskcluster-artifacts.net/DdctdbUmTb200i-DA3tlKQ/0/public/logs/live_backing.log possibly timing out. IIRC we discussed a similar issue recently where we thought this might be starting to happen.
Will, this particular symbol file was likely to be humongous. The compressed native debug info is over a gigabyte in size. Could we be timing out here?
Comment 18•4 years ago
|
||
Oh nevermind, we already discussed this in bug 1658531. I need to find time to address this. Dup'ing against that bug because that's where the problem lies. It's affecting other platforms too looking at the intermittent reports.
Reporter | ||
Comment 19•4 years ago
|
||
As I said above (notably in comment #12), this report covers two distinct, unrelated bugs:
The build process for official builds sometimes fails to copy Mozilla-specific symbols for that build to the symbol server.
Sometimes the
debug_id
for Mozilla-specific modules (likeXUL
) is zeroed out in crash reports (perhaps only with content-process crashes). This prevents these modules from being symbolicated in those crash reports.
Bug 1658531 only covers issue #1. So I've opened bug 1745224 to cover issue #2.
Description
•