Closed Bug 1855135 Opened 2 years ago Closed 2 years ago

Fetching crash annotations from non-content child processes doesn't work on Ubuntu Snap builds

Categories

(Release Engineering Graveyard :: Release Automation: Snap, defect)

Unspecified
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gsvelto, Unassigned)

References

Details

Attachments

(2 files)

This is what came out of the analysis of bug 1854179. Fetching crash annotations works fine on content processes but not on non-content child processes (gpu, rdd, utility, socket). This is probably a Snap sandbox issue.

I managed to repro this issue even with a content process which is unexpected. Installing the Firefox snap in developer mode and running snappy-debug shows these two lines right before we crash:

= AppArmor =
Time: Sep 26 12:10:48
Log: apparmor="ALLOWED" operation="open" class="file" profile="snap.firefox.firefox" name="/proc/6820/environ" pid=6661 comm=427265616B70616420536572766572 requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000
File: /proc/6820/environ (read)
Suggestion:
* adjust program to not access '@{PROC}/@{pid}/environ'

= AppArmor =
Time: Sep 26 12:10:48
Log: apparmor="ALLOWED" operation="open" class="file" profile="snap.firefox.firefox" name="/proc/6817/environ" pid=7362 comm=427265616B70616420536572766572 requested_mask="r" denied_mask="r" fsuid=1000 ouid=1000
File: /proc/6817/environ (read)
Suggestion:
* adjust program to not access '@{PROC}/@{pid}/environ'

I don't think these are the cause of the crash given that in developer mode they should pass. Still this is an issue with Ubuntu's sandbox as those files are accessed when generating every crash report.

It could be my mistake and I was used to all the plugs being connected, but while I was reproducing the lack of annotations and upon investigation, it looks like sudo snap connect firefox:browser-sandbox :browser-support gets back the crash from bug 1854179 (my build does not have the fix)

At some point, with local snap builds including bug 1854179:

  • repro on stable, beta opt builds, where we miss .extra file (we have .dmp)
  • does not repro on the same when running under strace, but then we lack .dmp (we have .extra)
  • does not repro on the same build when using debug alternatives (crash reporting completely OK it seems)
  • does not repro on nightly opt and debug

I could get a beta opt build (with bug 1854179) to correctly report crashes when I removed:

--enable-linker=gold
--enable-lto=cross
MOZ_PGO=1

Removing just --enable-linker=gold fails with:

2023-09-29 18:55:39.334 :: 2023-09-29 18:55:32.215 :: 10:40.71 js/xpconnect/shell/xpcshell
2023-09-29 18:55:39.334 :: 2023-09-29 18:55:32.574 :: 10:41.07 /snap/gnome-42-2204-sdk/current/usr/bin/ld: ../../toolkit/library/build/libxul.so: undefined reference to `SSL_ClientCertCallbackComplete@NSS_3.80'
2023-09-29 18:55:39.334 :: 2023-09-29 18:55:32.576 :: 10:41.07 /snap/gnome-42-2204-sdk/current/usr/bin/ld: ../../toolkit/library/build/libxul.so: undefined reference to `SECMOD_LockedModuleHasRemovableSlots@NSS_3.79'
2023-09-29 18:55:39.334 :: 2023-09-29 18:55:32.619 :: 10:41.11 /snap/gnome-42-2204-sdk/current/usr/bin/ld: ../../../toolkit/library/build/libxul.so: undefined reference to `SSL_ClientCertCallbackComplete@NSS_3.80'
2023-09-29 18:55:39.334 :: 2023-09-29 18:55:32.621 :: 10:41.11 /snap/gnome-42-2204-sdk/current/usr/bin/ld: ../../../toolkit/library/build/libxul.so: undefined reference to `SECMOD_LockedModuleHasRemovableSlots@NSS_3.79'

And with --enable-linker=gold MOZ_PGO=1 (so removing --enable-lto=cross), we get a winner: https://crash-stats.mozilla.org/report/index/44bb1049-f22a-42fe-a102-c80900230929

locally built upstream beta opt snap, with bug 1854179 applied.

the STRs are:

  • make sure you have firefox installed from store so its connections are OK, and verify them on snap connections firefox, the manual install should keep those
  • snap install --dangerous --name firefox ./firefox.snap
  • snap run firefox
  • in about:crashes, clean up everything
  • open youtube, play something
  • pgrep -af utility, then kill -BUS <utilityPID>
  • reload about:crashes, you should have one, and it should submit when you click on Submit

Expected:
Crash is submitted successfully

Actual:
"Failure"

Failure at submit correlates with ~/snap/firefox/common/.mozilla/firefox/Crash\ Reports/pending/ including only a .dmp file and not a .extra file (we should have both per UUID)

Quick summary of the findings we made with :gerard-majax: this seems to be a linking issue. The offset that we use to access the MOZANNOTATIONS variable is off by several megabytes in both beta and release. This means that the code reading the memory annotations tries to read some stuff from outside of libxul.so's mapping. The issue has been fixed on Ubuntu's nightly snap, but not on beta/release. As :gerard-majax found out, disabling cross-LTO seems to fix the problem there. Let's keep this bug open until his PRs have landed.

PRs are landed

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
See Also: → 1858390
Component: Crash Reporting → Release Automation: Snap
Product: Toolkit → Release Engineering
Product: Release Engineering → Release Engineering Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: