Closed Bug 1472589 Opened Last year Closed 9 months ago

Snap sandbox breaks parent process crash reporting

Categories

(Toolkit :: Crash Reporting, defect, P1)

Unspecified
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla67
Tracking Status
firefox65 --- wontfix
firefox66 --- verified
firefox67 --- fixed

People

(Reporter: jld, Assigned: jld)

References

Details

(Keywords: regression)

Attachments

(1 file)

I have a patch for bug 1461848 that fixes crash reporting for child processes, but not for the parent process; the dynamic linker can't find `libgtk-3.so.0` when launching the crash reporter GUI.  My guess is that snapd starts tearing down the sandbox when the main process exits, but I haven't investigated.
I took a look, and problem might be a lot more boring than I thought: within the Snap container there is no GTK in /usr/lib or the other default paths; instead it's in /snap/firefox/N/usr/lib (where N is the package revision) and this is communicated via $LD_LIBRARY_PATH.  But we unset $LD_LIBRARY_PATH when launching the crash reporter, for reasons explained in bug 407229 and the comment at [1].

It looks like we dynamically load libcurl, so maybe we could do the unsetenv after the exec (in the crashreporter executable itself) instead of before?

[1] https://searchfox.org/mozilla-central/rev/007b66c1f5f7a1b9a900a2038652a16a020f010c/toolkit/crashreporter/nsExceptionHandler.cpp#831
Component: Release Automation: Snap → Crash Reporting
Product: Release Engineering → Toolkit
QA Contact: jlorenzo
See Also: → 407229

Ubuntu 18.04 has the fix for bug 1461848 now, and the fix for this should be simple….

Assignee: nobody → jld

…for some values of “fix”. There doesn't appear to be libcurl anywhere in the container, so we'd have to (1) add a dependency on it, and (2) restore the original LD_LIBRARY_PATH instead of just clearing so that (2a) we can find that libcurl and (2b) successfully restart Firefox.

It is possible to submit a crash that way, by manually restarting Firefox and going into about:crashes (example: bp-c30a3268-c727-41d8-b26a-522fa0190201), but we should try to have UX parity with non-Snap.

Except that I'm pretty sure we don't need to touch LD_LIBRARY_PATH at all in this code, because we're not setting it in the parent process anymore, because we're not using that run-mozilla.sh script anymore. (It's set for child processes in GeckoChildProcessHost, in the environment passed to execve.)

We probably also don't need to unset DYLD_LIBRARY_PATH on Mac but that's not necessary for this bug.

So this seems to be fixable with a two-line change: delete the unsetenv and add a package to the list in snapcraft.yaml.in.

Priority: -- → P1
  1. The unsetting of LD_LIBRARY_PATH is removed, because it's no longer
    necessary and interferes with environments where it's necessary to find
    "system" libraries like GTK; see bug 1472589 comment #1 through #4.

  2. The Snap package manifest adds a dependency on the libcurl package,
    so that the crash reporter can send the report. This uses the GnuTLS
    variant because we're already pulling in GnuTLS as a dependency of some
    other packages (FFmpeg and CUPS, but also the non-GnuTLS cURL packages
    depend on it anyway via OpenLDAP).

Pushed by jedavis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dbf72abf5597
Fix parent process crash reporting in the Snap package environment. r=ted,jlorenzo

Note to self to request uplift.

Flags: needinfo?(jld)
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

Comment on attachment 9041341 [details]
Bug 1472589 - Fix parent process crash reporting in the Snap package environment.

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

Bug 1297513

User impact if declined

No crash reports for the parent process when using the Snap package.

Is this code covered by automated tests?

Unknown

Has the fix been verified in Nightly?

Yes

Needs manual test from QE?

Yes

If yes, steps to reproduce

This is complicated. The Snap repack job doesn't work on Try and I don't know if it would even work with an official Nightly build. After the patch is uplifted and shipped in a beta build, verification should be simple: snap install --beta firefox, then snap run firefox about:crashparent. (Use Ubuntu 18.04 with the latest OS updates applied, because that's known to have the fix for bug 1461848.)

jlorenzo might know more about whether we can verify the fix without shipping; I have a manual process to approximate the Snap repack for my own testing, but QE probably ought to use something closer to what we actually ship.

List of other uplifts needed

None

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

One part of this change just adds a library to the Snap package. The other could theoretically cause bug 407229 to regress but I confirmed that we're no longer doing the environment variable change that caused that bug.

String changes made/needed

none

Flags: needinfo?(jld)
Attachment #9041341 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9041341 [details]
Bug 1472589 - Fix parent process crash reporting in the Snap package environment.

We definitely want to get crash reports.
Let's uplift for beta 10 and verify the fix in beta.

Attachment #9041341 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [qa-triaged]

Verified as fixed on the latest Beta build (66b11) installed from the Snap Store on Ubuntu 18.04LTS.

Based on Comment 9 this issue will not be verified on the Nightly build.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
QA Whiteboard: [qa-triaged]
Whiteboard: [qa-triaged]
You need to log in before you can comment on or make changes to this bug.