Closed Bug 1698718 Opened 3 years ago Closed 4 months ago

Crash in [@ mozilla::gmp::GMPChild::RecvPreloadLibs]

Categories

(Core :: Audio/Video: GMP, defect, P3)

Unspecified
Linux
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: gsvelto, Unassigned)

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/8d617ac1-55fc-43df-ad1e-94c9c0210308

MOZ_CRASH Reason: MOZ_CRASH(Couldn't load lib needed by NSS)

Top 10 frames of crashing thread:

0 libxul.so mozilla::gmp::GMPChild::RecvPreloadLibs [clone .cold] 
1 libxul.so mozilla::gmp::PGMPChild::OnMessageReceived build-browser/ipc/ipdl/build-browser/ipc/ipdl/PGMPChild.cpp:500
2 libxul.so mozilla::ipc::MessageChannel::DispatchAsyncMessage 
3 libxul.so mozilla::ipc::MessageChannel::DispatchMessage 
4 libxul.so mozilla::ipc::MessageChannel::MessageTask::Run 
5 libxul.so MessageLoop::RunTask build-browser/ipc/chromium/ipc/chromium/src/base/message_loop.cc:465
6 libxul.so MessageLoop::DoWork [clone .part.0] build-browser/ipc/chromium/ipc/chromium/src/base/message_loop.cc:548
7 libxul.so base::MessagePumpDefault::Run 
8 libxul.so MessageLoop::Run build-browser/ipc/chromium/ipc/chromium/src/base/message_loop.cc:309
9 libxul.so XRE_InitChildProcess 

This is a Linux-specific issue, it seems that we're crashing here. Don't be fooled by the elevated volume in ESR, that's caused by the fact that we don't throttle crash report processing coming from the ESR channel. Crashes coming from release should be ~10 times higher than the numbers show.

This is the code the runs to load libs before the sandbox goes up. The sender should just be sending through the libs[0] we have on the allow list in the code we're crashing in. I.e. this code is trying to load "libfreeblpriv3.so", "libsoftokn3.so" and is failing to do so resulting in this crash. Specifically our dlopen call[1] is not returning a lib. I'm not sure why this would happen. We could add some handling here with dlerror, though since that returns a string I'm not sure the best way to link it to crash reports (we could search the string for known phrases and use different MOZ_CRASHs to differentiate?).

[0] https://searchfox.org/mozilla-central/rev/526a5089c61db85d4d43eb0e46edaf1f632e853a/dom/media/gmp/GMPParent.cpp#856
[1] https://searchfox.org/mozilla-central/rev/526a5089c61db85d4d43eb0e46edaf1f632e853a/dom/media/gmp/GMPChild.cpp#233

Severity: -- → S3
Priority: -- → P3

You could add a note to the crash report using AppendAppNotesToCrashReport(). You'd need to inspect the crashes manually afterwards but it usually does the job if you don't want to add a dedicated crash annotation (and if it's something you plan on removing after you've solved the issue).

Neat, I wasn't aware of that. Will cook up a patch.

Keywords: leave-open

Add a temporary note to this crash path to help diagnose why lib loads are
failing.

Assignee: nobody → bvandyk
Status: NEW → ASSIGNED
Pushed by bvandyk@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ac3a26e86df0
Add note to GMP lib load crash to diagnose reason. r=gsvelto

The leave-open keyword is there and there is no activity for 6 months.
:bryce, maybe it's time to close this bug?

Flags: needinfo?(bvandyk)
libfreeblpriv3.so: cannot open shared object file: No such file or directory

Looks like our error. I assume libfreeblpriv3.so should always be packaged and present. I don't know why it wouldn't be present. Busted install? OS security preventing GMP from seeing the file? Not sure where to start, :jld, any ideas?

Flags: needinfo?(bvandyk) → needinfo?(jld)

The only thing I can think of is: GMP is the only remaining normal process type that still uses the plugin-container executable instead of running firefox with a special flag. (Bug 1114647 and related; this is where it's currently specified in the source. Originally all child processes used plugin-container, which was confusing.) I wonder if there's some OS-level security policy that's allowing only the firefox executable to see those libraries for some reason, although it would have to have allowed libxul.so and a few others to be loaded before getting to this point, so that's still a little confusing. Relatedly, these are probably Mozilla's builds (no distribution ID in the telemetry environment in the crashes I looked at), so they may be downloaded into somewhere in a user's home directory, which might be part of why a security policy would block things.

I also notice that the crashes are all from Debian or Debian-based distros (and an unusually large number from Kali), but regular Debian doesn't have any problems with this, so I don't know what the connection is there.

(Incidentally, it's not clear if there was ever any technical requirement to continue using plugin-container for plugins on Linux — I vaguely recall that it was needed for Flash on Windows for annoying reasons — so if its existence is causing problems then it's possible we could just get rid of it.)

Flags: needinfo?(jld)

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #9)

Relatedly, these are probably Mozilla's builds (no distribution ID in the telemetry environment in the crashes I looked at), so they may be downloaded into somewhere in a user's home directory, which might be part of why a security policy would block things.

I think I can find that out for you by looking at the crashes. In the past we found some crazy issues related to that, including a guy that had it in installed under /root.

FYI I opened a bunch of crashes and Firefox seems to be installed where it's supposed to be installed by the package manager (under /usr).

(In reply to Gabriele Svelto [:gsvelto] from comment #11)

FYI I opened a bunch of crashes and Firefox seems to be installed where it's supposed to be installed by the package manager (under /usr).

If it's a downstream build, then I have another idea: they typically use the distro's packages for dependencies like NSS. And, looking at Debian's libnss3 package, the internal libraries like libfreebl3.so are in a different directory:

/usr/lib/x86_64-linux-gnu/libnss3.so
/usr/lib/x86_64-linux-gnu/libnssutil3.so
/usr/lib/x86_64-linux-gnu/libsmime3.so
/usr/lib/x86_64-linux-gnu/libssl3.so
/usr/lib/x86_64-linux-gnu/nss/libfreebl3.so
/usr/lib/x86_64-linux-gnu/nss/libfreeblpriv3.so
/usr/lib/x86_64-linux-gnu/nss/libnssckbi.so
/usr/lib/x86_64-linux-gnu/nss/libnssdbm3.so
/usr/lib/x86_64-linux-gnu/nss/libsoftokn3.so

And it looks like Debian's builds don't set distributionId (I'd misremembered that they did).

So now I have STR: using Debian's build of Firefox, load https://cpearce.github.io/mse-eme/. Example: bp-12fcc0d1-0464-4268-a2bc-3ab930210917

Also, a workaround: LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/nss (adjust as needed for arch to avoid the mistake seen in bp-b90fd437-7f52-4406-bccc-9305b0210917)

What I don't know yet is how to fix this properly. If we could start NSS and get it to load the libraries (before starting the sandbox, but after we know the plugin is clearkey), that would hopefully be easier than trying to guess the paths.

I just had the same crash and I'm using Kali Linux with Firefox ESR installed from the distro's repository.

https://crash-stats.mozilla.org/report/index/932c1f77-58d2-49bc-85a7-ea4d40210923#tab-details

The library directory change appears to be part of this Debian patch to NSS. glandium, as the author of that patch, do you think we should try to change how we preload NSS for the clearkey EME plugin, or would it make more sense for Debian to carry a patch for Firefox to extend the sandbox policy, given that Debian's packaging system should know the exact library paths?

Flags: needinfo?(mh+mozilla)

On one hand, if LD_LIBRARY_PATH works around it, there shouldn't be a need to extend the sandbox policy. Just to change the preloading in GMP*, which would be a Debian-specific thing.

On the other hand, shouldn't the preloading use NSS's ways, rather than dlopen? (especially in the light of possibly linking NSS entirely statically, which I suppose is still in the domain of possibilities some day)

Flags: needinfo?(mh+mozilla)

The leave-open keyword is there and there is no activity for 6 months.
:bryce, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(bvandyk)

Unassigning bugs assigned to Bryce because he no longer works at Mozilla.

Assignee: brycebugemail → nobody
Status: ASSIGNED → NEW

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: