Closed Bug 1539188 Opened 5 years ago Closed 5 years ago

Tab Rendering Issues and Tab Crashes

Categories

(Core :: Security: PSM, defect)

66 Branch
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: carlo, Unassigned)

Details

Attachments

(3 files, 1 obsolete file)

Attached file firefox.log

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0

Steps to reproduce:

Use firefox for an extended period of time. I have not found any set of steps that reproduces the issue directly aside from long term usage.

Actual results:

Tabs start to render incorrectly often only displaying background colors or a loading symbol.
After a period of time the tabs crash.
Restarting firefox fixes the issue.

Expected results:

Tabs should not render incorrectly or crash

Thanks for filing. Can you kindly paste one of your crash reports from about:crashes? Thanks.

Flags: needinfo?(carlo)

Thanks for the report. Based on the signature and the fact you are running Linux, this appears to be a dupe of Bug 1514734.

Status: UNCONFIRMED → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE

I don't think this is a dupe. Most of the crashes in bug 1514734 happen in the parent process, when we first try to allocate shared memory, most likely either due to OOM or limits on the size of /dev/shm.

This crash happens in a content process, when trying to map the shared memory. It's possible that could happen because of OOM, but not likely. It should really only happen if we don't have a contiguous virtual memory region large enough to map it. And that really shouldn't happen so soon after content process startup.

And it doesn't explain the other odd behavior reported in comment 0.

Jed, does anything about this stand out to you?

Status: RESOLVED → REOPENED
Ever confirmed: true
Flags: needinfo?(jld)
Resolution: DUPLICATE → ---

A lot of things about that log file stand out to me; here's the first one:

Sandbox: Unexpected EOF, op 0 flags 02400302 path /dev/shm/org.mozilla.ipc.17966.1
Sandbox: bad read from pid 17966: Message too long
Sandbox: Unexpected EOF, op 0 flags 02400302 path /dev/shm/org.mozilla.ipc.17966.1
Sandbox: bad read from pid 17966: Message too long
[Child 17966, Main Thread] WARNING: failed to open shm: Input/output error: file /build/firefox/src/mozilla-unified/ipc/chromium/src/base/shared_memory_posix.cc, line 142
[Child 17966, Main Thread] WARNING: failed to open shm: Input/output error: file /build/firefox/src/mozilla-unified/ipc/chromium/src/base/shared_memory_posix.cc, line 142
[Child 17966, Main Thread] WARNING: failed to open shm: Broken pipe: file /build/firefox/src/mozilla-unified/ipc/chromium/src/base/shared_memory_posix.cc, line 142

This looks like file descriptor exhaustion in the parent process (see bug 1401774 for why it's Message too long). And farther down in the log:

[Parent 12434, Main Thread] WARNING: failed to open shm: Too many open files: file /build/firefox/src/mozilla-unified/ipc/chromium/src/base/shared_memory_posix.cc, line 142

Also this (which is the same thing that happened to the sandbox broker, but for normal IPC):

[Parent 12434, Gecko_IOThread] WARNING: Message needs unreceived descriptors channel:6006064f1000 message-type:65531 header()->num_fds:1 num_fds:0 fds_i:0: file /build/firefox/src/mozilla-unified/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 536

And I'm kind of belaboring the point by now but here's an interestingly large file descriptor:

[Parent 12434, Gecko_IOThread] WARNING: pipe error (3263): Connection reset by peer: file /build/firefox/src/mozilla-unified/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 357

The usual diagnostic for this is to get the parent process's pid from ps x or similar and do lsof -n -p [insert pid here] to see what kind of fd is leaking.

Flags: needinfo?(jld)

Hm. Interesting. I guess if we've exhausted file descriptors in the parent process we'd fail to dup the file descriptors when we try to send them to the child. It's interesting that it would fail in this way, though...

Has STR: --- → yes

(see comment 5)

Flags: needinfo?(carlo)
Attached file lsof (obsolete) —

not sure if I left it running long enough but here is the lsof output

Flags: needinfo?(carlo)

(In reply to carlo from comment #8)

Created attachment 9053993 [details]
lsof

not sure if I left it running long enough but here is the lsof output

Is this after you've started seeing problems? This is a fairly reasonable number of file descriptors for the parent process. Though there are a lot of open copies of /etc/ca-certificates/trust-source/anchors, which is surprising.

No. Unfortunately I am fighting another bug (https://bugzilla.mozilla.org/show_bug.cgi?id=1539594) which results in crashes after prolonged usage. I will post another lsof once this issue occurs again. I figured I would send along output without the issue initially just in case anything looked fishy.

Flags: needinfo?(carlo)

(In reply to Kris Maglione [:kmag] from comment #9)

(In reply to carlo from comment #8)

Created attachment 9053993 [details]
lsof

not sure if I left it running long enough but here is the lsof output

Is this after you've started seeing problems? This is a fairly reasonable number of file descriptors for the parent process. Though there are a lot of open copies of /etc/ca-certificates/trust-source/anchors, which is surprising.

Keeler, do you have any idea how the trust anchors file gets opened, or why so many copies of it are opened in this instance?

I suppose it comes from some sort of OS-specific API, but I don't even know where to start looking.

Flags: needinfo?(dkeeler)

Some linux distros (notably fedora) change how the default softoken works by having it also look in /etc/ca-certificates/... for trust information. So if opening that file a lot is causing problems, it's not directly due to our code. My advice would be to run fedora's version of firefox under gdb with the relevant debug packages installed and breaking when that file gets opened.

Flags: needinfo?(dkeeler)

Well, opening the file a lot should be fine. Keeping multiple copies open, on the other hand...

Anyway, this is way out of my bailiwick. If that turns out to be the problem, I'm not sure what component to move the bug to.

Probably PSM or NSS.

Carlo, are you using a version of Firefox downloaded from mozilla.org or provided with your linux distro? If the latter, which one?

Component: Untriaged → Security: PSM
Product: Firefox → Core

(In reply to Dana Keeler (she/her) (use needinfo) (:keeler for reviews) from comment #16)

Looking at arch linux's nss package, it might be a problem with p11-kit: https://git.archlinux.org/svntogit/packages.git/tree/trunk/PKGBUILD?h=packages/nss#n79

I think that's probably the case. There's at least one place where the current version leaks a directory descriptor if there's a failure:

https://github.com/p11-glue/p11-kit/blob/793cc3b78f17bb5a3c151eba1144b73a5d51be3e/trust/token.c#L244-L286

And another one was fixed recently:

https://github.com/p11-glue/p11-kit/commit/e2170b295992cb7fdf115227a78028ac3780619f#diff-a98a07a4da044ade50fcbd24dd618fdc

Oh, C...

I'm going to wait until we get confirmation that this is the cause of the file descriptor exhaustion, but it seems the most likely explanation. If it is, I'll close this and we'll need to file an upstream bug.

Attached file lsof

Reproduced the error message, but interestingly I have not gotten the strange rendering or a crash yet. Definitely looks like there is a file descriptor leak...

Attachment #9053993 - Attachment is obsolete: true
Flags: needinfo?(carlo)
Attached file file.html

For reference, seems that opening new https sites causes the leak so using this pretty quickly produced the error messages.

Yup. You have over 3,000 open file descriptors for /etc/ca-certificates/trust-source/anchors there. That's definitely the problem.

I suppose that means you have a corrupt or unreadable file in that directory. Fixing that might fix your immediate problem, but you should probably file upstream bugs for p11-kit and the ArchLinux NSS package in case others run into it.

Closing as invalid, since this is an upstream bug that we can't do anything about here.

Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: