Closed Bug 1461875 Opened 2 years ago Closed 2 years ago

fd exhaustion from large number of Unix-domain stream sockets

Categories

(Core :: IPC, defect)

60 Branch
defect
Not set

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: karlt, Unassigned)

References

Details

JavaScript error: undefined, line 0: Error: No matching message handler

###!!! [Parent][MessageChannel] Error: (msgtype=0x10003B,name=PBackground::Msg_DispatchLocalStorageChange) Channel closing: too late to send/recv, messages will be lost

JavaScript error: undefined, line 0: Error: No matching message handler
JavaScript error: undefined, line 0: Error: No matching message handler
JavaScript error: undefined, line 0: Error: Missing host permission for the tab
JavaScript error: undefined, line 0: Error: Missing host permission for the tab
JavaScript error: undefined, line 0: Error: Missing host permission for the tab
JavaScript error: undefined, line 0: Error: Missing host permission for the tab
JavaScript error: undefined, line 0: Error: Missing host permission for the tab
JavaScript error: undefined, line 0: Error: Missing host permission for the tab
JavaScript error: undefined, line 0: Error: Missing host permission for the tab
JavaScript error: undefined, line 0: Error: Missing host permission for the tab
JavaScript error: undefined, line 0: Error: Missing host permission for the tab
JavaScript error: undefined, line 0: Error: Missing host permission for the tab
JavaScript error: undefined, line 0: Error: No matching message handler
JavaScript error: undefined, line 0: Error: No matching message handler
JavaScript error: undefined, line 0: Error: No matching message handler
JavaScript error: undefined, line 0: Error: No matching message handler
Sandbox: bad read from pid 2057: Message too long
Sandbox: Unexpected EOF, op 0 flags 00 path /sys/devices/system/cpu/present
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to lock new back buffer. (t=871384) [GFX1-]: Failed to lock new back buffer.
[Parent 1868, Main Thread] WARNING: file /var/tmp/portage/www-client/firefox-60.0_beta14/work/mozilla-beta-98d536130349/ipc/chromium/src/base/shared_memory_posix.cc, line 216
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to lock new back buffer. (t=871384) |[1][GFX1-]: Failed to lock new back buffer. (t=871384) [GFX1-]: Failed to lock new back buffer.
[Parent 1868, Main Thread] WARNING: file /var/tmp/portage/www-client/firefox-60.0_beta14/work/mozilla-beta-98d536130349/ipc/chromium/src/base/shared_memory_posix.cc, line 216
Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to lock new back buffer. (t=871384) |[1][GFX1-]: Failed to lock new back buffer. (t=871384) |[2][GFX1-]: Failed to lock new back buffer. (t=871384) [GFX1-]: Failed to lock new back buffer.

The Sandbox: errors are the first "non-normal" messages, after which some tabs fail to draw, presumably due to the graphics errors.

This is with a 60.0b14 build that had been running for three weeks.
2057 is a Web Content process.

    0.00 MB ── gfx-surface-image
    0.00 MB ── gfx-textures
    0.00 MB ── gfx-textures-peak
    0.00 MB ── gfx-tiles-waste
         67 ── ghost-windows
1,074.36 MB ── heap-allocated
    1.00 MB ── heap-chunksize
1,622.00 MB ── heap-mapped
         34 ── host-object-urls
    2.27 MB ── imagelib-surface-cache-estimated-locked
    2.27 MB ── imagelib-surface-cache-estimated-total
          0 ── imagelib-surface-cache-overflow-count
   11.89 MB ── js-main-runtime-temporary-peak
         68 ── page-faults-hard
305,336,509 ── page-faults-soft
3,339.43 MB ── resident-peak
    0.00 MB ── system-heap-allocated
    0.00 MB ── tracelogger
> Sandbox: bad read from pid 2057: Message too long

So, EMSGSIZE is what I mapped MSG_CTRUNC to when I wrote the sandbox file broker, because I failed to predict that usually when that happens it's fd exhaustion.  See bug 1401774 about trying to fail more gracefully, or at least guess EMFILE instead.

Assuming that's what it is, also see bug 1401776, where the fd limit was raised.
See Also: → 1401776, 1401774
Thank you, Jed.  fd exhaustion seems likely here.

% limit descriptors
descriptors     1024
% limit -h descriptors
descriptors     4096

% lsof -p 2057 | tail
Web\x20Co 2057 karl  172u     unix 0xffff880337ff8000      0t0 10779158 type=STREAM
Web\x20Co 2057 karl  173u     unix 0xffff8807cfc51800      0t0 10788356 type=STREAM
Web\x20Co 2057 karl  174u     sock                0,8      0t0 10213542 protocol: UNIX
Web\x20Co 2057 karl  175u     sock                0,8      0t0 10217489 protocol: UNIX
Web\x20Co 2057 karl  176u     sock                0,8      0t0 10208080 protocol: UNIX
Web\x20Co 2057 karl  177u     sock                0,8      0t0 10215516 protocol: UNIX
Web\x20Co 2057 karl  178u     unix 0xffff8805f8036c00      0t0 11026170 type=STREAM
Web\x20Co 2057 karl  179u     unix 0xffff8804571be000      0t0 11051859 type=STREAM
Web\x20Co 2057 karl  180u     unix 0xffff8805c49ce400      0t0 11216348 type=STREAM
Web\x20Co 2057 karl  181u     unix 0xffff8805c49c8800      0t0 11239458 type=STREAM
% lsof -p 1868 | tail
firefox 1868 karl  821u     unix 0xffff8805ced1bc00      0t0   8381672 type=STREAM
firefox 1868 karl  831u     unix 0xffff88076a535400      0t0  10134347 type=STREAM
firefox 1868 karl  834u     unix 0xffff8805c49ca400      0t0  11216349 type=STREAM
firefox 1868 karl  838u     unix 0xffff8805a4f58000      0t0  10812764 type=STREAM
firefox 1868 karl  841u     unix 0xffff880102f22c00      0t0  11323987 type=STREAM
firefox 1868 karl  845u     unix 0xffff8805d36ba800      0t0   8366973 type=STREAM
firefox 1868 karl  846u     unix 0xffff88022627e400      0t0  11329039 type=STREAM
firefox 1868 karl  865u     unix 0xffff8808156cb400      0t0   8382617 type=STREAM
firefox 1868 karl  894u     unix 0xffff88053047a400      0t0  10812144 type=STREAM
firefox 1868 karl  935u     unix 0xffff88024de38000      0t0  10248396 type=STREAM

Probably at least 95% of parent process fds would be unix type=STREAM, and not much less in the content process.
Those stream sockets are probably IPC top-level actors.  I filed bug 1462168 for better accounting.

I also took a look at the implementation of the StreamFilter WebExtension API (a nonportable Mozilla addition used by NoScript, among others), because that could create a lot of concurrent top-level actors… but if I understand correctly those shouldn't persist when the browser is idle unless there's a leak, and WebExtensions aren't out-of-process on Linux so those probably wouldn't be using socketpairs anyway.  But that's all guesswork.
Just two WebExtensions enabled:

Bugzilla Crash Stop
Invert Colors
Component: Security: Process Sandboxing → IPC
See Also: → 1462168
Summary: Sandbox: bad read from pid NNNN: Message too long → fd exhaustion from large number of Unix-domain stream sockets
Karl, are you still seeing this?

If so, the “ipc-channels” and “ipc-channels-peak” subtrees in about:memory should be informative.
Flags: needinfo?(karlt)
I have had a 62.0b18 build running for more than 7 weeks and have not seen similar symptoms, so I assume either the issue has been fixed, or I haven't been doing whatever was required to trigger this.

(This build doesn't have the new about:memory support for bug 1462168.)

% psu | grep firefox
 1662  1249 S  1 Aug 21 /usr/lib64/firefox/firefox --sm-client-id 10dce46972000149852055200000026080018
 1857  1662 S  0 Aug 21 /usr/lib64/firefox/firefox -contentproc -childID 1 -isForBrowser -prefsLen 69680 -schedulerPrefs 0001,2 -parentBuildID 20180821113748 -greomni /usr/lib64/firefox/omni.ja -appomni /usr/lib64/firefox/browser/omni.ja -appdir /usr/lib64/firefox/browser 1662 tab
 2399  1662 S  1 Aug 21 /usr/lib64/firefox/firefox -contentproc -childID 2 -isForBrowser -prefsLen 73865 -schedulerPrefs 0001,2 -parentBuildID 20180821113748 -greomni /usr/lib64/firefox/omni.ja -appomni /usr/lib64/firefox/browser/omni.ja -appdir /usr/lib64/firefox/browser 1662 tab
 2846  1662 S  1 Aug 21 /usr/lib64/firefox/firefox -contentproc -childID 3 -isForBrowser -prefsLen 73865 -schedulerPrefs 0001,2 -parentBuildID 20180821113748 -greomni /usr/lib64/firefox/omni.ja -appomni /usr/lib64/firefox/browser/omni.ja -appdir /usr/lib64/firefox/browser 1662 tab
 2861  1662 S  0 Aug 21 /usr/lib64/firefox/firefox -contentproc -childID 4 -isForBrowser -prefsLen 73865 -schedulerPrefs 0001,2 -parentBuildID 20180821113748 -greomni /usr/lib64/firefox/omni.ja -appomni /usr/lib64/firefox/browser/omni.ja -appdir /usr/lib64/firefox/browser 1662 tab
18181  1558 S  0  15:47 grep firefox
% lsof -p 2861 | tail
Web\x20Co 2861 karl   40u  a_inode               0,11        0     3075 [eventpoll]
Web\x20Co 2861 karl   41u  a_inode               0,11        0     3075 [eventfd]
Web\x20Co 2861 karl   42r     FIFO               0,10      0t0   184144 pipe
Web\x20Co 2861 karl   43u  a_inode               0,11        0     3075 [eventfd]
Web\x20Co 2861 karl   44w     FIFO               0,10      0t0   184144 pipe
Web\x20Co 2861 karl   51u     sock                0,8      0t0  9162372 protocol: UNIX
Web\x20Co 2861 karl   52u     sock                0,8      0t0 76834942 protocol: UNIX
Web\x20Co 2861 karl   53u     sock                0,8      0t0 72243058 protocol: UNIX
Web\x20Co 2861 karl   54u     sock                0,8      0t0 76836896 protocol: UNIX
Web\x20Co 2861 karl   57u     sock                0,8      0t0 72255790 protocol: UNIX
% lsof -p 1662 | tail
firefox 1662 karl  211w     FIFO               0,10      0t0    128627 pipe
firefox 1662 karl  214u     sock                0,8      0t0  76836895 protocol: UNIX
firefox 1662 karl  217u  a_inode               0,11        0      3075 [eventfd]
firefox 1662 karl  224u      REG               0,44      451  11607395 /home/karl/.cache/mozilla/firefox/0napnj6f.default/cache2/entries/7FC6F59E551AD926EB1B63FB3B1EAF3557965A43
firefox 1662 karl  225u     unix 0xffff88084c867400      0t0    128630 type=STREAM
firefox 1662 karl  226u  a_inode               0,11        0      3075 [eventfd]
firefox 1662 karl  236u      REG               0,44     6432   5905765 /home/karl/.cache/mozilla/firefox/0napnj6f.default/cache2/entries/09666A2D197560C2734894AFBE9F106109153E63
firefox 1662 karl  240u      REG               0,44      101  11592700 /home/karl/.cache/mozilla/firefox/0napnj6f.default/cache2/entries/A10DC5302AB850826103AD0DAC5197B406D93778
firefox 1662 karl  306u     sock                0,8      0t0   9162371 protocol: UNIX
firefox 1662 karl  425u     sock                0,8      0t0  22395573 protocol: UNIX
% lsof -p 1662 | grep STREAM | wc
     37     333    3071

I haven't been using InvertColors so much, but it is still installed.

Let's close this WFM for now at least, but I can try to remember to check those subtrees in a future build (if that's helpful).
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(karlt)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.