Closed Bug 1461875 Opened 7 years ago Closed 7 years ago

fd exhaustion from large number of Unix-domain stream sockets

Categories

(Core :: IPC, defect)

60 Branch
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: karlt, Unassigned)

References

Details

JavaScript error: undefined, line 0: Error: No matching message handler ###!!! [Parent][MessageChannel] Error: (msgtype=0x10003B,name=PBackground::Msg_DispatchLocalStorageChange) Channel closing: too late to send/recv, messages will be lost JavaScript error: undefined, line 0: Error: No matching message handler JavaScript error: undefined, line 0: Error: No matching message handler JavaScript error: undefined, line 0: Error: Missing host permission for the tab JavaScript error: undefined, line 0: Error: Missing host permission for the tab JavaScript error: undefined, line 0: Error: Missing host permission for the tab JavaScript error: undefined, line 0: Error: Missing host permission for the tab JavaScript error: undefined, line 0: Error: Missing host permission for the tab JavaScript error: undefined, line 0: Error: Missing host permission for the tab JavaScript error: undefined, line 0: Error: Missing host permission for the tab JavaScript error: undefined, line 0: Error: Missing host permission for the tab JavaScript error: undefined, line 0: Error: Missing host permission for the tab JavaScript error: undefined, line 0: Error: Missing host permission for the tab JavaScript error: undefined, line 0: Error: No matching message handler JavaScript error: undefined, line 0: Error: No matching message handler JavaScript error: undefined, line 0: Error: No matching message handler JavaScript error: undefined, line 0: Error: No matching message handler Sandbox: bad read from pid 2057: Message too long Sandbox: Unexpected EOF, op 0 flags 00 path /sys/devices/system/cpu/present Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to lock new back buffer. (t=871384) [GFX1-]: Failed to lock new back buffer. [Parent 1868, Main Thread] WARNING: file /var/tmp/portage/www-client/firefox-60.0_beta14/work/mozilla-beta-98d536130349/ipc/chromium/src/base/shared_memory_posix.cc, line 216 Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to lock new back buffer. (t=871384) |[1][GFX1-]: Failed to lock new back buffer. (t=871384) [GFX1-]: Failed to lock new back buffer. [Parent 1868, Main Thread] WARNING: file /var/tmp/portage/www-client/firefox-60.0_beta14/work/mozilla-beta-98d536130349/ipc/chromium/src/base/shared_memory_posix.cc, line 216 Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to lock new back buffer. (t=871384) |[1][GFX1-]: Failed to lock new back buffer. (t=871384) |[2][GFX1-]: Failed to lock new back buffer. (t=871384) [GFX1-]: Failed to lock new back buffer. The Sandbox: errors are the first "non-normal" messages, after which some tabs fail to draw, presumably due to the graphics errors. This is with a 60.0b14 build that had been running for three weeks. 2057 is a Web Content process. 0.00 MB ── gfx-surface-image 0.00 MB ── gfx-textures 0.00 MB ── gfx-textures-peak 0.00 MB ── gfx-tiles-waste 67 ── ghost-windows 1,074.36 MB ── heap-allocated 1.00 MB ── heap-chunksize 1,622.00 MB ── heap-mapped 34 ── host-object-urls 2.27 MB ── imagelib-surface-cache-estimated-locked 2.27 MB ── imagelib-surface-cache-estimated-total 0 ── imagelib-surface-cache-overflow-count 11.89 MB ── js-main-runtime-temporary-peak 68 ── page-faults-hard 305,336,509 ── page-faults-soft 3,339.43 MB ── resident-peak 0.00 MB ── system-heap-allocated 0.00 MB ── tracelogger
> Sandbox: bad read from pid 2057: Message too long So, EMSGSIZE is what I mapped MSG_CTRUNC to when I wrote the sandbox file broker, because I failed to predict that usually when that happens it's fd exhaustion. See bug 1401774 about trying to fail more gracefully, or at least guess EMFILE instead. Assuming that's what it is, also see bug 1401776, where the fd limit was raised.
See Also: → 1401776, 1401774
Thank you, Jed. fd exhaustion seems likely here. % limit descriptors descriptors 1024 % limit -h descriptors descriptors 4096 % lsof -p 2057 | tail Web\x20Co 2057 karl 172u unix 0xffff880337ff8000 0t0 10779158 type=STREAM Web\x20Co 2057 karl 173u unix 0xffff8807cfc51800 0t0 10788356 type=STREAM Web\x20Co 2057 karl 174u sock 0,8 0t0 10213542 protocol: UNIX Web\x20Co 2057 karl 175u sock 0,8 0t0 10217489 protocol: UNIX Web\x20Co 2057 karl 176u sock 0,8 0t0 10208080 protocol: UNIX Web\x20Co 2057 karl 177u sock 0,8 0t0 10215516 protocol: UNIX Web\x20Co 2057 karl 178u unix 0xffff8805f8036c00 0t0 11026170 type=STREAM Web\x20Co 2057 karl 179u unix 0xffff8804571be000 0t0 11051859 type=STREAM Web\x20Co 2057 karl 180u unix 0xffff8805c49ce400 0t0 11216348 type=STREAM Web\x20Co 2057 karl 181u unix 0xffff8805c49c8800 0t0 11239458 type=STREAM % lsof -p 1868 | tail firefox 1868 karl 821u unix 0xffff8805ced1bc00 0t0 8381672 type=STREAM firefox 1868 karl 831u unix 0xffff88076a535400 0t0 10134347 type=STREAM firefox 1868 karl 834u unix 0xffff8805c49ca400 0t0 11216349 type=STREAM firefox 1868 karl 838u unix 0xffff8805a4f58000 0t0 10812764 type=STREAM firefox 1868 karl 841u unix 0xffff880102f22c00 0t0 11323987 type=STREAM firefox 1868 karl 845u unix 0xffff8805d36ba800 0t0 8366973 type=STREAM firefox 1868 karl 846u unix 0xffff88022627e400 0t0 11329039 type=STREAM firefox 1868 karl 865u unix 0xffff8808156cb400 0t0 8382617 type=STREAM firefox 1868 karl 894u unix 0xffff88053047a400 0t0 10812144 type=STREAM firefox 1868 karl 935u unix 0xffff88024de38000 0t0 10248396 type=STREAM Probably at least 95% of parent process fds would be unix type=STREAM, and not much less in the content process.
Those stream sockets are probably IPC top-level actors. I filed bug 1462168 for better accounting. I also took a look at the implementation of the StreamFilter WebExtension API (a nonportable Mozilla addition used by NoScript, among others), because that could create a lot of concurrent top-level actors… but if I understand correctly those shouldn't persist when the browser is idle unless there's a leak, and WebExtensions aren't out-of-process on Linux so those probably wouldn't be using socketpairs anyway. But that's all guesswork.
Just two WebExtensions enabled: Bugzilla Crash Stop Invert Colors
Component: Security: Process Sandboxing → IPC
See Also: → 1462168
Summary: Sandbox: bad read from pid NNNN: Message too long → fd exhaustion from large number of Unix-domain stream sockets
Karl, are you still seeing this? If so, the “ipc-channels” and “ipc-channels-peak” subtrees in about:memory should be informative.
Flags: needinfo?(karlt)
I have had a 62.0b18 build running for more than 7 weeks and have not seen similar symptoms, so I assume either the issue has been fixed, or I haven't been doing whatever was required to trigger this. (This build doesn't have the new about:memory support for bug 1462168.) % psu | grep firefox 1662 1249 S 1 Aug 21 /usr/lib64/firefox/firefox --sm-client-id 10dce46972000149852055200000026080018 1857 1662 S 0 Aug 21 /usr/lib64/firefox/firefox -contentproc -childID 1 -isForBrowser -prefsLen 69680 -schedulerPrefs 0001,2 -parentBuildID 20180821113748 -greomni /usr/lib64/firefox/omni.ja -appomni /usr/lib64/firefox/browser/omni.ja -appdir /usr/lib64/firefox/browser 1662 tab 2399 1662 S 1 Aug 21 /usr/lib64/firefox/firefox -contentproc -childID 2 -isForBrowser -prefsLen 73865 -schedulerPrefs 0001,2 -parentBuildID 20180821113748 -greomni /usr/lib64/firefox/omni.ja -appomni /usr/lib64/firefox/browser/omni.ja -appdir /usr/lib64/firefox/browser 1662 tab 2846 1662 S 1 Aug 21 /usr/lib64/firefox/firefox -contentproc -childID 3 -isForBrowser -prefsLen 73865 -schedulerPrefs 0001,2 -parentBuildID 20180821113748 -greomni /usr/lib64/firefox/omni.ja -appomni /usr/lib64/firefox/browser/omni.ja -appdir /usr/lib64/firefox/browser 1662 tab 2861 1662 S 0 Aug 21 /usr/lib64/firefox/firefox -contentproc -childID 4 -isForBrowser -prefsLen 73865 -schedulerPrefs 0001,2 -parentBuildID 20180821113748 -greomni /usr/lib64/firefox/omni.ja -appomni /usr/lib64/firefox/browser/omni.ja -appdir /usr/lib64/firefox/browser 1662 tab 18181 1558 S 0 15:47 grep firefox % lsof -p 2861 | tail Web\x20Co 2861 karl 40u a_inode 0,11 0 3075 [eventpoll] Web\x20Co 2861 karl 41u a_inode 0,11 0 3075 [eventfd] Web\x20Co 2861 karl 42r FIFO 0,10 0t0 184144 pipe Web\x20Co 2861 karl 43u a_inode 0,11 0 3075 [eventfd] Web\x20Co 2861 karl 44w FIFO 0,10 0t0 184144 pipe Web\x20Co 2861 karl 51u sock 0,8 0t0 9162372 protocol: UNIX Web\x20Co 2861 karl 52u sock 0,8 0t0 76834942 protocol: UNIX Web\x20Co 2861 karl 53u sock 0,8 0t0 72243058 protocol: UNIX Web\x20Co 2861 karl 54u sock 0,8 0t0 76836896 protocol: UNIX Web\x20Co 2861 karl 57u sock 0,8 0t0 72255790 protocol: UNIX % lsof -p 1662 | tail firefox 1662 karl 211w FIFO 0,10 0t0 128627 pipe firefox 1662 karl 214u sock 0,8 0t0 76836895 protocol: UNIX firefox 1662 karl 217u a_inode 0,11 0 3075 [eventfd] firefox 1662 karl 224u REG 0,44 451 11607395 /home/karl/.cache/mozilla/firefox/0napnj6f.default/cache2/entries/7FC6F59E551AD926EB1B63FB3B1EAF3557965A43 firefox 1662 karl 225u unix 0xffff88084c867400 0t0 128630 type=STREAM firefox 1662 karl 226u a_inode 0,11 0 3075 [eventfd] firefox 1662 karl 236u REG 0,44 6432 5905765 /home/karl/.cache/mozilla/firefox/0napnj6f.default/cache2/entries/09666A2D197560C2734894AFBE9F106109153E63 firefox 1662 karl 240u REG 0,44 101 11592700 /home/karl/.cache/mozilla/firefox/0napnj6f.default/cache2/entries/A10DC5302AB850826103AD0DAC5197B406D93778 firefox 1662 karl 306u sock 0,8 0t0 9162371 protocol: UNIX firefox 1662 karl 425u sock 0,8 0t0 22395573 protocol: UNIX % lsof -p 1662 | grep STREAM | wc 37 333 3071 I haven't been using InvertColors so much, but it is still installed. Let's close this WFM for now at least, but I can try to remember to check those subtrees in a future build (if that's helpful).
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(karlt)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.