Closed Bug 1400532 Opened 2 years ago Closed 2 years ago

Crash in mozilla::wr::ShmSegmentsWriter::Write

Categories

(Core :: Graphics: WebRender, defect, P1, critical)

57 Branch
Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- unaffected
firefox56 --- unaffected
firefox57 --- fixed

People

(Reporter: calixte, Assigned: nical)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [clouseau][gfx-noted][wr-mvp])

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-b786ef22-888a-4954-a503-cbf070170916.
=============================================================

There are 9 crashes in nightly 57 starting with buildid 20170915100121. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1393031.

[1] https://hg.mozilla.org/mozilla-central/rev?node=0b78bddcd706d1c029699a304cc33a5e73b8188b
Flags: needinfo?(nical.bugzilla)
I think that this is caused by us running out of file descriptors because the default chunks size in IpcResourceUpdateQueue is too small.

I am looking into adjusting the chunk size by browsing around and looking for a good compromise between number of chunks and wasted space at the end of the last chunk. It wouldn't hurt to multiply the default chunk size by at least 4.
Assignee: nobody → nical.bugzilla
Status: NEW → ASSIGNED
Flags: needinfo?(nical.bugzilla)
Priority: -- → P3
Whiteboard: [clouseau] → [clouseau][gfx-noted][wr-mvp]
Ouch. So most of the IpcResourceUpdateQueues don't have any allocation (great, right now we don't allocate any shmems for those), then the majority of the update queues that do have allocations are within a few kilobytes, but every now and then we receive a very large blob image (several megabytes for a single allocations), which allocates hundreds of shmem segments at once.

I had a look at the shared memory implementation on linux and windows and I think that shmem pages that are not written to or read are not committed (I'd love to have confirmation of that on windows), so we should be able to over-shoot a little the default chunk size so that most update queues fit in a handful of chunks (I am thinking maybe 32k as the default chunk size).
For very large allocations (> 1Mo) we need a separate thing. I think that the ShmSegmentsWriter should put them in a separate list of large shmems that are allocated specifically for that and are not shared with other allocations (for the sake of simplicity).

Another way could be to have variable length chunks, but I think that it would be more complicated and not that much better.

Separating out large allocations and having them in one contiguous block opens the door to keeping the large shmem around longer and having webrender read directly from it to avoid copying them in the future.

The other thing that would help is to make sure we move images out of blob image recordings since I guess it is the source of the large blobs, but we should still be able to handle large blobs for whatever other reason.
This adds a list of dedicated shmems for large allocations. Unlike with small allocs, the large one have a shmem per allocation. This also bumps the default chunk size for small allocation to 32k.

With this the number of shmems used in each transaction is more "under control". I browsed around and got up to a dozen of shmems allocated in a single transaction (vs hundreds before this patch in some cases).
Attachment #8909304 - Flags: review?(jmuizelaar)
Comment on attachment 8909304 [details] [diff] [review]
Separate small and large shmem allocations.

Review of attachment 8909304 [details] [diff] [review]:
-----------------------------------------------------------------

I think it's worth getting feedback from an IPC peer at this point.
Whiteboard: [clouseau][gfx-noted][wr-mvp] → [clouseau][gfx-noted][wr-mvp] [triage]
Comment on attachment 8909304 [details] [diff] [review]
Separate small and large shmem allocations.

David, Jeff and I would like some feedback from an ipc peer on the approach here.

A bit of background: this is on top of a change in bug 1393031 that introduced "IpcResourceUpdateQueue", which is a queue of webrender resource updates that have to be sent from the content to the parent process and have to refer to potentially large blobs of memory (moz2d recordings, font data, etc.). The problem with simply serializing the these blobs with ipdl is that they get copied around a bunch of times, and on the parent side we need to put them into rust-owned memory which adds another copy. IPDL has a magic fast path for nsTArray that allows them to be moved and avoid some (but not all) of the copies but I couldn't get something similar to work with rust-compatible vectors.

So, the solution I cam up with was to just use shmems and try to move towards something closer to optimal with that. The way IpcResourceUpdateQueue works is to push blobs of bytes in a list of fixed size (32k) shmems using a bump allocator, and on the compositor side copy from the shmems to rust-owned vectors and dealoc the shmems as soonas we receive them. Blobs can end up split into several shmems. That's the current state of things in mozilla-central.

We occasionally run into very large blobs (5Mo) that will generate a lot of shmems and blow up the fd limit. So the next step (this patch) is to special case large allocations (anything bigger than twice the default chunk size) and put them in a separate shmem of the size of the large allocation.

Let us know if it sounds like we are going in the wrong direction and/or if there is a way (in the short term) to send large blobs of bytes though ipdl with as few copies as possible.
Attachment #8909304 - Flags: feedback?(dvander)
Priority: P3 → P1
Whiteboard: [clouseau][gfx-noted][wr-mvp] [triage] → [clouseau][gfx-noted][wr-mvp]
Target Milestone: --- → mozilla57
So I looked again at what Mojo is doing and they use a single shared memory buffer that's allocated at DataPipe creation time. This shared memory buffer is used a ring buffer and they block if there's not enough room inside of it.
https://cs.chromium.org/chromium/src/mojo/edk/system/data_pipe_producer_dispatcher.cc?q=DataPipeProducerDispatcher::WriteData&sq=package:chromium&l=99

This seems like a safer approach than doing allocations on demand. Having the blocking behaviour will make things more complicated, but we effectively have this complication with current ipc io approach so it seems like it could still be made workable.
Attachment #8909304 - Flags: review?(jmuizelaar) → review+
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d5e1533786ee
Separate small and large shmem allocations in IpcResourceUpdateQueue. r=jrmuizel
https://hg.mozilla.org/mozilla-central/rev/d5e1533786ee
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
(In reply to Nicolas Silva [:nical] from comment #5)
> Comment on attachment 8909304 [details] [diff] [review]
> Separate small and large shmem allocations.
> 
> David, Jeff and I would like some feedback from an ipc peer on the approach
> here.
> 
> A bit of background: this is on top of a change in bug 1393031 that
> introduced "IpcResourceUpdateQueue", which is a queue of webrender resource
> updates that have to be sent from the content to the parent process and have
> to refer to potentially large blobs of memory (moz2d recordings, font data,
> etc.). The problem with simply serializing the these blobs with ipdl is that
> they get copied around a bunch of times, and on the parent side we need to
> put them into rust-owned memory which adds another copy. IPDL has a magic
> fast path for nsTArray that allows them to be moved and avoid some (but not
> all) of the copies but I couldn't get something similar to work with
> rust-compatible vectors.
> 
> So, the solution I cam up with was to just use shmems and try to move
> towards something closer to optimal with that. The way
> IpcResourceUpdateQueue works is to push blobs of bytes in a list of fixed
> size (32k) shmems using a bump allocator, and on the compositor side copy
> from the shmems to rust-owned vectors and dealoc the shmems as soonas we
> receive them. Blobs can end up split into several shmems. That's the current
> state of things in mozilla-central.
> 
> We occasionally run into very large blobs (5Mo) that will generate a lot of
> shmems and blow up the fd limit. So the next step (this patch) is to special
> case large allocations (anything bigger than twice the default chunk size)
> and put them in a separate shmem of the size of the large allocation.
> 
> Let us know if it sounds like we are going in the wrong direction and/or if
> there is a way (in the short term) to send large blobs of bytes though ipdl
> with as few copies as possible.

Starting small and scaling up (or even back) based on allocation patterns seems reasonable. Even with moves supported fully in IPDL I don't think it would be the right route, shmems will work better. Do they get recycled back to the client?

On Windows, shmems should be allocated in 64KB chunks. That's the minimum size VirtualAlloc will return and anything else wastes address space.
Attachment #8909304 - Flags: feedback?(dvander) → feedback+
Blocks: 1402274
Shmems don't get recycled in the client at the moment. It shouldn't be too hard to add a bit of logic that does that, unless we want to implemement something similar to chromium's data pipe in the short term.

> On Windows, shmems should be allocated in 64KB chunks. That's the minimum size VirtualAlloc will return and anything else wastes address space.

Ah, this is good to know! Beyond 64k, are allocations rounded up to a multiple of something other than page sizes? Would we fall into some sort of sweet spot if we ask for 1024 * (64 - 2) bytes to account for ipdl's two guard pages?
IIRC, the page size is 4KB on windows but the "allocation granularity" is 64KB. So yeah the sweet spot would be 64KB - 2 pages. It'd be nice if IPDL exposed a function to give you back an ideal size >= requested bytes.
You need to log in before you can comment on or make changes to this bug.