Open Bug 1594768 Opened 5 years ago Updated 1 year ago

Too many blob layers on https://www.goodfour.us/

Tracking

()

Status:

NEW

People

(Reporter: nical, Unassigned)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(4 files, 3 obsolete files)

Somewhat reduced: HTML file that includes the slow animation from github 4 years ago Jim Blandy :jimb 505 bytes, text/html		Details
A single layer from the animation 4 years ago Jim Blandy :jimb 480.75 KB, application/json		Details
Attempt to make the single-layer version viewable. 4 years ago Jim Blandy :jimb 415 bytes, text/html		Details
Attempt to make the single-layer version viewable. 4 years ago Jim Blandy :jimb 424 bytes, text/html		Details
Attempt to make the single-layer version viewable. 4 years ago Jim Blandy :jimb 386 bytes, text/html		Details
Attempt to make the single-layer version viewable. 4 years ago Jim Blandy :jimb 398 bytes, text/html		Details
A single frame of the animation. 4 years ago Jim Blandy :jimb 1.76 KB, image/svg+xml		Details

Nicolas Silva [:nical]

Reporter

Description

•

5 years ago

Just above halfway on the page, "Why Four Hours?" animation).

Nicolas Silva [:nical]

Reporter

Updated

•

5 years ago

Blocks: wr-blob-perf

Jim Blandy :jimb

Comment 1

•

4 years ago

Attached file Somewhat reduced: HTML file that includes the slow animation from github — Details

Jim Blandy :jimb

Comment 2

•

4 years ago

Attached file A single layer from the animation — Details

This is a reduced version of the animation on the original page, including only a single layer. This is a Lottie animation that decodes to an SVG consisting of a large square of solid color, masked to a small circle. The two seem to move together.

The full animation includes many layers, all using a similar technique. Firefox doesn't recognize that although the path in the layer is large, the visible portion of it is much smaller, so it ends up allocating a much larger backing pixel buffer than necessary, and drawing more area than necessary.

(Masking is expensive because it ostensibly requires us to fully render the underlying image (the large square), render the mask, compute a luminance value for each pixel, and then trim the underlying image according to the luminance values, a per-pixel multiplication.)

Firefox could render this faster by computing a bounding box for the portion of the mask with non-zero luminance, intersecting that with the bounding box of the underlying image, and using the result to decide what portion to render, cache, and draw.

Jim Blandy :jimb

Comment 3

•

4 years ago

Attached file Attempt to make the single-layer version viewable. (obsolete) — Details

Jim Blandy :jimb

Comment 4

•

4 years ago

Attached file Attempt to make the single-layer version viewable. (obsolete) — Details

Attachment #9121163 - Attachment is obsolete: true

Jim Blandy :jimb

Comment 5

•

4 years ago

Attached file Attempt to make the single-layer version viewable. (obsolete) — Details

Attachment #9121164 - Attachment is obsolete: true

Jeff Muizelaar [:jrmuizel]

Comment 6

•

4 years ago

So I believe we could get a significant improvement here if the computed geometry for the nsDisplayItemMasksAndClip included the bounds of the mask as is computed by ComputeMaskGeometry (https://searchfox.org/mozilla-central/source/layout/painting/nsDisplayList.cpp#9803).

Miko do you think intersecting the bounds of the mask in nsDisplayItemMasksAndClip is practical?

Jeff Muizelaar [:jrmuizel]

Updated

•

4 years ago

Flags: needinfo?(mikokm)

Jeff Muizelaar [:jrmuizel]

Comment 7

•

4 years ago

It would also be interesting to try disabling active layers to see how much worse they're making performance here.

Jim Blandy :jimb

Comment 8

•

4 years ago

Attached file Attempt to make the single-layer version viewable. — Details

Attachment #9121165 - Attachment is obsolete: true

Miko Mynttinen

Comment 9

•

4 years ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)

Miko do you think intersecting the bounds of the mask in nsDisplayItemMasksAndClip is practical?

Unfortunately I'm not familiar enough with masking to answer this. Markus probably knows this code better.

Flags: needinfo?(mikokm) → needinfo?(mstange)

Jim Blandy :jimb

Updated

•

4 years ago

Comment 10

•

4 years ago

My laptop has a high-DPI screen and integrated Intel graphics. We observed the WRWorkerLP threads spending most of their time waiting for malloc locks, so we doubled the size of the tiles (to quarter the number of allocations) and lowered the number of worker threads to match physical cpus, since hyperthreading is unlikely to benefit blitting tasks, and they're spending their time waiting for the malloc lock anyway.

This lowers the time a WRWorkerLP spends blocked on malloc to 20%, but still gives poor frame rates, with 28% of the time spent on mask operations. The way to improve that time is to shrink the area of the mask, so we don't need to create the mask, compute its luminance, and so on. If we can use the mask region to choose smaller bounding rectangles into the blob image recordings for the SVG, that would reduce these costs.

Jim Blandy :jimb

Comment 11

•

4 years ago

If we force jemalloc to use a thread-local arena even for large allocations, the malloc lock overhead disappears from the profiles.

Perhaps we should change mozjemalloc.cpp's choose_arena to let each thread specify its own size threshold for redirecting requests to the default arena. Then WRWorkerLP threads could specify a large threshold, to remove contention over allocations like this, whereas other threads could continue to operate under the same parameters they do now.

Bert Peers [:bpeers]

Comment 12

•

4 years ago

Interesting. How large is 'large'? One would expect large allocations to be infrequent unless we're consuming enormous amount of memory, and therefor not conflict a lot, assuming all the small cruft goes to a per-thread SBA. Is the problem perhaps that we're constantly allocating and freeing these large chunks? If so, a per-thread re-use / delayed-free strategy could also be an idea?

Jeff Muizelaar [:jrmuizel]

Updated

•

4 years ago

Depends on: 1610409

Markus Stange [:mstange]

Comment 13

•

4 years ago

•

Edited

(In reply to Jeff Muizelaar [:jrmuizel] from comment #6)

So I believe we could get a significant improvement here if the computed geometry for the nsDisplayMasksAndClip included the bounds of the mask as is computed by ComputeMaskGeometry (https://searchfox.org/mozilla-central/source/layout/painting/nsDisplayList.cpp#9803).

So the way this is supposed to work is the following:

The nesting in the display list is the following: We have a nsDisplayMasksAndClip item, and inside of that we have the display items for the masked contents.
When we build the display items for the mask and the contents, we try to compute the mask bounds and apply them as a clip to the masked contents.
Now there is a clip on the masked contents, but not on the wrapping nsDisplayMasksAndClip item.
However, the clip on the contents affects the bounds of the nsDisplayMasksAndClip item. This happens as follows: The bounds of the nsDisplayMasksAndClip item is computed by nsDisplayWrapList::UpdateBounds, which calls nsDisplayList::GetClippedBoundsWithRespectToASR which unions the clipped bounds of its child items. This is where the intersection with the clip we computed above is performed.

We should verify that the call to ComputeClipForMaskItem in BuildDisplayListForStackingContext is hit and computes an appropriate rectangle.

Flags: needinfo?(mstange)

Markus Stange [:mstange]

Comment 14

•

4 years ago

The bounds and clips of the Gecko display items are printed as part of a Gecko display list dump. You can get one from your testcase by setting the pref layout.display-list.dump-content to true and running the testcase; this pref works in any Firefox build.

Jim Blandy :jimb

Comment 15

•

4 years ago

Attached image A single frame of the animation. — Details

Markus Stange [:mstange]

Comment 16

•

4 years ago

Jeff traced the rect computation to a call to GetBBox. GetBBox takes into account clip-paths but not masks, as per spec (see the "If clipped is true" pieces). So if we want to take masks into account we have to additionally intersect the result of GetBBox with the mask bounds, and we may need a way to compute those mask bounds if we don't already have one.

Jim Blandy :jimb

Comment 17

•

4 years ago

Okay, this makes sense. I'd observed GetBBox behaving that way but didn't realize it was per spec.

Jim Blandy :jimb

Comment 18

•

4 years ago

ComputeClipForMaskItem, however, is calling GetAndObserveMasks, suggesting that it already intends to take masks into account; however, it only consults the masks of the top item; in the example, the top item has a clip, but its sole child has a mask.

Nicolas Silva [:nical]

Reporter

Updated

•

4 years ago

No longer blocks: wr-perf

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Gregory Pappas [:gregp]

Updated

•

1 year ago

Blocks: lottie

You need to log in before you can comment on or make changes to this bug.