1110530 - GiBs of cumulative heap allocations under nsRegion::Or() when viewing Treeherder

Reporter

Description

•

10 years ago

I did cumulative heap profiling with DMD while viewing several Treeherder tabs, which is a good stress test for layout stuff.

I see GiBs of allocations under nsRegion::Or().

> Cumulative {
>   ~340,938 blocks in heap block record 1 of 18,221
>   ~1,974,381,515 bytes (~1,568,500,507 requested / ~405,881,008 slop)
>   Individual block sizes: 12,288 x 29,986; 8,192 x 81,198; 4,096 x 118,803; ~4,093 x 110,951
>   22.81% of the heap (22.81% cumulative)
>   Allocated at {
>     #01: alloc_data (/home/njn/moz/mi4/co64dmd/gfx/cairo/libpixman/src/../../../../../gfx/cairo/libpixman/src/pixman-region.c:214)
>     #02: pixman_rect_alloc (/home/njn/moz/mi4/co64dmd/gfx/cairo/libpixman/src/../../../../../gfx/cairo/libpixman/src/pixman-region.c:465)
>     #03: pixman_op (/home/njn/moz/mi4/co64dmd/gfx/cairo/libpixman/src/../../../../../gfx/cairo/libpixman/src/pixman-region.c:828)
>     #04: _moz_pixman_region32_union (/home/njn/moz/mi4/co64dmd/gfx/cairo/libpixman/src/../../../../../gfx/cairo/libpixman/src/pixman-region.c:1440)
>     #05: _moz_pixman_region32_union_rect (/home/njn/moz/mi4/co64dmd/gfx/cairo/libpixman/src/../../../../../gfx/cairo/libpixman/src/pixman-region.c:1368)
>     #06: nsRegion::Or(nsRegion const&, nsRect const&) (/home/njn/moz/mi4/o64dmd/layout/mathml/../../dist/include/nsRegion.h:140)
>     #07: nsIntRegion::Or(nsIntRegion const&, nsIntRect const&) (/home/njn/moz/mi4/o64dmd/layout/svg/../../dist/include/nsRegion.h:532)
>     #08: mozilla::PaintedLayerData::Accumulate(mozilla::ContainerState*, nsDisplayItem*, nsIntRegion const&, nsIntRect const&, nsIntRect const&, mozilla::DisplayItemClip const&
> ) (/home/njn/moz/mi4/o64dmd/layout/base/../../../layout/base/FrameLayerBuilder.cpp:2356)
>     #09: mozilla::ContainerState::ProcessDisplayItems(nsDisplayList*) (/home/njn/moz/mi4/o64dmd/layout/base/../../../layout/base/FrameLayerBuilder.cpp:3077)
>     #10: mozilla::FrameLayerBuilder::BuildContainerLayerFor(nsDisplayListBuilder*, mozilla::layers::LayerManager*, nsIFrame*, nsDisplayItem*, nsDisplayList*, mozilla::Container
> LayerParameters const&, mozilla::gfx::Matrix4x4 const*, unsigned int) (/home/njn/moz/mi4/o64dmd/layout/base/../../../layout/base/FrameLayerBuilder.cpp:4037)
>     #11: nsDisplayList::PaintRoot(nsDisplayListBuilder*, nsRenderingContext*, unsigned int) (/home/njn/moz/mi4/o64dmd/layout/base/../../../layout/base/nsDisplayList.cpp:1467)
>     #12: nsLayoutUtils::PaintFrame(nsRenderingContext*, nsIFrame*, nsRegion const&, unsigned int, unsigned int) (/home/njn/moz/mi4/o64dmd/layout/base/../../../layout/base/nsLay
> outUtils.cpp:3129) 
>     #13: PresShell::Paint(nsView*, nsRegion const&, unsigned int) (/home/njn/moz/mi4/o64dmd/layout/base/../../../layout/base/nsPresShell.cpp:6343)
>     #14: nsViewManager::ProcessPendingUpdatesPaint(nsIWidget*) (/home/njn/moz/mi4/o64dmd/view/../../view/nsViewManager.cpp:443)
>     #15: nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) (/home/njn/moz/mi4/o64dmd/view/../../view/nsViewManager.cpp:380)
>     #16: nsRefreshDriver::Tick(long, mozilla::TimeStamp) (/home/njn/moz/mi4/o64dmd/layout/base/../../../layout/base/nsRefreshDriver.cpp:1358)
>     #17: mozilla::RefreshDriverTimer::Tick() (/home/njn/moz/mi4/o64dmd/layout/base/../../../layout/base/nsRefreshDriver.cpp:160)
>     #18: nsTimerImpl::Fire() (/home/njn/moz/mi4/co64dmd/xpcom/threads/../../../xpcom/threads/nsTimerImpl.cpp:637)
>     #19: nsTimerEvent::Run() (/home/njn/moz/mi4/co64dmd/xpcom/threads/../../../xpcom/threads/nsTimerImpl.cpp:719)
>     #20: nsThread::ProcessNextEvent(bool, bool*) (/home/njn/moz/mi4/co64dmd/xpcom/threads/../../../xpcom/threads/nsThread.cpp:830)
>     #21: NS_ProcessNextEvent(nsIThread*, bool) (/home/njn/moz/mi4/xpcom/glue/nsThreadUtils.cpp:265)
>     #22: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (/home/njn/moz/mi4/o64dmd/ipc/glue/../../../ipc/glue/MessagePump.cpp:99)
>     #23: MessageLoop::Run() (/home/njn/moz/mi4/co64dmd/ipc/chromium/../../../ipc/chromium/src/base/message_loop.cc:201)
>   }
> }

Note that count of ~340,938 blocks is probably a significant underestimate due
to sampling, because lots of these allocations are small.

I did some ad hoc instrumentation of the cases where pixman_op() calls
pixman_rect_alloc(). From a different run, here are the most common frequencies:

> 5415811 counts:
> (  1)  2765520 (51.1%, 51.1%): alloc: 1, 1 --> 2 (48 bytes)
> (  2)   861066 (15.9%, 67.0%): alloc: 2, 1 --> 4 (80 bytes)
> (  3)   334603 ( 6.2%, 73.1%): alloc: 4, 1 --> 8 (144 bytes)
> (  4)   270705 ( 5.0%, 78.1%): alloc: 3, 1 --> 6 (112 bytes)
> (  5)    41636 ( 0.8%, 78.9%): alloc: 5, 1 --> 10 (176 bytes)
> (  6)    35948 ( 0.7%, 79.6%): alloc: 1, 2 --> 4 (80 bytes)
> (  7)    30061 ( 0.6%, 80.1%): alloc: 11, 1 --> 22 (368 bytes)
> (  8)    29318 ( 0.5%, 80.7%): alloc: 7, 1 --> 14 (240 bytes)
> (  9)    28716 ( 0.5%, 81.2%): alloc: 19, 1 --> 38 (624 bytes)
> ( 10)    26825 ( 0.5%, 81.7%): alloc: 9, 1 --> 18 (304 bytes)
> ( 11)    26640 ( 0.5%, 82.2%): alloc: 6, 1 --> 12 (208 bytes)
> ( 12)    23776 ( 0.4%, 82.6%): alloc: 15, 1 --> 30 (496 bytes)
> ( 13)    22651 ( 0.4%, 83.0%): alloc: 13, 1 --> 26 (432 bytes)
> ( 14)    19350 ( 0.4%, 83.4%): alloc: 17, 1 --> 34 (560 bytes)
> ( 15)    13720 ( 0.3%, 83.7%): alloc: 8, 1 --> 16 (272 bytes)
> ( 16)    13417 ( 0.2%, 83.9%): alloc: 25, 1 --> 50 (816 bytes)
> ( 17)    12150 ( 0.2%, 84.1%): alloc: 1, 3 --> 6 (112 bytes)
> ( 18)    11147 ( 0.2%, 84.3%): alloc: 29, 1 --> 58 (944 bytes)
> ( 19)    10806 ( 0.2%, 84.5%): alloc: 23, 1 --> 46 (752 bytes)
> ( 20)    10727 ( 0.2%, 84.7%): alloc: 21, 1 --> 42 (688 bytes)
> ( 21)    10545 ( 0.2%, 84.9%): alloc: 18, 1 --> 36 (592 bytes)
> ( 22)    10374 ( 0.2%, 85.1%): alloc: 10, 1 --> 20 (336 bytes)
> ( 23)    10153 ( 0.2%, 85.3%): alloc: 213, 1 --> 426 (6832 bytes)
> ( 24)     9583 ( 0.2%, 85.5%): alloc: 1, 4 --> 8 (144 bytes)
> ( 25)     9563 ( 0.2%, 85.7%): alloc: 27, 1 --> 54 (880 bytes)
> ( 26)     9114 ( 0.2%, 85.8%): alloc: 217, 1 --> 434 (6960 bytes)
> ( 27)     8814 ( 0.2%, 86.0%): alloc: 16, 1 --> 32 (528 bytes)
> ( 28)     8327 ( 0.2%, 86.1%): alloc: 31, 1 --> 62 (1008 bytes)
> ( 29)     8203 ( 0.2%, 86.3%): alloc: 33, 1 --> 66 (1072 bytes)
> ( 30)     8005 ( 0.1%, 86.4%): alloc: 215, 1 --> 430 (6896 bytes)

The first line says that 5.4 million allocations were done.

The second line says that for 51.1% of those calls, both the regions had 1
rectangle, the result had 2, and the size of the allocation for the result was
48 bytes.

The third line says that for 15.9% of those calls, the first region had 2
rectangles, the second had one, and the result had 4, which cost 80 bytes.
And so on.

The vast majority of cases involved one at least one trivial region (i.e. a
region with a single rectangle). And more than half involved two trivial
regions.

Here's the same data, but with each line weighted by the allocation size:

> 3455668241 counts:
> (  1) 132744960 ( 3.8%,  3.8%): alloc: 1, 1 --> 2 (48 bytes)
> (  2) 69365296 ( 2.0%,  5.8%): alloc: 213, 1 --> 426 (6832 bytes)
> (  3) 68885280 ( 2.0%,  7.8%): alloc: 2, 1 --> 4 (80 bytes)
> (  4) 63433440 ( 1.8%,  9.7%): alloc: 217, 1 --> 434 (6960 bytes)
> (  5) 55202480 ( 1.6%, 11.3%): alloc: 215, 1 --> 430 (6896 bytes)
> (  6) 53954688 ( 1.6%, 12.8%): alloc: 223, 1 --> 446 (7152 bytes)
> (  7) 53382400 ( 1.5%, 14.4%): alloc: 219, 1 --> 438 (7024 bytes)
> (  8) 48182832 ( 1.4%, 15.8%): alloc: 4, 1 --> 8 (144 bytes)
> (  9) 44983120 ( 1.3%, 17.1%): alloc: 227, 1 --> 454 (7280 bytes)
> ( 10) 42826960 ( 1.2%, 18.3%): alloc: 225, 1 --> 450 (7216 bytes)
> ( 11) 40841040 ( 1.2%, 19.5%): alloc: 224, 1 --> 448 (7184 bytes)
> ( 12) 38347488 ( 1.1%, 20.6%): alloc: 211, 1 --> 422 (6768 bytes)
> ( 13) 36496800 ( 1.1%, 21.7%): alloc: 205, 1 --> 410 (6576 bytes)
> ( 14) 35151440 ( 1.0%, 22.7%): alloc: 222, 1 --> 444 (7120 bytes)
> ( 15) 33723648 ( 1.0%, 23.7%): alloc: 229, 1 --> 458 (7344 bytes)
> ( 16) 33543744 ( 1.0%, 24.6%): alloc: 226, 1 --> 452 (7248 bytes)
> ( 17) 33367056 ( 1.0%, 25.6%): alloc: 198, 1 --> 396 (6352 bytes)
> ( 18) 32548096 ( 0.9%, 26.5%): alloc: 221, 1 --> 442 (7088 bytes)
> ( 19) 31314096 ( 0.9%, 27.4%): alloc: 238, 1 --> 476 (7632 bytes)
> ( 20) 30666528 ( 0.9%, 28.3%): alloc: 228, 1 --> 456 (7312 bytes)
> ( 21) 30318960 ( 0.9%, 29.2%): alloc: 3, 1 --> 6 (112 bytes)
> ( 22) 29604864 ( 0.9%, 30.1%): alloc: 209, 1 --> 418 (6704 bytes)
> ( 23) 29549520 ( 0.9%, 30.9%): alloc: 214, 1 --> 428 (6864 bytes)
> ( 24) 28934160 ( 0.8%, 31.8%): alloc: 232, 1 --> 464 (7440 bytes)
> ( 25) 28670512 ( 0.8%, 32.6%): alloc: 230, 1 --> 460 (7376 bytes)
> ( 26) 28249504 ( 0.8%, 33.4%): alloc: 239, 1 --> 478 (7664 bytes)
> ( 27) 28005264 ( 0.8%, 34.2%): alloc: 220, 1 --> 440 (7056 bytes)
> ( 28) 25973072 ( 0.8%, 35.0%): alloc: 216, 1 --> 432 (6928 bytes)
> ( 29) 25902880 ( 0.7%, 35.7%): alloc: 242, 1 --> 484 (7760 bytes)
> ( 30) 25822016 ( 0.7%, 36.5%): alloc: 236, 1 --> 472 (7568 bytes)

3.45 GiB of allocations!

The way the number of rectangles in the result is computed is this:

> max(numRects1, numRects2) * 2

I don't understand the region combining operations, but this seems pessimistic.
I wonder if we can allocate smaller sizes, and sometimes avoid allocating at
all (e.g. if we know the result will be trivial).

Nicholas Nethercote [inactive]

Reporter

Comment 1

•

10 years ago

Here's some more measurements. This time I added the number of rectangles we
actually ended up with in the result, in square brackets.

> 785992 counts:
> (  1)   176759 (22.5%, 22.5%): final: 1, 1 --> 2 [1] (48 bytes)
> (  2)    65387 ( 8.3%, 30.8%): final: 1, 1 --> 2 [2] (48 bytes)
> (  3)    55482 ( 7.1%, 37.9%): final: 4, 1 --> 8 [4] (144 bytes)
> (  4)    54464 ( 6.9%, 44.8%): final: 2, 1 --> 4 [0] (80 bytes)
> (  5)    35276 ( 4.5%, 49.3%): final: 3, 1 --> 6 [3] (112 bytes)
> (  6)    32790 ( 4.2%, 53.5%): final: 2, 1 --> 4 [2] (80 bytes)
> (  7)    24674 ( 3.1%, 56.6%): final: 2, 1 --> 4 [1] (80 bytes)
> (  8)    15314 ( 1.9%, 58.5%): final: 1, 2 --> 4 [1] (80 bytes)
> (  9)    13779 ( 1.8%, 60.3%): final: 6, 1 --> 12 [6] (208 bytes)
> ( 10)    12176 ( 1.5%, 61.8%): final: 3, 1 --> 6 [1] (112 bytes)
> ( 11)    10224 ( 1.3%, 63.1%): final: 1, 1 --> 2 [0] (48 bytes)
> ( 12)    10166 ( 1.3%, 64.4%): final: 4, 1 --> 8 [1] (144 bytes)
> ( 13)     8448 ( 1.1%, 65.5%): final: 1, 3 --> 6 [1] (112 bytes)
> ( 14)     7992 ( 1.0%, 66.5%): final: 7, 1 --> 14 [7] (240 bytes)
> ( 15)     7198 ( 0.9%, 67.4%): final: 5, 1 --> 10 [5] (176 bytes)
> ( 16)     6780 ( 0.9%, 68.3%): final: 5, 1 --> 10 [1] (176 bytes)
> ( 17)     6389 ( 0.8%, 69.1%): final: 9, 1 --> 18 [1] (304 bytes)
> ( 18)     5849 ( 0.7%, 69.9%): final: 3, 3 --> 6 [0] (112 bytes)
> ( 19)     5779 ( 0.7%, 70.6%): final: 9, 1 --> 18 [9] (304 bytes)
> ( 20)     4817 ( 0.6%, 71.2%): final: 11, 1 --> 22 [11] (368 bytes)
> ( 21)     4394 ( 0.6%, 71.8%): final: 1, 9 --> 18 [1] (304 bytes)
> ( 22)     4144 ( 0.5%, 72.3%): final: 1, 5 --> 10 [1] (176 bytes)
> ( 23)     3716 ( 0.5%, 72.8%): final: 8, 1 --> 16 [8] (272 bytes)
> ( 24)     3618 ( 0.5%, 73.2%): final: 1, 4 --> 8 [1] (144 bytes)
> ( 25)     3612 ( 0.5%, 73.7%): final: 1, 1 --> 2 [3] (48 bytes)
> ( 26)     3556 ( 0.5%, 74.1%): final: 17, 1 --> 34 [17] (560 bytes)
> ( 27)     3542 ( 0.5%, 74.6%): final: 15, 1 --> 30 [15] (496 bytes)
> ( 28)     3094 ( 0.4%, 75.0%): final: 20, 1 --> 40 [20] (656 bytes)
> ( 29)     2981 ( 0.4%, 75.4%): final: 13, 1 --> 26 [13] (432 bytes)
> ( 30)     2875 ( 0.4%, 75.7%): final: 19, 1 --> 38 [19] (624 bytes)

So the majority of the time when we combine two trivial regions and allocate,
we end up with a non-trivial region. But sometimes we end up with a trivial
region. So that seems tricky.

But when we combine a trivial with a non-trivial, the resulting size looks like
it's always equal to the size of the non-trivial input. So perhaps we can do
better there? It seems like the initial result size can be an underestimate --
e.g. look at line (25) -- so in the non-1,1 case perhaps we should use
max(numRects1,numRects2), i.e. omit the doubling?

Nicholas Nethercote [inactive]

Reporter

Comment 2

•

10 years ago

Or if we could avoid doing so many nsRegion::Or() calls in the first place, that'd be even better :)

Nicholas Nethercote [inactive]

Reporter

Comment 3

•

10 years ago

> So the majority of the time when we combine two trivial regions and allocate,
> we end up with a non-trivial region. But sometimes we end up with a trivial
> region.

Sorry, I switched those two cases. |1,1 --> [1]| is more common than |1,1 --> [2]|. |1,1 --> [0]| also happens, as does |1,1 --> [3]|!

Nicholas Nethercote [inactive]

Reporter

Comment 4

•

10 years ago

I tried removing the doubling from the result size pre-computation. Things
appeared to work ok. Here are the numbers. The lines with "!!!" are the cases
where the pre-computation underestimates the final result (which is again in
square brackets).

> 2696156 counts:
> (  1)   939179 (34.8%, 34.8%): final: 1, 1 --> 1 [1] (32 bytes)
> (  2)   322206 (12.0%, 46.8%): final: 1, 1 --> 1 [2] (32 bytes) !!!
> (  3)   300965 (11.2%, 57.9%): final: 2, 1 --> 2 [0] (48 bytes)
> (  4)   161216 ( 6.0%, 63.9%): final: 4, 1 --> 4 [4] (80 bytes)
> (  5)    88461 ( 3.3%, 67.2%): final: 3, 1 --> 3 [3] (64 bytes)
> (  6)    73121 ( 2.7%, 69.9%): final: 2, 1 --> 2 [1] (48 bytes)
> (  7)    53959 ( 2.0%, 71.9%): final: 2, 1 --> 2 [2] (48 bytes)
> (  8)    43591 ( 1.6%, 73.5%): final: 1, 2 --> 2 [1] (48 bytes)
> (  9)    19423 ( 0.7%, 74.3%): final: 1, 1 --> 1 [0] (32 bytes)
> ( 10)    19245 ( 0.7%, 75.0%): final: 3, 1 --> 3 [1] (64 bytes)
> ( 11)    19187 ( 0.7%, 75.7%): final: 5, 1 --> 5 [5] (96 bytes)
> ( 12)    16253 ( 0.6%, 76.3%): final: 1, 3 --> 3 [1] (64 bytes)
> ( 13)    15936 ( 0.6%, 76.9%): final: 6, 1 --> 6 [6] (112 bytes)
> ( 14)    15667 ( 0.6%, 77.5%): final: 10, 1 --> 10 [10] (176 bytes)
> ( 15)    14749 ( 0.5%, 78.0%): final: 9, 1 --> 9 [9] (160 bytes)
> ( 16)    13977 ( 0.5%, 78.5%): final: 7, 1 --> 7 [7] (128 bytes)
> ( 17)    12882 ( 0.5%, 79.0%): final: 4, 1 --> 4 [1] (80 bytes)
> ( 18)    11466 ( 0.4%, 79.4%): final: 13, 1 --> 13 [13] (224 bytes)
> ( 19)    11286 ( 0.4%, 79.8%): final: 15, 1 --> 15 [15] (256 bytes)
> ( 20)    10992 ( 0.4%, 80.3%): final: 11, 1 --> 11 [11] (192 bytes)
> ( 21)    10339 ( 0.4%, 80.6%): final: 19, 1 --> 19 [19] (320 bytes)
> ( 22)     9949 ( 0.4%, 81.0%): final: 17, 1 --> 17 [17] (288 bytes)
> ( 23)     8215 ( 0.3%, 81.3%): final: 1, 4 --> 4 [1] (80 bytes)
> ( 24)     7957 ( 0.3%, 81.6%): final: 1, 1 --> 1 [3] (32 bytes) !!!
> ( 25)     7279 ( 0.3%, 81.9%): final: 29, 1 --> 29 [29] (480 bytes)
> ( 26)     7022 ( 0.3%, 82.1%): final: 6, 1 --> 6 [1] (112 bytes)
> ( 27)     6886 ( 0.3%, 82.4%): final: 21, 1 --> 21 [21] (352 bytes)
> ( 28)     6115 ( 0.2%, 82.6%): final: 25, 1 --> 25 [25] (416 bytes)
> ( 29)     6033 ( 0.2%, 82.8%): final: 217, 1 --> 217 [217] (3488 bytes)
> ( 30)     5573 ( 0.2%, 83.0%): final: 31, 1 --> 31 [31] (512 bytes)
> ( 31)     5501 ( 0.2%, 83.3%): final: 23, 1 --> 23 [23] (384 bytes)
> ( 32)     5355 ( 0.2%, 83.5%): final: 27, 1 --> 27 [27] (448 bytes)
> ( 33)     5342 ( 0.2%, 83.6%): final: 7, 1 --> 7 [1] (128 bytes)
> ( 34)     5275 ( 0.2%, 83.8%): final: 4, 1 --> 4 [5] (80 bytes) !!!
> ( 35)     5228 ( 0.2%, 84.0%): final: 33, 1 --> 33 [33] (544 bytes)

Line (2) shows the only common case of underestimation. I'm assuming the
algorithm manages to correctly grow the buffer as needed in that case.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 5

•

10 years ago

Attached patch Use a rectangle instead of a region for computing the layer bounds — Details — Splinter Review

If this first Or was the bulk of the allocations this should remove that completely. I expect there might be more then just this one though.

Attachment #8535322 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 6

•

10 years ago

Comment on attachment 8535322 [details] [diff] [review]
Use a rectangle instead of a region for computing the layer bounds

Review of attachment 8535322 [details] [diff] [review]:
-----------------------------------------------------------------

Nice catch.

Attachment #8535322 - Flags: review?(roc) → review+

Nicholas Nethercote [inactive]

Reporter

Comment 7

•

10 years ago

Looks like the patch helps a bit, but not a lot.

Before:

> 694023 counts:
> (  1)   188923 (27.2%, 27.2%): final: 1, 1 --> 2 [1] (48 bytes)
> (  2)    65567 ( 9.4%, 36.7%): final: 1, 1 --> 2 [2] (48 bytes)
> (  3)    58486 ( 8.4%, 45.1%): final: 2, 1 --> 4 [0] (80 bytes)
> (  4)    50483 ( 7.3%, 52.4%): final: 4, 1 --> 8 [4] (144 bytes)
> (  5)    37731 ( 5.4%, 57.8%): final: 2, 1 --> 4 [2] (80 bytes)
> (  6)    22565 ( 3.3%, 61.1%): final: 2, 1 --> 4 [1] (80 bytes)
> (  7)    17330 ( 2.5%, 63.6%): final: 3, 1 --> 6 [3] (112 bytes)
> (  8)    14639 ( 2.1%, 65.7%): final: 1, 2 --> 4 [1] (80 bytes)
> (  9)     9890 ( 1.4%, 67.1%): final: 1, 1 --> 2 [0] (48 bytes)
> ( 10)     6676 ( 1.0%, 68.1%): final: 5, 1 --> 10 [1] (176 bytes)
> ( 11)     6192 ( 0.9%, 68.9%): final: 4, 1 --> 8 [1] (144 bytes)
> ( 12)     5716 ( 0.8%, 69.8%): final: 5, 1 --> 10 [5] (176 bytes)
> ( 13)     4467 ( 0.6%, 70.4%): final: 15, 1 --> 30 [15] (496 bytes)
> ( 14)     4441 ( 0.6%, 71.1%): final: 13, 1 --> 26 [13] (432 bytes)
> ( 15)     4223 ( 0.6%, 71.7%): final: 9, 1 --> 18 [9] (304 bytes)

After:
 
> 578222 counts:
> (  1)   181940 (31.5%, 31.5%): final: 1, 1 --> 2 [1] (48 bytes)
> (  2)    63264 (10.9%, 42.4%): final: 1, 1 --> 2 [2] (48 bytes)
> (  3)    59153 (10.2%, 52.6%): final: 2, 1 --> 4 [0] (80 bytes)
> (  4)    39473 ( 6.8%, 59.5%): final: 4, 1 --> 8 [4] (144 bytes)
> (  5)    28889 ( 5.0%, 64.5%): final: 2, 1 --> 4 [2] (80 bytes)
> (  6)    27559 ( 4.8%, 69.2%): final: 2, 1 --> 4 [1] (80 bytes)
> (  7)    18165 ( 3.1%, 72.4%): final: 1, 2 --> 4 [1] (80 bytes)
> (  8)    17460 ( 3.0%, 75.4%): final: 3, 1 --> 6 [3] (112 bytes)
> (  9)    11496 ( 2.0%, 77.4%): final: 1, 1 --> 2 [0] (48 bytes)
> ( 10)     7521 ( 1.3%, 78.7%): final: 4, 1 --> 8 [1] (144 bytes)
> ( 11)     5489 ( 0.9%, 79.6%): final: 1, 3 --> 6 [1] (112 bytes)
> ( 12)     4767 ( 0.8%, 80.4%): final: 1, 4 --> 8 [1] (144 bytes)
> ( 13)     4193 ( 0.7%, 81.2%): final: 6, 1 --> 12 [1] (208 bytes)
> ( 14)     4140 ( 0.7%, 81.9%): final: 5, 1 --> 10 [1] (176 bytes)
> ( 15)     3874 ( 0.7%, 82.6%): final: 10, 1 --> 20 [10] (336 bytes)

And I don't know how noisy these measurements are, so even the small win may be
illusory.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 8

•

10 years ago

(In reply to Nicholas Nethercote [:njn] from comment #7)
> Looks like the patch helps a bit, but not a lot.
> 
> Before:
> 
> > 694023 counts:
> > (  1)   188923 (27.2%, 27.2%): final: 1, 1 --> 2 [1] (48 bytes)
> > (  2)    65567 ( 9.4%, 36.7%): final: 1, 1 --> 2 [2] (48 bytes)
> > (  3)    58486 ( 8.4%, 45.1%): final: 2, 1 --> 4 [0] (80 bytes)
> > (  4)    50483 ( 7.3%, 52.4%): final: 4, 1 --> 8 [4] (144 bytes)
> > (  5)    37731 ( 5.4%, 57.8%): final: 2, 1 --> 4 [2] (80 bytes)
> > (  6)    22565 ( 3.3%, 61.1%): final: 2, 1 --> 4 [1] (80 bytes)
> > (  7)    17330 ( 2.5%, 63.6%): final: 3, 1 --> 6 [3] (112 bytes)
> > (  8)    14639 ( 2.1%, 65.7%): final: 1, 2 --> 4 [1] (80 bytes)
> > (  9)     9890 ( 1.4%, 67.1%): final: 1, 1 --> 2 [0] (48 bytes)
> > ( 10)     6676 ( 1.0%, 68.1%): final: 5, 1 --> 10 [1] (176 bytes)
> > ( 11)     6192 ( 0.9%, 68.9%): final: 4, 1 --> 8 [1] (144 bytes)
> > ( 12)     5716 ( 0.8%, 69.8%): final: 5, 1 --> 10 [5] (176 bytes)
> > ( 13)     4467 ( 0.6%, 70.4%): final: 15, 1 --> 30 [15] (496 bytes)
> > ( 14)     4441 ( 0.6%, 71.1%): final: 13, 1 --> 26 [13] (432 bytes)
> > ( 15)     4223 ( 0.6%, 71.7%): final: 9, 1 --> 18 [9] (304 bytes)
> 
> After:
>  
> > 578222 counts:
> > (  1)   181940 (31.5%, 31.5%): final: 1, 1 --> 2 [1] (48 bytes)
> > (  2)    63264 (10.9%, 42.4%): final: 1, 1 --> 2 [2] (48 bytes)
> > (  3)    59153 (10.2%, 52.6%): final: 2, 1 --> 4 [0] (80 bytes)
> > (  4)    39473 ( 6.8%, 59.5%): final: 4, 1 --> 8 [4] (144 bytes)
> > (  5)    28889 ( 5.0%, 64.5%): final: 2, 1 --> 4 [2] (80 bytes)
> > (  6)    27559 ( 4.8%, 69.2%): final: 2, 1 --> 4 [1] (80 bytes)
> > (  7)    18165 ( 3.1%, 72.4%): final: 1, 2 --> 4 [1] (80 bytes)
> > (  8)    17460 ( 3.0%, 75.4%): final: 3, 1 --> 6 [3] (112 bytes)
> > (  9)    11496 ( 2.0%, 77.4%): final: 1, 1 --> 2 [0] (48 bytes)
> > ( 10)     7521 ( 1.3%, 78.7%): final: 4, 1 --> 8 [1] (144 bytes)
> > ( 11)     5489 ( 0.9%, 79.6%): final: 1, 3 --> 6 [1] (112 bytes)
> > ( 12)     4767 ( 0.8%, 80.4%): final: 1, 4 --> 8 [1] (144 bytes)
> > ( 13)     4193 ( 0.7%, 81.2%): final: 6, 1 --> 12 [1] (208 bytes)
> > ( 14)     4140 ( 0.7%, 81.9%): final: 5, 1 --> 10 [1] (176 bytes)
> > ( 15)     3874 ( 0.7%, 82.6%): final: 10, 1 --> 20 [10] (336 bytes)
> 
> And I don't know how noisy these measurements are, so even the small win may
> be
> illusory.

Is there a way to narrow down which call to nsIntRegion::Or is responsible for the majority of the allocations?

Flags: needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Reporter

Comment 9

•

10 years ago

> Is there a way to narrow down which call to nsIntRegion::Or is responsible
> for the majority of the allocations?

The stack trace in comment 0 indicates where most of the problem lies. And
you've actually fixed much of it... I forgot to weight each line in comment 8
by its size. If I do that, and tweak the instrumentation so it's just
considering |union| calls, I get this:

Before:

> 608942272 counts:
> (  1) 11084816 ( 1.8%,  1.8%): union-alloc: 245, 1 --> 490 (7856 bytes)
> (  2) 10486896 ( 1.7%,  3.5%): union-alloc: 241, 1 --> 482 (7728 bytes)
> (  3) 10347024 ( 1.7%,  5.2%): union-alloc: 1, 1 --> 2 (48 bytes)
> (  4)  9507600 ( 1.6%,  6.8%): union-alloc: 4, 1 --> 8 (144 bytes)
> (  5)  9342144 ( 1.5%,  8.3%): union-alloc: 220, 1 --> 440 (7056 bytes)
> (  6)  9066800 ( 1.5%,  9.8%): union-alloc: 237, 1 --> 474 (7600 bytes)
> (  7)  8933408 ( 1.5%, 11.3%): union-alloc: 225, 1 --> 450 (7216 bytes)
> (  8)  8833104 ( 1.5%, 12.7%): union-alloc: 190, 1 --> 380 (6096 bytes)
> (  9)  8094160 ( 1.3%, 14.1%): union-alloc: 207, 1 --> 414 (6640 bytes)
> ( 10)  7931680 ( 1.3%, 15.4%): union-alloc: 222, 1 --> 444 (7120 bytes)
> ( 11)  7635040 ( 1.3%, 16.6%): union-alloc: 200, 1 --> 400 (6416 bytes)
> ( 12)  7490560 ( 1.2%, 17.9%): union-alloc: 192, 1 --> 384 (6160 bytes)
> ( 13)  7431776 ( 1.2%, 19.1%): union-alloc: 236, 1 --> 472 (7568 bytes)
> ( 14)  7046160 ( 1.2%, 20.2%): union-alloc: 93, 1 --> 186 (2992 bytes)
> ( 15)  6887904 ( 1.1%, 21.4%): union-alloc: 228, 1 --> 456 (7312 bytes)

After:

> 46061424 counts:
> (  1) 12260400 (26.6%, 26.6%): union-alloc: 1, 1 --> 2 (48 bytes)
> (  2)  6860160 (14.9%, 41.5%): union-alloc: 4, 1 --> 8 (144 bytes)
> (  3)  4691456 (10.2%, 51.7%): union-alloc: 3, 1 --> 6 (112 bytes)
> (  4)  3285840 ( 7.1%, 58.8%): union-alloc: 2, 1 --> 4 (80 bytes)
> (  5)  1784880 ( 3.9%, 62.7%): union-alloc: 18, 1 --> 36 (592 bytes)
> (  6)  1715376 ( 3.7%, 66.4%): union-alloc: 19, 1 --> 38 (624 bytes)
> (  7)  1650528 ( 3.6%, 70.0%): union-alloc: 16, 1 --> 32 (528 bytes)
> (  8)  1602288 ( 3.5%, 73.5%): union-alloc: 13, 1 --> 26 (432 bytes)
> (  9)  1389408 ( 3.0%, 76.5%): union-alloc: 20, 1 --> 40 (656 bytes)
> ( 10)  1265792 ( 2.7%, 79.3%): union-alloc: 15, 1 --> 30 (496 bytes)
> ( 11)  1218928 ( 2.6%, 81.9%): union-alloc: 14, 1 --> 28 (464 bytes)
> ( 12)  1215760 ( 2.6%, 84.5%): union-alloc: 17, 1 --> 34 (560 bytes)
> ( 13)  1204800 ( 2.6%, 87.2%): union-alloc: 7, 1 --> 14 (240 bytes)
> ( 14)  1007584 ( 2.2%, 89.3%): union-alloc: 11, 1 --> 22 (368 bytes)
> ( 15)   878640 ( 1.9%, 91.3%): union-alloc: 10, 1 --> 20 (336 bytes)

That's a 13.2x reduction in cumulative allocation size. The number of
allocations only drops by 1.26x, so it looks like this callsite must be
responsible for all the larger allocations. (And looking at the individual
lines it's clear this is the case.)

I also re-ran DMD and I'm seeing a similar reduction there. So you've actually
fixed most of the problem. Thanks! And sorry for my misleading measurements in
comment 8.

Flags: needinfo?(n.nethercote)

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 10

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/c306922867aa

Nicholas Nethercote [inactive]

Reporter

Comment 11

•

10 years ago

Thanks, Jeff! I'll dig a little more next week and see if I can identify other places where the smaller allocations might be coming from.

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

Assignee: nobody → jmuizelaar

Nicholas Nethercote [inactive]

Reporter

Comment 12

•

10 years ago

Attached file block counts for pixman_rect_alloc(), from DMD — Details

(In reply to Nicholas Nethercote [:njn] from comment #11)
> Thanks, Jeff! I'll dig a little more next week and see if I can identify
> other places where the smaller allocations might be coming from.

I did another run of Treeherder under DMD, with your patch applied, this time with sampling off, which gives details about every heap allocation.

We still have a *lot* of small allocations:

> Cumulative {
>   1,552,122 blocks in heap block record 2 of 8,348
>   271,205,008 bytes (238,975,424 requested / 32,229,584 slop)
>   Individual block sizes: 4,096 x 58; 2,048 x 7,821; 1,024 x 68,846; 496 x 13,144; 464 x 12,681; 432 x 17,113; 400 x 25,061; 368 x 18,743; 336 x 39,172; 304 x 31,250; 272 x 33,955; 240 x 66,841; 208 x 20,867; 176 x 45,576; 144 x 146,758; 112 x 114,653; 80 x 336,197; 48 x 553,386
>   7.05% of the heap (14.41% cumulative)
>   Allocated at {
>     #01: alloc_data (/home/njn/moz/mi4/o64dmd/gfx/cairo/libpixman/src/../../../../../gfx/cairo/libpixman/src/pixman-region.c:214)
>     #02: pixman_rect_alloc (/home/njn/moz/mi4/o64dmd/gfx/cairo/libpixman/src/../../../../../gfx/cairo/libpixman/src/pixman-region.c:465)
>     #03: pixman_op (/home/njn/moz/mi4/o64dmd/gfx/cairo/libpixman/src/../../../../../gfx/cairo/libpixman/src/pixman-region.c:828)
>   }
> }

1,552,122 allocations -- that's more than 10% of all heap allocations from this
run (going by count, not by size).

There's more detailed data in the attached file. AccumulateRectDifference(),
ProcessRemovedDisplayItems(), ProcessDisplayItems(), and
RecomputeVisibilityForItems() all feature prominently.

Jeff Muizelaar [:jrmuizel]

Assignee

Comment 13

•

10 years ago

(In reply to Nicholas Nethercote [:njn] from comment #12)
> Created attachment 8536283 [details]
> 1,552,122 allocations -- that's more than 10% of all heap allocations from
> this
> run (going by count, not by size).
> 
> There's more detailed data in the attached file. AccumulateRectDifference(),
> ProcessRemovedDisplayItems(), ProcessDisplayItems(), and
> RecomputeVisibilityForItems() all feature prominently.

So AccumulateRectDifference calls Xor(A, B) which is currently not natively supported and is emulated by using (A - B) U (B - A). If we actually need this kind of thing to be well implemented it shouldn't be too hard to add support to pixman. I've filed bug 1111432 for that work.

Ryan VanderMeulen [:RyanVM]

Comment 14

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/c306922867aa

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla37

Use a rectangle instead of a region for computing the layer bounds 10 years ago Jeff Muizelaar [:jrmuizel] 1.09 KB, patch	roc : review+	Details \| Diff \| Splinter Review
block counts for pixman_rect_alloc(), from DMD 10 years ago Nicholas Nethercote [inactive] 42.38 KB, text/plain		Details

Bugzilla

Quick Search

GiBs of cumulative heap allocations under nsRegion::Or() when viewing Treeherder

Categories

(Core :: Layout, defect)

Tracking

()

People

(Reporter: n.nethercote, Assigned: jrmuizel)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Comment 12

Comment 13

Comment 14

Attachment

General

Description

File Name

Content Type