Unbounded memory growth from PrimitiveKey instances created during interning
Categories
(Core :: Graphics: WebRender, defect, P2)
Tracking
()
People
(Reporter: mayankleoboy1, Assigned: bholley)
References
(Depends on 1 open bug)
Details
Attachments
(7 files)
+++ This bug was initially created as a clone of Bug #1506435 +++ User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0 Steps to reproduce: 1.enable WR 2. unzip the attachment (a large perf-html profile)) 3. Go to https://perf-html.io/ 4. Drag and drop the profile there 5. Wait till the profile loads 6. Show only "content process 3" 7. Invert the call stack 8. Show only "Javascript" Actual results: Large memoru increase. Most of it is heap-unclassified Expected results: Memory should be reported, i guess
Reporter | ||
Comment 1•5 years ago
|
||
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 2•5 years ago
|
||
You might need to check these options in about:preferences : "search for text when you start typing" , and "always use cursor to navigate between pages" . Not very sure, but check these options, and restart the browser if you cant repro the first time.
Updated•5 years ago
|
Assignee | ||
Comment 3•5 years ago
|
||
I can reproduce this. Memory climbs quickly interacting with the page. I loaded it quickly and then tabbed away to generate a DMD report. This was the dominant stack: Unreported { 6,194 blocks in heap block record 1 of 9,787 140,439,872 bytes (139,799,736 requested / 640,136 slop) Individual block sizes: 24,576 x 5,696; 20,480; 8,192 x 12; 4,096 x 55; 2,048 x 24; 1,024 x 18; 480 x 4; 448; 400; 384 x 4; 368 x 3; 336 x 3; 320 x 2; 304 x 2; 288 x 6; 272 x 5; 256 x 3; 240 x 4; 224 x 3; 208 x 7; 192 x 10; 176 x 10; 160 x 7; 144 x 16; 128 x 3; 112 x 79; 96 x 61; 80 x 6; 64 x 12; 48 x 81; 32 x 32; 16 x 23 32.31% of the heap (32.31% cumulative) 62.89% of unreported (62.89% cumulative) Allocated at { #01: replace_malloc(unsigned long) (in libmozglue.dylib) + 106 #02: _$LT$webrender..prim_store..PrimitiveKey$u20$as$u20$core..clone..Clone$GT$::clone::hb485db375d6c85a1 (in XUL) + 449 #03: webrender::display_list_flattener::DisplayListFlattener::create_primitive::hbcfb07f5168fdc7f (in XUL) + 3496 #04: webrender::display_list_flattener::DisplayListFlattener::add_primitive::h0924a26649204af3 (in XUL) + 3531 #05: webrender::display_list_flattener::DisplayListFlattener::flatten_item::h836d20b4bd584f54 (in XUL) + 18517 #06: webrender::display_list_flattener::DisplayListFlattener::flatten_items::he22f987981c469c7 (in XUL) + 323 #07: webrender::display_list_flattener::DisplayListFlattener::flatten_item::h836d20b4bd584f54 (in XUL) + 21031 #08: webrender::display_list_flattener::DisplayListFlattener::flatten_items::he22f987981c469c7 (in XUL) + 323 } } So looks like the interner is getting huge.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 4•5 years ago
|
||
Glenn thinks this is some sort of silly bug, since we shouldn't be allocating anywhere near that many keys (which are generally small).
Assignee | ||
Updated•5 years ago
|
Comment 5•5 years ago
|
||
It'd be worth testing this again. We did fix a couple of bugs where some structures were 10x the size they should have been, due to one enum field having a really large struct in it. That wouldn't fix the unbounded memory growth, if that is occurring, but might make the memory totals a lot better than previously.
Comment 6•5 years ago
|
||
I think I inadvertently stumbled upon what the problem is here - from the description of https://github.com/servo/webrender/pull/3381: --- The current picture caching code invalidates cached regions more than expected, due to how gecko handles scroll offsets. If a display list is received, then a scroll event occurs, and then a new display list is sent, the local rects will contain different coordinates, since the scroll offsets are baked into the local rects rather than in the scroll / reference frames. This results in primitives being interned with a different value, and cached surfaces being incorrectly invalidated. Since the way this works is non-trivial to change in Gecko, we need to handle this in WR. If we change primitive and clip keys and templates to store their rectangles in a true local primitive space (where the origin is always zero) then the interning will work as expected. To do this, we'll store the origin of a primitive in the instance, and only the size in the template. A side benefit of this approach is that test cases like dl_mutate will become much faster, since the only difference between the thousands of rectangles is the origin. This will effectively allow true instancing of primitives with common data. This is the first step towards that - by removing reliance of the primitive local rect from the primitive template code. The rest will be done as a series of incremental patches. --- The changes I'm planning to make to avoid these extra invalidations will almost certainly fix this issue too.
Updated•5 years ago
|
Comment 7•5 years ago
|
||
I haven't tested locally yet (will try to tomorrow), but this should hopefully be resolved with the next nightly build - the changes that make interned primitives exist in a true local space have landed now, which is what I suspect to be the cause of this.
Reporter | ||
Comment 8•5 years ago
|
||
you can also get a failure log like this : GP+[GFX1-]: Attempting to allocate a texture of size 1024x8704 above the limit, trimming Thw window turns white, and then recovers. The compositor changes to "D3D11 (advanced layers)". But no crash report is generated. (In reply to Glenn Watson [:gw] from comment #7) > I haven't tested locally yet (will try to tomorrow), but this should > hopefully be resolved with the next nightly build - the changes that make > interned primitives exist in a true local space have landed now, which is > what I suspect to be the cause of this. will retest when the next nightly arrives, presumably with https://github.com/servo/webrender/pull/3395 landed
Reporter | ||
Comment 9•5 years ago
|
||
as of buildconfig https://hg.mozilla.org/mozilla-central/rev/5c892a6147ae856f208e77c265eda4d7b677ac52 , this isnt fixed
Reporter | ||
Comment 10•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 11•5 years ago
|
||
I re-ran the DM instrumentation in comment 3 on an unoptimized, and got this stack. This reveals that the unreported memory is actually the glyph vec in TextRunKey, which wasn't at all clear in comment 3 because duplicate code folding.
So I could add reporters for those, but that's starting to feel a bit like whack-a-mole. I'm going to have a look at adding a conditional dependency onto malloc_size_of, and using it in a few targeted cases like interners.
Assignee | ||
Comment 12•5 years ago
|
||
Here's a memory report using my new machinery in bug 1519454, which shows the problem plain as day. I'll get back to actually look at this bug next week.
Assignee | ||
Comment 14•5 years ago
|
||
Assignee | ||
Comment 15•5 years ago
|
||
This does two things: * Ensures that the T gets dropped when the item is removal, which is important for the TextRunKey case where it holds heap memory. * Eliminates the epoch handling while still ensuring that we panic on stale lookups. We also remove the Item usage for local_data, but don't bother with the Option in that case.
Assignee | ||
Comment 16•5 years ago
|
||
Depends on D16629
Assignee | ||
Comment 17•5 years ago
|
||
The patches here fix the broad strokes of the memory issues. The testcase still heavily janks the renderer thread by issuing very large draw calls to the GPU, for the gazillions of off-screen text runs.
Bug 1520063 would fix that issue for text runs. We might also consider a general backstop that prevents us from issuing huge draw calls if they're taking too long. I'll add a note to discuss it in tomorrow's meeting.
Comment 18•5 years ago
|
||
Pushed by bholley@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/53a5c9148120 Use an Option<T> instead of Item<T> in DataStore. r=gw https://hg.mozilla.org/integration/autoland/rev/932ddb2ffb5e Arc the glyph arrays for text runs. r=gw
Comment 19•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/53a5c9148120
https://hg.mozilla.org/mozilla-central/rev/932ddb2ffb5e
Updated•5 years ago
|
Reporter | ||
Comment 20•5 years ago
|
||
I can still get 820-700MB on the testcase as of https://hg.mozilla.org/mozilla-central/rev/24f969298ec5c1a7cbc444bfc8f3d77b07f46bbe
It doesnt go above 1GB though. Is this the expected improvement from the patches in this bug?
Assignee | ||
Comment 21•5 years ago
|
||
(In reply to Mayank Bansal from comment #20)
I can still get 820-700MB on the testcase as of https://hg.mozilla.org/mozilla-central/rev/24f969298ec5c1a7cbc444bfc8f3d77b07f46bbe
It doesnt go above 1GB though. Is this the expected improvement from the patches in this bug?
Bug 1519454 measures the memory so that the increase is now accounted for and not heap-unclassifed.
The patches in this bug should do two things:
(1) Reduce the memory overhead by ~half
(2) Prevent the memory overhead from persisting when you switch tabs.
Bug 1520063 would eliminate all the memory overhead for this particular test-case, but we may or may not fix that for the initial release.
Can you confirm that what you see matches the above, and perhaps attach a memory report?
Reporter | ||
Comment 22•5 years ago
|
||
I tried this today. And holy cow, this is so much better now. Memory doesnt go above 250-300MB. Scrolling the page vertically and horizontally is super smooth. opening the nested JS lines is super smooth.
I am not sure which bug(s) fixed this after my last comment, but would be good to know.
This is definitely FIXED for me!
Assignee | ||
Comment 23•5 years ago
|
||
Great! It might be interesting to bisect to figure out what fixed it, but it's not a very high priority, so don't feel obligated. Glad it's working well now!
Reporter | ||
Comment 24•5 years ago
|
||
I think what made the biggest difference was not something in gecko, but changes on perf-html page via :
https://github.com/devtools-html/perf.html/pull/1695 (or https://github.com/devtools-html/perf.html/issues/1502)
Assignee | ||
Comment 25•5 years ago
|
||
Ah right! They were going to fix that on the perf-html side as well. So effectively, the testcase changed, and we don't have an easy way to access the old testcase. :-)
Description
•