Closed Bug 1506449 Opened 5 years ago Closed 5 years ago

Unbounded memory growth from PrimitiveKey instances created during interning

Categories

(Core :: Graphics: WebRender, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox65 --- wontfix
firefox66 --- fixed

People

(Reporter: mayankleoboy1, Assigned: bholley)

References

(Depends on 1 open bug)

Details

Attachments

(7 files)

+++ This bug was initially created as a clone of Bug #1506435 +++

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0

Steps to reproduce:

1.enable WR
2. unzip the attachment (a large perf-html profile))
3. Go to https://perf-html.io/
4. Drag and drop the profile there
5. Wait till the profile loads
6. Show only "content process 3"
7. Invert the call stack
8. Show only "Javascript"


Actual results:

Large memoru increase. Most of it is heap-unclassified



Expected results:

Memory should be reported, i guess
Attached file memory-report.json.gz
See Also: → 1506437, 1506435
You might need to check these options in about:preferences :  "search for text when you start typing" , and "always use cursor to navigate between pages" . Not very sure, but check these options, and restart the browser if you cant repro the first time.
Priority: -- → P3
I can reproduce this. Memory climbs quickly interacting with the page. I loaded it quickly and then tabbed away to generate a DMD report. This was the dominant stack:

Unreported {
  6,194 blocks in heap block record 1 of 9,787
  140,439,872 bytes (139,799,736 requested / 640,136 slop)
  Individual block sizes: 24,576 x 5,696; 20,480; 8,192 x 12; 4,096 x 55; 2,048 x 24; 1,024 x 18; 480 x 4; 448; 400; 384 x 4; 368 x 3; 336 x 3; 320 x 2; 304 x 2; 288 x 6; 272 x 5; 256 x 3; 240 x 4; 224 x 3; 208 x 7; 192 x 10; 176 x 10; 160 x 7; 144 x 16; 128 x 3; 112 x 79; 96 x 61; 80 x 6; 64 x 12; 48 x 81; 32 x 32; 16 x 23
  32.31% of the heap (32.31% cumulative)
  62.89% of unreported (62.89% cumulative)
  Allocated at {
    #01: replace_malloc(unsigned long) (in libmozglue.dylib) + 106
    #02: _$LT$webrender..prim_store..PrimitiveKey$u20$as$u20$core..clone..Clone$GT$::clone::hb485db375d6c85a1 (in XUL) + 449
    #03: webrender::display_list_flattener::DisplayListFlattener::create_primitive::hbcfb07f5168fdc7f (in XUL) + 3496
    #04: webrender::display_list_flattener::DisplayListFlattener::add_primitive::h0924a26649204af3 (in XUL) + 3531
    #05: webrender::display_list_flattener::DisplayListFlattener::flatten_item::h836d20b4bd584f54 (in XUL) + 18517
    #06: webrender::display_list_flattener::DisplayListFlattener::flatten_items::he22f987981c469c7 (in XUL) + 323
    #07: webrender::display_list_flattener::DisplayListFlattener::flatten_item::h836d20b4bd584f54 (in XUL) + 21031
    #08: webrender::display_list_flattener::DisplayListFlattener::flatten_items::he22f987981c469c7 (in XUL) + 323
  }
}

So looks like the interner is getting huge.
Flags: needinfo?(gwatson)
Priority: P3 → P2
Glenn thinks this is some sort of silly bug, since we shouldn't be allocating anywhere near that many keys (which are generally small).
Summary: Heap-unclassified memory on perf-html with large data:text/javascript URL → Unbounded memory growth from PrimitiveKey instances created during interning
It'd be worth testing this again.

We did fix a couple of bugs where some structures were 10x the size they should have been, due to one enum field having a really large struct in it.

That wouldn't fix the unbounded memory growth, if that is occurring, but might make the memory totals a lot better than previously.
Flags: needinfo?(gwatson)
I think I inadvertently stumbled upon what the problem is here - from the description of https://github.com/servo/webrender/pull/3381:

---

The current picture caching code invalidates cached regions more
than expected, due to how gecko handles scroll offsets.

If a display list is received, then a scroll event occurs, and then
a new display list is sent, the local rects will contain different
coordinates, since the scroll offsets are baked into the local
rects rather than in the scroll / reference frames. This results
in primitives being interned with a different value, and cached
surfaces being incorrectly invalidated.

Since the way this works is non-trivial to change in Gecko, we need
to handle this in WR.

If we change primitive and clip keys and templates to store their
rectangles in a true local primitive space (where the origin is
always zero) then the interning will work as expected. To do this,
we'll store the origin of a primitive in the instance, and only
the size in the template.

A side benefit of this approach is that test cases like dl_mutate
will become much faster, since the only difference between the
thousands of rectangles is the origin. This will effectively
allow true instancing of primitives with common data.

This is the first step towards that - by removing reliance of the
primitive local rect from the primitive template code. The rest
will be done as a series of incremental patches.

---

The changes I'm planning to make to avoid these extra invalidations will almost certainly fix this issue too.
Assignee: nobody → gwatson
I haven't tested locally yet (will try to tomorrow), but this should hopefully be resolved with the next nightly build - the changes that make interned primitives exist in a true local space have landed now, which is what I suspect to be the cause of this.
you can also get a failure log like this : 
GP+[GFX1-]: Attempting to allocate a texture of size 1024x8704 above the limit, trimming

Thw window turns white, and then recovers. The compositor changes to "D3D11 (advanced layers)". But no crash report is generated.


(In reply to Glenn Watson [:gw] from comment #7)
> I haven't tested locally yet (will try to tomorrow), but this should
> hopefully be resolved with the next nightly build - the changes that make
> interned primitives exist in a true local space have landed now, which is
> what I suspect to be the cause of this.


will retest when the next nightly arrives, presumably with https://github.com/servo/webrender/pull/3395 landed
Attached file memory-report.json.gz
Assignee: gwatson → bobbyholley
Attached file unoptimized DMD stack

I re-ran the DM instrumentation in comment 3 on an unoptimized, and got this stack. This reveals that the unreported memory is actually the glyph vec in TextRunKey, which wasn't at all clear in comment 3 because duplicate code folding.

So I could add reporters for those, but that's starting to feel a bit like whack-a-mole. I'm going to have a look at adding a conditional dependency onto malloc_size_of, and using it in a few targeted cases like interners.

Depends on: 1519454
Attached file memory-report.json.gz

Here's a memory report using my new machinery in bug 1519454, which shows the problem plain as day. I'll get back to actually look at this bug next week.

Depends on: 1520063
This does two things:
* Ensures that the T gets dropped when the item is removal, which is
  important for the TextRunKey case where it holds heap memory.
* Eliminates the epoch handling while still ensuring that we panic on
  stale lookups.

We also remove the Item usage for local_data, but don't bother with the
Option in that case.
Depends on D16629

The patches here fix the broad strokes of the memory issues. The testcase still heavily janks the renderer thread by issuing very large draw calls to the GPU, for the gazillions of off-screen text runs.

Bug 1520063 would fix that issue for text runs. We might also consider a general backstop that prevents us from issuing huge draw calls if they're taking too long. I'll add a note to discuss it in tomorrow's meeting.

Pushed by bholley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/53a5c9148120
Use an Option<T> instead of Item<T> in DataStore. r=gw
https://hg.mozilla.org/integration/autoland/rev/932ddb2ffb5e
Arc the glyph arrays for text runs. r=gw
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66

I can still get 820-700MB on the testcase as of https://hg.mozilla.org/mozilla-central/rev/24f969298ec5c1a7cbc444bfc8f3d77b07f46bbe

It doesnt go above 1GB though. Is this the expected improvement from the patches in this bug?

(In reply to Mayank Bansal from comment #20)

I can still get 820-700MB on the testcase as of https://hg.mozilla.org/mozilla-central/rev/24f969298ec5c1a7cbc444bfc8f3d77b07f46bbe

It doesnt go above 1GB though. Is this the expected improvement from the patches in this bug?

Bug 1519454 measures the memory so that the increase is now accounted for and not heap-unclassifed.

The patches in this bug should do two things:
(1) Reduce the memory overhead by ~half
(2) Prevent the memory overhead from persisting when you switch tabs.

Bug 1520063 would eliminate all the memory overhead for this particular test-case, but we may or may not fix that for the initial release.

Can you confirm that what you see matches the above, and perhaps attach a memory report?

Flags: needinfo?(mayankleoboy1)

I tried this today. And holy cow, this is so much better now. Memory doesnt go above 250-300MB. Scrolling the page vertically and horizontally is super smooth. opening the nested JS lines is super smooth.
I am not sure which bug(s) fixed this after my last comment, but would be good to know.

This is definitely FIXED for me!

Great! It might be interesting to bisect to figure out what fixed it, but it's not a very high priority, so don't feel obligated. Glad it's working well now!

Flags: needinfo?(mayankleoboy1)

I think what made the biggest difference was not something in gecko, but changes on perf-html page via :

https://github.com/devtools-html/perf.html/pull/1695 (or https://github.com/devtools-html/perf.html/issues/1502)

Ah right! They were going to fix that on the perf-html side as well. So effectively, the testcase changed, and we don't have an easy way to access the old testcase. :-)

You need to log in before you can comment on or make changes to this bug.