Closed Bug 1838350 Opened 1 years ago Closed 7 months ago

Add regression tests preventing "Animated theme produces steady memory leak in occluded or minimized windows [leaking render-texture-hosts via RecvAddSharedSurface]"

Categories

(Core :: Graphics, defect, P1)

defect

Tracking

()

RESOLVED FIXED
127 Branch
Tracking Status
firefox127 --- fixed

People

(Reporter: jgilbert, Assigned: tnikkel)

References

Details

Attachments

(1 file)

Because bug 1828587 needed uplift to Release, we need to try to add tests to prevent it in the future, so we're making this S2+gfx-triage.

I talked some with @tnikkel about how we might do this, such as:

  • Add animated (hidden?) item to are-we-slim-yet, and notice it that way.
  • Add a hidden pref to MOZ_RELEASE_ASSERT a max number of RenderTextureHosts.
  • Track the number of RenderTextureHosts in Telemetry, and add an alert if we start seeing a distribution spike.

I've done a little bit more investigation on this bug, I think writing a test that recreates the problem shouldn't be too hard, so "Add a hidden pref to MOZ_RELEASE_ASSERT a max number of RenderTextureHosts" should be feasible.

This is sort of the same as bug 1829494 right?

That one seems to be specifically about AWSY. This one is more general, just to get something going.

I don't think AWSY is the right mechanism, unless you are actually looking for patches to get backed out if they regress this memory usage by a few percent. A regular test could hook into the memory reporting infrastructure to check that the memory remains relatively steady in whatever page minimized/occluded scenarios. I'd have to dig around a bit to remember how that might work.

Assignee: nobody → tnikkel
No longer blocks: gfx-triage

Tim, considering this is flagged as an "S2", do we have any idea when we might have cycles to address it?

Flags: needinfo?(tnikkel)

I would like to, there's just more high priority work around here then time to work on them.

Flags: needinfo?(tnikkel)
Blocks: gfx-triage
Severity: S2 → S3

This has been relegated to S3 because though we (ie Jeff) decided that this should be the same priority as S2, it is rules-as-written not an S3. Our method for keeping de-facto priority high is to make it S3 but leave it in gfx-triage.

While attempting to repro a different user-reported issue, I enabled the dark space theme again and can confirm it isn't leaking currently, at least for that common case. But we should have a regression test for this.

I started writing a browser chrome test here back around when this bug was filed. It seems like a relatively simple task to just write a browser chrome test that reproduces the issue (and then use something like the suggestion in comment 1 "Add a hidden pref to MOZ_RELEASE_ASSERT a max number of RenderTextureHosts." to turn it into something that reports a failure if memory increases too much).

The requirements to reproduce should be straight forward:
-an animated images large enough to trigger partial decode mode (where we don't keep around every frame of the image), this can be adjusted by prefs so a smaller image triggers it
-load the image in a content page OR load the image in chrome, both scenarios should be tested as they required separate fixes
-minimize the window that contains the image
-optionally have a copy of the image still visible and animating to make sure further decoding happens
-let that run for x iterations to ensure we had enough time to trip over the limit set by the hidden pref

I started by finding two existing tests that minimized a window, then massaged one of them into loading an animated image in a content document and then minimized that window. However that did not reproduce the increasing memory. At that point I got pulled away to other tasks and haven't had a chance to come back to this.

So it does not seem hard or time consuming at all, but I always seem to run into issues when writing tests. Either it's hard to get a reliable test, or you write a test that seems to reproduce exactly the manual steps of the issue but the issue doesn't turn up due to something about the testing environment that causes the test to take much more time to write and land.

One thing to mention is that we've had this same basic bug still existing in the browser but in a slightly different place, bug 1875100. The test that I would have written here would not have caught that.

See Also: → 1875100

I revived my test and started looking into why it doesn't reproduce the bug.

To reproduce the bug we require one RasterImage object to be shared in two (or more) places, with one of the places visible and animating so that we keep generating new frames, and one of the places in a minimized window so that it receives those new frames but never deals with them so they pile up. When I load the image in the second window in my mochitest-browser-chrome we do not find the existing image in the image cache, and so we load a fresh one. The image either isn't in the image cache anymore or the key we use to look it up in the hash table must differ, but I'm not sure why yet (the same steps performed manually in a normal browser profile do reproduce of course).

(In reply to Timothy Nikkel (:tnikkel) from comment #11)

The image either isn't in the image cache anymore or the key we use to look it up in the hash table must differ

Neither of these was true. The test was loading an image at the top level to a path in the objdir, this path is a symlink, so it redirects to the file path in the src tree. When loading a uri with a redirect at the top level we search the cache using the current uri (ie after the redirect)

https://searchfox.org/mozilla-central/rev/b5b551329484a67f2abb4bae50411b93d5851733/image/imgLoader.cpp#2611

but we store the image in the cache using the original uri (before the redirect)

https://searchfox.org/mozilla-central/rev/b5b551329484a67f2abb4bae50411b93d5851733/image/imgLoader.cpp#2724

(you can see a comment mentioning this issue there).

This behaviour was introduced in bug 121084 21 years ago where they were aware that the patch was introducing this problem but accepted the patch because it solved more problems then it caused (bug 121084, comment 214, bug 121084, comment 216).

I got a test that recreates the leak for 2 of the 3 ways we've encountered it (animated images in content tab in minimized window, animated image in chrome in minimized window). The 3rd is bug 1875100 for which I'm attempted to adapt an existing test.

I got a test to reproduce the 3rd way now. What remains to do: integrate this into one test, add an pref-controlled assert that we stay below a certain render texture count, and make it stable on try.

Component: Graphics: WebRender → Graphics

I filed the problem of comment 12 as bug 1889840.

See Also: → 1839109
See Also: → 1830753
No longer blocks: gfx-triage
Pushed by tnikkel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/6e45ff49c9ae Add test for animated images in minimized window leaking. r=layout-reviewers,emilio
Regressions: 1892686
Status: NEW → RESOLVED
Closed: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → 127 Branch
Regressions: 1895625
No longer regressions: 1895625
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: