Closed Bug 1453801 Opened 2 years ago Closed 2 years ago

In SourceSurfaceSharedDataWrapper::SourceSurfaceSharedDataWrapper, mConsumer being initialized to 0 causes an assertion to trigger


(Core :: Graphics: WebRender, defect, P1)




Tracking Status
firefox-esr52 --- unaffected
firefox59 --- disabled
firefox60 --- disabled
firefox61 --- fixed


(Reporter: tristanbourvon, Assigned: aosmond)


(Blocks 1 open bug)



(3 files, 3 obsolete files)

This hunk caused an assertion to trigger (, presumably because RemoveConsumer is called before AddConsumer somewhere, and currently mConsumer gets positive garbage so this is invisible.

The hunk has since been backed out but should be a correct fix to land.
Ever confirmed: true
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(aosmond)
Well that's embarrassing.
Assignee: nobody → aosmond
Flags: needinfo?(jmuizelaar)
Flags: needinfo?(aosmond)
I guess this also means that in SharedSurfacesParent::Release we never do sInstance->mSurfaces.Remove(id) ?
I'm not sure if I even need the consumer code anymore, the code has evolved a lot since it was first introduced. Also this bug only affects WebRender enabled builds.
Should be fine just ripping it out. The image cache in the content process keeps the surfaces alive in the GPU process as long as necessary anyways. The races I feared should be very uncommon, and now WebRender accepts an external image ID not mapping to anything (it will log a critical warning).

Attachment #8967829 - Flags: review?(jmuizelaar)
Attachment #8967829 - Flags: review?(jmuizelaar) → review+
This should be unmarked as a security bug, given it will only affect WebRender, and the consequence is that we won't release images from the cache in the GPU process when we should (memory leak).
Depends on: 1454112
I think we need to back this out, it caused a huge AWSY regression w/ WebRender (bug 1454346) as well as frequent crashes (bug 1454112, bug 1454114).
Flags: needinfo?(aryx.bugmail)
Flags: needinfo?(aryx.bugmail)
Resolution: FIXED → ---
Target Milestone: mozilla61 → ---
Group: core-security → gfx-core-security
My latest attempt to fix this has yielded fruit.

I haven't looked into why the the Images measurement went up by 30% / 1.5MB, but the Resident Memory went down by 6% / 70MB. Finally results that are starting to make some sense.
(In reply to Andrew Osmond [:aosmond] from comment #9)
> My latest attempt to fix this has yielded fruit.
> central&originalRevision=0e45c13b34e815cb42a9f08bb44142d1a81e186e&newProject=
> try&newRevision=385b2b7f02edc4049f1bf48fc61299d174a39beb&framework=4&showOnly
> Important=1
> I haven't looked into why the the Images measurement went up by 30% / 1.5MB,
> but the Resident Memory went down by 6% / 70MB. Finally results that are
> starting to make some sense.

Looking at the subtests it feels like you're keeping images alive longer [1]. The after 30s / forced GC numbers look okay.

After some consideration, I think this is what is happened before:

1) Load a page. Surface cache touched to create image container. Image container cached.
2) Tab switch away to prepare for a new page. Image container remains cached by the caller.
3) Repeat.
4a) Return to tabs in order to close. Surface cache not touched as the image container was kept alive. Doesn't matter if imgIContainer::GetImageContainer(AtSize) is called because it exits before hitting the cache if the container already exists.
4b) In parallel, items in cache are expired and removed from reporting. The surfaces may still be kept alive by the image container, but no longer reported.

With my new changes:

2) Tab switch away to prepare for a new page. Image container is discarded.
4a). Return to tabs in order to close. Hits surface cache in order to get the image container for the last brief display before closing.
4b). Same as before, but the timer got reset, so the images live a bit longer.
Component: Graphics: Layers → Graphics: WebRender
This reveals a shortcoming in imagelib as well. Ideally we wouldn't expire cache entries that still have an image container alive. This would also make reporting more accurate, e.g. not underreporting.
I don't believe linux64 suffers much from this misreporting. Images are less likely to be layerized. That would cause them to go via the imgIContainer::Draw path, which will touch the surface cache again and keep the images alive longer.

While the patch appears to regress linux64-qr, in reality it just brings us back in line with what one would expect. Since we are showing the same images, why are we would imagelib report less memory usage? There isn't any obvious reason why WebRender would be more efficient here.
Group: gfx-core-security
This fixes how we would hold onto the image containers for too long. Tab switching away would keep the WebRenderUserData objects, while scrolling would toss them.
Attachment #8967829 - Attachment is obsolete: true
This sort of does a 180 on the consumer code and actually uses it. I didn't run into any failures without it, but unfortunately there is a small chance of a race...
And finally, add the missing shutdown code that should have been there all along. It is a bit tricky because we want to ensure the render thread has finished all its tasks before we remove any lingering textures.

Attachment #8969372 - Attachment is obsolete: true
Attachment #8969745 - Flags: review?(sotaro.ikeda.g)
Attachment #8969375 - Attachment is obsolete: true
Attachment #8969747 - Flags: review?(sotaro.ikeda.g)
Comment on attachment 8969371 [details] [diff] [review]
0001-Bug-1453801-Part-1.-Ensure-WebRenderUserData-objects.patch, v1

Attachment #8969371 - Flags: review?(sotaro.ikeda.g)
I see why the original change made things worse instead of fixing them. Correcting the image lifetime issue on the content process caused us to redecode the images more frequently. Since the rendering thread was keeping the old images alive, more images decoded will mean the memory footprint keeps getting higher. Hence why I needed to fix all of the bugs to see the improvement.
Attachment #8969371 - Flags: review?(sotaro.ikeda.g) → review+
Attachment #8969745 - Flags: review?(sotaro.ikeda.g) → review+
Comment on attachment 8969747 [details] [diff] [review]
0003-Bug-1453801-Part-3.-Fix-race-condition-shutting-down.patch, v2

Looks good!
Attachment #8969747 - Flags: review?(sotaro.ikeda.g) → review+
Pushed by
Part 1. Ensure WebRenderUserData objects are freed after a tab switch. r=sotaro
Part 2. Ensure shared surfaces are properly released from render texture cache. r=sotaro
Part 3. Fix race condition shutting down the render thread and shared surfaces. r=sotaro
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.