Closed Bug 990174 Opened 8 years ago Closed 3 years ago

SkiaGL runs out of memory when drawing lots of shadowed images

Categories

(Core :: Canvas: 2D, defect, P5)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
fennec + ---

People

(Reporter: snorp, Assigned: snorp)

References

Details

(Whiteboard: [skia-upstream])

Attachments

(2 files, 1 obsolete file)

We have a test that draws a shadowed image in a loop. We run out of memory on Android here fairly quickly. I've done some investigation, and it appears that Skia does some save/restore layer stuff for the blur filter. It creates a texture (and associated render buffer) for each of those, which are then cached. Either our cache settings are too large, or the cache eviction is not working for these types of textures, because we run out of memory (GL_OUT_OF_MEMORY) at some point. 

The test in question is test_2d_shadow_blur_low()[1], part of test_canvas.html.

[1] http://dxr.mozilla.org/mozilla-central/source/content/canvas/test/test_canvas.html#16430
One issue I see is that the cache lookup should probably be succeeding, at least on some of these, but it never does.
We might be leaking the SkImageFilter. SkAutoTUnref might be the solution here.
(In reply to George Wright (:gw280) from comment #2)
> We might be leaking the SkImageFilter. SkAutoTUnref might be the solution
> here.

While you are right, it doesn't look like that fixed this particular issue.
I modified the ResourceCacheTest test and was able to reproduce it within the Skia test suite.
Whiteboard: [skia-upstream]
Hi all, I'm not too familiar with the SkDropShadowImageFilter. You may want to loop in senorblanco@chromium.org. He is the owner of most of the image filter code in Skia.
Flags: needinfo?(senorblanco)
There's still a leak in the test code.

Objects in Skia are created with a refcount of 1. SkRefPtr refs its argument on creation, and derefs it on destruction. So this code will leave the filter with a refcount of 1 at end of execution.

I recommend the use of SkAutoTUnref<> instead, which does not ref on construction. It is not as powerful as a full-on smart pointer, but it tends to work better with skia semantics.
Just because I'm anal, here's valgrind to back me up:

==8536== 14,400 (12,800 direct, 1,600 indirect) bytes in 200 blocks are definitely lost in loss record 20 of 20
==8536==    at 0x4C2B1C7: operator new(unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==8536==    by 0x4EA07D: ResourceCache(skiatest::Reporter*, GrContextFactory*) (SkDropShadowImageFilter.h:16)
==8536==    by 0x4499F5: skiatest::Test::run() (Test.cpp:108)
==8536==    by 0x40B04C: SkTestRunnable::run() (skia_test.cpp:108)
==8536==    by 0x40A7F0: tool_main(int, char**) (skia_test.cpp:217)
==8536==    by 0x6A5776C: (below main) (libc-start.c:226)
(In reply to Stephen White from comment #6)
> There's still a leak in the test code.
> 
> Objects in Skia are created with a refcount of 1. SkRefPtr refs its argument
> on creation, and derefs it on destruction. So this code will leave the
> filter with a refcount of 1 at end of execution.
> 
> I recommend the use of SkAutoTUnref<> instead, which does not ref on
> construction. It is not as powerful as a full-on smart pointer, but it tends
> to work better with skia semantics.

Well that's embarrassing. I always get confused between our own ref count behavior and yours.

I changed it to:

SkAutoTUnref<SkDropShadowImageFilter> filter(SkDropShadowImageFilter::Create(5, 5, (float)i / 2.0, SK_ColorGRAY));

And I get the same out-of-memory behavior. I'm on Mac, so don't have valgrind handy at the moment, but I'm relatively certain the image filter is getting destroyed now.
Flags: needinfo?(senorblanco)
The cache never seems to think it has much stuff in it. The stats right before crash:

D/skia    (21276): Budget: 1000 items 12595200 bytes
D/skia    (21276): 		Entry Count: current 6 (4 locked) high 14
D/skia    (21276): 		Entry Bytes: current 10675161 high 27452377
D/skia    (21276): 		Detached Entry Count: current 1 high 6
D/skia    (21276): 		Detached Bytes: current 2395536 high 25550692
Alright, I ran this through the Mac OpenGL Profiler, and after each iteration I see 6 additional shader objects created. Presumably this is where the leak is coming from.
Call stack where the shaders are created:

	0: 0x1005fe4f5 in attach_shader(GrGLContext const&, unsigned int, unsigned int, SkString const&) at GrGLShaderBuilder.cpp: 642 
	1: 0x1005ffbd5 in GrGLFullShaderBuilder::compileAndAttachShaders(unsigned int, SkTDArray<unsigned int>*) const at GrGLShaderBuilder.cpp: 891 
	2: 0x1005fd669 in GrGLShaderBuilder::finish(unsigned int*) at GrGLShaderBuilder.cpp: 581 
	3: 0x1005ea6d5 in GrGLProgram::genProgram(GrGLShaderBuilder*, GrEffectStage const**, GrEffectStage const**) at GrGLProgram.cpp: 154 
	4: 0x1005e9b89 in GrGLProgram::GrGLProgram(GrGpuGL*, GrGLProgramDesc const&, GrEffectStage const**, GrEffectStage const**) at GrGLProgram.cpp: 53 
	5: 0x1005e9955 in GrGLProgram::GrGLProgram(GrGpuGL*, GrGLProgramDesc const&, GrEffectStage const**, GrEffectStage const**) at GrGLProgram.cpp: 63 
	6: 0x1005e989b in GrGLProgram::Create(GrGpuGL*, GrGLProgramDesc const&, GrEffectStage const**, GrEffectStage const**) at GrGLProgram.cpp: 27 
	7: 0x100620bac in GrGpuGL::ProgramCache::getProgram(GrGLProgramDesc const&, GrEffectStage const**, GrEffectStage const**) at GrGpuGL_program.cpp: 128 
	8: 0x100621529 in GrGpuGL::flushGraphicsState(GrGpu::DrawType, GrDeviceCoordTexture const*) at GrGpuGL_program.cpp: 246 
	9: 0x100586138 in GrGpu::setupClipAndFlushState(GrGpu::DrawType, GrDeviceCoordTexture const*, GrDrawState::AutoRestoreEffects*, SkRect const*) at GrGpu.cpp: 335 
	10: 0x10058646a in GrGpu::onDraw(GrDrawTarget::DrawInfo const&) at GrGpu.cpp: 374 
	11: 0x10058fa42 in GrDrawTarget::executeDraw(GrDrawTarget::DrawInfo const&) at GrDrawTarget.h: 495 
	12: 0x10058ca00 in GrInOrderDrawBuffer::flush() at GrInOrderDrawBuffer.cpp: 610 
	13: 0x1005618cb in GrContext::flush(int) at GrContext.cpp: 1213 
	14: 0x10056d5a5 in GrContext::AutoCheckFlush::~AutoCheckFlush() at GrContext.cpp: 74 
	15: 0x10056a975 in GrContext::AutoCheckFlush::~AutoCheckFlush() at GrContext.cpp: 74 
	16: 0x1005653ef in GrContext::drawRectToRect(GrPaint const&, SkRect const&, SkRect const&, SkMatrix const*, SkMatrix const*) at GrContext.cpp: 901 
	17: 0x10062acef in SkGpuDevice::drawDevice(SkDraw const&, SkBaseDevice*, int, int, SkPaint const&) at SkGpuDevice.cpp: 1722 
	18: 0x10027a20a in SkCanvas::internalDrawDevice(SkBaseDevice*, int, int, SkPaint const*) at SkCanvas.cpp: 1269 
	19: 0x100277f3e in SkCanvas::internalRestore() at SkCanvas.cpp: 1053 
	20: 0x1002838ce in AutoDrawLooper::~AutoDrawLooper() at SkCanvas.cpp: 371 
	21: 0x100282d45 in AutoDrawLooper::~AutoDrawLooper() at SkCanvas.cpp: 374 
	22: 0x10027af0b in SkCanvas::internalDrawBitmap(SkBitmap const&, SkMatrix const&, SkPaint const*) at SkCanvas.cpp: 1237 
	23: 0x10027f0a8 in SkCanvas::drawBitmap(SkBitmap const&, float, float, SkPaint const*) at SkCanvas.cpp: 2121 
	24: 0x1003e8751 in SkDropShadowImageFilter::onFilterImage(SkImageFilter::Proxy*, SkBitmap const&, SkImageFilter::Context const&, SkBitmap*, SkIPoint*) const at SkDropShadowImageFilter.cpp: 93 
	25: 0x1002ba4c5 in SkImageFilter::filterImage(SkImageFilter::Proxy*, SkBitmap const&, SkImageFilter::Context const&, SkBitmap*, SkIPoint*) const at SkImageFilter.cpp: 106 
	26: 0x100279f69 in SkCanvas::internalDrawDevice(SkBaseDevice*, int, int, SkPaint const*) at SkCanvas.cpp: 1262 
	27: 0x100277f3e in SkCanvas::internalRestore() at SkCanvas.cpp: 1053 
	28: 0x1002838ce in AutoDrawLooper::~AutoDrawLooper() at SkCanvas.cpp: 371 
	29: 0x100282d45 in AutoDrawLooper::~AutoDrawLooper() at SkCanvas.cpp: 374 
	30: 0x10027af0b in SkCanvas::internalDrawBitmap(SkBitmap const&, SkMatrix const&, SkPaint const*) at SkCanvas.cpp: 1237 
	31: 0x10027f0a8 in SkCanvas::drawBitmap(SkBitmap const&, float, float, SkPaint const*) at SkCanvas.cpp: 2121
Apparently we generate a new program for each changing blur sigma, and those are never removed.
Looks like that was a red herring, the program generation does eventually seem to stop. Not sure where we are running out of resources.

On Mac, I die inside the NVIDIA driver due to some assertion failure after 605 iterations.
According to tracing, there aren't a bunch of textures or FBOs or anything running around. At this point I think maybe something is technically destroyed (so tracing doesn't show it), but still attached to something that's not, and therefore actually alive. We were having some trouble with the Android emulator due to some usage like that in Skia, but I can't seem to find the bug.
According to OpenGL Driver Monitor, free GPU memory goes from about ~400M to 50M right before the crash, so there is definitely some wonky retention going on.
Apitrace all the things \o/

(I'll take a look later today)
That's Chrome/Blink-specific, but perhaps the same cleanup is required in Gecko.
(In reply to Stephen White from comment #20)
> That's Chrome/Blink-specific, but perhaps the same cleanup is required in
> Gecko.

Interesting bug, but we don't ever call GrContext::contextLost() right now. Also, this particular leak occurs just by looping some drawBitmap() calls, so I don't think GrContext destruction plays into it at all.
tracking-fennec: --- → ?
Assignee: nobody → snorp
tracking-fennec: ? → +
Stephen, any update here?
Flags: needinfo?(elsenorblanco)
It looks like the shader cache is generating the shaders each draw, and failing to cache them properly. I'm not really sure how the ownership of the shaders is handled, to be honest. Brian, any clues?
filter on [mass-p5]
Priority: -- → P5

SkiaGL was removed by bug 1530471.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(elsenorblanco)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.