Closed Bug 1412554 Opened 2 years ago Closed 2 years ago

Intermittent GECKO(5440) | Assertion failed: mAllocatedResourceDeviceMemory[ResourceTypeIndex(resourceType)] >= memorySize, file z:/build/build/src/gfx/angle/src/libANGLE/renderer/d3d/d3d11/ResourceManager11.cpp, line 397

Categories

(Core :: Canvas: WebGL, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox57 --- wontfix
firefox58 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: cleu)

References

()

Details

(Keywords: intermittent-failure, Whiteboard: [gfx-noted][stockwell fixed:product])

Attachments

(2 files)

Priority: P5 → P3
Whiteboard: [gfx-noted]
Whiteboard: [gfx-noted] → [gfx-noted][stockwell needswork]
this had 54 failures in 1 day and bug 1412477 had 38 failures in 1 day- this looks to be failing 50% of the time.
Flags: needinfo?(milan)
See Also: → 1412383
It seems that this bug is related to bug 1412383 since both of them are crashed inside memory management related code.

Possibly another OOM failure cannot be mitigated by enabling virtual memory.
Michael, can you keep following up here?  How can we resolve this?
Flags: needinfo?(milan) → needinfo?(cleu)
See Also: → 1413371
I am investigating on it.

It seems that sometimes when losing a gl context, ANGLE will release a 1GB-sized texture which is bigger than all allocation so an assertion is triggered.

I compared a successful and a failed gl8 mochitest, finding that there is no 1GB-sized texture release when the test is successful.

So I suspect the 1GB-sized release is bogus, but I have no idea why did that happen for now.
Flags: needinfo?(cleu)
OK, it seems that the 1GB-sized texture is totally legit.

I added logs on every resource allocate/release, tracking its inventory memory size before/after the allocate/release.

The 2 numbers represents number of objects and total size.

We can see one of the incrResource call, with one more object allocated but its size is significantly decreased, which is weird.

Also, the log inside ComputeMippedMemoryUsage said the object size is 16384*16384*4 = 1GB.

So if nothing going wrong, the correct size should be 3,982,035,420 + (16,384 * 16,384 * 4) = 3,982,035,420 + 1,073,741,824 = 5,055,777,244

Then we log it with floor number 2.

=>  log(5,055,777,244)/log(2) ~= 32.24

It just overflew.

So the problem is not OOM, it is actually an unsigned int overflow.

However, sometime it does not allocate texture that large.

Now I think we should investigate how ANGLE decide the size of texture and why this size varies among every mochitest ran.
I found that if we see the warning below, the test will not fail.
JavaScript warning: http://mochi.test:8888/tests/dom/canvas/test/webgl-mochitest/test_canvas_size.html, line 24: Error: WebGL warning: Requested size 16384x16384 was too large, but resize to 8192x8192 succeeded.

So this part is why the size varies among every mochitest run.

And interestingly, all the tests share same ANGLE context, and the overflow does not crash until we are releasing our WebGL context, so I think the crashes inside test_capture is actually broken by previous test instead of the test itself.
We can actually workaround this bug by limiting maximum texture/renderbuffer size to 8192x8192 if we encounter a 32-bit windows around here.
http://searchfox.org/mozilla-central/source/gfx/gl/GLContext.cpp#907

However, for ANGLE it should return OOM instead of just overflow.
Assignee: nobody → cleu
Comment on attachment 8926241 [details]
Bug 1412554 - Return GL_OOM if memory counter in ResourceManager11 overflow

https://reviewboard.mozilla.org/r/197502/#review202716


C/C++ static analysis found 0 defects in this patch.

You can run this analysis locally with: `./mach static-analysis check path/to/file.cpp`
This assert is actually caused by test_canvas_size.html in gl8, which attempts allocating very large canvas that may overflow ANGLE's D3D11 ResourceManager, however, ANGLE does not check overflow when allocating resources, they only check memory size on release, so we trigger this assert in latter tests when we want to lose contexts.

I submitted a patch that checks whether the memory counter overflow or not, blocking allocation and returning GL_OOM if an upcoming overflow is perceived.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=030c3f88759612e45f009e26dfae51be293dce6a&selectedJob=142933938

It seems that those intermittent failures are disappeared after this change.
Attachment #8926241 - Flags: review?(jgilbert)
I reported this issue to Google as well.

https://bugs.chromium.org/p/angleproject/issues/detail?id=2234
if :jgilbert is not available to review, can we get someone else to review this?  Right now this failure rate of >600/week is pretty high.
I'll review today.
Comment on attachment 8926241 [details]
Bug 1412554 - Return GL_OOM if memory counter in ResourceManager11 overflow

https://reviewboard.mozilla.org/r/197502/#review203870

::: gfx/angle/src/libANGLE/renderer/d3d/d3d11/ResourceManager11.cpp:391
(Diff revision 1)
>      {
>          ANGLE_TRY(ClearResource(renderer, desc, resource));
>      }
>  
>      ASSERT(resource);
>      incrResource(GetResourceTypeFromD3D11<T>(), ComputeMemoryUsage(desc));

Calculate the memory only once, store it in a local:
`const auto resourceSize = ComputeMemoryUsage(desc);`


Honestly this should probably just be uint64, but we can take this fix now, and recommend ANGLE do that.
Attachment #8926241 - Flags: review?(jgilbert) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/1c459f5c05e8
Return GL_OOM if memory counter in ResourceManager11 overflow r=jgilbert
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/1c459f5c05e8
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Duplicate of this bug: 1413371
Duplicate of this bug: 1412477
Whiteboard: [gfx-noted][stockwell needswork:owner] → [gfx-noted][stockwell fixed:product]
Duplicate of this bug: 1414703
Duplicate of this bug: 1412383
Sweet, this was only hit on try in the last 7 days.
You need to log in before you can comment on or make changes to this bug.