Closed Bug 1641594 Opened 4 months ago Closed 4 months ago

Assertion failure: mParent->mRecycleLockCount > 0, at /builds/worker/checkouts/gecko/image/imgFrame.cpp:967


(Core :: ImageLib, defect)




Tracking Status
firefox-esr68 --- wontfix
firefox-esr78 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix
firefox79 --- verified


(Reporter: jkratzer, Assigned: aosmond)


(Blocks 1 open bug)


(Keywords: assertion, testcase, Whiteboard: [bugmon:bisected,confirmed])


(4 files)

Attached file testcase.html

Testcase found while fuzzing mozilla-central rev cfa4bd8e6f78 (built with --enable-debug). Testcase must be served via HTTP in order to reproduce.

Assertion failure: mParent->mRecycleLockCount > 0, at /builds/worker/checkouts/gecko/image/imgFrame.cpp:967

rax = 0x00007fd4d19a3db1   rdx = 0x0000000000000000
rcx = 0x000055c439289a58   rbx = 0x00000000000082dc
rsi = 0x00007fd4e2d748b0   rdi = 0x00007fd4e2d73680
rbp = 0x00007fd4e0044460   rsp = 0x00007fd4e0044440
r8 = 0x00007fd4e2d748b0    r9 = 0x00007fd4e0045700
r10 = 0x0000000000000002   r11 = 0x0000000000000000
r12 = 0x00007fd480007068   r13 = 0x00007fd4e0044690
r14 = 0x000055c43b0dd260   r15 = 0x000055c43b0dd288
rip = 0x00007fd4ca4a3815
OS|Linux|0.0.0 Linux 5.3.0-51-generic #44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020 x86_64
CPU|amd64|family 6 model 94 stepping 3|8
14|7||mozilla::UniquePtr<mozilla::layers::PaintTask, mozilla::DefaultDelete<mozilla::layers::PaintTask> >::reset(mozilla::layers::PaintTask*)||302|0xe
14|8||mozilla::detail::RunnableFunction<mozilla::layers::PaintThread::QueuePaintTask(mozilla::UniquePtr<mozilla::layers::PaintTask, mozilla::DefaultDelete<mozilla::layers::PaintTask> >&&)::$_7>::~RunnableFunction()||565|0x2b
14|10||nsThread::ProcessNextEvent(bool, bool*)||415|0x9
14|11||NS_ProcessNextEvent(nsIThread*, bool)||501|0xc

Flags: in-testsuite?
Whiteboard: [bugmon:confirm] → [bugmon:bisected,confirmed]
Bugmon Analysis:
Verified bug as reproducible on mozilla-central 20200528032513-cfa4bd8e6f78.
Failed to bisect testcase (Start build crashes!):
> Start: 73c98da145a7c0ef518404493b23a979f328768e (20190530034755)
> End: cfa4bd8e6f789fcca1de2272f0d5b11c0ded913f (20200528032513)
> BuildFlags: BuildFlags(asan=False, tsan=False, debug=True, fuzzing=False, coverage=False, valgrind=False)
Severity: normal → S3
Flags: needinfo?(aosmond)

I have tried several times without success to reproduce this. The counter logic is pretty straightforward -- increment in the constructor (while holding the lock) and decrement in the deconstructor (while holding the lock). Those are the only places it is modified. There is some funny business with ClearCommands where we explicitly call the destructors, perhaps there are cases we manage to call it twice. I'd love to dig further but hard to prioritize unless we also see this with WebRender (completely different code path for managing the lifetimes).

One question I have is, where there any special prefs set? Much of Linux will get WebRender by default on nightly/developer builds, so either the machine which ran the fuzzing is a little special (is it in CI with no real GPU?) or maybe some prefs got set.

Flags: needinfo?(aosmond) → needinfo?(jkratzer)
Attached file prefs.js
Flags: needinfo?(jkratzer)

Andrew, I've attached the prefs that I used to reproduce this testcase. Note that I can reproduce this on my local machine with a standard graphics card.

Interesting I never thought to set both the threshold and the batch size to 0 and 1 respectively. I played with them but not specifically those values. I can reproduce now. Thanks!

So it appears RecyclingSourceSurface destructor never gets called and so the lock count goes up and up and up. Eventually it rolls over. That would do it!

I could paper over this by increasing the counter size, but I think the real fix is to remove RecyclingSourceSurface. Otherwise we end up making apparently thousands upon thousands of short term allocations we don't need and blowing up the memory usage during animations.

  1. We can use SourceSurface::hasOneRef to determine if anyone else outside of imagelib holds a reference to mLockedSurface, that way we can hand it out directly.

  2. WebRender does extra tracking work if it is a recycling shared surface. We need to ensure we mark the type of mLockedSurface as DATA_SHARED_RECYCLING to replace the existing check.

This should not be too much work but I don't really want to land this during the soft freeze, so I'll get it ready for the next cycle.

Assignee: nobody → aosmond
OS: Unspecified → All
Hardware: Unspecified → All

We can perform the same function as RecyclingSourceSurface by checking
the ref count of the underlying surface directly. We need to ensure
WebRender is explicitly aware that it is a recycled surface, but that is
easily achieved by changing the type of the surface. This avoids
unnecessary heap allocations, particularly in the case where many
elements on the same page refer to the same animation (and thus
duplicating RecyclingSourceSurface objects).

Pushed by
Remove the wrapper around recycled surfaces. r=tnikkel
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79

The patch landed in nightly and beta is affected.
:aosmond, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(aosmond)
Keywords: bugmon
Bugmon Analysis:
Verified bug as fixed on rev mozilla-central 20200615092624-f05a0084c5f2.
Removing bugmon keyword as no further action possible.
Please review the bug and re-add the keyword for further analysis.
Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.