Closed Bug 1389988 Opened 2 years ago Closed 2 years ago

Intermittent Assertion failure: !aState.GetIsCurrentlyDecoded(), at FrameAnimator.cpp:471

Categories

(Core :: ImageLib, defect, P4)

defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr52 --- unaffected
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- fixed

People

(Reporter: aryx, Assigned: aosmond)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files, 1 obsolete file)

https://treeherder.mozilla.org/logviewer.html#?job_id=122856153&repo=autoland

09:49:58     INFO - REFTEST TEST-LOAD | file:///Users/cltbld/tasks/task_1502638480/build/tests/reftest/tests/layout/reftests/invalidation/test-animated-image-layers-background.html | 898 / 996 (90%)
09:49:58     INFO - ++DOMWINDOW == 313 (0x124d34800) [pid = 1644] [serial = 2723] [outer = 0x10fe5d000]
09:49:58     INFO - [Child 1644] WARNING: stylo: ServoStyleSets cannot respond to document state changes yet (only matters for chrome documents). See bug 1290285.: file /home/worker/workspace/build/src/layout/base/PresShell.cpp, line 4327
09:49:58     INFO - Assertion failure: !aState.GetIsCurrentlyDecoded(), at /home/worker/workspace/build/src/image/FrameAnimator.cpp:471
09:49:58     INFO - #01: mozilla::image::RasterImage::LookupFrameInternal(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, unsigned int, mozilla::image::PlaybackType) [image/RasterImage.cpp:294]
09:49:58     INFO - 
09:49:58     INFO - #02: mozilla::image::RasterImage::LookupFrame(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, unsigned int, mozilla::image::PlaybackType) [image/RasterImage.cpp:358]
09:49:58     INFO - 
09:49:58     INFO - #03: mozilla::image::RasterImage::GetFrameInternal(mozilla::gfx::IntSizeTyped<mozilla::gfx::UnknownUnits> const&, unsigned int, unsigned int) [image/RasterImage.cpp:582]
09:49:58     INFO - 
09:49:58     INFO - #04: mozilla::image::RasterImage::GetCurrentImage(mozilla::layers::ImageContainer*, unsigned int) [mfbt/Tuple.h:329]
09:49:58     INFO - 
09:49:58     INFO - #05: mozilla::image::RasterImage::GetImageContainer(mozilla::layers::LayerManager*, unsigned int) [mfbt/Tuple.h:329]
09:49:58     INFO - 
09:49:58     INFO - #06: nsDisplayImageContainer::GetContainer(mozilla::layers::LayerManager*, nsDisplayListBuilder*) [xpcom/base/nsCOMPtr.h:402]
09:49:58     INFO - 
09:49:58     INFO - #07: nsDisplayBackgroundImage::BuildLayer(nsDisplayListBuilder*, mozilla::layers::LayerManager*, mozilla::ContainerLayerParameters const&) [mfbt/AlreadyAddRefed.h:121]
09:49:58     INFO - 
09:49:58     INFO - #08: mozilla::ContainerState::ProcessDisplayItems(nsDisplayList*) [mfbt/AlreadyAddRefed.h:121]
09:49:58     INFO - 
09:49:58     INFO - #09: mozilla::FrameLayerBuilder::BuildContainerLayerFor(nsDisplayListBuilder*, mozilla::layers::LayerManager*, nsIFrame*, nsDisplayItem*, nsDisplayList*, mozilla::ContainerLayerParameters const&, mozilla::gfx::Matrix4x4Typed<mozilla::gfx::UnknownUnits, mozilla::gfx::UnknownUnits> const*, unsigned int) [layout/painting/FrameLayerBuilder.cpp:5704]
09:49:58     INFO - 
09:49:58     INFO - #10: nsDisplayList::PaintRoot(nsDisplayListBuilder*, gfxContext*, unsigned int) [layout/painting/nsDisplayList.cpp:2203]
09:49:58     INFO - 
09:49:58     INFO - #11: nsLayoutUtils::PaintFrame(gfxContext*, nsIFrame*, nsRegion const&, unsigned int, nsDisplayListBuilderMode, nsLayoutUtils::PaintFrameFlags) [mfbt/AlreadyAddRefed.h:121]
09:49:58     INFO - 
09:49:58     INFO - #12: mozilla::PresShell::Paint(nsView*, nsRegion const&, unsigned int) [layout/base/PresShell.cpp:6473]
09:49:58     INFO - 
09:49:58     INFO - #13: nsDOMWindowUtils::UpdateLayerTree() [gfx/src/nsRegion.h:75]
09:49:58     INFO - 
09:49:59     INFO - ###!!! [Parent][MessageChannel] Error: (msgtype=0x150078,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
09:49:59     INFO - Assertion failure: mDestroyed, at /home/worker/workspace/build/src/gfx/layers/composite/ImageLayerComposite.cpp:44
09:49:59     INFO - #01: <name omitted> [memory/mozalloc/mozalloc.h:224]
09:49:59     INFO - 
09:49:59     INFO - #02: mozilla::layers::ContainerLayer::RemoveChild(mozilla::layers::Layer*) [gfx/layers/Layers.h:827]
09:49:59     INFO - 
09:49:59     INFO - #03: mozilla::layers::ContainerLayerComposite::~ContainerLayerComposite() [gfx/layers/composite/ContainerLayerComposite.cpp:666]
09:49:59     INFO - 
09:49:59     INFO - #04: mozilla::layers::ContainerLayerProperties::~ContainerLayerProperties() [memory/mozalloc/mozalloc.h:224]
09:49:59     INFO - 
09:49:59     INFO - #05: nsTArray_Impl<mozilla::UniquePtr<mozilla::layers::LayerPropertiesBase, mozilla::DefaultDelete<mozilla::layers::LayerPropertiesBase> >, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned long, unsigned long) [xpcom/ds/nsTArray.h:2011]
09:49:59     INFO - 
09:49:59     INFO - #06: mozilla::layers::ContainerLayerProperties::~ContainerLayerProperties() [xpcom/ds/nsTArray-inl.h:21]
09:49:59     INFO - 
09:49:59     INFO - #07: mozilla::layers::ContainerLayerProperties::~ContainerLayerProperties() [memory/mozalloc/mozalloc.h:224]
09:49:59     INFO - 
09:49:59     INFO - #08: nsTArray_Impl<mozilla::UniquePtr<mozilla::layers::LayerPropertiesBase, mozilla::DefaultDelete<mozilla::layers::LayerPropertiesBase> >, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned long, unsigned long) [xpcom/ds/nsTArray.h:2011]
09:49:59     INFO - 
09:49:59     INFO - #09: mozilla::layers::ContainerLayerProperties::~ContainerLayerProperties() [xpcom/ds/nsTArray-inl.h:21]
09:49:59     INFO - 
09:49:59     INFO - #10: mozilla::layers::ContainerLayerProperties::~ContainerLayerProperties() [memory/mozalloc/mozalloc.h:224]
09:49:59     INFO - 
09:49:59     INFO - #11: mozilla::layers::LayerManagerComposite::UpdateAndRender() [mfbt/UniquePtr.h:340]
09:49:59     INFO - 
09:49:59     INFO - #12: mozilla::layers::LayerManagerComposite::EndTransaction(mozilla::TimeStamp const&, mozilla::layers::LayerManager::EndTransactionFlags) [mfbt/RefPtr.h:314]
09:49:59     INFO - 
09:49:59     INFO - #13: mozilla::layers::CompositorBridgeParent::CompositeToTarget(mozilla::gfx::DrawTarget*, mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const*) [gfx/layers/ipc/CompositorBridgeParent.cpp:1044]
09:49:59     INFO - 
09:49:59     INFO - #14: mozilla::layers::CompositorVsyncScheduler::Composite(mozilla::TimeStamp) [gfx/layers/ipc/CompositorVsyncScheduler.cpp:263]
09:49:59     INFO - 
09:49:59     INFO - #15: mozilla::detail::RunnableMethodImpl<mozilla::layers::CompositorVsyncScheduler*, void (mozilla::layers::CompositorVsyncScheduler::*)(mozilla::TimeStamp), true, (mozilla::RunnableKind)1, mozilla::TimeStamp>::Run() [xpcom/threads/nsThreadUtils.h:1122]
09:49:59     INFO - 
09:49:59     INFO - #16: MessageLoop::RunTask(already_AddRefed<nsIRunnable>) [xpcom/base/nsCOMPtr.h:374]
09:49:59     INFO - 
09:49:59     INFO - #17: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask&&) [ipc/chromium/src/base/message_loop.cc:460]
09:49:59     INFO - 
09:49:59     INFO - #18: MessageLoop::DoWork() [ipc/chromium/src/base/message_loop.cc:535]
09:49:59     INFO - 
09:49:59     INFO - #19: base::MessagePumpDefault::Run(base::MessagePump::Delegate*) [ipc/chromium/src/base/message_pump_default.cc:36]
09:49:59     INFO - 
09:49:59     INFO - #20: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:599]
09:49:59     INFO - 
09:49:59     INFO - #21: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:184]
09:49:59     INFO - 
09:49:59     INFO - #22: ThreadFunc(void*) [ipc/chromium/src/base/platform_thread_posix.cc:39]
09:49:59     INFO - 
09:49:59     INFO - #23: libsystem_pthread.dylib + 0x405a
09:49:59     INFO - 
09:49:59     INFO - #24: libsystem_pthread.dylib + 0x3fd7
Priority: -- → P4
Summary: Intermittent Assertion failure: !aState.GetIsCurrentlyDecoded(), at FrameAnimator.cpp:471 → stylo: Intermittent Assertion failure: !aState.GetIsCurrentlyDecoded(), at FrameAnimator.cpp:471
[Tracking Requested - why for this release]: regression

bughunter sees this on https://www.stickmanbangkok.com/readers-submissions/
you may need to reload. occurs in Windows, Linux. Reproduced manually on Windows 7 64bit.
Assignee: nobody → bwerth
Attached file Reduced testcase
The greatly reduced testcase shows that this is just an animated GIF that is being manipulated by script in some way. The script is minified jQuery and so it's not at all obvious to me what is being done to the image.

This is still intermittent. I can reproduce on macOS by repeatedly opening a new tab with the testcase. Reloading an existing tab does not trigger the crash for me. When the crash occurs, it seems to do so before the image has been rendered, which may explain the intermittent. If the script runs before the first paint: a crash.

I'm still working on understanding the rendering state when the crash happens. The assert that is failing is that both mCompositedFrameInvalid and mHasBeenDecoded are true on the AnimationState when it's time to draw the frame in FrameAnimator::GetCompositedFrame.
Thanks for reducing this!

Can you confirm that this only happens with stylo disabled? Looking at the history, it seems like this bug was tagged as a stylo bug because it appeared one on the stylo-only CI. But that might just be chance, and given that this is sounding pretty image-y I'm wondering if it's unrelated to stylo.
Flags: needinfo?(bwerth)
Bughunter is running with the default prefs and should have had stylo enabled when it was reproduced there. I just reproduced locally on Fedora definitely with stylo enabled.
Reproduces without servo enabled for me on macOS. Does not appear to be a stylo bug.
Flags: needinfo?(bwerth)
Summary: stylo: Intermittent Assertion failure: !aState.GetIsCurrentlyDecoded(), at FrameAnimator.cpp:471 → Intermittent Assertion failure: !aState.GetIsCurrentlyDecoded(), at FrameAnimator.cpp:471
I believe attachment 8944820 [details] solves the problem; it at least prevents my reduced testcase attachment 8910537 [details] from crashing. I'm not exactly sure why this fix works, but the change is a commonsense optimization that possibly removes a race condition. I hope that bbirtles can verify whether or not this is sensible in a review.
Attachment #8944820 - Flags: review?(bbirtles)
Attachment #8944820 - Flags: review?(bbirtles) → review?(aosmond)
(In reply to Brad Werth [:bradwerth] from comment #12)
> I believe attachment 8944820 [details] solves the problem; it at least
> prevents my reduced testcase attachment 8910537 [details] from crashing. I'm
> not exactly sure why this fix works, but the change is a commonsense
> optimization that possibly removes a race condition. I hope that bbirtles
> can verify whether or not this is sensible in a review.

Sorry for the delay on this. I have been looking at this on and off, but I haven't convinced myself about the fix yet. These assertions were tricky to get right in my head the first time :).
There is a race condition here related to sync decoding. I'll see what I can dig up from rr.
Comment on attachment 8944820 [details]
Bug 1389988 Part 1: Prevent AnimationState::UpdateStateInternal from repeatedly setting mIsCurrentlyDecoded true every frame; avoiding race condition.

The problem is very different and sort of complicated. I put together something that will (should?) fix the intermittent, assuming it is the same as the attached test case.

I could reproduce easily if I changed dom.ipc.processCount from 4 to 1 to ensure the additional tabs are in the same content process.
Attachment #8944820 - Flags: review?(aosmond)
Comment on attachment 8944820 [details]
Bug 1389988 Part 1: Prevent AnimationState::UpdateStateInternal from repeatedly setting mIsCurrentlyDecoded true every frame; avoiding race condition.

https://reviewboard.mozilla.org/r/214978/#review222084

This shows there is a data race elsewhere.
Attachment #8944820 - Flags: review-
Comment on attachment 8944820 [details]
Bug 1389988 Part 1: Prevent AnimationState::UpdateStateInternal from repeatedly setting mIsCurrentlyDecoded true every frame; avoiding race condition.

Happy to have a real fix. Thanks!
Attachment #8944820 - Attachment is obsolete: true
Assignee: bwerth → nobody
(In reply to Brad Werth [:bradwerth] from comment #19)
> Comment on attachment 8944820 [details]
> Bug 1389988 Part 1: Prevent AnimationState::UpdateStateInternal from
> repeatedly setting mIsCurrentlyDecoded true every frame; avoiding race
> condition.
> 
> Happy to have a real fix. Thanks!

And thank you for the reduced test case and making me stare at the review for a while, I wouldn't have solved it otherwise.
Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Attachment #8946348 - Flags: review?(tnikkel)
Component: Layout → ImageLib
Comment on attachment 8946348 [details] [diff] [review]
0001-Bug-1389988-Handle-a-data-race-between-a-new-sync-de.patch, v1

Review of attachment 8946348 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/DecoderFactory.cpp
@@ +135,5 @@
>    decoder->SetOutputSize(aOutputSize);
>    decoder->SetDecoderFlags(aDecoderFlags | DecoderFlags::FIRST_FRAME_ONLY);
>    decoder->SetSurfaceFlags(aSurfaceFlags);
>  
> +  nsresult rv = decoder->Init();

This is super nit picky, but this could also return already initialized and we don't want to confuse that with the surface cache insertion failure. So I guess just return a generic failure here?

::: image/DecoderFactory.h
@@ +63,5 @@
>     *                    image.
>     * @param aDecoderFlags Flags specifying the behavior of this decoder.
>     * @param aSurfaceFlags Flags specifying the type of output this decoder
>     *                      should produce.
> +   * @param aTask Task representing the decoder.

I think the style is to use aOutTask for this, no?

@@ +68,2 @@
>     */
> +  static nsresult

Since the already initialized return value has a special meaning you should document that please.
Attachment #8946348 - Flags: review?(tnikkel) → review+
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e8e9b5a102ac
Handle a data race between a new sync decode request and a pending decoder. r=tnikkel
Duplicate of this bug: 1398316
https://hg.mozilla.org/mozilla-central/rev/e8e9b5a102ac
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Andrew, do you think this is applicable to 59 and worth uplift to beta?
Flags: needinfo?(aosmond)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #25)
> Andrew, do you think this is applicable to 59 and worth uplift to beta?

This was introduced in bug 1343341 and release 55, so it has been there for a while. I think it is safe to uplift, but there isn't any urgency for an end user perspective. It will only assert in debug builds, and probably flicker once at worst in a release build. Happy to put in the request if it helps for automated tests.
Flags: needinfo?(aosmond)
No sign of this actively affecting mozilla-beta, so let's have this ride the trains.
You need to log in before you can comment on or make changes to this bug.