Closed Bug 1530657 Opened 5 years ago Closed 5 years ago

Intermittent GECKO(10828) | Assertion failure: aCount == 0 || aStart < Length() (Invalid aStart index), at z:/build/build/src/obj-firefox/dist/include/nsTArray.h:2222

Categories

(Core :: Layout, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- unaffected
firefox66 --- fixed
firefox67 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: tnikkel)

References

Details

(Keywords: assertion, intermittent-failure, regression)

Attachments

(2 files, 1 obsolete file)

#[markdown(off)]
Filed by: cbrindusan [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=230513528&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/KCQKYSOoS0Cr1gVkkmzrDA/runs/0/artifacts/public/logs/live_backing.log

11:31:22 INFO - TEST-START | layout/svg/tests/test_hover_near_text.html
11:31:22 INFO - GECKO(10828) | Assertion failure: aCount == 0 || aStart < Length() (Invalid aStart index), at z:/build/build/src/obj-firefox/dist/include/nsTArray.h:2222
11:31:22 INFO - GECKO(10828) | #01: static void DestroyDisplayItemDataForFrames(class nsIFrame *) [layout/generic/nsSubDocumentFrame.cpp:0]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #02: static void DestroyDisplayItemDataForFrames(class nsIFrame *) [layout/generic/nsSubDocumentFrame.cpp:0]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #03: static void DestroyDisplayItemDataForFrames(class nsIFrame *) [layout/generic/nsSubDocumentFrame.cpp:0]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #04: static void DestroyDisplayItemDataForFrames(class nsIFrame *) [layout/generic/nsSubDocumentFrame.cpp:0]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #05: static void DestroyDisplayItemDataForFrames(class nsIFrame *) [layout/generic/nsSubDocumentFrame.cpp:0]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #06: static class nsView * BeginSwapDocShellsForViews(class nsView *) [layout/generic/nsSubDocumentFrame.cpp:1046]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #07: nsSubDocumentFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsSubDocumentFrame.cpp:952]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #08: nsLineBox::DeleteLineList(nsPresContext *,nsLineList &,nsIFrame *,nsFrameList *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsLineBox.cpp:365]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #09: nsBlockFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsBlockFrame.cpp:329]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #10: nsLineBox::DeleteLineList(nsPresContext *,nsLineList &,nsIFrame *,nsFrameList *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsLineBox.cpp:365]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #11: nsBlockFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsBlockFrame.cpp:329]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #12: nsFrameList::DestroyFramesFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsFrameList.cpp:50]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #13: nsContainerFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsContainerFrame.cpp:214]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #14: nsCanvasFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsCanvasFrame.cpp:218]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #15: nsFrameList::DestroyFramesFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsFrameList.cpp:50]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #16: nsContainerFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsContainerFrame.cpp:214]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #17: nsFrameList::DestroyFramesFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsFrameList.cpp:50]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #18: nsContainerFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsContainerFrame.cpp:214]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #19: nsFrameManager::Destroy() [layout/base/nsFrameManager.cpp:53]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #20: mozilla::PresShell::Destroy() [layout/base/PresShell.cpp:1351]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #21: nsDocumentViewer::DestroyPresShell() [layout/base/nsDocumentViewer.cpp:4249]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #22: nsDocumentViewer::Destroy() [layout/base/nsDocumentViewer.cpp:1799]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #23: nsDocumentViewer::Show() [layout/base/nsDocumentViewer.cpp:2088]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #24: nsPresContext::EnsureVisible() [layout/base/nsPresContext.cpp:1751]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #25: nsIPresShell::UnsuppressAndInvalidate() [layout/base/PresShell.cpp:3798]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #26: mozilla::PresShell::sPaintSuppressionCallback(nsITimer *,void *) [layout/base/PresShell.cpp:1835]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #27: nsTimerImpl::Fire(int) [xpcom/threads/nsTimerImpl.cpp:559]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #28: nsTimerEvent::Run() [xpcom/threads/TimerThread.cpp:260]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #29: mozilla::SchedulerGroup::Runnable::Run() [xpcom/threads/SchedulerGroup.cpp:292]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #30: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:1149]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #31: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/threads/nsThreadUtils.cpp:474]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #32: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:88]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #33: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:309]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #34: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:291]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #35: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:139]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #36: nsAppShell::Run() [widget/windows/nsAppShell.cpp:411]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #37: XRE_RunAppShell() [toolkit/xre/nsEmbedFunctions.cpp:908]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #38: mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:238]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #39: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:309]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #40: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:291]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #41: XRE_InitChildProcess(int,char * * const,XREChildData const *) [toolkit/xre/nsEmbedFunctions.cpp:750]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #42: NS_internal_main(int,char * *,char * *) [browser/app/nsBrowserApp.cpp:265]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #43: wmain [toolkit/xre/nsWindowsWMain.cpp:131]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #44: static int __scrt_common_main_seh() [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:288]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #45: KERNEL32.DLL + 0x13034
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #46: ntdll.dll + 0x71461

An intermittent array-out-of-bounds access seems bad.

Component: XPCOM → Layout
Priority: P5 → --

Out of bounds in DestroyDisplayItemDataForFrames... I don't know anything in there looking terribly suspicious so far...

Ryan has done some subdoc work recently for Fission, and maybe Miko or Matt have some ideas?

Could we get this retriggered ASAP please to see when it started?

Flags: needinfo?(sheriffs)

Huh, so this looks like a recent-ish regression then.

Also this is on the Clear() call for DisplayItemData in DestroyDisplayItemDataFor, I guess (the assertion is on RemoveElementsAt). Or in RemoveFrameFromLayerManager I guess, though that's more unlikely to be inlined.

First fail is for a Try push for bug 1505871 which landed earlier today on inbound where the 3 failures from today can be found.

Flags: needinfo?(sheriffs)

Timothy, please investigate that assertion from bug 1505871.

Blocks: 1505871
Flags: needinfo?(tnikkel)

Surprisingly this reproduces almost every time for me locally. I also get

Assertion failure: IsIdle(oldState) || IsRead(oldState), at /Users/tim/ffwho/src/xpcom/ds/PLDHashTable.h:125

sometimes instead, but I think it's probably the same issue hitting a different assert.

Looks like the problem happens when destroying the webrenderuserdata property.

(In reply to Timothy Nikkel (:tnikkel) from comment #7)

Surprisingly this reproduces almost every time for me locally. I also get

Assertion failure: IsIdle(oldState) || IsRead(oldState), at /Users/tim/ffwho/src/xpcom/ds/PLDHashTable.h:125

sometimes instead, but I think it's probably the same issue hitting a different assert.

FWIW, this is PLDHashTable's thread safety assert, which probably (partly) explains the intermittency.

Attached file stack.txt

So looks like whats happening here is we are destroying a parent document, when we destroy the subdocument frame we also clear the painting related frame properties including the webrender properties on the frames inside the subdocument. So we are destroying the WebRenderUserDataTable set as a frame property on a frame in the child document. This finds its way into BlobItemData::ClearFrame which deletes the BlobGroupDataProperty property on the same frame, whilst we are still deleting the WebRenderUserData property. This leaves the frame property table for that frame in a bad state. Then later we destroy the child document, and try to clear all it's frame properties and assert cause the table is not in a consistent state.

Flags: needinfo?(tnikkel)
Attached patch webrenderframepropdelrecurse (obsolete) — Splinter Review

This fix suggested by Matt.

Not sure if you want to go with this approach or some changes to the data structures involved.

Assignee: nobody → tnikkel
Attachment #9047009 - Flags: review?(jmuizelaar)
Attachment #9047009 - Attachment is obsolete: true
Attachment #9047009 - Flags: review?(jmuizelaar)
Attachment #9047011 - Flags: review?(jmuizelaar)
Comment on attachment 9047011 [details] [diff] [review]
webrenderframepropdelrecurse

Review of attachment 9047011 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsFrame.cpp
@@ +910,5 @@
> +  if (userDataTable) {
> +    for (auto iter = userDataTable->Iter(); !iter.Done(); iter.Next()) {
> +      iter.UserData()->RemoveFromTable();
> +    }
> +    delete userDataTable;

Can we just declare a FramePropertyDescriptor for WebRenderUserDataTable, so we don't have to remember to copy and paste this code everywhere we remove a WebRenderUserDataTable property?

(In reply to Nathan Froyd [:froydnj] from comment #13)

Can we just declare a FramePropertyDescriptor for WebRenderUserDataTable, so
we don't have to remember to copy and paste this code everywhere we remove a
WebRenderUserDataTable property?

That's what we do right now which is kind of the problem

https://searchfox.org/mozilla-central/rev/dbddac86aadf1d4871fb350bbe66db43728a9f81/gfx/layers/wr/WebRenderUserData.h#251

The DestroyWebRenderUserDataTable function is called while we are in the middle of modifying the frame properties table, which causes the destruction of some objects, one of which modifies the frame properties table for the same frame in it's destructor, and thus leaving the frame properties table in an inconsistent state.

I agree this solution is not great, but it seems like it is just propagating the existing fix for this same problem to a new frame property type.

Matt said he ran into the same problem with retained display lists, which I think is why we call RemoveDisplayItemDataForDeletion in nsIPresShell::NotifyDestroyingFrame unconditionally (ie even if mIgnoreFrameDestruction is true, which is set if we are tearing down the whole frame tree so we don't have to worry about each frame individually because everything is going away to speed up destruction). This call means the display item frame properties get removed 'properly' and not by the DeleteAll properties call in nsFrame::DestroyFrom.

Oh, and as for why bug 1505871 caused this, the frame properties that get messed up are on a frame with a component transfer filter applied, so that would surely change the structure of the web render user data frame properties on that frame, and I guess the structure of layout/svg/tests/test_filter_crossorigin.html had just the right structure to tickle this bug once that change was made.

Comment on attachment 9047011 [details] [diff] [review]
webrenderframepropdelrecurse

Review of attachment 9047011 [details] [diff] [review]:
-----------------------------------------------------------------

Not my favorite but I have no immediate suggestions for how to do better.
Attachment #9047011 - Flags: review?(jmuizelaar) → review+

(In reply to Jeff Muizelaar [:jrmuizel] from comment #16)

Comment on attachment 9047011 [details] [diff] [review]
webrenderframepropdelrecurse

Review of attachment 9047011 [details] [diff] [review]:

Not my favorite but I have no immediate suggestions for how to do better.

We should at least add a comment about why it's being done this way though.

Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8fafeb13445
Remove webrender user data properties from frames first, then destory them. r=jrmuizel
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

Comment on attachment 9047011 [details] [diff] [review]
webrenderframepropdelrecurse

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: 1505871 caused this to show up on tests, but i dont see any reason we couldn't hit it before that
  • User impact if declined: webrender crashes
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): only affects the order that we destroy things to be in a safer manner. should have no effect if webrender is disabled.
  • String changes made/needed: none
Attachment #9047011 - Flags: approval-mozilla-beta?
Comment on attachment 9047011 [details] [diff] [review]
webrenderframepropdelrecurse

Fix for WebRender issue, should be helpful for 66 experiments (and limited to them).
OK to uplift for beta 14.
Attachment #9047011 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: