Intermittent GECKO(10828) | Assertion failure: aCount == 0 || aStart < Length() (Invalid aStart index), at z:/build/build/src/obj-firefox/dist/include/nsTArray.h:2222

RESOLVED FIXED in Firefox 66

Status

()

defect
RESOLVED FIXED
3 months ago
3 months ago

People

(Reporter: intermittent-bug-filer, Assigned: tnikkel)

Tracking

({assertion, intermittent-failure, regression})

unspecified
mozilla67
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox65 unaffected, firefox66 fixed, firefox67 fixed)

Details

Attachments

(2 attachments, 1 obsolete attachment)

Reporter

Description

3 months ago
treeherder

#[markdown(off)]
Filed by: cbrindusan [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=230513528&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/KCQKYSOoS0Cr1gVkkmzrDA/runs/0/artifacts/public/logs/live_backing.log

11:31:22 INFO - TEST-START | layout/svg/tests/test_hover_near_text.html
11:31:22 INFO - GECKO(10828) | Assertion failure: aCount == 0 || aStart < Length() (Invalid aStart index), at z:/build/build/src/obj-firefox/dist/include/nsTArray.h:2222
11:31:22 INFO - GECKO(10828) | #01: static void DestroyDisplayItemDataForFrames(class nsIFrame *) [layout/generic/nsSubDocumentFrame.cpp:0]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #02: static void DestroyDisplayItemDataForFrames(class nsIFrame *) [layout/generic/nsSubDocumentFrame.cpp:0]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #03: static void DestroyDisplayItemDataForFrames(class nsIFrame *) [layout/generic/nsSubDocumentFrame.cpp:0]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #04: static void DestroyDisplayItemDataForFrames(class nsIFrame *) [layout/generic/nsSubDocumentFrame.cpp:0]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #05: static void DestroyDisplayItemDataForFrames(class nsIFrame *) [layout/generic/nsSubDocumentFrame.cpp:0]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #06: static class nsView * BeginSwapDocShellsForViews(class nsView *) [layout/generic/nsSubDocumentFrame.cpp:1046]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #07: nsSubDocumentFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsSubDocumentFrame.cpp:952]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #08: nsLineBox::DeleteLineList(nsPresContext *,nsLineList &,nsIFrame *,nsFrameList *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsLineBox.cpp:365]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #09: nsBlockFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsBlockFrame.cpp:329]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #10: nsLineBox::DeleteLineList(nsPresContext *,nsLineList &,nsIFrame *,nsFrameList *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsLineBox.cpp:365]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #11: nsBlockFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsBlockFrame.cpp:329]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #12: nsFrameList::DestroyFramesFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsFrameList.cpp:50]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #13: nsContainerFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsContainerFrame.cpp:214]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #14: nsCanvasFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsCanvasFrame.cpp:218]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #15: nsFrameList::DestroyFramesFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsFrameList.cpp:50]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #16: nsContainerFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsContainerFrame.cpp:214]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #17: nsFrameList::DestroyFramesFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsFrameList.cpp:50]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #18: nsContainerFrame::DestroyFrom(nsIFrame *,mozilla::layout::PostFrameDestroyData &) [layout/generic/nsContainerFrame.cpp:214]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #19: nsFrameManager::Destroy() [layout/base/nsFrameManager.cpp:53]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #20: mozilla::PresShell::Destroy() [layout/base/PresShell.cpp:1351]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #21: nsDocumentViewer::DestroyPresShell() [layout/base/nsDocumentViewer.cpp:4249]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #22: nsDocumentViewer::Destroy() [layout/base/nsDocumentViewer.cpp:1799]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #23: nsDocumentViewer::Show() [layout/base/nsDocumentViewer.cpp:2088]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #24: nsPresContext::EnsureVisible() [layout/base/nsPresContext.cpp:1751]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #25: nsIPresShell::UnsuppressAndInvalidate() [layout/base/PresShell.cpp:3798]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #26: mozilla::PresShell::sPaintSuppressionCallback(nsITimer *,void *) [layout/base/PresShell.cpp:1835]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #27: nsTimerImpl::Fire(int) [xpcom/threads/nsTimerImpl.cpp:559]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #28: nsTimerEvent::Run() [xpcom/threads/TimerThread.cpp:260]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #29: mozilla::SchedulerGroup::Runnable::Run() [xpcom/threads/SchedulerGroup.cpp:292]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #30: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:1149]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #31: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/threads/nsThreadUtils.cpp:474]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #32: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:88]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #33: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:309]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #34: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:291]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #35: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:139]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #36: nsAppShell::Run() [widget/windows/nsAppShell.cpp:411]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #37: XRE_RunAppShell() [toolkit/xre/nsEmbedFunctions.cpp:908]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #38: mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:238]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #39: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:309]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #40: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:291]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #41: XRE_InitChildProcess(int,char * * const,XREChildData const *) [toolkit/xre/nsEmbedFunctions.cpp:750]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #42: NS_internal_main(int,char * *,char * *) [browser/app/nsBrowserApp.cpp:265]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #43: wmain [toolkit/xre/nsWindowsWMain.cpp:131]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #44: static int __scrt_common_main_seh() [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:288]
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #45: KERNEL32.DLL + 0x13034
11:31:22 INFO -
11:31:22 INFO - GECKO(10828) | #46: ntdll.dll + 0x71461

An intermittent array-out-of-bounds access seems bad.

Component: XPCOM → Layout
Priority: P5 → --

Out of bounds in DestroyDisplayItemDataForFrames... I don't know anything in there looking terribly suspicious so far...

Ryan has done some subdoc work recently for Fission, and maybe Miko or Matt have some ideas?

Could we get this retriggered ASAP please to see when it started?

Flags: needinfo?(sheriffs)

Huh, so this looks like a recent-ish regression then.

Also this is on the Clear() call for DisplayItemData in DestroyDisplayItemDataFor, I guess (the assertion is on RemoveElementsAt). Or in RemoveFrameFromLayerManager I guess, though that's more unlikely to be inlined.

First fail is for a Try push for bug 1505871 which landed earlier today on inbound where the 3 failures from today can be found.

Flags: needinfo?(sheriffs)

Timothy, please investigate that assertion from bug 1505871.

Blocks: 1505871
Flags: needinfo?(tnikkel)
Assignee

Comment 7

3 months ago

Surprisingly this reproduces almost every time for me locally. I also get

Assertion failure: IsIdle(oldState) || IsRead(oldState), at /Users/tim/ffwho/src/xpcom/ds/PLDHashTable.h:125

sometimes instead, but I think it's probably the same issue hitting a different assert.

Looks like the problem happens when destroying the webrenderuserdata property.

(In reply to Timothy Nikkel (:tnikkel) from comment #7)

Surprisingly this reproduces almost every time for me locally. I also get

Assertion failure: IsIdle(oldState) || IsRead(oldState), at /Users/tim/ffwho/src/xpcom/ds/PLDHashTable.h:125

sometimes instead, but I think it's probably the same issue hitting a different assert.

FWIW, this is PLDHashTable's thread safety assert, which probably (partly) explains the intermittency.

Assignee

Comment 9

3 months ago
Posted file stack.txt
Assignee

Comment 10

3 months ago

So looks like whats happening here is we are destroying a parent document, when we destroy the subdocument frame we also clear the painting related frame properties including the webrender properties on the frames inside the subdocument. So we are destroying the WebRenderUserDataTable set as a frame property on a frame in the child document. This finds its way into BlobItemData::ClearFrame which deletes the BlobGroupDataProperty property on the same frame, whilst we are still deleting the WebRenderUserData property. This leaves the frame property table for that frame in a bad state. Then later we destroy the child document, and try to clear all it's frame properties and assert cause the table is not in a consistent state.

Flags: needinfo?(tnikkel)
Assignee

Comment 11

3 months ago
Posted patch webrenderframepropdelrecurse (obsolete) — Splinter Review

This fix suggested by Matt.

Not sure if you want to go with this approach or some changes to the data structures involved.

Assignee: nobody → tnikkel
Attachment #9047009 - Flags: review?(jmuizelaar)
Assignee

Comment 12

3 months ago
Attachment #9047009 - Attachment is obsolete: true
Attachment #9047009 - Flags: review?(jmuizelaar)
Attachment #9047011 - Flags: review?(jmuizelaar)
Comment on attachment 9047011 [details] [diff] [review]
webrenderframepropdelrecurse

Review of attachment 9047011 [details] [diff] [review]:
-----------------------------------------------------------------

::: layout/generic/nsFrame.cpp
@@ +910,5 @@
> +  if (userDataTable) {
> +    for (auto iter = userDataTable->Iter(); !iter.Done(); iter.Next()) {
> +      iter.UserData()->RemoveFromTable();
> +    }
> +    delete userDataTable;

Can we just declare a FramePropertyDescriptor for WebRenderUserDataTable, so we don't have to remember to copy and paste this code everywhere we remove a WebRenderUserDataTable property?
Assignee

Comment 14

3 months ago

(In reply to Nathan Froyd [:froydnj] from comment #13)

Can we just declare a FramePropertyDescriptor for WebRenderUserDataTable, so
we don't have to remember to copy and paste this code everywhere we remove a
WebRenderUserDataTable property?

That's what we do right now which is kind of the problem

https://searchfox.org/mozilla-central/rev/dbddac86aadf1d4871fb350bbe66db43728a9f81/gfx/layers/wr/WebRenderUserData.h#251

The DestroyWebRenderUserDataTable function is called while we are in the middle of modifying the frame properties table, which causes the destruction of some objects, one of which modifies the frame properties table for the same frame in it's destructor, and thus leaving the frame properties table in an inconsistent state.

I agree this solution is not great, but it seems like it is just propagating the existing fix for this same problem to a new frame property type.

Matt said he ran into the same problem with retained display lists, which I think is why we call RemoveDisplayItemDataForDeletion in nsIPresShell::NotifyDestroyingFrame unconditionally (ie even if mIgnoreFrameDestruction is true, which is set if we are tearing down the whole frame tree so we don't have to worry about each frame individually because everything is going away to speed up destruction). This call means the display item frame properties get removed 'properly' and not by the DeleteAll properties call in nsFrame::DestroyFrom.

Assignee

Comment 15

3 months ago

Oh, and as for why bug 1505871 caused this, the frame properties that get messed up are on a frame with a component transfer filter applied, so that would surely change the structure of the web render user data frame properties on that frame, and I guess the structure of layout/svg/tests/test_filter_crossorigin.html had just the right structure to tickle this bug once that change was made.

Comment on attachment 9047011 [details] [diff] [review]
webrenderframepropdelrecurse

Review of attachment 9047011 [details] [diff] [review]:
-----------------------------------------------------------------

Not my favorite but I have no immediate suggestions for how to do better.
Attachment #9047011 - Flags: review?(jmuizelaar) → review+

(In reply to Jeff Muizelaar [:jrmuizel] from comment #16)

Comment on attachment 9047011 [details] [diff] [review]
webrenderframepropdelrecurse

Review of attachment 9047011 [details] [diff] [review]:

Not my favorite but I have no immediate suggestions for how to do better.

We should at least add a comment about why it's being done this way though.

Comment 18

3 months ago
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8fafeb13445
Remove webrender user data properties from frames first, then destory them. r=jrmuizel

Comment 19

3 months ago
bugherder
Status: NEW → RESOLVED
Last Resolved: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Comment hidden (Intermittent Failures Robot)
Assignee

Comment 21

3 months ago

Comment on attachment 9047011 [details] [diff] [review]
webrenderframepropdelrecurse

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: 1505871 caused this to show up on tests, but i dont see any reason we couldn't hit it before that
  • User impact if declined: webrender crashes
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): only affects the order that we destroy things to be in a safer manner. should have no effect if webrender is disabled.
  • String changes made/needed: none
Attachment #9047011 - Flags: approval-mozilla-beta?
Comment on attachment 9047011 [details] [diff] [review]
webrenderframepropdelrecurse

Fix for WebRender issue, should be helpful for 66 experiments (and limited to them).
OK to uplift for beta 14.
Attachment #9047011 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.