Closed Bug 1634595 Opened 2 years ago Closed 2 years ago

Crash in [@ InvalidArrayIndex_CRASH | mozilla::layers::RecordCompositionPayloadsPresented]


(Core :: Graphics: WebRender, defect, P3)

Windows 10



Tracking Status
firefox-esr68 --- unaffected
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- fixed


(Reporter: mccr8, Assigned: sefeng)



(Keywords: crash, regression)

Crash Data


(1 file)

This bug is for crash report bp-8692b5db-644c-40a1-846d-cc7650200322.

Top 10 frames of crashing thread:

0 xul.dll InvalidArrayIndex_CRASH xpcom/ds/nsTArray.cpp:27
1 xul.dll mozilla::layers::RecordCompositionPayloadsPresented gfx/layers/Layers.cpp:2312
2 xul.dll mozilla::layers::CompositorBridgeParent::NotifyPipelineRendered gfx/layers/ipc/CompositorBridgeParent.cpp:2381
3 xul.dll mozilla::wr::NotifyDidRender gfx/webrender_bindings/RenderThread.cpp:433
4 xul.dll RunnableFunction<void  ipc/chromium/src/base/task.h:324
5 xul.dll MessageLoop::DoWork ipc/chromium/src/base/
6 xul.dll base::MessagePumpForUI::DoRunLoop ipc/chromium/src/base/
7 xul.dll base::MessagePumpWin::Run ipc/chromium/src/base/message_pump_win.h:79
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/
9 xul.dll MessageLoop::Run ipc/chromium/src/base/

This signature previously showed up in bug 1617432 when bug 1600793 landed, and went away when it was backed out. Then this signature came back again in the 20200318041411 build. Bug 1600793 relanded around March 13th, so the timing doesn't quite line up, but I figured I'd mention it.

This is the set of patches added in the 20200318041411 build:

Some of the indexes are quite a bit off from the length, so this isn't the usual "forgot to check if an array is empty" I see with invalid array index crashes. For instance: ElementAt(aIndex = 9105, aLength = 2), ElementAt(aIndex = 4, aLength = 2).

So the majority of them are coming from Fenix with a 0x0 crash address. There is where we access the array

I wonder if this is still related to the race condition we found before. There was a race condition when using GetPendingScrollPayload and RemovePendingScrollPayload thus a lock was added, however the lock was released at the end of the function, not at when we finish using the returned array.

Based on these crash reports, looks like it's possible that the array was removed after we acquired it by using GetPendingScrollPayload, thus the crash. I think we should extend the life time of the lock to the end of the array usage.

What do you think Botond?

Flags: needinfo?(botond)

I agree, iterating over the array without holding the lock is suspicious and could potentially cause issues like this, since another thread could modify the array using AddPendingScrollPayload() in the meantime.

To solve this, we could replace GetPendingScrollPayload() and RemovePendingScrollPayload() with a single API, TakePendingScrollPayload(), which returns the nsTArray<CompositionPayload> by value, moving the array's contents from the hash table into the return value while the lock is held. Then, the caller can iterate over the array safely (without needing to hold the lock) since it's already been removed from the hash table.

Flags: needinfo?(botond)
Severity: -- → S3
Priority: -- → P3

The lock we added in GetPendingScrollPayload and
RemovePendingScrollPayload are released at the end of the function,
not at when we finish using the underlying data, thus we may still
end up recording a payload that has been deleted.

To fix this, we introduce a new method to move the data from the
hashtable into the array argument while the lock is held, then the
caller can iterate over the array safely.

Assignee: nobody → sefeng
Pushed by
Move the element out of pendingScrollPayload before recording r=botond
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78
You need to log in before you can comment on or make changes to this bug.