Closed Bug 1634595 Opened 4 years ago Closed 4 years ago

Crash in [@ InvalidArrayIndex_CRASH | mozilla::layers::RecordCompositionPayloadsPresented]

Categories

(Core :: Graphics: WebRender, defect, P3)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla78
Tracking Status
firefox-esr68 --- unaffected
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- fixed

People

(Reporter: mccr8, Assigned: sefeng)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-8692b5db-644c-40a1-846d-cc7650200322.

Top 10 frames of crashing thread:

0 xul.dll InvalidArrayIndex_CRASH xpcom/ds/nsTArray.cpp:27
1 xul.dll mozilla::layers::RecordCompositionPayloadsPresented gfx/layers/Layers.cpp:2312
2 xul.dll mozilla::layers::CompositorBridgeParent::NotifyPipelineRendered gfx/layers/ipc/CompositorBridgeParent.cpp:2381
3 xul.dll mozilla::wr::NotifyDidRender gfx/webrender_bindings/RenderThread.cpp:433
4 xul.dll RunnableFunction<void  ipc/chromium/src/base/task.h:324
5 xul.dll MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:523
6 xul.dll base::MessagePumpForUI::DoRunLoop ipc/chromium/src/base/message_pump_win.cc:203
7 xul.dll base::MessagePumpWin::Run ipc/chromium/src/base/message_pump_win.h:79
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
9 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:290

This signature previously showed up in bug 1617432 when bug 1600793 landed, and went away when it was backed out. Then this signature came back again in the 20200318041411 build. Bug 1600793 relanded around March 13th, so the timing doesn't quite line up, but I figured I'd mention it.

This is the set of patches added in the 20200318041411 build: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=7c83f04c82e9ef6c1d2823a64aedd34a79a90eb9&tochange=70f8ce3e2d394a8c4d08725b108003844abbbff9

Some of the indexes are quite a bit off from the length, so this isn't the usual "forgot to check if an array is empty" I see with invalid array index crashes. For instance: ElementAt(aIndex = 9105, aLength = 2), ElementAt(aIndex = 4, aLength = 2).

So the majority of them are coming from Fenix with a 0x0 crash address. There is where we access the array https://hg.mozilla.org/mozilla-central/annotate/262c8adb52655e2028595d9838903b8b5a0a87da/gfx/layers/Layers.cpp#l2299.

I wonder if this is still related to the race condition we found before. There was a race condition when using GetPendingScrollPayload and RemovePendingScrollPayload thus a lock was added, however the lock was released at the end of the function, not at when we finish using the returned array.

Based on these crash reports, looks like it's possible that the array was removed after we acquired it by using GetPendingScrollPayload, thus the crash. I think we should extend the life time of the lock to the end of the array usage.

What do you think Botond?

Flags: needinfo?(botond)

I agree, iterating over the array without holding the lock is suspicious and could potentially cause issues like this, since another thread could modify the array using AddPendingScrollPayload() in the meantime.

To solve this, we could replace GetPendingScrollPayload() and RemovePendingScrollPayload() with a single API, TakePendingScrollPayload(), which returns the nsTArray<CompositionPayload> by value, moving the array's contents from the hash table into the return value while the lock is held. Then, the caller can iterate over the array safely (without needing to hold the lock) since it's already been removed from the hash table.

Flags: needinfo?(botond)
Severity: -- → S3
Priority: -- → P3

The lock we added in GetPendingScrollPayload and
RemovePendingScrollPayload are released at the end of the function,
not at when we finish using the underlying data, thus we may still
end up recording a payload that has been deleted.

To fix this, we introduce a new method to move the data from the
hashtable into the array argument while the lock is held, then the
caller can iterate over the array safely.

Assignee: nobody → sefeng
Status: NEW → ASSIGNED
Pushed by sefeng@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e450d28d57a0
Move the element out of pendingScrollPayload before recording r=botond
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78
You need to log in before you can comment on or make changes to this bug.