Closed Bug 1583105 Opened 5 years ago Closed 5 years ago

Crash in [@ `anonymous namespace'::wasapi_stream_render_loop]

Categories

(Core :: Audio/Video: cubeb, defect, P2)

All
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox-esr60 --- wontfix
firefox-esr68 --- wontfix
firefox67 --- wontfix
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- fixed

People

(Reporter: gsvelto, Assigned: kinetik)

References

Details

(Keywords: crash)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-354eb52b-7449-48d6-a8fc-33ca60190922.

Top 5 frames of crashing thread:

0 xul.dll static unsigned int `anonymous namespace'::wasapi_stream_render_loop media/libcubeb/src/cubeb_wasapi.cpp:1189
1 ucrtbase.dll thread_start<unsigned int , 1> 
2 kernel32.dll BaseThreadInitThunk 
3 mozglue.dll static void patched_BaseThreadInitThunk mozglue/dllservices/WindowsDllBlocklist.cpp:562
4 ntdll.dll RtlUserThreadStart 

This is not a new bug as I can find reports going back months though there's been an increase in September. While 64-bit crashes seem to be hitting the 0xffffffffffffffff address most 32-bit ones have the memory poison pattern as the crashing address so this might be an use-after-free bug on those platforms.

The recent crashes all point to the emergency_bailout check before we enter WaitForMultipleObjects, assuming the crash stacks are reliable in this case. I can't immediately find a way for the render thread's on-stack copy of emergency_bailout to be anything but a valid pointer (or nullptr if new is fallible). I can't see any recent changes to cubeb_wasapi.cpp that'd explain a spike in Firefox 69, either.

Priority: -- → P2

If the WaitForSingleObject waiting on the render thread to exit in stop_and_join_render_thread returns anything other than the explicitly handled results WAIT_TIMEOUT or WAIT_FAILED, we assume it was successful (WAIT_OBJECT_0) and will delete the emergency_bailout pointer. The only other documented result for WFSO is WAIT_ABANDONED, which I don't think applies in this case, so I don't think this is the cause but I don't have any better ideas so far.

https://social.msdn.microsoft.com/Forums/vstudio/en-US/6455b373-6975-493c-9676-d4e8da1fc70d/how-to-check-if-a-thread-is-suspended?forum=vcgeneral#a0f6a57f-a748-4517-8a7d-5ad89ff7b336 suggests WFSO can return WAIT_ABANDONED for thread handle waits if the target thread is suspended. So, even if this doesn't explain the crash, we should probably fix the code to handle this case.

Since the access shows a memory poison pattern it's likely that the cubeb_stream object has already been freed when wasapi_stream_render_loop() is called. Without knowing the code in detail that smells like a race to me. Is it possible for stop_and_join_render_thread() to fail so that the thread keeps running? That would explain what's happening here as the next call to wasapi_stream_render_loop() would be running with a dead cubeb_stream object.

Yes, this is why there is the emergency bailout boolean, but comment 2 is onto something maybe.

It seems like WAIT_ABANDONED is only for Mutex, though, as Matthew says.

dmajor sat with me today and found that this probably wasn't the best counter measure: if stm was freed in between creating the thread and the thread actually starting, this would read a poisoning patter and thus not exiting immediately here.

Does that make sense?

Comment 3 doesn't seem to be true, at least on Windows 10 (1903). I tried suspending the render thread programmatically (via SuspendThread(stm->thread)) and externally (via Process Explorer) - WFSO returns WAIT_TIMEOUT after the specified 5s timeout. Same result for 0 and INFINITE timeouts.

(In reply to Paul Adenot (:padenot) from comment #6)

dmajor sat with me today and found that this probably wasn't the best counter measure: if stm was freed in between creating the thread and the thread actually starting, this would read a poisoning patter and thus not exiting immediately here.

This is the only issue I can find after spending more time staring at this.

In the binaries I looked at, wasapi_stream_render_loop uses rbx to hold emergency_bailout. In every crash report I looked at, the actual value of rbx was either 0xe5e5e5e5e5e5e5e5 or 0xe5e5e5e5e5e50000. The only way for that to happen is as you describe above, i.e. we call wasapi_stream_start/wasapi_stream_destroy (hit the 5s WFSO timeout and trigger bailout) all before wasapi_stream_render_loop had a chance to copy stm->emergency_bailout to a local variable.

It seems unusual that wasapi_stream_render_loop would be stalled for >5s, so I don't have an explanation for why that would've started happening more frequently in September.

Assignee: nobody → kinetik
Status: NEW → ASSIGNED
Attached file GitHub Pull Request

This landed via the libcubeb resync in bug 1575883.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Depends on: 1575883
Target Milestone: --- → mozilla71
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: