Closed Bug 1706261 Opened 3 years ago Closed 3 years ago

fix race conditions leading to tab crashes with multiple connecting webrtc clients

Categories

(Core :: WebRTC, defect, P2)

All
OpenBSD
defect

Tracking

()

RESOLVED FIXED
91 Branch
Tracking Status
firefox91 --- fixed

People

(Reporter: gaston, Assigned: bwc)

References

()

Details

Attachments

(1 file)

see https://marc.info/?t=161787032500003&r=1&w=2 for the history of this report (im not the reporter, just the messenger)

from my understanding of the thread, libwebrtc is sloppy in that it uses thread_ in https://searchfox.org/mozilla-central/source/third_party/libwebrtc/webrtc/rtc_base/platform_thread.cc#363 before it might get initialized.

this leads to race conditions, and on OpenBSD leads to webrtc tab crashing a lot when there are more than two users in a jitsi chat.

replacing thread_ by pthread_self() would seem to fix the issue, as proposed in https://marc.info/?l=openbsd-bugs&m=161890243910121&w=2

The fundamental problem is that the thread_ member of the PlatformThread class may be used in the child thread before it has been set in the parent thread. POSIX says that:

"Upon successful completion, pthread_create() shall store the ID of the created thread in the location referenced by <thread>."

which means that pthread_create() should not update the location referenced by <thread> before it knows that the thread has been created sucessfully. But at that point the thread that has been created may already be running. So there is a potential race. On Linux the race doesn't happen because the NPTL code that is part of glibc does update the location referenced by <thread> before it creates the new thread. But that ignores the "upon successful completion" condition imposed by POSIX.

Using pthread_self() instead of thread_ as proposed in https://marc.info/?l=openbsd-bugs&m=161890243910121&w=2 fixes the immediate problem but doesn't eliminate any races with using the thread_ member of the PlatformThread class in the run function of the created thread. In order to fix that issue it may be better to do thread_ = pthread_self() early on in PlatformThread::run().

It appears that the fix proposed in the openbsd thread has already been applied to upstream:

https://source.chromium.org/chromium/chromium/src/+/master:third_party/webrtc/rtc_base/platform_thread.cc;l=80;drc=221e331b49dfefadbc6fa40b0c68e6f97606d0b3;bpv=1;bpt=1

It is not fixed in the version of libwebrtc we are updating to in bug 1654112, however. We could probably cherry-pick the fix.

Severity: -- → S2
Priority: -- → P2

that fix has been tested by multiple openbsd users with jitsi (and was commited to our packaging system), allowing seamless conferences with a dozen of persons. without the patch the tab would crash all the time..from the analysis of pthread_create() implementations, the crash would only affect OpenBSD and NetBSD.

i havent been able to find which upstream commit as i didnt find annotate/blame in the chromium source browser.

but doesn't eliminate any races with using the thread_ member of the PlatformThread class in the run function of the created thread. In order to fix that issue it may be better to do thread_ = pthread_self() early on in PlatformThread::run().

byron, how about that suggestion too ?

Flags: needinfo?(docfaraday)

It does not appear that |thread_| is ever used by the created thread in the current chromium code, so that is probably fine. The commit that fixed this is here:

https://source.chromium.org/chromium/_/webrtc/src.git/+/97c4458c8fd7b6e81a95f3a0c268d9fe6a335311

It covers a bunch more than just this bug, so the cherry-pick might not be trivial. The patch from the openbsd community looks mostly sufficient, although we might want to also take this line:

https://source.chromium.org/chromium/chromium/src/+/master:third_party/webrtc/rtc_base/platform_thread.cc;l=38;drc=221e331b49dfefadbc6fa40b0c68e6f97606d0b3;bpv=1;bpt=0

Flags: needinfo?(docfaraday)

thread_ is exposed through PlatformThread::GetThreadRef(), so if that ends being used in the newly created thread, there still is race, but fixing that probably doesn't justify deviating from the upstream codebase.

:bwc, can you pull the patch ? we're still applying locally, and it really helps webrtc stability.

Flags: needinfo?(docfaraday)
Blocks: 1714115

Try is a horrorshow, but that just seems to be baseline right now.

Flags: needinfo?(docfaraday)
Assignee: nobody → docfaraday
Status: NEW → ASSIGNED

I'm pretty sure this should do the trick, but it might not be a bad idea for you to test the changeset on the try pushes in comment 8 and verify.

Flags: needinfo?(landry)
Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4f00ba3efcdb
Avoid racy accesses to thread_ r=ng
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 91 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: