fix race conditions leading to tab crashes with multiple connecting webrtc clients
Categories
(Core :: WebRTC, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox91 | --- | fixed |
People
(Reporter: gaston, Assigned: bwc)
References
()
Details
Attachments
(1 file)
see https://marc.info/?t=161787032500003&r=1&w=2 for the history of this report (im not the reporter, just the messenger)
from my understanding of the thread, libwebrtc is sloppy in that it uses thread_ in https://searchfox.org/mozilla-central/source/third_party/libwebrtc/webrtc/rtc_base/platform_thread.cc#363 before it might get initialized.
this leads to race conditions, and on OpenBSD leads to webrtc tab crashing a lot when there are more than two users in a jitsi chat.
replacing thread_
by pthread_self()
would seem to fix the issue, as proposed in https://marc.info/?l=openbsd-bugs&m=161890243910121&w=2
Comment 1•3 years ago
|
||
The fundamental problem is that the thread_ member of the PlatformThread class may be used in the child thread before it has been set in the parent thread. POSIX says that:
"Upon successful completion, pthread_create() shall store the ID of the created thread in the location referenced by <thread>."
which means that pthread_create() should not update the location referenced by <thread> before it knows that the thread has been created sucessfully. But at that point the thread that has been created may already be running. So there is a potential race. On Linux the race doesn't happen because the NPTL code that is part of glibc does update the location referenced by <thread> before it creates the new thread. But that ignores the "upon successful completion" condition imposed by POSIX.
Using pthread_self() instead of thread_ as proposed in https://marc.info/?l=openbsd-bugs&m=161890243910121&w=2 fixes the immediate problem but doesn't eliminate any races with using the thread_ member of the PlatformThread class in the run function of the created thread. In order to fix that issue it may be better to do thread_ = pthread_self() early on in PlatformThread::run().
Assignee | ||
Comment 2•3 years ago
|
||
It appears that the fix proposed in the openbsd thread has already been applied to upstream:
It is not fixed in the version of libwebrtc we are updating to in bug 1654112, however. We could probably cherry-pick the fix.
Reporter | ||
Comment 3•3 years ago
|
||
that fix has been tested by multiple openbsd users with jitsi (and was commited to our packaging system), allowing seamless conferences with a dozen of persons. without the patch the tab would crash all the time..from the analysis of pthread_create() implementations, the crash would only affect OpenBSD and NetBSD.
i havent been able to find which upstream commit as i didnt find annotate/blame in the chromium source browser.
but doesn't eliminate any races with using the thread_ member of the PlatformThread class in the run function of the created thread. In order to fix that issue it may be better to do thread_ = pthread_self() early on in PlatformThread::run().
byron, how about that suggestion too ?
Assignee | ||
Comment 4•3 years ago
•
|
||
It does not appear that |thread_| is ever used by the created thread in the current chromium code, so that is probably fine. The commit that fixed this is here:
https://source.chromium.org/chromium/_/webrtc/src.git/+/97c4458c8fd7b6e81a95f3a0c268d9fe6a335311
It covers a bunch more than just this bug, so the cherry-pick might not be trivial. The patch from the openbsd community looks mostly sufficient, although we might want to also take this line:
Comment 5•3 years ago
|
||
thread_ is exposed through PlatformThread::GetThreadRef(), so if that ends being used in the newly created thread, there still is race, but fixing that probably doesn't justify deviating from the upstream codebase.
Reporter | ||
Comment 6•3 years ago
|
||
:bwc, can you pull the patch ? we're still applying locally, and it really helps webrtc stability.
Assignee | ||
Comment 7•3 years ago
|
||
Assignee | ||
Comment 8•3 years ago
•
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=60145f7db5e573448da6ced868fdea7bca2b580e https://treeherder.mozilla.org/jobs?repo=try&revision=9eea5928725de98b64f219d73d9b1fe0342ead4b
Assignee | ||
Comment 9•3 years ago
|
||
Try is a horrorshow, but that just seems to be baseline right now.
Assignee | ||
Comment 10•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 11•3 years ago
|
||
I'm pretty sure this should do the trick, but it might not be a bad idea for you to test the changeset on the try pushes in comment 8 and verify.
Reporter | ||
Comment 12•3 years ago
|
||
https://hg.mozilla.org/try/rev/bdfe40b55317a7fc64541ec6ccecd22cddaef0f3 matches what we've been shipping to -current users (cf http://cvsweb.openbsd.org/cgi-bin/cvsweb/ports/www/mozilla-firefox/patches/patch-third_party_libwebrtc_webrtc_rtc_base_platform_thread_cc?rev=1.1&content-type=text/x-cvsweb-markup) and backported to 6.9 since a month (cf http://cvsweb.openbsd.org/cgi-bin/cvsweb/ports/www/mozilla-firefox/patches/patch-third_party_libwebrtc_webrtc_rtc_base_platform_thread_cc?rev=1.1.2.1&content-type=text/x-cvsweb-markup) and it really improved the webrtc stability with multiple clients (tested by multiple ppl), so i'm confident this works :) thanks !
Comment 13•3 years ago
|
||
Pushed by bcampen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4f00ba3efcdb Avoid racy accesses to thread_ r=ng
Comment 14•3 years ago
|
||
bugherder |
Description
•