Crash in WinSqmSetIfMaxDWORD called from SCTP sowakeup() on association abort
Categories
(Core :: WebRTC: Networking, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox-esr60 | --- | wontfix |
firefox-esr68 | --- | fixed |
firefox55 | --- | wontfix |
firefox56 | --- | wontfix |
firefox57 | --- | wontfix |
firefox58 | --- | wontfix |
firefox59 | + | wontfix |
firefox60 | --- | wontfix |
firefox61 | --- | wontfix |
firefox62 | --- | wontfix |
firefox63 | --- | wontfix |
firefox64 | --- | wontfix |
firefox65 | --- | wontfix |
firefox66 | --- | wontfix |
firefox67 | --- | wontfix |
firefox68 | --- | fixed |
firefox69 | --- | fixed |
People
(Reporter: jesup, Assigned: ng)
References
Details
(5 keywords, Whiteboard: [adv-main68+])
Crash Data
Attachments
(7 files, 2 obsolete files)
2.80 KB,
patch
|
drno
:
review+
ritu
:
approval-mozilla-beta+
dveditz
:
sec-approval+
|
Details | Diff | Splinter Review |
6.26 KB,
patch
|
drno
:
review+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
8.73 KB,
patch
|
drno
:
review+
ritu
:
approval-mozilla-beta+
dveditz
:
sec-approval+
|
Details | Diff | Splinter Review |
2.33 KB,
patch
|
froydnj
:
review+
ritu
:
approval-mozilla-beta+
dveditz
:
sec-approval+
|
Details | Diff | Splinter Review |
1.14 KB,
patch
|
padenot
:
review+
|
Details | Diff | Splinter Review |
20.23 KB,
patch
|
Details | Diff | Splinter Review | |
47 bytes,
text/x-phabricator-request
|
abillings
:
sec-approval+
|
Details | Review |
Reporter | ||
Comment 3•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 4•7 years ago
|
||
Reporter | ||
Comment 5•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 7•7 years ago
|
||
Updated•7 years ago
|
Comment 8•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 9•7 years ago
|
||
Updated•7 years ago
|
Comment 10•7 years ago
|
||
Comment 11•7 years ago
|
||
Reporter | ||
Comment 12•7 years ago
|
||
Reporter | ||
Comment 13•7 years ago
|
||
Reporter | ||
Comment 14•7 years ago
|
||
Comment 15•7 years ago
|
||
Reporter | ||
Comment 16•7 years ago
|
||
Reporter | ||
Comment 17•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Comment 18•7 years ago
|
||
Reporter | ||
Comment 19•7 years ago
|
||
Comment 20•7 years ago
|
||
Reporter | ||
Comment 21•7 years ago
|
||
Updated•7 years ago
|
Comment 22•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 23•7 years ago
|
||
Reporter | ||
Comment 24•7 years ago
|
||
Comment 25•7 years ago
|
||
Updated•7 years ago
|
Comment 26•7 years ago
|
||
Reporter | ||
Comment 27•7 years ago
|
||
Reporter | ||
Comment 28•7 years ago
|
||
Reporter | ||
Comment 29•7 years ago
|
||
Reporter | ||
Comment 30•7 years ago
|
||
Reporter | ||
Comment 31•7 years ago
|
||
Comment 33•7 years ago
|
||
uplift |
Reporter | ||
Comment 34•7 years ago
|
||
Reporter | ||
Comment 35•7 years ago
|
||
Comment 36•7 years ago
|
||
Reporter | ||
Comment 37•7 years ago
|
||
Reporter | ||
Comment 38•7 years ago
|
||
Comment 39•7 years ago
|
||
Updated•7 years ago
|
Reporter | ||
Comment 40•7 years ago
|
||
Reporter | ||
Comment 41•7 years ago
|
||
Reporter | ||
Comment 42•7 years ago
|
||
Comment 43•7 years ago
|
||
Comment 44•7 years ago
|
||
Reporter | ||
Comment 45•7 years ago
|
||
Comment 46•7 years ago
|
||
Reporter | ||
Comment 47•7 years ago
|
||
Reporter | ||
Comment 48•7 years ago
|
||
Reporter | ||
Comment 49•7 years ago
|
||
![]() |
||
Comment 50•7 years ago
|
||
Reporter | ||
Comment 51•7 years ago
|
||
Comment 52•7 years ago
|
||
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 53•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Updated•7 years ago
|
Updated•7 years ago
|
Reporter | ||
Comment 55•7 years ago
|
||
Reporter | ||
Comment 56•7 years ago
|
||
Comment 58•7 years ago
|
||
uplift |
Comment 59•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Comment 60•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Updated•7 years ago
|
Comment 61•7 years ago
|
||
uplift |
Reporter | ||
Comment 62•7 years ago
|
||
Reporter | ||
Comment 63•7 years ago
|
||
Comment 66•7 years ago
|
||
Reporter | ||
Comment 67•7 years ago
|
||
Reporter | ||
Comment 68•7 years ago
|
||
Updated•7 years ago
|
Updated•7 years ago
|
Comment 70•7 years ago
|
||
Reporter | ||
Comment 71•7 years ago
|
||
Reporter | ||
Updated•7 years ago
|
Reporter | ||
Updated•7 years ago
|
Updated•7 years ago
|
Comment 73•7 years ago
|
||
Comment 75•7 years ago
|
||
Updated•7 years ago
|
Updated•7 years ago
|
Updated•7 years ago
|
Comment 77•6 years ago
|
||
Updated•6 years ago
|
Reporter | ||
Comment 78•6 years ago
|
||
Very probably caused Bug 1521304
Reporter | ||
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 79•6 years ago
|
||
Does not appear fixed by bug 1521304, as we have no drop in crashes in beta 8 or 9.
This may well be a race condition within the SCTP library, where we're tearing down the stack from socketthread or mainthread (with our lock), and it's racing against an SCTP timer-thread abort.
The e5e5 crashes on accessing sb_cond imply it was deleted. sodealloc() deletes the sb_cond, and also destroys the lock (DeleteCriticalSection, which requires the lock not be owned), so we know sodealloc assumes all access is ended.
in sofree():
/*
* From this point on, we assume that no other references to this
* socket exist anywhere else in the stack. Therefore, no locks need
* to be acquired or held.
*
* We used to do a lot of socket buffer and socket locking here, as
* well as invoke sorflush() and perform wakeups. The direct call to
* dom_dispose() and sbrelease_internal() are an inlining of what was
* necessary from sorflush().
*
* Notice that the socket buffer and kqueue state are torn down
* before calling pru_detach. This means that protocols shold not
* assume they can perform socket wakeups, etc, in their detach code.
*/
sodealloc(so);
When a DataChannel closes, we call usrsctp_close()->soabort(so) (locks) ->sofree(so)->sctp_close(so) (not under lock anymore), which calls SCTP_OS_TIMER_STOP(hb_timer).
sofree() then calls sodealloc(), which frees the sb_cond and then frees so.
SCTP_OS_TIMER_STOP does:
SCTP_TIMERQ_LOCK();
/*
* Don't attempt to delete a callout that's not on the queue.
*/
if (!(c->c_flags & SCTP_CALLOUT_PENDING)) {
c->c_flags &= ~SCTP_CALLOUT_ACTIVE;
SCTP_TIMERQ_UNLOCK();
return (0);
}
c->c_flags &= ~(SCTP_CALLOUT_ACTIVE | SCTP_CALLOUT_PENDING);
if (c == sctp_os_timer_next) {
sctp_os_timer_next = TAILQ_NEXT(c, tqe);
}
TAILQ_REMOVE(&SCTP_BASE_INFO(callqueue), c, tqe);
SCTP_TIMERQ_UNLOCK();
sctp_handle_tick() does:
c->c_flags &= ~SCTP_CALLOUT_PENDING;
SCTP_TIMERQ_UNLOCK();
c_func(c_arg);
SCTP_TIMERQ_LOCK();
so if the timer is sitting in handle_tick() at c_func(), and on another thread we call SCTP_OS_TIMER_STOP(), the OS_TIMER_STOP can get the lock, and finds that the timer is no longer pending, and so unlocks returns (and per above, we can then call sodealloc() and free the cond var and the so. And then the timer thread wakes up, and calls c_func, which is sctp_timeout_handler() (etc...)
We cannot delete a socket (and free it's condvars) until we have positive confirmation that the timer has been removed and cannot be called - including being in the process of being called.
This could be a lock (though I'm sure there are reasons why the timer code unlocks before calling the callbacks; likely because that can manipulate the timer list). There are other ways you could synchronize, but they all amount to "if it's currently processing the timer you're removing, you have to wait".
It's possible I have some details wrong but this seems like a clear hole to me, and almost certainly the cause here.
NI various, including dminor and drno as FYI, since this is a top-UAF bug.
Comment 80•6 years ago
|
||
Nico has been looking at DataChannels lately.
Comment 81•6 years ago
|
||
Would love to help but I don't fully comprehend the threading model in Firefox.
Reporter | ||
Comment 82•6 years ago
|
||
This problem shouldn't depend on the threading model; any case where these occur on different threads and aren't covered by locks can cause this problem. The timer code runs on a different thread. Hitting this bug requires unfortunate timing, as is the norm with this sort of bug.
reiterating:
"We cannot delete a socket (and free it's condvars) until we have positive confirmation that the timer has been removed and cannot be called - including being in the process of being called.
This could be a lock (though I'm sure there are reasons why the timer code unlocks before calling the callbacks; likely because that can manipulate the timer list). There are other ways you could synchronize, but they all amount to "if it's currently processing the timer you're removing, you have to wait"."
Updated•6 years ago
|
Comment 83•6 years ago
|
||
To give a status update: a patch for the problem (which we should be able to cheery pick) is currently actively being worked on by our SCTP friends.
Comment 84•6 years ago
|
||
... to be more precise: Peter Lei (he is on this bug report) is working on it and testing to make sure it doesn't break anything.
Once that is finished, he will provide the link to the patches required to integrate.
Comment 85•6 years ago
|
||
The latest sctp library code should hopefully fix this- if you're not taking the full library update, you'll want to cherry pick the following commits (in "git log" order, so apply bottom up):
21943f4 Fix callout stop deadlock
faf6330 Fix for missing timer wait wakeups
d225e88 Fix for rescheduling a timer that is currently in progress.
164ca60 Pull in missing guard for the currently executing callout (v2)
93fe1b3 Fix windows build break
a7c2902 Pull in missing guard for the currently executing callout
Reporter | ||
Comment 86•6 years ago
|
||
Peter - thanks for diving in on this!
One typo:
+#define SCTP_TIMERWIT_UNLOCK() KASSERT(pthread_mutex_unlock(&sctp_os_timerwait_mtx) == 0, ("%s: sctp_os_timerwait_mtx not locked", func))
I think "TIMERWAIT"
Assignee | ||
Comment 87•6 years ago
|
||
This patch can be applied under the cherry pick patch set to get it to cleanly apply (but not compile).
Assignee | ||
Comment 88•6 years ago
|
||
Apply under the cherry pick patch queue.
Comment 89•6 years ago
|
||
(In reply to Randell Jesup [:jesup] (needinfo me) from comment #86)
Peter - thanks for diving in on this!
One typo:
+#define SCTP_TIMERWIT_UNLOCK() KASSERT(pthread_mutex_unlock(&sctp_os_timerwait_mtx) == 0, ("%s: sctp_os_timerwait_mtx not locked", func))I think "TIMERWAIT"
Ah, yes thanks! I don't think we normally build with INVARIANTS (hence missed that) but will fix that.
Assignee | ||
Comment 90•6 years ago
|
||
Peter, is the only additional patch that needs to be pulled in dfe89db Fix compile breakage under INVARIANTS
?
Comment 91•6 years ago
|
||
Nico, I think that is all you need.
We are currently improving our continuous integration tests for usrsctp to catch such kind of compile errors and adding other tests. That is why the status of the tests might be reported as failed on our website.
Comment 92•6 years ago
|
||
(In reply to Nico Grunbaum [:ng] from comment #90)
Peter, is the only additional patch that needs to be pulled in
dfe89db Fix compile breakage under INVARIANTS
?
Not sure if you absolutely require it; if INVARIANTS is defined, you'd need it as the build would be currently breaking. But yes, this is the only other commit that I know of w.r.t. the thread race fixes beyond the original set I posted.
Assignee | ||
Comment 93•6 years ago
|
||
Assignee | ||
Comment 94•6 years ago
|
||
One of the patches in this set mentions fixing a timer deadlock, we could perhaps use that as a cover patch.
Assignee | ||
Comment 95•6 years ago
|
||
Assignee | ||
Comment 96•6 years ago
|
||
timer deadlock fix
Assignee | ||
Comment 97•6 years ago
|
||
Comment on attachment 9059326 [details]
Bug 1400563 - timer deadlock fix
Security Approval Request
- How easily could an exploit be constructed based on the patch?: It is probably difficult to create an exploit with knowledge of the bug. There are a number of comments in the patch that indicate that there was a timer shutdown race. Removing those comments from the patch wouldn't have much effect as the patch comes from the upstream 3rd party library, and one can see them on GitHub.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Yes
- Which older supported branches are affected by this flaw?: All
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: It should be easy as the effected code is altered vary rarely.
- How likely is this patch to cause regressions; how much testing does it need?: It is possible that it may cause regressions, it needs time to bake on Nightly.
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Comment 98•6 years ago
|
||
Comment on attachment 9059326 [details]
Bug 1400563 - timer deadlock fix
sec-approval+ for trunk so it can get some bake time.
![]() |
||
Comment 99•6 years ago
|
||
Updated•6 years ago
|
Reporter | ||
Comment 100•6 years ago
|
||
Note, however, that we see few of these in Nightly; many more in beta:
But if you restrict to 68, there's only 1 crash other than OOM.
Comment 101•6 years ago
|
||
With not a single crash in any of the 68 beta builds I'm tempted to close this. But we probably want to wait until 68 becomes release before we know for sure.
Updated•6 years ago
|
Comment 102•6 years ago
|
||
Not a single reported crash since 68 (which contains the patches) went into release. Looks like we are getting pretty close to calling this fixed!
Comment 103•5 years ago
|
||
I think we can now call this fixed, checking all the signatures above, no crashes in 68 or beyond for the last month.
Comment 104•5 years ago
|
||
Al did this ever get a security advisory? I just looked in https://www.mozilla.org/en-US/security/advisories/mfsa2019-21/ and didn't see it there.
Updated•5 years ago
|
Comment 105•5 years ago
|
||
(In reply to Liz Henry (:lizzard) from comment #104)
Al did this ever get a security advisory? I just looked in https://www.mozilla.org/en-US/security/advisories/mfsa2019-21/ and didn't see it there.
It did not because the fields were all changed after we shipped on July 9. I've added it to the roll up, where it should go.
Updated•5 years ago
|
Updated•5 years ago
|
Description
•