When debugging DataChannels, if one side is stalled (in gdb) for a long time, the other side may crash in a sctp-timer-processing routing accessing freed memory. Crashes in user_sctp_timer_iterate.c:81: if (c->c_time <= ticks) with c = 0xa5a5a5a5a5a5a5a5 Suspicion is that in processing one timer (and it drops to lock to process the timer) it freed another item on the list. Michael Tuexen thinks he knows the actual cause. Several people have seen this (and I'm fairly sure I've hit it before but never got a useful stack from it). sec-crit because it uses the freed-memory-pointer to load a function pointer and call it. Likely affects multiple releases as we haven't changed the SCTP library recently.
For the record, CR with FF34: https://crash-stats.mozilla.com/report/index/e6def404-4f1b-43bc-b4d0-301192141008
Crash Signature: [@ user_sctp_timer_iterate ]
This code hasn't changed pretty much since initial landing.
There's a fix in upstream (likely should update the library on m-c), and I can cherry-pick the fix for uplifts.
(In reply to Randell Jesup [:jesup] from comment #3) > There's a fix in upstream (likely should update the library on m-c), and I > can cherry-pick the fix for uplifts. Has this upstream fix been uplifted to our repo yet? We're nearing the end of a cycle so that would be good to have.
Michael, any updates on this before our cycle ends?
Not sure what you are asking for. I fixed the bug upstream. Integrating that fix is normally done by Randell.
Al: sorry, I wanted to build a more-upliftable patch than a full library import. I started working on that again last night when I realized it had fallen off the radar. I hope to land one today (and import a new lib to 36)
Michael, the bug is assigned to you, which is why I was asking you the question. We saw this during critsmash and I assumed the assignee was the person to ask for status. Randell, thanks.
Ahh, OK. So maybe the assignment is suboptimal...
Comment on attachment 8521747 [details] [diff] [review] update iteration code from upstream Review of attachment 8521747 [details] [diff] [review]: ----------------------------------------------------------------- imported rev 9038 from upstream, updated our moz.build file to match and remove the user_sctp_timer_iterate.c Imported as a single patch, it applies cleanly
Attachment #8521747 - Flags: review?(tuexen)
Comment on attachment 8521747 [details] [diff] [review] update iteration code from upstream [Security approval request comment] How easily could an exploit be constructed based on the patch? Very unlikely Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? no Which older supported branches are affected by this flaw? Pretty much all If not all supported branches, which bug introduced the flaw? Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be? Should apply directly How likely is this patch to cause regressions; how much testing does it need? fairly unlikely to regress, as this makes it use the existing Mac kernel timer loop, and this upstream patch is basically a clean import. I've done local testing where I freeze one side using the testcase in See Also here and two browsers. no problems in multiple tests.
Comment on attachment 8521747 [details] [diff] [review] update iteration code from upstream Many approvals given! Go!
Attachment #8521747 - Flags: sec-approval?
Attachment #8521747 - Flags: sec-approval+
Attachment #8521747 - Flags: approval-mozilla-esr31?
Attachment #8521747 - Flags: approval-mozilla-esr31+
Attachment #8521747 - Flags: approval-mozilla-beta?
Attachment #8521747 - Flags: approval-mozilla-beta+
Attachment #8521747 - Flags: approval-mozilla-aurora?
Attachment #8521747 - Flags: approval-mozilla-aurora+
Actually: Tuexen wrote the upstream patch; I imported it, so I really should be the reviewer.
Attachment #8521747 - Flags: review?(tuexen) → review+
Whiteboard: [webrtc-uplift] → [webrtc-uplift][adv-main34+][adv-esr31.3+]
Going to mark this as qe‑verify- as this was tested via bug # 1099388 comment # 7. I tried reproducing the issue with the test case from bug # 1099388 using the instructions from bug # 1099388 comment # 4 without any luck. I let the test case run for more than 2 hours on my asan build without any crashes. Using the information from comment #0, I wasn't sure how to proceed with testing as the issue was discovered while debugging DataChannels. If there's anything else that QE can do here, please let me know and I'll gladly take another look!
You need to log in before you can comment on or make changes to this bug.