Closed
Bug 1049291
Opened 10 years ago
Closed 10 years ago
b2g process crashes while calling select()
Categories
(Core :: WebRTC, defect, P1)
Tracking
()
People
(Reporter: tkundu, Assigned: ekr)
References
Details
(Keywords: crash, Whiteboard: [caf-crash 250][caf priority: p2][b2g-crash] [CR 705517])
Crash Data
Attachments
(3 files)
161.38 KB,
text/plain
|
Details | |
2.35 KB,
patch
|
ehugg
:
review+
tkundu
:
feedback+
|
Details | Diff | Splinter Review |
2.35 KB,
patch
|
Details | Diff | Splinter Review |
b2g process uses more than 1024 fds. We should not use select syscall inside gecko code which is called by b2g process. Reason is that select syscall has limitation of 1024 fd and it will crash during stability test . We are seeing webrtc is using select() and b2g is crashing
Reporter | ||
Comment 1•10 years ago
|
||
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.0?
Reporter | ||
Updated•10 years ago
|
Summary: Replace select syscall with poll() → b2g process crashes while calling select()
Comment 2•10 years ago
|
||
Randall, this crash seems WebRTC-related; can you take a look?
blocking-b2g: 2.0? → 2.0+
Component: Stability → WebRTC
Flags: needinfo?(rjesup)
Keywords: crash
Product: Firefox OS → Core
Assignee | ||
Comment 3•10 years ago
|
||
Can you describe the STR for this? Is this is Mochitests? I wonder if we're repeatedly spinning up instances of SIPCC.
Updated•10 years ago
|
Whiteboard: [b2g-crash]
Updated•10 years ago
|
Flags: needinfo?(rjesup)
Comment 4•10 years ago
|
||
The issue here is simply the existence of the select() call. It needs to be rewritten to poll() instead. Using select() with more than 1024 open fds can cause memory corruption and hard-to-diagnose stability issues.
Assignee | ||
Comment 5•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #4) > The issue here is simply the existence of the select() call. It needs to be > rewritten to poll() instead. Using select() with more than 1024 open fds > can cause memory corruption and hard-to-diagnose stability issues. Can you elaborate on this? select() shouldn't be causing memory corruption with any arguments that point to valid memory regions.
Comment 6•10 years ago
|
||
FD_SET doesn't do any bounds checking. If you pass in a file descriptor whose integer value is >= 1024, then it will corrupt memory.
Reporter | ||
Updated•10 years ago
|
Whiteboard: [b2g-crash] → [b2g-crash] [CR 705517]
Comment 7•10 years ago
|
||
Ekr's going to take this. We realize we need a very fast turn around (less than 1 day), but this time frame for a fix is asking a lot.
Assignee: nobody → ekr
Comment 8•10 years ago
|
||
Upon deeper investigation (thanks :jesup), there is no bug here. The select() in question passes in no fds so there's no risk of memory corruption by an FD_SET.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → INVALID
Assignee | ||
Comment 9•10 years ago
|
||
Hmm... (gdb) l 427 struct cpr_timeval * RESTRICT timeout) 428 { 429 int16_t rc; 430 struct timeval t, *t_p; 431 432 if (timeout != NULL) { 433 t.tv_sec = timeout->tv_sec; 434 t.tv_usec = timeout->tv_usec; 435 t_p = &t; 436 } else { (gdb) n 437 t_p = NULL; (gdb) n 440 rc = (int16_t) select(nfds, read_fds, write_fds, except_fds, t_p); (gdb) p read_fds $5 = (fd_set *) 0x117c15e78 (gdb) p *write_fds $6 = { fds_bits = {0 <repeats 32 times>} } (gdb) p *read_fds $7 = { fds_bits = {0, 32, 0 <repeats 30 times>} } (gdb) p nfds $8 = 38 (gdb)
Comment 10•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #8) > Upon deeper investigation (thanks :jesup), there is no bug here. The > select() in question passes in no fds so there's no risk of memory > corruption by an FD_SET. That was one specific one in DataChannels; the one in cprSelect() does pass fd's, and that was the one that triggered the stacktrace on this bug
Updated•10 years ago
|
Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Assignee | ||
Comment 11•10 years ago
|
||
Assignee | ||
Comment 12•10 years ago
|
||
The WIP patch above removes the call to cprSelect() and the FD_* macros in sip_platform_task and replaces it with a single blocking read to the IPC socket. This should be OK since I believe the other sockets are used for SIP stuff and we are not doing SIP. The other known use of select is in cpr_*_timers_using_select.c but that is in: start_timer_service_loop() called from: timerThread() called from: cpr_timer_pre_init() called from: cprPreInit() http://dxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/src/sipcc/cpr/linux/cpr_linux_init.c#183 but only if CPR_TIMERS_ENABLED is set, which it is not: http://dxr.mozilla.org/mozilla-central/source/media/webrtc/signaling/signaling.gyp?from=signaling.gyp&case=true#671 This is all pretty quick and dirty so it needs a bunch of testing and careful review to make sure I didn't miss anything.
Assignee | ||
Comment 13•10 years ago
|
||
Comment on attachment 8468942 [details] [diff] [review] Remove uses of select() in SIPCC SIP task Review of attachment 8468942 [details] [diff] [review]: ----------------------------------------------------------------- ehugg, can you please review. This works in pc_test.html and signaling_unittests pass.
Attachment #8468942 -
Flags: review?(ethanhugg)
Attachment #8468942 -
Flags: feedback?(tkundu)
Assignee | ||
Comment 14•10 years ago
|
||
Comment on attachment 8468942 [details] [diff] [review] Remove uses of select() in SIPCC SIP task Review of attachment 8468942 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/webrtc/signaling/src/sipcc/core/sipstack/sip_platform_task.c @@ +409,5 @@ > * - Forever-loop exits in sip_process_int_msg()::THREAD_UNLOAD > */ > while (TRUE) { > + sip_process_int_msg(); > + continue; I guess we don't actually need this continue. I wrote it before I decided to #if 0 out everything.
Comment 15•10 years ago
|
||
Comment on attachment 8468942 [details] [diff] [review] Remove uses of select() in SIPCC SIP task Review of attachment 8468942 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me. Built and ran unittests and a few ad-hoc tests. I found instances of FD_SET and FD_CLR in ccsip_platform_tcp.c but I'm pretty sure that code is never called since we don't make our own tcp connections from this code.
Attachment #8468942 -
Flags: review?(ethanhugg) → review+
Reporter | ||
Updated•10 years ago
|
Attachment #8468942 -
Flags: feedback?(tkundu) → feedback+
Comment 16•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ab18eda31e4
Target Milestone: --- → mozilla34
Comment 17•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/eb93c3dfc36d
status-b2g-v2.0:
--- → fixed
status-firefox34:
--- → affected
Updated•10 years ago
|
Whiteboard: [b2g-crash] [CR 705517] → [caf priority: p2][b2g-crash] [CR 705517]
Updated•10 years ago
|
Whiteboard: [caf priority: p2][b2g-crash] [CR 705517] → [caf-crash 250][caf priority: p2][b2g-crash] [CR 705517]
Comment 18•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/7ab18eda31e4
Status: REOPENED → RESOLVED
Closed: 10 years ago → 10 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 19•10 years ago
|
||
Updated•10 years ago
|
Comment 20•10 years ago
|
||
Observed on: Device: msm8610 Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.054 Moz BuildID: 20140804000204 B2G Version: 2.0 Gecko Version: 32.0 Gaia: http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=9e5907995c9327f14cb5d182cee5ff16b1743ed4 Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=ed87f0a54baf646eb0b20b4debc090c8016d2104
Comment 21•10 years ago
|
||
(^ that was without the patch here)
Updated•10 years ago
|
Flags: in-moztrap?(bzumwalt)
Comment 22•10 years ago
|
||
No STR is present to create test case to address bug.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(bzumwalt)
Flags: in-moztrap-
You need to log in
before you can comment on or make changes to this bug.
Description
•