Closed
Bug 947928
Opened 10 years ago
Closed 10 years ago
[RTSP] System crash when opening RTSP for the second time (NAT problem)
Categories
(Firefox OS Graveyard :: General, defect)
Tracking
(blocking-b2g:1.3+, firefox27 wontfix, firefox28 fixed, firefox29 fixed, b2g-v1.3 fixed)
People
(Reporter: ethan, Assigned: ethan)
References
Details
(Whiteboard: [FT:RIL])
Attachments
(10 files, 4 obsolete files)
151.62 KB,
text/x-log
|
Details | |
8.07 KB,
image/png
|
Details | |
128.34 KB,
application/vnd.tcpdump.pcap
|
Details | |
3.73 KB,
text/x-log
|
Details | |
3.54 KB,
text/plain
|
Details | |
85.66 KB,
application/vnd.tcpdump.pcap
|
Details | |
134.17 KB,
application/vnd.tcpdump.pcap
|
Details | |
20.55 KB,
patch
|
Details | Diff | Splinter Review | |
9.21 MB,
application/vnd.tcpdump.pcap
|
Details | |
1.30 KB,
patch
|
ethan
:
review+
|
Details | Diff | Splinter Review |
*** Reproduce Steps *** 1. Open a simple app using "rtsp" as video source. Such as: <video id=home_video controls preload=metadata autoplay=true width="100%"> <source src="rtsp://184.72.239.149/vod/mp4:BigBuckBunny_115k.mov"/> </video> 2. Click the play button in the control panel. The video was played normally. 3. Close the app. 4. Open the app and click the play button again. 5. After a few seconds, the whole system crashed. *** Note *** 1. This bug can be stably reproduced in certain network environment but not in others. 2. In the logcat, the PIDs of the simple app are: 4022 (normal) and 4085 (crashed).
Assignee | ||
Comment 1•10 years ago
|
||
When the app was opened for the second time, by touching the screen we can see the duration was already calculated and displayed, which means the metadata was loaded successfully.
Assignee | ||
Comment 2•10 years ago
|
||
*** GDB Backtrace Logs *** Gdb attached to the b2g process. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 109.649] __libc_android_abort () at bionic/libc/unistd/abort.c:82 82 *((char*)0xdeadbaad) = 39; (gdb) bt #0 __libc_android_abort () at bionic/libc/unistd/abort.c:82 #1 0x4000944c in __android_log_assert (cond=<value optimized out>, tag=0x41e3909c "AMPEG4ElementaryAssembler", fmt=0x41de1a54 "%s") at system/core/liblog/logd_write.c:246 #2 0x40e6b914 in android::AMPEG4ElementaryAssembler::addPacket (this=0x475ffc18, source=<value optimized out>) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp:239 #3 0x40e6bcc4 in android::AMPEG4ElementaryAssembler::assembleMore (this=0xdeadbaad, source=...) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp:384 #4 0x40e6d672 in android::ARTPAssembler::onPacketReceived (this=0x4972e6d0, source=...) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPAssembler.cpp:41 #5 0x40e6ee24 in android::ARTPSource::processRTPPacket (this=0x47368f60, buffer=<value optimized out>) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPSource.cpp:86 #6 0x40e6dcea in android::ARTPConnection::parseRTP (this=<value optimized out>, s=<value optimized out>, buffer=...) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp:491 #7 0x40e6ded4 in android::ARTPConnection::receive (this=0x446d0b20, s=0x43951c40, receiveRTP=true) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp:402 #8 0x40e6e130 in android::ARTPConnection::onPollStreams (this=0x446d0b20) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp:287 #9 0x40e6e636 in android::ARTPConnection::onMessageReceived (this=0x446d0b20, msg=...) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp:161 #10 0x408982bc in android::ALooperRoster::deliverMessage (this=0x4089b678, msg=...) at frameworks/base/media/libstagefright/foundation/ALooperRoster.cpp:133 #11 0x40897e26 in android::ALooper::loop (this=<value optimized out>) at frameworks/base/media/libstagefright/foundation/ALooper.cpp:212 #12 0x40897f9c in android::ALooper::LooperThread::threadLoop (this=<value optimized out>) ---Type <return> to continue, or q <return> to quit--- at frameworks/base/media/libstagefright/foundation/ALooper.cpp:47 #13 0x40156e58 in android::Thread::_threadLoop (user=<value optimized out>) at frameworks/base/libs/utils/Threads.cpp:834 #14 0x4015749e in thread_data_t::trampoline (t=<value optimized out>) at frameworks/base/libs/utils/Threads.cpp:127 #15 0x40091e18 in __thread_entry ( func=0x40157409 <thread_data_t::trampoline(thread_data_t const*)>, arg=0x461f3890, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217 #16 0x4009196c in pthread_create (thread_out=<value optimized out>, attr=0xbefc4d1c, start_routine=0x40157409 <thread_data_t::trampoline(thread_data_t const*)>, arg=0x461f3890) at bionic/libc/bionic/pthread.c:357 #17 0x00000000 in ?? ()
Comment 3•10 years ago
|
||
#2 0x40e6b914 in android::AMPEG4ElementaryAssembler::addPacket (this=0x475ffc18, source=<value optimized out>) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp:239 CHECK_GE(buffer->size(), 2u); It looks like the buffer content size is less than 2. Hi Ethan: Please help to dump more detail information in |ARTPSource::queuePacket|. In the function, we can dump the buffer->size() and the rtp sequence number. By using rtp sequence number, we can track the packet captured by wireshark.
Comment 4•10 years ago
|
||
Can you also put some debug message in line 490.... http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp#490 uint32_t seqNum = (uint32_t)buffer->int32Data(); size_t size = buffer->size(); dump("buffer size = %d seqNum = %d range = %d", size, seqNum, size - payloadOffset); Also here, http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp#391 dump("errno = %s receive bytes = %d", strerror(errno), nbytes);
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → ettseng
Assignee | ||
Comment 5•10 years ago
|
||
This pcap file was captured while reproducing this bug.
Assignee | ||
Comment 6•10 years ago
|
||
Gdb backtrace log. We printed the content of buffer and it looks fine.
Assignee | ||
Comment 7•10 years ago
|
||
Another gdb backtrace log. Note that the function call stacks while segmentation fault are different from the previous log. The only difference of reproducing bug is, this time we ran tcpdump at the same time.
Assignee | ||
Comment 8•10 years ago
|
||
This pcap was captured by slightly changing the reproduction steps: 1. Connect to another streaming source (music service for AMX). <a href="rtsp://m.streaming.imusica.com.br/vod/mp4:takemeout?=3529931==web.local=1262=0"> 2. Open the media through the video app (the b2g system still crashed).
Comment 9•10 years ago
|
||
Can we set this to 1.3+, as it looks quite important?
Flags: needinfo?(ettseng)
Updated•10 years ago
|
Whiteboard: [FT:RIL]
Assignee | ||
Comment 10•10 years ago
|
||
Yes I think so. We just identified the concrete scenario, which will be described in detail in the comment below. Although this bug would not happen in the normal case, however, when it happens, the result (crash) is undesirable. I agree to set it as 1.3+.
Flags: needinfo?(ettseng)
Assignee | ||
Comment 11•10 years ago
|
||
A very concrete scenario to reproduce this bug is to drop RTP packets, such as using iptables command: (Suppose our RTSP client uses port 15550-15553 as RTP listen ports.) iptables -I INPUT -i wlan0 -p udp --dport 15550 -j DROP iptables -I INPUT -i wlan0 -p udp --dport 15551 -j DROP iptables -I INPUT -i wlan0 -p udp --dport 15552 -j DROP iptables -I INPUT -i wlan0 -p udp --dport 15553 -j DROP When these firewall rules are set up, the behavior of RTSP app would be like this: 1) RTSP session establishment C->S: Describe, S->C: Replay 200 OK C->S: Setup, S->: Reply 200 OK 2) RTSP play request C->S: Play, S->C: Replay 200 OK Meanwhile, RTSP app starts a timer to count down 10 seconds to determine RTP availability. 3) Since RTP packets are dropped, the timer timed out and the app would try to "reconnect". This "reconnect" would teardown the previous RTSP session and describe/setup a new one. However, at this moment an implementation bug would cause the entire parent process crash (segmentation fault). In summary, when RTSP session is established and play is requested, if there is no RTP packet received, the app would definitely crash 10 seconds after play request was sent. We got a quick solution to fix the reconnect bug. Unfortunately, even if this is fixed and reconnect succeeded, if somehow RTP packet is coming now, the app would still crash while parsing the RTP packet. This would happen under network environment. We suspect there is some factors related the network devices (Wirelsss AP, Routers, etc.)
Assignee | ||
Comment 12•10 years ago
|
||
As described above, we have to follow up with more tests, observations and analysis. AFAIK, we should clarify at least the following issues: 1. Why RTP packets would not be received by app when we open the app for 2nd time? 2. After we fixed the timeout-reconnect bug, why would the app still crash while parsing RTP packet? After we clarified these issue, another question would be: Should we replace the assertions "CHECK()" by MOZ_ASSERT(), which will be turned off in release build?
Updated•10 years ago
|
blocking-b2g: --- → 1.3+
Updated•10 years ago
|
Blocks: b2g-RTSP-1.3
Assignee | ||
Comment 13•10 years ago
|
||
Packets captured when b2g system crashed by parsing RTP packet.
Assignee | ||
Comment 14•10 years ago
|
||
A patch to print debugging messages.
Assignee | ||
Comment 15•10 years ago
|
||
** Crash Point ** The crash point is here: http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp#333 In AMPEG4ElementaryAssembler::addPacket() CHECK_LE(offset + header.mSize, buffer->size()); (offset + header.mSize) is larger than buffer->size() which is the packet payload size, and the assertion CHECK_LE() causes the abort/crash. The reason why we get here is because the ARTPSource constructor creates a wrong assembler. For video track, it should be AAVCAssembler, not AMPEG4ElementaryAssembler. ** Incorrect RTP Port Number ** We are not 100% sure why we create a wrong assembler. However, the captured packets provide some clues about this issue. When the crash happens, we found something goes wrong with the "client_port" parameter in RTSP Setup/Response. Note in the following packet snippets, RTSP client (our app) specified port 15552/15553 as RTP/RTCP ports. But RTSP server replied port 15550/15551. 1) RTSP session establishment C->S: Describe, S->C: Reply 200 OK C->S: Setup for trackID 1 (audio) S->C: Reply 200 OK C->S: Setup for trackID 2 (video) 192.168.1.115 184.72.239.149 RTSP 291 SETUP rtsp://184.72.239.149/vod/mp4:BigBuckBunny_115k.mov/trackID=2 RTSP/1.0 Transport: RTP/AVP/UDP;unicast;client_port=15552-15553 S->C: Reply 200 OK 184.72.239.149 192.168.1.115 RTSP 389 Reply: RTSP/1.0 200 OK Transport: RTP/AVP/UDP;unicast;client_port=15550-15551;source=184.72.239.149;server_port=7194-7195;ssrc=68A7E392 2) RTSP play request C->S: Play, S->C: Reply 200 OK 3) RTP packet coming S->C: RTP (video) 184.72.239.149 192.168.1.115 RTP 836 PT=DynamicRTP-Type-97, SSRC=0x68A7E392, Seq=1, Time=0, Mark User Datagram Protocol, Src Port: 7194 (7194), Dst Port: 15550 (15550) Payload type: DynamicRTP-Type-97 (97) Synchronization Source identifier: 0x68a7e392 (1755833234)
Comment 16•10 years ago
|
||
The captured packet for Samsung Galaxy S2 in the same nat network environment.
Comment 17•10 years ago
|
||
Can you add "srand((unsigned int)time(NULL));" before the line we use rand() to get the start port ? http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp#118 I would like to make sure that two RTSP sessions use different client_ports.
Assignee | ||
Updated•10 years ago
|
Summary: [RTSP] System crash when opening rtsp media element → [RTSP] System crash when opening RTSP for the second time (NAT problem)
Assignee | ||
Comment 18•10 years ago
|
||
This patch is just a trial. Originally the RTP/RTCP client ports would stick on 15550-15553. We make these client ports as random integers between 15550 to 20000.
Assignee | ||
Comment 19•10 years ago
|
||
(In reply to Vincent Chang[:vchang] from comment #17) > Can you add "srand((unsigned int)time(NULL));" before the line we use rand() > to get the start port ? > http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/rtsp/ > ARTPConnection.cpp#118 > I would like to make sure that two RTSP sessions use different client_ports. Vincent, I am almost sure this is a NAT traversal problem (that's why I edited the title). First, in conclusion. Yes, using random client ports could resolve this bug! Once we use random ports, the scenario of this bug never happen again. Secondly, I think the issue arise with the implementation of NAT traversal. It seems the NAT implementation determines the 2nd-time "poke a hole" packets as invalid and replies those ICMP port unreachable messages itself. Actually I think it's a bug in the original code. As you know, the intention of the code is to use random ports. unsigned start = (rand() * 1000)/ RAND_MAX + 15550; However, this line would not generate random results since (rand() * 1000) may overflow int type. I found Android 4.4 already fixed this bug. We should just fix that. :)
Assignee | ||
Comment 20•10 years ago
|
||
BTW, we don't need to call srand() ourselves. It seems it was already done somewhere in our system.
Assignee | ||
Comment 21•10 years ago
|
||
Fix the bug of generating random client ports.
Attachment #8351226 -
Attachment is obsolete: true
Assignee | ||
Comment 22•10 years ago
|
||
Attachment #8351308 -
Attachment is obsolete: true
Assignee | ||
Comment 23•10 years ago
|
||
Comment on attachment 8351309 [details] [diff] [review] bug-947928-fix.patch Vincent, could you help to review this patch?
Attachment #8351309 -
Flags: review?(vchang)
Comment 24•10 years ago
|
||
Comment on attachment 8351309 [details] [diff] [review] bug-947928-fix.patch Review of attachment 8351309 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp @@ +116,5 @@ > > bumpSocketBufferSize(*rtcpSocket); > > + /* rand() * 1000 may overflow int type, use long long */ > + unsigned start = (unsigned)((rand() * 1000ll) / RAND_MAX) + 15550; Rather than call srand(seed) somewhere in the code, I would prefer to do it here. So that we can reduce the chance to use duplicate port numbers....
Attachment #8351309 -
Flags: review?(vchang)
Assignee | ||
Comment 25•10 years ago
|
||
Add srand() to reduce the chance of using duplicate port numbers.
Attachment #8351309 -
Attachment is obsolete: true
Assignee | ||
Updated•10 years ago
|
Attachment #8351317 -
Flags: review?(vchang)
Comment 26•10 years ago
|
||
Comment on attachment 8351317 [details] [diff] [review] bug-947928-fix.patch Review of attachment 8351317 [details] [diff] [review]: ----------------------------------------------------------------- Hi sworkman, we found this NAT problem under the firewall gateway environment. Not sure if you have time to review the patch.
Attachment #8351317 -
Flags: review?(vchang)
Attachment #8351317 -
Flags: review?(sworkman)
Attachment #8351317 -
Flags: review+
Comment 27•10 years ago
|
||
Comment on attachment 8351317 [details] [diff] [review] bug-947928-fix.patch Review of attachment 8351317 [details] [diff] [review]: ----------------------------------------------------------------- Oh this looked like a tricky one. Good job on the analysis! r=me.
Attachment #8351317 -
Flags: review?(sworkman) → review+
Assignee | ||
Comment 28•10 years ago
|
||
Update the comment.
Attachment #8351317 -
Attachment is obsolete: true
Attachment #8355800 -
Flags: review+
Assignee | ||
Comment 29•10 years ago
|
||
(In reply to Steve Workman [:sworkman] from comment #27) > Oh this looked like a tricky one. Good job on the analysis! r=me. Steve, thanks for your review. :)
Assignee | ||
Comment 30•10 years ago
|
||
The result of try server: https://tbpl.mozilla.org/?tree=Try&rev=4a276518d9ae
Assignee | ||
Updated•10 years ago
|
Keywords: checkin-needed
Comment 31•10 years ago
|
||
https://hg.mozilla.org/integration/b2g-inbound/rev/68437a442e03
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/68437a442e03
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Comment 33•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/bad5fd00645e
status-b2g-v1.3:
--- → fixed
status-firefox27:
--- → wontfix
status-firefox28:
--- → fixed
status-firefox29:
--- → fixed
Target Milestone: --- → 1.3 C2/1.4 S2(17jan)
You need to log in
before you can comment on or make changes to this bug.
Description
•