Closed Bug 947928 Opened 12 years ago Closed 12 years ago

[RTSP] System crash when opening RTSP for the second time (NAT problem)

Tracking

(blocking-b2g:1.3+, firefox27 wontfix, firefox28 fixed, firefox29 fixed, b2g-v1.3 fixed)

Status:

RESOLVED FIXED

Milestone:

1.3 C2/1.4 S2(17jan)

Project Flags:

blocking-b2g

1.3+

Tracking Flags:

Tracking

Status

firefox27

---

wontfix

firefox28

---

fixed

firefox29

---

fixed

b2g-v1.3

---

fixed

People

(Reporter: ethan, Assigned: ethan)

References

Details

(Whiteboard: [FT:RIL])

Attachments

(10 files, 4 obsolete files)

rtsp_error.log 12 years ago Ethan Tseng [:ethan] 151.62 KB, text/x-log		Details
device-2013-12-09-230741.png 12 years ago Ethan Tseng [:ethan] 8.07 KB, image/png		Details
rtsp.pcap 12 years ago Ethan Tseng [:ethan] 128.34 KB, application/vnd.tcpdump.pcap		Details
rtsp_gdb.log 12 years ago Ethan Tseng [:ethan] 3.73 KB, text/x-log		Details
rtsp_gdb.log.2 12 years ago Ethan Tseng [:ethan] 3.54 KB, text/plain		Details
rtsp_amx_music.pcap 12 years ago Ethan Tseng [:ethan] 85.66 KB, application/vnd.tcpdump.pcap		Details
rtsp_crash_at_rtp.pcap 12 years ago Ethan Tseng [:ethan] 134.17 KB, application/vnd.tcpdump.pcap		Details
rtsp-crash-dbg.patch 12 years ago Ethan Tseng [:ethan] 20.55 KB, patch		Details \| Diff \| Splinter Review
Rtsp ok for Galaxy S2 12 years ago Vincent Chang[:vchang][changyihsin] 9.21 MB, application/vnd.tcpdump.pcap		Details
bug-947928-fix.patch 12 years ago Ethan Tseng [:ethan] 1.68 KB, patch		Details \| Diff \| Splinter Review
bug-947928-fix.patch 12 years ago Ethan Tseng [:ethan] 1.17 KB, patch		Details \| Diff \| Splinter Review
bug-947928-fix.patch 12 years ago Ethan Tseng [:ethan] 1.17 KB, patch		Details \| Diff \| Splinter Review
bug-947928-fix.patch 12 years ago Ethan Tseng [:ethan] 1.25 KB, patch	vchang : review+ sworkman : review+	Details \| Diff \| Splinter Review
bug-947928-fix.patch 12 years ago Ethan Tseng [:ethan] 1.30 KB, patch	ethan : review+	Details \| Diff \| Splinter Review

Ethan Tseng [:ethan]

Assignee

Description

•

12 years ago

Attached file rtsp_error.log — Details

*** Reproduce Steps *** 1. Open a simple app using "rtsp" as video source. Such as: <video id=home_video controls preload=metadata autoplay=true width="100%"> <source src="rtsp://184.72.239.149/vod/mp4:BigBuckBunny_115k.mov"/> </video> 2. Click the play button in the control panel. The video was played normally. 3. Close the app. 4. Open the app and click the play button again. 5. After a few seconds, the whole system crashed. *** Note *** 1. This bug can be stably reproduced in certain network environment but not in others. 2. In the logcat, the PIDs of the simple app are: 4022 (normal) and 4085 (crashed).

Ethan Tseng [:ethan]

Assignee

Comment 1

•

12 years ago

Attached image device-2013-12-09-230741.png — Details

When the app was opened for the second time, by touching the screen we can see the duration was already calculated and displayed, which means the metadata was loaded successfully.

Ethan Tseng [:ethan]

Assignee

Comment 2

•

12 years ago

*** GDB Backtrace Logs *** Gdb attached to the b2g process. Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 109.649] __libc_android_abort () at bionic/libc/unistd/abort.c:82 82 *((char*)0xdeadbaad) = 39; (gdb) bt #0 __libc_android_abort () at bionic/libc/unistd/abort.c:82 #1 0x4000944c in __android_log_assert (cond=<value optimized out>, tag=0x41e3909c "AMPEG4ElementaryAssembler", fmt=0x41de1a54 "%s") at system/core/liblog/logd_write.c:246 #2 0x40e6b914 in android::AMPEG4ElementaryAssembler::addPacket (this=0x475ffc18, source=<value optimized out>) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp:239 #3 0x40e6bcc4 in android::AMPEG4ElementaryAssembler::assembleMore (this=0xdeadbaad, source=...) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp:384 #4 0x40e6d672 in android::ARTPAssembler::onPacketReceived (this=0x4972e6d0, source=...) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPAssembler.cpp:41 #5 0x40e6ee24 in android::ARTPSource::processRTPPacket (this=0x47368f60, buffer=<value optimized out>) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPSource.cpp:86 #6 0x40e6dcea in android::ARTPConnection::parseRTP (this=<value optimized out>, s=<value optimized out>, buffer=...) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp:491 #7 0x40e6ded4 in android::ARTPConnection::receive (this=0x446d0b20, s=0x43951c40, receiveRTP=true) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp:402 #8 0x40e6e130 in android::ARTPConnection::onPollStreams (this=0x446d0b20) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp:287 #9 0x40e6e636 in android::ARTPConnection::onMessageReceived (this=0x446d0b20, msg=...) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp:161 #10 0x408982bc in android::ALooperRoster::deliverMessage (this=0x4089b678, msg=...) at frameworks/base/media/libstagefright/foundation/ALooperRoster.cpp:133 #11 0x40897e26 in android::ALooper::loop (this=<value optimized out>) at frameworks/base/media/libstagefright/foundation/ALooper.cpp:212 #12 0x40897f9c in android::ALooper::LooperThread::threadLoop (this=<value optimized out>) ---Type <return> to continue, or q <return> to quit--- at frameworks/base/media/libstagefright/foundation/ALooper.cpp:47 #13 0x40156e58 in android::Thread::_threadLoop (user=<value optimized out>) at frameworks/base/libs/utils/Threads.cpp:834 #14 0x4015749e in thread_data_t::trampoline (t=<value optimized out>) at frameworks/base/libs/utils/Threads.cpp:127 #15 0x40091e18 in __thread_entry ( func=0x40157409 <thread_data_t::trampoline(thread_data_t const*)>, arg=0x461f3890, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217 #16 0x4009196c in pthread_create (thread_out=<value optimized out>, attr=0xbefc4d1c, start_routine=0x40157409 <thread_data_t::trampoline(thread_data_t const*)>, arg=0x461f3890) at bionic/libc/bionic/pthread.c:357 #17 0x00000000 in ?? ()

Benjamin Chen [:bechen]

Comment 3

•

12 years ago

#2 0x40e6b914 in android::AMPEG4ElementaryAssembler::addPacket (this=0x475ffc18, source=<value optimized out>) at /home/ethan/workspace/mozilla/mozilla-central/netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp:239 CHECK_GE(buffer->size(), 2u); It looks like the buffer content size is less than 2. Hi Ethan: Please help to dump more detail information in |ARTPSource::queuePacket|. In the function, we can dump the buffer->size() and the rtp sequence number. By using rtp sequence number, we can track the packet captured by wireshark.

Vincent Chang[:vchang][changyihsin]

Comment 4

•

12 years ago

Can you also put some debug message in line 490.... http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp#490 uint32_t seqNum = (uint32_t)buffer->int32Data(); size_t size = buffer->size(); dump("buffer size = %d seqNum = %d range = %d", size, seqNum, size - payloadOffset); Also here, http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp#391 dump("errno = %s receive bytes = %d", strerror(errno), nbytes);

Ethan Tseng [:ethan]

Assignee

Updated

•

12 years ago

Assignee: nobody → ettseng

Ethan Tseng [:ethan]

Assignee

Comment 5

•

12 years ago

Attached file rtsp.pcap — Details

This pcap file was captured while reproducing this bug.

Ethan Tseng [:ethan]

Assignee

Comment 6

•

12 years ago

Attached file rtsp_gdb.log — Details

Gdb backtrace log. We printed the content of buffer and it looks fine.

Ethan Tseng [:ethan]

Assignee

Comment 7

•

12 years ago

Attached file rtsp_gdb.log.2 — Details

Another gdb backtrace log. Note that the function call stacks while segmentation fault are different from the previous log. The only difference of reproducing bug is, this time we ran tcpdump at the same time.

Ethan Tseng [:ethan]

Assignee

Comment 8

•

12 years ago

Attached file rtsp_amx_music.pcap — Details

This pcap was captured by slightly changing the reproduction steps: 1. Connect to another streaming source (music service for AMX). <a href="rtsp://m.streaming.imusica.com.br/vod/mp4:takemeout?=3529931==web.local=1262=0"> 2. Open the media through the video app (the b2g system still crashed).

Wesley Huang [:wesley_huang] (EPM) (NI me)

Comment 9

•

12 years ago

Can we set this to 1.3+, as it looks quite important?

Flags: needinfo?(ettseng)

Wesley Huang [:wesley_huang] (EPM) (NI me)

Updated

•

12 years ago

Whiteboard: [FT:RIL]

Ethan Tseng [:ethan]

Assignee

Comment 10

•

12 years ago

Yes I think so. We just identified the concrete scenario, which will be described in detail in the comment below. Although this bug would not happen in the normal case, however, when it happens, the result (crash) is undesirable. I agree to set it as 1.3+.

Flags: needinfo?(ettseng)

Ethan Tseng [:ethan]

Assignee

Comment 11

•

12 years ago

A very concrete scenario to reproduce this bug is to drop RTP packets, such as using iptables command: (Suppose our RTSP client uses port 15550-15553 as RTP listen ports.) iptables -I INPUT -i wlan0 -p udp --dport 15550 -j DROP iptables -I INPUT -i wlan0 -p udp --dport 15551 -j DROP iptables -I INPUT -i wlan0 -p udp --dport 15552 -j DROP iptables -I INPUT -i wlan0 -p udp --dport 15553 -j DROP When these firewall rules are set up, the behavior of RTSP app would be like this: 1) RTSP session establishment C->S: Describe, S->C: Replay 200 OK C->S: Setup, S->: Reply 200 OK 2) RTSP play request C->S: Play, S->C: Replay 200 OK Meanwhile, RTSP app starts a timer to count down 10 seconds to determine RTP availability. 3) Since RTP packets are dropped, the timer timed out and the app would try to "reconnect". This "reconnect" would teardown the previous RTSP session and describe/setup a new one. However, at this moment an implementation bug would cause the entire parent process crash (segmentation fault). In summary, when RTSP session is established and play is requested, if there is no RTP packet received, the app would definitely crash 10 seconds after play request was sent. We got a quick solution to fix the reconnect bug. Unfortunately, even if this is fixed and reconnect succeeded, if somehow RTP packet is coming now, the app would still crash while parsing the RTP packet. This would happen under network environment. We suspect there is some factors related the network devices (Wirelsss AP, Routers, etc.)

Ethan Tseng [:ethan]

Assignee

Comment 12

•

12 years ago

As described above, we have to follow up with more tests, observations and analysis. AFAIK, we should clarify at least the following issues: 1. Why RTP packets would not be received by app when we open the app for 2nd time? 2. After we fixed the timeout-reconnect bug, why would the app still crash while parsing RTP packet? After we clarified these issue, another question would be: Should we replace the assertions "CHECK()" by MOZ_ASSERT(), which will be turned off in release build?

Wesley Huang [:wesley_huang] (EPM) (NI me)

Updated

•

12 years ago

blocking-b2g: --- → 1.3+

Wesley Huang [:wesley_huang] (EPM) (NI me)

Updated

•

12 years ago

Blocks: b2g-RTSP-1.3

Ethan Tseng [:ethan]

Assignee

Comment 13

•

12 years ago

Attached file rtsp_crash_at_rtp.pcap — Details

Packets captured when b2g system crashed by parsing RTP packet.

Ethan Tseng [:ethan]

Assignee

Comment 14

•

12 years ago

Attached patch rtsp-crash-dbg.patch — Details — Splinter Review

A patch to print debugging messages.

Ethan Tseng [:ethan]

Assignee

Comment 15

•

12 years ago

** Crash Point ** The crash point is here: http://mxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/rtsp/AMPEG4ElementaryAssembler.cpp#333 In AMPEG4ElementaryAssembler::addPacket() CHECK_LE(offset + header.mSize, buffer->size()); (offset + header.mSize) is larger than buffer->size() which is the packet payload size, and the assertion CHECK_LE() causes the abort/crash. The reason why we get here is because the ARTPSource constructor creates a wrong assembler. For video track, it should be AAVCAssembler, not AMPEG4ElementaryAssembler. ** Incorrect RTP Port Number ** We are not 100% sure why we create a wrong assembler. However, the captured packets provide some clues about this issue. When the crash happens, we found something goes wrong with the "client_port" parameter in RTSP Setup/Response. Note in the following packet snippets, RTSP client (our app) specified port 15552/15553 as RTP/RTCP ports. But RTSP server replied port 15550/15551. 1) RTSP session establishment C->S: Describe, S->C: Reply 200 OK C->S: Setup for trackID 1 (audio) S->C: Reply 200 OK C->S: Setup for trackID 2 (video) 192.168.1.115 184.72.239.149 RTSP 291 SETUP rtsp://184.72.239.149/vod/mp4:BigBuckBunny_115k.mov/trackID=2 RTSP/1.0 Transport: RTP/AVP/UDP;unicast;client_port=15552-15553 S->C: Reply 200 OK 184.72.239.149 192.168.1.115 RTSP 389 Reply: RTSP/1.0 200 OK Transport: RTP/AVP/UDP;unicast;client_port=15550-15551;source=184.72.239.149;server_port=7194-7195;ssrc=68A7E392 2) RTSP play request C->S: Play, S->C: Reply 200 OK 3) RTP packet coming S->C: RTP (video) 184.72.239.149 192.168.1.115 RTP 836 PT=DynamicRTP-Type-97, SSRC=0x68A7E392, Seq=1, Time=0, Mark User Datagram Protocol, Src Port: 7194 (7194), Dst Port: 15550 (15550) Payload type: DynamicRTP-Type-97 (97) Synchronization Source identifier: 0x68a7e392 (1755833234)

Vincent Chang[:vchang][changyihsin]

Comment 16

•

12 years ago

Attached file Rtsp ok for Galaxy S2 — Details

The captured packet for Samsung Galaxy S2 in the same nat network environment.

Vincent Chang[:vchang][changyihsin]

Comment 17

•

12 years ago

Can you add "srand((unsigned int)time(NULL));" before the line we use rand() to get the start port ? http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp#118 I would like to make sure that two RTSP sessions use different client_ports.

Ethan Tseng [:ethan]

Assignee

Updated

•

12 years ago

Summary: [RTSP] System crash when opening rtsp media element → [RTSP] System crash when opening RTSP for the second time (NAT problem)

Ethan Tseng [:ethan]

Assignee

Comment 18

•

12 years ago

Attached patch bug-947928-fix.patch (obsolete) — Details — Splinter Review

This patch is just a trial. Originally the RTP/RTCP client ports would stick on 15550-15553. We make these client ports as random integers between 15550 to 20000.

Ethan Tseng [:ethan]

Assignee

Comment 19

•

12 years ago

(In reply to Vincent Chang[:vchang] from comment #17) > Can you add "srand((unsigned int)time(NULL));" before the line we use rand() > to get the start port ? > http://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/rtsp/rtsp/ > ARTPConnection.cpp#118 > I would like to make sure that two RTSP sessions use different client_ports. Vincent, I am almost sure this is a NAT traversal problem (that's why I edited the title). First, in conclusion. Yes, using random client ports could resolve this bug! Once we use random ports, the scenario of this bug never happen again. Secondly, I think the issue arise with the implementation of NAT traversal. It seems the NAT implementation determines the 2nd-time "poke a hole" packets as invalid and replies those ICMP port unreachable messages itself. Actually I think it's a bug in the original code. As you know, the intention of the code is to use random ports. unsigned start = (rand() * 1000)/ RAND_MAX + 15550; However, this line would not generate random results since (rand() * 1000) may overflow int type. I found Android 4.4 already fixed this bug. We should just fix that. :)

Ethan Tseng [:ethan]

Assignee

Comment 20

•

12 years ago

BTW, we don't need to call srand() ourselves. It seems it was already done somewhere in our system.

Ethan Tseng [:ethan]

Assignee

Comment 21

•

12 years ago

Attached patch bug-947928-fix.patch (obsolete) — Details — Splinter Review

Fix the bug of generating random client ports.

Attachment #8351226 - Attachment is obsolete: true

Ethan Tseng [:ethan]

Assignee

Comment 22

•

12 years ago

Attached patch bug-947928-fix.patch (obsolete) — Details — Splinter Review

Attachment #8351308 - Attachment is obsolete: true

Ethan Tseng [:ethan]

Assignee

Comment 23

•

12 years ago

Comment on attachment 8351309 [details] [diff] [review] bug-947928-fix.patch Vincent, could you help to review this patch?

Attachment #8351309 - Flags: review?(vchang)

Vincent Chang[:vchang][changyihsin]

Comment 24

•

12 years ago

Comment on attachment 8351309 [details] [diff] [review] bug-947928-fix.patch Review of attachment 8351309 [details] [diff] [review]: ----------------------------------------------------------------- ::: netwerk/protocol/rtsp/rtsp/ARTPConnection.cpp @@ +116,5 @@ > > bumpSocketBufferSize(*rtcpSocket); > > + /* rand() * 1000 may overflow int type, use long long */ > + unsigned start = (unsigned)((rand() * 1000ll) / RAND_MAX) + 15550; Rather than call srand(seed) somewhere in the code, I would prefer to do it here. So that we can reduce the chance to use duplicate port numbers....

Attachment #8351309 - Flags: review?(vchang)

Ethan Tseng [:ethan]

Assignee

Comment 25

•

12 years ago

Attached patch bug-947928-fix.patch (obsolete) — Details — Splinter Review

Add srand() to reduce the chance of using duplicate port numbers.

Attachment #8351309 - Attachment is obsolete: true

Ethan Tseng [:ethan]

Assignee

Updated

•

12 years ago

Attachment #8351317 - Flags: review?(vchang)

Vincent Chang[:vchang][changyihsin]

Comment 26

•

12 years ago

Comment on attachment 8351317 [details] [diff] [review] bug-947928-fix.patch Review of attachment 8351317 [details] [diff] [review]: ----------------------------------------------------------------- Hi sworkman, we found this NAT problem under the firewall gateway environment. Not sure if you have time to review the patch.

Attachment #8351317 - Flags: review?(vchang)

Attachment #8351317 - Flags: review?(sworkman)

Attachment #8351317 - Flags: review+

Steve Workman [:sworkman] (INACTIVE)

Comment 27

•

12 years ago

Comment on attachment 8351317 [details] [diff] [review] bug-947928-fix.patch Review of attachment 8351317 [details] [diff] [review]: ----------------------------------------------------------------- Oh this looked like a tricky one. Good job on the analysis! r=me.

Attachment #8351317 - Flags: review?(sworkman) → review+

Ethan Tseng [:ethan]

Assignee

Comment 28

•

12 years ago

Attached patch bug-947928-fix.patch — Details — Splinter Review

Update the comment.

Attachment #8351317 - Attachment is obsolete: true

Attachment #8355800 - Flags: review+

Ethan Tseng [:ethan]

Assignee

Comment 29

•

12 years ago

(In reply to Steve Workman [:sworkman] from comment #27) > Oh this looked like a tricky one. Good job on the analysis! r=me. Steve, thanks for your review. :)

Ethan Tseng [:ethan]

Assignee

Comment 30

•

12 years ago

The result of try server: https://tbpl.mozilla.org/?tree=Try&rev=4a276518d9ae

Ethan Tseng [:ethan]

Assignee

Updated

•

12 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 31

•

12 years ago

https://hg.mozilla.org/integration/b2g-inbound/rev/68437a442e03

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 32

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/68437a442e03

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Comment 33

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/bad5fd00645e

status-b2g-v1.3: --- → fixed

status-firefox27: --- → wontfix

status-firefox28: --- → fixed

status-firefox29: --- → fixed

Target Milestone: --- → 1.3 C2/1.4 S2(17jan)

You need to log in before you can comment on or make changes to this bug.