Cannot send a blob from a FxOS phone to a connected remote peer via Data Channels

RESOLVED FIXED in Firefox 28

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: jsmith, Assigned: slee)

Tracking

(Depends on 1 bug, Blocks 1 bug)

29 Branch
mozilla29
ARM
Gonk (Firefox OS)
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:1.3+, firefox27 wontfix, firefox28 verified, firefox29 fixed, b2g-v1.3 verified, b2g-v1.4 fixed)

Details

Attachments

(3 attachments, 2 obsolete attachments)

Build - 12/19/2013 Leo 1.4

STR

1. On the FxOS phone - go to http://mysecondwebrtc.appspot.com/ using a wifi network with low activity
2. On a Desktop Firefox or FxAndroid Nightly, - go to http://mysecondwebrtc.appspot.com/ using the same wifi network as [1]
3. Register the Remote Client ID either on either one of the two peers involved (FxOS or the other peer)
4. After the data channel is confirmed open on both sides, try to send a mp3 file from the FxOS phone to the remote peer

Expected

The remote blob should be received at the remote peer eventually.

Actual

The remote blob fails to be received at the remote peer.

Additional Notes

Interestingly enough, sending a blob from the remote peer to the FxOS phone works fine. Additionally, sending string messages between each peer in either direction works fine as well.
Posted file Logcat
So rtccopy works in this case. So the question is why is my test app failing on FxOS, even though it works on Desktop & FxAndroid fine & works with the remote peer (Desktop or FxAndroid) sending data to FxOS.

Logcat shows:

12-19 21:11:04.230: I/Gecko(17575): ###!!! [Child][MessageChannel] Error: Channel closing: too late to send/recv, messages will be lost
(In reply to Jason Smith [:jsmith] from comment #2)
> So rtccopy works in this case. So the question is why is my test app failing
> on FxOS, even though it works on Desktop & FxAndroid fine & works with the
> remote peer (Desktop or FxAndroid) sending data to FxOS.
> 
> Logcat shows:
> 
> 12-19 21:11:04.230: I/Gecko(17575): ###!!! [Child][MessageChannel] Error:
> Channel closing: too late to send/recv, messages will be lost

Actually that logcat message is irrelevant. The sending of the file happens after these logcat statements:

12-19 21:11:09.140: E/GeckoConsole(17579): Content JS LOG at http://mysecondwebrtc.appspot.com/:302 in submitDCFile: Sending a file
12-19 21:11:09.140: E/GeckoConsole(17579): Content JS LOG at http://mysecondwebrtc.appspot.com/:303 in submitDCFile: [object File]
Confirmed this also reproduces on a 1.4 Buri device as well.
Can you repeat with NSPR_LOG_MODULES=datachannel:5,sctp:5 (requires a debug build!)
Also: does the file picker work as "normal" on B2G?  I can't find a file to select to send... (no memory card, so no photos)
(In reply to Randell Jesup [:jesup] from comment #6)
> Also: does the file picker work as "normal" on B2G?  I can't find a file to
> select to send... (no memory card, so no photos)

The file picker should work fine. It was implemented 1.1, so input=file elements should work correctly.
(In reply to Randell Jesup [:jesup] from comment #5)
> Can you repeat with NSPR_LOG_MODULES=datachannel:5,sctp:5 (requires a debug
> build!)

Hmm, this is going to take sometime to get, as only a few of us have debug environments setup. I'll try to look into this when I get back in January with Nils.

Note - it might be worthwhile to see if there's any workaround we can implement in the WebRTC logging system to allow for the necessary NSPR logging to work on production builds (or get as close as possible with the logging needed). Most of the Moz testers, contractor testers, & certification testers rely primarily on production builds, as that's what we have available for testing in the b2g build directories. We would need to manually generate device debug builds to allow for getting the logging here, which will have a significant overhead cost.
Keywords: qawanted
QA Contact: jsmith
A bunch of this stuff is already forced on for mtransport, for instance.... So, we could extend that...
(In reply to Eric Rescorla (:ekr) from comment #9)
> A bunch of this stuff is already forced on for mtransport, for instance....
> So, we could extend that...

Let me file a bug for this.
Pulling blocking bug here as this doesn't happen on all possible Data Channel sites, but it does happen consistently on my test site, which works fine on Desktop & FxANdroid.
No longer blocks: 945256
Attached - wireshark trace from the FxOS phone when it fails to send a blob after three attempts
Randell - does the wireshark trace give any input into what the problem here is?
Flags: needinfo?(rjesup)
blocking-b2g: --- → 1.3?
I'm putting 1.3? on this because this appears to be consistent across multiple networks & sounds like an obvious blocker for shipping data channels on FxOS.
I think the best course of action is to get these logs enabled as much as possible on production builds, as I don't see a streamlined way of getting our contractors or generally anyone in Moz QA of doing investigation on device debug builds within a short period of time.

If you want the logs needed here, then we'll need bug 952542 fixed.
Depends on: 952542
Keywords: qawanted
blocking-b2g: 1.3? → 1.3+
When sending blob on unagi, it gets the size of the blob first, [1]. But we are running this function on main thread(because of OOP), it returns failed, [2].

[1] http://dxr.mozilla.org/mozilla-central/source/netwerk/sctp/datachannel/DataChannel.cpp#2274
[2] http://dxr.mozilla.org/mozilla-central/source/dom/ipc/Blob.cpp#235
Flags: needinfo?(rjesup)
Posted patch patch (obsolete) — Splinter Review
Hi Randell,

Can you help review this patch?
The problem is resulted from that some operations of the incoming nsIInputStream cannot be ran on main-thread. My patch is to create another IO thread to read the blob. After read all the data, it will pass the data to main-thread and send to another peer.

Thanks.
Attachment #8359637 - Flags: review?(rjesup)
Assignee: nobody → slee
Comment on attachment 8359637 [details] [diff] [review]
patch

Review of attachment 8359637 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/sctp/datachannel/DataChannel.h
@@ +330,5 @@
>      , mFlags(flags)
>      , mIsRecvBinary(false)
>      {
>        NS_ASSERTION(mConnection,"NULL connection");
> +      NS_NewThread(getter_AddRefs(mInternalIOThread));

We can't add a a thread-per-channel (we can have MANY channels).  We could add a thread to DataChannelConnection, a global thread, or create ephemeral threads for sending a blob (though that might have bad edge-case performance when you want to send a lot of blobs).

If there is a global or DataChannelConnection thread, it probably should be created lazily (i.e. not just because we've created a DataChannelCOnnection, since most connections won't send blobs).  We shouldn't need a lock on the global as we'd do the create of the thread from mainthread always.

We'd also need to make sure the thread(s) get shut down at the right time!  See other code, or consult with bsmedberg.

Or alternatively as the source of the problem on B2G has been determined, I can take over the patch to free you to work on the emulator test issue.

@@ +444,5 @@
> +public:
> +  SendBlobRunnable(DataChannel* aChannel, nsCString& aData) :
> +    mChannel(aChannel),
> +    mData(aData) {
> +  }

{} instead of a line break

@@ +451,5 @@
> +    mChannel->SendBinaryMsg(mData);
> +    return NS_OK;
> +  }
> +private:
> +  DataChannel* mChannel;

I'm concerned that this is a Raw Pointer in a runnable to an object that can be garbage collected....!
Attachment #8359637 - Flags: review?(rjesup) → review-
(In reply to Randell Jesup [:jesup] from comment #18)
> We'd also need to make sure the thread(s) get shut down at the right time! 
> See other code, or consult with bsmedberg.
Sure, I checked the code of WifiProxyService. I will create the thread when DataChannelConnection sends the blob in the first time and shutdown the thread in the destructor.
> 
> Or alternatively as the source of the problem on B2G has been determined, I
> can take over the patch to free you to work on the emulator test issue.
The emulator test issues are handled by SC. I can work on this topic.
Posted patch patch -v2 (obsolete) — Splinter Review
Attachment #8359637 - Attachment is obsolete: true
Attachment #8360923 - Flags: review?(rjesup)
Comment on attachment 8360923 [details] [diff] [review]
patch -v2

Review of attachment 8360923 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with the error-check added.  One thread per DataChannel Connection (that is sending Blobs) isn't perfect, but it's ok.  Please file a follow-up bug to reduce this to one shared thread for all connections.

::: netwerk/sctp/datachannel/DataChannel.cpp
@@ +2287,5 @@
>    DataChannel *channel = mStreams[stream];
>    NS_ENSURE_TRUE(channel, 0);
>    // Spawn a thread to send the data
> +  if (!mInternalIOThread) {
> +    NS_NewThread(getter_AddRefs(mInternalIOThread));

Check the result and return failure if we can't spawn the thread.
Attachment #8360923 - Flags: review?(rjesup) → review+
Also file a bug (or add a patch here) to add a testcase for sending and receiving a blob.
Depends on: 961562
(In reply to Randell Jesup [:jesup] from comment #22)
> Also file a bug (or add a patch here) to add a testcase for sending and
> receiving a blob.

I found there are send-blob test case in [1]. If bug 950317 is landed, we can test it, is it right?

[1] http://mxr.mozilla.org/mozilla-central/source/dom/media/tests/mochitest/templates.js#390
It seems that the patch causes memory leak problem. In every mochitest-3, it detects memory leak. On my laptop, it has the same problem. I will figure out the problem then try again.
 
https://tbpl.mozilla.org/?tree=Try&rev=525852720481
(In reply to StevenLee[:slee] from comment #23)
> (In reply to Randell Jesup [:jesup] from comment #22)
> > Also file a bug (or add a patch here) to add a testcase for sending and
> > receiving a blob.
> 
> I found there are send-blob test case in [1]. If bug 950317 is landed, we
> can test it, is it right?
> 
> [1]
> http://mxr.mozilla.org/mozilla-central/source/dom/media/tests/mochitest/
> templates.js#390

Don't think so - the try runs with the patch from that bug didn't show evidence of this bug being triggered. We're probably missing something in our mochitest coverage here that would be needed to trigger this bug.
Posted patch patch - v3Splinter Review
Found the memory leak problem.
I sent the patch to try server and wait for the test results.
https://tbpl.mozilla.org/?tree=Try&rev=27c23855cffa
Attachment #8360923 - Attachment is obsolete: true
Attachment #8364249 - Flags: review+
(In reply to Jason Smith [:jsmith] from comment #25)
> Don't think so - the try runs with the patch from that bug didn't show
> evidence of this bug being triggered. We're probably missing something in
> our mochitest coverage here that would be needed to trigger this bug.
While I was tracing the memory leak problem of my patch, I found that DataChannelConnection::SendBlob(where the function that has problem) is called. I guess the reason why the bug is not triggered before is that the test cases input different file types, [2]. 

[1] http://dxr.mozilla.org/mozilla-central/source/netwerk/sctp/datachannel/DataChannel.cpp#2256
[2] http://dxr.mozilla.org/mozilla-central/source/dom/media/tests/mochitest/templates.js#390
(In reply to StevenLee[:slee] from comment #26)
> Created attachment 8364249 [details] [diff] [review]
> patch - v3
> 
> Found the memory leak problem.
> I sent the patch to try server and wait for the test results.
> https://tbpl.mozilla.org/?tree=Try&rev=27c23855cffa

Ugh, we really should remove the NS_GetMainThread overload that lets you do that.
https://hg.mozilla.org/mozilla-central/rev/d25dbecf3769
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Keywords: verifyme
Sanity tests look good - blobs are working fine on my test site now.
You need to log in before you can comment on or make changes to this bug.