Closed Bug 843857 Opened 11 years ago Closed 10 years ago

Fix connection error handling in UnixSocket

Categories

(Firefox OS Graveyard :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
1.4 S1 (14feb)

People

(Reporter: qdot, Assigned: qdot)

References

Details

(Whiteboard: [systemsfe][p=8])

Attachments

(3 files, 4 obsolete files)

UnixSocketImpl::Connect() can fail in multiple ways, but we'll never fire back an error event or even clean up the implementation.
Been cleaning up old bugs, figured I'd do this one real quick, and realized that we've got more problems than just not reporting back to the main thread at every point where failure happens. The places we call reset() on the socket are pretty much random, and we never clean up our watchers when we fail as part of listen/connect finishes in onFileCan*. Should we be calling the CloseSocket function there instead of just reset()?

Also, copy/pasting the NS_WARNING and task creation/send everywhere is messy, but I did it that way to keep the line numbers for the NS_WARNINGs valid.

Thomas: Any thoughts on how to improve this situation? Or am I worrying about the wrong things? :)
Attachment #8345454 - Flags: feedback?(tzimmermann)
Comment on attachment 8345454 [details] [diff] [review]
0001-Bug-843857-Make-sure-all-socket-errors-fire-events-b.patch

Review of attachment 8345454 [details] [diff] [review]:
-----------------------------------------------------------------

Hi Kyle,

You're right, this code still has lots of problems. I think we should take a cleaned up version of you're patch to improve things a bit. After an error happened, the user will (hopefully) call UnixSocketConsumer::CloseSocket, which will remove the file descriptor.

For 1.4, I have plans for improving the overall implementation of UnixSocket and fixing many of its problems. IMHO UnixSocketImpl should be split up into different classes for accept, listen, connect and data transfer, with each only handling its own operation. When an impl finishes, either with success or failure, it removes its watchers from the I/O loop and informs UnixSocketConsumer about the result. UnixSocketConsumer is then responsible for creating a new impl for the next phase. The overhead is probably slightly higher than now, but the code should be much cleaner and state handling should be simplified.

::: ipc/unixsocket/UnixSocket.cpp
@@ +490,4 @@
>  }
>  
>  void
> +UnixSocketImpl::FireSocketError(const nsAString& aErrorMsg)

I would leave the error messages were they are now and only do the work here.

@@ +494,5 @@
> +{
> +  NS_WARNING(aErrorMsg);
> +  nsRefPtr<OnSocketEventTask> t =
> +    new OnSocketEventTask(this, OnSocketEventTask::CONNECT_ERROR);
> +  NS_DispatchToMainThread(t);

Make sure you set mConnectionStatus to SOCKET_DISCONNECTED, because of bug 936402. And this method could also remove the watchers from the I/O loop.

@@ +502,5 @@
>  UnixSocketImpl::Accept()
>  {
>    MOZ_ASSERT(!NS_IsMainThread());
>  
>    if (!mConnector) {

I think all tests for mConnector can be turned to asserts. There are already MOZ_ASSERTs for aConnector when creating UnixSocketImpl in ConnectSocket or ListenSocket, so mConnector should never be null.

@@ +516,5 @@
>    if (!mConnector->CreateAddr(true, mAddrSize, mAddr, nullptr)) {
>      NS_WARNING("Cannot create socket address!");
> +    nsRefPtr<OnSocketEventTask> t =
> +      new OnSocketEventTask(this, OnSocketEventTask::CONNECT_ERROR);
> +    NS_DispatchToMainThread(t);

Again, mConnectionStatus = SOCKET_DISCONNECTED. Here and in all instances below.
Or just replace all of these patterns with calls to FireSocketError.
Attachment #8345454 - Flags: feedback?(tzimmermann) → feedback+
OMG, sorry for the awful grammar and orthography. :(
Implemented suggestions from feedback.
Attachment #8345454 - Attachment is obsolete: true
Attachment #8347862 - Flags: review?(tzimmermann)
Comment on attachment 8347862 [details] [diff] [review]
Patch 1 (v2) - Make sure all connect errors fire to main thread and clean up

Review of attachment 8347862 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, thanks!
Attachment #8347862 - Flags: review?(tzimmermann) → review+
https://hg.mozilla.org/mozilla-central/rev/f3c055585a4f
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → 1.3 C3/1.4 S3(31jan)
Backed out for blowing up B2G emulator tests. Please run this through Try before landing again.
https://hg.mozilla.org/mozilla-central/rev/eb8c4ec52bc0

https://tbpl.mozilla.org/php/getParsedLog.php?id=33546421&tree=B2g-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=33546743&tree=B2g-Inbound
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 1.3 C3/1.4 S3(31jan) → ---
Comment on attachment 8347862 [details] [diff] [review]
Patch 1 (v2) - Make sure all connect errors fire to main thread and clean up

Review of attachment 8347862 [details] [diff] [review]:
-----------------------------------------------------------------

Hi Kyle,
I've tried to trace the root cause of Mochitest failure with this patch.
https://tbpl.mozilla.org/php/getParsedLog.php?id=33546421&tree=B2g-Inbound

I believe the reason of the testing timeout is that B2G emulator failed to initialize.
The emulator can't be initiated correctly by "./run-emulator.sh" or Mochitest since it would file a socket error at line 527 (as I comment below).
The Mochitest would fail if the emulator is not ready.
I'm not familiar with B2G emulator boot process, it seems like we can't create socket file descriptor during B2G booting.
Best regards

::: ipc/unixsocket/UnixSocket.cpp
@@ +523,5 @@
>    if (mFd.get() < 0) {
>      mFd = mConnector->Create();
>      if (mFd.get() < 0) {
> +      NS_WARNING("Cannot create socket fd!");
> +      FireSocketError();

It would file a socket error during the emulator boot process.
Hi Jamin,

Thanks for leaving a comment. Could you provide us with the test case that triggers the bug?
Flags: needinfo?(jaliu)
Hi Thomas,

Step 1. ./config.sh emulator;./build.sh
Step 2. cd $B2G_DIR
Step 3. ./mach mochitest-remote

Step 3 would run every Mochitests on B2G emulator and take a lot of time to complete. (more than one hour)
You can find the complete list of test set in gecko/objdir-gonk/_tests/testing/mochitest/tests/mochitest.ini 

In normal case, the the emulator would turn on and show the Mochitest UI to display the testing progress.
However, the B2G emulator can't get to homescreen with change f3c055585a4f.

You can use ./run-emulator.ph to trigger the same behavior too.
Flags: needinfo?(jaliu)
I did some debugging and found that the emulator does not support sockets of type EL2CAP. With some extra debugging output added, the logcat says

I/Gecko   (   45): [45] WARNING: Could not open bluetooth socket! errno=94 'Socket type not supported' mType=4: file /home/mozilla/Projects/mozilla/src/mozilla-central/dom/bluetooth/bluez/BluetoothUnixSocketConnector.cpp, line 207
I/Gecko   (   45): [45] WARNING: Cannot create socket fd!: file /home/mozilla/Projects/mozilla/src/mozilla-central/ipc/unixsocket/UnixSocket.cpp, line 526
I/Gecko   (   45): [45] WARNING: NS_ENSURE_TRUE(gDBusConnection) failed: file /home/mozilla/Projects/mozilla/src/mozilla-central/ipc/dbus/DBusThread.cpp, line 320
I/GeckoBluetooth(   45): OnSocketConnectError: [server]
I/GonkDBus(   45): ...bind(61) gave errno 98
I/Gecko   (   45): [45] WARNING: Could not open bluetooth socket! errno=94 'Socket type not supported' mType=4: file /home/mozilla/Projects/mozilla/src/mozilla-central/dom/bluetooth/bluez/BluetoothUnixSocketConnector.cpp, line 207
I/Gecko   (   45): [45] WARNING: Cannot create socket fd!: file /home/mozilla/Projects/mozilla/src/mozilla-central/ipc/unixsocket/UnixSocket.cpp, line 526
I/Gecko   (   45): [45] WARNING: Bluetooth has already been enabled/disabled beforeor the toggling is failed.: file /home/mozilla/Projects/mozilla/src/mozilla-central/dom/bluetooth/BluetoothService.cpp, line 551
I/Gecko   (   45): [45] WARNING: No observer registered for path /B2G/bluetooth/manager: file /home/mozilla/Projects/mozilla/src/mozilla-central/dom/bluetooth/BluetoothService.cpp, line 453
I/GeckoBluetooth(   45): OnSocketConnectError: [server]
I/GonkDBus(   45): ...bind(62) gave errno 98
I/Gecko   (   45): [45] WARNING: Could not open bluetooth socket! errno=94 'Socket type not supported' mType=4: file /home/mozilla/Projects/mozilla/src/mozilla-central/dom/bluetooth/bluez/BluetoothUnixSocketConnector.cpp, line 207
I/Gecko   (   45): [45] WARNING: Cannot create socket fd!: file /home/mozilla/Projects/mozilla/src/mozilla-central/ipc/unixsocket/UnixSocket.cpp, line 526
I/GeckoBluetooth(   45): OnSocketConnectError: [server]
I/GonkDBus(   45): ...bind(62) gave errno 98
I/Gecko   (   45): [45] WARNING: Could not open bluetooth socket! errno=94 'Socket type not supported' mType=4: file /home/mozilla/Projects/mozilla/src/mozilla-central/dom/bluetooth/bluez/BluetoothUnixSocketConnector.cpp, line 207

and so on.
According to the blog post at [1], eL2CAP support requires a kernel version of at least 2.6.36. But the ICS emulator runs 2.6.29, so there probably isn't a way to simply 'switch on' the protocol and be done with this problem. Let's rather fix the error handling in BT.

[1] http://www.bluez.org/l2cap-extended-features-on-2-6-36/
BluetoothOppManager calls Listen in its error handler for failed connecting attempts. Calling Listen consequently fails with the same problem (i.e., EL2CAP not available) and triggers the same error handler.

This is a q'n'd fix that works for me; no idea if this has side effects. ;)
Attachment #8366029 - Flags: review?(echou)
Attachment #8366029 - Flags: feedback?(kyle)
Comment on attachment 8366029 [details] [diff] [review]
[01] Bug 843857: Don't try to listen on OPP socket if connecting failed

I am soooo not qualified to even fb OOP stuff at this point so I'm just gonna let eric review this. :)
Attachment #8366029 - Flags: feedback?(kyle)
Comment on attachment 8366029 [details] [diff] [review]
[01] Bug 843857: Don't try to listen on OPP socket if connecting failed

Review of attachment 8366029 [details] [diff] [review]:
-----------------------------------------------------------------

Unfortunately the patch must not work. Imagine that the remote device BT is disabled. Then we won't listen to incoming connection anymore once the patch applies. :(

Moreover, I'm not sure if listening to el2cap is the root cause since Listen() is called in BluetoothOppManager::Init() as well, which should be called once Bluetooth is up.
Attachment #8366029 - Flags: review?(echou) → review-
Hi Eric

> Unfortunately the patch must not work. Imagine that the remote device BT is
> disabled. Then we won't listen to incoming connection anymore once the patch
> applies. :(

This confuses me. OnSocketConnectError will only be called during while the connection gets established. If the remote device disconnects, you should see a OnSocketDisconnect instead.

> Moreover, I'm not sure if listening to el2cap is the root cause since
> Listen() is called in BluetoothOppManager::Init() as well, which should be
> called once Bluetooth is up.

It is. Init calls Listen. Listen fails because of EL2CAP and calls OnSocketConnectError. OnSocketConnectError calls Listen, which again fails because of EL2CAP, and so on.
FYI: This patch adds some debugging noise on top of 'Patch 1'. I used this yesterday, but don't have the logcat any longer. :(
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #18)
> Hi Eric
> 
> > Unfortunately the patch must not work. Imagine that the remote device BT is
> > disabled. Then we won't listen to incoming connection anymore once the patch
> > applies. :(
> 
> This confuses me. OnSocketConnectError will only be called during while the
> connection gets established. If the remote device disconnects, you should
> see a OnSocketDisconnect instead.
> 

The scenario I mentioned is not remote device disconnects actively. Imagine that you have a Bluetooth headset, which has been already paired with your FxOS device. Keep your headset off, and then turn on the FxOS device and try to use it to connect with the remote device (which means mSocket->Connect() is called). OnSocketConnectError should be fired instead of OnSocketDisconnect, IIRC.

> > Moreover, I'm not sure if listening to el2cap is the root cause since
> > Listen() is called in BluetoothOppManager::Init() as well, which should be
> > called once Bluetooth is up.
> 
> It is. Init calls Listen. Listen fails because of EL2CAP and calls
> OnSocketConnectError. OnSocketConnectError calls Listen, which again fails
> because of EL2CAP, and so on.

Got it. Now I understand the problem. ;)
What about this approach? If we run as BT client (i.e., !mIsServer) and connect fails, we'll switch over to server mode and start listening. If this fails in server mode, we give up.
Attachment #8366029 - Attachment is obsolete: true
Attachment #8366627 - Flags: review?(echou)
(In reply to Kyle Machulis [:kmachulis] [:qdot] from comment #5)
> Created attachment 8347862 [details] [diff] [review]
> Patch 1 (v2) - Make sure all connect errors fire to main thread and clean up
> 
> Implemented suggestions from feedback.

BTW, there is a semicolon missing in this patch.
Uploading new patch version with semicolon fixed (the backed out version that I landed was the fixed version but forgot to update bug)
Attachment #8347862 - Attachment is obsolete: true
Comment on attachment 8366627 [details] [diff] [review]
[01] Bug 843857: Don't listen if connect fails on BT server

Review of attachment 8366627 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, we can do this. :)
Attachment #8366627 - Flags: review?(echou) → review+
Target Milestone: --- → 1.4 S1 (14feb)
Whiteboard: [systemsfe]
V3 is missing a header. This patch (v4) is v2 + the missing semicolon; still attributed to Kyle.
Attachment #8366775 - Attachment is obsolete: true
Attachment #8370706 - Flags: review+
Thanks everyone. I'll land the patches as soon as b2g-inbound is open again.
(In reply to Thomas Zimmermann [:tzimmermann] [:tdz] from comment #25)
> Created attachment 8370706 [details] [diff] [review]
> [02] Make sure all connect errors fire to main thread and clean up (v4)
> 
> V3 is missing a header. This patch (v4) is v2 + the missing semicolon; still
> attributed to Kyle.

Oops. Thanks for fixing.
https://hg.mozilla.org/mozilla-central/rev/3d8caa7ec5eb
https://hg.mozilla.org/mozilla-central/rev/d5907ae0d95a
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Whiteboard: [systemsfe] → [systemsfe][p=8]
You need to log in before you can comment on or make changes to this bug.