Closed Bug 969518 Opened 10 years ago Closed 10 years ago

Autophone - attempt to use reverse tethering so we can use ethernet over usb instead of wifi

Categories

(Testing Graveyard :: Autophone, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bc, Assigned: bc)

Details

Attachments

(3 files, 1 obsolete file)

Using multiple phones and tablets in close proximity all using the same wifi network is an invitation to networking issues. We should attempt to use reverse tethering so we can use ethernet over usb which will be subject to network congestion as wifi.
I have left the wifi connection active on the phones so I can use that for controlling them via the SUTAgent. Trying to deal with receiving notification of phone reboots so that the usb network could be set up without already having a SUTAgent was a bridge too far. It will be interesting to see if moving the s1s2 traffic off of the wifi network will help improve the SUTAgent reliability. We will still see possible interference over wifi for the SUTAgent activities.

I've used a shell script to set up the usb network using ppp over usb. This requires the phone to run the shell as root and will require 're-rooting' of the phones/devices used in production. The script uses linux specific commands such as sysctl, and iptables so the autophone host will have to be converted from OSX to Linux. On the first invocation, the script sets up the ppp device and the iptables rules. On subsequent invocations, it skips creating the iptables rules if the ppp device has already been created. The devices can access the entire local network over the usb network. I've used ip route on the phone to make sure the requests for the remote test pages travel over the usb connection.

I have a sample running from an old X41 laptop running Fedora 20 and posting results to <http://phonedash-dev.allizom.org/#/org.mozilla.fennec/totalthrobber/local-blank/norejected/2014-03-04/2014-03-05/notcached/errorbars/standarderror>. It is testing mozilla-central builds for a gs2 and nexus one for 2014-03-04 and has debug level logging turned on in Autophone. It should be done in a couple of hours.

I plan on using iproute2's tc command for traffic shaping. It is available on Linux and on the phones. I haven't tried it out yet though. I like wise haven't measured the raw network through put available over the usb network connection.

Some observations:

* Linux *may* produce more stable results than OS X without any use of usb networking.

* Physical proximity to other phones may affect the wifi remote results more than I appreciated.

* The production Autophone system has been running but has not been that active this morning and may affect the results depending on its load.

* It is unknown how running more phones over usb will scale.
Attachment #8386166 - Flags: review?(mcote)
Comment on attachment 8386166 [details] [diff] [review]
bug-969518-usbnet.patch

Review of attachment 8386166 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me, just tiny nits.

::: USAGE.md
@@ +32,5 @@
> +               If specified, set up adb ppp over usb
> +               connections 'so that all traffic from the
> +               devices to the host or network 'specified by
> +               usb_network passes through the 'ppp over usb
> +               connection. Otherwise, use the default 'network.

Looks like you have some quotes from copying the options help.

::: autophone.py
@@ +363,5 @@
>                  self.logger.debug('Received registration message for known phone '
>                                    '%s.' % phoneid)
>                  worker = self.phone_workers[phoneid]
> +                if worker.phone_cfg == phone_cfg:
> +                    if phone_cfg['usb_network']:

I think you meant to use USB_NETWORK.

@@ +370,5 @@
> +                                               worker.worker_num + 1)
> +                        phone_usb_ip = '.'.join(usb_ip_parts)
> +                        output = subprocess.check_output([
> +                            usbnet_script,
> +                            '-s', phone_cfg['serial'],

SERIAL
Attachment #8386166 - Flags: review?(mcote) → review+
damn, I saw the coloring in USAGE.md but was so tired I missed fixing it. Fixed the USB_NETWORK issue but I don't currently have a variable SERIAL or some of the other phone_cfg attributes. Is that something you would want for 'phoneid', 'serial', 'ip', etc?
https://github.com/mozilla/autophone/commit/14cdbd778803b0026c001c44a0e3627da6e12868

I'll leave this open while I finish the investigation and the switch over for the host from OSX to Linux.
Attached patch bug-969518-followup-1.patch (obsolete) — Splinter Review
* autophone.ini.example
** Update with usbnet examples.
* autophone.py
** Make the host port distinct for each ppp connection.
** Add debug logging about usbnet set up and catch errors from
   calling usbnet.sh.
* usbnet.sh
** Fix typo in usage.
** Restrict usbnet to devices where we can run adbd on device as
   root so we can create the ppp device on the phone.
** If we have udev rules set up we do not need suid adb except
   for when creating the ppp devices.
** Explicitly turn off deflate and bsd compression on the ppp
   connection.
Attachment #8393564 - Flags: review?(mcote)
Comment on attachment 8393564 [details] [diff] [review]
bug-969518-followup-1.patch

Review of attachment 8393564 [details] [diff] [review]:
-----------------------------------------------------------------

::: autophone.ini.example
@@ +2,5 @@
>  #clear_cache = False
>  #ipaddr = ...
>  #port = 28001
> +#usb_network=192.168.1.50
> +#usb_gateway=br0

Nit: spacing is inconsistent with the rest of the file.

::: usbnet.sh
@@ +48,5 @@
>  fi
>  
>  echo "waiting for device $serialno"
> +adb -s $serialno wait-for-device
> +adb -s $serialno root

Should you maybe abort (loudly) on error here?
Attachment #8393564 - Flags: review?(mcote) → review+
(In reply to Mark Côté ( :mcote ) from comment #6)
> Comment on attachment 8393564 [details] [diff] [review]
> bug-969518-followup-1.patch

> Should you maybe abort (loudly) on error here?

Good idea.
Not sure how much effort you want to put into reviewing this since it will all be ripped out in our movement to adb instead of SUTAgent.

* autophone.ini.example
** Update with usbnet examples.
* autophone.py
** Do not lower case registration url data as device serial numbers
   are case sensitive.
** Make the host port distinct for each ppp connection.
** Add debug logging about usbnet set up and catch errors from
   calling usbnet.sh. Terminate Autophone and send email notification
   if an error occurs calling usbnet.sh.
** In Autophone.stop, call stop on the workers before calling shutdown
   on the Autophone instance. This along with the change to worker.py,
   helps prevent deadlocks when shutting down the server.
* worker.py
** Call terminate on worker process when stopping the worker.
* usbnet.sh
** Fix typo in usage.
** Implement a shell function wait_for_device using adb get-state
   which will time out and return an error after 30 seconds.
** Restrict usbnet to devices where we can run adbd on device as
   root so we can create the ppp device on the phone. Terminate
   with a non-zero exit code if adb root is not supported by a device.
** If we have udev rules set up we do not need suid adb except
   for when creating the ppp devices.
** Explicitly turn off deflate and bsd compression on the ppp
   connection.

I'd rather not spend much more time on this since it is a dead end. I intend to tag the repo with this revision so we can easily identify the last SUTAgent based revision.
Attachment #8393564 - Attachment is obsolete: true
Attachment #8400006 - Flags: review?(mcote)
The usb networking is not reliable. Comparing an original usbnet run

<http://phonedash.mozilla.org/#/org.mozilla.fennec/throbberstop/remote-twitter/norejected/2014-03-26/2014-03-26/notcached/noerrorbars/standarderror>

to an additional usbnet run and a wifi based run

<http://phonedash-dev.allizom.org/#/org.mozilla.fennec/throbberstop/remote-twitter/norejected/2014-03-26/2014-03-26/notcached/errorbars/standarderror>

shows the usbnet runs are not reproducible. It appears that there is a secular trend after repeated reboots of the device. It may be related to the increasing 'device number' assigned to the device on each reboot, but I'm not certain.

Mark and I discussed the situation and have decided that usbnet using ppp/adb is not workable and that wifi is also not viable due to its variability and its unsuitability for hosting in a colo environment. Once this is checked in, we will move towards a local test (from sdcard) only solution using adb instead of SUTAgent over tcp/ip.
Comment on attachment 8400006 [details] [diff] [review]
bug-969518-followup-2.patch

Review of attachment 8400006 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good although I wonder about the terminate().

::: worker.py
@@ +206,4 @@
>          """Call from main process."""
>          if self.is_alive():
>              self.cmd_queue.put_nowait(('stop', None))
> +            self.p.terminate()

Hm isn't it nicer to try to let the process stop gracefully rather than immediately SIGTERMing it?  Pretty much no point in sending the stop command if you're going to terminate it right after.
Attachment #8400006 - Flags: review?(mcote) → review+
Not really. It would just keep going and start running the tests. The whole stop, disable, etc thing is and has been broken. If we want to be able to handle a failure to start the ppp/adb networking, we need to keep the device(s) from running the test and submitting results, then rebooting and doing the samething over and over again. That is why I finally was forced to just stop Autophone altogether but that wasn't enough since the phone would just keep running the next test. It would crap out when it tried to get the next one since Autophone would be 'sort of down', but it would always submit one set. I finally got tired of wasting time with the broken control system since we are going to have to change a lot of this any way.
Okay, I'm just saying that you might as well take the 'stop' out of there, then, since there's I doubt it would process before terminate() acts.
Ah, right. Good point.

https://github.com/mozilla/autophone/commit/850f6a2f783737d8d62a86f0425d11afadc2cb80

We tried an failed, so I'm marking this fixed as filed. Maybe we can revisit some time.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I've finally gotten my nexus 4, nexus 5 and nexus 7 devices flashed back to factory and rooted via changing the default.prop in the ramdisk.img to enable adb root so I *could* run a test of usb networking with them to see if they exhibit the same secular behavior. I'll try to pick a quiet time and run the same 2014-03-26 day to phonedash-dev and see how they behave.
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: