I'd like to propose the removal of use of 'connect_to_network' in functional tests and replace them with a more explicit call (dependent upon what the test requires). The reason is that it creates several test paths which usually we try to avoid because they are the enemy of determinism. In addition, when we are using content from the local resource server we need to be on the same network and (on device) that can only be reached with wifi. Serving from the internet when wifi fails creates another branch of that logic in the test. To have more clear test paths I think it'd be better to have it fail hard when there is just no wifi at all - lack of wifi is likely to be symptomatic of something big we need to address anyway so using cell data may mask issues. Even though we are often frustrated by unrelated failures in Gaia tests, in this case I feel we should feel comfortable for the tests to fail as we've been getting much better responses to regressions caught by ui test failures recently, we can afford to fail aggressively. As we do now, where we explicitly need to check wifi or cell data code during a test then the test will do that. It's not really intuitive that it falls back to cell data - you have to know that is doing that and we have already tried to address by adding the log message when it falls back to cell data. The argument for using with cell data when wifi is unavailable I don't think would arise very often (ie, it's an edge case) and in that scenario the tester can modify the test to suit their local hardware configuration. It's a real edge case. An alternative could be to keep the use of connect_to_network, but remove the cell data step.
The reason this was added was to allow tests that require a network connection to not explicitly depend on WiFi. This meant that tests could indicate that they needed to have a connection without explicitly determining the type (at the time we had Panda boards on ethernet, desktop using the host's connection, and devices using either WiFi or cell data). This was not originally intended to be used as a failover, but it did have the advantage of allowing tests to pass if a WiFi connection was attempted but not available. Personally, I used this while running tests from coffee shops or on the train, when a WiFi connection is not always possible. I don't see the benefit to restricting our tests to only run on WiFi unless we don't expect them to run anywhere other than in a controlled CI environment.
I think the points you both raise are valid. If a test is not specifically meant to be testing wifi or cell data then it seems fine to me to allow it to use whatever means it can to connect to the network. To have it fail if a wifi connection isn't available, even though the test is not meant to test wifi, doesn't seem advantageous. On the other hand, it can make debugging tests more difficult if you don't know if the problem was based on communication, or which communication means was used. I think adding the logging which indicates that it had to fail over to cell data somewhat mitigates this issue. It really seems to me that the issue is that we are still having problems with wifi connectivity but hopefully that will get addressed eventually. Setting up a specific suite of tests to test wifi and which can be used to investigate wifi failures seems like a better approach to me than forcing each test to specify exactly which communication method it should use.
> If a test is not specifically meant to be testing wifi or cell data then it seems fine to me to allow it to use whatever means it can to connect to the network. Well the thing is we're needing to be more specific about the network because of the use of local resources. In the case of say test_browser_navigation.py, it can be: 1) on device, using wifi, loading webpage from local resources 2) on device, using cell data, loading webpage from internet 3) on desktopb2g, using lan, loading webpage from local resources What I'm suggesting is we just remove option 2 which avoids issues like slow cell data and determinism. In my observation the MV wifi has been pretty solid lately and I think we can trust it. If we can't then we need to do more work in either b2g or in the server room. We need to trust it 100% if we report results to TBPL. Which really just leaves Dave and his trains and coffee shops! Maybe we can use a command line override the default network connection to use cell data? Otherwise I think it's quite an edge case scenario. We already are quite focussed on a controlled CI environment in the way we use PyBluez and Plivo packages.
Here's my two cents, but really it's just reiterating the points above. Determinism is really the only way we are going to survive debugging and maintaining these tests as the test suites grow rapidly. While it's helpful to have the tests just grab whatever data connection they can, especially when they do not care what type of connection it is, it just makes it that much more painful when the tests cannot be debugged and repeated easily. This could just mask certain wifi vs. cell problems. One particular test that comes to mind is test_connect_to_network.py. Was wifi unavailable? Was cell data? Can we easily repeat it if it's a problem with only connecting to a cell data network? This is a small example, but could get out of hand quickly. I like the idea of being able to some how specify the data connection to use and overriding it depending on what you're testing. It also would allow you to run through specific tests both through wifi and cell and compare the outcomes.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 1084306
QA Whiteboard: [fxosqa-auto-backlog+]
You need to log in before you can comment on or make changes to this bug.