Closed
Bug 1026259
Opened 10 years ago
Closed 6 years ago
wpt tests sometimes lose wifi when restarting
Categories
(Firefox OS Graveyard :: Certification Suite, defect)
Firefox OS Graveyard
Certification Suite
Tracking
(firefox32 fixed, b2g-v2.0 fixed)
RESOLVED
WONTFIX
People
(Reporter: jgriffin, Unassigned)
References
Details
Attachments
(2 files)
1.79 KB,
patch
|
ahal
:
review+
|
Details | Diff | Splinter Review |
887 bytes,
patch
|
jgraham
:
review+
|
Details | Diff | Splinter Review |
The wpt tests restart B2G when a test fails. Sometimes, when this restart occurs, the device doesn't establish a wifi connection, and the restart times out. It's then retried. It looks like what's happening is that we reboot the device, and then immediately try to restart B2G with a new profile. We do this sometimes before the device has completely booted, which can have some bad side effects. Perhaps we need to add a check to wait for Marionette here? https://github.com/w3c/wptrunner/blob/jgraham/initial/wptrunner/browsers/b2g.py#L193
Comment 1•10 years ago
|
||
It seems like this check should be in mozrunner, perhaps? Putting in a Marionette().wait_for_port() in B2GRunner.start() before _setup_remote_profile() at the end, perhaps? (I thought this was working well, but as I typed this the phone rebooted a couple of times and failed to connect to WiFi, so I'm not so sure it is)
Reporter | ||
Comment 2•10 years ago
|
||
Yes, we could try putting it in B2GRunner.start().
Reporter | ||
Comment 3•10 years ago
|
||
The line that causes this problem is this: http://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozrunner/mozrunner/remote.py?from=remote.py#291 We "wait_for_net" until we see something with the pattern of an IP address that's "UP". The problem is, this matches 0.0.0.0. We need to wait until this has a non-zero IP before continuing. Changing that regex locally to: if (re.search(r'UP\s+[1-9]\d{0,2}\.\d{1,3}\.\d{1,3}\.\d{1,3}', line)): solves the problem. Unfortunately, this code was completely ripped out in the recent mozrunner refactor, and we no longer appear to have a device runner. We'll have to put some of that code back (i.e., create a generic, non-emulator B2G device runner) before we can solve this problem correctly. Or, potentially, we could use mozilla-aurora to create a point release to mozrunner 5.37 with this fix. Thoughts, ahal, jgraham?
Reporter | ||
Comment 4•10 years ago
|
||
Actually, given that mozrunner has changed so significantly, I'm leading toward making a 5.37.1 release off of mozilla-aurora.
Comment 5•10 years ago
|
||
Is the code in aurora identical to that in m-c before the refactor? In any case it seems like we should pin the certsuite to 5.37 for now and ahal should release 6.0. Then I can assess the difficulty of porting to 6.0 and we can decide whether it's worthwhile to make a new 5.x series release based on that.
Comment 6•10 years ago
|
||
You should be able to create a non-emulator runner using the mozrunner.devices.Device base class. There isn't currently a B2GDeviceRunner shortcut in the runners.py file, but it would be easy to add one. If you want I can put up a patch for it.
Comment 7•10 years ago
|
||
I tried modifying the regexp and I am still having difficulty getting net connections.
Reporter | ||
Comment 8•10 years ago
|
||
My concern is that the current behavior is non-deterministic and will probably cause problems with OEM's that will come back to haunt us. I agree that the long-term fix is to add missing abilities to mozrunner 6, and update wpt to work with that. In the shorter term, if we can improve the situation by releasing a 5.37.1, I think we should do that; anything that reduces confusion about the cert suite results is likely very worthwhile. I'll investigate this option locally, and post a patch here if I'm successful.
Comment 9•10 years ago
|
||
OK, thanks. Maybe the WiFi issue I see is different to the one that you see, but waiting longer didn't seem to help me. When the WiFi connection here fails I see something like 1:28.71 LOG: Thread-Log INFO STDOUT: -*- WifiWorker component: Determined that scanning is stuck, turning on background scanning! 1:28.71 LOG: Thread-Log INFO STDOUT: -*- WifiWorker component: Event coming in: CTRL-EVENT-STATE-CHANGE id=-1 state=0 BSSID=00:00:00:00:00:00 1:28.71 LOG: Thread-Log INFO STDOUT: -*- WifiWorker component: State change: SCANNING -> DISCONNECTED 1:28.71 LOG: Thread-Log INFO STDOUT: -*- WifiWorker component: Event coming in: CTRL-EVENT-STATE-CHANGE id=-1 state=2 BSSID=00:00:00:00:00:00 1:28.71 LOG: Thread-Log INFO STDOUT: -*- WifiWorker component: State change: DISCONNECTED -> INACTIVE in the logs (with WiFi logging enabled), and WiFi never comes up. Additionally I have started working on the mozrunner update and I have something that runs a little, but it's not yet at parity with the old code.
Reporter | ||
Comment 10•10 years ago
|
||
I wonder if it depends on the characteristics of the wifi network? Also, in addition to modifying the regex, I put in an extra call to self._wait_for_net() here: http://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozrunner/mozrunner/remote.py?from=remote.py#169 (as you suggested in comment #1) But, I haven't done enough testing to verify if that addition was necessary or not. I'll re-run the tests without my patch and see if I get the same wifi error messages.
Reporter | ||
Comment 11•10 years ago
|
||
(In reply to James Graham [:jgraham] from comment #5) > Is the code in aurora identical to that in m-c before the refactor? > I just checked, and yes it is.
Reporter | ||
Comment 12•10 years ago
|
||
This patch is for mozilla-aurora, and creates a mozrunner 5.37.1 which completely resolves the wifi problems in wpt for me locally. Eventually, we can incorporate similar logic into mozrunner 6.0 on trunk.
Attachment #8446188 -
Flags: review?(ahalberstadt)
Reporter | ||
Updated•10 years ago
|
Assignee: nobody → jgriffin
Comment 13•10 years ago
|
||
Comment on attachment 8446188 [details] [diff] [review] Improve wait_for_net to reduce the chances of dropping wifi when calling B2GRunner.start(), Review of attachment 8446188 [details] [diff] [review]: ----------------------------------------------------------------- Lgtm! James was trying to get his wptrunner working with trunk, but his tests weren't starting.. this might be what he was missing!
Attachment #8446188 -
Flags: review?(ahalberstadt) → review+
Comment 14•10 years ago
|
||
I don't think it is the (only) problem that I was having with the new mozrunner, although we should certainly forward-port the change. jgriffin: thanks for this. If you make a release I'll pin the version of wptrunner and mozrunner in the certsuite until the port to trunk is stable.
Reporter | ||
Comment 15•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-aurora/rev/e82b42e059af
Updated•10 years ago
|
status-b2g-v2.0:
--- → fixed
status-firefox32:
--- → fixed
Reporter | ||
Comment 16•10 years ago
|
||
I've release mozrunner 5.37.1 with the above fix. James, I'll pass this bug to you for the dependency updates.
Reporter | ||
Updated•10 years ago
|
Assignee: jgriffin → james
Comment 17•10 years ago
|
||
Hmm, so how important is it to call wait_for_net() before rebooting the device? That breaks the assumption in wptrunner that if there is no net connection we can call B2GRunner.start(), which will reboot the device and try again.
Reporter | ||
Comment 18•10 years ago
|
||
(In reply to James Graham [:jgraham] from comment #17) > Hmm, so how important is it to call wait_for_net() before rebooting the > device? That breaks the assumption in wptrunner that if there is no net > connection we can call B2GRunner.start(), which will reboot the device and > try again. Currently, if you reboot the device, and then reboot the device again or restart B2G before a net connection has been established, you'll fail to get a connection after B2G or the device restarts. (Instead, you'll have to reboot/restart a 3rd time....apparently there is some weird state persistence at work here.) I haven't seen the case locally where we fail to get a connection other than by the above, and rebooting fixes this. However, to protect against this, we could not fail if the initial wait_for_net fails, and continue with rebooting. I'll submit a patch.
Reporter | ||
Comment 19•10 years ago
|
||
Attachment #8448096 -
Flags: review?(james)
Reporter | ||
Updated•10 years ago
|
Assignee: james → jgriffin
Comment 20•10 years ago
|
||
Comment on attachment 8448096 [details] [diff] [review] Don't throw if wait_for_net fails before rebooting, So I had a thought this morning; maybe the right* thing to do here is check if we have already loaded the homescreen and, if so, skip waiting for net and go straight to setting up the profile. * Or at least a better.
Updated•10 years ago
|
Attachment #8448096 -
Flags: review?(james) → review+
Reporter | ||
Comment 21•10 years ago
|
||
The problem with checking for the homescreen is that the code will become coupled with specific versions of Gaia, and I'm not sure we want to do that, if we can avoid that. The way we do it for the emulator will likely not work on real devices, since the emulator code relies on the fact that the emulator is so slow. Sadly, we don't have a great, generic way of determining whether Gaia is loaded.
Comment 22•10 years ago
|
||
We already rely on checking for the homescreen in other places, at least. I'm just worried that with this setup when you get no wifi for whatever reason you have to wait a long time for an operation that you know will fail.
Reporter | ||
Comment 23•10 years ago
|
||
Do you know where we check for the homescreen currently? (The only place I was aware of it is emulator.py.) If we already have code that is sufficiently robust, that definitely makes sense.
Comment 24•10 years ago
|
||
There's code in wptrunner[1] modified from the code in mozrunner. I'm not sure how robust it is really, but apart from the logging-related bug, it isn't one of the parts has suffered from intermittent breakage. [1] https://github.com/w3c/wptrunner/blob/jgraham/initial/wptrunner/browsers/b2g.py#L286
Reporter | ||
Comment 25•10 years ago
|
||
(In reply to James Graham [:jgraham] from comment #24) > There's code in wptrunner[1] modified from the code in mozrunner. I'm not > sure how robust it is really, but apart from the logging-related bug, it > isn't one of the parts has suffered from intermittent breakage. > > [1] > https://github.com/w3c/wptrunner/blob/jgraham/initial/wptrunner/browsers/b2g. > py#L286 I took a stab at integrating this into the version of mozrunner on mozilla-aurora, but no luck. The wptrunner doesn't pass a marionette object to B2GRunner, so the runner has no way (without big changes to the code) of using marionette to wait for the homescreen. Have you seen an instance of the wifi not coming up after rebooting with the current patch? I'm pretty tempted to leave that patch as-is...the only bad thing that would happen is that we'd wait 40 seconds unnecessarily.
Comment 26•10 years ago
|
||
Yeah, the WiFi in the office here still seems to be really problematic with B2G devices for some reason. So I will likely hit the 40s wait reasonably often. But I agree it's not obvious how to integrate things that require marionette into this code without creating a terrible mess.
Reporter | ||
Comment 27•10 years ago
|
||
Landed as https://hg.mozilla.org/releases/mozilla-aurora/rev/afea0cb928b5. Bumped the mozrunner version and released as: https://pypi.python.org/pypi/mozrunner/5.37.2
Reporter | ||
Comment 28•10 years ago
|
||
I don't know that this is fixed, but I've fixed it enough so that I'm no longer working on it.
Assignee: jgriffin → nobody
Comment 29•10 years ago
|
||
I've heard that the problems I see are likely a bad interaction between bugs in 1.4 and bad network settings in the office. Hopefully this situation will improve when we switch to 2.x or fix the office network.
Comment 30•6 years ago
|
||
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•