Closed Bug 987760 Opened 6 years ago Closed 5 years ago

[v1.4][v2.0] Investigate test_settings_wifi.py failure

Categories

(Firefox OS Graveyard :: Gaia::UI Tests, defect, P1, major)

ARM
Gonk (Firefox OS)

Tracking

(tracking-b2g:backlog)

RESOLVED FIXED
tracking-b2g backlog

People

(Reporter: RobertC, Unassigned)

References

Details

(Keywords: qablocker, regression, Whiteboard: [fromAutomation][p=2], Priority 2)

Attachments

(23 files)

115.49 KB, text/plain
Details
1.75 MB, text/html
Details
4.23 MB, application/zip
Details
9.40 MB, application/zip
Details
29.10 KB, image/png
Details
6.31 MB, text/plain
Details
5.12 MB, application/x-gzip
Details
4.71 KB, text/plain
Details
2.14 MB, text/plain
Details
412.46 KB, text/plain
Details
5.32 MB, application/octet-stream
Details
391.84 KB, text/plain
Details
428.88 KB, text/html
Details
358.48 KB, text/plain
Details
312.41 KB, text/plain
Details
511.55 KB, text/plain
Details
2.41 MB, text/x-log
Details
1.36 MB, text/x-log
Details
875.89 KB, text/html
Details
60.11 KB, application/x-executable
Details
8.13 MB, text/plain
Details
475.59 KB, text/html
Details
1.23 MB, text/plain
Details
The test test_settings_wifi.py is failing intermittently because the expected wifi is not present in the available list. We could not reproduce this issue on the local devices. The earliest failure with this issue on Jenkins is from March 18.

Is there an issue with the local wifi network? Can someone from MV confirm this issue?

Stacktrace:
Traceback (most recent call last):
File "/var/jenkins/workspace/b2g.hamachi.mozilla-aurora.ui/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette_test.py", line 163, in run
testMethod()
File "/var/jenkins/workspace/b2g.hamachi.mozilla-aurora.ui/tests/python/gaia-ui-tests/gaiatest/tests/functional/settings/test_settings_wifi.py", line 21, in test_connect_to_wifi_via_settings_app
wifi_settings.connect_to_network(self.testvars['wifi'])
File "/var/jenkins/workspace/b2g.hamachi.mozilla-aurora.ui/tests/python/gaia-ui-tests/gaiatest/apps/settings/regions/wifi.py", line 32, in connect_to_network
this_network = self.wait_for_element_present(*this_network_locator)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-aurora.ui/tests/python/gaia-ui-tests/gaiatest/apps/base.py", line 29, in wait_for_element_present
lambda m: m.find_element(by, locator))
File "/var/jenkins/workspace/b2g.hamachi.mozilla-aurora.ui/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/wait.py", line 143, in until
cause=last_exc)
TimeoutException: Traceback (most recent call last):
File "/var/jenkins/workspace/b2g.hamachi.mozilla-aurora.ui/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/wait.py", line 122, in until
rv = condition(self.marionette)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-aurora.ui/tests/python/gaia-ui-tests/gaiatest/apps/base.py", line 29, in <lambda>
lambda m: m.find_element(by, locator))
File "/var/jenkins/workspace/b2g.hamachi.mozilla-aurora.ui/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 1189, in find_element
response = self._send_message('findElement', 'value', **kwargs)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-aurora.ui/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 613, in _send_message
self._handle_error(response)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-aurora.ui/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 634, in _handle_error
raise NoSuchElementException(message=message, status=status, stacktrace=stacktrace)
TimeoutException: Timed out after 30.3313939571 seconds, caused by <class 'marionette.errors.NoSuchElementException'>

Jenkins builds link:
http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.hamachi.mozilla-aurora.ui/70/testReport/junit/test_settings_wifi/TestSettingsWifi/test_connect_to_wifi_via_settings_app/history/?


Latest build on which this issue is reproduced:
Gaia      a2a88d0638594a6510f878d2c5e99a6ead7520ad
Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/67bdb575d833
BuildID   20140325000201
Version   30.0a2
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013

Also reproduced on:
Gaia      730670951e40b2317a167fcd07c398bb662d6e87
Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/a44f8b39c2c8
BuildID   20140324000202
Version   30.0a2
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
We cannot rerpoduce this locally, but we are seeing conatnt failures on Jenkins. Can someone from MV office verify this manually?
Flags: needinfo?(stephen.donner)
Flags: needinfo?(mozillamarcia.knous)
Raymond, can you try, today before you leave:

1) multiple (at least 15 or so) runs of test_settings_wifi.py on mozilla-central/master build
2) the same, but against the Aurora build for 1.4?

Let us know here what you see, please, and help us try to understand whether this is an automation/manual bug with the build(s), or perhaps a Wi-Fi issue with that node on Mozilla Guest; thanks!
Flags: needinfo?(stephen.donner) → needinfo?(mozbugs.retornam)
This was from a replacement candidate (but not-yet in rotation) B2G-11 phone.
The command I ran was: gaiatest --testvars=/Users/sdonner/Desktop/ui-tests/b2g-7.json --address=localhost:2828 --xml-output=results/results.xml --html-output=results/index.html --type=b2g-unit --restart --repeat=9  --timeout=30000 gaiatest/tests/functional/settings/test_settings_wifi.py

sdonner-10779:B2G-flash-tool sdonner$ ./check_versions.sh 
Gaia      80af23f8c74d9d2e9388d8ed3c204040b5c528ec
Gecko     https://hg.mozilla.org/mozilla-central/rev/c69c55582faa
BuildID   20140326040202
Version   31.0a1
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
:dlee, can you please take a look?  Thanks!
Flags: needinfo?(dlee)
Flags: needinfo?(mozillamarcia.knous)
Flags: needinfo?(mozbugs.retornam)
:bebe, can you try a good # of runs on mozilla-central builds /before/ we branched for 1.4?

(Note: it's getting a bit convoluted -- at least to me!-- to know what's where, with bug 986349 and bug 988117 in the mix.)
(In reply to Stephen Donner [:stephend] from comment #6)
> :dlee, can you please take a look?  Thanks!

Hi, Stephen,
  I tried to reproduce this issue on my platform(unagi) but i cannot reproduce it.
From the log in description it seems after scanning it can't find the ssid specified in testvars files.
But for scan result, we cannot promise that we could get all the wifi-network each time, it depends on driver, device.
  So if this issue is not always reproducible and it is not only happened after specific version, i will suggest maybe we can retry this(scan and get the wifi network) multiple times.
Flags: needinfo?(dlee)
Hi Dimi -

Luckily, Eric Chang helped reproduce this issue in the Taipei office:

gaiatest --testvars=testvars.json --address=localhost:2828 --xml-output=results.xml --html-output=index.html --type=b2g-unit --restart --repeat=9 --timeout=30000 gaiatest/tests/functional/settings/test_settings_wifi.py

GAIA_REV=c8bdb8fdd176f794d703a9ba24cca8364313cd45
GECKO_REV=68f47b1cc64d415ac77785a264257bc7a16ba09e
GECKO_BRANCH=mozillaorg/master
GAIA_BRANCH=mozillaorg/master
BUILD_TAG=jenkins-B2G.master.hamachi-206
BuildID=20140326051552

Can you work with him/the QA team?  Thanks to both of you!
Severity: normal → major
I checked again on this case and my failed one, the timeout on my case is as follows.


> GAIA_REV=c8bdb8fdd176f794d703a9ba24cca8364313cd45
> GECKO_REV=68f47b1cc64d415ac77785a264257bc7a16ba09e
> GECKO_BRANCH=mozillaorg/master
> GAIA_BRANCH=mozillaorg/master
> BUILD_TAG=jenkins-B2G.master.hamachi-206
> BuildID=20140326051552
> File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette_test.py", line 145, in run
> self.setUp()
> File "/home/ericcc/repo/mozilla-b2g/gaia/tests/python/gaia-ui-tests/gaiatest/tests/functional/settings/test_settings_wifi.py", line 13, in setUp
> self.data_layer.disable_wifi()
> File "/home/ericcc/repo/mozilla-b2g/gaia/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 263, in disable_wifi
> result = self.marionette.execute_async_script("return GaiaDataLayer.disableWiFi()", special_powers=True)
> File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 1162, in execute_async_script
> filename=os.path.basename(frame[0]))
> File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 613, in _send_message
> self._handle_error(response)
> File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 662, in _handle_error
> raise ScriptTimeoutException(message=message, status=status, stacktrace=stacktrace)
> ScriptTimeoutException: ScriptTimeoutException: timed out


Let me try to run again of this test, change my setup to - Buri, v1.4 and run again.
v1.4 
Gaia      7e705dd4718d528974d99ac31866318d7e201152         
Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/4889124accfa
BuildID   20140326000201                                   
Version   30.0a2                                           
ro.build.version.incremental=eng.archermind.20131114.105818
ro.build.date=Thu Nov 14 10:58:33 CST 2013  

This is what I've got out of 100 tests, not easy to reproduce..
 2/100 Timeouts of the same as above
 1/100 Timeout when waiting for X to show up
97/100 Pass
blocking-b2g: --- → 1.4?
Component: Gaia::UI Tests → Wifi
Whiteboard: [fromAutomation]
Recent automation regression - needed to ensure that we have stable on device automation runs.
Dimi, can you or someone on your team look at this ASAP, please?  Thx :-)
Flags: needinfo?(dlee)
Assignee: nobody → dlee
Flags: needinfo?(dlee)
Hi Eric
  Could you help run the test with adb log ? it will be very helpful ! thanks
Flags: needinfo?(echang)
Got a small script to run test with adb, for 100 runs, I need maybe 90 minutes
Flags: needinfo?(echang)
1.4+ for qablocker issue
blocking-b2g: 1.4? → 1.4+
Currently I am trying to reproduce this again with adb logcat on, first 100 tests(a for loop of 100 times, each test with separate adb logcat), another 100 times with a single logcat and gaiatest --repeat=99

200 PASS.


V1.4 AURORA, Buri, Engineering build
Gaia      a2a88d0638594a6510f878d2c5e99a6ead7520ad         
Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/67bdb575d833
BuildID   20140325000201                                   
Version   30.0a2                                           
ro.build.version.incremental=eng.archermind.20131114.105818
ro.build.date=Thu Nov 14 10:58:33 CST 2013
                                              
wifi with wpa
Are you able to try both authenticated and open nodes?
V1.4 AURORA, Buri, Engineering build
Gaia      a2a88d0638594a6510f878d2c5e99a6ead7520ad         
Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/67bdb575d833
BuildID   20140325000201                                   
Version   30.0a2                                           
ro.build.version.incremental=eng.archermind.20131114.105818
ro.build.date=Thu Nov 14 10:58:33 CST 2013
                                              
wifi without authentication

99 PASS
 1 FAIL (but not a timeout)

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette_test.py", line 163, in run
    testMethod()
  File "/home/ericcc/repo/mozilla-b2g/gaia/tests/python/gaia-ui-tests/gaiatest/tests/functional/settings/test_settings_wifi.py", line 21, in test_connect_to_wifi_via_settings_app
    wifi_settings.connect_to_network(self.testvars['wifi'])
  File "/home/ericcc/repo/mozilla-b2g/gaia/tests/python/gaia-ui-tests/gaiatest/apps/settings/regions/wifi.py", line 32, in connect_to_network
    this_network = self.wait_for_element_present(*this_network_locator)
  File "/home/ericcc/repo/mozilla-b2g/gaia/tests/python/gaia-ui-tests/gaiatest/apps/base.py", line 29, in wait_for_element_present
    lambda m: m.find_element(by, locator))
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.7.5-py2.7.egg/marionette/wait.py", line 143, in until
    cause=last_exc)
TimeoutException: Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.7.5-py2.7.egg/marionette/wait.py", line 122, in until
    rv = condition(self.marionette)
  File "/home/ericcc/repo/mozilla-b2g/gaia/tests/python/gaia-ui-tests/gaiatest/apps/base.py", line 29, in <lambda>
    lambda m: m.find_element(by, locator))
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 1189, in find_element
    response = self._send_message('findElement', 'value', **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 613, in _send_message
    self._handle_error(response)
  File "/usr/local/lib/python2.7/dist-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 634, in _handle_error
    raise NoSuchElementException(message=message, status=status, stacktrace=stacktrace)
TimeoutException: Timed out after 30.5163249969 seconds, caused by <class 'marionette.errors.NoSuchElementException'>
Need to remove the part says "(but not a timeout)" @@"
(In reply to Eric Chang [:ericcc] [:echang] from comment #19)
> Created attachment 8400474 [details]
> 20140402-3-open-wifi.zip
>                                               
> wifi without authentication
> 
> 99 PASS
>  1 FAIL (but not a timeout)
Is the firmware a factor here? I have the v1.2_US_20131115.cfg on this Buri.
(In reply to Eric Chang [:ericcc] [:echang] from comment #21)
> Is the firmware a factor here? I have the v1.2_US_20131115.cfg on this Buri.

I don't think so - I can/could reproduce this manually (thought it's easier/more problematic with automation, b/c it feels like a race condition or something), and with the latest from TeleWeb, which should be v1.2-device.cfg - are you on that?
Thanks for the info, I will try that. Currently I am waiting for dimi's private build for getting more debugging output.
(In reply to Eric Chang [:ericcc] [:echang] from comment #23)
> Thanks for the info, I will try that. Currently I am waiting for dimi's
> private build for getting more debugging output.

Thanks; can you also be sure you're both testing on Hamachi or Buri?  We haven't used Unagis since before October or so of last year, IIRC...
Sure, I use Buri only.
Depends on: 991499
One of the root cause that case this test case fail will be fixed in Bug 991499.
But there still other issues related to this bug, i will keep checking it.
(In reply to Dimi Lee[:dimi][:dlee] from comment #26)
> One of the root cause that case this test case fail will be fixed in Bug
> 991499.
> But there still other issues related to this bug, i will keep checking it.

Should bug 991499 be nominated to block then?
Attached file 20140405-1.zip
Hi Dimi, Got 1 with private build, could you help to check the log, thanks..
Flags: needinfo?(dlee)
(In reply to Eric Chang [:ericcc] [:echang] from comment #28)
> Created attachment 8402435 [details]
> 20140405-1.zip
> 
> Hi Dimi, Got 1 with private build, could you help to check the log, thanks..

Hi Eric,
  This error happened because wpa_supplicant did not find the ap during scan, this is the same problem as i described in Comment 8, could you help test again in office environment ? thanks
Flags: needinfo?(dlee)
Running for 5 hours now, all pass, will leave that in the office for the night.
Running for 13+ hours now, 800 runs all pass, (59~61 sec/a run) with private build from Dimi and open wifi.
Flags: needinfo?(dlee)
According to Eric test result, maybe we can check if this bug is still reproducible after bug 991499 land to 1.4
Flags: needinfo?(dlee)
According to comment 31 and comment 32, set to RESOLVED WORKSFORME
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
Stephen - Can you check to see if this has been running green here in our CI? Just want to double check that this is truly fixed.
Flags: needinfo?(stephen.donner)
Reopen it because this requires bug 991499 to be uplifted to 1.4.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
(In reply to Jason Smith [:jsmith] from comment #34)
> Stephen - Can you check to see if this has been running green here in our
> CI? Just want to double check that this is truly fixed.

Still having problems on master, too (see http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.hamachi.mozilla-central.unittests.adhoc/17/).  Likely, they are related to bug 990580.
Flags: needinfo?(stephen.donner)
This issue is reproducible on Jenkins, on v1.3 latest build. From the screenshot we can notice that the wireless network is present, but the phone is not able to connect to it before timeout. 

Traceback (most recent call last):
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g28_v1_3.ui.2/.env/local/lib/python2.7/site-packages/marionette_client-0.7.1-py2.7.egg/marionette/marionette_test.py", line 143, in run
testMethod()
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g28_v1_3.ui.2/tests/python/gaia-ui-tests/gaiatest/tests/functional/settings/test_settings_wifi.py", line 21, in test_connect_to_wifi_via_settings_app
wifi_settings.connect_to_network(self.testvars['wifi'])
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g28_v1_3.ui.2/tests/python/gaia-ui-tests/gaiatest/apps/settings/regions/wifi.py", line 48, in connect_to_network
timeout = max(self.marionette.timeout and self.marionette.timeout / 1000, 60))
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g28_v1_3.ui.2/tests/python/gaia-ui-tests/gaiatest/apps/base.py", line 103, in wait_for_condition
raise TimeoutException(message)
TimeoutException: Condition timed out


Gaia      100a159af11281d965e9d70d0f3d3111a38775ee
Gecko     https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/ea2a473b122f
BuildID   20140409004002
Version   28.0
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013

I couldn't reproduce this locally. I attached a screenshot of the failure.
Attached image screenshot.png
(In reply to Viorela Ioia [:viorela] from comment #37)
> This issue is reproducible on Jenkins, on v1.3 latest build. From the
> screenshot we can notice that the wireless network is present, but the phone
> is not able to connect to it before timeout. 
> 
> Traceback (most recent call last):
> File
> "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g28_v1_3.ui.2/.env/local/lib/
> python2.7/site-packages/marionette_client-0.7.1-py2.7.egg/marionette/
> marionette_test.py", line 143, in run
> testMethod()
> File
> "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g28_v1_3.ui.2/tests/python/
> gaia-ui-tests/gaiatest/tests/functional/settings/test_settings_wifi.py",
> line 21, in test_connect_to_wifi_via_settings_app
> wifi_settings.connect_to_network(self.testvars['wifi'])
> File
> "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g28_v1_3.ui.2/tests/python/
> gaia-ui-tests/gaiatest/apps/settings/regions/wifi.py", line 48, in
> connect_to_network
> timeout = max(self.marionette.timeout and self.marionette.timeout / 1000,
> 60))
> File
> "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g28_v1_3.ui.2/tests/python/
> gaia-ui-tests/gaiatest/apps/base.py", line 103, in wait_for_condition
> raise TimeoutException(message)
> TimeoutException: Condition timed out
> 
> 
> Gaia      100a159af11281d965e9d70d0f3d3111a38775ee
> Gecko     https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/ea2a473b122f
> BuildID   20140409004002
> Version   28.0
> ro.build.version.incremental=eng.tclxa.20131223.163538
> ro.build.date=Mon Dec 23 16:36:04 CST 2013
> 
> I couldn't reproduce this locally. I attached a screenshot of the failure.

What is the reproduce rate of this failure ? And is it possible to have log for this error ? Thanks!
Flags: needinfo?(viorela.ioia)
We still fail on master, and -- oddly enough -- it's now passing on the 1st 3 tries, but the 4th fails (see http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.hamachi.mozilla-central.unittests.adhoc/20/).

I'll attach a logfile, next, too.

19:28:41 Showing version details
19:28:45 application_buildid: 20140409142348
19:28:45 application_changeset: 690c810c8e3e
19:28:45 application_name: B2G
19:28:45 application_repository: https://hg.mozilla.org/mozilla-central
19:28:45 application_version: 31.0a1
19:28:45 build_changeset: 1ad48c4be51b279f7f63c1a13025b52fe087d231
19:28:45 device_firmware_date: 1387433095
19:28:45 device_firmware_version_incremental: 324
19:28:45 device_firmware_version_release: 4.0.4
19:28:45 device_id: msm7627a
19:28:45 gaia_changeset: 9d0b1bdf746823a94b13e6574c1d8304dc584763
19:28:45 gaia_date: 1397017099
19:28:45 platform_buildid: 20140409142348
19:28:45 platform_changeset: 690c810c8e3e
19:28:45 platform_repository: https://hg.mozilla.org/mozilla-central
The 4th test fails, here -- the 1st 3 pass, oddly enough!
Depends on: 993821
Depends on: 986365
No longer depends on: 993821
Depends on: 993821
(In reply to Stephen Donner [:stephend] from comment #41)
> Created attachment 8404424 [details]
> logcat.txt from latest failed run on master/m-c
> 
> The 4th test fails, here -- the 1st 3 pass, oddly enough!
WifiWorker cannot retrieve dhcp info, but log info is not enough to know it stucked in gecko, dhcpcd or AP side.
Hi, Stephen,
  Can we test connect to different AP ?
  And is it possible to provide an engineer build to you to get more information from log ? 
  Thanks for help.
Flags: needinfo?(viorela.ioia) → needinfo?(stephen.donner)
Whiteboard: [fromAutomation] → [fromAutomation][p=2]
Target Milestone: --- → 1.4 S6 (25apr)
(In reply to Dimi Lee[:dimi][:dlee] from comment #42)
> (In reply to Stephen Donner [:stephend] from comment #41)
> > Created attachment 8404424 [details]
> > logcat.txt from latest failed run on master/m-c
> > 
> > The 4th test fails, here -- the 1st 3 pass, oddly enough!
> WifiWorker cannot retrieve dhcp info, but log info is not enough to know it
> stucked in gecko, dhcpcd or AP side.
> Hi, Stephen,
>   Can we test connect to different AP ?
>   And is it possible to provide an engineer build to you to get more
> information from log ? 
>   Thanks for help.

I will try this from home tonight.  As for the 2nd question, I'm not quite clear on what you're asking -- do you want us to try a special build, for more debug info?  If so, sure, that'd be fine by me.  Please make sure it's got Marionette enabled, too.
Flags: needinfo?(stephen.donner)
Flags: needinfo?(dlee)
Attached file debug_987769.tar.gz
Hi Stephen,
  Attach debug files using latest mozilla-central
Detail information is included in README.
Please ping me if there are any problem, thanks
Flags: needinfo?(dlee)
Flags: needinfo?(stephen.donner)
Whiteboard: [fromAutomation][p=2] → [fromAutomation][p=2], Priority 2
Here's the output (and command to run) this test, with 9 repeats (10 total), /without/ your patch.  Logcat, also /without/ patch, for comparison, attaching next.
Flags: needinfo?(stephen.donner)
Depends on: 997791
Depends on: 990580, 991262
This issue now may related to many other bugs cause test case fail.
We should check this after other bugs is fixed then we will have a more clear environment to investigate.

Suggest set to non-blocking now.
blocking-b2g: 1.4+ → 1.4?
Flags: needinfo?(praghunath)
Target Milestone: 1.4 S6 (25apr) → ---
(In reply to Dimi Lee[:dimi][:dlee] from comment #47)
> This issue now may related to many other bugs cause test case fail.
> We should check this after other bugs is fixed then we will have a more
> clear environment to investigate.
> 
> Suggest set to non-blocking now.

Why? As long as there's intermittent failures due to wifi regressions, this remains as a blocker. If there's nothing else to do here that's related to wifi, then this should be closed out.
(In reply to Jason Smith [:jsmith] from comment #48)
> (In reply to Dimi Lee[:dimi][:dlee] from comment #47)
> > This issue now may related to many other bugs cause test case fail.
> > We should check this after other bugs is fixed then we will have a more
> > clear environment to investigate.
> > 
> > Suggest set to non-blocking now.
> 
> Why? As long as there's intermittent failures due to wifi regressions, this
> remains as a blocker. If there's nothing else to do here that's related to
> wifi, then this should be closed out.

Agreed; I suggested we clear through bug 990580 (and perhaps bug 991262) because it appears far-reaching -- this should still block, though (esp. if we can still reproduce, at all, after bug 990580's fix lands).
(In reply to Jason Smith [:jsmith] from comment #48)
> (In reply to Dimi Lee[:dimi][:dlee] from comment #47)
> > This issue now may related to many other bugs cause test case fail.
> > We should check this after other bugs is fixed then we will have a more
> > clear environment to investigate.
> > 
> > Suggest set to non-blocking now.
> 
> Why? As long as there's intermittent failures due to wifi regressions, this
> remains as a blocker. If there's nothing else to do here that's related to
> wifi, then this should be closed out.

We can not reproduce the failure caused by wifi at Taipei, Eric also had do lots of test (Comment 31). Stephen provided us failure logs and some of them are not related to wifi. But we did see some of logs show that gecko cannot retrieve ip information from DHCP. From the log it is not related to any wifi regression and we think it is more likely related to the environment.

We still want to keep track on this bug so it should not be closed out.
No longer depends on: 990580
No longer depends on: 991262
(In reply to Dimi Lee[:dimi][:dlee] from comment #50)
> (In reply to Jason Smith [:jsmith] from comment #48)
> > (In reply to Dimi Lee[:dimi][:dlee] from comment #47)
> > > This issue now may related to many other bugs cause test case fail.
> > > We should check this after other bugs is fixed then we will have a more
> > > clear environment to investigate.
> > > 
> > > Suggest set to non-blocking now.
> > 
> > Why? As long as there's intermittent failures due to wifi regressions, this
> > remains as a blocker. If there's nothing else to do here that's related to
> > wifi, then this should be closed out.
> 
> We can not reproduce the failure caused by wifi at Taipei, Eric also had do
> lots of test (Comment 31). Stephen provided us failure logs and some of them
> are not related to wifi. But we did see some of logs show that gecko cannot
> retrieve ip information from DHCP. From the log it is not related to any
> wifi regression and we think it is more likely related to the environment.

The reality is that this environment is where our devices are running CI automation, so it needs to work. We can't just point at the environment with no idea of the solution in mind, as that doesn't provide firm evidence that there are no more problems on the gecko side. We either need to prove firmly about what is failing in our wifi setup or we need to continue investigating what gecko bugs are present here. As such, this stays as a blocker, as this still is a regression in existing on device automation.

Note - The issues not related to wifi are being fixed separately and are unrelated to this bug, so I'm removing as dependencies.
blocking-b2g: 1.4? → 1.4+
Flags: needinfo?(praghunath)
Can we please check if this issue is reproducible once bug 979130 is uplifted?
Keywords: qawanted
(In reply to Preeti Raghunath(:Preeti) from comment #52)
> Can we please check if this issue is reproducible once bug 979130 is
> uplifted?

That's unrelated to this bug.
Keywords: qawanted
(In reply to Jason Smith [:jsmith] from comment #51)
> The reality is that this environment is where our devices are running CI
> automation, so it needs to work. We can't just point at the environment with
> no idea of the solution in mind, as that doesn't provide firm evidence that
> there are no more problems on the gecko side. We either need to prove firmly
> about what is failing in our wifi setup or we need to continue investigating
> what gecko bugs are present here.
Agree with you. Because Dimi wants some helps to reproduce this bugs with his debugging patch and Dimi can not reproduce this bug with local network, can someone help him?
Keywords: qawanted
(In reply to Ken Chang[:ken] from comment #54)
> (In reply to Jason Smith [:jsmith] from comment #51)
> > The reality is that this environment is where our devices are running CI
> > automation, so it needs to work. We can't just point at the environment with
> > no idea of the solution in mind, as that doesn't provide firm evidence that
> > there are no more problems on the gecko side. We either need to prove firmly
> > about what is failing in our wifi setup or we need to continue investigating
> > what gecko bugs are present here.
> Agree with you. Because Dimi wants some helps to reproduce this bugs with
> his debugging patch and Dimi can not reproduce this bug with local network,
> can someone help him?

What patch? Which attachment?
(In reply to Jason Smith [:jsmith] from comment #55)

> What patch? Which attachment?

https://bugzilla.mozilla.org/attachment.cgi?id=8406080, from comment 44.  I'll try to get this today.
(In reply to Stephen Donner [:stephend] from comment #56)
> (In reply to Jason Smith [:jsmith] from comment #55)
> 
> > What patch? Which attachment?
> 
> https://bugzilla.mozilla.org/attachment.cgi?id=8406080, from comment 44. 
> I'll try to get this today.

Tried this today, on a just-flashed Hamachi, but get the following:

sdonner-10779:Desktop sdonner$ adb push omni.ja /system/b2g
failed to copy 'omni.ja' to '/system/b2g/omni.ja': No space left on device

I've done an adb remount, but still no luck; what am I doing wrong?
(In reply to Stephen Donner [:stephend] from comment #58)
> Created attachment 8410013 [details]
> wifi_logcat.txt

This was done like so:

1) I pushed:

sdonner-10779:Desktop sdonner$ sudo adb remount
Password:
remount succeeded
sdonner-10779:Desktop sdonner$ sudo adb push libnetutils.so /system/lib/

to:

2) sdonner-10779:B2G-flash-tool sdonner$ ./check_versions.sh 
Gaia      84f4835833c4cb8ac7e5d9e0c94738b1cb0ef45a
Gecko     https://hg.mozilla.org/mozilla-central/rev/c962bde5ac0b
BuildID   20140421160201
Version   31.0a1
ro.build.version.incremental=324
ro.build.date=Thu Dec 19 14:04:55 CST 2013

3) Then ran:

gaiatest --testvars=/Users/sdonner/Desktop/ui-tests/b2g-7.json --address=localhost:2828 --xml-output=results/results.xml --html-output=results/index.html --type=b2g-unit --restart gaiatest/tests/functional/settings/test_settings_wifi.py

4) What I see when I run the automation, is that it's just "Searching..." indefinitely, under "Available networks."  Also, manually (after automation has run), I see that under the Settings app top-level, "Wi-Fi" is in the "Initializing..." state.  When I try this manually with the same build (after a restart), I don't get it into this state.
(In reply to Dimi Lee[:dimi][:dlee] from comment #60)
> Created attachment 8410035 [details]
> omni.ja with debug message

Unfortunately, this didn't work with my build; it causes the phone to boot into a blank/black screen and never finishes its startup.
Attached file logcat.txt
This should show a single test_settings_wifi.py failure, with Debug turned on, with the appropriate, new omni.ja replaced!
(In reply to Stephen Donner [:stephend] from comment #62)
> Created attachment 8410049 [details]
> logcat.txt
> 
> This should show a single test_settings_wifi.py failure, with Debug turned
> on, with the appropriate, new omni.ja replaced!

Hi, Stephen
 Could you help upload the output html (index.html), thanks !
Attached file index.html
Keywords: qawanted
Hi Dimi, I run this for 200*30 times, 3 days, I got tons of reports and logs, and the b2g starts but it stays in the blue flash screen. Will send you the logs soon.
:davehunt, any thoughts on the failures between a run of test_settings.wifi.py with and without --restart?

For me, locally, it passes without --restart, but fails _with_ --restart.  However, http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.hamachi.mozilla-central.unittests.adhoc/ fares better than I, and it uses --restart (but I flash with ./fullflash.sh -n and also don't do the stuff our Jenkins box does).
Flags: needinfo?(dave.hunt)
How good is the wifi signal in the testing env.? For us in Taipei, the signal of APs we are testing with are quite strong.
Flags: needinfo?(stephen.donner)
(In reply to Stephen Donner [:stephend] from comment #66)
> :davehunt, any thoughts on the failures between a run of
> test_settings.wifi.py with and without --restart?
> 
> For me, locally, it passes without --restart, but fails _with_ --restart. 
> However,
> http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.hamachi.mozilla-central.
> unittests.adhoc/ fares better than I, and it uses --restart (but I flash
> with ./fullflash.sh -n and also don't do the stuff our Jenkins box does).

Could you update GAIA to latest one and test again ?
In my testing environment, the always "Searching..." issue is fixed after updating GAIA
(In reply to Stephen Donner [:stephend] from comment #66)
> :davehunt, any thoughts on the failures between a run of
> test_settings.wifi.py with and without --restart?
> 
> For me, locally, it passes without --restart, but fails _with_ --restart. 
> However,
> http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.hamachi.mozilla-central.
> unittests.adhoc/ fares better than I, and it uses --restart (but I flash
> with ./fullflash.sh -n and also don't do the stuff our Jenkins box does).

With --restart we stop b2g and clear a bunch of files from the device. Without it, we try to achieve the same in software - like using the API to forget WiFi networks, etc. Usually --restart is better at achieving a clean state, but sometimes we need to tweak the collection of files we delete. Perhaps something has changed in the way WiFi details are cached?
Flags: needinfo?(dave.hunt)
Duplicate of this bug: 997197
Whiteboard: [fromAutomation][p=2], Priority 2 → [fromAutomation][p=2], Priority 2,[ETA:4/30]
(In reply to Eric Chang [:ericcc] [:echang] from comment #67)
> How good is the wifi signal in the testing env.? For us in Taipei, the
> signal of APs we are testing with are quite strong.

The Wi-Fi router (Juniper) is literally right above the devices, in the lab.
Flags: needinfo?(stephen.donner)
Update current status of this issue :
1. We are tracking the issue that test case fail when trying to obtain IP address (Comment 42).
So I provide some components with debug information to Stephen to get more debug message because we cannot reproduce this in Taipei (Comment 31).

2. Stephen encounter a problem that wifi always show Searching when running test case (Comment 59). 
This problem may caused by unsync version of GAIA and GECKO because it is fixed in my side if update to latest GAIA and GECKO (mozilla-central).
Stephen, Could you help confirm this in your side ? Thanks
Flags: needinfo?(stephen.donner)
(In reply to Dimi Lee[:dimi][:dlee] from comment #72)
> Update current status of this issue :
> 1. We are tracking the issue that test case fail when trying to obtain IP
> address (Comment 42).
> So I provide some components with debug information to Stephen to get more
> debug message because we cannot reproduce this in Taipei (Comment 31).
> 
> 2. Stephen encounter a problem that wifi always show Searching when running
> test case (Comment 59). 
> This problem may caused by unsync version of GAIA and GECKO because it is
> fixed in my side if update to latest GAIA and GECKO (mozilla-central).
> Stephen, Could you help confirm this in your side ? Thanks
I raise "qawanted" again because we still not get the log for original problem, the item 1 what Dimi mentioned and we can not reproduce it.
Keywords: qawanted
Zac, can you help get a local run which has failed, and a logcat?  Thx.
Flags: needinfo?(stephen.donner) → needinfo?(zcampbell)
(In reply to Stephen Donner [:stephend] from comment #71)
> (In reply to Eric Chang [:ericcc] [:echang] from comment #67)
> > How good is the wifi signal in the testing env.? For us in Taipei, the
> > signal of APs we are testing with are quite strong.
> 
> The Wi-Fi router (Juniper) is literally right above the devices, in the lab.

So, we thought this was true, but we are now told that the "Mozilla Guest" node to which we connect is, in fact, "somewhere else" -- likely in the ceiling in the adjacent kitchen to this lab.

As a next step on our side, we'll also try connecting to an authenticated, WPA2 Personal node: the one that the A-team uses (for which we'll need to isolate a device, add its MAC address to the Access Control List, and then change a credentials file so our automation can use it).

Raymond, please slot this as #2 on your priorities, just behind launching Australis tomorrow; thanks!
Flags: needinfo?(mozbugs.retornam)
(In reply to Ken Chang[:ken] from comment #73)

> I raise "qawanted" again because we still not get the log for original
> problem, the item 1 what Dimi mentioned and we can not reproduce it.

Actually, on Thursday the 24th, I sent Dimi the logfiles of runs with failures and without, as he requested.  I'll post them here, for posterity.
Here's the case without specifying "--restart"; it passes.
Attached file wifilogcat-restart.txt
Here's a logcat where we pass in "--restart", and fail to pass the test.
Additionally, I was told this isn't failing in Tiapei's environment, but yet:

http://release1-qa.corp.tpe1.mozilla.com:8080/view/B2G.Test.Hamachi/job/B2G.master.pvt.hamachi.gaiatest.unit.wifi/4/console

Eric, can you get that job running with HTML Reports (w/screenshots) and logcat (with debug Wi-Fi output) enabled?  Thanks!
Flags: needinfo?(echang)
(And sorry, one last spam for me tonight, but I notice Taipei QA's job is running the gaiatest/tests/unit/settings/test_wifi_settings.py/ *test_set_wifi* [1] test; we're after gaiatest/tests/functional/settings/test_settings_wifi.py/ *test_connect_to_wifi_via_settings_app* [2] test.

Thanks again for helping us all try to nail this down!

[1] https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/tests/unit/settings/test_wifi_settings.py#L10
[2] https://github.com/mozilla-b2g/gaia/blob/master/tests/python/gaia-ui-tests/gaiatest/tests/functional/settings/test_settings_wifi.py#L15
Keywords: qawanted
Ok, worked quite a bit again with Dimi, and we've got a better handle on this, now.

1) we're no longer seeing -- locally or in Jenkins -- the "Searching..." problem
2) The real problem (and for which this bug was filed) is still present: with --restart --repeat=n, we're still seeing problems with two access points 1) "Mozilla Guest" and 2) "ateam" taking a while to dole out DHCP leases/wireless sessions.

We're looking into that, tomorrow, after Australis launches :-)  We'll report back with more as we have it.
Actually, that was happening when we have build problem last week, the build was large, cannot be installed onto phone completely, and the build inside was messed up.

(In reply to Stephen Donner [:stephend] from comment #79)
> Additionally, I was told this isn't failing in Tiapei's environment, but yet:
> 
> http://release1-qa.corp.tpe1.mozilla.com:8080/view/B2G.Test.Hamachi/job/B2G.
> master.pvt.hamachi.gaiatest.unit.wifi/4/console
> 
> Eric, can you get that job running with HTML Reports (w/screenshots) and
> logcat (with debug Wi-Fi output) enabled?  Thanks!
Flags: needinfo?(echang)
FWIW, the connect_to_wifi() failure that underlies this and other tests (such as in test_cleanup_gaia) is also experienced -- ocassionally -- by Taipei, so I've asked over in bug 997552 comment 9 for help looking into that as another possible method of reproducing and nailing this issue.
Additionally, test_connect_and_forget_all_networks has also now started failing in our unittests; see http://selenium.qa.mtv2.mozilla.com:8080/view/B2G%20Hamachi/job/b2g.hamachi.mozilla-central.unittests/549/testReport/test_wifi/TestWiFi/test_connect_and_forget_all_networks/
Summary: [v1.4] Investigate test_settings_wifi.py failure → [v1.4][v2.0] Investigate test_settings_wifi.py failure
(In reply to Stephen Donner [:stephend] from comment #81)
> Ok, worked quite a bit again with Dimi, and we've got a better handle on
> this, now.
> 
> 1) we're no longer seeing -- locally or in Jenkins -- the "Searching..."
> problem
> 2) The real problem (and for which this bug was filed) is still present:
> with --restart --repeat=n, we're still seeing problems with two access
> points 1) "Mozilla Guest" and 2) "ateam" taking a while to dole out DHCP
> leases/wireless sessions.
> 
> We're looking into that, tomorrow, after Australis launches :-)  We'll
> report back with more as we have it.

Hi, Stephen,
  Have any update or test result on this bug ? Thanks
Flags: needinfo?(stephen.donner)
I ran tests with --restart and it failed. Attached is the logcat output

Gaia      d23e479e8a4ce0bc620acb2d7e2f82801aa4d0ea
Gecko     https://hg.mozilla.org/releases/mozilla-aurora/rev/36f67ce46855
BuildID   20140428000206
Version   30.0a2
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
Flags: needinfo?(mozbugs.retornam)
Attached file locat-apr-30-run-2.log
I ran again with the Omni.ja and libnetutils.so stephend sent me and it failed after 7 tries. 

Attached is the logcat for all the runs
Attached file locat-apr-30-run-3.log
Logcat from 3rd run
Flags: needinfo?(zcampbell)
Flags: needinfo?(stephen.donner)
Dimi, Ken: do you now have the information you need, from us, at this point?

FWIW, we asked IT, and learned the following:

6:22:12 PM <adam> well, the DHCP server is seeing it.
6:22:16 PM <adam> Apr 30 22:09:28 admin1a dhcpd: DHCPREQUEST for 10.252.33.54 (10.252.75.6) from e4:2d:02:2b:da:4b via 10.252.32.1
6:22:16 PM <adam> Apr 30 22:09:28 admin1a dhcpd: DHCPACK on 10.252.33.54 to e4:2d:02:2b:da:4b via 10.252.32.1
6:23:04 PM <adam> don't see an error yet, not rate limiting.
6:46:44 PM <adam> It appears to try to send it that 33.54 addr
6:51:06 PM <adam> I'm not seeing errors.
Flags: needinfo?(dlee)
Hi Stephen,
  Yes, it did provide some useful information, but we still need more test to investigate this issue.
From log, we can confirm that gecko part is OK and the problem may occur in dhcpcd(dhcp client) or dhcp server.

Could your help check following:
1. The libnetutils.so seems not replace correctly because i cannot see any debug information from latest log. Could you help reproduce this with pushing it to system/lib
2. In the lab environment, we tried both "ateam" and "mozilla guest". Do they both connect to the same dhcp server ?
3. For question 2, if they both connect to the same dhcp server, is it possible to try this case with different dhcp server ? For example, use your cell phone as wifi hotspot and try to connect to your phone.

  Thanks for help!
Flags: needinfo?(dlee) → needinfo?(stephen.donner)
(In reply to Dimi Lee[:dimi][:dlee] from comment #90)
> Hi Stephen,
>   Yes, it did provide some useful information, but we still need more test
> to investigate this issue.
> From log, we can confirm that gecko part is OK and the problem may occur in
> dhcpcd(dhcp client) or dhcp server.
> 
> Could your help check following:
> 1. The libnetutils.so seems not replace correctly because i cannot see any
> debug information from latest log. Could you help reproduce this with
> pushing it to system/lib

Fixed; sorry about that.  The logs from today should include it.

> 2. In the lab environment, we tried both "ateam" and "mozilla guest". Do
> they both connect to the same dhcp server ?

According to #it, no; we have "admin1a" and "admin1b", but both have, or should have, the same dhcpd.conf, which has:

5:08:09 PM <limed> we have these 2 values in dhcpd.conf
5:08:09 PM <limed>  max-response-delay 60;
5:08:09 PM <limed>  max-unacked-updates 10;

The full log of a Gaia UI Test run is here: https://app.box.com/s/dmk40286u1obssfdje4t.  I'll attach the HTML report next.

> 3. For question 2, if they both connect to the same dhcp server, is it
> possible to try this case with different dhcp server ? For example, use your
> cell phone as wifi hotspot and try to connect to your phone.
> 
>   Thanks for help!

I'll work again with them tomorrow on this issue, and one of them will be here in-person, to help us debug it, too.
Flags: needinfo?(stephen.donner)
Also, according to them, sometimes the device keeps requesting a lease even after it's been given a lease:

    May  5 23:49:32 admin1a dhcpd: DHCPREQUEST for 10.252.120.213 from e4:2d:02:38:d9:c4 via 10.252.120.1
    May  5 23:49:32 admin1a dhcpd: DHCPACK on 10.252.120.213 to e4:2d:02:38:d9:c4 via 10.252.120.1
    May  5 23:51:09 admin1a dhcpd: DHCPREQUEST for 10.252.120.213 from e4:2d:02:38:d9:c4 via 10.252.120.1
    May  5 23:51:09 admin1a dhcpd: DHCPACK on 10.252.120.213 to e4:2d:02:38:d9:c4 via 10.252.120.1
(In reply to Stephen Donner [:stephend] from comment #91)

> The full log of a Gaia UI Test run is here:
> https://app.box.com/s/dmk40286u1obssfdje4t.  I'll attach the HTML report
> next.

Sorry, in some spring cleaning, I accidentally a) uploaded the wrong file b) deleted the one I meant to upload :-(

Here's a reproduction of at least some of the problems, manually, though:

https://app.box.com/s/fslrqtubceap5bqx7h48
(In reply to Stephen Donner [:stephend] from comment #94)
> (In reply to Stephen Donner [:stephend] from comment #91)
> 
> > The full log of a Gaia UI Test run is here:
> > https://app.box.com/s/dmk40286u1obssfdje4t.  I'll attach the HTML report
> > next.
> 
> Sorry, in some spring cleaning, I accidentally a) uploaded the wrong file b)
> deleted the one I meant to upload :-(
> 
> Here's a reproduction of at least some of the problems, manually, though:
> 
> https://app.box.com/s/fslrqtubceap5bqx7h48

Thanks for providing the log.
From log we can know that the problem is between dhcp client and server.
dhcp client(dhcpcd) could not get ip address so the testcase fail.

Could you help check following :
1. Try to use your personal phone as wifi hotspot, and modify jenkins testcase config file to connect to your phone. Then we can know if this issue is related to network environment.

2. Replace uploaded dhcpcd to /system/bin/ and dump the debug message if testcase fail

Thanks!
Flags: needinfo?(stephen.donner)
Push this file to /system/bin/  to get dhcpcd debug message
Attached file wifi_logcat.txt
Flags: needinfo?(stephen.donner)
The above logcat is from the same run as the following HTML Report, which had two failures: 1 "AssertionError: WiFi was not connected via Settings app" and 1 "TimeoutException: TimeoutException: Timed out after 60.1 seconds" which timed out looking for "Connected".
test_settings_wifi.py testcase fail (Comment 37) on jenkins is because of gecko could not retrieve ip information from DHCP server.

After replacing gecko and libnetutils to get more debug information we found that libnetutils dump the following debug message:

"Timed out waiting for DHCP to finish"  (in system/core/libnetutils/dhcp_utils.c::dhcp_do_request)

This debug message means that this issue is not related to gecko and it may cause by either dhcpcd (dhcp client) ,dhcp server or network environment. Also according to comment 92, it did have some problem between client and server.

I will keep checking this issue, but from current investigate result, this issue is neither a wifi regression nor a gecko bug so i will suggest to set this bug to non-blocking b2g.
blocking-b2g: 1.4+ → 1.4?
Removing the blocking flag as the conclusion on this bug indicates this is not a gaia/gecko bug. We'll keep investigating though.
blocking-b2g: 1.4? → ---
Component: Wifi → Gaia::UI Tests
Attached file mywifi-woes_logcat.txt
This is from a manual run, where I try to connect to "ateam", but also connect a few times to "Mozilla Guest" with some issues with both SSIDs :(
(In reply to Stephen Donner [:stephend] from comment #102)
> Created attachment 8419111 [details]
> mywifi-woes_logcat.txt
> 
> This is from a manual run, where I try to connect to "ateam", but also
> connect a few times to "Mozilla Guest" with some issues with both SSIDs :(

And here's the video capture that goes along with it: https://drive.google.com/file/d/0B_XMhxrTPIrWWnJTYy1EMVVLd3M/edit?usp=sharing

Dimi, does this help?
(In reply to Stephen Donner [:stephend] from comment #103)
> (In reply to Stephen Donner [:stephend] from comment #102)
> > Created attachment 8419111 [details]
> > mywifi-woes_logcat.txt
> > 
> > This is from a manual run, where I try to connect to "ateam", but also
> > connect a few times to "Mozilla Guest" with some issues with both SSIDs :(
> 
> And here's the video capture that goes along with it:
> https://drive.google.com/file/d/0B_XMhxrTPIrWWnJTYy1EMVVLd3M/edit?usp=sharing
> 
> Dimi, does this help?

Yes, it is useful, I checked the timeline of viedo between
1:29(log time 20:25:44) - 1:41(log time 20:25:55) (waiting for ip address -> connected)

And following is the debug message of dhcpcd

05-07 20:25:43.709   990   990 E dhcpcd  : version 5.2.10 starting
05-07 20:25:43.719   990   990 F dhcpcd  : wlan0: using hwaddr e4:2d:02:38:d9:c4
05-07 20:25:43.779   990   990 E dhcpcd  : wlan0: rebinding lease of 10.252.33.142
05-07 20:25:43.779   990   990 F dhcpcd  : wlan0: sending REQUEST (xid 0x583817f), next in 4.99 seconds
05-07 20:25:48.789   990   990 F dhcpcd  : wlan0: sending REQUEST (xid 0x583817f), next in 7.91 seconds
05-07 20:25:53.789   990   990 E dhcpcd  : wlan0: broadcasting for a lease
05-07 20:25:53.789   990   990 F dhcpcd  : wlan0: sending DISCOVER (xid 0xab7ef042), next in 3.24 seconds
05-07 20:25:54.619   990   990 E dhcpcd  : wlan0: offered 10.252.120.213 from 10.252.75.6
05-07 20:25:54.619   990   990 F dhcpcd  : wlan0: sending REQUEST (xid 0xab7ef042), next in 4.43 seconds
05-07 20:25:54.619   990   990 E dhcpcd  : wlan0: ignoring offer of 10.252.120.213 from 10.252.75.7
05-07 20:25:54.679   990   990 E dhcpcd  : wlan0: acknowledged 10.252.120.213 from 10.252.75.7

From the log message, we can see that dhcpcd wait for 11 sec then get server ack. I think it is related to the testcase fail we encountered, because if dhcpcd takes too long to get server ack, the test case will fail cause of timeout.
blocking-b2g: --- → backlog
Whiteboard: [fromAutomation][p=2], Priority 2,[ETA:4/30] → [fromAutomation][p=2], Priority 2
Priority: -- → P1
(In reply to Dimi Lee[:dimi][:dlee] from comment #104)

> And following is the debug message of dhcpcd
> 
> 05-07 20:25:43.709   990   990 E dhcpcd  : version 5.2.10 starting
> 05-07 20:25:43.719   990   990 F dhcpcd  : wlan0: using hwaddr
> e4:2d:02:38:d9:c4
> 05-07 20:25:43.779   990   990 E dhcpcd  : wlan0: rebinding lease of
> 10.252.33.142
> 05-07 20:25:43.779   990   990 F dhcpcd  : wlan0: sending REQUEST (xid
> 0x583817f), next in 4.99 seconds
> 05-07 20:25:48.789   990   990 F dhcpcd  : wlan0: sending REQUEST (xid
> 0x583817f), next in 7.91 seconds
> 05-07 20:25:53.789   990   990 E dhcpcd  : wlan0: broadcasting for a lease
> 05-07 20:25:53.789   990   990 F dhcpcd  : wlan0: sending DISCOVER (xid
> 0xab7ef042), next in 3.24 seconds
> 05-07 20:25:54.619   990   990 E dhcpcd  : wlan0: offered 10.252.120.213
> from 10.252.75.6
> 05-07 20:25:54.619   990   990 F dhcpcd  : wlan0: sending REQUEST (xid
> 0xab7ef042), next in 4.43 seconds
> 05-07 20:25:54.619   990   990 E dhcpcd  : wlan0: ignoring offer of
> 10.252.120.213 from 10.252.75.7
> 05-07 20:25:54.679   990   990 E dhcpcd  : wlan0: acknowledged
> 10.252.120.213 from 10.252.75.7
> 
> From the log message, we can see that dhcpcd wait for 11 sec then get server
> ack. I think it is related to the testcase fail we encountered, because if
> dhcpcd takes too long to get server ack, the test case will fail cause of
> timeout.

James, is it possible to play with the default timeout value(s), specifically, the threshold for the time between the DHCPREQUEST and the DHCPACK?  Is that value 10 seconds, currently?  Thanks!
Flags: needinfo?(jbarnell)
Stephen,  we're using the default values from ISC dhcpd, those are globals which would impact many portions of the environment.  This looks like a problem similar to an ateam issue we're troubleshooting and I don't beleive its dhcp related.  You should reasonably expect dhcp to work is less than 10 seconds on a normal exhcange.  Can we match/link the logs above to the system start and wlan0 association events?  If I were to guess the dhcp request is going out before association completes.  Again this is 100% guess so take it with a grain of salt. 

There are a couple of items I've read recently that indicate certain hardware combinations with android/linux work great with a manual connect, but not when started via rc.d.  You're the best judge for that and I'll try to locate the links.

When can we coordinate a time to be in front of the devices?  I'd like to see whats happening and turn up the log level on the wifi controllers.  I also have a laptop with wireless packet capture to use at MTV2.  I do need the captures as we may need to engage Juniper on this.
Flags: needinfo?(jbarnell)
Can you try setting up a static IP and see if this issue still happens? That way we will be sure it's related to the wireless/DHCP combo.
Also is "e4:2d:02:38:d9:c4" a good mac address to look at the logs on the wireless side? Or you have a better test device?
Duplicate of this bug: 1010218
(In reply to Arzhel Younsi [:XioNoX] from comment #107)
> Can you try setting up a static IP and see if this issue still happens? That
> way we will be sure it's related to the wireless/DHCP combo.
> Also is "e4:2d:02:38:d9:c4" a good mac address to look at the logs on the
> wireless side? Or you have a better test device?

Not sure how we can set up a static IP on these Firefox OS devices; the problem doesn't seem to be in the dhcpd, though, now -- what we're now seeing is that the "ateam" node doesn't always appear after the wpa_supplicant's scan finds the rest of the SSIDs.

I've filed bug 1011358 for that.  We're seeing it on both Hamachis and Flames, as well.  I can get you the MAC address of the Flame tomorrow.
Depends on: 1011358
We have 2 failing tests in the latest v1.4 build, because of not being able to connect to wifi in time:
http://selenium.qa.mtv2.mozilla.com:8080/view/B2G%20Hamachi/job/b2g.hamachi.mozilla-b2g30_v1_4.ui.non-smoketest/44/HTML_Report/
http://selenium.qa.mtv2.mozilla.com:8080/view/B2G%20Hamachi/job/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/45/HTML_Report/

Gaia      17b102ee8d9a724b62285547cc5f1c5d30bfb01c
Gecko     https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/95be84248033
BuildID   20140520000201
Version   30.0
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
See Also: → 1005775
See Also: → 1001897
Tests that are using wifi connection have been failing in each of the latest v1.4 builds.
This is the most recent build report: http://selenium.qa.mtv2.mozilla.com:8080/view/B2G%20Hamachi/job/b2g.hamachi.mozilla-b2g30_v1_4.ui.non-smoketest/49/HTML_Report/
http://selenium.qa.mtv2.mozilla.com:8080/job/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/50/HTML_Report/
 
Gaia      16f7ef17feb08562aa6a3cfd6d561b2fcca652c0
Gecko     https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/9cc96d59cc30
BuildID   20140523000202
Version   30.0
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
I started an adhoc run of one of the test that uses wifi connection on v1.4, and it failed each time, with 3 different errors: no wireless networks are found on the list, ateam SSID is not found, or the connection to the ateam fails.
See report below:
http://selenium.qa.mtv2.mozilla.com:8080/view/B2G%20Hamachi/job/b2g.hamachi.mozilla-b2g30_v1_4.adhoc/10/HTML_Report/
I ran the same test locally several times and it passes each time.

Any updates on the wifi issues?
Flags: needinfo?(stephen.donner)
Flags: needinfo?(mozbugs.retornam)
(In reply to Viorela Ioia [:viorela] from comment #112)
> I started an adhoc run of one of the test that uses wifi connection on v1.4,
> and it failed each time, with 3 different errors: no wireless networks are
> found on the list, ateam SSID is not found, or the connection to the ateam
> fails.
> See report below:
> http://selenium.qa.mtv2.mozilla.com:8080/view/B2G%20Hamachi/job/b2g.hamachi.
> mozilla-b2g30_v1_4.adhoc/10/HTML_Report/
> I ran the same test locally several times and it passes each time.
> 
> Any updates on the wifi issues?

The network engineer we are working with in the Mountain View office is out this week. We'll have to pick this up next week.
Flags: needinfo?(mozbugs.retornam)
We are going to potentially be having Wi-Fi issues (esp. on master) for a bit, as bug 1001897 and bug 1011358 are still pending, and, as Raymond says, James Barnell was out most of last week, and will be back to work on this with us next week.
Flags: needinfo?(stephen.donner)
We had several wifi related failures in today's test run.

Build info:
Gaia      8f4201a44676eb70926a3d2645d94bf92fcd6718
Gecko     https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/4985bf79f425
BuildID   20140526000201
Version   30.0
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013

http://selenium.qa.mtv2.mozilla.com:8080/view/B2G/job/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/56/HTML_Report/
http://selenium.qa.mtv2.mozilla.com:8080/view/B2G/job/b2g.hamachi.mozilla-b2g30_v1_4.ui.non-smoketest/55/HTML_Report/

Traceback (most recent call last):
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette_test.py", line 145, in run
self.setUp()
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/tests/python/gaia-ui-tests/gaiatest/tests/functional/ftu/test_ftu_skip_tour.py", line 92, in setUp
self.wait_for_condition(lambda m: self.data_layer.is_wifi_enabled)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 1058, in wait_for_condition
Wait(self.marionette, timeout).until(method, message=message)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/wait.py", line 122, in until
rv = condition(self.marionette)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/tests/python/gaia-ui-tests/gaiatest/tests/functional/ftu/test_ftu_skip_tour.py", line 92, in <lambda>
self.wait_for_condition(lambda m: self.data_layer.is_wifi_enabled)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/tests/python/gaia-ui-tests/gaiatest/gaia_test.py", line 254, in is_wifi_enabled
return self.marionette.execute_script("return window.navigator.mozWifiManager.enabled;")
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 1114, in execute_script
filename=os.path.basename(frame[0]))
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 613, in _send_message
self._handle_error(response)
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.smoketest/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette.py", line 648, in _handle_error
raise JavascriptException(message=message, status=status, stacktrace=stacktrace)
JavascriptException: JavascriptException: TypeError: window.navigator.mozWifiManager is null
stacktrace:
execute_script @gaia_test.py, line 254
inline javascript, line 1084
src: "let __marionetteFunc = function(){return window.navigator.mozWifiManager.enabled;};__marionetteFunc.apply(null, __marionetteParams);"


Traceback (most recent call last):
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.non-smoketest/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/marionette_test.py", line 163, in run
testMethod()
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.non-smoketest/tests/python/gaia-ui-tests/gaiatest/tests/functional/settings/test_settings_wifi.py", line 21, in test_connect_to_wifi_via_settings_app
wifi_settings.connect_to_network(self.testvars['wifi'])
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.non-smoketest/tests/python/gaia-ui-tests/gaiatest/apps/settings/regions/wifi.py", line 50, in connect_to_network
lambda m: m.find_element(*self._connected_message_locator).text == "Connected")
File "/var/jenkins/workspace/b2g.hamachi.mozilla-b2g30_v1_4.ui.non-smoketest/.env/local/lib/python2.7/site-packages/marionette_client-0.7.5-py2.7.egg/marionette/wait.py", line 143, in until
cause=last_exc)
TimeoutException: TimeoutException: Timed out after 60.1016900539 seconds
Team, we will update you as we have progress; no need to continue posting more stacktraces - they just add to the length of the bug.  We know there is an issue with, especially, the "ateam" Wi-Fi node, so we are looking at, now, understanding the difference between our new test "qalab" node and the problems with the "ateam" node.
Assignee: dlee → nobody
We haven't seen wifi issues for some time now.
I guess this is no longer an issue in our tests.
I am going to close this bug.
Thanks to all for the investigation!
Status: REOPENED → RESOLVED
Closed: 6 years ago5 years ago
Resolution: --- → FIXED
blocking-b2g: backlog → ---
You need to log in before you can comment on or make changes to this bug.