Closed Bug 1340004 Opened 7 years ago Closed 7 years ago

App on local arm device is ignoring Robocop:Quit during 'mach robocop'

Categories

(Firefox for Android Graveyard :: Testing, defect, P2)

defect

Tracking

(firefox54 affected)

RESOLVED DUPLICATE of bug 1346083
Tracking Status
firefox54 --- affected

People

(Reporter: twointofive, Assigned: gbrown)

References

Details

Attachments

(5 files, 1 obsolete file)

Running './mach robocop testPictureLinkContextMenu' on a local ARM device gives different results based on whether wifi is turned on or off (this phone doesn't have a data plan, so it's wifi or nothing).

(I presume (some?) other tests are affected similarly, but I haven't tested.)
Attached file Test output with wifi off (obsolete) —
With wifi off the test doesn't appear to run at all: fennec loads on screen and then within a second disappears.
With wifi on I'm seeing two different results.  In both cases it appears that the tests are running correctly, but the output still reports errors.
Here the final summary report actually reports all successes, but there's still a mozdevice timeout error and a lot of debug output.
The test harness - running on your laptop or whatever - needs to be on the same network as your test device - your phone or emulator. If you are testing with a phone, the phone and the laptop should be using wifi, and each should be ping-able from the other. You definitely need wifi on the phone.
(In reply to Tom Klein from comment #3)
> Created attachment 8837880 [details]
> Test output with wifi on: version 2
> 
> Here the final summary report actually reports all successes, but there's
> still a mozdevice timeout error and a lot of debug output.

I see:

mozdevice ERROR | Timeout exceeded for _checkCmd call 'adb shell am instrument -w -e quit_and_finish 1 -e deviceroot /storage/sdcard0/tests -e class org.mozilla.gecko.tests.testPictureLinkContextMenu org.mozilla.roboexample.test/org.mozilla.gecko.FennecInstrumentationTestRunner'

That just means that the test did not complete in the expected maximum time (about 5 minutes). That could be a bug in the test. Debug by looking at the phone to see how far it has gotten in the test and what it is waiting for. Or, get more information from logcat. Or figure out what that anr trace is saying (not by specialty!).
(In reply to Geoff Brown [:gbrown] from comment #4)
> The test harness - running on your laptop or whatever - needs to be on the
> same network as your test device - your phone or emulator. If you are
> testing with a phone, the phone and the laptop should be using wifi, and
> each should be ping-able from the other. You definitely need wifi on the
> phone.

Ah, I had forgotten about that (from the documentation: https://wiki.mozilla.org/Mobile/Fennec/Android/Testing#Test_Environment) since I've always used emulators in the past.  Thanks for the reminder.  Not sure what to change the bug title to yet :)
I tried running the entire test suite 'mach robocop'; it made it through 7 tests (all passing!) plus a final crashing one in 28 minutes (that seems slow?).  I noticed every test has an anr report, and that every anr report starts with the same
  ----- pid 6777 at 2016-11-28 10:59:57 -----
Note the old date from 2016.  Maybe this is an issue with the state of my device?  It's not rooted, so I don't know how to delete that file ... :/  I didn't notice any anr screens while the tests were running, but I wasn't watching closely.

I'll try running on an arm emulator to see if that works any better, though my experience has been that arm emulator on x86 device is agonizingly slow...
Here's the full logcat output after running testPictureLinkContextMenu on my phone.  The console test output is essentially the same as comment 3.  The logcat shows some of the same permissions errors as in bug 1340001 comment 6 (lines 1078-1079 and 1745-1786 for example).

Looking at the timing here, we have:
SimpleTest START: 22:51:47
SimpleTest FINISHED 22:55:57
FINISHED - START = 4:10
SIG 9 sent at 22:58:58, 3:01 later

So it looks like the test completes normally (as far as I can tell: lines 3408-3417) and we get a 'SimpleTest FINISHED' in the log, but then nothing happens until we get a timeout 3 minutes later and the browser shuts down with a SIG 9.  That matches what I see happening on screen; we're testing a bunch of menu stuff, then nothing for several minutes, and then fennec stops.

Looking at the log in bug 1340001 comment 6 I see the same pattern.
Attachment #8837876 - Attachment is obsolete: true
Woops, not sure what happened with those line numbers; I might have been in a different version.  The permissions errors  are 213,216 (search on browser.db) and starting at 234 (search on addons.xpi).
(In reply to Tom Klein from comment #8)
> Looking at the log in bug 1340001 comment 6 I see the same pattern.

Double woops: sorry, too many tests and log files on different devices and emulators floating around, please ignore that line (the log in that comment for the x86 emulator shows a different failure).
I started running 'mach mochitest' on my phone again, but stopped it after a dozen or so tests since I was seeing consistent results.  I watched logcat for the first five tests, and all of them showed the comment 8 behavior where the test seemingly runs normally to its finish, but then you see a 'I/BaseTest: Requesting force quit.' line which apparently doesn't work, and then 3 minutes later there's a SIG 9.  Also all of the tests where I checked had the permissions errors from comment 8.  As in comment 7 they all have the same 2016-dated anr report as well.

So to mini-summarize the issues I'm seeing running robocop tests on my phone:
1) Every test run that I've looked at has some permissions errors/warnings in logcat (comment 8 and comment 9).
2) Every test run that I've watched fails to 'force quit' and instead quits after it times out attempting the force quit, three minutes later (comment 8).  (That explains why all my 'mach robocop' runs have been so slow I guess.)
3) Every test I've looked at shows the same anr report from 2016 (comment 7).  (Wild guess: do we display an anr report as long as one exists at the end of a test, and maybe permissions issues are preventing deleting that 2016 report, so we just always display that old report for every test?)
(In reply to Tom Klein from comment #11)
> 2) Every test run that I've watched fails to 'force quit' and instead quits
> after it times out attempting the force quit, three minutes later (comment
> 8).  (That explains why all my 'mach robocop' runs have been so slow I
> guess.)

That's the most concerning issue to me. Compare to a normal robocop run in automation:

https://public-artifacts.taskcluster.net/W-KSIqpNQFaQi_u2G0fDDQ/0/public/test_info//logcat-emulator-5554.log

02-17 05:04:55.548   807   820 I Robocop : {"action":"log","message":"SimpleTest FINISHED","time":1487336695551,"pid":null,"level":"info","source":"robocop","thread":null}
02-17 05:04:55.557   807   820 I BaseTest: Requesting force quit.
02-17 05:04:55.607   807   807 D GeckoToolbar: onTabChanged: MENU_UPDATED
02-17 05:04:55.607   807   807 D GeckoBrowserApp: BrowserApp.onTabChanged: 0: MENU_UPDATED
02-17 05:04:55.657   807   882 D GeckoSuggestedSites: No cached sites, refreshing.
02-17 05:04:55.677   807   882 D GeckoSuggestedSites: Refreshing suggested sites from file
02-17 05:04:55.727   807   807 I MediaRouter: Found default route: MediaRouter.RouteInfo{ uniqueId=android/.support.v7.media.SystemMediaRouteProvider:DEFAULT_ROUTE, name=Phone, description=null, iconUri=null, enabled=true, connecting=false, connectionState=0, canDisconnect=false, playbackType=0, playbackStream=3, deviceType=0, volumeHandling=1, volume=11, volumeMax=15, presentationDisplayId=-1, extras=null, settingsIntent=null, providerPackageName=android }
02-17 05:04:55.837   807   827 I GeckoDump: Robocop:Quit received -- requesting quit
02-17 05:04:55.878   807   807 D PassiveStumbler: Stopping StumblerService | isDebug:false
02-17 05:04:55.897   807   807 D GeckoToolbar: onTabChanged: MENU_UPDATED
02-17 05:04:55.897   807   807 D GeckoBrowserApp: BrowserApp.onTabChanged: 0: MENU_UPDATED
02-17 05:04:55.897   807   895 D GeckoFeedService: Service started with action: org.mozilla.fennec.FEEDS.SETUP
02-17 05:04:55.897   807   895 D GeckoFeedService: Not in content notifications experiment. Skipping.
02-17 05:04:56.058   807   882 D GeckoSuggestedSites: Number of suggested sites: 8
02-17 05:04:56.128   807   821 I GeckoPushService: Handling event: PushServiceAndroidGCM:Uninitialized
02-17 05:04:56.208   807   827 I nsScreenManagerAndroid: nsWindow[0x54395250]::Show 0
02-17 05:04:57.207   807   885 D GeckoSuggestedSites: Number of suggested sites: 8
02-17 05:05:02.357   807   827 I nsScreenManagerAndroid: nsWindow[0x54395f70]::Show 0
02-17 05:05:02.357   807   827 I nsScreenManagerAndroid: trying to show invisible window! ignoring..
02-17 05:05:03.707   807   827 I nsScreenManagerAndroid: nsWindow 0x54395f70 destructor
02-17 05:05:05.998   807   872 I Gecko   : 
02-17 05:05:05.998   807   872 I Gecko   : ###!!! [Parent][MessageChannel] Error: (msgtype=0x3E003D,name=PCompositorBridge::Msg_ReleaseSharedCompositorFrameMetrics) Closed channel: cannot send/recv
02-17 05:05:05.998   807   872 I Gecko   : 
02-17 05:05:05.998   807   872 I Gecko   : 
02-17 05:05:05.998   807   872 I Gecko   : ###!!! [Parent][MessageChannel] Error: (msgtype=0x3E003D,name=PCompositorBridge::Msg_ReleaseSharedCompositorFrameMetrics) Closed channel: cannot send/recv
02-17 05:05:05.998   807   872 I Gecko   : 

Note especially:

02-17 05:04:55.557   807   820 I BaseTest: Requesting force quit.
...
02-17 05:04:55.837   807   827 I GeckoDump: Robocop:Quit received -- requesting quit

In the logcat from your phone, I see:

02-16 22:55:57.842 9694-9720/org.mozilla.fennec_klein I/BaseTest: Requesting force quit.

but nothing about Robocop:Quit being received.


:jchen -- Thoughts on what might be going wrong with Robocop:Quit or what Tom can do to debug this further?

https://dxr.mozilla.org/mozilla-central/rev/0eef1d5a39366059677c6d7944cfe8a97265a011/mobile/android/tests/browser/robocop/src/org/mozilla/gecko/tests/BaseRobocopTest.java#220
Flags: needinfo?(nchen)
Robocop:Quit is implemented in roboextender. I think somehow roboextender is not being loaded.
Flags: needinfo?(nchen)
I was able to bisect in m-c to find last good is part one of bug 1328301 (e78066796efe), first bad is part two (92f215ed4df3).  I've verified both the good and the bad commit on multiple tests: testPictureLinkContextMenu and then the first couple tests run by 'mach robocop' (so by multiple I mean three).

Bob, any ideas what might be causing this?  See comment 12 for the issue.

My phone is a Galaxy S4 at 5.0.1, adb version is 1.0.36 (but see my next comment for some context on that).

(The errors in comment 7 1) now seem to be unrelated to the Quit issue: I've seen them occur on both good and failing commits, and I've seen them show up and not show up for a given commit which shows consistent pass/fail behavior in terms of quitting correctly or incorrectly.  Also comment 7 3) occurs even on good commits, so that does appear to be an unrelated issue.)
Depends on: 1328301
Flags: needinfo?(bob)
Summary: Local 'mach robocop' wifi-related issues on arm device → App on local arm device is ignoring Robocop:Quit during 'mach robocop'
As I mentioned, my adb version is 1.0.36, but once I discovered that might be an issue, I looked into where adb was coming from and found that I had not one, not two, not three, but four different SDK directories containing an adb: ~/Android, ~/.android-sdk, ~/.mozbuild/and, and ~/.mozbuild/android-sdk-linux.

I think the first two were installed when I installed Studio (or by/via Studio at some point), and I guess the last two were installed by mach bootstrap ('and' looks to be older).  It turns out I had Studio's SDK location set to ~/.android-sdk, but adb in my path was pointing to ~/.mozbuild/android-sdk-linux/platform-tools/adb.  ~/.android-sdk/platform-tools/adb version is 1.0.32, so if different parts of the build/test process were somehow using different versions of adb (is that possible?), then I could see there being issues...

But then I reset Studio's SDK to point to ~/.mozbuild/android-sdk-linux and moved the other three SDK directories to new locations (outside my PATH), and then rebuilt everything, did 'adb kill-server', and reran everything, and got the same results.  So as far as I can tell there should only be one possible SDK folder and one possible adb that are reachable.  But if anyone thinks this still might just be a configuration issue and has any suggestions, I'd be happy to try out other suggestions.
mid-air!

Tom: You say in comment 7 that your phone is not rooted. That may be the issue right there. The only locations in https://hg.mozilla.org/mozilla-central/rev/92f215ed4df3 where I would expect root status to come into play is where it checks for self.dirExists(remoteDir) which if we are not running as root may not detect the remoteDir and the actual push/pull to remoteDir. Due to the way I work around the changed semantics in adb 1.0.36, we can push to the remoteDir's parent which may be a problem.

It sounds like you do you have an old copy of adb 1.0.32 laying around that you can put in your path in front of adb 1.0.36. Try that and see if that makes a difference? If it doesn't then it isn't the adb 1.0.36 patch.

gbrown: How can we get the logger output from a mochitest/robocop run? I've always had issues not being able to see the actual INFO and WARNING output from the logger.
Flags: needinfo?(bob)
So I ran 'mach robocop testPictureLinkContextMenu' on 92f215ed4df3 (part 2) with adb version 1.0.32, then with 1.0.36, then with 1.0.32 again.  On both 1.0.32 runs I got the same results (test output and logcat attached) - it looks like as soon as the test tries to load a page it gets:

02-18 22:50:46.082 6164-6244/org.mozilla.fennec_klein I/Gecko: FATAL ERROR: Non-local network connections are disabled and a connection attempt to mochi.test (198.105.244.130) was made.
                                                               You should only access hostnames available via the test networking proxy (if running mochitests) or from a test-specific httpd.js server (if running xpcshell tests). Browser services should be disabled or redirected to a local server.
02-18 22:50:46.082 6164-6244/org.mozilla.fennec_klein A/MOZ_CRASH: Hit MOZ_CRASH(Attempting to connect to non-local address!) at /home/worker/workspace/build/src/netwerk/base/nsSocketTransport2.cpp:1279

When I run with version 1.0.36 I get the result I'm used to (it runs the test but then Quit doesn't quit).

Then I tried e78066796efe (part 1) with adb version 1.0.32 and everything worked, no networking error, no Quit error.  So maybe there's also a 1.0.32 issue with unrooted phones with the part 2 patch?
Part 1 of that bug is for adb*.py which is not used by the unit tests so it shouldn't have any effect.

The fact that you get the Non-local crash probably means the prefs aren't being set in the profile which would be consistent with the push failing. That may be what 0:18.23 LOG: MainThread INFO Unable to copy remote profile; falling back to push is telling us. Unfortunately, we

I'm not sure what to tell you. We pretty much require the device be rooted. You can work around the problem by reverting the patch and only using adb 1.0.32, but I'm not sure if other issues would arise.
(In reply to Bob Clary [:bc:] from comment #18)
> We pretty much require the device be rooted.

Ah, okay.  Could the doc at https://wiki.mozilla.org/Mobile/Fennec/Android/Testing#robocop_UI_tests be updated then?  It says "A rooted device is NOT required." :)  (https://wiki.mozilla.org/Mobile/Fennec/Android/Testing#Running_tests_on_the_Android_emulator seems out of date as well, referring to API11.)
Let's hold off until some other people can look at this. Perhaps we can get some help to not lose the ability to run unrooted. Most people in the US and Canada are off until Tuesday. Let's touch base then and revisit.
gbrown: I think at least part of the problem may due to the way I push a local directory to the parent directory on the device to get around the changed semantics of adb push. We might be able to change that but it's not clear to me that we ever really supported unrooted devices especially with more modern releases of Android where the permissions are even more locked down.

Do you think that if we work around this issue there will be other issues which arise on unrooted devices? I hate to tell Tom that we can't help him but I don't have the time to try to solve this long standing problem on an ad hoc, whack a mole approach. Patches accepted of course.
Flags: needinfo?(gbrown)
(In reply to Tom Klein from comment #19)
> (In reply to Bob Clary [:bc:] from comment #18)
> > We pretty much require the device be rooted.
> 
> Ah, okay.  Could the doc at
> https://wiki.mozilla.org/Mobile/Fennec/Android/Testing#robocop_UI_tests be
> updated then?  It says "A rooted device is NOT required." :) 
> (https://wiki.mozilla.org/Mobile/Fennec/Android/
> Testing#Running_tests_on_the_Android_emulator seems out of date as well,
> referring to API11.)

Thanks - I updated the API levels on the wiki and changed the rooting advice to "A rooted device is recommended, but may not be required. Test harnesses may need to kill processes, copy and delete files, or perform other operations which may require special permissions on some devices." I don't think we can be more definitive than that, in general.

Still looking into some other issues...
https://hg.mozilla.org/mozilla-central/rev/a9b6d5dfdfa7 may have helped here. Tom, could you try running some tests again, against a current mozilla-central revision?
Flags: needinfo?(gbrown)
Assignee: nobody → gbrown
Priority: -- → P2
(In reply to Geoff Brown [:gbrown] from comment #23)
> https://hg.mozilla.org/mozilla-central/rev/a9b6d5dfdfa7 may have helped
> here. Tom, could you try running some tests again, against a current
> mozilla-central revision?

Much better now :D  I let it run the full suite and as far as I can tell everything was running fine and on time - the couple tests I peeked at in logcat included the "Robocop:Quit received -- requesting quit" in response to the force quit request.  I checked after about half an hour and the usb connection had been lost (no devices from 'adb devices'), but 157 tests had completed in 27 minutes before the lost connection, which seems like it was on pace to finish the full suite in the expected amount of time.

I was a little surprised at first because there's no indication in the summary report that it hadn't run all the tests; if I wasn't paying attention it would be easy to miss that.  Like a fool I didn't pipe the output for saving, but here's the bottom in case you're interested (all the failing tests here have always been permafail for me locally):

[end of huge ANR report (I still get that on every test)]
Error deleting /data/anr/traces.txt
MOZ_UPLOAD_DIR not defined; tombstone check skipped
Traceback (most recent call last):
  File "/mnt/h2/klein/sandbox/ff/objdir-frontend-arm/_tests/testing/mochitest/runrobocop.py", line 564, in run_test_harness
    runResult = robocop.runTests()
  File "/mnt/h2/klein/sandbox/ff/objdir-frontend-arm/_tests/testing/mochitest/runrobocop.py", line 523, in runTests
    result = self.runSingleTest(test)
  File "/mnt/h2/klein/sandbox/ff/objdir-frontend-arm/_tests/testing/mochitest/runrobocop.py", line 426, in runSingleTest
    self.setupRemoteProfile()
  File "/mnt/h2/klein/sandbox/ff/objdir-frontend-arm/_tests/testing/mochitest/runrobocop.py", line 271, in setupRemoteProfile
    self.dm.pushDir(self.localProfile, self.remoteProfile)
  File "/mnt/h2/klein/sandbox/ff/mozilla-central/testing/mozbase/mozdevice/mozdevice/devicemanagerADB.py", line 306, in pushDir
    (newLocal, localDir, remoteDir))
DMError: failed to push /tmp/tmpB1E6qS/profile (copy of /tmp/tmpL6WaRj.mozrunner) to /storage/sdcard0/tests
27:07.68 LOG: MainThread ERROR runrobocop.py | Received unexpected exception while running tests
27:07.68 LOG: MainThread INFO Stopping web server
27:07.71 LOG: MainThread INFO Stopping web socket server
27:07.73 LOG: MainThread INFO Stopping ssltunnel
27:08.06 LOG: MainThread INFO Buffered messages finished
27:08.06 SUITE_END: MainThread 
Summary
=======

Ran 157 tests (37 parents, 120 subtests)
Expected results: 125
Unexpected results: 12 (FAIL: 12)

Unexpected Results
==================

testAddSearchEngine
-------------------
FAIL The action mode was opened
FAIL Exception caught
testAddonManager
----------------
FAIL Waiting for menu item ^Add-ons$
FAIL Exception caught
testAdobeFlash
--------------
FAIL Pixel at 0, 0
FAIL Exception caught
testBookmarkFolders
-------------------
FAIL Checking that the correct tab is displayed
FAIL Exception caught
testBookmarksPanel
------------------
FAIL Checking that the correct tab is displayed
FAIL Exception caught
testBrowserProvider - TestInsertUrlAnnotations
----------------------------------------------
FAIL Test org.mozilla.gecko.tests.testBrowserProvider$TestInsertUrlAnnotations threw exception: java.lang.SecurityException: The authority of the uri content://org.mozilla.fennec_klein.db.browser/urlannotations?profile= does not match the one of the contentProvider: null
FAIL Exception caught
(In reply to Tom Klein from comment #24)
> Much better now :D

That's excellent! Thanks for checking, Tom.

>  I let it run the full suite and as far as I can tell
> everything was running fine and on time - the couple tests I peeked at in
> logcat included the "Robocop:Quit received -- requesting quit" in response
> to the force quit request.

Good. I think bug 1346083 was the key: our code to push directories to the device was faulty. That may have been silently failing to create a proper test profile on your device.

>  I checked after about half an hour and the usb
> connection had been lost (no devices from 'adb devices')

That's a little troubling. I wonder if some Android service crashed. Running the entire robocop suite in one go may be a bit ambitious. It ought to work, but of course the continuous integration tests break up the suite into 4 chunks, and I don't know how many developers would wait over an hour to run all the tests, and of course this is your device...so you are in uncharted territory.

> I was a little surprised at first because there's no indication in the
> summary report that it hadn't run all the tests; if I wasn't paying
> attention it would be easy to miss that.

You did get:

> 27:07.68 LOG: MainThread ERROR runrobocop.py | Received unexpected exception while running tests

I hear you, that's easy to miss, but that's all I would expect.


I'm not surprised that you are seeing some individual test failures. There are a lot of known bugs in robocop tests, which we have "managed" largely by skipping them with robocop.ini "skip-if = android_version == '18'" entries. I see that several of your test failures have open bugs and are skipped in continuous integration Android 4.3 emulator tests.


It sounds like you are able to run robocop tests on your device now and the issues originally reported are resolved, so I'm going to close this bug. Please re-open if I've misinterpreted, or if those problems return.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: