Closed Bug 756440 Opened 7 years ago Closed 7 years ago

Intermittent Android robocop or mochitest TEST-UNEXPECTED-FAIL | DeviceManager: pulling file '/mnt/sdcard/tests/logs/mochitest.log' unsuccessful: No such file or directory | Exception caught while running (robocop) tests

Categories

(Testing :: General, defect)

ARM
Android
defect
Not set

Tracking

(firefox18 fixed)

RESOLVED FIXED
mozilla19
Tracking Status
firefox18 --- fixed

People

(Reporter: emorley, Assigned: wlach)

References

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

Android Tegra 250 mozilla-inbound opt test robocop on 2012-05-18 05:21:32 PDT for push 1d43642295b1

slave: tegra-273

https://tbpl.mozilla.org/php/getParsedLog.php?id=11860773&tree=Mozilla-Inbound

{
INFO | runtests.py | Running tests: start.

FIRE PROC: '"MOZ_CRASHREPORTER=1,XPCOM_DEBUG_BREAK=stack,MOZ_CRASHREPORTER_NO_REPORT=1,NO_EM_RESTART=1,MOZ_PROCESS_LOG=/tmp/tmp1rsn_Lpidlog,XPCOM_MEM_BLOAT_LOG=/tmp/tmp5zkYpC/runtests_leaks.log" am instrument -w -e class org.mozilla.fennec.tests.testAwesomebar org.mozilla.roboexample.test/android.test.InstrumentationTestRunner'
INFO | automation.py | Application pid: 0
DeviceManager: error pulling file '/mnt/sdcard/tests/logs/mochitest.log': No such file or directory

INFO | automation.py | Application ran for: 0:00:51.691433
INFO | automation.py | Reading PID log: /tmp/tmp1rsn_Lpidlog
getting files in '/mnt/sdcard/tests/profile/minidumps/'
WARNING | automationutils.processLeakLog() | refcount logging is off, so leaks can't be detected!

INFO | runtests.py | Running tests: end.
DeviceManager: error pulling file '/mnt/sdcard/tests/logs/mochitest.log': No such file or directory
removing file: /mnt/sdcard/tests/logs/mochitest.log
TEST-UNEXPECTED-FAIL | [Errno 2] No such file or directory: 'mochitest.log' | Exception caught while running robocop tests.
}
https://tbpl.mozilla.org/php/getParsedLog.php?id=15827298&tree=Mozilla-Inbound
tegra-102
Summary: Intermittent Android robocop TEST-UNEXPECTED-FAIL | [Errno 2] No such file or directory: 'mochitest.log' | Exception caught while running robocop tests. → Intermittent Android robocop TEST-UNEXPECTED-FAIL | DeviceManager: pulling file '/mnt/sdcard/tests/logs/mochitest.log' unsuccessful: No such file or directory | Exception caught while running robocop tests
Summary: Intermittent Android robocop TEST-UNEXPECTED-FAIL | DeviceManager: pulling file '/mnt/sdcard/tests/logs/mochitest.log' unsuccessful: No such file or directory | Exception caught while running robocop tests → Intermittent Android robocop or mochitest TEST-UNEXPECTED-FAIL | DeviceManager: pulling file '/mnt/sdcard/tests/logs/mochitest.log' unsuccessful: No such file or directory | Exception caught while running (robocop) tests
Hey Joel/Team,

I'm told this is near-perma-orange for Android/Opt right now... If there is something relatively obvious that we can try to rectify this, I'll love to know.

Just to be clear, we're seeing:

getting files in '/mnt/sdcard/tests/profile/minidumps/'
INFO | runtests.py | Received unexpected exception while running application
Traceback (most recent call last):
  File "/builds/tegra-178/test/build/tests/mochitest/runtests.py", line 731, in runTests
    timeout = timeout)
  File "/builds/tegra-178/test/build/tests/mochitest/automation.py", line 1020, in runApp
    self.checkForCrashes(profileDir, symbolsPath)
  File "/builds/tegra-178/test/build/tests/mochitest/remoteautomation.py", line 78, in checkForCrashes
    self._devicemanager.getDirectory(self._remoteProfile + '/minidumps/', dumpDir)
  File "/builds/tegra-178/test/build/tests/mochitest/devicemanagerSUT.py", line 668, in getDirectory
    remoteDir)
DMError: Automation Error: Error getting directory: /mnt/sdcard/tests/profile/minidumps/ not a directory
WARNING | automationutils.processLeakLog() | refcount logging is off, so leaks can't be detected!

INFO | runtests.py | Running tests: end.
TEST-UNEXPECTED-FAIL | DeviceManager: pulling file '/mnt/sdcard/tests/logs/mochitest.log' unsuccessful: No such file or directory | Exception caught while running robocop tests.
program finished with exit code 1
Hmm, looks like a pretty obvious problem with the merge of bug 795456. Not sure why this wasn't caught earlier by all the try stuff I was doing. I guess we only check for crashes in very specific circumstances? Anyway, have to run now, but I'll do up a patch later today and run it through try.
Ok, on further inspection the problems getting the minidumps was a bit of a red herring. This directory should always exist so long as we create the profile, but it looks like in at least of the last orange that wasn't happening, so we were throwing an exception when that happened. 

What I suspect is really going on is that the app isn't starting properly at all for some reason. If you look closely at the last log file (https://bugzilla.mozilla.org/show_bug.cgi?id=756440#c131), you'll see that fennec was only actually running for about 3 seconds. Clearly something is going very very wrong on startup, but we'll need something like a logcat to better diagnose what that is. I think the right thing here is just to print a warning if this directory doesn't exist. It's an indication that's something wrong, but not a definite problem.

The other exception that we're seeing in the log about being unable to retrieve the log file is almost certainly the same issue: we didn't get far enough with running the app to actually write anything to the remote log, so we're throwing an exception when we try to get it. I think the right thing here is similar: print a warning that we had a problem here, but don't outright fail. It's likely the real problem is elsewhere.

I'll attach patches to behave better in the two cases I mention, but we still need to spend some time finding the underlying cause...
See patch for further details on what's going on here.
Attachment #671117 - Flags: review?(jmaher)
I worry that there might be some side effect to what I'm doing here that I don't understand. However, I think the overall idea here is right.

(as before, more detail in the patch and in the bug)
Attachment #671118 - Flags: review?(jmaher)
(In reply to William Lachance (:wlach) from comment #138)

> I'll attach patches to behave better in the two cases I mention, but we
> still need to spend some time finding the underlying cause...

On that note, I'm looking at the source code to runtestsremote.py (mochitest remote) and am seeing quite a few things that are standing in the way of such diagnose. We don't collect/print the logcat in the case of failure, for example. We should try to fix that.
Comment on attachment 671117 [details] [diff] [review]
Only try to download crash directory on device if it exists

Review of attachment 671117 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/mobile/remoteautomation.py
@@ +85,5 @@
> +            except:
> +                print "WARNING: unable to remove directory: %s" % dumpDir
> +        else:
> +            print "WARNING: No crash directory (%s) on remote " \
> +                "device" % remoteCrashDir

i don't think we need a message here.
Attachment #671117 - Flags: review?(jmaher) → review+
Comment on attachment 671118 [details] [diff] [review]
Only warn when we fail cleanup steps with mobile mochitest/reftest

Review of attachment 671118 [details] [diff] [review]:
-----------------------------------------------------------------

thanks!!
Attachment #671118 - Flags: review?(jmaher) → review+
Did a try run over the weekend, looks good except for some unrelated oranges:

https://tbpl.mozilla.org/?tree=Try&rev=63ff0b435078
(In reply to Joel Maher (:jmaher) from comment #165)
> Comment on attachment 671117 [details] [diff] [review]
> Only try to download crash directory on device if it exists

> ::: build/mobile/remoteautomation.py
> @@ +85,5 @@
> > +            except:
> > +                print "WARNING: unable to remove directory: %s" % dumpDir
> > +        else:
> > +            print "WARNING: No crash directory (%s) on remote " \
> > +                "device" % remoteCrashDir
> 
> i don't think we need a message here.

On IRC we agreed that since the crash directory should be present on normal startup (even if no crashes occurred), we should still warn in this circumstance
So I talked a bit with :jmaher on irc about determining the "root cause", and it came up that fennec not starting up properly is *not* a new problem, just the particular orange messages we're seeing in this bug. Thus it's probably ok to mark this one as fixed-- we'll still error out in this case, and we'll star oranges under other existing bugs which correspond to this problem.

Inbound:

https://hg.mozilla.org/integration/mozilla-inbound/rev/1a7ebe9779c5
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ec5186031b0
https://hg.mozilla.org/mozilla-central/rev/1a7ebe9779c5
https://hg.mozilla.org/mozilla-central/rev/9ec5186031b0
Assignee: nobody → wlachance
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla19
https://tbpl.mozilla.org/php/getParsedLog.php?id=16207019&tree=Mozilla-Inbound

(In reply to William Lachance (:wlach) from comment #169)
> So I talked a bit with :jmaher on irc about determining the "root cause",
> and it came up that fennec not starting up properly is *not* a new problem,
> just the particular orange messages we're seeing in this bug. Thus it's
> probably ok to mark this one as fixed-- we'll still error out in this case,
> and we'll star oranges under other existing bugs which correspond to this
> problem.

Maybe I would, except that when other things hit bug 722166, they don't have a big fat TEST-UNEXPECTED-FAIL pointing right square at this bug, and instead they have logcat making me feel slightly guilty about not trying to read and interpret it.
Can we request Aurora uplift for this please?
(In reply to Ryan VanderMeulen from comment #208)
> Can we request Aurora uplift for this please?

a=test-only

https://hg.mozilla.org/releases/mozilla-aurora/rev/639ca47ee94c
https://hg.mozilla.org/releases/mozilla-aurora/rev/58b9e97c5b2d
Whiteboard: [orange]
Component: New Frameworks → General
You need to log in before you can comment on or make changes to this bug.