Closed
Bug 719377
Opened 13 years ago
Closed 12 years ago
Android reftests fail with a bunch of "error pulling file '/mnt/sdcard/tests/reftest/reftest.log': No such file or directory"
Categories
(Testing :: General, defect)
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: mak, Unassigned)
References
Details
(Keywords: intermittent-failure)
Attachments
(1 file)
1.14 KB,
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
https://tbpl.mozilla.org/php/getParsedLog.php?id=8662106&tree=Mozilla-Inbound
pushing directory: /tmp/tmpwo0Ora to /mnt/sdcard/tests/reftest/profile
pushing directory: /tmp/tmpwo0Ora to /mnt/sdcard/tests/reftest/profile
REFTEST INFO | runreftest.py | Running tests: start.
FIRE PROC: '"MOZ_CRASHREPORTER=1,XPCOM_DEBUG_BREAK=stack,MOZ_CRASHREPORTER_NO_REPORT=1,NO_EM_RESTART=1,MOZ_PROCESS_LOG=/tmp/tmpZ5RpHmpidlog,XPCOM_MEM_BLOAT_LOG=/tmp/tmpwo0Ora/runreftest_leaks.log" org.mozilla.fennec -no-remote -profile /mnt/sdcard/tests/reftest/profile/'
INFO | automation.py | Application pid: 1489
DeviceManager: error pulling file '/mnt/sdcard/tests/reftest/reftest.log': No such file or directory
DeviceManager: error pulling file '/mnt/sdcard/tests/reftest/reftest.log': No such file or directory
DeviceManager: error pulling file '/mnt/sdcard/tests/reftest/reftest.log': No such file or directory
...
DeviceManager: error pulling file '/mnt/sdcard/tests/reftest/reftest.log': No such file or directory
INFO | automation.py | Application ran for: 1:01:20.529375
INFO | automation.py | Reading PID log: /tmp/tmpZ5RpHmpidlog
getting files in '/mnt/sdcard/tests/reftest/profile/minidumps/'
WARNING | automationutils.processLeakLog() | refcount logging is off, so leaks can't be detected!
REFTEST INFO | runreftest.py | Running tests: end.
DeviceManager: error pulling file '/mnt/sdcard/tests/reftest/reftest.log': No such file or directory
program finished with exit code 0
elapsedTime=3688.847225
TinderboxPrint: jsreftest-1<br/><em class="testfail">T-FAIL</em>
Comment 1•13 years ago
|
||
Comment 2•13 years ago
|
||
Comment 3•13 years ago
|
||
Comment 4•13 years ago
|
||
Comment 5•13 years ago
|
||
Comment 6•13 years ago
|
||
It seems that sometimes we can't pull back the reftest log with the result.
For releng it means that we should have turned the job to purple.
372 if (self.remoteLogFile):
373 self._devicemanager.getFile(self.remoteLogFile, self.localLogName)
Perhaps, a try catch with sys.exit(5)?
[1] http://mxr.mozilla.org/mozilla-central/source/layout/tools/reftest/remotereftest.py#370
python reftest/remotereftest.py --deviceIP 10.250.49.12 --xre-path ../hostutils/xre --utility-path ../hostutils/bin --app org.mozilla.fennec --http-port 30025 --ssl-port 31025 --pidfile /builds/tegra-025/test/../remotereftest.pid --enable-privilege --bootstrap --total-chunks 3 --this-chunk 1 reftest/tests/testing/crashtest/crashtests.list --symbols-path=../http://stage.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/1331649309/fennec-13.0a1.en-US.android-arm.crashreporter-symbols.zip
in dir /builds/tegra-025/test/build/tests (timeout 2400 secs)
watching logfiles {}
argv: ['python', 'reftest/remotereftest.py', '--deviceIP', '10.250.49.12', '--xre-path', '../hostutils/xre', '--utility-path', '../hostutils/bin', '--app', 'org.mozilla.fennec', '--http-port', '30025', '--ssl-port', '31025', '--pidfile', '/builds/tegra-025/test/../remotereftest.pid', '--enable-privilege', '--bootstrap', '--total-chunks', '3', '--this-chunk', '1', 'reftest/tests/testing/crashtest/crashtests.list', '--symbols-path=../http://stage.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/mozilla-central-android/1331649309/fennec-13.0a1.en-US.android-arm.crashreporter-symbols.zip']
environment:
MINIDUMP_SAVE_PATH=/builds/tegra-025/test/minidumps
MINIDUMP_STACKWALK=/builds/tegra-025/test/tools/breakpad/osx/minidump_stackwalk
PATH=/opt/local/bin:/opt/local/sbin:/opt/local/Library/Frameworks/Python.framework/Versions/2.6/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
PWD=/builds/tegra-025/test/build/tests
SUT_IP=10.250.49.12
SUT_NAME=tegra-025
__CF_USER_TEXT_ENCODING=0x1F6:0:0
closing stdin
using PTY: False
unable to execute ADB: ensure Android SDK is installed and adb is in your $PATH
restarting as root failed
reconnecting socket
args: ['../hostutils/bin/xpcshell', '-g', '/builds/tegra-025/test/build/hostutils/xre', '-v', '170', '-f', '/builds/tegra-025/test/build/tests/reftest/reftest/components/httpd.js', '-e', "const _PROFILE_PATH = '/tmp/tmpMjgE9k';const _SERVER_PORT = '30025'; const _SERVER_ADDR ='10.250.48.200';", '-f', '/builds/tegra-025/test/build/tests/reftest/server.js']
INFO | remotereftests.py | Server pid: 25579
pushing directory: /tmp/tmplmBwig to /mnt/sdcard/tests/reftest/profile
pushing directory: /tmp/tmplmBwig to /mnt/sdcard/tests/reftest/profile
REFTEST INFO | runreftest.py | Running tests: start.
FIRE PROC: '"MOZ_CRASHREPORTER=1,XPCOM_DEBUG_BREAK=stack,MOZ_CRASHREPORTER_NO_REPORT=1,NO_EM_RESTART=1,MOZ_PROCESS_LOG=/tmp/tmpFdkHrBpidlog,XPCOM_MEM_BLOAT_LOG=/tmp/tmplmBwig/runreftest_leaks.log" org.mozilla.fennec -no-remote -profile /mnt/sdcard/tests/reftest/profile/'
INFO | automation.py | Application pid: 1539
DeviceManager: error pulling file '/mnt/sdcard/tests/reftest/reftest.log': No such file or directory
DeviceManager: error pulling file '/mnt/sdcard/tests/reftest/reftest.log': No such file or directory
...
DeviceManager: error pulling file '/mnt/sdcard/tests/reftest/reftest.log': No such file or directory
INFO | automation.py | Application ran for: 1:01:19.236892
INFO | automation.py | Reading PID log: /tmp/tmpFdkHrBpidlog
getting files in '/mnt/sdcard/tests/reftest/profile/minidumps/'
WARNING | automationutils.processLeakLog() | refcount logging is off, so leaks can't be detected!
REFTEST INFO | runreftest.py | Running tests: end.
DeviceManager: error pulling file '/mnt/sdcard/tests/reftest/reftest.log': No such file or directory
Comment 7•13 years ago
|
||
Attachment #605744 -
Flags: review?(jmaher)
Comment 8•13 years ago
|
||
Comment on attachment 605744 [details] [diff] [review]
try exception for missing remote reftest.log
looks good.
I am thinking we can optimize our tests a bit more and say if we don't get a log file in 10 minutes, then we fail. This way we are not spending an hour of time waiting for nothing. Also a max timeout of 60 minutes doesn't seem useful. Since things are split up we should make this a 30 minute limit. Maybe another bug.
Attachment #605744 -
Flags: review?(jmaher) → review+
Updated•13 years ago
|
Whiteboard: [orange] → [orange][autoland-try: -b do -p android,android-xul -u all -t all]
Comment 9•13 years ago
|
||
philor, how often do we see the reftest.log missing problem? do you know?
Whiteboard: [orange][autoland-try: -b do -p android,android-xul -u all -t all] → [orange][autoland-try:-b do -p android,android-xul -u all -t all]
Comment 10•13 years ago
|
||
Autoland Failure
There are no patches to run.
Updated•13 years ago
|
Whiteboard: [orange][autoland-try:-b do -p android,android-xul -u all -t all] → [orange]
Updated•13 years ago
|
Attachment #605744 -
Attachment is patch: true
Updated•13 years ago
|
Whiteboard: [orange] → [orange][autoland-try:-b do -p android,android-xul -u all -t all]
Updated•13 years ago
|
Whiteboard: [orange][autoland-try:-b do -p android,android-xul -u all -t all] → [orange][autoland-in-queue]
Comment 11•13 years ago
|
||
Autoland Patchset:
Patches: 605744
Branch: mozilla-central => try
Destination: http://hg.mozilla.org/try/pushloghtml?changeset=062184b8c098
Try run started, revision 062184b8c098. To cancel or monitor the job, see: https://tbpl.mozilla.org/?tree=Try&rev=062184b8c098
Comment 12•13 years ago
|
||
(In reply to Armen Zambrano G. [:armenzg] - Release Engineer from comment #9)
> philor, how often do we see the reftest.log missing problem? do you know?
With code from Sunday, or with code from Monday?
That's just the reftest harness flavor of bug 722166, either the browser not starting or crashing on startup before we notice it crashed. Something landed Monday on inbound which caused us to hit that around 2-5 times per run (plus another few of the Talos runs, which fail with their own messages).
Comment 13•13 years ago
|
||
We should turn the job red. I believe they are orange which is absolutely not good.
> reftest-1: T-FAIL
> program finished with exit code 0
Orange :S ==> https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=beb93f812874&jobname=Android Tegra 250 mozilla-inbound opt test reftest-1
See Also: → 722166
Comment 14•13 years ago
|
||
so this shouldn't be red because it can be:
* browser hanging
* browser not starting properly
* losing network connectivity
I suspect the network connectivity is a big offender here which would be a 'red' offense, but the others should be orange.
Comment 15•13 years ago
|
||
Even if it is "red" the developer has no log or minidumps to fix it.
My patch would turn it "purple" which means infra problem and retry.
Perhaps, that is incorrect since we might get in an infinite loop of retried jobs if the crash is a permanent one.
jmaher, is there a bug file where we can handle these crashes better and recuperate those logs and minidumps?
IIUC that is the root problem and we're focusing on the make up for the corpse.
Comment 16•13 years ago
|
||
We rarely get minidumps, I would say 1/1000 failures. There is no log file, as we usually have the log file if one exists.
For example, if the test stops halfway through, then we get this message a lot. In the log file on tbpl up to that point is the contents of reftest.log already. There is no other information. If the test shows no output other than 'fire proc', 'unable to find file'- then there is no file.
I would like to detect no logs faster and terminate faster to free up tegra time!
Comment 17•13 years ago
|
||
Actually, your patch would turn it the same orange it is, which is a good thing :)
Automatic retry is blue rather than purple, and it's not nearly that easy to set, but if it was it wouldn't be a matter of "we might get in an infinite loop" - this is a symptom of a browser which won't start up or crashes on startup, which is something we do all the time in code pushes - it would be a matter of "how many days will it be before we get an infinite loop from this, and will it be on a tree where philor watches, so he might see it and kill the loop, or will it be on try, where it will just continue forever unless a reconfig or a catastrophic master restart or something kills it?"
You're returning an exit code of 5 in http://mxr.mozilla.org/build/source/buildbotcustom/steps/unittest.py#954 which gets its evaluateCommand from http://mxr.mozilla.org/build/source/buildbotcustom/steps/unittest.py#369 so first your super_class (eventually a ShellCommand by way of ShellCommandReportTimeout in http://mxr.mozilla.org/build/source/buildbotcustom/steps/unittest.py#384) evaluates status, says "exit code was non-zero, I'll set the status for the step as FAILURE" and then in http://mxr.mozilla.org/build/source/buildbotcustom/steps/unittest.py#287 evaluateReftest will turn that FAILURE into WARNINGS and the job will still be orange.
Comment 18•13 years ago
|
||
Try run for 062184b8c098 is complete.
Detailed breakdown of the results available here:
https://tbpl.mozilla.org/?tree=Try&rev=062184b8c098
Results (out of 58 total builds):
exception: 2
success: 51
warnings: 4
failure: 1
Builds (or logs if builds failed) available at:
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/autolanduser@mozilla.com-062184b8c098
Updated•13 years ago
|
Whiteboard: [orange][autoland-in-queue] → [orange]
Comment 19•13 years ago
|
||
Comment 20•13 years ago
|
||
Comment 21•13 years ago
|
||
Comment 22•13 years ago
|
||
Comment 23•13 years ago
|
||
Comment 24•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Keywords: intermittent-failure
Assignee | ||
Updated•12 years ago
|
Whiteboard: [orange]
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•