Closed Bug 1101133 Opened 10 years ago Closed 9 years ago

Intermittent Jit tests fail with "No tests run or test summary not found"

Categories

(Core :: JavaScript Engine: JIT, defect)

36 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
firefox35 --- fixed
firefox36 --- fixed
firefox37 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: KWierso, Assigned: dminor)

References

Details

Attachments

(2 files)

I've seen this a couple times lately, where the Windows Debug Jit tests fail to run anything:
https://treeherder.mozilla.org/ui/logviewer.html#?job_id=4004418&repo=mozilla-inbound

After a push or two, these failures just seem to go away.
I think it could be new validation that's catching things that are supposed to be failures.

see cset added Nov 12th: http://hg.mozilla.org/build/mozharness/rev/bea2df1c0276

cc'n jgriffin to sanity check that it is working as intended.
Flags: needinfo?(jgriffin)
Yes, this seems to be working as intended.  As you can see from the linked log, no tests were actually run there.  This may be flagging errors which were silently being ignored before.
Flags: needinfo?(jgriffin)
It can't find jit_test.py or some other file? Pretty weird but it seems more of a build issue than a JS issue... Who can we ask about this? Maybe we can get access to one of the slaves to see what's going on?

07:47:26     INFO - Calling ['C:\\slave\\test\\build\\venv\\Scripts\\python', '-u', 'C:\\slave\\test\\build\\tests\\jit-test\\jit-test/jit_test.py', 'tests/bin/js', '--no-slow', '--no-progress', '--tinderbox', '--tbpl'] with output_timeout 1000
07:47:26     INFO -  C:\slave\test\build\venv\Scripts\python: can't open file 'C:\slave\test\build\tests\jit-test\jit-test/jit_test.py': [Errno 2] No such file or directory
07:47:26    ERROR - Return code: 2
07:47:26    ERROR - No tests run or test summary not found
(In reply to Jan de Mooij [:jandem] from comment #33)
> It can't find jit_test.py or some other file? Pretty weird but it seems more
> of a build issue than a JS issue... Who can we ask about this? Maybe we can
> get access to one of the slaves to see what's going on?
> 
> 07:47:26     INFO - Calling
> ['C:\\slave\\test\\build\\venv\\Scripts\\python', '-u',
> 'C:\\slave\\test\\build\\tests\\jit-test\\jit-test/jit_test.py',
> 'tests/bin/js', '--no-slow', '--no-progress', '--tinderbox', '--tbpl'] with
> output_timeout 1000
> 07:47:26     INFO -  C:\slave\test\build\venv\Scripts\python: can't open
> file 'C:\slave\test\build\tests\jit-test\jit-test/jit_test.py': [Errno 2] No
> such file or directory
> 07:47:26    ERROR - Return code: 2
> 07:47:26    ERROR - No tests run or test summary not found

Note the forward slash in the path C:\slave\test\build\tests\jit-test\jit-test/jit_test.py
(In reply to Benjamin Bouvier [:bbouvier] from comment #40)
> Note the forward slash in the path
> C:\slave\test\build\tests\jit-test\jit-test/jit_test.py

Most Windows internal path-parsing accepts either backwards *or* forward slashes.  Dunno what python does here, but this could easily be a red herring.
Flags: needinfo?(dminor)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #45)
> (In reply to Benjamin Bouvier [:bbouvier] from comment #40)
> > Note the forward slash in the path
> > C:\slave\test\build\tests\jit-test\jit-test/jit_test.py
> 
> Most Windows internal path-parsing accepts either backwards *or* forward
> slashes.  Dunno what python does here, but this could easily be a red
> herring.

I found the reason for the forward slash:
https://hg.mozilla.org/build/mozharness/diff/ca3b37a3b8ff/scripts/desktop_unittest.py#l1.307

That first appeared Thu Sep 6 09:13:01 2012 -0700 so I suspect that is not the cause, although I'll add a patch for changing this, to avoid future confusions:

base_cmd.append(dirs["abs_%s_dir" % suite_category] + "/" + run_file)
=>
base_cmd.append(os.path.join(dirs["abs_%s_dir" % suite_category], run_file))

Also since this is intermittent, I think it cannot be the cause.

This also doesn't seem to be specific to a particular slave or set of slaves, since there are multiple slaves affected, but does only seem to be Windows slaves (note the jit test suite is also running on Mac). So:
  * something Windows-specific
  * not the path issue
  * not any particular windows slave or set of slaves
  * intermittent

I'll see if I can catch a slave red-handed, and poke around on its disk to see if the file is genuinely missing.
Pete's idea of trying to catch a slave red-handed makes good sense to me.
Flags: needinfo?(dminor)
OK, here goes, I'm looking at slave from comment 81...
Comment on attachment 8525948 [details] [diff] [review]
bug1101133_mozharness_forward-slash-to-backslash-fix_v1.patch

ship it! os.path.join all the paths!
Attachment #8525948 - Flags: review?(jlund) → review+
Pete, find anything interesting looking at that slave?
Flags: needinfo?(pmoore)
An Dan, I do apologise - I tried on friday to connect, and had nonstop VPN issues. Today it passed me by, and I totally forgot to try again. Will try again...
Flags: needinfo?(pmoore)
(In reply to TBPL Robot from comment #152)
> submit_timestamp: 2014-11-24T01:52:19
> log:
> https://treeherder.mozilla.org/ui/logviewer.html#?repo=fx-team&job_id=1271306
> repository: fx-team
> who: tomcat[at]mozilla[dot]com
> machine: t-w732-ix-037
> buildname: Windows 7 32-bit fx-team debug test jittest
> revision: a7bd9b15a071
> 
> Return code: 2
> No tests run or test summary not found

On T-W732-IX-037, C:\slave\test\build\tests\jit-test is completely missing:

cltbld@T-W732-IX-037 /c/slave/test/build/tests
$ ls
bin  certs  config  mochitest  modules  mozbase

cltbld@T-W732-IX-037 /c/slave/test/build/tests
$

:philor pointed out this could be a problem with packaging the tests up in build phase, rather than a problem when running the test.

Will look into build logs for this run...
OK the zip upload looks fine - i.e. there is a jit-test directory (rather too many, if anything), e.g.:

$ curl -s -L 'http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/fx-team-win32-debug/1416819851/firefox-36.0a1.en-US.win32.tests.zip' | jar tv | grep jit_test.py
 11932 Mon Nov 24 01:40:00 CET 2014 jit-test/jit-test/jit-test/jit_test.py

I'll now see if I can work out why this no longer is on the filesystem...
So it looks like the zip was downloaded and extracted ok:

01:44:39     INFO - retry: Calling <bound method Proxxy._download_file of <mozharness.mozilla.proxxy.Proxxy object at 0x01D2E630>> with args: ('https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/fx-team-win32-debug/1416819851/firefox-36.0a1.en-US.win32.tests.zip', 'C:\\slave\\test\\build\\firefox-36.0a1.en-US.win32.tests.zip'), kwargs: {}, attempt #1
01:44:41     INFO - Downloaded 132598778 bytes.
01:44:41     INFO - mkdir: C:\slave\test\build\tests
01:44:41     INFO - Running command: ['unzip', '-q', '-o', 'C:\\slave\\test\\build\\firefox-36.0a1.en-US.win32.tests.zip', 'bin/*', 'certs/*', 'modules/*', 'mozbase/*', 'config/*', 'jit-test/*'] in C:\slave\test\build\tests
01:44:41     INFO - Copy/paste: unzip -q -o C:\slave\test\build\firefox-36.0a1.en-US.win32.tests.zip bin/* certs/* modules/* mozbase/* config/* jit-test/*
01:44:48     INFO - Return code: 0

So options I can think of:

1) it had a 0 exit code but failed for some reason
2) it was downloaded and extracted ok, but before the test ran, got deleted somehow
3) for some reason it was downloaded to the wrong place, or the zip file was structured badly
Bingo!!!


$ curl -s -L 'https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/fx-team-win32-debug/1416819851/firefox-36.0a1.en-US.win32.tests.zip' | jar tv | grep jit_test.py
 11932 Mon Nov 24 01:40:00 CET 2014 jit-test/jit-test/jit-test/jit_test.py

<== bad run has an extra jit-test subdirectory


$ curl -s -L 'https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/fx-team-win32-debug/1416605775/firefox-36.0a1.en-US.win32.tests.zip to C:\slave\test\build\firefox-36.0a1.en-US.win32.tests.zip' | jar tv | grep jit_test.py
 11932 Fri Nov 21 14:41:22 CET 2014 jit-test/jit-test/jit_test.py

<== good run has one less jit-test subdirectory
So the question .... why do we sometimes get an extra directory in the packaged tests, e.g.

jit-test/jit-test/jit-test/jit_test.py
instead of
jit-test/jit-test/jit_test.py
(In reply to Release Engineering SlaveAPI Service from comment #153)
> In production: https://hg.mozilla.org/build/mozharness/rev/6c660d6a19de

No failures after this comment but maybe that's just a coincidence...
Not sure if this is significant or not, but every build log for a starred run I've looked at so far as had this:

05:47:40     INFO -  "c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\src\obj-firefox\dist\test-stage\jit-test\jit-test\tests\pic": The directory is not empty.
05:47:40     INFO -  "c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\src\obj-firefox\dist\test-stage\jit-test\jit-test\tests\symbol": The directory is not empty.
05:47:40     INFO -  "c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\src\obj-firefox\dist\test-stage\jit-test\jit-test\tests\truthiness": The directory is not empty.
05:47:40     INFO -  "c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\src\obj-firefox\dist\test-stage\jit-test\jit-test\tests": The directory is not empty.
05:47:40     INFO -  "c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\src\obj-firefox\dist\test-stage\jit-test\jit-test": The directory is not empty.
05:47:40     INFO -  "c:\builds\moz2_slave\fx-team-w32-d-0000000000000000\build\src\obj-firefox\dist\test-stage\jit-test": The directory is not empty.

and I have never seen those errors in a build that results in a successful jit test run.
Oh, interesting! Is this the result of a shell cp/mv or something the harness is doing? Are the semantics for overwriting on windows unexpectedly different from the development platform or is a removal/cleanup script not running when it was expected it would?
(In reply to Terrence Cole [:terrence] from comment #179)
> Oh, interesting! Is this the result of a shell cp/mv or something the
> harness is doing? Are the semantics for overwriting on windows unexpectedly
> different from the development platform or is a removal/cleanup script not
> running when it was expected it would?

On the build machine we're doing a rm -rf on the test staging dir and it looks like this is failing in some cases on windows. I chatted with pmoore on irc and one possibility is that sometimes windows thinks these files are still in use and there is a race as to whether or not they will be removed properly. We use msys for shell commands on windows, so maybe there is something with their implementation as well.

I have a patch to the jit-test packaging rules that will hopefully work around this problem for now.
Comment on attachment 8529830 [details] [diff] [review]
Change packaging for jit-tests

Review of attachment 8529830 [details] [diff] [review]:
-----------------------------------------------------------------

We should probably file a followup on trying to figure out why things aren't getting removed properly, but I think this will at least stop the immediate problem.
Attachment #8529830 - Flags: review?(ted) → review+
Filed bug 1106162 for the intermittent failure removing the test-stage directory.
https://hg.mozilla.org/mozilla-central/rev/d101d9574541
Assignee: nobody → dminor
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Removing intermittent-failure keyword so this doesn't show up as a suggestion when starring because we've had a few process crashes starred to this bug.
You need to log in before you can comment on or make changes to this bug.