Open Bug 1318597 Opened 8 years ago Updated 2 years ago

Intermittent make: *** [check] Error 245

Categories

(Core :: General, defect)

defect

Tracking

()

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, leave-open, Whiteboard: [ele:1a])

Attachments

(1 file)

See Also: → 1308472
All recent failures here are OSX and end in:

 01:06:53     INFO - /builds/slave/autoland-m64-00000000000000000/build/src/config/tests/test_mozbuild_reading.py
 01:06:53     INFO - WARNING: Not a supported OS_TARGET for NSPR in moz.build: "". Use --with-system-nspr
 01:06:53     INFO - TEST-PASS | /builds/slave/autoland-m64-00000000000000000/build/src/config/tests/test_mozbuild_reading.py | TestMozbuildReading.test_filesystem_traversal_no_config
 01:06:53     INFO - TEST-SKIP | /builds/slave/autoland-m64-00000000000000000/build/src/config/tests/test_mozbuild_reading.py | TestMozbuildReading.test_filesystem_traversal_reading
 01:06:53     INFO - TEST-PASS | /builds/slave/autoland-m64-00000000000000000/build/src/config/tests/test_mozbuild_reading.py | TestMozbuildReading.test_orphan_file_patterns
 01:06:53     INFO - make: *** [check] Error 245
01:06:53 INFO - make -C tests/src-simple check-jar

See :mshal's comments in bug 1308472.
Hey Mike, see also bug 1308472 comment 52 for more details.

But the gist is, I tried running these tests in a loop for several hours and I wasn't able to reproduce the intermittent. So I'm not sure if this patch will fix anything or not. It's an educated guess at the problem.

I don't want to retrigger hundreds of builds to see if it still happens on a try push, so I think we have two options:

1. Land blind and see if the intermittent goes away (I think it's an acceptable change to land even if it doesn't fix anything).

2. Wait for your work in bug 1331663. Once that lands, I'll be able to pull the mozlint unittests out of the build and into their own task. This will both make this intermittent much less frequent, and also make it easier for us to isolate and debug the problem.

Unless you object, I'm happy to do option 1 as it probably won't hurt anything.
Comment on attachment 8829956 [details]
Bug 1318597 - Explicitly call manager.shutdown() after running mozlint,

https://reviewboard.mozilla.org/r/106916/#review108000

I think this is reasonable to land. We should see pretty quickly whether or not it helps.
Attachment #8829956 - Flags: review?(mshal) → review+
FYI the easiest way I found to reproduce it is to include this in testing/testsuite-targets.mk for a try push:

diff --git a/testing/testsuite-targets.mk b/testing/testsuite-targets.mk
index f5364807..530c13b 100644
--- a/testing/testsuite-targets.mk
+++ b/testing/testsuite-targets.mk
@@ -279,6 +279,104 @@ check::
        $(eval cores=$(shell $(PYTHON) -c 'import multiprocessing; print(multiprocessing.cpu_count())'))
        @echo "Starting 'mach python-test' with -j$(cores)"
        @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
+       @$(topsrcdir)/mach --log-no-times python-test -j$(cores)
...
(100 entries of this)

I'm sure there's a better way to do it in shell, but I didn't want to accidentally hide the return code.

That way for 1 build you get 100 test tries, and then you can do say 20 retriggers and get 2000 runs of the tests. Unfortunately I did this on Taskcluster so I didn't get a try high score :(

For some reason I never managed to reproduce it by just running 'mach python-test python/mozlint', only the full 'mach python-test' suite. I'm not sure if that's just bad luck or a clue that these tests are conflicting with something else running at the same time.

Bug 1331663 turned out to be trickier than I thought, but hopefully soon...
Ah thanks, that's a good idea! Yeah, I had run:
while ./mach python-test python/mozlint; do :; done

from the shell.. but I guess I should have ran the full suite. I think I'm going to land the patch without checking though, because I have a feeling that even if it doesn't help this problem, it could prevent other problems.
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7c8c355ed805
Explicitly call manager.shutdown() after running mozlint, r=mshal
Leaving open until we can verify whether this fixes anything or not.
Whiteboard: [leave-open]
That looks like a pretty significant drop in the intermittent rate! I don't know if I'm comfortable calling this resolved yet.. Let's wait another week to get full week with the fix landed before closing this.
Unfortunately, looking at bug 1308472's orange report, I still see a few failures. It's possible that the fix ended up changing how OSX reports the return code - now on OSX I see this:

10:44:43     INFO - Setting retcode to 1 from /builds/slave/autoland-m64-00000000000000000/build/src/python/mozlint/test/test_types.py

Whereas before we got:

05:25:35     INFO - Setting retcode to -11 from /builds/slave/m-in-m64-000000000000000000000/build/src/python/mozlint/test/test_types.py

(the -11 goes from signed to unsigned somewhere along the way, giving us the Error 245)

Here are the two recent ones I saw on the other bug (there are still a bunch of false positives there) -

OSX: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=73050965&lineNumber=14285
Linux: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=72505125&lineNumber=45803
Keywords: leave-open
Whiteboard: [leave-open][ele:1a] → [ele:1a]
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: