<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Dustin J. Mitchell [:dustin] (he/him)

Comment 2

•

8 years ago

there's been no recent changes to the builders but i note the referenced log file is for a pgo build which is something we've only been doing since bug 1304998 so if that's where all the failures are, it may be a problem with pgo configs.

Flags: needinfo?(rthijssen)

Comment 3

•

8 years ago

https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1308472&tree=all The OS X failure is mis-starred. But the rest are win32 pgo and win32 opt. All I can find for ERRORs is: 08:36:59 ERROR - Return code: 1 09:34:05 ERROR - Return code: 2 09:34:05 ERROR - 'make -k check' did not run successfully. Please check log for errors. so basically it looks like make is bombing out on its own, rather than reflecting an error from something it invoked. At any rate, this is beyond my ability to debug with this limited information.

Component: General → Build Config

Flags: needinfo?(pmoore)

Product: Taskcluster → Core

(not currently active) Ted Mielczarek

Comment 4

•

8 years ago

Looking at one of those logs: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=37243123#L40422 The log parser isn't finding some of the errors. Specifically, there's stuff like: 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-unittest=- 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-unittest-level=error 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-html=- 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-html-level=error 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-tbpl=- 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-tbpl-level=error 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-xunit=- 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-xunit-level=error 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-errorsummary=- 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-errorsummary-level=error 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-mach=- 12:15:12 INFO - usage: test.py [-h] [--log-raw LOG_RAW] [--log-raw-level LOG_RAW_LEVEL] 12:15:12 INFO - test.py: error: unrecognized arguments: --log-mach-level=error 12:15:12 INFO - Exception WindowsError: (32, 'The process cannot access the file because it is being used by another process', 'c:\\users\\task_1475834533\\appdata\\local\\temp\\tmpqpdska') in <bound method NamedTemporaryFile.__del__ of <mozfile.mozfile.NamedTemporaryFile object at 0x034503D0>> ignored 12:15:12 INFO - Exception WindowsError: (32, 'The process cannot access the file because it is being used by another process', 'c:\\users\\task_1475834533\\appdata\\local\\temp\\tmpro5zzh') in <bound method NamedTemporaryFile.__del__ of <mozfile.mozfile.NamedTemporaryFile object at 0x034CADB0>> ignored 12:15:12 INFO - Exception WindowsError: (32, 'The process cannot access the file because it is being used by another process', 'c:\\users\\task_1475834533\\appdata\\local\\temp\\tmp9jlr4s') in <bound method NamedTemporaryFile.__del__ of <mozfile.mozfile.NamedTemporaryFile object at 0x034CA3D0>> ignored 12:15:12 INFO - Exception WindowsError: (32, 'The process cannot access the file because it is being used by another process', 'c:\\users\\task_1475834533\\appdata\\local\\temp\\tmpq86gzl') in <bound method NamedTemporaryFile.__del__ of <mozfile.mozfile.NamedTemporaryFile object at 0x034CA830>> ignored <...> 12:15:24 INFO - z:/task_1475834533/build/src/testing/testsuite-targets.mk:340: recipe for target 'check' failed 12:15:24 INFO - mozmake: *** [check] Error 1 Some of this might be fallout from bug 1304593.

Comment 5

•

8 years ago

Or bug 1304593 didn't fix the entire intermittent as "make -k check return code 2" was happening over there too. But these look like errors in the mozbase tests, so could be differences in how it's run. Given this is intermittent, I'm guessing the "unrecognized arguments" errors are not the problem (though should likely be fixed anyway). It's the WindowsError at the bottom that is causing the orange.

Comment hidden (Intermittent Failures Robot)

Comment 7

•

8 years ago

it's occurred to me that some of the race conditions and file locks we see in windows builds (like this one) might be due to problems with performance on the c: drive (because of the first-read ebs problems: bug 1305174). often when this comes up, the file in question is a temp file on the c: drive. whereas the build working directory for tc win builds, is on the faster ephemeral volumes (z:). if this is the cause (or perhaps one of the causes) we could possibly address it in a couple different ways. we already have bug 1306989 against the generic worker to set the values of the temp environment variables to paths on z: but it's also possible to make the change in-tree with something like this: https://hg.mozilla.org/try/diff/48a500082516/taskcluster/taskgraph/transforms/job/mozharness.py which just sets the vars and creates the temp folders for the task user before the build starts.

Comment 8

•

8 years ago

Actually, here is a green windows log that has the same messages as those in comment 4: https://archive.mozilla.org/pub/firefox/try-builds/ahalberstadt@mozilla.com-b5bd0b383510b9afb5120bb90e2e2017762577c3/try-win32/try-win32-bm75-try1-build1403.txt.gz So I think that's unrelated.

Comment hidden (Intermittent Failures Robot)

Comment 11

•

8 years ago

I've now seen several failure logs from jobs where temp data was not on the c: drive, so I no longer think my theory in comment 7 is worth exploring.

Comment hidden (Intermittent Failures Robot)

Comment 14

•

8 years ago

:grenade, are you investigating this? 94 occurrences last week seems like a lot.

Flags: needinfo?(rthijssen)

Comment 15

•

8 years ago

i'm not looking at this. if it had been pgo only, then it could have been explained by a problem with pgo config, which i recently changed, but for make, we'll need build team expertise.

Flags: needinfo?(rthijssen)

David Burns :automatedtester

Comment 16

•

8 years ago

this is an error setting up hg: 13:49:45 INFO - [mozharness: 2016-10-15 13:49:45.751000Z] Skipping clobber step. 13:49:45 INFO - [mozharness: 2016-10-15 13:49:45.751000Z] Running clone-tools step. 13:49:45 INFO - Running pre-action listener: influxdb_recording_pre_action 13:49:45 INFO - Running main action method: clone_tools 13:49:45 INFO - retry: Calling _get_revision with args: (<mozharness.base.vcs.mercurial.MercurialVCS object at 0x0315AEB0>, 'z:\\task_1476538045\\build\\tools'), kwargs: {}, attempt #1 13:49:45 WARNING - did not specify revision or branch; assuming "default" 13:49:45 INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', '--version'] 13:49:45 INFO - Copy/paste: hg --config ui.merge=internal:merge --version 13:49:45 INFO - Mercurial Distributed SCM (version 3.7.3) 13:49:45 INFO - (see https://mercurial-scm.org for more information) 13:49:45 INFO - Copyright (C) 2005-2016 Matt Mackall and others 13:49:45 INFO - This is free software; see the source for copying conditions. There is NO 13:49:45 INFO - warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. 13:49:45 INFO - Return code: 0 13:49:45 INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', 'debuginstall'] 13:49:45 INFO - Copy/paste: hg --config ui.merge=internal:merge debuginstall 13:49:46 INFO - checking encoding (cp1252)... 13:49:46 INFO - checking Python executable (C:\mozilla-build\python\Scripts\hg.exe) 13:49:46 INFO - checking Python version (2.7.11) 13:49:46 INFO - checking Python lib (C:\mozilla-build\python\lib)... 13:49:46 INFO - checking installed modules (C:\mozilla-build\python\Lib\site-packages\mercurial)... 13:49:46 INFO - checking templates (C:\mozilla-build\python\Lib\site-packages\mercurial\templates)... 13:49:46 INFO - checking commit editor... 13:49:46 INFO - No commit editor set and can't find vi in PATH 13:49:46 INFO - (specify a commit editor in your configuration file) 13:49:46 INFO - checking username... 13:49:46 INFO - no username supplied 13:49:46 INFO - (specify a username in your configuration file) 13:49:46 INFO - 1 problems detected, please check your install! 13:49:46 ERROR - Return code: 1 13:49:46 INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', '--config', 'extensions.robustcheckout=Z:\\task_1476538045\\build\\src\\testing\\mozharness\\external_tools\\robustcheckout.py', 'robustcheckout', 'https://hg.mozilla.org/build/tools', 'z:\\task_1476538045\\build\\tools', '--sharebase', 'C:\\builds\\hg-shared', '--branch', 'default'] 13:49:46 INFO - Copy/paste: hg --config ui.merge=internal:merge --config extensions.robustcheckout=Z:\task_1476538045\build\src\testing\mozharness\external_tools\robustcheckout.py robustcheckout https://hg.mozilla.org/build/tools z:\task_1476538045\build\tools --sharebase C:\builds\hg-shared --branch default 13:49:55 INFO - 13:49:55 INFO - clone [=============================> ] 18057764/28698371 02s 13:49:55 INFO - clone [=========================================> ] 25260729/28698371 01s 13:49:55 INFO - 13:49:55 INFO - ensuring https://hg.mozilla.org/build/tools@default is available at z:\task_1476538045\build\tools 13:49:55 INFO - (sharing from new pooled repository 7ae7fb134bf7aec6ec96a062ff47a69053dd2973) 13:49:55 INFO - applying clone bundle from https://s3-us-west-1.amazonaws.com/moz-hg-bundles-us-west-1/build/tools/78748858da198ca43c54765522a91ecebee0e075.packed1.hg 13:49:55 INFO - 2021 files to transfer, 27.4 MB of data 13:49:55 INFO - transferred 27.4 MB in 4.1 seconds (6.66 MB/sec) 13:49:55 INFO - finished applying clone bundle 13:49:55 INFO - searching for changes 13:49:55 INFO - no changes found 13:49:55 INFO - searching for changes 13:49:55 INFO - no changes found 13:49:55 INFO - (pulling to obtain default) 13:49:55 INFO - (remote resolved default to 78748858da198ca43c54765522a91ecebee0e075; result is not deterministic) 13:49:55 INFO - (revision already present locally; not pulling) David, can you find a build peer to look at this? I suspect it isn't build related but taskcluster related given the fact that this only fails on taskcluster and not on buildbot.

Flags: needinfo?(dburns)

Comment 17

•

8 years ago

Greg, since you have been working on TC lately, and your HG knowledge is great, can you have a quick look.

Flags: needinfo?(dburns) → needinfo?(gps)

Dustin J. Mitchell [:dustin] (he/him)

Comment 18

•

8 years ago

i had no idea that the make error was hg related, but if it is, and seeing an hg 3.7.3 invocation in the log in comment 16, i would like to add here that there are two versions of hg on tc builders: - 3.7.3 installed with mozilla-build at c:\mozilla-build\python\scripts\hg.exe - 3.9.1 installed with mercurial inno-setup at c:\Program Files\Mercurial\hg.exe currently the older version has precedence in the PATH environment variable and will be used if no explicit path to hg is provided in the hg command.

Comment 19

•

8 years ago

I don't think the hg error is related -- that's always occurred, and is only an issue when creating a commit. But greg will know more.

Gregory Szorc [:gps]

Comment 20

•

8 years ago

The exceptions from comment #4 seem like the real intermittent failure. Possibly a poorly written test. I'm not sure what file/test they are occurring in, however.

Comment 21

•

8 years ago

Those are also red herrings, that sequence of errors show up in every log (see comment 8 for example).

Comment 22

•

8 years ago

I am unclear of the next steps here to resolve this- it appears I found an error in the logs which isn't an error (not sure why we don't fix that?) but the real failure is in something called test.py which from what I can figure out is related to mozbase unittests? why are these part of the build? it appears test.py is here: https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/test.py we run a series of tests defined here: https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/test-manifest.ini and it seems that we make it to the end of the tests (test_apk.ApkTest.test_with_package_name). all of this is run from the command: z:/task_1475834533/build/src/obj-firefox\_virtualenv\Scripts\python.exe z:/task_1475834533/build/src/testing/mozbase/test.py :ahal, I see you reviewed a patch for pylint which touched a lot of mozbase code recently (possibly it landed a few days prior to this issue showing up): https://hg.mozilla.org/mozilla-central/rev/f7b6a16eb1f6

Flags: needinfo?(ahalberstadt)

Comment 23

•

8 years ago

No, the mozbase linting landed well after this was filed. I could be wrong, but I don't think this is related to test.py. What makes you say that's where the real error is?

Flags: needinfo?(ahalberstadt)

Comment 24

•

8 years ago

comment 4 seems to be what ted and gps think is the root cause, which is test.py. this bug is a bit silly since so many people have looked at the failure logs and cannot determine the root cause of the failure. looking at what is failing, this is taskcluster windows builds only, not buildbot- If we cannot figure out the root cause, that is the place to start looking.

Comment 25

•

8 years ago

As I've said before, comment 4 happens in every single job, even green ones. Here's an arbitrary successful windows taskcluster build: https://treeherder.mozilla.org/logviewer.html#?job_id=3656791&repo=autoland#L0-L34278 Ctrl-F for the error in comment 4 and you'll find it :). I believe the true cause of the error is not showing up in the logs.

David Burns :automatedtester

Comment 26

•

8 years ago

got it, I thought that was in reference to the hg error I found. going on the theory that we cannot find the root cause seems to be a valid summary. Since this is build related, I would like :automatedtester to help find a build peer to find the root cause.

Flags: needinfo?(dburns)

Comment 27

•

8 years ago

clearing the n-i since there already is a n-i on a build peer

Flags: needinfo?(dburns)

Comment hidden (Intermittent Failures Robot)

Comment 31

•

8 years ago

There are now osx and linux failures too! https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-macosx64-debug/1479887795/mozilla-inbound-macosx64-debug-bm86-build1-build6103.txt.gz 01:37:13 INFO - TEST-PASS | /builds/slave/m-in-m64-d-0000000000000000000/build/src/python/mozbuild/mozbuild/test/configure/test_toolchain_configure.py | WindowsToolchainTest.test_unsupported_msvc 01:37:43 INFO - /builds/slave/m-in-m64-d-0000000000000000000/build/src/build/compare-mozconfig/compare-mozconfigs-wrapper.py 01:37:43 INFO - WARNING:__main__:if test `uname -s` != Linux; then not in macosx-universal [u'ac_add_options --enable-update-channel=nightly', u'ac_add_options --with-branding=browser/branding/nightly', u'ac_add_options --enable-profiling', u'mk_add_options CLIENT_PY_ARGS="--hg-options=\'--verbose --time\' --hgtool=../tools/buildfarm/utils/hgtool.py --skip-chatzilla --skip-comm --skip-inspector --tinderbox-print"', u'mk_add_options MOZ_MAKE_FLAGS="-j4"', u'if test "${MOZ_UPDATE_CHANNEL}" = "nightly"; then', u'ac_add_options --with-macbundlename-prefix=Firefox', u'fi', u'mk_add_options MOZ_MAKE_FLAGS="-j12"', u'ac_add_options --with-ccache', u'. "$topsrcdir/build/mozconfig.cache"', u'ac_add_options --disable-install-strip', u'ac_add_options --enable-instruments', u'ac_add_options --enable-dtrace']! 01:37:43 INFO - ERROR:__main__:found in /builds/slave/m-in-m64-d-0000000000000000000/build/src/browser/config/mozconfigs/macosx-universal/nightly but not in /builds/slave/m-in-m64-d-0000000000000000000/build/src/browser/config/mozconfigs/macosx-universal/beta: if test `uname -s` != Linux; then 01:37:43 INFO - ERROR:root:Mozconfig check failed! 01:37:43 INFO - TEST-PASS | /builds/slave/m-in-m64-d-0000000000000000000/build/src/build/compare-mozconfig/compare-mozconfigs-wrapper.py | TestCompareMozconfigs.test_compare_mozconfigs 01:37:43 INFO - /builds/slave/m-in-m64-d-0000000000000000000/build/src/config/tests/test_mozbuild_reading.py 01:37:43 INFO - WARNING: Not a supported OS_TARGET for NSPR in moz.build: "". Use --with-system-nspr 01:37:43 INFO - TEST-PASS | /builds/slave/m-in-m64-d-0000000000000000000/build/src/config/tests/test_mozbuild_reading.py | TestMozbuildReading.test_filesystem_traversal_no_config 01:37:43 INFO - TEST-SKIP | /builds/slave/m-in-m64-d-0000000000000000000/build/src/config/tests/test_mozbuild_reading.py | TestMozbuildReading.test_filesystem_traversal_reading 01:37:43 INFO - TEST-PASS | /builds/slave/m-in-m64-d-0000000000000000000/build/src/config/tests/test_mozbuild_reading.py | TestMozbuildReading.test_orphan_file_patterns 01:37:43 INFO - make: *** [check] Error 245

Comment 32

•

8 years ago

And on linux, https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=6952708#L20535 19:21:02 INFO - PROCESS-CRASH | test_assert.js | application crashed [@ xpcshell + 0x4b2a]

Comment 33

•

8 years ago

:mshal - The failure in comment 31 looks like a regression from bug 1183613; can you have a look?

Flags: needinfo?(mshal)

Updated

•

8 years ago

Comment 35

•

8 years ago

It looks like there are a few things going on here. 1) The compare-mozconfigs check fails - I introduced the OSX one in bug 1183613 as :gbrown noted - The Linux one is from bug 1278445 (back in July) 2) 'make check' still returns success even when compare-mozconfigs fails, which is why the errors above went unnoticed. - This was partly fixed I think by bug 1255479, but it missed one of the failure codepaths. The 'return ret_code' in compare-mozconfigs-wrapper.py should be an assert. 3) 'make -k check' randomly fails with strange error codes for no discernible reason (aka: this bug and bug 1318597). I still don't know the reason for this, nor can I reproduce it locally. I'll file separate bugs for 1) & 2) and fix those, but I don't think that alone will fix this issue.

Flags: needinfo?(mshal)

Updated

•

8 years ago

Comment 39

•

8 years ago

it has been a few weeks, this still is continuing- :mshal can you have a fresh look at this?

Flags: needinfo?(mshal)

Comment 40

•

8 years ago

Sure, I'll take another look. I haven't had much luck reproducing it locally yet, but I'm still trying. If not, the next best thing may be to just add some more debugging to see if we can get an actual error message to pinpoint the problem.

Flags: needinfo?(mshal)

Comment 41

•

8 years ago

I still haven't had any success in reproducing this locally. I'll file a bug to add some debugging, and hopefully on the next report we'll have some logs that give additional clues.

Updated

•

8 years ago

Depends on: 1325197

Comment 42

•

8 years ago

I did finally hit this locally, though it took a couple hundred runs. I had the patch in bug 1325197 applied, and saw this: ============================= test session starts ============================== platform linux2 -- Python 2.7.12, pytest-2.9.2, py-1.4.31, pluggy-0.3.1 -- /home/marf/mozilla-central-git/obj-x86_64-pc-linux-gnu/_virtualenv/bin/python cachedir: ../python/mozlint/.cache rootdir: /home/marf/mozilla-central-git/python/mozlint, inifile: collecting ... collected 5 items ../python/mozlint/test/test_types.py::test_linter_types[string.lint] PASSED ../python/mozlint/test/test_types.py::test_linter_types[regex.lint] PASSED ../python/mozlint/test/test_types.py::test_linter_types[external.lint] PASSED ../python/mozlint/test/test_types.py::test_linter_types[structured.lint] PASSED ../python/mozlint/test/test_types.py::test_no_filter PASSED =========================== 5 passed in 0.16 seconds =========================== Setting retcode to -139 from /home/marf/mozilla-central-git/python/mozlint/test/test_types.py Which looks like test_types.py ran successfully, but then somehow the test harness gets a return code of -139. I was able to run test_types.py several thousand times successfully in isolation, so I'm not certain that this specific test is to blame. Maybe there's some funky interaction with running these in parallel. In any case, if I hit it again I'll post more info.

Comment hidden (Intermittent Failures Robot)

Comment 45

•

8 years ago

Locally I've now hit this both in test_types.py and test_roller.py, which seems to match the logs since bug 1325197 landed. Specifically, test_roller.py failed in 7 cases and test_types.py failed in 13 cases. I'm not sure if it's relevant, but these are 2 (out of 4) files that use pytest (instead of mozunit). In all cases the file that ends up setting the bad return code reports that all tests ran successfully, as in #c42.

Comment 46

•

8 years ago

On unix systems the return code comes from mozprocess here: https://dxr.mozilla.org/mozilla-central/source/testing/mozbase/mozprocess/mozprocess/processhandler.py#609 I think there are several issues being rolled up into this bug, because clicking a bunch of the windows failures in orangefactor show that make check fails before the python tests even get a chance to run. We should probably make those log lines from bug 1325197 into errors if the return code is non-zero, then we could make a new bug and star those failures over there.

Comment 47

•

8 years ago

Yeah, unfortunately the error message in this bug catches anything that causes 'make -k check' to fail. The ones I'm looking at specifically are the ones where we got no error message aside from make reporting the error, which now look to be specific to test_roller.py and test_types.py. Bug 1318597 catches these a lot better, but is specific to OSX since the error codes differ on other platforms. I've been adding some more debug to try to figure out where the error code goes from 0 to something non-zero (since the test cases look like they pass). I also made a wrapper for the virtualenv python so I could see what the python subprocess itself was returning. In one instance where I hit a test_types.py failure, it showed the return code as 0 right before calling sys.exit(rc). The python wrapper shell script then reports that the python process segfaults, and has a return value of 139 (on Linux - this varies per platform). So it looks like we're doing something funky in these two tests that triggers a rare python bug during cleanup. ahal, any idea what test_types.py / test_roller.py would be doing differently than the other two mozlint tests? I still haven't seen a failure in test_formatters.py or test_parser.py

Flags: needinfo?(ahalberstadt)

Comment hidden (Intermittent Failures Robot)

Comment 49

•

8 years ago

Huh, that is bizarre. But I think your assessment might be correct. Both test_types.py and test_roller.py create a LintRoller() instance, whereas test_formatters.py and test_parser.py don't. The LintRoller() object in turn uses the multiprocessing stdlib, so that must be related. I'll put aside some time this week to dig into this further.

Flags: needinfo?(ahalberstadt)

Comment hidden (Intermittent Failures Robot)

Comment 52

•

8 years ago

First, for some reason the ERROR log isn't showing up in treeherder as an error: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=69175359&lineNumber=16851 If we can get treeherder to highlight that line, it will be a lot easier to file a bug that can catch/isolate this error from the other stuff. Second, I suspect that the multiprocessing manager is somehow involved: https://dxr.mozilla.org/mozilla-central/source/python/mozlint/mozlint/roller.py#127 Mostly because we are segfaulting, and the manager is responsible for handling shared memory access. The only thing I can think to try is to explicitly call manager.shutdown() at the end (it is supposed to shutdown() when gc'ed though). I plan to move the mozlint tests into their own separate task once bug 1331663 is completed.. I'm tempted to wait until that happens to make testing this fix on try easier (no need to rebuild hundreds of times). Alternatively, we could attempt to push that "fix" blindly and see if it works.

Flags: needinfo?(gps)

Comment 53

•

8 years ago

Where's the treeherder logic that determines what is an error? I'm totally find with just changing that message to match a filter better. Though note that you'll see that error message if any of the python tests fail, so you'd have to specifically look for test_roller.py / test_types.py in order to better organize the bug reporting. The "Error 245" one seems to get the OSX ones very consistently at least.

Gregory Szorc [:gps]

Comment 54

•

8 years ago

(In reply to Michael Shal [:mshal] from comment #53) > Where's the treeherder logic that determines what is an error? I'm totally > find with just changing that message to match a filter better. https://github.com/mozilla/treeherder/blob/master/treeherder/log_parser/parsers.py

Comment 55

•

8 years ago

I think the bug is that we *are* logging that at the logging.ERROR level, but by the time it gets to the log, it's only an INFO. There may be other error messages scattered around the tree being hidden this way too. Eventually I'd also like to use mozlog there instead of mach's logging mechanism, but I have no timetable for that yet.

Comment hidden (Intermittent Failures Robot)