Closed Bug 1191877 Opened 10 years ago Closed 7 years ago

Intermittent OSX build Automation Error: mozprocess timed out after 2400 seconds running ['/tools/buildbot/bin/python', 'mach', '--log-no-times', 'build', '-v']

Categories

(Release Engineering :: General, defect)

x86_64
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: RyanVM, Unassigned)

References

Details

(Keywords: intermittent-failure)

+++ This bug was initially created as a clone of Bug #1165763 +++ +++ This bug was initially created as a clone of Bug #1154377 +++ +++ This bug was initially created as a clone of Bug #1145507 +++
this looks like a buildsymbols problem
Flags: needinfo?(ted)
From comment #22: 11:09:46 INFO - 44623: Worker processing files: ('dist/universal/test-stage/cppunittest/TestJSONWriter.dSYM', 'dist/universal/test-stage/cppunittest/TestJSONWriter') 11:09:46 INFO - 44622: Worker processing files: ('dist/universal/test-stage/cppunittest/TestLineBreak.dSYM', 'dist/universal/test-stage/cppunittest/TestLineBreak') 11:09:47 INFO - 44623: Worker processing files: ('dist/universal/test-stage/cppunittest/TestLineBreak.dSYM', 'dist/universal/test-stage/cppunittest/TestLineBreak') 11:09:47 INFO - 44620: Worker processing files: ('dist/universal/test-stage/cppunittest/TestMacroArgs.dSYM', 'dist/universal/test-stage/cppunittest/TestMacroArgs') 11:09:47 INFO - Exception in thread Thread-2: 11:09:47 INFO - Traceback (most recent call last): 11:09:47 INFO - File "/tools/python/lib/python2.7/threading.py", line 551, in __bootstrap_inner 11:09:47 INFO - self.run() 11:09:47 INFO - File "/tools/python/lib/python2.7/threading.py", line 504, in run 11:09:47 INFO - self.__target(*self.__args, **self.__kwargs) 11:09:47 INFO - File "/tools/python/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks 11:09:47 INFO - put(task) 11:09:47 INFO - RuntimeError: dictionary changed size during iteration 11:21:43 INFO - 44613: Submitting jobs for files: ('dist/universal/firefox/Nightly.app/Contents/MacOS/XUL.dSYM', 'dist/universal/firefox/Nightly.app/Contents/MacOS/XUL') 12:01:43 INFO - Automation Error: mozprocess timed out after 2400 seconds running ['/tools/buildbot/bin/python', 'mach', '--log-no-times', 'build', '-v'] 12:01:43 ERROR - timed out after 2400 seconds of no output 12:01:43 ERROR - Return code: -9 12:01:43 WARNING - setting return code to 2 12:01:43 FATAL - 'mach build' did not run successfully. Please check log for errors. This is a bug in symbolstore.py.
So, there is a bug where multiple threads or something else modifies a dictionary during iteraction. This causes an exception in a thread. And, chances are the code for waiting on threads is also buggy in that it doesn't detect aborted threads.
This smells like https://bugs.python.org/issue23051. The bug was fixed in Python 2.7.10.
Actually, I don't think this is Python issue 23051. But the stack does appear to come from inside multiprocesing, so this does look like a bug in multiprocessing.
The error is not the same from issue 23051. However, the fix likely solves this problem as well. https://hg.python.org/cpython/rev/311d52878a65 is the fix. It added an additional try..except around the iteration of taskseq. This will almost certainly catch the exception we're seeing. I have no clue why we're now suddenly seeing this though. Weird.
So bug 1164816 covers the "background task in symbolstore.py raises and the script hangs" situation. I think that patch just got hung up on a broken unit test, I suspect I could get that finished up and landed without much effort. As to the error at hand here, I'm really at a loss as to why it'd start showing up now. The multiprocessing code hasn't changed in a long time, none of the recent changes to the script seem to be pretty innocuous: https://hg.mozilla.org/mozilla-central/filelog/d6ea652c579992daa9041cc9718bb7c6abefbc91/toolkit/crashreporter/tools/symbolstore.py
Flags: needinfo?(ted)
Could we hack around this by throwing a multiprocessing.Lock around the code that inserts jobs?
Possibly, yeah, but the patch in that bug was basically done except for one dumb error that I hadn't fixed, so I just fixed that and pushed it to try to sanity check that it still works on all the other platforms. That try push is looking pretty green, so I should be able to get those patches landed today.
Pretty sure bug 1164816 fixed this.
Depends on: 1164816
Component: General Automation → General
I think this bug outlived its usefulness. Closing this, feel free to reopen if I'm wrong.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.