Closed
Bug 1191877
Opened 9 years ago
Closed 6 years ago
Intermittent OSX build Automation Error: mozprocess timed out after 2400 seconds running ['/tools/buildbot/bin/python', 'mach', '--log-no-times', 'build', '-v']
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: RyanVM, Unassigned)
References
Details
(Keywords: intermittent-failure)
+++ This bug was initially created as a clone of Bug #1165763 +++ +++ This bug was initially created as a clone of Bug #1154377 +++ +++ This bug was initially created as a clone of Bug #1145507 +++
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 23•9 years ago
|
||
From comment #22: 11:09:46 INFO - 44623: Worker processing files: ('dist/universal/test-stage/cppunittest/TestJSONWriter.dSYM', 'dist/universal/test-stage/cppunittest/TestJSONWriter') 11:09:46 INFO - 44622: Worker processing files: ('dist/universal/test-stage/cppunittest/TestLineBreak.dSYM', 'dist/universal/test-stage/cppunittest/TestLineBreak') 11:09:47 INFO - 44623: Worker processing files: ('dist/universal/test-stage/cppunittest/TestLineBreak.dSYM', 'dist/universal/test-stage/cppunittest/TestLineBreak') 11:09:47 INFO - 44620: Worker processing files: ('dist/universal/test-stage/cppunittest/TestMacroArgs.dSYM', 'dist/universal/test-stage/cppunittest/TestMacroArgs') 11:09:47 INFO - Exception in thread Thread-2: 11:09:47 INFO - Traceback (most recent call last): 11:09:47 INFO - File "/tools/python/lib/python2.7/threading.py", line 551, in __bootstrap_inner 11:09:47 INFO - self.run() 11:09:47 INFO - File "/tools/python/lib/python2.7/threading.py", line 504, in run 11:09:47 INFO - self.__target(*self.__args, **self.__kwargs) 11:09:47 INFO - File "/tools/python/lib/python2.7/multiprocessing/pool.py", line 319, in _handle_tasks 11:09:47 INFO - put(task) 11:09:47 INFO - RuntimeError: dictionary changed size during iteration 11:21:43 INFO - 44613: Submitting jobs for files: ('dist/universal/firefox/Nightly.app/Contents/MacOS/XUL.dSYM', 'dist/universal/firefox/Nightly.app/Contents/MacOS/XUL') 12:01:43 INFO - Automation Error: mozprocess timed out after 2400 seconds running ['/tools/buildbot/bin/python', 'mach', '--log-no-times', 'build', '-v'] 12:01:43 ERROR - timed out after 2400 seconds of no output 12:01:43 ERROR - Return code: -9 12:01:43 WARNING - setting return code to 2 12:01:43 FATAL - 'mach build' did not run successfully. Please check log for errors. This is a bug in symbolstore.py.
Comment 24•9 years ago
|
||
So, there is a bug where multiple threads or something else modifies a dictionary during iteraction. This causes an exception in a thread. And, chances are the code for waiting on threads is also buggy in that it doesn't detect aborted threads.
Comment 25•9 years ago
|
||
This smells like https://bugs.python.org/issue23051. The bug was fixed in Python 2.7.10.
Comment 26•9 years ago
|
||
Actually, I don't think this is Python issue 23051. But the stack does appear to come from inside multiprocesing, so this does look like a bug in multiprocessing.
Comment 27•9 years ago
|
||
The error is not the same from issue 23051. However, the fix likely solves this problem as well. https://hg.python.org/cpython/rev/311d52878a65 is the fix. It added an additional try..except around the iteration of taskseq. This will almost certainly catch the exception we're seeing. I have no clue why we're now suddenly seeing this though. Weird.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 31•9 years ago
|
||
So bug 1164816 covers the "background task in symbolstore.py raises and the script hangs" situation. I think that patch just got hung up on a broken unit test, I suspect I could get that finished up and landed without much effort. As to the error at hand here, I'm really at a loss as to why it'd start showing up now. The multiprocessing code hasn't changed in a long time, none of the recent changes to the script seem to be pretty innocuous: https://hg.mozilla.org/mozilla-central/filelog/d6ea652c579992daa9041cc9718bb7c6abefbc91/toolkit/crashreporter/tools/symbolstore.py
Flags: needinfo?(ted)
Comment 32•9 years ago
|
||
Could we hack around this by throwing a multiprocessing.Lock around the code that inserts jobs?
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 37•9 years ago
|
||
Possibly, yeah, but the patch in that bug was basically done except for one dumb error that I hadn't fixed, so I just fixed that and pushed it to try to sanity check that it still works on all the other platforms. That try push is looking pretty green, so I should be able to get those patches landed today.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Updated•6 years ago
|
Component: General Automation → General
Comment 48•6 years ago
|
||
I think this bug outlived its usefulness. Closing this, feel free to reopen if I'm wrong.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•