Intermittent AssertionError: can only join a started process
Categories
(Testing :: Mozbase, defect, P5)
Tracking
(firefox109 fixed)
Tracking | Status | |
---|---|---|
firefox109 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: gbrown)
References
Details
(Keywords: assertion, intermittent-failure)
Attachments
(1 file)
Filed by: sstanca [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=396146170&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/dblngbTBRSmAvHuL7LvUKg/runs/0/artifacts/public/logs/live_backing.log
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'PUBLIC': 'C:\\Users\\Public',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'PYTHON': 'c:\\mozilla-build\\python3\\python3.exe',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'PYTHONPATH': 'C:\\Users\\task_1668063117\\build\\tests\\raptor',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'REQUIRE_GPU': '0',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'RUST_BACKTRACE': 'full',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'SCCACHE_DISABLE': '1',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'SCRIPTSPATH': 'C:\\Users\\task_1668063117\\mozharness',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'SYSTEMDRIVE': 'C:',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'SYSTEMROOT': 'C:\\windows',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'TASKCLUSTER_PROXY_URL': 'http://localhost:80',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'TASKCLUSTER_ROOT_URL': 'https://firefox-ci-tc.services.mozilla.com',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'TASK_ID': 'dblngbTBRSmAvHuL7LvUKg',
[task 2022-11-10T10:34:24.097Z] 10:34:24 INFO - 'TASK_STOP_TIME': '1668075766',
[task 2022-11-10T10:34:24.098Z] 10:34:24 INFO - 'TEMP': 'C:\\Users\\task_1668063117\\AppData\\Local\\Temp',
[task 2022-11-10T10:34:24.098Z] 10:34:24 INFO - 'TMP': 'C:\\Users\\task_1668063117\\AppData\\Local\\Temp',
[task 2022-11-10T10:34:24.098Z] 10:34:24 INFO - 'TOOLTOOL_CACHE': 'C:\\builds\\tooltool_cache',
[task 2022-11-10T10:34:24.098Z] 10:34:24 INFO - 'USERDOMAIN': 'T-W1064-MS-269',
[task 2022-11-10T10:34:24.098Z] 10:34:24 INFO - 'USERDOMAIN_ROAMINGPROFILE': 'T-W1064-MS-269',
[task 2022-11-10T10:34:24.098Z] 10:34:24 INFO - 'USERNAME': 'task_1668063117',
[task 2022-11-10T10:34:24.098Z] 10:34:24 INFO - 'USERPROFILE': 'C:\\Users\\task_1668063117',
[task 2022-11-10T10:34:24.098Z] 10:34:24 INFO - 'VIRTUAL_ENV': 'venv',
[task 2022-11-10T10:34:24.098Z] 10:34:24 INFO - 'WINDIR': 'C:\\windows'}
[task 2022-11-10T10:34:24.098Z] 10:34:24 INFO - Calling ['C:\\Users\\task_1668063117\\build\\venv\\Scripts\\python', 'C:\\Users\\task_1668063117\\build\\tests\\raptor\\raptor\\raptor.py', '--extra-profiler-run', '--chimera', '--collect-perfstats', '--browsertime-node', 'C:/Users/task_1668063117/fetches/node/node.exe', '--browsertime-browsertimejs', 'C:/Users/task_1668063117/fetches/browsertime/node_modules/browsertime/bin/browsertime.js', '--browsertime-chromedriver', 'C:/Users/task_1668063117/fetches/{}chromedriver.exe', '--browsertime-ffmpeg', 'C:/Users/task_1668063117/fetches/ffmpeg-4.4.1-full_build/bin/ffmpeg.exe', '--browsertime-geckodriver', 'C:/Users/task_1668063117/fetches/geckodriver.exe', '--browsertime-video', '--browsertime-visualmetrics', '--browsertime-no-ffwindowrecorder', '--browsertime', '--app', 'firefox', '--binary', 'C:\\Users\\task_1668063117\\build\\application\\firefox\\firefox.exe', '--test', 'fandom', '--symbolsPath', 'https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/LscU5UvvQFqv70C9yjwjmA/artifacts/public/build/target.crashreporter-symbols.zip', '--conditioned-profile', 'settled', '--host', '127.0.0.1', '--log-tbpl-level=debug'] with output_timeout 3600
[task 2022-11-10T10:34:24.246Z] 10:34:24 ERROR - Traceback (most recent call last):
[task 2022-11-10T10:34:24.246Z] 10:34:24 INFO - File "C:\Users\task_1668063117\build\tests\raptor\raptor\raptor.py", line 30, in <module>
[task 2022-11-10T10:34:24.247Z] 10:34:24 INFO - from browsertime import BrowsertimeDesktop, BrowsertimeAndroid
[task 2022-11-10T10:34:24.247Z] 10:34:24 INFO - File "C:\Users\task_1668063117\build\tests\raptor\raptor\browsertime\__init__.py", line 8, in <module>
[task 2022-11-10T10:34:24.247Z] 10:34:24 INFO - from .base import Browsertime
[task 2022-11-10T10:34:24.248Z] 10:34:24 INFO - File "C:\Users\task_1668063117\build\tests\raptor\raptor\browsertime\base.py", line 20, in <module>
[task 2022-11-10T10:34:24.248Z] 10:34:24 INFO - from benchmark import Benchmark
[task 2022-11-10T10:34:24.249Z] 10:34:24 INFO - File "C:\Users\task_1668063117\build\tests\raptor\raptor\benchmark.py", line 12, in <module>
[task 2022-11-10T10:34:24.249Z] 10:34:24 INFO - from wptserve import server, handlers
[task 2022-11-10T10:34:24.249Z] 10:34:24 INFO - ModuleNotFoundError: No module named 'wptserve'
[task 2022-11-10T10:34:24.274Z] 10:34:24 ERROR - Return code: 1
[task 2022-11-10T10:34:24.274Z] 10:34:24 WARNING - setting return code to 1
[task 2022-11-10T10:34:24.274Z] 10:34:24 INFO - Copying Raptor results to upload dir:
[task 2022-11-10T10:34:24.274Z] 10:34:24 INFO - C:\Users\task_1668063117\build\blobber_upload_dir\perfherder-data.json
[task 2022-11-10T10:34:24.274Z] 10:34:24 INFO - Copying raptor results from C:\Users\task_1668063117\build\raptor.json to C:\Users\task_1668063117\build\blobber_upload_dir\perfherder-data.json
[task 2022-11-10T10:34:24.275Z] 10:34:24 CRITICAL - Error copying results C:\Users\task_1668063117\build\raptor.json to upload dir C:\Users\task_1668063117\build\blobber_upload_dir\perfherder-data.json
[task 2022-11-10T10:34:24.275Z] 10:34:24 INFO - [Errno 2] No such file or directory: 'C:\\Users\\task_1668063117\\build\\raptor.json'
[task 2022-11-10T10:34:24.275Z] 10:34:24 INFO - Running post-action listener: _package_coverage_data
[task 2022-11-10T10:34:24.275Z] 10:34:24 INFO - Running post-action listener: _resource_record_post_action
[task 2022-11-10T10:34:24.275Z] 10:34:24 INFO - Running post-action listener: process_java_coverage_data
[task 2022-11-10T10:34:24.275Z] 10:34:24 INFO - Running post-action listener: stop_device
[task 2022-11-10T10:34:24.275Z] 10:34:24 INFO - [mozharness: 2022-11-10 10:34:24.275515Z] Finished run-tests step (success)
[task 2022-11-10T10:34:24.276Z] 10:34:24 INFO - Running post-run listener: _resource_record_post_run
[task 2022-11-10T10:34:24.376Z] 10:34:24 WARNING - Exception when reporting resource usage: Traceback (most recent call last):
[task 2022-11-10T10:34:24.376Z] 10:34:24 WARNING - File "C:\Users\task_1668063117\mozharness\mozharness\base\python.py", line 852, in _resource_record_post_run
[task 2022-11-10T10:34:24.376Z] 10:34:24 WARNING - self._resource_monitor.stop()
[task 2022-11-10T10:34:24.376Z] 10:34:24 WARNING - File "C:\Users\task_1668063117\build\venv\Lib\site-packages\mozsystemmonitor\resourcemonitor.py", line 354, in stop
[task 2022-11-10T10:34:24.376Z] 10:34:24 WARNING - self._process.join(10)
[task 2022-11-10T10:34:24.376Z] 10:34:24 WARNING - File "c:\mozilla-build\python3\lib\multiprocessing\process.py", line 123, in join
[task 2022-11-10T10:34:24.376Z] 10:34:24 WARNING - assert self._popen is not None, 'can only join a started process'
[task 2022-11-10T10:34:24.376Z] 10:34:24 WARNING - AssertionError: can only join a started process
[task 2022-11-10T10:34:24.376Z] 10:34:24 WARNING - returning nonzero exit status 1
[taskcluster 2022-11-10T10:34:24.412Z] Exit Code: 1
[taskcluster 2022-11-10T10:34:24.412Z] User Time: 15.625ms
[taskcluster 2022-11-10T10:34:24.412Z] Kernel Time: 0s
[taskcluster 2022-11-10T10:34:24.412Z] Wall Time: 1m37.2862379s
[taskcluster 2022-11-10T10:34:24.412Z] Result: FAILED
[taskcluster 2022-11-10T10:34:24.412Z] === Task Finished ===
[taskcluster 2022-11-10T10:34:24.412Z] Task Duration: 1m37.2862379s
[taskcluster 2022-11-10T10:34:26.418Z] Uploading artifact public/logs/localconfig.json from file logs\localconfig.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2023-11-10T08:57:14.563Z
[taskcluster 2022-11-10T10:34:26.711Z] Uploading redirect artifact public/logs/live.log to URL https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/dblngbTBRSmAvHuL7LvUKg/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2023-11-10T08:57:14.563Z
[taskcluster:error] exit status 1
Assignee | ||
Comment 1•2 years ago
|
||
It looks like this task failed because of the exception in raptor. There was a subsequent problem in resourcemanager which caused some warnings, including the bug title, "AssertionError: can only join a started process". In this bug, let's focus on improved handling in resourcemanager (maybe mozharness?), following early task termination.
Assignee | ||
Comment 2•2 years ago
|
||
There was trouble stopping because there was trouble starting:
[task 2022-11-10T10:34:18.302Z] 10:34:18 INFO - Starting resource monitoring.
[task 2022-11-10T10:34:18.309Z] 10:34:18 WARNING - Unable to start resource monitor: Traceback (most recent call last):
[task 2022-11-10T10:34:18.309Z] 10:34:18 WARNING - File "C:\Users\task_1668063117\mozharness\mozharness\base\python.py", line 822, in _start_resource_monitoring
[task 2022-11-10T10:34:18.309Z] 10:34:18 WARNING - self._resource_monitor.start()
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - File "C:\Users\task_1668063117\build\venv\Lib\site-packages\mozsystemmonitor\resourcemonitor.py", line 300, in start
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - self._process.start()
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - File "c:\mozilla-build\python3\lib\multiprocessing\process.py", line 105, in start
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - self._popen = self._Popen(self)
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - File "c:\mozilla-build\python3\lib\multiprocessing\context.py", line 223, in _Popen
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - return _default_context.get_context().Process._Popen(process_obj)
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - File "c:\mozilla-build\python3\lib\multiprocessing\context.py", line 322, in _Popen
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - return Popen(process_obj)
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - File "c:\mozilla-build\python3\lib\multiprocessing\popen_spawn_win32.py", line 48, in __init__
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - None, None, False, 0, None, None, None)
[task 2022-11-10T10:34:18.310Z] 10:34:18 WARNING - FileNotFoundError: [WinError 2] The system cannot find the file specified
Assignee | ||
Comment 3•2 years ago
|
||
mozsystemmonitor is in mozbase, but unlike most mozbase packages, it is installed from the mozilla pypi:
[task 2022-11-10T10:33:26.537Z] 10:33:26 INFO - Looking in links: https://pypi.pub.build.mozilla.org/pub/
[task 2022-11-10T10:33:27.414Z] 10:33:27 INFO - Collecting mozsystemmonitor==0.4
[task 2022-11-10T10:33:27.418Z] 10:33:27 INFO - Downloading https://pypi.pub.build.mozilla.org/pub/mozsystemmonitor-0.4.tar.gz (8.1 kB)
[task 2022-11-10T10:33:27.673Z] 10:33:27 INFO - Requirement already satisfied: psutil>=3.1.1 in c:\users\task_1668063117\build\venv\lib\site-packages (from mozsystemmonitor==0.4) (5.7.0)
[task 2022-11-10T10:33:27.675Z] 10:33:27 INFO - Using legacy 'setup.py install' for mozsystemmonitor, since package 'wheel' is not installed.
[task 2022-11-10T10:33:27.698Z] 10:33:27 INFO - Installing collected packages: mozsystemmonitor
[task 2022-11-10T10:33:27.700Z] 10:33:27 INFO - Running setup.py install for mozsystemmonitor: started
[task 2022-11-10T10:33:27.983Z] 10:33:27 INFO - Running setup.py install for mozsystemmonitor: finished with status 'done'
[task 2022-11-10T10:33:27.990Z] 10:33:27 INFO - Successfully installed mozsystemmonitor-0.4
[task 2022-11-10T10:33:28.055Z] 10:33:28 INFO - Return code: 0
...
[task 2022-11-10T10:33:30.220Z] 10:33:30 INFO - Processing c:\users\task_1668063117\build\tests\tools\geckoprocesstypes_generator
[task 2022-11-10T10:33:30.454Z] 10:33:30 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\manifestparser
[task 2022-11-10T10:33:30.739Z] 10:33:30 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozcrash
[task 2022-11-10T10:33:30.982Z] 10:33:30 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozdebug
[task 2022-11-10T10:33:31.228Z] 10:33:31 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozdevice
[task 2022-11-10T10:33:31.475Z] 10:33:31 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozfile
[task 2022-11-10T10:33:31.734Z] 10:33:31 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozgeckoprofiler
[task 2022-11-10T10:33:31.975Z] 10:33:31 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozhttpd
[task 2022-11-10T10:33:32.220Z] 10:33:32 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozinfo
[task 2022-11-10T10:33:32.461Z] 10:33:32 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozinstall
[task 2022-11-10T10:33:32.707Z] 10:33:32 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozleak
[task 2022-11-10T10:33:32.948Z] 10:33:32 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozlog
[task 2022-11-10T10:33:33.230Z] 10:33:33 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\moznetwork
[task 2022-11-10T10:33:33.468Z] 10:33:33 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozpower
[task 2022-11-10T10:33:33.718Z] 10:33:33 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozprocess
[task 2022-11-10T10:33:33.975Z] 10:33:33 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozprofile
[task 2022-11-10T10:33:34.250Z] 10:33:34 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozproxy
[task 2022-11-10T10:33:34.542Z] 10:33:34 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozrunner
[task 2022-11-10T10:33:34.809Z] 10:33:34 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozscreenshot
[task 2022-11-10T10:33:35.047Z] 10:33:35 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\moztest
[task 2022-11-10T10:33:35.313Z] 10:33:35 INFO - Processing c:\users\task_1668063117\build\tests\mozbase\mozversion
[task 2022-11-10T10:33:35.564Z] 10:33:35 INFO - Processing c:\users\task_1668063117\build\tests\condprofile
There have been several minor changes to mozsystemmonitor since version 0.4 -- none obviously relevant to this bug.
Assignee | ||
Comment 4•2 years ago
|
||
I checked a variety of tasks for m-c. mozsystemmonitor is working reliably on most platforms, but on windows-hw, it fails consistently as seen in comment 2; usually this does not result in a task failure - that seems reasonable to me, as the only consequence is that resource usage stats are not available.
Assignee | ||
Comment 5•2 years ago
|
||
I suspect _collect() is raising somewhere, on windows-hw only.
Assignee | ||
Comment 6•2 years ago
|
||
The direct cause of "can only join a started process" is that these two lines are in the wrong order:
https://searchfox.org/mozilla-central/rev/dd216c1307a2bf1b0d2465b9749fa86dac44303a/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py#329-330
_running guards calls involving self._process; as written, if start() raises, _running is still True, causing trouble on teardown.
Now, why is start() failing?
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 8•2 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #6)
Now, why is start() failing?
That seems to happen when multiprocessing cannot find the python executable. Docs say it takes that from sys.executable. For talos on win-hw, sys.executable is the script from the venv, C:\Users\task_1668127320\build\venv\Scripts\python.
I changed the executable in https://treeherder.mozilla.org/jobs?repo=try&revision=e0a3727ab8358a4bbe32f1385f9278c9d4b74443 and that worked great -- start() succeeded and system resource stats were collected and reported.
Now, comment 7 shows this is happening on regular windows as well, and on non-talos tests...
Assignee | ||
Comment 9•2 years ago
|
||
Comment 10•2 years ago
|
||
Pushed by gbrown@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/70abb5595f9f Improve error handling in mozsystemmonitor and enable multiprocessing in some mozharness scripts; r=jmaher
Comment 11•2 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Description
•