Closed Bug 1800055 Opened 2 years ago Closed 2 years ago

Intermittent AssertionError: can only join a started process

Categories

(Testing :: Mozbase, defect, P5)

defect

Tracking

(firefox109 fixed)

RESOLVED FIXED
109 Branch
Tracking Status
firefox109 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gbrown)

References

Details

(Keywords: assertion, intermittent-failure)

Attachments

(1 file)

Filed by: sstanca [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=396146170&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/dblngbTBRSmAvHuL7LvUKg/runs/0/artifacts/public/logs/live_backing.log


[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'PUBLIC': 'C:\\Users\\Public',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'PYTHON': 'c:\\mozilla-build\\python3\\python3.exe',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'PYTHONPATH': 'C:\\Users\\task_1668063117\\build\\tests\\raptor',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'REQUIRE_GPU': '0',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'RUST_BACKTRACE': 'full',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'SCCACHE_DISABLE': '1',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'SCRIPTSPATH': 'C:\\Users\\task_1668063117\\mozharness',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'SYSTEMDRIVE': 'C:',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'SYSTEMROOT': 'C:\\windows',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'TASKCLUSTER_PROXY_URL': 'http://localhost:80',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'TASKCLUSTER_ROOT_URL': 'https://firefox-ci-tc.services.mozilla.com',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'TASK_ID': 'dblngbTBRSmAvHuL7LvUKg',
[task 2022-11-10T10:34:24.097Z] 10:34:24     INFO -  'TASK_STOP_TIME': '1668075766',
[task 2022-11-10T10:34:24.098Z] 10:34:24     INFO -  'TEMP': 'C:\\Users\\task_1668063117\\AppData\\Local\\Temp',
[task 2022-11-10T10:34:24.098Z] 10:34:24     INFO -  'TMP': 'C:\\Users\\task_1668063117\\AppData\\Local\\Temp',
[task 2022-11-10T10:34:24.098Z] 10:34:24     INFO -  'TOOLTOOL_CACHE': 'C:\\builds\\tooltool_cache',
[task 2022-11-10T10:34:24.098Z] 10:34:24     INFO -  'USERDOMAIN': 'T-W1064-MS-269',
[task 2022-11-10T10:34:24.098Z] 10:34:24     INFO -  'USERDOMAIN_ROAMINGPROFILE': 'T-W1064-MS-269',
[task 2022-11-10T10:34:24.098Z] 10:34:24     INFO -  'USERNAME': 'task_1668063117',
[task 2022-11-10T10:34:24.098Z] 10:34:24     INFO -  'USERPROFILE': 'C:\\Users\\task_1668063117',
[task 2022-11-10T10:34:24.098Z] 10:34:24     INFO -  'VIRTUAL_ENV': 'venv',
[task 2022-11-10T10:34:24.098Z] 10:34:24     INFO -  'WINDIR': 'C:\\windows'}
[task 2022-11-10T10:34:24.098Z] 10:34:24     INFO - Calling ['C:\\Users\\task_1668063117\\build\\venv\\Scripts\\python', 'C:\\Users\\task_1668063117\\build\\tests\\raptor\\raptor\\raptor.py', '--extra-profiler-run', '--chimera', '--collect-perfstats', '--browsertime-node', 'C:/Users/task_1668063117/fetches/node/node.exe', '--browsertime-browsertimejs', 'C:/Users/task_1668063117/fetches/browsertime/node_modules/browsertime/bin/browsertime.js', '--browsertime-chromedriver', 'C:/Users/task_1668063117/fetches/{}chromedriver.exe', '--browsertime-ffmpeg', 'C:/Users/task_1668063117/fetches/ffmpeg-4.4.1-full_build/bin/ffmpeg.exe', '--browsertime-geckodriver', 'C:/Users/task_1668063117/fetches/geckodriver.exe', '--browsertime-video', '--browsertime-visualmetrics', '--browsertime-no-ffwindowrecorder', '--browsertime', '--app', 'firefox', '--binary', 'C:\\Users\\task_1668063117\\build\\application\\firefox\\firefox.exe', '--test', 'fandom', '--symbolsPath', 'https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/LscU5UvvQFqv70C9yjwjmA/artifacts/public/build/target.crashreporter-symbols.zip', '--conditioned-profile', 'settled', '--host', '127.0.0.1', '--log-tbpl-level=debug'] with output_timeout 3600
[task 2022-11-10T10:34:24.246Z] 10:34:24    ERROR -  Traceback (most recent call last):
[task 2022-11-10T10:34:24.246Z] 10:34:24     INFO -    File "C:\Users\task_1668063117\build\tests\raptor\raptor\raptor.py", line 30, in <module>
[task 2022-11-10T10:34:24.247Z] 10:34:24     INFO -      from browsertime import BrowsertimeDesktop, BrowsertimeAndroid
[task 2022-11-10T10:34:24.247Z] 10:34:24     INFO -    File "C:\Users\task_1668063117\build\tests\raptor\raptor\browsertime\__init__.py", line 8, in <module>
[task 2022-11-10T10:34:24.247Z] 10:34:24     INFO -      from .base import Browsertime
[task 2022-11-10T10:34:24.248Z] 10:34:24     INFO -    File "C:\Users\task_1668063117\build\tests\raptor\raptor\browsertime\base.py", line 20, in <module>
[task 2022-11-10T10:34:24.248Z] 10:34:24     INFO -      from benchmark import Benchmark
[task 2022-11-10T10:34:24.249Z] 10:34:24     INFO -    File "C:\Users\task_1668063117\build\tests\raptor\raptor\benchmark.py", line 12, in <module>
[task 2022-11-10T10:34:24.249Z] 10:34:24     INFO -      from wptserve import server, handlers
[task 2022-11-10T10:34:24.249Z] 10:34:24     INFO -  ModuleNotFoundError: No module named 'wptserve'
[task 2022-11-10T10:34:24.274Z] 10:34:24    ERROR - Return code: 1
[task 2022-11-10T10:34:24.274Z] 10:34:24  WARNING - setting return code to 1
[task 2022-11-10T10:34:24.274Z] 10:34:24     INFO - Copying Raptor results to upload dir:
[task 2022-11-10T10:34:24.274Z] 10:34:24     INFO - C:\Users\task_1668063117\build\blobber_upload_dir\perfherder-data.json
[task 2022-11-10T10:34:24.274Z] 10:34:24     INFO - Copying raptor results from C:\Users\task_1668063117\build\raptor.json to C:\Users\task_1668063117\build\blobber_upload_dir\perfherder-data.json
[task 2022-11-10T10:34:24.275Z] 10:34:24 CRITICAL - Error copying results C:\Users\task_1668063117\build\raptor.json to upload dir C:\Users\task_1668063117\build\blobber_upload_dir\perfherder-data.json
[task 2022-11-10T10:34:24.275Z] 10:34:24     INFO - [Errno 2] No such file or directory: 'C:\\Users\\task_1668063117\\build\\raptor.json'
[task 2022-11-10T10:34:24.275Z] 10:34:24     INFO - Running post-action listener: _package_coverage_data
[task 2022-11-10T10:34:24.275Z] 10:34:24     INFO - Running post-action listener: _resource_record_post_action
[task 2022-11-10T10:34:24.275Z] 10:34:24     INFO - Running post-action listener: process_java_coverage_data
[task 2022-11-10T10:34:24.275Z] 10:34:24     INFO - Running post-action listener: stop_device
[task 2022-11-10T10:34:24.275Z] 10:34:24     INFO - [mozharness: 2022-11-10 10:34:24.275515Z] Finished run-tests step (success)
[task 2022-11-10T10:34:24.276Z] 10:34:24     INFO - Running post-run listener: _resource_record_post_run
[task 2022-11-10T10:34:24.376Z] 10:34:24  WARNING - Exception when reporting resource usage: Traceback (most recent call last):
[task 2022-11-10T10:34:24.376Z] 10:34:24  WARNING -   File "C:\Users\task_1668063117\mozharness\mozharness\base\python.py", line 852, in _resource_record_post_run
[task 2022-11-10T10:34:24.376Z] 10:34:24  WARNING -     self._resource_monitor.stop()
[task 2022-11-10T10:34:24.376Z] 10:34:24  WARNING -   File "C:\Users\task_1668063117\build\venv\Lib\site-packages\mozsystemmonitor\resourcemonitor.py", line 354, in stop
[task 2022-11-10T10:34:24.376Z] 10:34:24  WARNING -     self._process.join(10)
[task 2022-11-10T10:34:24.376Z] 10:34:24  WARNING -   File "c:\mozilla-build\python3\lib\multiprocessing\process.py", line 123, in join
[task 2022-11-10T10:34:24.376Z] 10:34:24  WARNING -     assert self._popen is not None, 'can only join a started process'
[task 2022-11-10T10:34:24.376Z] 10:34:24  WARNING - AssertionError: can only join a started process
[task 2022-11-10T10:34:24.376Z] 10:34:24  WARNING - returning nonzero exit status 1
[taskcluster 2022-11-10T10:34:24.412Z]    Exit Code: 1
[taskcluster 2022-11-10T10:34:24.412Z]    User Time: 15.625ms
[taskcluster 2022-11-10T10:34:24.412Z]  Kernel Time: 0s
[taskcluster 2022-11-10T10:34:24.412Z]    Wall Time: 1m37.2862379s
[taskcluster 2022-11-10T10:34:24.412Z]       Result: FAILED
[taskcluster 2022-11-10T10:34:24.412Z] === Task Finished ===
[taskcluster 2022-11-10T10:34:24.412Z] Task Duration: 1m37.2862379s
[taskcluster 2022-11-10T10:34:26.418Z] Uploading artifact public/logs/localconfig.json from file logs\localconfig.json with content encoding "gzip", mime type "application/octet-stream" and expiry 2023-11-10T08:57:14.563Z
[taskcluster 2022-11-10T10:34:26.711Z] Uploading redirect artifact public/logs/live.log to URL https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/dblngbTBRSmAvHuL7LvUKg/runs/0/artifacts/public/logs/live_backing.log with mime type "text/plain; charset=utf-8" and expiry 2023-11-10T08:57:14.563Z
[taskcluster:error] exit status 1

It looks like this task failed because of the exception in raptor. There was a subsequent problem in resourcemanager which caused some warnings, including the bug title, "AssertionError: can only join a started process". In this bug, let's focus on improved handling in resourcemanager (maybe mozharness?), following early task termination.

Assignee: nobody → gbrown
See Also: → 895388

There was trouble stopping because there was trouble starting:

[task 2022-11-10T10:34:18.302Z] 10:34:18     INFO - Starting resource monitoring.
[task 2022-11-10T10:34:18.309Z] 10:34:18  WARNING - Unable to start resource monitor: Traceback (most recent call last):
[task 2022-11-10T10:34:18.309Z] 10:34:18  WARNING -   File "C:\Users\task_1668063117\mozharness\mozharness\base\python.py", line 822, in _start_resource_monitoring
[task 2022-11-10T10:34:18.309Z] 10:34:18  WARNING -     self._resource_monitor.start()
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING -   File "C:\Users\task_1668063117\build\venv\Lib\site-packages\mozsystemmonitor\resourcemonitor.py", line 300, in start
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING -     self._process.start()
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING -   File "c:\mozilla-build\python3\lib\multiprocessing\process.py", line 105, in start
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING -     self._popen = self._Popen(self)
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING -   File "c:\mozilla-build\python3\lib\multiprocessing\context.py", line 223, in _Popen
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING -     return _default_context.get_context().Process._Popen(process_obj)
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING -   File "c:\mozilla-build\python3\lib\multiprocessing\context.py", line 322, in _Popen
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING -     return Popen(process_obj)
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING -   File "c:\mozilla-build\python3\lib\multiprocessing\popen_spawn_win32.py", line 48, in __init__
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING -     None, None, False, 0, None, None, None)
[task 2022-11-10T10:34:18.310Z] 10:34:18  WARNING - FileNotFoundError: [WinError 2] The system cannot find the file specified

mozsystemmonitor is in mozbase, but unlike most mozbase packages, it is installed from the mozilla pypi:

[task 2022-11-10T10:33:26.537Z] 10:33:26     INFO -  Looking in links: https://pypi.pub.build.mozilla.org/pub/
[task 2022-11-10T10:33:27.414Z] 10:33:27     INFO -  Collecting mozsystemmonitor==0.4
[task 2022-11-10T10:33:27.418Z] 10:33:27     INFO -    Downloading https://pypi.pub.build.mozilla.org/pub/mozsystemmonitor-0.4.tar.gz (8.1 kB)
[task 2022-11-10T10:33:27.673Z] 10:33:27     INFO -  Requirement already satisfied: psutil>=3.1.1 in c:\users\task_1668063117\build\venv\lib\site-packages (from mozsystemmonitor==0.4) (5.7.0)
[task 2022-11-10T10:33:27.675Z] 10:33:27     INFO -  Using legacy 'setup.py install' for mozsystemmonitor, since package 'wheel' is not installed.
[task 2022-11-10T10:33:27.698Z] 10:33:27     INFO -  Installing collected packages: mozsystemmonitor
[task 2022-11-10T10:33:27.700Z] 10:33:27     INFO -      Running setup.py install for mozsystemmonitor: started
[task 2022-11-10T10:33:27.983Z] 10:33:27     INFO -      Running setup.py install for mozsystemmonitor: finished with status 'done'
[task 2022-11-10T10:33:27.990Z] 10:33:27     INFO -  Successfully installed mozsystemmonitor-0.4
[task 2022-11-10T10:33:28.055Z] 10:33:28     INFO - Return code: 0
...
[task 2022-11-10T10:33:30.220Z] 10:33:30     INFO -  Processing c:\users\task_1668063117\build\tests\tools\geckoprocesstypes_generator
[task 2022-11-10T10:33:30.454Z] 10:33:30     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\manifestparser
[task 2022-11-10T10:33:30.739Z] 10:33:30     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozcrash
[task 2022-11-10T10:33:30.982Z] 10:33:30     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozdebug
[task 2022-11-10T10:33:31.228Z] 10:33:31     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozdevice
[task 2022-11-10T10:33:31.475Z] 10:33:31     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozfile
[task 2022-11-10T10:33:31.734Z] 10:33:31     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozgeckoprofiler
[task 2022-11-10T10:33:31.975Z] 10:33:31     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozhttpd
[task 2022-11-10T10:33:32.220Z] 10:33:32     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozinfo
[task 2022-11-10T10:33:32.461Z] 10:33:32     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozinstall
[task 2022-11-10T10:33:32.707Z] 10:33:32     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozleak
[task 2022-11-10T10:33:32.948Z] 10:33:32     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozlog
[task 2022-11-10T10:33:33.230Z] 10:33:33     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\moznetwork
[task 2022-11-10T10:33:33.468Z] 10:33:33     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozpower
[task 2022-11-10T10:33:33.718Z] 10:33:33     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozprocess
[task 2022-11-10T10:33:33.975Z] 10:33:33     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozprofile
[task 2022-11-10T10:33:34.250Z] 10:33:34     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozproxy
[task 2022-11-10T10:33:34.542Z] 10:33:34     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozrunner
[task 2022-11-10T10:33:34.809Z] 10:33:34     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozscreenshot
[task 2022-11-10T10:33:35.047Z] 10:33:35     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\moztest
[task 2022-11-10T10:33:35.313Z] 10:33:35     INFO -  Processing c:\users\task_1668063117\build\tests\mozbase\mozversion
[task 2022-11-10T10:33:35.564Z] 10:33:35     INFO -  Processing c:\users\task_1668063117\build\tests\condprofile

There have been several minor changes to mozsystemmonitor since version 0.4 -- none obviously relevant to this bug.

See Also: → 1800121

I checked a variety of tasks for m-c. mozsystemmonitor is working reliably on most platforms, but on windows-hw, it fails consistently as seen in comment 2; usually this does not result in a task failure - that seems reasonable to me, as the only consequence is that resource usage stats are not available.

The direct cause of "can only join a started process" is that these two lines are in the wrong order:
https://searchfox.org/mozilla-central/rev/dd216c1307a2bf1b0d2465b9749fa86dac44303a/testing/mozbase/mozsystemmonitor/mozsystemmonitor/resourcemonitor.py#329-330

_running guards calls involving self._process; as written, if start() raises, _running is still True, causing trouble on teardown.

Now, why is start() failing?

(In reply to Geoff Brown [:gbrown] from comment #6)

Now, why is start() failing?

That seems to happen when multiprocessing cannot find the python executable. Docs say it takes that from sys.executable. For talos on win-hw, sys.executable is the script from the venv, C:\Users\task_1668127320\build\venv\Scripts\python.

I changed the executable in https://treeherder.mozilla.org/jobs?repo=try&revision=e0a3727ab8358a4bbe32f1385f9278c9d4b74443 and that worked great -- start() succeeded and system resource stats were collected and reported.

Now, comment 7 shows this is happening on regular windows as well, and on non-talos tests...

Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/70abb5595f9f
Improve error handling in mozsystemmonitor and enable multiprocessing in some mozharness scripts; r=jmaher
See Also: → 1800123
See Also: → 1776048
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 109 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: