Closed Bug 1368241 Opened 7 years ago Closed 6 years ago

Intermittent ImportError: No module named mozsystemmonitor.resourcemonitor

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aryx, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

https://treeherder.mozilla.org/logviewer.html#?job_id=102550183&repo=autoland

09:22:03     INFO - Fetch https://queue.taskcluster.net/v1/task/KZ5Et2QORNCOm3ERAAePdA/artifacts/public/build/target.crashreporter-symbols.zip into memory
09:22:04     INFO - Content-Length response header: 67759832
09:22:04     INFO - Bytes received: 67759832
09:22:09     INFO - Running post-action listener: _resource_record_post_action
09:22:09     INFO - Running post-action listener: set_extra_try_arguments
09:22:09     INFO - [mozharness: 2017-05-27 09:22:09.824000Z] Finished download-and-extract step (success)
09:22:09     INFO - [mozharness: 2017-05-27 09:22:09.824000Z] Running create-virtualenv step.
09:22:09     INFO - Running pre-action listener: _install_mozbase
09:22:09     INFO - Running pre-action listener: _pre_create_virtualenv
09:22:09     INFO - Running pre-action listener: _resource_record_pre_action
09:22:09     INFO - Running main action method: create_virtualenv
09:22:09     INFO - Creating virtualenv Z:\task_1495876836\build\venv
09:22:09     INFO - mkdir: Z:\task_1495876836\build\venv\Scripts
09:22:09     INFO - Copying c:\mozilla-build\python\python27.dll to Z:\task_1495876836\build\venv\Scripts\python27.dll
09:22:09     INFO - Running command: ['c:\\mozilla-build\\python\\python.exe', 'c:\\mozilla-build\\python\\Lib\\site-packages\\virtualenv.py', '--no-site-packages', '--distribute', 'Z:\\task_1495876836\\build\\venv'] in Z:\task_1495876836\build
09:22:09     INFO - Copy/paste: c:\mozilla-build\python\python.exe c:\mozilla-build\python\Lib\site-packages\virtualenv.py --no-site-packages --distribute Z:\task_1495876836\build\venv
09:22:13     INFO -  New python executable in Z:\task_1495876836\build\venv\Scripts\python.exe
09:22:23     INFO -  Installing setuptools, pip, wheel...done.
09:22:23     INFO - Return code: 0
09:22:23     INFO - Getting output from command: ['Z:\\task_1495876836\\build\\venv\\Scripts\\pip', '--version']
09:22:23     INFO - Copy/paste: Z:\task_1495876836\build\venv\Scripts\pip --version
09:22:23     INFO - Running post-action listener: _resource_record_post_action
09:22:23     INFO - Running post-action listener: _start_resource_monitoring
09:22:23  WARNING - Unable to start resource monitor: Traceback (most recent call last):
09:22:23  WARNING -   File "Z:\task_1495876836\mozharness\mozharness\base\python.py", line 583, in _start_resource_monitoring
09:22:23  WARNING -     from mozsystemmonitor.resourcemonitor import SystemResourceMonitor
09:22:23  WARNING - ImportError: No module named mozsystemmonitor.resourcemonitor
09:22:23     INFO - [mozharness: 2017-05-27 09:22:23.380000Z] Finished create-virtualenv step (failed)
09:22:23    FATAL - Uncaught exception: Traceback (most recent call last):
09:22:23    FATAL -   File "Z:\task_1495876836\mozharness\mozharness\base\script.py", line 2066, in run
09:22:23    FATAL -     self.run_action(action)
09:22:23    FATAL -   File "Z:\task_1495876836\mozharness\mozharness\base\script.py", line 2005, in run_action
09:22:23    FATAL -     self._possibly_run_method(method_name, error_if_missing=True)
09:22:23    FATAL -   File "Z:\task_1495876836\mozharness\mozharness\base\script.py", line 1945, in _possibly_run_method
09:22:23    FATAL -     return getattr(self, method_name)()
09:22:23    FATAL -   File "Z:\task_1495876836\mozharness\mozharness\base\python.py", line 436, in create_virtualenv
09:22:23    FATAL -     halt_on_failure=True)
09:22:23    FATAL -   File "Z:\task_1495876836\mozharness\mozharness\base\script.py", line 1563, in get_output_from_command
09:22:23    FATAL -     cwd=cwd, stderr=tmp_stderr, env=env)
09:22:23    FATAL -   File "c:\mozilla-build\python\lib\subprocess.py", line 710, in __init__
09:22:23    FATAL -     errread, errwrite)
09:22:23    FATAL -   File "c:\mozilla-build\python\lib\subprocess.py", line 958, in _execute_child
09:22:23    FATAL -     startupinfo)
09:22:23    FATAL - WindowsError: [Error 5] Access is denied
09:22:23    FATAL - Running post_fatal callback...
we have 44 instances in the last week,almost all on windows 7.

:catlee, can you help find someone to look at this high frequency failure?
Flags: needinfo?(catlee)
Whiteboard: [stockwell needswork]
glandium, you're more familiar with mozsystemmonitor. any ideas?
Flags: needinfo?(catlee) → needinfo?(mh+mozilla)
Whatever is causing the job to fail, it's not mozsystemmonitor.

The code is:
        try:
            from mozsystemmonitor.resourcemonitor import SystemResourceMonitor

            self.info("Starting resource monitoring.")
            self._resource_monitor = SystemResourceMonitor(poll_interval=1.0)
            self._resource_monitor.start()
        except Exception:
            self.warning("Unable to start resource monitor: %s" %
                         traceback.format_exc())

IOW, the WARNING lines are all output from that except branch. Everything that follows is normal execution with that exception ignored. That's what fails, and what the traceback says is that it's:

        output = self.get_output_from_command([pip, '--version'],
                                              halt_on_failure=True)

that fails.

09:22:23    FATAL - WindowsError: [Error 5] Access is denied

suggests pip is not readable/executable/whatever.
Flags: needinfo?(mh+mozilla)
:catlee, is this something that the relops team would look at or maybe the taskcluster team?  The failure rate is remaining high.
Flags: needinfo?(catlee)
Greg, any ideas? Looks like this is happening on the TC win7 instances.
Flags: needinfo?(catlee) → needinfo?(garndt)
Rob, do you know what could cause pip to not be readable/executable? (see comment 22)

Could this be related to the access denied messages we saw before with hg-shared?
Flags: needinfo?(garndt) → needinfo?(rthijssen)
sounds very similar to the hg problem. investigating now...
i believe the problem (on Windows 7) was caused by the missing Y: drive. On tc win instances the PIP_DOWNLOAD_CACHE environment variable is set to Y:\pip-cache and appropriate permissions are set on that directory (but only if creation of that directory was successful). On windows 7 we had a problem with the script which mounts the Y: drive (now fixed), while the Y: drive didn't exist, the env var wasn't being set causing pip calls to fall back to the buildbot default pip directory at c:\builds\pip_cache. Since this directory didn't have any permissions set, i think this was the cause of the access errors.

the errors should have already stopped occuring on win7 since the y: drive fix last week.
https://github.com/mozilla-releng/OpenCloudConfig/commit/28dcf8c9d4370aae5b138382da085f48b7ad8469
Flags: needinfo?(rthijssen)
they errors have gone way down, thanks for getting this fix.  Since we do still have errors and they are not all on win7, we either need to keep this bug open, or duplicate it so we can track the new failures.
should now be fixed with the same patch as was used for bug 1343049 where we now prevent generic worker from starting when the user environment is incomplete
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Component: General Automation → General
Blocks: 1461762
You need to log in before you can comment on or make changes to this bug.