esr78 CI is broken
Categories
(Release Engineering :: Release Automation: Other, defect)
Tracking
(firefox-esr78 fixed)
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | fixed |
People
(Reporter: mtabara, Assigned: glandium)
References
Details
Attachments
(1 file, 2 obsolete files)
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-esr78+
|
Details | Review |
Seems like esr78 is currently broken. On Friday, the tip of esr78 was this and everything was green. Today, things are broken, but we've only pushed these changes.
At first glance, looking in the logs, I spotted the following:
[task 2020-09-01T03:47:37.248Z] 03:47:37 WARNING - check> Warning: Your Pipfile requires python_version 2.7, but you are using 3.5.3 (/builds/worker/w/o/_/o/bin/python).
[task 2020-09-01T03:47:37.248Z] 03:47:37 INFO - check> $ pipenv check will surely fail.
[task 2020-09-01T03:47:37.248Z] 03:47:37 ERROR - check> Traceback (most recent call last):
[task 2020-09-01T03:47:37.248Z] 03:47:37 INFO - check> File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/virtualenv.py", line 766, in <module>
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> verify_python_version(sys.stdout)
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/virtualenv.py", line 685, in verify_python_version
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> from distutils.version import LooseVersion
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> File "<frozen importlib._bootstrap>", line 969, in _find_and_load
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> File "<frozen importlib._bootstrap>", line 666, in _load_unlocked
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> File "<frozen importlib._bootstrap>", line 577, in module_from_spec
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> File "/builds/worker/workspace/obj-build/_virtualenvs/obj-build-1hbI4qbY-python3/lib/python3.5/site-packages/_distutils_hack/__init__.py", line 82, in create_module
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> return importlib.import_module('._distutils', 'setuptools')
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> File "/builds/worker/workspace/obj-build/_virtualenvs/obj-build-1hbI4qbY-python3/lib/python3.5/importlib/__init__.py", line 126, in import_module
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> return _bootstrap._gcd_import(name[level:], package, level)
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> File "<frozen importlib._bootstrap>", line 981, in _gcd_import
[task 2020-09-01T03:47:37.249Z] 03:47:37 INFO - check> File "<frozen importlib._bootstrap>", line 931, in _sanity_check
[task 2020-09-01T03:47:37.250Z] 03:47:37 INFO - check> SystemError: Parent module 'setuptools' not loaded, cannot perform relative import
[task 2020-09-01T03:47:37.250Z] 03:47:37 INFO - check> Error running mach:
Might be a python3 fallout at first glance. Weirdly, none of the changes that have landed touched any of the python files so it might be something different that I'm not currently seeing.
Continueing invetigation.
Reporter | ||
Comment 1•4 years ago
|
||
Okay, to recap:
On Fri, Aug 28, 18:51:00, this pushlog changeset worked like a charm on ESR78, all builds green on treeherder.
On Tue, Sep 1, 02:34:13, this follow-up pushlog changeset is broken on Linux/Windows platforms for a handful of jobs that are red on threeherder.
Looking at the files, only cpp
and javascript
related files changed so at this point I'm suspecting some infra docker-deployment that might've caused this.
Next step:
a) is this falling on esr78 only
b) what kind of workers are failing
c) compare1-2 jobs from Friday vs Tuesday to see if the logs speak for themselves in terms of potential infra underlying changes
Reporter | ||
Comment 2•4 years ago
|
||
Let's take them one by one:
- Windows 2012 x64 asan
debug
,optimized
andfuzzy
builds are all broken. It works on central so it's esr78 related only.
Issues raised in TH log summary: bug 1616074, 1598844 and bug 1545973.
prov/workertype: gecko-3/b-win2012
- Linux x64 shippable opt
opt
build broken for idem with above. Works smoothly on central today so it's isolated to esr78.
prov/workertype: gecko-3/b-linux
- Linux x64 debug
debug
,base-toolchain
,base-toolchain-clang
,fuzzy-debug
all broken for apparent bug 1598845, bug 1607333 and bug 1530613.
4-TODO: similarly with 3), for Linux x64 tsan, Linux x64 asan, Linux x64 opt, Linux shippable opt and Linux debug. All work on central.
One of the jobs failing is also Valgrind with
prov/workertype: gecko-3/b-linux-aws
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 3•4 years ago
|
||
I took the Linux debug build from Friday and compare the logs vs today, found an interesting thing.
Friday:
[task 2020-08-28T18:17:23.859Z] 18:17:23 INFO - check> Virtualenv location: /builds/worker/workspace/obj-build/_virtualenvs/obj-build-1hbI4qbY-python3
[task 2020-08-28T18:17:23.859Z] 18:17:23 WARNING - check> Warning: Your Pipfile requires python_version 2.7, but you are using 3.5.3 (/builds/worker/w/o/_/o/bin/python).
[task 2020-08-28T18:17:23.859Z] 18:17:23 INFO - check> $ pipenv check will surely fail.
[task 2020-08-28T18:17:23.859Z] 18:17:23 WARNING - check> Warning: Your Pipfile requires python_version 2.7, but you are using 3.5.3 (/builds/worker/w/o/_/o/bin/python).
[task 2020-08-28T18:17:23.860Z] 18:17:23 INFO - check> $ pipenv check will surely fail.
[task 2020-08-28T18:17:23.860Z] 18:17:23 INFO - check> Error processing command. Ignoring because optional. (optional:setup.py:third_party/python/psutil:build_ext:--inplace)
[task 2020-08-28T18:17:23.860Z] 18:17:23 INFO - check> Error processing command. Ignoring because optional. (optional:packages.txt:comm/build/virtualenv_packages.txt)
[task 2020-08-28T18:17:23.860Z] 18:17:23 INFO - check> /builds/worker/checkouts/gecko/xpcom/idl-parser/xpidl/runtests.py
[task 2020-08-28T18:17:23.860Z] 18:17:23 INFO - check> TEST-PASS | /builds/worker/checkouts/gecko/xpcom/idl-parser/xpidl/runtests.py |
A couple of days later, same build, same spot:
[task 2020-09-01T02:20:03.420Z] 02:20:03 INFO - check> Virtualenv location: /builds/worker/workspace/obj-build/_virtualenvs/obj-build-1hbI4qbY-python3
[task 2020-09-01T02:20:03.420Z] 02:20:03 WARNING - check> Warning: Your Pipfile requires python_version 2.7, but you are using 3.5.3 (/builds/worker/w/o/_/o/bin/python).
[task 2020-09-01T02:20:03.420Z] 02:20:03 INFO - check> $ pipenv check will surely fail.
[task 2020-09-01T02:20:03.420Z] 02:20:03 WARNING - check> Warning: Your Pipfile requires python_version 2.7, but you are using 3.5.3 (/builds/worker/w/o/_/o/bin/python).
[task 2020-09-01T02:20:03.420Z] 02:20:03 INFO - check> $ pipenv check will surely fail.
[task 2020-09-01T02:20:03.420Z] 02:20:03 ERROR - check> Traceback (most recent call last):
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/virtualenv.py", line 766, in <module>
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> verify_python_version(sys.stdout)
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> File "/builds/worker/checkouts/gecko/python/mozbuild/mozbuild/virtualenv.py", line 685, in verify_python_version
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> from distutils.version import LooseVersion
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> File "<frozen importlib._bootstrap>", line 969, in _find_and_load
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> File "<frozen importlib._bootstrap>", line 666, in _load_unlocked
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> File "<frozen importlib._bootstrap>", line 577, in module_from_spec
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> File "/builds/worker/workspace/obj-build/_virtualenvs/obj-build-1hbI4qbY-python3/lib/python3.5/site-packages/_distutils_hack/__init__.py", line 82, in create_module
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> return importlib.import_module('._distutils', 'setuptools')
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> File "/builds/worker/workspace/obj-build/_virtualenvs/obj-build-1hbI4qbY-python3/lib/python3.5/importlib/__init__.py", line 126, in import_module
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> return _bootstrap._gcd_import(name[level:], package, level)
[task 2020-09-01T02:20:03.421Z] 02:20:03 INFO - check> File "<frozen importlib._bootstrap>", line 981, in _gcd_import
[task 2020-09-01T02:20:03.422Z] 02:20:03 INFO - check> File "<frozen importlib._bootstrap>", line 931, in _sanity_check
[task 2020-09-01T02:20:03.422Z] 02:20:03 INFO - check> SystemError: Parent module 'setuptools' not loaded, cannot perform relative import
Somewhat weird that they behaved differently. I'll check to see what we have on central on a green build.
Comment 4•4 years ago
|
||
Interestingly, this fails even with bug 1594914 grafted onto ESR78:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b9ff2f6bf291414655caa3bd184c95acd8d5187
I also did verify that previously-green changesets are failing now, so this definitely wasn't due to an in-tree change.
Reporter | ||
Comment 5•4 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #4)
Interestingly, this fails even with bug 1594914 grafted onto ESR78:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0b9ff2f6bf291414655caa3bd184c95acd8d5187I also did verify that previously-green changesets are failing now, so this definitely wasn't due to an in-tree change.
Yeah, sounds like an infra-change but I don't get how central is still working since the pool of workers are shared in gecko-3
. If it was failing as a side-effect of infra-change, it would have to fail for central too.
Comment 6•4 years ago
|
||
I don't have a cogent linear explanation of what is going on here, but there are a couple virtualenv
changes I've made in recent history that we've already known are suspect (namely bug 1660351), and that could be an issue.
If we're still caching objdir
s between runs in automation (something that has caused CI incidents before), then this seems like a safe bet.
Comment 7•4 years ago
|
||
Kind of a shot in the dark because again, I don't understand this fully, but I suspect that backporting bug 1656614 into ESR78 would fix this.
Reporter | ||
Comment 8•4 years ago
|
||
RyanVM did a try push on top of esr78 and seems like bug 1594914 doesn't solve this. He's testing bug 1659575 and bug 1656614 via the same way to see if this fixes things.
Reporter | ||
Comment 9•4 years ago
|
||
One of the thoughts that I have is that https://hg.mozilla.org/mozilla-central/rev/33979c798b55 from bug 1661637 landed on Saturday causing docker-images to rebuild for level-3 workers which are shared across central/esr78. So I'm wondering whether this change needs grafting to esr78 but I'm not 100% I'm right, I'm checking some of the taskgraph internal logic now. I think the change might've impacted just the decision tasks and forced the downstream dependencies (such as toolchains) to rebuilt, but hasn't impacted the actual Linux/Windows workers.
Reporter | ||
Comment 10•4 years ago
|
||
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #9)
One of the thoughts that I have is that https://hg.mozilla.org/mozilla-central/rev/33979c798b55 from bug 1661637 landed on Saturday causing docker-images to rebuild for level-3 workers which are shared across central/esr78. So I'm wondering whether this change needs grafting to esr78 but I'm not 100% I'm right, I'm checking some of the taskgraph internal logic now. I think the change might've impacted just the decision tasks and forced the downstream dependencies (such as toolchains) to rebuilt, but hasn't impacted the actual Linux/Windows workers.
Theory doesn't stand, build task should not be affected. Also we should be seeing this fail on beta/release too, not just esr78.
(In reply to Mihai Tabara [:mtabara]⌚️GMT from comment #8)
RyanVM did a try push on top of esr78 and seems like bug 1594914 doesn't solve this. He's testing bug 1659575 and bug 1656614 via the same way to see if this fixes things.
Neither worked, all three patches failed via try pushes on top of esr78.
Comment 11•4 years ago
|
||
I think my try pushes prove that https://github.com/pypa/setuptools/issues/2350 is to blame.
The real fix is probably to wait for setuptools to release a fixed version. If we can live with a busted esr78 for some period of time, that may be preferable.
This try push contains this fix which forces builds to not use the new broken behavior. However, this appears to negatively affect or not fix some tests, e.g. source-test-python-tryselect.
Comment 12•4 years ago
|
||
Setuptools 50.0.1 is out. The changelog appears promising. This may Just Work now.
Comment 13•4 years ago
|
||
This patch should only land on esr78, and only until we get an upstream setuptools fix.
Comment 14•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 15•4 years ago
|
||
The real fix is probably to wait for setuptools to release a fixed version.
It seems the me the real fix is to not have automation install whatever last version of setuptools is available when it runs. We use static version numbers to avoid this very problem we're facing right now.
Assignee | ||
Comment 16•4 years ago
|
||
It also seems all the failures are only happening during make check when running python-tests with python3. For esr78, it might be fair game to just not run them.
Assignee | ||
Comment 17•4 years ago
|
||
It's also worth noting we do have a wheel for setuptools 41.6, and we should be using it already, so how do we get hit by the setuptools 50 thing?
Assignee | ||
Comment 18•4 years ago
|
||
So... I took a build job that failed on esr78, edited it on taskcluster to get an interactive task and... it didn't fail and I can't reproduce. So... it looks like it might have fixed itself, which is not really reassuring. I'm going to retrigger a few jobs on esr78 and see what happens.
Assignee | ||
Comment 19•4 years ago
|
||
So the core problem is that pipenv is installing the latest setuptools from the network rather than the in-tree wheel we use for virtualenv or a static version. It now installs 50.0.3.
Assignee | ||
Comment 20•4 years ago
|
||
Linux builds have apparently fixed themselves with that new version, but not the Windows builds.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 22•4 years ago
|
||
By setting PIP_NO_INDEX when running pipenv, we make it populate the
virtualenv with the in-tree wheels for e.g. pip and setuptools, instead of
whatever happens to be the latest version the day pipenv runs. This
makes it match what we already do for other virtualenvs (with
--no-download in VirtualenvManager.create).
Comment 23•4 years ago
|
||
I think my phabricator revision will fix this. More details in https://phabricator.services.mozilla.com/D89088#2800138(In reply to Mike Hommey [:glandium] from comment #22)
Created attachment 9173567 [details]
Bug 1662381 - Don't let pipenv initialize the virtualenv with packages from pypi.By setting PIP_NO_INDEX when running pipenv, we make it populate the
virtualenv with the in-tree wheels for e.g. pip and setuptools, instead of
whatever happens to be the latest version the day pipenv runs. This
makes it match what we already do for other virtualenvs (with
--no-download in VirtualenvManager.create).
Thank you for this!
Reporter | ||
Comment 24•4 years ago
|
||
Comment on attachment 9173567 [details]
Bug 1662381 - Don't let pipenv initialize the virtualenv with packages from pypi.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Fixing release automation for a bunch of builds given the
setuptools
saga. - User impact if declined:
- Fix Landed on Version:
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
- String or UUID changes made by this patch:
Reporter | ||
Updated•4 years ago
|
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 26•4 years ago
|
||
Note that esr78 fixed itself entirely because even newer setuptools releases reverted the problematic changes. That said, we should still land the fix because esr78 is going to be supported for a year still, and we don't want similar surprises happening in the future.
Comment 27•4 years ago
|
||
Comment on attachment 9173567 [details]
Bug 1662381 - Don't let pipenv initialize the virtualenv with packages from pypi.
Approved for 78.3esr. Is this going to land on m-c still?
Comment 28•4 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 29•4 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #27)
Comment on attachment 9173567 [details]
Bug 1662381 - Don't let pipenv initialize the virtualenv with packages from pypi.Approved for 78.3esr. Is this going to land on m-c still?
It's not needed on m-c anymore, as we have removed pipenv entirely.
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Description
•