Closed Bug 1262760 Opened 4 years ago Closed 3 years ago

OSError: [Errno 40] Too many levels of symbolic links

Categories

(Release Engineering :: General, defect, critical)

x86_64
Linux
defect
Not set
critical

Tracking

(firefox-esr45 fixed)

RESOLVED FIXED
Tracking Status
firefox-esr45 --- fixed

People

(Reporter: ishikawa, Assigned: ewong)

References

Details

Attachments

(3 files, 11 obsolete files)

7.10 KB, patch
jlund
: review+
Details | Diff | Splinter Review
635 bytes, patch
glandium
: review+
Details | Diff | Splinter Review
3.44 KB, patch
jlund
: review+
rail
: review+
Details | Diff | Splinter Review
Just so people who should be aware are notified:
I have seen submissions of linux build to try-comm-central failing
with the following issue:

OSError: [Errno 40] Too many levels of symbolic links
Failed to create virtualenv: /builds/slave/tb-try-c-cen-lx-00000000000000/build/objdir-tb/_virtualenv

several times. (Not only mine which saw this error for the first time on
Tuesday,
https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=cfd9ec7c43bd3430425b830576357220b92bfe15

but some subsequent submissions by ewong, aleth, etc. also suffered from this
same error (not all of them, but they also seemed to have failed due to other
infrastructure issue).

OS X and Windows seem to be unaffected with such issues, and so
I suspect something is very amiss with linux platform for try-comm-central.

I don't think this is related to some script issue such as
bug 1262057.

TIA
> I suspect something is very amiss with linux platform for try-comm-central.

I meant to say that something is very wrong with linux platform.

TIA
The failure is during the get buildID stage, which from :glandium's
suggestion stems from a mismatch use of python.  

I'm still looking at it.  I am a bit concerned that it could be related
to my patch.  :(
Attached patch [buildbotcustom] wip patch (obsolete) — Splinter Review
Jordan, this is a prelim patch.   Not exactly sure if this is the right approach.
|make buildsymbols| stage is busted as well, but I don't know if it's related
to the previous steps' bustages since it uses "make" and not "pymake", so it's
getting the python from the virtualenv.
Attachment #8739298 - Flags: feedback?(jlund)
Comment on attachment 8739298 [details] [diff] [review]
[buildbotcustom] wip patch

Review of attachment 8739298 [details] [diff] [review]:
-----------------------------------------------------------------

1) in comment 2, which patch do you think could be related?
2) when was the last time this step worked?
3) what other products/branches share this factory and build step?

I would get the final r? of someone like glandium. I'm not too familiar with mach and using calls like: `/some/abs/path/to/python mach /some/abs/path/to/python $args`. I get what you are doing but I'd be a bad reviewer :)
Attachment #8739298 - Flags: feedback?(jlund) → feedback+
(In reply to Jordan Lund (:jlund) from comment #4)
> Comment on attachment 8739298 [details] [diff] [review]
> [buildbotcustom] wip patch
> 
> Review of attachment 8739298 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> 1) in comment 2, which patch do you think could be related?

:glandium (for lack of a better word) deconfirmed that my patch is
related to the bustage.  I'm looking at what change might have
happened to have caused this.

> 2) when was the last time this step worked?

widr, I have no idea.  I hardly pay attention to the treeherder c-c
tree.  it's only because that I'm fighting the suite tree and
came across the bug 1262057, which I was hoping to fix the
bustage of the added mozilla/ in the gcc path; but that opened
this bug. ;/  And Linux* has been busted for so long, I can't
see any of the very old good 

> 3) what other products/branches share this factory and build step?

I'm looking but with a prelim scan, it looks like Firefox also use
this (judging from bug 1232466).  

> 
> I would get the final r? of someone like glandium. I'm not too familiar with
> mach and using calls like: `/some/abs/path/to/python mach
> /some/abs/path/to/python $args`. I get what you are doing but I'd be a bad
> reviewer :)

tbh, it was one way to consistently use the same python executable;
however, I agree.. having glandium's input would be better.
(In reply to Edmund Wong (:ewong) from comment #5)
> (In reply to Jordan Lund (:jlund) from comment #4)
> > Comment on attachment 8739298 [details] [diff] [review]
> > [buildbotcustom] wip patch
> > 
> > Review of attachment 8739298 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > 1) in comment 2, which patch do you think could be related?
> 
> :glandium (for lack of a better word) deconfirmed that my patch is
> related to the bustage.  I'm looking at what change might have
> happened to have caused this.
> 
> > 2) when was the last time this step worked?
> 
> widr, I have no idea.  I hardly pay attention to the treeherder c-c
> tree.  it's only because that I'm fighting the suite tree and
> came across the bug 1262057, which I was hoping to fix the
> bustage of the added mozilla/ in the gcc path; but that opened
> this bug. ;/  And Linux* has been busted for so long, I can't
> see any of the very old good 

(clicked save by mistake).

(helps to get next 50).  anyway.  Last good Linux* build was Mar 8th.
Comment on attachment 8739298 [details] [diff] [review]
[buildbotcustom] wip patch

:glandium, is this something that would make mach consistently
use the same python executable?
Attachment #8739298 - Flags: feedback?(mh+mozilla)
Comment on attachment 8739298 [details] [diff] [review]
[buildbotcustom] wip patch

Review of attachment 8739298 [details] [diff] [review]:
-----------------------------------------------------------------

I don't know.
Attachment #8739298 - Flags: feedback?(mh+mozilla)
Attachment #8739298 - Flags: review?(jlund)
Comment on attachment 8739298 [details] [diff] [review]
[buildbotcustom] wip patch

Review of attachment 8739298 [details] [diff] [review]:
-----------------------------------------------------------------

still prefer not to review this but as mentioned over irc, we can try this out together tomorrow and hopefully catch any fallout before sheriffs. This is hard to test as it is used extensively on mozilla-release but not much else firefox wise.
Attachment #8739298 - Flags: review?(jlund) → review+
Comment on attachment 8739298 [details] [diff] [review]
[buildbotcustom] wip patch

I'm suspicious curmudgeon so I tried 
 /tools/buildbot/bin/python /builds/slave/tb-try-c-cen-lx-00000000000000/build/mozilla/mach /tools/buildbot/bin/python /builds/slave/tb-try-c-cen-lx-00000000000000/build/mozilla/config/printconfigsetting.py /builds/slave/tb-try-c-cen-lx-00000000000000/build/objdir-tb/dist/bin/application.ini App BuildID
on a slave, it yields:

It looks like you are trying to run an unknown mach command: /tools/buildbot/bin/python

Run |mach help| to show a list of commands.
----

I suggest a loaner to someone who knows the commm build system.
Attachment #8739298 - Flags: review+ → review-
Blocks: 1263805
(In reply to Jordan Lund (:jlund) from comment #9)
> Comment on attachment 8739298 [details] [diff] [review]
> [buildbotcustom] wip patch
> 
> Review of attachment 8739298 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> still prefer not to review this but as mentioned over irc, we can try this
> out together tomorrow and hopefully catch any fallout before sheriffs. This
> is hard to test as it is used extensively on mozilla-release but not much
> else firefox wise.

Looking at the customcode, I don't believe m-r even uses
this set of code.

Given:
http://hg.mozilla.org/build/buildbotcustom/file/tip/process/release.py#l585

which is what I'm assuming m-r uses, graphServer isn't stated in the
ReleaseBuildFactory argument list.  As such, 
http://hg.mozilla.org/build/buildbotcustom/file/tip/process/factory.py#l2451
doesn't have graphServer in the argument list in the MercurialBuildFactory
init line and so:

http://hg.mozilla.org/build/buildbotcustom/file/tip/process/factory.py#l817

graphServer=None,

and therefore:
http://hg.mozilla.org/build/buildbotcustom/file/tip/process/factory.py#l1430

is ignored in m-r builds.

And since Firefox doesn't use buildbot in (m-c, m-a and m-b), therefore
Firefox doesn't even use addBuildinfoSteps().

However, for TB, it uses buildbot for all the trees.

Looking at the bustages and non-bustages since c-c and c-a uses the
same buildbot code, I think I understand what's going on.

Firstly, http://hg.mozilla.org/build/buildbotcustom/file/tip/process/factory.py#l1386
essentially runs |/tools/buildbot/bin/python mach python ...| and it jumps to:

http://hg.mozilla.org/mozilla-central/file/tip/python/mach_commands.py#l62

which goes:

http://mxr.mozilla.org/comm-central/source/mozilla/python/mozbuild/mozbuild/base.py#665
which runs self.virtualenv_manager.ensure()[1] and self.virtualenv_manager.activate().

self.virtualenv_manager initializes self.python_path by getting
self.bin_path  (http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l70)
and basically concatenates that bin_path(basically the _virtualenv/bin path in the objdir) with
'python', so self.python_path becomes <objdir/_virtualenv/bin/python>

Then it gets info on this self.python_path via:

http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l92


It is busting in the ensure() step, which is:
http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l152

Since [1] runs without python= argument, ensure uses the default "sys.executable".

Then it calls |self.up_to_date(python)| which is here: 
http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l111

And here is the condition where it bails out:

http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l133

because python = sys.executable  (/tools/buildbot/bin/python) and
self.python_path = <objdir/_virtualenv/bin/python).

The sizes are the same, but since the paths are different the first condition:
  (/tools/buildbot/bin/python, 4896) vs. (<objdir/_virtualenv/bin/python>, 4896)

and the second condition
  (34014192, 4896) vs (34015216, 4896)

So with both conditions failing, it returns False.

At this point, I'm not sure what the correct way of fixing this is.  As I see
it, these are these options:

1) Patch virtualenv.py's check to make sure it's comparing the right
   python executable.  I'm assuming the check was to prevent something
   corrupting the _virtualenv's python; but the check currently assumes
   that |mach python| is called with the virtualenv's python, which
   in this case it isn't.

2) Use the objdir's virtualenv to call |mach python|

    - This assumes that the objdir's virtualenv path exists when
      calling |mach python| and assumes that no steps prior to this
      one blows away the virtualenv's path (even though |mach python|
      will build the virtualenv, the call to objdir's virtualenv python
      will fail).

3) Add a new step before addBuildInfos() to blow away the existing objdir's _virtualenv
   and have addBuildInfos() build it from scratch.

   - I'm not exactly sure this is a good idea as it might get rid of
     some 'stuff' pertinent to future steps.

:glandium, :jlund, any idea?
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(jlund)
starting with a simple change.
Attachment #8739298 - Attachment is obsolete: true
Attachment #8740778 - Flags: review?(jlund)
(In reply to Edmund Wong (:ewong) from comment #11)
> And here is the condition where it bails out:
> 
> http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/
> virtualenv.py#l133
> 
> because python = sys.executable  (/tools/buildbot/bin/python) and
> self.python_path = <objdir/_virtualenv/bin/python).
> 
> The sizes are the same, but since the paths are different the first
> condition:
>   (/tools/buildbot/bin/python, 4896) vs. (<objdir/_virtualenv/bin/python>,
> 4896)
> 
> and the second condition
>   (34014192, 4896) vs (34015216, 4896)
> 
> So with both conditions failing, it returns False.

Come to think of it, that condition is weird, and would be a regression from bug 1255585. Chris, why did you add the python path in the condition?

That said, even with that weirdness out of the way, you *do* have a problem where you're running mach with a python that is definitely not the same as the one used to create the virtualenv, because 34014192 != 34015216 (which means the python version is not exactly the same!). It's fine that you run mach with a python that is not the one from the virtualenv, but then it has the be the exact same one that was used to create the virtualenv.
Flags: needinfo?(mh+mozilla) → needinfo?(cmanchester)
The intention of the new check is to enforce exactly that -- either we're already running the virtualenv python, or we're using a python that's compatible with the python that created the virtualenv (so the caller can activate the virtualenv and expect it to work).

The first part of the condition is intending to catch the case we're already in the virtualenv. The second part would fail in this case, because on OS X, virtualenv will modify the interpreter binary when it creates the virtualenv, so the file size would not match the size of the interpreter used to create the virtualenv.
Flags: needinfo?(cmanchester)
(In reply to Edmund Wong (:ewong) from comment #11)
> Looking at the customcode, I don't believe m-r even uses
> this set of code.
> 
> Given:
> http://hg.mozilla.org/build/buildbotcustom/file/tip/process/release.py#l585
> 
> which is what I'm assuming m-r uses, 

we use this for actual release automation (automation for shipping firefox). However, we have CI builds just like m-c,m-a,m-b that uses the same entry point: https://dxr.mozilla.org/build-central/source/buildbotcustom/misc.py#1469

you can see proof of this by looking at a continuous integration build here: https://treeherder.mozilla.org/#/jobs?repo=mozilla-release&selectedJob=164811

e.g.

log: http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-release-linux/1460197253/mozilla-release-linux-bm72-build1-build24.txt.gz

log_snippet: /tools/buildbot/bin/python /builds/slave/m-rel-lx-000000000000000000000/build/mach python /builds/slave/m-rel-lx-000000000000000000000/build/config/printconfigsetting.py /builds/slave/m-rel-lx-000000000000000000000/build/obj-firefox/dist/bin/application.ini App BuildID

as to the other part of actually debugging the issue, I'll leave it to others here who are more familiar with the inner details to help diagnose
Flags: needinfo?(jlund)
Comment on attachment 8740778 [details] [diff] [review]
[buildbotcustom] proposed patch (v2)

Review of attachment 8740778 [details] [diff] [review]:
-----------------------------------------------------------------

again, I don't think I should review this. hopefully something comes out of the comments between you, glandium and chris in comment 11-14. if you get a r+ from someone, I can help integrate it into production.
Attachment #8740778 - Flags: review?(jlund)
(In reply to Chris Manchester (:chmanchester) from comment #14)
> The intention of the new check is to enforce exactly that -- either we're
> already running the virtualenv python, or we're using a python that's
> compatible with the python that created the virtualenv (so the caller can
> activate the virtualenv and expect it to work).
> 
> The first part of the condition is intending to catch the case we're already
> in the virtualenv. The second part would fail in this case, because on OS X,
> virtualenv will modify the interpreter binary when it creates the
> virtualenv, so the file size would not match the size of the interpreter
> used to create the virtualenv.

Ah okay, rereading it, it makes sense now. So we're left with the problem that the job is running mach with a *different* python than mach created the virtualenv with.
Is the issue here that the python for the virtualenv comes from tooltool, as opposed to /tools/buildbot/bin/python?
Your python comes from tooltool? O_o
Is there a symlink or are symlinks created during virtualenv setup?
If so, that is where I would focus attention although I know nothing about virtualenv creation :-(
Having done some tests, the issue I'm seeing is that the |compile|
step creates the virtualenv from /usr/local/bin/python which is
v 2.7.7.   In subsequent steps, it uses sys.executable which
is /tools/buildbot/bin/python which is 2.7.3.

Here is how I think it's being run:

1)
http://mxr.mozilla.org/build/source/buildbotcustom/process/factory.py#1386

2)
http://hg.mozilla.org/mozilla-central/file/tip/python/mach_commands.py#l62

3)
http://mxr.mozilla.org/comm-central/source/mozilla/python/mozbuild/mozbuild/base.py#665

4)
http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l152

5)
http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l111

6)
http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l133

However, for some reasons, it completely bypasses all the above and
goes directly to:

http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l163

and then to

http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l178

http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l196

which then goes to virtualenv/virtualenv.py and tries
to create_environment via:

http://hg.mozilla.org/mozilla-central/file/tip/python/virtualenv/virtualenv.py#l698

and inside which, at:

http://hg.mozilla.org/mozilla-central/file/tip/python/virtualenv/virtualenv.py#l919

which install_python() overwrites the current 2.7.7 python
(but not overwriting the python_exe.ext file, thus the difference
in hexversion) with the 2.7.3 version.

But during the tests, I'm getting confused by the logs as given by:

http://archive.mozilla.org/pub/thunderbird/try-builds/ewong@pw-wspx.org-4f2a53e915fe2c5edb9f2015bd964d8d9fc99f3e/try-comm-central-linux64/try-comm-central-linux64-bm79-try1-build70.txt.gz

(the patch which is : https://hg.mozilla.org/try-comm-central/rev/08ad60045e69ab9c897149bc1a0ce61785cc3f99)
[do pardon the haphazard logging..]

the log shows it runs ensure AFTER it writes the new python,
but I haven't quite figured out how it is going directly
from buildbot's http://mxr.mozilla.org/build/source/buildbotcustom/process/factory.py#1392
to running virtualenv/virtualenv.py without even showing
the logs for ensure().

I suspect I am misunderstanding how the logging works.

As for the solution, I'm guessing that there should be a check
before http://hg.mozilla.org/mozilla-central/file/tip/python/virtualenv/virtualenv.py#l698 to
see if a python exists in home_dir. If so, completely skip the
create_environment.

:glandium, can you clarify why I'm seeing such a strange result in
that while it probably does run ensure(), it just skips the 
logging? (is it because the logging works differently, and that I
need Logger to do the logging, instead of self.logx.write(..)?)
[yes, I could've just used self.log_handler.write() but I'm just
lazy :P)
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #19)
> Your python comes from tooltool? O_o
Umm, no, I completely misread some code.
(In reply to Edmund Wong (:ewong) from comment #21)
> Having done some tests, the issue I'm seeing is that the |compile|
> step creates the virtualenv from /usr/local/bin/python which is
> v 2.7.7.   In subsequent steps, it uses sys.executable which
> is /tools/buildbot/bin/python which is 2.7.3.

Which of these pythons should it be using, and why?
(In reply to Edmund Wong (:ewong) from comment #21)
> Having done some tests, the issue I'm seeing is that the |compile|
> step creates the virtualenv from /usr/local/bin/python which is
> v 2.7.7.   In subsequent steps, it uses sys.executable which
> is /tools/buildbot/bin/python which is 2.7.3.

sys.executable is whatever the path of the python that was used to run the script was. So it's only /tools/buildbot/bin/python if that's what you used to run the python script. And apparently, running mach build makes you use /usr/local/bin/python. I'll repeat myself, but this is exactly what you shouldn't be doing (although virtualenv should be able to switch between both without screwing up symlinks)
Flags: needinfo?(mh+mozilla)
(In reply to OrangeFactor Robot from comment #25)
> 6 automation job failures were associated with this bug in the last 7 days.
> 
> Repository breakdown:
> * comm-central: 6
> 
> Platform breakdown:
> * linux64: 3
> * linux32: 3
> 
> For more details, see:
> https://brasstacks.mozilla.com/orangefactor/
> ?display=Bug&bugid=1262760&startday=2016-04-18&endday=2016-04-24&tree=all

I think the issue more serious than this report indicates.
I have not bothered to submit compilation jobs to tryserver even for checking the sanity of my patches on pristine tree due to the bug. :-(
After comparing the logs and chatting with :jlund, here are my findings:

m-c (and now m-a) has this code:

http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l132

but why is it busted with TB and not FF?

The difference that I came up with are the following:

1) FF uses mozharness, which when running the buildsymbols step,
   it searches for the python2.7 exe via:

   http://hg.mozilla.org/releases/mozilla-aurora/file/tip/testing/mozharness/mozharness/mozilla/building/buildbase.py#l733

   And it happens to find the _virtualenv/bin/python2.7 one.

   But TB uses:
   
   http://hg.mozilla.org/build/buildbotcustom/file/tip/process/factory.py#l1380

   As mentioned earlier, /tools/buildbot/bin/python is 2.7.3, and the python in
   _virtualenv is 2.7.7.

2) Why isn't FF busted?

   because FF doesn't use buildbot anymore aside for ESR branches and so
   it won't bust.  If FF used buildbot steps w/o the mozharness, I am
   sure it will break assuming that on the slaves, /tools/buildbot/bin/python
   is anything but 2.7.7 (the version which created the virtualenv in the 
   compile step).

   Why?

   As mentioned earlier, during the compile step, /usr/local/bin/python2.7 (2.7.7)is
   used to create the virtualenv.  In the buildsymbols step, the buildbot step
   runs |/tools/buildbot/bin/python mach python ...| which eventually runs the
   following line:

   http://mxr.mozilla.org/mozilla-central/source/python/virtualenv/virtualenv.py#1214
   which overwrites the existing python with /tools/buildbot/bin/python, but
   it doesn't update the information in the objdir/_virtualenv/python_exe.txt
   (especially it doesn't update the sys.hexversion). From that:

   http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l133

   this fails the check and so it goes to create the virtualenv again.

   However, by now, the virtualenv's python has been overwritten and since it fails
   line 133 in virtualenv.py, it goes to :

   http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l163

   which calls:

   http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l178

   but with the confusion of python 2.7.3 being used on a 2.7.7-created virtualenv,
   things go bonkers.

   I made a test.  I commented out:

   http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l132

   to 135

   and I'm waiting for the results; but I expect it to go green, like
   https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&selectedJob=18104


   So, the option is to if we are to install a new virtualenv, we need to make sure that
   aside for overwriting the python (and its symlink), we also need to fix the
   information within python_exe.txt.  I am not certain that is a good idea
   as changing the virtualenv's python which was used to build TB might
   cause issues later on.

   So since FF doesn't use buildbot (aside for m-esr*), I'm guessing the possible
   way is to apply the attached patch, which explicitly uses the non-modified
   virtualenv's python.  

Can someone clarify whether I got anything wrong?
(In reply to Edmund Wong (:ewong) from comment #28)
> > http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/
> > virtualenv.py#l132
> > 
> >    to 135
> > 
> >    and I'm waiting for the results; but I expect it to go green, like
> >   
> > https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&selectedJob=18104
> > 
> 
> result:
> https://treeherder.mozilla.org/#/jobs?repo=try-comm-
> central&fromchange=132eb2d29eee69324708536617348601f6ab26ee&selectedJob=18106

This looks very promising.
Thank you for investigating this!
Comment on attachment 8740778 [details] [diff] [review]
[buildbotcustom] proposed patch (v2)

I just did a test with a loaner.

Indeed, at the beginning of the |make buildsymbols| step, it overwrites
the python binaries via:

http://mxr.mozilla.org/mozilla-central/source/python/virtualenv/virtualenv.py#1214

and 

http://mxr.mozilla.org/mozilla-central/source/python/virtualenv/virtualenv.py#1215

and it overwrites the symlinks:

http://mxr.mozilla.org/mozilla-central/source/python/virtualenv/virtualenv.py#1348

Which screws up the virtualenv.

The option is to add to the virtualenv.py code, prior to 
http://mxr.mozilla.org/mozilla-central/source/python/virtualenv/virtualenv.py#919

is to check if the home_dir exists *and* has already python bins.
If it does exist and has existing python bins, don't overwrite them.

Another option:

Use this buildbotcustom patch, which isn't used in m-c to m-r,
and if someone can correct me, the current m-a code won't hit
m-esr* (at least, not the following lines):

http://hg.mozilla.org/mozilla-central/file/tip/python/mozbuild/mozbuild/virtualenv.py#l133

to 135.

:glandium, with the 2nd option, it won't touch the tree; but, there
is a possible future bug with that, especially if the aforementioned
code hits m-esr* *AND* we still use /tools/buildbot/bin/python *and*
/tools/buildbot/bin/python is 2.7.3.  

:jlund,  given my stated findings, any opinions?
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(jlund)
(In reply to aleth [:aleth] from comment #23)
> (In reply to Edmund Wong (:ewong) from comment #21)
> > Having done some tests, the issue I'm seeing is that the |compile|
> > step creates the virtualenv from /usr/local/bin/python which is
> > v 2.7.7.   In subsequent steps, it uses sys.executable which
> > is /tools/buildbot/bin/python which is 2.7.3.
> 
> Which of these pythons should it be using, and why?

I believe the objdir/_virtualenv/bin/python is the right one to use,
regardless of what the sys.prefix is.
(In reply to Edmund Wong (:ewong) from comment #31)
> The option is to add to the virtualenv.py code, prior to 
> http://mxr.mozilla.org/mozilla-central/source/python/virtualenv/virtualenv.
> py#919
> 
> is to check if the home_dir exists *and* has already python bins.
> If it does exist and has existing python bins, don't overwrite them.

Overwriting is pretty much the desired effect. The problem is the fuckup with symlinks.
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #33)
> (In reply to Edmund Wong (:ewong) from comment #31)
> > The option is to add to the virtualenv.py code, prior to 
> > http://mxr.mozilla.org/mozilla-central/source/python/virtualenv/virtualenv.
> > py#919
> > 
> > is to check if the home_dir exists *and* has already python bins.
> > If it does exist and has existing python bins, don't overwrite them.
> 
> Overwriting is pretty much the desired effect. The problem is the fuckup
> with symlinks.

In that case, I'll try to figure out how to untangle this mess, which means
the buildbotcustom patch can wait.
Flags: needinfo?(jlund)
Attached patch proposed patch (v1) (obsolete) — Splinter Review
While I am quite pleased with the following results for the
proposed patch:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a2f7bdb83ab370163ea4a83388259efb89a9637d

I am not certain it is right.  The only thing it does is prevent it
from doing symlinks when objdir/_virtualenv/bin/python is already
a symlink (to what, I don't check.  Should I?)
Attachment #8747394 - Flags: review?(mh+mozilla)
(In reply to Edmund Wong (:ewong) from comment #36)
> but try-c-c is a different story:
> https://treeherder.mozilla.org/#/jobs?repo=try-comm-
> central&revision=607e41e1c4e102f0a7f412cdd03d535177c046e6

Looking at the above you'd think that Linux was green and somehow
linux debug is busted.  It's all a lie.  Strictly speaking, Linux *
should all be red because the buildid is set to gibberish and
for some reasons, the graph_server post step accepts this
garbage and sends it off its merry way.

During the printconfigsettings for the BuildID, we get the output:
Using real prefix '/tools/python27'
New python executable in /builds/slave/tb-try-c-cen-lx-00000000000000/build/objdir-tb/_virtualenv/bin/python
Installing setuptools, pip, wheel...done.
running build_ext
copying build/lib.linux-x86_64-2.7/psutil/_psutil_linux.so -> psutil
copying build/lib.linux-x86_64-2.7/psutil/_psutil_posix.so -> psutil

20160429181908
program finished with exit code 0
elapsedTime=2.660079
buildid: "Using real prefix '/tools/python27'\nNew python executable in /builds/slave/tb-try-c-cen-lx-00000000000000/build/objdir-tb/_virtualenv/bin/python\nInstalling setuptools, pip, wheel...done.\nrunning build_ext\ncopying build/lib.linux-x86_64-2.7/psutil/_psutil_linux.so -> psutil\ncopying build/lib.linux-x86_64-2.7/psutil/_psutil_posix.so -> psutil\n\n20160429181908"

^^ please do note the buildid.

Then the next step, it sends that whole thing "Using real ... " to the graph server.

In the printconfigsetting step, the output of
that printconfig setting python command is sent to the property buildid.
With the additional "Using real ..." stuff, that's also being included.
(I suspect the graph server accepts the input from this step and sanitizes it.)

Both Linux and Windows debug fail(well, windows is an exception issue) because
post-get buildid/appID step (in which the buildid is set to gibberish), 
subsequent steps that require the buildid, chokes as it tries to use the
bogus buildid in the parameters, i.e.:

mock_mozilla -r mozilla-centos6-x86_64 --cwd /builds/slave/tb-try-c-cen-lx-d-000000000000/build/objdir-tb --unpriv --shell '/usr/bin/env HG_SHARE_BASE_DIR="/builds/hg-shared" XPCOM_DEBUG_BREAK="stack-and-abort" MOZ_CRASHREPORTER_NO_REPORT="1" UPLOAD_USER="trybld" MOZ_AUTOMATION="1" UPLOAD_HOST="upload.trybld.productdelivery.prod.mozaws.net" MOZ_OBJDIR="/builds/slave/tb-try-c-cen-lx-d-000000000000/build/objdir-tb" LD_LIBRARY_PATH="/tools/gcc-4.3.3/installed/lib:objdir-tb/mozilla/dist/bin" PATH="/tools/buildbot/bin:/usr/local/bin:/usr/lib/ccache:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/tools/git/bin:/tools/python27/bin:/tools/python27-mercurial/bin:/home/cltbld/bin" UPLOAD_SSH_KEY="~/.ssh/trybld_dsa" TINDERBOX_OUTPUT="1" CCACHE_DIR="/builds/ccache" POST_UPLOAD_CMD="post_upload.py --tinderbox-builds-dir ewong@pw-wspx.org-607e41e1c4e1/ -p thunderbird -i Using real prefix '"'"'/tools/python27'"'"'
New python executable in /builds/slave/tb-try-c-cen-lx-d-000000000000/build/objdir-tb/_virtualenv/bin/python
Installing setuptools, pip, wheel...done.
running build_ext
copying build/lib.linux-x86_64-2.7/psutil/_psutil_linux.so -> psutil
copying build/lib.linux-x86_64-2.7/psutil/_psutil_posix.so -> psutil

20160430014126 --revision 607e41e1c4e102f0a7f412cdd03d535177c046e6 --who ewong@pw-wspx.org --builddir try-comm-central-linux-debug --release-to-try-builds" LC_ALL="C" CCACHE_COMPRESS="1" UPLOAD_TO_TEMP="1" DISPLAY=":2" CCACHE_UMASK="002" python '"'"'/builds/slave/tb-try-c-cen-lx-d-000000000000/tools/buildfarm/utils/retry.py'"'"' -s 1 -r 5 -t 2460 make upload'


This is theoretically a separate bug.  :jlund, Should I just fix it also in this bug?
(if so, the options are:
   1) fix it such that it doesn't output the 'unneeded logging output'
   2) add another step between the get_buildid step and the next step which basically
      takes the buildid property and sanitize it.
   3) change the SetProperty() to SetPropertyFromCommand() and that will allow us
      the flexibility of taking whatever is generated by the printconfigsetting and
      sanitize it before outputting it to the system which sets the property.

:glandium, despite the errors, is the patch right?
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(jlund)
I'll sound like a broken record, but can't you just use the same python for everything?
Flags: needinfo?(mh+mozilla)
Comment on attachment 8740778 [details] [diff] [review]
[buildbotcustom] proposed patch (v2)

Given comment 38, perhaps this patch is the option that we can use.
Attachment #8740778 - Flags: review?(jlund)
Comment on attachment 8747394 [details] [diff] [review]
proposed patch (v1)

Review of attachment 8747394 [details] [diff] [review]:
-----------------------------------------------------------------

virtualenv is third-party, you'll want to discuss this upstream.
Attachment #8747394 - Flags: review?(mh+mozilla)
Comment on attachment 8740778 [details] [diff] [review]
[buildbotcustom] proposed patch (v2)

Review of attachment 8740778 [details] [diff] [review]:
-----------------------------------------------------------------

I've lost touch with this bug and how things work. But I'll trust that the path you're setting this to is real across all of osx/linux.

however, I do have 2 q's. clearing r? until questions are addressed.

::: process/factory.py
@@ +1376,5 @@
>              # hax https://bugzilla.mozilla.org/show_bug.cgi?id=1232466#c10
>              if self.platform.startswith('win'):
>                  python = ['c:/mozilla-build/python27/python', '-u']
>              else:
> +                python = ['%s/_virtualenv/bin/python' % self.mozillaObjdir]

1) judging by this patch, windows works as is?


2) this suggests that for FF, we assume self.mozillaObjdir to hold a different path value than TB. do you suspect this to be a problem: https://dxr.mozilla.org/build-central/source/buildbotcustom/process/factory.py#967
Attachment #8740778 - Flags: review?(jlund)
> This is theoretically a separate bug.  :jlund, Should I just fix it also in
> this bug?
> (if so, the options are:
>    1) fix it such that it doesn't output the 'unneeded logging output'
>    2) add another step between the get_buildid step and the next step which
> basically
>       takes the buildid property and sanitize it.
>    3) change the SetProperty() to SetPropertyFromCommand() and that will
> allow us
>       the flexibility of taking whatever is generated by the
> printconfigsetting and
>       sanitize it before outputting it to the system which sets the property.

does this still apply if you go with the patch you just asked me to r?
Flags: needinfo?(jlund)
(In reply to Jordan Lund (:jlund) from comment #41)
> Comment on attachment 8740778 [details] [diff] [review]
> [buildbotcustom] proposed patch (v2)
> 
> Review of attachment 8740778 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I've lost touch with this bug and how things work. But I'll trust that the
> path you're setting this to is real across all of osx/linux.
> 
> however, I do have 2 q's. clearing r? until questions are addressed.
> 
> ::: process/factory.py
> @@ +1376,5 @@
> >              # hax https://bugzilla.mozilla.org/show_bug.cgi?id=1232466#c10
> >              if self.platform.startswith('win'):
> >                  python = ['c:/mozilla-build/python27/python', '-u']
> >              else:
> > +                python = ['%s/_virtualenv/bin/python' % self.mozillaObjdir]
> 
> 1) judging by this patch, windows works as is?

Yes, windows works as is.

> 
> 
> 2) this suggests that for FF, we assume self.mozillaObjdir to hold a
> different path value than TB. do you suspect this to be a problem:
> https://dxr.mozilla.org/build-central/source/buildbotcustom/process/factory.
> py#967

tl;dr  (probably can become a disertation. ;P )

self.mozillaObjdir is  build/objdir  regardless of which application. The
line you linked no longer holds true (for TB and SM because of 
the flattening of the objdir in bug 1059511 (I believe), which means it goes to line 970.

Why: In TB's buildconfigs, mozillaDir is not defined except here:
http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla/thunderbird_config.py#l933

which affects only "cypress" as:
http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla/thunderbird_config.py#l933
ACTIVE_PROJECT_BRANCHES is from:

http://hg.mozilla.org/build/buildbot-configs/file/tip/mozilla/thunderbird_config.py#l7

which is only cypress and even then, it's defined as ''.
And since mozilla_dir isn't defined in the thunderbird_config branches,

http://hg.mozilla.org/build/buildbotcustom/file/tip/misc.py#l1832

will turn out to hand mozillaDir to NightlyBuildFactory and NightlyRepackFactory
as '', which essentially is None (I know, not really).

So in the NightlyBuildFactory and NightlyRepackFactory, this
check no longer holds true.  Which means, it can essentially be changed
to the following:

       # SeaMonkey/Thunderbird make use of mozillaDir. Firefox does not.
       self.mozillaDir = ''
       self.mozillaObjdir = self.objdir

       # Thunderbird now doesn't have mozillaDir, but still has a
       # mozillaSrcDir
       if mozillaSrcDir:
           self.mozillaSrcDir = '/%s' % (mozillaSrcDir)
       else:
           self.mozillaSrcDir = ''

Further note that I am 'trying' to get that virtualenv patch applied
to upstream but I've kinda gotten stuck as it breaks the virtualenv
tests. ;/
(In reply to Jordan Lund (:jlund) from comment #42)
> > This is theoretically a separate bug.  :jlund, Should I just fix it also in
> > this bug?
> > (if so, the options are:
> >    1) fix it such that it doesn't output the 'unneeded logging output'
> >    2) add another step between the get_buildid step and the next step which
> > basically
> >       takes the buildid property and sanitize it.
> >    3) change the SetProperty() to SetPropertyFromCommand() and that will
> > allow us
> >       the flexibility of taking whatever is generated by the
> > printconfigsetting and
> >       sanitize it before outputting it to the system which sets the property.
> 
> does this still apply if you go with the patch you just asked me to r?

No, not really.  In any event, point #1 requires an upstream patch,
so if the patch I asked you to r? isn't right, I make it right
or it's up to #2 or #3.  However, #3 assumes the buildbot version supports
SetPropertyFromCommand()  which is a 0.8.8+ feature. If the buildbot on the
builders don't support #3, then it's totally a #2 option.
See Also: → 1270045
Bug 1270045 is going to disable graph server post of ctor data, since graphs.m.o is being decommissioned. That will fix this bug because buildbot will get to http://hg.mozilla.org/build/buildbotcustom/file/7d70b0149b14/process/factory.py#l1430 and never go down that block, and so never try to run in printconfigsetting.py in addBuildInfoSteps().
I was probably too optimistic about bug 1270045 fixing this. It does remove the lookup of buildID etc for ctors, but later in the build there's another place it does that again. The code looks very similar so it will probably fail there instead, see
 http://hg.mozilla.org/build/buildbotcustom/file/default/process/factory.py#l1669

The net result is likely to be the build will now do buildsymbols and packaging, but fail just before uploading. We'll see from
  https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=4c4bfb67c29380ae4102566bbee076b370ba8f92
(In reply to Nick Thomas [:nthomas] from comment #46)
> I was probably too optimistic about bug 1270045 fixing this. It does remove
> the lookup of buildID etc for ctors, but later in the build there's another
> place it does that again. The code looks very similar so it will probably
> fail there instead, see
>  http://hg.mozilla.org/build/buildbotcustom/file/default/process/factory.
> py#l1669

Thanks Nick. Thought I had 'em all. Need to update my patch.
:jlund, since self.mozillaObjdir is just build/objdir (for both FF
and TB),  this would point to the same objdir.
Assignee: nobody → ewong
Attachment #8740778 - Attachment is obsolete: true
Attachment #8747394 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8749232 - Flags: review?(jlund)
Comment on attachment 8749232 [details] [diff] [review]
[buildbotcustom] use virtualenv's bin/python and not /tools/buildbot/bin/python [checked-in]

Review of attachment 8749232 [details] [diff] [review]:
-----------------------------------------------------------------

sure. let's try it :)
Attachment #8749232 - Flags: review?(jlund) → review+
Comment on attachment 8749232 [details] [diff] [review]
[buildbotcustom] use virtualenv's bin/python and not /tools/buildbot/bin/python [checked-in]

Pushed to buildbotcustom:
http://hg.mozilla.org/build/buildbotcustom/rev/a6b085c766b5
Attachment #8749232 - Attachment description: [buildbotcustom] use virtualenv's bin/python and not /tools/buildbot/bin/python → [buildbotcustom] use virtualenv's bin/python and not /tools/buildbot/bin/python [checked-in]
The buildid now reads:
Upon execvpe objdir-tb/_virtualenv/bin/python ['objdir-tb/_virtualenv/bin/python', '/builds/slave/tb-c-cen-lx-000000000000000000/build/mozilla/mach', 'python', '/builds/slave/tb-c-cen-lx-000000000000000000/build/mozilla/config/printconfigsetting.py', '/builds/slave/tb-c-cen-lx-000000000000000000/build/objdir-tb/dist/bin/application.ini', 'App', 'BuildID'] in environment id 50550336
:Traceback (most recent call last):
  File "/tools/buildbot/lib/python2.7/site-packages/twisted/internet/process.py", line 414, in _fork
    executable, args, environment)
  File "/tools/buildbot/lib/python2.7/site-packages/twisted/internet/process.py", line 460, in _execChild
    os.execvpe(executable, args, environment)
  File "/tools/buildbot/lib/python2.7/os.py", line 353, in execvpe
    _execvpe(file, args, env)
  File "/tools/buildbot/lib/python2.7/os.py", line 368, in _execvpe
    func(file, *argrest)
OSError: [Errno 2] No such file or directory
What is the protocol here for getting this backed out?
Flags: needinfo?(jlund)
Flags: needinfo?(ewong)
(In reply to Kent James (:rkent) from comment #54)
> What is the protocol here for getting this backed out?

I'll back it out now and reconfig (put it out of production). will update this bug once that is done
Flags: needinfo?(jlund)
Flags: needinfo?(ewong)
backed out. reconfig will start in 2m and take ~30min before taking effect. After 16:30 PT new builds can be triggered.
Sorry guys. My bad. Was supposed to be the full objdir path
Attachment #8749232 - Attachment is obsolete: true
Attachment #8749909 - Flags: review?(jlund)
FYI, we're now (since Saturday) seeing the same failure on OS X builds, but so far not on OS X nightlies:
https://treeherder.mozilla.org/logviewer.html#?job_id=37330&repo=comm-central#L16
Flags: needinfo?(ewong)
Comment on attachment 8749909 [details] [diff] [review]
[buildbotcustom] uses absMozillaObjDir

Review of attachment 8749909 [details] [diff] [review]:
-----------------------------------------------------------------

::: process/factory.py
@@ +1673,5 @@
>          # hax https://bugzilla.mozilla.org/show_bug.cgi?id=1232466#c10
>          if self.platform.startswith('win'):
>              python = ['c:/mozilla-build/python27/python', '-u']
>          else:
> +            python = ['%s/bin/python' % self.absMozillaObjDir]

I'm a little confused how you have both of the following in this patch:

   python = ['%s/bin/python' % self.absMozillaObjDir]

and

   python = ['%s/_virtualenv/bin/python' % self.absMozillaObjdir]


are you expecting self.absMozillaObjdir to change?
Attachment #8749909 - Flags: review?(jlund)
(In reply to Edmund Wong (:ewong) from comment #58)
> Sorry guys. My bad. Was supposed to be the full objdir path

Are you sure absMozillaObjdir is an absolute path?
https://mxr.mozilla.org/build/source/buildbotcustom/process/factory.py#2195
(In reply to aleth [:aleth] from comment #62)
> (In reply to Edmund Wong (:ewong) from comment #58)
> > Sorry guys. My bad. Was supposed to be the full objdir path
> 
> Are you sure absMozillaObjdir is an absolute path?
> https://mxr.mozilla.org/build/source/buildbotcustom/process/factory.py#2195

My bad.  Totally thought that absMozillaObjdir and completely failed to
see that comment.  :(
Flags: needinfo?(ewong)
Attachment #8749909 - Attachment is obsolete: true
Attachment #8750131 - Flags: review?(jlund)
Summary: try-comm-central issue: OSError: [Errno 40] Too many levels of symbolic links → OSError: [Errno 40] Too many levels of symbolic links
Comment on attachment 8750131 [details] [diff] [review]
[buildbotcustom] uses the absolute Mozilla Objdir.

Review of attachment 8750131 [details] [diff] [review]:
-----------------------------------------------------------------

r- until typos are fixed.

::: process/factory.py
@@ +1377,5 @@
>              if self.platform.startswith('win'):
>                  python = ['c:/mozilla-build/python27/python', '-u']
>              else:
> +                python = [WithProperty('%(basedir)s/' +
> +                          '%s/_virtualenv/bin/python' % self.absMozillaObjdir)]

WithProperty and self.absMozillaObjdir don't exist.

I think you mean

WithProperties and self.absMozillaObjDir (MercurialBuildFactory's)

interestingly, SingleSourceFactory has a self.absMozillaObjdir, and I bet that was a typo: https://hg.mozilla.org/build/buildbotcustom/rev/2c7eb46074b8

once you fix those typos, I'm worried about this patch in general. I would prefer it to just affect what is broken: TB. But this patch as I've mentioned before will touch a lot of Firefox (esr) and Fennec (esr, m-r, m-b). Some values for absMozillaObjdir I've determined will be:

WithProperties "%(basedir)s/build/obj-firefox/_virtualenv/bin/python
WithProperties "%(basedir)s/build/objdir-tb/_virtualenv/bin/python
WithProperties "%(basedir)s/build/comm-aurora/objdir-tb/_virtualenv/bin/python
WithProperties "%(basedir)s/build/comm-central/objdir-tb/i386/_virtualenv/bin/python

hopefully this is what you expect. We can try this but I'd like to do a Fennec m-r re-trigger and a Firefox esr re-trigger in treeherder after it lands on prod so we catch new regressions before next release..
Attachment #8750131 - Flags: review?(jlund) → review-
(In reply to Jordan Lund (:jlund) from comment #65)
> Comment on attachment 8750131 [details] [diff] [review]
> [buildbotcustom] uses the absolute Mozilla Objdir.
> 
> Review of attachment 8750131 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r- until typos are fixed.
> 
> ::: process/factory.py
> @@ +1377,5 @@
> >              if self.platform.startswith('win'):
> >                  python = ['c:/mozilla-build/python27/python', '-u']
> >              else:
> > +                python = [WithProperty('%(basedir)s/' +
> > +                          '%s/_virtualenv/bin/python' % self.absMozillaObjdir)]
> 
> WithProperty and self.absMozillaObjdir don't exist.
> 
> I think you mean
> 
> WithProperties and self.absMozillaObjDir (MercurialBuildFactory's)

Ah yes. My bad.  

> 
> interestingly, SingleSourceFactory has a self.absMozillaObjdir, and I bet
> that was a typo: https://hg.mozilla.org/build/buildbotcustom/rev/2c7eb46074b8
> 

Right.  Do you want me to fix that?  Btw, while scanning through the code,
I've also noticed references to '/pub/mozilla.org/'...  which no longer
exists (since everything's been moved to S3)? 

Or did I mistake it?


> once you fix those typos, I'm worried about this patch in general. I would
> prefer it to just affect what is broken: TB. But this patch as I've
> mentioned before will touch a lot of Firefox (esr) and Fennec (esr, m-r,
> m-b). Some values for absMozillaObjdir I've determined will be:
> 
> WithProperties "%(basedir)s/build/obj-firefox/_virtualenv/bin/python
> WithProperties "%(basedir)s/build/objdir-tb/_virtualenv/bin/python
> WithProperties
> "%(basedir)s/build/comm-aurora/objdir-tb/_virtualenv/bin/python
> WithProperties
> "%(basedir)s/build/comm-central/objdir-tb/i386/_virtualenv/bin/python
> 
> hopefully this is what you expect. We can try this but I'd like to do a
> Fennec m-r re-trigger and a Firefox esr re-trigger in treeherder after it
> lands on prod so we catch new regressions before next release..

Yes, those objdir are expected.  last one is OSX64.
Attachment #8750131 - Attachment is obsolete: true
Attachment #8750560 - Flags: review?(jlund)
Attachment #8750560 - Flags: review?(jlund) → review+
:jlund, I've seen the c-c looks green but how are the fennec and esr trees?
Looks like it worked! :-)

Thanks ewong for fixing this.

I suspect the typo mentioned in comment 65 should be fixed too, probably in a separate bug.
:ewong, can you comment on this change and how it could affect linux hardware machines in scl3?  We have talos data becoming unstable on May 9th (bug 1271948), and this went live about the same time.  Right now it appears the load on the machines is much higher- could it be that changing python is causing us problems or causing the spawned processes that run test jobs to be consuming more resources somehow?

I need to know:
* does this affect linux hardware machines?
* does this affect all test jobs and all branches?
* what testing was done on this to validate results?
Flags: needinfo?(ewong)
(In reply to aleth [:aleth] from comment #68)
> https://hg.mozilla.org/build/buildbotcustom/rev/
> a15cf826c0cb2facb526acdc568aac416c170733
> Bug 1262760 - Use the objdir's virtualenv's python executable. r=jlund

(In reply to Release Engineering SlaveAPI Service from comment #69)
> In production: https://hg.mozilla.org/build/buildbotcustom/rev/a15cf826c0cb

And... backed out again :(

https://hg.mozilla.org/build/buildbotcustom/rev/c8883ca15b03

Caused Bug 1272666 


(In reply to Joel Maher (:jmaher) from comment #72)
> :ewong, can you comment on this change and how it could affect linux
> hardware machines in scl3?  We have talos data becoming unstable on May 9th
> (bug 1271948), and this went live about the same time.  Right now it appears
> the load on the machines is much higher- could it be that changing python is
> causing us problems or causing the spawned processes that run test jobs to
> be consuming more resources somehow?
> 
> I need to know:
> * does this affect linux hardware machines?
> * does this affect all test jobs and all branches?
> * what testing was done on this to validate results?

That would be odd, we're working on getting this patch's backout into production about now. Can you confirm/deny if this affects your metrics there in anyway?
Depends on: 1272666
Flags: needinfo?(jmaher)
ok, after collecting a bunch of data, I don't believe we have issues with this patch to affect the talos performance numbers.
Flags: needinfo?(jmaher)
Flags: needinfo?(ewong)
(In reply to Joel Maher (:jmaher) from comment #74)
> ok, after collecting a bunch of data, I don't believe we have issues with
> this patch to affect the talos performance numbers.

Fwiw, I tested the step on a linux64 loaner
Attached patch [buildbotcustom] proposed patch (obsolete) — Splinter Review
This pretty much avoids screwing around with the actual code, but
uses that if self.mozillaSrcDir check (only TB uses this (and SeaMonkey uses a different branch)).
Attachment #8750560 - Attachment is obsolete: true
Attachment #8756664 - Flags: review?(jlund)
Comment on attachment 8756664 [details] [diff] [review]
[buildbotcustom] proposed patch

Incidentally, this patch is thanks to :jlund's brilliant suggestion, so
when it works, kudos to him.
Comment on attachment 8756664 [details] [diff] [review]
[buildbotcustom] proposed patch

Review of attachment 8756664 [details] [diff] [review]:
-----------------------------------------------------------------

::: process/factory.py
@@ +1294,1 @@
>              if self.mozillaSrcDir:

looks good but I think we should make an elif out of these

if windows:

elif thunderbird:

else:  # everything else
Attachment #8756664 - Flags: review?(jlund) → review-
Attached patch [buildbotcustom] proposed patch (obsolete) — Splinter Review
Attachment #8756664 - Attachment is obsolete: true
Attachment #8756694 - Flags: review?(jlund)
Comment on attachment 8756694 [details] [diff] [review]
[buildbotcustom] proposed patch

Review of attachment 8756694 [details] [diff] [review]:
-----------------------------------------------------------------

I think this will work though, after discussion in irc, I think the best solution is to take ewong's suggestion and put the `if windows` at the end:

if self.mozillaSrcDir:  # thunderbird
    machPath = '%(basedir)s/build/mozilla/mach'
    python = [WithProperties('%(basedir)s/' +
            '%s/_virtualenv/bin/python' % self.absMozillaObjDir)]
else:
    machPath = '%(basedir)s/build/mach'
    python = ['/tools/buildbot/bin/python']

if self.platform.startswith('win'):
    # all windows platforms, including tb, should use this python path
    python = ['c:/mozilla-build/python27/python', '-u']
Attachment #8756694 - Attachment is obsolete: true
Attachment #8756694 - Flags: review?(jlund)
Attachment #8756697 - Flags: review?(jlund)
Comment on attachment 8756697 [details] [diff] [review]
[buildbotcustom] proposed patch (version lost count)

Review of attachment 8756697 [details] [diff] [review]:
-----------------------------------------------------------------

looks good, fingers crossed!
Attachment #8756697 - Flags: review?(jlund) → review+
Call me clueless in Seattle; but, can someone point out the results of
the push?  Any bustages?  (rhetorical question really, since the patch
has not yet been backed out).  Anything need fixing?
Ok, apparently the tb esr builds aren't happy; but c-c is building.
this patch (albeit a hack) makes sure the printconfigsetting env has the pythonpath set to where xpcshell module is.  Also, (though I'm not
100% sure it's needed (it was needed on the loaner), I set the
I_PREFER_A_SUBOPTIMAL_MERCURIAL_EXPERIENCE env to bypass the
requirement of running |mach mercurial-setup|.

I've tested this on m-esr45 so this works.  it doesn't choke.
Attachment #8757196 - Flags: review?(jlund)
Attachment #8756697 - Attachment description: [buildbotcustom] proposed patch (version lost count) → [buildbotcustom] proposed patch (version lost count) [checked-in]
(In reply to Edmund Wong (:ewong) from comment #87)
> Created attachment 8757196 [details] [diff] [review]
> [buildbotcustom] proposed fix patch (v1)
> 
> this patch (albeit a hack) makes sure the printconfigsetting env has the
> pythonpath set to where xpcshell module is.  Also, (though I'm not
> 100% sure it's needed (it was needed on the loaner), I set the
> I_PREFER_A_SUBOPTIMAL_MERCURIAL_EXPERIENCE env to bypass the
> requirement of running |mach mercurial-setup|.
> 
> I've tested this on m-esr45 so this works.  it doesn't choke.

let me clarify the last statement here.  

I know why Firefox ESR45 broke, because it couldn't find the xpcshell module.
Had I set the PYTHONPATH to the xpcshell path, it works; but this
is only needed with ESR45.  ESR48(or whatever the next ESR is) will have had this 
fixed, so once Firefox/TB hits the next ESR,  this patch probably should be backed
out or removed, since I_PREFER.. was changed.  IF it is needed, it'd be
NO_MERCURIAL_SETUP_CHECK.
(In reply to Edmund Wong (:ewong) from comment #87)
> Created attachment 8757196 [details] [diff] [review]
> [buildbotcustom] proposed fix patch (v1)
> 
> this patch (albeit a hack) makes sure the printconfigsetting env has the
> pythonpath set to where xpcshell module is.

Why not just apply the build/virtualenv_packages.txt part of bug 1216817 to esr? it's a trivial change that should do no harm.

> Also, (though I'm not
> 100% sure it's needed (it was needed on the loaner), I set the
> I_PREFER_A_SUBOPTIMAL_MERCURIAL_EXPERIENCE env to bypass the
> requirement of running |mach mercurial-setup|.

The code in build/mach_bootstrap.py explicitly skips the mercurial-setup barking when MOZ_AUTOMATION is set. This is true on esr45 too. So you don't need this.
I got pinged by TB drivers asking about this bustage. I just backed out the buildbotcustom patch...

https://hg.mozilla.org/build/buildbotcustom/rev/c543cd26f4ad
(In reply to Mike Hommey [:glandium] from comment #89)
> (In reply to Edmund Wong (:ewong) from comment #87)
> > Created attachment 8757196 [details] [diff] [review]
> > [buildbotcustom] proposed fix patch (v1)
> > 
> > this patch (albeit a hack) makes sure the printconfigsetting env has the
> > pythonpath set to where xpcshell module is.
> 
> Why not just apply the build/virtualenv_packages.txt part of bug 1216817 to
> esr? it's a trivial change that should do no harm.

ewong, do you still have a loaner with which you could test that this does the trick? The relevant commit containing this change is https://hg.mozilla.org/mozilla-central/rev/e9e84784daf9.
(In reply to aleth [:aleth] from comment #91)
> (In reply to Mike Hommey [:glandium] from comment #89)
> > (In reply to Edmund Wong (:ewong) from comment #87)
> > > Created attachment 8757196 [details] [diff] [review]
> > > [buildbotcustom] proposed fix patch (v1)
> > > 
> > > this patch (albeit a hack) makes sure the printconfigsetting env has the
> > > pythonpath set to where xpcshell module is.
> > 
> > Why not just apply the build/virtualenv_packages.txt part of bug 1216817 to
> > esr? it's a trivial change that should do no harm.
> 
> ewong, do you still have a loaner with which you could test that this does
> the trick? The relevant commit containing this change is
> https://hg.mozilla.org/mozilla-central/rev/e9e84784daf9.

clearing review until this suggestion is either actioned or ruled out.
Attachment #8757196 - Flags: review?(jlund)
the first buildbotcustom patch is 'right' (for some future definition, i.e. when it doesn't burn trees) but it required what glandium mentioned..  adding an entry into the esr45 build/virtual... file.  which will require probably lots of
coordination with relman etc..   the complexity increases..
(In reply to Edmund Wong (:ewong) from comment #93)
> the first buildbotcustom patch is 'right' (for some future definition, i.e.
> when it doesn't burn trees) but it required what glandium mentioned.. 
> adding an entry into the esr45 build/virtual... file.  which will require
> probably lots of
> coordination with relman etc..   the complexity increases..

what's one more bug mail?   I actually meant, "The plot thickens..."
This patch adds the xpcshell.pth to the build/virtualenv_packages.txt so that during the printconfigsetting step, it finds the xpcshellcommandline module.
Attachment #8757196 - Attachment is obsolete: true
Attachment #8758505 - Flags: review?(mh+mozilla)
:jlund, I've attached the esr45 build config patch.  

I've tested it on a loaner with mozilla-esr45 and it works with the 
required buildbotcustom change.
Flags: needinfo?(jlund)
Comment on attachment 8758505 [details] [diff] [review]
proposed patch (V1)

Review of attachment 8758505 [details] [diff] [review]:
-----------------------------------------------------------------

Another way to fix this is bug 1277087, FWIW.
Attachment #8758505 - Flags: review?(mh+mozilla) → review+
(In reply to Edmund Wong (:ewong) from comment #96)
> :jlund, I've attached the esr45 build config patch.  
> 
> I've tested it on a loaner with mozilla-esr45 and it works with the 
> required buildbotcustom change.

okay cool. once you get approval from someone in relman: add a approval-esr45? flag for ritu or lizzard here: https://bugzilla.mozilla.org/attachment.cgi?id=8758505&action=edit we can try this buildbot patch again. put a note showing where the esr45 patch has been tested and which rel branches it is already on.

note: esr45 ff uses moharness to build now so we have reduced scope risk to from two products to one: tb
Flags: needinfo?(jlund)
Comment on attachment 8758505 [details] [diff] [review]
proposed patch (V1)

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration:
User impact if declined:  building esr45 fails when the previous
buildbotcustom patch is applied.
Fix Landed on Version: 45
Risk to taking this patch (and alternatives if risky): none
String or UUID changes made by this patch:  None

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8758505 - Flags: approval-mozilla-esr45?
Comment on attachment 8758505 [details] [diff] [review]
proposed patch (V1)

could you have a look at this. I wouldn't be surprised if this bug will also fix recent post merge fennec beta builds: comment 100
Flags: needinfo?(rkothari)
Hi Edmund, is this being nominated for ESR 45.3.0? The doors are shut for ESR 45.2.0. Please confirm.
Flags: needinfo?(rkothari) → needinfo?(ewong)
Comment on attachment 8758505 [details] [diff] [review]
proposed patch (V1)

RelEng flagged this as something we need for handling post-merge issues. I am approving this assuming this will land for ESR45.3.0.
Attachment #8758505 - Flags: approval-mozilla-esr45? → approval-mozilla-esr45+
(In reply to Ritu Kothari (:ritu) from comment #102)
> Hi Edmund, is this being nominated for ESR 45.3.0? The doors are shut for
> ESR 45.2.0. Please confirm.

Hi :ritu,  Whichever ESR 45 is opened for nominations I guess. I don't
know anything about the ESR branch.
Flags: needinfo?(ewong)
Attachment #8756697 - Attachment description: [buildbotcustom] proposed patch (version lost count) [checked-in] → [buildbotcustom] proposed patch (version lost count)
Keywords: checkin-needed
(In reply to Edmund Wong (:ewong) from comment #104)
> (In reply to Ritu Kothari (:ritu) from comment #102)
> > Hi Edmund, is this being nominated for ESR 45.3.0? The doors are shut for
> > ESR 45.2.0. Please confirm.
> 
> Hi :ritu,  Whichever ESR 45 is opened for nominations I guess. I don't
> know anything about the ESR branch.

esr 45.3.0 is fine I think
(In reply to Jordan Lund (:jlund) from comment #101)
> Comment on attachment 8758505 [details] [diff] [review]
> proposed patch (V1)
> 
> could you have a look at this. I wouldn't be surprised if this bug will also
> fix recent post merge fennec beta builds: comment 100

I wasn't very explicit here. The fix for broken fennec beta and the tb jobs that are hitting this bug, I believe, is similar to the buildbotcustom patch attached here: https://bugzilla.mozilla.org/attachment.cgi?id=8756697

however, the buildbotcustom patch breaks esr so ewong needed to get approval and land that first.

ewong: so now that fennec is affected, I'm thinking we should remove the `if self.mozillaSrcDir:` from your bbotcustom patch so it fixes fennec too. what do you think?
Flags: needinfo?(ewong)
(In reply to Jordan Lund (:jlund) from comment #106)
> (In reply to Jordan Lund (:jlund) from comment #101)
> > Comment on attachment 8758505 [details] [diff] [review]
> > proposed patch (V1)
> > 
> > could you have a look at this. I wouldn't be surprised if this bug will also
> > fix recent post merge fennec beta builds: comment 100
> 
> I wasn't very explicit here. The fix for broken fennec beta and the tb jobs
> that are hitting this bug, I believe, is similar to the buildbotcustom patch
> attached here: https://bugzilla.mozilla.org/attachment.cgi?id=8756697
> 
> however, the buildbotcustom patch breaks esr so ewong needed to get approval
> and land that first.
> 
> ewong: so now that fennec is affected, I'm thinking we should remove the `if
> self.mozillaSrcDir:` from your bbotcustom patch so it fixes fennec too. what
> do you think?

Having done some manual teasing of the broken android stuff, I was initially under the impression the attached buildbot patch won't fix it (due to running it after the brokenness already hit)

HOWEVER it appears that the broken python invoke that this patch fixes, actually recreates (in a broken way) that venv to begin with, so I think this is what we need to do...
Component: Build Config → General Automation
Product: Thunderbird → Release Engineering
QA Contact: catlee
Version: Trunk → unspecified
I did no proper testing here, but afaict this should work.

The only question to be is if we want `absMozilla` or the non-absolute-version (or if we do want absMozilla do we want the baseDir)

Either way here you go, review at will.
Attachment #8760846 - Flags: review?(jlund)
Comment on attachment 8760846 [details] [diff] [review]
[buildbotcustom] Ontop of ewongs patch above

Review of attachment 8760846 [details] [diff] [review]:
-----------------------------------------------------------------

::: process/factory.py
@@ +1287,5 @@
>              printconfig_workdir = WithProperties('%(basedir)s/build/' + self.objdir)
>  
> +            machPath = '%(basedir)s/build/mozilla/mach'
> +            python = [WithProperties('%(basedir)s/' +
> +                      '%s/_virtualenv/bin/python' % self.absMozillaObjDir)]

I fear using self.absMozillaObjDir here might be tb specific..

investigating...
Comment on attachment 8760846 [details] [diff] [review]
[buildbotcustom] Ontop of ewongs patch above

Review of attachment 8760846 [details] [diff] [review]:
-----------------------------------------------------------------

::: process/factory.py
@@ +1287,5 @@
>              printconfig_workdir = WithProperties('%(basedir)s/build/' + self.objdir)
>  
> +            machPath = '%(basedir)s/build/mozilla/mach'
> +            python = [WithProperties('%(basedir)s/' +
> +                      '%s/_virtualenv/bin/python' % self.absMozillaObjDir)]

I guess because of https://bugzilla.mozilla.org/show_bug.cgi?id=1059511 this will work for either tb and ff.

I don't see the harm in using abs paths for it either.
Attachment #8760846 - Flags: review?(jlund) → review+
(In reply to Jordan Lund (:jlund) from comment #110)
> Comment on attachment 8760846 [details] [diff] [review]
> [buildbotcustom] Ontop of ewongs patch above
> 

> 
> I don't see the harm in using abs paths for it either.

might have to be actually judging by comment 58 and the backout comments before it
(will take the first to look)
Attachment #8760846 - Attachment is obsolete: true
Attachment #8760865 - Flags: review?(rail)
Attachment #8760865 - Flags: review?(jlund)
Comment on attachment 8760865 [details] [diff] [review]
[buildbotcustom] ontop of ewongs patch take 2

lgtm
Attachment #8760865 - Flags: review?(rail) → review+
Comment on attachment 8760865 [details] [diff] [review]
[buildbotcustom] ontop of ewongs patch take 2

Review of attachment 8760865 [details] [diff] [review]:
-----------------------------------------------------------------

actually, does this patch assume ewong's has landed?
Comment on attachment 8760865 [details] [diff] [review]
[buildbotcustom] ontop of ewongs patch take 2

Review of attachment 8760865 [details] [diff] [review]:
-----------------------------------------------------------------

works for me. trusting on rail and you that using abs paths makes it so we don't need two different values for tb vs fennec.
Attachment #8760865 - Flags: review?(jlund) → review+
landed for post-land review, so that I can get the reconfig going before the hour is up...

https://hg.mozilla.org/build/buildbotcustom/rev/6f6a9824ca4d

r? rail?
Flags: needinfo?(rail)
lgtm!
Flags: needinfo?(rail)
jlund, callek does something need to still land here on mozilla-beta ?
Flags: needinfo?(jlund)
Flags: needinfo?(bugspam.Callek)
(In reply to Carsten Book [:Tomcat] from comment #121)
> jlund, callek does something need to still land here on mozilla-beta ?

nothing on beta afaik. fennec beta is unblocked. not sure about tb. ewong: how is tb looking? If everything is good. please resolve this bug. If you haven't landed the esr45 patch, and esr45 tb tests are broken, you might want to land now seeing you have approval: https://bugzilla.mozilla.org/attachment.cgi?id=8758505
Flags: needinfo?(jlund)
Flags: needinfo?(bugspam.Callek)
(In reply to Jordan Lund (:jlund) from comment #122)
> (In reply to Carsten Book [:Tomcat] from comment #121)
> > jlund, callek does something need to still land here on mozilla-beta ?
> 
> nothing on beta afaik. fennec beta is unblocked. not sure about tb. ewong:
> how is tb looking? If everything is good. please resolve this bug. If you
> haven't landed the esr45 patch, and esr45 tb tests are broken, you might
> want to land now seeing you have approval:
> https://bugzilla.mozilla.org/attachment.cgi?id=8758505

TB looks good on c-c.  ESR45 is busted on nightly and will probably
be busted on regular build, so I'll push the patch to esr45.

Thanks
Flags: needinfo?(ewong)
https://hg.mozilla.org/releases/mozilla-esr45/rev/508243893e0c


Does this need to land anywhere else?
Flags: needinfo?(ewong)
(In reply to Wes Kocher (:KWierso) from comment #124)
> https://hg.mozilla.org/releases/mozilla-esr45/rev/508243893e0c
> 
> 
> Does this need to land anywhere else?

I don't believe so, since the issue lies in the missing of
the xpcshell.pth in the packages file, and this occurs only
with um 45 and less.  So I believe the other trees are ok.
Flags: needinfo?(ewong)
Closing then (the checkin-needed patch looks like it already landed). Thx a lot Edmund!!!
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Umm, ewong, I'm not sure this actually worked on comm-esr? https://treeherder.mozilla.org/#/jobs?repo=comm-esr45
Flags: needinfo?(ewong)
(In reply to aleth [:aleth] from comment #127)
> Umm, ewong, I'm not sure this actually worked on comm-esr?
> https://treeherder.mozilla.org/#/jobs?repo=comm-esr45

ah right.  comm-esr45 needs to have a change pushed.  it's basically
pullling from a not-so-updated relbranch.
Status: RESOLVED → REOPENED
Flags: needinfo?(ewong)
Resolution: FIXED → ---
Basically what needs to be done are the following:

create a tag/relbranch on mozilla-esr45 that points to http://hg.mozilla.org/releases/mozilla-esr45/rev/508243893e0c,
then change client.py to point to that relbranch.

Or

Move the THUNDERBIRD_45_VERBRANCH to that cset.

Since this is release territory, I think I need to ask tb-drivers
and relman (at the least).

:rkothari, :rkent..  any advice as how to proceed?
Status: REOPENED → ASSIGNED
Flags: needinfo?(rkothari)
Flags: needinfo?(rkent)
I think what you are saying is that we need to do a merge from default of THUNDERBIRD_45_VERBRANCH which we would typically do prior to any release. I'll do that soon and see it it fixes the issue.
Flags: needinfo?(rkent)
OK I did the merge to default on comm-esr45 and triggered a build, and that is happening successfully. So I think that we can close this again.
Status: ASSIGNED → RESOLVED
Closed: 3 years ago3 years ago
Flags: needinfo?(rkothari)
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.