Closed Bug 935997 Opened 6 years ago Closed 6 years ago

mach talos-test doesn't work, complains about mozinfo==0.4

Categories

(Testing :: Talos, defect)

x86_64
Linux
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Unassigned)

References

Details

Attachments

(1 file)

we need to make mach run talos again.  I suspect with our updating mozbase modules both in talos, in tree and dependencies in mozharness we have some random conflict here.

We have mozinfo==0.7 available, but talos is pegged to 0.4.  We can update that in talos with no dependency chain, I am just not sure if that will fix the problem or not.
hmm, I modified testing/talos/talos.json to point to a private repository (as I do on try server successfully) and it appears we don't use this.  

here is a link to mach_commands.py for talos: http://mxr.mozilla.org/mozilla-central/source/testing/talos/mach_commands.py

It clearly states in there to use talos.json and the path looks correct, but instead of downloading a custom talos repository, it downloads: /home/jmaher/mozilla/inbound/obj-x86_64-unknown-linux-gnu/mozharness/venv/bin/pip install --pypi-url http://pypi.python.org/simple --download-cache /home/jmaher/mozilla/inbound/obj-x86_64-unknown-linux-gnu/mozharness/venv/cache http://hg.mozilla.org/build/talos/archive/tip.tar.gz


aki: do you know why that mozharness script would be using the production talos instead of the one specified in talos.json?
Flags: needinfo?(aki)
(In reply to Joel Maher (:jmaher) from comment #1)
> aki: do you know why that mozharness script would be using the production
> talos instead of the one specified in talos.json?

I ran into this just recently.  It happens because the mach command sets the 'python_webserver' option:
http://mxr.mozilla.org/mozilla-central/source/testing/talos/mach_commands.py#99

which causes mozharness to bail out of the clone_talos step:
http://hg.mozilla.org/build/mozharness/file/fbb800d7643e/mozharness/mozilla/testing/talos.py#l474

instead of running _populate_webroot which is where it would normally read the talos.json options:
http://hg.mozilla.org/build/mozharness/file/fbb800d7643e/mozharness/mozilla/testing/talos.py#l434
Ok, it looks like we're not supporting the python_webserver option fully, which makes sense since we were pushing hard to get this live in production and skimped on some standalone stuff, namely talos.json.

What should the python_webserver setup look like?
Flags: needinfo?(aki)
From looking at the scripts, it appears that if we are using python webserver (--develop cli option to talos), then we probably don't have access to the internal server with the .zip files. All of that is setup at the same time.

For this specific bug, I am going to test mozinfo 0.7 with talos and see if that sticks. To solve the mozharness problem, we would need to split out the downloading/setup of talos vs downloading the .zip files (only used for tp5o/tp5n-xperf)
(In reply to Joel Maher (:jmaher) from comment #4)
> From looking at the scripts, it appears that if we are using python
> webserver (--develop cli option to talos), then we probably don't have
> access to the internal server with the .zip files. All of that is setup at
> the same time.
>
> For this specific bug, I am going to test mozinfo 0.7 with talos and see if
> that sticks. To solve the mozharness problem, we would need to split out the
> downloading/setup of talos vs downloading the .zip files (only used for
> tp5o/tp5n-xperf)

I'm pretty sure we only download the zip files if the suite we're running specifies we need them in talos.json, so I'm not sure we need to split that out.
:
I think all of _populate_webroot() assumes you're on a Talos production mini:
http://hg.mozilla.org/build/mozharness/file/fbb800d7643e/mozharness/mozilla/testing/talos.py#l407
which is why we make sure python_webserver isn't set here:
http://hg.mozilla.org/build/mozharness/file/fbb800d7643e/mozharness/mozilla/testing/talos.py#l474

If we want mach to run talos properly, I think we need to make the _populate_webroot() method standalone-friendly... with the correct paths, etc.
For instance, rmtree() might be the wrong approach if a developer has locally-changed, un-backed-up files.  And as I mentioned above, the default here is the production Talos mini directory paths, which probably won't work with python_webserver without changes.
We skimped on this, I think, because adding mach support was a lightly-tested afterthought; the main push was getting this into production.

Do you want to take a stab at this?  Or are you waiting for someone else to pick this up?
Joel - any rough ETR on this? It would be very helpful in quantifying any performance differences when the mac binary is built on linux (bug 921040).
Flags: needinfo?(jmaher)
it isn't on my plate and won't be anytime soon.  I don't think this will help us validate talos numbers, we would need to compare build a to b on the same machine.  mach talos-test will not work with a mac binary built on linux
Flags: needinfo?(jmaher)
vikstrous is hitting this issue too.

This is not a perfect solution (it doesn't address the python_webserver issue) but maybe we can just specify a version of mozinstall that we use in production explicitly. That way we won't blow away talos'  dep with mozinfo (0.4) when we install the latest mozinstall's mozinfo dep (0.7)

vikstrous: can you try testing this for me? 

Can you change from the following http://mxr.mozilla.org/mozilla-central/source/testing/talos/mach_commands.py#129 ->
    mozharness_repo = 'https://hg.mozilla.org/build/mozharness'
    mozharness_rev = 'production'

to ->
    mozharness_repo = 'https://github.com/lundjordan/mozharness'
    mozharness_rev = 'mach-talos'

then run again as normal and post findings? If it works, I think we should throw the quick patch in mozharness. Eventually, I think we are going to break production if we are not explicit about module versions (since talos and mozinstall have conflicting deps)
Flags: needinfo?(vstanchev)
Looks like it still tried to fetch the repo with hg and not git:

Running Talos test suite svgr
 0:00.06 /usr/bin/hg pull -r mach-talos -u
Process executed with non-0 exit code: [u'/usr/bin/hg', u'pull', u'-r', u'mach-talos', u'-u']
Flags: needinfo?(vstanchev) → needinfo?(jlund)
woops! sorry, /me should have realized mach would not figure out hg vs git :)

can you try again but with:
    mozharness_repo = 'http://hg.mozilla.org/users/jlund_mozilla.com/mozharness-mach-talos'
    mozharness_rev = 'default'
Flags: needinfo?(jlund)
working on a patch to get mozinfo 0.7 for talos, seems to be a few other hiccups that need to get worked out.
I am not 100% sure this will solve the problem, but it will get us on the same mozbase module versions!
Attachment #8382245 - Flags: review?(jlund)
Comment on attachment 8382245 [details] [diff] [review]
talos_machVersion.patch

cool,

I should note that Viktor had success when I explicitly set the mozinstall version in mozharness to 1.6. 1.6 has the requirement of mozinfo 0.4. Since we keep our venv outside of our clobber scope, we have been using ~1.6 in production.

But if 0.7 works for talos then that is a solution too. Plus it will allow us to use a higher mozinstall version in the future anyway.
Attachment #8382245 - Flags: review?(jlund) → review+
cool, I still need mozlog 1.5 updated, then I can land this, we will get there and thanks for helping out with this.
https://hg.mozilla.org/build/talos/rev/e2dd770e2d4c

running "mach talos-test chromez" on my inbound tree, yields success.  We don't need to deploy this asap to production, I am fine waiting until the next rollout.

thanks for the help on this so far.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.