Closed Bug 787200 Opened 12 years ago Closed 9 years ago

Move the Talos code into mozilla-central

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(firefox43 fixed)

RESOLVED FIXED
mozilla43
Tracking Status
firefox43 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: parkouss)

References

Details

Attachments

(4 files, 1 obsolete file)

CCing Joel since he knows what needs to be done here.  :-)
Historically it make great sense to leave talos outside of mozilla-central.  Now we run different versions of talos on different branches as we change talos and the tests so frequently.  

There might be some difficulty in getting the releng scripts retrofitted to pull talos out of tests.zip (or similar) instead of downloading from build.mozilla.com/talos.zip.

We also store some .zip files (tsplaces profiles and tp5n.zip pages) outside of the repository (for legal and size reasons).  Those would need to be documented better.

The last piece I can think of that might be problematic is that talos depends on external resources when we do a create_talos_zip.  Most of these resources live in m-c already, but we would either have to add some in there, or require a virtual_env to run them (like we already do now, it would just be different than other harnesses).
OS: Mac OS X → All
Hardware: x86 → All
We pull Talos from build.mozilla.org based on the contents of https://mxr.mozilla.org/mozilla-central/source/testing/talos/talos.json. I agree, Talos is a completely separate thing. It's really up to ateam where it lives IMO.
I'm fairly against this and would like to move more of our testing code and standalone tools out of mozilla central.  One frustration is that there is no clear guideline for what should be where:  there's a lot mirrored to mozilla-central, there's a lot that lives there, and there's a lot that isn't mirrored and doesn't live there.  ABICT, there is no clear policy about what and why.  While I feel that in practice modularity benefits from not having everything live in a giant tree, if we do want e.g. all testing related code to live in mozilla-central, then we should decide that and do it.  This affects mozbase amongst other things.

For talos specifically, we will want pyyaml in-tree.  pywin32 will be very challenging, as witnessed by the fact that it has held up upgrading the windows talos slaves from 2.4 for over six months now.
(In reply to comment #3)
> I'm fairly against this and would like to move more of our testing code and
> standalone tools out of mozilla central.  One frustration is that there is no
> clear guideline for what should be where:  there's a lot mirrored to
> mozilla-central, there's a lot that lives there, and there's a lot that isn't
> mirrored and doesn't live there.  ABICT, there is no clear policy about what
> and why.  While I feel that in practice modularity benefits from not having
> everything live in a giant tree, if we do want e.g. all testing related code to
> live in mozilla-central, then we should decide that and do it.  This affects
> mozbase amongst other things.

From the perspective of Mozilla developers, most people treat the things that live outside of mozilla-central as either non-existent or second class.  Examples are mozmill and Jetpack tests.  It would be very beneficial for us to be able to easily see what's inside the Talos tests and improve/fix them, and also to be able to run Talos in our check-outs against our builds for local measurements.  All of this we can already do with the rest of our automated test suites (reftests/xpcshell/mochitest/etc.)
(In reply to Ehsan Akhgari [:ehsan] from comment #4)
> (In reply to comment #3)
> > I'm fairly against this and would like to move more of our testing code and
> > standalone tools out of mozilla central.  One frustration is that there is no
> > clear guideline for what should be where:  there's a lot mirrored to
> > mozilla-central, there's a lot that lives there, and there's a lot that isn't
> > mirrored and doesn't live there.  ABICT, there is no clear policy about what
> > and why.  While I feel that in practice modularity benefits from not having
> > everything live in a giant tree, if we do want e.g. all testing related code to
> > live in mozilla-central, then we should decide that and do it.  This affects
> > mozbase amongst other things.
> 
> From the perspective of Mozilla developers, most people treat the things
> that live outside of mozilla-central as either non-existent or second class.
> Examples are mozmill and Jetpack tests.  It would be very beneficial for us
> to be able to easily see what's inside the Talos tests and improve/fix them,
> and also to be able to run Talos in our check-outs against our builds for
> local measurements.  All of this we can already do with the rest of our
> automated test suites (reftests/xpcshell/mochitest/etc.)

b2g also lives outside mozilla-central. This isn't a scalable attitude.
what if we had a make target:
make talos-tsvg <- or other test name?

That could check out the code and run the tests against the build in the tree.  Just trying to look at all options before committing one way or another.
I would not mind mirroring talos to m-c.  We already mirror several pieces to m-c, from NSPR to mozbase.  I would personally prefer to devote effort to unify how we mirror the different pieces.

If talos was in m-c, `make talos-tsvg` would require (presumedly) some subset of buildbot-configs in order to get comparable results to production.  We'd need to isolate what lived in buildbot-configs and move that to m-c.

We would also have to ship pywin32 with the mozilla-build, ABICT.
(In reply to comment #5)
> (In reply to Ehsan Akhgari [:ehsan] from comment #4)
> > (In reply to comment #3)
> > > I'm fairly against this and would like to move more of our testing code and
> > > standalone tools out of mozilla central.  One frustration is that there is no
> > > clear guideline for what should be where:  there's a lot mirrored to
> > > mozilla-central, there's a lot that lives there, and there's a lot that isn't
> > > mirrored and doesn't live there.  ABICT, there is no clear policy about what
> > > and why.  While I feel that in practice modularity benefits from not having
> > > everything live in a giant tree, if we do want e.g. all testing related code to
> > > live in mozilla-central, then we should decide that and do it.  This affects
> > > mozbase amongst other things.
> > 
> > From the perspective of Mozilla developers, most people treat the things
> > that live outside of mozilla-central as either non-existent or second class.
> > Examples are mozmill and Jetpack tests.  It would be very beneficial for us
> > to be able to easily see what's inside the Talos tests and improve/fix them,
> > and also to be able to run Talos in our check-outs against our builds for
> > local measurements.  All of this we can already do with the rest of our
> > automated test suites (reftests/xpcshell/mochitest/etc.)
> 
> b2g also lives outside mozilla-central. This isn't a scalable attitude.

Well, the Gecko specific bits used to live outside of mozilla-central, and the b2g team tried hard to merge them back in (and I think they succeeded, since they're doing their Gecko development on m-c.)
(In reply to comment #6)
> what if we had a make target:
> make talos-tsvg <- or other test name?
> 
> That could check out the code and run the tests against the build in the tree. 
> Just trying to look at all options before committing one way or another.

That would definitely be better than the current situation if we decide that we don't wanna move Talos inside m-c.  (In that case I suggest we should use mach as opposed to a new make target.)
hg subrepos could also conceivably be used here: http://www.selenic.com/hg/help/subrepos
The problem we want to solve here isn't "it's hard to run Talos locally" but "it's hard to reproduce remote Talos numbers locally."  Of course running Talos locally is a necessary condition, but in my experience, it's not sufficient.

Your hardware does not match the testing machines'.  Your system is not configured the same as the testing machines.  Your toolchain does not match the testing machines'.  You do not have all the same daemons running on your machine as the testing machines.  So your results are likely not to match the testing machines'.

For big regressions, of course we'll likely be able to see them locally.  But big regressions aren't the problem, IME; our existing tools don't have much difficulty pointing out the offender in those cases.  Or at least, they wouldn't, if we actually ran Talos on every m-i push.

Separately, I thought that we could not redistribute the Talos files we run on our builders.  That's why the file lives at build.mozilla.com.

Anyway, a make target (or script or mach target or whatever) can't hurt, and I think that would make a lot more sense than checking 100mb of test files (IIRC) in to m-c.  But I don't think it will help much, either.
We can run all tests from the talos repository except for tp5 and ts_places_* (the dirty tests).  Those tests require additional files which live on the build network.

The main motivation for this bug is to solve these questions:
* Where is the talos code, how are these tests run?
* I want to investigate a performance regression, how do I run talos?

While it is very true that we cannot reproduce the numbers locally that talos generates, I still think we can reproduce a regression locally by running two builds and comparing results.  So running tests locally will be able to allow somebody to see the impact code changes have on the numbers, even if it is off by a small factor from what we would see on tbpl.

The big gaping hole is how do you compare results when running them locally, or more specifically how do you translate the reported numbers into something that gets reported to graph server allowing you to see the difference your patch makes.

If the common scenario is to run talos tests locally to investigate a regression (known or possible based off local patches), then having make targets will be useful.  I think with both cases we will need to run against old and new builds to ensure we can see the difference locally which means that a make target isn't going to be that helpful.

To solve the releng dilemna, we can continue to generate talos.zip files and make our system work the same way it does no matter where the talos code lives.
> I still think we can reproduce a regression locally by running two builds and comparing 
> results.

Indeed, that's a logical conclusion.  But in my experience, this is much easier said than done.  I spent two months trying to understand why I couldn't reproduce a talos regression I observed on m-i using tryserver, and I eventually gave up (bug 653961).  Trying to do the same on your local machine will be ten times harder.

Rafael had a great post on dev.planning about how incredibly hard this can be in practice.  Unfortunately it's not on Google Groups yet, afaict.  But to quote the problems he encountered trying to reproduce a regression locally:

> * our builds are *really* hard to reproduce. The build I was downloading from
>   try was faster than the one I was doing locally. In despair I decided to fix
>   at least part of this first. It found that our build was depending on the way
>   the bots use ccache (they set CCACHE_BASEDIR which changes __FILE__), the
>   build directory (shows up on debug info that is not stripped), and the file
>   system being case sensitive or not.
> 
> * testing on linux showed even more bizarre cases where small changes cause
>   performance problems. In particular, adding a nop *after the last ret* in
>   function would make the js interpreter faster on sunspider. The nop was just
>   enough to make the function size cross the next 16 bytes boundary and that
>   changed the address of every function linked after it.
> 
> * the histogram of some benchmarks don't look like a normal distribution
>   (https://plus.google.com/u/0/108996039294665965197/posts/8GyqMEZHHVR). I
>   still have to read the paper mentioned in the comments.
That raises the question of why should we run tests that give such noisy results in automation? If it's so hard to reproduce a regression locally, why do we think that the automated results are any better?
(In reply to Chris AtLee [:catlee] from comment #14)
> That raises the question of why should we run tests that give such noisy
> results in automation? If it's so hard to reproduce a regression locally,
> why do we think that the automated results are any better?

ISTM that Justin has not been lamenting the noisiness of the tests themselves, but other environmental factors.  Maybe "able to run Talos locally" shouldn't be the goal, but "able to run Talos locally with a similar build environment" should be the goal, of which the former is only a stepping stone to the latter.  That would require downloadable build tools, some way to automagically use them, etc. etc.

Of course, the tests may be noisy too!  But that seems like a separate concern from what's been discussed thus far.
I think part of the problem is, we have no way to distinguish between "small" and "large" regressions.  Any detectable downward change is considered a regression.

> If it's so hard to reproduce a regression locally, why do we think that the automated 
> results are any better?

In one sense, the problem is that some of our tests have /too little/ noise when run on automation.  This causes us to freak out about what are small but detectable changes in the result which we can't reproduce on other systems.

Put another way, part of the problem is that the noise introduced by different builds/systems/etc is larger than the noise in the tests when run on automation.
Performance tests will always have noise even if it is very small and almost unmeasurable.  Running on different hardware and OS configs is the biggest factor.  On my local desktop I have all types of things running which we do not run on the talos slaves, likewise the versions of talos and other libraries could be different.

I attribute the noise in performance runs as an equal to the random oranges we see in our unittests, it is a fact of life unless we are testing "hello world".  

We have been working on reducing the noise in the tests and making sure the tests are useful.  If we start backing patches out due to performance regressions we need to give developers the ability to attempt to reproduce the problem locally.  Just like random oranges, this is not always possible.  There are good ways to run Talos right now, but it is out of the normal workflow for builds and unittests.  

the real questions is: if we run it on tbpl, should it be all inclusive in mozilla-central?  Talos is the major exception to that rule.

logistically we can make this work; it might take a while to get our automation using it from a new location.  Lets focus on the bug at hand here and weigh on in the pros/cons of talos living in m-c.  right now it is pretty neutral.
> I attribute the noise in performance runs as an equal to the random oranges we see in 
> our unittests, it is a fact of life unless we are testing "hello world".

Yes, of course.

> If we start backing patches out due to performance regressions we need to give developers 
> the ability to attempt to reproduce the problem locally.

I don't think anyone disagrees with the idea of making it easier to run Talos locally.  The disagreement is only over the degree to which this may be futile.

Anyway, to the question at hand about whether Talos belongs in m-c: How large is the Talos zip file?
> I don't think anyone disagrees with the idea of making it easier to run Talos locally.  
> The disagreement is only over the degree to which this may be futile.

And the degree to which it's futile informs the priority of fixing this bug, because if running tests were always futile, we wouldn't care at all about making it easy to run Talos locally.

But if the priority of this bug isn't in question, then the futility isn't really relevant to this bug, I agree.
(In reply to comment #13)
> > I still think we can reproduce a regression locally by running two builds and comparing 
> > results.
> 
> Indeed, that's a logical conclusion.  But in my experience, this is much easier
> said than done.  I spent two months trying to understand why I couldn't
> reproduce a talos regression I observed on m-i using tryserver, and I
> eventually gave up (bug 653961).  Trying to do the same on your local machine
> will be ten times harder.

In the past there was a time when I watched dev.tree-management very closely and bugged people when they regressed stuff.  And they kept telling me that Talos regressions are very hard to reproduce locally.  So I started to wonder whether there's some truth to that on the average case, so I started to try to reproduce the regressions and improvements locally (on the tests that did not require non-public files, obviously.)  And I realized that it is actually very easy to reproduce many Talos regressions locally.  I don't have that data around any more, but I clearly remember the number being much larger than 50%.
Currently talos.zip is about 9MB in size.  

It sounds like we should adjust revisit our dev.tree-management emailer program and poll developers if running talos from an external repository is too much of an OOB process.  I know personally when I have to commit to a github project it is a learning curve every time (about once every 6 weeks).  Would rather remove as much of the learning curve as possible.
(In reply to comment #21)
> Currently talos.zip is about 9MB in size.  

That is not very big.  I land 20+MB patches without a blink!
(In reply to Joel Maher (:jmaher) from comment #21)
> Currently talos.zip is about 9MB in size.  
> 
> It sounds like we should adjust revisit our dev.tree-management emailer
> program and poll developers if running talos from an external repository is
> too much of an OOB process.  I know personally when I have to commit to a
> github project it is a learning curve every time (about once every 6 weeks).
> Would rather remove as much of the learning curve as possible.

To be fair, talos lives in hg, not in github: http://hg.mozilla.org/build/talos/
Keeping just on why I think we should move talos to m-c, as I agree that the impact it would have on reproducing remote regressions locally would be small, but it should have other benefits:

* One place to look for information. I am trying to create a script to compare talos runs. This is something that should really be on m-c as it is a basic developer tool, if the rest of talus was there information could be shared.
* Easier to reproduce old results. If something changes in talos, that change is recorded in m-c. This is analogous to the move of mozconfigs to m-c which was an awesome improvement.
* Easier to test changes. Do you want to propose a new benchmark? Do you want to change one to remove some source of noise? All you need is a try run modifying talos. This is even true if all you want to do is check if a test is sensitive to measurement bias: you can change the talos script to create a dummy env var for example.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #24)
> * One place to look for information. I am trying to create a script to
> compare talos runs. This is something that should really be on m-c as it is
> a basic developer tool, if the rest of talus was there information could be
> shared.

I don't buy this as a scalable solution to the problem of "where is what I want"

> * Easier to reproduce old results. If something changes in talos, that
> change is recorded in m-c. This is analogous to the move of mozconfigs to
> m-c which was an awesome improvement.

We already have a .json that identifies what talos zip our "old builds" are based off of, you already conceded that reproducing this locally doesn't work.

> * Easier to test changes. Do you want to propose a new benchmark? Do you
> want to change one to remove some source of noise? All you need is a try run
> modifying talos. This is even true if all you want to do is check if a test
> is sensitive to measurement bias: you can change the talos script to create
> a dummy env var for example.

This alone won't make testing local talos changes easier. What does work *now* is creating a new talos.zip uploading it somewhere our infra can reach, and then running a try run with the json changed.

To make our infra use something from in tree is a much different/bigger problem, and would need to change whether we did that from in talos or in m-c as far as "where does the code live".
> I don't buy this as a scalable solution to the problem of "where is what I
> want"

I have seen trees *way* larger than m-c, and it scales really well.

> We already have a .json that identifies what talos zip our "old builds" are
> based off of, you already conceded that reproducing this locally doesn't
> work.

And remotely? Is the zip on m-c? Is the zip named with a hash of its contents?

> This alone won't make testing local talos changes easier.

Again, I am not discussing that. I am discussing why I think we should move talos to m-c, and as I said on the first paragraph, that was not one of the reasons.

> What does work
> *now* is creating a new talos.zip uploading it somewhere our infra can
> reach, and then running a try run with the json changed.
>

Which is *way* worse than testing other changes in firefox.

> To make our infra use something from in tree is a much different/bigger
> problem, and would need to change whether we did that from in talos or in
> m-c as far as "where does the code live".

If we are going to take arguments on this line we would never make infrastructure changes. I was not here at the time, but I am sure it was hell to change cvs to hg. More recently we have also changed how we do some things to make our lives easier:

* mozconfigs are now in tree
* compilers (for now b2g and os x) can be fetch from a manifest in m-c.
(In reply to Rafael Ávila de Espíndola (:espindola) from comment #26)
> > I don't buy this as a scalable solution to the problem of "where is what I
> > want"
> 
> I have seen trees *way* larger than m-c, and it scales really well.

I have seen trees way *smaller* than m-c and (depending on context) I find this strategy doesn't scale well at all.  Namely, while everyone agrees that modularity is important, and most people agree that it is good to have software that (not to be taken too literally) "does one thing and one thing well", my experience with "put all the things in the tree" is that software that should be modular components end up getting (usually needlessly) intertwined with other software.  Meaningful dependencies are not maintained. The giant tree is used as a replacement for a real deployment strategy.  Our tree is a great example of this. I'm not going to dig up examples right now, but if you look at our existing testing software and try to decipher what depends on what (and even better, the location of what in the tree) and can come back and say that that is *good* practice....well, I'd be somewhat shocked.

While I also think several aspects of putting talos in the tree would be problematic (e.g. pywin32) and (very) time-consuming (e.g. changing all the build infrastructure to use the in-tree builds), my foremost objection is that programmers -- even very talented programmers -- seem to be unable to keep things modular when you have a giant soup of all the things.

> > We already have a .json that identifies what talos zip our "old builds" are
> > based off of, you already conceded that reproducing this locally doesn't
> > work.
> 
> And remotely? Is the zip on m-c? Is the zip named with a hash of its
> contents?

Yes, in fact it is named with the changeset hash of its contents: https://hg.mozilla.org/mozilla-central/file/e3e7f8f7796d/testing/talos/talos.json

 "url": "http://build.mozilla.org/talos/zips/talos.38e088867f7b.zip"

So if you wanted to run the exact version 38e088867f7b of talos, that is perfectly possible (and scriptable).  However, for most cases on m-c (not necessary aurora or beta), its probably reasonable to check out the tip of talos from http://hg.mozilla.org/build/talos/

> > What does work
> > *now* is creating a new talos.zip uploading it somewhere our infra can
> > reach, and then running a try run with the json changed.
> >
> 
> Which is *way* worse than testing other changes in firefox.

Interestingly i've got this down to a script I can invoke in 1 line: http://k0s.org/mozilla/talos/update_talos.py:

~/mozilla/talos/update_talos.py url/or/file/path/to/test.diff

While this is tailored to my machine (mostly due to lack of interest in anyone else using it), i'd certainly be happy to check this into m-c and/or improve it if anyone was interested in using this script.
 
> > To make our infra use something from in tree is a much different/bigger
> > problem, and would need to change whether we did that from in talos or in
> > m-c as far as "where does the code live".
> 
> If we are going to take arguments on this line we would never make
> infrastructure changes. I was not here at the time, but I am sure it was
> hell to change cvs to hg. More recently we have also changed how we do some
> things to make our lives easier:
> 
> * mozconfigs are now in tree
> * compilers (for now b2g and os x) can be fetch from a manifest in m-c.

But you don't store the compilers themselves in the tree? Why not?

Without broadening the scope too much, I'd like to know criteria for:
- what can/should live in the tree
- what can/should be mirrored to the tree
- what can/should not live in the tree

Obviously there are very strong opinions in all directions here.  I'm fairly pro-mirroring/subrepos and fairly against having a giant monolithic repo of all the things.  That said, if we actually had a strategy here, instead of variations on this theme coming up every three months and having this argument, I would at least be somewhat happier even if I disagreed with the outcome.  Now, as best I can tell we have no real strategy and what our strategy is seems to change from project to project, test framework to test framework, etc. If there is a pattern, I can neither discern it nor have I seen it documented.

As far as technical concerns, I haven't heard anyone come up with a strategy about what to do with the pywin32 dependency (outside of the fact we don't want it anyway) or any cost-estimate on how much releng manhours it would take to revamp all the infrastructure (as a talos developer, outside of concern for trying to preserve sanity in Mozilla's infrastructure, I shouldn't really care; I just have to check out another repo.  If I was with releng...I would probably feel much more frustrated). And there's still the matter of the pagesets which can't legally live in tree.

If I saw that developer convenience to run talos tests outweighed these concerns, I would be more empathetic.  I've personally done a lot in the last year from changing Talos from a system that was very hard to install, required an apache setup, and otherwise required production configuration, into a real piece of software that can be installed in the usual python way. I would like to continue to make this easier.  I don't think it is a huge amount to ask to checkout talos in a virtualenv, run setup.py develop, and run the tests (and hopefully complain to me if something goes wrong).  And as said I am perfectly open to mirroring or subrepositories as solutions if `make tsvg` is worth the high but unestimated number of manhours to get our infrastructure to use in-tree talos
>But you don't store the compilers themselves in the tree? Why not?
>

hg sucks.
OK, there seems to be a huge disconnect between the two sides of the argument here.  Let's see if I can make things a bit clearer.

(In reply to Jeff Hammel [:jhammel] from comment #27)
> I have seen trees way *smaller* than m-c and (depending on context) I find
> this strategy doesn't scale well at all.  Namely, while everyone agrees that
> modularity is important, and most people agree that it is good to have
> software that (not to be taken too literally) "does one thing and one thing
> well", my experience with "put all the things in the tree" is that software
> that should be modular components end up getting (usually needlessly)
> intertwined with other software.  Meaningful dependencies are not
> maintained. The giant tree is used as a replacement for a real deployment
> strategy.  Our tree is a great example of this. I'm not going to dig up
> examples right now, but if you look at our existing testing software and try
> to decipher what depends on what (and even better, the location of what in
> the tree) and can come back and say that that is *good* practice....well,
> I'd be somewhat shocked.

You and Ben have said that this is not scalable, and before reading this I thought that you're talking about things like the size of the repository, etc.  As far as those concerns go, I refer you to the fact that Talos both in terms of the number of changesets (518) and the on-disk size is tiny compared to m-c, so it will not change the performance characteristics of hg handling m-c in any meaningful way.  And while I appreciate discussions about the future directions on how scalable merging certain things into m-c is, I don't think that this is the right forum for that discussion, and it is definitely out of the scope of the current discussion (especially since nobody has a list of the "things" that people may want integrated into m-c in the future.)  Coming up with a strategy on whether a thing needs to live in m-c or not is something best done based on the merits of doing so on a case by case basis, not by blanket strategies which are bound to miss important points about individual cases.

As far as the modularity argument goes, I really fail to see why the location of the source code has anything to do with this.  This is something which should be enforced by the module owner and reviewers.  The reason that many interdependencies exist in many parts of m-c today could well be attributed to history and pragmatic concerns in some cases "having _something_ which works tomorrow, rather than having something ideal six months from now."  Now it can be argued that those trade-offs have been wrong in the past, but that doesn't really have anything to do with the location of the source code.

> While I also think several aspects of putting talos in the tree would be
> problematic (e.g. pywin32) and (very) time-consuming (e.g. changing all the
> build infrastructure to use the in-tree builds), my foremost objection is
> that programmers -- even very talented programmers -- seem to be unable to
> keep things modular when you have a giant soup of all the things.

And that is why we have review requirements.  Note that I have never proposed any changes in the module ownership or review requirements for Talos.

> > > To make our infra use something from in tree is a much different/bigger
> > > problem, and would need to change whether we did that from in talos or in
> > > m-c as far as "where does the code live".
> > 
> > If we are going to take arguments on this line we would never make
> > infrastructure changes. I was not here at the time, but I am sure it was
> > hell to change cvs to hg. More recently we have also changed how we do some
> > things to make our lives easier:
> > 
> > * mozconfigs are now in tree
> > * compilers (for now b2g and os x) can be fetch from a manifest in m-c.
> 
> But you don't store the compilers themselves in the tree? Why not?

Because people are not expected to understand the internals of the source code for compilers, and to modify them.  You can't use the fact that we don't put the source code for our compilers in the tree as a precursor to conclude that we should do the same for Talos.

> Without broadening the scope too much, I'd like to know criteria for:
> - what can/should live in the tree
> - what can/should be mirrored to the tree
> - what can/should not live in the tree

As I said above, this is scope-creep.  We *don't* need to have an answer for these points in order to decide where Talos should live.  And it is very possible in my experience that we as Mozilla will _never_ have a blanket answer for those (which I would support, as I think these forms of abstract questions are impossible to answer well in practice.)

> Obviously there are very strong opinions in all directions here.  I'm fairly
> pro-mirroring/subrepos and fairly against having a giant monolithic repo of
> all the things.

Hmm, why would you prefer m-c/talos to be a subrepo as opposed to a normal subdirectory of m-c?

>  That said, if we actually had a strategy here, instead of
> variations on this theme coming up every three months and having this
> argument, I would at least be somewhat happier even if I disagreed with the
> outcome.  Now, as best I can tell we have no real strategy and what our
> strategy is seems to change from project to project, test framework to test
> framework, etc. If there is a pattern, I can neither discern it nor have I
> seen it documented.

No, you're right.  I'm not aware of any global strategy on this either.  Which is why we're having this discussion.  :-)

> As far as technical concerns, I haven't heard anyone come up with a strategy
> about what to do with the pywin32 dependency (outside of the fact we don't
> want it anyway) or any cost-estimate on how much releng manhours it would
> take to revamp all the infrastructure (as a talos developer, outside of
> concern for trying to preserve sanity in Mozilla's infrastructure, I
> shouldn't really care; I just have to check out another repo.  If I was with
> releng...I would probably feel much more frustrated). And there's still the
> matter of the pagesets which can't legally live in tree.

I don't know what the pywin32 dependency issue is, since you have only mentioned that it exists.  I don't also know what the RelEng side of work will look like, but in a small unofficial chat that I had with a few of the RelEng folks the other day at the office, they seemed to indicate that they just use the zip file on the build server, so they don't really care where the code lives.

Given that the above is actually true (and it would be great if someone from RelEng can confirm that please), and that we can find a solution to the pywin32 dependency problem, would you still object to moving the source code?

> If I saw that developer convenience to run talos tests outweighed these
> concerns, I would be more empathetic.  I've personally done a lot in the
> last year from changing Talos from a system that was very hard to install,
> required an apache setup, and otherwise required production configuration,
> into a real piece of software that can be installed in the usual python way.
> I would like to continue to make this easier.  I don't think it is a huge
> amount to ask to checkout talos in a virtualenv, run setup.py develop, and
> run the tests (and hopefully complain to me if something goes wrong).  And
> as said I am perfectly open to mirroring or subrepositories as solutions if
> `make tsvg` is worth the high but unestimated number of manhours to get our
> infrastructure to use in-tree talos

The amount of work that has gone through Talos is indeed astonishing, and I can attest to that as someone who has personally gone through the pain of running the initial versions of "Standalone Talos"... something which IIRC took me a few days to accomplish back in those days!  :-)

But speaking as a developer, it is not a matter of laziness that I request for the Talos code to be moved to m-c.  Here are the reasons why I think it would be a useful change:

1. More visibility to all of the developers working on Core and Firefox, as opposed to only the people who work specifically on Talos.
2. Easier tracking of changes alongside with m-c changes.  For example, if I want to know how the changes in Talos and the layout module have played together over the past month, I need to be able to issue a command like |hg log layout/ talos/|.  Knowing the snapshot level history of Talos changes in the talos.json file is definitely better than not detecting any dependency at all, but is not enough at all, since it hides the history of changes, which is probably more important than major diffs of snapshots.  The current alternative today is to log both repositories in two terminal windows, and hack your way around by matching change dates in your head and doing guesswork.
3. Making is easier for more developers to run Talos locally by adding build system support for making it easier for developers to run those tests.  Note that we do have historical data on how well keeping test suites outside of mozilla-central works.  We have had long debates about merging in the jetpack and mozmill test suites, and the teams working on those projects have resisted for various reasons.  The current state of those tests is that people hide them on TBPL, and when somebody lands something that breaks them, the usual answer from the developers is shrugging, and pointing out that it takes them too much effort to figure out where to get the test suite from and how to run it, and the fact that this would be the first time that they've ever looked at that test suite does not help either.  You may dislike this reaction, and I can sympathize, but this is the fact of the matter as history has taught us.  And those test suites are not taken seriously by developers to this day, which is sad.
(In reply to Ehsan Akhgari [:ehsan] from comment #29)
> OK, there seems to be a huge disconnect between the two sides of the
> argument here.  Let's see if I can make things a bit clearer.

Responses selectively inline.

> (In reply to Jeff Hammel [:jhammel] from comment #27)
> > I have seen trees way *smaller* than m-c and (depending on context) I find
> > this strategy doesn't scale well at all.

<snip/>

> As far as the modularity argument goes, I really fail to see why the
> location of the source code has anything to do with this.  This is something
> which should be enforced by the module owner and reviewers.  The reason that
> many interdependencies exist in many parts of m-c today could well be
> attributed to history and pragmatic concerns in some cases "having
> _something_ which works tomorrow, rather than having something ideal six
> months from now."  Now it can be argued that those trade-offs have been
> wrong in the past, but that doesn't really have anything to do with the
> location of the source code.

While in theory, I will mostly agree.  In practice, both in terms of
mozilla-central and the other large (though not m-c large) monoliths I
have worked with, I find that this gives developers reverse incentives
to take short-cuts vs. having to figure out more rigorous ways of
resolving dependencies.  In general, I think "put XXX in YYY's
monolithic repository" is a way to work around figuring out how
software YYY can depend on software XXX, not solving the problem.
In the interest of not bikeshedding or creeping scope, I will leave
this here, but am open to discussing this is less off-topic forums.

> > While I also think several aspects of putting talos in the tree would be
> > problematic (e.g. pywin32) and (very) time-consuming (e.g. changing all the
> > build infrastructure to use the in-tree builds), my foremost objection is
> > that programmers -- even very talented programmers -- seem to be unable to
> > keep things modular when you have a giant soup of all the things.
>
> And that is why we have review requirements.  Note that I have never
> proposed any changes in the module ownership or review requirements for
> Talos.

I would be much more inclined to agree if the state of python
packaging and interdependencies in mozilla-central was better than it
was.  Seeing these problems persist and not receive prioritization
does not encourage me that the review process is enough to catch this.

> > > > To make our infra use something from in tree is a much different/bigger
> > > > problem, and would need to change whether we did that from in talos or in
> > > > m-c as far as "where does the code live".
> > > 
> > > If we are going to take arguments on this line we would never make
> > > infrastructure changes. I was not here at the time, but I am sure it was
> > > hell to change cvs to hg. More recently we have also changed how we do some
> > > things to make our lives easier:
> > > 
> > > * mozconfigs are now in tree
> > > * compilers (for now b2g and os x) can be fetch from a manifest in m-c.
> > 
> > But you don't store the compilers themselves in the tree? Why not?
> 
> Because people are not expected to understand the internals of the source
> code for compilers, and to modify them.  You can't use the fact that we
> don't put the source code for our compilers in the tree as a precursor to
> conclude that we should do the same for Talos.
>
> > Without broadening the scope too much, I'd like to know criteria for:
> > - what can/should live in the tree
> > - what can/should be mirrored to the tree
> > - what can/should not live in the tree
>
> As I said above, this is scope-creep.  We *don't* need to have an answer for
> these points in order to decide where Talos should live.  And it is very
> possible in my experience that we as Mozilla will _never_ have a blanket
> answer for those (which I would support, as I think these forms of abstract
> questions are impossible to answer well in practice.)

What I don't like to see is not only this logic being applied
inconsistently (through inference), but coming up with N different
solutions to the same problem.  Not only do we mirror a lot of stuff,
we mirror it differently.  Or we copy + paste code.  I also don't
believe that one blanket strategy "solves everything", but from my
POV, we don't seem to have a strategy as far as this question.

> > Obviously there are very strong opinions in all directions here.  I'm fairly
> > pro-mirroring/subrepos and fairly against having a giant monolithic repo of
> > all the things.
>
> Hmm, why would you prefer m-c/talos to be a subrepo as opposed to a normal
> subdirectory of m-c?

If talos is a subrepository or is mirrored to m-c, development
continues to take place in hg.m.o/build/talos.

Another reason I tend against the "move all the things to m-c"
strategy is that it discourages collaboration and makes reuse
harder. From the point of view of "Talos as a standalone piece of
software (for performance testing of Firefox)", and I want to
collaborate on this project can work on Talos as a standalone tool.
Personally, when I have been interested in adding functionality to
standalone tools for other OSS projects, if the first step is "check
out a large repository", I lose my interest.  While the feeling is not
universal, I have often heard from others that feel the same.

For reuse, I have currently one active and several past projects that
have consumed Talos code as a library.  While releasing Talos to pypi
(something we don't do and that I don't know how :jmaher feels about)
would make this at least possible, it is (effectively) impossible to
use "Talos the python package" as a (normal) dependency in downstream
code.

Personally, I would like to see Mozilla aim for more modularity and
more of "Mozilla as a platform", not less.  But that again is a larger topic.

<snip/>

> >  That said, if we actually had a strategy here, instead of
> > variations on this theme coming up every three months and having this
> > argument, I would at least be somewhat happier even if I disagreed with the
> > outcome.  Now, as best I can tell we have no real strategy and what our
> > strategy is seems to change from project to project, test framework to test
> > framework, etc. If there is a pattern, I can neither discern it nor have I
> > seen it documented.
> 
> No, you're right.  I'm not aware of any global strategy on this either. 
> Which is why we're having this discussion.  :-)
> 
> > As far as technical concerns, I haven't heard anyone come up with a strategy
> > about what to do with the pywin32 dependency (outside of the fact we don't
> > want it anyway) or any cost-estimate on how much releng manhours it would
> > take to revamp all the infrastructure (as a talos developer, outside of
> > concern for trying to preserve sanity in Mozilla's infrastructure, I
> > shouldn't really care; I just have to check out another repo.  If I was with
> > releng...I would probably feel much more frustrated). And there's still the
> > matter of the pagesets which can't legally live in tree.
> 
> I don't know what the pywin32 dependency issue is, since you have only
> mentioned that it exists.  I don't also know what the RelEng side of work
> will look like, but in a small unofficial chat that I had with a few of the
> RelEng folks the other day at the office, they seemed to indicate that they
> just use the zip file on the build server, so they don't really care where
> the code lives.
> 
> Given that the above is actually true (and it would be great if someone from
> RelEng can confirm that please), and that we can find a solution to the
> pywin32 dependency problem, would you still object to moving the source code?

Yes, again mostly for reasons of modularity.  But I'm not the owner of Talos.

> > If I saw that developer convenience to run talos tests outweighed these
> > concerns, I would be more empathetic.  I've personally done a lot in the
> > last year from changing Talos from a system that was very hard to install,
> > required an apache setup, and otherwise required production configuration,
> > into a real piece of software that can be installed in the usual python way.
> > I would like to continue to make this easier.  I don't think it is a huge
> > amount to ask to checkout talos in a virtualenv, run setup.py develop, and
> > run the tests (and hopefully complain to me if something goes wrong).  And
> > as said I am perfectly open to mirroring or subrepositories as solutions if
> > `make tsvg` is worth the high but unestimated number of manhours to get our
> > infrastructure to use in-tree talos
>
> The amount of work that has gone through Talos is indeed astonishing, and I
> can attest to that as someone who has personally gone through the pain of
> running the initial versions of "Standalone Talos"... something which IIRC
> took me a few days to accomplish back in those days!  :-)

> But speaking as a developer, it is not a matter of laziness that I request
> for the Talos code to be moved to m-c.  Here are the reasons why I think it
> would be a useful change:

> 1. More visibility to all of the developers working on Core and Firefox, as
> opposed to only the people who work specifically on Talos.

I would tend to solve this problem through improved communication and tools.

> 2. Easier tracking of changes alongside with m-c changes.  For example, if I
> want to know how the changes in Talos and the layout module have played
> together over the past month, I need to be able to issue a command like |hg
> log layout/ talos/|.  Knowing the snapshot level history of Talos changes in
> the talos.json file is definitely better than not detecting any dependency
> at all, but is not enough at all, since it hides the history of changes,
> which is probably more important than major diffs of snapshots.  The current
> alternative today is to log both repositories in two terminal windows, and
> hack your way around by matching change dates in your head and doing
> guesswork.

Again, I would tend towards tools to help with this problem.  FWIW, I
do believe and would encourage tests to live with mozilla-central.
Test harnesses....not so much.

> 3. Making is easier for more developers to run Talos locally by adding build
> system support for making it easier for developers to run those tests.  Note
> that we do have historical data on how well keeping test suites outside of
> mozilla-central works.  We have had long debates about merging in the
> jetpack and mozmill test suites, and the teams working on those projects
> have resisted for various reasons.  The current state of those tests is that
> people hide them on TBPL, and when somebody lands something that breaks
> them, the usual answer from the developers is shrugging, and pointing out
> that it takes them too much effort to figure out where to get the test suite
> from and how to run it, and the fact that this would be the first time that
> they've ever looked at that test suite does not help either.  You may
> dislike this reaction, and I can sympathize, but this is the fact of the
> matter as history has taught us.  And those test suites are not taken
> seriously by developers to this day, which is sad.

I think it will be hard to take Talos data seriously until we have
more rigorous statistics in place and having these be self-evident to
developers.  While if a changeset causes a significant performance hit
it may be easy to figure out from e.g. the datazilla json format if
one has caused a regression, we do not have easy (and rigorous) ways
of analzying this currently.

I'm going to de-CC myself from this bug as I fear I am derailing the
discussion, which believe it or not is not my intent, and because most
of what I have to say has to do with the direction Mozilla is going as
a whole with respect to "Mozilla as a platform", but I am happy to
address all of these issues is (possibly too great of) depth in other
channels.
I think the two sides of the argument have mostly said what they have to say, I personally don't have a lot to add to this discussion.

According to https://wiki.mozilla.org/Modules/Core#Testing_Infrastructure, Clint is the owner of Talos.  I'm assigning this bug to him since I believe he is the one who needs to make the call here.  :-)
Assignee: nobody → ctalbert
(In reply to Ehsan Akhgari [:ehsan] from comment #29)
> 3. Making is easier for more developers to run Talos locally by adding build
> system support for making it easier for developers to run those tests.

I'm not sure this actually requires having talos in m-c; if we had a

./mach talos dromaeojs

(say), that would transparently pull the right files from wherever they live, I think that would be sufficient for most developers.
After looking over this thread yet again and thinking about the problem we are trying to solve, I don't see any value in putting talos into m-c.  While ctalbert is the owner of talos, :jhammel and myself have been the two people driving the refactoring and expansion of talos with the help of a lot of community members and a few folks from inside of Mozilla.

Talos is fairly easy to run if you follow the instructions.  We can figure out how to make it run from 'make talos dromaeojs' which will solve the core problem here.  Hacking on talos and the test cases is not the common use case and in the rare case that is needed you will have the source code available from where you are running it locally.
I'm sorry for the late reply. I forgot to hit submit.

(In reply to Ehsan Akhgari [:ehsan] from comment #29)
> Given that the above is actually true (and it would be great if someone from
> RelEng can confirm that please), and that we can find a solution to the
> pywin32 dependency problem, would you still object to moving the source code?
> 

Yes, we download a talos.zip that the a-team gives us to upload to build.mozilla.org.
Where the source leaves does not affect us.
We're hoping to have
(broken comment - probably that is why I did not hit submit)
I don't know what we were hoping for :P if I remember I will reply.
(In reply to comment #33)
> After looking over this thread yet again and thinking about the problem we are
> trying to solve, I don't see any value in putting talos into m-c.  While
> ctalbert is the owner of talos, :jhammel and myself have been the two people
> driving the refactoring and expansion of talos with the help of a lot of
> community members and a few folks from inside of Mozilla.
> 
> Talos is fairly easy to run if you follow the instructions.  We can figure out
> how to make it run from 'make talos dromaeojs' which will solve the core
> problem here.  Hacking on talos and the test cases is not the common use case
> and in the rare case that is needed you will have the source code available
> from where you are running it locally.

At the risk of repeating myself yet again, the ability of looking at the history of Talos intertwined with the history of m-c is also valuable.
talos is run as a released version from a timepoint.  Every checkin we do does not get deployed.  So looking at what changes are in talos for a given time range does not tell you what was actually being run on tbpl.  

Remember, builbot uses talos.zip as a released version of talos and is downloaded from a private link in the build.mozilla.org network.

To the point of talos being the cause of regressions it is probably 1% of all talos regressions which have been caused by the harness/toolchain.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
At the risk of starting another war, I filed bug 789266 on making it easier to run Talos locally.
Sorry I'm late to the party. I've had an even worse fire to fight this week.

There are lots of good reasons here from both sides. The way we run Talos has nothing to do with what is in any tree. Talos is a released software project. We stage and QA version x as a release, we upload it to a place the build infra can get to it, and then that release is used as the bits that run the test on the changsets being tested.

Because of that ^ reason alone, I *don't* want to put Talos in m-c. I think that having Talos code in m-c will spawn even more confusion because then when new developer A comes along and sees the code in m-c, they modify the code, and then nothing happens in the automation. And that would only serve to make them more confused.  

And because Talos is so critical and even small changes can have an impact on the noise (in the current version), I don't want every single commit in m-c to be treated as a "new version" of Talos. Every version of Talos is run through many runs on staging servers before it is ever deployed. And I don't want to switch to a system where we don't stage our changes to Talos before we deploy them.

What I *do* think we should do, which is exactly what jlebar filed the next bug for is make this process easier for developers to run Talos locally and on try.  Testing on Try is a critical piece to how we develop Talos (because then we can see the numbers, trust they run the same environments etc), but the method we use to do that is not your standard method because we don't want the things we push to try to be picked up by the tests that are happening on inbound etc.

So, I agree with Joel here that the value vs. the trouble of putting Talos in m-c isn't worth it.  What I think *is* worth it, and *is* long overdue is a developer focused design on how to run talos both locally on your machine and on try. There is no contention around making this system easier to use. Jeff and Joel have been working on that for months as time allows. And the entire thrust behind the Datazilla/Signal From Noise project has been to make Talos easier to use and less noisy.

I'm hoping we can use the new mach utility in both cases to give developers a simple interface for working with Talos.  Let's follow up on bug 789266 and ensure that our changes to make Talos seamlessly easy to use are amenable to your day to day work flows.
Based on experience and various advancements since this was last discussed, I believe the going consensus is now that we do, in fact, want to do this. The only question is when, as many things need to change. Probably we want to migrate Android away from Talos first (this is already planned), as it would be the source of additional complexity in tackling this. In any case, reopening to reflect the current status.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
See Also: → 1139487
this is a good idea, and the current thought is to put talos in-tree and backport changes needed for android to the existing repo as needed.

we need to sort out:
page_load_test/
  - dromaeo/*
  - kraken/*
  - v8_7/*
  - canvasmark/*
  - webgl/*
startup_test/media/tools

in addition we should remove:
* talos/mobile_profile
* talos/places_generated_max
* talos/places_generated_med
* talos/specialpowers
* talos/startup_test/fennecmark
* talos/remotePerfConfigurator.py
* talos/test.py (test definitions for android)
* talos/PerfConfigurator.py (remote specific code)
* talos/ (references to mozdevice)
* requirements.txt (remove mozdevice, cache_flusher)
* create_talos_zip.py (whole file)
* talos/ffprocess_remote.py (might need other cleanup here)
* talos/breakpad/* (need to figure out if we need it, or can use in tree)

other things to sort out:
* how to run talos from './mach talos' using in-tree
* updating mozharness to get talos from in-tree (maybe the new archiver bits)
* do we need to include in a tests.zip style format?
Thanks Joel for sorting this out! A few comments:

(In reply to Joel Maher (:jmaher) from comment #41)
> 
> other things to sort out:
> * how to run talos from './mach talos' using in-tree
> * updating mozharness to get talos from in-tree (maybe the new archiver bits)

I suspect that will be done at the same time, since we have to tell mozharness to use in tree for desktop, an that './mach talos' uses mozharness.

> * do we need to include in a tests.zip style format?

If this was android specific, I'd vote no, so we do not maintain stuff that we have no use yet (maybe we'll never need it).


Also, I'm thinking about starting cleanup (removing android support) on another cloned repo - I can use my bitbucket or github account for that. So we could do the initial merge in tree based on something cleaner.
I wonder if we could start using a new 'no-android' talos branch already ?

I thought android versions on harness were launched in another way (using the talos.zip property in talos.json, and run by another harness script - maybe testing/mozharness/scripts/android_panda_talos.py). So they don't use the talos_repo and talos_revision properties.

In this case even without in-tree we can do a no-android branch, and use that in talos.json for the talos_repo property.

Am I missing something ?
a couple things:
* create_talos_zip.py - this requires a bit of hacks to run from in tree
* if we move in tree, then our way of running will change- possibly slight changes to talos
* android code makes talos messy- it would be nicer to have it somewhat clean when we start out

but you bring up a good point, if we can somehow continue to use the existing panda scripts to run android talos from the .zip like we currently do, then we don't have to worry about running from in-tree vs external-repo vs .zip.  I would like to double check create_talos_zip.py, if that doesn't require much or any changes, it would be much more realistic to get this in tree.

Still, many of the files we have outlined above can be removed, it would be great to remove what we can.
(In reply to Joel Maher (:jmaher) from comment #44)
> but you bring up a good point, if we can somehow continue to use the
> existing panda scripts to run android talos from the .zip like we currently
> do, then we don't have to worry about running from in-tree vs external-repo
> vs .zip.  I would like to double check create_talos_zip.py, if that doesn't
> require much or any changes, it would be much more realistic to get this in
> tree.
> 
> Still, many of the files we have outlined above can be removed, it would be
> great to remove what we can.

Yep, I started a new branch for removing android from talos, see bug 1187684. The thing is that maybe we can work on this, merge that on the official talos repo and use it in talos.json. We could separate the in-tree work this way from the cleaning.
Attached patch talos_into_common.tests.zip.diff (obsolete) — Splinter Review
Well Talos now uses latest mozbase packages, so we are ready to move it in m-c.

Here is how I see that:

 - copy talos from hg.mozilla.org/build/talos into testing/talos.
 - make changes into the build system to package talos inside a test zip file. Maybe inside the already existing common.tests.zip, or maybe into a new talos.tests.zip file.
 - adapt testing/mozharness/mozharness/mozilla/testing/talos.py to use that instead of cloning from hg to run talos. (I suspect we should inherit from MozbaseMixin - testing/mozharness/mozharness/mozilla/mozbase.py)

:chmanchester, I'm asking you for feedback here:

 - does the overall process seems good to you ?
 - the attached patch should put talos inside common.tests.zip (from what I understood of the code). Is this a good start ?
Attachment #8646295 - Flags: feedback?(cmanchester)
Comment on attachment 8646295 [details] [diff] [review]
talos_into_common.tests.zip.diff

Review of attachment 8646295 [details] [diff] [review]:
-----------------------------------------------------------------

This is totally on the right track (and thank you for tackling this!), but it's a little hard to imagine without the code already moved to testing/talos and the stage-package target implemented there.

How big is testing/talos going to be? If it's more than very small we can just move it to talos.tests.zip.
Attachment #8646295 - Flags: feedback?(cmanchester) → feedback+
testing/talos is around 40M uncompressed, so I suspect we should move it on his own package. :)
So, this creates a talos.tests.zip file.

I pushed to try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5b1ef7f1d990

You can see in results dir that the zip file is here and seems good:
http://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/j.parkouss@gmail.com-5b1ef7f1d990/

This patch assume that the talos code has been copied into testing/talos. I did that on try, but this will be done later in m-c with full talos history.

Also this is half of the work, next step is to make mozharness use that zip file instead of cloning talos. I am working on that, waiting for resuts here:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=529cdc20b980

(I'll ask later for this mozharness patch to be reviewed)
Assignee: cmtalbert → j.parkouss
Attachment #8646295 - Attachment is obsolete: true
Status: REOPENED → ASSIGNED
Attachment #8653643 - Flags: review?(cmanchester)
Oh, I think I got the first talos in-tree job run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=746b83150c6e

The mozharness patch should probably be cleaned a bit - still, great news!
we really need to test this on all platforms- odd things show up sometimes!
So this patch assume that we have a talos test zip file, and use that in mozharness instead of cloning (see previous patch).

I tested locally with "./mach talos-test chromez" and it works fine.

Also pushed to try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2e021bfb27bd

So far, so good. :)
Attachment #8654235 - Flags: review?(jlund)
Comment on attachment 8653643 [details] [diff] [review]
787200_talos_test_zip.patch

Review of attachment 8653643 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me! We should get ted or another build peer to sign off on this before landing.
Attachment #8653643 - Flags: review?(cmanchester) → feedback+
Some (stripped) conversation that can help me because the patch in attachment 8653643 [details] [diff] [review] is failing on OSX 10.7.

21:52 <parkouss> jmaher: hm, it looks like my talos build started on a bad osx base https://treeherder.mozilla.org/#/jobs?repo=try&revision=2e021bfb27bd
21:54 <chmanchester> parkouss: that actually might be your patch :/
21:54 <jmaher> oh
21:54 <parkouss> jmaher: chmanchester ah ?
21:54 <chmanchester> there's some crazy code for unified builds that invoke make package multiple times
21:55 <chmanchester> s/package/package-tests/
21:55 <jmaher> chmanchester: why do you remember all this stuff; good catch
21:56 <parkouss> chmanchester: but I tried the zip creation on another try with success
21:56 <chmanchester> parkouss: another try on 10.7 builds?
21:56 <parkouss> hm
21:58 <parkouss> chmanchester: ah, you may be right
22:00 <chmanchester> parkouss: https://dxr.mozilla.org/mozilla-central/source/build/macosx/universal/flight.mk?offset=400#24 is where all that happens
22:01 <parkouss> chmanchester: erf, I thought I was there;
22:02 <chmanchester> parkouss: I guess it's choking on the minidump_stackwalk binary checked in to the talos repo
22:04 <chmanchester> parkouss: np, hope that's helpful! a way to hack around it might be to not run the unify script on the talos-stage subdir
22:04 <chmanchester> since, as we learned, there's no talos on 10.6 anymore
Depends on: 1200294
Comment on attachment 8654235 [details] [diff] [review]
787200_mozharness_talos_test_zip.patch

Review of attachment 8654235 [details] [diff] [review]:
-----------------------------------------------------------------

this code looks good. couple impl questions:

1) are we not going to pursue https://bugzilla.mozilla.org/show_bug.cgi?id=1188043 ? and instead put talos in test zip?
2) what's going to happen to talos.json. iirc - it managed the revision of talos we were cloning and mobile/desktop behaved differently.

sorry to ask questions that have obviously been solved; I'm just getting up to speed.

r+ assuming you figure out https://bugzilla.mozilla.org/show_bug.cgi?id=1200294
Attachment #8654235 - Flags: review?(jlund) → review+
(In reply to Jordan Lund (:jlund) from comment #56)

> 1) are we not going to pursue
> https://bugzilla.mozilla.org/show_bug.cgi?id=1188043 ? and instead put talos
> in test zip?

No, you're right - this is no more required. I'll see with jmaher, but I think we can get rid of the bug (ie WONTFIX).

> 2) what's going to happen to talos.json. iirc - it managed the revision of
> talos we were cloning and mobile/desktop behaved differently.

well, it needs to stay here for now - it still needs to manage the talos for android, and the suite definitions for now.
But I should get rid of the line for the talos hg revision. :) I can update the patch for that.

> sorry to ask questions that have obviously been solved; I'm just getting up
> to speed.

NP, this helps! Thanks. :)
No longer depends on: 1188043
Comment on attachment 8653643 [details] [diff] [review]
787200_talos_test_zip.patch

So now that we get rid of the minidump talos binaries, I hope we will be fine now!

Pused to try:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9d44fe00454e
Attachment #8653643 - Flags: review?(ted)
Attachment #8653643 - Flags: review?(ted) → review+
Unfortunately unify on osx does not like testing/talos/talos/profiler/dump_syms_mac now.
I filed bug 1201224 with a patch to try to fix that, but unfortunately it ran into bug 1201345.
Depends on: 1201224
Ok, looks good except for xperf test (?). Jmaher, some thoughts ? the intree copy is based on https://hg.mozilla.org/build/talos/rev/be34538a5581.
I think I see the problem on xperf, it is not finding:
'{talos}\talos\page_load_test\tp5n\tp5n.manifest'

in this file:
http://hg.mozilla.org/build/talos/file/3b94adbd66f1/talos/xtalos/xperf_whitelist.json#l8

we are missing the \talos\ there, I think that should be easy to add.  We should add it, not replace it.  Both for tp5n.manifest and tp5n.manifest.develop.
Ok, thanks!

So, pushed to try one more time, with talos tip and the fix mentioned below:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=61d7bbe76b18
Bug 787200 - Move the Talos code into mozilla-central: add talos in-tree, add items in xperf whitelist. r=jmaher
Attachment #8658344 - Flags: review?(jmaher)
Comment on attachment 8658344 [details]
MozReview Request: Bug 787200 - Move the Talos code into mozilla-central: add talos in-tree, add items in xperf whitelist. r=jmaher

https://reviewboard.mozilla.org/r/18585/#review16607

thanks!
Attachment #8658344 - Flags: review?(jmaher) → review+
Comment on attachment 8658343 [details]
MozReview Request: Bug 787200 - Move the Talos code into mozilla-central: add talos in-tree, using https://hg.mozilla.org/build/talos/rev/c0de097a7159. r=jmaher

https://reviewboard.mozilla.org/r/18583/#review16609

crazy stuff, but really great!
Attachment #8658343 - Flags: review?(jmaher) → review+
Well my last push try was not using the fix in bug 1201224... So new try now that it is landed in m-c:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=dad13e46d6be
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: