787200 - Move the Talos code into mozilla-central

(no longer active)

Reporter

Description

•

12 years ago

CCing Joel since he knows what needs to be done here. :-)

Joel Maher ( :jmaher ) (UTC -8)

Comment 1

•

12 years ago

Historically it make great sense to leave talos outside of mozilla-central. Now we run different versions of talos on different branches as we change talos and the tests so frequently. There might be some difficulty in getting the releng scripts retrofitted to pull talos out of tests.zip (or similar) instead of downloading from build.mozilla.com/talos.zip. We also store some .zip files (tsplaces profiles and tp5n.zip pages) outside of the repository (for legal and size reasons). Those would need to be documented better. The last piece I can think of that might be problematic is that talos depends on external resources when we do a create_talos_zip. Most of these resources live in m-c already, but we would either have to add some in there, or require a virtual_env to run them (like we already do now, it would just be different than other harnesses).

Jeff Hammel

Updated

•

12 years ago

OS: Mac OS X → All

Hardware: x86 → All

bhearsum@mozilla.com (:bhearsum)

Comment 2

•

12 years ago

We pull Talos from build.mozilla.org based on the contents of https://mxr.mozilla.org/mozilla-central/source/testing/talos/talos.json. I agree, Talos is a completely separate thing. It's really up to ateam where it lives IMO.

Jeff Hammel

Comment 3

•

12 years ago

I'm fairly against this and would like to move more of our testing code and standalone tools out of mozilla central. One frustration is that there is no clear guideline for what should be where: there's a lot mirrored to mozilla-central, there's a lot that lives there, and there's a lot that isn't mirrored and doesn't live there. ABICT, there is no clear policy about what and why. While I feel that in practice modularity benefits from not having everything live in a giant tree, if we do want e.g. all testing related code to live in mozilla-central, then we should decide that and do it. This affects mozbase amongst other things. For talos specifically, we will want pyyaml in-tree. pywin32 will be very challenging, as witnessed by the fact that it has held up upgrading the windows talos slaves from 2.4 for over six months now.

(no longer active)

Reporter

Comment 4

•

12 years ago

(In reply to comment #3) > I'm fairly against this and would like to move more of our testing code and > standalone tools out of mozilla central. One frustration is that there is no > clear guideline for what should be where: there's a lot mirrored to > mozilla-central, there's a lot that lives there, and there's a lot that isn't > mirrored and doesn't live there. ABICT, there is no clear policy about what > and why. While I feel that in practice modularity benefits from not having > everything live in a giant tree, if we do want e.g. all testing related code to > live in mozilla-central, then we should decide that and do it. This affects > mozbase amongst other things. From the perspective of Mozilla developers, most people treat the things that live outside of mozilla-central as either non-existent or second class. Examples are mozmill and Jetpack tests. It would be very beneficial for us to be able to easily see what's inside the Talos tests and improve/fix them, and also to be able to run Talos in our check-outs against our builds for local measurements. All of this we can already do with the rest of our automated test suites (reftests/xpcshell/mochitest/etc.)

bhearsum@mozilla.com (:bhearsum)

Comment 5

•

12 years ago

(In reply to Ehsan Akhgari [:ehsan] from comment #4) > (In reply to comment #3) > > I'm fairly against this and would like to move more of our testing code and > > standalone tools out of mozilla central. One frustration is that there is no > > clear guideline for what should be where: there's a lot mirrored to > > mozilla-central, there's a lot that lives there, and there's a lot that isn't > > mirrored and doesn't live there. ABICT, there is no clear policy about what > > and why. While I feel that in practice modularity benefits from not having > > everything live in a giant tree, if we do want e.g. all testing related code to > > live in mozilla-central, then we should decide that and do it. This affects > > mozbase amongst other things. > > From the perspective of Mozilla developers, most people treat the things > that live outside of mozilla-central as either non-existent or second class. > Examples are mozmill and Jetpack tests. It would be very beneficial for us > to be able to easily see what's inside the Talos tests and improve/fix them, > and also to be able to run Talos in our check-outs against our builds for > local measurements. All of this we can already do with the rest of our > automated test suites (reftests/xpcshell/mochitest/etc.) b2g also lives outside mozilla-central. This isn't a scalable attitude.

Joel Maher ( :jmaher ) (UTC -8)

Comment 6

•

12 years ago

what if we had a make target: make talos-tsvg <- or other test name? That could check out the code and run the tests against the build in the tree. Just trying to look at all options before committing one way or another.

Jeff Hammel

Comment 7

•

12 years ago

I would not mind mirroring talos to m-c. We already mirror several pieces to m-c, from NSPR to mozbase. I would personally prefer to devote effort to unify how we mirror the different pieces. If talos was in m-c, `make talos-tsvg` would require (presumedly) some subset of buildbot-configs in order to get comparable results to production. We'd need to isolate what lived in buildbot-configs and move that to m-c. We would also have to ship pywin32 with the mozilla-build, ABICT.

(no longer active)

Reporter

Comment 8

•

12 years ago

(In reply to comment #5) > (In reply to Ehsan Akhgari [:ehsan] from comment #4) > > (In reply to comment #3) > > > I'm fairly against this and would like to move more of our testing code and > > > standalone tools out of mozilla central. One frustration is that there is no > > > clear guideline for what should be where: there's a lot mirrored to > > > mozilla-central, there's a lot that lives there, and there's a lot that isn't > > > mirrored and doesn't live there. ABICT, there is no clear policy about what > > > and why. While I feel that in practice modularity benefits from not having > > > everything live in a giant tree, if we do want e.g. all testing related code to > > > live in mozilla-central, then we should decide that and do it. This affects > > > mozbase amongst other things. > > > > From the perspective of Mozilla developers, most people treat the things > > that live outside of mozilla-central as either non-existent or second class. > > Examples are mozmill and Jetpack tests. It would be very beneficial for us > > to be able to easily see what's inside the Talos tests and improve/fix them, > > and also to be able to run Talos in our check-outs against our builds for > > local measurements. All of this we can already do with the rest of our > > automated test suites (reftests/xpcshell/mochitest/etc.) > > b2g also lives outside mozilla-central. This isn't a scalable attitude. Well, the Gecko specific bits used to live outside of mozilla-central, and the b2g team tried hard to merge them back in (and I think they succeeded, since they're doing their Gecko development on m-c.)

(no longer active)

Reporter

Comment 9

•

12 years ago

(In reply to comment #6) > what if we had a make target: > make talos-tsvg <- or other test name? > > That could check out the code and run the tests against the build in the tree. > Just trying to look at all options before committing one way or another. That would definitely be better than the current situation if we decide that we don't wanna move Talos inside m-c. (In that case I suggest we should use mach as opposed to a new make target.)

Jeff Hammel

Comment 10

•

12 years ago

hg subrepos could also conceivably be used here: http://www.selenic.com/hg/help/subrepos

Justin Lebar (not reading bugmail)

Comment 11

•

12 years ago

The problem we want to solve here isn't "it's hard to run Talos locally" but "it's hard to reproduce remote Talos numbers locally." Of course running Talos locally is a necessary condition, but in my experience, it's not sufficient. Your hardware does not match the testing machines'. Your system is not configured the same as the testing machines. Your toolchain does not match the testing machines'. You do not have all the same daemons running on your machine as the testing machines. So your results are likely not to match the testing machines'. For big regressions, of course we'll likely be able to see them locally. But big regressions aren't the problem, IME; our existing tools don't have much difficulty pointing out the offender in those cases. Or at least, they wouldn't, if we actually ran Talos on every m-i push. Separately, I thought that we could not redistribute the Talos files we run on our builders. That's why the file lives at build.mozilla.com. Anyway, a make target (or script or mach target or whatever) can't hurt, and I think that would make a lot more sense than checking 100mb of test files (IIRC) in to m-c. But I don't think it will help much, either.

Joel Maher ( :jmaher ) (UTC -8)

Comment 12

•

12 years ago

We can run all tests from the talos repository except for tp5 and ts_places_* (the dirty tests). Those tests require additional files which live on the build network. The main motivation for this bug is to solve these questions: * Where is the talos code, how are these tests run? * I want to investigate a performance regression, how do I run talos? While it is very true that we cannot reproduce the numbers locally that talos generates, I still think we can reproduce a regression locally by running two builds and comparing results. So running tests locally will be able to allow somebody to see the impact code changes have on the numbers, even if it is off by a small factor from what we would see on tbpl. The big gaping hole is how do you compare results when running them locally, or more specifically how do you translate the reported numbers into something that gets reported to graph server allowing you to see the difference your patch makes. If the common scenario is to run talos tests locally to investigate a regression (known or possible based off local patches), then having make targets will be useful. I think with both cases we will need to run against old and new builds to ensure we can see the difference locally which means that a make target isn't going to be that helpful. To solve the releng dilemna, we can continue to generate talos.zip files and make our system work the same way it does no matter where the talos code lives.

Justin Lebar (not reading bugmail)

Comment 13

•

12 years ago

> I still think we can reproduce a regression locally by running two builds and comparing > results. Indeed, that's a logical conclusion. But in my experience, this is much easier said than done. I spent two months trying to understand why I couldn't reproduce a talos regression I observed on m-i using tryserver, and I eventually gave up (bug 653961). Trying to do the same on your local machine will be ten times harder. Rafael had a great post on dev.planning about how incredibly hard this can be in practice. Unfortunately it's not on Google Groups yet, afaict. But to quote the problems he encountered trying to reproduce a regression locally: > * our builds are *really* hard to reproduce. The build I was downloading from > try was faster than the one I was doing locally. In despair I decided to fix > at least part of this first. It found that our build was depending on the way > the bots use ccache (they set CCACHE_BASEDIR which changes __FILE__), the > build directory (shows up on debug info that is not stripped), and the file > system being case sensitive or not. > > * testing on linux showed even more bizarre cases where small changes cause > performance problems. In particular, adding a nop *after the last ret* in > function would make the js interpreter faster on sunspider. The nop was just > enough to make the function size cross the next 16 bytes boundary and that > changed the address of every function linked after it. > > * the histogram of some benchmarks don't look like a normal distribution > (https://plus.google.com/u/0/108996039294665965197/posts/8GyqMEZHHVR). I > still have to read the paper mentioned in the comments.

Chris AtLee [:catlee]

Comment 14

•

12 years ago

That raises the question of why should we run tests that give such noisy results in automation? If it's so hard to reproduce a regression locally, why do we think that the automated results are any better?

Nathan Froyd [:froydnj]

Comment 15

•

12 years ago

(In reply to Chris AtLee [:catlee] from comment #14) > That raises the question of why should we run tests that give such noisy > results in automation? If it's so hard to reproduce a regression locally, > why do we think that the automated results are any better? ISTM that Justin has not been lamenting the noisiness of the tests themselves, but other environmental factors. Maybe "able to run Talos locally" shouldn't be the goal, but "able to run Talos locally with a similar build environment" should be the goal, of which the former is only a stepping stone to the latter. That would require downloadable build tools, some way to automagically use them, etc. etc. Of course, the tests may be noisy too! But that seems like a separate concern from what's been discussed thus far.

Justin Lebar (not reading bugmail)

Comment 16

•

12 years ago

I think part of the problem is, we have no way to distinguish between "small" and "large" regressions. Any detectable downward change is considered a regression. > If it's so hard to reproduce a regression locally, why do we think that the automated > results are any better? In one sense, the problem is that some of our tests have /too little/ noise when run on automation. This causes us to freak out about what are small but detectable changes in the result which we can't reproduce on other systems. Put another way, part of the problem is that the noise introduced by different builds/systems/etc is larger than the noise in the tests when run on automation.

Joel Maher ( :jmaher ) (UTC -8)

Comment 17

•

12 years ago

Performance tests will always have noise even if it is very small and almost unmeasurable. Running on different hardware and OS configs is the biggest factor. On my local desktop I have all types of things running which we do not run on the talos slaves, likewise the versions of talos and other libraries could be different. I attribute the noise in performance runs as an equal to the random oranges we see in our unittests, it is a fact of life unless we are testing "hello world". We have been working on reducing the noise in the tests and making sure the tests are useful. If we start backing patches out due to performance regressions we need to give developers the ability to attempt to reproduce the problem locally. Just like random oranges, this is not always possible. There are good ways to run Talos right now, but it is out of the normal workflow for builds and unittests. the real questions is: if we run it on tbpl, should it be all inclusive in mozilla-central? Talos is the major exception to that rule. logistically we can make this work; it might take a while to get our automation using it from a new location. Lets focus on the bug at hand here and weigh on in the pros/cons of talos living in m-c. right now it is pretty neutral.

Justin Lebar (not reading bugmail)

Comment 18

•

12 years ago

> I attribute the noise in performance runs as an equal to the random oranges we see in > our unittests, it is a fact of life unless we are testing "hello world". Yes, of course. > If we start backing patches out due to performance regressions we need to give developers > the ability to attempt to reproduce the problem locally. I don't think anyone disagrees with the idea of making it easier to run Talos locally. The disagreement is only over the degree to which this may be futile. Anyway, to the question at hand about whether Talos belongs in m-c: How large is the Talos zip file?

Justin Lebar (not reading bugmail)

Comment 19

•

12 years ago

> I don't think anyone disagrees with the idea of making it easier to run Talos locally. > The disagreement is only over the degree to which this may be futile. And the degree to which it's futile informs the priority of fixing this bug, because if running tests were always futile, we wouldn't care at all about making it easy to run Talos locally. But if the priority of this bug isn't in question, then the futility isn't really relevant to this bug, I agree.

(no longer active)

Reporter

Comment 20

•

12 years ago

(In reply to comment #13) > > I still think we can reproduce a regression locally by running two builds and comparing > > results. > > Indeed, that's a logical conclusion. But in my experience, this is much easier > said than done. I spent two months trying to understand why I couldn't > reproduce a talos regression I observed on m-i using tryserver, and I > eventually gave up (bug 653961). Trying to do the same on your local machine > will be ten times harder. In the past there was a time when I watched dev.tree-management very closely and bugged people when they regressed stuff. And they kept telling me that Talos regressions are very hard to reproduce locally. So I started to wonder whether there's some truth to that on the average case, so I started to try to reproduce the regressions and improvements locally (on the tests that did not require non-public files, obviously.) And I realized that it is actually very easy to reproduce many Talos regressions locally. I don't have that data around any more, but I clearly remember the number being much larger than 50%.

Joel Maher ( :jmaher ) (UTC -8)

Comment 21

•

12 years ago

Currently talos.zip is about 9MB in size. It sounds like we should adjust revisit our dev.tree-management emailer program and poll developers if running talos from an external repository is too much of an OOB process. I know personally when I have to commit to a github project it is a learning curve every time (about once every 6 weeks). Would rather remove as much of the learning curve as possible.

(no longer active)

Reporter

Comment 22

•

12 years ago

(In reply to comment #21) > Currently talos.zip is about 9MB in size. That is not very big. I land 20+MB patches without a blink!

Jeff Hammel

Comment 23

•

12 years ago

(In reply to Joel Maher (:jmaher) from comment #21) > Currently talos.zip is about 9MB in size. > > It sounds like we should adjust revisit our dev.tree-management emailer > program and poll developers if running talos from an external repository is > too much of an OOB process. I know personally when I have to commit to a > github project it is a learning curve every time (about once every 6 weeks). > Would rather remove as much of the learning curve as possible. To be fair, talos lives in hg, not in github: http://hg.mozilla.org/build/talos/

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Comment 24

•

12 years ago

Keeping just on why I think we should move talos to m-c, as I agree that the impact it would have on reproducing remote regressions locally would be small, but it should have other benefits: * One place to look for information. I am trying to create a script to compare talos runs. This is something that should really be on m-c as it is a basic developer tool, if the rest of talus was there information could be shared. * Easier to reproduce old results. If something changes in talos, that change is recorded in m-c. This is analogous to the move of mozconfigs to m-c which was an awesome improvement. * Easier to test changes. Do you want to propose a new benchmark? Do you want to change one to remove some source of noise? All you need is a try run modifying talos. This is even true if all you want to do is check if a test is sensitive to measurement bias: you can change the talos script to create a dummy env var for example.

Justin Wood (:Callek)

Comment 25

•

12 years ago

(In reply to Rafael Ávila de Espíndola (:espindola) from comment #24) > * One place to look for information. I am trying to create a script to > compare talos runs. This is something that should really be on m-c as it is > a basic developer tool, if the rest of talus was there information could be > shared. I don't buy this as a scalable solution to the problem of "where is what I want" > * Easier to reproduce old results. If something changes in talos, that > change is recorded in m-c. This is analogous to the move of mozconfigs to > m-c which was an awesome improvement. We already have a .json that identifies what talos zip our "old builds" are based off of, you already conceded that reproducing this locally doesn't work. > * Easier to test changes. Do you want to propose a new benchmark? Do you > want to change one to remove some source of noise? All you need is a try run > modifying talos. This is even true if all you want to do is check if a test > is sensitive to measurement bias: you can change the talos script to create > a dummy env var for example. This alone won't make testing local talos changes easier. What does work *now* is creating a new talos.zip uploading it somewhere our infra can reach, and then running a try run with the json changed. To make our infra use something from in tree is a much different/bigger problem, and would need to change whether we did that from in talos or in m-c as far as "where does the code live".

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Comment 26

•

12 years ago

> I don't buy this as a scalable solution to the problem of "where is what I > want" I have seen trees *way* larger than m-c, and it scales really well. > We already have a .json that identifies what talos zip our "old builds" are > based off of, you already conceded that reproducing this locally doesn't > work. And remotely? Is the zip on m-c? Is the zip named with a hash of its contents? > This alone won't make testing local talos changes easier. Again, I am not discussing that. I am discussing why I think we should move talos to m-c, and as I said on the first paragraph, that was not one of the reasons. > What does work > *now* is creating a new talos.zip uploading it somewhere our infra can > reach, and then running a try run with the json changed. > Which is *way* worse than testing other changes in firefox. > To make our infra use something from in tree is a much different/bigger > problem, and would need to change whether we did that from in talos or in > m-c as far as "where does the code live". If we are going to take arguments on this line we would never make infrastructure changes. I was not here at the time, but I am sure it was hell to change cvs to hg. More recently we have also changed how we do some things to make our lives easier: * mozconfigs are now in tree * compilers (for now b2g and os x) can be fetch from a manifest in m-c.

Jeff Hammel

Comment 27

•

12 years ago

(In reply to Rafael Ávila de Espíndola (:espindola) from comment #26) > > I don't buy this as a scalable solution to the problem of "where is what I > > want" > > I have seen trees *way* larger than m-c, and it scales really well. I have seen trees way *smaller* than m-c and (depending on context) I find this strategy doesn't scale well at all. Namely, while everyone agrees that modularity is important, and most people agree that it is good to have software that (not to be taken too literally) "does one thing and one thing well", my experience with "put all the things in the tree" is that software that should be modular components end up getting (usually needlessly) intertwined with other software. Meaningful dependencies are not maintained. The giant tree is used as a replacement for a real deployment strategy. Our tree is a great example of this. I'm not going to dig up examples right now, but if you look at our existing testing software and try to decipher what depends on what (and even better, the location of what in the tree) and can come back and say that that is *good* practice....well, I'd be somewhat shocked. While I also think several aspects of putting talos in the tree would be problematic (e.g. pywin32) and (very) time-consuming (e.g. changing all the build infrastructure to use the in-tree builds), my foremost objection is that programmers -- even very talented programmers -- seem to be unable to keep things modular when you have a giant soup of all the things. > > We already have a .json that identifies what talos zip our "old builds" are > > based off of, you already conceded that reproducing this locally doesn't > > work. > > And remotely? Is the zip on m-c? Is the zip named with a hash of its > contents? Yes, in fact it is named with the changeset hash of its contents: https://hg.mozilla.org/mozilla-central/file/e3e7f8f7796d/testing/talos/talos.json "url": "http://build.mozilla.org/talos/zips/talos.38e088867f7b.zip" So if you wanted to run the exact version 38e088867f7b of talos, that is perfectly possible (and scriptable). However, for most cases on m-c (not necessary aurora or beta), its probably reasonable to check out the tip of talos from http://hg.mozilla.org/build/talos/ > > What does work > > *now* is creating a new talos.zip uploading it somewhere our infra can > > reach, and then running a try run with the json changed. > > > > Which is *way* worse than testing other changes in firefox. Interestingly i've got this down to a script I can invoke in 1 line: http://k0s.org/mozilla/talos/update_talos.py: ~/mozilla/talos/update_talos.py url/or/file/path/to/test.diff While this is tailored to my machine (mostly due to lack of interest in anyone else using it), i'd certainly be happy to check this into m-c and/or improve it if anyone was interested in using this script. > > To make our infra use something from in tree is a much different/bigger > > problem, and would need to change whether we did that from in talos or in > > m-c as far as "where does the code live". > > If we are going to take arguments on this line we would never make > infrastructure changes. I was not here at the time, but I am sure it was > hell to change cvs to hg. More recently we have also changed how we do some > things to make our lives easier: > > * mozconfigs are now in tree > * compilers (for now b2g and os x) can be fetch from a manifest in m-c. But you don't store the compilers themselves in the tree? Why not? Without broadening the scope too much, I'd like to know criteria for: - what can/should live in the tree - what can/should be mirrored to the tree - what can/should not live in the tree Obviously there are very strong opinions in all directions here. I'm fairly pro-mirroring/subrepos and fairly against having a giant monolithic repo of all the things. That said, if we actually had a strategy here, instead of variations on this theme coming up every three months and having this argument, I would at least be somewhat happier even if I disagreed with the outcome. Now, as best I can tell we have no real strategy and what our strategy is seems to change from project to project, test framework to test framework, etc. If there is a pattern, I can neither discern it nor have I seen it documented. As far as technical concerns, I haven't heard anyone come up with a strategy about what to do with the pywin32 dependency (outside of the fact we don't want it anyway) or any cost-estimate on how much releng manhours it would take to revamp all the infrastructure (as a talos developer, outside of concern for trying to preserve sanity in Mozilla's infrastructure, I shouldn't really care; I just have to check out another repo. If I was with releng...I would probably feel much more frustrated). And there's still the matter of the pagesets which can't legally live in tree. If I saw that developer convenience to run talos tests outweighed these concerns, I would be more empathetic. I've personally done a lot in the last year from changing Talos from a system that was very hard to install, required an apache setup, and otherwise required production configuration, into a real piece of software that can be installed in the usual python way. I would like to continue to make this easier. I don't think it is a huge amount to ask to checkout talos in a virtualenv, run setup.py develop, and run the tests (and hopefully complain to me if something goes wrong). And as said I am perfectly open to mirroring or subrepositories as solutions if `make tsvg` is worth the high but unestimated number of manhours to get our infrastructure to use in-tree talos

Rafael Ávila de Espíndola (:espindola) (not reading bugmail)

Comment 28

•

12 years ago

>But you don't store the compilers themselves in the tree? Why not? > hg sucks.

(no longer active)

Reporter

Comment 29

•

12 years ago

OK, there seems to be a huge disconnect between the two sides of the argument here. Let's see if I can make things a bit clearer. (In reply to Jeff Hammel [:jhammel] from comment #27) > I have seen trees way *smaller* than m-c and (depending on context) I find > this strategy doesn't scale well at all. Namely, while everyone agrees that > modularity is important, and most people agree that it is good to have > software that (not to be taken too literally) "does one thing and one thing > well", my experience with "put all the things in the tree" is that software > that should be modular components end up getting (usually needlessly) > intertwined with other software. Meaningful dependencies are not > maintained. The giant tree is used as a replacement for a real deployment > strategy. Our tree is a great example of this. I'm not going to dig up > examples right now, but if you look at our existing testing software and try > to decipher what depends on what (and even better, the location of what in > the tree) and can come back and say that that is *good* practice....well, > I'd be somewhat shocked. You and Ben have said that this is not scalable, and before reading this I thought that you're talking about things like the size of the repository, etc. As far as those concerns go, I refer you to the fact that Talos both in terms of the number of changesets (518) and the on-disk size is tiny compared to m-c, so it will not change the performance characteristics of hg handling m-c in any meaningful way. And while I appreciate discussions about the future directions on how scalable merging certain things into m-c is, I don't think that this is the right forum for that discussion, and it is definitely out of the scope of the current discussion (especially since nobody has a list of the "things" that people may want integrated into m-c in the future.) Coming up with a strategy on whether a thing needs to live in m-c or not is something best done based on the merits of doing so on a case by case basis, not by blanket strategies which are bound to miss important points about individual cases. As far as the modularity argument goes, I really fail to see why the location of the source code has anything to do with this. This is something which should be enforced by the module owner and reviewers. The reason that many interdependencies exist in many parts of m-c today could well be attributed to history and pragmatic concerns in some cases "having _something_ which works tomorrow, rather than having something ideal six months from now." Now it can be argued that those trade-offs have been wrong in the past, but that doesn't really have anything to do with the location of the source code. > While I also think several aspects of putting talos in the tree would be > problematic (e.g. pywin32) and (very) time-consuming (e.g. changing all the > build infrastructure to use the in-tree builds), my foremost objection is > that programmers -- even very talented programmers -- seem to be unable to > keep things modular when you have a giant soup of all the things. And that is why we have review requirements. Note that I have never proposed any changes in the module ownership or review requirements for Talos. > > > To make our infra use something from in tree is a much different/bigger > > > problem, and would need to change whether we did that from in talos or in > > > m-c as far as "where does the code live". > > > > If we are going to take arguments on this line we would never make > > infrastructure changes. I was not here at the time, but I am sure it was > > hell to change cvs to hg. More recently we have also changed how we do some > > things to make our lives easier: > > > > * mozconfigs are now in tree > > * compilers (for now b2g and os x) can be fetch from a manifest in m-c. > > But you don't store the compilers themselves in the tree? Why not? Because people are not expected to understand the internals of the source code for compilers, and to modify them. You can't use the fact that we don't put the source code for our compilers in the tree as a precursor to conclude that we should do the same for Talos. > Without broadening the scope too much, I'd like to know criteria for: > - what can/should live in the tree > - what can/should be mirrored to the tree > - what can/should not live in the tree As I said above, this is scope-creep. We *don't* need to have an answer for these points in order to decide where Talos should live. And it is very possible in my experience that we as Mozilla will _never_ have a blanket answer for those (which I would support, as I think these forms of abstract questions are impossible to answer well in practice.) > Obviously there are very strong opinions in all directions here. I'm fairly > pro-mirroring/subrepos and fairly against having a giant monolithic repo of > all the things. Hmm, why would you prefer m-c/talos to be a subrepo as opposed to a normal subdirectory of m-c? > That said, if we actually had a strategy here, instead of > variations on this theme coming up every three months and having this > argument, I would at least be somewhat happier even if I disagreed with the > outcome. Now, as best I can tell we have no real strategy and what our > strategy is seems to change from project to project, test framework to test > framework, etc. If there is a pattern, I can neither discern it nor have I > seen it documented. No, you're right. I'm not aware of any global strategy on this either. Which is why we're having this discussion. :-) > As far as technical concerns, I haven't heard anyone come up with a strategy > about what to do with the pywin32 dependency (outside of the fact we don't > want it anyway) or any cost-estimate on how much releng manhours it would > take to revamp all the infrastructure (as a talos developer, outside of > concern for trying to preserve sanity in Mozilla's infrastructure, I > shouldn't really care; I just have to check out another repo. If I was with > releng...I would probably feel much more frustrated). And there's still the > matter of the pagesets which can't legally live in tree. I don't know what the pywin32 dependency issue is, since you have only mentioned that it exists. I don't also know what the RelEng side of work will look like, but in a small unofficial chat that I had with a few of the RelEng folks the other day at the office, they seemed to indicate that they just use the zip file on the build server, so they don't really care where the code lives. Given that the above is actually true (and it would be great if someone from RelEng can confirm that please), and that we can find a solution to the pywin32 dependency problem, would you still object to moving the source code? > If I saw that developer convenience to run talos tests outweighed these > concerns, I would be more empathetic. I've personally done a lot in the > last year from changing Talos from a system that was very hard to install, > required an apache setup, and otherwise required production configuration, > into a real piece of software that can be installed in the usual python way. > I would like to continue to make this easier. I don't think it is a huge > amount to ask to checkout talos in a virtualenv, run setup.py develop, and > run the tests (and hopefully complain to me if something goes wrong). And > as said I am perfectly open to mirroring or subrepositories as solutions if > `make tsvg` is worth the high but unestimated number of manhours to get our > infrastructure to use in-tree talos The amount of work that has gone through Talos is indeed astonishing, and I can attest to that as someone who has personally gone through the pain of running the initial versions of "Standalone Talos"... something which IIRC took me a few days to accomplish back in those days! :-) But speaking as a developer, it is not a matter of laziness that I request for the Talos code to be moved to m-c. Here are the reasons why I think it would be a useful change: 1. More visibility to all of the developers working on Core and Firefox, as opposed to only the people who work specifically on Talos. 2. Easier tracking of changes alongside with m-c changes. For example, if I want to know how the changes in Talos and the layout module have played together over the past month, I need to be able to issue a command like |hg log layout/ talos/|. Knowing the snapshot level history of Talos changes in the talos.json file is definitely better than not detecting any dependency at all, but is not enough at all, since it hides the history of changes, which is probably more important than major diffs of snapshots. The current alternative today is to log both repositories in two terminal windows, and hack your way around by matching change dates in your head and doing guesswork. 3. Making is easier for more developers to run Talos locally by adding build system support for making it easier for developers to run those tests. Note that we do have historical data on how well keeping test suites outside of mozilla-central works. We have had long debates about merging in the jetpack and mozmill test suites, and the teams working on those projects have resisted for various reasons. The current state of those tests is that people hide them on TBPL, and when somebody lands something that breaks them, the usual answer from the developers is shrugging, and pointing out that it takes them too much effort to figure out where to get the test suite from and how to run it, and the fact that this would be the first time that they've ever looked at that test suite does not help either. You may dislike this reaction, and I can sympathize, but this is the fact of the matter as history has taught us. And those test suites are not taken seriously by developers to this day, which is sad.

Jeff Hammel

Comment 30

•

12 years ago

(In reply to Ehsan Akhgari [:ehsan] from comment #29) > OK, there seems to be a huge disconnect between the two sides of the > argument here. Let's see if I can make things a bit clearer. Responses selectively inline. > (In reply to Jeff Hammel [:jhammel] from comment #27) > > I have seen trees way *smaller* than m-c and (depending on context) I find > > this strategy doesn't scale well at all. <snip/> > As far as the modularity argument goes, I really fail to see why the > location of the source code has anything to do with this. This is something > which should be enforced by the module owner and reviewers. The reason that > many interdependencies exist in many parts of m-c today could well be > attributed to history and pragmatic concerns in some cases "having > _something_ which works tomorrow, rather than having something ideal six > months from now." Now it can be argued that those trade-offs have been > wrong in the past, but that doesn't really have anything to do with the > location of the source code. While in theory, I will mostly agree. In practice, both in terms of mozilla-central and the other large (though not m-c large) monoliths I have worked with, I find that this gives developers reverse incentives to take short-cuts vs. having to figure out more rigorous ways of resolving dependencies. In general, I think "put XXX in YYY's monolithic repository" is a way to work around figuring out how software YYY can depend on software XXX, not solving the problem. In the interest of not bikeshedding or creeping scope, I will leave this here, but am open to discussing this is less off-topic forums. > > While I also think several aspects of putting talos in the tree would be > > problematic (e.g. pywin32) and (very) time-consuming (e.g. changing all the > > build infrastructure to use the in-tree builds), my foremost objection is > > that programmers -- even very talented programmers -- seem to be unable to > > keep things modular when you have a giant soup of all the things. > > And that is why we have review requirements. Note that I have never > proposed any changes in the module ownership or review requirements for > Talos. I would be much more inclined to agree if the state of python packaging and interdependencies in mozilla-central was better than it was. Seeing these problems persist and not receive prioritization does not encourage me that the review process is enough to catch this. > > > > To make our infra use something from in tree is a much different/bigger > > > > problem, and would need to change whether we did that from in talos or in > > > > m-c as far as "where does the code live". > > > > > > If we are going to take arguments on this line we would never make > > > infrastructure changes. I was not here at the time, but I am sure it was > > > hell to change cvs to hg. More recently we have also changed how we do some > > > things to make our lives easier: > > > > > > * mozconfigs are now in tree > > > * compilers (for now b2g and os x) can be fetch from a manifest in m-c. > > > > But you don't store the compilers themselves in the tree? Why not? > > Because people are not expected to understand the internals of the source > code for compilers, and to modify them. You can't use the fact that we > don't put the source code for our compilers in the tree as a precursor to > conclude that we should do the same for Talos. > > > Without broadening the scope too much, I'd like to know criteria for: > > - what can/should live in the tree > > - what can/should be mirrored to the tree > > - what can/should not live in the tree > > As I said above, this is scope-creep. We *don't* need to have an answer for > these points in order to decide where Talos should live. And it is very > possible in my experience that we as Mozilla will _never_ have a blanket > answer for those (which I would support, as I think these forms of abstract > questions are impossible to answer well in practice.) What I don't like to see is not only this logic being applied inconsistently (through inference), but coming up with N different solutions to the same problem. Not only do we mirror a lot of stuff, we mirror it differently. Or we copy + paste code. I also don't believe that one blanket strategy "solves everything", but from my POV, we don't seem to have a strategy as far as this question. > > Obviously there are very strong opinions in all directions here. I'm fairly > > pro-mirroring/subrepos and fairly against having a giant monolithic repo of > > all the things. > > Hmm, why would you prefer m-c/talos to be a subrepo as opposed to a normal > subdirectory of m-c? If talos is a subrepository or is mirrored to m-c, development continues to take place in hg.m.o/build/talos. Another reason I tend against the "move all the things to m-c" strategy is that it discourages collaboration and makes reuse harder. From the point of view of "Talos as a standalone piece of software (for performance testing of Firefox)", and I want to collaborate on this project can work on Talos as a standalone tool. Personally, when I have been interested in adding functionality to standalone tools for other OSS projects, if the first step is "check out a large repository", I lose my interest. While the feeling is not universal, I have often heard from others that feel the same. For reuse, I have currently one active and several past projects that have consumed Talos code as a library. While releasing Talos to pypi (something we don't do and that I don't know how :jmaher feels about) would make this at least possible, it is (effectively) impossible to use "Talos the python package" as a (normal) dependency in downstream code. Personally, I would like to see Mozilla aim for more modularity and more of "Mozilla as a platform", not less. But that again is a larger topic. <snip/> > > That said, if we actually had a strategy here, instead of > > variations on this theme coming up every three months and having this > > argument, I would at least be somewhat happier even if I disagreed with the > > outcome. Now, as best I can tell we have no real strategy and what our > > strategy is seems to change from project to project, test framework to test > > framework, etc. If there is a pattern, I can neither discern it nor have I > > seen it documented. > > No, you're right. I'm not aware of any global strategy on this either. > Which is why we're having this discussion. :-) > > > As far as technical concerns, I haven't heard anyone come up with a strategy > > about what to do with the pywin32 dependency (outside of the fact we don't > > want it anyway) or any cost-estimate on how much releng manhours it would > > take to revamp all the infrastructure (as a talos developer, outside of > > concern for trying to preserve sanity in Mozilla's infrastructure, I > > shouldn't really care; I just have to check out another repo. If I was with > > releng...I would probably feel much more frustrated). And there's still the > > matter of the pagesets which can't legally live in tree. > > I don't know what the pywin32 dependency issue is, since you have only > mentioned that it exists. I don't also know what the RelEng side of work > will look like, but in a small unofficial chat that I had with a few of the > RelEng folks the other day at the office, they seemed to indicate that they > just use the zip file on the build server, so they don't really care where > the code lives. > > Given that the above is actually true (and it would be great if someone from > RelEng can confirm that please), and that we can find a solution to the > pywin32 dependency problem, would you still object to moving the source code? Yes, again mostly for reasons of modularity. But I'm not the owner of Talos. > > If I saw that developer convenience to run talos tests outweighed these > > concerns, I would be more empathetic. I've personally done a lot in the > > last year from changing Talos from a system that was very hard to install, > > required an apache setup, and otherwise required production configuration, > > into a real piece of software that can be installed in the usual python way. > > I would like to continue to make this easier. I don't think it is a huge > > amount to ask to checkout talos in a virtualenv, run setup.py develop, and > > run the tests (and hopefully complain to me if something goes wrong). And > > as said I am perfectly open to mirroring or subrepositories as solutions if > > `make tsvg` is worth the high but unestimated number of manhours to get our > > infrastructure to use in-tree talos > > The amount of work that has gone through Talos is indeed astonishing, and I > can attest to that as someone who has personally gone through the pain of > running the initial versions of "Standalone Talos"... something which IIRC > took me a few days to accomplish back in those days! :-) > But speaking as a developer, it is not a matter of laziness that I request > for the Talos code to be moved to m-c. Here are the reasons why I think it > would be a useful change: > 1. More visibility to all of the developers working on Core and Firefox, as > opposed to only the people who work specifically on Talos. I would tend to solve this problem through improved communication and tools. > 2. Easier tracking of changes alongside with m-c changes. For example, if I > want to know how the changes in Talos and the layout module have played > together over the past month, I need to be able to issue a command like |hg > log layout/ talos/|. Knowing the snapshot level history of Talos changes in > the talos.json file is definitely better than not detecting any dependency > at all, but is not enough at all, since it hides the history of changes, > which is probably more important than major diffs of snapshots. The current > alternative today is to log both repositories in two terminal windows, and > hack your way around by matching change dates in your head and doing > guesswork. Again, I would tend towards tools to help with this problem. FWIW, I do believe and would encourage tests to live with mozilla-central. Test harnesses....not so much. > 3. Making is easier for more developers to run Talos locally by adding build > system support for making it easier for developers to run those tests. Note > that we do have historical data on how well keeping test suites outside of > mozilla-central works. We have had long debates about merging in the > jetpack and mozmill test suites, and the teams working on those projects > have resisted for various reasons. The current state of those tests is that > people hide them on TBPL, and when somebody lands something that breaks > them, the usual answer from the developers is shrugging, and pointing out > that it takes them too much effort to figure out where to get the test suite > from and how to run it, and the fact that this would be the first time that > they've ever looked at that test suite does not help either. You may > dislike this reaction, and I can sympathize, but this is the fact of the > matter as history has taught us. And those test suites are not taken > seriously by developers to this day, which is sad. I think it will be hard to take Talos data seriously until we have more rigorous statistics in place and having these be self-evident to developers. While if a changeset causes a significant performance hit it may be easy to figure out from e.g. the datazilla json format if one has caused a regression, we do not have easy (and rigorous) ways of analzying this currently. I'm going to de-CC myself from this bug as I fear I am derailing the discussion, which believe it or not is not my intent, and because most of what I have to say has to do with the direction Mozilla is going as a whole with respect to "Mozilla as a platform", but I am happy to address all of these issues is (possibly too great of) depth in other channels.

(no longer active)

Reporter

Comment 31

•

12 years ago

I think the two sides of the argument have mostly said what they have to say, I personally don't have a lot to add to this discussion. According to https://wiki.mozilla.org/Modules/Core#Testing_Infrastructure, Clint is the owner of Talos. I'm assigning this bug to him since I believe he is the one who needs to make the call here. :-)

Assignee: nobody → ctalbert

:Ms2ger (he/him; ⌚ UTC+1/+2)

Comment 32

•

12 years ago

(In reply to Ehsan Akhgari [:ehsan] from comment #29) > 3. Making is easier for more developers to run Talos locally by adding build > system support for making it easier for developers to run those tests. I'm not sure this actually requires having talos in m-c; if we had a ./mach talos dromaeojs (say), that would transparently pull the right files from wherever they live, I think that would be sufficient for most developers.

Joel Maher ( :jmaher ) (UTC -8)

Comment 33

•

12 years ago

After looking over this thread yet again and thinking about the problem we are trying to solve, I don't see any value in putting talos into m-c. While ctalbert is the owner of talos, :jhammel and myself have been the two people driving the refactoring and expansion of talos with the help of a lot of community members and a few folks from inside of Mozilla. Talos is fairly easy to run if you follow the instructions. We can figure out how to make it run from 'make talos dromaeojs' which will solve the core problem here. Hacking on talos and the test cases is not the common use case and in the rare case that is needed you will have the source code available from where you are running it locally.

Armen [:armenzg]

Comment 34

•

12 years ago

I'm sorry for the late reply. I forgot to hit submit. (In reply to Ehsan Akhgari [:ehsan] from comment #29) > Given that the above is actually true (and it would be great if someone from > RelEng can confirm that please), and that we can find a solution to the > pywin32 dependency problem, would you still object to moving the source code? > Yes, we download a talos.zip that the a-team gives us to upload to build.mozilla.org. Where the source leaves does not affect us. We're hoping to have

Armen [:armenzg]

Comment 35

•

12 years ago

(broken comment - probably that is why I did not hit submit) I don't know what we were hoping for :P if I remember I will reply.

(no longer active)

Reporter

Comment 36

•

12 years ago

(In reply to comment #33) > After looking over this thread yet again and thinking about the problem we are > trying to solve, I don't see any value in putting talos into m-c. While > ctalbert is the owner of talos, :jhammel and myself have been the two people > driving the refactoring and expansion of talos with the help of a lot of > community members and a few folks from inside of Mozilla. > > Talos is fairly easy to run if you follow the instructions. We can figure out > how to make it run from 'make talos dromaeojs' which will solve the core > problem here. Hacking on talos and the test cases is not the common use case > and in the rare case that is needed you will have the source code available > from where you are running it locally. At the risk of repeating myself yet again, the ability of looking at the history of Talos intertwined with the history of m-c is also valuable.

Joel Maher ( :jmaher ) (UTC -8)

Comment 37

•

12 years ago

talos is run as a released version from a timepoint. Every checkin we do does not get deployed. So looking at what changes are in talos for a given time range does not tell you what was actually being run on tbpl. Remember, builbot uses talos.zip as a released version of talos and is downloaded from a private link in the build.mozilla.org network. To the point of talos being the cause of regressions it is probably 1% of all talos regressions which have been caused by the harness/toolchain.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

12 years ago

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WONTFIX

Justin Lebar (not reading bugmail)

Comment 38

•

12 years ago

At the risk of starting another war, I filed bug 789266 on making it easier to run Talos locally.

cmtalbert

Comment 39

•

12 years ago

Sorry I'm late to the party. I've had an even worse fire to fight this week. There are lots of good reasons here from both sides. The way we run Talos has nothing to do with what is in any tree. Talos is a released software project. We stage and QA version x as a release, we upload it to a place the build infra can get to it, and then that release is used as the bits that run the test on the changsets being tested. Because of that ^ reason alone, I *don't* want to put Talos in m-c. I think that having Talos code in m-c will spawn even more confusion because then when new developer A comes along and sees the code in m-c, they modify the code, and then nothing happens in the automation. And that would only serve to make them more confused. And because Talos is so critical and even small changes can have an impact on the noise (in the current version), I don't want every single commit in m-c to be treated as a "new version" of Talos. Every version of Talos is run through many runs on staging servers before it is ever deployed. And I don't want to switch to a system where we don't stage our changes to Talos before we deploy them. What I *do* think we should do, which is exactly what jlebar filed the next bug for is make this process easier for developers to run Talos locally and on try. Testing on Try is a critical piece to how we develop Talos (because then we can see the numbers, trust they run the same environments etc), but the method we use to do that is not your standard method because we don't want the things we push to try to be picked up by the tests that are happening on inbound etc. So, I agree with Joel here that the value vs. the trouble of putting Talos in m-c isn't worth it. What I think *is* worth it, and *is* long overdue is a developer focused design on how to run talos both locally on your machine and on try. There is no contention around making this system easier to use. Jeff and Joel have been working on that for months as time allows. And the entire thrust behind the Datazilla/Signal From Noise project has been to make Talos easier to use and less noisy. I'm hoping we can use the new mach utility in both cases to give developers a simple interface for working with Talos. Let's follow up on bug 789266 and ensure that our changes to make Talos seamlessly easy to use are amenable to your day to day work flows.

William Lachance (:wlach)

Comment 40

•

10 years ago

Based on experience and various advancements since this was last discussed, I believe the going consensus is now that we do, in fact, want to do this. The only question is when, as many things need to change. Probably we want to migrate Android away from Talos first (this is already planned), as it would be the source of additional complexity in tackling this. In any case, reopening to reflect the current status.

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

William Lachance (:wlach)

Updated

•

10 years ago

Comment 41

•

10 years ago

this is a good idea, and the current thought is to put talos in-tree and backport changes needed for android to the existing repo as needed. we need to sort out: page_load_test/ - dromaeo/* - kraken/* - v8_7/* - canvasmark/* - webgl/* startup_test/media/tools in addition we should remove: * talos/mobile_profile * talos/places_generated_max * talos/places_generated_med * talos/specialpowers * talos/startup_test/fennecmark * talos/remotePerfConfigurator.py * talos/test.py (test definitions for android) * talos/PerfConfigurator.py (remote specific code) * talos/ (references to mozdevice) * requirements.txt (remove mozdevice, cache_flusher) * create_talos_zip.py (whole file) * talos/ffprocess_remote.py (might need other cleanup here) * talos/breakpad/* (need to figure out if we need it, or can use in tree) other things to sort out: * how to run talos from './mach talos' using in-tree * updating mozharness to get talos from in-tree (maybe the new archiver bits) * do we need to include in a tests.zip style format?

talos_into_common.tests.zip.diff 9 years ago Julien Pagès (:parkouss) 3.06 KB, patch	chmanchester : feedback+	Details \| Diff \| Splinter Review
787200_talos_test_zip.patch 9 years ago Julien Pagès (:parkouss) 5.37 KB, patch	ted : review+ chmanchester : feedback+	Details \| Diff \| Splinter Review
787200_mozharness_talos_test_zip.patch 9 years ago Julien Pagès (:parkouss) 14.05 KB, patch	jlund : review+	Details \| Diff \| Splinter Review
MozReview Request: Bug 787200 - Move the Talos code into mozilla-central: add talos in-tree, using https://hg.mozilla.org/build/talos/rev/c0de097a7159. r=jmaher 9 years ago Julien Pagès (:parkouss) 40 bytes, text/x-review-board-request	jmaher : review+	Details
MozReview Request: Bug 787200 - Move the Talos code into mozilla-central: add talos in-tree, add items in xperf whitelist. r=jmaher 9 years ago Julien Pagès (:parkouss) 40 bytes, text/x-review-board-request	jmaher : review+	Details