Cannot run any test suites on linux64-jsdcov or linux64-ccov.

RESOLVED FIXED

Status

Testing
Code Coverage
RESOLVED FIXED
8 months ago
5 months ago

People

(Reporter: sparky, Unassigned)

Tracking

(Blocks: 2 bugs)

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

8 months ago
Recently, while attempting to fix some other problems, I discovered that we can no longer specify any tests to run at all. It seems to only be running when '-p all -u all' is specified (hence why it's still running on m-c). For any other specification I cannot start a single test whether I use 'mochitest-bc', 'all', etc..

Here's the latest revision that I've used that fails: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4baf70869cbb65912c558096101b32986663b128

This is the last good revision that I have (about a month and a half ago): https://hg.mozilla.org/try/rev/5bf30443017e2616ee14bf38423d5ba0eae9c2ba
(Reporter)

Comment 1

8 months ago
Dustin, would you know of any recent changes that could have caused this problem? 

At first I thought that the change from using an empty run-on-projects array to one with 'mozilla-central' could've caused this, but it seems that we have this change in the revision that is working. We also have the talos changes which are the most recent changes that we've made, IIRC.
Flags: needinfo?(dustin)
Nothing comes to mind.  You might experiment with some `./mach taskgraph target-graph` on various revisions, using a parameters.yml from one of those try pushes, to see when the change occurred.
Flags: needinfo?(dustin)
(Reporter)

Comment 3

8 months ago
Thanks for the quick reply Dustin, will do.
(Reporter)

Comment 4

8 months ago
So far, I've managed to find that this regression is coming from one of the changes made in this bug: https://bugzilla.mozilla.org/show_bug.cgi?id=1384905

Right now, I suspect that the problem is either coming from an additional check_run_on_projects call added on line 583 of 'try_option_syntax.py' (https://reviewboard.mozilla.org/r/162046/diff/7#index_header), or, something from the 'take 2' patch.
(Reporter)

Comment 5

8 months ago
Created attachment 8899135 [details] [diff] [review]
test_fix.diff

So I've managed to find that the problem is coming from the addition of the 'run_by_default' variable which changed the logic slightly. There's also the last return which was changed from 'True' in the old revision's file to 'False' in the current file revision.

The patch I have attached fixes the problem and it seems to be working so far but I'll test it a bit more first to make sure it doesn't introduce any other regressions.
(Reporter)

Comment 6

8 months ago
I've tested the patch given in comment 5 and it fails when I do '-b do - p all -u all' but passes for everything else. The main problem area is here: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/try_option_syntax.py#588-589

When I change the return shown there to True, everything works except for what was fixed in bug 1384905. So, I'm wondering if there is a way we can differentiate between build platforms (linux64-jsdcov/ccov and their tests) and platforms that seem to ride along other platform names (windows 8 being specified with '-b d -p win64 -u xpcshell[Windows 8]' - I'm not sure of what the technical term for this is). I noticed a newer flag 'built-projects' in the test schema [1] - that isn't used in 'match_test' - but I'm not sure if using that would be the right way to go about this.

Dustin, would you have any ideas about how these cases could be differentiated? 


[1]: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/tests.py#177
Flags: needinfo?(dustin)
Try is such a tangled mess.  A "ridealong" is a build platform that comes along when another build platform is specified.  At the moment these are mostly things that the author thought "when someone types -p linux64 they should get my stuff too".

In the case you give, Windows 8 is a test platform (specifically, it's aliased to the windows8-64 test platform - https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/try_option_syntax.py#136).

built-projects is a way of specifying a run_on_projects for a test job that matches what the run_on_projects value is for the build.  But run_on_projects is ignored if `-p` is given in try.  So, yes, I think this comes down to something broken in bug 1384905.

I don't see a bright future for try option syntax (-p, -b, etc.).  The easiest fix might be to use `./mach try fuzzy` here instead.  I expect something like `./mach try fuzzy -q 'linux64-ccov mochitest'` would find what you want.
Flags: needinfo?(dustin)
(Reporter)

Comment 8

8 months ago
Ah, thanks for the explanations! I was thinking that we could hard-code a list of build platforms that don't run by default and use that list at [1], but that may be far from ideal.

Does the 'try fuzzy' command change the tests that are run and/or the number of times they are run, or will everything stay the same?

Another thing we can do here is let linux64-jsdcov/ccov run by default, but prevent them from uploading artifacts and this way, Active Data won't get overloaded.

[1]: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/try_option_syntax.py#588-589
Flags: needinfo?(dustin)
Andrew is working on adding ways to modify the tests run with mach try fuzzy -- bug 1387135.  For the moment it just runs each task once with the "normal" arguments.
Flags: needinfo?(dustin)
(Reporter)

Comment 10

8 months ago
Ok, thank you. Would you also by any chance know if this works on windows?

I'm getting this error when I try to use it with any build and test combination: https://pastebin.mozilla.org/9030207
(Reporter)

Comment 11

8 months ago
Dustin, the ni? is for comment 10.
Flags: needinfo?(dustin)
I think that's a question for :ahal..
Flags: needinfo?(dustin) → needinfo?(ahalberstadt)
I think you are seeing that because you're using a new parameters.yml (by default it downloads the latest one from mozilla-central) on an old version of m-c. If you rebase onto the latest m-c tip that error should go away.

Alternatively, you can pass in a parameters.yml from the m-c that you're currently based on (look under artifacts in treeherder for that push) and pass it in via -p.
Flags: needinfo?(ahalberstadt)
To clarify, yes I expect it to work on Windows, so if you still have problems please let me know!
(Reporter)

Comment 15

8 months ago
Thanks for the help :ahal! I managed to get around that last error but I'm getting another one now: https://pastebin.mozilla.org/9030358

It's telling me that I am missing a commit with try syntax. At first, I thought we didn't need one, so I didn't have it but even after I added a try syntax commit, I still see the error. Would you have any ideas for how i can fix this (or what I'm doing wrong)?
(Reporter)

Comment 16

8 months ago
ni? for comment 15.
Flags: needinfo?(ahalberstadt)
Sorry, forgot to mention you also need a recent copy of version-control-tools (./mach mercurial-setup -u)
Flags: needinfo?(ahalberstadt)
(Reporter)

Comment 18

8 months ago
Thanks again :ahal, it worked: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e06a73c0803ed15a7545fb46666e9c48882142f7
So, we can definitely continue with using this tool instead. 

We'll also have to update our code coverage docs (quick-start guide and official MDN docs) because of this change.


Joel, Kyle, Marco, Armen, do you have any thoughts or concerns about going with this route? There is one other solution that I can think of and that is to let jsdcov and ccov run on try by default.
This is perfectly fine. Thanks for looking into it!
I also agree, this seems reasonable- if we run into many problems we can either move more people to fuzzy, or try to solve this in a different way.
Just to note, you can use -q <query> to avoid the interactive interface, so you could craft some pre-built queries that select the things you want and share them with people. I'm also close to landing bug 1390969 which will let you use the --preset/--save arguments (like try syntax has). So you'll be able to do, e.g:
./mach try --preset jsdcov
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #20)
> I also agree, this seems reasonable- if we run into many problems we can
> either move more people to fuzzy, or try to solve this in a different way.

I'm very much in favor of moving to fuzzy (well, more generally to using tryselect mechanisms, of which fuzzy is a great example) as quickly as possible.
I forgot about this and was struggling on bug 1335518 to get a build with tests running.
If we can't support "./mach try -b o -p linux64-ccov -u all -t none" anymore, we should really update the documentation.
(In reply to Greg Mierzwinski [:gmierz] from comment #24)
> Sorry about that Marco, I've updated the documents:
> https://developer.mozilla.org/en-US/docs/Mozilla/Testing/
> Measuring_Code_Coverage_on_Firefox

No problem and thanks for updating the docs! I was already aware of this bug, but totally forgot about it since it happened while I was on vacation :)
(Reporter)

Comment 26

7 months ago
No problem! :)

:ahal, would you have a link to some documentation on these tryselect mechanisms that you're working on? (i.e. --preset/--save and fuzzy). I'd like to link to them in the Mozilla code coverage documentation.
Flags: needinfo?(ahalberstadt)
Unfortunately it's only documented in |mach try fuzzy --help|, so for now I guess you'll just have to tell people to run that. Bug 1397433 is on file to add proper docs though. I hope to get around to this sooner rather than later.
Flags: needinfo?(ahalberstadt)
(Reporter)

Comment 28

5 months ago
Closed because the solution works for us. See the documentation here: https://developer.mozilla.org/en-US/docs/Mozilla/Testing/Measuring_Code_Coverage_on_Firefox
Status: NEW → RESOLVED
Last Resolved: 5 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.