Closed Bug 1254964 Opened 4 years ago Closed 3 years ago

Enable manual taskcluster scheduling for valgrind-mochitest runs

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set

Tracking

(firefox53 fixed)

RESOLVED FIXED
Tracking Status
firefox53 --- fixed

People

(Reporter: jseward, Assigned: jseward)

References

Details

Attachments

(1 file, 3 obsolete files)

This bug builds on bug 1245566.  That added much of the framework
needed to allow taskcluster valgrind/mochitest runs using special
try syntax, but stopped short of completely solving the problem,
due to concerns described at
https://bugzilla.mozilla.org/show_bug.cgi?id=1245566#c8

This bug aims to finish the job, so that, once landed, it is possible
to schedule such a run merely by supplying the correct try syntax,
for example: -b o -p linux64_tc -u mochitest-valgrind -t none
Attached patch Patch for discussion (obsolete) — Splinter Review
https://bugzilla.mozilla.org/show_bug.cgi?id=1245566#c8
appears to list two areas of concern for the uncommitted sections:

[1] the "linux64" build got renamed to "linux64_tc"

[2] adding a "mochitest-valgrind" test description would cause the
    tests to get run for "-u all", which is not desired.

This patch attempts to address [1] by cloning the two "linux64" parts
so as to create "linux64_tc" equivalents, rather than simply renaming
them.

I do not know how to solve [2].  Any suggestions?
Attachment #8728390 - Flags: feedback?(armenzg)
I will be away until March 23rd.
Please find another reviewer before that, otherwise, I will get to it at that point.
My apologies for the inconvenience.
Attached patch bug1254964-2.cset (obsolete) — Splinter Review
With slight revisions after discussing w/ jmaher and dustin on irc.
Attachment #8728390 - Attachment is obsolete: true
Attachment #8728390 - Flags: feedback?(armenzg)
Attachment #8730715 - Flags: review?(dustin)
Comment on attachment 8730715 [details] [diff] [review]
bug1254964-2.cset

Review of attachment 8730715 [details] [diff] [review]:
-----------------------------------------------------------------

From bug 1245566 I think what you're trying to do here is to define a test job that does not run on try by default, but can be run on-demand.  I believe that the test job requires a specific build of Firefox (with Valgrind), and not the normal linux64 build.  Again from that bug, it sounds like the not-by-default is because the suite isn't green yet.  Is that about right?

For the not-green-yet bit, Armen's answer in that bug is good: keep a patch around that adds the job, and include that in your try pushes until the job is green.  Another (more expensive) option is to land that patch but hide the job in treeherder.  I think it's possible to define a job but not schedule it by default -- I just don't see it as a great option.

Aside from that, I'm confused by this patch -- it looks like linux64-mochitest-valgrind is an exact duplicate of linux64.  How would the resulting build differ?
Attachment #8730715 - Flags: review?(dustin)
(In reply to Dustin J. Mitchell [:dustin] from comment #4)

Thank you for looking at this in some detail.

> From bug 1245566 I think what you're trying to do here is to define a test
> job that does not run on try by default, but can be run on-demand.

Yes, correct.

> I believe that the test job requires a specific build of Firefox (with
> Valgrind), and not the normal linux64 build.

No, I don't think that the test job requires a (new) specific build of
Firefox.  As far as I can see, the standard linux64 opt build is done
with the relevant flags (--enable-valgrind, --disable-jemalloc).  I
think that is because there is already a short on-every-push Valgrind
run on try, "Linux x64 opt Valgrind Build (V)" and so the build needs
to be configured to make that work anyway.

And I believe that this test job just uses the "Linux x64 opt" build.

> Again from that bug, it sounds like the not-by-default is because the
> suite isn't green yet.  Is that about right?

Partially, but there are other reasons.  The not-by-default is because
a complete mochitest-on-valgrind-on-a-slow-EC2-instance run requires
about 200 CPU hours.  It was never the intention to have this be a
default on-every-push job.  Rather, the intention was for it to be a
Tier-2 job that is run once every 12 or 24 hours.

Making it green is difficult to say the least.  The runs (it is split
into 40 pieces) go orange not only because of Valgrind errors, but
also because of any oranges that would have occurred anyway even
running not on Valgrind.

Also there are timeout issues that occur more frequently than on
"normal" runs, because Valgrind slows down execution so much.  I have
dealt with some of them, but there is more work to do here.

> For the not-green-yet bit, Armen's answer in that bug is good: keep a patch
> around that adds the job, and include that in your try pushes until the job
> is green.

I understand that .. but for the reasons above, getting to a
completely green status will be difficult.

> Aside from that, I'm confused by this patch -- it looks like
> linux64-mochitest-valgrind is an exact duplicate of linux64.  How would the
> resulting build differ?

I think it won't differ, and we can just use the linux64 build, per
comments above.  Does that make sense?  I am slightly hazy about this
myself, but I _think_ what I say is correct.
OK, I think in that case that you just want the third hunk from your patch.  The second hunk is adding a new build job, which you don't need, and the first hunk is adding your job to the default try runs.

With that in place, you should be able to test a try push to run your new task.

Once you've done that, look at the "D" task for your try push, and you'll see your try syntax injected into a `mach taskcluster-graph` command.  You can verify the behavior for "-u all" by copy/pasting that command and adjusting the try syntax.  If your job appears in the resulting task graph (grep for it), then it has not been correctly excluded from -u all.
Attached patch bug1254964-7.cset (obsolete) — Splinter Review
Rebased.  Required try syntax: -b o -p linux64 -u mochitest-valgrind -t none
Attachment #8730715 - Attachment is obsolete: true
Attachment #8809782 - Flags: review?(dustin)
Comment on attachment 8809782 [details] [diff] [review]
bug1254964-7.cset

Review of attachment 8809782 [details] [diff] [review]:
-----------------------------------------------------------------

<y-u-no-mozreview obligatory />

::: taskcluster/ci/desktop-test/test-sets.yml
@@ +36,5 @@
>      - web-platform-tests-wdspec
>      - xpcshell
>  
> +# This is the same as all-tests-debug,
> +# with the addition of "mochitest-valgrind".

I filed bug 1316877 to make this easier to express.  Fine to land as-is for now, and I'll clean it up in that bug.

::: taskcluster/ci/desktop-test/tests.yml
@@ +539,5 @@
> +                    - remove_executables.py
> +        extra-options:
> +            - --mochitest-suite=valgrind-plain
> +    # Bug 1281241: migrating to m3.large instances
> +    instance-size: legacy

Removing this would move you from m1.medium to m3.large, which would likely get better performance.
Attachment #8809782 - Flags: review?(dustin) → review+
* switches instance size from m1.medium to m3.large, for extra perf
  (anti-timeout) headroom and lower latency

* adds "run-on-projects: []"
Attachment #8809782 - Attachment is obsolete: true
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4513f4dad100
Enable manual taskcluster scheduling for valgrind-mochitest runs.  r=dustin@mozilla.com.
https://hg.mozilla.org/mozilla-central/rev/4513f4dad100
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.