Closed Bug 1229348 Opened 4 years ago Closed 4 years ago

Add a "valgrind-plain" suite to all_mochitest_suites

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set

Tracking

(firefox47 fixed)

RESOLVED FIXED
Tracking Status
firefox47 --- fixed

People

(Reporter: jseward, Assigned: jseward)

References

Details

Attachments

(4 files, 4 obsolete files)

This is a step in the direction of having a Tier-2 Valgrind Mochitest
run done in automation.  It builds on top of
bug 1185244 ("Improve mach support for running mochitests on Valgrind")

It also takes as prereq, JGraham's patch to add a |mozharness| command
to |mach|, in bug 1207377.

Aims of this bug are:

* Add a target |mochitest-valgrind| to |./mach mozharness|

* Select suitable suppression files and hand off to
  testing/mochitest/runtests.py

* Detect Valgrind errors in the output.

Most of the heavy lifting is taken care of already by bug 1185244.
Attached patch WIP patch (obsolete) — Splinter Review
WIP patch.  Makes |mach mozharness mochitest-valgrind| work.
Two concerns:

* The path for valgrind is hardwired as /usr/bin/valgrind.
  This does not seem good to me.  Is there a better way?

* Error (failure) detection does not seem to work.  The output
  contains lines like this, as created by the V output parser in
  bug 1185244
    13:39:44     INFO -  TEST-VALGRIND-ERROR ...
  but I still see, at the end:
    13:39:44     INFO -  4 INFO SimpleTest FINISHED
    13:39:44     INFO -  SUITE-END | took 152s
    13:39:44     INFO - Return code: 0
    13:39:44     INFO - TinderboxPrint: mochitest-valgrind-plain<br/>6/0/0
    13:39:44     INFO - # TBPL SUCCESS #
    13:39:44     INFO - The mochitest suite: valgrind-plain ran with return status: SUCCESS
  which doesn't seem right to me.
If this is a tier 2 type of test jobs + only Linux64 we can use the Taskcluster Linux builds to schedule this (rather than adding it to Buildbot).
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=tc%20linux

Please let me know when you have this working and we can look into scheduling it (atm only for Linux builds)
(In reply to Julian Seward [:jseward] from comment #1)
> Created attachment 8694178 [details] [diff] [review]
> WIP patch
> 
> WIP patch.  Makes |mach mozharness mochitest-valgrind| work.
> Two concerns:
> 
> * The path for valgrind is hardwired as /usr/bin/valgrind.
>   This does not seem good to me.  Is there a better way?

I'm not sure how this is usually done in mozharness, jlund may be able to give us a hint here.

> 
> * Error (failure) detection does not seem to work.  The output
>   contains lines like this, as created by the V output parser in
>   bug 1185244
>     13:39:44     INFO -  TEST-VALGRIND-ERROR ...
>   but I still see, at the end:
>     13:39:44     INFO -  4 INFO SimpleTest FINISHED
>     13:39:44     INFO -  SUITE-END | took 152s
>     13:39:44     INFO - Return code: 0
>     13:39:44     INFO - TinderboxPrint: mochitest-valgrind-plain<br/>6/0/0
>     13:39:44     INFO - # TBPL SUCCESS #
>     13:39:44     INFO - The mochitest suite: valgrind-plain ran with return
> status: SUCCESS
>   which doesn't seem right to me.

I think I see what's up here, let me look a bit more and upload something to try.
Attached patch vg_errorSplinter Review
This should case a failure when any line starts with "TEST-VALGRIND-ERROR". It will do so for all the harnesses, but I think that's ok.
(In reply to Chris Manchester [:chmanchester] from comment #5)
> Created attachment 8694461 [details] [diff] [review]
> vg_error
> 
> This should case a failure when any line starts with "TEST-VALGRIND-ERROR".
> It will do so for all the harnesses, but I think that's ok.

In other words, the second to last line should be "# TBPL FAILURE #".
(In reply to Chris Manchester (away Dec. 3-7) from comment #6)
> > This should case a failure when any line starts with "TEST-VALGRIND-ERROR".
> > It will do so for all the harnesses, but I think that's ok.
> 
> In other words, the second to last line should be "# TBPL FAILURE #".

Almost.  With the patch I get this:

13:12:35  WARNING - # TBPL WARNING #
13:12:35  WARNING - setting return code to 1
13:12:35  WARNING - The mochitest suite: valgrind-plain ran with return status: WARNING
Depends on: 1207377
(In reply to Julian Seward [:jseward] from comment #7)
I fiddled around with the "harness_error" clause in errors.py but failed
to figure out how to get TBPL FAILURE from it.
(In reply to Julian Seward [:jseward] from comment #8)
> (In reply to Julian Seward [:jseward] from comment #7)
> I fiddled around with the "harness_error" clause in errors.py but failed
> to figure out how to get TBPL FAILURE from it.

Sorry, that's my mistake in comment 6. TBPL WARNING will turn treeherder orange, and is the usual job status for a test failure and what we should expect from that patch, so I think that's actually appropriate here. In practice it's nearly synonymous with TBPL FAILURE, which turns the job red in treeherder.
> > 
> > * The path for valgrind is hardwired as /usr/bin/valgrind.
> >   This does not seem good to me.  Is there a better way?
> 
> I'm not sure how this is usually done in mozharness, jlund may be able to
> give us a hint here.
> 

hm, off the top of my head, we could leverage mozharness's which[1] to add some resiliency. problem there is that this hardcoded bit is in all_mochitest_suites. So, it be a little nasty but we could

1) change "--valgrind=/usr/bin/valgrind" to "--valgrind=%(valgrind)s" in all_mochitest_suites

2) add `"valgrind": "/usr/bin/valgrind"` to here[2] (so we have a fallback default value

3) then prior to using  all_mochitest_suites, do something like: `all_mochitest_suites % {'valgrind' self.which("valgrind")}`

the above would then search for valgrind exe in the PATH and use that if found. If not found, it will fall back to the hard coded path in self.config['exes']

However, the added complexity might not be worth the reward of resiliency. particularly if this is likely going to be the common valgrind location on all machines.

hope this helps

[1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py#766
[2] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/unittests/linux_unittest.py#22
(In reply to Jordan Lund (:jlund) from comment #10)
> hope this helps

Thank you.  Per subsequent irc chat, I'll just stick with hardwiring /usr/bin/valgrind
for now, and we can switch later to the more complex variable-location approach if
that turns out to be necessary.
Attached patch Cleaned up patch (obsolete) — Splinter Review
Attachment #8694178 - Attachment is obsolete: true
Attachment #8695256 - Flags: review?(cmanchester)
Hi Julian,
You can try this on try in the following manner:
1) Add an entry to the build job you want to suscribe like this in try/job_flags.yml [1]
 * The key of your entry should match to the name in step 2
2) Add you job name in here [2] and an alias to match your job [3]
2) Create a yml file like this [4] - make your changes
NOTE: the next two steps would not be necessary if my patch being reviewed would have landed. You cannot land your patch with the next two patches. If my patch does not land on time we can talk again and see what changes you will need to land.
3) Add fx_unittest_generic_command.yml file to your tree [5]
4) Patch fx_unittest_base.yml in this way [6]
5) Apply this hack to create a "linux64_tc" platform [7]. This will prevent Buildbot jobs from being scheduled

Before you push to try, you can test your patch to see your job being added (at root level of gecko tree):
> ./mach taskcluster-graph --pushlog-id=99894 --project=try '--message=try: -b do -p linux64_tc -u all -t none' --owner=armenzg@mozilla.com --revision-hash=edc4eabdbcf8c4e88c003f2ffec47a3a60796eca | grep '\"symbol'

It doesn't matter that you use my name, pushlog-id and others. What you care about is that your job would get added.
When satisfied, push to try and you should see your job being added.

[1] https://hg.mozilla.org/try/diff/d7d2db6e8bc7/testing/taskcluster/tasks/branches/try/job_flags.yml#l1.83
[2] https://hg.mozilla.org/try/diff/d7d2db6e8bc7/testing/taskcluster/tasks/branches/base_job_flags.yml#l1.81
[3] https://hg.mozilla.org/try/diff/d7d2db6e8bc7/testing/taskcluster/tasks/branches/base_job_flags.yml#l1.29
[4] https://hg.mozilla.org/try/diff/d7d2db6e8bc7/testing/taskcluster/tasks/tests/fx_linux64_luciddream.yml
[5] https://hg.mozilla.org/try/diff/d7d2db6e8bc7/testing/taskcluster/tasks/tests/fx_unittest_generic_command.yml
[6] https://hg.mozilla.org/try/diff/d7d2db6e8bc7/testing/taskcluster/tasks/tests/fx_unittest_base.yml
[7] https://hg.mozilla.org/try/rev/ac118b1d519c
(In reply to Armen Zambrano Gasparnian [:armenzg] from comment #13)
> You can try this on try in the following manner:

Armen, thanks for the details.  There is one thing that I unfortunately
failed to mention.  These runs need to be done on the Valgrind-enabled
builds that already get made.  The normal linux64_{opt,debug} builds
won't work for this.  chmanchester mentioned in irc that it would be
necessary to upload those builds before running them on try.  So, I am 
unclear how that affects your instructions in comment #13.

I will try them any, though.
Flags: needinfo?(armenzg)
(In reply to Julian Seward [:jseward] from comment #14)
> Armen, thanks for the details.  There is one thing that I unfortunately
> failed to mention.  These runs need to be done on the Valgrind-enabled
> builds that already get made.  The normal linux64_{opt,debug} builds
> won't work for this.

Ignore all that.  The standard "Linux x64 opt" builds should be OK to use,
because they are done with --enable-valgrind --disable-jemalloc, at least
if I am reading the logs right.  Sorry for the noise.
I wrote this comment before I read comment 15. In any case, here they're are for posterity :)

Those instructions won't work without having a build.
I thought they would be triggered with the Linux build.
You will have to run the valgrind build to TaskCluster first (or at least run it side by side).

We can either (option A) run the Buildbot job through TaskCluster or (option B) make it worker inside of docker (also TaskCluster).

For now, let's focus on adding the build to your Try push through Buildbot/TaskCluster.
1- You can create a .yml file like this [1]; You will have to add the name "Linux x86-64 mozilla-central valgrind" instead and better filename
2- You will also need this .yml file [2]
3- Add valgrind_build to this [3]
4- Also add a reference to the file representing the build in here [4]
5- Also add your test from comment 13 in here [5] and allow your build to trigger this test

When you push to try you will have to use "valgrind_build" as the way to indicate that you want to trigger the TaskCluster valgrind_build you're defining (rather than the one that would be triggered automatically by the Buildbot system).

Again, you can use the ./mach command to determine what would get scheduled since it will give a json dump of what tasks would have been scheduled.

[1] https://hg.mozilla.org/projects/alder/file/tip/testing/taskcluster/tasks/buildbot_bridge_build/linux64_firefox_opt.yml
[2] https://hg.mozilla.org/projects/alder/file/tip/testing/taskcluster/tasks/buildbot_bridge_build.yml
[3] https://hg.mozilla.org/projects/alder/file/tip/testing/taskcluster/tasks/job_flags.yml#l7
[4] https://hg.mozilla.org/projects/alder/file/tip/testing/taskcluster/tasks/job_flags.yml#l61
[5] https://hg.mozilla.org/projects/alder/file/tip/testing/taskcluster/tasks/job_flags.yml#l119
Flags: needinfo?(armenzg)
Attachment #8695256 - Flags: review?(cmanchester) → review+
(In reply to On PTO until Jan. 4th - Armen Zambrano Gasparnian [:armenzg] from comment #13)
> Hi Julian,
> You can try this on try in the following manner:

Results are as follows.  Looks promising to my untutored eye.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=44a74eed308f

Looks OK to you?  The test job doesn't appear to have run, but afaics
that is the intention.
Attachment #8695910 - Attachment is obsolete: true
It seems you got most things right. Congrats on achieving scheduling on TaskCluster! :)

You need to change the mozharness script to the right one:
https://hg.mozilla.org/try/rev/44a74eed308f#l3.15
> python2.7: can't open file '/home/worker/workspace/mozharness/scripts/linux_unittest.py': [Errno 2] No such file or directory

I don't know if desktop_unittest.py [1] would be the right one for you or if you would need to write a new one.

[1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/desktop_unittest.py
Depends on: 1242979
Rationalisation of current patch set.

001-bug1229348-7.cset       -- provide "mach mozharness mochitest-valgrind"
                            -- r+ chmanchester already, but not landed

002-armenzg-sched-006.cset  -- enable scheduling on taskcluster

003-dont_use_test_media_devices.diff  -- temp hack, needs to be combined into
                                      -- 002- somehow

How to use:

(1) check out m-c, as usual
(2) apply all 3
(3) ./mach build && ./mach package && ./mach build package-tests
(4) ./mach try -b o -p linux64_tc -u mochitest-valgrind -t none

With just 001 and 003, it should be possible to run locally:

./mach mozharness mochitest-valgrind

With all 3 patches applied, it should be possible to run locally:

(python mozharness/scripts/desktop_unittest.py --config-file \
  mozharness/configs/unittests/linux_unittest.py --config-file \
  mozharness/configs/remove_executables.py --no-read-buildbot-config \
   --installer-url=file:///home/sewardj/MOZ/MC-AUTO2/ff-Og-linux64/dist/firefox-47.0a1.en-US.linux-x86_64.tar.bz2 \
   --test-packages-url=file:///home/sewardj/MOZ/MC-AUTO2/ff-Og-linux64/dist/test_packages.json \
   --mochitest-suite=valgrind-plain) 2>&1 | cat
Attachment #8695256 - Attachment is obsolete: true
See description in comment 21.
Attachment #8702373 - Attachment is obsolete: true
See description in comment 21.
Current results with steps (1) .. (4) as shown in comment 21 are

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6cf22d15e392

It bundles everything up, does a build, and AFAIKS correctly
starts to run mochitests on valgrind on the node.  Which completely
fails because valgrind isn't actually installed on the node :-)
Comment on attachment 8712167 [details] [diff] [review]
001-bug1229348-7.cset

Review of attachment 8712167 [details] [diff] [review]:
-----------------------------------------------------------------

This
Attachment #8712167 - Flags: review+
Blocks: 1245566
(In reply to Julian Seward [:jseward] from comment #22)
> Created attachment 8712168 [details] [diff] [review]
> 002-armenzg-sched-006.cset

This is not really part of this bug.  It has been re-homed to bug 1245566.
https://hg.mozilla.org/mozilla-central/rev/1df6bf08633c
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.