Closed Bug 631842 Opened 13 years ago Closed 11 years ago

Add and use a mach command to run the Valgrind test job

Categories

(Firefox Build System :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla29

People

(Reporter: philor, Assigned: n.nethercote)

References

Details

Attachments

(3 files, 10 obsolete files)

5.40 KB, patch
ted
: review+
Details | Diff | Splinter Review
1.94 KB, patch
catlee
: review+
Details | Diff | Splinter Review
4.68 KB, patch
gps
: review+
Details | Diff | Splinter Review
Rather than having the instructions for imitating the tinderbox Valgrind build be something like

1. Look at http://hg.mozilla.org/build/tools/file/894a93c77304/scripts/valgrind/valgrind.sh and figure out which parts are specific to buildbot and which aren't
2. Do the parts that aren't

it would be nice to have the instructions be

1. Build with --enable-valgrind --disable-jemalloc
(1.5 ??? what am I going to do about suppression since I'm not using some old CentOS with a random set of upgraded libs?!)
2. make -C objdir valgrind
Probably should have a mach command, too.
Summary: Add a make target for Valgrind → Make Valgrind-on-TBPL runs easy to emulate locally (via a make target and/or mach command)
Let's just do a mach command. Make targets are so out of fashion. "mach valgrind" seems a little too generic, so maybe something like "mach valgrind-test"? It would simply invoke the same command as "make pgo-profile-run" but with the Valgrind args baked in:
http://hg.mozilla.org/mozilla-central/annotate/7476bb2b8e9c/testing/testsuite-targets.mk#l403

(we could plausibly replace the "make pgo-profile-run" invocation in valgrind.sh with this mach invocation then, which would be interesting.)

It might be sensible to have the mach command error out if the build you're running it on isn't --enable-valgrind --disable-jemalloc.
Blocks: 946011
Attached patch Implement |mach valgrind-test|. (obsolete) — Splinter Review
This works.  It re-implements the code from
http://hg.mozilla.org/build/tools/file/default/scripts/valgrind/valgrind.sh#l73
onwards.

It's my first mach command, so I'm sure to have committed some
style atrocities.  I ended up putting it in python/mach_commands.py because
that let me use self.python_executable.

ted: in profileserver.py I changed it to use default builds rather than
packaged builds, because that makes sense for local use.  But if you think it's
important to use packaged builds on the test machines I can add an argument to
|mach valgrind-test| to allow either behaviour.

I'll change valgrind.sh to use this new command in a later patch.
Attachment #8347045 - Flags: review?(ted)
Attachment #8347045 - Flags: review?(gps)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
(In reply to Nicholas Nethercote [:njn] from comment #3)
> But if you think it's important to use packaged builds on the test machines (...)

It is. There are significant differences between packaged and non-packaged builds, and they do have an effect on the profile data.
I'm happy to allow for packaged and non-packaged, but note that this new command will only be used for the Valgrind jobs.  The normal PGO builds will still use the |pgo-profile-run| make target.
> I'm happy to allow for packaged and non-packaged, but note that this new
> command will only be used for the Valgrind jobs.  The normal PGO builds will
> still use the |pgo-profile-run| make target.

Oh, but they both use the profileserver.py script.  So you're right -- I'll need to make it to work both ways, and expose that in the mach command.
Comment on attachment 8347045 [details] [diff] [review]
Implement |mach valgrind-test|.

I've removed the r? request for ted, because glandium has answered that question.
Attachment #8347045 - Flags: review?(ted)
Attachment #8347045 - Flags: review?(gps)
This version adds the following features:

- |mach test v| and |mach test V| work.

- I added a --unstaged option to profileserver.py which causes it to not be run
  from the staged package, and I use that for |mach valgrind-test|.  All the
  other uses of profileserver.py are unchanged.

r?ing gps for the mach command, and ted for the profileserver.py and local.py
changes.
Attachment #8347820 - Flags: review?(ted)
Attachment #8347820 - Flags: review?(gps)
Attachment #8347045 - Attachment is obsolete: true
I haven't actually tested this because I don't have a build slave.  I figure
I'll debug it as necessary on the production servers, since (a) the jobs aren't
visible on TBPL, and (b) I've been doing plenty of that lately anyway. O_o

The good news is that this patch moves the Valgrind invocation from the tools
repo to mozilla-central.  So once this patch is working, future changes to the
Valgrind invocation can be tested via try server.

The invocation should be the same as it currently is, with the exception that
Firefox is no longer run out of the staged package, because |mach
valgrind-test| uses the unstaged build.
Attachment #8347822 - Flags: review?(catlee)
> - I added a --unstaged option to profileserver.py which causes it to not be
> run
>   from the staged package, and I use that for |mach valgrind-test|.  All the
>   other uses of profileserver.py are unchanged.

Actually, I wonder if I should just copy profileserver.py and make a new valgrind.py file.  It's more a historical accident than anything else that the Valgrind jobs runs the PGO code, and in the future I'd like to explore making the Valgrind job run more stuff, to try to catch more errors.  That would also make bug 823787 a lot easier, because I wouldn't have to understand why Windows PGO jobs are returning a non-zero exit code.  Yeah, I'll do that.
Attachment #8347820 - Flags: review?(ted)
Attachment #8347820 - Flags: review?(gps)
Product: Firefox → Core
This patch clones build/pgo/profileserver.py as
build/valgrind/valgrind_test.py, with two changes.

- Unstaged builds are used.

- Exit failure is detected.

The test files are also cloned.  They will change in the future in order to
improve the Valgrind coverage.
Attachment #8347840 - Flags: review?(ted)
Attachment #8347841 - Flags: review?(gps)
Attachment #8347820 - Attachment is obsolete: true
Attachment #8347822 - Attachment is obsolete: true
Attachment #8347822 - Flags: review?(catlee)
Summary: Make Valgrind-on-TBPL runs easy to emulate locally (via a make target and/or mach command) → Add and use a mach command to run the Valgrind test job
> (part 2) - Implement |mach valgrind-test|.

One problem with this is that redirecting the output doesn't always work.  Basic redirects work, but this doesn't:

  mach valgrind-test 2>&1 | tee x

Nothing gets printed the terminal or to the file named 'x'.  I suspect this is related to the use of subprocess.call to call the valgrind_test.py script?
This revised version doesn't copy all the js-input/ and blueprint/ files.
Attachment #8347940 - Flags: review?(ted)
Attachment #8347840 - Attachment is obsolete: true
Attachment #8347840 - Flags: review?(ted)
No longer blocks: 946011
gps, ted, catlee:  this is the last bug that needs to be fixed before the Valgrind test job can be made visible on TBPL!  I'd *love* to get that done before the Christmas shutdown.
Two minor changes:

- I added a --suppressions arg to |mach valgrind-test|, which gets passed
  through to Valgrind.  This is useful for suppressing errors in system
  libraries that aren't covered by the built-in suppression files.

  Note that |mach test v --suppressions=<file>| doesn't work;  I guess the
  aliasing doesn't handle arguments?

- I move the setting of G_SLICE into the mach command.

I'm not planning to change this patch again :)
Attachment #8348459 - Flags: review?(gps)
Attachment #8347841 - Attachment is obsolete: true
Attachment #8347841 - Flags: review?(gps)
No need to set G_SLICE now that the mach command does it.
Attachment #8348460 - Flags: review?(catlee)
Attachment #8347842 - Attachment is obsolete: true
Attachment #8347842 - Flags: review?(catlee)
(In reply to Nicholas Nethercote [:njn] from comment #14)
> > (part 2) - Implement |mach valgrind-test|.
> 
> One problem with this is that redirecting the output doesn't always work. 
> Basic redirects work, but this doesn't:
> 
>   mach valgrind-test 2>&1 | tee x
> 
> Nothing gets printed the terminal or to the file named 'x'.  I suspect this
> is related to the use of subprocess.call to call the valgrind_test.py script?

Output from mach can be... special.

There is a run_process() helper in mach.mixin.process.ProcessExecutionMixin (inherit that class) that runs processes in an "ordained" manner and ensures output is captured appropriately.
I don't want coal in my stocking, so I'm going to try to get this reviewed this week. It's not helping that others have the same before-Christmas aspirations as you and my review queue is nice and clogged.
I was just testing on a build slave.  We need $PYTHON because the default
python is 2.6.6, and mach needs 2.7.  And I fixed the path to mach.
Attachment #8349135 - Flags: review?(catlee)
Attachment #8348460 - Attachment is obsolete: true
Attachment #8348460 - Flags: review?(catlee)
I tried using ProcessExecutionMixin.run_process but failed miserably.  Oh well,
follow-up fodder.

I also added an |is_firefox| condition, just for laughs.
Attachment #8349147 - Flags: review?(gps)
Attachment #8348459 - Attachment is obsolete: true
Attachment #8348459 - Flags: review?(gps)
Attachment #8349135 - Flags: review?(catlee) → review+
Comment on attachment 8347940 [details] [diff] [review]
(part 1) - Clone the PGO profile test script and data for use in the Valgrind test.

Review of attachment 8347940 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry, didn't realize how trivial this was!

::: build/valgrind/index.html
@@ +1,4 @@
>  <script>
>  /* This Source Code Form is subject to the terms of the Mozilla Public
>   * License, v. 2.0. If a copy of the MPL was not distributed with this
>   * file, You can obtain one at http://mozilla.org/MPL/2.0/. */

Is there any reason to copy this file currently? Seems like you might as well just leave it where it is until you want to make actual changes.

::: build/valgrind/valgrind_test.py
@@ +71,5 @@
> +
> +    # Note that the |finally| block below will always run.
> +    # http://docs.python.org/2/library/sys.html#sys.exit says that sys.exit
> +    # "is implemented by raising the SystemExit exception, so cleanup actions
> +    # specified by finally clauses of try statements are honored".

You don't need to put this comment here unless you really want to. I don't think we need to document language features like this.
Attachment #8347940 - Flags: review?(ted) → review+
> Is there any reason to copy this file currently? Seems like you might as
> well just leave it where it is until you want to make actual changes.

Fair enough.  I've reverted that part.

> You don't need to put this comment here unless you really want to. I don't
> think we need to document language features like this.

Ok!
Two down, one to go :)
Comment on attachment 8349147 [details] [diff] [review]
(part 2) - Implement |mach valgrind-test|.

Review of attachment 8349147 [details] [diff] [review]:
-----------------------------------------------------------------

This looks mostly good. Just some minor comments.

::: python/mach_commands.py
@@ +142,5 @@
> +        metavar='FILENAME',
> +        help='Specify a suppression file for Valgrind to use. Multiple '
> +            '--suppression args can be used to specify multiple suppression '
> +            'files.')
> +    def valgrind_test(self, suppressions):

I think this should be in build/valgrind/mach_commands.py or even testing/mach_commands.py. python/mach_commands.py definitely seems like the wrong place for it.

@@ +160,5 @@
> +            '--show-possibly-lost=no',
> +            '--track-origins=yes'
> +        ]
> +        for s in suppressions:
> +            debugger_args.append('--suppressions=' + s)

What if s has spaces in it? Would it be better to just use the multiple arguments (no =) form?

@@ +166,5 @@
> +        build_dir = os.path.join(self.topsrcdir, 'build')
> +        supps_dir = os.path.join(build_dir, 'valgrind')
> +        debugger_args.append('--suppressions=' + os.path.join(supps_dir, 'cross-architecture.sup'))
> +
> +        machtype = subprocess.check_output(['bash', '-c', 'echo $MACHTYPE']).rstrip()

Why can't we get this from os.environ? Please document if there is a good reason.

@@ +182,5 @@
> +
> +        script = os.path.join(build_dir, 'valgrind', 'valgrind_test.py')
> +        return subprocess.call([self.python_executable, script,
> +                                '--debugger=valgrind',
> +                                '--debugger-args=' + ' '.join(debugger_args) + ''],

This seems like a fragile escaping problem.

You can also consider using imp.load_source() to obtain a module object for a .py file and calling some method on that module natively. I haven't looked at valgrant_test.py, but presumably you could avoid the whole argument parsing dance altogether.
Attachment #8349147 - Flags: review?(gps) → feedback+
> I think this should be in build/valgrind/mach_commands.py or even
> testing/mach_commands.py. python/mach_commands.py definitely seems like the
> wrong place for it.

As per comment 3, I ended up putting it in python/mach_commands.py because that let me use self.python_executable.


> > +        for s in suppressions:
> > +            debugger_args.append('--suppressions=' + s)
> 
> What if s has spaces in it? Would it be better to just use the multiple
> arguments (no =) form?

I don't understand if the second question pertains to mach or Valgrind.  Valgrind doesn't allow more than one suppression file per --suppression option, and the '=' is mandatory.


> > +        machtype = subprocess.check_output(['bash', '-c', 'echo $MACHTYPE']).rstrip()
> 
> Why can't we get this from os.environ? Please document if there is a good
> reason.

It's an odd bash-only variable.  I'll add a comment.


> @@ +182,5 @@
> > +
> > +        script = os.path.join(build_dir, 'valgrind', 'valgrind_test.py')
> > +        return subprocess.call([self.python_executable, script,
> > +                                '--debugger=valgrind',
> > +                                '--debugger-args=' + ' '.join(debugger_args) + ''],
> 
> This seems like a fragile escaping problem.
> 
> You can also consider using imp.load_source() to obtain a module object for
> a .py file and calling some method on that module natively. I haven't looked
> at valgrant_test.py, but presumably you could avoid the whole argument
> parsing dance altogether.

I didn't want to deviate from the current structure too much.  It's *really* awkward at the moment because the valgrind invocations are currently done via the valgrind.sh script in the tools/ repo, which means that I can't do any kind try run that modifies them.  So I desperately want to get this mach command landed so I can change valgrind.sh to use it, and then the valgrind invocation is controlled within mozilla-central, which makes everything relating to this ten times easier.
This version just removes the --suppression argument.  It's a nicety that's
only needed for local runs.  It's not needed for TBPL runs, so can be deferred.

I've left the command in python/mach_commands.py due to the
self.python_executable requirement.
Attachment #8349788 - Flags: review?(gps)
Attachment #8349147 - Attachment is obsolete: true
(In reply to Nicholas Nethercote [:njn] from comment #28)
> > I think this should be in build/valgrind/mach_commands.py or even
> > testing/mach_commands.py. python/mach_commands.py definitely seems like the
> > wrong place for it.
> 
> As per comment 3, I ended up putting it in python/mach_commands.py because
> that let me use self.python_executable.

Ahhh. That code from python/mach_commands.py needs to DIAF... because on your MozbuildObject, you can now do:

self.virtualenv_manager.python_path

You can also do crazy stuff like activate the virtualenv and install packages from pip from mach commands. See e.g. tools/docs/mach_commands.py.

> > @@ +182,5 @@
> > > +
> > > +        script = os.path.join(build_dir, 'valgrind', 'valgrind_test.py')
> > > +        return subprocess.call([self.python_executable, script,
> > > +                                '--debugger=valgrind',
> > > +                                '--debugger-args=' + ' '.join(debugger_args) + ''],
> > 
> > This seems like a fragile escaping problem.
> > 
> > You can also consider using imp.load_source() to obtain a module object for
> > a .py file and calling some method on that module natively. I haven't looked
> > at valgrant_test.py, but presumably you could avoid the whole argument
> > parsing dance altogether.
> 
> I didn't want to deviate from the current structure too much.  It's *really*
> awkward at the moment because the valgrind invocations are currently done
> via the valgrind.sh script in the tools/ repo, which means that I can't do
> any kind try run that modifies them.  So I desperately want to get this mach
> command landed so I can change valgrind.sh to use it, and then the valgrind
> invocation is controlled within mozilla-central, which makes everything
> relating to this ten times easier.

I like how you think.

Perfect is the enemy of landed. Get me a new patch to review so I can rubber stamp this for you.
This version uses self.virtualenv_manager.python_path, and moves the command to
build/valgrind/mach_commands.py.
Attachment #8349790 - Flags: review?(gps)
Attachment #8349788 - Attachment is obsolete: true
Attachment #8349788 - Flags: review?(gps)
Comment on attachment 8349790 [details] [diff] [review]
(part 2) - Implement |mach valgrind-test|.

Review of attachment 8349790 [details] [diff] [review]:
-----------------------------------------------------------------

Land it!
Attachment #8349790 - Flags: review?(gps) → review+
A follow-up (part 2b) to fix a minor error in the mach command exposed on the build slaves:
https://hg.mozilla.org/mozilla-central/rev/73c93e4a4d30
And another follow-up for the tools repo (part 3b) to use an appropriate python and print an error message on failure for TBPL to pick up:
https://hg.mozilla.org/build/tools/rev/a6ee69bf24cf
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
And part 3c, to avoid mach's initialize-then-quit behaviour:
https://hg.mozilla.org/build/tools/rev/41396c0f77c6
And with that all done, it's now actually working.  Runs with errors show up red;  runs without errors show up green.  Yay.

The current structure isn't ideal -- e.g. it would be better for the mach command to print the TEST-UNEXPECTED-FAILURE message -- but at least it's working.  I'll clean things up in the new year.
Blocks: 952397
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: