Closed Bug 1612345 Opened 4 years ago Closed 4 years ago

Better document retrigger with logging, fix broken cases, and support more test suites

Categories

(Testing :: General, enhancement, P2)

Version 3
enhancement

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: gbrown)

References

(Blocks 1 open bug, )

Details

(Whiteboard: dev-prod-2020)

Attachments

(10 files)

currently we have the ability via treeherder and custom actions to retrigger jobs with special flags (such as --gecko-profiler). There are other flags, I would like to find ways to simplify this for developers who are investigating failures.

This might be something to add to treeherder, or in-tree action tasks. Here is an example of an action task to retrigger a task with --gecko-profile:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/actions/gecko_profile.py

I think moving forward here are some steps:

  • find flags to add to env/browser/etc. for getting more logging
  • add support to do it via custom tricks
  • make shortcuts in pushhealth :)

Alex, could you outline some logging flags you would like to see for your use case and indicate where to add the flags?

Flags: needinfo?(achronop)

I would like to see Android included, as there have been similar requests from the geckoview team, and that case is slightly more complicated, at least for environment variables (setting the host env does not affect the device or test app).

:gbrown, are there some use cases you could identify here (or cc/needinfo users)?

We have several log flags that are all activated the same way so I will give an example with just one.

In a local run, all you need to do is to set the MOZ_LOG env. You can do it per run with something like:
MOZ_LOG=timestamps,MediaTrackGraph:4 ./mach run

On a try run, in order to capture logs I modify specific files by appending the falg(s) that I would like to activate here and here. I am not sure if this is the best way to do it but it is reliable.

In general, I would expect that the logic will allow the user to specify the desired flags.

Flags: needinfo?(achronop)

(In reply to Alex Chronopoulos [:achronop] from comment #4)

We have several log flags that are all activated the same way so I will give an example with just one.

Which kind of jobs can take advantage of such flags? Builds? Tests? Perf tests?
I would like to determine the context.

In a local run, all you need to do is to set the MOZ_LOG env. You can do it per run with something like:
MOZ_LOG=timestamps,MediaTrackGraph:4 ./mach run

I've tried such command locally and it is an unknown command. Is this instead ./mach run-desktop?

On a try run, in order to capture logs I modify specific files by appending the falg(s) that I would like to activate here and here. I am not sure if this is the best way to do it but it is reliable.

Do you mean that you make code changes and then push to try?

Flags: needinfo?(achronop)

(In reply to Armen [:armenzg] from comment #5)

Which kind of jobs can take advantage of such flags? Builds? Tests? Perf tests?
I would like to determine the context.

I am not familiar with context, I believe the answer here is Tests.

I've tried such command locally and it is an unknown command. Is this instead ./mach run-desktop?

You need to build firefox.

[17:36:34 firefox]$ ./mach run -h
usage: mach [global arguments] run [command arguments]

Run the compiled program, possibly under a debugger or DMD.
...

Do you mean that you make code changes and then push to try?

Yes, I change the files I pointed out and push on try.

Flags: needinfo?(achronop)

I believe we have the ability to rerun a task with customizations of the env variables, however, it is restricted to mochitests and reftests.

After evaluating the steps below, could you please answer the following?

  • Should this be better documented? and where?
  • Is the current feature sufficient to support your workflow?

I assume it might just be sufficient to reduce the restriction of just mochitests and reftests. The action is defined here.

STR:

  • Load a mochitest or reftest. For instance this
  • In the panel details look for the 3 dots icon ("Other job details")
  • Click "Custom actions..."
  • I modified the payload: changed MOZ_LOG, repeat of 1 and no runUntilFail [1]
  • You can see both jobs in here
  • The live log is here (until the job actually completes)

If you search for "MOZ_LOG" in the live log you will see that on line 760 the MOZ_LOG value is set.

Is this the kind of output you would expect it to show up?

[task 2020-01-30T18:17:17.345Z] GECKO(1752) | [Child 1831: GraphRunner]: D/MediaTrackGraph Moving tracks between suspended and runningstate: mTracks: 0, mSuspendedTracks: 1
I don't see such output in the non-custom mochitest.

[1]

environment:
  MOZ_LOG: 'timestamps,MediaTrackGraph:4'
logLevel: debug
path: ''
preferences:
  mygeckopreferences.pref: myvalue2
repeat: 1
runUntilFail: false

That's awesome thank you. I am pretty happy with this workflow. It's not something easy to find if you don't know where to look at, but since I know how to do it it's more than enough.

(In reply to Armen [:armenzg] from comment #7)

I believe we have the ability to rerun a task with customizations of the env variables, however, it is restricted to mochitests and reftests.

That's most of our tests. They are 70% of our test collection. If we can enable gtest, which is the rest 30%, it will be everything.

After evaluating the steps below, could you please answer the following?

  • Should this be better documented? and where?

Please do, if we use a separate page specific to logging (in order to pop up in a google/wiki search), and link it in the general wiki try page will benefit a lot of people. I mean this is the first place I would have looked for it. I am thinking to create a wiki page for my teammates so if you do not mind nongeneric instructions we can go with it.

On top of that it would be equally beneficial to mention in the general wiki try page a handy way to create a new try run with logs activated, without modifying files, if possible.

  • Is the current feature sufficient to support your workflow?

Absolutely, I've verified the logs in the custom retrigger and that is the outcome that I was looking for.

I assume it might just be sufficient to reduce the restriction of just mochitests and reftests. The action is defined here.

As I said, if you consider adding gtests it will be golden for my workflow.

I will handle it next week. Happy to help :)

Assignee: nobody → armenzg
Status: NEW → ASSIGNED
Summary: expose and allow for "retrigger" with logging → Better document retrigger with logging & add gtest

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)

:gbrown, are there some use cases you could identify here (or cc/needinfo users)?

My main goal is to achieve the same level of support for Android/geckoview as for desktop. All of the existing support in mochitest_retrigger_action (MOZ_LOG, --repeat, --runUntilFailure, prefs, etc) are equally useful for geckoview. Some of those features probably already work on geckoview, but I'm sure anything involving environment variables will not.

Also, geckoview devs rely heavily on the geckoview-junit test suite, so it would be great if that suite was supported.

My first patch to include test-type tasks for gtest and geckoview-junit, however, those tasks don't have those values defined.

Here's the code that sets test-type as tags

@transforms.add
def set_test_type(config, tests):
    for test in tests:
        for test_type in ['mochitest', 'reftest', 'talos', 'raptor']:
            if test_type in test['suite'] and 'web-platform' not in test['suite']:
                test.setdefault('tags', {})['test-type'] = test_type
        yield test

This second patch aims to permit this action for all tasks defined as kind: test.

We can either adjust the def set_test_type(...) to add test-type to a larger set of suites OR we make this action applicable for all kinds of tasks marked as kind:test.

The former option requires opting in which suites will permit this kind of actions which means more trial and error.
The latter option will permit triggering this action on tasks that will perhaps not actually apply the changes requested and cause some confusion on developers trying. The advantage of this option is that it will not require having to opt-in specific suites.

Please specify the preferred approach. I prefer the latter.

The current state of affairs can be seen in this push. It seems that I need to fix one more thing.

For the curious, here's the documentation for the context section.

achronop, gbrown, Could you please look around in this push and try some custom scheduling?

Flags: needinfo?(gbrown)
Flags: needinfo?(achronop)

I tried but none of my attempts actually ran any tests. I see --no-run-tests specified and then an attempt to run mach which results in "It looks like you are trying to run an unknown mach command:". :(

Flags: needinfo?(gbrown)

I've just tried a new run with a different log flag. We don't have that many gtests for MediaTrackGraph. I'll let you know when it is finished.

Flags: needinfo?(achronop)

achronop: Could you please check again? Thanks!

Flags: needinfo?(achronop)

Nvm. I need to look into something first.

Flags: needinfo?(achronop)

I copy here the output when I run locally one gtest with logs on:

$ MOZ_LOG=MediaTrackGraph:4 ./mach gtest TestAudioCallbackDriver.*
Running GTest tests...
Note: Google Test filter = TestAudioCallbackDriver.*
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from TestAudioCallbackDriver
[ RUN      ] TestAudioCallbackDriver.StartStop
[(null) 10493: Main Thread]: D/MediaTrackGraph 0x7f3025e9bbc0: AudioCallbackDriver ctor
[(null) 10493: Main Thread]: D/MediaTrackGraph 0x7f3025e9bbc0: AudioCallbackDriver 0x7f2ffd62e200 Falling back to SystemClockDriver.
[(null) 10493: Main Thread]: D/MediaTrackGraph Starting thread for a SystemClockDriver  0x7f2ffd25d6c0
[(null) 10493: Main Thread]: D/MediaTrackGraph Starting new audio driver off main thread, to ensure it runs after previous shutdown.
[(null) 10493: MediaTrackGrph]: D/MediaTrackGraph Starting a new system driver for graph 0x7f2ffd25d6c0
[(null) 10493: MediaTrackGrph]: W/MediaTrackGraph 0x7f2ffd25d6c0: Global underrun detected
[(null) 10493: MediaTrackGrph]: D/MediaTrackGraph 0x7f2ffd25d6c0: Time did not advance
[(null) 10493: CubebOperation #1]: D/MediaTrackGraph 0x7f3025e9bbc0: AsyncCubebOperation::INIT driver=0x7f2ffd62e200
[(null) 10493: CubebOperation #1]: D/MediaTrackGraph Effective latency in frames: 512
[(null) 10493: CubebOperation #1]: D/MediaTrackGraph AudioCallbackDriver State: STARTED
[(null) 10493: CubebOperation #1]: D/MediaTrackGraph 0x7f3025e9bbc0: AudioCallbackDriver started.
[(null) 10493: Main Thread]: D/MediaTrackGraph 0x7f3025e9bbc0: Releasing audio driver off main thread (GraphDriver::Shutdown).
[(null) 10493: CubebOperation #1]: D/MediaTrackGraph 0x7f3025e9bbc0: AsyncCubebOperation::SHUTDOWN driver=0x7f2ffd62e200
Couldn't convert chrome URL: chrome://branding/locale/brand.properties
[10493, Main Thread] WARNING: Could not get the program name for a cubeb stream.: 'NS_SUCCEEDED(rv)', file /home/achronop/repos/mozilla/firefox/dom/media/CubebUtils.cpp, line 381
[(null) 10493: CubebOperation #1]: D/MediaTrackGraph AudioCallbackDriver State: STOPPED
[       OK ] TestAudioCallbackDriver.StartStop (202 ms)
[----------] 1 test from TestAudioCallbackDriver (202 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (203 ms total)
[  PASSED  ] 1 test.

I tried using the custom re-trigger for mochitests but it did not work. The run is in [1]. The log file is very small and there is a failure message at the end.

[1]https://treeherder.mozilla.org/#/jobs?repo=try&author=achronop%40gmail.com&selectedJob=288524781

Whiteboard: dev-prod-2020

Armen is moving to another project. Gbrown will work to figure out next steps on this for specific test harnesses and Sarah (in a few weeks) can help out with Treeherder related work items

Status: ASSIGNED → NEW
Assignee: armenzg → gbrown
Priority: -- → P2

:armenzg - I intend to continue where you left off in comment 12; if you have any other work-in-progress, tips, thoughts, etc. please let me know.

Flags: needinfo?(armenzg)

Would you mind metting next week with me to go over it?
Except Wednesdays all other days are available.

My suggestion is moving to one action script per suite rather than trying to use one for many.
It will also be good to talk about how to make these kind of changes easy to test locally without having to use the CI.

Flags: needinfo?(armenzg)
See Also: → 1499673, 1502034
Summary: Better document retrigger with logging & add gtest → Better document retrigger with logging, fix broken cases, and support more test suites

Simple update to strings and names for the custom retrigger action, in preparation
for the addition of more tasks.

Add a --setpref option to geckoview-junit with the same meaning and
help description as used in mochitest.

The retrigger custom action is busted for Android tasks, failing with "KeyError: u'remote_webserver'",
because it assumes a mozharness configuration format that was changed long ago. This patch brings
things up to date.

Keywords: leave-open
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1e4b56bff5a1
Generalize the retrigger-mochitest action; r=bc
https://hg.mozilla.org/integration/autoland/rev/2caf817caafd
Support --setpref in geckoview-junit; r=bc
https://hg.mozilla.org/integration/autoland/rev/524ea269c239
Fix retrigger support for android mochitest variable parameters; r=bc

Still to do:

  • add geckoview-junit
  • add gtest
  • better defaults (or new UI?)
  • add other suites that might already be mostly supported
  • environment pass-through for android
  • chunk support (keep chunk arguments of original task)
  • allow override of harness arguments
  • crashreporting in new task (esp symbols-path)
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/99aa249db142
Add custom retrigger support for geckoview-junit; r=bc

Convert the gtest option parser from optparse to argparse. mochitest, reftest,
and other suites use argparse. Using argparse will simplify the integration
of gtest with the custom retrigger action.

Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/53ad80b01022
Convert gtest argument parser to argparse; r=bc

Update the default values to avoid common pitfalls, such as trying to repeat
a 30-minute long tasks 30x times with extra logging!
The new defaults allow a simple re-run of most tasks with no changes.
While we are here, tweak the parameter descriptions.

Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fdbb08f3e279
Change defaults for custom retrigger action; r=bc

Add test package mach support for gtest and hook into the custom retrigger
action. Some existing custom retrigger features, like setting gecko prefs,
are not (easily) applicable to gtest, which doesn't use mozprofile; for
this reason, use a separate action context with items suitable for gtest.

See Also: → 1623635
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9e60199e3597
Add custom retrigger support for gtest; r=bc

Various updates to the custom retrigger action so that, without any custom changes to
parameters, the retriggered task runs with the same parameters as the original task.
Several issues were found and corrected, notably:

  • parameters like --allow-software-gl-layers were ignored
  • MOZHARNESS_TEST_PATHS was ignored
  • using repeat=1 by default meant that each test ran twice

(In reply to Geoff Brown [:gbrown] from comment #28)

Still to do:
x add geckoview-junit
x add gtest
x better defaults (or new UI?)

  • add other suites that might already be mostly supported
  • environment pass-through for android
  • chunk support (keep chunk arguments of original task)
  • allow override of harness arguments
  • crashreporting in new task (esp symbols-path)
  • windows/mac/android-hw (generic-worker) support
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/27f290b2a3e9
Ensure that most custom retriggers repeat the original task by default; r=bc

The custom retrigger actions work well on linux and android-em, but fail
on windows, osx, and android-hw. At least part of the problem seems to be
the worker implementation, but I am not entirely clear on what goes wrong.
It looks like I won't have much more time for retrigger improvements in the
near future, so I'd prefer to "turn off" the actions on tasks known to fail.
I found helpful examples for the 'context' parameter in
https://searchfox.org/mozilla-central/source/taskcluster/docs/actions.rst

Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/818a39ddca16
Restrict custom retrigger to docker-worker tasks; r=bc

I'm declaring victory here. The main remaining element I was waiting for was environment pass-through for android, but there is a separate bug for that. Otherwise, I think the retrigger action is in good shape and reasonably documented now.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Keywords: leave-open
Blocks: 1690174
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: