Closed Bug 1046694 Opened 7 years ago Closed 7 years ago

Split Gij and Gip tests into several chunks and run on Cedar

Categories

(Release Engineering :: General, defect, P3)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: julienw, Assigned: sbruno)

References

Details

Attachments

(4 files, 11 obsolete files)

23.25 KB, patch
kmoir
: review+
sbruno
: checked-in+
Details | Diff | Splinter Review
5.55 KB, patch
kmoir
: review+
sbruno
: checked-in+
Details | Diff | Splinter Review
1019 bytes, patch
kmoir
: review+
sbruno
: checked-in+
Details | Diff | Splinter Review
1.93 KB, patch
massimo
: review+
sbruno
: checked-in+
Details | Diff | Splinter Review
Gij and Gip already support splitting, and they take a lot of time.

It's done differently for each though.

Gij:
We need 2 env variable:
* NBPARTS : how many chunks should we split this list with?
* PART : which part to run in _this_ job (1 to NBPARTS)

(We can rename it using "CHUNK" terminology if necessary)

These days, I think we can split in 4 parts.

Gip:
We have 3 different suites, that use the SUITE env variable:

SUITE=unit
SUITE=functional
SUITE=accessibility

Then we can split them further using TOTAL_CHUNKS and THIS_CHUNK env variables, for example: THIS_CHUNK=1 TOTAL_CHUNKS=3

Only the "functional" suite would need a chunk split for now. We can say 3 chunks for this suite would be useful.


Please tell me if you need anything from the test runners.
Hey Jonathan, I don't know if you know the right persons to copy here, to make this happen?

Thanks!
Flags: needinfo?(jgriffin)
I'll needinfo releng managers; catlee, coop, do you have anyone that can do this?  I think we want to schedule these chunks on cedar first to make sure they run ok, before doing them everywhere.

AFAIK, Gip will need to be chunked using command-line args (--this-chunk and --total-chunks, like mochitest) rather than env variables.
Flags: needinfo?(jgriffin)
Flags: needinfo?(coop)
Flags: needinfo?(catlee)
I'm quite sure the env variables work too, because that's how we do it on Travis :) but maybe both works !
(In reply to Julien Wajsberg [:julienw] from comment #3)
> I'm quite sure the env variables work too, because that's how we do it on
> Travis :) but maybe both works !

I see, because of this:  https://github.com/mozilla-b2g/gaia/blob/master/tests/travis_ci/gaia_ui_tests/script

It would happen in buildbot a little differently, via manipulation of the command-line args directly.
Oh, I see, good catch !
Is there any chance we could make this move forward?

* This impairs productivity because it takes ~90 min on TBPL.
* when we need to respin to find if a test is intermittent, each respin takes ~90min, which is not really resource-friendly.
(In reply to Julien Wajsberg [:julienw] from comment #6)
> Is there any chance we could make this move forward?

Coop has been on PTO (and therefore unable to answer) - will be back today.
Sorry for the delay. Simone should be able to pick this up early next week.
Assignee: nobody → sbruno
Flags: needinfo?(coop)
Flags: needinfo?(catlee)
Priority: -- → P3
(In reply to Chris Cooper [:coop] from comment #8)
> Sorry for the delay. Simone should be able to pick this up early next week.

No problem, you've the right to take holidays :) sorry for pushing too hard ;)
I had a look to the required changes to buildbot-configs and mozharness.
For the Gij part, I believe the steps are:

- define a new suite gaia-js-integration in mozilla-tests/b2g_config.py
- add that suite to all relevant platforms (linux64, linux64 Debug, OsX)
- add configuration for each chunk to those patforms
- add logic to initially limit this to cedar

I believe also some changes to mozharness are needed, in particular to gaia_integration.py:

- add "this_chunk" and "total_chunks" to the set of supported config_options
- add logic that translates those configs to env variable used when running "make test-integration".

Jordan (as mozharness expert), Massimo (as you worked on test chunks and mozharness in the past), does this plan sound ok to you? Did I miss some parts, or misunderstood something? Thanks!
Flags: needinfo?(mgervasini)
Flags: needinfo?(jlund)
(In reply to Simone Bruno [:simone] from comment #10)
> I had a look to the required changes to buildbot-configs and mozharness.
> For the Gij part, I believe the steps are:
> 
> - define a new suite gaia-js-integration in mozilla-tests/b2g_config.py
> - add that suite to all relevant platforms (linux64, linux64 Debug, OsX)
> - add configuration for each chunk to those patforms
> - add logic to initially limit this to cedar

yup. sounds like you have a handle on it. IIUC, you can copy the way we do things here:
http://mxr.mozilla.org/build/source/buildbot-configs/mozilla-tests/b2g_config.py#500
http://mxr.mozilla.org/build/source/buildbot-configs/mozilla-tests/b2g_config.py#743


> I believe also some changes to mozharness are needed, in particular to
> gaia_integration.py:
> 
> - add "this_chunk" and "total_chunks" to the set of supported config_options
> - add logic that translates those configs to env variable used when running
> "make test-integration".
> 

you got it. here's the mozharness equivalent to the buildbot-config links above:
http://mxr.mozilla.org/build/source/mozharness/scripts/b2g_desktop_unittest.py#53
http://mxr.mozilla.org/build/source/mozharness/scripts/b2g_desktop_unittest.py#156 # here, if ENV vars is the only thing that Gip or Gij know how to recognize for chunking, you could add to the env instead of cmdline options in the respective scripts you need to change.

Is this helpful? Let me know if I can assist further :)
Flags: needinfo?(jlund)
For Gip, this happens through command line parameters (see comment 2).
For Gij, this happens with environment variables.

Hope this helps, thanks!
Simone, your plan looks good, if you need any help with the implementation, feel free to ping me anytime.
Flags: needinfo?(mgervasini)
Attached patch mozharness_001_WIP_GIJ-part-only (obsolete) — Splinter Review
Work in progress - not asking for review yet since I need to run some tests first.
Attachment #8472251 - Flags: feedback?
Work in progress - not asking for review yet since I need to run some tests first.
Julien, Jonathan, I need some clarifications on the GIP part:

> Gip:
> We have 3 different suites, that use the SUITE env variable:
> 
> SUITE=unit
> SUITE=functional
> SUITE=accessibility
>  [...]
> Only the "functional" suite would need a chunk split for now. We can say 3
> chunks for this suite would be useful.
> 

At the moment the GIP tests are invoked without setting any SUITE env variable; nor I can find any other configuration specifying whether the unit/functional/accessibility suite is actually executed.

It would be relatively easy for me to simply add the "--this-chunk" and "--total-chunks" parameters to the current invocation of the cli.py script, but this way we would not take into account the SUITE parameter you wrote about. Would that be enough for you, or we need to implement something like splitting the current GIP suite in three different suites (GIP-unit, GIP-functional, GIP-accessibility), and chunk the functional one?

Thanks!
Flags: needinfo?(jgriffin)
Flags: needinfo?(felash)
Oh, I forgot to thank Jordan and Massimo for their feedback and help. Thanks!
(In reply to Simone Bruno [:simone] from comment #16)
> Julien, Jonathan, I need some clarifications on the GIP part:
> 
> > Gip:
> > We have 3 different suites, that use the SUITE env variable:
> > 
> > SUITE=unit
> > SUITE=functional
> > SUITE=accessibility
> >  [...]
> > Only the "functional" suite would need a chunk split for now. We can say 3
> > chunks for this suite would be useful.
> > 
> 
> At the moment the GIP tests are invoked without setting any SUITE env
> variable; nor I can find any other configuration specifying whether the
> unit/functional/accessibility suite is actually executed.
> 
> It would be relatively easy for me to simply add the "--this-chunk" and
> "--total-chunks" parameters to the current invocation of the cli.py script,
> but this way we would not take into account the SUITE parameter you wrote
> about.

I don't know if --this-chunk and --total-chunks work if we don't specify a suite.

> Would that be enough for you, or we need to implement something like
> splitting the current GIP suite in three different suites (GIP-unit,
> GIP-functional, GIP-accessibility), and chunk the functional one?

Ideally I'd rather want the latter :)

I'm gonna needinfo :zac on this for having more information.
Flags: needinfo?(felash) → needinfo?(zcampbell)
Attachment #8472251 - Flags: feedback?
Splitting Gip into suites would involve passing a different manifest to each suite; the SUITE var is used only in Travis.
Flags: needinfo?(jgriffin)
For chunking Gip (either the entire thing or the functional suite), you should use --total-chunks and --this-chunk.
(In reply to Jonathan Griffin (:jgriffin) from comment #19)
> Splitting Gip into suites would involve passing a different manifest to each
> suite; the SUITE var is used only in Travis.

for unit tests, the manifest is $gaia_dir/tests/python/gaia-ui-tests/gaiatest/tests/unit/manifest.ini
for accessibility, $gaia_dir/tests/python/gaia-ui-tests/gaiatest/tests/accessibility/manifest.ini
for functional, $gaia_dir/tests/python/gaia-ui-tests/gaiatest/tests/functional/manifest.ini
Combining what Julien and Jonathan said, the plan is now:

- Split the existing GIP suite into three suites (unit, accessibility, functional) by passing the appropriate manifest to each of them
- chunk the functional suite via command line parameters
Attached patch mozharnes_GIP_WIP_01 (obsolete) — Splinter Review
Changes to mozharness to execute single GIP suites and add parameters governing tests chunks.
Without changes to buildbot-configs, the overall behavior in TBPL should be exactly what it is now.
Next patch will be the buildbot-configs changes to correctly invoke marionette to run distinct GIP suites; the functional one will be executed in different chunks.
(In reply to Julien Wajsberg [:julienw] from comment #18)
> 
> > Would that be enough for you, or we need to implement something like
> > splitting the current GIP suite in three different suites (GIP-unit,
> > GIP-functional, GIP-accessibility), and chunk the functional one?
> 
> Ideally I'd rather want the latter :)
> 
> I'm gonna needinfo :zac on this for having more information.

Seems fine to me and clearer than chunking because the contents of the chunk change each time, the history for that suite can be inconsistent.
Flags: needinfo?(zcampbell)
(In reply to Jonathan Griffin (:jgriffin) from comment #21)
> (In reply to Jonathan Griffin (:jgriffin) from comment #19)
> > Splitting Gip into suites would involve passing a different manifest to each
> > suite; the SUITE var is used only in Travis.
> 
> for unit tests, the manifest is
> $gaia_dir/tests/python/gaia-ui-tests/gaiatest/tests/unit/manifest.ini
> for accessibility,
> $gaia_dir/tests/python/gaia-ui-tests/gaiatest/tests/accessibility/manifest.
> ini
> for functional,
> $gaia_dir/tests/python/gaia-ui-tests/gaiatest/tests/functional/manifest.ini

Yes you can just use those manifest files directly.

The tests will pass, *but* some tests listed in functional/manifest.ini will call internet resources and thus be a breach of TBPL requirements.

I have been working separately to remove tbpl-manifest.ini (bug 1044484) anyway so this might give the final impetus.
Attached patch buildbot-configs_002_WIP_GIP (obsolete) — Splinter Review
Attached patch buildbot-configs_003_WIP_GIP (obsolete) — Splinter Review
Attachment #8473667 - Attachment is obsolete: true
Simone your wip patches looks good so far.  (you asked me earlier in irc to look at these patches) Interested to see how it works out on your dev-master.
Depends on: 1056007
(In reply to Zac C (:zac) from comment #25)
> (In reply to Jonathan Griffin (:jgriffin) from comment #21)
> > (In reply to Jonathan Griffin (:jgriffin) from comment #19)
> > > Splitting Gip into suites would involve passing a different manifest to each
> > > suite; the SUITE var is used only in Travis.
> > 
> > for unit tests, the manifest is
> > $gaia_dir/tests/python/gaia-ui-tests/gaiatest/tests/unit/manifest.ini
> > for accessibility,
> > $gaia_dir/tests/python/gaia-ui-tests/gaiatest/tests/accessibility/manifest.
> > ini
> > for functional,
> > $gaia_dir/tests/python/gaia-ui-tests/gaiatest/tests/functional/manifest.ini
> 
> Yes you can just use those manifest files directly.
> 
> The tests will pass, *but* some tests listed in functional/manifest.ini will
> call internet resources and thus be a breach of TBPL requirements.
> 
> I have been working separately to remove tbpl-manifest.ini (bug 1044484)
> anyway so this might give the final impetus.

In bug 1056007 I'm resolving this problem.
Attached file mozharness_002_WIP_GIJ (obsolete) —
New patch is needed because the underlying codebase has been changed
Attachment #8472251 - Attachment is obsolete: true
Attached file buildbot-configs_002_WIP_GIJ (obsolete) —
Bitrot! As for mozharness, new patch needed because the underlying code changed.
Attachment #8472252 - Attachment is obsolete: true
Attached file buildbot-configs_003_WIP_GIJ (obsolete) —
* added extra args for linux64 platform (I had omitted them in previous WIP patch)
* removed extra arg "--test-suite", which is not used by gaia_integration.py
Attachment #8476969 - Attachment is obsolete: true
Status update:
- I tested the GIJ part on my staging environment.
- After my last changes, GIJ tests seem to be correctly defining the test suites and invoking mozharness with the correct arguments.
- at the moment tests on my staging environment are red because I should trigger them via sendchange, thus providing all required configuration (the tests builds have just been triggered manually via buildbot interface, e.g.: http://dev-master1.srv.releng.scl3.mozilla.com:8401/builders/b2g_ubuntu64_vm%20cedar%20opt%20test%20gaia-js-integration-1/builds/7). In other words, this is an issue related to how I triggered the tests, but not related to the code changes needed for this Bug, so I am close to provide the GIJ patches for review.
Attached patch buildbot-configs_004_GIJ (obsolete) — Splinter Review
Attachment #8477033 - Attachment is obsolete: true
Attached patch mozharness_003_GIJ (obsolete) — Splinter Review
Attachment #8476951 - Attachment is obsolete: true
Attachment #8477398 - Flags: review?(kmoir)
Attachment #8477399 - Flags: review?(kmoir)
Attached patch mozharness_004 (obsolete) — Splinter Review
This is the final patch for mozharness, including both changes for Gij and Gip
Attachment #8472958 - Attachment is obsolete: true
Attachment #8477399 - Attachment is obsolete: true
Attachment #8477399 - Flags: review?(kmoir)
Attachment #8477435 - Flags: review?(kmoir)
Final patch for buildbot-configs, including changes for both GIP and Gij
Attachment #8473672 - Attachment is obsolete: true
Attachment #8477398 - Attachment is obsolete: true
Attachment #8477398 - Flags: review?(kmoir)
Attachment #8477438 - Flags: review?(kmoir)
:simone, It would be better to split Gip up by its test categories if you can. 

The configuration for each part would be:

manifest file , --type= command
1. gaiatest/tests/unit/manifest.ini, --type=b2g
2. gaiatest/tests/functional/manifest.ini --type=b2g-external
3. gaiatest/tests/accessibility/manifest.ini --type=b2g

and you would resolve bug 1056007 at the same time!
Pardon I didn't see your next patch Simone! 

Can you just note change to --type=b2g-external for functional suite? this is to exclude external/internet resources.
Attached patch mozharness_005Splinter Review
Addressing Comment 38 and Comment 39.

I added the following code:

            # Bug 1046694
            # using a different manifest if a specific gip-suite is specified
            # --type parameter depends on gip-suite
            if self.config.get('gip_suite'):
                manifest = os.path.join(dirs['abs_gaiatest_dir'], 'gaiatest', 'tests', self.config.get('gip_suite'),
                                        'manifest.ini')
                if self.config.get('gip_suite') == "functional":
                    cmd.extend(self._build_arg('--type', "b2g-external"))
                else:
                    cmd.extend(self._build_arg('--type', "b2g"))
:zac
Does this look ok (or I totally misunderstood your comments :-))?
Attachment #8477435 - Attachment is obsolete: true
Attachment #8477435 - Flags: review?(kmoir)
Attachment #8477482 - Flags: review?(kmoir)
Flags: needinfo?(zcampbell)
Simone: The patches look good so far to me but it would be good to see a builder diff because the change is quite complicated

https://wiki.mozilla.org/ReleaseEngineering:TestingTechniques#builder_list.py_.2F_dump_master.py

(builder_list.py part)

Also, did the tests run green on your test master?
Simone, that looks correct to me. Thanks! :)
Flags: needinfo?(zcampbell)
Kim:

This is the diff between old and new output of builder_list.py on my test environment: http://people.mozilla.org/~sbruno/builder_list_diff_1046694.html

I did not get green builds on my test environment because I did not trigger the tests issuing sendchange commands with all the required parameters, but I verified in the logged output that the mozharness calls are all correct ("correct" here meaning that each call adds the parameters or sets env variables as requested in previous comments).
Attachment #8477438 - Flags: review?(kmoir) → review+
Attachment #8477482 - Flags: review?(kmoir) → review+
Any movement on this? The Gij suite is starting to fail sometimes because it simply takes too long (over 2 hours). And we are adding more integration tests every day.

https://tbpl.mozilla.org/php/getParsedLog.php?id=47411564&tree=Gaia-Try&full=1#error0
Flags: needinfo?(sbruno)
Hi mhenretty,

I found a problem on the patch which has been reviewed, I am going to submit a new patch for review with some slight further code changes and land it, let's say, by Monday.
Flags: needinfo?(sbruno)
Attached patch mozharness_005_bSplinter Review
Hi Kim,

This extra code change is needed on top of the previous patch not to break old-style gaia tests, which will still run for an interim period on all-but-cedar branches.

It's just a simple patch to make sure that env variable governing test chunking are set only when passed from the calling script.
Attachment #8487138 - Flags: review?(kmoir)
Attachment #8487138 - Flags: review?(kmoir) → review+
Hey Simone,

what's missing here ? :)

Thanks a lot for this work!
Flags: needinfo?(sbruno)
Sorry julienw, nothing missing - landing today.
Flags: needinfo?(sbruno)
Patches landed, will be rolled out to prod with next reconfig.
Attachment #8477438 - Flags: checked-in+
Attachment #8477482 - Flags: checked-in+
Attachment #8487138 - Flags: checked-in+
In prod with reconfig on 2014-09-15 08:30 PT
How soon before we start seeing this in tbpl? Will it be in gaia-try?
Flags: needinfo?(sbruno)
Apparently some pieces are missing here...

Ryan,

This patch (in prod since yesterday) is supposed to chunk the gaia js integration into 4 chunks (cedar only):

gaia-js-integration-1
gaia-js-integration-2
gaia-js-integration-3
gaia-js-integration-4

 ... and spilt the gaia ui tests as follows (functional tests are also chunked in 3 parts)(also, cedar only):

gaia-ui-test-functional-1
gaia-ui-test-functional-2
gaia-ui-test-functional-3
gaia-ui-test-unit
gaia-ui-test-accessibility

These suites are not visible in TBPL - are some changes needed to TBPL code as well to show them?

Thanks!
Flags: needinfo?(sbruno) → needinfo?(ryanvm)
TBPL doesn't require changes in general to show jobs - they'll appear, but just potentially with the wrong symbol or under the "other" row etc. Jobs will occasionally be hidden, but on cedar this is unlikely since even known broken jobs are visible.

The jobs just aren't being scheduled - there must be something wrong with the patches here:
https://tbpl.mozilla.org/?tree=Cedar&jobname=gaia&showall=1

Note also that the job doesn't appear via buildapi (which is TBPL's datasource - a good place to look first):
https://secure.pub.build.mozilla.org/buildapi/self-serve/cedar/rev/d9c83fef7864

The only other thing I can suggest checking is that the reconfig occurred before the last push to cedar - if no one has pushed since, then the new jobs won't be there.
Flags: needinfo?(ryanvm)
I see no change to the Gip job on any branches, but also there have been no pushes to Cedar since the reconfig.
I've used self-serve to request new dep builds on cedar d9c83fef7864; they should be scheduled shortly.
Jobs seem to be scheduled now. Bug 1067878 deals with displaying meaningful test job names for the newly added chunked jobs.
Depends on: 1067878
Depends on: 1067892
Attached patch mozharness_005_cSplinter Review
Syntax for extending list of command line arguments needs to be changed to fix marionette execution exceptions:

Uncaught exception: Traceback (most recent call last):
 File "/builds/slave/test/scripts/mozharness/base/script.py", line 1258, in run
   self.run_action(action)
 File "/builds/slave/test/scripts/mozharness/base/script.py", line 1200, in run_action
   self._possibly_run_method(method_name, error_if_missing=True)
 File "/builds/slave/test/scripts/mozharness/base/script.py", line 1141, in _possibly_run_method
   return getattr(self, method_name)()
 File "scripts/scripts/marionette.py", line 420, in run_marionette
   cmd.extend(self._build_arg('--this-chunk', self.config.get('this_chunk')))
AttributeError: 'MarionetteTest' object has no attribute '_build_arg'
Running post_fatal callback...
Exiting -1
Attachment #8489945 - Flags: review?(mgervasini)
Comment on attachment 8489945 [details] [diff] [review]
mozharness_005_c

Simone, the patch looks good for me. Can you remove the extra spaces at the end of this line: 

   cmd.extend(['--total-chunks', self.config.get('total_chunks')]) 

before landing it?
Attachment #8489945 - Flags: review?(mgervasini) → review+
In prod with reconfig on 2014-09-16 07:23 PT
Same questions as before: How soon before we start seeing this in tbpl? Will it be in gaia-try?
Flags: needinfo?(sbruno)
The tests are currently being displayed in tbpl (so far, they are enabled on cedar only: https://tbpl.mozilla.org/?tree=Cedar), but with such names as "U1 U2" etc.

Bugs 1067878 and 1067892 deal with displaying meaningful names instead of the default ones cuurrently showed on the page.
Flags: needinfo?(sbruno)
No longer depends on: 1056007, 1067878, 1067892
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
(In reply to Michael Henretty [:mhenretty] from comment #62)
> Same questions as before: How soon before we start seeing this in tbpl? Will
> it be in gaia-try?

To answer this question, after they are running correctly on cedar (with the correct TBPL letters) another patch will be required to get them running on trunk branches like gaia-try; this should be much faster than the original patch.
Summary: Splitting Gij and Gip tests in several chunks → Split Gij and Gip tests into several chunks and run on Cedar
Depends on: 1075787
No longer depends on: 1075787
Blocks: 1081246
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.