Closed Bug 994920 Opened 10 years ago Closed 6 years ago

Run media mochitests on B2G emulators on faster VMs

Categories

(Firefox OS Graveyard :: General, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jgriffin, Unassigned)

References

Details

(Keywords: ateam-b2g-big, Whiteboard: [leave open])

Attachments

(5 files)

Currently, media mochitests (and possibly some others) on B2G emulators are very CPU-bound when run on the existing EC2 instances.  This causes timeouts and other problems.

There are a couple of potential solutions to this.  One is to be able to run those tests on beefier EC2 instances (bug 985650), but according to Armen, this is a lot of work and not likely to happen soon.

Therefore, in the short term, I propose we run the media mochitests using IX hardware slaves, instead of EC2 instances.  There are several pieces here:  splitting the media tests into their own chunk (similar, probably, to how we split devtools mochitests for desktop in bug 984930), and then landing the mozharness and buildbot-configs changes that will be needed to schedule these in TBPL.
Maire, can you provide some context to help determine priority?
Joel, do you think the subsuite approach you're using in bug 984930 is the best way to go here,as far as splitting out media tests into a separate chunk?
Flags: needinfo?(jmaher)
we have this problem with mochitest-gl (webgl sub test harness in mochitest-plain) for android.

caveat:
* subsuite is on/off- no conditions for build type or platform right now.  This means that if we defined subsuite=media, then all those tests will not be run by default on desktop

ideas:
* we can skip-if = b2g, then create a specific job type for b2g-media (mozharness target using --test-path or --manifest) and a buildbot builder that only runs that on hardware.  This isn't a reality until we get the build to stop filtering tests based on skip-if conditions (bug 989583)
* we can add platform conditions to the subsuite (subsuite-if) and then we can skip this for b2g only.  the problem here is it starts turning somewhat clean manifest syntax and muddles the water.

no matter what route gets done, we would need to create a mozharness target and a buildbot builder to run it.
Flags: needinfo?(jmaher)
Another option is to run _all_ of the B2G mochitests on IX slaves, until we either have the ability to run them on faster EC2 nodes or skip-if filtering is addressed.

From an e-mail from Maire, I think this is needed for B2G 2.0, so it's not a fire drill, and we may be able to wait for a good solution.  Needinfo'ing her so she can comment on deadlines.
Flags: needinfo?(mreavy)
Priority: -- → P2
WebRTC is the headline feature for v2.0.  We're targeting getting all our patches landed by mid May so that we are solid by FC on June 9.  This week the "media quality" part of our schedule was badly hurt by B2G emulator time outs that were not problems in our code, but problems with the slowness of the current emulator.  Anything (even hackish) that you guys can do in the very near term to improve the B2G emulator perf will get the full support (and many thanks!) from me and the WebRTC team.

Randell Jesup posted to dev-platform detailing the issues he ran into: https://groups.google.com/forum/#!topic/mozilla.dev.platform/qzyz-NzLqT0

Thanks!
Flags: needinfo?(mreavy)
For the record, the plan is:

* we can skip-if = b2g, then create a specific job type for b2g-media (mozharness target using --test-path or --manifest) and a buildbot builder that only runs that on hardware.  This isn't a reality until we get the build to stop filtering tests based on skip-if conditions (bug 989583)

If bug 989583 gets bogged down, we'll go with the subsuite approach, which is more work on the infrastructure side.
Depends on: 989583
FYI, I think bug 989583 should be resolved this week, which will allow us to proceed with this.
Thanks, Jonathan.  Do you have an ETA for this bug?
To give some more context about the urgency:

In bug 1016498 I described in a comment that the logs show random delays between two WebRTC API calls in our test from 0.5 up to 13s with an average of 5.5s.
We are getting so many oranges related to these problems on B2G emulator, that I'm inclined to de-activate all of our tests on B2G emulator which involve a network connection.

The two other options I see are:
- run the tests on some form of dedicated hardware
- create special builds for B2G emulator with insane timeout values to avoid getting test failures - but I don't like this approach as we are then not testing any more what we are shipping to customers
Once bug 989583 is resolved (ETA: ~1 week) it should take us about a week to move these tests to IX slaves, which are real hardware slaves.  So, about 2 weeks total.
Assignee: nobody → jgriffin
Steps we need to do here:

1 - create a separate mochitest-media job on cedar (on emulators initially)
2 - land a patch on cedar to skip-if = b2g media mochitests, and verify it doesn't prevent the  mochitest-media job from running normally
3 - move the mochitest-media job to IX hardware slaves
4 - green up tests, as needed
5 - schedule the mochitest-media job on all trunk branches on IX hardware slaves
6 - land the patch from step 2 on trunk branches
Jonathan let me know if I can help in any way with any of the steps you outlined.
Comment on attachment 8438077 [details] [diff] [review]
Add --test-path to in-tree B2G mochitest config,

Review of attachment 8438077 [details] [diff] [review]:
-----------------------------------------------------------------

I'm confused by this, didn't you say we were going to create a new mochitest job? This will affect the other jobs too. How will we specify different test_path config variables while re-using mochitest_options for the new job?
Attachment #8438077 - Flags: review?(ahalberstadt) → review-
Comment on attachment 8438078 [details] [diff] [review]
Add --test-path support to B2G mochitest mozharness script,

Review of attachment 8438078 [details] [diff] [review]:
-----------------------------------------------------------------

This part looks good.
Attachment #8438078 - Flags: review?(ahalberstadt) → review+
(In reply to Andrew Halberstadt [:ahal] from comment #16)
> Comment on attachment 8438077 [details] [diff] [review]
> Add --test-path to in-tree B2G mochitest config,
> 
> Review of attachment 8438077 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'm confused by this, didn't you say we were going to create a new mochitest
> job? This will affect the other jobs too. How will we specify different
> test_path config variables while re-using mochitest_options for the new job?

By default, test_path will be None, so it will be excluded from other jobs here:

http://hg.mozilla.org/build/mozharness/file/aa104dcaf661/scripts/b2g_desktop_unittest.py#l185

This is similar to how we handle the browser_arg argument, which isn't used for most runs.

See the third patch for the related buildbot config that actually creates and schedules the new job.
Comment on attachment 8438077 [details] [diff] [review]
Add --test-path to in-tree B2G mochitest config,

Review of attachment 8438077 [details] [diff] [review]:
-----------------------------------------------------------------

Oh heh, that's kind of dirty :p. I was thinking that the string interpolation would convert None to 'None' and therefore be True, but I guess that has already been taken care of. In that case, this looks good too!
Attachment #8438077 - Flags: review- → review+
Comment on attachment 8438079 [details] [diff] [review]
Schedule mochitest-media on B2G emulators on cedar,

Review of attachment 8438079 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/b2g_config.py
@@ +1068,5 @@
> +                'mochitest-media': {
> +                    'extra_args': [
> +                        '--cfg', 'b2g/emulator_automation_config.py',
> +                        '--test-suite', 'mochitest',
> +                        '--test-path', 'media/',

so I think overall the whole patch works. However, just want to sanity check that we do not want to call a specific chunk here. As in, I'm not sure what happens when you add the '--test-path' to the mochi run_tests.py call but I noticed we are not chunking so I'd assume this will run all the chunks.
Attachment #8438079 - Flags: review?(jlund) → review+
(In reply to Jordan Lund (:jlund) from comment #20)
> Comment on attachment 8438079 [details] [diff] [review]
> Schedule mochitest-media on B2G emulators on cedar,
> 
> Review of attachment 8438079 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: mozilla-tests/b2g_config.py
> @@ +1068,5 @@
> > +                'mochitest-media': {
> > +                    'extra_args': [
> > +                        '--cfg', 'b2g/emulator_automation_config.py',
> > +                        '--test-suite', 'mochitest',
> > +                        '--test-path', 'media/',
> 
> so I think overall the whole patch works. However, just want to sanity check
> that we do not want to call a specific chunk here. As in, I'm not sure what
> happens when you add the '--test-path' to the mochi run_tests.py call but I
> noticed we are not chunking so I'd assume this will run all the chunks.

Right, without chunks, we will just run all the tests in the specified path in one chunk.
Comment on attachment 8438078 [details] [diff] [review]
Add --test-path support to B2G mochitest mozharness script,

https://hg.mozilla.org/build/mozharness/rev/8cb7108d657e
(In reply to Jonathan Griffin (:jgriffin) from comment #22)
> Comment on attachment 8438078 [details] [diff] [review]
> Add --test-path support to B2G mochitest mozharness script,
> 
> https://hg.mozilla.org/build/mozharness/rev/8cb7108d657e

pushed to production
In prod with reconfig on 2014-06-12 10:46 PT
Whiteboard: [leave open]
Comment on attachment 8438079 [details] [diff] [review]
Schedule mochitest-media on B2G emulators on cedar,

https://hg.mozilla.org/build/buildbot-configs/rev/b7ae1079631d
buildbot-config patch live in production :)
Depends on: 1026802
I've run into a problem here, and that is that emulators don't like IX slaves.  Although we may be able to resolve that problem, IX slaves are already somewhat overloaded.  Instead, we're going to experiment with running the tests on faster VM nodes; see bug 1026802.
https://tbpl.mozilla.org/php/getParsedLog.php?id=41912476&tree=Try&full=1 (duration=112mins, while others are about 1 hour)
Is that the reason why sometimes mochitest-3 on b2g opt runs much slower?
(In reply to JW Wang [:jwwang] from comment #30)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=41912476&tree=Try&full=1
> (duration=112mins, while others are about 1 hour)
> Is that the reason why sometimes mochitest-3 on b2g opt runs much slower?

Very likely that's a significant contributing factor.
Depends on: 1031083
Summary: Run media mochitests on B2G emulators on IX hardware slaves → Run media mochitests on B2G emulators on faster VMs
Hi Jonathan -- What's the reason for changing from hardware slaves to faster VMs.  Have we tested faster VMs already? Thanks.
Flags: needinfo?(jgriffin)
Sounds like we don't have enough hardware. Obviously faster VMs should be the first attempt here.
But are there any fallback plans/ideas... e.g. execute only a sub-set of tests on real hardware?
I have experimented with both real hardware and faster VM's; see bug 1026800.

The emulator does not currently run on the real hardware we have.  Although we could probably fix this, the pool of available machines is not large, and adding additional load is undesirable because it will increase wait times.

We don't have this problem with VM's.  I've tried the tests on a faster VM and they seem to run OK.  Since the timeouts for the media tests are intermittent, we'll have to wait to see how they perform in production, but we have the option of moving to even faster VM's if the problem persists.
Flags: needinfo?(jgriffin)
(In reply to Nils Ohlmeier [:drno] from comment #33)
> Sounds like we don't have enough hardware. Obviously faster VMs should be
> the first attempt here.
> But are there any fallback plans/ideas... e.g. execute only a sub-set of
> tests on real hardware?

Yes, if we can't get consistently green tests on very fast VM's, we can fallback to real hardware, which will take some additional configuration work.
Depends on: 1034055
Where are we at with this?
The names of test slaves for B2G emulator suggest that everything still runs small machines.

We are disabling WebRTC test in bug 1059867 and discus alternatives for the future in bug 1059878.
See Also: → 1059867, 1059878
I didn't realize this was still pressing.

I have a set of media tests running on faster VM's on cedar, but they're actually not getting triggered; I think support for --test-path in the B2G mochitest runner may not be working.

I'll expedite a fix for this, and then we can see if the faster VM's solve this problem.
We were passing the wrong test-path to the media mochitest job on cedar; I've fixed this in https://hg.mozilla.org/build/buildbot-configs/rev/917020f08255, but it won't roll out until the next buildbot reconfig.
patch(es) in production for this bug :)
Maire:  can I get a list of tests we want run on the faster VM type?  I was thinking it was the tests in content/media, but looking at bug 1059867, I think it may be dom/media instead!  Or maybe it's both...
Flags: needinfo?(mreavy)
I'll attach some lists, but basically: most webrtc (dom/media/tests/mochitest) tests, some content/media/tests (and I'm guessing a bit there without reading each one and understanding them - jwwang likely can do a better job filtering that list), and all of content/media/webaudio/tests (some don't need to be, likely, but it's far simpler to take all of them).

We could take all of dom/media/tests/mochitests - most of the CPU time is in the tests I've already nomincated to move.  Side note: dom/media/tests/crashtests may want to move as well
FYI, Maire asked me to filter the list, so clearing needinfo to her
Flags: needinfo?(mreavy)
I am having progress in enabling content/media/tests on B2G debug by removing per-token-exactGC in manifest.js (see https://tbpl.mozilla.org/?tree=Try&rev=319879ff1197).

Most of them run well for now. Please exclude content/media/tests from the list for now.
So I think the thing to do here is:

- point the media mochitest job on cedar (which is now running green) from content/media/tests to dom/media/tests, so we can see if we can reproduce the failures that led to the recent test disabling there

- implement conditional subsuites so we can make the media job contain tests from multiple directories without resorting to extra manifests that would eventually create confusion
(In reply to Jonathan Griffin (:jgriffin) from comment #45)
> So I think the thing to do here is:
> 
> - point the media mochitest job on cedar (which is now running green) from
> content/media/tests to dom/media/tests, so we can see if we can reproduce
> the failures that led to the recent test disabling there

https://hg.mozilla.org/build/buildbot-configs/rev/dd784899e53c
Depends on: 1061982
(In reply to Jonathan Griffin (:jgriffin) from comment #45)
> 
> - implement conditional subsuites so we can make the media job contain tests
> from multiple directories without resorting to extra manifests that would
> eventually create confusion

bug 1061982
Merged to production, and deployed.
So we can run the dom/media tests now on cedar on the faster VM.  Strangely, they're all perma-fail with bug 1035011.  See:  https://tbpl.mozilla.org/?tree=Cedar&showall=1&rev=11c440c3fec3&jobname=emulator

The media tests are also running again in chunk 7 in the regular mochitest chunks on cedar; they also exhibit this error.  So, this faster VM doesn't seem to be helping any.  I'm curious though why these tests are perma-fail; were they also perma-fail on trunk before being disabled?

Logfile:
https://tbpl.mozilla.org/php/getParsedLog.php?id=47499158&tree=Cedar&full=1

If nothing fishy appears to be happening here, we can try to bump the VM size again to see if it makes any difference.  Going the real hardware route is going to be more time consuming.  :(
Checking if these tests are also perma-fail on try:  https://tbpl.mozilla.org/?tree=Try&rev=b23bd3185420
(In reply to Jonathan Griffin (:jgriffin) from comment #50)
> Checking if these tests are also perma-fail on try: 
> https://tbpl.mozilla.org/?tree=Try&rev=b23bd3185420

This are similar to the failures on cedar. Nils, I thought the problem we were trying to solve was clearing frequent intermittents caused by CPU contention, is it actually to solve a perma-fail?  Or is it resolving a set of frequent intermittents that cumulatively amount to a perma-fail?  Or...?

Right now, based on these results, it looks like the faster VM's have had no effect on these, but I'd like someone else to confirm.
Flags: needinfo?(drno)
Maire, do you have any input on comment #51?
Flags: needinfo?(mreavy)
When fixing bug 707777 (test_bug493187.html), I came to realize the test requires faster machine (for faster decoding) in order to pass. Can we put the whole folder of content/media/test/ on faster VM?
Hi Jonathan, thanks for your help with this!

When we started talking about moving to faster machines (about 6-9 months ago), it was to solve frequent media/WebRTC intermittents that indeed were shown to be caused by CPU contention.   If we ran the same suite of tests on faster hardware, we didn't see failures.  AFAIK no one has measured how much faster the hardware needed to be to avoid failures.

A number of tests were disabled over the last 6 months after clear indication in the logs that connection attempts were timing out.  There are links in the mochitest.ini file to the bugs that led to disabling a specific test.

NOTE: Due to the slowness of the emulator, we worked around some problems by dramatically reducing the generation rate for fake audio.  We'd love to get rid of this type of kludge.

About a month ago, as we were trying to land Bug 991037, we noticed that the refactored code in the patches of that bug caused many of the existing frequent intermittents to permafail.  This is when we decided to disable the WebRTC tests on the B2G emulator (roughly 3 weeks ago).

Our goal is to get all of these tests re-enabled (and remove any kludges) if we believe they can consistently give us accurate results.
Flags: needinfo?(mreavy)
I'm adding James into the loop here, since his plans impact decisions here.

It looks like faster VM's aren't sufficient here, unless we move to _much faster_ VM's.  If the tests were going to remain in buildbot long-term, we might do the work needed to run these on real hardware.

However, James wants to transition B2G tests to TaskCluster soon, and TaskCluster doesn't support real hardware atm, only VM's.

Given that's the case, I propose to try finding a fast VM this works on (apparently the work on bug 991037 rendered earlier attempts at this invalid), and do the buildbot work needed to support that, so that these can be transitioned to TaskCluster eventually.

Doing this might require a much more expensive VM than we currently use, but I think this is the only viable option here.  Needinfo'ing James and Catlee for their opinions.
Flags: needinfo?(jlal)
Flags: needinfo?(drno)
Flags: needinfo?(catlee)
Hrm- Have we tried instances with gpus yet?
Flags: needinfo?(jlal)
(In reply to James Lal [:lightsofapollo] from comment #56)
> Hrm- Have we tried instances with gpus yet?

We haven't; those are roughly 6x more expensive than the "faster" VM's currently in use.  But I agree, that's probably what we need to get these tests running well.
I see two options at this point.

1) run these tests on real hardware in buildbot for now. This would be on the same machines that do our Linux performance tests. This will obviously delay getting these tests moved over to Task Cluster. These machines aren't particularly powerful, so there's no guarantee this will work. Perhaps worth a quick experiment to verify.

2) investigate reducing the CPU requirements of these tests. It's always better to have tests that are less resource intensive. Is there some fundamental aspect of these tests that make them expensive to run? If so, we could also look at not running them per-push to keep costs under control.
Flags: needinfo?(catlee)
I don't think that reducing the CPU requirements of these tests is a viable option.  Unfortunately, the emulator itself is very CPU-intensive, and it's quite easy to bump into a CPU-bound state with the VM's we're currently using.

I'd like to propose trying these on the g2.2xlarge node type (I'll file a bug for this) and if it works ok there, to move the tests to that VM type.  Although prices for this are relatively expensive, there is only 1 job that would need to run on it (at least for now), so the price-per-push shouldn't change much.  And, we can always look at scheduling these less frequently if needed.
Depends on: 1067521
(In reply to Chris AtLee [:catlee] from comment #58)
> I see two options at this point.
> 
> 1) run these tests on real hardware in buildbot for now. This would be on
> the same machines that do our Linux performance tests. This will obviously
> delay getting these tests moved over to Task Cluster. These machines aren't
> particularly powerful, so there's no guarantee this will work. Perhaps worth
> a quick experiment to verify.

Unfortunately, the emulator doesn't run on the current Linux hardware slaves, and it will take some work to figure out why and how to fix that.  So a quick experiment here won't be very quick.
Good news.  The dom/media tests that have been disabled (with the exception of test_dataChannel_bug1013809.html) work well on an AWS VM of instance type g2.2xlarge.  I'll file a bug to stand this up as a platform within releng infra.
Depends on: 1090612
Not actively working on this, since it requires bug 1090612 to be implemented, so unassigning.
Assignee: jgriffin → nobody
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: