Closed Bug 818968 Opened 10 years ago Closed 8 years ago

B2G emulator reftests should be runnable on AWS

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: jrmuizel)

References

Details

(Whiteboard: status-in-comment-110)

Attachments

(8 files)

...to reduce the impact on linux32 wait times.
The original idea of moving to linux64 was to help offload our physical linux32 slaves. However now that we have some linux32 tests running on AWS in production, we'd prefer to run B2G tests on AWS also. 

From irc, seems this has been discussed already, and some work already in progress, so I've modified summary to match reality.

ahal, per ctalbert, I think this should be assigned to you, as you are working on this? If not, please kick it back to me!
Assignee: nobody → ahalberstadt
Summary: B2G emulator tests should also run on Linux64 slaves, not just Linux32 → B2G emulator tests should be runnable on AWS
Yep, the B2G tests are running on the Ubuntu 32 pool on Cedar already. There's quite a few regular failures/crashes though. I have a loaner VM currently and am trying to figure out why.

I suspect we'll either have to disable the crashing tests, or try running these on the 64 bit pool with ia32-libs installed (as it seemed to work fine there when I tested it).
Status: NEW → ASSIGNED
The good news is I can reproduce the results we see on cedar.

The bad news is I see:
A) Intermittent socket connection failures
B) Intermittent b2g process crashes (with varying logcat output accompanying them)
C) Reliable emulator crashes on certain tests

Rail, how much work would it be to re-image the Ubuntu 64 pool with a few additional libraries? I'd hate to make you do a bunch of extra work only to run into the same problems though.
Flags: needinfo?(rail)
We don't need to reimage the whole farm, puppet will install needed packages after a reboot.
Flags: needinfo?(rail)
Blocks: 837268
(In reply to Andrew Halberstadt [:ahal] from comment #2)
> Yep, the B2G tests are running on the Ubuntu 32 pool on Cedar already.
> There's quite a few regular failures/crashes though. I have a loaner VM
> currently and am trying to figure out why.
> 
> I suspect we'll either have to disable the crashing tests, or try running
> these on the 64 bit pool with ia32-libs installed (as it seemed to work fine
> there when I tested it).

Not a requirement, but my preference would be just run the tests on linux32, instead of running on linux64-with-32bit-libraries installed. Somehow it sounds more stable! :-)


(In reply to Andrew Halberstadt [:ahal] from comment #3)
> The good news is I can reproduce the results we see on cedar.
cool!

> The bad news is I see:
> A) Intermittent socket connection failures
> B) Intermittent b2g process crashes (with varying logcat output accompanying
> them)
> C) Reliable emulator crashes on certain tests

Are these intermittent problems different to any intermittent problems we already see on physical linux32 test machines?

Where can we see details of these 3 crashes/failures?
(In reply to John O'Duinn [:joduinn] from comment #5)
> Not a requirement, but my preference would be just run the tests on linux32,
> instead of running on linux64-with-32bit-libraries installed. Somehow it
> sounds more stable! :-)

Well the odd thing is that I originally tested on the 64 bit VM and they all worked perfectly fine!

> Are these intermittent problems different to any intermittent problems we
> already see on physical linux32 test machines?

Nope, these problems do not exist on the fedora-32 pool

> Where can we see details of these 3 crashes/failures?

Sorry, should have pasted some logs (there is a logcat dump after the failure):
https://tbpl.mozilla.org/php/getParsedLog.php?id=19668211&tree=Cedar&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=19668227&tree=Cedar&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=19668419&tree=Cedar&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=19668897&tree=Cedar&full=1
https://tbpl.mozilla.org/php/getParsedLog.php?id=19641485&tree=Cedar&full=1

Sadly each of those logs is a slightly different failure (though I imagine at least some of them share the same root cause).
(In reply to Rail Aliiev [:rail] from comment #4)
> We don't need to reimage the whole farm, puppet will install needed packages
> after a reboot.

So do you think we could try this? Or should we really push to get them working on 32 bit?

I'm not really how to solve the problems in those logs. It isn't a missing package/dependency problem as the tests run and sometimes even pass. I can try disabling tests but due to the intermittent nature of the failures I think we might just hit them on new tests once I do. Also disabling tests is as much of a band-aid solution as switching to 64 bit machines.
Before we try to run those on 64-bit platform, I'm going to add some 32-bit tests vms with different kernel. Maybe it helps...
The generic kernel didn't seem to do anything. Can we try installing ia32-libs and running on the 64 bit pool?
I'll test how the 64-bit slaves behave with ia32-libs installed first, then we can try this option.
Depends on: 843179
Depends on: 843201
These are running on the 64 bit slaves no on cedar and already look much better. There was 1 failure in mochitest-1, but the other 8 chunks and the marionette tests were green. I re-triggered them a few times to see if they're stable or not.
From What I see on cedar (https://tbpl.mozilla.org/?tree=Cedar&rev=81d021bb66df) we have all green opt and all orange (except X) for debug.
Blocks: 844989
I'm no longer working on this now that mochitests, marionette and xpcshell tests are running on the ubuntu 64 vm's. However there's still:

1. The failures on the 32 bit pool
2. Debug failures on both
3. Reftests

So we can leave the bug open for those.
Assignee: ahalberstadt → nobody
Status: ASSIGNED → NEW
Blocks: 850105
(In reply to Andrew Halberstadt [:ahal] from comment #14)
> I'm no longer working on this now that mochitests, marionette and xpcshell
> tests are running on the ubuntu 64 vm's. However there's still:
> 
> 1. The failures on the 32 bit pool
Totally not certain why we need to run emulators on both 64bit OS's and 32bit OS's. Running them *only* on 64bit OS's should be plenty. If they run there and are stable, then let's save ourselves the headache.  Can anyone tell me why we should run them on 32bit OS's?
> 2. Debug failures on both
Yes, we need to fix these, I don't see bugs filed for this. Do they exist?
> 3. Reftests
Likewise, I don't see bugs here. Reftests actually use physical hardware for some tests. How many can we actually put in AWS? For things we can't get in AWS can we run them on Ubuntu ix machines?

> 
> So we can leave the bug open for those.
(In reply to Clint Talbert ( :ctalbert ) from comment #15)
> Totally not certain why we need to run emulators on both 64bit OS's and
> 32bit OS's. Running them *only* on 64bit OS's should be plenty. If they run
> there and are stable, then let's save ourselves the headache.  Can anyone
> tell me why we should run them on 32bit OS's?

Added capacity of being able to hand out B2G emulator tasks to either 32bit or 64bit machines, iiuc.
(In reply to Ed Morley [:edmorley UTC+1] from comment #16)
> (In reply to Clint Talbert ( :ctalbert ) from comment #15)
> > Totally not certain why we need to run emulators on both 64bit OS's and
> > 32bit OS's. Running them *only* on 64bit OS's should be plenty. If they run
> > there and are stable, then let's save ourselves the headache.  Can anyone
> > tell me why we should run them on 32bit OS's?
> 
> Added capacity of being able to hand out B2G emulator tasks to either 32bit
> or 64bit machines, iiuc.

I don't know that we should be trying to do that. It introduces another source of platform differences and possible test failures. We also can add more EC2 machines as we need to, so it's not like we're constrained by 64-bit capacity.
Oh ha yeah true.

I'm still in the 'fedora physical machine and limited in number' mentality :-)
Anyone planning on looking into this?
https://tbpl.mozilla.org/?tree=Cedar&showall=1&jobname=b2g_emulator_vm.*reftest

We are still running these jobs on the Fedora machines on trunk.
https://tbpl.mozilla.org/?showall=1&jobname=b2g_emulator.*reftest

If we could fix this on Q1 it would be great.
No longer depends on: 843100
I believe we need two different branches:
* one to work on OOP & reftests
* another to work on switching Fedora b2g reftests to Ubuntu EC2 b2g reftests

If I thought I was starting to understand a little bit the big picture:
<ahal> armenzg: on cedar those reftests are running OOP, so some of the failures might actually be because of that and not because of running on ec2
<catlee> ahal: aaah
<armenzg> ahal, is that because of some code on the actual Cedar?
<ahal> armenzg: yes, and that's intentional
<armenzg> or some parameters on the unit tests?
<ahal> I just pointed the graphics team at those failures today
<armenzg> ahal, would we need a different branch to fix those reftests?
<ahal> armenzg: it's a pref being set in the harness
<armenzg> *the b2g reftests
<ahal> armenzg: ideally, or if we could figure out how to only have oop enabled for the fed-r3 pool, that would work too
<armenzg> ahal, where is that pref specified?
<ahal> armenzg: http://hg.mozilla.org/projects/cedar/rev/de1a3b3fbf4e
ahal> we could add a --enable-oop command line option or something
<armenzg> ahal, I think separating branches will make it cleaner
<ahal> armenzg: agreed
ahal> armenzg: catlee: though keep in mind that we are trying to switch reftests to be oop, so eventually they will need to work oop and on ec2
Summary: B2G emulator tests should be runnable on AWS → B2G emulator reftests should be runnable on AWS
Repeating the status update that I mentioned on the blocked bug:

== Status update ==
We run b2g reftests on rev3 minis on trunk:
https://tbpl.mozilla.org/?showall=1&jobname=b2g_emulator.*reftest
These minis are currently *not* managed due to security issues with the old puppet setup.
These minis are to be decommissioned, they are currently incapable of holding the load (bad wait times) and the data-center they are racked in (SCL1) is to be shut down on H1 2014 IIUC.

We could run these jobs on EC2/AWS, however, it is not currently succeeding and no one is currently assigned to fix them (see dependent bug 818968 for details):
https://tbpl.mozilla.org/?tree=Cedar&showall=1&jobname=b2g_emulator_vm.*reftest

As catlee mentions on comment 7, we might be able to run them on in-house machines, however, it is likely that we won't have enough machines and would require a hardware purchase.
Whiteboard: status-in-comment-21
We probably need the help of the GFX team as well to investigate some of these problems.

As a last resort, we could potentially get there by disabling tests, but that's a slippery slope...
How many tests are failing on AWS?
I can't get a hard figure atm, because we're only running these on EC2 on cedar, which we're also using for making the tests run OOP.  It looks like several dozen failures, but there are some crashes, so the actual number may be higher once all the tests are run.
(In reply to Jonathan Griffin (:jgriffin) from comment #25)
> It looks like
> several dozen failures, but there are some crashes, so the actual number may
> be higher once all the tests are run.

I'm sure there are patterns here so we can distribute these failures and crashes and get them fixed quickly.  I'll help find owners, just send me the details.
(In reply to Andrew Overholt [:overholt] from comment #26)
> (In reply to Jonathan Griffin (:jgriffin) from comment #25)
> > It looks like
> > several dozen failures, but there are some crashes, so the actual number may
> > be higher once all the tests are run.
> 
> I'm sure there are patterns here so we can distribute these failures and
> crashes and get them fixed quickly.  I'll help find owners, just send me the
> details.

And Andrew will ping me for the graphics issues.
For some reason I can't understand, it seems that I only mentioned on email threads but it is worth mentioning it in here. agal said that mlee should be able to help in here.
I haven't heard back from him.
Flags: needinfo?(mlee)
Depends on: 957767
(In reply to Andrew Overholt [:overholt] from comment #26)
> (In reply to Jonathan Griffin (:jgriffin) from comment #25)
> > It looks like
> > several dozen failures, but there are some crashes, so the actual number may
> > be higher once all the tests are run.
> 
> I'm sure there are patterns here so we can distribute these failures and
> crashes and get them fixed quickly.  I'll help find owners, just send me the
> details.

Andrew, the tests are now running on pine on EC2, side-by-side the Fedora HW slaves; you can see there are 100+ failures on the EC2 runs:  https://tbpl.mozilla.org/?tree=Pine&jobname=reftest

Can you and/or Milan find some developers to look at these failures?  If it would help, we could file some bugs for the failures, but at this point I'm guessing that would just be noise.
Flags: needinfo?(overholt)
Flags: needinfo?(milan)
To record what was tripping me here = I needed to add &showall=1 in order to see the runs.
Flags: needinfo?(milan)
A large number of failed tests have the maximum difference of 1 or 2.
Flags: needinfo?(mlee)
(In reply to Jonathan Griffin (:jgriffin) from comment #29)
> (In reply to Andrew Overholt [:overholt] from comment #26)
> > (In reply to Jonathan Griffin (:jgriffin) from comment #25)
> > > It looks like
> > > several dozen failures, but there are some crashes, so the actual number may
> > > be higher once all the tests are run.
> > 
> > I'm sure there are patterns here so we can distribute these failures and
> > crashes and get them fixed quickly.  I'll help find owners, just send me the
> > details.
> 
> Andrew, the tests are now running on pine on EC2, side-by-side the Fedora HW
> slaves; you can see there are 100+ failures on the EC2 runs: 
> https://tbpl.mozilla.org/?tree=Pine&jobname=reftest
> 
> Can you and/or Milan find some developers to look at these failures?  If it
> would help, we could file some bugs for the failures, but at this point I'm
> guessing that would just be noise.

Yes, definitely.  I'll work on this.
Flags: needinfo?(overholt)
Bas and I had an exchange on IRC, and he had some thoughts on the subject of a great majority of tests showing 1 or 2 pixel difference around the SVG characters).  I think AA, but I'll let Bas comment.
Flags: needinfo?(bas)
(In reply to Jonathan Griffin (:jgriffin) from comment #29)
> (In reply to Andrew Overholt [:overholt] from comment #26)
> > (In reply to Jonathan Griffin (:jgriffin) from comment #25)
> > > It looks like
> > > several dozen failures, but there are some crashes, so the actual number may
> > > be higher once all the tests are run.
> > 
> > I'm sure there are patterns here so we can distribute these failures and
> > crashes and get them fixed quickly.  I'll help find owners, just send me the
> > details.
> 
> Andrew, the tests are now running on pine on EC2, side-by-side the Fedora HW
> slaves; you can see there are 100+ failures on the EC2 runs: 
> https://tbpl.mozilla.org/?tree=Pine&jobname=reftest
> 
> Can you and/or Milan find some developers to look at these failures?

I'm only seeing reftest pixel differences failures.  Is that correct?

Did you also want help with bug 905177, bug 918754, bug 942111, bug 926264, and bug 937897?
Flags: needinfo?(jgriffin)
> I'm only seeing reftest pixel differences failures.  Is that correct?

Yes.  Milan has already had an initial look at these failures as well.

> Did you also want help with bug 905177, bug 918754, bug 942111, bug 926264, and bug 937897?

We'll take any help we can get, but none of these are frequent enough to be a high priority (most have only a few or no occurrences this year so far).
Flags: needinfo?(jgriffin)
I asked in the other bug, but let me ask here as well - is there a dependency or a preferred ordering between this and bug 922680?
Good question...Armen, is there a drop dead date for getting rid of the Fedora slaves?
Flags: needinfo?(armenzg)
(In reply to Jonathan Griffin (:jgriffin) from comment #35)
> > Did you also want help with bug 905177, bug 918754, bug 942111, bug 926264, and bug 937897?
> 
> We'll take any help we can get, but none of these are frequent enough to be
> a high priority (most have only a few or no occurrences this year so far).

The low occurrence might have been due that we only run them on Cedar and we don't start things over there.

(In reply to Milan Sreckovic [:milan] from comment #36)
> I asked in the other bug, but let me ask here as well - is there a
> dependency or a preferred ordering between this and bug 922680?

Right now, we're under water as-is.
If this bug gets fixed, we will fix the capacity problem and have a working state before tackling running them as out of process.
Nevertheless,the ideal would be to 1) run b2g reftests out of process on 2) AWS, however, we can't tell how much doing both at the same time would delay the whole thing.
I would want to *only* focus on moving them to AWS or talos-linux64 ix machines as fallback (since we have a lot and can actually buy more).

Said that, the minis are on SCL1 and we are shutting down the colo on July. That means that worst comes to worst we have to be out of there by May. If we still required the minis by then we would have to coordinate moving all of them in one shot and have a very long downtime until they're all back again on SCL3. We should avoid this at all cost.

If we could aim at having these running on production by the end of Q1 it would be ideal.

Side note, we will need to uplift some patches to b2g26 branches (where b2g reftests might be running).
Flags: needinfo?(armenzg)
Milan and I just discussed this and decided that the work for this bug will get into the gfx team schedule and will happen before any work required by them on bug 922680.
(In reply to Andrew Overholt [:overholt] from comment #39)
> Milan and I just discussed this and decided that the work for this bug will
> get into the gfx team schedule and will happen before any work required by
> them on bug 922680.

Right - it may happen in parallel, but only if working on bug 922680 does not slow down working on this.  There is a good chance to parallelize, by splitting between Taipei and non-Taipei gfx teams...
(In reply to Milan Sreckovic [:milan] from comment #33)
> Bas and I had an exchange on IRC, and he had some thoughts on the subject of
> a great majority of tests showing 1 or 2 pixel difference around the SVG
> characters).  I think AA, but I'll let Bas comment.

My gut instinct was that there was a rounding difference in blending (it wasn't just around characters), and by coincidence the tests I looked at just happened to have all their partially opaque pixels as a result of AA. I only looked at two tests though, and it would take looking at a little more to say with more certainty.
Flags: needinfo?(bas)
Thanks for looking into it. It is great to see some progress at looking at this and being included into the schedule.
Where does this fit inside the gfx schedule?

Please file a loan request when ready to tackle fixing this. [1]

Just as a reminder, we have a limited pool and it affects getting results on tbpl due to backlogs on test results.

[1] https://wiki.mozilla.org/ReleaseEngineering/How_To/Request_a_slave
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #42)
> Thanks for looking into it. It is great to see some progress at looking at
> this and being included into the schedule.
> Where does this fit inside the gfx schedule?

We're aiming for Gecko 30, but it hasn't been assigned yet.
As I mentioned earlier at the office to Milan, thanks for letting us know the schedule. It works with our data-center-evacuation deadlines (IIUC).
Best of luck with it. Let us know how we can help.
Assignee: nobody → jmuizelaar
Depends on: 965429
Depends on: 965484
Jeff, is this the config file you're using?
http://hg.mozilla.org/build/mozharness/file/6951cbba436d/configs/b2g/emulator_automation_config.py

https://pastebin.mozilla.org/4149410

Did you download the emulator locally? I can't tell from the pastebin you gave me.
Maybe you can specify it with a similar option like --installer-url.

FTR, you can use --no-clobber to prevent local clobbering.

Here's the mozinstall code if you want to look at it:
http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozinstall/mozinstall/mozinstall.py
It would be great if we could get a test run with the environment variable GALLIUM_DRIVER=softpipe

That seems to help with quite a few of the tests.
Not sure where this is getting tested, but you can add it here:
http://mxr.mozilla.org/mozilla-central/source/build/mobile/b2gautomation.py#83
Depends on: 968199
Depends on: 967799
I landed a mozharness change that sets GALLIUM_DRIVER=softpipe for the b2g reftest jobs.

I have triggered a b2g ics emulator build on Cedar and we should see the results in a couple of hours:
https://tbpl.mozilla.org/?tree=Cedar&jobname=emulator&rev=8001db135ba5
It seems we need to back out the change since it is making the crashtest jobs to run more than 100 minutes long (and timing out) while they used to run around 50 minutes.

I've also started seeing some mochitest timeouts.
The patch landed again with a fix.

I've re-triggered the crashtest jobs in here to prove that things are now fixed (2nd & 3rd re-triggered jobs):
https://tbpl.mozilla.org/?tree=Cypress&jobname=b2g_emulator.*crashtest

I will issue a new comment once I know where is the right place to look for our current status.
I figured it out.

Do not use Cedar to check on status for this bug since OOP is running there.

I'm going to use Elm for status on where we are. I hope to have it ready by tomorrow.
Comment on attachment 8374088 [details] [diff] [review]
[checked-in] configure Elm to run b2g reftests on Fedora and Ubuntu

lgtm
Attachment #8374088 - Flags: review?(rail) → review+
Comment on attachment 8374090 [details] [diff] [review]
show builders differences after attachment 8374088 [details] [diff] [review]

ship it!
Attachment #8374090 - Flags: feedback?(rail) → feedback+
Comment on attachment 8374088 [details] [diff] [review]
[checked-in] configure Elm to run b2g reftests on Fedora and Ubuntu

http://hg.mozilla.org/build/buildbot-configs/rev/433520990836

I will be updating the buildbot masters to make this change live.
Attachment #8374088 - Attachment description: configure Elm to run b2g reftests on Fedora and Ubuntu → [checked-in] configure Elm to run b2g reftests on Fedora and Ubuntu
Live in production.
If we want to see the Fedora tests:
https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator%20elm

If we want to see the Ubuntu tests:
https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator_vm%20elm

Jeff, this is back to you to evaluate where we're at.

I believe you landed something on m-i which I don't believe is contained on my push from this morning.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #58)
> If we want to see the Fedora tests:
> https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator%20elm
> 
> If we want to see the Ubuntu tests:
> https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator_vm%20elm
> 
> Jeff, this is back to you to evaluate where we're at.
> 
> I believe you landed something on m-i which I don't believe is contained on
> my push from this morning.

Can we try relanding the softpipe change once bug 967799 has been merged to m-c?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #59)
> 
> Can we try relanding the softpipe change once bug 967799 has been merged to
> m-c?

Pushed:
https://hg.mozilla.org/build/mozharness/rev/7853577f5492

The Cedar branch reads from the default branch. Elm does from production.
I will re-trigger the jobs.
Jeff, the jobs are now running with your change
https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator_vm%20elm
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #61)
> Jeff, the jobs are now running with your change
> https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator_vm%20elm

This doesn't seem to have bug 967799 yet.
That change just landed in m-c today. Someone will need to merge m-c to elm before it hits there.
(In reply to Jeff Muizelaar [:jrmuizel] from comment #62)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> from comment #61)
> > Jeff, the jobs are now running with your change
> > https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator_vm%20elm
> 
> This doesn't seem to have bug 967799 yet.

Can you please take care of merging to Elm?

I mentioned it on comment 58:
> I believe you landed something on m-i which I don't believe is contained on my push from this morning.
For the curious, we started seeing some green:
https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator_vm%20elm
jrmuizel, I see that the slave has been returned in bug 965429.

What are the next steps in here?
(In reply to (Wed-Thu. Feb.19-20th in TRIBE) Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #66)
> jrmuizel, I see that the slave has been returned in bug 965429.
> 
> What are the next steps in here?

I'm going to put together a patch against mesa to fix llvm and then you guys are going to deploy that mesa and we will switch back to llvmpipe.
Thank you jrmuizel.
Depends on: 975034
We deployed today the patched mesa.
While we discover if the change will stick or not, I have merge m-i to elm so we get fresh test results based on the patched libraries.
We should know by tomorrow if any oddities are found on other test suites run on these hosts.

The only worry I have is a single test failure on mozilla-inbound for b2g desktop on Linux64.
I don't see it on b2g-inbound but it could be a matter of timing.
Let's re-trigger to get more results.
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=b2g_ubuntu64.*gaia
https://tbpl.mozilla.org/?tree=B2g-Inbound&jobname=b2g_ubuntu64.*gaia
The test failures are not related to our deployment. It was a bad merge.
Green! Green!
https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator_vm

It seems that R5 is the only stubborn one.

Jeff, what do you think is needed for that last suite?
Status update
#############

* Running side by side b2g reftests on Elm on minis and ec2
** https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator_vm
** https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator%20 (the last space is intentional)
* All reftests except R5 are running green
Whiteboard: status-in-comment-21 → status-in-comment-72
Depends on: 981856
Depends on: 981865
hrmmm... after merging m-c to Elm we have started seeing a crash on all of the suites.
https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator_vm%20elm

PROCESS-CRASH | Shutdown | application crashed [@ 0x0]
Bug 930884 - Intermittent PROCESS-CRASH | Shutdown | application crashed [@ 0x0] [@ egl_window_surface_t::~egl_window_surface_t] [@ GLContextEGL::~GLContextEGL] Return code: 1
https://tbpl.mozilla.org/php/getParsedLog.php?id=35953998&tree=Elm&full=1

I've re-triggered the previous changesets to see if it was merge related or not.
I will merge once more.
FTR we upload artifacts to blobber to help determine issues e.g.:
12:55:56     INFO -  (blobuploader) - INFO - TinderboxPrint: Uploaded qemu.log to http://mozilla-releng-blobs.s3.amazonaws.com/blobs/elm/sha512/8904641cb2dce54e269d4afab27948f42ea0295d1445f5c7289d5a592a0c243c58e78dba9e55d45ab72a8dbbaa5b6b3cbda75e149e5ce62a663575ffdd270cc4
It was the merge from m-c.
Re-triggers on https://hg.mozilla.org/projects/elm/rev/7b8c2a48d08b did not have the crash.
Yeah you just got unlucky and merged between bustage and backout:
https://hg.mozilla.org/projects/elm/rev/1bcd1fea2e73

That being said, you must have merged inbound in, I'd recommend not doing that :)
We're back to R5 failing (webgl).
R5 has run green. One of the runs failed 3 tests:
https://tbpl.mozilla.org/?tree=Elm&jobname=b2g

I'm going to write a patch to enable these jobs side by side on m-i.
I want to see them run a lot side-by-side.
On Monday we should look at enabling on more branches depending on how it looks.
FTR, I believe the only things we will have to uplift are bug 981856 and bug 967799.
Attachment #8391172 - Flags: review?(bhearsum) → review+
Depends on: 983650
Live.

We will start seeing them in a bit:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=b2g_emulator_vm.*elm%20opt%20test%20reftest&showall=1

We might need to hide if we see intermittent oranges.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #81)
> Live.
> 
> We will start seeing them in a bit:
> https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=b2g_emulator_vm.
> *elm%20opt%20test%20reftest&showall=1
> 
> We might need to hide if we see intermittent oranges.

Many of these are failing, so I've hidden all b2g_emulator_vm reftest jobs on mozilla-inbound.
They're now running:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=b2g_emulator_vm.*opt%20test%20reftest&showall=1

I thought the patch from bug 981856 would have landed by now.
jrmuizel, can you please land it when the trees re-open?

R1 failed once:
https://tbpl.mozilla.org/php/getParsedLog.php?id=36152212&tree=Mozilla-Inbound&full=1
I've landed bug 981856 and bug 983650 which should be enough to make these tests go green.
Depends on: 984468
= Status update =
* Running hidden side-by-side on Mozilla-Inbound:
** All jobs are running green
** https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=b2g_emulator_vm.*reftest&showall=1
** I will ask sheriffs if we can go ahead and enable across the board (bug 984468)
Depends on: 985556
This patch depends on bug 985556 and bug 984468 being fixed first.
Hopefully by tomorrow.
Attachment #8393664 - Flags: review?(rail)
Attachment #8391172 - Attachment description: enable b2g reftests on EC2 for m-i → [checked-in] enable b2g reftests on EC2 for m-i
Attachment #8393664 - Flags: review?(rail) → review+
Comment on attachment 8393667 [details]
change of list of builders from attachment 8393664 [details] [diff] [review]

lgtm
Attachment #8393667 - Flags: feedback?(rail) → feedback+
Depends on: 985650
Here is a list of intermittent failures that I've seen. The R1 failure happens more often than the other two (only 1 instances), however, it is a known intermittent.

All jobs:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=b2g_emulator_vm.*reftest&showall=1

R1 - Known intermittent on the Fedora machines - bug 930894
R3 - Known intermittent on the Fedora machines - bug 948389
R6 - Not known

#1) REFTEST TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/image/test/reftest/ico/ico-bmp-corrupted/wrapper.html?invalid-compression-RLE4.ico | image comparison (==), max difference: 255, number of differing pixels: 4
https://tbpl.mozilla.org/php/getParsedLog.php?id=36451088&tree=Mozilla-Inbound&full=1
(blobuploader) - INFO - TinderboxPrint: Uploaded qemu.log to http: //mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/cd586370f83e777665329ad43e580eebdc1696cb5a909b74c07bd72d0080d9927656c25f822ce21b0c4f82f94542781eee1b64ec2ac5706eac0b7a40852ade50
(blobuploader) - INFO - TinderboxPrint: Uploaded emulator-5554.log to http: //mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/52325c8a1abacaad7c4edecb75d6c08c20c0769feebeb88856d4858e48c81eb7ecd1cf094fa7260f394fa7bee4a48c8ff69158a4d1077040c665f62a1ce4296e

R3) REFTEST TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/reftests/bugs/405577-1.html | image comparison (==), max difference: 16, number of differing pixels: 73
Bug 948389 - Intermittent REFTEST TEST-UNEXPECTED-FAIL | tests/layout/reftests/bugs/405577-1.html | image comparison (==), max difference: 16, number of differing pixels: 73
https://tbpl.mozilla.org/php/getParsedLog.php?id=36461344&tree=Mozilla-Inbound&full=1
(blobuploader) - INFO - TinderboxPrint: Uploaded qemu.log to http: //mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/61ea59f1f1ebe9f9d1d69066ea43ab18ef3a68f19974238e0a8d5603b966a056990594d9a3b27637a4587b556a01e154f4a1f11111b36fb3d6fb6167ba208f38
(blobuploader) - INFO - TinderboxPrint: Uploaded emulator-5554.log to http: //mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/72914f85ade6e4b102af7e55a7ee57264a45036e9fac269e4c67d01495146d01d2856539671159999eb0e762c038b1510f434c5d1a79918872b6f19c0110b22e

R6) TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/reftests/position-dynamic-changes/horizontal/fromauto-leftN-widthA-rightN-2.html?border_parent | application timed out after 330 seconds with no output
https://tbpl.mozilla.org/php/getParsedLog.php?id=36440356&tree=Mozilla-Inbound&full=1
(blobuploader) - INFO - TinderboxPrint: Uploaded qemu.log to http: //mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/9d1f67257efb69e3fd5a58163286bf73434da66f86e7d06d81a48a96da5fdada4f41a0194b3e9cb3b08904d19cfd26ecefeb2ffd78b021fc7f182fd73d924ae2
(blobuploader) - INFO - TinderboxPrint: Uploaded emulator-5554.log to http: //mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/9e2e975c57f6b1691351c274e62c4177c216294aaa91399d236d5b5c5c983de05dadafa99cf36d636fbe0ae453777f5820dcc1260040e7d286d9287b1c88d00e
Whiteboard: status-in-comment-72 → status-in-comment-90
Today, we can say that b2g reftests are runnable on EC2.
I'm moving bug 985650 to bug 864866 to make it easier to see what is left to move away from the minis.

Jeff, what is left for bug 983650?
No longer depends on: 985650
Jeff, can you please help with the uplift process?

We have to investigate what is required to get these trees green:
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3&jobname=b2g_emulator_vm.*reftest&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g26-v1.2&jobname=b2g_emulator_vm.*reftest&showall=1
v1.3t is pretty broken right now and b2g18 doesn't matter anymore.
Blocks: 864866
Waiting on uplift requests to be reviewed:
https://bugzilla.mozilla.org/show_bug.cgi?id=981856#c10
https://bugzilla.mozilla.org/show_bug.cgi?id=967799#c12

If I hear nothing by tomorrow I will ask about as to who reviews them and if they could review it soon.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #93)
> Waiting on uplift requests to be reviewed:
> https://bugzilla.mozilla.org/show_bug.cgi?id=981856#c10
> https://bugzilla.mozilla.org/show_bug.cgi?id=967799#c12
> 
> If I hear nothing by tomorrow I will ask about as to who reviews them and if
> they could review it soon.

Anyone around that has those 4 repos checked out and could land on them the approved 2 patches?
https://tbpl.mozilla.org/?tree=Mozilla-B2g18&jobname=b2g_emulator_vm.*reftest&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3t&jobname=b2g_emulator_vm.*reftest&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3&jobname=b2g_emulator_vm.*reftest&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g26-v1.2&jobname=b2g_emulator_vm.*reftest&showall=1

I could check the repos out if no one was able to help by tomorrow.
Thank you jgriffin for pushing those changes to 1.3.

I see some issues for R7 (perma-orange), R8 (perma-orange) & R11 (high freq. intermittent).
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3&jobname=b2g_emulator_vm.*reftest&showall=1&rev=c6c7b01cdb8e
== Status ==
* Waiting for uplift on bug 983650 to happen on b2g26/28 branches
* Once that lands, we hope to get green
* If we do, then we can unhide the jobs and start giving people heads up that we're going to switch over

FTR: I will be traveling on Wednesday and away on Friday for a conference.
Whiteboard: status-in-comment-90 → status-in-comment-97
I've unhidden the b2g reftests that are green on the b2g 26 & 28 trees so we don't regress.

How do we move forward with this? Keep on investigating one by one the issues on the b2g release branches?

We have various perma-oranges on the b2g release branches.
Some of them are 1-test failure which we can disable.
Some of them are T-FAIL and I see this message:
12:02:33     INFO -  System JS : ERROR chrome://marionette/content/marionette-listener.js:173
12:02:33     INFO -                       TypeError: content is null

On another note, do I need to something special to look at the emulator.log that get uploaded to blobber? It is all gibberish:
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-b2g26_v1_2/sha512/2e3ec79ae08c9561d368764adadabc0a144643cd98fb9a1c14fa7cd731c49da0e5cf84de43939941166095f0839fb21681320dd9aa8f533f21c2c7d39b506025

I'm looking at the output of the logcat on the logs (after "INFO - dumping logcat") and we might have info about the T-FAIL instances (all sorts of warnings that I can't tell if they're important or not).

On B2G26-v1.2 [1]:
* We have R8, R11 & R13 misbehaving

R8: T-FAIL or 28 tests failing
TypeError: content is null
TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/reftests/font-inflation/text-1.html | application timed out after 330 seconds with no output

R11: The regression was introduced with one of the changesets by jrmuizel (I think):
https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/e04883daad8a

R13:
TypeError: content is null
TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/reftests/svg/text/simple-multiline-number.svg | application timed out after 330 seconds with no output

On B2G28-v1.3 [2]:
* We have R7, R8 & R11 misbehaving

R7: T-FAIL:
TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/content/canvas/test/reftest/webgl-color-test.html?__&alpha&_____&_______&preserve&_______ | application timed out after 330 seconds with no output

R8: T-FAIL
TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/reftests/font-inflation/textarea-3.html | application timed out after 330 seconds with no output
OR
TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/content/canvas/test/reftest/webgl-color-test.html?aa&_____&depth&_______&preserve&_______ | application timed out after 330 seconds with no output

On B2G28-v1.3t [3]:
* We have R7, R11 & R15 misbehaving

R7: T-FAIL

R11: Intermittent 1-test failure

R15: 1 test failing:
REFTEST TEST-UNEXPECTED-FAIL | http://10.0.2.2:8888/tests/layout/reftests/text/wordbreak-4a.html | image comparison (==), max difference: 255, number of differing pixels: 3437

[1] https://tbpl.mozilla.org/?tree=Mozilla-B2g26-v1.2&jobname=b2g_emulator_vm.*reftest&showall=1
[2] https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3&jobname=b2g_emulator_vm.*reftest&showall=1
[3] https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3t&jobname=b2g_emulator_vm.*reftest&showall=1
It seems like a lot of these are timeouts.
They're permanent timeouts rather than sporadic. Re-triggers don't green up.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #100)
> They're permanent timeouts rather than sporadic. Re-triggers don't green up.

Any guesses as to why we're timing out on these branches and not on trunk?
(In reply to Jeff Muizelaar [:jrmuizel] from comment #101)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4)
> from comment #100)
> > They're permanent timeouts rather than sporadic. Re-triggers don't green up.
> 
> Any guesses as to why we're timing out on these branches and not on trunk?

They're Gecko26 & 28 instead of 31, besides that difference I don't know.
(In reply to comment #101)
> (In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from
> comment #100)
> > They're permanent timeouts rather than sporadic. Re-triggers don't green up.
> 
> Any guesses as to why we're timing out on these branches and not on trunk?

There's a thread on dev-platform discussing weird emulator timeouts for mochitests right now.  Perhaps that's relevant?
Perhaps.  In any case, we need to be completely off the rev3 minis by the end of April, so we'll likely need to disable the tests on 26 & 28 unless someone has time to investigate them.  We probably can wait about 1 week before we start doing this, to make sure we meet our deadline.
Depends on: 994936
Looks like all the relevant patches have been uplifted, and reftests on AWS are green everywhere except 1.3T, where there are a few intermittents.  That branch is not sheriffed, so we don't need to green it up right now (there are some intermittents on Fedora slave as well).

I think we can likely declare victory here, and disable the Fedora tests on non-trunk branches next week.  Thanks for the hard work everyone.
doh!  I didn't notice that.  I guess aggressive disabling will start next week then.
I've sent a notice to dev.b2g and dev.platform for them to know that we will be looking at disabling those tests on those branches.
v1.3 merges into v1.3t so we shouldn't need to land there separately.
= Status update =
* We have perma failures on b2g26 and b2g28 branches [1]
* Today I will meet with jgriffin and ahal on how to proceed on disabling those failing tests
* Once we get green we will be disabling the minis on m-a and b2g release branches

[1]
https://tbpl.mozilla.org/?tree=Mozilla-B2g26-v1.2&jobname=b2g_emulator_vm.*reftest&showall=1&onlyunstarred=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3&jobname=b2g_emulator_vm.*reftest&showall=1&onlyunstarred=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3t&jobname=b2g_emulator_vm.*reftest&showall=1&onlyunstarred=1
Whiteboard: status-in-comment-97 → status-in-comment-109
Whiteboard: status-in-comment-109 → status-in-comment-110
So all the failures except for R8 on b2g26 are consistent. I vote we start by disabling those and then see what happens. The R8 on b2g26 has a different failure each time, so it seems like something timing related to a prior test. That being said, some of the b2g26 failing R8 tests are the same ones as in b2g28 R8. So we can try disabling them on 26 anyway to see what happens.
No longer depends on: 994936
I decided to skip-if instead of random-if all the webgl-color-test because they were showing up orange even with the random-if? I guess we can't have two random-if statements on the same line?

Anyway, random-if is essentially the same as skip-if in terms of who looks at results, and skip-if has less chance of affecting other tests :).
Attachment #8406362 - Flags: review?(jgriffin)
Comment on attachment 8406362 [details] [diff] [review]
disable failures on b2g28, pass 1

Review of attachment 8406362 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds good to me.
Attachment #8406362 - Flags: review?(jgriffin) → review+
Here's the b2g26 initial pass. Changed some random-if's to skip-if's for the same reason. The reason why they are failing even though set to random-if is because they are hitting an exception and erroring out. Skipping will hopefully prevent this (though it's possible the exception is unrelated to a specific test, let's hope not).
Attachment #8406381 - Flags: review?(jgriffin)
Attachment #8406381 - Flags: review?(jgriffin) → review+
The 26 patch had a space between skip-if (B2G) which seems to have caused a parse error:
https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/fb09ba297b43
b2g28 now seems green, I think we can go ahead and stop scheduling the fedora tests there. b2g28t still has a few failing reftests, but since 28 merges into 28t I think these are due to something other than running on AWS.
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #118)
> R15 is perma-failing on b2g28t:
> https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3t&jobname=b2g_emulator_vm.
> *reftest&showall=1&onlyunstarred=1

Right, but b2g28t is not sherriffed and there's a chance that failure is caused by something that landed specifically on b2g28t, so I think we should leave it be (for now at least).
Should we then leave R15 hidden there? I'm happy to.
Down to R8 & R15:
https://tbpl.mozilla.org/?tree=Mozilla-B2g26-v1.2&jobname=b2g_emulator_vm.*reftest&showall=1&onlyunstarred=1

Perhaps we're raising up some issues with webgl and svg?
Or need to chunk further for this branch? (since we don't have faster VMs at the moment).
If you'd like I can try a few more rounds of disabling, but my intuition is telling me that new tests will just start failing instead.
At least R13 is a consistent failure, we might have a chance at greening that up.
(In reply to Andrew Halberstadt [:ahal] from comment #122)
> If you'd like I can try a few more rounds of disabling, but my intuition is
> telling me that new tests will just start failing instead.

I get the same vibe.

Perhaps we just disable R8 completely for that branch and live with it.
1.2 isn't very important; I think we could get away with disabling perma-orange chunks.
WFM. I will work on the patches.

FYI we've announced this on the platform meeting few minutes ago:
https://wiki.mozilla.org/Platform/2014-04-15#RelEng_.28catlee.29
Attachment #8407097 - Flags: review?(rail) → review+
Comment on attachment 8407097 [details] [diff] [review]
[checked-in] disable some chunks for b2g26 and some for b2g28t

Review of attachment 8407097 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not 100% sure, but I think we should just leave 1.3t alone. It's not being sheriffed so oranges aren't a cause for backout. Though I guess they still do introduce noise, so maybe we should just stop running them. Either way I'm not really the person to make that call, so I'll leave it up to you :)
Attachment #8407097 - Flags: feedback?(ahalberstadt)
Attachment #8407097 - Attachment description: disable some chunks for b2g26 and some for b2g28t → [checked-in] disable some chunks for b2g26 and some for b2g28t
We're done in here.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.