Closed Bug 864866 Opened 11 years ago Closed 10 years ago

[tracker] Move away from the rev3 minis

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
All
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: armenzg, Assigned: armenzg)

References

Details

(Whiteboard: status-in-comment-47)

Attachments

(12 files, 2 obsolete files)

6.06 KB, patch
bhearsum
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
5.89 KB, patch
bhearsum
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
160.26 KB, patch
coop
: review+
Details | Diff | Splinter Review
2.28 KB, patch
jgriffin
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
5.91 KB, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
5.75 KB, text/plain
jgriffin
: feedback+
Details
4.78 KB, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
4.51 KB, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
7.48 KB, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
52.14 KB, patch
coop
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
43.72 KB, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
1.27 KB, patch
rail
: review+
armenzg
: checked-in+
Details | Diff | Splinter Review
This bug will track moving away from the rev3 minis.

What do we currently run?
* Fed32 - talos (bug 863903) and b2g (bug 850105)
* Fed64 - talos (bug 863903)
* XP (bug 770579) - unittests and talos
* Win7 (bug 770578) - unittests and talos

For some things we will also need to ride the trains.

Current machine distributions:
fed32 - 100 machines
fed64 -  39 machines
winxp - 121 machines
win7 -  129 machines
Depends on: 837017
Depends on: 850105
These 3 bugs tracks disabling rev3 jobs on Fedora, Fedora64, WinXP and Win7 on m-b, m-r and m-esr17.
Depends on: 877336, 877337, 877342
Component: Release Engineering: Machine Management → Release Engineering: Platform Support
QA Contact: armenzg → coop
Product: mozilla.org → Release Engineering
For WinXP, Win7, Fed & Fed64:
* we still run desktop tests on https://tbpl.mozilla.org/?tree=Mozilla-Esr17
** once esr17 dies, it will be taken care of
** we're one merge day away (6 weeks)
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #4)
> I also found these b2g jobs:
> ############################
...
> https://tbpl-dev.allizom.org/?tree=Mozilla-B2g26-v1.
> 2&showall=1&jobname=2g_emulator%20mozilla-
> b2g26_v1_2%20opt%20test%20marionette-webapi

I believe bug 932988 will take care of this.
Depends on: 932988
I will help drive this to completion.
Assignee: nobody → armenzg
Depends on: 863236
No longer depends on: 877342
Depends on: 948135
bug 948135 will deal with decommissioning talos-r3-w7 machines. Doing that bug now will make us not have to wait until b2g18 dies in March.
Attachment #8346598 - Flags: review?(bhearsum)
Attachment #8346600 - Flags: review?(bhearsum)
Attachment #8346598 - Flags: review?(bhearsum) → review+
Attachment #8346600 - Flags: review?(bhearsum) → review+
Comment on attachment 8346600 [details] [diff] [review]
[tools] remove talos-r3-w7 plus some old platforms

https://hg.mozilla.org/build/tools/rev/86744be92bed
Attachment #8346600 - Flags: checked-in+
mysql> select count(*) from slaves where name like 'talos-r3-w7-%';
+----------+
| count(*) |
+----------+
|      133 |
+----------+
1 row in set (0.00 sec)

mysql> delete from slaves where name like 'talos-r3-w7-%';
Query OK, 133 rows affected (0.00 sec)
Depends on: 949582
Attachment #8346735 - Flags: review?(coop)
Depends on: 948427
No longer depends on: 863236
Comment on attachment 8346735 [details] [diff] [review]
slave_health.cleanup.diff

Review of attachment 8346735 [details] [diff] [review]:
-----------------------------------------------------------------

Landed as part of:

https://hg.mozilla.org/users/coop_mozilla.com/slave_health/rev/70ca2a0e8e5f
Attachment #8346735 - Flags: review?(coop)
Attachment #8346735 - Flags: review+
Attachment #8346735 - Flags: checked-in+
I will start looking into the Fedora machines:
* bug 850101
* bug 818968
* bug 948551
== Status update ==
We have dealt with xp and win7.
We have to deal with Fed and Fed64.
We will first make sure that we have somewhere the right jobs being run (even if orange).
We need to move to not run on Rev3 machines for *any* trunk tree.
We will then look at release dates and see if they work with scl1 colo moves.

* Bug 850101 - Linux 64-bit *debug* mochitest-browser-chrome
https://tbpl.mozilla.org/?jobname=Rev3&showall=1
https://tbpl.mozilla.org/?tree=Cedar&showall=1&jobname=Ubuntu%20VM%2012.04%20cedar%20debug%20test%20mochitest-browser-chrome

It seems that the A-team is also trying to run it as a split job. I'm not 100% there.

* Bug 850105 - Run b2g emulator unit tests on Ubuntu
https://tbpl.mozilla.org/?showall=1&jobname=b2g_emulator.*reftest
https://tbpl.mozilla.org/?tree=Cedar&showall=1&jobname=b2g_emulator.*reftest

The Cedar jobs should have been running either on the EC2 instances or the in-house machines, however, they're running on Cedar :'(

On another note, the jit tests are running on the rev3 minis as well as the Firefox desktop jobs:
https://tbpl-dev.allizom.org/?tree=Cedar&showall=1&jobname=Rev3
Whiteboard: status-in-comment-16
Something here is in production
Depends on: 950111
== Status update ==
We need owners from b2g, a-team or fx devs to drive the following two bugs to completion:

* Bug 850101 - Run desktop mochitests-browser-chrome on Ubuntu
https://bugzilla.mozilla.org/showdependencytree.cgi?id=850101&hide_resolved=1
==> Status https://bugzilla.mozilla.org/show_bug.cgi?id=850101#c12

* Bug 818968 - B2G emulator tests should be runnable on AWS
==> Status https://bugzilla.mozilla.org/show_bug.cgi?id=850105#c8
It seems that bug 850101 at least has some action in the dep bugs.
Back in December we heard that Mike Lee will be helping with the b2g reftests and Felipe Gomes will be helping us with mochitest-browser-chrome.
== Status update ==
We got owners: fgomes & jrmuizel

They're still working on figuring how to make the trees go green.

* Bug 850101 - Run desktop mochitests-browser-chrome on Ubuntu
https://bugzilla.mozilla.org/showdependencytree.cgi?id=850101&hide_resolved=1
==> Status https://bugzilla.mozilla.org/show_bug.cgi?id=850101#c12

* Bug 818968 - B2G emulator tests should be runnable on AWS
==> Status https://bugzilla.mozilla.org/show_bug.cgi?id=850105#c8
Whiteboard: status-in-comment-16 → status-in-comment-21
(In reply to Armen Zambrano [:armenzg] (Release Engineering) (EDT/UTC-4) from comment #21)
> == Status update ==
> We got owners: fgomes & jrmuizel
> 
> They're still working on figuring how to make the trees go green.
> 
> * Bug 850101 - Run desktop mochitests-browser-chrome on Ubuntu
> https://bugzilla.mozilla.org/showdependencytree.cgi?id=850101&hide_resolved=1
> ==> Status https://bugzilla.mozilla.org/show_bug.cgi?id=850101#c12
I've heard nothing new from fgomes.

> 
> * Bug 818968 - B2G emulator tests should be runnable on AWS
> ==> Status https://bugzilla.mozilla.org/show_bug.cgi?id=850105#c8

We got some green test jobs on Elm.
I've got bug 975034 blocking it. I hope to make breakthrough this week.
Whiteboard: status-in-comment-21 → status-in-comment-22
The A-team has also filed a tracking bug for this initiative: bug 981775.
== Status update ==
* B2g reftests on AWS
** https://tbpl.mozilla.org/?tree=Elm&jobname=b2g_emulator_vm%20elm
** only R5 is giving us trouble
** Jeff is to investigate it bug 981856

* Debug mochitest-browser-chrome
** https://tbpl.mozilla.org/?tree=Cedar&jobname=Ubuntu.*mochitest-browser-chrome-
** Splitting into 3 chunks might be the way to go
** We're enabling the job on m-i tomorrow - bug 982225
** The work of fgomes might not be needed
Whiteboard: status-in-comment-22 → status-in-comment-24
Depends on: 984480
Depends on: 982225
Depends on: 987892, 985650
Tracking bugs in here instead:
* Bug 982225 - Run additional hidden 3 debug mochitest-browser-chrome chunks and b2g reftests
* Bug 987892 - Ubuntu debug mochitest-browser-chrome-1 is crashing on mozilla-aurora
* Bug 985650 - Allow certain jobs to run on beefier EC2 instances even if more pricey
== Status update ==
We're running side-by-side on the minis and on EC2:
* chunked debug mochitest-browser-chrome jobs
** https://tbpl.mozilla.org/?jobname=debug.*mochitest-browser-chrome
** NOTE: They will show up later today
* b2g reftests
** https://tbpl.mozilla.org/?jobname=b2g_emulator.*reftest

To finish this up we need to:
* stop running Fedora jobs
** bug 982225 and bug 985650
* uplift patches where required
** bug 987892
Whiteboard: status-in-comment-24 → status-in-comment-26
Depends on: 988432
It seems that the EC2-b2g reftests on 15 chunks is on par to the Fedora on 10 chunks wrt to wall time.
I would like to propose to disable the minis where it applies.

Gecko 31 - https://tbpl.mozilla.org/?jobname=b2g_emulator.*reftest
Gecko 30 - https://tbpl.mozilla.org/?tree=Mozilla-Aurora&jobname=b2g_emulator.*reftest

We have to investigate what is required to get these trees green:
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3&jobname=b2g_emulator_vm.*reftest&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g26-v1.2&jobname=b2g_emulator_vm.*reftest&showall=1
v1.3t is pretty broken right now and b2g18 doesn't matter anymore.
Depends on: 818968
Disabling the minis does not make sense until we fix the b2g26 & b2g28 trees since we risk introducing oranges on those branches that the minis would have caught on m-c & m-a.
Can we disable these? Who is working on them?
Attachment #8399517 - Flags: review?(jgriffin)
Comment on attachment 8399517 [details] [diff] [review]
disable jsreftest on rev3

Review of attachment 8399517 [details] [diff] [review]:
-----------------------------------------------------------------

No one is working on them, and in any case we have them running on EC2 on cedar, so when we do work on them, we should work on them in that context.
Attachment #8399517 - Flags: review?(jgriffin) → review+
Attachment #8399517 - Flags: checked-in+
Blocks: 985650
No longer depends on: 985650
== Status update ==
We're running side-by-side on the minis and on EC2:
* chunked debug mochitest-browser-chrome jobs
** https://tbpl.mozilla.org/?jobname=debug.*mochitest-browser-chrome
* b2g reftests
** https://tbpl.mozilla.org/?jobname=b2g_emulator.*reftest

We need to:
* bug 987892 - green up m-a, m-b, m-r & esr24 for debug mochitest-browser-chrome
* bug 818968 - land approved patches for b2g18, b2g26 and b2g28
** we would hope those would be sufficient to get them green
Whiteboard: status-in-comment-26 → status-in-comment-31
jgriffin, I see some fedora jobs running on b2g18:
https://tbpl.mozilla.org/?tree=Mozilla-B2g18&jobname=b2g_emulator%20mozilla-b2g18
I don't want to have to enable EC2 replacements for that job. It's only sanity reftests.
B2g18 is in EOL and I want to ask relman to give explicit consent to simply disable those jobs.

Does this work for you?
and even some Linux desktop tests!
https://tbpl.mozilla.org/?tree=Mozilla-B2g18&jobname=Rev3

I think I had deleted b2g18 from my brain past 3-17.
I agree wrt removing reftests from b2g18.  For the others, let's move them to EC2 and then simply disable whatever doesn't work.  We should spend as little time as possible trying to fix this there.
Notice that "b2g_emulator" runs on the minis and "b2g_emulator_vm" on EC2.
Attachment #8401941 - Flags: feedback?(jgriffin)
Comment on attachment 8401940 [details] [diff] [review]
On mozilla-b2g18 and v1_1_0hd (which are EOL): disable Linux desktop tests on minis, run marionette-web-api on EC2 (instead of minis) and run b2g emulator reftest sanity on EC2  (instead of minis)

woooo!
Attachment #8401940 - Flags: review?(rail) → review+
Comment on attachment 8401941 [details]
changes from previous patch

Seems reasonable for b2g18.
Attachment #8401941 - Flags: feedback?(jgriffin) → feedback+
Comment on attachment 8401940 [details] [diff] [review]
On mozilla-b2g18 and v1_1_0hd (which are EOL): disable Linux desktop tests on minis, run marionette-web-api on EC2 (instead of minis) and run b2g emulator reftest sanity on EC2  (instead of minis)

https://hg.mozilla.org/build/buildbot-configs/rev/ad21288639b2
Attachment #8401940 - Flags: checked-in+
== Status update ==
We're running side-by-side on the minis and on EC2:
* chunked debug mochitest-browser-chrome jobs
** green up to m-a
** we need to green up m-b, m-r & m-esr24
* b2g reftests
** RyanVM will be landing this morning one of our last patches needing an uplift

Bug involved:
* bug 987892 - green up m-b, m-r & esr24 for debug mochitest-browser-chrome
* bug 983650 - land approved patches for b2g26 and b2g28

* We need to see these green before switching over:
https://tbpl.mozilla.org/?tree=Mozilla-Beta&jobname=Ubuntu.*debug.*mochitest-browser-chrome&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-Release&jobname=Ubuntu.*debug.*mochitest-browser-chrome&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-Esr24&jobname=Ubuntu.*debug.*mochitest-browser-chrome&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g26-v1.2&jobname=b2g_emulator_vm.*reftest&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3&jobname=b2g_emulator_vm.*reftest&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3t&jobname=b2g_emulator_vm.*reftest&showall=1
Whiteboard: status-in-comment-31 → status-in-comment-41
Depends on: 992219
something is in production
Depends on: 992690
WRT to b2g reftests, the uplift cleared some oranges but it is not completely green (see bug 818968 for summary).

WRT to debug mochitest-browser-chrome, we got m-b green and need some more work for m-r & m-esr24.
== Status update ==
We're running side-by-side on the minis and on EC2:
* chunked debug mochitest-browser-chrome jobs
** we have disabled the jobs on the minis up to m-a
** only m-r is orange and we will leave like it (merge day in 3 weeks)
** tomorrow/Monday we will only run the jobs on the minis for m-r
* b2g reftests
** bug 994936 - tomorrow/Monday we will have disabled the jobs on the minis for the trunk trees
** next week we will disable some tests on b2g26/b2g28
** at that point we will disable *all* b2g reftests on the minis

* Last orange jobs:
https://tbpl.mozilla.org/?tree=Mozilla-Release&jobname=Ubuntu.*debug.*mochitest-browser-chrome&showall=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g26-v1.2&jobname=b2g_emulator_vm.*reftest&showall=1&onlyunstarred=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3&jobname=b2g_emulator_vm.*reftest&showall=1&onlyunstarred=1
https://tbpl.mozilla.org/?tree=Mozilla-B2g28-v1.3t&jobname=b2g_emulator_vm.*reftest&showall=1&onlyunstarred=1
Whiteboard: status-in-comment-41 → status-in-comment-45
Attachment #8405043 - Flags: review?(rail) → review+
Comment on attachment 8405043 [details] [diff] [review]
Only add debug mochitest-browser-chrome for Fedora/Fedora64 on m-r

Something has landed somewhere which makes this patch unnecessary.
Attachment #8405043 - Attachment is obsolete: true
Attachment #8405043 - Flags: checked-in-
This is how close we're [1]
* [DONE] debug mochitest-browser-chrome
* b2g reftests failing on b2g26 and b2g28
** we will either disable tests or fix the root cause this week

[1]
MacAir buildbot-configs hg:[default!] $ grep "Rev3" old_builders_list.txt
MacAir buildbot-configs hg:[default!] $ grep "emulator" old_builders_list.txt | grep -v "_vm" | grep "reftest-10"
b2g_emulator mozilla-aurora opt test reftest-10 ScriptFactory
b2g_emulator mozilla-b2g26_v1_2 opt test reftest-10 ScriptFactory
b2g_emulator mozilla-b2g28_v1_3t opt test reftest-10 ScriptFactory
b2g_emulator mozilla-b2g28_v1_3 opt test reftest-10 ScriptFactory
Whiteboard: status-in-comment-45 → status-in-comment-47
Depends on: 994936
Attachment #8406901 - Flags: review?(rail)
Attachment #8406902 - Flags: review?(rail)
Attached patch fedora.diff (obsolete) — Splinter Review
Attachment #8406903 - Flags: review?(rail)
Attachment #8406912 - Flags: review?(rail)
Attachment #8406924 - Flags: review?(coop)
Attached patch fedora.diffSplinter Review
This is a no-op change from the previous patch.
It only differs on removing 30+ lines that I had left inside of a commented out set of code.
Attachment #8406903 - Attachment is obsolete: true
Attachment #8406903 - Flags: review?(rail)
Attachment #8406929 - Flags: review?(rail)
Comment on attachment 8406924 [details] [diff] [review]
fedora.slavehealth.diff

Review of attachment 8406924 [details] [diff] [review]:
-----------------------------------------------------------------

You should remove the html bits here too:

https://hg.mozilla.org/build/slave_health/file/1ef894f23768/index.html#l136
Attachment #8406924 - Flags: review?(coop) → review+
Comment on attachment 8406902 [details] [diff] [review]
fedora.tools.diff

as we talked on IRC http://hg.mozilla.org/build/tools/file/0d316748aadd/trychooser/index.html#l158 also needs to be removed.
Attachment #8406902 - Flags: review?(rail) → review+
Comment on attachment 8406901 [details] [diff] [review]
fedora.buildbotcustom.diff

Can you also update http://hg.mozilla.org/build/buildbotcustom/file/9ced1444fbb2/test/test_try_parser.py? A separate patch WFM since this one passes trial tests.
Attachment #8406901 - Flags: review?(rail) → review+
Attachment #8406912 - Flags: review?(rail) → review+
Attachment #8406966 - Flags: review?(rail)
Comment on attachment 8406929 [details] [diff] [review]
fedora.diff

Review of attachment 8406929 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/b2g_config.py
@@ -1777,5 @@
>      for platform in BRANCHES[branch]['platforms']:
>          if 'slave_platforms' not in BRANCHES[branch]['platforms'][platform]:
>              BRANCHES[branch]['platforms'][platform]['slave_platforms'] = list(PLATFORMS[platform]['slave_platforms'])
>  
>  NON_UBUNTU_BRANCHES = set([name for name, branch in items_before(BRANCHES, 'gecko_version', 22)])

Can you delete NON_UBUNTU_BRANCHES as well? It's not used

::: mozilla-tests/config.py
@@ -2329,5 @@
> -        for suite_type, ubuntu_tests in [('opt_unittest_suites',
> -                                          get_ubuntu_unittests(branch, 'opt_unittest_suites')),
> -                                         ('debug_unittest_suites',
> -                                          get_ubuntu_unittests(branch, 'debug_unittest_suites'))]:
> -            if nested_haskey(BRANCHES[branch]['platforms'], p, ubuntu,

since this is the only place where we use nested_haskey in this file, can you also remove its import?
Attachment #8406929 - Flags: review?(rail) → review+
Attachment #8406966 - Flags: review?(rail) → review+
Attachment #8406929 - Flags: checked-in+
Attachment #8406901 - Flags: checked-in+
Attachment #8406924 - Flags: checked-in+
Attachment #8406966 - Flags: checked-in+
Attachment #8406912 - Flags: checked-in+
Attachment #8406902 - Flags: checked-in+
Blocks: 997213
We don't run anymore jobs on the minis.

It is time to do any remaining clean up and close this bug.
At slavealloc:
mysql> delete from slaves where name like 'talos-r3-fed%';
Query OK, 165 rows affected (0.01 sec)
I only left code for the fedora slaves on buildapi for the sake of looking at the past on our reports.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
I forgot to go and disable buildbot on all of the instances. Most of them had stopped.

This is the command to deal with them:
/tools/buildbot/bin/buildslave stop talos-slave
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: