Use ec2 instances with more memory when running linux gtests

RESOLVED FIXED

Status

Release Engineering
Buildduty
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: chmanchester, Assigned: kmoir)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(5 attachments, 11 obsolete attachments)

2.52 KB, patch
rail
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
5.98 KB, text/plain
Details
858 bytes, patch
rail
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
14.13 KB, patch
kmoir
: checked-in+
Details | Diff | Splinter Review
19.23 KB, patch
kmoir
: checked-in+
Details | Diff | Splinter Review
(Reporter)

Description

2 years ago
We found in bug 1204148 that a few gtests require a lot of memory. The path forward seems to be to allow these to run on instances with more memory.

It looks like the tst-linux64 instance type has 3.75 GiB, the next level up, 7.5 GiB, should be enough. I have a loaner out in bug 1204153 we can change the type of to confirm this is sufficient.
(Reporter)

Comment 1

2 years ago
rail, is this something you can help me figure out? If it's something I could conceivably do myself I'm more than happy to write a patch with a few pointers. Thank you.
Flags: needinfo?(rail)
We tried to switch to a different instance type in bug 945981 for all tst-linux machines, but failed due to some flaky tests.

Maybe we can use tst-emulator64 slaves for this? Kim added them at some point IIRC. We can ask our buildduty to work on this I think.

ATM we use c3.xlarge (7.5G of RAM) and m3.xlarge (15G of RAM) (see https://github.com/mozilla/build-cloud-tools/blob/master/configs/watch_pending.cfg#L76-L84) for these slaves. 

If this approach works, we will need to 1) use "tst-emulator64" slaves for these tests, 2) update regexes at https://github.com/mozilla/build-cloud-tools/blob/master/configs/watch_pending.cfg#L7 and 3) probably increase the pool. 

If we need linux32 as well, we may need to setup a different slave type.
Flags: needinfo?(rail)
(Reporter)

Comment 3

2 years ago
If tst-emulator64 is a lot tst-linux64, but with more memory, that should work great.

The tests in question are disabled on linux32, so that would not be an immediate need.

I will ask in bug 1204153 to switch to this instance type to confirm.
Component: General Automation → Buildduty
QA Contact: catlee → bugspam.Callek
(Assignee)

Comment 4

2 years ago
What is the priority on this bug?  Right now the buildduty folks are working on testing our new 10.10 machines.  I can add it to their queue

As for expanding the emulator pool, we have to bring up new masters first (bug 1205409) before we increase the pool size again (bug 1204756)
Depends on: 1205409
(Reporter)

Comment 5

2 years ago
Thank you for the additional information.

I think this is "normal" priority. This is the last step in what has been a quarter-long effort to get the GTests running and green on test machines. There isn't a single factor I'm aware of that would make this time sensitive in a way that would make it a priority over ongoing work, but there is some urgency in the sense that the longer these tests are run on builders the more likely it is issues will creep in that keep them from running and passing on test machines, and the process will be prolonged yet again.

With this in mind, what kind of time frame is realistic for this change?
(Reporter)

Comment 6

2 years ago
I confirmed in bug 1204153 that simply increasing the instance size is enough for these tests.
(Reporter)

Comment 7

2 years ago
I see the dependent bug was marked as fixed, is there anything I can do to help here? I think this needs at least an entry in watch_pending.cfg. We want to match builders like:

Ubuntu VM 12.04 x64 mozilla-inbound debug test gtest
Ubuntu VM 12.04 x64 mozilla-inbound opt test gtest
Ubuntu VM 12.04 x64 mozilla-inbound pgo test gtest

so I think we can go with

"^Ubuntu VM 12.04 .+ test gtest$"

as our regex.
Flags: needinfo?(kmoir)
(Assignee)

Comment 8

2 years ago
Thanks for the regexp for watch pending.  We need to make some changes in buildbot-configs to add this as a platform. Is this just for linux64? (not linux which may be deprecated anyways - see bug 1209932) I can look at this now, it should be a small patch.
Flags: needinfo?(kmoir)
(Assignee)

Updated

2 years ago
Assignee: nobody → kmoir
(Reporter)

Comment 9

2 years ago
Great, thank you! The issue in question is only a problem for linux64, so we can change that to

"^Ubuntu VM 12.04 x64 .+ test gtest$"

Different instance types for different architectures might cause some confusion for people trying to debug these tests, but if 32 bit linux is on the way out anyway, it's probably ok.
(Assignee)

Comment 10

2 years ago
Created attachment 8669085 [details] [diff] [review]
bug1205785.patch

I'm having some problems enabling these tests on ubuntu64_vm_large (which exists an a attached machine type on the same masters that provide ubuntu64_vm)

I tried to enable this machine type for linux64 builds with the attached patch only this builder is enabled.  

Ubuntu ASAN VM 12.04 x64 <branch> opt test gtest ScriptFactory

Versus before the patch where
Ubuntu ASAN VM 12.04 x64 <branch> opt test gtest ScriptFactory
Ubuntu VM 12.04 x64 <branch> opt test gtest ScriptFactory
Ubuntu VM 12.04 x64 <branch> pgo test gtest ScriptFactory
Ubuntu VM 12.04 x64 <branch> debug test gtest ScriptFactory

Rail do you have any suggestions on what I might be missing?
Attachment #8669085 - Flags: feedback?(rail)
Comment on attachment 8669085 [details] [diff] [review]
bug1205785.patch

Review of attachment 8669085 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/config.py
@@ +139,5 @@
>  PLATFORMS['linux64']['talos_slave_platforms'] = ['ubuntu64_hw']
>  PLATFORMS['linux64']['env_name'] = 'linux-perf'
>  PLATFORMS['linux64']['ubuntu64_vm'] = {'name': 'Ubuntu VM 12.04 x64'}
>  PLATFORMS['linux64']['ubuntu64_hw'] = {'name': 'Ubuntu HW 12.04 x64'}
> +PLATFORMS['linux64']['ubuntu64_vm_large'] = {'name': 'Ubuntu HW 12.04 x64'}

Maybe because of this line? It says "HW", but it is a "VM".

@@ +2466,5 @@
> +                continue
> +    
> +            if platform in BRANCHES[name]['platforms']:
> +                if slave_platform in BRANCHES[name]['platforms'][platform]:
> +                    if platform in "linux64" and slave_platform in "ubuntu64_vm":

This condition is a bit confusing. I'd rather use:
  platform == "linux64"
or
  platform in ["linux64"]

This way you do exact matching instead of partial (`"linux" in "linux64"` returns True). This may lead to hidden issues in the future.

@@ +2468,5 @@
> +            if platform in BRANCHES[name]['platforms']:
> +                if slave_platform in BRANCHES[name]['platforms'][platform]:
> +                    if platform in "linux64" and slave_platform in "ubuntu64_vm":
> +                        continue
> +                    elif platform in "linux64" and slave_platform in "ubuntu64_vm_large":

The same here.

@@ +2469,5 @@
> +                if slave_platform in BRANCHES[name]['platforms'][platform]:
> +                    if platform in "linux64" and slave_platform in "ubuntu64_vm":
> +                        continue
> +                    elif platform in "linux64" and slave_platform in "ubuntu64_vm_large":
> +                        BRANCHES[name]['platforms'][platform][slave_platform]['debug_unittest_suites'] = []

Looks like this line is redundant. Just set the value to GTEST instead of += GTEST.

@@ +2471,5 @@
> +                        continue
> +                    elif platform in "linux64" and slave_platform in "ubuntu64_vm_large":
> +                        BRANCHES[name]['platforms'][platform][slave_platform]['debug_unittest_suites'] = []
> +                        BRANCHES[name]['platforms'][platform][slave_platform]['debug_unittest_suites'] += GTEST
> +                        BRANCHES[name]['platforms'][platform][slave_platform]['opt_unittest_suites'] = []

The same here.
Attachment #8669085 - Flags: feedback?(rail) → feedback+
(Assignee)

Comment 12

2 years ago
Thanks for the feedback rail.  The reason that builder wasn't appearing was that
1) had to create a new platform type to avoid duplicate builder names
2) add the new platform type to master_config.json

I have some new patches which I'll attach soon
(Assignee)

Comment 13

2 years ago
Created attachment 8670317 [details] [diff] [review]
bug1205785bb.patch
Attachment #8669085 - Attachment is obsolete: true
(Assignee)

Comment 14

2 years ago
Created attachment 8670318 [details] [diff] [review]
bug1205785puppet.patch

puppet changes
(Assignee)

Comment 15

2 years ago
Created attachment 8670354 [details] [diff] [review]
bug1205785bb.patch
Attachment #8670317 - Attachment is obsolete: true
(Assignee)

Comment 16

2 years ago
Created attachment 8670563 [details] [diff] [review]
bug1205785bb.patch
Attachment #8670354 - Attachment is obsolete: true
(Assignee)

Comment 17

2 years ago
Created attachment 8670567 [details]
bug1205785builder.diff

builder diff
(Assignee)

Comment 18

2 years ago
Created attachment 8670577 [details]
bug1205785builder.diff
Attachment #8670567 - Attachment is obsolete: true
(Assignee)

Comment 19

2 years ago
Created attachment 8670578 [details] [diff] [review]
bug1205785bb.patch
Attachment #8670563 - Attachment is obsolete: true
(Assignee)

Comment 20

2 years ago
Created attachment 8670585 [details] [diff] [review]
bug1205785tools.patch
(Assignee)

Comment 21

2 years ago
Created attachment 8671005 [details] [diff] [review]
bug1205785builder.diff
Attachment #8670577 - Attachment is obsolete: true
(Assignee)

Comment 22

2 years ago
Created attachment 8671020 [details] [diff] [review]
bug1205785cloud.patch
(Assignee)

Updated

2 years ago
Attachment #8670318 - Flags: review?(rail)
(Assignee)

Comment 23

2 years ago
Comment on attachment 8670578 [details] [diff] [review]
bug1205785bb.patch

I don't like this patch.  It feels so twisted and complicated.  

But it enables emulator64 instances for linux64 + asan + tsan + code coverage gtests
(Assignee)

Updated

2 years ago
Attachment #8670578 - Flags: review?(rail)
(Assignee)

Updated

2 years ago
Attachment #8670585 - Flags: review?(rail)
(Assignee)

Updated

2 years ago
Attachment #8671020 - Flags: review?(rail)
Comment on attachment 8670318 [details] [diff] [review]
bug1205785puppet.patch

Review of attachment 8670318 [details] [diff] [review]:
-----------------------------------------------------------------

::: modules/buildmaster/templates/BuildSlaves-tests.py.erb
@@ +17,5 @@
>      'ubuntu64_vm-mulet': """<%=scope.function_secret(["linux_tests_password"])%>""",
>      'ubuntu64_hw': """<%=scope.function_secret(["linux_tests_password"])%>""",
>      'ubuntu64_vm_mobile': """<%=scope.function_secret(["linux_tests_password"])%>""",
>      'ubuntu64_vm_large': """<%=scope.function_secret(["linux_tests_password"])%>""",
> +    'ubuntu64_vm_lnx_large': """<%=scope.function_secret(["linux_tests_password"])%>""",    

A nit: trailing whit espace.
Attachment #8670318 - Flags: review?(rail) → review+
Comment on attachment 8670585 [details] [diff] [review]
bug1205785tools.patch

Review of attachment 8670585 [details] [diff] [review]:
-----------------------------------------------------------------

::: buildfarm/maintenance/production-masters.json
@@ +547,5 @@
>      "limit_fx_slave_platforms": {
>        "linux64": [
> +        "ubuntu64_vm",
> +        "ubuntu64_vm_lnx_large"
> +        

A nit: extra line with trailing whitespace

@@ +1961,5 @@
>        ],
>        "linux64-cc": [
> +        "ubuntu64_vm",
> +        "ubuntu64_vm_lnx_large"
> +        

the same here
Attachment #8670585 - Flags: review?(rail) → review+
Comment on attachment 8671020 [details] [diff] [review]
bug1205785cloud.patch

Review of attachment 8671020 [details] [diff] [review]:
-----------------------------------------------------------------

Instead of attaching a patch you can submit a PR on github :)

::: configs/watch_pending.cfg
@@ +29,5 @@
>          "^(TB )?WINNT (5.2|6.1|6.2)( x86-64)? try": "y-2008",
>          "^b2g_try_linux(32|64)_gecko(-debug)?": "try-linux64",
> +        "^Ubuntu VM large 12.04 x64.* (gtest).*" : "tst-emulator64",
> +        "^Ubuntu Code Coverage VM large 12.04 x64.* test (gtest).*": "tst-emulator64",
> +        "^Ubuntu ASAN VM large 12.04 x64.* test (gtest).*": "tst-emulator64",

(gtest).* looks confusing, it should be either without parenthesis or (gtest)?.* to indicate that whatever it's in parenthesis is optional (question mark).
Attachment #8671020 - Flags: review?(rail) → review-
Comment on attachment 8670578 [details] [diff] [review]
bug1205785bb.patch

Review of attachment 8670578 [details] [diff] [review]:
-----------------------------------------------------------------

Only one question:

::: mozilla-tests/config.py
@@ +2604,5 @@
> +                    BRANCHES[name]['platforms'][platform][slave_platform]['opt_unittest_suites'] += GTEST
> +
> +trunk_branches = []
> +for name, branch in items_at_least(BRANCHES, 'gecko_version', 44):
> +    trunk_branches.append(name)

Hmm, "trunk_branches" sounds confusing. Do you want the tests to ride the trains or want them stay only on the trunk branches? For the latter you may want to use BRANCHES['mozilla-central']['gecko_version'] instead of 44.
(Assignee)

Comment 28

2 years ago
Comment on attachment 8670318 [details] [diff] [review]
bug1205785puppet.patch

and merged to production
Attachment #8670318 - Flags: checked-in+
(Assignee)

Comment 29

2 years ago
Created attachment 8673130 [details] [diff] [review]
bug1205785bb-2.patch
Attachment #8670578 - Attachment is obsolete: true
Attachment #8670578 - Flags: review?(rail)
(Assignee)

Comment 30

2 years ago
Created attachment 8673132 [details]
bug1205785builder-2.diff
Attachment #8671005 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8673132 - Attachment is patch: false
(Assignee)

Comment 31

2 years ago
Created attachment 8673134 [details] [diff] [review]
bug1205785cloud-2.patch
Attachment #8671020 - Attachment is obsolete: true
Attachment #8673134 - Flags: review?(rail)
(Assignee)

Updated

2 years ago
Attachment #8673130 - Attachment is patch: true
Attachment #8673130 - Attachment mime type: text/x-patch → text/plain
(Assignee)

Updated

2 years ago
Attachment #8673130 - Flags: review?(rail)
Attachment #8673134 - Flags: review?(rail) → review+
Comment on attachment 8673130 [details] [diff] [review]
bug1205785bb-2.patch

Review of attachment 8673130 [details] [diff] [review]:
-----------------------------------------------------------------

\o/

r+ with 3 fixes below

::: mozilla-tests/config.py
@@ +2610,5 @@
> +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64-tsan': 'ubuntu64_vm' }, branch_exclusions=["try"])
> +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64-tsan': 'ubuntu64_vm_lnx_large'}, branch_exclusions=["try"])
> +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64-asan': 'ubuntu64_vm'}, branch_exclusions=ride_train_branches)
> +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64-asan': 'ubuntu64-asan_vm_lnx_large'}, branch_exclusions=ride_train_branches)
> +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64': 'ubuntu64_vm_lnx_large'}, branch_exclusions=ride_train_branches)

Typo in 3 places:
s/ride_train_branches/ride_trains_branches/g
Attachment #8673130 - Flags: review?(rail) → review+
(Assignee)

Comment 33

2 years ago
Created attachment 8673156 [details] [diff] [review]
bug1205785tools-2.patch

tools patch with fixed white space
Attachment #8670585 - Attachment is obsolete: true
Attachment #8673156 - Flags: checked-in+
(Assignee)

Comment 34

2 years ago
Created attachment 8673157 [details] [diff] [review]
bug1205785bb-3.patch

thanks for the reviews rail. I had the variable name correct on my master but didn't export the correct patch in my workspace ...monday...I mean tuesday
Attachment #8673130 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8673157 - Flags: checked-in+
(Assignee)

Updated

2 years ago
Attachment #8673134 - Flags: checked-in+
(Assignee)

Comment 35

2 years ago
This is deployed and the old gtest builders are removed but the new ones aren't appearing.  I am wondering if this is a problem where the master needs to be restarted to pick up the new platform.  I'm going to graceful bm125 and restart to test this theory.
(Assignee)

Comment 36

2 years ago
restarted bm125, this did not fix my builder problem, investigating further....
(Assignee)

Comment 37

2 years ago
My current theory on why the builders are not enabled.

I think the new platform names are too long and are being truncated as part of the builder dir.
My test master did not have the same master_config.json as a production master (to save reconfig time).  If I add the same platform limitations to my test master, the builders disappear.  Continuing debugging....
(Assignee)

Comment 38

2 years ago
no, this is not it, the problem is the master_config.json doesn't have the new platforms
(Assignee)

Comment 39

2 years ago
I have fixed this issue.  It seems that a reconfig doesn't update the master_config.py.  You need to run

cd /builds/buildbot/tests1-linux64/
export PRODUCTION_MASTERS=tools/buildfarm/maintenance/production-masters.json
python buildbot-configs/update-master-json.py $PRODUCTION_MASTERS master/master_config.json
make checkconfig
make reconfig

on the masters to update it which I did on bm51-54, 67,68, 113-125  I didn't run a reconfig on all the masters because I wanted the regular reconfig process to run. I did run on on bm125 and can see gtests are running yet failing.

Are the gtests not defined in tree yet?  I noticed they were failing before the instance type was changed.  Looking at the results, I can stuff like this in the logs which suggests the test scripts are not in place yet.  This is for a Ubuntu VM large 12.04 x64 try opt test gtest job.

13:24:39     INFO - copying tree: /builds/slave/test/build/tests/gtest/gtest_bin to /builds/slave/test/build/application/firefox
13:24:39     INFO - Running post-action listener: _package_coverage_data
13:24:39     INFO - Running post-action listener: _resource_record_post_action
13:24:39    FATAL - Uncaught exception: Traceback (most recent call last):
13:24:39    FATAL -   File "/builds/slave/test/scripts/mozharness/base/script.py", line 1718, in run
13:24:39    FATAL -     self.run_action(action)
13:24:39    FATAL -   File "/builds/slave/test/scripts/mozharness/base/script.py", line 1660, in run_action
13:24:39    FATAL -     self._possibly_run_method(method_name, error_if_missing=True)
13:24:39    FATAL -   File "/builds/slave/test/scripts/mozharness/base/script.py", line 1601, in _possibly_run_method
13:24:39    FATAL -     return getattr(self, method_name)()
13:24:39    FATAL -   File "scripts/scripts/desktop_unittest.py", line 503, in run_tests
13:24:39    FATAL -     preflight_run_method=self.preflight_gtest)
13:24:39    FATAL -   File "scripts/scripts/desktop_unittest.py", line 596, in _run_category_suites
13:24:39    FATAL -     preflight_run_method(suites)
13:24:39    FATAL -   File "scripts/scripts/desktop_unittest.py", line 563, in preflight_gtest
13:24:39    FATAL -     os.path.join(abs_app_dir))
13:24:39    FATAL -   File "/builds/slave/test/scripts/mozharness/base/script.py", line 619, in copytree
13:24:39    FATAL -     files = os.listdir(src)
13:24:39    FATAL - OSError: [Errno 2] No such file or directory: '/builds/slave/test/build/tests/gtest/gtest_bin'
13:24:39    FATAL - Running post_fatal callback...
Flags: needinfo?(cmanchester)
(Reporter)

Comment 40

2 years ago
Great, thank you for working on this! I need to land bug 992983 to move these to the test job (they're currently running during make check, so when we can't leave a gap when moving them).

I'll check these on try today, and if everything looks good there land the conversion.
Flags: needinfo?(cmanchester)
(Reporter)

Comment 41

2 years ago
Green gtests in https://treeherder.mozilla.org/#/jobs?repo=try&revision=d88785b73494&exclusion_profile=false

Thanks again!
Wooot!
(Assignee)

Updated

2 years ago
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.