Closed
Bug 1205785
Opened 9 years ago
Closed 9 years ago
Use ec2 instances with more memory when running linux gtests
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: chmanchester, Assigned: kmoir)
References
Details
Attachments
(5 files, 11 obsolete files)
2.52 KB,
patch
|
rail
:
review+
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
5.98 KB,
text/plain
|
Details | |
858 bytes,
patch
|
rail
:
review+
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
14.13 KB,
patch
|
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
19.23 KB,
patch
|
kmoir
:
checked-in+
|
Details | Diff | Splinter Review |
We found in bug 1204148 that a few gtests require a lot of memory. The path forward seems to be to allow these to run on instances with more memory. It looks like the tst-linux64 instance type has 3.75 GiB, the next level up, 7.5 GiB, should be enough. I have a loaner out in bug 1204153 we can change the type of to confirm this is sufficient.
Reporter | ||
Comment 1•9 years ago
|
||
rail, is this something you can help me figure out? If it's something I could conceivably do myself I'm more than happy to write a patch with a few pointers. Thank you.
Flags: needinfo?(rail)
Comment 2•9 years ago
|
||
We tried to switch to a different instance type in bug 945981 for all tst-linux machines, but failed due to some flaky tests. Maybe we can use tst-emulator64 slaves for this? Kim added them at some point IIRC. We can ask our buildduty to work on this I think. ATM we use c3.xlarge (7.5G of RAM) and m3.xlarge (15G of RAM) (see https://github.com/mozilla/build-cloud-tools/blob/master/configs/watch_pending.cfg#L76-L84) for these slaves. If this approach works, we will need to 1) use "tst-emulator64" slaves for these tests, 2) update regexes at https://github.com/mozilla/build-cloud-tools/blob/master/configs/watch_pending.cfg#L7 and 3) probably increase the pool. If we need linux32 as well, we may need to setup a different slave type.
Flags: needinfo?(rail)
Reporter | ||
Comment 3•9 years ago
|
||
If tst-emulator64 is a lot tst-linux64, but with more memory, that should work great. The tests in question are disabled on linux32, so that would not be an immediate need. I will ask in bug 1204153 to switch to this instance type to confirm.
Component: General Automation → Buildduty
QA Contact: catlee → bugspam.Callek
Assignee | ||
Comment 4•9 years ago
|
||
What is the priority on this bug? Right now the buildduty folks are working on testing our new 10.10 machines. I can add it to their queue As for expanding the emulator pool, we have to bring up new masters first (bug 1205409) before we increase the pool size again (bug 1204756)
Depends on: 1205409
Reporter | ||
Comment 5•9 years ago
|
||
Thank you for the additional information. I think this is "normal" priority. This is the last step in what has been a quarter-long effort to get the GTests running and green on test machines. There isn't a single factor I'm aware of that would make this time sensitive in a way that would make it a priority over ongoing work, but there is some urgency in the sense that the longer these tests are run on builders the more likely it is issues will creep in that keep them from running and passing on test machines, and the process will be prolonged yet again. With this in mind, what kind of time frame is realistic for this change?
Reporter | ||
Comment 6•9 years ago
|
||
I confirmed in bug 1204153 that simply increasing the instance size is enough for these tests.
Reporter | ||
Comment 7•9 years ago
|
||
I see the dependent bug was marked as fixed, is there anything I can do to help here? I think this needs at least an entry in watch_pending.cfg. We want to match builders like: Ubuntu VM 12.04 x64 mozilla-inbound debug test gtest Ubuntu VM 12.04 x64 mozilla-inbound opt test gtest Ubuntu VM 12.04 x64 mozilla-inbound pgo test gtest so I think we can go with "^Ubuntu VM 12.04 .+ test gtest$" as our regex.
Flags: needinfo?(kmoir)
Assignee | ||
Comment 8•9 years ago
|
||
Thanks for the regexp for watch pending. We need to make some changes in buildbot-configs to add this as a platform. Is this just for linux64? (not linux which may be deprecated anyways - see bug 1209932) I can look at this now, it should be a small patch.
Flags: needinfo?(kmoir)
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → kmoir
Reporter | ||
Comment 9•9 years ago
|
||
Great, thank you! The issue in question is only a problem for linux64, so we can change that to "^Ubuntu VM 12.04 x64 .+ test gtest$" Different instance types for different architectures might cause some confusion for people trying to debug these tests, but if 32 bit linux is on the way out anyway, it's probably ok.
Assignee | ||
Comment 10•9 years ago
|
||
I'm having some problems enabling these tests on ubuntu64_vm_large (which exists an a attached machine type on the same masters that provide ubuntu64_vm) I tried to enable this machine type for linux64 builds with the attached patch only this builder is enabled. Ubuntu ASAN VM 12.04 x64 <branch> opt test gtest ScriptFactory Versus before the patch where Ubuntu ASAN VM 12.04 x64 <branch> opt test gtest ScriptFactory Ubuntu VM 12.04 x64 <branch> opt test gtest ScriptFactory Ubuntu VM 12.04 x64 <branch> pgo test gtest ScriptFactory Ubuntu VM 12.04 x64 <branch> debug test gtest ScriptFactory Rail do you have any suggestions on what I might be missing?
Attachment #8669085 -
Flags: feedback?(rail)
Comment 11•9 years ago
|
||
Comment on attachment 8669085 [details] [diff] [review] bug1205785.patch Review of attachment 8669085 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozilla-tests/config.py @@ +139,5 @@ > PLATFORMS['linux64']['talos_slave_platforms'] = ['ubuntu64_hw'] > PLATFORMS['linux64']['env_name'] = 'linux-perf' > PLATFORMS['linux64']['ubuntu64_vm'] = {'name': 'Ubuntu VM 12.04 x64'} > PLATFORMS['linux64']['ubuntu64_hw'] = {'name': 'Ubuntu HW 12.04 x64'} > +PLATFORMS['linux64']['ubuntu64_vm_large'] = {'name': 'Ubuntu HW 12.04 x64'} Maybe because of this line? It says "HW", but it is a "VM". @@ +2466,5 @@ > + continue > + > + if platform in BRANCHES[name]['platforms']: > + if slave_platform in BRANCHES[name]['platforms'][platform]: > + if platform in "linux64" and slave_platform in "ubuntu64_vm": This condition is a bit confusing. I'd rather use: platform == "linux64" or platform in ["linux64"] This way you do exact matching instead of partial (`"linux" in "linux64"` returns True). This may lead to hidden issues in the future. @@ +2468,5 @@ > + if platform in BRANCHES[name]['platforms']: > + if slave_platform in BRANCHES[name]['platforms'][platform]: > + if platform in "linux64" and slave_platform in "ubuntu64_vm": > + continue > + elif platform in "linux64" and slave_platform in "ubuntu64_vm_large": The same here. @@ +2469,5 @@ > + if slave_platform in BRANCHES[name]['platforms'][platform]: > + if platform in "linux64" and slave_platform in "ubuntu64_vm": > + continue > + elif platform in "linux64" and slave_platform in "ubuntu64_vm_large": > + BRANCHES[name]['platforms'][platform][slave_platform]['debug_unittest_suites'] = [] Looks like this line is redundant. Just set the value to GTEST instead of += GTEST. @@ +2471,5 @@ > + continue > + elif platform in "linux64" and slave_platform in "ubuntu64_vm_large": > + BRANCHES[name]['platforms'][platform][slave_platform]['debug_unittest_suites'] = [] > + BRANCHES[name]['platforms'][platform][slave_platform]['debug_unittest_suites'] += GTEST > + BRANCHES[name]['platforms'][platform][slave_platform]['opt_unittest_suites'] = [] The same here.
Updated•9 years ago
|
Attachment #8669085 -
Flags: feedback?(rail) → feedback+
Assignee | ||
Comment 12•9 years ago
|
||
Thanks for the feedback rail. The reason that builder wasn't appearing was that 1) had to create a new platform type to avoid duplicate builder names 2) add the new platform type to master_config.json I have some new patches which I'll attach soon
Assignee | ||
Comment 13•9 years ago
|
||
Attachment #8669085 -
Attachment is obsolete: true
Assignee | ||
Comment 14•9 years ago
|
||
puppet changes
Assignee | ||
Comment 15•9 years ago
|
||
Attachment #8670317 -
Attachment is obsolete: true
Assignee | ||
Comment 16•9 years ago
|
||
Attachment #8670354 -
Attachment is obsolete: true
Assignee | ||
Comment 17•9 years ago
|
||
builder diff
Assignee | ||
Comment 18•9 years ago
|
||
Attachment #8670567 -
Attachment is obsolete: true
Assignee | ||
Comment 19•9 years ago
|
||
Attachment #8670563 -
Attachment is obsolete: true
Assignee | ||
Comment 20•9 years ago
|
||
Assignee | ||
Comment 21•9 years ago
|
||
Attachment #8670577 -
Attachment is obsolete: true
Assignee | ||
Comment 22•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Attachment #8670318 -
Flags: review?(rail)
Assignee | ||
Comment 23•9 years ago
|
||
Comment on attachment 8670578 [details] [diff] [review] bug1205785bb.patch I don't like this patch. It feels so twisted and complicated. But it enables emulator64 instances for linux64 + asan + tsan + code coverage gtests
Assignee | ||
Updated•9 years ago
|
Attachment #8670578 -
Flags: review?(rail)
Assignee | ||
Updated•9 years ago
|
Attachment #8670585 -
Flags: review?(rail)
Assignee | ||
Updated•9 years ago
|
Attachment #8671020 -
Flags: review?(rail)
Comment 24•9 years ago
|
||
Comment on attachment 8670318 [details] [diff] [review] bug1205785puppet.patch Review of attachment 8670318 [details] [diff] [review]: ----------------------------------------------------------------- ::: modules/buildmaster/templates/BuildSlaves-tests.py.erb @@ +17,5 @@ > 'ubuntu64_vm-mulet': """<%=scope.function_secret(["linux_tests_password"])%>""", > 'ubuntu64_hw': """<%=scope.function_secret(["linux_tests_password"])%>""", > 'ubuntu64_vm_mobile': """<%=scope.function_secret(["linux_tests_password"])%>""", > 'ubuntu64_vm_large': """<%=scope.function_secret(["linux_tests_password"])%>""", > + 'ubuntu64_vm_lnx_large': """<%=scope.function_secret(["linux_tests_password"])%>""", A nit: trailing whit espace.
Attachment #8670318 -
Flags: review?(rail) → review+
Comment 25•9 years ago
|
||
Comment on attachment 8670585 [details] [diff] [review] bug1205785tools.patch Review of attachment 8670585 [details] [diff] [review]: ----------------------------------------------------------------- ::: buildfarm/maintenance/production-masters.json @@ +547,5 @@ > "limit_fx_slave_platforms": { > "linux64": [ > + "ubuntu64_vm", > + "ubuntu64_vm_lnx_large" > + A nit: extra line with trailing whitespace @@ +1961,5 @@ > ], > "linux64-cc": [ > + "ubuntu64_vm", > + "ubuntu64_vm_lnx_large" > + the same here
Attachment #8670585 -
Flags: review?(rail) → review+
Comment 26•9 years ago
|
||
Comment on attachment 8671020 [details] [diff] [review] bug1205785cloud.patch Review of attachment 8671020 [details] [diff] [review]: ----------------------------------------------------------------- Instead of attaching a patch you can submit a PR on github :) ::: configs/watch_pending.cfg @@ +29,5 @@ > "^(TB )?WINNT (5.2|6.1|6.2)( x86-64)? try": "y-2008", > "^b2g_try_linux(32|64)_gecko(-debug)?": "try-linux64", > + "^Ubuntu VM large 12.04 x64.* (gtest).*" : "tst-emulator64", > + "^Ubuntu Code Coverage VM large 12.04 x64.* test (gtest).*": "tst-emulator64", > + "^Ubuntu ASAN VM large 12.04 x64.* test (gtest).*": "tst-emulator64", (gtest).* looks confusing, it should be either without parenthesis or (gtest)?.* to indicate that whatever it's in parenthesis is optional (question mark).
Attachment #8671020 -
Flags: review?(rail) → review-
Comment 27•9 years ago
|
||
Comment on attachment 8670578 [details] [diff] [review] bug1205785bb.patch Review of attachment 8670578 [details] [diff] [review]: ----------------------------------------------------------------- Only one question: ::: mozilla-tests/config.py @@ +2604,5 @@ > + BRANCHES[name]['platforms'][platform][slave_platform]['opt_unittest_suites'] += GTEST > + > +trunk_branches = [] > +for name, branch in items_at_least(BRANCHES, 'gecko_version', 44): > + trunk_branches.append(name) Hmm, "trunk_branches" sounds confusing. Do you want the tests to ride the trains or want them stay only on the trunk branches? For the latter you may want to use BRANCHES['mozilla-central']['gecko_version'] instead of 44.
Assignee | ||
Comment 28•9 years ago
|
||
Comment on attachment 8670318 [details] [diff] [review] bug1205785puppet.patch and merged to production
Attachment #8670318 -
Flags: checked-in+
Assignee | ||
Comment 29•9 years ago
|
||
Attachment #8670578 -
Attachment is obsolete: true
Attachment #8670578 -
Flags: review?(rail)
Assignee | ||
Comment 30•9 years ago
|
||
Attachment #8671005 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Attachment #8673132 -
Attachment is patch: false
Assignee | ||
Comment 31•9 years ago
|
||
Attachment #8671020 -
Attachment is obsolete: true
Attachment #8673134 -
Flags: review?(rail)
Assignee | ||
Updated•9 years ago
|
Attachment #8673130 -
Attachment is patch: true
Attachment #8673130 -
Attachment mime type: text/x-patch → text/plain
Assignee | ||
Updated•9 years ago
|
Attachment #8673130 -
Flags: review?(rail)
Updated•9 years ago
|
Attachment #8673134 -
Flags: review?(rail) → review+
Comment 32•9 years ago
|
||
Comment on attachment 8673130 [details] [diff] [review] bug1205785bb-2.patch Review of attachment 8673130 [details] [diff] [review]: ----------------------------------------------------------------- \o/ r+ with 3 fixes below ::: mozilla-tests/config.py @@ +2610,5 @@ > +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64-tsan': 'ubuntu64_vm' }, branch_exclusions=["try"]) > +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64-tsan': 'ubuntu64_vm_lnx_large'}, branch_exclusions=["try"]) > +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64-asan': 'ubuntu64_vm'}, branch_exclusions=ride_train_branches) > +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64-asan': 'ubuntu64-asan_vm_lnx_large'}, branch_exclusions=ride_train_branches) > +delete_slave_platform(BRANCHES, PLATFORMS, {'linux64': 'ubuntu64_vm_lnx_large'}, branch_exclusions=ride_train_branches) Typo in 3 places: s/ride_train_branches/ride_trains_branches/g
Attachment #8673130 -
Flags: review?(rail) → review+
Assignee | ||
Comment 33•9 years ago
|
||
tools patch with fixed white space
Attachment #8670585 -
Attachment is obsolete: true
Attachment #8673156 -
Flags: checked-in+
Assignee | ||
Comment 34•9 years ago
|
||
thanks for the reviews rail. I had the variable name correct on my master but didn't export the correct patch in my workspace ...monday...I mean tuesday
Attachment #8673130 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Attachment #8673157 -
Flags: checked-in+
Assignee | ||
Updated•9 years ago
|
Attachment #8673134 -
Flags: checked-in+
Assignee | ||
Comment 35•9 years ago
|
||
This is deployed and the old gtest builders are removed but the new ones aren't appearing. I am wondering if this is a problem where the master needs to be restarted to pick up the new platform. I'm going to graceful bm125 and restart to test this theory.
Assignee | ||
Comment 36•9 years ago
|
||
restarted bm125, this did not fix my builder problem, investigating further....
Assignee | ||
Comment 37•9 years ago
|
||
My current theory on why the builders are not enabled. I think the new platform names are too long and are being truncated as part of the builder dir. My test master did not have the same master_config.json as a production master (to save reconfig time). If I add the same platform limitations to my test master, the builders disappear. Continuing debugging....
Assignee | ||
Comment 38•9 years ago
|
||
no, this is not it, the problem is the master_config.json doesn't have the new platforms
Assignee | ||
Comment 39•9 years ago
|
||
I have fixed this issue. It seems that a reconfig doesn't update the master_config.py. You need to run cd /builds/buildbot/tests1-linux64/ export PRODUCTION_MASTERS=tools/buildfarm/maintenance/production-masters.json python buildbot-configs/update-master-json.py $PRODUCTION_MASTERS master/master_config.json make checkconfig make reconfig on the masters to update it which I did on bm51-54, 67,68, 113-125 I didn't run a reconfig on all the masters because I wanted the regular reconfig process to run. I did run on on bm125 and can see gtests are running yet failing. Are the gtests not defined in tree yet? I noticed they were failing before the instance type was changed. Looking at the results, I can stuff like this in the logs which suggests the test scripts are not in place yet. This is for a Ubuntu VM large 12.04 x64 try opt test gtest job. 13:24:39 INFO - copying tree: /builds/slave/test/build/tests/gtest/gtest_bin to /builds/slave/test/build/application/firefox 13:24:39 INFO - Running post-action listener: _package_coverage_data 13:24:39 INFO - Running post-action listener: _resource_record_post_action 13:24:39 FATAL - Uncaught exception: Traceback (most recent call last): 13:24:39 FATAL - File "/builds/slave/test/scripts/mozharness/base/script.py", line 1718, in run 13:24:39 FATAL - self.run_action(action) 13:24:39 FATAL - File "/builds/slave/test/scripts/mozharness/base/script.py", line 1660, in run_action 13:24:39 FATAL - self._possibly_run_method(method_name, error_if_missing=True) 13:24:39 FATAL - File "/builds/slave/test/scripts/mozharness/base/script.py", line 1601, in _possibly_run_method 13:24:39 FATAL - return getattr(self, method_name)() 13:24:39 FATAL - File "scripts/scripts/desktop_unittest.py", line 503, in run_tests 13:24:39 FATAL - preflight_run_method=self.preflight_gtest) 13:24:39 FATAL - File "scripts/scripts/desktop_unittest.py", line 596, in _run_category_suites 13:24:39 FATAL - preflight_run_method(suites) 13:24:39 FATAL - File "scripts/scripts/desktop_unittest.py", line 563, in preflight_gtest 13:24:39 FATAL - os.path.join(abs_app_dir)) 13:24:39 FATAL - File "/builds/slave/test/scripts/mozharness/base/script.py", line 619, in copytree 13:24:39 FATAL - files = os.listdir(src) 13:24:39 FATAL - OSError: [Errno 2] No such file or directory: '/builds/slave/test/build/tests/gtest/gtest_bin' 13:24:39 FATAL - Running post_fatal callback...
Flags: needinfo?(cmanchester)
Reporter | ||
Comment 40•9 years ago
|
||
Great, thank you for working on this! I need to land bug 992983 to move these to the test job (they're currently running during make check, so when we can't leave a gap when moving them). I'll check these on try today, and if everything looks good there land the conversion.
Flags: needinfo?(cmanchester)
Reporter | ||
Comment 41•9 years ago
|
||
Green gtests in https://treeherder.mozilla.org/#/jobs?repo=try&revision=d88785b73494&exclusion_profile=false Thanks again!
Comment 42•9 years ago
|
||
Wooot!
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•4 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•