Closed
Bug 990173
Opened 11 years ago
Closed 8 years ago
Move b2g bumper to a dedicated host
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: jhopkins, Unassigned)
References
Details
Attachments
(1 file)
752.07 KB,
image/png
|
Details |
b2g bumper is currently running on buildbot-master66 which is being deprecated (bug 990172).
Let's move b2g bumper to a dedicated host and call it git-hg-bumpers or similar.
Comment 1•11 years ago
|
||
Hal: did this already get moved? I see a bunch of git commands running on bm66 dating from May19, but nothing else.
Flags: needinfo?(hwine)
Not to my knowledge. Puppet still believes b2g_bumper lives on bm66.
Flags: needinfo?(hwine)
Comment 3•10 years ago
|
||
buildbot-master66 has been complaining about load all day today. I think it's time to fine a beefier instance for this, or split the processes across multiple instances.
Severity: normal → major
Comment 4•10 years ago
|
||
The b2g bumper runs every 5 minutes, and takes about 3 minutes to run. On 10/31 it took 2:30 - 2:45 to run. Nagios alerts with WARNING when any of the short/mid/longterm load are over 10 (CRITICAL is 25). We're getting alerts because the long-term load is just about always over 10 (see inset graph).
The longer term trend comes from graphite, and the values are quite different for the short-term load, but the long-term seems reasonably similar. Something changed late on 11/04 and again on 11/08. I can't see any big changes in the manifests then, so I'm going to guess something external is making all the 'git ls-remote' operations we do slower. So splitting the work or relaxing the nagios threshold.
I've downtimed the alert for 12 hours.
Comment 5•10 years ago
|
||
I've noticed that we are not caching results to git ls-remote across branches - essentially meaning we are creating far more load than we need to. I would propose resolving this first, as with 5 branches, we might be able to reduce load by e.g. 70%-80% (depending on how much commonality exists between the manifests).
I will take a look at this.
Comment 6•10 years ago
|
||
I created a very crude script to run them in sequence, and ran locally.
The first branch took approximately 20 mins to run. The other branches took approximately 12 seconds to run, when sharing a cache across branches. In other words, it looks like with 5 branches, when sharing a cache, we should hit about 20% of the resource usage we were previously hitting.
The very crude script for testing was:
#!/usr/bin/env python
import sys
import copy
from b2g_bumper import B2GBumper
if __name__ == '__main__':
git_ref_cache = {}
b2g_bumper_script = sys.argv[0].replace('_combined', '')
config_files = copy.copy(sys.argv[1:])
for config in config_files:
sys.argv = [b2g_bumper_script, '-c', config, '-c', '/Users/pmoore/work/b2g_bumper/pete.py', '--checkout-manifests', '--massage-manifests']
bumper = B2GBumper()
bumper._git_ref_cache = git_ref_cache
print bumper._git_ref_cache
bumper.run()
print bumper._git_ref_cache
This essentially cached the bumper._git_ref_cache between runs.
I'll implement differently for this bug, as this approach messed up mozharness logging - instead I'll write out the cache as a json file, and have a script which just runs through the branches in sequence, stores the cache at the end of the run, and loads it at the beginning of the run.
Each time the cron runs, it will start with an empty cache, so the cache is just used once per branch, and then a new cache is created.
Hope this makes sense!
Comment 7•10 years ago
|
||
I've written the code to clear/import/export the cache, and tested locally, and it is working. However, I believe the parallelisation is not working properly in b2g bumper, so I want to fix this too.
I've also noticed that we query for a single revision when using git ls-remote, but for the 1017 distinct requests we make for (url, head/tag), there are only 302 unique urls - i.e. on average each git url will be queried 3.36 times for different tags/heads. I believe it will be much more efficient if we only query once and get back all heads/tags.
Another point that occurs to me is that all this data we collect and hammer git.m.o for, we already have in vcs sync. We know the state of all the heads/tags because only vcs sync pushes to git.m.o - therefore if we simply stored the state of the heads/tags in vcs sync when pushing, we would not need to keep querying git.m.o. At the moment we are polling the source repos in vcs sync to look for changes, and only pushing to git.m.o when there are changes, but with b2g bumper we are now polling git.m.o too looking for changes.
Lastly, I don't think the parallelisation is working as expected. A run b2g bumper takes on average a couple of seconds per git ls-remote command, even when parallelised, but without parallelisation, it also takes this long.
So I'm going to try to tackle these issues together, as there are massive performance improvements that can be made, which will allow us to scale much better. And we've seen that our vcs usage is very high in general, so anything we can do that will reduce load on vcs systems can only be a good thing.
Comment 8•10 years ago
|
||
I moved (most of) the above concerns (comments 5,6,7) into a separate bug (1097784).
Comment 9•10 years ago
|
||
I'm migrating this host to a new VLAN and CentOS 6.5.
If I just halt the old host and start up the new, freshly puppetized one, will b2g-bumper run?
Flags: needinfo?(pmoore)
Comment 10•10 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #9)
> I'm migrating this host to a new VLAN and CentOS 6.5.
>
> If I just halt the old host and start up the new, freshly puppetized one,
> will b2g-bumper run?
Yes. To explain my reasoning:
1) Halting the old host should do no damage, if it dies half way through pushing, it should not be able to leave any corruption on the git server.
2) Mozharness should take care of creating fresh working directories for b2g bumper on the first run, assuming the jobs are correctly configured in a crontab, and all prerequisite packages are installed by puppet to get mozharness on its feet.
3) The first run should not take particularly long, as there is not massive data processing to do, like there is in vcs sync, so this is also not a concern.
In any case it would be worth monitoring when you start the new machine up, that mozharness is running. If anything was not automatic, please report back in the bug.
Thanks Dustin for picking this up!
Pete
Flags: needinfo?(pmoore)
Comment 11•10 years ago
|
||
We went forward with this today since trees were closed anyway. It's taking its sweet time cloning repositories, but nothing seems to have failed yet.
Comment 12•10 years ago
|
||
Note that I had to add an additional 65G EBS volume to this host and extend the VG into that PV -- its disk was getting dangerously full without that (and the old host had a 100GB root volume).
Comment 14•8 years ago
|
||
b2g bumper has been decommissioned
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(hwine)
Resolution: --- → WONTFIX
Assignee | ||
Updated•8 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•