Closed
Bug 989492
Opened 11 years ago
Closed 11 years ago
tool to compare different sources of slave and master data
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: bhearsum)
References
()
Details
Attachments
(6 files, 4 obsolete files)
80.92 KB,
text/plain
|
catlee
:
feedback+
|
Details |
139.33 KB,
text/plain
|
Details | |
16.23 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
3.10 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
2.11 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
7.00 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
We've got a bunch of different places containing slave and master data, and they often get out of sync. Having something that compares and reports differences would help us to keep things in line. We have data in at least the following places: * AWS (source of truth for AWS machines) * Inventory System list (source of truth for hardware machines) * Inventory DNS/SREG * Slavealloc * Buildbot-configs Here's a first stab at something that compares AWS instance lists+inventory systems vs. inventory dns/sreg. I'm focused on masters and slaves here (for now at least), so it has a ton of excludes to ignore other things. I'm not really sure where the best place to this is, so it's in tools for now. I almost want a new repo for it. Suggestions welcome. There's more to do still, including: * compare against buildbot-configs * ||ize
Attachment #8398742 -
Flags: feedback?(catlee)
Assignee | ||
Comment 1•11 years ago
|
||
Assignee | ||
Comment 2•11 years ago
|
||
This report was run before I finalized the exclude list, so it has things about signing and some other machines that are now excluded. It's caught a bunch of interesting things so far including: * A few machines that exist in a SoT but not slavealloc (eg, panda-0882, tst-linux64-ec2-390) * Masters in slavealloc whose FQDN is an IP (may or may not be valid) * Old EC2 machine names in slavealloc that should probably be removed * Some weird DNS inconsistency (eg bld-linux64-spot-389, servo machines) There's also a ton of complaints about spot instances that are probably invalid. Haven't dug into those much yet.
Attachment #8398751 -
Flags: feedback?(catlee)
Assignee | ||
Comment 3•11 years ago
|
||
Some enhancements in this version: * Use network interfaces to find spot instances. Rail tells me this is the right way to find the list of possible spot instances. * Generate list of machines in Buildbot. This part of the script sucks because of our configs. I'm happy to change it if someone has a better idea. This script now depends on a million packages too, mostly because cloudtools and buildbot depend on a ton of stuff. It also depends on invtool (by way of cloudtools), which doesn't seem to be installable unless you're root. This might make deploying it a little tough.
Attachment #8398742 -
Attachment is obsolete: true
Attachment #8398742 -
Flags: feedback?(catlee)
Attachment #8399584 -
Flags: feedback?(catlee)
Assignee | ||
Updated•11 years ago
|
Attachment #8399584 -
Flags: feedback?(rail)
Assignee | ||
Updated•11 years ago
|
Attachment #8399584 -
Flags: feedback?(rail) → feedback?(rail)
Comment 5•11 years ago
|
||
Comment on attachment 8399584 [details] [diff] [review] fix spot instances; compare against buildbot Review of attachment 8399584 [details] [diff] [review]: ----------------------------------------------------------------- In overall it looks great. I have 2 unrelated to the code concerns: 1) boto may return not all objects, see https://github.com/boto/boto/pull/2189 2) the part responsible for spot DNS check may go away once we switch to puppetless approach.
Attachment #8399584 -
Flags: feedback?(rail) → feedback+
Comment 6•11 years ago
|
||
Comment on attachment 8398751 [details]
sample report
For the "machines missing from AWS or inventory" section, it would be nice to know where the machines ARE defined. e.g. are they coming from slavealloc, or buildbot-configs, or ???
Attachment #8398751 -
Flags: feedback?(catlee) → feedback+
Assignee | ||
Comment 7•11 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #6) > Comment on attachment 8398751 [details] > sample report > > For the "machines missing from AWS or inventory" section, it would be nice > to know where the machines ARE defined. e.g. are they coming from > slavealloc, or buildbot-configs, or ??? Ah, I think I addressed this in my updated patch already (but I didn't attach an updated report). The new report has: report.write("Machines in AWS/Inventory but not in Slavealloc:\n") report.write("Machines in AWS/Inventory but not in Buildbot configs:\n") report.write("Machines in Slavealloc but not in AWS or inventory:\n") report.write("Machines in Buildbot configs but not in AWS or inventory:\n") And for each type of dns record: report.write("Machines with errors in their %s DNS records:\n" % type_)
Assignee | ||
Comment 8•11 years ago
|
||
Assignee | ||
Comment 9•11 years ago
|
||
No changes to the actual report here, just replacing the command line interface with something compatible with reportor. I wasn't quite sure how to test it, but I was able to run it from the command line after setting REPORTOR_CREDS. Is there more testing I can/should do? I also got rid of most of the dependencies by copying in the parts of cloudtools that I need (crappy) and fixing bug 991056. The remaining few I added to setup.py. I'm not 100% sure if I'm done working on the report yet, but I think the best way forward is to get it running, fix up all of the obvious things it whines about, and see where we stand.
Attachment #8398743 -
Attachment is obsolete: true
Attachment #8399584 -
Attachment is obsolete: true
Attachment #8399584 -
Flags: feedback?(catlee)
Attachment #8400724 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8400724 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 10•11 years ago
|
||
Comment on attachment 8400724 [details] [diff] [review] run the report in reportor Checked this in, tested it on cruncher, and now I've deployed it to the "production" reportor spot.
Attachment #8400724 -
Flags: checked-in+
Assignee | ||
Comment 11•11 years ago
|
||
Whoops, I forgot to install the newly required deps (boto, etc.). Did that now, we should have a report out tomorrow.
Assignee | ||
Comment 12•11 years ago
|
||
The report is running now but it's a tad noisy because of check_call doesn't output. This should shut it up.
Attachment #8403506 -
Flags: review?(catlee)
Assignee | ||
Comment 13•11 years ago
|
||
This patch also gets us ignoring a few more things: * Servo (because its set-up is pretty static, and it's a PITA to get its slavelist) * All dev machines (because loaners aren't useful to this report, and non-loaners come in and out of existence). * aws-manager (not a master or slave) * buildbot-master81 (not in slavealloc)
Attachment #8403506 -
Attachment is obsolete: true
Attachment #8403506 -
Flags: review?(catlee)
Attachment #8404163 -
Flags: review?(catlee)
Assignee | ||
Comment 14•11 years ago
|
||
I went through and fixed up most of the machines that were in slavealloc but not other places. I also fixed most of the invalid dns. bug 994267 is fixing up buildbot-configs to remove dead machines.
Comment 15•11 years ago
|
||
Comment on attachment 8404163 [details] [diff] [review] even more quieter Review of attachment 8404163 [details] [diff] [review]: ----------------------------------------------------------------- ::: reports/machine_sanity/machine_sanity.py @@ +140,5 @@ > + null = open(devnull, 'w') > + try: > + check_call(["hg", "clone", buildbot_configs, bbdir], stdout=null) > + finally: > + null.close() a bit cleaner as with open(devnull, 'w') as null: check_call(["hg", "clone", buildbot_configs, bbdir], stdout=null) 3 fewer lines! SAVE THE NEWLINES!!!!
Attachment #8404163 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 16•11 years ago
|
||
Comment on attachment 8404163 [details] [diff] [review] even more quieter Landed with the suggested change.
Attachment #8404163 -
Flags: checked-in+
Assignee | ||
Comment 18•11 years ago
|
||
Watch for skip patterns in slavealloc/buildbot names, not just inventory/aws.
Attachment #8406136 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8406136 -
Flags: review?(catlee) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #8406136 -
Flags: checked-in+
Assignee | ||
Comment 19•11 years ago
|
||
Per IRC, this patch provides a JSON file with all of the valid slaves listed in it. Also, turns out I broke stuff in my last patch. This fixes that.
Attachment #8408412 -
Flags: review?(catlee)
Updated•11 years ago
|
Attachment #8408412 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 20•11 years ago
|
||
Comment on attachment 8408412 [details] [diff] [review] provide listing of all valid slaves Landed and updated cruncher.
Attachment #8408412 -
Flags: checked-in+
Assignee | ||
Comment 21•11 years ago
|
||
With the latest patch checked in, we've now got a list of all of the usable slaves: https://secure.pub.build.mozilla.org/builddata/reports/reportor/daily/machine_sanity/usable_slaves.json I think we're done here now?
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•