Bug 1461379 (bmo-dbi-connector)

API DB Availability Exceptions on recurring BMO scripts

RESOLVED FIXED

Status

()

P1
normal
RESOLVED FIXED
9 months ago
4 months ago

People

(Reporter: claudijd, Assigned: dylan)

Tracking

Production

Details

Attachments

(2 attachments)

We have a scheduled job setup where by we periodically poll the BMO API and look for unassigned jobs in a component where we have a round robin assignment script running.

I happened to notice that periodically (like once every few days) it will throw an exception on the API and although we are taking steps to work with that, I suspect the BMO team very much cares about how robust and reliable the API is, so I'm sharing some stack traces for transparency and comment (maybe there is something that can be done to make this more reliable)...

$ /bin/bash -e /tmp/jenkins2189971386254713860.sh
[00:00:01] INFO [__main__.autoassign:213] No unassigned bugs for component
[00:00:02] INFO [__main__.autoassign:213] No unassigned bugs for component
[00:00:02] DEBUG [__main__.autocasa:89] Analyzing 10 closed bugs...
[00:00:04] WARNING [__main__.autocasa:113] Project 10829 already has a security status set (done), skipping!
[00:00:06] WARNING [__main__.autocasa:119] Project 11513 is already in status 'none' and will not be modified
[00:00:08] WARNING [__main__.autocasa:113] Project 11518 already has a security status set (done), skipping!
[00:00:09] WARNING [__main__.autocasa:113] Project 11524 already has a security status set (done), skipping!
[00:00:11] WARNING [__main__.autocasa:113] Project 11535 already has a security status set (done), skipping!
[00:00:13] WARNING [__main__.autocasa:119] Project 11558 is already in status 'none' and will not be modified
[00:00:14] WARNING [__main__.autocasa:113] Project 11618 already has a security status set (done), skipping!
[00:00:16] WARNING [__main__.autocasa:119] Project 11619 is already in status 'none' and will not be modified
Traceback (most recent call last):
  File "./assigner.py", line 220, in <module>
    main()
  File "./assigner.py", line 66, in main
    autocasa(bapi, capi, bcfg, ccfg, args.dry_run)
  File "./assigner.py", line 93, in autocasa
    comments = bapi.get_comments(bug.get('id'))['bugs'][str(bug.get('id'))]['comments']
  File "/usr/lib/python3.6/site-packages/bugzilla.py", line 55, in get_comments
    return self._get('bug/{bugid}/comment'.format(bugid=bugid))
  File "/usr/lib/python3.6/site-packages/bugzilla.py", line 123, in _get
    raise Exception(r.url, r.reason, r.status_code, r.json())
Exception: ('https://bugzilla.mozilla.org/rest/bug/1456277/comment?api_key=****', 'OK', 200, {'documentation': 'https://bmo.readthedocs.org/en/latest/api/', 'error': True, 'code': 100500, 'message': "\nCan't connect to the database.\nError: Lost connection to MySQL server at 'reading initial communication packet', system error: 104\n  Is your database installed and up and running?\n  Do you have the correct username and password selected in localconfig?\n\n"})
Build step 'Execute shell' marked build as failure

The error seems to suggest that BMO is having communication reliability issues between the Web front-end and the backend MySQL DB.
Here's the issue for our project that is triggering these exceptions every so often (twice in a few days time, running once every 10min or so):

https://github.com/mozilla/infosec-risk-management-bugzilla/issues/11

Code that triggers this:

https://github.com/mozilla/infosec-risk-management-bugzilla/blob/master/assigner.py
Dylan: is this expected behavior on BMO API endpoints?
Flags: needinfo?(dylan)
(In reply to Jonathan Claudius [:claudijd] (use NEEDINFO) from comment #1)
> Here's the issue for our project that is triggering these exceptions every
> so often (twice in a few days time, running once every 10min or so):
> 
> https://github.com/mozilla/infosec-risk-management-bugzilla/issues/11
> 
> Code that triggers this:
> 
> https://github.com/mozilla/infosec-risk-management-bugzilla/blob/master/
> assigner.py

How many bugs are you making edits on at a time? 

I'm wondering if this is related to the bulk editing issue I've run into.
Flags: needinfo?(jclaudius)
Priority: -- → P1
:emceeaich - The tool does a couple things:

1.) READ/WRITE - Looks for unassigned bugs in two security components (Enterprise Information Security::Vulnerability Assessment and Enterprise Information Security::Rapid Risk Analysis) and if they are unassigned it will assign them in a round robin assignment
2.) READ - Looks for all bugs in component in two security components (Enterprise Information Security::Vulnerability Assessment and Enterprise Information Security::Rapid Risk Analysis) and if they are in sync with Mozilla CASA system.  If the status' are out of sync, it will correct them in the CASA system.  I don't believe it will go the other direction, so no WRITE here.

Let us know if you would prefer a real-time troubleshooting session where we could trigger multiple runs on this in succession and see if it triggers your issue.

The runs are scheduled to run once every 10min 24/7 to triage any new security issues ASAP.
Flags: needinfo?(jclaudius)
Please note that we are still experiencing these issues, more context and times of occurrence here:

https://github.com/mozilla/infosec-risk-management-bugzilla/issues/13

Note that we do this every 10min around the clock.
:digi - making you aware of this, as our latest error (documented in issue #13 above) seems to suggest proxy tunnel issues
(In reply to Jonathan Claudius [:claudijd] (use NEEDINFO) from comment #6)
> :digi - making you aware of this, as our latest error (documented in issue
> #13 above) seems to suggest proxy tunnel issues

Thanks! What host(s) run this job?
pentest-master.private.mdc1.mozilla.com || pentest-slave1.private.mdc1.mozilla.com (I can't login to check ATM, but I think it's slave)
(Assignee)

Updated

9 months ago
Assignee: nobody → dylan
Flags: needinfo?(dylan)
(Assignee)

Comment 9

9 months ago
Created attachment 8980038 [details] [review]
Refactor Bugzilla::DB to not subclass DBI
(Assignee)

Comment 10

9 months ago
Created attachment 8980039 [details] [review]
Use DBIx::Connector to maintain fewer connections to mysql
(Assignee)

Comment 11

8 months ago
I'm not entirely sure the landed changes will fix the bug, but I'm going to resolve this bug and hope that next week we see a difference.
Status: NEW → RESOLVED
Last Resolved: 8 months ago
Resolution: --- → FIXED
:dylan - thank you, I'll report back / reopen as needed.
(Assignee)

Updated

4 months ago
Alias: bmo-dbi-connector
(Assignee)

Updated

4 months ago
Blocks: 1492926
You need to log in before you can comment on or make changes to this bug.