Closed Bug 1126879 Opened 10 years ago Closed 8 years ago

slaveapi fails at filing tracking bugs when it wants to file an unreachable bug for a slave without a problem tracking bug

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: aobreja)

Details

Attachments

(3 files)

See, until the next time we toss history, https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=t-snow-r4&name=t-snow-r4-0011 with two reboots with the result "400 Client Error: Bad Request" before I figured out that it wanted to call the slave unreachable but was failing to file the tracking bug to make it blocked by the unreachable bug, then after I filed the tracker for it, the successful "Failed. Filed IT bug for reboot"
this is a regression in bmo im sure. a seamonkey irc bot (i dont control) had issues yesterday as well
> 400 Client Error: Bad Request in order to look at this, we'll need to know what request the bot is making against bmo -- what webservice endpoint is it hitting, what method, and what parameters?
Flags: needinfo?(bugspam.Callek)
for example: 2015-01-28 09:35:01,262 - INFO - panda-0524 - Sending request: POST https://bugzilla.mozilla.org/rest/bug 2015-01-28 09:35:01,879 - ERROR - panda-0524 - Something went wrong while processing! 2015-01-28 09:35:01,879 - ERROR - panda-0524 - Traceback (most recent call last): 2015-01-28 09:35:01,879 - ERROR - panda-0524 - 2015-01-28 09:35:01,880 - ERROR - panda-0524 - File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/processor.py", line 64, in _worker 2015-01-28 09:35:01,880 - ERROR - panda-0524 - res, msg = action(slave, *args, **kwargs) 2015-01-28 09:35:01,880 - ERROR - panda-0524 - 2015-01-28 09:35:01,880 - ERROR - panda-0524 - File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/actions/reboot. py", line 116, in reboot 2015-01-28 09:35:01,880 - ERROR - panda-0524 - slave.reboot_bug = file_reboot_bug(slave) 2015-01-28 09:35:01,880 - ERROR - panda-0524 - 2015-01-28 09:35:01,880 - ERROR - panda-0524 - File "/builds/slaveapi/prod/lib/python2.7/site-packages/slaveapi/clients/bugzill a.py", line 76, in file_reboot_bug 2015-01-28 09:35:01,880 - ERROR - panda-0524 - resp = bugzilla_client.create_bug(data) 2015-01-28 09:35:01,880 - ERROR - panda-0524 - 2015-01-28 09:35:01,881 - ERROR - panda-0524 - File "/builds/slaveapi/prod/lib/python2.7/site-packages/bzrest/client.py", line 55, in create_bug 2015-01-28 09:35:01,881 - ERROR - panda-0524 - return self.request("POST", "bug", data) 2015-01-28 09:35:01,881 - ERROR - panda-0524 - 2015-01-28 09:35:01,881 - ERROR - panda-0524 - File "/builds/slaveapi/prod/lib/python2.7/site-packages/bzrest/client.py", line 40, in request 2015-01-28 09:35:01,881 - ERROR - panda-0524 - r.raise_for_status() 2015-01-28 09:35:01,881 - ERROR - panda-0524 - 2015-01-28 09:35:01,881 - ERROR - panda-0524 - File "/builds/slaveapi/prod/lib/python2.7/site-packages/requests/models.py", lin e 683, in raise_for_status 2015-01-28 09:35:01,881 - ERROR - panda-0524 - raise HTTPError(http_error_msg, response=self) 2015-01-28 09:35:01,881 - ERROR - panda-0524 - 2015-01-28 09:35:01,881 - ERROR - panda-0524 - HTTPError: 400 Client Error: Bad Request Which is: http://mxr.mozilla.org/build/source/slaveapi/slaveapi/clients/bugzilla.py#66 Which is calling into https://github.com/bhearsum/bzrest/blob/master/bzrest/client.py specifically its just calling a POST with that data: https://github.com/bhearsum/bzrest/blob/master/bzrest/client.py#L54
Flags: needinfo?(bugspam.Callek) → needinfo?(glob)
Specifically I suspect this is a regression from: bug 1124437 Backport upstream bug 1090275 to bmo/4.2 to whitelist webservice api methods
this doesn't appear to be related to bug 1124437 - i'm able to create bugs via rest without issue. bugzilla will be returning the reason for the failure in its json response: {"documentation":"http://www.bugzilla.org/docs/tip/en/html/api/","code":32000,"error":true,"message":"The version value 'other' is not active."} however the library used here is catching and dealing with the http/400 result first, which results in it dropping the error message in favour of a generic "bad request" one. my guess is the bot is setting the "blocks" field to a bug which doesn't exist.
Flags: needinfo?(glob)
Should be fixed with bzrest 0.9, which I just updated on prod slaveapi.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
It would be sweet to finally get this fixed, since I've generally rebooted a slave two or three times, and thus lost 24-48 hours of its life, before I finally notice it's not actually getting anywhere.
It would be sweet to finally get this fixed, since we now have employees doing buildduty, including doing my non-job when I'm on non-PTO, and since I didn't train them they don't know this bug exists.
Alin or Andrei should be able to tackle this.
Assignee: bugspam.Callek → nobody
Component: Tools → Buildduty
QA Contact: hwine → bugspam.Callek
Assignee: nobody → aobreja
This patch should upgrade bzrest to 0.9 version which could solve the problem with calling the POST "return self.request("POST", "bug", data)". A recent example of this problem could be found on t-yosemite-r7-0387. The Puppet repository can be found here: https://github.com/mozilla/build-puppet Callek what could be the risks if we do this upgrade on puppet for bzrest?
Comment on attachment 8784864 [details] [diff] [review] bug1126879_puppet.patch Review of attachment 8784864 [details] [diff] [review]: ----------------------------------------------------------------- As I said in c#6 -- I updated bzrest on prod and thought that fixed it. I didn't realize we had bzrest==.7 pinned here. so do eet.
Attachment #8784864 - Flags: review+
(In reply to Justin Wood (:Callek) from comment #12) > Comment on attachment 8784864 [details] [diff] [review] > bug1126879_puppet.patch > > Review of attachment 8784864 [details] [diff] [review]: > ----------------------------------------------------------------- > > As I said in c#6 -- I updated bzrest on prod and thought that fixed it. I > didn't realize we had bzrest==.7 pinned here. so do eet. Ahh and based on that comment too, we need to update: https://github.com/mozilla/build-slaveapi/blob/master/setup.py#L20 Otherwise we'll fail to install things right. So steps: * Update github's slaveapi repo (version bump for bzrest and slaveapi itself). * Package it up and deploy to relengweb's pypi and puppet's pypi mirrors * Deploy this puppet patch + a version bump for slaveapi.
Attachment #8786738 - Flags: review?(bugspam.Callek) → review+
Callek I don't have merge rights for this patch, I get " Only those with write access to this repository can merge pull requests." Can you merge this patch for me? thanks
Flags: needinfo?(bugspam.Callek)
Done, thanks.
Flags: needinfo?(bugspam.Callek)
> * Package it up and deploy to relengweb's pypi and puppet's pypi mirrors > * Deploy this puppet patch + a version bump for slaveapi. Done this part.
If we expect this to work now, it doesn't.
Attached file logs_bug1126879.txt
Seems that upgrading Bzrest to 0.9 from 0.7 did not solved the issue ,is the same issue as in Comment 3 with : Sending request: POST https://bugzilla.mozilla.org/rest/bug The problem can be seen on t-w864-ix-230 and t-w864-ix-199. Callek do you have any suggestions here?
Flags: needinfo?(bugspam.Callek)
Two thoughts: * I thought there was a puppet issue with the version bumps, did that get sorted out, if not then we're not actually running the new code. * Slaveapi needs to be manually restarted after the ver bumps, since there is no soft-reset and it retains state in memory (ala: the History is not flushed to disk anywhere).
Flags: needinfo?(bugspam.Callek)
Callek do have any other suggestions on this bug? Andrei mentioned this morning that he was still stuck on this bug.
Flags: needinfo?(bugspam.Callek)
Nothing offhand, tracebacks would be useful if one exists in the logs. Also running similar commands with slave's venv of slaveapi+bzrest to validate that it can indeed reach bmo with the creds it has and is able to submit a bug in a similar fashion. If we feel this is important enough and our buildduty team can't decipher the app, I can look into it but its a big context switch so I'd like :coop to confirm with me that he does want me to look in for debugging sake, if I am to do so.
Flags: needinfo?(bugspam.Callek)
Apparently it just needed dhouse to restart slaveapi (a couple of times) after he did a kernel upgrade on it, since it just filed some tracking bugs for the first time in just over two years.
Status: REOPENED → RESOLVED
Closed: 10 years ago8 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: