Closed
Bug 1279556
Opened 8 years ago
Closed 8 years ago
Wait times emails stopped on May 10
Categories
(Release Engineering :: General, defect, P3)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1293659
People
(Reporter: coop, Unassigned)
Details
The last wait times email I received came on May 10. Something is wrong here.
Reporter | ||
Comment 1•8 years ago
|
||
Looking in papertrail, buildapi is returning 500: Jun 10 06:06:04 relengwebadm.private.scl3.mozilla.com buildapi_waittimes: Error: fetching wait times from location http://buildapi.pvt.build.mozilla.org/buildapi/reports/waittimes/buildpool?maxb=480&endtime=1465542000&mpb=15&format=json : HTTP Error 500: Internal Server Error No code has changed in buildapi since March 11. I tried re-deploying the existing buildapi code, and saw the following errors: [2016-06-10 09:36:43] Running push_www [2016-06-10 09:36:43] [web1.releng.webapp.scl3.mozilla.com] running: /data/bin/update-www.sh buildapi [2016-06-10 09:36:43] [web2.releng.webapp.scl3.mozilla.com] running: /data/bin/update-www.sh buildapi [2016-06-10 09:36:43] [celery1.srv.releng.scl3.mozilla.com] running: /data/bin/update-www.sh buildapi [2016-06-10 09:36:43] [celery1.srv.releng.scl3.mozilla.com] failed: /data/bin/update-www.sh buildapi (0.086s) [celery1.srv.releng.scl3.mozilla.com] err: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). [2016-06-10 09:36:43] [web2.releng.webapp.scl3.mozilla.com] failed: /data/bin/update-www.sh buildapi (0.091s) [web2.releng.webapp.scl3.mozilla.com] err: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). [2016-06-10 09:36:43] [web1.releng.webapp.scl3.mozilla.com] failed: /data/bin/update-www.sh buildapi (0.103s) [web1.releng.webapp.scl3.mozilla.com] err: Permission denied (publickey,gssapi-keyex,gssapi-with-mic). [2016-06-10 09:36:43] Finished push_www (0.104s) [2016-06-10 09:36:43] Starting new HTTPS connection (1): changelog.allizom.org So we can't talk to celery from the webheads. This is likely related to the recent key changes, so NI-ing Callek for potential insight.
Flags: needinfo?(bugspam.Callek)
Priority: -- → P3
Comment 2•8 years ago
|
||
(In reply to Chris Cooper [:coop] from comment #1) > Looking in papertrail, buildapi is returning 500: > > Jun 10 06:06:04 relengwebadm.private.scl3.mozilla.com buildapi_waittimes: > Error: fetching wait times from location > http://buildapi.pvt.build.mozilla.org/buildapi/reports/waittimes/ > buildpool?maxb=480&endtime=1465542000&mpb=15&format=json : HTTP Error 500: > Internal Server Error I'd be interested in what the buildapi error was/is.... > > No code has changed in buildapi since March 11. I tried re-deploying the > existing buildapi code, and saw the following errors: > > [celery1.srv.releng.scl3.mozilla.com] err: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic). > [web2.releng.webapp.scl3.mozilla.com] err: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic). > [web1.releng.webapp.scl3.mozilla.com] err: Permission denied > (publickey,gssapi-keyex,gssapi-with-mic). > > So we can't talk to celery from the webheads. This is likely related to the > recent key changes, so NI-ing Callek for potential insight. These wouldn't be from my key changes, this would be from webops changes with the rebuilds recently. They are also the issues that require us to file bugs lately about other relengweb pushes (like trychooser)
Flags: needinfo?(bugspam.Callek)
Comment 3•8 years ago
|
||
[root@web1.releng.webapp.scl3 ~]# cat /var/log/httpd/buildapi.pvt.build.mozilla.org/error_log [Fri Jun 10 10:05:01 2016] [error] [client 10.22.81.211] Timeout when reading response headers from daemon process 'buildapi': /data/www/buildapi/buildapi.wsgi /var/log/buildapi/buildapi.log itself looks fine. I wonder if this is some sort of fallout from the py2.7 upgrade that broke relengweb during the rebuild.
Flags: needinfo?(hwine)
Comment 4•8 years ago
|
||
I'm not aware of the py27 issues mentioned in comment 3 The trychooser push issue should be fixed as of this morning -- I can push, but I have extra rights atm.
Flags: needinfo?(hwine)
Comment 5•8 years ago
|
||
fwiw the entire relengweb cluster went down ~may 10'th during the rebuilds and a mod_wsgi/python update. http://logs.glob.uno/?c=mozilla%23releng&s=10+May+2016&e=10+May+2016 See ~15:38 and on.
Comment 6•8 years ago
|
||
Okay, after some back and forth on #irc for me to remember the past -- a little more detail. Some of our releng webcluster apps used to rely on data flows authorized with ssh keys. Unfortunately, some of those keys were not documented and/or were "loose" (not in puppet or ldap). Recent cleanups happened in 2 steps, the first was around May 10 iirc. The second happened later in May. As of this morning, the keys have been recovered and properly inserted into puppet for trychooser. (See bug 1278585 comment 5 and on.) If the wait times report used the same flows, then it should start working. It appears not to as http://buildapi.pvt.build.mozilla.org/buildapi/reports/waittimes still returns a 500 Next steps would be to identify what credentials are being used in this flow, and working with webops to get that flow re-enabled and/or switch to the keys that do work. ni: :coop to report back next week on whether the emails are still a no-show. And whether this is worth pursuing.
Flags: needinfo?(coop)
Reporter | ||
Comment 7•8 years ago
|
||
(In reply to Hal Wine [:hwine] (use NI) from comment #6) > ni: :coop to report back next week on whether the emails are still a > no-show. And whether this is worth pursuing. Emails are still broken. I think this is still worth pursuing, because even if these particular reports aren't useful, we can't use the data source to build an alternative until this is fixed. The long term plan should probably involve moving to heroku.
Flags: needinfo?(coop)
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Assignee | ||
Updated•7 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•