Closed
Bug 1293544
Opened 9 years ago
Closed 9 years ago
alertmanager.allizom.org is returning 500 errors and needs to get back to a stable changeset
Categories
(Testing :: General, defect)
Tracking
(Not tracked)
VERIFIED
FIXED
People
(Reporter: nthomas, Assigned: armenzg)
References
Details
eg http://alertmanager.allizom.org/data/setadetails/?date=2016-08-09&buildbot=1&branch=fx-team&inactive=1
We first noticed at Aug 8 1400 PDT, seems to be pretty consistent. Caused a tree closure because buildbot didn't cope very well.
Comment 1•9 years ago
|
||
I no longer have access to that machine (and never worked on alertmanager :) )
jmaher is the primary contact, but he is on PTO. Adding some other people who might be able to help.
Comment 2•9 years ago
|
||
last time jgriffin was able to login and fix it, I don't have credentials
Flags: needinfo?(jgriffin)
Comment 3•9 years ago
|
||
Also, jgriffin would it be possible to share the credentials with releng so we can look at it when ateam members are unavailable?
| Assignee | ||
Comment 4•9 years ago
|
||
It seems that a security change was deployed to alertmanager in the last day or so:
https://github.com/mozilla/ouija/commit/8ab444e1c80e603758759c1c71417b0a53ede131
Unfortunately, the change came in with other development changes that are meant for the Heroku/TaskCluster work not to be deployed to alertmanager.
glob is currently working on getting us back into a good state.
So far, he installed the sqlalchemy python package which was missing for the latest changes.
Instead of trying to fix the latest development changes we're going to try to rollback as best as possible.
glob is now backing up the database just in case rolling back fails.
We're hoping that reverting to 8ab444e1c80e603758759c1c71417b0a53ede131 might get us back into a good state.
We might need to go further in the commit history and cherry pick that change.
Assignee: nobody → glob
Flags: needinfo?(jgriffin)
(In reply to Armen Zambrano [:armenzg] - Engineering productivity from comment #4)
> It seems that a security change was deployed to alertmanager in the last day
> So far, he installed the sqlalchemy python package which was missing for the
> latest changes.
for the record..
ouija doesn't use a virtualenv, which made this all the more interesting.
there were two versions of MySQL-python installed, one by pip and one by apt-get. i had to purge the apt-installed package to resolve versioning mismatches between MySQLdb and _mysql.
Comment 6•9 years ago
|
||
re: comment 4
Armen: is there a way we can pin the revision that the existing seta server uses so that it is not brought down by code deploys meant for the new server?
Flags: needinfo?(armenzg)
(In reply to Kim Moir [:kmoir] from comment #6)
> Armen: is there a way we can pin the revision that the existing seta server
> uses so that it is not brought down by code deploys meant for the new server?
i found no evidence of automated code deployment, so there shouldn't be a problem.
i set up a nightly backup of the database (to /mnt/mysql/backups), and checked out revision 8ab444e1c80e603758759c1c71417b0a53ede131.
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(armenzg)
Resolution: --- → FIXED
| Assignee | ||
Comment 8•9 years ago
|
||
I will create a new branch on mozilla/ouija to be used for alertmanager.
We will stay on that branch.
| Assignee | ||
Comment 9•9 years ago
|
||
Thanks glob! I will take it from here since I'm getting access.
It seems that 8ab444e1c80e603758759c1c71417b0a53ede131 gives us an issue with the dates returned in the json data to be a day into the future.
I'm going to revert to eed18721a3c993e251c2f0708a557073adc62193 and cherry pick the sql injection change.
Assignee: glob → armenzg
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: alertmanager.allizom.org is returning 500 errors → alertmanager.allizom.org is returning 500 errors and need to get back to a stable release
| Assignee | ||
Comment 10•9 years ago
|
||
This is what I've done so far.
* Created a new branch with only the change we want [1]
* Updated the host and restarted apache [2]
Unfortunately, I still see the bad date in here:
http://alertmanager.allizom.org/data/setadetails/?date=2016-08-09&buildbot=1&branch=fx-team&inactive=1
> {"jobtypes": {"2016-08-10":
[1]
https://github.com/mozilla/ouija/commits/production-alertmanager
armenzg@armenzg-thinkpad:~/repos/ouija$ git checkout -b production-alertmanager
Switched to a new branch 'production-alertmanager'
armenzg@armenzg-thinkpad:~/repos/ouija$ git reset --hard eed18721a3c993e251c2f0708a557073adc62193
HEAD is now at eed1872 merged updated and cleaned up updatedb.py
armenzg@armenzg-thinkpad:~/repos/ouija$ git push origin production-alertmanager
Total 0 (delta 0), reused 0 (delta 0)
To git@github.com:mozilla/ouija.git
* [new branch] production-alertmanager -> production-alertmanager
armenzg@armenzg-thinkpad:~/repos/ouija$ git cherry-pick 8ab444e1c80e603758759c1c71417b0a53ede131
[production-alertmanager 948a182] ensure we avoid common SQL injection attacks
Author: Joel Maher <joel.maher@gmail.com>
Date: Mon Aug 8 17:29:54 2016 +0000
1 file changed, 31 insertions(+), 11 deletions(-)
armenzg@armenzg-thinkpad:~/repos/ouija$ git push
Counting objects: 4, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 823 bytes | 0 bytes/s, done.
Total 4 (delta 3), reused 0 (delta 0)
To git@github.com:mozilla/ouija.git
eed1872..948a182 production-alertmanager -> production-alertmanager
[2]
ubuntu@ouija:~/ouija$ git checkout -b production-alertmanager origin/production-alertmanager
M tools/seta.cfg
Branch production-alertmanager set up to track remote branch production-alertmanager from origin.
Switched to a new branch 'production-alertmanager'
ubuntu@ouija:~/ouija$ sudo apache2ctl restart
AH00558: apache2: Could not reliably determine the server's fully qualified domain name, using ouija.allizom.org. Set the 'ServerName' directive globally to suppress this message
| Assignee | ||
Comment 11•9 years ago
|
||
I also cherrypicked the following change [1]
So far kmoir says that checkconfig works.
[1]
armenzg@armenzg-thinkpad:~/repos/ouija$ git cherry-pick 4e8f99e8191d372e1abaa2f3863022a6803e607c
[production-alertmanager 8699ce0] fix date off by 1 for setadetails/ query
Author: Joel Maher <joel.maher@gmail.com>
Date: Mon Aug 8 19:30:02 2016 +0000
1 file changed, 1 insertion(+), 1 deletion(-)
armenzg@armenzg-thinkpad:~/repos/ouija$ git push
Counting objects: 4, done.
Delta compression using up to 8 threads.
Compressing objects: 100% (4/4), done.
Writing objects: 100% (4/4), 410 bytes | 0 bytes/s, done.
Total 4 (delta 3), reused 0 (delta 0)
To git@github.com:mozilla/ouija.git
948a182..8699ce0 production-alertmanager -> production-alertmanager
| Assignee | ||
Comment 12•9 years ago
|
||
To make it extra clear, we can now merge anything to master since alertmanager does not use `master` anymore.
Summary: alertmanager.allizom.org is returning 500 errors and need to get back to a stable release → alertmanager.allizom.org is returning 500 errors and needs to get back to a stable changeset
| Assignee | ||
Comment 13•9 years ago
|
||
I believe this is fixed.
I'm waiting for confirmation of reconfigurations to have completed.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Flags: needinfo?(bugspam.Callek)
Resolution: --- → FIXED
Comment 14•9 years ago
|
||
All but the last master has reconfig'd (its in progress)
Status: RESOLVED → VERIFIED
Updated•9 years ago
|
Flags: needinfo?(bugspam.Callek)
You need to log in
before you can comment on or make changes to this bug.
Description
•