Stage new tbpl (the MySQL version) on generic cluster

RESOLVED FIXED

Status

mozilla.org Graveyard
Server Operations
RESOLVED FIXED
7 years ago
2 years ago

People

(Reporter: laura, Assigned: cshields)

Tracking

Details

(Reporter)

Description

7 years ago
The code we need to stage is located here:
hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/

rhelmer will attach a sample puppet config to this bug.
(In reply to Laura Thomson :laura from comment #0)
> rhelmer will attach a sample puppet config to this bug.

I've done a very simple set of files and manifests for a vagrant setup:

https://github.com/rhelmer/tbpl/tree/vagrant-bug-682591/vagrant

I'd like this to be as close to stage/prod as possible, so if we can work together to get this up to snuff and then use something similar for stage that'd be ideal.
(Assignee)

Updated

7 years ago
Assignee: server-ops → cshields
(Assignee)

Comment 2

7 years ago
woot!  rhelmer++

got this on -dev right now:

https://tbpl-dev.allizom.org/

user/pass is in https://intranet.mozilla.org/Websites/Stage_Passwords

The dev site is auto-updating the tbpl data with the python cron job..  but I do not have the site code set up to auto-update yet.  I'll try to get that after dinner tonight.

Stage will come tomorrow and we can hand that side off to infrasec/qa.

Kudos to rhelmer, got this site up in -dev with about an hour and half of heavily preempted work.
(Assignee)

Comment 3

7 years ago
I see joduinn on the bug, let me add this request for releng:

the tbpl-dev and eventually tbpl.allizom sites are not to be "used"..  please don't use them for build duty or share the links with developers to use them.  They are not on servers designed for any kind of load outside of development purposes, nor do we want to give the impression that they can be relied on for actual use.  The production site will be right around the corner, we promise!

Comment 4

7 years ago
tbpl.allizom.org is 403'ing for me right now (eg http://tbpl.allizom.org/?usebuildbot=1&tree=Try&rev=e9351f9ab296), which means I cannot access Try results, since the old tbpl.mozilla doesn't show try any more due to the tinderbox emails being turned off.

I'm presuming it was broken by something here?
(Assignee)

Comment 5

7 years ago
I haven't touched tbpl.allizom.org - that is the stage site (see https://mana.mozilla.org/wiki/display/websites/Home for an explanation) and that is an "old" stage site.

I'll have it replaced tomorrow.
Group: infra

Comment 6

7 years ago
Yeah I realise it's the old stage site, it just seemed enough of a coincidence that it's gone down in the last 30 mins, not long after tbpl-dev.allizom.org was rolled out - so just wanted to check either way.

I'm getting 403 for:
You don't have permission to access /error/noindex.html on this server.

There weren't any .htaccess changes by any chance that could have caused it?

(In reply to Corey Shields [:cshields] from comment #5)
> (see https://mana.mozilla.org/wiki/display/websites/Home for an explanation)

My LDAP login fails for that URL, presume I don't have the required permissions to view it.
(Assignee)

Comment 7

7 years ago
Correct...

SO, background here:

I did some prep work on TBPL stage (tbpl.allizom.org) which consisted of a dns pointer, which doubled up the tbpl.allizom.org that is currently in use.

According to #build, tbpl.allizom.org is being used in "production".  This is unacceptable.  The whole purpose of the allizom domain is to denote that it is a stage environment and will be prone to service interruptions.

So now I'm at the point where we need to build the real stage environment for this tomorrow, yet the old site is already serving a critical role.  We need to fix that ASAP.
(Assignee)

Comment 8

7 years ago
Sorry, got this and IRC going at the same time.

I've reverted the dns change for tbpl.allizom.org - but sometime in the next 24 hrs I want to revert the fact that it is being USED so that we can setup the real stage and the webdevs, QA, and infrasec can use it for their purposes.

Comment 9

7 years ago
(In reply to Corey Shields [:cshields] from comment #7)
> According to #build, tbpl.allizom.org is being used in "production".  This
> is unacceptable.  

The production site does not work for Try (as the tinderbox emails had to be turned off due to load), so there isn't much other choice at present.
(Assignee)

Comment 10

7 years ago
per IRC:

Tomorrow we will be reclaiming the tbpl.allizom.org name for the new (proper) stage site.

releng still needs to use the old tbpl.allizom.org site, so I will rename that tbpl-never-use-allizom-again.allizom.org and they can continue using that until the real production site is available.  This will allow us to point tbpl.allizom.org to the real stage.

This is an inconvenience to people who may have bookmarks to tbpl.allizom.org, they will have to be informed of the new TEMPORARY location, with our apologies that tbpl.allizom.org was never meant for actual use.

We will get tbpl.mozilla.org out ASAP and this will all be behind us.
Awesome, thanks Corey :-)
Note that the UI on tbpl-dev.allizom.org (and presumably tbpl.allizom.org when it is updated) currently displays: "Loading failed: error".

This is due to URLs like this not working (/php/starcomment.php is 404):
http://tbpl.mozilla.org/php/starcomment.php?tree=mozilla-central&dates%5B%5D=2011-08-29

I was thinking we could just make this a relative URL in the config, but looked up the original change (bug 668992) and this only makes sense to have on the production server, or some server that has access to the War On Orange elastic search server.

tl;dr - there's an error in the UI on dev/stage that will go away when prod is updated.
(In reply to Corey Shields [:cshields] from comment #10)
> Tomorrow we will be reclaiming the tbpl.allizom.org name for the new
> (proper) stage site.
> 
> releng still needs to use the old tbpl.allizom.org site, so I will rename
> that tbpl-never-use-allizom-again.allizom.org and they can continue using
> that until the real production site is available.

Could you please update the VirtualHost definition at the bottom of httpd.conf on dm-tbpl01.m.o when you make that change.
jgriffin knows what's what on the other end of that connection to WOO.
(In reply to Robert Helmer [:rhelmer] from comment #12)
> Note that the UI on tbpl-dev.allizom.org (and presumably tbpl.allizom.org
> when it is updated) currently displays: "Loading failed: error".
> 
> This is due to URLs like this not working (/php/starcomment.php is 404):
> http://tbpl.mozilla.org/php/starcomment.php?tree=mozilla-
> central&dates%5B%5D=2011-08-29
> 
> I was thinking we could just make this a relative URL in the config, but
> looked up the original change (bug 668992) and this only makes sense to have
> on the production server, or some server that has access to the War On
> Orange elastic search server.
> 
> tl;dr - there's an error in the UI on dev/stage that will go away when prod
> is updated.

In order for that connection to ES to work, we'd need to file an IT request to open the network path from the machine hosting that URL to the machine hosting ES.  Currently that path is open from the machine hosting http://tbpl.mozilla.org.  If the 'new' TBPL is going to be hosted on a different piece of hardware, let me know and I can file an IT request to open the relevant network path.
(Assignee)

Comment 16

7 years ago
(In reply to Jonathan Griffin (:jgriffin) from comment #15)
> In order for that connection to ES to work, we'd need to file an IT request
> to open the network path from the machine hosting that URL to the machine
> hosting ES.  Currently that path is open from the machine hosting
> http://tbpl.mozilla.org.  If the 'new' TBPL is going to be hosted on a
> different piece of hardware, let me know and I can file an IT request to
> open the relevant network path.

Thanks, what is the destination for ES and I'll file the bug.  We will want to add the flow to our docs for the generic cluster.  And this cluster is 100 seamicro nodes, so it would be easier for us to file it.

thanks again!
(Assignee)

Updated

7 years ago
Depends on: 683401
(In reply to Robert Helmer [:rhelmer] from comment #12)
> tl;dr - there's an error in the UI on dev/stage that will go away when prod
> is updated.

Actually since prod is now http-only with an http->https redirect, this is going to fall afoul of the same-origin-policy (https URL should be ok if it is setting xs-xhr headers).

We also get mixed http/https warnings due to google calendar API being used, filed bug 683418 to switch that over.
(Assignee)

Comment 19

7 years ago
update:

Dev: is done as mentioned earlier..  tbpl-dev.allizom.org

Stage: is pending the release of tbpl.allizom.org since that is being used

Prod: is up!!!  https://tbpl.mozilla.org

basically I didn't want to type out the long url I mentioned earlier so I just worked on prod instead.

thanks to rhelmer and philor for the testing and help on irc.  Given the urgent need for this I haven't done all of the testing I would have liked to, and infrasec has signed off today on the mysql changes.  So with that there may still be bugs, but now we have environments to test them in and a way to easily push those changes out to the new production environment.

This will get R/F after I move stage in.
(Assignee)

Comment 20

7 years ago
(In reply to Corey Shields [:cshields] from comment #19)

> Prod: is up!!!  https://tbpl.mozilla.org

FYI according to philor "you can use this for everything except retriggering"
I've adjusted our buildapi to accept https://tbpl.mozilla.org for retriggering. philor reports it working now.
Depends on: 683430
(Reporter)

Comment 22

7 years ago
This is done.
Status: NEW → RESOLVED
Last Resolved: 7 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard

Updated

2 years ago
Group: infra
You need to log in before you can comment on or make changes to this bug.