Closed Bug 1344364 Opened 3 years ago Closed 3 years ago

Need to open port 5432 (postgresql) for on relengapi webheads

Categories

(Infrastructure & Operations :: NetOps: DC ACL Request, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: garbas, Assigned: jbarnell)

Details

Hi,

RelengAPI is moving its database to PostgreSQL(Heroku). In Bug 1342251 I managed to get needed system packages installed, but now I would need to open port 5432 so I can connect to PostgreSQL database.

This are the logs I'm seeing

Mar 01 16:01:45 web1.stage.releng.webapp.scl3.mozilla.com relengapi: OperationalError: (psycopg2.OperationalError) could not connect to server: Connection timed out
Mar 01 16:01:45 web1.stage.releng.webapp.scl3.mozilla.com relengapi: #011Is the server running on host "ec2-184-73-202-229.compute-1.amazonaws.com" and accepting
Mar 01 16:01:45 web1.stage.releng.webapp.scl3.mozilla.com relengapi: #011TCP/IP connections on port 5432? 


[root@web1.releng.webapp.scl3 ~]# traceroute -n -T -p 5432 ec2-184-73-202-229.compute-1.amazonaws.com
traceroute to ec2-184-73-202-229.compute-1.amazonaws.com (184.73.202.229), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
...
:justdave I got your contant from :coop. Did I assigned the ticket to the right group? Looking to be able to deploy this week.
Flags: needinfo?(justdave)
Can you let me know if web1.releng.webapp.scl3 is configured to use the proxy servers? I'm just picking this up but that should get you up and running.
Flags: needinfo?(rgarbas)
Assignee: network-operations → jbarnell
:jbarnell I'm not really sure where/how to check this. Is there any mana/wiki page I can read about this?
Flags: needinfo?(rgarbas)
Flags: needinfo?(justdave)
Flags: needinfo?(jbarnell)
That's a question for someone in webops.
Flags: needinfo?(jbarnell) → needinfo?(smani)
No this system is not using the proxy servers yet, so a normal network ACL to open the port should be good.
Flags: needinfo?(smani)
libpq/pyscopg2 do not offer any known configurable option for using an HTTP proxy that I can find accessible at a Python level, so this application is probably not capable of using the outbound proxies without rewriting upstream DB code.
Also I believe this is the list of releng web servers that run the relengapi app and would likely need this ACL, feel free to correct me Rok:

web1.stage.releng.webapp.scl3.mozilla.com
web1.releng.webapp.scl3.mozilla.com
web2.releng.webapp.scl3.mozilla.com
relengwebadm.private.scl3.mozilla.com

Do the celery servers need the same port open Rok?  If so, add celery[1-2].srv.releng.scl3.mozilla.com and celery1.stage.srv.releng.scl3.mozilla.com.
Rok, is this a permanent long-term ACL, or merely a temporary ACL as part of exiting to the cloud?
Flags: needinfo?(rgarbas)
:atoll: this is needed until we migrate all traffic to new services. probably this is going to take us whole year, but majority of the trafic will be redirected in next month.

:ericz: celery servers also need the same port open.
Flags: needinfo?(rgarbas)
Okay. If it were possible to minimize the amount of time we keep this ACL open, that would be lovely, so please let us know as soon as you've migrated the bare minimum necessary for us to close it.
Sources:

web1.stage.releng.webapp.scl3.mozilla.com
web1.releng.webapp.scl3.mozilla.com
web2.releng.webapp.scl3.mozilla.com
relengwebadm.private.scl3.mozilla.com
celery1.srv.releng.scl3.mozilla.com and
celery2.srv.releng.scl3.mozilla.com and
celery1.stage.srv.releng.scl3.mozilla.com.

For the record, relengwebadm may already have the necessary ACL, but the workers surely don't. These could be tagged "Relengapi" or "Relengweb" if you need a shortname for the routers, and there might already be an ACL group leftover from the past for them.
:atoll: 

(1) I don't think I need the shortname.
(2) To confirm, I can already connect to postgresql from relengwebadm.
(3) For restoring back the ACL I created an Bug 1351073 to remind me.

Thank you.
Hi there,

just checking in on status of this bug? It's blocking the migration of releng services.

atoll, do you think you could give us an ETA of completion? Should this bug be assigned to you? I understand you have your own priorities so if we could just get an ETA, we can adjust our own goals and targets accordingly. Thanks in advance. Feel free to poke rok on vidyo or irc if it means less investigation and overall time for you. :)
Flags: needinfo?(rsoderberg)
Netops can help here. 

James, 

Comment #11 has the sources, they all need access to ec2-184-73-202-229.compute-1.amazonaws.com port 5432.

Jordan,

Please correct me if I'm wrong. Thanks!
Flags: needinfo?(rsoderberg) → needinfo?(jbarnell)
To correct :fox2mike:

Staging env needs to be able to access ec2-184-73-202-229.compute-1.amazonaws.com on port 5432
This means the following sources:
 - web1.stage.releng.webapp.scl3.mozilla.com
 - celery1.stage.srv.releng.scl3.mozilla.com.
 - relengwebadm.private.scl3.mozilla.com

Production environment needs to be able to access ec2-54-87-55-241.compute-1.amazonaws.com on port 5432
This means the following sources:
 - web1.releng.webapp.scl3.mozilla.com
 - web2.releng.webapp.scl3.mozilla.com
 - celery1.srv.releng.scl3.mozilla.com and
 - celery2.srv.releng.scl3.mozilla.com and
 - relengwebadm.private.scl3.mozilla.com


At this time only `relengwebadm.private.scl3.mozilla.com` has the opened port 5432 and can acess both postgresql databases.
anything change here today, Monday, in and around 13:15-13:55 pacific?

a number of sql dependent services, notably archiver, tooltool, were taken down and we received 500s. e.g. Bug 1355205 - builds fail to download resources

garbas> jlund: connection to mysql is a problem
13:52:16 jlund: https://papertrailapp.com/groups/1421834/events?q=(program%3Arelengapi+OR+api.pub.build.mozilla.org)+-web1.stage+-celery1.stage
13:53:24 
<jlund> Jordan Lund we should raise this in #moc and loop in webops
13:54:04 
<garbas> it looks like mysql connection is back
13:54:11 maybe it was only temporary
13:56:44 
<garbas> jlund: it shows tooltool is now serving
I looked in newrelic at api.pub.build.mozilla.org and it looks like it was a db auth issue rather than a flow, lots of tracebacks ending in

OperationalError: (_mysql_exceptions.OperationalError) (1045, "Access denied for user 'REDACTED'@'AAA.BBB.CCC.DDD' (using password: YES)")
Is this the cause of bug 1350273?
(In reply to Chris AtLee [:catlee] from comment #18)
> Is this the cause of bug 1350273?

Not 17 days ago, certainly. Did bug 1350273 reoccurred during today's 13:15-13:55 window?
hi jbarnell, can you confirm if you can resolve this bug and what the ETA is for doing so?
pecking away at this (part 1):

[edit security policies from-zone srv to-zone untrust]
      policy puppet--rsync { ... }
+     policy postgressql {
+         match {
+             source-address celery1.stage;
+             destination-address ec2-184-73-202-229.compute-1;
+             application PostgresSQL;
+         }
+         then {
+             permit;
+         }
+     }
[edit security zones security-zone untrust address-book]
       address orange-antelope-static3.rmq.cloudamqp.com { ... }
+      address ec2-184-73-202-229.compute-1 184.73.202.229/32;
[edit applications]
    application releng_to_puppet { ... }
+   application PostgresSQL {
+       protocol tcp;
+       destination-port 5432;
+   }
Part 2

{primary:node1}[edit]
jbarnell@fw1.ops.releng.scl3.mozilla.net# show | compare 
[edit security policies from-zone srv to-zone untrust]
      policy postgressql { ... }
+     policy PostGres_prod {
+         match {
+             source-address [ celery1 celery2 ];
+             destination-address ec2-54-87-55-241.compute-1;
+             application PostgresSQL;
+         }
+         then {
+             permit;
+         }
+     }
[edit security zones security-zone untrust address-book]
       address ec2-184-73-202-229.compute-1 { ... }
+      address ec2-54-87-55-241.compute-1 54.87.55.241/32;
Flags: needinfo?(jbarnell)
Final bits

jbarnell@fw1.ops.scl3.mozilla.net# show | compare 
[edit security policies from-zone private to-zone untrust]
      policy inet-permit { ... }
+     policy releng_postgres {
+         match {
+             source-address relengwebadm.private.scl3;
+             destination-address [ ec2-184-73-202-229.compute-1 ec2-54-87-55-241.compute-1 ];
+             application postgres;
+         }
+         then {
+             permit;
+         }
+     }
[edit security policies from-zone webapp to-zone untrust]
      policy mozreview_pulse { ... }
+     policy releng_postgres_stage {
+         match {
+             source-address web1.stage.releng.webapp.scl3;
+             destination-address ec2-184-73-202-229.compute-1;
+             application postgres;
+         }
+         then {
+             permit;
+         }
+     }
+     policy releng_postgres_prod {
+         match {
+             source-address [ web1.releng.webapp.scl3 web2.releng.webapp.scl3 ];
+             application postgres;     
+             ## Warning: missing mandatory statement(s): 'destination-address'
+         }
+         then {
+             permit;
+         }
+     }
[edit security zones security-zone untrust address-book]
       address contegix-118 { ... }
+      address ec2-54-87-55-241.compute-1 54.87.55.241/32;
+      address ec2-184-73-202-229.compute-1 184.73.202.229/32;

{primary:node1}[edit]
jbarnell@fw1.ops.scl3.mozilla.net#
please check sorry for the wait.
Flags: needinfo?(jlund)
I've checked that all port 5432 is open on all staging and production relengapi servers.

:jbarnell thank you!
Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(jlund)
Resolution: --- → FIXED
Summary: Need to open port 5432 (postgresql) for on relenapi webheads → Need to open port 5432 (postgresql) for on relengapi webheads
Hi,

I forgot to add "web3.releng.webapp.scl3.mozilla.com" to the list of production environments that have port 5432 open. Could we add this webhead to the list?

Here is the list of servers in production environment that needs access to ec2-54-87-55-241.compute-1.amazonaws.com on port 5432
 - web1.releng.webapp.scl3.mozilla.com
 - web2.releng.webapp.scl3.mozilla.com
 - web3.releng.webapp.scl3.mozilla.com
 - celery1.srv.releng.scl3.mozilla.com and
 - celery2.srv.releng.scl3.mozilla.com and
 - relengwebadm.private.scl3.mozilla.com

pinging :justdave: by :jbarnell:'s recommendation
Status: RESOLVED → REOPENED
Flags: needinfo?(justdave)
Resolution: FIXED → ---
justdave@fw1.ops.scl3.mozilla.net# show | compare
[edit security policies from-zone webapp to-zone untrust policy releng_postgres_prod match]
-      source-address [ web1.releng.webapp.scl3 web2.releng.webapp.scl3 ];
+      source-address [ web1.releng.webapp.scl3 web2.releng.webapp.scl3 web3.releng.webapp.scl3 ];

CHG0011750
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Flags: needinfo?(justdave)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.