[bedrock] stage deployment is failing to rsync with webheads

RESOLVED FIXED

Status

Infrastructure & Operations Graveyard
WebOps: Product Delivery
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: pmac, Assigned: cyliang)

Tracking

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2261] )

The deployment for staging is failing due to 2 things:

1. can't resolve bedrock1.stage.webapp.phx1.mozilla.com
2. bedrock1.stage.webapp.scl3.mozilla.com can't connect via rsync (ssh) to bedrockadm.private.phx1.mozilla.com

Relevant section from our push logs:

> [2015-11-25 06:39:32] Running deploy_app
> pushing code to bedrock1.stage.webapp.phx1.mozilla.com
> [2015-11-25 06:39:32] [bedrock1.stage.webapp.phx1.mozilla.com] running: /data/bin/update-www.sh www.allizom.org-django
> [2015-11-25 06:39:33] [bedrock1.stage.webapp.phx1.mozilla.com] failed: /data/bin/update-www.sh www.allizom.org-django (0.021s)
> [bedrock1.stage.webapp.phx1.mozilla.com] err: ssh: Could not resolve hostname bedrock1.stage.webapp.phx1.mozilla.com: Name or service not known
> pushing code to bedrock1.stage.webapp.scl3.mozilla.com
> [2015-11-25 06:39:33] [bedrock1.stage.webapp.scl3.mozilla.com] running: /data/bin/update-www.sh www.allizom.org-django
> [2015-11-25 06:40:36] [bedrock1.stage.webapp.scl3.mozilla.com] finished: /data/bin/update-www.sh www.allizom.org-django (63.181s)
> [bedrock1.stage.webapp.scl3.mozilla.com] err: rsync: failed to connect to bedrockadm.private.phx1.mozilla.com (10.8.75.50): Connection timed out (110)
> [bedrock1.stage.webapp.scl3.mozilla.com] err: rsync error: error in socket IO (code 10) at clientserver.c(122) [Receiver=3.0.9]
> [2015-11-25 06:40:36] [bedrock1.stage.webapp.scl3.mozilla.com] running: service httpd graceful
> [2015-11-25 06:40:36] [bedrock1.stage.webapp.scl3.mozilla.com] finished: service httpd graceful (0.474s)
> [2015-11-25 06:40:41] Finished deploy_app (68.683s)
Depends on: 1227967

Updated

2 years ago
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2261]
(Assignee)

Comment 1

2 years ago
Issue #1 came from PHX1 references in the commander config file.  Those references have been deleted.
Issue #2 came from a missing ACL.  That has been fixed and a good push went out (verified via IRC).
Assignee: server-ops-webops → cliang
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.