Closed
Bug 1484950
Opened 7 years ago
Closed 7 years ago
mdc1/mdc2: Cannot reach signing*.srv.releng.mdc{1,2}.mozilla.com:9120 from mobil-signing-linux-1.srv.releng.use1.mozilla.com
Categories
(Infrastructure & Operations Graveyard :: NetOps, task)
Infrastructure & Operations Graveyard
NetOps
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlorenzo, Assigned: van)
References
Details
Attachments
(1 file)
47.93 KB,
text/html
|
Details |
Since bug 1409091, mobil-signing-linux-1.srv.releng.use1.mozilla.com talks to the signing servers in order to sign Firefox Focus (for Android). For instance [1]. The tasks worked because they reached scl3.
4 days ago, we deactivated scl3 on mobil-signing-linux-1[2]. This made the task fail since then[3]. Per these logs, it seems we can't reach any signing*.mdc*:9120 from it.
I connected onto the machine and did the following:
> $ ping signing9.srv.releng.mdc1.mozilla.com
> PING signing9.srv.releng.mdc1.mozilla.com (10.49.48.42) 56(84) bytes of data.
> 64 bytes from signing9.srv.releng.mdc1.mozilla.com (10.49.48.42): icmp_seq=1 ttl=62 time=79.7 ms
> 64 bytes from signing9.srv.releng.mdc1.mozilla.com (10.49.48.42): icmp_seq=2 ttl=62 time=80.4 ms
> $ nc signing9.srv.releng.mdc1.mozilla.com 9120 ; echo $?
> 1
It seems the port 9120 is blocked to mobil-signing-linux-1. I know I had to whitelist it for slc3 in [4], but I don't know how to do it for mdc*.
Could you guys help me with that?
[1] https://tools.taskcluster.net/groups/bT-ak0LPRZCfzRszu0zxmA/tasks/YiL7td5aSgSahUD_qmdM0w/runs/0/logs/public%2Flogs%2Flive_backing.log#L6734
[2] https://github.com/mozilla-releng/build-puppet/pull/170
[3] https://tools.taskcluster.net/groups/dfmoWdgmQTSP6tjLQn8D7Q/tasks/UhEtSPJ3QR-zIhSvHD9vlQ/runs/0/logs/public%2Flogs%2Flive_backing.log#L6
[4] https://bug1409091.bmoattachments.org/attachment.cgi?id=8970896
Reporter | ||
Updated•7 years ago
|
Severity: normal → blocker
Assignee | ||
Comment 1•7 years ago
|
||
:jlorenzo so it looks like your need this flow:
mobil-signing-linux-1.srv.releng.use1.mozilla.com > signing*.mdc*:9120
>we deactivated scl3 on mobil-signing-linux-1[2]
do you need mobil-signing-linux2 or is that 12? if so, can i have the FQDN? ill add the above flow shortly.
Assignee: network-operations → vle
Assignee | ||
Comment 2•7 years ago
|
||
added policy 272 mobil-signing-linux-1--buildbot. can you test and let me know if that's all you needed?
Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 3•7 years ago
|
||
Johan is out on PTO, he asked me to shepard this bug while he's away so let me see if I get this right ...
Based on what I can see in the TC provisioners,only one worker ID defined within the mobile-signing-v1 workerGroup[1], and that's mobil-signing-linux-1[2]. Full FQDN of the workerID is "mobil-signing-linux-1.srv.releng.use1.mozilla.com".
Until a couple of days ago, this instance was talking succcessfully to the following signing servers from the SCL3:
signing4.srv.releng.scl3.mozilla.com:9120
signing5.srv.releng.scl3.mozilla.com:9120
signing6.srv.releng.scl3.mozilla.com:9120
Since that's no longer possible due to the migration to MDC1[3], the machine is now attempting to communicate to:
signing7.srv.releng.mdc1.mozilla.com:9120
signing8.srv.releng.mdc1.mozilla.com:9120
signing8.srv.releng.mdc1.mozilla.com:9120
and fails to do so as Johan said due to netflows missing I suppose.
(In reply to Van Le [:van] from comment #1)
> :jlorenzo so it looks like your need this flow:
>
> mobil-signing-linux-1.srv.releng.use1.mozilla.com > signing*.mdc*:9120
Yes, indeed! We need mobil-signing-linux-1.srv.releng.use1.mozilla.com > signing{7,8,9}.srv.releng.mdc1.mozilla.com:9120 (to be extremely narrow), or (more generic) "signing*.mdc*:9120"
(In reply to Van Le [:van] from comment #1)
> >we deactivated scl3 on mobil-signing-linux-1[2]
>
> do you need mobil-signing-linux2 or is that 12? if so, can i have the FQDN?
> ill add the above flow shortly.
Just the mobil-signing-linux-1.srv.releng.use1.mozilla.com.
There is *no* mobil-signing-linux2 or mobil-signing-linux12. The "...linux-1[2]" was the notation to reference the puppet "[2]" PR from bug 1484950 comment 0, which was https://github.com/mozilla-releng/build-puppet/pull/170.
(In reply to Van Le [:van] from comment #2)
> Created attachment 9002820 [details]
> 1484950.html
>
> added policy 272 mobil-signing-linux-1--buildbot. can you test and let me
> know if that's all you needed?
The policy looks good to me. Maybe modulo the "buildbot" naming :) However, I'm not familiar with the other namings so I don't know if that's a general convention or not, but AFAIK Buildbot is going away soon, both automation wise and infra-wise. Either way, just a tiny nit.
Sounds like it enables traffic from mobil-signing-linux-1.srv.releng.use1 > releng_signing_mdc1 which seems fine.
I've tried to test that out following Johan's steps from comment 0 but I don't get a proper TCP connection.
> [mtabara@mobil-signing-linux-1.srv.releng.use1.mozilla.com ~]$ ping signing9.srv.releng.mdc1.mozilla.com
> PING signing9.srv.releng.mdc1.mozilla.com (10.49.48.42) 56(84) bytes of data.
> 64 bytes from signing9.srv.releng.mdc1.mozilla.com (10.49.48.42): icmp_seq=1 ttl=62 time=79.6 ms
so it's reachable. But then when I tried to connect to that specific port, it hangs.
> [mtabara@mobil-signing-linux-1.srv.releng.use1.mozilla.com ~]$ nc signing9.srv.releng.mdc1.mozilla.com 9120
> (nothing .. it just hangs here)
I'd normally go ahead and rerun one of those signing jobs which are being run on this particular instance, but the current graph (encompassing build, signing, pushing) is still blocked on the build task failing. Since tasks are run in that order, it doesn't currently get to signing so I can't run/rerun that. Focus team is aware of that and working on a fix AFAIK. A recent graph was triggered earlier here[4].
@van: chnages from your side seem good to me, not sure why netcat is not returning a succcessful connection to that host:port. Do you have a different method of testing the netflow policy you've added in mind?
Thank you!
[1]: https://tools.taskcluster.net/provisioners/scriptworker-prov-v1/worker-types/mobile-signing-v1
[2]: https://tools.taskcluster.net/provisioners/scriptworker-prov-v1/worker-types/mobile-signing-v1/workers/mobile-signing-v1/mobil-signing-linux-1
[3]: https://tools.taskcluster.net/groups/dfmoWdgmQTSP6tjLQn8D7Q/tasks/UhEtSPJ3QR-zIhSvHD9vlQ/runs/0/logs/public%2Flogs%2Flive_backing.log#L6
[4]: https://tools.taskcluster.net/groups/GOQgxRSdSquZ651kfavubg
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 4•7 years ago
|
||
Sounds like signing jobs are now working! Based on the worker history[1], turns out we finally have a green signing job[2].
@van: I'll reopen if things go south again, thanks a lot for the help!
[1]: https://tools.taskcluster.net/provisioners/scriptworker-prov-v1/worker-types/mobile-signing-v1/workers/mobile-signing-v1/mobil-signing-linux-1
[2]: https://tools.taskcluster.net/groups/AVADqfB-QzW_k0aqBlx8vA/tasks/ahEeCzdLRFeXhVExQqgd0A/runs/0
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•