Closed Bug 600039 Opened 14 years ago Closed 14 years ago

build.m.o not responding to requests from Toronto office

Categories

(Infrastructure & Operations Graveyard :: NetOps, task)

x86
All
task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: ravi)

Details

This should return a text file
 http://build.mozilla.org/tryserver-symbols/crashinject.pdb/5038C745A23F4CCC93977EA88C0BC0FB2/crashinject.sym
starting with 
 MODULE windows x86 5038C745A23F4CCC93977EA88C0BC0FB2 crashinject.pdb
 FILE 1 d:/sdks/v7.0/include/WinNT.h
 FILE 2 d:/msvs8/vc/include/ctype.h

This should return a page with graphs
 http://build.mozilla.org/builds/pending/

They work for me if I route via the build VPN, but not if I hit the external IP. Ehsan says this is preventing his work on a security bug and is a blocker.
Assignee: server-ops → network-operations
Component: Server Operations → Server Operations: Netops
An ETA here would be much appreciated.
Blocks: 286389
Blocks: 286382
No longer blocks: 286389
From an external source, that's exactly what it returns.  What do you get if you hit it externally?

I tested from Amsterdam and China.
WFM from two other external hosts, as well.
Ehsan, something wrong with your networking ? I think you said you were in the Toronto office.
What IP address are you getting when you resolve build.mozilla.org?  From my external test host I'm able to get the data as you descrive without issues.

ravi@neu:[ttype][6:43pm](4):111:~> host build.mozilla.org
build.mozilla.org is an alias for dm-wwwbuild01.mozilla.org.
dm-wwwbuild01.mozilla.org has address 63.245.208.186
(In reply to comment #5)
> What IP address are you getting when you resolve build.mozilla.org?  From my
> external test host I'm able to get the data as you descrive without issues.
> 
> ravi@neu:[ttype][6:43pm](4):111:~> host build.mozilla.org
> build.mozilla.org is an alias for dm-wwwbuild01.mozilla.org.
> dm-wwwbuild01.mozilla.org has address 63.245.208.186

OK, this is the problem.  build.mozilla.org resolves to 10.2.74.128 for me.  I also tried to connect to the MV office VPN, and I get the same IP when I'm on VPN as well.
ehsanakhgari:~/moz/src [07:01:05]$ traceroute build.mozilla.org
traceroute to build.mozilla.org (10.2.74.128), 64 hops max, 52 byte packets
 1  ca-gw1.ca.mozilla.com (10.240.2.1)  4.415 ms  3.829 ms  1.692 ms
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * *
12  * * *
13  * * *
14  * * *
15  * * *
16  * * *
17  * * *
18  * * *
19  * * *
20  * * *
21  * * *
22  * * *
23  * * *
24  * * *
25  * * *
26  * * *
27  * * *
28  * * *
29  * * *
30  * * *
31  * * *
32  * * *
33  * * *
34  * * *
35  * * *
36  * * *
37  * * *
38  * * *
39  * * *
40  * * *
41  * * *
42  * * *
43  * * *
44  * * *
45  * * *
46  * * *
47  * * *
48  * * *
49  * * *
50  * * *
51  * * *
52  * * *
53  * * *
54  * * *
55  * * *
56  * * *
57  * * *
58  * * *
^C
Summary: build.m.o not responding to requests from external IPs → build.m.o not responding to requests from Toronto office
I was all but assured build.mozilla.org would never need to resolve externally.  I was not aware there was an externally resolvable address and so when build.mozilla.org was moved to the new DNS tree (and became a SOA) to help facilitate the Internap move and VPN/hostname continuity it brought to light this split horizon issue.

Currently httpd on build.mozilla.org (10.2.74.128) gives a HTTP/401 internally and there is no restriction externally.  There are 2 fixes:

1) The quickest would be to rename the build.mozilla.org (63.245.208.186) virtual server to something else (e.g. www.build.mozilla.org).

2) Alternately (and more time consuming) a combination of either removing the authentication on 10.2.74.128 and opening ACLs should allow the flow.
Any of the two solutions which would resolve this quicker would be super-awesome!
I did #1.  As seen from:

[root@caadm01 bin]# host www.build.mozilla.org
www.build.mozilla.org has address 63.245.208.186
Lowering the priority as a fix has been provided.  If this is good for a permanent fix please close.
Severity: blocker → normal
Status: NEW → ASSIGNED
I think that build.mozilla.org should resolve externally again, too.

Sorry for wherever the wires were crossed, Ravi, I think one of us probably confused build.mozilla.org the host with .build.mozilla.org the network when you asked about it.
build.mozilla.org (63.245.208.186) does resolve externally.  www.build.mozilla.org is just an internal record under the mozilla.org namespace that points to the same IP to get around your issue.  If we want to make www.build.mozilla.org also publicly available I can do that.

ravi@neu:[ttyq0][9:14pm](5):106:~> host build.mozilla.org
build.mozilla.org is an alias for dm-wwwbuild01.mozilla.org.
dm-wwwbuild01.mozilla.org has address 63.245.208.186

ravi@neu:[ttyq0][9:21pm](5):107:~> host www.build.mozilla.org
Host www.build.mozilla.org not found: 3(NXDOMAIN)
oh, sorry! I misunderstood. This seems fine.
Okay then!
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → WORKSFORME
When I switched the symbol server DNS name to www.build.mozilla.org, it could successfully load the symbols, but the source server still doesn't work.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Could you add more information here ? Presumably you can reach http://hg.mozilla.org.
(In reply to comment #17)
> Could you add more information here ? Presumably you can reach
> http://hg.mozilla.org.

Yes, I can.  What I meant was the source server integration as explained here: <https://developer.mozilla.org/en/Using_the_Mozilla_source_server>
The sym file from your latest push to try looks plausible:
 http://build.mozilla.org/tryserver-symbols/js.pdb/91F3AF71FF1348A3B19CD6ECC63013A91/js.sym
and this returns data
 http://hg.mozilla.org/try/raw-file/c6870be9ecaf/js/src/jslong.h

So not a bug for IT to worry about, please file it separately. Ted is the master of knowledge in this area.
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
(In reply to comment #19)
> The sym file from your latest push to try looks plausible:
> 
> http://build.mozilla.org/tryserver-symbols/js.pdb/91F3AF71FF1348A3B19CD6ECC63013A91/js.sym
> and this returns data
>  http://hg.mozilla.org/try/raw-file/c6870be9ecaf/js/src/jslong.h
> 
> So not a bug for IT to worry about, please file it separately. Ted is the
> master of knowledge in this area.

I'm *not* talking about the source code from hg.  I'm talking about the source server integration in Visual Studio which allows you to view the source code when you're debugging an executable without having the source code locally available...
I was just trying to point out that the information that Visual Studio needs to request the source is in the sym file, and that if I construct a synthetic request for a source file from hg then it works (despite the try repo having so many heads).
(In reply to comment #21)
> I was just trying to point out that the information that Visual Studio needs to
> request the source is in the sym file, and that if I construct a synthetic
> request for a source file from hg then it works (despite the try repo having so
> many heads).

So, does Visual Studio get the source files directly from hg?
OK, the workaround of using www.build.mozilla.org is not good enough any more.  The new TBPL code tries to load these URLs:

http://build.mozilla.org/builds/builds-pending.js
http://build.mozilla.org/builds/builds-running.js

both of which fail from inside the Toronto office.  Therefore, none of the developers inside the Toronto office can access TBPL any more.  This is a blocker for us.
Severity: normal → blocker
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
No longer blocks: 286382
Netops will look at this when they get into the office. tbpl went live *today*, so I'm sure while this is a blocker, eventually, you have other standby tools to work with till this is resolved :)
Assignee: network-operations → ravi
(In reply to comment #24)
> Netops will look at this when they get into the office. tbpl went live *today*,
> so I'm sure while this is a blocker, eventually, you have other standby tools
> to work with till this is resolved :)

He's talking about the non-mozilla hosted TBPL, which developers have been relying on for months now.
(In reply to comment #25)
 
> He's talking about the non-mozilla hosted TBPL, which developers have been
> relying on for months now.

He specifically mentioned the new TBPL code..which is what led me to think tbpl.mozilla.org

Either ways, netops will look at this ASAP.
(In reply to comment #26)
> (In reply to comment #25)
> 
> > He's talking about the non-mozilla hosted TBPL, which developers have been
> > relying on for months now.
> 
> He specifically mentioned the new TBPL code..which is what led me to think
> tbpl.mozilla.org

I'm actually talking about the new TBPL _code_, which is vastly improved using the buildbot data.  The domain name (tbpl.mozilla.org) has no relevance here, and it just happens that we switched to that domain name at the same time as the underlying code changed.

Until this is resolved, we have to use an old version of TBPL which is nowhere near as capable as the current version.
Contemplating possible resolutions...
(In reply to comment #8)
> 2) Alternately (and more time consuming) a combination of either removing the
> authentication on 10.2.74.128 and opening ACLs should allow the flow.

Can we do this instead? Seems like the mountain view office is also getting the internal IP, and isn't having any issues connecting. Is it not just a matter of adding 10.240.* to the same whitelist that 10.250.* is on?
It appears the data served from dm-wwwbuild01.mozilla.org (63.245.208.186) a/k/a build.mozilla.org is different from the data from build.mozilla.org (10.2.74.128).

This split horizon here is what is not compatible.  My fix of creating a new hostname was under the assumption the content was distinctly different.  If it is the same and there is only authentication on the public side, is there any way to, on the server side, to only for user auth if you are not coming from an internal IP?
I really don't have any understanding of how things are set up, so I'm making wild guesses, but the data points I have are this:

- can't connect from Toronto (resolves to 10.2.74.128)
- can't connect from MV-VPN (resolves to 10.2.74.128)
- *can* connect from Mountain View office (resolves to 10.2.74.128)

By "can't connect" I mean that I get no response at all for requests to that IP - HTTP requests just time out, ping/traceroute just gets nothing, etc. - traffic appears to be entirely blocked. Is there a firewall in the way?
I will revisit this after the firewall is switched out in MTV in 2 days (Sunday).

There are many firewalls in the way.  3 of them to be exact.
Are there any updates here?
No.  The collateral issues from upgrading this weekend were only fixed 40m ago.  Once we reassess where we stand can we begin to look into this.

I expect this to still happen today, but at the moment I cannot offer any data.
Any updates now?
Too many issues which have since cooled down or been resolved with the network as a whole preempted this and other routine bugs.  This is at the top of my queue for today to resolve.
This should not be resolved as it was tested successfully from TOR's admin host.

[root@caadm01 ~]# curl -H 'build.mozilla.org' 'build.mozilla.org/builds/builds-pending.js'
{"pending": {"try": {"fd6942bda3b7": [{"submitted_at": 1287516512, "id": 1099664, "buildername": "Rev3 MacOSX Leopard 10.5.8 tryserver opt test reftest"}, {"submitted_at": 1287516512, "id": 1099666, "buildername": "Rev3 MacOSX Leopard 10.5.8 tryserver opt test mochitests-3/5"}, {"submitted_at": 1287516512, "id": 1099667, "buildername": "Rev3 MacOSX Leopard 10.5.8 tryserver opt test mochitests-1/5"}, {"submitted_at": 1287516512, "id": 1099669, "buildername": "Rev3 MacOSX Leopard 10.5.8 tryserver opt test mochitests-5/5"}, {"submitted_at": 1287516512, "id": 1099671, "buildername": "Rev3 MacOSX Leopard 10.5.8 tryserver opt test jsreftest"}], "1e51437e17b4": [{"submitted_at": 1287516696, "id": 1099701, "buildername": "Rev3 MacOSX Leopard 10.5.8 tryserver debug test xpcshell"}, {"submitted_at": 1287517592, "id": 1100025, "buildername": "Rev3 MacOSX Leopard 10.5.8 tryserver opt test xpcshell"}], "e72516c6cc0b": [{"submitted_at": 1287518252, "id": 1100172, "buildername": "Rev3 MacOSX Leopard 10.5.8 tryserver opt test xpcshell"}, {"submitted_at": 1287519335, "id": 1100571, "buildername": "Rev3 Fedora 12x64 tryserver debug test xpcshell"}], "26ecd3da7d51": [{"submitted_at": 1287512492, "id": 1098780, "buildername": "Rev3 Fedora 12 tryserver opt test crashtest"}, {"submitted_at": 1287512492, "id": 1098782, "buildername": "Rev3 Fedora 12 tryserver opt test mochitests-2/5"}, {"submitted_at": 1287512492, "id": 1098783, "buildername": "Rev3 Fedora 12 tryserver opt test mochi
[...]
Status: REOPENED → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
(In reply to comment #37)
> This should not be resolved as it was tested successfully from TOR's admin
> host.

I bet you meant "now".
That is an even money bet there :)
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.