Closed Bug 450645 Opened 16 years ago Closed 7 years ago

hg.mozilla.org should disable http fetch, make https mandatory

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: BenB, Unassigned)

References

Details

(Whiteboard: [relsec])

Attachments

(1 obsolete file)

When devs (and even more so distributors and build machines) fetch via http, they make themselves vulnerable to interception (MITM etc.) and allow hackers to mess with the downloaded source code.

The stakes are very high (even 30000 users are a lot to root at once) for Mozilla. Even more so when a developer commits an inserted backdoor / security hole. I think the review process would not catch that, as review happens before checkin and relies on the commiter to do the right thing; very rarely the commited diffs are read carefully and scrunitized. There is a small, but real chance that a backdoor / deliberate security hole is inserted into official Mozilla code this way.

This actually did happen in the past. Usually, tarballs on insecure FTP servers were modified, but intercepting VCS is just another vector to do the same. So, the likeliness may be small, but the risk is real.

Given that hardware https boosters exist, or https often is fine just with server CPU, turning on https is possible even on big scale.

Please turn off http, to ensure that all devs and distributors use https.
Compare bug 450648 about moving http webapps away from hg.m.o.
http could either be completely disabled or just redirect to hgweb.mozilla.org or similar.
OS: Linux → All
Hardware: PC → All
I disagree.  I think its fine to leave http on for folks that are browsing casually or just looking at things.  We already have https available for folks that want a secure chanel.

I don't think we should shut down a service because there is potential for abuse.
The other flavor of this is if the MITM injection is performed with the intention of pwning the developer's machine directly. Once an attacker is into your box, it's a small task for them to steal account logins and the such. [Eg, install keysniffer, grab SSH passphrase and keys, then commit to Hg/CVS at your leisure.]
(In reply to comment #3)
> The other flavor of this is if the MITM injection is performed with the
> intention of pwning the developer's machine directly. Once an attacker is into
> your box, it's a small task for them to steal account logins and the such. [Eg,
> install keysniffer, grab SSH passphrase and keys, then commit to Hg/CVS at your
> leisure.]
> 

We had this risk with CVS, too, right? With anonymous pserver checkouts?

Not saying this is necessarily a reason to keep it, but I don't think we're in a worse position.
(In reply to comment #4)
> We had this risk with CVS, too, right? With anonymous pserver checkouts?

No, all the build machines pulled the source over ssh from cvs.mozilla.org, so no, we didn't have this problem with CVS.

> Not saying this is necessarily a reason to keep it, but I don't think we're in
> a worse position.

We are in a worse position currently with Hg. ;)
(In reply to comment #5)
> (In reply to comment #4)
> > We had this risk with CVS, too, right? With anonymous pserver checkouts?
> 
> No, all the build machines pulled the source over ssh from cvs.mozilla.org, so
> no, we didn't have this problem with CVS.
> 
> > Not saying this is necessarily a reason to keep it, but I don't think we're in
> > a worse position.
> 
> We are in a worse position currently with Hg. ;)
> 

We're not talking about build machines exclusively, afaik
aravind, casual browsing via http webapps should not happen on hg.mozilla.org anyways, see the other bug filed and mentioned.
(In reply to comment #2)

> I don't think we should shut down a service because there is potential for
> abuse.

Visiting http://hg.mozilla.org in a browser is fine, the issue here is about "hg clone http://hg.mozilla/org".

(In reply to comment #4)

> We had this risk with CVS, too, right? With anonymous pserver checkouts?

I don't think this matters. (If anything, it might merit another bug to secure CVS and other repos used! :-) DNS attacks are all the rage these days, which makes the odds of being MITM'd much higher.
Assignee: server-ops → aravind
We should not disable http fetch.  We should use https fetch for our build systems.
Comment on attachment 333921 [details] [diff] [review]
[backed out] build machines should pull sources via https, not http

ps, feel free to land this post-review.
(In reply to comment #9)
> We should not disable http fetch.

Can you expand on this assertion?
Attachment #333921 - Flags: review?(nthomas) → review?(ccooper)
Attachment #333921 - Flags: review?(ccooper) → review+
Comment on attachment 333921 [details] [diff] [review]
[backed out] build machines should pull sources via https, not http

changeset:   237:a3d8c3284ac7
Attachment #333921 - Attachment description: build machines should pull sources via https, not http → [checked in] build machines should pull sources via https, not http
This patch ended up breaking some functionality because the host machine didn't have Python SSL modules on it. I think I've fixed it, if not I'll have to back it out.
Comment on attachment 333921 [details] [diff] [review]
[backed out] build machines should pull sources via https, not http

I've backed this out for now, until I have time to fix the failures.
Attachment #333921 - Attachment description: [checked in] build machines should pull sources via https, not http → [backed out] build machines should pull sources via https, not http
This is a WONTFIX for now.  I don't really think blocking http is the right thing to do.  For everyone that wants to, https is available.  In anycase we don't allow any kind of checkins on http or https so this should hopefully not be that big a risk.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
I'd kind of like a better explanation of why this isn't something we'll do (see also comment 12). Doing insecure pulls is essentially just as bad as doing insecure nightly updates. It seems kind of silly to give people a useless choice.

Also reopening to at least have the build machines pull securely.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Depends on: 457100
Now, hg via https entirely broke, see bug 457100.
Nevermind my last comment, was local horkage, sorry.
No longer depends on: 457100
bhearsum:  Are all the build machines pulling builds over https?

dolske: I am not sure that the possibility of someone snooping (or feeding you junk) should stop us from serving traffic on http.  As I mentioned earlier, we do have https enabled for folks that want to go that route.

Once bhearsum confirms that build machines have been switched over to https, I don't see a reason to keep this open.
(In reply to comment #20)
> dolske: I am not sure that the possibility of someone snooping (or feeding you
> junk) should stop us from serving traffic on http.  As I mentioned earlier, we
> do have https enabled for folks that want to go that route.

The point is that if there's both a secure and insecure way of doing something, the insecure way should be disabled unless there's a compelling reason to keep it. So far, no one has presented any argument for why we should retain the ability to pull without SSL.
(In reply to comment #21)
> The point is that if there's both a secure and insecure way of doing something,
> the insecure way should be disabled unless there's a compelling reason to keep
> it. So far, no one has presented any argument for why we should retain the
> ability to pull without SSL.

I am okay with forcing users to use https.  Do we want to redirect folks visiting us on http to https?  I am not sure if that (redirect) is vulnerable to all the pitfalls of http.

Shaver: any objections to this?
If you want security, and not just security theater, you're going to need a slew of dependencies, since (assuming the rumors that Python 2.6 was going to actually verify certs, and not just accept anything including domain mismatches like 2.5 did were true) you'll need a version of Mercurial that only runs on 2.6, and won't allow pulling or cloning by any previous version, plus bugs to update the build machines and MozillaBuild to 2.6 plus that Mercurial. As someone, I think Gavin, was pointing out the other day in #hg, unless you use one of the third-party Python wrappers for OpenSSL instead of Python 2.5's "support" the only difference between http and https is that the person who wants to pwn you needs a cert, any cert, including a self-signed one for a different domain.

Even if you just want security theater, we still probably ought to have some documentation about how to get the fake-SSL support for 2.5 on various platforms - it took me a couple of tries to finally find a blog post saying that you need to install "py25-socket-ssl" from macports to get even the illusion of SSL support.
I honestly don't think this matters either way.  I am fine disabling http and forcing folks to use https.  However, I am not planning on upgrading python versions on the O.S for this.  We run on rhel5 (python 2.4).  It's a pita to run custom compiled versions of python, hg, ssl support libraries.. etc.

So if merely disabling http isn't good enough, then I'd be tempted to WONTFIX this bug.
Aravind, Phil is speaking about the client. The server-side Python would not need to be upgraded.
(In reply to comment #20)
> bhearsum:  Are all the build machines pulling builds over https?

Not yet. Busy with other things. When you're done with this bug feel free to toss it into mozilla.org:RelEng and I'll pick it up when I can.
Depends on: 460020
> When you're done with this bug feel free to
> toss it into mozilla.org:RelEng and I'll pick it up when I can.

The change in build machines has to happen *first*, so we can't be done until the build machines are switched. Thus, I filed blocking bug 460020.
No longer depends on: 460020
No, you misunderstand me: I am speaking about both the client and the server. You filed this bug to make it impossible for anyone pulling from hg.mozilla.org to be MITMed, whether or not they want that protection. If https plus current Python/hg offers absolutely no protection against MITM, the only way that forcing https will force MITM protection is if the server is changed to not accept connections from anyone using them. Whether or not that will be possible without upgrading Python and hg on the server nobody knows, since nobody knows how it will be done.

If you want to force everyone to not be at risk of MITM attacks, your steps have to be:

1. Find a combination of Python (possibly plus extensions) and hg for the client side which will actually verify certs.
2. Find a way to make either hg or the server itself reject connections from anything which is not using that (which might or might not require server upgrades).
3. Get that combo into MozillaBuild.
4. Get that combo on the build machines.
5. Start making the server reject both http and https unless it's using your actually-secure combo.

By way of contrast, if instead of forcing everyone to be secure you want to make it possible for people to be secure, you need just step 1 plus a wiki page.
If you want to reject Python <2.6 which does not verify certs, probably all you'd need to do would be to reject these User-Agent strings. Probably that's just a Apache mod_rewrite rule or something like that.

I think at least your steps 1, 3 and 4 are a good idea, though. Step 4 is already filed as bug 460020 (as noted above). I just filed bug 460052 for Step 3 - fixing MozillaBuild.
In other words, no server software upgrades of Python needed, mod_rewrite or similar would be enough even if you want to block. Apache handles HTTPS on server, so server should not be affected by the Python problem.

Note that a whitelist, which you suggest, is not an option, as I'll want to download the bz2 tarballs via HTTPS using wget or the browser.
https://www.g4v.org/hg/ serves an empty hg repository with a mismatched cert, in case that's useful for testing (cert only matches the non-"www." version).

|hg clone https://www.g4v.org/hg/| succeeds using mercurial 1.0.
http://www.heikkitoivonen.net/blog/2008/10/14/ssl-in-python-26/ - see the "Clients" section, where Heikki talks about how Python 2.6's ssl module leaves hostname checking to the client application. So you need a dependency on a new version of Mercurial which either only runs on 2.6+ and does hostname checking itself, or bundles a third-party alternative, and in either case doesn't provide any fallback to insecure "Secure"SL, which is likely to be a hard sell.
See also http://www.selenic.com/mercurial/bts/issue1174, and possibly http://www.selenic.com/mercurial/bts/issue643. In any case, you'd need all the clients to install an extra dependency or Python 2.6 and a newer version of hg. Seems exceedingly unlikely for now.
Ugh, what a disappointing situation. :(

So, it sounds like there are two possible routes to take:

1) Effectively wontfix this bug (at least for now), since Hg/Python are broken. Revisit the issue in the future when the client software can correctly detect MITM attacks.

2) Go ahead and switch the server to require SSL-only now, and wait for the client software to catch up.

While there's no immediate security benefit to #2, perhaps it's cheaper to switch now rather than later (certainly that would be true ~6 months ago, before most developers and build systems started using Hg).

At the very least, we should probably update documentation to show using https://hg.mozilla.org. [Hmm: If a tree is already cloned from http://hg.mo, can a user just edit .hg/hgrc and change the "default" path to https://hg.mo?]
I'd try to get the client fixed (see bug 460052 comment 3), and then fix this one, i.e. a bug dependency.
Depends on: 460020, 460052
That doesn't work, if only because it's unrealistic to require people (for example those who spend their time in scratchbox, the Fennec platform, or those who want to stick to a Debian release) to use a very recent client version.
(In reply to comment #34)
> [Hmm: If a tree is already cloned from http://hg.mo,
> can a user just edit .hg/hgrc and change the "default" path to https://hg.mo?]

Yep.
I try, but are not getting too far. E.g. for comm-central, the URLs are in the scripts. (Admittedly, this is technically a different issue: "make the scripts use https".)
(In reply to comment #36)
> That doesn't work, if only because it's unrealistic to require people (for
> example those who spend their time in scratchbox, the Fennec platform, or those
> who want to stick to a Debian release) to use a very recent client version.

I don't know what scratchbox is, but the Fennec people generally develop on normal desktop machines that will have whatever toolset we have for Firefox builds. People who want to stick to a stock anything, Debian or otherwise, are generally out of luck. We've always required specific versions of various tools at various times and people just have to go get them (service pack this, XCode that, specific make version, etc.). What you're suggesting is to extend the length of time for which the code repo is potentially vulnerable to injection and the risk of backdooring a couple hundred million people so as not to inconvenience a handful of developers who can't get the required version of an open-source tool. A dozen? twenty? a hundred? That's a bad tradeoff -- the developers can suck it up.
So, from comment 28 and the linked urls, doing this correctly on the server side means upgrading python to 2.6, using non-standard ssl libraries, and re-writing some of the hg server code to use these libraries.

Supporting non-standard libraries is a non-trivial task for us (and I not even sure it'd all just work), and would require extensive testing.  All that work, and we'd still be vulnerable until all clients can be upgraded to some future version of mercurial that addresses these issues.

This is a WONTFIX for now, please re-open at some future date (probably when we are into the next rhel major release)
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → WONTFIX
Well, FTR, I don't think it's a given that this requires any server-side modifications (other than the config change to disable HTTP). Phil's premise in comment 28 is that the server should *enforce* using a client that does proper cert checking. While that's an interesting idea, I'm not sure that's required or as complicated as made out to be.

In any case, this is a bit academic (seeing as there isn't a suitable Mercurial client yet), so deferring any action until that happens seems fair.
I'm sorry, what did I miss in comment 28, that makes it anything other than security theater to require https to prevent MITM attacks when we know that there's no current client for which that will prevent MITM attacks and to then allow continued use of those clients for which it will not prevent MITM attacks even after there is one for which it will?
aravind, there are no software upgrades at all needed on the server side. I said that, with reason, already in comment 30. REOPENing.

Phil, I don't think this bug needs to block on the client. *Even if* it does, it's a classic blocker bug, bug 460052. We don't close bug just because they have blockers, but mark the blockers in the bugzilla field.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Works for me.  I will disable http when those two blockers have been resolved.
Assignee: aravind → nobody
Component: Server Operations → Server Operations: Projects
Can I wontfix this now, now that build has decided to wontfix their bug?
Yes! :)
Status: REOPENED → RESOLVED
Closed: 16 years ago15 years ago
Resolution: --- → WONTFIX
Now that our hg install is fixed and the certificate can be checked, we need to re-visit this.

This is important for the integrity of our source code base, and the security of our developer's machines. If a developer fetches source code over http, and a MITM introduces a source code change, the developer will compile and run that malicious code without noticing. In turn, his hg commit can be modified, bypassing all reviews, because we usually don't hair-comb hg commits patches after submission. It will be tough to detect that before we ship the code to our users.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Component: Server Operations: Projects → Mercurial: hg.mozilla.org
Product: mozilla.org → Developer Services
QA Contact: mzeier → hwine
Whiteboard: [relsec]
CVE-2016-3630 makes this more important.

We should consider rolling out the SHA-2 x509 cert to hg.mozilla.org before we force people to https. The reason is a lot of people pin cert fingerprints in Mercurial (because Python TLS support is broken on many Python installs). If we force https then change the cert, a lot of people will pin the cert after the forced https then get caught up in the fingerprint change. This is avoidable churn. Doing the fingerprint change then forcing https is a better end-user experience.

IMO we should announce a date for the certificate change then force TLS connections at the same time or a week or two later.

FWIW, Mercurial 3.8 (to be released May 1) supports pinning multiple certificate fingerprints. So if we get that deployed in automation, we can announce the new fingerprint so downstream systems (like Firefox automation) can install it and they will transition to the new cert transparently.
(In reply to Gregory Szorc [:gps] from comment #50)
> We should consider rolling out the SHA-2 x509 cert to hg.mozilla.org before
> we force people to https.

Is there a bug filed for this? :-)
Depends on: 1277714
Flags: needinfo?(gps)
There is a bug somewhere to move away from SHA-1 certs on hg.mozilla.org. We'll want to deploy Mercurial 3.8 to automation first so we can configure the SHA-2 fingerprint before it is deployed so automation doesn't blow up when we switch certs.
Flags: needinfo?(gps)
(In reply to Gregory Szorc [:gps] from comment #52)
> There is a bug somewhere to move away from SHA-1 certs on hg.mozilla.org.

Do you have the number? Having all bugs added to a dependency tree makes it much easier to see state at a glance.

> We'll want to deploy Mercurial 3.8 to automation first so we can configure
> the SHA-2 fingerprint before it is deployed so automation doesn't blow up
> when we switch certs.

Indeed (this is why I added bug 1277714 as a dependency as part of comment 51).
Bug 1147548 tracks the certificate upgrade.
Depends on: 1147548
Now that we have a SHA-256 cert deployed to hg.mozilla.org and we've done the hard work of upgrading automation to use Mercurial 3.9, that makes this bug unblocked! I think we should proceed with making hg.mozilla.org TLS only by end of 2016.

I think the next step here is assessing who is still using port 80 and get them transitioned to secure connections.

atoll: could you please assist me with obtaining the load balancer logs (I don't think I have access)? I'd like to analyze who is connecting to port 80 to help identify high volume consumers by repo/URL and source IP. I'm especially interested in high volume consumers coming from IP addresses that Mozilla uses.
Flags: needinfo?(rsoderberg)
atoll: also, you may want to look at relative traffic levels for :80 versus :443 on the load balancer. If we'll be shifting a lot of traffic to TLS, the load balancer CPU may not take kindly to that. An historical comparison from several months back might be useful: I believe we've shifted 1+ TB/day off hg.mozilla.org to Amazon S3 and a CDN as part of the "clone bundles" work that started in bug 1041173.
We keep 14 days of logs for these VIPs as no special retention arrangement has been made otherwise. These numbers are short by approximately 100 requests per day:

09/Sep/2016 288078
10/Sep/2016 309327
11/Sep/2016 283886
12/Sep/2016 312045
13/Sep/2016 364613
14/Sep/2016 331701
15/Sep/2016 384095
16/Sep/2016 361511
17/Sep/2016 355823
18/Sep/2016 324166
19/Sep/2016 520418
20/Sep/2016 336958
21/Sep/2016 331948
22/Sep/2016 317656
23/Sep/2016 311451
24/Sep/2016 300618
25/Sep/2016 340177
26/Sep/2016 260148

This works out to an additional 5 requests/sec over the day, which is acceptable from a load balancer perspective. The user-agent breakdown of the above time range, which I can email you privately if you need the full un-truncated output, has the following top hitters:

  10891 mercurial/proto-1.0 (Mercurial 3.9)
  15091 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  16334 YisouSpider
  22319 mercurial/proto-1.0 (Mercurial 3.9.1)
  24103 Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.0; trendictionbot0.5.0; trendiction search; http://www.trendiction.de/bot; please let us know of any problems; web at trendiction.com) Gecko/20071127 Firefox/3.0.0.11
  33158 Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
  34371 Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
  37894 Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
  51274 ltx71 - (http://ltx71.com/)
  55847 Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; Win64; x64; Trident/4.0; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729
  70248 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
  77742 Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)
 104115 Mozilla/5.0 (compatible; MegaIndex.ru/2.0; +http://megaindex.com/crawler)
 192388 mercurial/proto-1.0 (Mercurial 3.9+60-cfa543f6c331)
 424680 mercurial/proto-1.0
 767538 -
 775358 Python-urllib/2.7
2976553 Twisted PageGetter

I checked a random urllib request:

"GET /releases/mozilla-release/raw-file/59f461d36b4a133f5045a628f08628f9c48919d2/toolkit/components/telemetry/Histograms.json HTTP/1.1"

And a couple of random "-" requests:

"GET /mozilla-central/raw-file/tip/testing/docker/desktop-test/dot-files/config/pip/pip.conf HTTP/1.1"
"GET /releases/mozilla-b2g44_v2_5/log/fddffdeab17035827778431387dcff9256e136a9 HTTP/1.0"

And a couple of random mercurial/proto requests:

"GET /build/mozharness?cmd=batch HTTP/1.1"
"GET /mozilla-central?cmd=capabilities HTTP/1.1"
Flags: needinfo?(rsoderberg)
Oh, and Twisted PageGetter:

"GET /releases/l10n/mozilla-aurora/ro/json-pushes?startID=941&endID=1141 HTTP/1.0"

Which is originating requests from nat-fw1.scl3, so look inward on that particular user-agent as a first-step.
(In reply to Gregory Szorc [:gps] from comment #55)

We have not upgraded to 3.9 everywhere, so this is still blocked. 2008 is blocking on hg/aws performance issues which have not been solved yet.
Depends on: 1305174
The EBS bug is a generic bug that will likely turn into a tracker. The bug we want to track is the one where our Windows automation is moving off the root EBS volume.
Depends on: 1305485
No longer depends on: 1305174
Axel: is that your l10n automation not using TLS? If so, please switch things to use https://hg.mozilla.org/.
Flags: needinfo?(l10n)
The twisted one is surely mine, filed bug 1305973 to track that.
Flags: needinfo?(l10n)
Depends on: 1312135
Depends on: 1312797
QA Contact: hwine → klibby
The automation for l10n.m.o is out of the way now.
(In reply to Amy Rich [:arr] [:arich] from comment #59)
> (In reply to Gregory Szorc [:gps] from comment #55)
> 
> We have not upgraded to 3.9 everywhere, so this is still blocked. 2008 is
> blocking on hg/aws performance issues which have not been solved yet.

:arr, :gps, is this still an issue? All of the dependent bugs are mark as resolved.
Attachment #333921 - Attachment is obsolete: true
Flags: needinfo?(gps)
I /think/ we've got the world upgraded to 3.9+. I can pull the logs on the hgweb machines to verify. needinfo me for that.

atoll: could you please do a repeat of comment #57 and see who our top remaining port 80 consumers are? The past 24-48 hours is preferred, as comment #63 indicates removal of port 80 traffic on 2017-01-16.
Flags: needinfo?(gps) → needinfo?(rsoderberg)
User agent analysis of the past 14 days.

  11954 curl
  12688 Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.0; trendictionbot0.5.0; trendiction search; http://www.trendiction.de/bot; please let us know of any problems; web at trendiction.com) Gecko/20071127 Firefox/3.0.0.11
  14702 ltx71 - (http://ltx71.com/)
  14720 Twitterbot/1.0
  14839 Mozilla/5.0 (compatible; SemrushBot/1.2~bl; +http://www.semrush.com/bot.html)
  16258 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  18291 Mozilla/5.0 (compatible; Linux x86_64; Mail.RU_Bot/2.0; +http://go.mail.ru/help/robots)
  20398 BoogleBot 2.0
  26244 mercurial/proto-1.0 (Mercurial 4.0.1)
  27270 Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.71 Safari/537.36
  30819 Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
  43521 Mozilla/5.0 (compatible; SemrushBot/1.1~bl; +http://www.semrush.com/bot.html)
  52918 mercurial/proto-1.0 (Mercurial 3.9.2)
  63535 Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)
  69021 Python-urllib/2.7
 119252 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
 149165 Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, help@moz.com)
 227889 mercurial/proto-1.0
 237981 mercurial/proto-1.0 (Mercurial 4.0.1+304-60a40b3827ce)
 239363 -
 254360 Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
3465184 Twisted PageGetter

(All below numbers are for one day or less, rather than 7 days)

Twisted is still there, asking for json-pushes data, but I don't think it's l10n, I think it's SeaMonkey builds:

11.223.245.63.in-addr.arpa domain name pointer sea-master1.community.scl3.mozilla.com.

User-Agent '-' means no user-agent header was sent. Lots of security hole scans, but also (of course) yet another way people are direct-fetching effective_tld_names.dat (seriously, how many different ways *are* there to access this file?!)

     64 GET /mozilla-central/raw-file/1ad9af3a2ab8/netwerk/dns/effective_tld_names.dat HTTP/1.0
    189 GET /mozilla-central/raw-file/1ad9af3a2ab8/netwerk/dns/effective_tld_names.dat HTTP/1.1

But also what appears to be some sort of thing that cares about pushlogs:

     30 GET /chatzilla/json-pushes?full=true&startdate=2017-01-08%2009%3A06%3A16&enddate=Now HTTP/1.0
     30 GET /chatzilla/log/c6ddca598a5398ca9a58b14d91bce82b8878bc7b HTTP/1.0
     30 GET /chatzilla/pushloghtml?startdate=2+days+ago&enddate=now HTTP/1.0
     30 GET /comm-central/pushloghtml?startdate=2+days+ago&enddate=now HTTP/1.0
     30 GET /dom-inspector/json-pushes?full=true&startdate=2017-01-08%2009%3A08%3A36&enddate=Now HTTP/1.0
     30 GET /dom-inspector/log/ca57c796b6228d020925c3ce299ff0056739a18b HTTP/1.0
     30 GET /dom-inspector/pushloghtml?startdate=2+days+ago&enddate=now HTTP/1.0
     30 GET /mozilla-central/pushloghtml?startdate=2+days+ago&enddate=now HTTP/1.0

I checked some random source IPs here and they're all offsite resources we don't control, so not much we can do there.

The mercurial/proto-1.0 one was interesting, because it says Callek repeatedly over and over, but it turns out that it's just SeaMonkey build hosts depending critically on /users/Callek_gmail.com/tools to build SeaMonkey. So if I exclude the entire community VLAN, "mercurial/proto-1.0" is seeing traffic from an EC2 instance (that we may or may not control), and from the SCL3 DC NAT, as the two top most hits, with a huge long tail with scatterings of 63.245 after that. Whatever's running out of SCL3 is probably related to these three repos:

    204 GET /qa/testcase-data/?cmd=listkeys HTTP/1.1
    111 GET /SeaMonkey/seamonkey-project-org?cmd=listkeys HTTP/1.1
     68 GET /webtools/telemetry-experiment-server?cmd=listkeys HTTP/1.1

And excluding community VLAN and SCL3 NAT, the top repos queried are:

    934 GET /build/mozharness?cmd=listkeys HTTP/1.1
    245 GET /mozilla-unified?cmd=known HTTP/1.1
    151 GET /mozilla-central/?cmd=known HTTP/1.1
    102 GET /users/hwine_mozilla.com/repo-sync-tools?cmd=capabilities HTTP/1.1

Oh, of course, I did my grep wrong, so repeating now with *both* mercurial user-agents (no-version and 4.0.1 version):

   4978 GET /mozilla-central?cmd=capabilities HTTP/1.1
    934 GET /build/mozharness?cmd=listkeys HTTP/1.1
    340 GET /build/buildbot-configs/?cmd=capabilities HTTP/1.1
    336 GET /build/tools?cmd=capabilities HTTP/1.1
    336 GET /build/buildbotcustom?cmd=capabilities HTTP/1.1
    336 GET /build/braindump/?cmd=capabilities HTTP/1.1
    164 GET /integration/mozilla-inbound?cmd=capabilities HTTP/1.1
    102 GET /users/hwine_mozilla.com/repo-sync-tools?cmd=capabilities HTTP/1.1

And it trails off from there. Onward to Python-urllib:

   2957 GET /build/tools/raw-file/default/buildfarm/maintenance/production-branches.json HTTP/1.1
     42 GET /releases/mozilla-release/raw-file/59f461d36b4a133f5045a628f08628f9c48919d2/toolkit/components/telemetry/Histograms.json HTTP/1.1
     14 GET /projects/kraken/archive/tip.zip HTTP/1.0

All those buildfarm requests are from the SCL3 firewall, so I can't tell you what the backend is, but that's a really arcane URL to care about, so I bet someone knows. Histograms.json is som random EC2 instance and half of the histograms it asks for are 404, so who even knows what's up there. The kraken requests are some offsite that I don't recognize.

Every curl request for a time window, because there's so few we can just direct-inspect them. I elided all but one of the '1 request' URLs, of which there were a handful. I kept this one for irony:

      1 GET /releases/mozilla-beta/filelog/c03e51cec3b5f6b8821687c8db8be309727d5470/devtools/client/netmonitor/test/html_curl-utils.html HTTP/1.1

And the rest are:

      2 GET / HTTP/1.1
      8 GET /mozilla-central/atom-log/aec6bf932306/security/nss/lib/ckfw/builtins/certdata.txt HTTP/1.1
      8 GET /releases/l10n/mozilla-aurora/id/atom-log HTTP/1.1
     17 GET /gaia-l10n/en-US/atom-log HTTP/1.1
     17 GET /mozilla-central/atom-log/default/netwerk/dns/effective_tld_names.dat HTTP/1.1
     70 GET /hgcustom/version-control-tools/atom-log HTTP/1.1
     93 GET /hgcustom/version-control-tools/rss-log HTTP/1.1
     95 GET /releases/mozilla-release/atom-log HTTP/1.1
     98 GET /releases/mozilla-release/rss-log HTTP/1.1
    110 GET /releases/mozilla-beta/atom-log HTTP/1.1
    142 GET /releases/mozilla-beta/rss-log HTTP/1.1
Flags: needinfo?(rsoderberg)
Callek: could you please triage the SeaMonkey and Callek references in comment #66 and file bugs blocking this one to transition things to https://hg.mozilla.org?
Flags: needinfo?(bugspam.Callek)
Callek: I suspect you may also know what's up with that /build/tools/raw-file/default/buildfarm/maintenance/production-branches.json request. There's a reference to that path in a few repos under hg.mo/build. But the URLs are https://. The only http:// reference to that URL I can find was changed way back in bug 960571. Perhaps there are some machines running with a really old config? Or more likely, I'm not grepping all the code I need to be (I wish we had a monorepo).
Note that while the l10n automation is using https now, it's still using mercurial 3.7.3 in a few places. Upgrading that will depend on bug 1323771, I think.
(needinfoing Kyle for ActiveData-ETL, Bob for autophone, Mark for telemetry-*)

I see a number of non-https references in the build-central set of repos, found using DXR:
https://dxr.mozilla.org/build-central/source/buildbotcustom/process/factory.py#1054
https://dxr.mozilla.org/build-central/source/braindump/buildbot-related/create-staging-master.pl#164
https://dxr.mozilla.org/build-central/source/braindump/update-related/create-channel-switch-mar.py#24
https://dxr.mozilla.org/build-central/source/mozharness/configs/developer_config.py#46
https://dxr.mozilla.org/build-central/source/mozharness/mozharness/mozilla/taskcluster_helper.py#60
https://dxr.mozilla.org/build-central/source/puppet/modules/cruncher/templates/reportor_credentials.ini.erb#9
https://dxr.mozilla.org/build-central/source/puppet/modules/slaveapi/templates/slaveapi.ini.erb#37
https://dxr.mozilla.org/build-central/source/slave_health/scripts/slave_health_cron.sh#20
https://dxr.mozilla.org/build-central/source/tools/buildfarm/maintenance/end_to_end_reconfig.sh#482
https://dxr.mozilla.org/build-central/source/tupperware/buildapi-app/Dockerfile#9

Similarly for mozilla-central:
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/developer_config.py#45
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/mozilla/taskcluster_helper.py#64
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/firefox_ui_tests/update_release.py#49
https://dxr.mozilla.org/mozilla-central/source/security/nss/tests/run_niscc.sh#262

And using GitHub code search:
https://github.com/klahnakoski/ActiveData-ETL/blob/master/resources/settings/mercurial_settings.json
https://github.com/klahnakoski/ActiveData-ETL/blob/35705ffe1ade8fbdf3188b12d873a0b4f1d833c8/mohg/hg_mozilla_org.py#L263
https://github.com/mozilla/autophone/blob/master/builds.py#L37
https://github.com/mozilla/telemetry-server/blob/5bfd3131426d89fa99cea55ab61a3ce7f32bf9c5/telemetry/revision_cache.py#L96
https://github.com/mozilla/telemetry-server/blob/5bfd3131426d89fa99cea55ab61a3ce7f32bf9c5/bin/get_histogram_tools.sh
https://github.com/mozilla/telemetry-tools/blob/19393e0eabc87aa0deb5a4c819ea301fabc8e439/telemetry/revision_cache.py#L96
https://github.com/mozilla/telemetry-tools/blob/19393e0eabc87aa0deb5a4c819ea301fabc8e439/scripts/get_histogram_tools.sh
https://github.com/mozilla-releng/funsize/blob/master/funsize/data/generate_update_platforms.sh
https://github.com/mozilla-l10n/mozilla-l10n-query/blob/86099e3e7d78c0fc71a7b4830a57a3ef20a66df5/app/scripts/update_sources.py
https://github.com/mozilla/elmo/blob/master/apps/life/fixtures/hg_mozilla_org.json


I wonder if it would also be useful to post to a handful of newsgroups even now, to encourage people to update local scripts/check their projects etc to at least reduce the number of cases that have to be followed up manually in this bug?
Flags: needinfo?(mreid)
Flags: needinfo?(klahnakoski)
Flags: needinfo?(bob)
(In reply to Gregory Szorc [:gps] from comment #67)
> Callek: could you please triage the SeaMonkey and Callek references in
> comment #66 and file bugs blocking this one to transition things to
> https://hg.mozilla.org?

Pushing this n-i to :ewong

I can help track down specifics (just not as time available as I'd like) but there are very likely a handful of these SeaMonkey related...

I would say all /users/Callek../tools are likely SeaMonkey infra.

The chatzilla/domi uses are also probably SeaMonkey, at least insofar as they are internal.

Some of the mozilla-* repos with pushlog and l10n repos are also likely SeaMonkey, I'm less confident we make up a significant portion of polls there.

... As for:

111 GET /SeaMonkey/seamonkey-project-org?cmd=listkeys HTTP/1.1

There is one primary consumer of that repo, and thats our internal webops-controlled server (that teh SeaMonkey team doesn't have direct access to). which pulls and rebuilds the website.

But iirc there was also a person on the SeaMonkey team doing near-identical work as the webops server to give us a staging site.

(In reply to Gregory Szorc [:gps] from comment #68)
> Callek: I suspect you may also know what's up with that
> /build/tools/raw-file/default/buildfarm/maintenance/production-branches.json
> request. There's a reference to that path in a few repos under hg.mo/build.
> But the URLs are https://. The only http:// reference to that URL I can find
> was changed way back in bug 960571. Perhaps there are some machines running
> with a really old config? Or more likely, I'm not grepping all the code I
> need to be (I wish we had a monorepo).

I would not be shocked at all if there was a lingering http:// reference in SeaMonkey automation to this file somewhere.
Flags: needinfo?(bugspam.Callek) → needinfo?(ewong)
I just cleaned up the telemetry-server code to use only https in
https://github.com/mozilla/telemetry-server/pull/159
Flags: needinfo?(mreid)
I'll handle it in bug 1332314
Depends on: 1332314
Flags: needinfo?(bob)
Thanks for all the follow-up work, Ed, Justin, Mark, and Bob!

I think we should pick a date for 301'ing the http endpoint and announce it. I'll send out an email when we pick a date.

arr: would you care to pick a date? I'll throw out February 7. Keep in mind I disappear for a while after February 10. (Although I don't think my presence is critical since this change is more about downstream consumers and not hg.mozilla.org itself.)
Flags: needinfo?(arich)
Question: Will we redirect http to https so that old links continue to work?
Yes, we will HTTP 301 the requests.
Will we prohibit HG clients, or will they follow the redirect?
The ActiveData ETL tries to use https first.
Flags: needinfo?(klahnakoski)
(In reply to Kyle Lahnakoski [:ekyle] from comment #78)
> The ActiveData ETL tries to use https first.

Can it be updated to *not* ever try http://?
(In reply to Richard Soderberg [:atoll] from comment #77)
> Will we prohibit HG clients, or will they follow the redirect?

hg clients will follow the redirect. The answer to whether they'll accept the x509 certificate is "it depends" with most answers being "probably."

If we're concerned about impact to automation, I suppose we could first add a redirect for some user agents, such as anything with "mozilla" or "bot" in it. Then we can chase the long tail of non-human, non-indexer clients.
I'm more inclined to suggest blocking it; there's an inherent risk in permitting clients to continue operating when http:// works transparently. If we redirect HG clients for now, can we also set a date at which time we'll block them instead?
:dividehex, could you please catch up with gps to coordinate any needed changes here? I want to make sure that we have a quick and easy roll back plan in case we find that this breaks things substantially.

:hwine: NI you to verify that all the releng infra will work after this cutover, to help pick a date, and for general IT coordination (if needed).
Flags: needinfo?(jwatkins)
Flags: needinfo?(hwine)
Flags: needinfo?(arich)
(In reply to Amy Rich [:arr] [:arich] from comment #82)
> :hwine: NI you to verify that all the releng infra will work after this
> cutover, to help pick a date, and for general IT coordination (if needed).

First look says we're okay -- I'll double check out of band (no news is good news).
  https://dxr.mozilla.org/build-central/search?q=http%3A%2F%2Fhg.mozilla.org&redirect=false

Happy to work on date -- probably should go through CAB anyway to get visibility.
Flags: needinfo?(hwine)
(In reply to Justin Wood (:Callek) from comment #71)
> (In reply to Gregory Szorc [:gps] from comment #67)
> > Callek: could you please triage the SeaMonkey and Callek references in
> > comment #66 and file bugs blocking this one to transition things to
> > https://hg.mozilla.org?
> 
> Pushing this n-i to :ewong

> I would not be shocked at all if there was a lingering http:// reference in
> SeaMonkey automation to this file somewhere.

Freaky...   As I think I had filed bug 1305911.  

I'll get that fixed.
Flags: needinfo?(ewong)
Depends on: 1305911
Depends on: 1332964
RelEng has a few items to check, detailed in bug 1332964.
RelEng has completed it's sanity check, and we see no blockers to moving forward with disabling HTTP. That said, the usual caveats about releases, merges, & chemspills apply, so lets coordinate on that date. :)
(In reply to Amy Rich [:arr] [:arich] from comment #82)
> :dividehex, could you please catch up with gps to coordinate any needed
> changes here? I want to make sure that we have a quick and easy roll back
> plan in case we find that this breaks things substantially.

I emailed :gps.  We'll make sure there is a rollback procedure.
Flags: needinfo?(jwatkins)
The cutover should not cause any service disruption, however it will be useful to have plenty of eyes around "just in case". That makes the cutover a good candidate for our Web morning (PT) window. Based on comment 74, the 2 upcoming dates are Feb 1 & Feb 8.

I'll file a CAB ticket for Feb 1 (as the CAB for that will be Jan 25). If someone hollers, we can push to Feb 8.

:dividehex has confirmed rollback plan is done (comment 87)

:ewong - looks like you're ready to land on bug 1305911 -- any concerns with a Feb 1 date?
forgot to ni ewong for comment 88
Flags: needinfo?(ewong)
:atoll reminded me that "best practice" for the introduction of a 301 redirect is to first do a dance with a 302 redirect, and tuning cache expiry times before returning them to 1 hour. This prevents "locking" a client into the HTTPS arrangement if we need to revert it. That makes sense in most situations.

For this case though:
 a) we know HTTPS works just fine with all modern clients, so there is no need to "roll back" a modern browser client
 b) if we rolled back (extremely low risk), it would be due to deficiencies in older implementations of the hg client
 c) afaik no older hg client ever attempted to update the URL on receipt of a 301, so there is no lock-in risk.

:gps - can you confirm or invalidate my logic please? I'm no expert on hg clients.
Flags: needinfo?(gps)
Mercurial doesn't retain or update settings if it encounters an HTTP 301. Non-Mercurial automated clients generally behave this way as well. Generally speaking, only browsers and HTTP caches/proxies will retain HTTP 301. These could interfere with Mercurial clients, however.

I think starting with an HTTP 302 before jumping to HTTP 301 is advised. Just in case. The difference between 301 and 302 for most clients in this case is semantic.

Also, we have an HSTS policy on hg.mozilla.org. So once a modern web browser follows an HTTP redirect and hits https://hg.mozilla.org, it shouldn't load http://hg.mozilla.org/ - even if the user types that in the address bar.
Flags: needinfo?(gps)
TIL - thanks! I'll update the plan to be:
 - switch to 302 initially, shortening the cache retention policy to 10min
 - 2 days later, switch to 301 (assuming no issues) and return cache retention policy to current values.

There will be a 2nd CAB ticket for the 2nd action, but that's even less visible non-event.
For those that want to follow the CAB process:

- 1st change: add 302 redirect HTTP->HTTPS: CHG0011207
- 2nd change: convert 302 redirect to 301 redirect (one week later): CHG0011215
Manual test case to be done by releng staff:
 - log into any buildbot master. e.g. buildbot-master87
 - activate the buildbot virtual env. e.g. source /builds/buildbot/*/bin/activate
 - cd /tmp && hg clone http://hg.mozilla.org/build/mozharness

Test passes if
 - Clone is successful
 - 2 warning messages are presented regarding "warning: connecting to <host> using legacy security technology (TLS 1.0)"
    - one for bundle from S3, one from hg.mozilla.org
(In reply to Hal Wine [:hwine] (use NI) from comment #88)
> The cutover should not cause any service disruption, however it will be
> useful to have plenty of eyes around "just in case". That makes the cutover
> a good candidate for our Web morning (PT) window. Based on comment 74, the 2
> upcoming dates are Feb 1 & Feb 8.
> 
> I'll file a CAB ticket for Feb 1 (as the CAB for that will be Jan 25). If
> someone hollers, we can push to Feb 8.
> 
> :dividehex has confirmed rollback plan is done (comment 87)
> 
> :ewong - looks like you're ready to land on bug 1305911 -- any concerns with
> a Feb 1 date?

No concerns,  thanks for the heads up Hal!

Edmund
Flags: needinfo?(ewong)
(:gcox noticed the 301/302 thing, actually, I just wrote a longer reply about it :)
Using a 302 will require a new trafficscript rule in zeus; all of the existing httpd redirects use http.changeSite() or the http redirect action, both of which use a 301. http.redirect() doesn't preserve any path or query string, it just expects a URL as an argument, so will need to cobble something together. :atoll, any thoughts?
Flags: needinfo?(rsoderberg)
Decision reached and approved by :gps via email: we will go with a 301 redirect HTTP->HTTPS in the load balancer.
Thanks all!
Flags: needinfo?(rsoderberg)
Blocks: 1324148
Blocks: 1335626
No longer blocks: 1324148
Http-> https redirect has been enabled on zlb.

└─▪ curl -I http://hg.mozilla.org
HTTP/1.1 301 Moved Permanently
Content-Type: text/html
Date: Wed, 01 Feb 2017 16:00:31 GMT
Location: https://hg.mozilla.org/
Connection: Keep-Alive
Content-Length: 0
Status: REOPENED → RESOLVED
Closed: 15 years ago7 years ago
Resolution: --- → FIXED
See Also: → 1336275
See Also: → 1336299
See Also: → 1336300
See Also: → 1336302
See Also: → 1336311
See Also: → 1336359
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: