Closed Bug 1229640 Opened 9 years ago Closed 7 years ago

[tracker] Transfer thunderbird.net to the Thunderbird project

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rkent, Assigned: Atoll)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2290] )

Given Mitchell's recent post (https://groups.google.com/forum/#!msg/mozilla.governance/kAyVlhfEcXg/Eqyx1X62BQAJ) concerning the separation of Thunderbird infrastructure from Mozilla, the domain thunderbird.net is needed for use by the Thunderbird team as the root domain for thunderbird use. That domain should not be used for any shared Mozilla purpose.

As far as I have been able to tell, all in-product links have used mozillamessaging.com as the domain, with redirects to servers defined using thunderbird.net Please change the autoconfig servers to use instead some other non-thunderbird domain such as mozillamessaging.com instead of thunderbird.net, and free thunderbird.net for use independently of Mozilla. autoconfig is currently used by the Gaia email app.
I think this component is about the abandoned effort to create a Django app to create ISPDB database entries.  Although looking at the other bugs, it also seems like a ton of bugs that should have be been in the ISPDB database entries component are mis-filed in here too.

Which is to say I don't think this bug will catch the attention of anyone in here.
Andrew, can you suggest an appropriate component for this?
Unfortunately, I have no idea.  I usually cargo cult existing bugs when I need to request server infra stuff to happen and the component descriptions aren't clear enough.  I'd :email gozer or that other person you'd contacted about ispdb server infra directly.
Let me make a stab at the component. Usually I get it wrong, but then I get corrected and progress is made.
Assignee: nobody → server-ops-webops
Component: ISPDB Server → WebOps: Other
Product: Webtools → Infrastructure & Operations
QA Contact: smani
Version: Trunk → unspecified
Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/2290]
:gozer, I'm not sure any of us are qualified to understand the next-steps here, so as a tentative next-step I'm asking you how to proceed - or who can help us next.
Flags: needinfo?(gozer)
Okay, I've searched the old memory banks and I can give you the what's up here.

A long long time ago, in a galaxy far far away, there was mozillamessaging, so I made sure that all
in-product links baked into thunderbird pointed at live.mozillamessaging.com, see:

http://mxr.mozilla.org/comm-central/search?string=mozillamessaging.com&tree=comm-central

At some point later, mozillamessaging was no more, and we wanted to get rid of that domain, so I grabbed thunderbird.net, moved all services there, put in place redirects from mozillamessaging.com to there, then filed Thunderbird bug 1169782 to get rid of the references to mozillamessaging.com in favor of the more generic and nice looking thunderbird.net

As one can see, none of that was done:

http://mxr.mozilla.org/comm-central/search?string=thunderbird.net&tree=comm-central

So the current situation is that all TB installations in the wild still talk to mozillamessaging.com, not thunderbird.net, however, that redirects to the thunderbird.net flavors of the backend services.

You can't kill any of that without breaking thunderbird, unfortunately.

Hope that makes more sense now?
Flags: needinfo?(gozer)
(In reply to Philippe M. Chiasson (:gozer) from comment #6)
> So the current situation is that all TB installations in the wild still talk
> to mozillamessaging.com, not thunderbird.net, however, that redirects to the
> thunderbird.net flavors of the backend services.

So, if I read this right, to satisfy comment 0, we would need to move services *back* to mozillamessaging.com, deprecating/removing the redirects to thunderbird.net in the process, and then everything would be good?
(In reply to Richard Soderberg [:atoll] from comment #7)
> So, if I read this right, to satisfy comment 0, we would need to move
> services *back* to mozillamessaging.com, deprecating/removing the redirects
> to thunderbird.net in the process, and then everything would be good?

This is consistent with my understanding of the situation.

Restating the use-cases as I understand them (as someone working on the Firefox OS Gaia Mail app):

1) The ISPDB Autoconfig mechanism used by Thunderbird, Firefox OS Gaia Mail, and some other mail app consumers needs to be hosted somewhere:
1a) If it was ever changed, we'd still want the word "mozilla" in the name so that consumers of the Firefox OS mail app and others understand why their devices are connecting to a domain that's not their mail server.  The "thunderbird" name/brand has nothing to do with the Firefox OS mail app and would be confusing, which is why we avoided making the transition in the gaia mail app.
1b) For legacy reasons, this domain is already (live.)mozillamessaging.com in many, many shipped mail clients, so changing it to anything else is not a clear win in terms of being able to retire things.  Changing it to something more explicitly more mozilla like "mailconfig.mozilla.org" would arguably be an improvement but would also need cookie-auditing, etc.

2) The Thunderbird project wants to be able to use thunderbird.net without having to hassle or depend on MoCo resources.  (I'm not sure how Mozilla Community IT intersects?)  Presumably, if Thunderbird moves entirely away from the Mozilla umbrella and trademarks were appropriately transferred, root domain ownership might also transfer elsewhere, but for now it seems like MoCo IT would end up just pointing the nameservers for thunderbird.net elsewhere or to a self-serve infrastructure once this bug is resolved consistent comment 7?  (I'm just stating these things for clarity; presumably one of the outcomes of whatever's happening with MoFo and Thunderbird discussions would be clear delineations of ownership and control of the trademarks and the domains.)
Assignee: server-ops-webops → nmaul
What would it take to get some progress made on this?
If I could wish for one thing, it would be for someone who used to code on the app to help us adjust it to the new domain (and verify that it works, once we've done so). None of us in Webops have ever worked on this app, and it's just been silently left untouched for years. So if you can locate anyone who's actually *coded* on it, or *set it up*, or any of that, and inspire them to spend an hour or two of time in a meeting with us to do this work, that's what would get it done the fastest.
:gozer is the only expert that I know is available. Is there any code involved? I thought it was all static pages.
If it helps to stand things up fresh, that's something I can probably help doing.

As I understand it, the ISPDB server has 2 main things going on:
1) Dynamic DNS MX lookups.  This is presumably some type of minimal magic CGI script.
2) The static ISPDB xml files that are also passed through a script to support the v1 spec and allow a single file like gmail.com to be symlinked to from googlemail.com/etc. defined in the XML file so that the lookups of those can happen statically without any CGI.

Especially if pointed at some existing modern mozilla infra that does things like this (simple CGI, simple static build scripts either as part of deploy or initial check-in as a 'binary' byproduct), I can help modernize things.  (Like a docker image or a chef/whatever script, etc.)

We could maybe stand that up then with some test domain, test Thunderbird and gaia email against it briefly, then declare victory, swap the domain over, then burn down the old infrastructure.
(This would let us get rid of subversion as well.)
(In reply to Kent James (:rkent) from comment #11)
> :gozer is the only expert that I know is available. Is there any code
> involved? I thought it was all static pages.

It's all static pages? Interesting!

(In reply to Andrew Sutherland [:asuth] from comment #12)
> If it helps to stand things up fresh, that's something I can probably help
> doing.
> 
> As I understand it, the ISPDB server has 2 main things going on:
> 1) Dynamic DNS MX lookups.  This is presumably some type of minimal magic
> CGI script.

As in, it's serving DNS requests? Or.. a REST-like API?

> 2) The static ISPDB xml files that are also passed through a script to
> support the v1 spec and allow a single file like gmail.com to be symlinked
> to from googlemail.com/etc. defined in the XML file so that the lookups of
> those can happen statically without any CGI.

Okay, definitely a REST-ish API.

> Especially if pointed at some existing modern mozilla infra that does things
> like this (simple CGI, simple static build scripts either as part of deploy
> or initial check-in as a 'binary' byproduct), I can help modernize things. 
> (Like a docker image or a chef/whatever script, etc.)

We do actually have specifically this. We auto-update from a Git repository, 'you' deploy code at will and make things awesome.

> We could maybe stand that up then with some test domain, test Thunderbird
> and gaia email against it briefly, then declare victory, swap the domain
> over, then burn down the old infrastructure.

That is certainly an appealing choice. Ping me in a couple days and we'll see what we can work out here.
Blocks: 1169782
(In reply to Richard Soderberg [:atoll] from comment #14)
> > As I understand it, the ISPDB server has 2 main things going on:
> > 1) Dynamic DNS MX lookups.  This is presumably some type of minimal magic
> > CGI script.
> 
> As in, it's serving DNS requests? Or.. a REST-like API?

Hitting the following with a simple GET: https://live.mozillamessaging.com/dns/mx/mozilla.com
nets you the following:
"""
aspmx.l.google.com
alt1.aspmx.l.google.com
alt2.aspmx.l.google.com
aspmx2.googlemail.com
aspmx3.googlemail.com
"""

I think this might just be "dig +short -t mx mozilla.com" with the mx priorities stripped off the front and the terminal domain period stripped.  Because when I run that locally, I get:
"""
10 aspmx2.googlemail.com.
5 alt2.aspmx.l.google.com.
10 aspmx3.googlemail.com.
5 alt1.aspmx.l.google.com.
1 aspmx.l.google.com.
"""

> > 2) The static ISPDB xml files that are also passed through a script to
> > support the v1 spec and allow a single file like gmail.com to be symlinked
> > to from googlemail.com/etc. defined in the XML file so that the lookups of
> > those can happen statically without any CGI.
> 
> Okay, definitely a REST-ish API.

The actual accesses by the client are all simple GETs.

The flow for thing is:
1) I check changes into http://viewvc.svn.mozilla.org/vc/mozillamessaging.com/sites/autoconfig.mozillamessaging.com/trunk/
2) this script here runs in 30 minutes or so, I think there's a cron job somewhere: http://svn.mozilla.org/mozillamessaging.com/sites/ispdb.mozillamessaging.com/trunk/tools/convert.py
3) It spews its output at https://live.mozillamessaging.com/autoconfig/v1.1/ and https://live.mozillamessaging.com/autoconfig/v1.0/

Note that this whole bug is mainly just about the fact that even though both Thunderbird and Gaia Email try and just use live.mozillamessaging.com, the redirect bounces them to autoconfig.thunderbird.net.

> That is certainly an appealing choice. Ping me in a couple days and we'll
> see what we can work out here.

:rkent, feel free to ping me to ping further, or ping here, etc.
Although I recognize that getting off of subversion and modernizing this is a worthwhile goal, isn't the question of freeing thunderbird.net just a matter of redoing some redirects to a different server name? It is a purely intermediary identifier that could be anything you wanted it to be, since it is not used publicly.
See Also: → 1166192
Just as a heads up:

  <VirtualHost *:80 *:81>
      ServerName broker-live.mozillamessaging.com
      Redirect permanent / https://broker.thunderbird.net/
  </VirtualHost>
  
  <VirtualHost *:80 *:81>
      ServerName autoconfig-live.mozillamessaging.com
      Redirect permanent / https://autoconfig.thunderbird.net/
  </VirtualHost>

Once Thunderbird receives these redirects, it will permanently cache them until the user resets various caches or starts a new profile. So we can point them elsewhere, but all Thunderbird instances that previously visited this redirect *one or more times* will attempt to contact Thunderbird.net directly, rather than honoring any change we make today.

Is that acceptable?
broker-live.mozillamessaging.com contains information that is specific to Thunderbird (where we are suggesting an email provider based on business relationships), so this link should remain on thunderbird.net We should in-product change the broker-live.mozillamessaging.com to directly reference thunderbird.net  It sounds like you should continue to maintain the thunderbird.net content for that. Can we agree on that and get rid of broker-live.mozillamessaging.com in-product for Thunderbird 45?

Where does the content for https://broker.thunderbird.net/ come from?

For https://autoconfig.thunderbird.net/ this is the general configuration information that should be moved to a new domain, presumably mozillamessaging.com

Though one new wrinkle is that the original motitivation was that we shared this with FirefoxOS email app. Not sure if the FirefoxOS status change should cause us to rethink this? Let's NI asuth on this question.

Assuming that we go forward though, isn't it possible to add a new redirect in thunderbird.net that sends https://autoconfig.thunderbird.net/ to whatever we agree for the autoconfig database? Then thunderbird.net would need to maintain that redirect for an extended period of time.

I suppose the simple answer to "Is that acceptable?" is No.
Flags: needinfo?(bugmail)
Permanent is a misnomer in this case.  The 301 response is cached and honored according to the Expires directive.  Once the response that was a 301 expires, we'll talk to the server again.

When I hit https://live.mozillamessaging.com/autoconfig/v1.1/mozilla.com just now I got:
Expires: Fri, 12 Feb 2016 02:25:06 GMT

And when I hit https://live.mozillamessaging.com/dns/mx/mozilla.com I got:
Expires: Fri, 12 Feb 2016 02:28:18 GMT

I retrieved these via "wget -S"
Depends on: 1247721
(In reply to Kent James (:rkent) from comment #18)
> broker-live.mozillamessaging.com contains information that is specific to
> Thunderbird (where we are suggesting an email provider based on business
> relationships), so this link should remain on thunderbird.net We should
> in-product change the broker-live.mozillamessaging.com to directly reference
> thunderbird.net  It sounds like you should continue to maintain the
> thunderbird.net content for that. Can we agree on that and get rid of
> broker-live.mozillamessaging.com in-product for Thunderbird 45?

Tentative OK here, with the usual caveat about "if we need developer resources, we might reach out".

> Where does the content for https://broker.thunderbird.net/ come from?

      WSGIScriptAlias /provider /data/www/broker.thunderbird.net/broker/broker.wsgi process-group=broker-app application-group=broker-app

[remote "origin"]
	fetch = +refs/heads/*:refs/remotes/origin/*
	url = git@github.com:mozilla/broker.git

commit 22b81424d335afea65848712c86526a8b268801b
Author: Andrei Hajdukewycz <sancus@off.net>
Date:   Mon May 12 06:23:56 2014 -0400

    remove hover

> For https://autoconfig.thunderbird.net/ this is the general configuration
> information that should be moved to a new domain, presumably
> mozillamessaging.com

Sure - but as noted, we can't "move" it for any existing Thunderbird profile on any existing release. We'd have to ship an update that changes the URL to one *without* a permanent redirect, and that won't help the users that don't update.

> Though one new wrinkle is that the original motitivation was that we shared
> this with FirefoxOS email app. Not sure if the FirefoxOS status change
> should cause us to rethink this? Let's NI asuth on this question.

Yes, we should.

> Assuming that we go forward though, isn't it possible to add a new redirect
> in thunderbird.net that sends https://autoconfig.thunderbird.net/ to
> whatever we agree for the autoconfig database? Then thunderbird.net would
> need to maintain that redirect for an extended period of time.

Yep, this is fine, as long as it's not the original redirect destination (autoconfig-live.mome).

> I suppose the simple answer to "Is that acceptable?" is No.

Not a problem, this is how this stuff goes with ancient sites.
(In reply to Andrew Sutherland [:asuth] from comment #19)
> Permanent is a misnomer in this case.  The 301 response is cached and
> honored according to the Expires directive.  Once the response that was a
> 301 expires, we'll talk to the server again.

Nope, sorry. I wish that was the case. But it's not:

https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpResponseHead.cpp#446

    // These responses can be cached indefinitely.
    if ((mStatus == 300) || (mStatus == 410) || nsHttp::IsPermanentRedirect(mStatus)) {
        LOG(("nsHttpResponseHead::ComputeFreshnessLifetime [this = %p] "
             "Assign an infinite heuristic lifetime\n", this));
        *result = uint32_t(-1);
        return NS_OK;
    }

https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpChannel.cpp#4731

    if (nsHttp::IsPermanentRedirect(mRedirectType))
        redirectFlags = nsIChannelEventSink::REDIRECT_PERMANENT;

https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttp.cpp#315

bool
nsHttp::IsPermanentRedirect(uint32_t httpStatus)
{
  return httpStatus == 301 || httpStatus == 308;
}
I could absolutely be wrong, I'm mainly going off of :bzbarsky's reply to http://stackoverflow.com/questions/6980192/firefox-5-caching-301-redirects which seems like a reasonable thing.

http://httpd.apache.org/docs/2.2/mod/mod_alias.html#redirect says "permanent" means 301, not 300.  So it seems like our freshness won't be boosted to infinity.  Also the responses I got from my wget's are actually 302's, not 301's, suggesting there's some other override in place making things even more fine for us.  Pasting below, but only the redirect stages, not the final results:


$ wget -S https://live.mozillamessaging.com/autoconfig/v1.1/mozilla.com
--2016-02-11 15:25:06--  https://live.mozillamessaging.com/autoconfig/v1.1/mozilla.com
Resolving live.mozillamessaging.com (live.mozillamessaging.com)... 63.245.213.24
Connecting to live.mozillamessaging.com (live.mozillamessaging.com)|63.245.213.24|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 302 Found
  Server: Apache
  X-Backend-Server: pp-web03
  Cache-Control: max-age=21600
  Content-Type: text/html; charset=iso-8859-1
  Date: Thu, 11 Feb 2016 20:25:06 GMT
  Location: https://autoconfig.thunderbird.net/v1.1/mozilla.com
  Keep-Alive: timeout=20, max=998
  Expires: Fri, 12 Feb 2016 02:25:06 GMT
  Connection: Keep-Alive
  X-Cache-Info: caching
  Content-Length: 235


$ wget -S https://live.mozillamessaging.com/dns/mx/mozilla.com
--2016-02-11 15:28:17--  https://live.mozillamessaging.com/dns/mx/mozilla.com
Resolving live.mozillamessaging.com (live.mozillamessaging.com)... 63.245.213.24
Connecting to live.mozillamessaging.com (live.mozillamessaging.com)|63.245.213.24|:443... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 302 Found
  Server: Apache
  X-Backend-Server: pp-web04
  Cache-Control: max-age=21600
  Content-Type: text/html; charset=iso-8859-1
  Date: Thu, 11 Feb 2016 20:28:18 GMT
  Location: https://mx.thunderbird.net/dns/mx/mozilla.com
  Keep-Alive: timeout=20, max=996
  Expires: Fri, 12 Feb 2016 02:28:18 GMT
  Connection: Keep-Alive
  X-Cache-Info: caching
  Content-Length: 229
Ugh, sorry, I missed the "|| nsHttp::IsPermanentRedirect(mStatus)" part in Thunderbird because of line-wrapping; freshness will be updated. That's super dumb.  But yeah, we're actually serving 302's for the things that matter.
(In reply to Kent James (:rkent) from comment #18)
> Though one new wrinkle is that the original motitivation was that we shared
> this with FirefoxOS email app. Not sure if the FirefoxOS status change
> should cause us to rethink this? Let's NI asuth on this question.

Yeah, let's rethink.  And not just because of the redirect dumbness.

== Short Answer / Proposal:

Let's go back to the pre-Gaia plan.  Let's:
- have Thunderbird switch its URLs directly over to thunderbird.net
- leave these redirects in place for Firefox OS for now since it means less server changes.
- have Thunderbird eventually stand up its own instance of the ISPDB as part of the cut-over.
- have any successor/continued Firefox OS mail that is MoCo supported (not a given) stand up its own ISPDB solution which may or may not resemble the current setup.  (But will still be a collaborative effort backed by the same canonical repository/database.)

== Long Answer / Rationale:

Although many things are confusingly up in the air with Firefox OS, one thing we do know is that no more commercial phones will be shipped.  This is nice because the big problem was we could never update the email app so the URLs we used were frozen in stone.  (Also, there were a lot of potentially confusing branding things going on where using a Thunderbird domain might cause privacy-conscious users to reasonably be confused and concerned.)  Note that devices are still supported for some time yet, so live.mozillamessaging.com does want ts ISPDB-ish URLs to continue to work with or without redirects until then.  And the MX lookup is really the most important thing.

Going forward, whatever form Gaia Mail and its friends take on will be easily updated and may indeed have some hosted form where it has its own domain and the ISPDB can just have its own checkout on that domain and it can be JSON or whatever.

My rationale is:
- Thunderbird disentangling.  Thunderbird should be accessing domains owned by Thunderbird.
- This lets Mozilla hypothetically pull the plug on the mozillamessaging.com domain once Firefox OS is no longer supported maybe.
- Some other consumers like Gnome Evolution already replicated the DB onto their own infra.
- Any consumers that don't do that can change their URLs to the Thunderbird infra when appropriate.  Presumably we'd announce that the thunderbird.net is the more guaranteed to live-on endpoint so that lurkers have time to make changes to their apps, etc.
Flags: needinfo?(bugmail)
Oh, hooray, we're not that dumb.  Starting from https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/nsHttpResponseHead.cpp#427 we'll try the "max-age" header then the "expires" header.  We only set things to infinity if the server is super dumb.  Ours is not.  So 4 hours expiration is the word (or whatever the date math is.)
(In reply to Andrew Sutherland [:asuth] from comment #25)
> Oh, hooray, we're not that dumb.  Starting from
> https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/
> nsHttpResponseHead.cpp#427 we'll try the "max-age" header then the "expires"
> header.  We only set things to infinity if the server is super dumb.  Ours
> is not.  So 4 hours expiration is the word (or whatever the date math is.)

Yesssss. Thank you so much for finding this. Is that "all the way back to Firefox 5" code?
(In reply to Richard Soderberg [:atoll] from comment #26)
> Yesssss. Thank you so much for finding this. Is that "all the way back to
> Firefox 5" code?

Yeah, the logic has been doing both those checks first since the initial hg import in 2007, rev 1.  That corresponds to Firefox 2.0.x it looks like.  Presumably it was that way in CVS for a while too.
Interesting. So the net result is we are going to WONTFIX this bug, and do changes in bug 1169782 to remove the mozillamessaging links from Thunderbird, right?

Instead, may I suggest that we morph this bug into a request to provide thunderbird.net versions of anything we are missing that is currently mozillamessaging.com? Or should we file a new bug for that? There are a number of mozillamessaging.com links in current Thunderbird that just need to be either hardlinked, or an equivalent thunderbird.net redirect setup.
I should also indicate I don't really have a preference what we do.  Like, not at all.  The plan before made sense when 1) Firefox OS was shipping devices and lively and 2) Thunderbird was presumed to still get to use MoCo resources and to be living under the Mozilla umbrella.  Which is why I was for keeping live.mozillamessaging.com alive as a shared resource.  Now, from my perspective, it just has to stay alive until Firefox OS is no longer supported, and Thunderbird should do whatever is best for Thunderbird.  I'm still happy to try and provide assistance in whatever makes sense, since I think the ISPDB is a good community/web resource.  It just now only really has to be 1 thing to 1 client.
How about this plan, then:

1. Someday, help the Thunderbird project set up non-IT instances of autoconfig/broker .thunderbird.net.
2. Then, tell the IT instances to accept autoconfig/broker .mozillamessaging.com traffic.
3. Finally, point the .thunderbird.net domains to the Thunderbird non-IT instances.

That seems like it addresses every point above - whenever TB project is ready to host the production sites, it's very easy to implement the split - and until then, there's no prep work required.

Does that meet everyone's needs?
(Buried in #2 is "remove the redirects", which I forgot to state explicitly.)
Blocks: 552333
Assignee: nmaul → server-ops-webops
Assignee: server-ops-webops → rsoderberg
Assignee: rsoderberg → server-ops-webops
(In reply to Richard Soderberg [:atoll] from comment #30)
> How about this plan, then:
> 
> 1. Someday, help the Thunderbird project set up non-IT instances of
> autoconfig/broker .thunderbird.net.
> 2. Then, tell the IT instances to accept autoconfig/broker
> .mozillamessaging.com traffic.
> 3. Finally, point the .thunderbird.net domains to the Thunderbird non-IT
> instances.
> 
> That seems like it addresses every point above - whenever TB project is
> ready to host the production sites, it's very easy to implement the split -
> and until then, there's no prep work required.
> 
> Does that meet everyone's needs?

hey all, I'm going to be the primary ops contact on the Thunderbird side for this. I want to get these services duplicated and taken off your hands as soon as possible.

I don't know precisely what kind of hardware is running this stuff, but it seems like it's all pretty simple apache services, so I'm guessing something like a 4 or 6 core Linode VPS would be more than sufficient.

I think all I will really need is a bit of guidance on making sure I set up everything perfectly identically before we start changing DNS, since all these services are based on htaccess/other stuff in the public SVN, right?
I've scheduled the meeting to get work/time allocation assigned to this task. Will report back.
Assignee: server-ops-webops → rsoderberg
To hopefully speed this up, I believe there's only a couple things I still need.

https://broker-live.sancus.ca and https://live.sancus.ca/thunderbird/start/?locale=en-GB&version=45.7.1&os=WINNT&buildid=20170206074256 are working now.

I don't have the script/code for mx.thunderbird.net, doesn't seem to be in SVN anywhere I can see, and while I can probably write a replacement, I'd rather have the actual file from the existing servers to 100% ensure that I don't miss anything weird or silly.

The script https://svn.mozilla.org/mozillamessaging.com/sites/ispdb.mozillamessaging.com/trunk/tools/convert.py needed to produce the v1.0 config files for autoconfig doesn't seem to run with modern python2.7 and the newest lxml. I can probably fix it, but it would be easier to just get that information and the cron script that runs it from the existing servers.

In any case, all of these things are implemented with ServerName *.thunderbird.net aliased to *.sancus.ca for testing, so as soon as we get mx.thunderbird.net and autoconfig.thunderbird.net sorted, we can do a DNS switchover.
On IRC, you said you found the 'live-mx' repository. I'll still provide you code snapshots from all three servers. There's also this cron:

/etc/cron.d/updates:*/30  * * * *   root cd /data/static; svn-up.sh src/autoconfig.thunderbird.net/autoconfig; svn-up.sh src/autoconfig.thunderbird.net/tools; src/autoconfig.thunderbird.net/update
After testing with https://github.com/Sancus/thundernest-ansible/blob/master/testsites.py I'm confident that these services are outputting the same stuff, with one exception, which is that the DNS MX results for mozilla.com are all capitalized when output by the perl script, but lowercase with my python script. Fortunately, this isn't important.

mx.sancus.ca, live.sancus.ca match, and broker-live.sancus.ca all match.

The only remaining issue is autoconfig seems to have a huge list of mime types or something to set a content-type and content-language for each of the xml files. I'm not sure if this is actually important at all, but if I can get that, everything should be the same.
The content-type issue is resolved, the following servers:
https://mx.sancus.ca/dns/mx/{domain}
https://live.sancus.ca/{redirects}
https://broker-live.sancus.ca/provider/list
https://autoconfig.sancus.ca/v1.1/

output the same content as Mozilla servers. I've tested autoconfig and broker-live with Thunderbird, it works with both as expected, able to look up email servers and talk to the gandi mail account suggestion service. The script mentioned in the previous comment confirms that web server output is th e same and most of the headers are as well aside from some caching and apache differences.

I believe next steps are that we:
1) Copy the DNS configurations for thunderbird.net and mozillamessaging.com into the DNS we are using(Cloudflare).
2) Change the dns servers for the domains with the registrar.
3) Run the lets-encrypt script on our side to make SSL certs for those domains.
4) Change the server IPs in DNS to make the final switchover.

Registrar transfer of the thunderbird.net domain can happen later and is not important to this process. Also, it seems good to keep the Mozilla servers running for a few weeks, as in case of disaster we can always change the DNS back very quickly.
Summary: Remove thunderbird.net domain from use in email autoconfiguration → [tracker] Remove thunderbird.net domain from use in email autoconfiguration
Depends on: 1341780
Depends on: 1341783
The new DNS servers for thunderbird.net will be kia.ns.cloudflare.com and nick.ns.cloudflare.com when ready.

After discussion on IRC, there is no particular need for us to take over DNS control of mozillamessaging.com. There are no plans or reason to ever create any new subdomains on it or to use the domain in the future, and all we need are a few CNAMEs and redirects, which are covered in Bug 1341783.
I see no discussion of continuous testing of the availability of the new servers. I do not think we should consider switching to them until there is some availability notification configured and operational. I have used site24x7.com for this service in the past, but there are other similar providers. Comment 37 "output the same content as Mozilla servers" is the sort of thing that needs to be built into a continuous test, not just done with manual one-off testing.

In addition to the continuous testing, we also need procedures of what to do in the case of failure, along with more than one person capable of responding to the issues. One of the project goals is to build capacity to manage this sort of thing.
Depends on: 1341798
(In reply to Kent James (:rkent) from comment #39)
> I see no discussion of continuous testing of the availability of the new
> servers. I do not think we should consider switching to them until there is
> some availability notification configured and operational. I have used
> site24x7.com for this service in the past, but there are other similar
> providers.

I know that Cloudflare is involved for the DNS here but I don't know if it's being used as a caching / uptime layer or not. :sancus, can you discuss your architecture with :rkent and see if it passes muster here?

> Comment 37 "output the same content as Mozilla servers" is the
> sort of thing that needs to be built into a continuous test, not just done
> with manual one-off testing.

Agreed in general. IT does not do any such testing today, so I don't consider this essential for the migration. Please file this as a separate bug for the Thunderbird team - they can work on it in parallel, and ship it regardless of whether we've migrated or not.

> In addition to the continuous testing, we also need procedures of what to do
> in the case of failure, along with more than one person capable of
> responding to the issues. One of the project goals is to build capacity to
> manage this sort of thing.

We can delay handing this off for a while. When is the soonest that the project could finish building the necessary capacity? The clusters hosting these apps are being EOL'd by us, and it's essential that we migrate service.
Flags: needinfo?(sancus)
Flags: needinfo?(rkent)
I am setting up the requested monitoring and documentation. Will respond in the thread when done, should be done by tomorrow. I don't think this should delay us significantly.
Flags: needinfo?(sancus)
OK, https://github.com/Sancus/thundernest-ansible/blob/master/README.md now has complete setup instructions. I've destroyed the whole setup and rebuilt it following those instructions twice now, so they should be pretty solid for anyone who has some basic technical knowledge.

Also, we have monitoring setup with the site24x7 api: https://github.com/Sancus/thundernest-ansible/blob/master/monitoring/health_tests.py

Should be good to move forward now.
Andrei and I had extensive talks about this today. The switchover needs to occur in two steps.

First, we need to switch the thunderbird.net domain administration to be managed by Thunderbird, but initially with the same information that is included in the existing name servers.

Second, we need to add servers to support the various uses of thunderbird.net and mozillamessaging.com. We cannot really test those instructions adequately as there are embedded domain names in the instructions, and ideally those should point to thunderbird.net (but that requires that we can control thunderbird.net to even do the tests).

There is also the parallel issue of switching the servers pointed to in the Thunderbird product for the upcoming release 52 to use thunderbird.net rather than mozillamessaging.com

So for now, let's focus on the transfer of DNS servers from mozilla servers to a provider controlled by Thunderbird.

The main issue is evaluating whether the DNS server on a free cloudflare account is a reasonable choice for hosting the thunderbird.net DNS. If I understand comment 25 correctly, DNS expiration is effectively 4 hours, so for users who startup daily, a typical DNS usage would be one hit per day for any URLs that are routinely hit once per startup.

So, which DNS entries are likely to be called at startup? Those would include the start page, blocklist, and any update URLs. The configuration URLs are only going to be called on initial install, so their volumes are 2-3 orders of magnitude lower.

Looking at usage, live.mozillamessaging.com is used in the start page URL, and is slated for conversion to thunderbird.net in bug 1341783  See pref("mailnews.start_page.url","https://live.mozillamessaging.com/%APP%/start?locale=%LOCALE%&version=%VERSION%&os=%OS%&buildid=%APPBUILDID%");

I have not investigated the update URLs.

With 10,000,000 daily users (per blocklist hits, lower on weekends) let's say we have 60,000,000 hits per week and 250,000,000 hits per month. Although cloudflare claims they have no traffic limits for free accounts, I see some skepticism for that in looking at online comments, and difficulties in comparison since most look at bandwidth issues, not DNS hits. That many hits would be 100x the limit for the $153/year EasyDNS enterprise plan, and would cost $100 per month on AWS Route 53.

But I am not an expert here, just an accomplished skeptic. Can I get other opinions about the risk that our traffic levels here are beyond levels that cloudflare would consider appropriate for a free account?

From a pure financial standpoint as Thunderbird Treasurer, donation income from the startpage is about $2500 per day, so that is the direct cost of a day of DNS downtime to the start page.
Flags: needinfo?(rkent)
Well I'm obviously on the side of it being fine because I made the choice to use Cloudflare. Cloudflare has a lot invested in their model of not caring about bandwidth charges *for their CDN/Cache* which consumes several orders of magnitude more bandwidth than DNS queries would consume. It seems irrational that they would be concerned about DNS queries relative to that.

All concerns about bandwidth on the free plans that I've found on the internet have to do with people under major DDOS attacks, who are using Cloudflare to cache full websites. We're not doing that, even with Cloudflare CACHING turned on for all these services it's all extremely tiny redirects or requests serving 2-10KB of text.
https://support.cloudflare.com/hc/en-us/articles/203502550-Does-Cloudflare-charge-for-or-limit-DNS-queries-

They explicitly state(and recently so) that they do not care about DNS queries as well, I should state for the record(although I'm sure you already looked this up).
I'm willing to be convinced that cloudflare DNS is appropriate, but let's see if any of the Mozilla folks have comments on this. Otherwise your analysis seems OK, though I am still a little nervous about using a free service with minimal direct support for critical DNS services. But not nervous enough to say no.

I'm really not the best person to be reviewing all of this in the long run.
(In reply to Kent James (:rkent) from comment #43)
> With 10,000,000 daily users (per blocklist hits, lower on weekends) let's
> say we have 60,000,000 hits per week and 250,000,000 hits per month.

Well only a small fraction of those hits would hit cloudflare. Once queried, the entries would be locally cached in user's ISPs local DNS server.
(In reply to Magnus Melin from comment #47)
> (In reply to Kent James (:rkent) from comment #43)
> > With 10,000,000 daily users (per blocklist hits, lower on weekends) let's
> > say we have 60,000,000 hits per week and 250,000,000 hits per month.
> 
> Well only a small fraction of those hits would hit cloudflare. Once queried,
> the entries would be locally cached in user's ISPs local DNS server.

True. OK with no more comments, I am comfortable with switching the thunderbird.net DNS servers to cloudflare.

After that, I'll look one more time at the revised instructions to create the servers.
(In reply to Richard Soderberg [:atoll] from comment #40) 
> We can delay handing this off for a while.

Based on the above comments we should be ready to switch the DNS servers for thunderbird.net over to the cloudflare ones. Just to add that information here so you don't have to hunt for it: 'kia.ns.cloudflare.com' and 'nick.ns.cloudflare.com'.

http://i.imgur.com/jlYNBpe.png is the setup in cloudflare for confirmation.

This can happen whenever really, I won't change the CNAME/A records until a later time we agree on in this bug.
Any idea when we can plan to take that first step? :)
Flags: needinfo?(rsoderberg)
Flags: needinfo?(rsoderberg)
Summary: [tracker] Remove thunderbird.net domain from use in email autoconfiguration → [tracker] Transfer thunderbird.net to the Thunderbird project
Depends on: 1348343
Depends on: 1348349
Depends on: 1348367
Depends on: 1348389
Depends on: 1348391
To summarize the progress so far, once bug 1348389 is resolved, :sancus will have complete authority to cutover individual endpoints from Mozilla to TB at any time. More bugs will be filed and closed as part of this process.
Depends on: 1348395
Depends on: 1348417
Depends on: 1348418
Alright! We're handed off at an administrative level. Sancus has control over the migration away from our hosting to your hosting for each of the apps. We have a hard deadline of 14 days, unexpectedly, to complete this migration (third-party EOL of our cluster's OS) and are available for any help that remains to be offered here.
Since we need to be able to generate certificates via Lets Encrypt for the mozillamessaging.com legacy domains, I need to write a hook to do this via the web server instead of dns auth. I will do that on Monday. Once that is done we:

1) Point DNS at our load balancer for one super low risk domain only (support.thunderbird.net, say, which is just a redirect to support.mozilla.org). This will let us run the lets encrypt for it and confirm that we can acquire the certificates.

2) At this point, we change all DNS to point to our load balancer, and then acquire a new Lets Encrypt cert for all the domains we need.

3) Once we have those certs, I can change our load balancer to use them and we can free up the temporary SAN cert from bug 1348395 to be refunded.

At that point, this transfer will be completely complete!

I think we can do #1-3 on tuesday or wednesday at the latest. Overall risk is very low since we can always quickly revert DNS changes and the earlier we do this, the safer it is since we'll be furthest away from the 14-day deadline for the Mozilla servers to be decommissioned.
(I am not available next Wednesday.)
It turns out that the SSL cert being served by broker-live.mozillamessaging.com somehow became incorrect during the work we did on Friday.

broker-live.mozillamessaging.com:443 uses an invalid security certificate.

The certificate is only valid for the following names:
  generic-san.mozilla.org, air.mozilla.org, outgoing.mozilla.org, securitywiki.mozilla.org, wiki.mozilla.org, www.itisatrap.org, itisatrap.org, calendar.mozilla.org, moztrap.mozilla.org, pto.mozilla.org, mx.thunderbird.net, broker.thunderbird.net, intranet.mozilla.org, iplimit.irc.mozilla.org, m.wiki.mozilla.org, fb-affiliates.mozilla.org, getfirebug.com, www.getfirebug.com, phonebook.mozilla.org, passwordreset.mozilla.org, www.webfwd.org, webfwd.net, www.webfwd.net, webfwd.org, planet.mozilla.org, planet.mozilla.de, planet.firefox.com, planet.webmademovies.org, flashblock.itisatrap.org, except.flashallow.itisatrap.org, except.flashblock.itisatrap.org, flashallow.itisatrap.org, flashsubdoc.itisatrap.org, except.flashsubdoc.itisatrap.org

I wasn't sure how long it would take to resolve this problem by paging webops and since we already have the servers set up and working, I just switched the CNAME for broker.thunderbird.net to an A record for my load balancer. The cname on broker-live.mozillamessaging.com points to that, and so we are actually serving broker-live.mozillamessaging.com now. Our server also has monitoring on it that would have caught this issue(it was actually caught by rkent, thanks!) within 60 seconds, so I think it's safer to just leave it as-is than to fix the Mozilla server and try to swap back over. I can also then just use this domain(broker.thunderbird.net) for the Lets Encrypt work/testing tomorrow.
Also, all of our site monitoring tests are now pointed at the appropriate real domains(thunderbird.net/mozillamessaging.com depending on the TB in-product url) instead of at sancus.ca so it should prevent any further issues like this.
Depends on: 1349342
Traffic is now on our servers. It was more significant than expected, and overwhelmed a linode load balancer briefly. Now there are two! This seems to be stable, however I'd like to put live.mozillamessaging.com behind cloudflare caching to reduce the traffic on our servers. Bug 1349342 is filed for the dns changes this will require. However, this isn't urgent.
I believe we're complete now. Cheers!
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.