Closed Bug 1594366 Opened 4 months ago Closed 3 months ago

emailwizard shouldn't send repeated OPTIONS requests

Categories

(Thunderbird :: Account Manager, defect)

defect
Not set

Tracking

(thunderbird_esr6871+ fixed, thunderbird71 fixed, thunderbird72 fixed)

RESOLVED FIXED
Thunderbird 72.0
Tracking Status
thunderbird_esr68 71+ fixed
thunderbird71 --- fixed
thunderbird72 --- fixed

People

(Reporter: mkmelin, Assigned: BenB)

References

(Blocks 1 open bug, Regression)

Details

Attachments

(2 files)

There are some rather pointless OPTIONS requests in the emailWizard: https://searchfox.org/comm-central/rev/41db3d110be306f546940455eb03d5982b0fa163/mail/components/accountcreation/content/emailWizard.js#434

(And to make matters worse, these get triggered a lot since it's called at the end of switchToMode.)

Testing what the supposed win would be, there doesn't appear to be any.

Without OPTIONS: autoconfig.thunderbird.net -
GET: Connecting 188ms + TLS Setup 193ms, Waiting 188ms

With OPTIONS:

  • OPTIONS: connecting 189ms, TLS Setup 200ms, waiting 495ms
  • GET: connecting 187ms + TLS Setup 191ms. waiting 188ms

I also notice these now go to live.thunderbird.net, although that just ends up redirecting to autoconfig.thunderbird.net. Andrei, can we just use autoconfig.thunderbird.net directly or is there a good reason to go through live?

Flags: needinfo?(sancus)

(In reply to Magnus Melin [:mkmelin] from comment #0)

I also notice these now go to live.thunderbird.net, although that just ends up redirecting to autoconfig.thunderbird.net. Andrei, can we just use autoconfig.thunderbird.net directly or is there a good reason to go through live?

Most client requests have always gone through live.thunderbird.net, the reason is so that we can change the server domain/urls whenever necessary simply by changing the redirects on live without making any client changes. It's proven to have some convenience value in the long term, so I don't see any reason to change it. Yes, you can just redirect individual domains, but then over the years you end up with a bunch of redirects to maintain on every random domain the way Mozilla has. Also, live.thunderbird.net is cached globally by Cloudflare so it should be quite fast for everyone.

Flags: needinfo?(sancus)
Status: NEW → ASSIGNED

Remove the unneeded OPTIONS requests

Attachment #9107125 - Flags: review?(alta88)
Attachment #9107125 - Flags: review-

Magnus, as you can imagine, these requests were not added for fun, but for an actual reason. So, please don't just remove stuff.

These pre-cache requests fixed an actual bug 1572418, where the HTTPS requests timed out and the users got no response. We noticed because that this prevented our addon to be installed due to addons.json request failing, but while investigating the bug, I also noticed that the ISPDB requests also failed for the very same reason. So, this does not only affect Owl.

This bug had devastating effects on our installations, but was extremely hard to find. We were looking for months for the cause. In our own tests, these requests were fast, as you saw as well. What happened was:

  • The DNS lookup takes time
  • The SSL requires a OSCP fetch, which is needed only for the very first request (at a given day, for that CA). This is another network request, which needs to complete before the SSL setup can finish and the original call can continue
  • The redirects you mentioned (but they alone are not a problem, and they are no problem anymore since we make these pre-flights)
  • Many end users do not have a network that is as fast as yours

All that in combination meant that we didn't see this problem, because we had DNS cached on our machines (even for new profiles), fast network etc., but a huge percentage of users had this problem.

Exotic enough was that you couldn't see the problem from the server logs, because the call arrived at the server, and logged there, but was aborted by the client due to the timeout. Yes, it sounds strange, but that's what happened.

Before we found this, I couldn't figure out why the install rates were so bad. After the fix shipped, install rates dramatically increased. I don't have exact numbers on ISPDB, and server logs are false as mentioned, but this would also mean that the ISPDB might have failed as well, so the effects were far beyond and broke the autoconfig for a large amount of end users.

So, this pre-cache is absolutely needed, in this case. We should have done this much earlier. If only we had known.

And to make matters worse, these get triggered a lot since it's called at the end of switchToMode

That's a bug, indeed. They should be triggered exactly once at the start of the dialog. We could fix that. It's sufficient just to move these calls.

Attachment #9107125 - Attachment is obsolete: true
Attachment #9107125 - Flags: review?(alta88)
Attachment #9107131 - Flags: review?(mkmelin+mozilla)
Attachment #9107131 - Attachment is patch: true
Comment on attachment 9107125 [details] [diff] [review]
bug1594366_OPTIONS.patch

There will be no network requests if manual config is chosen, period.
Attachment #9107125 - Attachment is obsolete: false
Attachment #9107125 - Flags: review- → review+
Attachment #9107131 - Flags: review?(mkmelin+mozilla) → review-
Attachment #9107131 - Attachment is obsolete: true

@jorg - this has been approved by two peers, I expect it to land and I expect any further review flipping to be ignored and to be a violation of rules, not to mention etiquette.

Attachment #9107125 - Flags: review-
Comment on attachment 9107125 [details] [diff] [review]
bug1594366_OPTIONS.patch

alta88, this is not your module. You have no right to make decisions here, or make statements like "There will be no ..., period". You are not a module owner or peer here, but only peer of RSS, which is irrelevant here.

I am the primary author of this dialog, and the effective module owner (the dialog does not have its own module), since 10 years.

I don't know where your "2 peers" comes from. I see no other peers or modules owners that have weighed in or even reviewed the patch. I see only Magnus' patch, and no reviews other than from you.

As effective module owner, I explained why Magnus' patch causes real world problems and cannot land. This causes real world breakage. Not only for Exchange, but also ISPDB requests were failing without the fix for bug 1272418. Even for users where it would not fail, the precache still makes them faster. This was a very important bug fix and there is no reason to remove that fix.
Comment on attachment 9107131 [details] [diff] [review]
Move calls up to avoid repeated requests

Ugh, this code was in the correct place in my original patch, but it got moved by the rebase; presumably some context changed which threw off the patch command. So yes, this where the code needs to be to fix bug 1572418, but without generating unnecessary further requests.
Attachment #9107131 - Flags: review+
Attachment #9107131 - Flags: review?(acelists)
Attachment #9107131 - Attachment is obsolete: false

The mail/ module owner submitted a patch (1 peer approval). The mail/ module owner trumps whatever you think you are. A peer also has the right to ask anyone else to delegate a review, and I am a peer in the project (2nd peer).

You are listed nowhere as a peer or owner of anything, not even as a core developer. That you may have written some code here (the contribution varies based on whether it's convenient for you to claim it) doesn't give you any rights, either codified or implied or 'effective'.

You and neil are paid commercial interests pushing a third party product. You and neil have engaged in Self Dealing by reviewing each others code. You have engaged in Self Dealing for the purpose of Self Enrichment. Yes these are legal terms for your actions in the jurisdiction of Thunderbird's legal domicile. You have diminished the integrity of this project, as well stated in Bug 1592258.

I want to see you both removed from having any bugzilla or other rights on this project.

Let's take a look at the history here. Exchange support was implemented in bug 1500105 and the patches that landed were approved by Magnus and Aceman (https://phabricator.services.mozilla.com/D9215). There was consensus, including the the Thunderbird Council, that Exchange support would be integrated into Thunderbird. Sadly, IMHO, the terms weren't clearly defined, but we can assume that the functionality is consensus-based. That also means it needs to function reasonably well for the 3rd party which provides that support.

A few more bugs got landed which fine-tuned (or distorted, depending on the point of view) the functionality, including bug 1572418 and bug 1571772.

The code being discussed here was introduced in bug 1572418. That bug and comment #3 explain the technical background. So apparently the code had a bug which lead to multiple requests, so that needs to be fixed. Just removing the code would regress bug 1572418 and is hence counter-productive. I have no reason to believe that Ben's fix doesn't fix the issue. If there are technical issues with the patch, please state them here. In comment #5 we read (quote): "no network requests if manual config is chosen". I guess that's still a principal we should follow. So the question is whether Ben's patch addresses that or not. We also note that manual config currently doesn't even exist on trunk, that's bug 1590636.

As for the general conduct here: Please refrain from telling us who has got the bigger stick or is the better peer. Formally, this code is in mail/ where Magnus is the module owner and Aceman and myself are peers. If you consider this to belong to Mailnews, then the owner is Joshua, and Neil is a module peer. However, that's all pretty irrelevant since we work on a consensus basis.

I certainly won't land any patch which is infringing the consensus principle here.

So let's do this:
Ben/Neil to answer how Ben's patch plays out with manual config and Magnus to look at Ben's proposal.

If you can't reach a consensus here, I will personally take the issue up with the Thunderbird Council as the ultimate authority in absence of the much talked about but never fully established engineering steering committee. I will ignore any further non-technical discussion.

And off-topic: The same goes for bug 1592258 where we will address some shortcomings introduced in bug 1571772.

Flags: needinfo?(neil)
Flags: needinfo?(ben.bucksch)
Comment on attachment 9107131 [details] [diff] [review]
Move calls up to avoid repeated requests

Magnus, you filed the bug, can you please read comment #10 and check whether the patch does the right thing. I'm not sure that comment #5 "no network requests if manual config is chosen" is meant to mean, also in view of bug 1590636.
Attachment #9107131 - Flags: feedback?(mkmelin+mozilla)

And while we're here: Ben, please rebase your patch to trunk.

Jörg wrote:

In comment #5 we read (quote): "no network requests if manual config is chosen".

That was a statement from alta88 himself, and does not reflect a project decision. Quite the contrary, that statement has always been wrong. I have a Thunderbird 60.9 and a TB Trunk autoconfig dialog open here that does quite the opposite. Manual config is just a way to configure accounts for which the autoconfig failed. That "no network requests" is a mantra that alta88 made and not reflected by the software. In fact, immediately after you start Thunderbird, it already loads the startpage - a webpage, which is a whole lot more intrusive than a JSON or XML request. If you want to avoid network requests, ...

you have to start by removing the start page.

I don't think that's what Thunderbird wants, given that many donations come from there and many project member livelihoods hang on them.

I really don't see the problem with an OPTIONS pre-flight request to a thunderbird server.

Ben/Neil to answer how Ben's patch plays out with manual config

There is no way to do manual config without going through the autoconfig. This is intentional and by design. Just open the dialog in any TB build and you'll see. It's always been like that, very intentionally, and it's been approved a thousand times.

This really was just a bug fix. I just want to work in peace.

Flags: needinfo?(ben.bucksch)
Flags: needinfo?(neil)

While I don't think it's a huge problem to do the pre-flight request, this all seems technically guaranteed to be pointless for the thing you intended it. See bug 1572418 comment 34.

Since you wrote that bug 1572418 made a huge difference in Owl uptake, I'd suggest you look at the timeline here: bug 1572418 landed the same time as bug 1571772. Coincidence? I think any jury would have the verdict pretty quickly on which one of them is the probable cause for the spike in Owl installs.

Regressed by: 1572418

You mean: bug 1572418 landed the same time as bug 1571772. Both landed on 60.9 and 68.1.

Yes, thanks for the correction. I'll fix comment 14.

Magnus, I did actually see that bug here on my system. Neil saw it was well. It's very difficult to reproduce on a developer system, due to DNS caches and other caches, but I did see it repeatedly, and so did Neil. That bug was real. I am not just hypothesizing.

It's not an edge case. Real users are much more prone to run into this than a TB developer, because they usually set up accounts first thing after installing Thunderbird. That means their DNS caches are cold, and the OSCP call needs to happen.

You wrote in bug 1572418 comment 9, you did NOT see it on your system.

I can imagine there could be fringe cases where it would take more than 5sec, but I don't see how that could be more than a tiny fraction of a percentage, especially as live.thunderbird.net is cached globally per comment 1. Digicert (used by the server) has a reported average response time of 19ms for OSCP requests. It all just seems that much more probable that exchange being shown and used by default in bug 1571772 is what, unsurprisingly, made any difference.

How about fixing the repeated requests and then you can discuss more in a follow-up. TB 68.3 ships in less than a month.

Digicert (used by the server) has a reported average response time of 19ms for OSCP requests

I don't know who said that or who measured. Maybe Digicert itself, from the server perspective. But that's definitely not the time of the OSCP request from when the client fires it, with cold DNS caches, TCP setup, TLS setup (!), HTTP request sent, server itself responding (that's maybe those 19ms), and HTTP response transmitted. Such total request times are typically at least 1 second on a normal DSL line, and can be much longer. DNS plays a big factor here.

You wrote in bug 1572418 comment 9, you did NOT see it on your system.

OK, I see how that might look contra-dictionary. At the time when I wrote this comment, I didn't see it on my system, but Neil did on his. Then, during the course of reviews and further tests, I did see it repeatedly on my machine as well.

That's exactly why I said repeatedly that this bug is really hiding itself and really difficult to see on a developer system, due to caches, but affects a lot of end users.

I can know who of our users use Office365 or outlook.com or on-premise Exchange. Based on the numbers I have, I know that's not the cause of the spike, but this bug here was.

I would please ask you not to speculate on the causes. I spent 6 months trying to figure out why the statistics don't match up. I correlated the numbers with other data. They just didn't correlate as they normally would. I analyzed them back and forth and nothing made sense. Until I found this bug. Once it was fixed, the numbers fit together. With this fix, the numbers fit together exactly as expected. That's why this bug here caused 60-80% lookup failures and consequently missing installs.

What reason would I have to make this up? If I'm wrong, the OPTIONS pre-flight request to a Thunderbird server can't possibly help me in any way. What good would it do me to add this? If I'm right, this was a serious bug that broke the configs even from ISPDB.

Either way, the pre-flight definitely makes the subsequent request to ISPDB faster, by pre-warming DNS caches and the OCSP request is already done. So, this pre-flight request makes sense even in the hypothetical case that the bug did not exist. Compare http://bitsup.blogspot.com/2008/11/dns-prefetching-for-firefox.html .

I had a long discussion with Magnus about bug 1185366, bug 1592258 and bug 1594366. Here's a summary for this bug:

As Magnus said in comment #14, priming the connection to live.thunderbird.net isn't a huge problem. We're not in a position to prove or disprove Ben's findings, so the pragmatic approach would be to "just fix" the repetition identified.

That leaves the privacy concern that a network request without any "form data" to a domain controlled by Thunderbird happens when the account setup panel is first displayed.

There are various aspects to that argument: In TB 68, there is actually no "manual" button on the panel to start with, so if the user continues, the network request will inevitably happen. However, a manual button might be brought back in bug 1590636.

In the context of this it might be useful to investigate whether instead of making a connection to live.thunderbird.net, instead only a connection to the DNS server could be done.

I suggest the following:
Land attachment 9107131 [details] [diff] [review] now to fix the repetition issue. In a follow-up do one of the following:

  1. Disprove Ben's data and remove the call altogether
  2. Do some "DNS priming", that is, talking to the DNS server only, and not running a full request to the site.

If I don't hear a resounding veto, I'll proceed to landing on Tuesday morning.

(In reply to Ben Bucksch (:BenB) from comment #20)

You wrote in bug 1572418 comment 9, you did NOT see it on your system.

OK, I see how that might look contra-dictionary. At the time when I wrote this comment, I didn't see it on my system, but Neil did on his. Then, during the course of reviews and further tests, I did see it repeatedly on my machine as well.

Glad to hear it's reproducible (in the 1½h following your review statement it wasn't reproducible, until landing). I'd like to try to reproduce it then! Please give exact steps.

Flags: needinfo?(neil)
Flags: needinfo?(ben.bucksch)

(In reply to Ben Bucksch (:BenB) from comment #21)

What reason would I have to make this up? If I'm wrong, the OPTIONS pre-flight request to a Thunderbird server can't possibly help me in any way. What good would it do me to add this?

Since you're asking. Seems convenient to "blame" the apparent huge increase in Owl uptake on this change, and not on the fact that you just promoted Owl to being offered (at all), and offered by default for the worlds second largest email server (bug 1592258) which landed at the same time. Do you really not see a problem with this reasoning?

I'd like to try to reproduce it then! Please give exact steps.

Like I said many times, it is very hard to reproduce. There are no exact steps, because caches are involved. I sometimes saw it, sometimes not.

Flags: needinfo?(ben.bucksch)

In the context of this it might be useful to investigate whether instead of making a connection to live.thunderbird.net, instead only a connection to the DNS server could be done.

The problem is both the DNS and the OCSP. Both are slow, see comment 20. Both need to be pre-warmed. The actual bug 1572418 was caused by the OSCP request. DNS is one component of that, but not everything. The OSCP request was the other part. The pre-warming of the OSCP is what fixed the bug.

Seems convenient to "blame" the apparent huge increase in Owl uptake on this change, and not on the fact that you just promoted Owl to being offered (at all), and offered by default for the worlds second largest email server (bug 1592258) which landed at the same time.

I told you that I know who uses Office365 and who uses outlook.com and how many use on-premise Exchange. Your explanation does not explain the numbers.

Why would I fight so hard to keep this trivial bug fix, if I didn't honestly believe this myself? Do you think that I deliberately pick fights with you? No, not at all. I hate arguing with you. I argue because your proposed changes will actually concretely break stuff.

  1. Disprove Ben's data and remove the call altogether

That is entirely backwards. In an honest attempt at technical resolution, we have now been tricked (logical fallacy that a negative cannot be proven) into having to disprove something that bucksch admits he cannot actually prove/reproduce yet claims will 'concretely break stuff'. Even if there were a provable smidgen of validity here, the 'remedy' is foisted on the entire population of users.

Given the problem can't even be reproduced, this is completely out of proportion. No bug about ispdb access has been reported (has it?). Has anyone ever failed to setup a new account due to ispdb access?

This code path is very noncritical. However, every new account setup must exercise it and send user domains/ips into the network, currently with no opt out. Unacceptable.

Finally, I dispute every single assertion bucksch has made in this bug as being unproven/unprovable, data free, exaggerated, inconsistent, or downright false. Sadly, I've encountered such dissembling before and credibility is gone. Some are catching on but not all. Here's an example of the mailnews/ owner catching on.

https://lists.thunderbird.net/pipermail/maildev_lists.thunderbird.net/2019-August/001958.html

I vowed not to joint any non-technical discussion here :-(

However, every new account setup must exercise it and send user domains/ips into the network, currently with no opt out.

What domains/ips is sent into the network when accessing live.thunderbird.net? This is a HTTP request which carries the clients IP address like any network packet. The client does have an internet connection, so it will occasionally send packets. Please clarify where the big privacy infringement is.

Next to the scientific argument here: The issue was actually nicely described in bug 1572418 comment #0 and the patch put forward came from a mailnews peer, Neil, who has probably more experience than some of the current people working in the area combined, myself included.

The better doctor is the one who heals the patient, scientifically proven or not, so if the patient claims to be healed when before he was suffering, the fix - as little as we understand it - must work. Surely no one has a hidden benefit from making such a priming network call other than to fix the problem that was described.

So scientifically speaking it is really on those who deny the problem to prove that it doesn't exist in the first place or that the fix or even placebo is useless.

Only if the privacy concern were attributed such a weight, it could be justified to remove the code altogether. I don't think this is the case.

All we've done so far is doing "desk checks" of code, whereas the people who reported the problem have real life empirical data. Computer software is not only about checking code, it's also about see how code behaves under imperfect real life conditions. As an anecdote I invite you to read: https://www.ibiblio.org/harris/500milemail.html

Further to the privacy discussion: In all the time autoconfig has existed, so at least since I started using TB many years ago, there has never been a manual button available on the first panel. The manual button became available after Continue was clicked and after a bazillion network queries had already happened. Complaining about a privacy issue for a single request to a site controlled by the TB project appears vastly overblown. Yesterday we landed bug 1590636 to provide a manual button on the first panel, so whoever is concerned now, please investigate further in another bug.

And on a personal note: "I dispute every single assertion bucksch has made in this bug ..." is really not a constructive approach for dialogue or constructive problem solving.

Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/7c97fc48c124
Move OPTIONS requests for connection priming to avoid repeated execution. r=Neil DONTBUILD

Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED

Please file a follow-up bug to address the (minor) privacy issue as well as further investigation and fine-tuning. I'm hijacking this bug to fix the repetition issue.

Flags: needinfo?(neil)
Summary: emailwizard shouldn't send pointless OPTIONS requests → emailwizard shouldn't send repeated OPTIONS requests
Target Milestone: --- → Thunderbird 72.0
Comment on attachment 9107131 [details] [diff] [review]
Move calls up to avoid repeated requests

Certainly it's consensus to fix the branches as well.
Attachment #9107131 - Flags: review?(acelists)
Attachment #9107131 - Flags: feedback?(mkmelin+mozilla)
Attachment #9107131 - Flags: approval-comm-esr68+
Attachment #9107131 - Flags: approval-comm-beta+
Assignee: mkmelin+mozilla → ben.bucksch
Type: task → defect
You need to log in before you can comment on or make changes to this bug.