Closed Bug 1400359 Opened 7 years ago Closed 7 years ago

Mail pickup via POP fails silently if server is misconfigured (wrong name or address does not resolve)

Categories

(MailNews Core :: Networking: POP, defect)

defect
Not set
normal

Tracking

(thunderbird_esr5257+ fixed, thunderbird56 fixed, thunderbird57 fixed)

RESOLVED FIXED
Thunderbird 57.0
Tracking Status
thunderbird_esr52 57+ fixed
thunderbird56 --- fixed
thunderbird57 --- fixed

People

(Reporter: alaric, Assigned: jorgk-bmo)

References

Details

(Whiteboard: TB 56 beta 4 => TB 52.5 ESR)

Attachments

(2 files, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.9) Gecko/20100101 Goanna/3.2 Firefox/45.9 PaleMoon/27.4.2
Build ID: 20170825013029

Steps to reproduce:

I have two domains, which we will call b.com and c.net.  I no longer accept mail at b.com, but I have one mail account set up in Thinderbird which was set up when I did, and which fetches its mail from pop3.b.com.  Every other account I or anyone else in the household has retrieves its mail from pop3.c.net, which resolves to the same host.

A couple of days ago, I was doing DNS maintenance and saw the A record for pop3.b.com.  "Oh," I thought to myself, "nobody uses that any more, I should drop that A record."  And so I did, unaware that one of my accounts was still using it.


Actual results:

What HAPPENED was that mail pickup for that account failed, totally silently.  So far as I could see, my mail server was telling me mail was being delivered to me, but Thunderbird was never showing it appearing in my mailbox.  There was no indication whatsoever that Thunderbird was suddenly unable to CHECK my mailbox.


Expected results:

What SHOULD have happened is that Thunderbird should have given me some kind of notification that it could not check my mail because the mail server was suddenly not resolving.  Obviously it shouldn't spam me with notifications every time it tries to fetch mail, but it should at least let me know what there is a problem when the problem first occurs, and preferably offer me some options of what to do about it.  (For example - retry, let me reconfigure the server address, ignore for an hour and then retry, ignore until restarted.)
Component: Untriaged → Networking: POP
Product: Thunderbird → MailNews Core
Whiteboard: [dupme]
Version: 52 Branch → 52
Hmm, I'm not a network administrator but dropping the A record means that pop3.b.com can't be resolved any more, right?

That's like configuring a non-existent server, no? If I change the server name in the account config to something invalid, I get "Failed to connect to server", at least for IMAP.

Oh, I see, you reported POP, so I did the same for a POP account and no message at all, it fails silently. We should fix that.

Aceman, could you take a look, please.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(acelists)
Summary: Mail pickup fails silently if server address does not resolve → Mail pickup via POP fails silently if server is misconfigured (wrong name or address does not resolve)
I looked at this a bit. For IMAP, the error comes from nsImapProtocol::CreateNewLineFromSocket() where we check the return code for NS_ERROR_UNKNOWN_HOST or NS_ERROR_UNKNOWN_PROXY_HOST. There is no such check in the POP3 protocol, but there is in nsMsgProtocol::OnStopRequest(). However, that doesn't see to be triggered since we most likely don't even start the transfer.

It's a bit of a mystery where the error gets eaten up. Needs further looking into.
(In reply to Phil Stracchino from comment #0)
> Obviously it shouldn't spam me with notifications
> every time it tries to fetch mail,

So I think this is why it isn't complaining about the broken server. When you click Get all new mail it silently skips servers that TB does not have a password for so maybe also those that aren't reachable. The same could be if you just wait for the automatic fetch from server in the interval you have set in your account settings.

> but it should at least let me know what
> there is a problem when the problem first occurs, and preferably offer me
> some options of what to do about it.  (For example - retry, let me
> reconfigure the server address, ignore for an hour and then retry, ignore
> until restarted.)

OK, does TB remain silent when you click Get Mail on that specific server/account?
Flags: needinfo?(acelists)
Always silent, there is never any error, regardless what you do: Not at start up, not getting mail, never.
so, the older bug reports I was thinking of are 
bug 1278519 insufficient error notification
bug 227551  No feedback/error message when connection to POP3 server lost, or server settings not valid
With regard to password
 Bug 533006 - We don't notice that the server closed the connection after failed passwor
 Bug 490140 - No error message or password prompt when POP password is incorrect
Thanks Wayne, yes this seems to be an old problem so it's about time to fix it. I didn't know it since I have a reliable server which I've configured correctly. I believe I've seen password re-prompts in cases I have changed the password, so I'm not sure that bug 490140 is still relevant. Let me dupe bug 227551 and bug 1278519 here.

Bug 227551 comment #6 mau have some clues, so I'll try logging to see whether the error is detected and where it gets lost instead of being reported. So people have been complaining in bug 227551 for years, but no one has done anything.
Yup, forward duping is perfectly fine. Initially, I had trouble finding them myself.
Whiteboard: [dupme]
Check the content of bug 1278519 to see whether the failure to report anything at all ( and the error condition being mysteriously "swallowed" at some point ) is not rather that the error indication is so subtle that it is going totally unnoticed by the user. 

This was my point in that bug report: changing status line from "looking up ... " to "looked up ..." is so inconspicuous that is may as well not be there. 

If it said "FAILED to look up ..."  there may be half a chance of it getting noticed. That is a five minute fix: change the text in the status bar. 

From there is would be a simple task to make if more visible either with a colour change or something more in your face for a critical error than a little note on the edge of peripheral vision.
Thanks for the hint. I can't see neither "looking up" nor "looked up" in any string anywhere in our code base:
http://searchfox.org/comm-central/search?q=looked&path=*.dtd
http://searchfox.org/comm-central/search?q=looking&case=false&regexp=false&path=*.dtd
http://searchfox.org/comm-central/search?q=looked&case=false&regexp=false&path=*.properties
http://searchfox.org/comm-central/search?q=looking&case=false&regexp=false&path=.properties

unless this message is produced in M-C Core::Netwerk

mozilla/netwerk/locales/en-US/necko.properties
5	#ResolvingHost=Looking up
11	3=Looking up %1$S…
mozilla/netwerk/locales/en-US/necko.properties
19	11=Looked up %1$S…

Then with 100% certainty we can't change it. But we should report an errors somehow. IMAP shows a little alert like the "new mail" alert, not a dialogue you need to acknowledge. That's what I'm aiming for here.
With domain names redacted, the text I get in the status bar when connection is down reads: 

example.com: Looked up SSL0.MYISP.NET

I can not find when I got the "Looking up" msg. It seems that now all I get is "connecting to ..." if the network resolves the domain.

That may reflect a change since I reported. or that there is some other additional condition is required the circumstance during which it displays "looking up". 


If you can not find "Looked up" maybe this is returned by a lower level. It seems to at the initial level of resolving the domain name that these messages are generated. Clearly "Looked up" is NOT appropriate if it FAILS to look up the domain name. 

some kind of visually obvious, non-modal message would seen a good solution.
Sorry, correction. It is not the domain name example.com that starts the message. It is the "account name" as shown in Account Settings dialogue.
I note that you are searching a path based on en-US translations, just in case it matters I am using ;

LANG=en_GB.UTF-8
Capturing the screen in a video I see:
Connecting to <server>...
Connented to <sever>...
Host contacted, sending login information...

The first two are again from Necko:
https://dxr.mozilla.org/comm-central/source/mozilla/netwerk/locales/en-US/necko.properties#12

I haven't seen the "Looking up" or "Looked up", but no doubt it will shown on a slow connection. So all this happens in Core::Netwerk and I need to see where the error is passed back to TB and ignored.

The third message is from TB:
mail/locales/en-US/chrome/messenger/localMsgs.properties
32 hostContact=Host contacted, sending login information…
and are used here:
https://dxr.mozilla.org/comm-central/rev/65567743280f48caf53b55ebd423896b84bcb34a/mailnews/local/src/nsPop3Protocol.cpp#3888 and 18 lines down from there.

I'll do an SMTP log next when I find the time.

I also noticed that in the configuration there are:
mail.server.server1.hostname and
mail.server.server1.realhostname

Changing the server name in the account configuration changes "realhostname" but not "hostname". The lookup happens on "hostname". And when I set this to something invalid in the config editor, I indeed see the
  Looked up <server>...
in the status bar and nothing else.

But next time around, it seems to do the lookup with realhostname.

Aceman, the account manager is your territory, I'm really confused by those two hostnames. I can even see that in nsPop3Protocol::LoadUrlInternal() one hostname is used, and the status message is referring to the other.

A little hard to debug if you don't understand which hostname the system is using or whether perhaps one it's trying two different ones.

Summarising: nsMsgProtocol::OpenNetworkSocketWithInfo() is called with the real name, but in nsPop3Protocol::LoadUrlInternal() when logging and for calling net_pop3_load_state(), it uses server->GetHostName(hostName) which is the "plain" name.

This is particularly confusing when the edits the real name and the system keeps using some other previous value.
Flags: needinfo?(acelists)
I found the problem. nsPop3Protocol::OnStopRequest() gets called with aStatus==NS_ERROR_UNKNOWN_HOST and then exits early returning NS_OK without calling nsMsgProtocol::OnStopRequest() which would have displayed a message.
This shows the desired error message. That's the solution to 14 y/o bug 227551 :-)
Assignee: nobody → jorgk
Status: NEW → ASSIGNED
Attachment #8909334 - Flags: review?(acelists)
Removed include which is no longer needed.
Attachment #8909334 - Attachment is obsolete: true
Attachment #8909334 - Flags: review?(acelists)
Attachment #8909335 - Flags: review?(acelists)
Attachment #8909335 - Attachment description: 1400359-invalid-pop-server.patch → 1400359-invalid-pop-server.patch (v1b)
Comment on attachment 8909335 [details] [diff] [review]
1400359-invalid-pop-server.patch (v1b)

Stealing review as discussed via email.

Tested with SeaMonkey 2.53 (comm-beta). Shared code so should be good for TB too.
Attachment #8909335 - Flags: review?(acelists) → review+
(In reply to Jorg K (GMT+2) from comment #7)
> Thanks Wayne, yes this seems to be an old problem so it's about time to fix
> it. I didn't know it since I have a reliable server which I've configured
> correctly.

The only reason I discovered it was because I deleted an obsolete A record during DNS cleanup without realizing that *one* - and only one - of the mail accounts I have configured in Thunderbird hadn't been updated to the new record.  (I made that change *years* ago...)

> I believe I've seen password re-prompts in cases I have changed
> the password, so I'm not sure that bug 490140 is still relevant. Let me dupe
> bug 227551 and bug 1278519 here.

Invalid (or missing) password for the new server hostname alerted just fine.  It was only the non-resolving server address that did not generate an alert.
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/d0f95b20e4c1
show user alert if pop server cannot be reached. r=frg
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
This is fixed now. I wasn't aware that this bug was in fact from 2003. Sometimes it needs the N-th duplicate to wake someone up :-(

I mainly use POP and I've never seen the bug, but then, maybe I never noticed. I'll include this in TB 52 ESR.
Flags: needinfo?(acelists)
Target Milestone: --- → Thunderbird 57.0
Comment on attachment 8909335 [details] [diff] [review]
1400359-invalid-pop-server.patch (v1b)

14 y/o bug, should be fixed one day and the fix is pretty simple and low-risk.
Attachment #8909335 - Flags: approval-comm-esr52+
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/b182a7a28b12
Follow-up: Don't try to report NS_BINDING_ABORTED. r=me
Adding the alert message cause an intermittent failure in mailnews/local/test/unit/test_pop3PasswordFailure3.js.

Sometimes but not always nsPop3Protocol::OnStopRequest() gets called with aStatus==804b0014 (NS_ERROR_NET_RESET) which triggers an assert in nsMsgProtocol::ShowAlertMessage().

Looking at https://dxr.mozilla.org/comm-central/rev/65567743280f48caf53b55ebd423896b84bcb34a/mailnews/imap/src/nsImapProtocol.cpp#4830
NS_ERROR_NET_TIMEOUT, NS_ERROR_NET_RESET and NS_ERROR_NET_INTERRUPT are also treated, so I'll add those cases.
Pushed by mozilla@jorgk.com:
https://hg.mozilla.org/comm-central/rev/175495e6ca8e
Follow-up: Add NS_ERROR_NET_RESET and NS_ERROR_NET_INTERRUPT in nsMsgProtocol::ShowAlertMessage(). r=me DONTBUILD
This does not just happen in tests, otherwise you would not have multiple bug reports on this issue. Please don't assume that everyone using TB has the same fast, reliable network connectivity that you have, and leaves it live all the time.

The bug I filed, which got duped here occurs fairly regularly because I use networkmanager widget to disconnect when I'm not using the network. ( Yes I do use the PC for other things that do not require net access and I also sleep. ) 

If I attempt to recover POP3 mail having forgotten to activate the link to the router, I get the error condition reported. 

Thanks for the effort to fix this bug, it was nearly old enough to go out and vote ! ;)
The strings I added were:
+# LOCALIZATION NOTE(netResetError): %S is the server name
+netResetError=Connection to server %S was reset.
+# LOCALIZATION NOTE(netInterruptError): %S is the server name
+netInterruptError=Connection to server %S was interrupted.

Are you saying that those alerts are now displayed? Then we have a problem. I'm not allowed to make string changes in a beta and even less in the ESR version. For ESR I will have to land a different patch so those errors *will not* show.

If only the pre-existing error
# LOCALIZATION NOTE(unknownHostError): %S is the server name
unknownHostError=Failed to connect to server %S.
shows, we're cool.
Flags: needinfo?(ffux)
My comment relates the my bug which was duped to this one. I'm assuming that action indicates that this is the same problem. 

I have not tried the patched dev. version all my comments relate to the Fedora distro version, currently 52.3.0

"Failed to connect to server %S." is  a great improvement over "looked up %S" when the look up failed. 

Sorry , don't have time to build and test. 

It should be quite trivial to reproduce the error condition either via the network widget; pulling a cable or unplugging a router. Then attempt to get POP mail. 

I hope that clarifies my earlier comment.
Flags: needinfo?(ffux)
Whiteboard: TB 56 beta 4 => TB 52.4.1 ESR
Attachment #8909335 - Flags: approval-comm-esr52+ → approval-comm-esr52?
Whiteboard: TB 56 beta 4 => TB 52.4.1 ESR → TB 56 beta 4 => TB 52.5 ESR
Attachment #8909335 - Flags: approval-comm-esr52? → approval-comm-esr52+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: