Open Bug 263142 Opened 20 years ago Updated 2 years ago

popstate.dat file is not updated while messages are deleted or downloaded from the server, resulting in incorrect state relative to server after a crash

Categories

(MailNews Core :: Networking: POP, defect)

defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: aceman, Unassigned)

References

Details

(Whiteboard: [notacrash])

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7) Gecko/20040626 Firefox/0.9.1
Build Identifier: Mozilla/1.7.3

The file doesn't reflect the actual state of the mbox files while messages are
being downloaded. If mozilla/OS crashes while deleting/downloading messages, the
file will not contain the work already done. I.e. the same messages will be
deleted again (maybe causing bug 237131). Or the same messages will be downladed
again, causing duplicates (and then problems in bug 238365). I haven't really
tried it, but I expect this to happen from my knowledge of this file.

Reproducible: Always
Steps to Reproduce:
1.Connect to the pop3 server and download new messages.
2.Watch the file.
3.The status bar counts up messages and they are appended to the mbox files
(size increases)
Actual Results:  
popstate.dat isn't changing while this is done.
It is updated only after all operations are finished (or the connection is closed?).

Expected Results:  
Update the file after each message is processed - when we get a confirmation of
deletion or a complete message is retrieved. It should append new uidls or
remove deleted ones after each message done.
Blocks: 166111
Product: MailNews → Core
Here's a proposal how we should handle the POP state tracking. Its goal is to be
more secure in case of crashes or server hiccups.

-Messages not listed as available on server at time of retrieval should be
removed from the list in memory.
-Receiving of new messages and adding to the list in memory. If not leave on
server, issue delete command and if responded with +OK marking with "d" in the
list in memory. Then writing this entry to disk (appending to file) in any case.
-Delete old messages in the list marked with "d".
-If connection closed correctly (QUIT issued with subsequent +OK response),
write out whole list.

That way we can prevend already retrieved from being retrieved again if the
client crashed while retrieving messages last time and the whole list wasn't
written out (completely).
It will also be prevented that messages marked as deleted will be retrieved
again if not really deleted because a) the session wasn't closed correctly or b)
the server had a hiccup (or backup replayed).

Drawback: popstate.dat will actually be wrong insofar as messages that have
really been deleted on the server when session was closed correctly, will still
have an entry.
This won't cause any harm besides it uses more space on disk than really
necessary. This is sort of security vs. truth.

Problem will be appending of new entries to an existing popstate.dat since one
popstate can hold the status of various hosts.


Actually bug 237131, but want to note it here:
Currently popstate.dat will be cleared in such a situation

C: UIDL
S: +OK 3 messages
S: 1 42acab37dcf1@lunix
S: 2 42acab37dcf3@lunix
S: 3 42acab37dcf4@lunix
S: .
C: DELE 1
S: *** Connection closed
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows 98 → All
Hardware: PC → All
please be careful. gmail offers me 371 of my messages when i first download
mail. i have 130k+ conversations. (not sure what's going on with gmail, i have a
protocol log but i haven't figured out what to do with it.)
(In reply to comment #1)
> -Delete old messages in the list marked with "d".
> -If connection closed correctly (QUIT issued with subsequent +OK response),
> write out whole list.
What if it was not closed correctly? What is the difference causing us to NOT
write out the list? Some operations could be successful. I understand the 'write
out whole list' operation as just some optimization of the file (because you
wrote the entries quickly to the end of file, and now you just clean it up). Is
there something more to it?

> It will also be prevented that messages marked as deleted will be retrieved
> again if not really deleted because a) the session wasn't closed correctly or b)
> the server had a hiccup (or backup replayed).
How do we know this? If we sent a dele, the server confirmed, and the connetion
was closed correctly, we delete the entry completelly. If the message appears on
the server again (backup), we think it is new and get it.

> Drawback: popstate.dat will actually be wrong insofar as messages that have
> really been deleted on the server when session was closed correctly, will still
> have an entry.
I don't understand it. Currently, they don't have an entry anymore. Why would
they prevail in your scenario?

> This won't cause any harm besides it uses more space on disk than really
> necessary. This is sort of security vs. truth.
No problem. popstate.dat is probably the smallest, but very critical file in the
profile:) Enlarge it 100 times and be more safe, I will still be happy (my is
2kb) :)

> Problem will be appending of new entries to an existing popstate.dat since one
> popstate can hold the status of various hosts.
How can this be? Is this the current implementation or your planned one?

> Actually bug 237131, but want to note it here:
> Currently popstate.dat will be cleared in such a situation
Thanks for still keeping that bug in mind :)

I would also welcome a process to recreate a lost popstate.dat file from the
messages in the mailbox. E.g. extract all UIDLs from the headers, write them to
the file as 'keep on server'. Could be an extension...
That would help with Bug 238365.
(In reply to comment #3)
> > -Delete old messages in the list marked with "d".
> > -If connection closed correctly (QUIT issued with subsequent +OK response),
> > write out whole list.
> What if it was not closed correctly? What is the difference causing us to NOT
> write out the list? Some operations could be successful.

Not writing out the list is if we don't send QUIT and/or server gives no +OK
response. Some operations on the server could be successful, yes. But doing what
the RFC says (that is assuming the server didn't finally update the states) is
the most secure way in a "don't know what happended" state.

> I understand the 'write out whole list' operation as just some optimization
> of the file (because you wrote the entries quickly to the end of file, and
> now you just clean it up). Is there something more to it?

It's an optimization in security. Nothing less and nothing more is needed.
a) Immediately updating that a new message has been downloaded makes sure we
don't download it again, whatever happens.
b) Change two is that messages we issued a DELE for aren't removed from the list
(neither while connected nor after it in the complete write) until proven as
deleted.

I forgot to mention in my previous comment that what you wrote about deleting
("I.e. the same messages will be deleted again (maybe causing bug 237131).")
doesn't matter. Trying to delete an already deleted message doesn't harm, but
with step 1 from my proposal the possibility of reaching that point is nearly
impossible.

> How do we know this? If we sent a dele, the server confirmed, and the
> connetion was closed correctly, we delete the entry completelly.

No, in the leave on server case we don't and we will not. And as I wrote in step
two, if we don't leave them on server, we'll only mark them with "d" but keep
the entry in the list until we retrieve the next time and it isn't there anymore.

> If the message appears on the server again (backup), we think it is new and
> get it.

If they throw in the backup after we did some retrieves in the meantime, you're
right. Prevending this would mean to leave all ever retrieved messages in the
popstate (or even for x days).

> I don't understand it. Currently, they don't have an entry anymore. Why would
> they prevail in your scenario?

Hm, maybe we understand my proposal different. 
In step 2 we don't delete them and in step 3 "-Delete old messages in the list
marked with "d"." we don't delete the either since with old I mean entries that
had "d" before this connection. So the they'll be still in the list.

> > Problem will be appending of new entries to an existing popstate.dat since
> > one popstate can hold the status of various hosts.
> How can this be? Is this the current implementation or your planned one?

To be honest, I don't know how this can be or for what. But if I understand the
current source correctly, it's possible in theory.

> > Actually bug 237131, but want to note it here:
> > Currently popstate.dat will be cleared in such a situation
> Thanks for still keeping that bug in mind :)

To be honest again, currently I can't say why this problem clears the file. But
before I change any popstate related code I'll.

> I would also welcome a process to recreate a lost popstate.dat file from the
> messages in the mailbox. E.g. extract all UIDLs from the headers, write them
> to the file as 'keep on server'. Could be an extension...

I'd say extension or separate command line tool. Actually that would be the
right problem for a perl script.
Flags: blocking1.8a6?
(In reply to comment #5)
> > How do we know this? If we sent a dele, the server confirmed, and the
> > connetion was closed correctly, we delete the entry completelly.
> 
> No, in the leave on server case we don't and we will not.
I am not sure about this, my popstate.dat doesn't keep 'd' marked lines after a
successful connection.

> And as I wrote in step
> two, if we don't leave them on server, we'll only mark them with "d" but keep
> the entry in the list until we retrieve the next time and it isn't there anymore.

This would need a second state for the message: 'd' is for 'to be deleted', 'c'
would be for 'check if it was deleted'. For 'd' we send a DELE, for 'c' we
shouldn't.
(In reply to comment #6)
> > No, in the leave on server case we don't and we will not.
> I am not sure about this, my popstate.dat doesn't keep 'd' marked lines after
> a successful connection.

Oh right, sorry. But following this proposal we will.

> This would need a second state for the message: 'd' is for 'to be deleted',
> 'c' would be for 'check if it was deleted'. For 'd' we send a DELE, for 'c'
> we shouldn't.

Since in step one, entries for messages not existing anymore have been deleted,
there's no necessity for doing so.
But if we decide to leave messages stated as deleted by the server in popstate
for more than the next retrieval (would be more secure in case of backup replays
but would also cost more space and slow down processing). e.g. 3 days, another
state would be necessary, yes.
(In reply to comment #7)
> > This would need a second state for the message: 'd' is for 'to be deleted',
> > 'c' would be for 'check if it was deleted'. For 'd' we send a DELE, for 'c'
> > we shouldn't.
> 
> Since in step one, entries for messages not existing anymore have been deleted,
> there's no necessity for doing so.

Hm, this is how I understand the new handling:
1. message A is received, it is marked 'k'.
2. I decide to delete it.
3. it is marked 'd'.
4. I connect, the message is deleted in the server, but we keep it and mark 'c'.
5. another message B is received.
6. I delete it, it becomes 'd'.
7. Another connect, A has 'c', we see it isn't on the server so we permanently
delete its reference.
8. Message B has 'd' so we issue a DELE for it and mark it as 'c'.
9. etc.
Is something wrong with this? I don't see how we could manage without the 'c'
state (besides leaving the message in 'd', issuing DELE again on next connect,
getting and error and this will indicate the message was already successfully
deleted at the previous attempt. But this is ugly).

> But if we decide to leave messages stated as deleted by the server in popstate
> for more than the next retrieval (would be more secure in case of backup replays
> but would also cost more space and slow down processing). e.g. 3 days, another
> state would be necessary, yes.
For low volume (but high value messages) accounts this would be acceptable. I
would personally set the period to months (if configurable; actually it could
take the value from 'keep on server for x days', that would make sense). As to
the slow processing - it would cost only CPU and not too much I think. Comparing
which of the 1000 UIDLs (strings) on the server aren't already in the 10000
UIDLs on the local disk... I think a matter of seconds on the slowest machine.
No special network requests involved.
(In reply to comment #8)

> Is something wrong with this? I don't see how we could manage without the 'c'
> state (besides leaving the message in 'd', issuing DELE again on next connect
> [...]

'c' isn't wrong but unnecessary (in the "don't keep deleted for longer"-case!).
I took your example to show you why:

1. Message A is received, it is marked 'k'.
2. You decide to delete it.
3. It is marked 'd'.
4. You connect, all 'd' messages not on the server (UIDL not listed) are removed
from our list, message A stays because upon connect it's still on the server.
5. Message A is deleted on the server, it stays 'd'.
6. Another message B is received.
7. You delete it, it becomes 'd'.
8. Another connect, all 'd' messages not on the server (UIDL not listed) are
removed from our list, message A goes away from our list, message B stays.
9. Message B has 'd' so we issue a DELE for it and leave it 'd'.
10. etc.
Now that's nice:) It is even more robust in cases where the messages get deleted
on the server without Mozilla knowing, e.g. through a web interface.
David, what do you think about this one? Do you have any cycles to look at it? 
Asa, this would involve a substantial re-working of the popstate.dat code (I
believe the code has worked this way since Netscape 2.0 - it's not a regression
or anything like that...)
Thanks David. Minusing for 1.8. 
Flags: blocking1.8a6? → blocking1.8a6-
Is there any progress on this or is it still in the design state?

One more aspect of this: what happens when the new file is moved to an older
version of mozilla? The syntax is not changed (as you plan to have the same 2
states). But there will be messages (marked 'd') in the file that are not on the
server anymore. Will older versions cope with this?
Assignee: sspitzer → nobody
Do we have a feeling to what extent this might mitigate user pain, and real or user perceived bugs?  Any significant impact to hotmail, gmail & webmail users, which is/has been on the rise?
QA Contact: networking.pop
This would help very much on dialup, due to unstable connections and on unstable OS. But any user of POP3 can benefit (won't get duplicate messages problems) because of mozilla bugs, like mozilla stopping downloading even when the link is still up. I have seen many of such situations on my win98 dialup machine.
Product: Core → MailNews Core
Is there perhaps a middle ground solution that is more tractable and not so radical as noted in comment 9?  Baby steps possible?
Severity: normal → major
We already mitigated the losing connection issue. The crashing issue during download is the only one left, as far as I know.
Nice to hear that. Which bug fixes the connection issue?
(In reply to comment #20)
> The crashing issue during download is the only one left, as far as I know.

bienvenu, is the crash you refer to one of these ...
Bug 536800 crash [@ nsPop3Sink::CheckPartialMessages(nsIPop3Protocol*)]
bug 536802 crash [@ net_pop3_load_state]
bug 557123 crash [@ nsPop3Sink::IncorporateComplete(nsIMsgWindow*, int)]
bug 558049 crash [@ nsPop3Sink::AbortMailDelivery(nsIPop3Protocol*)]
Depends on: 352998
I think we talk about a theoretical scenario of crashing between download of new messages and writing the new popstate.dat file. In this case the file stays out of sync with the mbox file. But if there are any real crash cases, then even better.
(In reply to comment #24)
> I think we talk about a theoretical scenario of crashing between download of
> new messages and writing the new popstate.dat file. In this case the file stays
> out of sync with the mbox file. But if there are any real crash cases, then
> even better.

so comment 8 and comment 9 (I think) summarize what's left to mitigate against all crashes.  (although fixing crash bugs would help with user pain)

bienvenu recently commented in a related bug but I couldn't identify it. 
https://bugzilla.mozilla.org/buglist.cgi?type1-0-0=substring&emaillongdesc1=1&field0-0-0=short_desc&bug_severity=blocker&bug_severity=critical&bug_severity=major&bug_severity=normal&type0-0-1=substring&field0-0-1=component&type1-0-1=allwordssubstr&resolution=---&classification=Client%20Software&classification=Components&emailtype1=substring&query_format=advanced&longdesc=duplicate&value0-0-1=pop&email1=bienvenu&type0-0-0=anywords&value0-0-0=pop%20pop3&field1-0-0=short_desc&product=MailNews%20Core&product=Thunderbird&longdesc_type=allwordssubstr&field1-0-1=short_desc


fwiw, r3acer, using v3.1.4 at the time, says "haven't experienced this bug in a very long time and I am still using pop."
Summary: popstate.dat file is not updated while messages are deleted or downloaded from the server. → popstate.dat file is not updated while messages are deleted or downloaded from the server, resulting in incorrect state relative to server after a crash
Whiteboard: [notacrash]
For those looking for a temporary solution, I have written a simple PHP script to rebuild the popstate.dat file. Please check:

https://github.com/Magentron/rebuild_thunderbird_popstate

Fork it!
How does it work? From reading the source it seems it connects to the server and gets the UIDLs and writes them all with the 'k' state as the new contents popstate.dat . I do not see a check which of those UIDLs really are downloaded in the mbox.
If that is the case, then it is not really complete. The bug is about TB crashing halfway through downloading email, so it means some messages are left on the server, that are not in the mbox file. So we don't want those in the popstate.dat. Also, some new messages may have arrived between the last connection and running this tool. We must first download them before putting them into popstate.dat. Can you update this? Compare UIDLS from server to UIDLS in mbox files (may take a while, maybe .msf files could help, but they may not be reliable in case of crash) and only put those into popstate.dat that already exist locally.
Also, put a warning somewhere, that the script must be run with Thunderbird closed. Thanks for working on this.
@Aceman: it's a simple script that helped me get my Thunderbird running again as expected without having to download everything. The scope of this script was not to parse all mail files of Thunderbird, but to generate it from the messages present on the server. Since I have web access to the mail on the server I could easily determine how many messages were not yet in my Thunderbird files, to exclude these I added the -i (ignore n last messages) parameter. After running the script I could check whether new messages had arrived and if so, I could make some manual changes to the generated popstate.dat file. I changed the default filename and added a message/doc about Thunderbird being closed to prevent problems when Thunderbird is running.
If you need more improvements, please fork the project and add them yourself.

Ping, will this have been helped, or totally fixed by your rewrite?

Flags: needinfo?(remotenonsense)
See Also: → 1762327

It was improved a little by bug 1762327 to update popstate.dat every 20 messages. But the root problem is not fixed. Instead of relying on popstate.dat, I think we should use msg folder as source of truth.

Flags: needinfo?(remotenonsense)
See Also: → 801443
See Also: → 1815146
You need to log in before you can comment on or make changes to this bug.