downloaded mails are lost when disk is full

VERIFIED FIXED in mozilla0.9

Status

P2
critical
VERIFIED FIXED
18 years ago
10 years ago

People

(Reporter: martin.sperl, Assigned: naving)

Tracking

({dataloss})

Trunk
mozilla0.9
x86
Windows 2000
dataloss

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [nsbeta1+]relnote-user [1.0 stop ship?])

Attachments

(1 attachment)

(Reporter)

Description

18 years ago
i tested Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; m18) Gecko/20000929
Netscape6/6.0b3 and stored the netscape profile and mail a samba drive where the
available space is limited by quotas (digital unix). When the drive is nearly
full and i download a mail which is bigger than the available disk space (via
pop3) the mail is downloaded to my workstation and the subject is shown. but:
when i want to view the mail i get an unknown error.
after i closed the mail client(and browser) i deleted some files to get free
space and opend the mail client again.

the downloaded mail(s) are neither in my local mailbox nor on the pop3 server
--> mail lost !

if have you have further questions please mail to martin.sperl@gmx.net

thanks for your great work

Comment 1

18 years ago
not a database issue. Jeff has worked on this in the past, and a mozilla
contributor checked in a fix for pre-flighting the disk space available. I
wonder if FE_DiskSpaceAvailable is returning the right value for the Samba drive.
Assignee: bienvenu → putterman
Component: Mail Database → Mail Back End

Comment 2

18 years ago
reassigning to jefft.  We should verify that mail is really lost. In the past
we've minused these bugs because we thought mail wasn't being lost.
Assignee: putterman → jefft

Comment 3

18 years ago
Investigating...
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Target Milestone: --- → M18

Comment 4

18 years ago
Will be mentioned in general low disk space release note item
Related: bug 57902, bug 49868, bug 32443
Keywords: relnoteRTM
QA Contact: esther → laurel
(Reporter)

Comment 5

18 years ago
windows through samba limited by quotas reports the free space of the whole disk 
and not the available space for the specific user limited by his quotas. a 
serious check for available disk space have to allocate the disk storage 
necessary for the next operation to see wheter it will work or not.
Whiteboard: relnote-user

Comment 6

18 years ago
reassigning jefft's bugs to naving
Assignee: jefft → naving
Status: ASSIGNED → NEW

Updated

18 years ago
Severity: normal → critical
Keywords: mozilla0.8
Whiteboard: relnote-user → relnote-user [1.0 stop ship?]

Comment 7

18 years ago
Losing mails is the worst thing a mail client can do :) If this bug is real, it
should be evaluated as a possible stop ship for Mozilla 1.0. Nominating for
mozilla 0.8 for a start. I expect this to get pushed out a little more, but
let's avoid a "but now it's too late for this" dilemma for 1.0. Are there any
"arch" or "highrisk" issues with this?
Upping severity because of dataloss. Please correct me if I'm wrong.

Comment 8

18 years ago
Damn it, is there a path that I missed through GetMsg that doesn't hit my disk
space checking routine?

Adding a dependancy to that bug.

Depends on: 32443

Comment 9

18 years ago
Unless I'm totally misunderstanding the code, reserving the disk space won't do
anything. I got pointed to this from bug 62480, which is POP3, as is the
original reporter's problem (so I haven't actually experienced this, I've just
been looking at the code)

nsPop3Sink::WriteLineToMailbox does *m_outFileStream << buffer, which has no way
of checking for disk full errors (techically, you have to flush before you know
that for sure - maybe this should be done?) The patch to 62480 actually checks
for error status, but nothing's setting them, so it doesn't make a difference.

I can't see anything being reserved. Mozilla checks that space is available, but
this is subject to races (ns3PopProtocol.cpp#1803 is where you're talking about,
right?) if someone else is writing to the disk at the same time you are.
Checking before is OK, but mozilla should deal with unexpected failures (IO
failure, NFS server dying, etc) Anything else just causes dataloss. Also, if
Inbox is a symlink, checking the ammount of space available in the folder won't
help.

NB - the reporter mentions using samba. If you don't compile samba correctly
(--enable-quota, IIRC), then some older versions won't tell windows that
something went wrong (if hard quota != soft quota, but occasionally anyway,
especialy if the server isn't linux), and it will happily keep writing. From
personal experience, I know that MS Access and Word onto a quota'd samba drive
will corrupt their own files quite happily.  In this case, there isn't anything
mozilla can do about it. This doesn't mean that there isn't a problem though.

Comment 10

18 years ago
This currently isn't planned for mozilla0.8.  I've changed the nomination to
mozilla0.9 so it gets properly triaged for the next milestone.

It looks like others besides naving are looking at this.  If someone else is
able to fix this by 0.8, then please do so!
Keywords: mozilla0.8 → mozilla0.9
putterman: The nomination for mozilla 0.8 was only to evaluate if there are any
"arch" or "highrisk" issues with this bug, to avoid a "but now it's too late for
moz 1.0" dilemma. Please consider doing this evaluation early in the 0.9 cycle.

Comment 12

18 years ago
yeah, I sort of fell down on the job when it came to looking at mozilla0.8
nominations (I was busy with all of the nsbeta1 nominations).  It turns out
there weren't too many that weren't looked at already and I've gone through and
made sure that they get nominated for mozilla0.9 so that they can be evaluated
rather than lost.

Comment 13

18 years ago
Ideally Mozilla wouldn't delete messages from the POP server without first
confirming the messages have been successfully written to the mail file on disk.
Having two copies of the same message is a lot better than losing a message
forever. I assumed this was what Moz did, but some of the commments above and on
bug 62480 lead me to doubt this. Can anyone clarify?

Comment 14

18 years ago
mozilla.org@pidgin.org: Thats what this bug is about. Currently there is no way
for the pop code to know that the write failed. Adding dataloss keyword.

The full solution for this bug is more complicated though, because if we run out
of space halfway through the message, we have to delete the half of the message
we have already written, or the mailbox may be left corrupted. Actaully noticing
the error will mean that we don't delete the message, which is a start.

FWIW, I've had the same mailloss in NS4.x, when the linux machines netscape was
running on got the wrong sizes for quota from the sunos NFS drives.
Keywords: dataloss

Updated

18 years ago
Target Milestone: M18 → ---

Comment 15

18 years ago
nominating for beta1
Keywords: nsbeta1

Comment 16

18 years ago
marking nsbeta1+
Priority: P3 → P2
Whiteboard: relnote-user [1.0 stop ship?] → [nsbeta1+]relnote-user [1.0 stop ship?]
Target Milestone: --- → mozilla0.9
(Assignee)

Comment 17

18 years ago
We don't begin downloading messages until we make sure that the disk space
is available. From the comments in the code, GetDiskSpaceAvailable() may not 
work on all platforms. I believe this may be true for samba drive (digital unix) 

Reporter, you can verify this if you have a debug build. You should look for

"Call to GetDiskSpaceAvailable FAILED! " on the console

Comment 18

18 years ago
The problem is that GetDiskSpaceAvailable is not atomic - it tells you what is
available now, not what may be available by the time we've finished downloading.

The GetDiskSpaceAvailable call should be there an an optimisation ("if we know
we can't finish, don't bother starting"), not as a solution.

That check is also totally broken anyway, at least on unix - consider quotas:

nsLocalFileUnix.cpp:
1004     // The number of Bytes free = The number of free blocks available to
1005     // a non-superuser, minus one as a fudge factor, multiplied by the size
1006     // of the beforementioned blocks.

This may or may not be a bug in the nsLocalFileUnix implementation - it is
returning the ammount of free disk space available. The only comment in the idl
file is: // maybe we should put this somewhere else.
Which isn't much help as to the official definition of the attribute.

Regardless, the code _must_ get the return value - even if you consider this
ammount + the slack checked for to be sufficient, think disk IO error, network
drive going offline, etc. The problem is that nsPop3Sink::WriteLineToMailbox
just uses the nsIOFileStream, which doesn't report errors.
(Assignee)

Comment 19

18 years ago
The available space can change as we begin downloading, only due to external
factors, like downloading some other app/file. 

What you suggest about failed writes may work in this case. 

However this case may not occur very often.


under multi-user operating systems, other users can affect the available space
on disk through normal usage. I think the case would occur as the norm on most
shared systems.

am I misreading something?
(Assignee)

Comment 21

18 years ago
Under multi user OS, I think we have quotas so one user's memory usage should 
not affect others.

The best we can do right now is do GetDiskSpaceAvailable before we download each
message but it will slow down getting messages.

Comment 22

18 years ago
> Under multi user OS, I think we have quotas so one user's memory usage should 
not affect others.

1. GetDiskSpaceAvailable doesn't take quotas into account on (at least) unix. I
filed bug 72892 on that.

2. Besides, thats only true if numUsers*diskSpacePerUser <= diskSize. Which is
not true on the two sets of machines (from two different unis) I have access to,
and I doubt its true generally, although I have no data to back that up. (On
some of the machines, TAs and staff have no quotas at all, but students do. And
root never has a quota, so a large log file could take up all the disk space.
This happened this morning on one machine, and the unix cmd-line mail gave an
semi-informative error and quit.)

3. This is _mail_. Error codes must be checked. Retrieving and sending mail
without losing it is the only essential purpose of a mail client.
at least for all the multi-user machines around here, we don't have quotas and
are free to chew up as much disk with builds as we like.

It's still our responsibility to not eat the user's mail, regardless of the
setup by a sysadmin.

Comment 24

18 years ago
As an end user, I would much rather take a small performance hit every time I
retrieve messages if the alternative is to lose messages every so often, and I
imagine many users feel the same way. Mozilla or any other POP client should get
a return value from whatever is writing the message to disk, and then delete the
message only if the return value indicates the write succeeded. It's not enough
to just guard against common cases like disk full, quota full, bad permissions etc.

Bug 71025 reports a case where messages were lost when writing messages to disk
failed for a reason other than the disk being full, and Mozilla deleted the
message from the server anyway.
Just another case study from reality: Our working group at a university pays per
GB for (university-wide-)centralized backups. Thus our home partition is very
limited in size. In the past, every two months or so it suddenly happened that
the disk was full. There sometimes seems to be a (mysterious) process writing to
disk continuously, until no space is left at all. So this occurs immediately,
while working, even if there was plenty of space an hour ago. Usually we
discover it when we try to use mail. (We are mostly using emacs or exmh for
mail, in a linux/solaris environment. Also, we are still using NN4.6/4.7.)
We are not using quotas, so one user's memory usage _does_ affect others.
(Assignee)

Comment 26

18 years ago
I am working on it and will have a fix soon but how to test it ?
(Assignee)

Comment 27

18 years ago
Created attachment 29209 [details] [diff] [review]
patch that was checked in
(Assignee)

Comment 28

18 years ago
fixed
Status: NEW → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → FIXED

Comment 30

18 years ago
*** Bug 74321 has been marked as a duplicate of this bug. ***

Comment 31

18 years ago
OK with 2001-06-07-13-0.9.1 commercial branch (beta1) build and linux rh6.2.

On a POP account, server settings have "leave messages on server" disabled.
When disk is full, Get Msg (there are indeed several messages to retrieve) gives
no "unknown error".  Status bar text shows there are indeed messages to
retrieve, but doesn't retrieve them -- text is "receiving 0 of N messages". 
Mail window is not left in an unusable state, no hang, etc.  After freeing some
disk space, Get Msg is able to retrieve the new messages just fine; properly
downloaded to inbox and able to be displayed.

Comment 32

18 years ago
Same results on same scenario using 2001-06-07-13-0.9.1 with win98.
Since this is such a time-consuming process to reproduce, I'm going to go out on
a limb and assume mac is okay, too.

Marking this verified.
Status: RESOLVED → VERIFIED
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.