Closed Bug 32443 Opened 24 years ago Closed 22 years ago

GetMsg currently does NOT detect out of disk space conditions

Categories

(MailNews Core :: Backend, defect, P3)

x86
Linux
defect

Tracking

(Not tracked)

VERIFIED WORKSFORME

People

(Reporter: jce2, Assigned: jce2)

References

Details

(Keywords: dataloss, Whiteboard: Cannot reproduce since checkin. relnote-user)

Attachments

(2 files)

I ran out of space on the partition where my mail gets downloaded,
and instead of giving me an error, it acted like it downloaded everything
perfectly. Then when I went to read the last 10 messages, I kept getting
"Unexpected error" dialogs popping up. Of course, the "unexpected error"
was that I had run out of disk space, and the body to those messages
had NOT been downloaded.

You need to check for out of disk space conditions while downloading and
saving every message.
Reassign to jefft
Assignee: phil → jefft
Summary: Mail/News currently does NOT detect out of disk space conditions. → GetMsg currently does NOT detect out of disk space conditions
Status: NEW → ASSIGNED
Target Milestone: M17
M17!??? This needs to be fixed NOW..and it should be an easy fix too!!



(at least if you isolate disk reads in one procedure, like you

should be doing).



Assign it to ME if you're that swamped.

If you want to fix this and have the fix be x-platform, please work with jefft 
to assign this bug report to you.
jce2, thanks for offering help. Let me know when you have a fix. Appreciated.
Assignee: jefft → jce2
Status: ASSIGNED → NEW
Adding jefft to cc list...
Well, there WAS an out of disk check in nsPop3Protocol.cpp,

but it seems to be ifdefed out.



#if 0 // not yet

        {

            const char* dir =

MSG_GetFolderDirectory(MSG_GetPrefs(m_pop3ConData->pane));



            /* When checking for disk space available, take in consideration

             * possible database

             * changes, therefore ask for a little more than what the message

             * size is. Also, due to disk sector sizes, allocation blocks,

             * etc. The space "available" may be greater than the actual space

             * usable. */

            if((m_totalDownloadSize > 0)

               && ((PRUint32)m_totalDownloadSize + (PRUint32)

                   3096) > FE_DiskSpaceAvailable(ce->window_id, dir))

            {

                return(Error(MK_POP3_OUT_OF_DISK_SPACE));

            }



        }

#endif



So, supposedly, we need to re-write this check using nsIFile instead

of whatever this FE thing was.



BTW: When I re-write that, I'm going to make it a bit clearer what the hell
a "ce" is. :P

Well, I've just found that nsLocalFile::GetAvailableDiskSpace has NOT
been implimented, and we pretty much need that implimented before we
can fix this bug. :P

I've filed that as 32942.

Depends on: 32942
I've resolved bug #32942, now I'm getting ready to fix this bug.

Therefore, assigning it to myself.
Status: NEW → ASSIGNED
This bug just bit me again. I've been too busy because of my relocation and
looking for a job to fix it, but I'll try to do it this weekend.

I'm trying to get a nsbeta2+ to check the fix in.
Keywords: nsbeta2
Putting on [nsbeta2+] radar for beta2 fix. 
Whiteboard: [nsbeta2+]
*sigh* after getting GetDiskSpaceAvailable implimented in nsIFile, I suddenly
realized that nsPop3Protocol.cpp still uses nsFileSpec! 

Anyways, I still should be able to impliment this.
QA Contact: lchiang → laurel
I'm wondering if we should hold nsbeta2+ for this. I'd be more than happy to
take in beta3. I don't think I'd pull beta2 off the wire if we have this problem.

anyone mind if I nominate for beta3 and take if off the beta2 radar?
Ok, ok, I'll get off my ass and fix this one today.

What error code should I return if there isn't enough disk space? NS_NODISKSPACE ?
Hey jce2, we're going to take this off the beta2 radar. We don't think we'd pull
the beta off the wire to snag this fix. But don't worry. We'll be branching the
beta tomorrow. So you can check this into the tip (which won't be beta2) as soon
as tomorrow if you have a fix. Not trying to rush you, if anything this just
gives you more time. 
Keywords: nsbeta3
Whiteboard: [nsbeta2+] → [nsbeta2-]
Keywords: relnote2
Almost ready to submit my patch. But I have two questions first:

1) I've written my procedure based on nsIFileSpec, because mail/news is still
using nsIFileSpec. Should
I also write a procedure based on nsIFile, and make an ifdef switch out of  it?

2) More important:

Are we ever going to have more then one pop process running at the same time? My
code
does not contain the semaphores or any of the other code needed to accurately
handle two
pop processes running within the same partition.

I'm also returning MK_POP3_OUT_OF_DISK_SPACE, instead of a XP error code.
Ok, here's my first patch to fix this bug. I've stepped through this code about
20 times, so I'm very confident with it.

Now we need a procedure that will handle the out of disk space error
appropriately. Right now, the mail download will simply abort if there isn't
enough disk space, and a debug message will be printed to the console saying
that there isn't enough disk space. Of course, we should pop up a dialog
box explaining why the mail download has been aborted.
Adding patch keyboard and removing [nsbeta2-] from whiteboard for re-evaluation.
(I'm hoping that you might be able to check this into the branch).

We should check this in as soon as possible, even though we don't have a dialog
box handler for this yet, since this bug causes loss of data. 	
Keywords: patch
Whiteboard: [nsbeta2-]
Putting on [nsbeta2-] radar. Not critical to beta2.  Adding "nsbeta3" keyword 
for consideration of a fix for that milestone. 
Whiteboard: [nsbeta2-]
jce2 - I think you need to email waterson@mozilla.org or brendan@mozilla.org for 
permission to checkin your patch.  thanks.  Do you think you'll be able to get 
to a dialog to display to the user why Get Message fails for them?  
I need the mail-news  people to review this patch first, right?

I'll add the dialog box code as soon as I figure out where that should reside.
Hey, jefft, would you give me a r= on this one?

And since this was a "relnote2", this should definately be a nsbeta3+
Keywords: review
Target Milestone: M17 → M18
mscott, bienvenu: Could you review the attached patch please? It has been
waiting for about 10 days now. Thanks!
Whiteboard: [nsbeta2-] → [nsbeta2-] [FIX IN HAND]
It doesn't matter if nsbeta3+ or not since this is not written by a Netscape 
employee (ie. patch).  What dialog box code are you adding?  What will the 
dialog box say, look like?  Thanks

BTW, jefft is on sabbatical, hence the lack of response from him for a review.
If you have a fix, please get a review and approval by going through the Mozilla 
review/approval process.

But, not holding Netscape PR3 for, so - per mail triage
Keywords: mailtrack
Whiteboard: [nsbeta2-] [FIX IN HAND] → [nsbeta3-][nsbeta2-] [FIX IN HAND]
OK, r=bienvenu. Could you find someone to try this on the mac for you? You've
tried it on Linux, and I can try it on windows, but we should try it on the mac.
Hey! The patch is right there, I've been waiting for ages for the module owners
to review it (They DO get this bug report e-mailed to them, right?). All I need
is some guy to test it on the mac, and then I can ask brendan to check it in. I
have complete confidence in this code, I stepped through it literally 25 times.

Second of all, this bug causes DATA LOSS (I should know, I lost 75 of my e-mail
messages when it bit me in march). This NEEDS to be in nsbeta3. It needed to be
in nsbeta1. We're not Microsoft (are we?) There's no excuse for our software to
lose potentially important e-mail
messages.

Third of all, it would help if module owners would make a daily search for bugs
with the keyword "patch" on them in their module. The netscape PDT is being so
restrictive about marking bugs nsbeta3 that you're only going to get some (VERY
important) things fixed through patches from the outside world. If these patches
don't make it in, netscape 6 is going to (bluntly) suck.

Adding "dataloss" keyword to bug and clearing nsbeta3- for reconsideration
(after all, the fix is RIGHT THERE, just review it and check it in!)
Keywords: dataloss
Whiteboard: [nsbeta3-][nsbeta2-] [FIX IN HAND] → [nsbeta2-] [FIX IN HAND]
I agree this should be nsbeta3+, and again, Jeff is on sabbatical, along with a
couple other people, and those of us not lucky enough to take a vacation have
been scrambling to cover for them; hence the lack of response. The main reason I
wanted this tested on the mac is that GetDiskSpaceAvailable is notoriously
platform dependent. I'm sure brendan or waterson would be happy to approve this.
Sorry for the delay, i was sick. Now, I am working on it and so far I found two compile errors with the way you 
manage PRInt64 variables which on Mac are in fact a struct of two int. I will attach the new patch when I am done 
testing it...
Here is the two modifications I add to to in order to compile on Mac:

1) replace:
				PRInt64 mailboxSpaceLeft = 0;
    by:
				PRInt64 mailboxSpaceLeft = LL_Zero();

2) replace:
             	// The big if statement
             	if ((m_totalDownloadSize + (PRInt64) EXTRA_SAFETY_SPACE)
                 	> mailboxSpaceLeft )
    by:
            	// The big if statement            	
            	PRInt64 llResult;
            	PRInt64 llExtraSafetySpace;
            	PRInt64 llTotalDownloadSize;
            	LL_I2L(llExtraSafetySpace, EXTRA_SAFETY_SPACE);
            	LL_I2L(llTotalDownloadSize, m_totalDownloadSize);
            	
            	LL_ADD(llResult, llTotalDownloadSize, llExtraSafetySpace);
            	if (LL_CMP(llResult, >, mailboxSpaceLeft))            	
The patch works fine on Mac except than when my disk is full (yes, I tried this case), the function GetMsg doesn't 
download the messages and return an error but the User is not notified of the problem! No alert, no status, just 
looks like GetMsg doesn't work or you don't have new messages. 
First of all, I'm GLAD that we tested this on mac first. It would have been very
embarrasing to be the one who broke the build on mac.

ducarroz: Could you e-mail me and give me a quick explanation about how you use
LL_I2L on mac, and what I need to do differently when I do mathmatical
operations? I was completely clueless of the compile error over in Linux land.
(Either that, or give me a good url where all of this is explained)

Basically, you're saying that PRint64 = struct { int a; int b } on Mac?
(Yes, I know that isn't the correct typedef syntax, It's too early in the
morning for that).

Anyways, this patch WON'T pop up a dialog box when the Out of disk space
condition is detected. That needs to be done by the procedure that calls
nsIPop3Sink (and I don't quite know what object that is yet).

As I said in an earlier comment:

"Now we need a procedure that will handle the out of disk space error
appropriately. Right now, the mail download will simply abort if there isn't
enough disk space, and a debug message will be printed to the console saying
that there isn't enough disk space. Of course, we should pop up a dialog
box explaining why the mail download has been aborted."

I'm sure this is VERY easily done, as soon as I can find:
a) The procedure that is supposed to handle MK_POP3_OUT_OF_DISK_SPACE.
b) The xul file that contains all the dialog definitions.

I'll look at finding these things when I get home this evening (I'm in New
Zealand).

In the meantime, we should apply this patch to the main tree, even without the
dialog box handler present, because this is a HORRIBLE dataloss bug. Which of
the two outcomes would you prefer if a user runs out of disk space?

1) The message download aborts, the user thinks that he has downloaded all of
his messages even though he really hasn't, and the remaining messages sit on the
pop3 server until he realizes that mozilla suddenly isn't downloading any more
messages and either thinks to check his debug console (where it says "Out of
disk space! Raising error!"), or he comes on #mozilla and asks "Why isn't it
downloading any new messages?". Anyways, on Windows (since this will probably be
located on his "C:" drive, he's going to realize that he's run out of disk space
soon enough).

Or

2) (What CURRENTLY happens): Mozilla appears to download ALL of the messages
located on the server (and DELETES them from the server). The user has no idea
that something is wrong until he clicks on a message header from a message that
was lost due to no disk space and gets an "Unknown error" dialog. THOSE MESSAGES
ARE LOST FOREVER! Especially now that we have "biff" capability, and the ability
to automatically check messages, this bug is DEADLY! Mozilla could run on a
computer all day, happily automatically losing ALL of the e-mail messages that
were sent to that user that day (and maybe the next day).

(3) Mozilla detects the out of disk space condition and pops up a dialog box
telling the user. We'll get that as soon as I find those objects, or ben helps
me out a bit. :P)

So PLEASE check this in, even without the dialog box handler (It will follow in
the next few days). We NEED to get rid of this HORRIBLE data-losing bug!
I should add that my earlier patch obviously compiled for me on Linux and Windows.
And THANK YOU, ducarroz, for fixing my patch so it does compile on Mac :-)

We might want to write up a quick page on what we need to do to make sure that
mac builds don't break...
removed keyword review as this patch has been reviewed. R=bienvenu,ducarroz.

jce2, do you want me checkin the fix for you?
Keywords: review
a=brendan@mozilla.org -- who is gonna make a dialog or status message alert the
user to the problem?  Is that this bug, with another patch forthcoming, or is it
another bug?

Please use diff -u format for patches.

The prlong.h macros, for which I'm originally to blame, are necessary in Mozilla
for any PR*Int64 arithmetic or bitwise-logical operations.  This is "known" but
probably underdocumented.  There are other platforms than Mac that lack native
compiler support for 64-bit integers.

/be
I have filled bug 49868 about the missing error message when the disk is full.
Yes, please check the fix in for me (I don't have cvs write access). (Maybe I
should start the process of getting write access).

I think we should create another bug for popping up the dialog, since this bug
already has enough descriptions added to it, and the other patch should go under
another component anyways.

And yes, those macros are WELL underdocumented. I had no idea about them.
I will check it in a soon the tree open... and I have already open a new bug for 
the error message (bug 49868).
jce2: out of curiosity, and maybe it'll do some good: what are the doc(s) that
you in fact have perused, and in which some statement about using prlong.h's
macros as well as its types would have done some good?  I'd genuinely like to
know, but I'm also skeptical -- and I feel that if you #include "prlong.h" and
use its typedef names, you should investigate their operational semantics a bit
more -- say by reading that header file.

/be
Fixed and checked in
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
OK, I'm very confused by all the conversation in this bug and am not certain
what everyone expected the final fix to be...

Using the oct10 branch commercial build with linux rh6.0:
POP user without "leave on server" enabled:
1. Does Get Msg when out of disk space, the header is received and displayed in
the thread pane.
2. When user selects the message they get the "Unknown Error" alert, which tells
them nothing (I know there's a futured bug about the text, but 49868). The body
is not downloaded.
3. User exits and re-launches, logs in to server, gets messages.  The header(s)
previously downloaded which hit out of space condition are never seen again,
even when disk space is freed and other new messages are received.

POP user with "leave on server" enabled:
All of the above happens, but once disk space is freed, the message is
gotten/recieved again and body is downloadable.

Using NT 4.0 with today's commercial branch:
User never sees any problem, never gets the header downloaded, It just appears
that Get Msg isn't working or there aren't any messages.  Once disk space is
freed, they will get the message(s) whether or not "leave on server" was enabled
or disabled.

So, I haven't tried mac yet (BIGer disk to fill), but I assume from ducarroz'
testing when he tested the fix that mac should work like NT.

I am reopening this based on the Linux case -- data loss in the event that you
didn't leave the messages on the server.  I still think this is crappy that the
user (who would never look at a console even if there was one appearing in the
regular user world) doesn't know what is up until they discover in some other
way that the disk is full... that argument is covered in the dialog bug 49868.

I am also going to add keyword rtm so that the PDT team puts their two cents in
(again). Cleared the [FIX IN HAND] text from whiteboard, too.
Status: RESOLVED → REOPENED
Keywords: relnote2rtm
Resolution: FIXED → ---
Whiteboard: [nsbeta2-] [FIX IN HAND] → [nsbeta2-]
Severity: blocker → critical
Keywords: relnoteRTM
mail team, this is a dataloss bug on Linux.  Can anyone give a hand and see why
Linux didn't get fixed along with the other platforms?
Whiteboard: [nsbeta2-] → [nsbeta2-][rtm need info]
I'm sorry. I've tried and I've tried and I've tried, and I can't reproduce the
data-loss described in this bug since I checked in my fix many months ago.

Is it possible that "GetDiskSpaceAvailable" is broken in some cases and not
broken in other cases?

I repeat, I can not reproduce what laurel has commented about. I have tested
this in two ways:

1) Run mozilla in a debugger, set a break point to right after
GetDiskSpaceAvailable() is called and changed the value to less then what
it needs to download, exactly what it needs to download, slightly over the
amount it needed to download, and zero, and a negative number. Results (as
expected): Doesn't download, downloads, downloads, doesn't download, doesn't
download.

2) I filled up a partion with junk, then mapped my mail file to that test
partition. It wouldn't download, and the message said "Not enough disk space".

Therefore, I can't reproduce this bug at all. When it DOES happen, it MUST be a
failure of the GetDiskSpaceAvailable procedure (as shown by my testing of #1).

Therefore, this bug should drop off of the RTM radar, unless it can be
reproduced, and even then, the problem is probably (most definately) with
GetDiskSpaceAvailable and not GetMsg.
Whiteboard: [nsbeta2-][rtm need info] → [nsbeta2-][rtm need info] Cannot reproduce.
I'm taking this to rtm- since I own the bug.
Whiteboard: [nsbeta2-][rtm need info] Cannot reproduce. → [nsbeta2-][rtm-] Cannot reproduce since checkin.
I read somewhere (I guess it was in another bug) that on some Unix flavor, if
the user has a quota, GetDiskSpaceAvailable returns the amount of physical disk
space available, not the amount of space available in your quota. So, the only
way to really guarantee that you have enough disk space available is to actually
try to allocate it on disk. I'll go try to find that bug. I'm not sure if that's
what's going on here. I don't know how Laurel recreated the out of disk space
situation.
"Low disk space" relnote...

Gerv
Whiteboard: [nsbeta2-][rtm-] Cannot reproduce since checkin. → [nsbeta2-][rtm-] Cannot reproduce since checkin. relnote-user
LAST CALL to everybody:

Can anyone come up with any situation other than running out of disk quota that
causes loss of data like this? If not, then this bug should be closed, and
another one filed, because the bug is then in GetDiskSpaceAvailable, and not GetMsg.
I saw the described behavior on a late December build when running out of disk
space on an x86 linux 2.2 system.  No quotas or anything like that.  I wound up
with an InBox that acted super-stupid when I was trying to file or delete
messages.  The Inbox mbox file wound up with a segment that looks like this:

0000000   F   r   o   m       -       S   a   t       D   e   c       3
0000020   0       1   3   :   3   8   :   0   3       2   0   0   0  \n
0000040   X   -   U   I   D   L   :       5   e   0   0   d   8   c   a
0000060   6   9   7   b   6   f   a   9   0   9   0   b   1   5   2   8
0000100   f   8   c   c   9   4   7   2  \n   X   -   M   o   z   i   l
0000120   l   a   -   S   t   a   t   u   s   :       0   0   0   1  \n
0000140   X   -   M   o   z   i   l   l   a   -   S   t   a   t   u   s
0000160   2   :       0   0   0   0   0   0  \0  \0  \0  \0  \0  \0  \0
0000200  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0075000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0   F   r   o
0075020   m       -       T   u   e       J   a   n           2       1
0075040   9   :   4   6   :   3   3       2   0   0   1  \n   X   -   U
0075060   I   D   L   :       1   c   5   6   4   2   3   8   e   d   7
0075100   f   9   8   1   7   d   2   e   9   b   c   e   8   b   3   e
0075120   a   f   8   8   4
0075125

Notice the line numbers in the od output - a whole lot of zeros.  It looks like
something got smacked trying to write the X-Mozilla-Status2: line.

Hope that helps.
Hmm...maybe another procedure other then GetMsg decided to write out status
information, but couldn't.

Only GetMsg grows the size of the file (except for copying/moving a msg into a
folder).
Blocks: 55814
Blowing away nsbeta2- and rtm- since those are ancient.

Bill: Do you remember if you did any copying or moving messages in or out of
that folder before this happened?
Whiteboard: [nsbeta2-][rtm-] Cannot reproduce since checkin. relnote-user → Cannot reproduce since checkin. relnote-user
Keywords: mailtrack
Target Milestone: M18 → ---
Keywords: nsenterprise
I'm sorry if this'll sound stupid, but if no mail is deleted on the server
_before_ we know for sure it's been written out correctly, how can data-loss
occur then? It would make a failing disk-space checking procedure (not saying it
is, just _if_, for whatever reason) a lot less painful...

In 0.9.2 Release notes:
[... If that option ("Leave messages on Server") was not enabled, those messages
are lost. ...]

This couldn't be possible anymore then, since a message would have to be written
fully before the 'dele' or whatever command. (the possibility of loosing mail -
anyhow, even by randomly pulling the modem's plug- sends a shiver down my spine,
 and I'm sure I'm not the only one ... :) ).
Removing nsenterprise nomination.
Keywords: nsenterprise
I'm not sure why this is still at an open state. 
I've verified with apr8-9 commercial trunk build for bug 49868.
Marking verified worksforme
Status: REOPENED → RESOLVED
Closed: 24 years ago22 years ago
Resolution: --- → WORKSFORME
.
Status: RESOLVED → VERIFIED
Blocks: 49868
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: