GetMsg currently does NOT detect out of disk space conditions

VERIFIED WORKSFORME

Status

P3
critical
VERIFIED WORKSFORME
19 years ago
11 years ago

People

(Reporter: jce2, Assigned: jce2)

Tracking

({dataloss})

Trunk
x86
Linux
dataloss
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: Cannot reproduce since checkin. relnote-user)

Attachments

(2 attachments)

(Assignee)

Description

19 years ago
I ran out of space on the partition where my mail gets downloaded,
and instead of giving me an error, it acted like it downloaded everything
perfectly. Then when I went to read the last 10 messages, I kept getting
"Unexpected error" dialogs popping up. Of course, the "unexpected error"
was that I had run out of disk space, and the body to those messages
had NOT been downloaded.

You need to check for out of disk space conditions while downloading and
saving every message.

Comment 1

19 years ago
Reassign to jefft
Assignee: phil → jefft
Summary: Mail/News currently does NOT detect out of disk space conditions. → GetMsg currently does NOT detect out of disk space conditions

Updated

19 years ago
Status: NEW → ASSIGNED
Target Milestone: M17
(Assignee)

Comment 2

19 years ago
M17!??? This needs to be fixed NOW..and it should be an easy fix too!!



(at least if you isolate disk reads in one procedure, like you

should be doing).



Assign it to ME if you're that swamped.

Comment 3

19 years ago
If you want to fix this and have the fix be x-platform, please work with jefft 
to assign this bug report to you.

Comment 4

19 years ago
jce2, thanks for offering help. Let me know when you have a fix. Appreciated.
Assignee: jefft → jce2
Status: ASSIGNED → NEW

Comment 5

19 years ago
Adding jefft to cc list...
(Assignee)

Comment 6

19 years ago
Well, there WAS an out of disk check in nsPop3Protocol.cpp,

but it seems to be ifdefed out.



#if 0 // not yet

        {

            const char* dir =

MSG_GetFolderDirectory(MSG_GetPrefs(m_pop3ConData->pane));



            /* When checking for disk space available, take in consideration

             * possible database

             * changes, therefore ask for a little more than what the message

             * size is. Also, due to disk sector sizes, allocation blocks,

             * etc. The space "available" may be greater than the actual space

             * usable. */

            if((m_totalDownloadSize > 0)

               && ((PRUint32)m_totalDownloadSize + (PRUint32)

                   3096) > FE_DiskSpaceAvailable(ce->window_id, dir))

            {

                return(Error(MK_POP3_OUT_OF_DISK_SPACE));

            }



        }

#endif



So, supposedly, we need to re-write this check using nsIFile instead

of whatever this FE thing was.



(Assignee)

Comment 7

19 years ago
BTW: When I re-write that, I'm going to make it a bit clearer what the hell
a "ce" is. :P

(Assignee)

Comment 8

19 years ago
Well, I've just found that nsLocalFile::GetAvailableDiskSpace has NOT
been implimented, and we pretty much need that implimented before we
can fix this bug. :P

I've filed that as 32942.

Depends on: 32942
(Assignee)

Comment 9

19 years ago
I've resolved bug #32942, now I'm getting ready to fix this bug.

Therefore, assigning it to myself.
Status: NEW → ASSIGNED
(Assignee)

Comment 10

19 years ago
This bug just bit me again. I've been too busy because of my relocation and
looking for a job to fix it, but I'll try to do it this weekend.

I'm trying to get a nsbeta2+ to check the fix in.
Keywords: nsbeta2

Comment 11

19 years ago
Putting on [nsbeta2+] radar for beta2 fix. 
Whiteboard: [nsbeta2+]
(Assignee)

Comment 12

19 years ago
*sigh* after getting GetDiskSpaceAvailable implimented in nsIFile, I suddenly
realized that nsPop3Protocol.cpp still uses nsFileSpec! 

Anyways, I still should be able to impliment this.

Updated

19 years ago
QA Contact: lchiang → laurel

Comment 13

19 years ago
I'm wondering if we should hold nsbeta2+ for this. I'd be more than happy to
take in beta3. I don't think I'd pull beta2 off the wire if we have this problem.

anyone mind if I nominate for beta3 and take if off the beta2 radar?
(Assignee)

Comment 14

19 years ago
Ok, ok, I'll get off my ass and fix this one today.

What error code should I return if there isn't enough disk space? NS_NODISKSPACE ?

Comment 15

19 years ago
Hey jce2, we're going to take this off the beta2 radar. We don't think we'd pull
the beta off the wire to snag this fix. But don't worry. We'll be branching the
beta tomorrow. So you can check this into the tip (which won't be beta2) as soon
as tomorrow if you have a fix. Not trying to rush you, if anything this just
gives you more time. 
Keywords: nsbeta3
Whiteboard: [nsbeta2+] → [nsbeta2-]

Updated

19 years ago
Keywords: relnote2
(Assignee)

Comment 16

19 years ago
Almost ready to submit my patch. But I have two questions first:

1) I've written my procedure based on nsIFileSpec, because mail/news is still
using nsIFileSpec. Should
I also write a procedure based on nsIFile, and make an ifdef switch out of  it?

2) More important:

Are we ever going to have more then one pop process running at the same time? My
code
does not contain the semaphores or any of the other code needed to accurately
handle two
pop processes running within the same partition.

I'm also returning MK_POP3_OUT_OF_DISK_SPACE, instead of a XP error code.
(Assignee)

Comment 17

19 years ago
Ok, here's my first patch to fix this bug. I've stepped through this code about
20 times, so I'm very confident with it.

Now we need a procedure that will handle the out of disk space error
appropriately. Right now, the mail download will simply abort if there isn't
enough disk space, and a debug message will be printed to the console saying
that there isn't enough disk space. Of course, we should pop up a dialog
box explaining why the mail download has been aborted.
(Assignee)

Comment 19

19 years ago
Adding patch keyboard and removing [nsbeta2-] from whiteboard for re-evaluation.
(I'm hoping that you might be able to check this into the branch).

We should check this in as soon as possible, even though we don't have a dialog
box handler for this yet, since this bug causes loss of data. 	
Keywords: patch
Whiteboard: [nsbeta2-]

Comment 20

19 years ago
Putting on [nsbeta2-] radar. Not critical to beta2.  Adding "nsbeta3" keyword 
for consideration of a fix for that milestone. 
Whiteboard: [nsbeta2-]

Comment 21

19 years ago
jce2 - I think you need to email waterson@mozilla.org or brendan@mozilla.org for 
permission to checkin your patch.  thanks.  Do you think you'll be able to get 
to a dialog to display to the user why Get Message fails for them?  
(Assignee)

Comment 22

19 years ago
I need the mail-news  people to review this patch first, right?

I'll add the dialog box code as soon as I figure out where that should reside.
Hey, jefft, would you give me a r= on this one?

And since this was a "relnote2", this should definately be a nsbeta3+
Keywords: review
Target Milestone: M17 → M18
mscott, bienvenu: Could you review the attached patch please? It has been
waiting for about 10 days now. Thanks!
Whiteboard: [nsbeta2-] → [nsbeta2-] [FIX IN HAND]

Comment 24

19 years ago
It doesn't matter if nsbeta3+ or not since this is not written by a Netscape 
employee (ie. patch).  What dialog box code are you adding?  What will the 
dialog box say, look like?  Thanks

BTW, jefft is on sabbatical, hence the lack of response from him for a review.

Comment 25

19 years ago
If you have a fix, please get a review and approval by going through the Mozilla 
review/approval process.

But, not holding Netscape PR3 for, so - per mail triage
Keywords: mailtrack
Whiteboard: [nsbeta2-] [FIX IN HAND] → [nsbeta3-][nsbeta2-] [FIX IN HAND]

Comment 26

19 years ago
OK, r=bienvenu. Could you find someone to try this on the mac for you? You've
tried it on Linux, and I can try it on windows, but we should try it on the mac.
(Assignee)

Comment 27

19 years ago
Hey! The patch is right there, I've been waiting for ages for the module owners
to review it (They DO get this bug report e-mailed to them, right?). All I need
is some guy to test it on the mac, and then I can ask brendan to check it in. I
have complete confidence in this code, I stepped through it literally 25 times.

Second of all, this bug causes DATA LOSS (I should know, I lost 75 of my e-mail
messages when it bit me in march). This NEEDS to be in nsbeta3. It needed to be
in nsbeta1. We're not Microsoft (are we?) There's no excuse for our software to
lose potentially important e-mail
messages.

Third of all, it would help if module owners would make a daily search for bugs
with the keyword "patch" on them in their module. The netscape PDT is being so
restrictive about marking bugs nsbeta3 that you're only going to get some (VERY
important) things fixed through patches from the outside world. If these patches
don't make it in, netscape 6 is going to (bluntly) suck.

Adding "dataloss" keyword to bug and clearing nsbeta3- for reconsideration
(after all, the fix is RIGHT THERE, just review it and check it in!)
Keywords: dataloss
Whiteboard: [nsbeta3-][nsbeta2-] [FIX IN HAND] → [nsbeta2-] [FIX IN HAND]

Comment 28

19 years ago
I agree this should be nsbeta3+, and again, Jeff is on sabbatical, along with a
couple other people, and those of us not lucky enough to take a vacation have
been scrambling to cover for them; hence the lack of response. The main reason I
wanted this tested on the mac is that GetDiskSpaceAvailable is notoriously
platform dependent. I'm sure brendan or waterson would be happy to approve this.
Sorry for the delay, i was sick. Now, I am working on it and so far I found two compile errors with the way you 
manage PRInt64 variables which on Mac are in fact a struct of two int. I will attach the new patch when I am done 
testing it...
Here is the two modifications I add to to in order to compile on Mac:

1) replace:
				PRInt64 mailboxSpaceLeft = 0;
    by:
				PRInt64 mailboxSpaceLeft = LL_Zero();

2) replace:
             	// The big if statement
             	if ((m_totalDownloadSize + (PRInt64) EXTRA_SAFETY_SPACE)
                 	> mailboxSpaceLeft )
    by:
            	// The big if statement            	
            	PRInt64 llResult;
            	PRInt64 llExtraSafetySpace;
            	PRInt64 llTotalDownloadSize;
            	LL_I2L(llExtraSafetySpace, EXTRA_SAFETY_SPACE);
            	LL_I2L(llTotalDownloadSize, m_totalDownloadSize);
            	
            	LL_ADD(llResult, llTotalDownloadSize, llExtraSafetySpace);
            	if (LL_CMP(llResult, >, mailboxSpaceLeft))            	
The patch works fine on Mac except than when my disk is full (yes, I tried this case), the function GetMsg doesn't 
download the messages and return an error but the User is not notified of the problem! No alert, no status, just 
looks like GetMsg doesn't work or you don't have new messages. 
(Assignee)

Comment 33

19 years ago
First of all, I'm GLAD that we tested this on mac first. It would have been very
embarrasing to be the one who broke the build on mac.

ducarroz: Could you e-mail me and give me a quick explanation about how you use
LL_I2L on mac, and what I need to do differently when I do mathmatical
operations? I was completely clueless of the compile error over in Linux land.
(Either that, or give me a good url where all of this is explained)

Basically, you're saying that PRint64 = struct { int a; int b } on Mac?
(Yes, I know that isn't the correct typedef syntax, It's too early in the
morning for that).

Anyways, this patch WON'T pop up a dialog box when the Out of disk space
condition is detected. That needs to be done by the procedure that calls
nsIPop3Sink (and I don't quite know what object that is yet).

As I said in an earlier comment:

"Now we need a procedure that will handle the out of disk space error
appropriately. Right now, the mail download will simply abort if there isn't
enough disk space, and a debug message will be printed to the console saying
that there isn't enough disk space. Of course, we should pop up a dialog
box explaining why the mail download has been aborted."

I'm sure this is VERY easily done, as soon as I can find:
a) The procedure that is supposed to handle MK_POP3_OUT_OF_DISK_SPACE.
b) The xul file that contains all the dialog definitions.

I'll look at finding these things when I get home this evening (I'm in New
Zealand).

In the meantime, we should apply this patch to the main tree, even without the
dialog box handler present, because this is a HORRIBLE dataloss bug. Which of
the two outcomes would you prefer if a user runs out of disk space?

1) The message download aborts, the user thinks that he has downloaded all of
his messages even though he really hasn't, and the remaining messages sit on the
pop3 server until he realizes that mozilla suddenly isn't downloading any more
messages and either thinks to check his debug console (where it says "Out of
disk space! Raising error!"), or he comes on #mozilla and asks "Why isn't it
downloading any new messages?". Anyways, on Windows (since this will probably be
located on his "C:" drive, he's going to realize that he's run out of disk space
soon enough).

Or

2) (What CURRENTLY happens): Mozilla appears to download ALL of the messages
located on the server (and DELETES them from the server). The user has no idea
that something is wrong until he clicks on a message header from a message that
was lost due to no disk space and gets an "Unknown error" dialog. THOSE MESSAGES
ARE LOST FOREVER! Especially now that we have "biff" capability, and the ability
to automatically check messages, this bug is DEADLY! Mozilla could run on a
computer all day, happily automatically losing ALL of the e-mail messages that
were sent to that user that day (and maybe the next day).

(3) Mozilla detects the out of disk space condition and pops up a dialog box
telling the user. We'll get that as soon as I find those objects, or ben helps
me out a bit. :P)

So PLEASE check this in, even without the dialog box handler (It will follow in
the next few days). We NEED to get rid of this HORRIBLE data-losing bug!
(Assignee)

Comment 35

19 years ago
I should add that my earlier patch obviously compiled for me on Linux and Windows.
And THANK YOU, ducarroz, for fixing my patch so it does compile on Mac :-)

We might want to write up a quick page on what we need to do to make sure that
mac builds don't break...
removed keyword review as this patch has been reviewed. R=bienvenu,ducarroz.

jce2, do you want me checkin the fix for you?
Keywords: review
a=brendan@mozilla.org -- who is gonna make a dialog or status message alert the
user to the problem?  Is that this bug, with another patch forthcoming, or is it
another bug?

Please use diff -u format for patches.

The prlong.h macros, for which I'm originally to blame, are necessary in Mozilla
for any PR*Int64 arithmetic or bitwise-logical operations.  This is "known" but
probably underdocumented.  There are other platforms than Mac that lack native
compiler support for 64-bit integers.

/be
I have filled bug 49868 about the missing error message when the disk is full.
(Assignee)

Comment 39

19 years ago
Yes, please check the fix in for me (I don't have cvs write access). (Maybe I
should start the process of getting write access).

I think we should create another bug for popping up the dialog, since this bug
already has enough descriptions added to it, and the other patch should go under
another component anyways.

And yes, those macros are WELL underdocumented. I had no idea about them.
I will check it in a soon the tree open... and I have already open a new bug for 
the error message (bug 49868).
jce2: out of curiosity, and maybe it'll do some good: what are the doc(s) that
you in fact have perused, and in which some statement about using prlong.h's
macros as well as its types would have done some good?  I'd genuinely like to
know, but I'm also skeptical -- and I feel that if you #include "prlong.h" and
use its typedef names, you should investigate their operational semantics a bit
more -- say by reading that header file.

/be
Fixed and checked in
Status: ASSIGNED → RESOLVED
Last Resolved: 19 years ago
Resolution: --- → FIXED

Comment 43

19 years ago
OK, I'm very confused by all the conversation in this bug and am not certain
what everyone expected the final fix to be...

Using the oct10 branch commercial build with linux rh6.0:
POP user without "leave on server" enabled:
1. Does Get Msg when out of disk space, the header is received and displayed in
the thread pane.
2. When user selects the message they get the "Unknown Error" alert, which tells
them nothing (I know there's a futured bug about the text, but 49868). The body
is not downloaded.
3. User exits and re-launches, logs in to server, gets messages.  The header(s)
previously downloaded which hit out of space condition are never seen again,
even when disk space is freed and other new messages are received.

POP user with "leave on server" enabled:
All of the above happens, but once disk space is freed, the message is
gotten/recieved again and body is downloadable.

Using NT 4.0 with today's commercial branch:
User never sees any problem, never gets the header downloaded, It just appears
that Get Msg isn't working or there aren't any messages.  Once disk space is
freed, they will get the message(s) whether or not "leave on server" was enabled
or disabled.

So, I haven't tried mac yet (BIGer disk to fill), but I assume from ducarroz'
testing when he tested the fix that mac should work like NT.

I am reopening this based on the Linux case -- data loss in the event that you
didn't leave the messages on the server.  I still think this is crappy that the
user (who would never look at a console even if there was one appearing in the
regular user world) doesn't know what is up until they discover in some other
way that the disk is full... that argument is covered in the dialog bug 49868.

I am also going to add keyword rtm so that the PDT team puts their two cents in
(again). Cleared the [FIX IN HAND] text from whiteboard, too.
Status: RESOLVED → REOPENED
Keywords: relnote2 → rtm
Resolution: FIXED → ---
Whiteboard: [nsbeta2-] [FIX IN HAND] → [nsbeta2-]

Updated

19 years ago
Severity: blocker → critical

Updated

19 years ago
Keywords: relnoteRTM

Comment 44

19 years ago
mail team, this is a dataloss bug on Linux.  Can anyone give a hand and see why
Linux didn't get fixed along with the other platforms?
Whiteboard: [nsbeta2-] → [nsbeta2-][rtm need info]
(Assignee)

Comment 45

19 years ago
I'm sorry. I've tried and I've tried and I've tried, and I can't reproduce the
data-loss described in this bug since I checked in my fix many months ago.

(Assignee)

Comment 46

19 years ago
Is it possible that "GetDiskSpaceAvailable" is broken in some cases and not
broken in other cases?

(Assignee)

Comment 47

19 years ago
I repeat, I can not reproduce what laurel has commented about. I have tested
this in two ways:

1) Run mozilla in a debugger, set a break point to right after
GetDiskSpaceAvailable() is called and changed the value to less then what
it needs to download, exactly what it needs to download, slightly over the
amount it needed to download, and zero, and a negative number. Results (as
expected): Doesn't download, downloads, downloads, doesn't download, doesn't
download.

2) I filled up a partion with junk, then mapped my mail file to that test
partition. It wouldn't download, and the message said "Not enough disk space".

Therefore, I can't reproduce this bug at all. When it DOES happen, it MUST be a
failure of the GetDiskSpaceAvailable procedure (as shown by my testing of #1).

Therefore, this bug should drop off of the RTM radar, unless it can be
reproduced, and even then, the problem is probably (most definately) with
GetDiskSpaceAvailable and not GetMsg.
Whiteboard: [nsbeta2-][rtm need info] → [nsbeta2-][rtm need info] Cannot reproduce.
(Assignee)

Comment 48

19 years ago
I'm taking this to rtm- since I own the bug.
Whiteboard: [nsbeta2-][rtm need info] Cannot reproduce. → [nsbeta2-][rtm-] Cannot reproduce since checkin.

Comment 49

19 years ago
I read somewhere (I guess it was in another bug) that on some Unix flavor, if
the user has a quota, GetDiskSpaceAvailable returns the amount of physical disk
space available, not the amount of space available in your quota. So, the only
way to really guarantee that you have enough disk space available is to actually
try to allocate it on disk. I'll go try to find that bug. I'm not sure if that's
what's going on here. I don't know how Laurel recreated the out of disk space
situation.
"Low disk space" relnote...

Gerv
Whiteboard: [nsbeta2-][rtm-] Cannot reproduce since checkin. → [nsbeta2-][rtm-] Cannot reproduce since checkin. relnote-user
(Assignee)

Comment 52

19 years ago
LAST CALL to everybody:

Can anyone come up with any situation other than running out of disk quota that
causes loss of data like this? If not, then this bug should be closed, and
another one filed, because the bug is then in GetDiskSpaceAvailable, and not GetMsg.
I saw the described behavior on a late December build when running out of disk
space on an x86 linux 2.2 system.  No quotas or anything like that.  I wound up
with an InBox that acted super-stupid when I was trying to file or delete
messages.  The Inbox mbox file wound up with a segment that looks like this:

0000000   F   r   o   m       -       S   a   t       D   e   c       3
0000020   0       1   3   :   3   8   :   0   3       2   0   0   0  \n
0000040   X   -   U   I   D   L   :       5   e   0   0   d   8   c   a
0000060   6   9   7   b   6   f   a   9   0   9   0   b   1   5   2   8
0000100   f   8   c   c   9   4   7   2  \n   X   -   M   o   z   i   l
0000120   l   a   -   S   t   a   t   u   s   :       0   0   0   1  \n
0000140   X   -   M   o   z   i   l   l   a   -   S   t   a   t   u   s
0000160   2   :       0   0   0   0   0   0  \0  \0  \0  \0  \0  \0  \0
0000200  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
0075000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0   F   r   o
0075020   m       -       T   u   e       J   a   n           2       1
0075040   9   :   4   6   :   3   3       2   0   0   1  \n   X   -   U
0075060   I   D   L   :       1   c   5   6   4   2   3   8   e   d   7
0075100   f   9   8   1   7   d   2   e   9   b   c   e   8   b   3   e
0075120   a   f   8   8   4
0075125

Notice the line numbers in the od output - a whole lot of zeros.  It looks like
something got smacked trying to write the X-Mozilla-Status2: line.

Hope that helps.
(Assignee)

Comment 54

18 years ago
Hmm...maybe another procedure other then GetMsg decided to write out status
information, but couldn't.

Comment 55

18 years ago
Only GetMsg grows the size of the file (except for copying/moving a msg into a
folder).
(Assignee)

Updated

18 years ago
Blocks: 55814
(Assignee)

Comment 56

18 years ago
Blowing away nsbeta2- and rtm- since those are ancient.

Bill: Do you remember if you did any copying or moving messages in or out of
that folder before this happened?
Whiteboard: [nsbeta2-][rtm-] Cannot reproduce since checkin. relnote-user → Cannot reproduce since checkin. relnote-user

Updated

18 years ago
Keywords: mailtrack

Updated

18 years ago
Target Milestone: M18 → ---

Updated

18 years ago
Keywords: nsenterprise

Comment 57

18 years ago
I'm sorry if this'll sound stupid, but if no mail is deleted on the server
_before_ we know for sure it's been written out correctly, how can data-loss
occur then? It would make a failing disk-space checking procedure (not saying it
is, just _if_, for whatever reason) a lot less painful...

In 0.9.2 Release notes:
[... If that option ("Leave messages on Server") was not enabled, those messages
are lost. ...]

This couldn't be possible anymore then, since a message would have to be written
fully before the 'dele' or whatever command. (the possibility of loosing mail -
anyhow, even by randomly pulling the modem's plug- sends a shiver down my spine,
 and I'm sure I'm not the only one ... :) ).
Removing nsenterprise nomination.
Keywords: nsenterprise

Comment 59

17 years ago
I'm not sure why this is still at an open state. 
I've verified with apr8-9 commercial trunk build for bug 49868.
Marking verified worksforme
Status: REOPENED → RESOLVED
Last Resolved: 19 years ago17 years ago
Resolution: --- → WORKSFORME

Comment 60

17 years ago
.
Status: RESOLVED → VERIFIED

Updated

17 years ago
Blocks: 49868
Product: MailNews → Core
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.