Closed Bug 794303 Opened 12 years ago Closed 11 years ago

Compact folder fails if local mail folder size is near 4GB(less than 4GB) or over 4GB, (Compact still fails, even after fix of bug 608449 and bug 462665)

Categories

(MailNews Core :: Backend, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 29.0

People

(Reporter: World, Assigned: ishikawa)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 6 obsolete files)

Compact folder fails if local mail folder size is near 4GB(less than 4GB) or over 4GB, even though bug 462665 was changed to FIXED after patch for bug 402392 was landed on Tb 12.0.

Build: Tb 15.0.1(en-US, Win-XP) official release.

(Case-1) Mail folder file size = 3.99 GB (4,286,726,750 bytes)
Each mail size=16MB (255 x 16MB mails + some small mails which is deleted)
( 4,286,726,750 bytes == 3GB + 1016 MB + 144 KB + 606 bytes) 
( 4 GB == 4,294,967,296 bytes, so actually less than 4GB, less than 4GB-4MB )

nstmp file size when error dialog was shown == 2.34 GB (2,419,883,776 bytes)
( 2,419,883,776 bytes == 2 GB + 259 MB + 799 KB + 768 bytes )

(Case-2) Mail folder file size = 4.39 GB (4,722,786,304 bytes)
Each mail size=4MB (many 4MB mails, a mail at offset=0 is deleted)
Due to bug 789679, file size larger than 4GB-4MB can't be generated by Tb, so mail folder file is manually created using small Script.

nstmp file size when error dialog was shown == over 2GB, far less than 3GB.

Note-1:
In any case, after "OK" to error dialog, nstmp/nstmp.msf was deleted.
Note-2:
In any case, "auto/internal rebuild-index by manual .msf deletion" worked pretty well, and no problem occurred in mail viewing, mail copy/move(except bug 789679) etc. "Order received" column value was shown as expected.
Blocks: 462665
Severity: normal → major
See Also: → 789679
My bug 793865 is one possible reason for this, but I would have to investigate to make sure all of the issues are sorted.
Depends on: 793865
WADA, what is the error dialog you mention in the description? What does it say?
(In reply to :aceman from comment #2)
> WADA, what is the error dialog you mention in the description? What does it say?

I don't remember detail, but alert like one which says "Error occured".
QA Contact: ishikawa
Off by one field :)

Chiaki identified one possible problem in compaction code, in bug 789679 comment 64:

ISHIKAWA, chiaki 2013-03-17 14:15:24 CET 
I saw the following messages dumped in the console terminal :

--DOMWINDOW == 33 (0xa89c420) [serial = 41] [outer = (nil)] [url = about:blank]
WARNING: temp file not of expected size in compact: 'tempFileRightSize', file ../../../../mailnews/base/src/nsMsgFolderCompactor.cpp, line 432
WARNING: compact failed: 'msfRenameSucceeded', file ../../../../mailnews/base/src/nsMsgFolderCompactor.cpp, line 488
++DOMWINDOW == 34 (0x96c18190) [serial = 44] [outer = 0xa90b890]

Line 432 of nsMsgFolderCompactor.cpp: 

  if (NS_SUCCEEDED(rv))
    rv = cloneFile->GetFileSize(&fileSize);
  bool tempFileRightSize = (fileSize == m_totalMsgSize);
* NS_WARN_IF_FALSE(tempFileRightSize, "temp file not of expected size in compact");

So (filesize != m_totalMsgSize) somehow. But why?

I looked for m_totalMsgSize:

grep -n  -e m_totalMsgSize *.[chp]* /dev/null
nsMsgFolderCompactor.cpp:272:  m_totalMsgSize = 0;
nsMsgFolderCompactor.cpp:431:  bool tempFileRightSize = (fileSize == m_totalMsgSize);
nsMsgFolderCompactor.cpp:1156:    m_totalMsgSize += msgSize;
nsMsgFolderCompactor.h:65:  uint32_t m_totalMsgSize;  <=== AGAH!!!

OK, it seems that we missed increasing the size of m_totalMsgSize to uint64_t?
Assignee: nobody → ishikawa
QA Contact: ishikawa
Depends on: 789679
OS: Windows XP → All
Hardware: x86 → All
I am running Debian GNU/Linux 32-bit version.

Compaction seems to work with the one line change of 
uint32_t -> uint64_t m_totalMsgSize in addition to :aceman's patch reported elsewhere 
(Of course, we need more testing, but it is encouraging.)

Local build with the one line change to declare m_totalMsgSize as
uint64_t in addition to :aceman's patches, compiled clean.

(However, it did not compile on the tryserver.  Maybe the system-wide
build structure change is not properly propagated to c-c source tree
and m-c subtree used by TryServer yet. Or I am simply using it
incorrectly.)

Short Summary: With the one line patch, once we get 4+ GB Inbox
folder, we can compactify the e-mails after we delete (or move) enough
e-mail messages NEAR THE BEGINNING of the folder so that there
internal offset are within 32 bits, and then compact. 

NG: But now I realize that there are no guarantee that the offsets
used for the articles near the beginning may not mix and match (offset
module (2^32)) of articles toward the end of large Inbox folder (> 4GB
size). So we SHOULD NOT MAKE a large Inbox folder (>4GB - 10MB, say) IN
THE FIRST PLACE (until internal index is all 64 bits).

That said,. here is a simple test I performed for compacting after
the one-line patch.

1. Started new patched TB (uint32_t m_totalMsgSize -> uinit64_t
m_totalMsgSize)

2. Got the full Inbox warning again like before upon startup.

3. The extra column addon showed the following.
   unread Total Size
   8275   8299  4098MB

4. Now let me compact the Inbox folder.

I pushed the "Compact" button in the context menu.

The intermediate file seems to be created in the Mail directory.  (I
didn't set TMP, TMPDIR for this session run).

So I monitored the size of "nstmp" file for a minute or two

  /TEST-MAIL-DIR/Mail/127.0.0.1:
  ...
  -rw-rwx--- 1 mtest2 ishikawa 4296912495  Mar 17 11:55 Inbox
  -rw-r--r-- 1 mtest2 mtest2      3728173  Mar 18 19:52 Inbox.msf
  -rw-rwx--- 1 mtest2 ishikawa    5247690  Mar 17 11:51 Sent
  -rw-r--r-- 1 mtest2 mtest2         4964  Mar 17 20:56 Sent.msf
  -rw------- 1 mtest2 mtest2            0  Mar 17 12:02 Trash
  -rw-r--r-- 1 mtest2 mtest2         2649  Mar 17 20:56 Trash.msf
  -rw-rwxr-- 1 mtest2 ishikawa         25  Mar 17 11:29 msgFilterRules.dat
  -rw------- 1 mtest2 mtest2   2090680320  Mar 18 19:55 nstmp
  -rw-r--r-- 1 mtest2 mtest2            0  Mar 18 19:53 nstmp.msf
  -rw-rwx--- 1 mtest2 ishikawa         61  Mar 17 11:41 popstate.dat

  ... after a while ...

  -rw-rwxr-- 1 mtest2 ishikawa         25  Mar 17 11:29 msgFilterRules.dat
  -rw------- 1 mtest2 mtest2   3076878336  Mar 18 19:55 nstmp
  -rw-r--r-- 1 mtest2 mtest2            0  Mar 18 19:53 nstmp.msf

  ... after a while ...

  -rw-rwxr-- 1 mtest2 ishikawa         25  Mar 17 11:29 msgFilterRules.dat
  -rw------- 1 mtest2 mtest2   3472994304  Mar 18 19:55 nstmp
  -rw-r--r-- 1 mtest2 mtest2            0  Mar 18 19:53 nstmp.msf

  ... after a while ...

  -rw------- 1 mtest2 mtest2   3824140288  Mar 18 19:55 nstmp
  -rw-r--r-- 1 mtest2 mtest2            0  Mar 18 19:53 nstmp.msf

  ... after a while ...

  -rw------- 1 mtest2 mtest2   4078563328  Mar 18 19:56 nstmp
  -rw-r--r-- 1 mtest2 mtest2            0  Mar 18 19:53 nstmp.msf

  ... after a while ...

5. TB said "Done compacting" at the lower-left corner of the window pane.
Or did it for real?

The final size of Inbox is:

  -rw------- 1 mtest2 mtest2   4295876569  Mar 18 19:56 Inbox
  -rw-r--r-- 1 mtest2 mtest2      3649370  Mar 18 19:56 Inbox.msf

2^32=4294967296  

Inbox is now smaller, but  still larger than 4GB.  But come to think of it, I did not delete enough e-mail articles to make it much smaller yet.

6. I see the subject lines 3521, 3522, 3523 at the end of
   Inbox subject listing..

OK, let us remove articles with subject 3521, 3522, 3523.
I select subject line with 3523 (the last e-mail)
I hit delete.

This time the selected article showed up in the Trash.
(Unlike the last time I tested such deletion for the fist time 
after Inbox became full. That time, an article near the
start of Inbox got thrown into Trash instead :-(

7. Let us delete 3521 (skipped the now last one) and try to delete it.
Worked. 3521 went into Trash, and Inbox showed 3520, 3522.  (as
expected.)

8. Let us delete 3512, 3513, 3514, ..., 3520, 3522 (at the end.)  OK
the deleted one showed up in the trash. (as expected)

9. After checking the deletion of 10 e-mails at the end one at a time, I
deleted about 120 e-mail articles, and compacted again.

Now, extra column showed:
8154 8173     4035MB

The file Inbox intself became smaller than 2^32.

  -rw------- 1 mtest2 mtest2   4230613231  Mar 18 20:06 Inbox
  -rw-r--r-- 1 mtest2 mtest2      3599001  Mar 18 20:06 Inbox.msf

Now this is smaller than 2^32.

So this one line patch works with :aceman's other patches for
compaction of Inbox that is slightly larger 4GB.

The REAL REMAINING ISSUE ought to be why POP3 download allows the
Inbox to become larger than (4GB - 10MB) in the first place.

Going over 4GB still triggers the 32bit offset limit of internal
article indexing and if you try to remove an article at the offset (>
4GB) in larger than 4GB Inbox folder, I think we delete an article
near the beginning (because of the (offset mod (2 ^ 32)) is used for
indexing the internal database.

After a few more testings, I can propose the attached patch for final review.

TIA
(In reply to ISHIKAWA, chiaki from comment #5)
> One line patch to declare m_totalMsgSize uint64_t instead of uint32_t

This bug looks merely a "32bits integer" issue...

> The REAL REMAINING ISSUE ought to be why POP3 download allows
> the Inbox to become larger than (4GB - 10MB) in the first place.

If this bug will be resolved by the "One line patch", phenomenon of "larger than 4GB file is created by POP3 download even though '4GB-4MB-1MB' file size check" itself is never critical problem, because "Berkley Mbox file larger than 4GB" is already supported well by Backend.
Regardless of current "'4GB-4MB' file size check", "file larger than 4GB" will be generated upon download once the "'4GB-4MB' file size check" will be removed.
(In reply to ISHIKAWA, chiaki from comment #5)
> So this one line patch works with :aceman's other patches for
> compaction of Inbox that is slightly larger 4GB.
Great!

> The REAL REMAINING ISSUE ought to be why POP3 download allows the
> Inbox to become larger than (4GB - 10MB) in the first place.
Bug 640371.

> Going over 4GB still triggers the 32bit offset limit of internal
> article indexing and if you try to remove an article at the offset (>
> 4GB) in larger than 4GB Inbox folder, I think we delete an article
> near the beginning (because of the (offset mod (2 ^ 32)) is used for
> indexing the internal database.
Maybe bug 793865 so you do not need to fix it here as rkent is looking at this kind of issues.

> After a few more testings, I can propose the attached patch for final review.
I think go for it. I'll test the patch too (on linux) and give you feedback.
Chiaki, how do you know which emails are "at the beginning" of the folder? Is it the number in the "order received" field?
(In reply to :aceman from comment #8)
> Chiaki, how do you know which emails are "at the beginning" of the folder?
> Is it the number in the "order received" field?

I created the test e-mail messages received during POP3 session
with ascending numbers in the subject line.

But you are right. If I copy/move e-mail articles to Inbox from other
folders, these are appended to "Inbox" folder irrespective of
its Subject and Date field values.
So we can not tell reliably which articles are near the beginning of Inbox folder.

So we can only tell that some articles are near the beginning in only
in artificial setting. My observation is not that useful in real life...

TIA
(In reply to WADA from comment #6)
> (In reply to ISHIKAWA, chiaki from comment #5)
> > One line patch to declare m_totalMsgSize uint64_t instead of uint32_t
> 
> This bug looks merely a "32bits integer" issue...
> 
> > The REAL REMAINING ISSUE ought to be why POP3 download allows
> > the Inbox to become larger than (4GB - 10MB) in the first place.
> 
> If this bug will be resolved by the "One line patch", phenomenon of "larger
> than 4GB file is created by POP3 download even though '4GB-4MB-1MB' file
> size check" itself is never critical problem, because "Berkley Mbox file
> larger than 4GB" is already supported well by Backend.
> Regardless of current "'4GB-4MB' file size check", "file larger than 4GB"
> will be generated upon download once the "'4GB-4MB' file size check" will be
> removed.

Mbox larger than 4GB seems to be supported, all right.

But I am afraid the indexing of internal articles may still have
32 bit issues. (For example, when I tried to delete the last article in 4+ GB Inbox folder, somehow an article near the beginning of the Inbox folder gets
deleted and moved to Trash. The last article in Inbox disappeared completely
after compaction. So this is a serious data loss problem.
Bug 793865.

We seem to attack all these related problems in one batch.

TIA
(In reply to ISHIKAWA, chiaki from comment #10)
> (For example, when I tried to delete the last article in 4+ GB Inbox folder, 
> somehow an article near the beginning of the Inbox folder gets deleted and moved to Trash.

Mail of what MsgKey/Size was deleted when you requested delete on mail of what MgKey/Size?
MsgKey is shown in "Order Received" column.
  When IMAP, UID of mail (faked key/temporary UID, while Offline Copy)
  When Berkley, Offset of mail, except after filter move upon pop3 mail
                download(MsgKey of previous mail + 1)
  When non-Berkley(maildir), I don't know.

"32bits signed integer for Order Received column display" was already resolved on Win, but I don't know about problem on Linux, Mac OS X.
And, some code may not correctly distingush messageKey and messageOffset in Berkley Mbox, because clear separation of messageKey and messageOffset was recently done by 4GB support in Backend for pluggable message store.
Comment on attachment 726176 [details] [diff] [review]
One line patch to declare m_totalMsgSize uint64_t instead of uint32_t

Chiaki, can you create a try build with patches from bug 789679 and this patch so that WADA can try it out on Windows as the reporter of this bug? Thanks.

I was not successful in trying this out. I probably did not delete messages from the start of the file. So it always failed with the error Chiaki was seeing before the patch. Or I am doing something wrong.
(In reply to :aceman from comment #12)
> Comment on attachment 726176 [details] [diff] [review]
> One line patch to declare m_totalMsgSize uint64_t instead of uint32_t
> 
> Chiaki, can you create a try build with patches from bug 789679 and this
> patch so that WADA can try it out on Windows as the reporter of this bug?
> Thanks.
> 
> I was not successful in trying this out. I probably did not delete messages
> from the start of the file. So it always failed with the error Chiaki was
> seeing before the patch. Or I am doing something wrong.

I will try, but I have never created win binary before (locally and TryServer).
The last time I tried on the TryServer several days ago with a different patch, the patch failed. I suspect the end of line marker mismatch (CRLF vs LF). 
Also I am seeing build-related problems :-(
But I will try. Keep your fingers crossed. :-)
Aryx, could you please help Chiaki if he gets into any problems with the try builds?
m_totalMsgSize is used at following three lines only.
(i) Before start, initialize by ZERO.
http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp#272
272   m_totalMsgSize = 0;
(ii) After copy of each mail, add msgSize.
http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp#1156
1156     m_totalMsgSize += msgSize;
(iii) After copy of all messages, compare with actual nstmp file size.
http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp#431
431   bool tempFileRightSize = (fileSize == m_totalMsgSize);
432   NS_WARN_IF_FALSE(tempFileRightSize, "temp file not of expected size in compact");

This size check logic was already applied when I tested(comment #0).
If m_totalMsgSize is cause of Compact failue of comment #0, all mails should be copied to nstmp and nstmp file size should be near 4GB when Compact fails.
However, Compact failure always occured at "2GB < nstmp size << 3GB".

Following is Process Moitor log for read/write of Archives and nstmp whe Archives is compacted.
(A) Before Compact
    Mbox=Archives :  prettiestName = Archives, sizeOnDisk = 12950 + 5955, expungedBytes = 5955
      messageKey =     0 + 5955, messageOffset =     0 + 5955, messageSize = 6995 
      messageKey =  6995 + 5955, messageOffset =  6995 + 5955, messageSize = 1024 
      messageKey =  8019 + 5955, messageOffset =  8019 + 5955, messageSize = 1194 
      messageKey =  9213 + 5955, messageOffset =  9213 + 5955, messageSize = 1552 
      messageKey = 10765 + 5955, messageOffset = 10765 + 5955, messageSize = 2185 
(B) After Compact
    Mbox=Archives :  prettiestName = Archives, sizeOnDisk = 12950, expungedBytes = 5955
      messageKey =     0, messageOffset =     0, messageSize = 6995 
      messageKey =  6995, messageOffset =  6995, messageSize = 1024 
      messageKey =  8019, messageOffset =  8019, messageSize = 1194 
      messageKey =  9213, messageOffset =  9213, messageSize = 1552 
      messageKey = 10765, messageOffset = 10765, messageSize = 2185 
(Process Monitor log)
> 00:05:08.5581226  15:13:46.0396938 C:\wada\@@@\Mail0\Archives  Read   IRP_MJ_READ               SUCCESS  Offset: 5,955, Length: 6,995  2124  3760
> 00:05:08.5647089  15:13:46.0462801 C:\wada\@@@\Mail0\nstmp     Write  IRP_MJ_WRITE              SUCCESS  Offset: 0, Length: 4,096  2124  1496
> 00:05:08.5660610  15:13:46.0476322 C:\wada\@@@\Mail0\Archives  Read   FASTIO_READ               SUCCESS  Offset: 12,950, Length: 1,024  2124  3760
> 00:05:08.5660786  15:13:46.0476498 C:\wada\@@@\Mail0\Archives         FASTIO_CHECK_IF_POSSIBLE  SUCCESS  Operation: Read, Offset: 12,950, Length: 1,024  2124  3760
> 00:05:08.5667581  15:13:46.0483293 C:\wada\@@@\Mail0\Archives  Read   FASTIO_READ               SUCCESS  Offset: 13,974, Length: 1,194  2124  3760
> 00:05:08.5667737  15:13:46.0483449 C:\wada\@@@\Mail0\Archives         FASTIO_CHECK_IF_POSSIBLE  SUCCESS  Operation: Read, Offset: 13,974, Length: 1,194  2124  3760
> 00:05:08.5671824  15:13:46.0487536 C:\wada\@@@\Mail0\nstmp     Write  FASTIO_WRITE   FAST IO DISALLOWED  Offset: 4,096, Length: 4,096  2124  1496
> 00:05:08.5674774  15:13:46.0490486 C:\wada\@@@\Mail0\nstmp     Write  IRP_MJ_WRITE              SUCCESS  Offset: 4,096, Length: 4,096  2124  1496
> 00:05:08.5682163  15:13:46.0497875 C:\wada\@@@\Mail0\Archives  Read   FASTIO_READ               SUCCESS  Offset: 15,168, Length: 1,552  2124  3760
> 00:05:08.5682331  15:13:46.0498043 C:\wada\@@@\Mail0\Archives         FASTIO_CHECK_IF_POSSIBLE  SUCCESS  Operation: Read, Offset: 15,168, Length: 1,552  2124  3760
> 00:05:08.5688427  15:13:46.0504139 C:\wada\@@@\Mail0\Archives  Read   FASTIO_READ               SUCCESS  Offset: 16,720, Length: 2,185  2124  3760
> 00:05:08.5688586  15:13:46.0504298 C:\wada\@@@\Mail0\Archives         FASTIO_CHECK_IF_POSSIBLE  SUCCESS  Operation: Read, Offset: 16,720, Length: 2,185  2124  3760
> 00:05:08.5692626  15:13:46.0508338 C:\wada\@@@\Mail0\nstmp     Write  FASTIO_WRITE  FAST IO DISALLOWED   Offset: 8,192, Length: 4,096  2124  1496
> 00:05:08.5692810  15:13:46.0508522 C:\wada\@@@\Mail0\nstmp            FASTIO_CHECK_IF_POSSIBLE  FAST IO DISALLOWED  Operation: Write, Offset: 8,192, Length: 4,096  2124  1496
> 00:05:08.5695545  15:13:46.0511257 C:\wada\@@@\Mail0\nstmp     Write  IRP_MJ_WRITE              SUCCESS  Offset: 8,192, Length: 4,096  2124  1496
> 00:05:08.5698808  15:13:46.0514520 C:\wada\@@@\Mail0\Archives         IRP_MJ_CLEANUP            SUCCESS    2124  1496
> 00:05:08.5706608  15:13:46.0522320 C:\wada\@@@\Mail0\nstmp     Write  FASTIO_WRITE              FAST IO DISALLOWED  Offset: 12,288, Length: 662  2124  1496
> 00:05:08.5706784  15:13:46.0522496 C:\wada\@@@\Mail0\nstmp            FASTIO_CHECK_IF_POSSIBLE  FAST IO DISALLOWED  Operation: Write, Offset: 12,288, Length: 662  2124  1496
> 00:05:08.5709513  15:13:46.0525225 C:\wada\@@@\Mail0\nstmp     Write  IRP_MJ_WRITE              SUCCESS  Offset: 12,288, Length: 662  2124  1496

As seen in log, each mails is read in buffer(perhaps 8KB), and data is written 4KB buffer, and 4KB buffer is continuously written to nstmp.
I guess "32bits signed integer" problem in "file weite to nstmp" used by Compact.
- 4KB buffer write is repeated
- nstmp file size exceeds 2GB => 32bits signed integer value==negative
- negative == I/O error, then file I/O is terminated.
- (a) Compact fails due to the unknown I/O error,
  or (b) Compact is aborted by file size check with m_totalMsgSize.

If Compact is successful in Try Server build, I think (a) is resolved by patch for bug 789679, and problem of (b) which occurs after the patch is resolved by "one liner patch" for this bug.
Blocks: 608449
Writing to nstmp is similar to;
  Create nstmp, Open nstmp with Write mode,
  While(buffer data available){ write 4KB buffer data to nstmp; }
  Close nstmp
Because total size of "data to be copied to nstmp" is larger than 2GB, and entire data is written during single open/close, "total written count" is greater than 2GB. 
And, during write operation, if SEEK is requested with "offset from start of file", and if seek position is 32bits integer, problem occurs when total written data size exceeds 2GB. In this case, "total written bytes" == "current file size", but this is not problem with "file size". This is problem with "offset in file". 

This is same as known problem in Object REXX Stream function.
  In Stream function for file I/O of Rexx, Rexx uses 32bits signed
  integer for SEEK position, so wrong seek position is returned
  if over 2GB, and absolute seek request to over 2GB is impossible.
  IIRC, relative seek request over 2GB was successful although wrong
  position is returned.
  In REXX, .stream Object class for I/O is implemented with 64bits
  signed integer, so no problem occurs with file larger than 2GB, 4GB.  

"32bits signed integer" problem in NSPR file I/O function?
Summary: Compact folder fails if local mail folder size is near 4GB(less than 4GB) or over 4GB, even though bug 462665 was changed to FIXED after patch for bug 402392 was landed on Tb 12.0. → Compact folder fails, if nstmp file size is larger than 2GB (Compact still fails, even though bug 462665 was changed to FIXED after patch for bug 402392 was landed on Tb 12.0.)
I think we'll definitely want a test for this bug. You should be able to base one on:

http://mxr.mozilla.org/comm-central/source/mailnews/local/test/unit/test_over2GBMailboxes.js

I'm not sure if we have one for compact about, but I think we might do (or I suspect it should be fairly easy to hook up).
OK, if this actually gets some review we could maybe make a test.
However the execution of the test may take several minutes. Is that acceptable? But probably better than have datalosses at >4GB.

But these is one problem. We actually do what we can in bug 789679 to prevent TB growing over 4GB (for now, while fixing this infrastructure to allow it to handle 4B+). So the test should not be able to do that via TB apis. Do you suggest to somehow create a 4GB+ file via filesystem operations (e.g. appending to the mbox file via shell commands)?
Flags: needinfo?(mbanner)
Blocks: 789679
No longer depends on: 789679
Removing my small protest against "closing bug of 'Remove the 4GB(MS Win)/2GB(*nix inxluding Mac OS X) mailbox limit' as FIXED" from this bug's summary, accordig to bug summary change of bug 462665.
Summary: Compact folder fails, if nstmp file size is larger than 2GB (Compact still fails, even though bug 462665 was changed to FIXED after patch for bug 402392 was landed on Tb 12.0.) → Compact folder fails, if nstmp file size is larger than 2GB (Compact still fails, even after fix of bug 608449 and bug 462665)
(In reply to :aceman from comment #19)
> OK, if this actually gets some review we could maybe make a test.
> However the execution of the test may take several minutes. Is that
> acceptable? But probably better than have datalosses at >4GB.

Yes, although re-using the existing test might help here.

> But these is one problem. We actually do what we can in bug 789679 to
> prevent TB growing over 4GB (for now, while fixing this infrastructure to
> allow it to handle 4B+). So the test should not be able to do that via TB
> apis. Do you suggest to somehow create a 4GB+ file via filesystem operations
> (e.g. appending to the mbox file via shell commands)?

Well we already have one 4GB test that creates an mbox > 4GB....
Flags: needinfo?(mbanner)
OK, I see there is http://mxr.mozilla.org/comm-central/source/mailnews/local/test/unit/test_over4GBMailboxes.js . (And one for 2GB). And it grows the file using FS operations (or stream, not proper mailbox APIs) as I said to do. And it seems to not catch that >4GB compact is not working for some cases. And that is it's main purpose. So yes, we can look at it and extend.
Checked Compact with Tb 17.0.3 on MS Win again.
(1) Before Compact : greater than 4GB due to bug 640371.
                   delete 100 mails at top (400MB)
MessageKey at bottom after POP3 download of (10 * 7MB mail).
    ( 7340184 bytes == 7 MB + 0 KB + 152 Bytes ) 
> Mail #       :  Property value of msgDBHder of each downloaded mail in Inbox
> 7MB-mail-#01 :  messageKey = 4288430080, messageOffset = 4288430080, messageSize = 7340184
>          4288430080 == 3 GB + 1017 MB + 784 KB + 0 Bytes < 0xFFFFFF00
>          4GB - 4288430080 == 6 MB + 240 KB + 0 Bytes
>          => Download is initiated. messageKey==messageOffset 
> 7MB-mail-#02 :  messageKey = 4288430081, messageOffset = 4295770264, messageSize = 7340184
>          0xFFFFFF00 < 4295770264 == 4 GB + 0 MB + 784 KB + 152 Bytes
>          => messageKey is incremeted by 1
> 7MB-mail-#03 :  messageKey = 4288430082, messageOffset = 4303110448, messageSize = 7340184
> 7MB-mail-#04 :  messageKey = 4288430083, messageOffset = 4310450632, messageSize = 7340184
> 7MB-mail-#05 :  messageKey = 4288430084, messageOffset = 4317790816, messageSize = 7340184
> 7MB-mail-#06 :  messageKey = 4288430085, messageOffset = 4325131000, messageSize = 7340184
> 7MB-mail-#07 :  messageKey = 4288430086, messageOffset = 4332471184, messageSize = 7340184
> 7MB-mail-#08 :  messageKey = 4288430087, messageOffset = 4339811368, messageSize = 7340184
> 7MB-mail-#09 :  messageKey = 4288430088, messageOffset = 4347151552, messageSize = 7340184
> 7MB-mail-#10 :  messageKey = 4288430089, messageOffset = 4354491736, messageSize = 7340184
(2) After Compact : Compact successfull. 3758 MB

Problem in Compact when "simply 2GB<nstmp<4GB-α" looks already resolved.
Change bug summary back to original. Sorry for spam.

Following is a part of bug 793865 comment #12 from :aceman.
> With this patch I could finally successfully compact a folder >4GB
> that was not compactable with only the patch in bug 794303.
> And the folder still stayed above 4GB as I did not remove enough messages. But that is fine.

Current problem in Compact looks;
- If file size after Compact exceeds 4GB(or 4GB-4MB-1MB, or around it),
  Compact fails.
- If messageKey overflow by bug 640371 exists, Compact fails,
  or Compact breaks mail folder.
Summary: Compact folder fails, if nstmp file size is larger than 2GB (Compact still fails, even after fix of bug 608449 and bug 462665) → Compact folder fails if local mail folder size is near 4GB(less than 4GB) or over 4GB, (Compact still fails, even after fix of bug 608449 and bug 462665)
Checked Compact with Tb 17.0.3 on MS Win:
   Folder size after Compact==4GB-4MB 
   => Compact is executed normally.
      If less than 4GB, Compact works well currenltly.
   => However, [CRLF] of last mail was truncated ater Compact,
      and mesasgeSize was reduced by 2 after Compact.
      No problem when digital signed mail or cyphered mail?

[Test Procedure]
(1) Initial Inbox
    Just 4MB mail x 1024 mails == Just 4GB
    Bottom of each mail (multipart/mixed mail, [CRLF]=0x0D0A) :
     ...[CRLF]
     --bdy--[CRLF]
     [CRLF]
(2) Before Compact : Delete one mail at offset=0
(3) Compact => Successful
Inbox file size after Compact :
>   4290772990 bytes     == 3 GB + 1019 MB + 1023 KB + 1022 Bytes
>   4GB-4290772990 bytes ==           4 MB +    0 KB +    2 Bytes <= [CRLF] at bottom of last mail is truncated
nsIMsgFolder properties of Inbox
>    Mbox=Inbox :  URI = mailbox://soarex%40ops.dti.ne.jp@pop.ops.dti.ne.jp/Inbox,
>       prettiestName = Inbox, expungedBytes = 0, sizeOnDisk = 4290772990, isSpecialFolder = ( Inbox )
nsIMsgDBHdr properies of mails at bottom part after Compact
>  Extracted msgHdr data of all messages in current View of Inbox
>   messageOffset     Key, Offset, Size
>    Before Compact    After Compact
>    4269801472+4MB : messageKey = 4269801472, messageOffset = 4269801472, messageSize = 4194304
>    4273995776+4MB : messageKey = 4273995776, messageOffset = 4273995776, messageSize = 4194304
>    4278190080+4MB : messageKey = 4278190080, messageOffset = 4278190080, messageSize = 4194304
>    4282384384+4MB : messageKey = 4282384384, messageOffset = 4282384384, messageSize = 4194304
>    4286578688+4MB : messageKey = 4286578688, messageOffset = 4286578688, messageSize = 4194302  <= two bytes(CRLF at bottom) is truncated
After test of comment #25, delete one mail at offset=0, Compact 
nsIMsgDBHdr properies of mails at bottom part after Compact
>   messageOffset     Key, Offset, Size
>    Before Compact    After Compact
>    4269801472+4MB : messageKey = 4269801472, messageOffset = 4269801472, messageSize = 4194304
>    4273995776+4MB : messageKey = 4273995776, messageOffset = 4273995776, messageSize = 4194304
>    4278190080+4MB : messageKey = 4278190080, messageOffset = 4278190080, messageSize = 4194304
>    4282384384+4MB : messageKey = 4282384384, messageOffset = 4282384384, messageSize = 4194304
"Truncate of [CRLF] at bottom" was "mail end at 4GB" only phenomenon.
Perhaps "32bits envelopePos" relevant phenomenon. (next data of the last [CRLF] exceeds 4GB boundary, so offset wraps to ZERO.)
(In reply to WADA from comment #25)
> Inbox file size after Compact :
> >   4290772990 bytes     == 3 GB + 1019 MB + 1023 KB + 1022 Bytes
> >   4GB-4290772990 bytes ==           4 MB +    0 KB +    2 Bytes <= [CRLF] at bottom of last mail is truncated
> nsIMsgFolder properties of Inbox
> >    Mbox=Inbox :  URI = mailbox://soarex%40ops.dti.ne.jp@pop.ops.dti.ne.jp/Inbox,
> >       prettiestName = Inbox, expungedBytes = 0, sizeOnDisk = 4290772990, isSpecialFolder = ( Inbox )
> nsIMsgDBHdr properies of mails at bottom part after Compact
> >  Extracted msgHdr data of all messages in current View of Inbox
> >   messageOffset     Key, Offset, Size
> >    Before Compact    After Compact
> >    4269801472+4MB : messageKey = 4269801472, messageOffset = 4269801472, messageSize = 4194304
> >    4273995776+4MB : messageKey = 4273995776, messageOffset = 4273995776, messageSize = 4194304
> >    4278190080+4MB : messageKey = 4278190080, messageOffset = 4278190080, messageSize = 4194304
> >    4282384384+4MB : messageKey = 4282384384, messageOffset = 4282384384, messageSize = 4194304
> >    4286578688+4MB : messageKey = 4286578688, messageOffset = 4286578688, messageSize = 4194302  <= two bytes(CRLF at bottom) is truncated

This seems to be a good find. Please keep the testcase at hand, we should look at it.
FYI.
Reason why "problem when 2GB=<nstmp<4GB" does't occur in Tb 17 may be bug 215450 which was fixed on 2012-08-13(Target Milestone: Mozilla 17).
See bug 789679 comment #31.
For disk space check before Compact.
totalMsgSize looks incremented at EndCopy of each mail. And string of "space" is not found in nsMsgFolderCompactor.cpp. 
Disk space check is not done before start Compact?
Checked with try server build with patch from bug 793865.
> http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/
> archaeopteryx@coole-files.de-1359bd2452f9/
Following is copy of bug 640371 comment #44.

(1) Inbox : after test of bug 640371 comment #42
(2) Delete one mail at offset=0 (4194304 bytes == just 4 MB)
    File size after Compact > 4GB
(3) Do Compact
    => All mails including offset>4GB mail are normally shifted
       4194304 bytes.
    sizeOnDisk = 4420954336
    4420954336 = 4 GB + 120 MB + 154 KB + 224 Bytes

Position switch from "messageKey = messageOffset" to "messageKey = highest messageKey + 1" :
Last message of "messageKey = messageOffset"
> messageKey = 2143289344, messageOffset = 2143289344, messageSize = 4194304
> 2143289344 = 1 Giga + 1020 Mega + 0 Kilo + 0 / 2143289344 = 1 GB + 1020 MB + 0 KB + 0 Bytes
Message where starts "messageKey = highest messageKey + 1"
> messageKey = 2143289345, messageOffset = 2147483648, messageSize = 4194304
> 2143289345 = 1 Giga + 1020 Mega + 0 Kilo + 1 / 2147483648 = 2 GB +    0 MB + 0 KB + 0 Bytes
Current logic;
  messageOffset < 2GB   => messageKey = messageOffset
  2GB <= messageOffset  => messageKey = highest messageKey + 1
(In reply to WADA from comment #29)
> For disk space check before Compact.
> totalMsgSize looks incremented at EndCopy of each mail. And string of
> "space" is not found in nsMsgFolderCompactor.cpp. 
> Disk space check is not done before start Compact?

How can we know how much space is needed for whole compact?
(In reply to :aceman from comment #31)
> (In reply to WADA from comment #29)
> > For disk space check before Compact.
> > totalMsgSize looks incremented at EndCopy of each mail. And string of
> > "space" is not found in nsMsgFolderCompactor.cpp. 
> > Disk space check is not done before start Compact?
> 
> How can we know how much space is needed for whole compact?

Twice the existing usage should not be unreasonable.  ie the files will only get smaller. 

Anyone with a drive less that 4Gb free needs more help than compacting can give.
If somebody has 4GB folder and deletes 2GB messages from it, we can't request 4GB free space. But hey, we actually know, how much is in those deleted messages (expungedBytes attribute) so we can ask for size - expungedBytes. But it would be strange if such a check would not be already there. WADA if there is not yet a bug for this please make one. After you have tested that if you have stable disk space (nothing else is writing to it) the compact starts even when there clearly is not enough space and then aborts.
(In reply to :aceman from comment #33)
> But hey, we actually know, how much is in those deleted messages 
> (expungedBytes attribute) so we can ask for size - expungedBytes.

expungedBytes is not mandatory.
> Assume msgFolder contains currently processed folder object.
> var required_size=0;
> try{
> var Enumerator=msgFolder.messages ; // if no messages, Enumerator may be undefined
> var cnt_max = 9999999999 ; // avoid infinite loop by my coding error
> for(var cnt=1; Enumerator && Enumerator.hasMoreElements() && cnt<cnt_max; cnt++)
> {
>    var msgDBHdr = Enumerator.getNext().QueryInterface(Components.interfaces.nsIMsgDBHdr) ;
>    if( true==msgDBHdr.IMAPDeleted ){ continue; }
>    required_size ++= = msgDBHdr.messageSize;
> }
> }catch(e){ alert("Compact Error"+"\n"+e); }
Current "m_totalMsgSize += msgSize; in EndCopy" is for;
(a) surely detect 4KB buffer write error during "copy a message". 
But it does do next at same time.
(b) count increase of size by adding X-Mozilla-Status:, X-Mozilla-Status2: and by adding/expanding X-Mozilla-Keys:, and by other size changes during Compact.
So, "m_totalMsgSize(==final SizeOnDisk value) - required_size" is usable to inform taken actions by Compact to Activity Manager.

> But it would be strange if such a check would not be already there.
> WADA if there is not yet a bug for this please make one.

You can't find code who checks required disk space before start of Compact? If so, I'll open bug for it.
(In reply to WADA from comment #34)
> (In reply to :aceman from comment #33)
> > But hey, we actually know, how much is in those deleted messages 
> > (expungedBytes attribute) so we can ask for size - expungedBytes.
> 
> expungedBytes is not mandatory.

I actually suggest very careful review of any data from the MSF file that is to be used.  Get satisfaction has a regular stream of people with corrupt MSF files with no apparent commonality.  If the message index can be out of symc as these problems evidence, the entire contents of this little database is of questionable value and really needs to be evaluated as suspect before anything else.
Perhaps generate another MSF and compare them to determine validity.
(In reply to :aceman from comment #33)
> After you have tested that if you have stable disk space (nothing else is writing to it)
> the compact starts even when there clearly is not enough space and then aborts.

I filled my VALLUABLE(only 9.5GB free!) HDD by 4GB, 2GB, 2GB files(each contains only "hello" at end, all other is 0x00 padded by MS Win). Free space is 1.5GB. Then deleted one 4MB mail at offset=0 of more than 4GB Tb 17.0.2's Inbox, and requested Tb to Do Compact. 
As expected, Tb started Compact blindly, and Compact failed with next dialog because of HDD full as expected.
> !
> The folder 'Inbox' could not be compacted because writing to folder failed.
> Verify that you have enough disk space, and that you have write privileges to the file system, then try again.
>    [ OK ]

I'll open bug for "Free disk space check before start Compact".
(In reply to Matt from comment #35)
> I actually suggest very careful review of any data from the MSF file that is
> to be used.  Get satisfaction has a regular stream of people with corrupt
> MSF files with no apparent commonality.  If the message index can be out of
> symc as these problems evidence, the entire contents of this little database
> is of questionable value and really needs to be evaluated as suspect before
> anything else.
> Perhaps generate another MSF and compare them to determine validity.

Generating a new msf is almost as slow as actually attempting to do a compact. With the difference that it does not take so much disk space.
So I think this is not much of a help.
Blocks: 854791
No longer blocks: 854791
Blocks: 854798
FYI. I've opened two bugs relevant to Compact.
  bug 854791 : free disk space check before Compact
  bug 854798 : don't alter messageKey by Compact.
               messageKey change should be done by rebuild index only
Attachment #726176 - Flags: review?(irving)
Attachment #726176 - Flags: review?(irving) → review+
Chiaki, can you please include a proper commit message into the patch (instead of "uint32_t m_totalMsgSize; -> uint64_t") so that this can be checked in?
Status: NEW → ASSIGNED
Just as requested.
This is the same content, and I only edited the git-like comment block.

TIA

PS: I thought we don't need a review here, do we?
Sorry for a quick repost.
I mistyped the type name in the comment. Fixed these.

TIA
Attachment #732879 - Attachment is obsolete: true
No, you don't need to request re-review for fixing the patch comment; you can set the r+ flag yourself and note "carrying forward r=irving" in the bugzilla comment.

Make sure you set "[leave open]" in the Whiteboard field for the bug when you request check-in, unless this is the only change needed for this bug.
(In reply to :Irving Reid from comment #42)
> Make sure you set "[leave open]" in the Whiteboard field for the bug when
> you request check-in, unless this is the only change needed for this bug.

We are not sure yet. For Ishikawa this patch was enough to get compact working. For me it needed also the other patches to get it going (I probably got hit by wrapping of msgkey as I had more than 256 messages above offset 0xFFFFFF00).

So we don't know, but there is no pending work in this bug we would know about.
But let's keep it open for further observations.
Keywords: checkin-needed
Whiteboard: [please leave open]
Comment on attachment 732882 [details] [diff] [review]
One line patch to declare m_totalMsgSize uint64_t instead of uint32_t (with proper checkin message) [check-in in comment 45]

Carrying over r=irving.
Attachment #732882 - Attachment description: Fixed typos: (Just changed the comment for the patch) One line patch to declare m_totalMsgSize uint64_t instead of uint32_t → One line patch to declare m_totalMsgSize uint64_t instead of uint32_t (with proper checkin message)
Attachment #732882 - Flags: review+
Attachment #726176 - Attachment is obsolete: true
Thank you. Let's hope the whole patch set will soon be in the repository.
Now that I have my test environment working (gcc 4.7 and "ld" blew up exhausting memory during link. Now "gold" fixes the problem.), I hope to track 4GB boundary issues more closely.
You could disable a lot of stuff in TB build, that is not needed so that ld does not exhaust memory. Mine does peak at only 2.7GB.
(In reply to :aceman from comment #47)
> You could disable a lot of stuff in TB build, that is not needed so that ld
> does not exhaust memory. Mine does peak at only 2.7GB.
Maybe you are right, but I have been debugging TB using valgrind/memcheck, and since
I found a few issues using DEBUG BUILD output, I have enabled it: valgrind/memcheck+DEBUG+testing since I do local testing, and the whole thing became rather large. Believe it or not, without all these, I may not have found subtle bugs
caused by the bad interaction of EXTERNAL libraries, which made using TB/FF very difficult for Japanese/Korean/Chinese users.

Anyway, now that I found the working combination of compiler/linker("gold") and know the subtle difference of compiler revisions on my PC and Trybuilder, I should be able to test 4GB patches more easily (except that I was sidetracked with a few more bugs on the side. Bug fixing of the current TB seems a never-ending story just for using a mail-client peacefully :-) 

Seriously, if you can list the set of latest 4GB patches somewhere, 
I will appreciate it very much. Maybe one of the 4GB-related bugzilla?

To be honest, I was surprised that maybe I am one of the handful people compiling TB using GCC 4.7 locally, and experienced this memory exhaustion with the stock "ld". 
This I am saying because I could not find any blogs, mail-list postings, etc. using google when it hit me.
That is worrisome from the viewpoint of supposed community support of TB. There dont seem to be enough people compiling the code locally.

We need to make local compilation a little easier so that more people will experiment with TB code, and fix bugs there.

For example, this large libxul.so needs to be split, I think, but this is another bug/wish. (It seems from the news post about comm-central build change, that all the mailer/newsreader specific objects are dumped into lixul.so to be linked with m-c bits?! No wonder it gets big if my reading is correct.)

Again, the 4GB patches are really appreciated.

TIA
Of course, I have DEBUG and such stuff enabled too. I meant to disable features. Like video, audio, printing, plugins, webrtc, GAMEPAD(!!!), those are not needed to make useful TB changes. Of course, if you want to debug those, you need them. But those are mostly in the Gecko core/toolkit so you could as well not build TB but Firefox if you want those.

I'm also with GCC 4.7 on linux. The linker memory problems are known and there are plans to fix it. But you need to search into Gecko Core bugs, not TB.

The latest state of 4GB patches, that can be built it in https://bugzilla.mozilla.org/show_bug.cgi?id=793865#c52
(In reply to :aceman from comment #49)
> Of course, I have DEBUG and such stuff enabled too. I meant to disable
> features. Like video, audio, printing, plugins, webrtc, GAMEPAD(!!!), those
> are not needed to make useful TB changes. Of course, if you want to debug
> those, you need them. But those are mostly in the Gecko core/toolkit so you
> could as well not build TB but Firefox if you want those.

:aceman, have you been able to compile TB with these features disabled!?
I tried this about two-three years ago, and it worked. I could disalbe
audio/video media support. But then it no longer worked when I 
tried maybe about a year later.
I have not bothered to eliminate them, but if I can, I will!

> 
> I'm also with GCC 4.7 on linux. The linker memory problems are known and
> there are plans to fix it. But you need to search into Gecko Core bugs, not
> TB.

I see.
 
> The latest state of 4GB patches, that can be built it in
> https://bugzilla.mozilla.org/show_bug.cgi?id=793865#c52

I will try.

TIA
Blocks: 765514
Attachment #732882 - Attachment description: One line patch to declare m_totalMsgSize uint64_t instead of uint32_t (with proper checkin message) → One line patch to declare m_totalMsgSize uint64_t instead of uint32_t (with proper checkin message) [check-in in comment 45]
I have a test for this that plugs into the modified test file from bug 640371.
Depends on: 640371
Flags: in-testsuite?
Attached patch WIP test (obsolete) — Splinter Review
To be applied on top of bug 640371.
Uploading as a proof of concept. Will probably need update for review comments and patch changes in bug 640371.
Attachment #742706 - Flags: review?(irving)
Comment on attachment 742706 [details] [diff] [review]
WIP test

Review of attachment 742706 [details] [diff] [review]:
-----------------------------------------------------------------

Unfortunately, this worked on one test run and timed out on another; I'm concerned that it's going to be unreliable in our automated test environment because compacting the 4GiB file takes so long that sometimes the test harness will kill the test before it finishes. Unfortunately, I don't have any good ideas about how to avoid this problem.

::: mailnews/local/test/unit/test_over4GBMailboxes.js
@@ +338,5 @@
>      copyIntoOver4GiB();
>    }
>  };
>  
> +var ParseListener3 =

Use relevant names for these, rather than just numbers - for example, ParseListenerOver4GiB

@@ +351,5 @@
> +    compactOver4GiB();
> +  }
> +};
> +
> +var CompactListener1 =

CompactListenerOver4GiB
Attachment #742706 - Flags: review?(irving) → review-
Can we inform the test infrastructure that we need more time for this test?

Maybe Ted could know? (Ted: the problem here is that the test runs over a 4GB file and it is too slow.)
Flags: needinfo?(ted)
Blocks: 616559
As long as we're not hanging with no output for more than approx 5 mins (iirc), we don't have an issue with overall time. Additionally given that our other 4GB tests do the same, we shouldn't have an issue here.
Flags: needinfo?(ted)
Which other 4GB tests? The other ones are fast as they rely on the sparse flag on files. But this test loses it so it may really take 4GB on any filesystem (unless it is some compressing one) and may take long, as Irving says. I also don't think we output anything while compacting.
Flags: needinfo?(mbanner)
xpcshell tests do not monitor the output as it happens, they simply have a global timeout. Right now there's no way to indicate that a test takes longer. What we could do is add an additional entry to manifests that would function similarly to Mochitest's requestLongerTimeout():
http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/SimpleTest.js#458

Perhaps something like:
[test_foo.js]
requestlongertimeout = 2 # 2x longer timeout

If you wanted to implement such a thing, you ought to be able to pull it out of the test dict in the run_test method:
http://mxr.mozilla.org/mozilla-central/source/testing/xpcshell/runxpcshelltests.py#833

Note how we do "if 'disabled' in test:", etc, you should be able to just do "if 'requestlongertimeout' in test:". Then you'd simply multiply HARNESS_TIMEOUT by that value when passing it to the timeout Timer:
http://mxr.mozilla.org/mozilla-central/source/testing/xpcshell/runxpcshelltests.py#894
Attached patch patch for xpcshelltest.py (obsolete) — Splinter Review
Thanks Ted, great hints. So I created a patch like you suggested. If it works for Irving I will file a separate bug for Core to make the change officially.
Flags: needinfo?(mbanner)
Attached patch test v2 (obsolete) — Splinter Review
Irving please try this again, with the other patch to /mozilla applied too.
Attachment #742706 - Attachment is obsolete: true
Attachment #749005 - Flags: review?(irving)
Comment on attachment 749005 [details] [diff] [review]
test v2

Standard8, can you please look at this? Maybe the timeout irving sees is caused by the whole test not working on Windows (as followed in another bug). But maybe this could be landed so that it runs at least on other platforms. If it runs for you and on Tb-try, I can remove the xpcshell.ini change.
Attachment #749005 - Flags: review?(mbanner)
Comment on attachment 749005 [details] [diff] [review]
test v2

Review of attachment 749005 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see any problems with the code at first glance, but the patch has suffered serious bitrot. I had a bit of time, so I took a run at fixing the bitrot and running the test. I'll attach a copy of my changes.

It passes, but it prints out some strange log messages; for example

TEST-PASS | /Users/ireid/tbird/obj/comm-central/mozilla/_tests/xpcshell/mailnews/local/test/unit/test_over4GBMailboxes.js | [downloadOver4GiB : 181] true == true

TEST-INFO | (xpcshell/head.js) | test pending (2)
GetDiskSpaceAvailable returned: 180679979008 bytes
Begin mail message delivery.
Not enough disk space! Raising error!
Abort mail message delivery.

TEST-PASS | ../../../resources/POP3pump.js | [OnStopRunningUrl : 65] 2147500037 == 2147500037

If those warnings are the way they're supposed to be, and my bitrot fixes are correct, I'm happy to turn the f+ to r+.
Attachment #749005 - Flags: review?(irving) → feedback+
Fixed simple merge failures in xpcshell.ini and test_over4GBMailboxes.js, and replace all instances of gLocalInboxFolder with localAccountUtils.inboxFolder.
Attachment #787248 - Flags: feedback?(acelists)
Comment on attachment 787248 [details] [diff] [review]
Tests v2, with Irving's attempt at removing bitrot

(In reply to :Irving Reid from comment #61)
> It passes, but it prints out some strange log messages; for example
> TEST-PASS |
> /Users/ireid/tbird/obj/comm-central/mozilla/_tests/xpcshell/mailnews/local/
> test/unit/test_over4GBMailboxes.js | [downloadOver4GiB : 181] true == true
> 
> TEST-INFO | (xpcshell/head.js) | test pending (2)
> GetDiskSpaceAvailable returned: 180679979008 bytes
> Begin mail message delivery.
> Not enough disk space! Raising error!
> Abort mail message delivery.
> 
> TEST-PASS | ../../../resources/POP3pump.js | [OnStopRunningUrl : 65]
> 2147500037 == 2147500037
> 
> If those warnings are the way they're supposed to be, and my bitrot fixes
> are correct, I'm happy to turn the f+ to r+.
Yes, these messages are expected as the test specifically tests those error conditions to appear when needed (e.g. downloadOver4GiB is trying to download messages crossing the 4GB size of the mailbox). After your changes the test still passes for me on linux. Thanks. If you didn't need the timeout increase support from xpcshell, then we can drop that from xpcshell.ini.
Attachment #787248 - Flags: feedback?(acelists) → feedback+
Comment on attachment 787248 [details] [diff] [review]
Tests v2, with Irving's attempt at removing bitrot

Review of attachment 787248 [details] [diff] [review]:
-----------------------------------------------------------------

OK, this looks ready to land.
Attachment #787248 - Flags: review+
Thanks. Can we drop the change from xpcshell.ini? Did you need to use the "patch for xpcshelltest.py" to run the 4Gb test successfully?
I don't remember. Probably best to run a try build and see what happens across platforms. I'll run that for you.
Thank you for your try submission. It's the best!

Results will be displayed on TBPL as they come in:
https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=0c756b7b3d98

Once completed, builds and logs will be available at:
http://ftp.mozilla.org/pub/mozilla.org/thunderbird/try-builds/ireid@mozilla.com-0c756b7b3d98
Interesting results on the try run. The test passed on OS X (is it 64bit?) and all 64bit linux. It failed on Linux 32bit due to timeout/crash, maybe killed by xpcshell. The test is disabled on Windows so no failure there.

Can we assume the 64bit machines are a bit faster so they allowed the test to finish before the time out of xpcshell? That would mean the test is OK, we just need to proceed with adding the timeout extension functionality into xpcshell.

We can test the theory by pushing "patch for xpcshelltest.py" and "Tests v2, with Irving's attempt at removing bitrot" again and see what it does on linux. Is it possible to do this for a try run? The first patch is for mozilla-central. Ted, Irving, do you know?
Depends on: 927550
Comment on attachment 749005 [details] [diff] [review]
test v2

Removing obsolete review request.
Attachment #749005 - Flags: review?(mbanner)
(In reply to :aceman from comment #68)
> Interesting results on the try run. The test passed on OS X (is it 64bit?)
> and all 64bit linux. It failed on Linux 32bit due to timeout/crash, maybe
> killed by xpcshell. The test is disabled on Windows so no failure there.
> 
> Can we assume the 64bit machines are a bit faster so they allowed the test
> to finish before the time out of xpcshell? That would mean the test is OK,
> we just need to proceed with adding the timeout extension functionality into
> xpcshell.
> 
> We can test the theory by pushing "patch for xpcshelltest.py" and "Tests v2,
> with Irving's attempt at removing bitrot" again and see what it does on
> linux. Is it possible to do this for a try run? The first patch is for
> mozilla-central. Ted, Irving, do you know?

Hi, aceman,

I think I can do this after a clean up of local files.
(Use of hg to take care of both c-c, and m-c AND
to live with my own local patches complicates things a little bit.)
But if you need to run a tryserver run for linux tell me which patches I should use.

TIA
Of course I can do this locally and it works. But I needed to push the 2 patches to tryserver. However it seems the m-c patch is being accepted and checked in so we do not need anything special now (see bug 927550). So after it lands we push the " Tests v2, with Irving's attempt at removing bitrot" patch again to the c-c try server and see if it works now. You do not need to do anything for now, thanks.
Comment on attachment 749004 [details] [diff] [review]
patch for xpcshelltest.py

(moved to other bug)
Attachment #749004 - Attachment is obsolete: true
Comment on attachment 749005 [details] [diff] [review]
test v2

(obsoleted by new version of the patch)
Attachment #749005 - Attachment is obsolete: true
Yay, with the last try run under
https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=6b593fad925d , the test seems to have passed. That is with m-c having the "timeout extensions patch" already and so the patch here using the feature.

TEST-INFO | /builds/slave/test/build/xpcshell/tests/mailnews/local/test/unit/test_over4GBMailboxes.js | running test ...
TEST-PASS | /builds/slave/test/build/xpcshell/tests/mailnews/local/test/unit/test_over4GBMailboxes.js | test passed (time: 458337.732ms)

Irving, is there anything to do here before checkin?
Flags: needinfo?(irving)
I'm a bit nervous about adding a very slow test to the suite that runs on every check-in; our build/test infrastructure is already overloaded. I'm not sure if there's anything we can do about that at this point other than disable the test on Windows and then run it manually on occasion.

Mark, would you rather keep the slow test in every build or disable it?

Aside from that, I think this is ready to land.
Flags: needinfo?(irving) → needinfo?(mbanner)
The test is already disabled on Windows due to other bug (I'm working on it). But it will be this slow on all systems. What about if we ran the slow part of the test based on a random number (e.g. in 1 in 10 or 100)? In that case it will not be disabled permanently (and nobody remembers to enable it when needed). It will not catch problems immediatelly, but at least after the 10 or 100 runs (just several checkins). We need this test for further patches on the >4GB front.
(In reply to :aceman from comment #76)
> The test is already disabled on Windows due to other bug (I'm working on
> it). But it will be this slow on all systems. What about if we ran the slow
> part of the test based on a random number (e.g. in 1 in 10 or 100)? In that
> case it will not be disabled permanently (and nobody remembers to enable it
> when needed). It will not catch problems immediatelly, but at least after
> the 10 or 100 runs (just several checkins). We need this test for further
> patches on the >4GB front.

It seems to be a good idea: with the large number of check-ins, maybe we could target 1/200 as a possible frequency of execution of the test, and I think we still get several tests per week or something?
Running it constantly (albeit not always) is the key here to catch any regressions, etc.

TIA
Attachment #8335601 - Flags: review?(mbanner)
Attachment #8335601 - Flags: review?(irving)
Comment on attachment 8335601 [details] [diff] [review]
test v3, with probabilistic run of the slow part

Review of attachment 8335601 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is OK.
Attachment #8335601 - Flags: review?(irving) → review+
Attachment #787248 - Attachment is obsolete: true
Ok, I think this is probably fine to go with the random runs. We can always change our minds if it adversely affects things.
Flags: needinfo?(mbanner)
Comment on attachment 8335601 [details] [diff] [review]
test v3, with probabilistic run of the slow part

As irving has already reviewed this I don't need to.
Attachment #8335601 - Flags: review?(mbanner)
Thanks. I wanted you to see it just due to the random runs.
So let's try to push it.
Keywords: checkin-needed
Whiteboard: [please leave open]
https://hg.mozilla.org/comm-central/rev/aa63e2ff56a3
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Flags: in-testsuite? → in-testsuite+
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 29.0
Why do the devs assert the bug is fixed when it is not?

I am running the latest TB and I tested filling my inbox
close to 4GB or messages (I did this by first exiting TB
and then catenating my older Inbox.sav into Inbox, restarted
TB, which rebuilt the Inbox index file, then I ran compact.
It failed!!!
JD, the bug fix was checked in an hour ago; in our development process, that's when we mark the bug 'fixed'. That means it will appear in tonight's Thunderbird Daily builds, and after appropriate testing, will progress through Earlybird (alpha) and Beta releases. See https://wiki.mozilla.org/Releases for the timelines, and https://wiki.mozilla.org/RapidRelease for a description of our release process.

Depending on the result of testing, and the user impact, we can nominate this bug fix for fast-track inclusion in the release builds of Thunderbird, but we won't do that until it's been in Daily for a while to make sure it doesn't cause any problems.
Thnx for theheads up re: the deployment of the bugfix.
I hope Fedora will include it in the FC18 updates.
Actually the real code fix was checked in long ago (comment 45), this time only a test was checked in.

JD, if you have new evidence/scenario where the compact fails, please open a new bug (with summary "compact of folder near 4GB fails even after bug 794303"). In that bug please specify how small the mssages in the inbox are. First read bug 793865 as you may be running out of message keys as discussed there (which is a different problem).
As soon as I actually update my TB via Fedora updates, and the update contains these recent fixes, you can bet I will be testing.

Most of the messages are rather small. A few that contain photos can be as large as 10 to 20 MB.
Blocks: 1144020
No longer blocks: 1144020
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: