Last Comment Bug 674742 - (destroys_encrypted) Compacting a Mail folder can delete all copies of the folder in certain error conditions
(destroys_encrypted)
: Compacting a Mail folder can delete all copies of the folder in certain error...
Status: RESOLVED FIXED
[2 bugs in one. 1 file handling bug t...
: dataloss, main-thread-io, regression
Product: MailNews Core
Classification: Components
Component: Backend (show other bugs)
: 5.0
: All All
: -- critical (vote)
: Thunderbird 18.0
Assigned To: Mark Banner (:standard8, limited time in Dec)
:
:
Mentors:
http://getsatisfaction.com/mozilla_me...
: 625953 777473 789754 (view as bug list)
Depends on: 784872
Blocks: 398151 199473 498814 541256 545650 575202 628255 646111 788212 789754 797215
  Show dependency treegraph
 
Reported: 2011-07-27 15:40 PDT by Andrew
Modified: 2015-03-16 23:25 PDT (History)
29 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
+
fixed
fixed
fixed


Attachments
html of Andrew's comment 23 (12.96 KB, text/html)
2012-08-06 13:26 PDT, Wayne Mery (:wsmwk, NI for questions)
no flags Details
Description of the 2 bugs. (Formerly comment #23.) (15.48 KB, text/html)
2012-08-06 17:53 PDT, Andrew
no flags Details
Description of the 2 bugs. (Formerly comment #23.) (16.51 KB, text/html)
2012-08-07 05:48 PDT, Andrew
no flags Details
Description of the 2 bugs. (Formerly comment #23.) (17.62 KB, text/html)
2012-08-07 10:42 PDT, Andrew
no flags Details
Description of the 2 bugs. (Formerly comment #23.) (18.81 KB, text/html)
2012-08-07 13:25 PDT, Andrew
no flags Details
Description of the 2 bugs. (Formerly comment #23.) (23.48 KB, text/html)
2012-08-08 18:40 PDT, Andrew
no flags Details
Description of the 2 bugs. (Formerly comment #23.) (30.13 KB, text/html)
2012-08-09 10:19 PDT, Andrew
no flags Details
An xpcshell test simulating the failure in nsIMsgFolderCompactor.compact (22.93 KB, patch)
2012-08-16 16:32 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
Fix (6.61 KB, patch)
2012-08-17 03:02 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
Simple fix (2.04 KB, patch)
2012-08-19 18:15 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
Updated test (23.38 KB, patch)
2012-08-19 18:17 PDT, Hiroyuki Ikezoe (:hiro)
no flags Details | Diff | Splinter Review
Description of the 2 bugs. (Formerly comment #23.) (31.99 KB, text/html)
2012-08-20 12:11 PDT, Andrew
no flags Details
Patch destructive compaction (1.88 KB, patch)
2012-08-21 10:18 PDT, Andrew
no flags Details | Diff | Splinter Review
Patch destructive compaction (2.13 KB, patch)
2012-08-21 11:10 PDT, Andrew
no flags Details | Diff | Splinter Review
Patch destructive compaction, and file handling (5.56 KB, patch)
2012-08-22 08:24 PDT, Andrew
no flags Details | Diff | Splinter Review
Patch destructive compaction, and file handling (5.56 KB, patch)
2012-08-22 08:33 PDT, Andrew
no flags Details | Diff | Splinter Review
Patch destructive compaction, and file handling (5.57 KB, patch)
2012-08-22 08:41 PDT, Andrew
no flags Details | Diff | Splinter Review
Patch destructive compaction, and file handling (5.54 KB, patch)
2012-08-22 08:48 PDT, Andrew
rkent: review-
Details | Diff | Splinter Review
Description of the 2 bugs. (Formerly comment #23.) (41.34 KB, text/html)
2012-08-24 08:42 PDT, Andrew
no flags Details
Patch destructive compaction, and file handling (7.26 KB, patch)
2012-08-24 18:56 PDT, Andrew
no flags Details | Diff | Splinter Review
Patch destructive compaction, and file handling. Warnings on web sites. (8.53 KB, patch)
2012-09-04 07:18 PDT, Andrew
no flags Details | Diff | Splinter Review
Patch destructive compaction, and file handling. Warnings on web sites. (8.57 KB, patch)
2012-09-04 07:54 PDT, Andrew
no flags Details | Diff | Splinter Review
Patch destructive compaction, and file handling. Warnings on web sites. (8.65 KB, patch)
2012-09-04 08:23 PDT, Andrew
no flags Details | Diff | Splinter Review
xpcom patch v.1 (5.22 KB, patch)
2012-09-04 11:07 PDT, Jim Mathies [:jimm]
no flags Details | Diff | Splinter Review
andrew's fix v.1 (4.31 KB, patch)
2012-09-04 11:24 PDT, Jim Mathies [:jimm]
no flags Details | Diff | Splinter Review
Patch destructive compaction, and file handling. Warnings on web sites. (9.05 KB, patch)
2012-09-05 12:02 PDT, Andrew
no flags Details | Diff | Splinter Review
Description of the 2 bugs. (Formerly comment #23.) (44.92 KB, text/html)
2012-09-05 12:24 PDT, Andrew
no flags Details
Patch destructive compaction, and file handling. Warnings on web sites. (9.70 KB, patch)
2012-09-08 09:31 PDT, Andrew
no flags Details | Diff | Splinter Review
Patch destructive compaction, and file handling. Warnings on web sites. (9.69 KB, patch)
2012-09-08 10:33 PDT, Andrew
no flags Details | Diff | Splinter Review
Description of the 2 bugs. (Formerly comment #23.) (50.96 KB, text/html)
2012-09-08 10:51 PDT, Andrew
no flags Details
Description of the 2 bugs. (Formerly comment #23.) (65.92 KB, text/html)
2012-09-17 18:12 PDT, Andrew
no flags Details
Fix compaction of folders (8.03 KB, patch)
2012-09-26 08:27 PDT, Mark Banner (:standard8, limited time in Dec)
rkent: review+
standard8: approval‑comm‑beta+
Details | Diff | Splinter Review
Description of the 2 bugs. (Formerly comment #23.) (86.92 KB, text/html)
2012-10-08 03:41 PDT, Andrew
no flags Details

Description Andrew 2011-07-27 15:40:08 PDT
User Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en; rv:1.9.1.3) Gecko/20090824 Firefox/3.5.3 (.NET CLR 3.5.30729)
Build ID: 20110706120824

Steps to reproduce:

Manually compress a file folder of storage.


Actual results:

Destroyed both the main file and "*.msf".

The indicator in the tree on the folder continues to show the number for unread messages. The preview pane goes blank.

Low level recovery not possible, because it immediately writes onto the same location on disk another file: localstore.rdf

Compression works well in simple folders of the OS.
If the folder of the OS is encrypted, then it destroys the file always.

Version 2.2 is affected.

Version 2.0 and earlier did not have this defect, although compression in them did sometimes delete a folder, when this coincided with other activity in it.


Expected results:

SeaMonkey 2.2 should save the purged (compressed) file.

<ul>Workarounds:<li>Back up your data.</li><li>Turn off compression in preferences.</li><li>Remove encryption from containing folders in the OS. Do not forget to apply encryption later, when programmers resolve this error in software.</li><li>Refrain from invoking compression of folders manually.</li></ul>
Comment 1 Andrew 2011-07-27 15:48:21 PDT
<ul>Workarounds:<li>Back up your data.</li><li>Turn off compression in preferences.</li><li>Remove encryption from containing folders in the OS. Do not forget to apply encryption later, when programmers resolve this error in software.</li><li>Refrain from invoking compression of folders manually.</li><li>Immediately post a warning on the web site of SeaMoneky, and other parts of Mozilla, which might have this bug.</li></ul>
Comment 2 Andrew 2011-07-27 16:13:42 PDT
Steps to reproduce:

Manually compress a file of storage (mail folder).
I must reside in an encrypted folder of the OS (Windows).

Actual results:

Destroyed both the main file and "*.msf".

The indicator in the tree on the folder continues to show the number for unread messages. The preview pane goes blank.

Low level recovery not possible, because it immediately writes onto the same location on disk another file: localstore.rdf

Compression works well in simple folders of the OS.
If the folder of the OS is encrypted, then it destroys the file always.

Version 2.2 is affected.

Version 2.0 and earlier did not have this defect, although compression in them did sometimes delete a folder, when this coincided with other activity in it.

Expected results:

SeaMonkey 2.2 should save the purged (compressed) file.
Comment 3 Andrew 2011-07-27 16:16:50 PDT
Sorry for the duplicate. I can't edit the previous comment. So I post another. Administrators please remove the first comment, and this one.
Comment 4 Andrew 2011-07-27 17:35:53 PDT
Another defect of a similar sort, though less deadly, occurs at installation.
nsIPrefServer.SavePref(...) fails, if the file "prefs.js" goes to an encrypted file folder.

As a consequence the Extension Manager complains the most. At start-up it fails to create some database (does not report which), and "rolls back" something, though it does report what. See the Error Console.
Yet add-ons do continue to work.
But SeaMonkey does not save changes to preferences, to tabs, etc.

The workaround:
remove encryption on the folder, in which the file "prefs.js' resides.
When later versions of SeaMonkey begin to work well, you may reapply encryption.
Comment 5 therube 2011-07-27 18:01:41 PDT
How, what are you using to "manually compress a file folder"?

How, what are you using to "encrypt a file folder"?

(I have [Windows] BitLocker in W7, but no way I'm going to touch that!)
Comment 6 therube 2011-07-27 21:22:45 PDT
> How, How

Figured that part out. W7 (Ultimate, if it matters), right-click a folder, General tab, Advanced box.  You can either compress or encrypt.

(I either compress or encrypted then entire Profile folder.)

I don't have Mail to check with, so I used News (news.mozilla.org) & (from the little or none that I know about Mail/News) I'm not seeing a problem, compressed or encrypted or neither.  Everything appears to be working correctly.

> Turn off compression in preferences.

What preference, where?

Comment 2 ...

Encrypt Profile folder
Compress News folder (residing within Profile folder, so initially was Encrypted too)

Same results, I'm not seeing any issues (with News).

Comment 4 ...

Quit
Takes existing, Encrypted Profile, delete all the files
Start SeaMonkey

Profile populates, including prefs.js
about:config, change a preference (I used browser.sessionstore.max_concurrent_tabs,9)
Verify the change in prefs.js on disk (SeaMonkey still open)
Quit

Verify the change in prefs.js on disk still retained (SeaMonkey closed)

Restart
about:config
Verify the Preference change has been retained

Change to another setting & repeat with same results


(I don't see a way to make a folder/files both Encrypted & Compressed.  Maybe I'm not understanding, or maybe use of Mail is required, but I'm not seeing any issues?  Run a CHKDSK on your drive?)

Mozilla/5.0 (Windows NT 6.1; rv:5.0) Gecko/20110706 Firefox/5.0 SeaMonkey/2.2
Comment 7 Andrew 2011-07-28 04:56:57 PDT
This has nothing to do with the health of the file system in the OS. But of course I checked it before further tests. It is not corrupt.

Regarding the mail folder.

Perhaps you are trying a different thing from what I suggest.
I mean Compacting not in the OS, but in SeaMonkey.
I mean Encryption not in SeaMonkey, but in the OS.

Maybe you got confused by my mistake in the term. Initially I wrote "Compression". While actually in SeaMonkey it is called "Compacting". - "Compact This Folder" on the pop up menu, and "Compact Folders" in the File  Menu.
Now I have noticed this mistake in terms, and corrected the title of this report appropriately.

"Compact Folders" is the most deadly command now, because it would destroy ALL folders, not just one.

Nobody cares about News folders. They may get lost. No damage perceived in that.

Check this BUG with Mail.

Create a dummy account, if you have none.
Create a dummy mail folder.
Create a dummy message in it.
Go to Windows Explorer, right-click on the folder, which contains your mail folder file, select "Advanced", and encrypt it. Again: no need to encrypt all files in it, but you may for completeness.
Go back to SeaMonkey Mail.
Right-click on the dummy folder. 
Select "Compress this folder".
Watch you dummy folder go blank. This means it has disintegrated.
On restart of SeaMonkey Mail it also disappears from the tree.

For completeness you may encrypt the containing folder before operation.

Repeat the same with the containing folder unencrypted in the OS. It would work well.

Regarding "prefs.js".

You do not have to go to the pains of starting and checking, and repeating all that.

Simply...
Encrypt the folder in the OS. You do not have to encrypt all the files under it, but you may for completeness.
Start SeaMonkey.
Start Error Console, and look among the first reports. You will see about Addon manager failing through nsIPrefServer, and alike.

Remove the encryption check-mark in the OS, start SeaMonkey, and watch all errors clear on startup of SeaMonkey 2.2.

When nsIPrefServer fails, then thereafter any change to any current settings is not permanent. Only cache of web pages persists.
For example, multiple tabs in the last closed windows are not saved, even when you press "Save and  quit" in an exit confirmation dialog box.
When you restart SeaMonkey, the number of tabs reopening is again what it was before the change.
This is an enough indicator. No need for difficult checks.

SeaMonkey 2.0 is not affected. I did not try SeaMonkey 2.3.
Try with different versions of Windows. I will too, when I get my hands on the other versions.
Comment 8 Siddharth Agarwal [:sid0] (inactive) 2011-07-28 11:58:43 PDT
So in theory encryption's supposed to be completely transparent to the application, so this really shouldn't be happening...
Comment 9 Philip Chee 2011-07-28 12:00:53 PDT
Does this also happen with Thunderbird 5 or 6b?
Comment 10 Philip Chee 2011-07-28 12:02:50 PDT
Moving to MailNews Core as Compacting is shared code.
Comment 11 Andrew 2011-07-28 14:32:19 PDT
Yes. It must be transparent.
I think somebody used a function from Windows API, which one fancied to be very attractive, without full understanding of what it actually does.
Or perhaps the function has a bug, which makes it fail on encrypted folders.

I presume the mechanism is this: Mozilla makes a copy of the compacted folder in a different location from the original, or maybe in the same one, but by another name. It is unencrypted. Then it overwrites the encrypted file. But it takes some time for the OS to encrypt the file. Mozilla does not wait and does something more to the file, while the OS has not completed it. Hence the collision, and disintegration.

Perhaps for this mechanism to work another bug of compaction must be present too. Then one less prominent, which I mentioned in earlier posts, and which many others complain: compaction fails, and deletes it, when there's another activity on the file.

So, it might not be a new API function in use, but some new programming trick, which is careless with the OS.

Even if you do not reproduce the BUG, it would be nice to eliminate the already long lived trouble with compaction anyway.
The best way to prevent data loss is to make any changes to file rely on transaction. The program should always check that the new file, which it creates instead of the the old one, does become available, and that the email messages do appear within it, and their quantity is sane. Only after that it may remove the old file.
Or better allow the user to configure how many back-ups of each file to keep alongside the new file.

This would solve this BUG, even if you do not find it. And it would prevent many other BUGs discovered and undiscovered yet.

Remember the old issue with Netscape/Mozilla/SeaMonkey destroying its Bookmarks?
Only recently it ceased behaving so, and makes backups.
I wonder: why file destruction is so inherent with SeaMonkey?

I did not try this BUG with Thunderbird. But I guess it is there. Just the version of the core should be the same as that of SeaMonkey 2.2.
Comment 12 David :Bienvenu 2011-07-28 14:37:52 PDT
(In reply to comment #11)
> Yes. It must be transparent.
> I think somebody used a function from Windows API, 

We don't use any Windows API functions directly. We use nsIFile methods, which don't use anything fancy afaik. Here's the windows-specific stuff:

http://mxr.mozilla.org/comm-central/source/mozilla/xpcom/io/nsLocalFileWin.cpp
Comment 13 Andrew Sutherland [:asuth] 2011-07-28 14:42:02 PDT
The mozilla platform in general tries to perform safe operations by doing as you say:
- create a temporary file, write to it, finish writing
- perform what should be an atomic move

Whether this is Mailnews' bad or Windows' bad really depends on where we are creating the temporary file.  In general, file moves are only atomic within a single file-system.  As long as we are creating the temporary file within the profile dir hierarchy, Mailnews should be in the clear.  If we are creating the temporary file in the TMP directory, that could be a different file-system and we should fix it.
Comment 14 David :Bienvenu 2011-07-28 14:49:17 PDT
we create the nstmp file in the same directory as the folder being compacted to avoid any possibly expensive operations copying to an other file system or partition. Hence the bug where you shutdown during the middle of compaction and wake up with an nstmp folder.
Comment 15 Andrew Sutherland [:asuth] 2011-07-28 14:54:28 PDT
(In reply to comment #14)
> we create the nstmp file in the same directory as the folder being compacted
> to avoid any possibly expensive operations copying to an other file system
> or partition. Hence the bug where you shutdown during the middle of
> compaction and wake up with an nstmp folder.

Okay, then this sounds like windows' compressed file/folder/encrypted file/folder is semantics affecting and should not be used.

Reporter/Andrew, I would suggest using something like truecrypt (http://www.truecrypt.org/) that does not affect file-system semantics in its operation.  I'm not aware of compression products that operate in a similar fashion, although they certainly could exist.
Comment 16 Andrew 2011-07-28 15:32:40 PDT
The fact that you use nsIFile does make sense of the character of this BUG.
Remember the problem with "prefs.js", which I mentioned in the now redundant initial comment, and in its latest version in comment #2, and there after.
It too emerged only in SeaMonkey 2.2. Both BUGs relate to encryption.
So they must be linked.

The difference in behaviour by upper level functions saves the "prefs.js" from destruction.
Probably that's because the functions in nsIPrefServer are already corrected, and roll back the file, if unable to save.

But Compaction does not do this check. So it simply destroys the old file.

This BUG is not in "shutdown" or "wake up" in the middle of compaction.
Though I presume that situation might arise too. But in such situation at least some file, like "nstmp" should exist, not deleted.

This BUG destroys data, and wipes the disk at its former location with another file "localstore.rdf", so one can't recover the deleted file.

If you say you create a file "nstmp", then perhaps I should be looking for a deleted file by that name instead of "Inbox" or whatever is destroyed.

Regarding the possible location for the temporary "nstmp". I think it does not matter. You may put it in same folder, or in another, or on another file system.
I do not see the difference.
If Compaction want's to destroy data, it will anyway.

As I mentioned, the trick should be to check the file before deletion of the old version.
This way you would prevent many bugs.
You may also catch those new bugs, as the program detects failure, it may report the failure to the user, and suggest to report it to the programmers.

I forgot to remind you about how to check for the Mail BUG.
Creating a dummy message is not enough. Compaction will not start, unless there's any space to recover from the Mail file.
So you have to create at least 2 dummy messages.
One you keep. Another - delete.
Then launch Compaction. And watch the second message also "disappear along with the whole Mail file.
Take every precaution to compact just this file (right button menu, Compact This Folder), or risk losing all.

Using alternative encryptors does not solve the BUG.

Perhaps you do not use Windows, so you haven't known that encryption is inherent to Windows OS. It is "transparent" to user software. It does not prevent user access to the data. It only encrypts the data on disk.
So the software must be able to work regardless.

An unsuspecting user would check the tick box in Windows OS (encrypt), and innocently risk lose all mail. This may happen, if the unused space in mail files exceeds what is in the preferences, and automatic Compaction (named "purge" in prefs) is enabled.
Mozilla would selectively destroy the largest files, which are the most valuable.

Compression is also built into the file system of Windows.
I did not try if the BUG also manifests itself on compressed data, or in compressed folders. I will try.
Comment 17 Andrew 2011-07-29 07:59:56 PDT
I have tested in a compressed folder (sub-folder of Mail storage). Compaction does work in it well. It takes an uncompressed file, and creates a compressed copy of it.

By the way, this means that it does not operate on the original file, it creates a copy (maybe before, maybe after compaction). If it were to work on the original, then the file would remain uncompressed, because it was not. Only the folder was marked to be compressed in the OS. So only new files are compressed in it.

Compaction succeeds even if the original file is encrypted. The thing is that the new file must become unencrypted. For that to happen its folder must be marked as unencrypted in the OS.

Compaction only fails in encrypted folders, because the OS perhaps tries and fails to create an encrypted file. Or Compaction interferes with this.

Maybe Mozilla keeps the file open in non exclusive mode, which it maybe did not do previously - a programming mistake.

I inspected the disk. There's no trace of "nstmp" file (either deleted or not). This might mean that Compaction fails even to create it. Not to speak of the further processing.
Comment 18 David :Bienvenu 2011-07-29 08:04:50 PDT
When the compaction is done, we rename nstmp to the original folder name, so I wouldn't expect to see a deleted version.
Comment 19 Wayne Mery (:wsmwk, NI for questions) 2011-09-10 18:05:52 PDT
isn't this either
a) not an encryption issue, i.e. a view bug or a garden variety compact issue [1], or
b) invalid, because by design there's no way Thunderbird can be affecting with encryption state of a file.

[1] but not something locking out the files because that was fixed, no?  Except that we still have open bugs like
Bug 498814 - "Compact Folder" silently fails and deletes .msf, if mail folder file is opened by other software (in the worst case, generates null mail folder file or deletes mail folder file)
Bug 628255 - messages deleted from inbox; compaction started on nothing; all lost
Comment 20 Andrew 2011-09-12 08:13:13 PDT
I made more tests:
Can't reproduce on Windows XP.
Always reproduces on Windows 2000.

This does not mean that XP is immune to the bug. Maybe the bug simply did not manifest itself.
As I write below, compaction is not robust. This is one more and a bigger bug in itself, which persists regardless of the OS.

Tests show, that compaction does create "nstmp" file. It is probably valid.
Creation of this file succeeds even in encrypted folder.

It deletes the file at some later moment, which I did not yet detect.
It deletes the original file.
And tries to rename "nstmp". That's perhaps when it disappears.
Seems like someone has used a wrong switch for a function call from the OS.

Or perhaps it renames the copy without deleting the original. Maybe Some OS do allow this, and some do not and rather fail.

The other cases, which you point to, are different.
One makes receiving mail disappear while deleting already stored one.
Another destroys the original storage, but keeps the "nstmp" file.

The bug with encryption is different. It does create "nstmp". And then destroys it and the original.

The failure does not relate to another software keeping the file open.
In my tests all other software is off.

It is possible, that the "Mail" itself keeps the file open, while it tries to overwrite it with a new one. Then it perhaps interferes with itself, and fails.

I do not believe this bug directly relates to encryption.
Maybe encryption interferes with compaction indirectly. Maybe it introduces some delay in the OS, which compaction does not care to wait or check for, and goes ahead regardless.

Remember that this is 2 bugs in one.
Or 1 bug affecting 2 actions:
- compaction;
- saving of prefs.js.

So it might be a low level bug in parts common for both actions.

It is a new bug. Previous versions never had it. So, I think you may quickly track down the guilty change.
I suggest you check the changes in code down to the actual point of calling the OS for renaming the file.
Someone has put some strange bit there.

Another more general suggestion remains, as I mentioned already:
make compaction robust.
It should not delete the original before it makes sure that renaming the copy does succeed.

This method would solve or at least eliminate the adverse effects of this and any other bugs reported about compaction, which you mentioned, and did not mention yet.

On failure the user would be alerted, and recover data. Then the bug would remain, but it would not be critical.
Comment 21 Wayne Mery (:wsmwk, NI for questions) 2011-09-12 09:03:36 PDT
> Previous versions never had it. So, I think you may quickly track down the guilty change.

I doubt that anyone else can reproduce this. But given that it's easy for you to reproduce, your skills should easily be able to narrow the regression range to within a day or two of the failure. The steps are a binary search and are pretty easy:
1. you know it works on date A but fails on date Z
2. get the build for date of the build in the middle of A and Z, and test it.   
3. If M fails, then get the build in the middle of A and M. If M works, then get the build in the middle of M and Z.   
-- repeat the process.

The builds are at ftp://ftp.mozilla.org/pub/thunderbird/nightly/2011/
The process is also described at http://www.rumblingedge.com/2009/02/24/howto-find-regression-windows-through-manual-binary-search/
Comment 22 Wayne Mery (:wsmwk, NI for questions) 2011-09-12 09:39:02 PDT
In addition, the following makes no sense ...

> Can't reproduce on Windows XP.
> Always reproduces on Windows 2000.
And always fails in win7, if comment 0 is accurate

(Failing all 3 OS does not prove it is caused by TB, but ...)
If thunderbird were definitely at fault, would it not have to fail on all 3 OS?
Comment 23 Andrew 2012-08-06 10:38:41 PDT
''Sorry for the HTML. Ask me to supply plain text, if unbearable.''
<html><body><h1>Introduction</h1>No need to grind through tests of different versions.
Simply look at the code.

<b>I found the error.</b>

It took me several hours. Because I am not familiar with this code.
For you it would take about an hour. Any developer should look at the code in the first place. But you rather engaged in discussions about how it could not be.
You neglected your duties. I did your job.<h2>Summary</h2>The compaction begins in chrome://messenger/content/widgetglue.js
The function MsgCompactFolder uses GetSelectedMsgFolders to obtain a single nsIMsgFolder. It does nothing, if many folders are selected. Although it will compact all folders in an "Account", if asked to.
Hence it would destroy all.

There I examined this call:
<code>selectedFolder.compact(null, msgWindow)</code>

It is a "native" function. Not in JS code. Hence it might be different for different OS.

It is defined only here:
http://mxr.mozilla.org/comm-central/source/mailnews/base/public/nsIMsgFolder.idl#202

I failed to definitely find what this function maps to from JS to Native code.
This is the inconvenience of your system. It only searches for all occurrences of text. It does not allow to immediately go to the definition of the function.
The search is case insensitive. The only match is "Compact". Note that it differs in case. This upper case brings up plenty of matches.

I skip straight to:
http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp

It <b>does create a compact "<i>nstmp</i>"</b> file.
Then it <b>destroys the original</b>, and tries to <b>move</b> "<i>nstmp</i>" to the original name.
The command to <b>move</b> actually uses a command to <b>copy</b>. A copy <b>fails</b>. Then <b>FolderCompactor</b> interprets the failure flag as an instruction to <b>destroy</b> the "<b>nstmp</b>" too.
That is how <b>both copies</b> are destroyed. - A <b>complete loss of data</b>.

It never should destroy it. If "<b>move</b>" (<i>actually "<b>copy</b>"</i>) <b>fails</b>, then the original is already gone. So we need to keep the copy.
If "move" succeeds, then there's no need to remove the copy anyway.

Another <b>Bug</b> is in the underlying <b>nsILocalFileWin</b>:nsILocalFile:nsIFile.
This is where the "<b>copy</b>" actually <b>fails</b>.

The <b>Bug</b> in <b>nsILocalFileWin</b> is new. You introduced it in 2011.
The <b>Bug</b> in <b>nsFolderCompactState</b> is old. It simply waited to manifest itself.

The <b>Bug</b> in <b>nsILocalFileWin</b> also <b>affects</b> the behaviour, which I described with "<b>prefs</b>" (<b>nsIPrefServer.SavePref(...)</b>).

But SavePref(...) behaves differently during roll-back. It is programmed correctly to <b>preserve the copy</b>.
While <b>nsFolderCompactState</b> is programmed to <b>always destroy</b> it.

Both functions are flawed. But one is at least <b>safe</b>.

The <b>best</b> way would be <b>not</b> to <b>delete</b> the original <b>before</b> you <b>succeed</b> to rename a copy. Retain the data for as long as possible.
That is: rename the original, rename the copy, only after that delete the original.

Now I will examine <b>nsFolderCompactState</b>, and later <b>nsILocalFileWin</b>.<h2>Bug in nsFolderCompactState</h2>The mistake occurs in the function "<b>FinishCompact</b>".

Note that there are 2 functions by that name:
nsFolderCompactState::<b>FinishCompact()</b>
nsOfflineStoreCompactState::<b>FinishCompact()</b>
The former <b>does have</b> the <b>Bug</b>.
The latter is <b>free</b> of the <b>Bug</b>, because it does not check for errors, and hence does not remove the copy. It simply moves it.
Perhaps you should introduce the <b>Bug</b> into both. I see no reason why they are different.

The faulty code:
<code>443	rv = m_file-><b>MoveToNative</b>((nsIFile *) nullptr, leafName);
444	NS_ASSERTION(NS_SUCCEEDED(rv), "error renaming compacted folder");
445	if (NS_SUCCEEDED(rv))
446	{
447	<b>folderRenameSucceeded</b> = true;
448           rv = newSummaryFile-><b>MoveToNative</b>((nsIFile *) nullptr, dbName);
449           NS_ASSERTION(NS_SUCCEEDED(rv), "error renaming compacted folder's db");
450           <b>msfRenameSucceeded</b> = NS_SUCCEEDED(rv);
451	}
452	}
453     }
454     NS_ASSERTION(msfRenameSucceeded && folderRenameSucceeded, "rename failed in compact");
455	}
456	if (!<b>folderRenameSucceeded</b>)
457	m_file->Remove(false);
458	if (!<b>msfRenameSucceeded</b>)
459	newSummaryFile->Remove(false);</code>

"<b>MoveToNative</b>" internally calls a function from the Operating System named "<b>CopyFileExW</b>". It call it with a wrong flag. So the latter fails. See below about the second <b>Bug</b> in <b>nsILocalFileWin</b>.

On this failure the code does not try to copy "<i>msf</i>". It simply destroys "<i>nstmp</i>", and "<i>nstmp.msf</i>". See lines 456-459, where it checks for <b>folderRenameSucceeded</b>, and <b>msfRenameSucceeded</b> is <b>false</b>. 

Note that you do <b>not check</b> for success the call for <b>MoveToNative</b> at line 448. This is which moves the "<i>nstmp.msf</i>".
This is <b>inconsistent</b> with your common practice, which you introduced for the main file "<i>nstmp</i>".
You simply and blindly mark success.
You do take <b>care not to remove</b> the <b>unimportant</b> file "<i>nstmp.msf</i>". Or be it only in case if you did try to move it.
But you do take care to <b>always remove</b> the <b>important</b> file "<i>nstmp</i>" (<i>it contains the vital data to lose</i>).

At lines 457 and 459 you again <b>fail to check</b> the result of removal of the copies for success or <b>failure</b>.<h2>Bug in nsILocalFileWin</h2>The indirect <b>call</b> to it occurs in in another file of code, in function "<b>FinishCompact</b>":
<code>443	rv = m_file->MoveToNative((nsIFile *) nsnull,nullptr, leafName);</code>
I gave the bigger excerpt above.
This call happens after you delete the original file with mail messages.

It indirectly calls the "<b>CopySingleFile</b>" with a flag <b>move</b>.

The "<b>CopySingleFile</b>" has a <b>new Bug</b>.
Compare the 2 versions:
http://mxr.mozilla.org/comm-central/source/mozilla/xpcom/io/nsLocalFileWin.cpp#1814
http://mxr.mozilla.org/comm-central/source/mozilla/xpcom/io/nsLocalFileWin.cpp?rev=7ef2a4732afe#1465

If "nstmp" is encrypted, it calls "CopyFileExW", instead of "MoveFileExW" as in another case (unencrypted).

<code>1841	if (FileEncryptionStatusW(filePath.get(), &status)
1842		&& status == FILE_IS_ENCRYPTED)
1843	{
1844	dwCopyFlags |= <b>COPY_FILE_ALLOW_DECRYPTED_DESTINATION</b>;
1845	copyOK = <b>CopyFileExW</b>(filePath.get(), destPath.get(), NULL, NULL, NULL, dwCopyFlags);
1846
1847	if (copyOK)
1848		DeleteFileW(filePath.get());
1849	}
1850	else
1851	{
1852	copyOK = ::<b>MoveFileExW</b>(filePath.get(), destPath.get(),
1853		MOVEFILE_REPLACE_EXISTING);
1854
1855	// Check if copying the source file to a different volume,
1856	// as this could be an SMBV2 mapped drive.
1857	if (!copyOK && GetLastError() == ERROR_NOT_SAME_DEVICE)
1858	{
1859	copyOK = CopyFileExW(filePath.get(), destPath.get(), NULL, NULL, NULL, dwCopyFlags);
1860
1861	if (copyOK)
1862		DeleteFile(filePath.get());
1863	}
1864	}</code>

<i>By the way there is posibly a mistake in file at line <i>1862</i>. Perhaps it should read "DeleteFile<b>W</b>", instead of "<b>DeleteFile</b>". This mistake rarely manifests itself. It is does not pose any danger. It simply retains the duplicate data.</i>

The old version of the code used to call:
<code>1470	copyOK = ::MoveFileExW(filePath.get(), destPath.get(),
1471		MOVEFILE_REPLACE_EXISTING |
1472		MOVEFILE_COPY_ALLOWED |
1473		MOVEFILE_WRITE_THROUGH);</code>

"<b>CopyFileExW</b>" (<i>line 1845</i>) fails, because the code supplies the wrong <b>dwCopyFlags</b>.
The return code is 0.
<code>GetLastError</code> reports <samp>ERROR_INVALID_PARAMETER 87 (0x57)</samp>.

When the file "<b>nstmp</b>" (<i>or "nstmp.msf"</i>) is <b>not compressed</b>, you use "<b>MoveFileExW</b>" (<i>line 1852</i>). You give it <b>no</b> chance to <b>fail</b>. No bad flag.

Note that on failure you choose <b>not</b> to <b>delete</b> the file. (<i>line 1847</i>) - Quite <b>unlike</b> you do in the <b>faulty</b> calling function "<b>FinishCompact</b>", which I discussed earlier.

The first version, which introduced the check for encryption:
http://mxr.mozilla.org/comm-central/source/mozilla/xpcom/io/nsLocalFileWin.cpp?rev=f4f9d45743a8#1465
It did <b>not fail</b>. It only introduced the potential for a <b>Bug</b>, a risk, a predisposition, an exposure to wrong flags.
Namely: the function "<b>CopyFileExW</b>", which it now uses instead of "<b>MoveFileExW</b>", if the file is compressed.

The first version, which introduced a new flag for "<b>CopyFileExW</b>":
http://mxr.mozilla.org/comm-central/source/mozilla/xpcom/io/nsLocalFileWin.cpp?rev=315cbee8f00c#1460
The date is March 28, 2011. After a few months it went into the final build of Mozilla, and out to the users. That's when they noticed it, if at all.<h3>The failure propagates up the call stack.</h3>
As I mentioned, on that failure the file "<b>nstmp</b>" still <b>exists</b>, <b>not renamed</b>.
But the code already <b>deleted</b> the <b>original</b>.
The function "<b>FinishCompact</b>" makes a wrong check on <b>return</b> from "<b>MoveToNative</b>", and <b>deletes</b> it:
<code>456	if(!folderRenameSucceeded)
457	m_file-><b>Remove</b>(false);</code>
See the larger excerpt above.

This removes the copy. Hence <b>no copy</b>, and <b>no original</b>.<h1>Solution.</h1>You should <b>never remove</b> the copy. Because a failed copy always means that this is <b>the only</b> copy. <b>No original exists</b> already (<i>deleted</i>).
It should remove the copy, only when a copy succeeds. Or use "<b>MoveFileExW</b>", to spare of the need to remove the copy at all.

"<b>Move</b>" has one more <b>advantage</b>: it simply <b>renames</b> a file. So it <b>spares</b> of the need to <b>copy</b> its data. This is especially good on remote files. Usually both files reside on the <b>same volume</b>, whether local or remote.
Why use "<b>copy</b>" in "<b>CopySingleFile</b>", when the mode of operation in the function is already set to "<b>move</b>"?

<ul><li>Avoid using "<b>CopyFileExW</b>".
Use "<b>MoveFileExW</b>" instead.</li><li>If using "<b>COPY_FILE_ALLOW_DECRYPTED_DESTINATION</b>", make it conditional on the Operating System (OS). You do already check for the OS above the excerpt in order to insert other flags (<i>see the source code</i>). You should do the same for this flag too.</li>
<li>Reprogram:<ul><li><b>FinishCompact</b> to <b>retain</b> data on failure.</li><li><b>nsIPrefServer.SavePref(...)</b> to tolerate failure better.</li></ul><h1>Impact</h1>The <b>mistake</b> is in how the code feeds a <b>flag</b> to, and <b>interprets</b> the <b>result</b> code from the OS.

Strictly speaking, the condition for the error is the file "<i>nstmp</i>" to be <b>encrypted</b>. The fact that the folder is marked as <b>encrypted</b> is not the main condition. In theory "<i>nstmp</i>" may become <b>encrypted</b> during operation.

A different code compiles for each OS.
For OS other than Windows the software may be correct. For some - wrong.
You should check the code for all the OS.

The <b>Bug</b> in <b>nsILocalFileWin</b> <b>affects all</b> functions, which use <b>files</b>. If any is programmed to fail, this <b>new Bug</b> will trigger that failure.
It is a "<b>timed bomb</b>" waiting for both: existing <i><b>Bug</b>s</i> to manifest, and for new <b>Bug</b>s to hit it.
Potentially it affects the entire software, and all OS.

That is why I changed the <b>mark</b> on this <b>bug report</b> to pertain to "<b>All</b>".

Both <b>Bug</b>s may, and do cause <b>huge data loss</b>. Consider that they may <b>destroy multiple folders</b>.<h2><b>Encryption</b> and <b>version</b> of the <b>OS</b> are <b>not</b> the <b>only</b> causes for faults.</h2>The <b>Bug</b>s may <b>manifest</b> not only, when the mentioned <samp>ERROR_INVALID_PARAMETER</samp> occurs. During <b>copying</b> of a file <b>any</b> other minor or even false <b>failure</b>, in any <b>operating systems</b> may occur. It too would <b>cause</b> the <b>Bug</b> to <b>reek havoc</b>.
Then <b>data disappears</b>.<h1>Dereliction of duties</h1>
The mistake is introduced between the 2 versions of the file. It was not present initially.

No need to <b>try</b> different versions of Mozilla to see which exactly it was, as you suggested. It would be a <b>waste of time</b>. It would not expose the mistake itself. You may find it by simply <b>looking</b> at it.
Instead of writing all that stuff you should <b>look</b> at the <b>code</b>. Or at least show it to me. I would show you the mistake.
Now I did so by myself. <b>Give me a salary for my work.</b> Someone took it, while I worked in their stead.

There is one more implication: the mistakes are so blatant, that they amount to gross <b>incompetence</b>. It is best if the programmers, who made them, would be <b>expelled</b>, and prohibited from contribution.</body></html>
Comment 24 Wayne Mery (:wsmwk, NI for questions) 2012-08-06 12:43:52 PDT
> You neglected your duties. I did your job.
thanks for your help. but I think you rush to judgement ... I am a volunteer. And I am not a developer.

Can you repost in plain text please? (It's not readable now, so please do always post in plain text.)

You've invested quite a lot of time. Might you be interested in proposing a patch?  Even if it's just a draft that someone else could further refine.
Comment 25 Andrew 2012-08-06 13:11:46 PDT
Well, somebody is paid out there. Not everyone is a volunteer.
And these bugs are in the core. At least the one in "FinishCompact" might have been there since Netscape - the paid times.

Copy the HTML into a HTML editor. It will become nicely formatted.
I may also repost HTML via e-mail.
But you can do the same in you e-mail software by copying the text yourself.
If you can't, just ask.

Reposting in plain text loses formatting. Makes it even less readable.

Maybe there is a separate area, where I can post it in HTML?
Or a concealed switch to allow HTML posts here?

I have already proposed a patch, a draft. See the section "Solution" in my latest HTML.
I could go and change the code myself.
But I am not sure why somebody made all that mess. Maybe I would disrupt something that was intended.
It is best if people, who introduced that nonsense, would repair it too. As a punishment.

I did not locate the code for "prefs".
But that one is not critical. It seems to prevent data loss.
The most critical now is "FinishCompact".
I think I could remove just the lines, which remove the copies. That alone would make the bug not critical, and allow time for other changes.

Mainly I mean the faulty flag for "CopyFileEx". I could change that bit too, and then let the one, who introduced it to see what I disrupt.

If that is fine, then give me access to code. I would do the work. Would be quicker than writing here about it.
Comment 26 Wayne Mery (:wsmwk, NI for questions) 2012-08-06 13:26:07 PDT
Created attachment 649389 [details]
html of Andrew's comment 23

(In reply to Andrew from comment #25)
> Copy the HTML into a HTML editor. It will become nicely formatted.
> ... but you can do the same in you e-mail software by copying the text yourself.

I had already started to, but ended the attempt when I saw that my copying of the html page (attached) doesn't preserve your paragraph breaks.  So I asked :)

> Maybe there is a separate area, where I can post it in HTML?
> Or a concealed switch to allow HTML posts here?
nope

> I could go and change the code myself. But I am not sure why somebody made all that mess.
> Maybe I would disrupt something that was intended.
> It is best if people, who
> introduced that nonsense, would repair it too. As a punishment.
they are long gone.  


> Mainly I mean the
> faulty flag for "CopyFileEx". I could change that bit too, and then let the
> one, who introduced it to see what I disrupt.
bienvenue or someone who understands the code can perhaps help

> If that is fine, then give me access to code. I would do the work. Would be quicker than writing here about it.

Indeed. See https://developer.mozilla.org/en-US/docs/Simple_Thunderbird_build
Comment 27 Wayne Mery (:wsmwk, NI for questions) 2012-08-06 13:30:34 PDT
see also https://developer.mozilla.org/en-US/docs/Developer_Guide/How_to_Submit_a_Patch

Thanks
Comment 29 Andrew 2012-08-06 17:53:15 PDT
Created attachment 649500 [details]
Description of the 2 bugs. (Formerly comment #23.)
Comment 30 Andrew 2012-08-06 18:00:03 PDT
Sorry for spam.
Please remove the old comment #23, and its duplicate #28, and the obsolete attachment.
Then remove this note too.
Comment 31 Andrew 2012-08-07 05:48:19 PDT
Created attachment 649607 [details]
Description of the 2 bugs. (Formerly comment #23.)

You refer me to the process of submission of a patch.

But it is so complicated. It is quicker to simply change the faulty lines.
Just like they did so easily before. They just did it. And I am supposed to go through a process.

And you refer me to the same system, which has already failed. The same people, who supposedly supervise the code, have introduced the malicious code before, despite all the supposedly rigorous testing.
Until the culprits are removed, the system is not safe, regardless of this particular progress.

Perhaps first you should identify who will be the supervisor of the patch.
Comment 33 Wayne Mery (:wsmwk, NI for questions) 2012-08-07 06:05:38 PDT
(crap. sorry for comment 32 spam - I don't know why that posted in comment)

(In reply to Andrew from comment #31)
> But it is so complicated. It is quicker to simply change the faulty lines.
> Just like they did so easily before. They just did it. And I am supposed to
> go through a process.

If you prefer, then just post the code changes ... then someone else to make it into a proper patch.

If you would like additional advice/help from volunteers and developers, you might visit the #maildev channel at irc.mozilla.org - details at https://wiki.mozilla.org/IRC
Comment 34 Wayne Mery (:wsmwk, NI for questions) 2012-08-07 07:04:20 PDT
Andrew, I'm helped as much as I can. And I've marked two of the big html comments private, which is the most thing I can do to "delete" them.

Someone from #maildev may comment in a couple days. Thanks for your continued interest in making Thunderbird better.
Comment 35 Andrew 2012-08-07 10:42:52 PDT
Created attachment 649688 [details]
Description of the 2 bugs. (Formerly comment #23.)

Remove also the #23. Less clutter.

I have updated the description with regard to when the trigger for the bug of destruction first appeared.

The bug of destruction is in "Mail Folder" (nsFolderCompactState).
The trigger - in nsILocalFileWin.

They were trying to fix a bug 199473. Yet they introduced a new one.
So the person to blame for the bug is the one who made that patch, and his supervisors.

The change at fault is here:
http://hg.mozilla.org/mozilla-central/rev/f4f9d45743a8

There you need to investigate whether indeed you need to use CopyFileEx, instead of MoveFileEx, and whether the flag COPY_FILE_ALLOW_DECRYPTED_DESTINATION is required.

Regardless of this investigation, we can readily move that clause into one place together with the check for the version of windows.
This would immediately remove the trigger of destruction, and various other faults.
Later you may take time to seek other solutions.

Removing the destruction itself should be simple: just remove the lines 456-459, as per excerpt in my attachment. The ones, which remove the compact file "nstmp", and "nstmp.msf".

You could retain destruction only, if deletion of the original file failed.
But it is not nice to rob the user of the produce of compaction. Let the user check the failure, and choose which file one wants to keep.
It is best to remove the destruction completely.

Later you may take time to discuss other ways, if you like.
Comment 36 Wayne Mery (:wsmwk, NI for questions) 2012-08-07 10:55:32 PDT
Jim, Steve, can you assess potential for patch in bug 199473 causing this regression?
Comment 37 Jim Mathies [:jimm] 2012-08-07 11:52:14 PDT
> It does create a compact "nstmp" file.
> Then it destroys the original, and tries to move "nstmp" to the original name.
> The command to move actually uses a command to copy. A copy fails (see a few
> lines below about nsILocalFileWin). Then FolderCompactor interprets the failure
> flag as an instruction to destroy the "nstmp" too.
> That is how both copies are destroyed. - A complete loss of data.

http://mxr.mozilla.org/comm-central/source/mailnews/base/src/nsMsgFolderCompactor.cpp#426

I don't know mail code well enough to comment on what's going on here, but this does seem strange - 

bool folderRenameSucceeded = false;
bool msfRenameSucceeded = false;

rv = m_file->MoveToNative((nsIFile *) nullptr, leafName);

if (NS_SUCCEEDED(rv)) {
  folderRenameSucceeded = true;
  rv = newSummaryFile->MoveToNative((nsIFile *) nullptr, dbName);
  msfRenameSucceeded = NS_SUCCEEDED(rv);
}

if (!folderRenameSucceeded)
  m_file->Remove(false);
if (!msfRenameSucceeded)
  newSummaryFile->Remove(false);

If the first MoveToNative fails, m_file and newSummaryFile are deleted. If the second MoveToNative fails, newSummaryFile is deleted.


As far as the COPY_FILE_ALLOW_DECRYPTED_DESTINATION flag is concerned, I don't see the issue. The point of the fix was to allow copies from encrypted to non-encrypted volumes. If the call fails when both volumes are encrypted we should file a separate bug for that.
Comment 38 Andrew 2012-08-07 13:25:46 PDT
Created attachment 649774 [details]
Description of the 2 bugs. (Formerly comment #23.)

Jim, I do not see what new are you trying to say in the first part.
I wrote the same: if moving o "nstmp" fails, then you delete it. This is a bug.
In nsFolderCompactState you should keep, what you have failed to move.

Now about nsILocalFileWin.

The second part of your message about encryption means that you did not read what I wrote.
The failure is not in the OS being unable to copy or to move.
The failure is in Mozilla. It fails any time the OS does not accept the flag.

In nsILocalFileWin for certain versions of the OS the function CopyFileExe does not accept certain flags. nsILocalFileWin does check the OS for some flags, but fails to do so for this one.
You should move the flag into the part of code just a few lines above it, which does check the version.

Or you may remove the flag about encryption, and the CopyFileEx altogether.
I have tested MoveFileExW. It works fine between file systems, one of which does support encryption, and the other does not.
There was no need for the patch.
Comment 39 Jim Mathies [:jimm] 2012-08-07 14:34:21 PDT
(In reply to Andrew from comment #38)
> 
> In nsILocalFileWin for certain versions of the OS the function CopyFileExe
> does not accept certain flags. nsILocalFileWin does check the OS for some
> flags, but fails to do so for this one.

Which version of Windows doesn't support COPY_FILE_ALLOW_DECRYPTED_DESTINATION? According to msdn it is valid on XP and up.
Comment 40 Hiroyuki Ikezoe (:hiro) 2012-08-07 15:45:56 PDT
(In reply to Jim Mathies [:jimm] from comment #39)
> (In reply to Andrew from comment #38)
> > 
> > In nsILocalFileWin for certain versions of the OS the function CopyFileExe
> > does not accept certain flags. nsILocalFileWin does check the OS for some
> > flags, but fails to do so for this one.
> 
> Which version of Windows doesn't support
> COPY_FILE_ALLOW_DECRYPTED_DESTINATION? According to msdn it is valid on XP
> and up.

The reporter said:
(from comment #20)
> I made more tests:
> Can't reproduce on Windows XP.
> Always reproduces on Windows 2000.
Comment 41 Jim Mathies [:jimm] 2012-08-07 16:03:21 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #40)
> > Which version of Windows doesn't support
> > COPY_FILE_ALLOW_DECRYPTED_DESTINATION? According to msdn it is valid on XP
> > and up.
> 
> The reporter said:
> (from comment #20)
> > I made more tests:
> > Can't reproduce on Windows XP.
> > Always reproduces on Windows 2000.

We don't support 2K. However if the community would like to put together a patch that one-offs this flag on 2k, feel free, we'll still find the time to review and land it. If anyone cares to do that please post it to a new bug specific to the issue.
Comment 42 Andrew 2012-08-08 09:41:19 PDT
In comment #39 Hiroyuki has noted correctly my point: the current code does check the version already. But it fails to do so for this flag.
So, it is illogical to leave the flag outside the check.
As is, the code is inconsistent with itself.
So it is a bug in itself, regardless of your attitude to 2K.

Astonishing is the fact that you introduced the check for version after you introduced the bad flag. The bad flag was already there, but you failed to embrace it in the check.

You introduced the bug of a bad flag long before you renounced 2K.
After that you laid one more bug on top of it - the check for version, which fails to embrace the flag.

If you indeed intend not to support 2K, then you would remove the check.
But this is very unsafe. Users are still likely to try to use the software on old versions of OS. You should not hinder them just because you deem their version "wrong".

And it is not about the OS. It is about the version of the function CopyFileEx only. In theory it may be different from the version of an OS.
CopyFileEx may fail in any OS. - Even more so, because later you may introduce more bugs, and flags the same careless way.

It is far better to check the return value of the function, and call it again with less flags.
But a check for version of OS is a good approximation.

You missed one more of my points that the bad flag is actually not necessary at all.
It is a patch for the OS, not for the Mozilla. Since then the OS was perhaps patched already.
Hence my test on MoveFileEx works.

So, simply removing the flag, and replacing CopyFileEx with MoveFileEx is a patch enough.
It would revert the patch for the bug the flag was designed for. It would even cure one more bug, which that patch introduced by patching that bug incompletely.
See my comments there:
https://bugzilla.mozilla.org/show_bug.cgi?id=199473#c15

Do not forget that this bug is actually 2 in 1:
nsFolderCompactState
nsILocalFileWin

nsFolderCompactState destroys data in all OS. It is waiting for just any failure. The failure does not have to be limited to nsILocalFileWin, or to version of the OS. Anything unexpected may cause it.

nsILocalFileWin has a potential to fail not only in W2k. As I mentioned, the code is bad. It predisposes programmers for new mistakes, new bugs.
And it is inefficient, because it uses Copy instead of Move. As I mentioned, Move does work. Improving this would improve all OS.

So, Hiroyuki has incorrectly marked the bug as W2K only.
Comment 43 Andrew 2012-08-08 18:40:48 PDT
Created attachment 650407 [details]
Description of the 2 bugs. (Formerly comment #23.)

In the attachment in section named "Solution" I added an incomplete example of how to patch nsILocalFileWin conservatively. So that you retain the patches for bugs 199473, and 545650.

Although I suggest to radically revert (discard) those 2 patches, because they patch not Mozilla, but the new Windows OS, which you tend to praise so much.
The old OS seems to work well. And the new ones perhaps are already patched themselves by now.

Read the attachment for more. It is my main description of the bugs.

I also add 3 more bugs to the "Blocks" list.
They all seem to result from a bug in nsFolderCompactState.
In some this bug triggers by an error return from a call nsILocalFileWin.
In others it seems to receive some other trigger, which we did not yet discover.
Note that some OS there are the ones, which you do claim to "support". - Not 2K.

Can someone solve the bug in bugzilla? Every time it fails to recognize the type of attachment "text/html". Instead it marks it as "text/plain".
Comment 44 Andrew 2012-08-09 10:19:58 PDT
Created attachment 650605 [details]
Description of the 2 bugs. (Formerly comment #23.)

3 more versions of patches for nsFolderCompactState (the bug of destruction), and nsILocalFileWin (the bug in a patch for a bug of encryption).

The latter patches also patch the earlier patches for bug 199473 (bug of encryption), and bug 545650 (bug of network).
The patch for 199473 actually introduced 2 bugs:
One is the trigger for the bug of destruction. - The one we try to fix.
Another - incomplete application of the patch.
Comment 45 Hiroyuki Ikezoe (:hiro) 2012-08-16 16:32:07 PDT
Created attachment 652619 [details] [diff] [review]
An xpcshell test simulating the failure in nsIMsgFolderCompactor.compact

This test needs a patch against mozilla-central because this test creates a Test-Double of nsIFile which causes NS_ERROR_FAILURE in moveToNative method. To create the Test-Double nsIFile needs non-builtinclass.
This test also needs the fix for bug 782868.
Comment 46 Hiroyuki Ikezoe (:hiro) 2012-08-16 16:37:21 PDT
I should note that the test works fine (i.e. it fails now) Windows XP and Linux. But I can not confirm the test works on Mac OSX.
Comment 47 Hiroyuki Ikezoe (:hiro) 2012-08-16 16:39:12 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #45)
> Created attachment 652619 [details] [diff] [review]
> An xpcshell test simulating the failure in nsIMsgFolderCompactor.compact
> 
> This test needs a patch against mozilla-central because this test creates a
> Test-Double of nsIFile which causes NS_ERROR_FAILURE in moveToNative method.

Opened a new bug for the patch against mozilla-central. bug 783414.
Comment 48 Kent James (:rkent) 2012-08-16 17:12:13 PDT
Hiro, there is so much boilerplate code in your patch that I can't really figure out what you are trying to do here. Could you give a brief verbal description of the essence of this?

Also, is this meant as a demo or as a real contribution that would be reviewed and landed?
Comment 49 Hiroyuki Ikezoe (:hiro) 2012-08-16 18:00:06 PDT
(In reply to Kent James (:rkent) from comment #48)
> Hiro, there is so much boilerplate code in your patch that I can't really
> figure out what you are trying to do here. Could you give a brief verbal
> description of the essence of this?

The test does just check that message folder file still exists when the folder compaction is failed.
Currently the test fails because there is no fix against this issue.

> Also, is this meant as a demo or as a real contribution that would be
> reviewed and landed?

The test will be reviewed when the fix is reviewed.
Comment 50 Hiroyuki Ikezoe (:hiro) 2012-08-17 03:02:44 PDT
Created attachment 652714 [details] [diff] [review]
Fix

This patch depends on the fix for bug 782868.

Unfortunately the patch against mozilla-central (bug 783414) failed to get review+. I have no other idea to write test for this issue. So I attach the fix for this issue anyway.

This patch assumes that nsIFile.moveToNative method is an atomic operation. This assumption is not problem in most cases of both on Linux and Mac OSX because rename(2), which is invoked in moveToNative on these platforms, guarantees to leave the target file whenever the operation fails.
In case of Windows, I suppose XPCOM should provide an atomic 'moveTo'.
Comment 51 Andrew 2012-08-18 10:27:18 PDT
Comment on attachment 652714 [details] [diff] [review]
Fix

Review of attachment 652714 [details] [diff] [review]:
-----------------------------------------------------------------

I suggest you do not create separate functions.
Please simply remove them, and add lines to old code.

Otherwise it is difficult to track the changes.
Hence, now I only provide a review at first glance.

Please do by yourself a comprehensive comparison to my patch.
Or remove the new functions. Place the new code where it belongs directly. Then I could compare myself easily.
It seems that there's no new code at all. You simply separated it into new function.

See the section "Solution" in my attachment. It simply reduces code. No need for complexity.
It seems that your patch simply duplicates mine there. But as I mentioned it is difficult for me to compare now.

The main part is to remove the lines 456-459:
456 if(!folderRenameSucceeded)
457 m_file->Remove(false);
Because they remove the last copy of data: the file "nstmp".

Could you please simply produce a patch from my patch? Attach it as another version.

Do not remove more than you need to remove. For example you removed the comment at line 375. It is a trivial one. Maybe it deserves removal. But make sure that you take into the new code as much as there is useful in the old.

You also lost the identifier "dbName". Old code uses it to move "nstmp.msf". You instead use "summaryLeafName". I do not know if this is good. I am not familiar with the purpose of "dbName".
You should provide a reason for this change anyway.
Comment 52 WADA 2012-08-19 17:10:48 PDT
Bug 498814 is for phenomena when mbox file(XXX, not XXX.msf, if mail folder named XXX) is opened by other software just before Tb's Compact starts. In this case, "Sharing Violation" occurs upon Tb's "open read/write with shared mode".

Even after "file size of XXX=0 by Compact failure" was resolved by bug 491303(file size check of nstmp was introduced), and even after critical problems are resolved by Tb 14, problem of "undeleted nstmp/nstmp.msf" still remains.
And, because external software can open/close any Tb's file at any time, external software can interfere any Tb's open step of XXX/XXX.msf/nstmp/nstmp.msf and Tb's "Move nstmp/nstmp.msf to XXX/XXX.msf" request(rename with replace mode).

Will patch for this bug resolve such problems due to interfere by other software?
Comment 53 Hiroyuki Ikezoe (:hiro) 2012-08-19 18:15:41 PDT
Created attachment 653236 [details] [diff] [review]
Simple fix
Comment 54 Hiroyuki Ikezoe (:hiro) 2012-08-19 18:17:23 PDT
Created attachment 653237 [details] [diff] [review]
Updated test

This test also checks temporary files surely removed when moveToNative fails.
Comment 55 Andrew 2012-08-20 10:35:19 PDT
Please note the comment #1. Please implement it.
Post warning on all sites.

I tested my own patches. They do work.

Now you need to test them with other 2:
bug 199473;
bug 545650.
Because I implemented my patch, which completely removes patches for those bugs.
Patch of bug 199473 was necessary to bypass.
Removal of patch for bug 545650 is an optimization with the occasion. Not vital for the bug we patch now.

See my attachment, section "Discard the patch of encryption 199473, and of network 545650".

So much for a bug in nsLocalFileWin::CopySingleFile().

A patch in nsFolderCompactState::FinishCompact()
is more straightforward, as it seems to remove no other patch.

(In reply to comment #52)

Please read my reports carefully, and have a look at excerpts of guilty code in my attachment.

There I already noted the essence of the bug of destruction. It is in 4 lines: 456-459.
They remove the last remaining copy of data: "nstmp", and "nstmp.msf".
You need to remove these 4 lines completely. See my patch in attachment.

By the way, Hiroyuki failed to remove them in attachment 653236 [details] [diff] [review]. He only optimized inessential code.

The process, which you report in your bug 498814 seems to be this:
- Initially Mozilla keeps the "original" file open.
- Another program opens the same.
- Nothing bad happens. Because Mozilla already has the file open.
- Mozilla succeeds to compact.
- It removes "original.msf". It succeeds, because it is not open in another program.
- Fails to remove the "original". Because it is open in another program, and not shared.
- So, it triggers the bug of destruction: removes "nstmp", and "nstmp.msf".

Result: only the "original" file remains.
No "original.msf".

On the second turn:
- The "original" remains closed in Mozilla. Maybe because it is closed after failure during previous compaction. Or maybe because Mozilla does not open a folder before you use it.
The folder file is not open.
You do not see it so, as a user.
- You open the same file in another program again.
- Compact creates empty "nstmp", and "nstmp.msf" (both of size 0).
- Then it tries to actually compact.
- But cannot open the "original". Because it had not opened it before another program did.
- Mozilla abandons compaction. It does not call "FinishCompact".
So, it does not have a chance to destroy "original", "original.msf", "nstmp", or "nstmp.msf".
(I did not confirm this stage by actually looking at code. But I presume the code was so at least at the time of your report.)

Result:
Both originals remain: "original", "original.msf".
In addition you have "nstmp", and "nstmp.msf" both 0 in size.

The circumstance that there was 1 not deleted message, or more seems to be irrelevant, a mere coincidence.

I derived all this from simply looking at my excerpts of code.
Please do so for yourself too.
See if I am correct.

Here I presume that Mozilla would not fail to read a file, which is opened in another program without sharing, if it had that file already open.
I think you may test that assumption quite easily.

But even if I am wrong, the code may still reach "FinishCompact", and destroy 3 files:
"original.msf",
"nstmp",
"nstmp.msf".
Just look at the code closer.

My conclusion is that this bug is a very likely cause of bug 498814, which you report about.

(In reply to comment #50)
The atomicity of MoveFileExW is not an issue to solve here. It is handled by the OS.

Hopefully, the OS handles the renaming securely. So, on failure at least one piece of data remains.

Of course you may produce a fool proof code: rename the "original", rename "nstmp", delete the original only on success.
Now Mozilla does the reverse. It is unsafe by design.
(In reply to comment #45)
I think there's no need to produce a test to simulate a failure.
There's a great need to examine the code. Do test in the mind.

It would be a great mess, if you wrote code only by tests. You should write it by design.
Comment 56 Andrew 2012-08-20 12:11:13 PDT
Created attachment 653469 [details]
Description of the 2 bugs. (Formerly comment #23.)

Corrected and clarified description.

Correction to my comment in reply to comment #52.
This bug is not in itself the cause of your bug 498814.
The bug itself is only the bug of destruction in 4 lines 456-459.

While you complain not about destruction, but about failure to compact, and to destroy "nstmp".
You do actually want to destroy them. Am I correct?
I want to prevent destruction.

I meant that the same code causes both bugs.
Perhaps we should treat the whole code as one bug, rather than split it into separate bugs.

My patch would retain "nstmp" files by design. Because they contain essential data.

It is another matter that Mozilla should not produce empty files in the first place. You may remove them indeed.
But if Mozilla succeeds in compaction, places data in "nstmp", then why remove the files? Why not allow the user to pick the file one likes?

I would agree to remove "nstmp", only if they are empty. But even that decision may be prone to various bugs, which might remove valid files full of data.
I would rather remove those 4 lines of destruction.

The removal of empty files should be done in another way:
- Compact created "nstmp" with size 0.
- Fails to read the original.
- So it fails to copy any messages to "nstmp".
- Remove "nstmp" without any condition, and terminate compaction.

Such removal can be done anywhere, including in the function "FinishCompact", which we try to patch now.
The only condition is that condition for removal must be not the size of file, but absence of any action to copy messages - failure to start copying.

If copying of at least one message was initiated (and failed or not), then it is another kind of failure, and "nstmp" should remain.

You remove it only if copying did not start. For example, if you fail to open a file, or to read from it.
Comment 57 Andrew 2012-08-20 12:21:51 PDT
Comment on attachment 653236 [details] [diff] [review]
Simple fix

Review of attachment 653236 [details] [diff] [review]:
-----------------------------------------------------------------

You have lost one line:
444 NS_ASSERTION ...
Perhaps you are using some old source file for diff.
Look at the current nsMsgFolderCompactor.cpp.

You failed to remove the 4 lines: 456-459 (as per original).
They are the essence of the bug.
I mentioned the same many times:
See my comment 35, comment 51, comment 55.

If not for those failures, the patch would be good.

Make an extra check, if it does not remove code, introduced by other patches. I mention this in comment 55.
Please confirm this.
Comment 58 Hiroyuki Ikezoe (:hiro) 2012-08-20 14:51:11 PDT
Andrew, If you really want to review the patch, please use review tool. Thanks.
Comment 59 Andrew 2012-08-21 06:53:46 PDT
Comment on attachment 653237 [details] [diff] [review]
Updated test

Review of attachment 653237 [details] [diff] [review]:
-----------------------------------------------------------------

Consider my comment #55 as a review.

There I mention that it is not necessary to simulate failure.
It is necessary to analyze the code. It would be far better.

I advise not to proceed with this patch. It is not a patch anyway. Is it?
It is just a temporary code to simulate failure in nsILocalFile in order to cause the bug in nsIMsgFolderCompactor::FinishCompact to manifest itself.

If you want to proceed with the code for simulation, then you should better explain what you are doing. Kent asked the same in comment #48.

I did not fill a review form for this attachment. I just posted it as a simple comment #55 about your comment #45 about a test to simulate failure.

A review is just one more way to post a comment. I just "publish" it in a form, attached to a specific patch. It automatically appears in comments as well.
Or is there a special way to do a review?

When you post an advice, please note which comment of mine do you mean.
My comment #57 is properly filled as a review for your attachment 653236 [details] [diff] [review].
My comment #55 is indeed filled as a mere comment, rather than a form of review.
Comment 60 Andrew 2012-08-21 09:19:08 PDT
Comment on attachment 653236 [details] [diff] [review]
Simple fix

Review of attachment 653236 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/src/nsMsgFolderCompactor.cpp
@@ +439,5 @@
>    }
>    if (!folderRenameSucceeded)
>      m_file->Remove(false);
>    if (!msfRenameSucceeded)
>      newSummaryFile->Remove(false);

These 4 lines are the bug.
Remove them all.
Lines 440-443 as per patch.
456-459 as per original.
See the overall review.
Comment 61 Andrew 2012-08-21 10:18:12 PDT
Created attachment 653826 [details] [diff] [review]
Patch destructive compaction

Patch for nsFolderCompactState::FinishCompact.
As per proposal in my attachment in section "Solution".
I remove the 4 lines 456-459 (as per original). They destroy data for no reason.

I remove the lines, which remove the original file, before renaming the copy.
This is one more destruction.
Both instances of destruction leave a user with no data.

I simply rename the compact copy "nstmp" into the original.
This is simultaneously an optimization. Less code.

If move fails, then a user has both copies.
More choice is better.

There is always more to do.
You may want to add code, which opens the failed file "nstmp" in user interface. So, a user may pick messages from it there immediately. View it as a mail folder.
As is the patch allows the user to manipulate the file in the Operating System only.
If one wants to view the file as a mail folder, one would have to restart Mozilla.
Comment 62 Andrew 2012-08-21 11:10:55 PDT
Created attachment 653851 [details] [diff] [review]
Patch destructive compaction
Comment 63 Hiroyuki Ikezoe (:hiro) 2012-08-21 16:16:10 PDT
(In reply to Andrew from comment #60)
> Comment on attachment 653236 [details] [diff] [review]
> Simple fix
> 
> Review of attachment 653236 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: mailnews/base/src/nsMsgFolderCompactor.cpp
> @@ +439,5 @@
> >    }
> >    if (!folderRenameSucceeded)
> >      m_file->Remove(false);
> >    if (!msfRenameSucceeded)
> >      newSummaryFile->Remove(false);
> 
> These 4 lines are the bug.
> Remove them all.
> Lines 440-443 as per patch.
> 456-459 as per original.
> See the overall review.

Well, as I wrote in comment #50, the patch assumes that nsIFile.moveToNative method is an atomic operation. So if the method fails, the target file (in this case it's the original folder file before compaction) is left there. And those temporary files, which are the result of compaction failure, should be removed.
Comment 64 Hiroyuki Ikezoe (:hiro) 2012-08-21 16:19:55 PDT
(In reply to Andrew from comment #59)
> Comment on attachment 653237 [details] [diff] [review]
> Updated test
> 
> Review of attachment 653237 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Consider my comment #55 as a review.
> 
> There I mention that it is not necessary to simulate failure.
> It is necessary to analyze the code. It would be far better.

Disagree. Auto-mated test is more useful especially once regression happens.
Comment 65 Andrew 2012-08-22 07:46:18 PDT
To comment #63.

Please read my description carefully.
The call chain is very long.
Failure may happen anywhere. Not only in CopyFileExW. Even that function may fail extraordinarily. Especially that the nsILocalFileWin::CopySingleFile() uses it separately from DeleteFileW.

Both files may be missing.

An assumption about the best outcome is dangerous.
Why not use my assumption about the worst?
We are talking about failure, after-all.
In normal operation these 4 lines are not required anyway.

There are too many reports about destruction. This precludes me from making good assumptions.
The assumption is only based on your assumption. This contradicts yourself, when you state that test is better. Anyway, you can't possibly test for a failure, which you are not aware yet of. Especially so that you are not willing to analyze the code. Instead, you rely on tests, which you never conduct.
There's no test that proves your assumption. You yourself did acknowledge that it is only an assumption.

You also missed my point that a user has to have a choice which file to pick after failure.

I already mentioned that it would be best to even let a user one open the new "nstmp" as a mail folder in mail window.

Comment #64.

You can test only known conditions.
And you only can test if those conditions occur. For example for bug 199473 and bug 545650 you need to test encrypted, and network folders. How do you access them? Do you have all the equipment in the world?

And you are not testing for those bug at all. You are not testing if those bugs in the OS are still present, or if my patch does not make them manifest again.

You only simulate a failure in nsIFile in order to trigger this bug of destruction. In other words the test is designed to prove the bug.
But why?
Look at the code, and you readily see why compaction fails and then destroys data.

Anyway, you never explained what the test is about, and how you do it.

By the way, you yourself failed to use a "review tool" now.

I add those 2 bugs (bug 199473 and bug 545650) to list named "See Also". So, they are distinct from the other dependents.
I also add them to the list named "Blocks". They triggered our bug.
Our patch would patch them too, because they were patched improperly. Especially the bug 199473.
Comment 66 Andrew 2012-08-22 08:24:37 PDT
Created attachment 654222 [details] [diff] [review]
Patch destructive compaction, and file handling

I add the patch for nsILocalFile::CopySingleFile().
A single check for the version of the OS bypasses patches for bug 199473 and bug 545650.
Discussions in those bugs indicate that old OS just works without those patches.

I already mentioned would be useful to optimise CopySingleFile further to use a return code of CopyFileEx, and MoveFileEx, rather than version of the OS.
See my attachment for explanation.
A return code explicitly indicates that a flag is not acceptable, or that a function failed.
Refrain from setting the flags in advance before you know that OS fails without them.
Call MoveFileEx, or CopyFileEx. If it succeeds, then there's no need for extra flags. Hence more optimal execution of code.
You already do so partly in one of these blocks of code.

I complete the patch for the flag, which allows decryption.
I introduce it for the case, where it was missing for no reason. One case had the flag. Another had not.
See my attachment with description of the bug.

Further optimisation to do would be to simply set this flag for all cases, without checking for it on the file.
If file is not encrypted, it would not harm.
Comment 67 Andrew 2012-08-22 08:33:20 PDT
Created attachment 654223 [details] [diff] [review]
Patch destructive compaction, and file handling

Sorry. Bugzilla does not tolerate tabs in patches. Please correct.
Comment 68 Andrew 2012-08-22 08:41:30 PDT
Created attachment 654226 [details] [diff] [review]
Patch destructive compaction, and file handling
Comment 69 Andrew 2012-08-22 08:48:27 PDT
Created attachment 654229 [details] [diff] [review]
Patch destructive compaction, and file handling
Comment 70 Kent James (:rkent) 2012-08-22 09:17:38 PDT
Hiro, really great work in trying to sort this out and move this forward. And also thanks Andrew - though I sure wish you would learn to be less aggressive and negative in your style here.

I've spent a couple of hours trying to sort this out, and let me summarize here and also comment on the latest patches.

I tried to reproduce the problems on Windows 7 using the STR in comment 7, and cannot (which is not surprising since comment 20 implies that there is an underlying OS issue that was fixed in Windows XP). But as the other comments state, and Hiro's test code shows, there is a possible failure at a very critical moment in compacting that needs to be addressed, as follows:

In nsFolderCompactState::FinishCompact line 437, the uncompacted mail file, which contains our critical data, is removed:

     rv = folderPath->Remove(false);

The encryption issue is one way this might fail, but there could be other ways as well. Once that occurs, then we must take heroic efforts to ensure that the compact completes. But in fact, if any error occurs after that, we both fail AND delete the compacted version of the files. This is what hiro's test does, it simulates a failure in the move that occurs after this, and shows that the critical data is destroyed.

So, how to fix this? hiro's approach in "Simple Fix" is to trust that the following code will do the Right Thing:

rv = m_file->MoveToNative((nsIFile *) nullptr, leafName);

Namely, it will either succeed (deleting the uncompacted file, and replacing it with the compacted file), or it will fail cleanly (leaving the original uncompacted file intact). Andrew, if I understand correctly, does not have full confidence in this.

So I see the choice as follows:

1) Accept hiro's patch as is, trusting the move "is atomic" as he puts it.
2) Don't trust the move, and add additional heroic efforts to recover.

What would 2) look like?

- If the move reports to have succeeded, all is well (so we trust success).
- If the move fails, then we test if the original file is intact. If it is, delete the unsuccessful compacted file. If not, attempt to create a unique file, then move to that file. If even that fails, then leave the temporary file.

This will have the unfortunate effect (in the hopefully rare case that the move fails in a non-atomic way) of possibly creating two file message folders with similar names, differentiated by a uniqueness number. But the alternatives are worse - leaving the cryptic temporary name, or deleting the data completely.

This is ugly, but given that problems we had had in this area, and the unpredictablity of failures, I think it is the correct approach.
Comment 71 Kent James (:rkent) 2012-08-22 09:27:47 PDT
Comment on attachment 654229 [details] [diff] [review]
Patch destructive compaction, and file handling

Review of attachment 654229 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/src/nsMsgFolderCompactor.cpp
@@ +442,5 @@
>     NS_ASSERTION(msfRenameSucceeded && folderRenameSucceeded, "rename failed in compact");
>   }
> +  // On failure do not destroy the compact copy "nstmp", and "nstmp.msf".
> +  // Let the user choose by hand which file to retain: the possibly remaining original,
> +  // or this compact copy.

If !folderRenameSucceeded, attempt to create a unique file and move to that. If even that fails, leave the temporary file.

If !msfRenameSucceeded, delete the temporary db file, as it can be mostly recovered from a valid mail file, and the temporary version is worthless junk at this point.

::: mozilla/xpcom/io/nsLocalFileWin.cpp
@@ +1819,1 @@
>     DWORD dwVersion = GetVersion();

You cannot mix patches in this part of the code with patches in mailnews. If you want to pursue this issue, file a separate bug.
Comment 72 Andrew 2012-08-22 17:28:43 PDT
Comment on attachment 654229 [details] [diff] [review]
Patch destructive compaction, and file handling

Review of attachment 654229 [details] [diff] [review]:
-----------------------------------------------------------------

::: mailnews/base/src/nsMsgFolderCompactor.cpp
@@ +442,5 @@
>     NS_ASSERTION(msfRenameSucceeded && folderRenameSucceeded, "rename failed in compact");
>   }
> +  // On failure do not destroy the compact copy "nstmp", and "nstmp.msf".
> +  // Let the user choose by hand which file to retain: the possibly remaining original,
> +  // or this compact copy.

No. Keep both files.
They are useful.
It is failure already. No need to keep it neat.

The file name is already unique. Compaction uses only unique names.
No need to make it even more unique.
Next compaction would create a file like "nstmp-1", and so on. I've seen this many times.

A user would not have any problem with "cryptic" file names, as you term them. Even if they appear as folders in mail.

A "cryptic" name is better. A user easier notices an unusual name.
If you name it "compact-1", it may get lost, despite its appearance.
Perhaps a name "original-compact" would suffice.
Perhaps you could name the file so from the start. So, you do not have to rename it later.

No need to try to rename again. It is failure anyway. More failure is likely. Let the user decide now.

A user would be grateful to have one's data safe. And still have compact data. Job done even on failure. A user is not stuck with whatever one tries to do, which we have no idea of, and have no right to assume.

That's a great additional benefit of deleting these 4 lines.
What is better?
To waste the whole job of compaction?
Or only the tiny bit - the failed renaming of files.

Remember: we are talking only about failure. It is not neat already. Why demand neat removal of "nstmp"? Why disable the user even more?

Good if original remains. What if it does not?..

Leave the "nstmp". I see no problem in that. Only benefits.

That's not to say that Kent advocates removal. He suggested quite the opposite.
I just clarify for completeness.

What to do with the "orphan" files "nstmp", and "nstmp.msf"?
On failure you should not make any assumption of what is junk.
"nstmp.msf" is a valid file for use as a folder. Let's keep it. Otherwise why has the software worked at all?

A user may open it in mail window.

To open without any further programming, one would simply have to restart Mozilla.

Later you may enhance the patch to open the file as a folder without restart.
I noted this a few times already.
But that may wait.
You may commit this patch now. And later produce a new version. No need to close the bug immediately.

Or perhaps you could open a new bug specific to "orphaned" compact files.

Meanwhile some message to user would be useful.
For example a note on the status bar:
'Compact succeeded, but failed to replace the original folder. Restart Mozilla. The compact folder will show under name "nstmp".'
Comment 73 Hiroyuki Ikezoe (:hiro) 2012-08-22 17:50:45 PDT
(In reply to Andrew from comment #72)
 
> Leave the "nstmp". I see no problem in that. Only benefits.

Unfortunately those garbage in temporary directory makes the machine slower. 

Please see bug 389132#c15
Comment 74 Hiroyuki Ikezoe (:hiro) 2012-08-22 18:11:41 PDT
Comment on attachment 653237 [details] [diff] [review]
Updated test

I just opened a new bug for the test to avoid confusion. See bug 784888.
Comment 75 Andrew 2012-08-22 19:09:46 PDT
(In reply to Kent from comment #70)
> Hiro, really great work in trying to sort this out and move this forward.

Big work - yes. But not great.
He only kept the attention on the bug.

> And also thanks Andrew

I've done a far greater job. I looked at the code. Which no one has done before, as you see. Even the people, who introduced the bad flag, forgot about it, when they heard of this bug. That's if they heard of it at all, or cared to.

> less aggressive and negative
> in your style here.

We are talking about gross incompetence here.
No way aggressive.
I just calmly stated the facts: Hiroyuki did the test. But never told us what it was, and how. And it is not a test of a patch. It is a simulation of a bug. - Two different things.

Why simulate what we see in code already?
Is this question aggressive?

Perhaps what seems to you as aggressive is the cascade of true statements, not diluted with fantasy.

> I've spent a couple of hours trying to sort this out

I spent a lot more looking for the bugs, and then describing both.

You read "aggressive" statements in 2 hours, instead of "polite" dilutions in 20 hours.

> I tried to reproduce the problems on Windows 7

Wrong approach.
You should not try only encryption.

There are many triggers for the bug. Any failure during the cal chain will trigger it. See how many dependent bugs there are listed. And many more not listed. Each delete data under different circumstances.

> an underlying OS issue that was fixed in Windows XP).

Not exactly right. WinXP introduced a new version of CopyFileEx. It accepts the new flag about encryption.
This change in itself meant nothing to Mozilla.

WinXP introduced a bug of encryption (the mentioned bug 199473).
That is why you produced a patch for it.
The new OS did not solve anything. Instead you had to patch it. But your patch was a bug in itself. Big and careless.

> Hiro's test code shows

No. It showed nothing. Maybe Hiroyuki saw something. But he never told us.
I presume his simulation was simply to insert a failure code somewhere in the call stack - a distinct instruction to return such code.
He did not provoke any natural failure. Did he? He simply inserted a result code.

> In nsFolderCompactState::FinishCompact line 437, the uncompacted mail file,
> which contains our critical data, is removed:
> 
>      rv = folderPath->Remove(false);
> 
> The encryption issue is one way this might fail, but there could be other
> ways as well.

You understood correctly.
No need for a test, or simulation.

> Once that occurs, then we must take heroic efforts to ensure
> that the compact completes.

Incorrect. Compact is already done. "nstmp" contains correct, compact mail.

> But in fact, if any error occurs after that

After what? What do you mean?
Do you mean a code to make, or do you mean the current code?

Current code does not attempt to complete compact. It simply destroys it. No heroism. No further fatal failure possible, because on failure code does nothing (except destruction).

The future code may indeed attempt something "heroic". But why bother?
You are correct that more failures may follow, if there was one already. It is better to simply leave, and retain "nstmp" with "nstmp.msf".

> we both fail AND delete the compacted version of the files.

For clarity: perhaps by the word "both" you mean not both attempts at recovery of a fault (which never takes place), but another kind of "both": failure to rename, and then destruction.

> rv = m_file->MoveToNative((nsIFile *) nullptr, leafName);

Imagine that some more incompetent programmers appear. They would decide to change this function so that it performs in 2 stages: remove the destination if it exists, then move the file.
They may decide to do so for many irrational reasons. Just like they did when they introduced a new bug, instead of the mentioned patch for bug 199473 of encryption.
How do they know that up the call stack someone "trusts" them?
All trust would go bust.

> Andrew, if I understand correctly, does not have
> full confidence in this.

Absolutely no confidence.
Above I gave one more possibility of failure.

It does not really matter what it results from. Old OS, gross incompetence, small mistakes, hardware failures, or anything else may trigger the bug here.

It is only natural to keep data as long as it is not safe.

> - If the move reports to have succeeded, all is well (so we trust success).
> - If the move fails, then we test if the original file is intact.

No need to do that. Because there's no need to delete the compact copy of the file.

> If it is, delete the unsuccessful compacted file.

Surely you can't call this: "additional heroic efforts to recover".
It is a "heroic" effort to destroy the good part of the result, and thus fail the user even more.

Consider that an already faulty OS may wrongly report status of files. Or some wrong flags may be present in the function calls from Mozilla to the OS, like they were this time.
You wrongly decide that the file exists, while it is not.
So, you blissfully delete the copy.

During a fault you better be more careful.

> If not, attempt to create a unique
> file, then move to that file. If even that fails, then leave the temporary
> file.

Not necessary to bother.
As I noted in review of my patch, you may do this at the start of compaction: name the file "original-compact".

But "nstmp" is good enough.

> in the hopefully rare case that the
> move fails in a non-atomic way

Data loss does not care about chances. When it comes, it is a disaster.

> creating two file message
> folders with similar names, differentiated by a uniqueness number.

No problem. Quite the opposite. It is good.
On restart of Mozilla a user would see 2 folders one beside the other:
"original"
"original-compact"
So, one knows what happened, and where to look.

> But the alternatives are worse
> - leaving the cryptic temporary name

The name "nstmp" is not worse really. Very nice, and standing out.

> or deleting the data completely.
>
> This is ugly, but given that problems we had had in this area, and the
> unpredictablity of failures

Yes. It does not matter if disaster is ugly or pretty.
Once failed do not disturb things more.

Hopefully, disasters are rare. Most of the time we see a pretty Mozilla.

> I think it is the correct approach.

So, my version is best.
Comment 76 Andrew 2012-08-22 19:20:09 PDT
(In reply to Hiroyuki from comment #73)
> (In reply to Andrew from comment #72)
> > Leave the "nstmp". I see no problem in that. Only benefits.
> 
> Unfortunately those garbage in temporary directory makes the machine slower. 
> 
> Please see bug 389132#c15

You are not familiar with compaction.
Mozilla creates "nstmp" in the same folder where original is.
If it remains there, a restart of Mozilla is enough to make it appear to user as a mail folder.

What garbage?
It is not as if I suggested to leave "nstmp" during normal operation. Normally, if you rename it successfully, it is removed anyway.

We are talking about failure.
Such cases are rare. Abnormal circumstances justify abnormal situation.
A user would be insane to ignore this symptom, and produce more files.
Before proceeding one would solve the failure.
Comment 77 Hiroyuki Ikezoe (:hiro) 2012-08-22 19:36:30 PDT
(In reply to Andrew from comment #76)
> (In reply to Hiroyuki from comment #73)
> > (In reply to Andrew from comment #72)
> > > Leave the "nstmp". I see no problem in that. Only benefits.
> > 
> > Unfortunately those garbage in temporary directory makes the machine slower. 
> > 
> > Please see bug 389132#c15
> 
> You are not familiar with compaction.
> Mozilla creates "nstmp" in the same folder where original is.

Thanks, I misunderstood. I supposed that temporary file is always created in temporary directory.

So, if the nstmp file is the same folder as the original and left there, the nstmp will be appeared in folder pane on Thunderbird window. Is it ok for you?
Comment 78 Kent James (:rkent) 2012-08-22 21:01:17 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #77)

> So, if the nstmp file is the same folder as the original and left there, the
> nstmp will be appeared in folder pane on Thunderbird window. Is it ok for
> you?

In what I expect to be by far the most common case, the move is indeed atomic, and the failure will result in the original mail folder file intact. In that case, the correct thing to do is to delete the nstmp file, as it is junk at this point that is of no value since the compact failed. The original mail folder file is fine and can continue to be used successfully. An extra nstmp file, while perhaps of interest to the power user, is only junk from the perspective of the average use. It should only be left there after you have verified that it is necessary. IMHO.

Andrew is correct that if the nstmp file is left around, it will be detected as a mail folder file, and a summary database file will be automatically created to use it.
Comment 79 Kent James (:rkent) 2012-08-22 22:00:00 PDT
(In reply to Andrew from comment #75)
> 
> > less aggressive and negative
> > in your style here.
> 
> We are talking about gross incompetence here.
> No way aggressive.
> I just calmly stated the facts.

Andrew, while you have good things to offer technically, your communication style here prevents your skills from being used as effectively as they could be. If you have any interest in discussing this further, please email me privately. I have no interest in debating this further in this bug, so I will make no more public comments on this issue, nor respond to further comments on this issue.

> 
> Why simulate what we see in code already?
>

Because it is now official policy to add automated testing for issues whenever possible, and hiro is following that policy. Most of us also agree with that policy, though at times it is quite inconvenient.

>
> You should not try only encryption.
>

I understand that now, but at first I was confused why hiro's test did not have anything about encryption in it. I was trying to summarize the situation for future reviewers, so they would not have to spend the same time that I have spent pouring through the comments and code. I did not add any new insights, only summary.
 
> 
> > Hiro's test code shows
> 
> No. It showed nothing.

It shows in an automated test that error loss occurs with a file error at a particular point. "Automated test" is considered a superior solution to simply talking though the code.

> 
> > Once that occurs, then we must take heroic efforts to ensure
> > that the compact completes.
> 
> Incorrect. Compact is already done.
>
> "nstmp" contains correct, compact mail

True, but SOMETHING is still undone, else we would just exit before renaming the file. If you would rather I say "The complete compact process is not done" then I will say that. If you are unhappy with my words, them please propose what you would like to call that-which-is-not-done.

> 
> > But in fact, if any error occurs after that
> 
> After what? What do you mean?

Any errors that occur after the between the ->Remove and the complete of that-which-is-not-done (the compact process).  Like a failure of the move.

> 
> Current code does not attempt to complete compact. It simply destroys it. No
> heroism.

Exactly - and I am saying (in agreement with you) that we need to add what I am calling "heroic efforts" to recover from the failure. Simply leaving the temp file as you propose is one possible "heroic effort".

> 
> For clarity: perhaps by the word "both" you mean not both attempts at
> recovery of a fault (which never takes place), but another kind of "both":

I believe you are referring to "we both fail AND delete" This is the current behavior. The "fail" is an error return from ->MoveToNative, and the "delete" is m_file->Remove(false);

> 
> Imagine that some more incompetent programmers appear. They would decide to
> change this function so that it performs in 2 stages: remove the destination
> if it exists, then move the file.

This is why, if possible, we try to add automated tests, to catch future failures.

> 
> > - If the move reports to have succeeded, all is well (so we trust success).
> > - If the move fails, then we test if the original file is intact.
> 
> No need to do that. Because there's no need to delete the compact copy of
> the file.
> 
> > If it is, delete the unsuccessful compacted file.
> 
> Surely you can't call this: "additional heroic efforts to recover".
> It is a "heroic" effort to destroy the good part of the result, and thus
> fail the user even more.

At this point, we have a perfectly valid old (but uncompacted) mail file. We can leave the old mail file, the old .msf file, and everything is as it was before we tried to do our ill-fated compact. Leaving any remnants of our failed attempt is just introducing unnecessary, confusing junk into the mail folder. The compact will be tried again later, and hopefully it will succeed next time.

> 
> Consider that an already faulty OS may wrongly report status of files. Or
> some wrong flags may be present in the function calls from Mozilla to the
> OS, like they were this time.
> You wrongly decide that the file exists, while it is not.
> So, you blissfully delete the copy.
> 
> During a fault you better be more careful.

I can see that your willingness to be heroic here exceeds mine. There are limits to the number of cascading failures that we need to consider, particular when the cost is to leave possibly 2 copies of the same mail folder around to confuse the user.

> 
> > If not, attempt to create a unique
> > file, then move to that file. If even that fails, then leave the temporary
> > file.
> 
> Not necessary to bother.
> As I noted in review of my patch, you may do this at the start of
> compaction: name the file "original-compact".
> 
> But "nstmp" is good enough.

There would be merits in choosing a better temp name originally, but I don't believe it is necessary. "original-compact" though would introduce localization issues, while "original-1" would not.
Comment 80 Andrew 2012-08-23 04:34:22 PDT
(In reply to Kent from comment #78)
> In what I expect to be by far the most common case, the move is indeed
> atomic, and the failure will result in the original mail folder file intact.

As I just have proved the move was not "atomic". Hence the data loss.
Now you start the same argument again.
This time the bug was in the same code. (Removal of original file.) Next time it will be elsewhere.

> In that case, the correct thing to do is to delete the nstmp file, as it is
> junk at this point

Why? It is correct and compact. Just what the user wanted.

> of no value since the compact failed.

No. Compact succeeded. What failed is file handling.

> The original mail folder file is fine

No guarantee of that.
And please heed my previous point that a user want to be in control. Let the user decide which file to delete by hand: the compact, or the original.

> and can continue to be used successfully.

No. A user wants it compact.

That's unless compact starts automatically without any warning, and knowledge of the user. Which is a bad default setting in preferences.
You should allow a user to make settings of compaction explicitly, so one is at least aware of it.
That's one more bug.

There are many reasons why one would not use the folder successfully. Some may be objecive. Some - subjective. You never know.
For example: a full disk.

Do not take away control from a user.

> An extra nstmp file, while perhaps of interest to the power user

An extra "nstmp" file is an extra opportunity to work around the problem.
You want to take it away.

No. "nstmp" is of interest to every user, who loses data.

You want to keep the user stuck with the problem.

> junk from the perspective of the average 

An average user feels powerless in the face of failure. You leave a user alone with one's problem.

It is the average user, who benefits the most from having a simple and easy choice immediately in their mail window:
- Delete original.
- Delete compact.

> It should only be left there
> after you have verified that it is necessary.

The only verification, is to ask the user.
Remember that "nstmp" is a successful compact folder.

Automatic verification is not reliable, because the calls to OS fail. A failing system cant be reliable enough for such decisions.

> a summary database file will be automatically
> created to use it.

It will not create it, if it already exists. That's one more reason to keep the summary. Although its removal is not harmful. It is only inefficiency both in code of compaction, and in subsequent creation of summary anew.
Comment 81 Andrew 2012-08-23 04:47:09 PDT
(In reply to Hiroyuki from comment #77)
> nstmp will be appeared in folder pane.
> Is it ok for you?

It is not only OK, it is what a user wants, when something fails. A user wants the product of the operation, which one tried.

Reminder: it will appear as a folder only after restart of Mozilla.
Expert users know this.
Novices may panic, if they see no result of compaction.

Or they may think that compaction succeeded, while it actually failed.
This is one more reason why you should not silently delete "nstmp". Because it provides a clue to the user.

Automatic display of the folder without restart is an option for further development of the patch later. First let's do the essence.
Comment 82 Andrew 2012-08-23 06:06:47 PDT
(In reply to Kent from comment #79)
> your communication style

You never explained what bothers you. So, you never should have made it public.

> > Why simulate what we see in code already?
> 
> Because it is now official policy to add automated testing for issues
> whenever possible

It is impossible to justify every line of code by tests.
Coding is logic. It is not testing.

There's no use to test whether a code returns failure, if you write a deliberate code to return failure.
There's no use to test whether the calling code fails on failure, if the code is written to fail.

> would not have to spend the same
> time that I have spent pouring through the comments and code.

So, you agree that the test is a distraction.
Its author still has not explained the test. This makes it even more so.

Comments tend to grow in numbers. That is because participants often fail to read what others mean. There's a lot of repetition.

That is why I attached a single description of the bug.

> It shows in an automated test that error loss occurs with a file error at a
> particular point. "Automated test" is considered a superior solution to
> simply talking though the code.

Perhaps you are talking about something I do not know.
I have not seen any "Automated test", and how did it show where is the error.
No report on that. Hiroyuki only wrote that something worked.

It is not a superior rocket science to notice that an operator "if" causes the error.

Do you create an "Automated test" to test the results of your "Automated test"?

> True, but SOMETHING is still undone

Only renaming.

Please be consistent with yourself. Here you mean that if "something" is not done, then it is an overall failure, and needs to be rolled back.

But you do not treat the failure itself in the same way. If "something" has failed, then it is a failure to proceed with anything.

> else we would just exit before renaming
> the file.

What do you mean?

By the way, in my comment about a report by WADA I noted exactly that: if compaction fails before reaching the function FinishCompact, then it seems to simply abandons the file, just as you speculate now.

> If you would rather I say "The complete compact process is not
> done"

Useless to say that.
The purpose of compaction is not its process. the purpose is the result.

You propose to destroy a valid result, just because of some minor side issue with something else, not with compaction itself.

> Simply leaving the
> temp file as you propose is one possible "heroic effort".

Many users will thank me for heroism in defending their right to have their data, and control, and to be able progress with compaction as far as possible, despite of failures.

> This is why, if possible, we try to add automated tests, to catch future
> failures.

You have not read my description, and my comments.
"automated tests" failed to prevent this bug, as they should. How do they decide what is a failure "automatically"?

Remember that you introduced the bad flag in nsLocalFileWin.cpp before you renounced old OS. (Bug 199473 in "See also" list.)
It should have failed.

Later you applied a patch for another bug 545650 (in "See also" list.)
There you did check for the version of the OS. But failed to embrace the bad flag. You only embraced the bad flags which you introduced with that patch.

A test of that patch should fail again. And it did not.

That is gross incompetence.

What's worse, is that nobody is paying any attention to this fact "automatically".
Nobody tries to assess the abilities of the responsible programmers, and to trust them less, or to restrict their access to production of code.

> Leaving any remnants of our
> failed attempt is just introducing unnecessary, confusing junk

Please do not drive the discussion in circles.
I already answered this point both in this message, and earlier.

A user is confused only by failure, and data loss.
One would not be confused by extra data, which you kindly give to the user in case of failure.

Again you are talking as if "confusion" was constant. It is only on failure. And it is a quite good indicator of failure.
A bad indicator is silence, as it is now.

> tried again later, and hopefully
> it will succeed next time.

No. A failure is persistent.
So you constantly keep a user in a dead loop, unable to proceed even that bit.

Only some failures may be intermittent, like the bug in the "Blocks" list.
But you never try to determine the type of failure. So you should not make any assumption, or hope for anything good.

A user would have to solve the problem by hand. Or work around it by renaming "nstmp" by hand. Hand usually succeeds. Software usually fails.

> I can see that your willingness to be heroic here exceeds mine. There are
> limits to the number of cascading failures that we need to consider

You are writing, as if you are analyzing a failure, which you know nothing of.
This bug would not be filed, had it been not persistent, and cascading.

If a bad flag is there, it is not going to disappear.
A once failed move will indeed cascade, and will fail twice, and ever more.

I repeat that you are inconsistent:
You treat one failure as a total failure of compact.
Then you should treat one failure as a total failure to operate at all.

Instead you propose to treat a failure to move with one more call to move. Two calls of a kind are very likely to fail.

> There would be merits in choosing a better temp name originally, but I don't
> believe it is necessary.

Be consistent.
At one point you say you want a better name.
Now you say you don't.

> "original-compact" though would introduce
> localization issues

We are talking about failure. The purpose of the appendix "-compact" is not to participate in mail. The purpose is to allow the user to detect abnormality. Those, who do not understand English will notice the new folder even sooner.
Consider the appendix as a mere collection of letters, "cryptic", as you termed it.

I do not think it is necessary to translate it to other languages. Let alone making a preference for it.

> while "original-1" would not.

I see no difference.
"original-compact" looks just the same.
So, the name can be either of these. I do not much care.

It would be useful to name the temporary file after the original in some way.
Comment 83 Wayne Mery (:wsmwk, NI for questions) 2012-08-23 06:33:52 PDT
(In reply to Andrew from comment #82)
> (In reply to Kent from comment #79)
> > your communication style
> 
> You never explained what bothers you. So, you never should have made it
> public.

I will offer this starting point, accusing/calling people incompetent is rude at best.  It is sufficient to point out technical errors, i.e. what you feel is incorrect code.


> > > Why simulate what we see in code already?
> > 
> > Because it is now official policy to add automated testing for issues
> > whenever possible
> 
> It is impossible to justify every line of code by tests.
> Coding is logic. It is not testing.
> 
> There's no use to test whether a code returns failure, if you write a
> deliberate code to return failure.
> There's no use to test whether the calling code fails on failure, if the
> code is written to fail.

The tests are required. REQUIRED.  And a patch will not be accepted without tests.

Of course, any proposed test would in fact need to be correct such that it catches the failure it presumes to test.


> > Simply leaving the
> > temp file as you propose is one possible "heroic effort".
> 
> Many users will thank me for heroism in defending their right to have their
> data, and control, and to be able progress with compaction as far as
> possible, despite of failures.

many users, and I dare say vast majority, won't have a clue what to do with an nstmp folder or file if they see it.
Comment 84 Wayne Mery (:wsmwk, NI for questions) 2012-08-23 07:54:24 PDT
hiro, perhaps asking feedback or review from bienvenu on a proposed patch would be useful

And, it's not anyone's fault here, but we BADLY need a working test. The amount of dataloss this is causing users is non-trivial, i.e. relatively high.  I'm not suggesting all of http://getsatisfaction.com/mozilla_messaging/tags/compactlost are caused by this bug, but I'd think a high percentage are

Thanks for your efforts.
Comment 85 Andrew 2012-08-23 10:36:19 PDT
(In reply to Wayne from comment #83)
> accusing/calling people incompetent is
> rude at best.

Then calling people criminals is rude as well.
Yet a result of every conviction is exactly this.

I gave an example of blatant incompetence of makers of patches for bug 199473 and bug 545650.
That's the proof.
Can you prove that it was competence?

People made a code, which obviously fails.
Later they added one more code, and still failed to notice that a couple of lines below the code needs the same check, which they introduced.

I am not even speaking of people, who later hear of a bug, and fail to look at the code.
For example, David Bienvenu, whom you mention. In comment #12, soon after my initial report in 2011, he did point to nsLocalFileWin.cpp. But failed to look into the file to find anything about encryption.
Instead he encouraged us to do the opposite: not to look into the file. 

Makers of that patch of encryption failed to notice that my report is about encryption too. I do not know if they did see the bug, which we discuss now. If they did, they did not care.

That is what is impolite. I am very polite in pointing these issues.

The result of this impolite behaviour is a delay, and lost time.
Instead of looking at that file I started a search for a bug from scratch. I came to that file separately out of my own conclusions.

Initially I failed to track the call stack correctly. So, I might have failed to locate the bug. This would mean a prolonged disaster for someone.

> It is sufficient to point out technical errors,

Why?
I do not know what is sufficient then...
Then anything is allowed...

These are not small errors. These are bad design.
I have mentioned this already: the mentioned other 2 bugs ("See also" list) patch the OS, rather than Mozilla. Yet they introduced one more bug.

> The tests are required.

You are not reading what I mean.
I do not accept "required" for an argument.

To start with: I wrote that Hiroyuki does test whether a new patch disturbs past patches. He only "tests" the bug in this report.
And he certainly cant test for unknown. He can't test the new patch for unknown failures.

He only can foresee some failures at best. And he would do that not by testing, but by looking at code.

The fact that you do not apply patches without tests would mean that you do not apply tests as well. Because you never test them.

I point out that you spend huge energy on minors things, yet you spare every drop of energy on looking at code.
And your response is that it is required.

Mozilla is stagnant. No change possible. However bad things would be.

> many users, and I dare say vast majority, won't have a clue what to do with
> an nstmp folder or file if they see it.

Such user would not know what is a computer anyway.
If I see a mail folder nstmp, I look at it, and find duplicate messages.
That's what any user would do, even unreasonable one.
If one decides to delete "nstmp" - that's one's choice. You can't change that, and should not.
But you should help users, who want to do something about it.
Unreasonable users would find out quite quickly why "nstmp" appear. So, they know there's a fault with compaction.

Especially if compaction reports so in status bar, or else where (perhaps in JS Error log)

The current code gives no user no clue as to what happens at all.
You don't like users to have no clue. So, you should not like that either.

I have thought why nsFolderCompactState::FinishCompact() removes the original.
I think this is a heritage of the old function MoveFile in the OS. It used to fail, if file exists. It had no flag to overwrite.
That is why it removed the original before copy.

A programmer should have known that this is inherently unsafe. One should keep the original until its compact copy ("nstmp") takes its name.
A programmer should also know that this job of moving a file belongs to file handling functions - not to compaction.

Long ago you started using MoveFileEx. It does allow overwriting. But you never changed FinishCompact to take advantage of this.
Comment 86 Andrew 2012-08-23 11:16:06 PDT
(In reply to Wayne from comment #84)
> And, it's not anyone's fault here

What do you mean?
Were those 2 bugs a natural disaster? Or were someone's fault?

If we now fail to make a good patch, whose fault would that be?
Yours? Mine? Or again it would be "anonymous"?

> but we BADLY need a working test.

Until now nobody said what exactly do you test, and how.
Kent  only made asumptions about Hiroyuki's test. But Hiroyuki himself never answered.
Do you test a bug?
Or a patch?

The bug is obvious.
The patch is not approved yet.

> amount of dataloss this is causing users is non-trivial, i.e. relatively
> high.

Then please tend to the warning in my comment #1. As I remind you in comment #55, please post warnings on all main Mozilla sites, which feature the download of the mail program (TB, SM, and perhaps other).

A year ago I was already asking keepers of those sites to post a warning. Nobody cares.

> http://getsatisfaction.com/mozilla_messaging/tags/compactlost are caused by
> this bug, but I'd think a high percentage are

Post a warning to them too.
Although, it is belated. Data is already lost. But they may repeat the loss.

One of those links to a draft article about compaction on wiki.mozilla.org.
There a warning would be useful too.

You made the "Whiteboard" mark:
[regression bug 199473][gs]
What does it mean?
Comment 87 Kent James (:rkent) 2012-08-23 11:25:21 PDT
Hiro, do you still want your patch in https://bugzilla.mozilla.org/attachment.cgi?id=653236 to go forward, or would you like to try to incorporate first my suggestions from comment 70?

If you still want to move forward, I suggest you change the reviewer to bienvenu. He is still keeping up with reviews on the weekend, and this is more his area than Standard8.
Comment 88 Magnus Melin 2012-08-23 11:27:31 PDT
(In reply to Andrew from comment #85)
> (In reply to Wayne from comment #83)
> > accusing/calling people incompetent is
> > rude at best.
> 
> Then calling people criminals is rude as well.
> Yet a result of every conviction is exactly this.

Oh come on, you know it's rude. With programming there's often unexpected situations you wouldn't think of when you write the code. That + time constraints doesn't make anyone "grossly incompetent". After all, the vast majority hasn't hit this bug.

As for automated tests, they are run continuously (https://tbpl.mozilla.org/?tree=Thunderbird-Trunk). Automated testing is however relatively new, so there's not tests for everything.
Comment 89 Andrew 2012-08-23 12:14:57 PDT
(In reply to Magnus from comment #88)
> Oh come on, you know it's rude.

Not in the slightest.
It is straight talk.
Where would we be, if to every blame we would have an answer that blame is rude?

Every source code in Mozilla has a link named "blame".
It is rude then.

> With programming there's often unexpected
> situations

Then every sweeping man can be a programmer.
Animals too.

> incompetent". After all, the vast
> majority

How do you know? From count? Or from imagination?

> hasn't hit this bug.

Perhaps they did not even notice. Because most people receive and send junk anyway.

Wayne just showed you the extent of the problem.
Most users hit it one day.
This code is so old that gigabytes upon gigabytes are lost by now.
And nobody cares.

Sorry to be so rude. Perhaps I should "politely" lie: everybody is trying one's best to avoid mistakes.

There were multiple reports even on bugzilla.
All the reporters got was an endless talk about reproducing the problem.
Instead, you should simply look at the code.

Sorry to be rude again. I should "politely" lie: everyone tried ones best to test.

> As for automated tests, they are run continuously

Thank you for the link.
Now I know it. But I still have no clue about what the link does. And what the test by Hiroyuki does.
Is there an explanation somewhere? No link to explanation from there.

> there's not tests for everything.

And there can't be.
Comment 90 Magnus Melin 2012-08-23 12:47:08 PDT
(In reply to Andrew from comment #89)
> It is straight talk.

Well the tone of that talk isn't doing anything good.

> > incompetent". After all, the vast
> > majority
> 
> How do you know? From count? Or from imagination?

Given a user base of tens of millions, there would be a lot more complaints filed.

> Now I know it. But I still have no clue about what the link does. And what
> the test by Hiroyuki does.

There are builds generated constantly from the latest hg tip - to make sure it builds and tests pass (tests run automatically for all builds). tbpl shows the results of those builds and test output, in combination with the hg checkins. 

AFAIKT, Hiro's test creates a folder, fills it with mail, deletes some, calls compation (and having it fail), then checks that even on (this case of) failure the target folder still exists.

> > there's not tests for everything.
> And there can't be.

Of course not, but it's still a goal to test as much as possible. I have no idea why one would be against that, except for that it's much work to create tests.
Comment 91 Hiroyuki Ikezoe (:hiro) 2012-08-23 16:32:15 PDT
Please let this issue separate into several bugs.

As Andrew has mentioned, this issue includes two issues. The one is in compactor code, the other is in XPCOM.

To step forward this issue step by step, I'd suggest separating this issue into more small pieces.

1. The issue of original folder removal when neIFile.moveToNative fails
2. Disposition of temporary files (removed or not)
3. The failure of nsIFile.moveToNative on encrypted disk on Windows XP
4. Atomic-ness of nsIFile.moveToNative on Windows
5. Test

-- #1 --
IMHO, #1 is the most critical issue, should be fixed as soon as possible. My patch has the fix only for this issue.

-- #2 --
We need to consent about #2.

-- #3 --
Andrew's patch already includes the fix for #3. The part should be separated since the XPCOM module owner is totally different from comm-central.

-- #4 --
nsIFile.moveToNative is atomic both on Linux and MacOS. The method should be atomic on Windows too. (I just googled a bit, Windows Vista might provide atomic operation of renaming file.)

-- #5 --
Already filed bug 784888.
Comment 92 Hiroyuki Ikezoe (:hiro) 2012-08-23 17:19:40 PDT
Removed the dependency for the test.
Comment 93 WADA 2012-08-23 23:03:58 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #91)
> To step forward this issue step by step, I'd suggest separating this issue
> into more small pieces.
> 1. The issue of original folder removal when nsIFile.moveToNative fails
> 2. Disposition of temporary files (removed or not)
>(snip)
> -- #1 --
> IMHO, #1 is the most critical issue, should be fixed as soon as possible. My
> patch has the fix only for this issue.

Bug 498814 is for next problem.
(case-A) Interfere of Tb's first open of "original" as first step of Compact,
         by other software such as auto-backup.
         Note: manual forcing of condition is always possible and easy.
  "Sharing Violation" => null "nstmp" => null "original"
  This null Mbox file was avoided by introducing m_totalMsgSize.
(case-B) Interfere of Tb's nsIFile.moveToNative after successful nstmp/nstmp.msf
         creation, by other software such as auto-backup.
         Note: manual forcing of condition is impossible.
  "Encrypted original file on Win-2K of this bug" looks a way to force 
  nsIFile.moveToNative failure.

Because nstmp/nstmp.msf, original/original.msf is closed by Tb before nsIFile.moveToNative, any other software can open any of nstmp/nstmp.msf, original/original.msf before Tb's nsIFile.moveToNative.
So, following can occur even after your patch.
  moveToNative(nstmp     -> original)     == successful
  moveToNative(nstmp.msf -> original.msf) == unsuccessful
In this case, content of remaining original.msf is still old but original is already updated, so "Outdated-MSF condition" happens after end of Compact, and garbage of nstmp.msf is kept unless nstmp.msf is deleted.

A far as "internal rebuild-index" is normally invoked upon next mail folder open, problem is "unexpected rebuild-index after Compact" only.
In some bugs, phenomenon of "rebuild-index after Compact" is seen.
However, some codes appends mail data to mail box file without explicit mail folder open, then "Outdated-MSF condition in .msf" is kept until next explicit mbox open like "Mail folder click". This may produce unwanted problem, and may be a cause of some funny phenomena after silent auto-Compact in some bugs.

Ikezoe san, do you have solution of this kind of issue?

Another question.
As seen in prefs.js corruption report even after "Safe file writing of prefs.js",
  write to prefs-n.js, delete prefs.js, rename prefs-n.js to prefs.js
power failure/system crash can occur while OS is executing "rename".
Because rename is "update of file system's directory content", it is affected by power failure etc.
Is there any protection from this kind of data loss issue in Compact folder?
Comment 94 Hiroyuki Ikezoe (:hiro) 2012-08-23 23:32:19 PDT
(In reply to WADA from comment #93)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #91)
> > To step forward this issue step by step, I'd suggest separating this issue
> > into more small pieces.
> > 1. The issue of original folder removal when nsIFile.moveToNative fails
> > 2. Disposition of temporary files (removed or not)
> >(snip)
> > -- #1 --
> > IMHO, #1 is the most critical issue, should be fixed as soon as possible. My
> > patch has the fix only for this issue.
> 
> Bug 498814 is for next problem.
> (case-A) Interfere of Tb's first open of "original" as first step of Compact,
>          by other software such as auto-backup.
>          Note: manual forcing of condition is always possible and easy.
>   "Sharing Violation" => null "nstmp" => null "original"
>   This null Mbox file was avoided by introducing m_totalMsgSize.
> (case-B) Interfere of Tb's nsIFile.moveToNative after successful
> nstmp/nstmp.msf
>          creation, by other software such as auto-backup.
>          Note: manual forcing of condition is impossible.
>   "Encrypted original file on Win-2K of this bug" looks a way to force 
>   nsIFile.moveToNative failure.
> 
> Because nstmp/nstmp.msf, original/original.msf is closed by Tb before
> nsIFile.moveToNative, any other software can open any of nstmp/nstmp.msf,
> original/original.msf before Tb's nsIFile.moveToNative.
> So, following can occur even after your patch.
>   moveToNative(nstmp     -> original)     == successful
>   moveToNative(nstmp.msf -> original.msf) == unsuccessful
> In this case, content of remaining original.msf is still old but original is
> already updated, so "Outdated-MSF condition" happens after end of Compact,
> and garbage of nstmp.msf is kept unless nstmp.msf is deleted.

To solve this case, a new msf should be re-created from the folder which has been compacted. Right?

> Another question.
> As seen in prefs.js corruption report even after "Safe file writing of
> prefs.js",
>   write to prefs-n.js, delete prefs.js, rename prefs-n.js to prefs.js
> power failure/system crash can occur while OS is executing "rename".
> Because rename is "update of file system's directory content", it is
> affected by power failure etc.
> Is there any protection from this kind of data loss issue in Compact folder?

Though I can not understand what is the problem you mentioned, if the OS provides "rename" as an atomic operation, application have no choice about it?
Comment 95 WADA 2012-08-24 00:21:20 PDT
(In reply to Hiroyuki Ikezoe (:hiro) from comment #94)
> To solve this case, a new msf should be re-created from the folder which has
> been compacted. Right?

Right. "Internal rebuild-index upon outdated msf condition" is similar to "read important data in .msf" + "delete .msf" + "reparse mail data and reconstruct .msf data". In order to avoid loss of important data such as "tag data where IMAP server doesn't support user keyword", old .msf is needed for expected rebuild index.

> Though I can not understand what is the problem you mentioned, if the OS
> provides "rename" as an atomic operation, application have no choice about
> it?

When I saw loss of prefs.js after system crash on Win-XP, CHKDSK by OS was invoked upon power-on/reboot, and removal of incomplete prefs.js from file system was observed. IIRC, prefs-n.js was not kept by OS.
i.e. Intermittent status while "rename from prefs-n.js to prefs.js with replace mode" looks not atomic if power failure or system crash happens while renaming from prefs-n.js to prefs.js.
If similar thing happens while "rename from nstmp to original", loss of nstmp and original may occur.
To protect from this kind of issue, "rename current to old, rename new to current, delete old" with "undo or redo if intermittent status" is needed to protect from file loss due to power failure. This is needed in current "safe file writing of Mozilla" to protext from outstandig loss of prefs.js.
And, "rename nstmp to original + rename nstmp.msf to original.msf" with "undo or redo if intermittent status of two consecutive rename" is needed, if "avoiding rebuild index upon rename nstmp.msf to original.msf failure" is mandatory.
Comment 96 Hiroyuki Ikezoe (:hiro) 2012-08-24 02:01:26 PDT
(In reply to WADA from comment #95)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #94)
> > To solve this case, a new msf should be re-created from the folder which has
> > been compacted. Right?
> 
> Right. "Internal rebuild-index upon outdated msf condition" is similar to
> "read important data in .msf" + "delete .msf" + "reparse mail data and
> reconstruct .msf data". In order to avoid loss of important data such as
> "tag data where IMAP server doesn't support user keyword", old .msf is
> needed for expected rebuild index.
> 
> > Though I can not understand what is the problem you mentioned, if the OS
> > provides "rename" as an atomic operation, application have no choice about
> > it?
> 
> When I saw loss of prefs.js after system crash on Win-XP, CHKDSK by OS was
> invoked upon power-on/reboot, and removal of incomplete prefs.js from file
> system was observed. IIRC, prefs-n.js was not kept by OS.

I suppose this is an evidence that Windows XP does not provide atomic rename.
Comment 97 Andrew 2012-08-24 08:42:39 PDT
Created attachment 655022 [details]
Description of the 2 bugs. (Formerly comment #23.)

Updated the section about the "History of bad code" in nsFolderCompactState.
There I make an interesting revelation about safe ("atomic" as you cal it) file handling, which Mozilla had previously. I do not suggest to return that feature though. The current OS handles renaming well enough, and fast.

Updated the sections:
- about "Impact" and the not importance of the quantity of reports.
- "Dereliction of duties". Including your remarks about rudeness. But mainly about responsibility, and meritocracy, which is a rule in Mozilla (not optional).

WADA, nothing cant prevent you from interfering with compaction.
See comment #14. There's no cure against brute force.

You make various suggestions of how to create, or recreate "nstmp.msf", and move this and other files on failure in some different way.
But every such subsequent action may encounter the same failure, because other software may keep opening files at will.
For example, how do you recreate "nstmp.msf", if the original is still open by other software, and Mozilla cannot open it?
You can't possibly program to prevent this mess.
One solution might be to "lock" files during compaction. Maybe open them in "not shared" mode. But then maybe the OS would not allow a "move" of another file to your open file. You would rather have to write to file, rather than move it. Maybe other ways are possible.
Maybe defer the move till destination becomes available.
To think about this requires lots of time.

You should not patch the OS in Mozilla, like you did with the bugs in the list named "See also".
The OS should have a patch itself. That is what I prefer.

Here you only cure Mozilla, so it is usable, and does not loose data.
I think your request does not belong here.
You did not answer my remarks in comment #55.
Your request is not about loss of data, isn't it?

As I mentioned, it is the same function of code. So, we could solve your bug and this bug in one go.
But isn't it better to separate them? Because your bug may wait. It is not critical.
That's unless you may design a patch quickly. Then there's no problem in doing all at once.

I think that making other software read files of Mozilla is solely your responsibility.
You could program your software to access the files safely.
You may also program Mozilla to wait for a safe opportunity, when "nobody is looking" at the files. Maybe you can develop such an extension in JavaScript. No need to mess with low level code.

Frankly, I did not analyze your comments in detail. It is too difficult to follow your thought, and infer your wishes from he pile of data. Not clear. I only glanced at them. Maybe I misunderstood you. You may try again.
If they are not related to data loss (either itself something that triggers it), then it is better to put your comments in your bug, where they belong. Less clutter.

You may create a summary of your bug, like I did here. Make an attachment, and update the text sometimes.

To Kent in comment #71.
My 2 patches in one attachment do indeed span 2 different files.
But the rules of Bugzilla do not forbid this way. Rules actually allow it.
Just we would need a "super-reviewer" in this case, in addition to normal reviews.

We could think of this patch as new. Then it may belong here.
Or we could think of it as an extension to patches in the "See also" list (bug 199473, bug 545650).
Then you should reopen one of those.
Commit my patch. Later we could refine it as per my suggestions in attachment about "Description". Namely, to use return error codes rather than version of OS. So, if the OS is patched already, the patch in Mozilla does not act automatically.

To Magnus in comment #90.
Kent in comment #70 speculated differently:
That Hiroyuki's test tests the bug.
You speculate that it tests the patch.
This means that Bugzilla does not know.

To Hiroyuki in comment #91.
Your list of solutions is incomplete.
Add my list in my attachment named "Description of the 2 bugs".
You missed the need to post warnings on all sites.

Nobody proved that removal of "nstmp" is necessary.
I provided plenty of arguments against it.
This means that we may act now.
The bug is critical, hence - urgent.
Later you may take time to refine it.

Please commit my patch of "FinishCompact" on my behalf.

I could do that myself. But I do not understand the rules well. Does the patch need a test, or only bugs are tested?
The patch is trivial. Little test is necessary anyway, if at all.

The rules are a too complex mess. Explanation of the rules, which I read, is a mess in itself.
I write about this in my attachment as well.
Comment 98 Hiroyuki Ikezoe (:hiro) 2012-08-24 17:34:40 PDT
(In reply to Andrew from comment #97)

> To Hiroyuki in comment #91.
> Your list of solutions is incomplete.
> Add my list in my attachment named "Description of the 2 bugs".
> You missed the need to post warnings on all sites.

6. Post a warning on all sites of Mozilla, which distribute the mail software

So can you agree that this bug is separated into these 6 bugs?
Comment 99 Andrew 2012-08-24 18:56:22 PDT
Created attachment 655246 [details] [diff] [review]
Patch destructive compaction, and file handling

Additional patch.
I named "nstmp" as "original-1" to begin with.
So, you do not have to bother to rename it later. Renaming may fail not just to original, but to any name due to unexpected bugs.
In that case of failure a user would still be able to proceed.

(To Hiroyuki from comment #98)
What do you mean "separated"?
There are no 6 separate bugs.

There are 2 bugs in 1: compaction, and file handling.
As I just noted, it is better to patch them together. Because it is easier to read about them in one place. Programmers will learn that code is not separate. Pieces of it interact.

It is possible to separate nsILocalFileWin.
It is your point 3.
But then it is better to put it into bug 199473.
I think your rules state that bug 199473 and perhaps bug 545650 should be rolled back immediately now. Then my patch could patch them again.
Simultaneously authors of those patches would have a chance to test again if the patches are still necessary. Maybe the bugs in OS are gone.

Your points 1, 2 are just one bug.
I do not agree to delete "nstmp" (or "original-1", as per my patch). 
Safety - first. Remove all dangerous lines of code.
Please commit point 2.
Lets do it, and see what happens.
Later you may argue about it.
I think nothing bad will happen. Users will not complain about extra folders. Because failure is rare. If failure persists, they would not complain, because they would still have folders compact.
Perhaps no discussion would be necessary.

By the way, this my latest patch solves the point 2, by removing the need to rename.

Your point 4 is not a bug.
No proof of it. Only speculation.
No need to file a bug.
Note my attachment named "Description ...".
See section "History of bad code".
There I explain that "atomicity", as you term it, initially was in the same nsILocalFileWin.
If you really want to make Mozilla paranoid, then you simply can find which patch removed that, and revert it.
But this reduces efficiency.
I think it is best to leave it till you find evidence.

Your point 5 about the test is already separate. I presume the system does not allow to merge bugs, or delete them. So, it is pointless now to discuss.
Otherwise, I think that all related stuff must be in one place.
But maybe I do not know something about how you usually handle tests, and what they are.
I can't say definitely whether you should merge them or not.

Point 6 (the warning) can't be a separate bug. It is just one of the jobs to do.
Comment 100 Hiroyuki Ikezoe (:hiro) 2012-08-24 19:10:34 PDT
(In reply to Andrew from comment #99)

> It is possible to separate nsILocalFileWin.
> It is your point 3.
> But then it is better to put it into bug 199473.
> I think your rules state that bug 199473 and perhaps bug 545650 should be
> rolled back immediately now. Then my patch could patch them again.

If you really want to do so, please file a new bug for it.
As you can see the top of this page, there is "Product:" field which is "Mailnews Core" for this issue. So XPCOM guys don't see your comment here and then your patch will not be accepted forever.

> Point 6 (the warning) can't be a separate bug. It is just one of the jobs to
> do.

On this bugzilla we usually handle many kinds of jobs. Writing document, updating web sites, setting up servers. etc. etc.
Comment 101 Andrew 2012-09-04 07:18:21 PDT
Created attachment 658071 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

Add a warning to web sites.

(To Hiroyuki comment #100)
>
> If you really want to do so

You have to want to do so too. It is not just my issue.
But I see no progress.
You disappoint all users of Mozilla.

> please file a new bug for it.

I explained that splitting one issue into many bugs is not effective.

> So XPCOM guys don't see your comment here

They did see my comment there: bug 199473.
But they don't care anyway.

> On this bugzilla we usually handle many kinds of jobs.

Your word "usually" implies inaction, and bureaucracy, rather than action.

I explained that rules of Mozilla, which all of you so much defend do actually allow joining many jobs into one bug.

They also make production of bugs easy. And their repair next to impossible.
Look at the link named "Web site" in the head of this bug. People are suffering. And you do nothing. That's what I call rude. I am not rude.

Act now.
Comment 102 Andrew 2012-09-04 07:54:45 PDT
Created attachment 658088 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

Cosmetic to make "diff" display correctly. It misunderstands incomplete line breaks (LF without CR).
Comment 103 Ludovic Hirlimann [:Usul] 2012-09-04 08:00:54 PDT
(In reply to Andrew from comment #102)
> Created attachment 658088 [details] [diff] [review]
> Patch destructive compaction, and file handling. Warnings on web sites.
> 
> Cosmetic to make "diff" display correctly. It misunderstands incomplete line
> breaks (LF without CR).

YOu'll need to set reviewers too.
Comment 104 Jim Mathies [:jimm] 2012-09-04 08:21:52 PDT
Comment on attachment 658088 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

Review of attachment 658088 [details] [diff] [review]:
-----------------------------------------------------------------

Please break the patch up into two patches - one for mailnews and one for xpcom.

::: mozilla/xpcom/io/nsLocalFileWin.cpp
@@ +1821,5 @@
>     if (dwMajorVersion > 5) {
> +    //Patch the bad new OS (bug 199473, bug 545650).
> +    //This patch should be redundant, after the OS itself is patched.
> +    //Set the flags only for new OS.
> +    //Old OS would fail to accept them. It works well without them anyway.

Please delete all these comments. The if condition explains the behavior.

@@ +1822,5 @@
> +    //Patch the bad new OS (bug 199473, bug 545650).
> +    //This patch should be redundant, after the OS itself is patched.
> +    //Set the flags only for new OS.
> +    //Old OS would fail to accept them. It works well without them anyway.
> +        DWORD dwCopyFlags;

nit - bad spacing.

@@ +1827,5 @@
>         bool path1Remote, path2Remote;
>         if (!IsRemoteFilePath(filePath.get(), path1Remote) ||
>             !IsRemoteFilePath(destPath.get(), path2Remote) ||
>             path1Remote || path2Remote) {
>             dwCopyFlags = COPY_FILE_NO_BUFFERING;

Why are you excluding 'COPY_FILE_NO_BUFFERING' treatment for dwMajorVersion == 5?

@@ -1825,5 @@
>             !IsRemoteFilePath(destPath.get(), path2Remote) ||
>             path1Remote || path2Remote) {
>             dwCopyFlags = COPY_FILE_NO_BUFFERING;
>         }
> -    }

You seem to have removed the wrong brace or something. This doesn't compile.
Comment 105 Andrew 2012-09-04 08:23:08 PDT
Created attachment 658100 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

I already explained:
- I don't know where to find the reviewers.
- I am not able to commit the patch.
Anyone please help to progress. Don't just wait for me.

Kent has already reviewed part of my patches (comment #71).
But he failed to heed my counterarguments.
Comment 106 Andrew 2012-09-04 09:24:27 PDT
Comment on attachment 658100 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

(To Jim comment #104)
> Please break the patch up into two patches - one for mailnews and one for
> xpcom.

Is it really necessary?
Why?
Does something not work?

> Please delete all these comments. The if condition explains the behavior.


The note about bug numbers is necessary. The reader can't always know the reason.
The note that the code may be redundant in the future attracts attention. So, you never forget to check when it is safe to remove the entire case.
The note about failure is necessary. Because the case is not benign optimisation for some OS. This note would prevent introduction of this bug anew. Fool proof. We have seen that this is necessary.

> nit - bad spacing.

4 spaces before comments.
4 - before "if (dwMajorVersion > 5)"
Comments refer to that line, and to the entire case. Rather than to a single line of code below comment.
I could place the comment above "if". But it is better to put it inside the case to which it refers.

8 spaces before "DWORD dwCopyFlags;".
8 - before the line that follows it.

> Why are you excluding 'COPY_FILE_NO_BUFFERING'

It is not me. I do not change that. I only embraced the flag of encryption (COPY_FILE_ALLOW_DECRYPTED_DESTINATION). I made an entirely separate case for new OS.

Makers makers of the patch for bug 545650 did so. They excluded old OS. They report that old OS uses an old network protocol, which does not exhibit the bug.
But they forgot to embrace the bad flag of encryption.
How did they manage to miss it? It was only a few lines below.
They missed the opportunity to inoculate the bug 199473. I would say this is a bug in itself.

> You seem to have removed the wrong brace or something. This doesn't compile.

It is not me.

There's a problem with "diff". It loses, hides part of the patch.
Look at the raw text of the patch. You will see that all braces are in place.

Before I fixed line feeds, both "diff" and "Splinter review" displayed less lines for nsLocalFileWin.cpp, than I actually included.
So, it misses, hides, does not display the terminating brace at line 69 as per my new version. - And many lines above it.

After that brace I restored the entire case, as it was before the bug of bad flags. Namely: using solely MoveFileExW, instead of juggling with CopyFileExW, with its bad flags, and MoveFileExW.

Now "diff" displays correctly. But "Splinter review" still hides most lines.

Alternatively I suggest to completely remove the case with flags, and leave only my new (old) case without them. This would revert patches for bug 199473, and bug 545650. Because they cause failure.
If they want to apply those patches, let them try again, and waste at least the same time, as I have wasted.
See section "Solution" in my attachment with "description" of the bugs. The see a "radical" patch for "nsILocalFileWin". Namely: that new (old) case after "if" with the opposite case removed.
Comment 107 Jim Mathies [:jimm] 2012-09-04 10:38:28 PDT
Comment on attachment 658100 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

There's no need for a superreview on this patch, it hasn't been reviewed yet and I'm sure Brendan has better things to do with his time.

The purpose for splitting the patch up between mailnews and xpcom is that each part will be reviewed by two different people.

I will help by doing this for you.
Comment 108 Jim Mathies [:jimm] 2012-09-04 11:07:39 PDT
Created attachment 658155 [details] [diff] [review]
xpcom patch v.1
Comment 109 Jim Mathies [:jimm] 2012-09-04 11:24:18 PDT
Created attachment 658161 [details] [diff] [review]
andrew's fix v.1

This is Andrew's proposed solution to for the mailnews code. I do not know this code. The maintainers of mailnews can battle over which solution to review and land. The patch applies correctly to cc.
Comment 110 Hiroyuki Ikezoe (:hiro) 2012-09-04 14:42:12 PDT
(In reply to Andrew from comment #101)
> Created attachment 658071 [details] [diff] [review]
> Patch destructive compaction, and file handling. Warnings on web sites.
> 
> Add a warning to web sites.
> 
> (To Hiroyuki comment #100)
> >
> > If you really want to do so
> 
> You have to want to do so too. It is not just my issue.

Thanks. I just wanted the word.
And thanks jimm for opening bug 788212.
Comment 111 Andrew 2012-09-05 11:49:01 PDT
(Jim comment #107)
> There's no need for a superreview on this patch

Your rules say the opposite: that there is a need.

> it hasn't been reviewed yet

It has.

Kent reviewed it (comment #71), but failed to approve for no valid reason.
Others reviewed it too. But failed to provide arguments against it.

You too made comment #37. Although a scant one.

Rules do not prohibit concurrent reviews anyway.

> I'm sure Brendan has better things to do with his time.

So do you too.

I decided that I need his review.

And you do indeed spend time on other things, instead of attending to this bug.
You spent over a month doing nothing. Your review is still welcome. But I asked it from someone else.

Instead you split this bug report into new pieces. Perhaps to ensure people see less failure than there actually is.

Consider your comment #37. It is about to mail - the bug of destruction.
Later you comment #41 about W2K, as if your previous comment related to file handling (the bug of encryption).
Not a very reliable reviewer I would say.

Why you are so keen to remain the sole reviewer of the bugs?
Let other people do it, just like you yourself suggested.

> The purpose for splitting ...
> each part will be reviewed by two different people.

Who told so?

I would request reviews of 2 supervisors anyway.
Just the system of Bugzilla did not allow me to choose both at a time.
I wonder how Hiroyuki was able to set 2 reviewers for his patch?

> I will help by doing this for you.

No splitting, thank you.
I'd like this to be one bug.

I asked for help only with committing the patch.

In comment #106 I asked you: why split?
Can you compile separate parts of the patch?
Or do you compile it only as a whole?
You never explained what does not work.
Hence, there's no reason to split.

You made my patch obsolete. So my warning for the web sites is no longer visible.
Not very helpful either. Instead of publishing that warning, you hide it.
I already asked many people to publish the warning. Nobody wants to do that.
Is that on purpose?

I restore my patch, and invite super reviewers.
Later I will decide who will be a reviewer.

But I suggest not to wait. Commit the patch at least for the nsIMsgFolderCompactor immediately. Review it later, if you want.

As I write in bug 199473#c19, and in your duplicate of it - bug 788212#c13: you failed to follow your own bureaucratic rules already, when you made a buggy patch for bug 199473.
Don't shy away from again bypassing the rules now.
Comment 112 Andrew 2012-09-05 12:02:16 PDT
Created attachment 658575 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

I found that Splinter "diff failed to work, because it stops on empty lines. Put at least one space on such lines.
The other "diff" in does not fail in this way.
Please fix Bugzilla?

I use less lines of replacement. So, the view is cleaner that that of Jim's patch. There's less code change, than it seems from his patch.

I can split the patch into files, modules, of functions. But if it compiles anyway, then no need to do that. At least you did not explain.
Comment 113 Andrew 2012-09-05 12:24:22 PDT
Created attachment 658585 [details]
Description of the 2 bugs. (Formerly comment #23.)

Clarified a little about the need to check the return codes of functions, like  CopyFileExW, rather than checking solely the version of the OS.
Also about my "radical" patch. - The one, which discard patches for bug 199473, and 545650.
(Section: "Solution".)

Added a few more rants about responsibility.
(Section: "Dereliction of duties").

Please fix Bugzilla. It automatically does not detect file type HTML. I have to set it after I attach.
Comment 114 Andrew 2012-09-08 09:22:11 PDT
Comment on attachment 658161 [details] [diff] [review]
andrew's fix v.1

Review of attachment 658161 [details] [diff] [review]:
-----------------------------------------------------------------

Your patch is functionally the same as the corresponding part of mine.

Please commit any of them.
It is urgent. Remember? Don't wait for another year.
If my patch does not compile, commit yours.

We would discuss further optimisation later.
See the section "Solution" in my attachment named "Description ...".

The difference between your patch and mine is that yours has more lines to add. They appear because, you change the indent. Perhaps your system requires to mark any change, even the space.
Then your patch is less readable than mine.
Can you remove indent without marking a change?

Bugzilla does not allow me to add any review mark to your patch.

::: mailnews/base/src/nsMsgFolderCompactor.cpp
@@ +438,5 @@
>    bool folderRenameSucceeded = false;
>    bool msfRenameSucceeded = false;
>    if (tempFileRightSize)
>    {
> +    // rename the copied folder and database to be the original folder and

Here you missed my comment
// Move compact "original-1" (formerly known as "nstmp") to the original file, and overwrite it.
I think it is necessary, because it clarifies the situation.
Comment 115 Andrew 2012-09-08 09:31:25 PDT
Created attachment 659494 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

Extended warning for Web sites.
Edited a comment in Finishcompact.

Cosmetic workaround to make "Splinter diff" in Bugzilla work: added spaces to one more empty line, so "Splinter diff" does not stop on it.
Please patch Bugzilla.
Comment 116 Andrew 2012-09-08 10:33:24 PDT
Created attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.
Comment 117 Andrew 2012-09-08 10:51:34 PDT
Created attachment 659506 [details]
Description of the 2 bugs. (Formerly comment #23.)

Extended analysis of a bug in function "Finishcompact".
At line 453 there is a missing instruction to bypass destruction of compact copies. At least this should be present.
At that point the original is guaranteed to not exist.
This proves the gross incompetence of the programmer.

I do not say that the current code would be good, if it contained the bypass. I say that the bypass was obligatory, if the programmer wanted to make the code as is.

Extended the section named "Solution". The list of optimisations to do after the urgent part of the patch is committed.
Comment 118 Mark Banner (:standard8, limited time in Dec) 2012-09-08 14:07:04 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

Andrew, if you are not sure about our processes, please ask. Our process for review patches is here:

https://developer.mozilla.org/en/Mailnews_and_Mail_code_review_requirements

Your patch does not need superreview, if Kent or another peer/owner grants review, that is good enough.

If you do not agree with the fact Kent denied your review, then I suggest that you work with him, to resolve his concerns - I agree with those concerns, we shouldn't be leaving temp files around on a system just in case unless it is absolutely unavoidable, which in this case, I don't believe it is.

Please also keep comments on this bug to just those necessary to discussion the bug itself and to resolving a fix for the bug and getting it landed. Extra comments about your impressions and feelings do not help in getting this resolved.

I've tried to be concise here, so if you have any discussion/issues about what I've said that don't relate to fixing this bug, please email me direct.
Comment 119 Andrew 2012-09-08 17:04:12 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

(Mark comment #118)
> Andrew, if you are not sure about our processes, please ask.

I did. No valid answer.
For example, see my comment #97 about the 2 opposite interpretations of test by Hiroyuki expressed by reviewers Magnus and Kent.

In most other cases I get only statements, and no reasons.
Most of the time replies contradict your own rules.

> Our process for review patches is here

Thank you.
Now I know that I need one more review: about user experience with user interface.
He will tell you whether a user wants to keep compact mail, or not, when failure occurs.

I mean your mentioned "temporary" files. I leave them as "original-1". And let the user choose what to do with them.

Later you may enhance the patch to open the compact "original-1" as a mail folder for user without the need to restart Mozilla.
Provide messages to the user about failure.
See the plan in section "Solution" in attachment named "Description ...".

> Your patch does not need superreview

Read your own rules please. My patch spans two different modules. So, super is mandatory.
Did you read this bug at all?

I also would like to escalate the issue of dereliction of duties. A super reviewer may help with that.
Is this what you are trying to avoid.

The super-reviewer will speak for himself. Do not shut him up.
You gave no reason why he must "shut up".

For these 2 reasons (rules and absence of reason) I restore the super-reviewer.

> if Kent or another peer/owner grants
> review, that is good enough.

We would wait for that another 7 years.

> If you do not agree with the fact Kent denied your review, then I suggest
> that you work with him

I gave my reasons, counterarguments. Nobody has denied them yet. No need to work on them privately. They are all public.

> I agree with those concerns, we shouldn't be leaving temp files around

Have you read the bug?
What temp files? The files are the only and last copies of data.
Who are you to decide for the user which data to destroy?

In normal operation you would be absolutely right.
But we are talking about failure. In that case give control to user.
I renamed the files after their original names. They are no longer anonymous "nstmp".
No mess.

> unless it is absolutely unavoidable, which in this case, I don't believe it
> is.

No beliefs, please.
Provide only reasons.
Can you counter argue mine?

> Please also keep comments on this bug to just those necessary to discussion

Look at this bug. It is full of unnecessary comments, not made by me. Lets start with those, who asked me to test different versions of Mozilla, instead of looking at code.

What discussion? About beliefs?
Nobody discusses my arguments.

You do not tell what is "off topic" anyway. So, I dismiss your comment.

> your impressions and feelings

Do not try to portray, and dismiss me as "bad feelings".
I only mean meritocracy.
I explain this in my section "Dereliction of duties" in attachment named "Description ...".

> do not help in getting this resolved.

How your comment helps resolution?
You removed the super-reviewer.
You did not commit the patch.
What did you do?

Take the patch, and commit it now.
Jim has already started that. This means that he agrees with my patch. He agreed that later you may argue more.
He even compiled something. But he did not commit it.

> don't relate to fixing this bug, please email me direct.

If there were anything personal, I would.
All this is public. Everyone will see how much little worth is your declared care about users.
You should commit the patch immediately. Later argue about cosmetics.
Instead, you drag your feet, and argue about bureaucracy.
Aren't you ashamed any little bit? The bug is over 1 year old. And nobody cares. Yet you ask me to keep private, lest anyone sees the shame.

You don't mention what is "off topic". You try to divert the talk off the public view. Such comments of yours are off topic themselves.

I could simply revert your change silently.
But I do comment. So, you know the reason. Unlike you. I always reason.

Please provide a proper review, if you would. But before that please read the bug.

Please post warnings about data loss on all sites. Take the text from my patch.
Comment 120 Gervase Markham [:gerv] 2012-09-09 01:43:39 PDT
Andrew: your bug is valid, but your insistence that you know Mozilla's way of doing things better than the long-term Mozillians who are commenting in this bug is strongly working against your goal of getting it fixed.

I would also suggest that being rude to people, or at the very least brusque, and attributing bad motives to them, is not normally a good way to get them to do what you want.

I don't want to ban you from Bugzilla, but the sanity of our long-term contributors has a high value placed upon it and I will take steps to protect it if necessary. So please, please, adjust your approach so we can all work together on solving this.

Gerv
Comment 121 Andrew 2012-09-09 07:01:11 PDT
(Gervase comment #120)
> your bug is valid

It is your bug. Maybe even it is you, who made it.
It is very, very old. - "Long-term", as you say.

People lose data every day.

Their sanity does matter.
Yours - does not. You can always go elsewhere for peace of mind.

Commit the patch now.
And post warnings now.

If you fail to do that, I will publish the rest of my reply to your rant. I will include it in my attachment named "Description ...", section "Dereliction of duties".
Comment 122 Hiroyuki Ikezoe (:hiro) 2012-09-10 16:37:18 PDT
(In reply to Andrew from comment #121)
> (Gervase comment #120)
> > your bug is valid
> 
> It is your bug. Maybe even it is you, who made it.

As you may know, 'your bug' means the bug you opened. i.e. bug 674742 itself.
Comment 123 :aceman 2012-09-11 07:26:05 PDT
Andrew, I think you should split the patch and separate the code changes from the webpage changes. I don't think brendan will give you sr+ on a patch like this.
Please split it and request review from rkent or irving again.
But first email rkent (as he requested in comment 79) if he can look at it and actually has the needed authority to give you r+.
Request review from brendan only on the html page part.
Comment 124 Andrew 2012-09-11 07:51:15 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

- Commit the patch now. Mine, or Jim's are equivalent with respect to code of mail at least (comment #114).

- Post warnings now. I invited a peer reviewer for that.

Hiroyuki, and others, please do not post useless comments.
No duplicates, no distraction.
Comment 125 Robert Kaiser 2012-09-11 08:13:50 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

I cannot review anything here, not sure why you included me. If it was for the SeaMonkey website stuff, Jens Hatlak (InvisibleSmiley) is the person to contact there. And you'll need an actual reviewer for the actual code here (also not me, as I don't understand C++ code).
Comment 126 Mark Banner (:standard8, limited time in Dec) 2012-09-11 08:20:51 PDT
Hi everyone, thanks for the inputs so far. I am going to be moving this bug forward to getting a patch landed. I am, however, in travelling and engaged on other activities for the next few days, so I may not be as responsive as I would like to be.

Therefore, please allow me a few days to analyse this fully. We have plenty of time to get something in for the next releases, so I will try and ensure that happens.
Comment 127 Andrew 2012-09-11 08:30:54 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

(:aceman comment #123)
> I think you should split the patch

I already asked this many times: why split?
Nobody explained. Yet everyone asks me to ask questions.
I would split it, only if my patch does not compile, or does not work otherwise.
See comment #111 for example.

Approval of the patch would not mean the approval of the entire patch (all modules). Each approval would only mean its part. - The one, which someone reviews. Other parts would have to be approved by others.
There would be several approvals.

> rkent or irving again

Kent has already in part agreed with me.
See comment #79. He only failed to approve the patch (or its part).
I invite him again to complete his job.

> But first email rkent (as he requested in comment 79)

No. There he only asked to email him on matters relating to responsibility, not on technical matters.
I describe responsibility of Bugzillians in section "Dereliction of duties", in my attachment named "Description ...".

> Request review from brendan only on the html page part.

I invite him, because the patch spans several parts. Specifically as a super-reviewer. I also want an assessment of the mentioned responsibility of Bugzillians (meritocracy).

But at least the warnings should be on the web already a year ago.
Do it now.
Do bureaucracy later. Why not bypass it now?
Your web sites suggest that you may edit pages either way: via bug reports, and directly.
I only suggest what is in your rules already.

(Robert Comment #125)
You are listed here as a peer:
https://wiki.mozilla.org/Modules/SeaMonkey#Web_pages

You are also the SeaMonkey Council member.
So, you should be obliged to judge the mentioned meritocracy (bureaucracy, dereliction of duties).

Jens has already failed. I did invite him privately several times.
I also invited Justin privately. He ignored me too.
But I do invite Justin again, as a peer. Jens is marked as an owner. Peers come first.

I explained that in email to you, Robert. And not once. I wrote you back in 2011. You refused to warn users.
Comment 128 Kent James (:rkent) 2012-09-11 15:20:05 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

There has been much discussion of the plans for this bug. Standard8 (who outranks me in the hierarchy) has decided to take control of this, so I an deferring any reviews to him.
Comment 129 Justin Wood (:Callek) 2012-09-11 15:28:52 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

I'm not qualified for the mailnews code, for the seamonkey website change:

(a) it needs a different patch/bug
(b) I veto any warning on the front-page in the news section.
(c) I'd rather defer to Jens (InvisibleSmiley) on that review.
Comment 130 Gervase Markham [:gerv] 2012-09-12 07:56:07 PDT
I think it is highly, highly unlikely that we will be posting data loss warnings on any Mozilla website. We have a number of possible responses to a dataloss bug, up to and including a firedrill release. But a website warning, without an updated version available, does very little constructive.

Gerv
Comment 131 Andrew 2012-09-12 16:33:06 PDT
Such a warning already shows on SM for many releases for another defect (although not filed as a bug).

I suggest to post a warning on the first page, rather than on release notes of a certain version, because the bug has been there forever. It affects all releases, no just one, and still does.

Its function is very constructive. It allows a user to mitigate, or eliminate danger. This is obvious. How can you deny that?

Your argument that a warning must accompany a patched version is not valid. The purpose is to warn about existing danger, not about past danger.

You suggest that a user does not need to know. This is nonsense.
There's no justification for the delay in publishing the warning.

When you give a dangerous thing to someone, you must always warn.
If you keep danger a secret, this makes your conscience dirty, and may indicate bad motives.
I am not even speaking about sanity of a user in face of disaster.

> Firedrill release

To me this bug seems like a "firewall" (against action, and against bad publicity). Over a year nobody bothered. Not even now, when I showed the bad code.

Data loss bugs must be processed in reverse order:
patch first, later discuss and test endlessly.
That's what I would consider as a "Firedrill".
The code is quite obvious. Compile it immediately. I do not see any cause for bureaucracy, and delay.
Comment 132 Andrew 2012-09-17 18:12:34 PDT
Created attachment 661991 [details]
Description of the 2 bugs. (Formerly comment #23.)

Added: Human form of bug report ("Summary"). Seems like a human virus.
Extended list of culprits ("Who failed").
Tips on Recovery of mail.
Comment 133 Thomas D. (needinfo?me) 2012-09-18 00:32:42 PDT
(In reply to Gervase Markham [:gerv] from comment #130)
> I think it is highly, highly unlikely that we will be posting data loss
> warnings on any Mozilla website. We have a number of possible responses to a
> dataloss bug, up to and including a firedrill release. But a website
> warning, without an updated version available, does very little constructive.
> Gerv

Fwiw, regardless of the particulars of this bug, I find that statement rather worrying in general. What happened to Mozilla's attitude of transparency? I'd think that "Known issues" of major dataloss should be made public until they are fixed (unless doing so adds a security risk of exploitation), so that users are informed, can take precautions as necessary or opt for another product. It's obvious not all bugs can be fixed immediately, and we can't expect coders to be super-human and error-free. But we should put our users and the integrity of their data first. So imho, with or without an updated version, having a warning somewhere would do a lot constructive (I would not expect a banner on the front page, but a note under known issues). If publishing the warning would be too painful for Mozilla, it might be a good indication that the problem needs more attention along those firedrill lines.

So while Andrew's style of communication cannot be approved of at all, and is certainly a social problem which he needs to address, perhaps there's a point somewhere.

Notwithstanding, against Andrew's black & white blame games, of course there might be other real-world factors to consider (as we don't live in an ideal world...):

1) If Mozilla were to warn against all their software bugs while commercial competitors never file such warnings, it would create unjustified disadvantages of public perception for Mozilla.
2) If as a result of too much transparency about Mozilla bugs that only occur under certain conditions, Mozilla products lose market share, it means that proprietary closed-source commercial competitor products with a lot more security and dataloss bugs might gain market share.

The net result of that might not make the world better. And nobody knows about the real risks because it is closed-source...
Comment 134 Andrew 2012-09-18 05:25:08 PDT
Microsoft is full of bug reports. Some of them are exploited at the time of submission. Most - are not.
Microsoft is not afraid of making bugs public. They are not afraid of exploitation.
Most are certainly exploited afterwards. But then a user, who failed to update, is to blame.

Mozilla has far fewer bug reports. That's really worrying.
Worst is that Mozilla does not warn the public.

What's the point of the abstinence, if the bug report is already here anyway?
You achieve the opposite: anyone is free to exploit the bug.
But innocent users are denied protection.

Tom, and others. If you can, please help me make Mozilla change, and demote the culprits of this cover up, and of the bad code.
Instead, promote good people into decision making.

Please do not rant about my style. My reply to you about that is in attachment named "Description ...", section "Dereliction of duties".
Rude is not me, but the people who deliberately fail to act, and cover up.
They want fame without blame.
Do good. I won't be "rude".
Comment 135 Jim Mathies [:jimm] 2012-09-18 05:46:35 PDT
(In reply to Thomas D. from comment #133)
> Fwiw, regardless of the particulars of this bug, I find that statement
> rather worrying in general. What happened to Mozilla's attitude of
> transparency? I'd think that "Known issues" of major dataloss should be made
> public until they are fixed (unless doing so adds a security risk of
> exploitation), so that users are informed, can take precautions as necessary
> or opt for another product. 

This bug is 100% open to the public, and there are various sources that link back to it. I think what you are really questioning here is our communication stance vs. our transparency. As far as communication of issues goes, we weigh our response based on the severity of the issue. We do this a lot. It's not an easy job, sometimes we don't make the right call. I do however feel that Mozilla's record of responding to and communicating issues is very good and can stand on it's own under scrutiny.

Which brings me to an important point, the core issue with compaction data loss is better described by bug 498814. If you wrap everything in this bug together (data loss in TB via compaction + nsILocalFileWin's incompatibility with 2K + running TB on 2K) technically this bug is invalid and should be closed as such.

We do have a corner-case data loss issue that can be triggered through unique circumstances which we need to fix. But IMHO it's not an issue we need to stand on the top of a mountain and scream "fire" over. This bug, it's dependents, and various support related pages and sites that detail the issue are currently acting as our notice to the users of Thunderbird.
Comment 136 Jim Mathies [:jimm] 2012-09-18 06:03:41 PDT
Also just to clarify the status here in case people are curious - Standard8 took this bug comment in 126. He will sift through the various fixes posted on various bugs to figure out which fix is appropriate. We'll then land it before the next train leaves, which is on 2012-10-08. Assuming there are no issues we'll likely nominate and land the same fix on release branches if it's deemed appropriate by drivers, which will get the fix out to users sooner.
Comment 137 Justin Wood (:Callek) 2012-09-20 16:13:18 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

clearing review from jens, as I said elsewhere the SeaMonkey Website change needs *its own bug/patch*
Comment 138 Andrew 2012-09-21 22:01:57 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

(Justin comment #129)
> I veto any warning on the front-page in the news section.

I veto your veto.
You have already failed users.
A warning must be out there. You failed to post it at least somewhere.
It does not matter why, and how.

You also failed to read this bug report. Here I explain why it must be one.
You never explained why split.
You never provide any arguments.
Learn to argue. So far you only learnt to do nothing, and to fail.

The rest of explanation of my veto to you is in attachment, named "Description", section "Who failed".

You keep up obstruction of publication of the warning.
You veto it.
You constantly remove reviewers.

This time I did not ask for your review. You have already failed.
I asked for Jens. So, I rightfully restore the invitation for him. I give him one more chance to fail. He has already failed privately. Let him fail publicly too.

A few more arguments in favour of a warning I give below in my reply to Jim.

(Jim comment #135)
You wrote a pack of lies and demagoguery.
You try to make an impression as if there are a lot of warnings on every site.
No. The main sites of TB, and SM are free of the warning.

You attempt to portray this bug as a minor issue, not worthy the attention of users.
But the site of SM contradicts you. As I wrote in comment #131, it does show warnings for far lesser bugs, and for one bug of data loss.
You never argued against those.
You only are very insistent against a warning about this particular bug.

The only person, who posted any warning to users was me. That was on some support site, which is linked at the top of this bug report (as "URL").
Nobody of Mozilla, and Bugzilla here ever cared.

They did the opposite: guys of Mozilla keep deleting my messages there. They keep distracting users by calling my reports "inaccurate", "misleading". They try to discredit me. They urge users not to read this bug report.

So, it is not "open" to all. Nothing like what you suggest.

Quite a few guys, who participate, or just watch this bug report here (Wayne, Robert T., cameleon), also watch reports of incidents there.
But none of them have warned any user. They silently let users suffer.

Do you suggest that posting a warning there is a good thing?
Then they have failed to do a good thing.

Or do you suggest that a warning there, or anywhere is not necessary?
Then you yourself are not necessary to Mozilla.

Both of you, and all the rest keep up the cover up of this bug.
You split it, and divert attention.
You try to present another bug 498814 report as a "better explanation" of this bug. Although that bug is completely different.
I do not know if you do realize the difference. Maybe you did not even read that bug report. To you it does not matter, as long as it helps to conceal this bug.

You will keep pouring such useless comments here, making this bug report longer and longer.
Then you will complain that it is long. So, you may justify diversion of readers to mockups.

The rest of my response to you will be in the same attachment, named "Description", section "Who failed".
The section "Dereliction of duties" already contains some insights into bad ways of Mozilla, including you.
Comment 139 Mark Banner (:standard8, limited time in Dec) 2012-09-26 08:27:43 PDT
Created attachment 664987 [details] [diff] [review]
Fix compaction of folders

This is an extended version of the patch, it adds in further checks and hardening so that if anything fails, we do our absolute best to put everything back as it was, or leave it untouched.

This patch does not:

- Change SeaMonkey's website. As has been mentioned many times already, a separate bug is required for that, as is standard process in Mozilla. It is not going to be covered by this bug.

- Change xpcom/io. This was a separate, but related issue, that has been wontfixed as it affects Windows 2k which we do not run on (we also have blocks to stop it starting up). Again, any other changes to xpcom will not be covered by this bug, and require a separate bug under the Core product in an appropriate XPCOM component.
Comment 140 Mark Banner (:standard8, limited time in Dec) 2012-09-26 08:29:00 PDT
*** Bug 789754 has been marked as a duplicate of this bug. ***
Comment 141 Kent James (:rkent) 2012-09-27 07:53:00 PDT
Comment on attachment 664987 [details] [diff] [review]
Fix compaction of folders

Review of attachment 664987 [details] [diff] [review]:
-----------------------------------------------------------------

I'm afraid that I see one little tweak that this code needs. r=me with this one little tweak.

::: mailnews/base/src/nsMsgFolderCompactor.cpp
@@ +437,5 @@
> +    // We don't delete it yet, as we want to keep the files in sync.
> +    nsCOMPtr<nsIFile> tempSummaryFile;
> +    rv = oldSummaryFile->Clone(getter_AddRefs(tempSummaryFile));
> +    if (NS_SUCCEEDED(rv))
> +      rv = tempSummaryFile->CreateUnique(nsIFile::NORMAL_FILE_TYPE, 0600);

If this fails, then tempSummaryFile will point to the actual db name, and then we will possibly delete that later during cleanup. Prevent this by adding after this line:

else tempSummaryFile = nullptr;
Comment 142 Andrew 2012-09-27 09:18:23 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

My patch is much simpler, better, and much more efficient.

It leaves the compact copy for the user to work with.
On failure the user would have two folders:
"original" - not compact.
"original-1" - compact.
A user then may delete any by hand, or keep both.

Instead Mark wants to delete the compact copy, so a user is stuck if failure persists. A user would not be able to produce a compact folder in any way. This is wrong.

The "ui‑review" is necessary for this user experience.
When Mark removed my patch, he also removed this reviewer. Users do not matter to him.

The patch by Mark tries to back the file up before overwriting. There's no need for that, because the OS does so quite well. If the OS, or the hardware is defective for some reason, then any such operation would be defective anyway. More manipulation might destroy even more data.

My patch also contains 2 more patches:
- File handling. Reverts the unnecessary 2 previous patches of the OS - it is not just a fix for W2K. You may revert (discard) patches for bug 199473, and bug 545650 for any version of Windows. Nobody proved they are necessary.
- The warning on web sites. You do not want to post it, but still must.

Nobody answered why split my patch. Does its compilation fail?

I asked for a review by Mark. But I did not ask him to remove my patch instead, or to interfere with other requests for reviews.
He failed to provide a review. I will replace him with another reviewer later.

So, I restore the patch (destruction of mail, renaming of file, warning on web sites).
And I restore the reviewers.
Comment 143 Jim Mathies [:jimm] 2012-09-27 09:26:38 PDT
(In reply to Andrew from comment #142)
> Comment on attachment 659504 [details] [diff] [review]
> - File handling. Reverts the unnecessary 2 previous patches of the OS - it
> is not just a fix for W2K. You may revert (discard) patches for bug 199473,
> and bug 545650 for any version of Windows. Nobody proved they are necessary.

This is simply not true, both patches were tested and deemed necessary. If you would like to discuss further please file a follow up bug cc myself, or email me directly. I'd be happy to discuss the reasoning, sample code, etc. But I would like to request we avoid this bug, which has grown too far out of scope.
Comment 144 Jens Hatlak (:InvisibleSmiley) 2012-09-27 12:02:20 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

You have been told this already, but I'll tell you one final time: We *require* separate bugs for website changes. This is non-negotiable, so please stop requesting review from me on this bug and/or this patch. Thanks.
Comment 145 Mark Banner (:standard8, limited time in Dec) 2012-09-27 13:10:30 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

Andrew, 

> Instead Mark wants to delete the compact copy, so a user is stuck if failure
> persists. A user would not be able to produce a compact folder in any way.
> This is wrong.

It is not acceptable to leave copies of files in a user's profile unless we really cannot avoid it. Users are not necessarily going to detect this failure, if it does occur, and they won't know what to do with the files either.

The biggest, and most likely risk here, is that the user will then end up with a disk full of attempted compacts, and they won't know what to do with all the files.

As we cannot complete the compact, by far the best thing is to leave things in the state prior to attempting the compact.

> The patch by Mark tries to back the file up before overwriting. There's no
> need for that, because the OS does so quite well. If the OS, or the hardware
> is defective for some reason, then any such operation would be defective
> anyway. More manipulation might destroy even more data.

I intentionally went for the safer side. As you have pointed out, this is a situation that can cause dataloss, so therefore I think being safer is the best thing. Even if the OS has its own checks, something in-between or in our code could fail, so we should handle that acceptably as well.

> Nobody answered why split my patch. Does its compilation fail?

Please read comment 139. I have explained it there already.

> I asked for a review by Mark. But I did not ask him to remove my patch
> instead, or to interfere with other requests for reviews.
> He failed to provide a review. I will replace him with another reviewer
> later.

I'm sorry I obsoleted your patch, that's a practice we have for some things, but wasn't appropriate here. I will mark it as reviewed denied, for the reasons stated above - it is not acceptable to leave unused/untracked data in a user's profile.
Comment 146 Mark Banner (:standard8, limited time in Dec) 2012-10-01 13:37:24 PDT
Checked in:

https://hg.mozilla.org/comm-central/rev/7d385acf54e4
Comment 147 :aceman 2012-10-07 07:23:05 PDT
A reason for reopening would be in order.
Comment 148 rsx11m 2012-10-07 07:59:29 PDT
Andrew, if the only reason for reopening this bug is that you don't agree with the patch that was eventually committed, that's not an acceptable reason. If you have any indication in a current nightly build that the patch does not work as intended and described in comment #139, please state so with steps to reproduce.
Comment 149 Andrew 2012-10-07 14:11:02 PDT
As I instructed you many times:
- Keep spam out. 
- Look at code.
- Follow your own rules. Don't force them on others.
- Read the bug report.

This is not negotiable.
I wish you followed those instructions with the same vigour as you follow, and attack any of my actions.
Why is all the work on me? And all the fun with span - on you.

For the reasons above you cannot commit this patch.
One more reason: it introduces a few more bugs. Some are critical.
- Gross incompetence on all sides.

Later I may submit my explanation.
Or do it by yourself, if you are really capable of anything else but spam.
By that time I suggest you commit my patch, or any part of it urgently.
Argue later.
Comment 150 Andrew 2012-10-07 16:06:08 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

Every reviewer has failed to provide a review.
Some have rudely and audaciously removed invitations for others.
I invite new reviewers.

Philip replaces lazy Jens, and Justin for web (warnings, and workarounds).
MN I invite for file handling.
AS and Neil replace the lazy Kent, Jim, and Mark for mail.
BW - will evaluate user experience about "original-1" (very clear to users on failure) versus "nstmp" (strange name).

Anyone else is welcome to raise the issue of responsibility of culprits, and lack of competence. These guys, and some more, like Roland Tanglao, Wayne, cameleon, Matt.
Roland is particularly vicious: deleted over 20 warnings to users. In an earlier comment #135 I called him Robert T. by mistake.
Investigate:
- this entire bug report;
- their malicious activity on the support site;
- and my attachment named "Description ...", section "Dereliction of duties".

This will be a task also for a super-reviewer, in additional to a review of all the parts of the patch. He would review this "meritocracy" (bureaucracy, mediocracy, betrayal of users).

I do not restrict reviews to parts.
Everyone may review the whole patch, or any of its 4 parts (mail, file, web, experience).
I describe the purpose of reviews in comments: #119, #127, and others. Read the whole bug report.
Comment 151 neil@parkwaycc.co.uk 2012-10-07 17:16:44 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

As this is your first patch, I did at least read through it. However in future I would appreciate patches that actually apply and compile. For instance, a patch should only apply to a single version control repository.

>-    bool summaryFileExists;
>-    // remove the old folder and database
>-    rv = summaryFile->Remove(false);
>-    summaryFile->Exists(&summaryFileExists);
>-    if (NS_SUCCEEDED(rv) && !summaryFileExists)
>-    {
>-      bool folderPathExists;
>-      rv = folderPath->Remove(false);
>-      folderPath->Exists(&folderPathExists);
>-      if (NS_SUCCEEDED(rv) && !folderPathExists)
>-      {
Certainly removing the old files seems unnecessary because MoveToNative already does that. I'm not 100% convinced about leaving behind the temporary files though.

>  m_file->InitWithFile(path);
...
>+  nsCString leafName;
>+  path->GetNativeLeafName(leafName);
>+  m_file->SetNativeLeafName(leafName);
No need to do this because InitWithFile(path) already copies the leaf name.

>diff --git a/mozilla/xpcom/io/nsLocalFileWin.cpp b/mozilla/xpcom/io/nsLocalFileWin.cpp
Changes outside of mailnews are outside the scope of this bug. If you feel that there is a deficiency in the nsLocalFileWin code then you should file a separate bug and request review from a suitable peer.

>Index: seamonkeyproject-org/src/index.en.html
(I didn't look at this because I can't help you with website updates.)
Comment 152 Justin Wood (:Callek) 2012-10-07 17:32:44 PDT
Comment on attachment 659504 [details] [diff] [review]
Patch destructive compaction, and file handling. Warnings on web sites.

Andrew, you have been told many times here the reasons why we do things the way we do, you have been told many times not to reopen this bug, and that you need[ed] seperate patches.

Please do not reopen this bug without an indication that the patch as-landed, doesn't do what it claims to do.

Please do not re-request review from *anyone* on this attachment.

Please do not treat us as not caring about our users.

Please DO read https://bugzilla.mozilla.org/page.cgi?id=etiquette.html.

Failure to follow the above statements could result in me getting you banned from Bugilla. -- Thank you.
Comment 153 Andrew 2012-10-08 03:41:43 PDT
Created attachment 669090 [details]
Description of the 2 bugs. (Formerly comment #23.)

(Justin comment #152)
> Please do not reopen this bug without an indication that the patch
> as-landed, doesn't do what it claims to do.

The patch by Mark is one more bug.
If you are incompetent in making of code, why are you writing here?
If you are so confident in that patch, then you must be prepared to resign from Mozilla, when I prove you wrong. But I am sure you do not have courage for that.

You have been told many times to look at code.
Instead you keep looking at me.

> do not re-request review from *anyone* on this attachment.

What rule prohibits this?
Rules encourage requests for reviews. They do not allow you to take away an invitation for others.

> do not treat us as not caring about our users.

Of course you want to portray me as a lone fighter.
But many people have already expressed their disgust at Mozilla over this matter. They did so both here, and on the support site.
If you really cared, you would reply to at least one of them here in this bug report. His name is Thomas.
Please DO read from him:
comment #133

Failure to follow the above statements could result in you, Justin, getting banned.

Now to the same effect read also my comment #150 again.
Thank you for nothing.
Comment 154 Gervase Markham [:gerv] 2012-10-08 06:59:16 PDT
I don't know how close others are to the ends of their tethers, but I've reached mine. Andrew's account has been disabled.

Gerv
Comment 155 Mark Banner (:standard8, limited time in Dec) 2012-10-10 04:25:05 PDT
Comment on attachment 664987 [details] [diff] [review]
Fix compaction of folders

[Triage Comment]
I discussed this with Kent soon after it landed, and we agreed we wanted it for TB 17 after a bit of baking. It has been baking for a week or so now, so taking into 17 ready for the first beta.
Comment 156 Mark Banner (:standard8, limited time in Dec) 2012-10-10 04:31:18 PDT
https://hg.mozilla.org/releases/comm-beta/rev/f521dad4966d
Comment 157 Wayne Mery (:wsmwk, NI for questions) 2013-11-13 14:58:54 PST
*** Bug 777473 has been marked as a duplicate of this bug. ***
Comment 158 WADA 2015-03-16 23:25:54 PDT
*** Bug 625953 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.