Closed Bug 472446 Opened 15 years ago Closed 15 years ago

Unrecoverable corruption if "Rebuild Index" function used on unfocused folder with corrupt .msf file

Categories

(MailNews Core :: Database, defect)

x86
Windows XP
defect
Not set
critical

Tracking

(Not tracked)

VERIFIED FIXED
Thunderbird 3.0b2

People

(Reporter: fehe, Assigned: rkent)

References

Details

(Keywords: dataloss, qawanted)

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9) Gecko/2008052906 Firefox/3.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b3pre) Gecko/20090106 Lightning/1.0pre

There's a very nasty bug with the "Rebuild Index" function.  This might have been introduced by the fix for Bug 471307.  Not sure and can't check yet.

[WARNING: BACKUP YOUR PROFILE FIRST]

If I select a folder and then right-click and choose "Rebuild Index" on a different folder (one having a corrupt index file), it looks like the operation succeeded; however, when I then switch to the folder I just performed the
rebuild operation on, there are no messages listed.  Everything for that folder is blank.  Right-clicking the folder and choosing Properties..., shows "Default Character Encoding" has been reset to "Arabic (IBM-864)" and the
following appears in Error Console:

Error: Component returned failure code: 0x80550005 [nsIMsgFolder.charset] =
<unknown>
Source file: chrome://messenger/content/folderProps.js
Line: 237
 ----------
Error: Component returned failure code: 0x8000ffff (NS_ERROR_UNEXPECTED)
[nsIMsgDBView.hdrForFirstSelectedMessage]
Source file: chrome://messenger/content/mailWindowOverlay.js
Line: 1651

If I open the messages file, I see that my messages are still there, but
neither rerunning "Rebuild Index" nor deleting the .msf file and restarting Shredder will allow the messages to be displayed.  It's irrecoverably messed up.


Reproducible: Always

Steps to Reproduce:
1. [WARNING: BACKUP YOUR PROFILE FIRST]
2. Locate a folder with corrupted .msf index file.  Sending email can generated one for you.  See Bug 471682
3. Select a folder other than the one with the corrupt .msf file
4. Right-click the now *non-selected* folder with corrupt .msf file choose "Properties... --> General Information --> Rebuild Index".  Click OK.
5. Select the folder that had the corrupt .msf file.
6. Notice that everything is blank now.
7. Notice the error messages in Error Console
8. At this point neither running "Rebuild Index" again nor deleting the corrupt .msf file and restarting will bring the messages back in view--even thought the messages themselves still exist.
Blocks: 437886, 471307
Component: General → Database
Flags: blocking-thunderbird3?
Keywords: mail4
Product: Thunderbird → MailNews Core
QA Contact: general → database
QA Contact: database → general
QA Contact: general → database
Again, opening folder and having missing mail is fixed in Bug 471130

Also similiar to as documented in some of my other bugs I will try to fix here.

The problem is this js is calling getMsgDatabase(msgWindow). This returns null db on error.  It needs to to get updatefolder to work.  js just needs to to a try and test like commandglue.js does.

It would be nice if someone can grep all those getMsgDatabase(msgWindow) and make sure they play nice.
This is in widgetglue.js in tb2 and same problem. But it has fix per my ref bug.
@Phil:  Are you absolutely sure Bug 471130 will fix this?  Why does deleting the .msf file no longer work once this occurs?  Shredder is creating a new .msf file but it's not listing the messages.  Where is the information stored?  Bug 471130 seems to deal with the case where the .msf file exists and is corrupt but Shredder does not recreate it. 

Another thing I've just noticed: While this bug is in effect, if Shredder processes new content in the affected folder (e.g. you have a "Sent" folder that should display hundreds of email but is presently completely blank, and then you send an email), all other messages (existing but invisible) are wiped and the messages file now contains nothing but the newly processed item(s).  This is a serious DATA LOSS issue.

If you're sure, let me know and I'll dupe to Bug 471130
Keywords: dataloss
Version: unspecified → Trunk
bug fixes summary file missing or out of date when you left click the folder to open it. 

Rt clicking an unopened folder and rebuild bypasses the fix but has the same problem, so I will investigate and try to fix the js. This is not dupe yet but can be helped partially by 471130.

You may want to try again and be 100% sure that when you instigate this problem, *close* shredder, delete the *.msf and restart and left click to open folder.  I didn't think that was an issue but I may be wrong.

Also you can't dupe yet because sent folder is a special folder and may have other issues that need looking.  Hopefully with all these pings, they will check in my 471130 so you can see what it does here.
(In reply to comment #3)
> You may want to try again and be 100% sure that when you instigate this
> problem, *close* shredder, delete the *.msf and restart and left click to open
> folder.  I didn't think that was an issue but I may be wrong.

As detailed, that does not work in this case.  The .msf file is recreated but no longer works with the message file.  Everything remains blank.
 
> Also you can't dupe yet because sent folder is a special folder and may have
> other issues that need looking.  Hopefully with all these pings, they will
> check in my 471130 so you can see what it does here.

This bug affects all folders.  It's happened already on my "Sent" and "Trash" folders.
ok, still don't dupe because the other bug doesn't fix the index rebuild.  We'll do it here.
checks for null db like commandglue.js does. Enables it to get to updatefolder() to do a rebuild.
Attachment #355890 - Flags: review?
Comment on attachment 355890 [details] [diff] [review]
fixes rebuild of bad summary file

I'm not sure about this at all. I guess I'm going to have to dig into this...
This fixes the rebuild index. but we still got problems.
Doesn't fix the default char set thing.

I imported my OExpress files which do not have good .msf files (another problem
with threads)

I posted on mozilla.dev.tb and my sent file went from 623 total files (correct)
to 621 and it does not show the ng sent message.  The actual db file has the
message so the .msf is not fixing it.

I think my patch should be and committed and this bug cloned to fix more
problems or someway to get to the .msf problem.
This bug seems to keep coming up with multiple problems of the .msf.
@Phil:  Were you able to reproduce my steps in Comment #0 or is more info needed?

By the way, the default character set bug is pre bug 437886 but still would be nice if it could get fixed, as it happens occasionally.  Something obviously gets reset for some reason.
David,

There's some overlap with 471130.  There is bad .msf files not being fixed.  Can we do that here and let 471130 to the marshalling code for bad msf files?

This bug should depend on 471130 if that is the case
comment 7
David sorry for confusing things. I'm patching this with my 471130 patch in place so I'm not helping here until 471130 get solved.
You can forget about my patch here until we get 471130 fixed. But still there's some corruption of .msf files that need attention here and I haven't a clue where those are built. I see maybe the parse code.  If you can direct me to the code to look at for msf building I will work on this.
Iu: comment 9
No
Sorry, but my patches fix this particular problem and I can't reproduce your bug.  I can't reproduce any of my problems either at this time. I rebuilt my profile. and going back and forth with TB2 and TB3 they seem to be ok with the sent folder.  I really suspect I was looking at the 'send items' folder that gets created from oexpress mail import with local folder in place.  

I did an import without the local folders and 'sent items' converts to 'sent' special tb folder. And the problem seems to have gone.  So my reports here may have been operator error.
Again I'm speaking with my own patches installed which are not on trunk.
I was able to reproduce reporter's steps 1-7 (didn't try step 8).
I was able to reproduce this again with Phil's help. I got it to happen once by replacing Sent.msf with a non-mork file to REALLY corrupt it. Then I did a rebuild of Sent using a right click while a different folder was selected, being careful not to open the corrupt .msf file first by being quick so that the folder tooltip did not have time to fire.

When I did, I ended up with a file where I could not recover the Sent folder even by rebuilding. The reason for this is that all of the messages in the file now have a X-Mozilla-Status of 0009, meaning they were marked deleted.

I'll look at this again tomorrow.
Status: UNCONFIRMED → NEW
Ever confirmed: true
The reason that X-Mozilla-Status is being changed to 0009, is that the message retention settings are messed up when the folder properties dialog is loaded. You can see that in the Retention Policy tab of the Folder Properties dialog. When the reindexed folder is opened again, it sees "Always delete read messages" as set, and the retention code deletes all of the messages in the Sent folder (since they are marked as Read). Tracing throughthe code, I can see X_Mozilla-Status being set to 0009 in the message retention code.

I ran an older version of TB (circa 2007-12-3) and had the following: the folderProps.js error message mentioned in Comment 0 is still present, the retention settings in the folder properties dialog menu are still messed up, but the corrupt .msf file for the folder is deleted when the folderProps dialog is opened, and so the reindexed file is OK. (The reindexing failure only occurs when a corrupt file is present during reindexing, and does not happen when there is no .msf file present).

I also ran a patch that I have done, where I attempted to blindly restore behaviour of the DB opening routines to pre bug 437886 behaviour (as agreed with Bienvenu on IRC). With that patch, everything worked fine - there was no folderProps.js error, and the reindexing worked fine.

Unfortunately I largely failed in my patch to restore behaviour to pre bug 437886, as there was too much obscure and strange behaviour that I could not resist "fixing". I'm encouraged that my "fixed" approach helped in this bug, even when I did not know at the time I did the patch of the reason for this bug, but I need to think some more before I want to promote my patch as the right solution.

I'll post that patch, which should be considered preliminary, as reference. I also want to trace out why, exactly, current trunk fails with the reindex, but old trunk does not. Why does the old trunk delete the corrupt .msf file, but the new one leaves it? And why does the old trunk successfully reindex even when the retention settings are incorrect, while the new one does not?
In this patch, I convert openDBFolder to use a database status return code rather than error codes. This then allows the OUT_OF_DATE db to be returned in js (which was not previously possible).

There are more issues as well which I will discuss if I decide to promote this patch.
Assignee: nobody → kent
Status: NEW → ASSIGNED
Target Milestone: --- → Thunderbird 3.0b2
I tested this with TB 3.0 beta 1, and I did not get the irrecoverable loss of the data. In beta1, there is an additional error return that occurs when you try to rebuild the database, and because of that error the rebuild fails, as well as the OK button no longer works on the folderProps dialog - and it is during the response to OK that the bad retention property is saved that causes the message deletions.

Why is trunk different than beta1? In trunk openFolderDB we have:

  // Don't try to create the database yet--let the createNewDB call do that.
  rv = msgDB->Open(folderPath, PR_FALSE, aLeaveInvalidDB);
  if (NS_FAILED(rv) && rv != NS_MSG_ERROR_FOLDER_SUMMARY_OUT_OF_DATE)
    return rv;

That "if" fails, and the code falls through to save the value of the database in the folder. In beta1, the equivalent check succeeds, and the database is not saved. That is the start of the sequence of events that allows the messages to be deleted on trunk, but not on beta1.

So I could restore the old behaviour I guess - but that means making the reindex fail, and because it fails the effect of another bug is not seen. This is not pretty ...

I have a larger plan to try to prevent the database from blowing away metadata that is not stored in the message. I want to make sure that any major changes that I recommend are compatible with that requirement, and I'm not far enough along in that to know if this current patch is or is not compatible. Rather than do some big changes now followed by big changes a few weeks later, I think it would be better to try to just address the immediate symptoms of the current regressions along the lines of what Phil proposed. I'll look at his suggestions, and perhaps make my own as well.
nsMsgLocalMailFolder::UpdateFolder
That needs review. I may be missing something but I see this not reparsing the database if it is invalid  or out-of-date.
Unless of course like TB2 you don't send it a mDatabase if it is out-of-date.
I updated by debug trunk build on Jan 14, and the behaviour is now different compared to my previous builds, which were based on a Jan 8 version. When I get properties of a folder after corrupting the .msf file, I still see the problem of the messed up retention settings - but when I reindex the folder, it is not deleting the email in it. I'm not sure why that is, and I'm not going to pursue it. I'll testing a patch that simply tries to open the database at the beginning of the call to folderProps.js. and does an updateFolder if that fails. The point of this, Phil, is actually not to reparse the database, but to provide a working database so that all of the other folder properties that are part of folderProps.js are not messed up. But if the data loss is now gone, then this is a much less important bug, as the situation to cause it is fairly unusual.
In attachment 355890 [details] [diff] [review] I would get a hang at that point because there was no database, I believe was when I clicked index button.  Even if a database is always expected, should we still have a safety try at that spot before we start setting properties?
(In reply to comment #19)
> But if the data loss is now gone,
> then this is a much less important bug, as the situation to cause it is fairly
> unusual.

I hope you're right about that, because someone just reported completely losing emails.  Start reading from: http://forums.mozillazine.org/viewtopic.php?p=5493445#p5493445
I posted a response to that post.

I'll try to get my simple patch posted tonight.
To see what this does, you need to corrupt the database file, and make sure that you do not open the folder after you corrupt the file. You can corrupt the file by, for example, change the version from 1.4 to 1.5. Right click the folder, and select properties. Before the patch, the retention settings should be incorrect. After, they should be back at defaults. If you reindex with the corrupt retentions settings (Jan 8 or prior build at least for me) then you will see the original symptoms.
Attachment #356444 - Attachment is obsolete: true
Attachment #357315 - Flags: superreview?(bienvenu)
Attachment #357315 - Flags: review?(bienvenu)
(In reply to comment #22)
> I posted a response to that post.
> 
> I'll try to get my simple patch posted tonight.

I read your response to him, but I do not believe his messages are recoverable.  What he is experiencing is similar to what I reported in Comment #2.  Specifically, after doing "Rebuild Index" messes up the mbox file, the messages are recoverable only as long as no new messages are processed.  Once new messages are processed, the mbox file is completely wiped, a new .msf file is sucessfully created and the new messages replace whatever was there before.

Try it for yourself and see:

1. Backup your inbox then corrupt its .msf file.
2. With a different folder selected, run "Rebuild Index" on the inbox, to achieve the state where your inbox file contents are now marked for deletion.
3. Send an email to the account having the inbox in Step 1.
4. Check for new mail for that account.
5. Notice that the new mail is successfully received.  Your message is viewable
6. Now check the actual inbox file.  Notice that all previous messages are gone.
I can confirm the "Default Character Encoding" resetting to "Arabic (IBM-864)" when trying to rebuild indexes (https://bugzilla.mozilla.org/show_bug.cgi?id=472446#c0). 

Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1b3pre) Gecko/20090116 Shredder/3.0b2pre
(In reply to comment #25)
> I can confirm the "Default Character Encoding" resetting to "Arabic (IBM-864)"
> when trying to rebuild indexes
> (https://bugzilla.mozilla.org/show_bug.cgi?id=472446#c0). 
> 
> Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9.1b3pre) Gecko/20090116
> Shredder/3.0b2pre

That happens when the folder database does not exist when opening the folder properties. The proposed patch is supposed to reload the database if it fails on entering the folder properties dialog.
It has been suggestd that Bug 474091 is related to this, I'm not sure about that since I was seeing a different symptom of the corruption - i.e. i could see hte headers of the messages, but not the contents. 

- Mitra
I don't know if this is relevant , but the Apple Crash reports for the crash I see show failure in  _CFBundleGetLanguageSearchList + 18 

I'm not clear what that is - but notice the bug here is suggested as being a Character Encoding problem  - which suggests a relevance (or it could just be coincidence).
(In reply to comment #27)
> It has been suggestd that Bug 474091 is related to this, I'm not sure about
> that since I was seeing a different symptom of the corruption - i.e. i could
> see hte headers of the messages, but not the contents. 
> 

That would be a dupe of bug 471682.  Though the bug is most noticeable after sending email, other folders do also get affected.
Ok - that would explain the inability to read messages - and if I can get back into TB I can test by deleting the .msf files for corrupt folders. 

I'm still stuck now that neither unzipping my zipped Profile, nor restoring from TimeMachine produce a Profile that TB can even open on.
Flags: blocking-thunderbird3? → blocking-thunderbird3+
(In reply to comment #15)
> The reason that X-Mozilla-Status is being changed to 0009, is that the message
> retention settings are messed up when the folder properties dialog is loaded.
> You can see that in the Retention Policy tab of the Folder Properties dialog.
> When the reindexed folder is opened again, it sees "Always delete read
> messages" as set, and the retention code deletes all of the messages in the
> Sent folder (since they are marked as Read). Tracing throughthe code, I can see
> X_Mozilla-Status being set to 0009 in the message retention code.
> 
> I ran an older version of TB (circa 2007-12-3) and had the following: the
> folderProps.js error message mentioned in Comment 0 is still present, the
> retention settings in the folder properties dialog menu are still messed up,
> but the corrupt .msf file for the folder is deleted when the folderProps dialog
> is opened, and so the reindexed file is OK. (The reindexing failure only occurs
> when a corrupt file is present during reindexing, and does not happen when
> there is no .msf file present).
> 
> I also ran a patch that I have done, where I attempted to blindly restore
> behaviour of the DB opening routines to pre bug 437886 behaviour (as agreed
> with Bienvenu on IRC). With that patch, everything worked fine - there was no
> folderProps.js error, and the reindexing worked fine.

Does this mean there is a chance of recovering lost inboxes? Is this something people should wait for?

Thank you.
(In reply to comment #31)
> Does this mean there is a chance of recovering lost inboxes? Is this something
> people should wait for?
> 
> Thank you.

Only if you have not received new email since the error occurred.  If you receive new email, your inbox file will be completely overwritten with the new email.  Therefore, you have to immediately take Thunderbird offline and manually fix the inbox file (search/replacing X_Mozilla-Status from 0009 back to 0001).
Comment on attachment 357315 [details] [diff] [review]
load folder if db open fails

I'll r/sr this because it fixes a bad dataloss bug, but there are still issues - if I do this on a large folder, and then try to do a rebuild index, I get told  that the folder is locked. I also don't get any feedback that the update is going on, so I'm going to change it to updateFolder(window.arguments[0].msgWindow) so that the user gets some feedback...
Attachment #357315 - Flags: superreview?(bienvenu)
Attachment #357315 - Flags: superreview+
Attachment #357315 - Flags: review?(bienvenu)
Attachment #357315 - Flags: review+
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
for reference purposes, bug 437886 checked in 2008-12-19 and bug 471307 checked in 2008-12-29

do we have test(s) for rebuild index?
Flags: in-testsuite?
(In reply to comment #33)
> (From update of attachment 357315 [details] [diff] [review])
> ... but there are still issues

At first, I tried to do a patch that would allow the folder properties dialog to function with an invalid database so that we could get to the rebuild index function. But it was a lost cause, because folder properties is all about setting "folder properties" which are database values.

The cleanest fix would be one where the rebuild index function could be made available from some place other than the properties dialog. That could be separate XUL, or a separate error dialog that pops up if the database open fails. Any of these things would have UI impact.
(In reply to comment #35)
>
> do we have test(s) for rebuild index?

I am not aware of any. Also, this bug is really more about issues in the folder properties dialog, rather than issues with rebuild index itself.
The issue in Comment #0 no longer happens (Rebuild Index successfully rebuilds the corrupted .msf file produced by Bug 471682). Fix confirmed in today's build.

Thanks
Status: RESOLVED → VERIFIED
(In reply to comment #32)
> (In reply to comment #31)
> > Does this mean there is a chance of recovering lost inboxes? Is this something
> > people should wait for?
> > 
> > Thank you.
> 
> Only if you have not received new email since the error occurred.  If you
> receive new email, your inbox file will be completely overwritten with the new
> email.  Therefore, you have to immediately take Thunderbird offline and
> manually fix the inbox file (search/replacing X_Mozilla-Status from 0009 back
> to 0001).

I closed it immediately, so may still be OK. I will just shut off my Internet connection so nothing new comes in. How is this 009 to 001 thing done? Is there a writeup anywhere? Is there a "fixit" tool? Thank you.
(In reply to comment #39)
> I closed it immediately, so may still be OK. I will just shut off my Internet
> connection so nothing new comes in. How is this 009 to 001 thing done? Is there
> a writeup anywhere? Is there a "fixit" tool? Thank you.

To avoid ongoing bug spam, I have responded to your question here: http://forums.mozillazine.org/viewtopic.php?p=5531385#p5531385
Attachment #355890 - Flags: review?
You need to log in before you can comment on or make changes to this bug.