Last Comment Bug 371672 - assert in morkConfig, invalid mork db
: assert in morkConfig, invalid mork db
: fixed1.8.1.3, fixed1.8.1.4
Product: MailNews Core
Classification: Components
Component: Database (show other bugs)
: Trunk
: x86 Windows XP
-- normal (vote)
: ---
Assigned To: David :Bienvenu
Depends on:
  Show dependency treegraph
Reported: 2007-02-25 18:11 PST by David :Bienvenu
Modified: 2008-07-31 04:30 PDT (History)
3 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---

possible fix (1.44 KB, patch)
2007-02-25 18:12 PST, David :Bienvenu
no flags Details | Diff | Splinter Review
a different approach (2.38 KB, patch)
2007-02-27 11:08 PST, David :Bienvenu
no flags Details | Diff | Splinter Review
proposed fix (1.20 KB, patch)
2007-02-28 07:33 PST, David :Bienvenu
mscott: superreview+
dveditz: approval1.8.1.4+
Details | Diff | Splinter Review

Description User image David :Bienvenu 2007-02-25 18:11:05 PST
I've started to see an actual mork db corruption, caused by an actual mork bug - at least, I think it's a mork bug. The situation usually arises when we have a saved search on a folder, and we make a change to a message that results in the message no longer belonging to the saved search. We cut the message from the table corresponding to the saved search. Sometimes one or more of the cells in the row is dirty, so we write out the dirty cells. But, we don't write out the dirty atoms for the cells, and in some situations, the cell atoms are dirty. When we try to reload the db, it fails because the atoms for the row are not in any dictionary we've loaded. Most of the corruptions I've seen have to do with flag values that aren't in any dictionary.

Here's an example of a bad bit of db. This is a change transaction...

{189A611:^80 {(k^B1:c)(s=9u)} -

In particular, we're cutting the row 48784D from the saved search, in this snippet here:

{189A611:^80 {(k^B1:c)(s=9u)} -

But, the flags cell (88) is dirty, so we're writing out the dirty cell as well. But the cell value is an atom, AE0, that we've never written out in any dictionary. 

I've changed the code so that if we're writing a row, and some of the cells have  dirty atoms, we write out the dirty atoms in a dictionary before the row cell changes.

I'm not sure why this isn't usually more of a problem in mork - I suspect it's because the atoms usually end up in a dictionary long before the cut happens.

I'll attach a patch in a second. I'm still debugging this, to see if it fixes the problem, and indeed, to see if I ever hit the code I've added.
Comment 1 User image David :Bienvenu 2007-02-25 18:12:10 PST
Created attachment 256409 [details] [diff] [review]
possible fix
Comment 2 User image David :Bienvenu 2007-02-27 09:07:46 PST
Comment on attachment 256409 [details] [diff] [review]
possible fix

this doesn't work - the dictionary has to be written out earlier.
Comment 3 User image David :Bienvenu 2007-02-27 11:08:43 PST
Created attachment 256655 [details] [diff] [review]
a different approach

this might work better - my thinking is that we have rows in the change list that are not in the table, so their dirty atoms don't get written out in the normal table dict writing pass. This patch makes it so we also look for dirty atoms in the change list rows, and write them, if any.

The Tell calls and the assertion are just for me to see if we ever write out any data from the new code.

I left the previous change in this patch - it's an easy way to see if this code doesn't work, because I'll hit a breakpoint in that code.
Comment 4 User image David :Bienvenu 2007-02-28 07:33:46 PST
Created attachment 256790 [details] [diff] [review]
proposed fix

OK, this fix seems to work - I ran into a situation where the new code did write out some dirty items in the dictionary, and I was able to load the DB afterwards.

While it's scary to change Mork at this late date, this code only writes out data in relatively rare situations, and prior to this patch, in those situations, we'd have a corrupt db, so it's hard for this patch to make things worse. Plus, it seems to work.
Comment 5 User image David :Bienvenu 2007-02-28 08:32:48 PST
Comment on attachment 256790 [details] [diff] [review]
proposed fix

we need this for TB 2.0
Comment 6 User image Daniel Veditz [:dveditz] 2007-02-28 15:30:19 PST
Comment on attachment 256790 [details] [diff] [review]
proposed fix

approved for 1.8 branch, a=dveditz for drivers
Comment 7 User image David :Bienvenu 2007-02-28 15:35:10 PST
fixed on trunk and branch.

This one will be pretty difficult for QA to verify
Comment 8 User image Daniel Veditz [:dveditz] 2007-03-30 11:43:33 PDT
adding fixed keyword to match renamed approval flag
Comment 9 User image Marcia Knous [:marcia - use ni] 2007-06-11 12:19:31 PDT
David: Do you have a suggestion as to how we can verify this on the QA side?
Comment 10 User image David :Bienvenu 2007-06-11 12:33:37 PDT
Marcia, no - it's very hard to verify. I know I haven't seen any of the corruptions that I was seeing on a semi-regular basis.

Note You need to log in before you can comment on or make changes to this bug.