Open Bug 1717113 Opened 3 years ago Updated 1 month ago

Mork (.msf) file not immediately sync'd to disk when copy-filter rule, which affects biff message counts

Categories

(MailNews Core :: Database, defect)

defect

Tracking

(Not tracked)

UNCONFIRMED

People

(Reporter: pablo, Unassigned)

References

Details

Attachments

(5 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36

Steps to reproduce:

::: Environment/Set up :::
o POP account - non-gmail / check once a minute
o Create a shell script to email four or more simple messages to the test account:

echo foo | mailx ...

o Create a folder in the account named "Received"
o Create a copy-all filter rule to copy all email to "Received"
o Run the shell script
o Manually (or wait) get all new messages
o tail -f .../Inbox.msf
o I wrote a wee monitoring script that md5sum Inbox.msf and makes copies of it when it detects a change - I'll attach the time-stamped files.

Actual results:

Periodically, more often than not, Inbox.msf is not fully flushed when a copy-all filter is active and more than four emails are sent in a burst.

I tried to replicate the issue using GMail account but could not. GMail seems to be noticeably laggier than my non-GMail account. Who knows what Google is doing.

The end result is that ^A2= (A2=numNewMsgs) lags. I am working on /nbiff/ that is a (new)biff, systray for Linux. It depends on ^A2= being up to date. :)

For those suffering from insomnia, more on nbiff is here - https://github.com/pablo-blueoakdb/nbiff

Expected results:

The Mork file should always be sync'd to disk on a change.

Should TB unexpectedly abort, it'll eliminate re-receiving emails. I know because I tried to send TB some signals to see if I could force it to flush its .msf to disk via a signal. :) I ended up kill'ing it a few times.

Doesn't mork call |fdatasync()| at key places?
I don't see a direct call to it if I am not mistaken.

|fsync| is not called directly either.
There is a direct call gloda related javascript file, but that is all.

Of course, we may be calling fdatasync or fsync through other codes (most likely in mozilla portion of code.)

Hi Chiaki,

Thank you for investigating so far.

Would it be helpful if I ran an strace on the parent and all the children?

(In reply to Pablo from comment #7)

Hi Chiaki,

Thank you for investigating so far.

Would it be helpful if I ran an strace on the parent and all the children?

It certainly would be helpful.
At this moment, I am not sure who can fix mork code, though.

In any case, the lack of fdatasync or fsync at key places should be a concern to many parties
and so your trace that shows the lack of such calls would alert more parties, I think.

Also, my checking was done very briefly using searchfox and so may not be complete.
We will figure that out once your trace shows the lack of fdatasync or fsync.

But sqlite3 code certainly calls fdatasync, etc. if available.
I thought there was a transition movement from mork to sqlite3. But maybe I was wrong.

(In reply to ISHIKAWA, Chiaki from comment #8)

I thought there was a transition movement from mork to sqlite3. But maybe I was wrong.

not yet

(In reply to ISHIKAWA, Chiaki from comment #8)

(In reply to Pablo from comment #7)

Would it be helpful if I ran an strace on the parent and all the children?

It certainly would be helpful.

I attached to a running thunderbird process (and its children) and captured two sets of results. I've placed them in two sub-directories:

  • delay-sync/
  • immediate-sync/

Top-level in the tar-ball is timing. This is a very rough time notation of when I've done tasks (e.g. Get new email, Click on Inbox, etc.). The idea is to provide an index into the timed strace files. They're voluminous! :p

I don't think sync()s are being used. Using the timing file, I couldn't find any such calls around the forced sync:

pablo@oreo:/usr2/tmp/trace/delay-sync
└─▬ $ fgrep 10:04:4[0-5] * | fgrep -i sync

At this moment, I am not sure who can fix mork code, though.

As a database person, and applying the same principals of ACID (Durability in particular), I wonder if we could open() the .msf files with O_DSYNC.

I'm also thinking that I may need to get a special version of thunderbird. The goal is to have some level-user debug switches to create a human-readable log to post.

Attached file trace.tar.bz2
Component: Untriaged → Database
Product: Thunderbird → MailNews Core

Pablo,
Still seeing this with version 91?

Flags: needinfo?(pablo)
Summary: Mork (.msf) file not immediately sync'd to disk when copy-filter rule → Mork (.msf) file not immediately sync'd to disk when copy-filter rule, which affects biff message counts

Hi Wayne,
Unfortunately, I still seem the problem on version 91.

I ran the unit test below with version 91.3.2 (64-bit) and I still see the problem. I've redacted some of the information:

for i in $(seq 1 6) ; do echo hey | mailx -r devnull@wnXXX -s "[WN] test $i" pablo@wnXXX; done
Flags: needinfo?(pablo)

If I understand bug 418551 correctly, MORK has been removed for Thunderbird 93+, and replaced with jsoncpp.

See Also: → 418551

(In reply to Thomas D. (:thomas8) from comment #14)

If I understand bug 418551 correctly, MORK has been removed for Thunderbird 93+, and replaced with jsoncpp.

Spoke too soon. Looks like the real deal for messages is Bug 11050 / Meta bug 1572000. Overall de-mork is Meta Bug 453975.

See Also: 4185511572000
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: