Closed Bug 1306914 Opened 8 years ago Closed 8 years ago

Collecting file seek position problem report: WriteLineToMailbox detected an unexpected file position change.

Categories

(MailNews Core :: Networking: POP, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1116055

People

(Reporter: ishikawa, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

+++ This bug was initially created as a clone of Bug #1116055 +++

I have spotted a very conspicuous offender of not using buffered
output where it can in comm-central thunderbird.

I think as long as a buffer of reasonable size, even if it is 4KB or
16KB, is used, I/O performance will  be much better and reasonably good.
(especially for a remote file system, or imap case.)
It does not have to be 128KB or much larger something, I think,
although it would help for a local disk write and read.

However, here I report that I have found a routine is copying message
data WITHOUT ANY BUFFERING at all when it WRITES the output LINE BY
LINE.

... the quote from the original bug continues at the end of this comment.

This bug entry was created to see if users may encounter possible strange seek position issues when unnecessary |Seek()| that had been issued for each line of message saving (POP3 case), thus negating any effort to use buffering for output, have been removed.

A set of patches has been created in bug 1116055 and friends that disables the extra |Seek()| calls. Unless we do this, buffering does not happen at all.
In the patch, it was finally decided only ONE |Seek()| per message saving is
absolutely necessary, but no more.

However, to make sure the code is correct on all platforms, the patch
inserts a probe to see if the implicit assumption about the file position is
correct.
Should a user experience a bug possibly related to the removing of |Seek()| issue, and see the message 
WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug THISBUGNUMBER,
the user is encouraged to report to this bugzilla entry.

TIA

The quote from the original bug continues here.

The culprit routine I found is

NS_IMETHODIMP nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream,
int32_t aLength)
in the file
comm-central/mailnews/local/src/nsLocalMailFolder.cpp

And the problematic unbuffered write in question is

http://mxr.mozilla.org/comm-central/source/mailnews/local/src/nsLocalMailFolder.cpp#2111

2111     rv = mCopyState->m_fileStream->Write(start, lineLength,
&bytesWritten);

This does not seem to use buffering at all.
(Please check the excerpt of system call trace
from strace under linux
attached in the following.).

I agree that, since this function seems to be in need of scanning the
data for "From" at the beginning, and escaping "From" at the beginning
of line, etc., it needs to SCAN the INPUT line by line, but there is
NO REASON AT ALL TO USE UNBUFFERED *OUTPUT*!

Repeat-by:

Monitor and record the system calls issued by TB.
I used strace under linux.
strace -o /tmp/t.out TB-BINARY

Start TB, and then copy a message with reasonably large binary
attachment in one folder to another folder.

I found the repetition of small write(fd, bug, len) with length
parameter 74 MANY times.

E.g.: Excerpt from a typical strace record.

I only excerpted the interesting lines.  There are many other system
calls.

The particular message I copied had an attachment of 1.2MB size.

The repeated writes() have the first parameter (file descriptor) as
54.  I searched backward in trace file and found that FD 54 is for the
target folder. (folder-C)

   ...

open("/home/ishikawa/.thunderbird/u1ku1i3y.default/Mail/Local
Folders/folder-C", O_RDWR|O_CREAT, 0664) = 54
   ...

Extraction of the initial part of the writes to this file descriptor:
(I think I created the message using HTML format mail. That is why you
see a few lines of HTML main message body. But that is not of any
concern. Concern is the long repetition of mime data that follows.)
...

write(54, "From - Sun Nov 22 11:35:01 2009\r"..., 33) = 33
clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0
clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0
write(54, "X-Mozilla-Status: 0001\r\n", 24) = 24
clock_gettime(CLOCK_MONOTONIC, {2572825, 299989570}) = 0
clock_gettime(CLOCK_MONOTONIC, {2572825, 300253008}) = 0
write(54, "X-Mozilla-Status2: 10000000\r\n", 29) = 29
      .... I omit clock_gettime() below ...
write(54, "X-Mozilla-Keys: $label3         "..., 97) = 97
write(54, "FCC: mailbox://nobody@Local%20Fo"..., 44) = 44
write(54, "X-Identity-Key: id1\r\n", 21) = 21
write(54, "Message-ID: <4B08A02E.7080201@ex"..., 44) = 44
write(54, "Date: Sun, 22 Nov 2009 11:35:01 "..., 39) = 39
write(54, "From: ishikawa <ishikawa@example"..., 39) = 39
write(54, "X-Mozilla-Draft-Info: internal/d"..., 70) = 70
write(54, "User-Agent: Thunderbird 2.0.0.23"..., 49) = 49
write(54, "MIME-Version: 1.0\r\n", 19)  = 19
write(54, "To: nobody@example.com\r\n", 24) = 24
write(54, "Subject: libc-2.7.so 11:35\r\n", 28) = 28
write(54, "Content-Type: multipart/mixed;\r\n", 32) = 32
write(54, " boundary=\"------------000003080"..., 50) = 50
write(54, "\r\n", 2)                    = 2
write(54, "This is a multi-part message in "..., 46) = 46
write(54, "--------------000003080301040107"..., 40) = 40
write(54, "Content-Type: text/html; charset"..., 45) = 45
write(54, "Content-Transfer-Encoding: 7bit\r"..., 33) = 33
write(54, "\r\n", 2)                    = 2
write(54, "<!DOCTYPE html PUBLIC \"-//W3C//D"..., 65) = 65
write(54, "<html>\r\n", 8)              = 8
write(54, "<head>\r\n", 8)              = 8
write(54, "</head>\r\n", 9)             = 9
write(54, "<body bgcolor=\"#ffffff\" text=\"#0"..., 41) = 41
write(54, "<br>\r\n", 6)                = 6
write(54, "<br>\r\n", 6)                = 6
write(54, "</body>\r\n", 9)             = 9
write(54, "</html>\r\n", 9)             = 9
write(54, "\r\n", 2)                    = 2
write(54, "--------------000003080301040107"..., 40) = 40
write(54, "Content-Type: application/octet-"..., 41) = 41
write(54, " name=\"libc-2.7.so\"\r\n", 21) = 21
write(54, "Content-Transfer-Encoding: base6"..., 35) = 35
write(54, "Content-Disposition: attachment;"..., 34) = 34
write(54, " filename=\"libc-2.7.so\"\r\n", 25) = 25
write(54, "\r\n", 2)                    = 2
write(54, "f0VMRgEBAQAAAAAAAAAAAAMAAwABAAAA"..., 74) = 74
write(54, "AAA0AAAANAAAADQAAABAAQAAQAEAAAUA"..., 74) = 74
write(54, "BAAAAAEAAAABAAAAAAAAAAAAAAAAAAAA"..., 74) = 74
write(54, "EwCcJwAAcFQAAAYAAAAAEAAAAgAAAJyd"..., 74) = 74

      ... MANY lines of write with 74 octets....

write(54, "--------------000003080301040107"..., 42) = 42
write(54, "\r\n", 2)                    = 2
close(54)                               = 0


Observation:

For this message with 1.2MB attachment, I counted 24016 write
calls. (fgrep "write(54," /tmp/t.out | wc)

Granted that the routine seems to be doing something about seeing
"From", escaping "From" with ">From", etc., but it has *NO* reason to
use UNBUFFERED WRITE here.  It can buffer the output and then finally
at the end of the loop it can flush or something.

HOW TO FIX?

I am not familiar with I/O routine inside mozilla code base.  I am not
sure where this file stream is opened [See the stack backtrace at the
end.] and why it is not using buffered
mode.

    rv = mCopyState->m_fileStream->Write(start, lineLength, &bytesWritten);

Is there a way to set the buffer-mode to m_fileStream inside CopyData
(just in case there are multiple wayward callers)?
Or can we simply replace this Write with a buffered output version, but
what that function would be?
[The stack trace is shown below for those who want to chase what is
calling this function.]

The above is for copying an existing message to another folder.  (So it
can happen offline.)

When I look up CopyData in mxr database,
I found another instance of  CopyData (for imap)
and that seems to have the same unbuffered output issue.
(Or at least superficially the code looks the same.
Unless "->Write" semantics is different (i.e., unbuffered vs buffered)
we have the crawling I/O performance when we copy / download large
message (when we write to a local folder) when imap is used, too!

MXR: E.g. Other place(s) where CopyData was found:

I searched the identifier CopyData using mxr:

MXR is slightly buggy, I think: I only left mail/mailnews-related entries.

Defined as a function in:

    mailnews/imap/src/nsImapMailFolder.cpp (View Hg log or Hg annotations)
        line 3343, as member of class nsImapMailFolder -- NS_IMETHODIMP
nsImapMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength)

    This function, CopyData  calls another function
    CopyDataToOutputStreamForAppend() and in it, we have

3303     {
3304       rv = outputStream->Write(start,
3305                                              end-start,
3306                                              &writeCount);
3307       rv = outputStream->Write(CRLF, 2, &writeCount);
3308     }

       We should make sure that these writes are buffered!
       (and flush at the end of the loop/return.)
       From what I read before about the slowness of I/O in a comment
       posted by IMAP user, probably these are NOT buffered, either.

       AND, we SHOULD CHECK the return value rv: there is NO ERROR
       checking here !? No wonder there are reports of IMAP users
       losing messages, etc. in the past?
       [Added comment: even if these UNBUFFERED
       writes failed, close will succeed! So checking here
       is very important! ]

Defined as a function prototype in:

    mailnews/base/public/nsICopyMessageListener.idl (View Hg log or Hg
annotations)
        line 20 -- void copyData(in nsIInputStream aIStream, in long
aLength);

Defined as a variable in:

    mailnews/local/src/nsLocalMailFolder.cpp (View Hg log or Hg annotations)
        line 1830 -- rv = CopyData(inputStream, (int32_t) fileSize);
        line 2026 -- NS_IMETHODIMP
nsMsgLocalMailFolder::CopyData(nsIInputStream *aIStream, int32_t aLength)

    This is the function I found out about and report here.

The questions I have are:

Is there a way to set the buffer-mode to m_fileStream inside CopyDate (just in case there are other wayward callers)?

Or can we simply replace this Write with a buffered output version?

See Related bugs:

Bug 558528 - Larger buffer size in file I/O by Mail&News to improve
performance on slower drives (eg network drives/shares). Use of
Necko's buffer size, buffer size user can easily customize.

    Well, I have not used the aceman's patch in there yet since I
    wanted to find out where the offending code for calling write()
    repeatedly with smallish 74 octet length originated. I noticed
    this repeated write() of smallish size earlier this year when
    someone reported a serious issue when his mail store is on a
    remote file system (CIFS) and he seems to have used imap.  I tried
    to re-create his problem locally by mounting Windows 7 file system
    from my local linux installation and in so doing noticed a few
    issues with CIFS code itself during a transient network error
    (which seems to have been fixed nicely),
    and a missing error code or two on TB's side, and in so doing noticed
    a trace of MANY write system calls with short (70+ octets) on wire.
    Such many small write calls over wire can kill performance easily,
    and the missing error handling was bad 

    BTW, :aceman's patch there seems to handle many "read" side of the
    issue. The problem I am reporting is "write" side.

    Also, note that the above bug deals with buffer size when 
    buffering is enabled. This bug is about a place where buffering
    is not used at all when it should.

Bug 494238 - (jetpack-panel-apps) ability for a jetpack to open a panel

    I wanted to use the great profiler ( see

https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Reporting_a_Thunderbird_Performance_Problem_with_G
    ). But it  never seemed to work for me when I tried to
    use it against C-C TB until lately.

    But finally, thanks to
    https://bugzilla.mozilla.org/show_bug.cgi?id=494238#c47 it seems
    to work!
    (But I have not yet used that profiler for this particular
performance issue because
    it has been only a day since I could get it working after a long fiasco
    with bug regarding loadinfo, Bug 1111304 - assertion failure
    loadinfo, this month. )

    I wanted to get out the words
    fast so  that something can be done during the holidays for
    this serious performance issue.
    But once I become familiar with the profiler and can
    offer performance figures using the profiler, I will.
    It looks the profiler could have saved me in
    my efforts to capture the traceback mentioned below.
    I see a traceback in the profile window somehow.


This is the stack trace of C-C TB when the write with this small chunk
is invoked.

(gdb) where
#0  write () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007ffff7fb8740 in pt_Write (fd=<optimized out>, buf=<optimized out>,
    amount=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/pthreads/ptio.c:1315
#2  0x00007ffff04beccc in (anonymous namespace)::interposedWrite (
    aFd=<optimized out>, aBuf=<optimized out>, aAmt=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/xpcom/build/NSPRInterposer.cpp:68
#3  0x00007ffff7f9f0b9 in PR_Write (fd=<optimized out>, buf=<optimized out>,
    amount=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/nsprpub/pr/src/io/priometh.c:114
#4  0x00007ffff009ce03 in nsMsgFileStream::Write (this=<optimized out>,
    buf=<optimized out>, count=<optimized out>, result=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgFileStream.cpp:154
#5  0x00007ffff02ace8d in nsMsgLocalMailFolder::CopyData (
    this=<optimized out>, aIStream=<optimized out>, aLength=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsLocalMailFolder.cpp:2111
#6  0x00007fffeffa089f in nsCopyMessageStreamListener::OnDataAvailable (
    this=<optimized out>, ctxt=<optimized out>, aIStream=<optimized out>,
    sourceOffset=<optimized out>, aLength=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mailnews/base/src/nsCopyMessageStreamListener.cpp:86
#7  0x00007ffff02c8a48 in nsMailboxProtocol::ReadMessageResponse (
    this=<optimized out>, inputStream=<optimized out>,
    sourceOffset=<optimized out>, length=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:594
#8  0x00007ffff02c8f0b in nsMailboxProtocol::ProcessProtocolState (
    this=<optimized out>, url=<optimized out>, inputStream=<optimized out>,
    offset=<optimized out>, length=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mailnews/local/src/nsMailboxProtocol.cpp:679
#9  0x00007ffff00c1399 in nsMsgProtocol::OnDataAvailable (
    this=<optimized out>, request=<optimized out>, ctxt=<optimized out>,
    inStr=<optimized out>, sourceOffset=<optimized out>,
    count=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mailnews/base/util/nsMsgProtocol.cpp:353
#10 0x00007ffff0572bfe in nsInputStreamPump::OnStateTransfer (
    this=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:609
#11 0x00007ffff0573f5d in nsInputStreamPump::OnInputStreamReady (
    this=<optimized out>, stream=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/netwerk/base/src/nsInputStreamPump.cpp:436
#12 0x00007ffff04751d8 in nsInputStreamReadyEvent::Run (this=<optimized
out>)
    at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/io/nsStreamUtils.cpp:88
#13 0x00007ffff049cf6d in nsThread::ProcessNextEvent (this=<optimized out>,
    aMayWait=<optimized out>, aResult=<optimized out>)
    at /REF-COMM-CENTRAL/comm-central/mozilla/xpcom/threads/nsThread.cpp:855
#14 0x00007ffff04cfba3 in NS_ProcessNextEvent (aThread=<optimized out>,
    aMayWait=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/xpcom/glue/nsThreadUtils.cpp:265
#15 0x00007ffff08c4356 in mozilla::ipc::MessagePump::Run (
    this=<optimized out>, aDelegate=<optimized out>)
    at /REF-COMM-CENTRAL/comm-central/mozilla/ipc/glue/MessagePump.cpp:99
#16 0x00007ffff0885029 in MessageLoop::RunInternal (this=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:233
#17 0x00007ffff08853fc in RunHandler (this=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:226
#18 MessageLoop::Run (this=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/ipc/chromium/src/base/message_loop.cc:200
#19 0x00007ffff28827b3 in nsBaseAppShell::Run (this=<optimized out>)
    at /REF-COMM-CENTRAL/comm-central/mozilla/widget/nsBaseAppShell.cpp:164
#20 0x00007ffff33d61f9 in nsAppStartup::Run (this=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/toolkit/components/startup/nsAppStartup.cpp:281
#21 0x00007ffff345e1ee in XREMain::XRE_mainRun (this=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4153
#22 0x00007ffff345f9ab in XREMain::XRE_main (this=<optimized out>,
    argc=<optimized out>, argv=<optimized out>, aAppData=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4229
#23 0x00007ffff345fd7d in XRE_main (argc=<optimized out>,
    argv=<optimized out>, aAppData=<optimized out>, aFlags=<optimized out>)
    at
/REF-COMM-CENTRAL/comm-central/mozilla/toolkit/xre/nsAppRunner.cpp:4449
#24 0x000000000040378e in do_main (argc=argc@entry=1,
    argv=argv@entry=0x7fffffffd9f8, xreDirectory=<optimized out>)
    at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:195
#25 0x00000000004038b5 in main (argc=1, argv=0x7fffffffd9f8)
    at /REF-COMM-CENTRAL/comm-central/mail/app/nsMailApp.cpp:380
(gdb)
info register

Yeah, right. I had to set a breakpoint to "write" system call and
checked the registers (this is under Debian GNU/Linux 64-bit) and when
the passed length parameter is 74 in, I hit the breakpoint and
voila.

It took me a while to figure out gdb's esoteric command expression
syntax, and I have to skip the major part of the execution without
setting up the break point (because with the conditional breakpoint, the
execution speed is much slower [maybe error message written to stderr
causes one |write| call per character and thus slows down TB execution
very much], and hitting control-C to revert back to gdb prompt when TB
is finally waiting for user interaction after showing the three window
panes, and THEN set the conditional breakpoint

break write if ($rdx == 74)

to gdb prompt and continue,
then hit the breakpoint to obtain the above stack trace.
The great profiler I mentioned above seems to record the repeated write
nicely (since it is sampled many times, and at the sampled time, the
profiler seems to record the stack trace, and so it would have been
much easier for me to figure out which higher level routine invokes
short writes)


TIA
See Also: → 1116055
Blocks: 1242046
Sorry, this bugzilla entry might have been made by mistake.
Please ignore it for the moment.
I will close it if this was indeed was created by accident...
Sorry I was not clear enough (and I almost forgot myself !).

This bugzilla is to collect the report of 
"WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug 1306914." which is printed if there is a system that does not honor the tacit assumption about the |Seek()| position in nsPop3Sink.cpp.
This is crucial in the enabling-buffering patches.
I am going to change the title a little bit.
When the user sees a seek position inconsistency during the beta testing period of patches to enable write buffering (see bug 1116055), C-C TB nightly would print the following message:

WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug 1306914.

Please report the type of OS, 32-bit vs 64-bit, and CPU type and report the exact line you see in the error log or in the console.

Thank you!
Summary: Report a file position issue: unnecessary |Seek()| is removed (re: Performance issue: Failure to use buffered write (comm-central thunderbird)) → Collecting file seek position problem report: WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug 1306914.
We're landing this in bug 1116055.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
Summary: Collecting file seek position problem report: WriteLineToMailbox detected an unexpected file position change. If you can reliably reproduce this message, please report the steps you used to bug 1306914. → Collecting file seek position problem report: WriteLineToMailbox detected an unexpected file position change.
We will collect user feedback in bug 1308335.
You need to log in before you can comment on or make changes to this bug.