Last Comment Bug 365751 - Deleting an attachment with \n is O(N^2)
: Deleting an attachment with \n is O(N^2)
Status: RESOLVED FIXED
: perf
Product: MailNews Core
Classification: Components
Component: Backend (show other bugs)
: Trunk
: All All
: -- normal with 1 vote (vote)
: ---
Assigned To: Andrew Schultz
:
:
Mentors:
http://bonsai.mozilla.org/cvsblame.cg...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-01-02 22:19 PST by Andrew Schultz
Modified: 2008-07-31 04:30 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
stacktrace (3.10 KB, text/plain)
2007-01-02 22:19 PST, Andrew Schultz
no flags Details
use PL_strnpbrk once instead of memchr twice (1.84 KB, patch)
2007-01-02 22:24 PST, Andrew Schultz
mozilla: review+
mozilla: superreview+
Details | Diff | Splinter Review
patch v2 (5.50 KB, patch)
2007-01-13 23:29 PST, Andrew Schultz
no flags Details | Diff | Splinter Review
patch v2.1 (5.55 KB, patch)
2007-01-13 23:33 PST, Andrew Schultz
mozilla: review+
mozilla: superreview+
Details | Diff | Splinter Review

Description Andrew Schultz 2007-01-02 22:19:16 PST
Created attachment 250299 [details]
stacktrace

I tried to delete a 10MB attachment with current trunk and it took 5 or 10 minutes.  The code

1. looks through the whole attachment for \r (fails)
2. looks for a \n (succeeds)
3. advances past the \n
4. looks through the rest of the whole attach for \r (fails!)

It should just use PL_strnpbrk so it can look for both characters at once.
Comment 1 Andrew Schultz 2007-01-02 22:24:21 PST
Created attachment 250302 [details] [diff] [review]
use PL_strnpbrk once instead of memchr twice
Comment 2 David :Bienvenu 2007-01-03 07:44:11 PST
Comment on attachment 250302 [details] [diff] [review]
use PL_strnpbrk once instead of memchr twice

looks good, thx for the patch!
Comment 3 Andrew Schultz 2007-01-04 20:27:43 PST
FIXED
Comment 4 David :Bienvenu 2007-01-08 10:39:46 PST
this caused a regression - bug 366151. My inclination is to back this out since it's an easy patch to reconstruct, and the regression is so severe...
Comment 5 David :Bienvenu 2007-01-08 10:48:16 PST
re-opening - I backed this out. I don't have time to figure out what went wrong with the patch right this minute - my guess is that it didn't handle something like a line with just a linebreak, i.e., an empty line, because that's what triggers the parsing of the headers, and that parsing is never triggered.
Comment 6 neil@parkwaycc.co.uk 2007-01-08 16:53:57 PST
Comment on attachment 250302 [details] [diff] [review]
use PL_strnpbrk once instead of memchr twice

>+    end = PL_strnprbrk(start, "\r\n", endBuffer - start + 1);
For a start this needs to be PL_strnpbrk (both times!), but this function is truly awful, I feel like submitting it to http://thedailywtf.com/, apart from the wiry flow control the two bugs I spotted were a) >From lines are forced to end with MSG_LINEBREAK while the rest is copied literally b) no check for a CRLF split over two reads (although I can't see an easy cure for the latter though).
Comment 7 Andrew Schultz 2007-01-08 18:36:15 PST
> For a start this needs to be PL_strnpbrk (both times!)

right!  :(

> a) >From lines are forced to end with MSG_LINEBREAK while the rest is copied 
> literally

Why is >From special cased?  Is the sole purpose of that block to prepend ">" to the line?  

Why not just do
  mCopyState->m_fileStream->write(">", 1);
?

> b) no check for a CRLF split over two reads

Actually, this is easy to fix.  Just tell PL_strnpbrk to stop short of the last character.
Comment 8 Andrew Schultz 2007-01-13 23:29:31 PST
Created attachment 251429 [details] [diff] [review]
patch v2

This
1. uses PL_strnpbrk instead of PL_strnrprbrk
2. uses memmove instead of memcpy since the memory regions could (potentially) be overlapping)
3. doesn't leak memory on realloc OOM
4. doesn't special-case "From " lines so much, using Neil's (clever but evil) *--start = '>'
5. ensures there are 2 extra bytes in the buffer... the old code ensured 0 and used 1.

It doesn't handle CRLF that split over multiple calls.  As it is, the method doesn't know if it will be called again.  My only idea would be to assume that the buffer has the same type of line breaks everywhere.
Comment 9 Andrew Schultz 2007-01-13 23:33:22 PST
Created attachment 251430 [details] [diff] [review]
patch v2.1

forgot to save before attaching
Comment 10 David :Bienvenu 2007-01-14 09:04:30 PST
Did the old code handle CRLF's split over multiple calls?  And what happens with the new code when CRLF's span blocks? Will we corrupt messages, and/or attachments not getting stripped?
Comment 11 Andrew Schultz 2007-01-14 09:27:29 PST
The old code also did not handle that case.  With both new and old code, the data written to m_fileStream is correct.  The problem is that 
  mCopyState->m_parseMsgState->ParseAFolderLine(start, lineLength);
receives the following:

text textCRLF
text textCRLF
text textCR
LF
text textCRLF

It looks like nsParseMailMessageState::ParseFolderLine might
1. Think the headers were finished if the break happened while still reading headers (and ignore the rest of the headers).
2. Think there were 1 too many lines in the body if the headers had already been read.

And for the 10MB attachment I'm looking at now, the attachment doesn't get deleted anyway.  :(  That's bug 350803.  I'd guess that if the attachment deleting code was actually working, it wouldn't be invoking this method to copy the attachment.
Comment 12 David :Bienvenu 2007-01-14 09:33:11 PST
OK, thx, Andrew, I'll give this patch a try
Comment 13 David :Bienvenu 2007-01-14 12:48:16 PST
Comment on attachment 251430 [details] [diff] [review]
patch v2.1

OK, I've convinced myself that the *--start = '>' thing is OK - but a couple comments in the code might help:-) Something to the effect that if the start of the block is "From ", we've allowed space at the beginning of mCopyState->m_dataBuffer, and if the "From " is in the middle of the block, we've already written out the previous line, so it's OK that we're overwriting the end of the previous line.
Comment 14 Andrew Schultz 2007-01-14 13:24:34 PST
OK.  landed with an additional comment and actually doing #5 from comment 8 (the comment was correct in the attached patch, but the code only reserved 1 byte).
Comment 15 neil@parkwaycc.co.uk 2007-01-14 16:41:53 PST
(In reply to comment #8)
>5. ensures there are 2 extra bytes in the buffer... the old code ensured 0 and used 1.
Well, it wrote it, but I'm not sure it ever read it again ;-)
[My guess is that it's useful for debugging]

Note You need to log in before you can comment on or make changes to this bug.