Closed Bug 365751 Opened 13 years ago Closed 13 years ago

Deleting an attachment with \n is O(N^2)

Categories

(MailNews Core :: Backend, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ajschult784, Assigned: ajschult784)

References

()

Details

(Keywords: perf)

Attachments

(2 files, 2 obsolete files)

Attached file stacktrace
I tried to delete a 10MB attachment with current trunk and it took 5 or 10 minutes.  The code

1. looks through the whole attachment for \r (fails)
2. looks for a \n (succeeds)
3. advances past the \n
4. looks through the rest of the whole attach for \r (fails!)

It should just use PL_strnpbrk so it can look for both characters at once.
Attachment #250302 - Flags: superreview?(bienvenu)
Attachment #250302 - Flags: review?(bienvenu)
Comment on attachment 250302 [details] [diff] [review]
use PL_strnpbrk once instead of memchr twice

looks good, thx for the patch!
Attachment #250302 - Flags: superreview?(bienvenu)
Attachment #250302 - Flags: superreview+
Attachment #250302 - Flags: review?(bienvenu)
Attachment #250302 - Flags: review+
FIXED
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
this caused a regression - bug 366151. My inclination is to back this out since it's an easy patch to reconstruct, and the regression is so severe...
re-opening - I backed this out. I don't have time to figure out what went wrong with the patch right this minute - my guess is that it didn't handle something like a line with just a linebreak, i.e., an empty line, because that's what triggers the parsing of the headers, and that parsing is never triggered.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 250302 [details] [diff] [review]
use PL_strnpbrk once instead of memchr twice

>+    end = PL_strnprbrk(start, "\r\n", endBuffer - start + 1);
For a start this needs to be PL_strnpbrk (both times!), but this function is truly awful, I feel like submitting it to http://thedailywtf.com/, apart from the wiry flow control the two bugs I spotted were a) >From lines are forced to end with MSG_LINEBREAK while the rest is copied literally b) no check for a CRLF split over two reads (although I can't see an easy cure for the latter though).
> For a start this needs to be PL_strnpbrk (both times!)

right!  :(

> a) >From lines are forced to end with MSG_LINEBREAK while the rest is copied 
> literally

Why is >From special cased?  Is the sole purpose of that block to prepend ">" to the line?  

Why not just do
  mCopyState->m_fileStream->write(">", 1);
?

> b) no check for a CRLF split over two reads

Actually, this is easy to fix.  Just tell PL_strnpbrk to stop short of the last character.
Status: REOPENED → ASSIGNED
Attached patch patch v2 (obsolete) — Splinter Review
This
1. uses PL_strnpbrk instead of PL_strnrprbrk
2. uses memmove instead of memcpy since the memory regions could (potentially) be overlapping)
3. doesn't leak memory on realloc OOM
4. doesn't special-case "From " lines so much, using Neil's (clever but evil) *--start = '>'
5. ensures there are 2 extra bytes in the buffer... the old code ensured 0 and used 1.

It doesn't handle CRLF that split over multiple calls.  As it is, the method doesn't know if it will be called again.  My only idea would be to assume that the buffer has the same type of line breaks everywhere.
Attachment #250302 - Attachment is obsolete: true
Attachment #251429 - Flags: superreview?(bienvenu)
Attachment #251429 - Flags: review?(bienvenu)
Attached patch patch v2.1Splinter Review
forgot to save before attaching
Attachment #251429 - Attachment is obsolete: true
Attachment #251430 - Flags: superreview?(bienvenu)
Attachment #251430 - Flags: review?(bienvenu)
Attachment #251429 - Flags: superreview?(bienvenu)
Attachment #251429 - Flags: review?(bienvenu)
Did the old code handle CRLF's split over multiple calls?  And what happens with the new code when CRLF's span blocks? Will we corrupt messages, and/or attachments not getting stripped?
The old code also did not handle that case.  With both new and old code, the data written to m_fileStream is correct.  The problem is that 
  mCopyState->m_parseMsgState->ParseAFolderLine(start, lineLength);
receives the following:

text textCRLF
text textCRLF
text textCR
LF
text textCRLF

It looks like nsParseMailMessageState::ParseFolderLine might
1. Think the headers were finished if the break happened while still reading headers (and ignore the rest of the headers).
2. Think there were 1 too many lines in the body if the headers had already been read.

And for the 10MB attachment I'm looking at now, the attachment doesn't get deleted anyway.  :(  That's bug 350803.  I'd guess that if the attachment deleting code was actually working, it wouldn't be invoking this method to copy the attachment.
OK, thx, Andrew, I'll give this patch a try
Comment on attachment 251430 [details] [diff] [review]
patch v2.1

OK, I've convinced myself that the *--start = '>' thing is OK - but a couple comments in the code might help:-) Something to the effect that if the start of the block is "From ", we've allowed space at the beginning of mCopyState->m_dataBuffer, and if the "From " is in the middle of the block, we've already written out the previous line, so it's OK that we're overwriting the end of the previous line.
Attachment #251430 - Flags: superreview?(bienvenu)
Attachment #251430 - Flags: superreview+
Attachment #251430 - Flags: review?(bienvenu)
Attachment #251430 - Flags: review+
OK.  landed with an additional comment and actually doing #5 from comment 8 (the comment was correct in the attached patch, but the code only reserved 1 byte).
Status: ASSIGNED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
(In reply to comment #8)
>5. ensures there are 2 extra bytes in the buffer... the old code ensured 0 and used 1.
Well, it wrote it, but I'm not sure it ever read it again ;-)
[My guess is that it's useful for debugging]
Product: Core → MailNews Core
You need to log in before you can comment on or make changes to this bug.