Closed Bug 365751 Opened 14 years ago Closed 14 years ago
Deleting an attachment with \n is O(N^2)
I tried to delete a 10MB attachment with current trunk and it took 5 or 10 minutes. The code 1. looks through the whole attachment for \r (fails) 2. looks for a \n (succeeds) 3. advances past the \n 4. looks through the rest of the whole attach for \r (fails!) It should just use PL_strnpbrk so it can look for both characters at once.
Comment on attachment 250302 [details] [diff] [review] use PL_strnpbrk once instead of memchr twice looks good, thx for the patch!
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
this caused a regression - bug 366151. My inclination is to back this out since it's an easy patch to reconstruct, and the regression is so severe...
re-opening - I backed this out. I don't have time to figure out what went wrong with the patch right this minute - my guess is that it didn't handle something like a line with just a linebreak, i.e., an empty line, because that's what triggers the parsing of the headers, and that parsing is never triggered.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment on attachment 250302 [details] [diff] [review] use PL_strnpbrk once instead of memchr twice >+ end = PL_strnprbrk(start, "\r\n", endBuffer - start + 1); For a start this needs to be PL_strnpbrk (both times!), but this function is truly awful, I feel like submitting it to http://thedailywtf.com/, apart from the wiry flow control the two bugs I spotted were a) >From lines are forced to end with MSG_LINEBREAK while the rest is copied literally b) no check for a CRLF split over two reads (although I can't see an easy cure for the latter though).
> For a start this needs to be PL_strnpbrk (both times!) right! :( > a) >From lines are forced to end with MSG_LINEBREAK while the rest is copied > literally Why is >From special cased? Is the sole purpose of that block to prepend ">" to the line? Why not just do mCopyState->m_fileStream->write(">", 1); ? > b) no check for a CRLF split over two reads Actually, this is easy to fix. Just tell PL_strnpbrk to stop short of the last character.
Status: REOPENED → ASSIGNED
This 1. uses PL_strnpbrk instead of PL_strnrprbrk 2. uses memmove instead of memcpy since the memory regions could (potentially) be overlapping) 3. doesn't leak memory on realloc OOM 4. doesn't special-case "From " lines so much, using Neil's (clever but evil) *--start = '>' 5. ensures there are 2 extra bytes in the buffer... the old code ensured 0 and used 1. It doesn't handle CRLF that split over multiple calls. As it is, the method doesn't know if it will be called again. My only idea would be to assume that the buffer has the same type of line breaks everywhere.
forgot to save before attaching
Did the old code handle CRLF's split over multiple calls? And what happens with the new code when CRLF's span blocks? Will we corrupt messages, and/or attachments not getting stripped?
The old code also did not handle that case. With both new and old code, the data written to m_fileStream is correct. The problem is that mCopyState->m_parseMsgState->ParseAFolderLine(start, lineLength); receives the following: text textCRLF text textCRLF text textCR LF text textCRLF It looks like nsParseMailMessageState::ParseFolderLine might 1. Think the headers were finished if the break happened while still reading headers (and ignore the rest of the headers). 2. Think there were 1 too many lines in the body if the headers had already been read. And for the 10MB attachment I'm looking at now, the attachment doesn't get deleted anyway. :( That's bug 350803. I'd guess that if the attachment deleting code was actually working, it wouldn't be invoking this method to copy the attachment.
OK, thx, Andrew, I'll give this patch a try
Comment on attachment 251430 [details] [diff] [review] patch v2.1 OK, I've convinced myself that the *--start = '>' thing is OK - but a couple comments in the code might help:-) Something to the effect that if the start of the block is "From ", we've allowed space at the beginning of mCopyState->m_dataBuffer, and if the "From " is in the middle of the block, we've already written out the previous line, so it's OK that we're overwriting the end of the previous line.
OK. landed with an additional comment and actually doing #5 from comment 8 (the comment was correct in the attached patch, but the code only reserved 1 byte).
Status: ASSIGNED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
(In reply to comment #8) >5. ensures there are 2 extra bytes in the buffer... the old code ensured 0 and used 1. Well, it wrote it, but I'm not sure it ever read it again ;-) [My guess is that it's useful for debugging]
You need to log in before you can comment on or make changes to this bug.