On a debug windows PII-450 MHZ machine with 256 MB of RAM, it takes 3 minutes to insert 130K of text into the compose window. That is, I'm replying to a 100K text message and it's taking about that long before I see the body inserted into the editor instance. I've done a lot of analysis on where the hotspots were for this operation and one of the big ones was here in WillInsertText. I sent a more detailed message to Joe and some others but here's the gist of where we can kill some low hanging fruit: 1) The method is taking an nsString which is over 130K (corresponding to the message body) as an input argument and it isn't passing the nsString by reference. This causes an extra allocation of the string. 2) This method extracts one line at a time, copying that line into a new nsString before calling DoTextInsertion (which appears to be an expensive operation). A couple things we do here, searching for the next line is actually expensive using nsString::FindCharInSet on a string as large as this one. Can DoTextInsertion take larger chunks than one line at time? Also, I replaced the search code with FindChar which is a little faster and this helped a bit. Also, we don't need to copy each extracted line into a new nsString and passing that string to DoTextInsertion. Instead, we could pass an index into the original string into DoTextInsertion. 3) We make a couple copies of the input String in this method. At 130K per copy they are expensive. I wonder if we can't somehow re-examine if they are all necessary. Thanks!
Nominating for beta1, adding perf to the keyword.
Keywords: beta1, perf
Just for analysis sake (I'm definetly not saying this is the right thing to do). I modified nsTextEditRules::WillInsertText to insert the entire message body in one step instead of trying to find a line, copy that line out into a new nsString then insert that string into the document and repeat. Time BEFORE: 3-4 minutes Time AFTER: 20 seconds The remaining 20 seconds is time spent in nsHTMLToTXTSinkStream::Write searching for new line characters in this huge nsString. This gives us an idea of how much of a performance win we are looking at in here. Keep in mind that my numbers for a high end machine too.
m15; cc sfraser
Target Milestone: M15
Actually I'm nominating this for beta1. I'm not sure we can triage it to m15 which is after beta1 until PDT determines if they will rule it PDT+.
*** Bug 26388 has been marked as a duplicate of this bug. ***
Status: NEW → ASSIGNED
*** Bug 19273 has been marked as a duplicate of this bug. ***
pasting in my comments from email: Scott MacGregor wrote: > We've got a really bad performance problem [...] > 1) nsTextEditRules:: WillInsertText and DoTextInsertion are costing us > a lot. [...] Mailnews is giving the editor instance a nsString, > in my example this string was 130K. WillInsertText makes several > copies of the string . You correctly point out that this routine is slow. We can certainly speed it up, though if it's really taking 4 minutes for you then we may be doomed. > First, we should pass the nsString by reference in the method > signature for WillInsertText to reduce an extra copy of the string > when calling the method. Second, we immediately copy the entire string > into aOutString as part of the intialization work even though later we > may not end up using that copied value. Can we avoid that? I would like to get rid of the outString (I've wanted to do this before). In some cases outString simply makes no sense. But there are other cases where some old code depends on it. I'm hopeful that we can ditch it with a little work, though. > However I think the most expensive part of this method is the while > loop which determines how we call DoTextInsertion. [...] Ideally, > it'd be great if DoTextInsertion could handle chunks larger than just > one line a time. Any way you slice it, I have to break it up into runs of text separated by newlines. This is because I have to use <break> nodes instead of newlines, even when inside of preformatted text. The selection & caret code cant deal with blank lines caused by newlines in preformatted text, but it can deal with break nodes. At the bottom level, we have to insert a separate text node for each run of chars between breaks. > I'll file a bug so we can get the ball rolling on this part for beta1 > if we can. I can cut down on string copying for beta1. And we can tune the rest. But the basic problem (needing to scan through/break up huge text strings) remains. :-( If you can talk the layout folks into generating frames for blank lines caused by newlines in preformatted text, then we can avoid this problem.
>If you can talk the layout folks into generating frames for blank lines caused >by newlines in preformatted text, then we can avoid this problem. I agree completely. I've been running with that little hack in my tree where i just insert the entire string in WillInsertText instead of slicing line by line and it is so much faster. Although I did it as just a hack, I don't see any side effects! I must be missing something =) In any case I will definetly help push on the layout folks with you after beta1 so you can get out of the business of reading out line by line. Even if you have to insert one line at a time, you could get a pretty big performance boost by not copying each line into a new nsString. If you just passed a ptr into the string buffer and a length parameter into DoTextInsertion instead of a new nsString you'd save a lot. I briefly looked at doing this and it looks like it would involve changing several method signatures underneath DoTextInsertion to also take a PRUnichar * + a length instead of a nsString. But I didn't see that any of these methods were modifying the input nsString in any way so it may be as easy as just changing the signatures. This would help save a lot as then you are only bounded by: 1) the time it takes to search out the position of the next new line 2) the time it takes for the right frame creation and text insertion for each line of text processed by DoTextInsertion. Just a few thoughts...
Reassigning to firstname.lastname@example.org. I'll take a look at rewriting some of the code to use nsSubsumeStr to see if that speeds some things up.
Assignee: jfrancis → kin
Status: ASSIGNED → NEW
Must have a fix by 03/03 or will punt for beta1.
Whiteboard: [PDT+] → [PDT+] Need fix by 03/03
Is this performance an end user issue? If not, can we push this to post beta 1?
bijals: yes it is. Please read my comments at the top of the bug report where I report that inserting a 100K document into an ender widget takes over 4 minutes on a high end machine. This is very common when replying to large emails. I'd say this effects the end user =).
Status: NEW → ASSIGNED
I think think the searching, dup'ing, and splitting of the 160k nsString is the least of our worries. I commented out the calls to InsertBreak() and DoTextInsertion() and found that the searching, dup'ing, and splitting of the 160k string only took about 4 seconds of the total time ... I was able to get this down to less than a second by rewriting the code to use offsets and nsSubsumeStr. I'm currently trying get Quantify to work to see where the hotspots in the InsertBreak() and DoTextInsertion() calls are.
I agree completely with the numbers you saw =). I've been running with a hacked patch in my tree where we always insert the entire nsString in one shot instead of extracting line by line. I know this is supposed to be wrong but I've been running with it in my tree all week and haven't seen anything bad happen. Granted I'm only testing mailnew compose and not Editing documents.
I'll take a look at this.
Assignee: kin → sfraser
Status: ASSIGNED → NEW
OK, we've found 3-4 performance hotspots that contribute to slowness here: 1. The implementation of nsEditor::GetChildOffset() is slow; each call could take from 2ms upwards, depending on the number of children. We can fix this method to use nsIContent::IndexOf(), which cuts the call time down to around 0.021 ms 2. nsIDOMSelection::Collapse is being called a bunch of times, and some of those calls can be time-consuming; 1.7ms or more. Those that take significant time spent most of that time in selectFrames(), which does 2 CreateInstance calls and some iterating through frames. Some of this is avoidable. 3. The call to nsHTMLEditor::InsertFormattingForNode() on each new text node calls nsIDOMElement::SetAttribute(), which is slow for new nodes (because it calls GetPrimaryFrameFor() on an uncached node; see bug 29730). One call to SetAttribute takes around 1.8ms, of which 1.5ms is in GetPrimaryFrameFor(). 4. We don't currently recycle nsRange objects. We could. 5. nsEditor::InsertTextImpl() was playing funny games with inserting a " " string, then selecting it, then replacing it with the inserted text. This can probably be removed.
So with provisionary fixes to 1, 2, 3 and 5 we halved the time taken to insert 1000 lines of text, from 20 seconds to around 10 seconds. This is not good enough, but a start.
simon and i thought the numbers here sounded suspicious, since kin was reporting 4 minutes to paste with a 2000 line file (using the same lines we were using in the shorter test). So I independently did the comarison in my tree. I confirm Simon's numbers. I did a 500 line test, and got 5.1 seconds with our speedups, and 9.5 seconds without them. With the full 2200+ lines in Kin's test file, my numbers were 57.5 seconds vs 130 seconds. As Simon said, we can do better.
just filed bug 29737 on seelction being slow after you paste in the 2200 line file. I suspect it's the same GetPrimaryFrameFor() problem, but I haven't investigated.
I've got pasting the 500 lines down to about 3 seconds, and 2200+ lines down to about 27 seconds on my PII 450 Mhz machine. I'll attatch the diffs that I'm using. My diffs contain the following changes: 1. GetChildOffset() calls nsIContent::IndexOf(). 2. Commented out the setting of " " in InsertTextImpl(). 3. Call InsertFormattingForNode() before InsertBefore() in CreateElementTxn::Do(). 4. Rewrite of nsHTMLEditRules::WillInsertText() and nsTextEditRules::WillInsertText() code to use nsSubsumeString() instead of duping and splitting. 5. Reordered the CID checks in TransactionFactory::GetNewTransaction()so that the most commonly used during editing and pasting are at the top. 6. nsDOMSelection::selectFrames() bails early if the selection is collapsed AND we are currently selecting the range passed in.
Created attachment 5957 [details] [diff] [review] Patch set containing fixes from kin, mjudge, jfrancis, and sfraser
Some things I'm noticing: 1. Pasting in HTML Mail Compose and Composer are faster than pasting in PlainText Mail Compose. Pasting 500 lines in HTML Mail Compose and Composer is about 3 seconds, and 10 seconds in PlainText Mail. 2. In Composer, if I paste 2200+ lines, then undo or SelectAll-Delete, and then paste again, 10 seconds is added to the total amount of time it takes for the content to appear. If I repeat this again, another 10 seconds is added to the total time.
cc: troy, since I'm going to be mentioning GetPrimaryFrameFor
Status: NEW → ASSIGNED
I don't want to hear any mention of GetPrimaryFrameFor(). It's exactly the same issue as for BR frames. We don't add text frames to the map by default, because it drastically increases the size of the map, and text nodes are anonymous so normal DOM usage doesn't involve doing a content->frame lookup If you want text nodes added to the map by default in editable documents, then go ahead
Here's a rundown of where I think we're spending time, after applying fixes for the performance problems described above. These data are for pasting a single line of text (text, then a <br>) into the editor (taken from the middle of pasting 100 lines). Times are microseconds. Two main top-level calls themselves make up almost 100% of the total pasting time: Num calls Time (us) -------------------------------------------------------------------------- nsHTMLEditor::InsertBreak 1 1690 nsEditor::InsertTextImpl (enclosing call only) 1 1862 -------------------------------------------------------------------------- Total 3352 Here are the (non-inclusive) calls that take significant time within the above calls (and together account for around 50% of the time) -------------------------------------------------------------------------- PresShell::ContentInserted (frame construction) 2 434 (12%) nsDOMSelection::Collapse 5 395 (11%) nsEditor::GetPriorNode ( ->IsEditable -> GetPrimaryFrameFor) 3 359 (10%) nsGenericDOMDataNode::ReplaceData ( -> GetPrimaryFrameFor) 1 373 (10%) InsertTextTxn::Init (a string dup) 1 203 (6%)
Here are those data non-mangled: Num calls Time (us) -------------------------------------------------------------------------- nsHTMLEditor::InsertBreak 1 1690 nsEditor::InsertTextImpl (enclosing call only) 1 1862 -------------------------------------------------------------------------- Total 3352 Here are the (non-inclusive) calls that take significant time within the above calls (and together account for around 50% of the time) -------------------------------------------------------------------------- PresShell::ContentInserted (frame construction) 2 434 (12%) nsDOMSelection::Collapse 5 395 (11%) nsEditor::GetPriorNode ( ->IsEditable -> GetPrimaryFrameFor) 3 359 (10%) nsGenericDOMDataNode::ReplaceData ( -> GetPrimaryFrameFor) 1 373 (10%) InsertTextTxn::Init (a string dup) 1 203 (6%)
removing PDT+ for reconsideration. I recommend we go with M15 for this. We've identified a lot of performance improvements, but there is no silver bullet. We need a lot of small and medium sized changes. I'd feel better about running with this code throughout M15 + M16 and having it be in beta2.
Whiteboard: [PDT+] w/b minus on 03/03
i'm expirementing with a rewrite of much of the insert text codepath, designed to avoid the pitfalls that we have discovered with the current code. Right now my first quick+dirty cut at the rewrite takes around the same amount of time as "fast" version of the old code (ie, the old code with all the improvements we have discovered). I'm trying to write the new version in such a manner as to avoid triggering expensive GetPrimaryFrameFor() calls and also avoid any selection related calls until the paste is completely finished.
I know Joe is rewriting the editor code to not call collapse so much ... but I went ahead and modified nsDOMSelection so that it caches the iterators used in selectFrames(). This seems to shave about a second off the time it takes to paste the 500 and 2250 line case. I'll attatch the updated diffs for nsRangeList.cpp.
Created attachment 6072 [details] [diff] [review] Updated nsRangeList.cpp diff with cached iterators.
reassigning to self
Assignee: sfraser → jfrancis
Status: ASSIGNED → NEW
moving to M17
Target Milestone: M15 → M17
per my recent work and re-profiling, adding dependencies on 5 bugs.
moving to future
Target Milestone: M17 → Future
moving this back to m19 since it blocks 29584 -- I didn't see the dependency earlier
Target Milestone: Future → M19
35292 has been futured 35293 is futured (just now) 35294 is dependent on 43863 (which is futured) and 35294 is futured 35295 is nsbeta3+ 35296 is nsbeta3+ since over half of the dependency bugs are futured, moving this to future
Target Milestone: M19 → Future
Removing dependencies on 35292,35293,35294,35295,35296 since they really aren't dependencies. Bug 53252 is now the tracking bug for all areas that need tuning.
Severity: normal → major
Priority: P3 → P1
Target Milestone: Future → M19
removed future now that a parent paste bug has been opened, paste performance must be fixed before rtm
per beppe removing status whiteboard
Target Milestone: M19 → Future
Mozilla 0.9 this for joe. Joe, if you disagree, please retriage.
Target Milestone: Future → mozilla0.9
the core problem here is 35294. adding dependency. note that bug is blocked by 43863. i'm pushing this off to 0.9.1, not because it isn't critical, but because i don't anticipate movement on the dependencies.
although Beth has moved this out, I think we will make some progress on this in 092.
updating summary, milestone, dependencies
awaiting 35294 work.
the swami says: things that will not land in 099!
Target Milestone: mozilla0.9.9 → mozilla1.0
Moving to Mozilla1.1. Engineers are overloaded working on higher priority bugs.
Target Milestone: mozilla1.0 → mozilla1.1
a lot of work has been done here. We need to start from scratch on perf investigation since many of the hotspots have received attention. On fix that hasn't yet landed is at the front of the first attached patch: kin speeded things up by reordering the switch statement in the editor factory method.
The trunk is the wave of the future!
Target Milestone: mozilla1.1alpha → mozilla1.1beta
The days of having a half dozen milestones out in front of us to divide bugs between seem to be gone, though I dont know why. Lumping everything together as far out as I can. I'll pull back things that I am working on as I go.
Target Milestone: mozilla1.1beta → mozilla1.2beta
migrating milestones now that there are more available. i'll pull these back as I work on them
Target Milestone: mozilla1.2beta → mozilla1.4beta
Assignee: mozeditor → nobody
Status: ASSIGNED → NEW
old bug. down priority to P3.
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.