Using Build ID: 2002052402 Steps to reproduce problem: 1. Open HTML editor. 2. Start a bulleted list. Type "Test 1" and Enter. 3. Press TAB. Type "Test 2" and Enter. 4. Press TAB. Type "Test 3". 5. Select All, Copy. 6. Paste into a plain text field. Expected results: * Test 1 o Test 2 + Test 3 Actual results: * Test 1 o Test 2 + Test 3 Additional information: In the process of determining the above steps, I now notice that if you replace Step 5 with Select All, Cut, Paste, Select All, Copy it pastes correctly!
Created attachment 89682 [details] [diff] [review] Patch V1.0 I tested the patch with all the build test cases and quite many others that I could think of. I'm wondering if for some reasons we really need to add a space for the first character when nsIDocumentEncoder::OutputSelectionOnly is true i.e. (!mStartedOutput && mFlags | nsIDocumentEncoder::OutputSelectionOnly) is true.
Comment on attachment 89682 [details] [diff] [review] Patch V1.0 I think I see why this was there. This does fix the problem for lists, because even though mStartedOutput is false, we actually have already output something (the bullet). But for something that doesn't have a bullet, and starts with a space, and isn't in a pre, the space will get lost. Try loading any page (e.g. the start page) and select a space followed by a word, then pasting the plaintext somewhere. With the patch, the space gets lost. Perhaps the solution is to output the space when we output the bullet, and set mStartedOutput at that time, so that we won't get into this clause? Or would that cause problems somewhere else?
In fact PlainTextSerializer is working properly and the bug seems to be in Composer. When a list is created by Mozilla composer, it adds a new line and two white spaces after every "UL" and "OL" and this keeps on increasing with nesting. For an HTML generated by following the steps mentioned in this bug i.e. through Mozilla Editor, DUMP CONTENT output is as follows: body@02A2A320 refcount=4< Text@02A3A7A0 refcount=3<\n> ul@02A3A6F0 refcount=3< Text@02A3A6A0 refcount=3<\n > li@02A3A5B0 refcount=4< Text@02A3A560 refcount=3<test1> > Text@02A3A470 refcount=3<\n > ul@02A3A410 refcount=3< Text@02A3A3C0 refcount=3<\n > li@02A3A320 refcount=4< Text@02A3A2D0 refcount=3<test2> > Text@02A3A1E0 refcount=3<\n > ul@02A3A140 refcount=3< Text@02A3A0F0 refcount=3<\n > li@02A3A050 refcount=4< Text@02A3BFD0 refcount=3<test3> br@02A3BEA0 refcount=3<> Text@02A3BDF0 refcount=3<\n > > Text@02A3BCC0 refcount=3<\n > > Text@02A3BBF0 refcount=3<\n > > Text@02A3BB50 refcount=3<\n> > Text@02A3A7F0 refcount=3<\n> > > However if an HTML file is written with some text editor and no additional space / lines are added, Serializer works simply perfect. If there is some reason for those spaces(though it doesn't look like), we can try for some band- aid fix.
Oh, those are the indentation spaces and I don't know how to differentiate them from a genuine space. Though, if the Parser bug# 15378 can be fixed to neglect newlines and white spaces, this bug will just go away.