Open Bug 62188 Opened 19 years ago Updated 11 years ago

Support start number for lists in plain text output

Categories

(Core :: DOM: Serializers, defect, P3, minor)

defect

Tracking

()

REOPENED

People

(Reporter: bratell, Assigned: t_mutreja)

References

(Depends on 1 open bug)

Details

(Whiteboard: [#140541 will fix this])

Attachments

(1 file, 3 obsolete files)

Spinoff of bug 61173. We will start all numbered lists on 1 regardless if 
someone copy'n'pastes the number 4711 to 4717 in a list. There is support in 
html to specify number of an item in a list. We should look at that in the 
plaintext converter.
Bratell, would you please provide some more information about this bug. I tried 
with following:

I copied a list of following from the browser :
   1.  one
   2. two
   3. three
   4. four

then pasted it on notepad and got the following :
   1.  one
   2. two
   3. three
   4. four


but when I copied :
   2. two
   3. three
   4. four

 and pasted it on notepad I got :
* two
* three
* four

whereas if I pasted the same on composer I got: 
   1. two
   2. three
   3. four

Do you mean that the numbering is maintained during HTML serialization but not 
in PlainTextSerialization. Precisely, do we want 

   1. two
   2. three
   3. four

to appear while copying on the notepad.


What I meant this bug for was to add support to nsPlainTextSerializer for the
start attribute in <ol>. That way the editor/copy code could add that attribute
to the copied list items so that they would have the same numbers in the
cut'n'paste. 

If the list is a ranking and I copy the part between 80 and 89 into a list to
show how bad my friend is, it's bad that the output list is numbered 1 to 10. 

That you got an unordered list must be because the copy code doesn't output a
<ol> around the copied <li>:s. Maybe this is the reason they don't? 

So to answer your question, I want notepad to get the following:

   2. two
   3. three
   4. four


Anyway, it's problably quite simple and safe to add the support in
nsPlainTextSerializer. May be a one-liner almost, but I have bigger fish to
catch right now so I will probably not look at it sometime soon and when that is
done the copy code have to be changed to use the start attribute and I have no
idea how simple or difficult that is.


Attached patch Patch to fix this. (obsolete) — Splinter Review
Currently OL and LI tags are not being passed to the serializer so we have no
clue about the ordinal number of LI elements. Problem got more complicated in
order to take care of nested lists, partially selected ordered lists (where we
may completely loose track of the ordinal number for LI if "value" attribute is
not explicitly set with that LI) and to keep the view source cleaner to maximum
possible extent. Also we could not set any other temporary attribute for
internal processing of LI value as that attribute would not be interpreted
correctly if we paste the content to any other HTML editor.

I made following modifications to the code to take care of copying list items
with 'n' levels of nesting and pasting them to some plain text editor / HTML
editor / MSWord etc. :

1. Added OL and LI in nsHTMLCopyEncoder::IncludeInContext().
2. Added a member variable |mIsCopying| to nsDocumentEncoder, which is true
only when the content is coming via nsHTMLCOPYEncoder.
3. While copying the partial list (more complicated when the selected part is
having nested lists too) as we loose track of the actual ordinal number of LI
elements if it's not explicitly set, we added a temporary "value" attribute to
the first LI element of an OL within the selected range. Based on this, ordinal
numbers for the remaining LIs are calculated during serialization.
4. To get the value of this temporary attribute, we start from the first
selected LI item and traverse the DOM Tree back till we get either a previous
sibling LI node with an explicit "value" attribute or there is no more previous
sibling to traverse. In case we don't get any sibling node with a value
attribute, we need to decide the ordinal number based on "start" attribute of
the parent OL node. If parent OL does not contain explicitly mentioned "start"
attribute, set it to a default value of 0. Also to get the right ordinal, we
need to keep track of the number of previous siblings we traversed.
5. Now as the "value" attributes are added to the original DOM Nodes, we have
to avoid displaying them while "view source". To take care of this, as soon as
we add that temporary "value" attribute we push the node in a stack and later
when the destructor of nsDocumentEncoder is called after the copy part is done,
we clean up nodes kept in the stack for removal of the added "value" attribute.
Remove the stack once this clean up is done.
6. By this time, clipboard content is supposed to contain clearly mentioned
ordinal number wherever required, so during HTMLSerialization ordinal numbers
are correctly generated.
7. In PlainText Serializer, we defined a data structure a keep start attribute
of an OL and number of	LI nodes traversed. Later whenever LI tag is
encountered, we check if it contains explicit "value" attribute otherwise we
read the stack to get the start value and number of siblings traversed
before(For that OL). Addition of these two values gives the right ordinal value
for the LI node.
Comment on attachment 69038 [details] [diff] [review]
Patch to fix this.

I can only comment on the nsPlainTextSerializer part but I think you're making
things a little more complicated than needed. 

In the nsPlainTextSerializer there is already a stack that is used to keep
track of the next number for the current ordered list, mOLStack. 

you see the mOLStack[mOLStackIndex-1]++, where the number is increased (removed
by your patch). When you encounter the value attribute you can just set that
value directly instead of increasing by one. I don't see you ever using the
startval number you're saving, except for copying it to other state structs.   
Just using the existing stack should make your patch much smaller.
Giving the bug to Tanu, since he/she has made good progress on it.
Assignee: bratell → tmutreja
RemoveElementAt might be a little cleaner than ReplaceElementAt(nsnull...),
though I expect the implementations are pretty much the same.

I agree with Daniel about using the existing stack; or if you can't use it,
please delete it or integrate it into your new variable, so we don't have two
different OL stacks being maintained.
Thanks Daniel & Akkana! I think I really complicated the stuff. Modifying
nsPlainTextSerializer here, as per your suggestions.
Keywords: patch
Whiteboard: [patch need r/sr=]
Comment on attachment 69222 [details] [diff] [review]
Incorporating the suggestions... 

Much simpler, thanks!  Looks fine to me, r=akkana.
Personally I'd probably move the declaration of startAttr inside the next block
along with startVal, but I'll leave the decision on that to you.
Attachment #69222 - Flags: review+
Comment on attachment 69222 [details] [diff] [review]
Incorporating the suggestions... 

Yes, this looks fine.

If I may ask for a favour, that would be to comment this:
 if (mULCount + mOLStackIndex == 0)
-      EnsureVerticalSpace(1);
+      EnsureVerticalSpace(0);

Those constants are fragile since the serializer is used in so many places.
Comment on attachment 69222 [details] [diff] [review]
Incorporating the suggestions... 

I'm not at all happy about us using a "value" attribute in the DOM for storing
the number of an list item, this is something that should be dealt with in the
text output code, w/o touching the DOM tree. For all we know someone else
could've stored valuable information in the "value" attribute on the list
items.

Keep this information in the text output code.
Attachment #69222 - Flags: needs-work+
Johnny, here we are not setting "value" attribute if LI is already having it. 
But I see no other way to overcome special cases like the one below:

If the original list is something like:

<OL>
<LI>ONE</LI>
<LI VALUE = 100>TWO</LI>
<LI>THREE</LI>
<OL>
<LI>FOUR</LI>
</LI>FIVE</LI>
</OL>
<LI>SIX</LI>
</OL>

and then if I make a selection for copy starting from the LI having "THREE", we 
almost loose the context information for this node. I could not see any other 
way to set an ordinal as 101 for this node. Only if I set a value attribute to 
this node and pass further, serializer can know the actual ordinal for this one 
otherwise serializer will always take this with an ordinal as 3.

So, as far as I understand your concern here, we are not doing anything with an 
LI having "value" attribute so there is no possibility of loosing any 
information. 
Moreover, as we are manipulating this additional attribute only while copying 
and cleaning the node as soon as we are done with copy, there should be no 
eventual impact of this.
Once I'm clear with this part, I'll update the patch for incorporating other 
suggestions too.
Even if we're sure we don't loose value attributes when copying LI's, modifying
the DOM tree while doing the copy is IMO not a good design.

Why not simply have a trivial hash (plhash, or pldhash) that you store the
"value"'s in as you see them (i.e. key on the LI's nsIDOMNode pointer value or
something, and store the "value" in the hash based on that key), and then use
that when you serialize the document? That would leave the DOM tree alone (and
not cause reflows, mutation events, and who knows what as a result of setting
the "value" attributes) and isolate this logic completely inside the serializer
code. This is not hard, and is IMO well worth doing.
Johnny & Akkana, when we use nsPlainTextSerializer for pasting the content on 
text editor, we have Parser Nodes instead of DOM nodes. So putting the DOM 
nodes / elements on hash may not work because while pasting it we will not be  
having handle to the DOM. (Correct me, if I'm wrong here !!!) 

Other solution that I can think now is to set the value attribute to LIs while 
doing HTML serialization(during "copy") in nsHTMLContentSerializer. If 
feasible, that would just be shifting the logic from nsDocumentEncoder--> 
nsHTMLContentSerializer. 

Please suggest if you have anything better in mind and see if the above makes 
some sense. 
If we're serializing to plaintext from parser nodes there is no problem here,
right? In that case we'll be outputting every LI that the parser hands us, and
we'll have every LI so we can easily compute the numbers for every LI as we
serialize them. However, if we're serializing a partial DOM tree into HTML,
we'll have a DOM tree, and we can use the hash ahd output the value atributes
that are needed to make the output correct.
In this patch I've shifted the logic of setting "value" attributes to LI from
nsDocumentEncoder --> nsHTMLContentSerializer and hence we are no more
modifying the original DOM tree.

Daniel, I'm still continuing with the EnsureVerticalSpace(0), as on skipping
this I see 2 new line characters whenever I paste an OL to a plain text editor.
Please let me know if I'm missing some cases where you think it mandatory to
comment it out.
Please test sending OL:s in plain text mails to. If that looks ok, it's fine
with me and r=bratell for the nsPlaintextSerializer part. Just make sure this
matches the tests so we don't break tinderbox. 
Attachment #70251 - Flags: superreview+
Comment on attachment 70251 [details] [diff] [review]
Moving the logic from nsDocumentEncoder  to nsHTMLContentSerializer...

- In nsHTMLContentSerializer::IsFirstChildOfOL():

+    if (state->isFirstListItem)
+      return PR_TRUE;
+    else
+      return PR_FALSE;
+  }
+  else
+    return PR_FALSE;

two else-after-return's, doing it this way is much cleaner:

+    if (state->isFirstListItem)
+      return PR_TRUE;
+
+    return PR_FALSE;
+  }
+
+  return PR_FALSE;

Other than that, sr=jst
Status: NEW → ASSIGNED
Whiteboard: [patch need r/sr=] → [patch need a=]
In nsHTMLContentSerializer::IsFirstChildOfOL(), replaced
+    if (state->isFirstListItem)
+      return PR_TRUE;
+    else
+      return PR_FALSE;
+  }
+  else
+    return PR_FALSE;

by

+    if (state->isFirstListItem)
+      return PR_TRUE;
+
+    return PR_FALSE;
+  }
+
+  return PR_FALSE;
Attachment #69038 - Attachment is obsolete: true
Attachment #69222 - Attachment is obsolete: true
Attachment #70251 - Attachment is obsolete: true
Target Milestone: --- → mozilla1.0
Comment on attachment 71463 [details] [diff] [review]
Including jst's comments.

a=asa (on behalf of drivers) for checkin to 0.9.9 and the mozilla trunk.  

Also, moving reviews forward to the current patch. Let's try to keep the
relevant reviews attached to the relevant patches. Thanks.
Attachment #71463 - Flags: superreview+
Attachment #71463 - Flags: review+
Attachment #71463 - Flags: approval+
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+,
topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword.  Please send any
questions or feedback about this to adt@netscape.com.  You can search for
"Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
Whiteboard: [patch need a=] → [has-approval]
Fixed with checkin
D:\mozilla\content\base>cvs commit
cvs commit: Examining .
cvs commit: Examining public
cvs commit: Examining src
Checking in public/nsIContentSerializer.h;
/cvsroot/mozilla/content/base/public/nsIContentSerializer.h,v  <--  nsIContentSe
rializer.h
new revision: 1.10; previous revision: 1.9
done
Checking in src/nsDocumentEncoder.cpp;
/cvsroot/mozilla/content/base/src/nsDocumentEncoder.cpp,v  <--  nsDocumentEncode
r.cpp
new revision: 1.63; previous revision: 1.62
done
Checking in src/nsHTMLContentSerializer.cpp;
/cvsroot/mozilla/content/base/src/nsHTMLContentSerializer.cpp,v  <--  nsHTMLCont
entSerializer.cpp
new revision: 1.40; previous revision: 1.39
done
Checking in src/nsHTMLContentSerializer.h;
/cvsroot/mozilla/content/base/src/nsHTMLContentSerializer.h,v  <--  nsHTMLConten
tSerializer.h
new revision: 1.14; previous revision: 1.13
done
Checking in src/nsPlainTextSerializer.cpp;
/cvsroot/mozilla/content/base/src/nsPlainTextSerializer.cpp,v  <--  nsPlainTextS
erializer.cpp
new revision: 1.48; previous revision: 1.47
done
Checking in src/nsPlainTextSerializer.h;
/cvsroot/mozilla/content/base/src/nsPlainTextSerializer.h,v  <--  nsPlainTextSer
ializer.h
new revision: 1.18; previous revision: 1.17
done
Checking in src/nsXMLContentSerializer.cpp;
/cvsroot/mozilla/content/base/src/nsXMLContentSerializer.cpp,v  <--  nsXMLConten
tSerializer.cpp
new revision: 1.21; previous revision: 1.20
done
Checking in src/nsXMLContentSerializer.h;
/cvsroot/mozilla/content/base/src/nsXMLContentSerializer.h,v  <--  nsXMLContentS
erializer.h
new revision: 1.10; previous revision: 1.9
done
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
This commit added a compiler warning

content/base/src/nsHTMLContentSerializer.cpp:1147
 `struct nsHTMLContentSerializer::olState * state' might be used uninitialized
in this function

Would not it be better to change it :

--- nsHTMLContentSerializer.cpp 2002/03/12 08:21:51     1.40
+++ nsHTMLContentSerializer.cpp 2002/03/12 20:47:53
@@ -1144,13 +1144,13 @@

   if (parentName.EqualsIgnoreCase("OL")) {
     olState defaultOLState(0, PR_FALSE);
-    olState* state;
+    olState* state=nsnull;
     if (mOLStateStack.Count() > 0)
       state = (olState*)mOLStateStack.ElementAt(mOLStateStack.Count()-1);
     /* Though we should never reach to a "state" as null at this point as
     all LI are supposed to be inside some OL and OL tag should have pushed
     a state to the mOLStateStack.*/
-    if (!state || mOLStateStack.Count() == 0)
+    if (!state)
       state = &defaultOLState;

     if (state->isFirstListItem)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
This should've been filed as a new bug, but let's not worry about that now.
sr=jst for the above patch.
a=asa (on behalf of drivers) for checkin to the 1.0 trunk of the patch in comment 22
Fix checked in for ayn2@cornell.edu

D:\mozilla\content\base\src>cvs commit
cvs commit: Examining .
Checking in nsHTMLContentSerializer.cpp;
/cvsroot/mozilla/content/base/src/nsHTMLContentSerializer.cpp,v  <-- 
nsHTMLContentSerializer.cpp
new revision: 1.42; previous revision: 1.41
done
Status: REOPENED → RESOLVED
Closed: 18 years ago18 years ago
Resolution: --- → FIXED
I had to back out part of the patch.  reopening
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
clearing milestone
Target Milestone: mozilla1.2alpha → ---
Does bug 136595 fit into this discussion at all, or is the list-style-type/CSS
stuff another layer on top of it all which should be dealt with separately?
Backed part of the patch:

@@ -1300,7 +1303,9 @@
       tag.get() == nsHTMLAtoms::h3       ||
       tag.get() == nsHTMLAtoms::h4       ||
       tag.get() == nsHTMLAtoms::h5       ||
-      tag.get() == nsHTMLAtoms::h6) {
+      tag.get() == nsHTMLAtoms::h6       ||
+      tag.get() == nsHTMLAtoms::ol       ||
+      tag.get() == nsHTMLAtoms::li) {
     return PR_TRUE;
   }

Though the remaining patch has not much meaning without this, we did not back 
the whole patch out to make sure that the problem to be fixed in future is just 
a smart addition of OL, LI, UL in nsDocumentEncoder::IncludeInContext().

I'm opening a new bug for that and marking it dependent on that.
Depends on: 140541
Whiteboard: [has-approval] → [#140541 will fix this]
*** Bug 149956 has been marked as a duplicate of this bug. ***
Any update on this? Just running through some old bugs i'm cc'd on and see no
movement in almost 7 months
QA Contact: sujay → dom-to-text
You need to log in before you can comment on or make changes to this bug.