Closed Bug 57047 Opened 24 years ago Closed 24 years ago

We are missing <pre> when there are other tags inside the <pre>

Categories

(Core :: DOM: Serializers, defect, P3)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: bratell, Assigned: bratell)

Details

(Keywords: dataloss, Whiteboard: fix in hand)

Attachments

(1 file)

If you for instance cut'n'paste from a page looking like below we drop the line 
breaks. That is because we, in the plaintext serializer (nsHTMLToTXTSinkStream 
mk.2) only look for <pre> at the top of the stack and in this case the <font> is 
above the <pre>. 

I've no idea if this is legal HTML but it is used and we fail on it. I think it 
might be too expensive to look at the whole stack, but maybe look at the top 
3-4?


<pre>
  <font color="green">// The system has IP address from Private Address 
  // area,only. Internet from the computer can be accessable 
  // via Proxy. If user turn on proxy connection flag, the 
  // function believe Internet accessable.</font>
</pre>
Marking dataloss since we drop the text's new lines and sometimes those are 
really important.
Whiteboard: dataloss
Yes, I've often wondered when/if this might become a problem.  Easy enough to
look at the top few.  I had previously written a routine that looked all the way
up to the top of the stack (IsInPre), but someone else did a rewrite and got rid
of it.
Status: NEW → ASSIGNED
Keywords: dataloss
Whiteboard: dataloss
Target Milestone: --- → Future
Are these hard numbers (look only for the nenxt few) really necessary? <pre> can
contain line tags, but not block tags. So, I propose to look (down) into the
stack until you find a block-level tag. Just exactly this, if it is <pre> or not.
It might be much more expensive to check agains a list of inline tags than to 
just traverse the whole stack looking for <pre>. Easiest might be to, in 
OpenContainer, count the number of <pre>:s on the stack and keep the number in a 
varable which is decreased which <pre>:s are closed in CloseContainer. Then we 
only have to check if "m_pre_on_stack>0". 
I've constructed a fix for this. It's to look for a pre in the stack until we 
reaches another block level tag. Since the check if a tag is block level or not 
is quite expensive it reduces performance by about 2%. Nothing that anyone will 
notice I guess.

I've also rearranged the two if statements to make the IsPre the final condition 
since it's the most expensive.

I'm looking for a review of this. Akkana? Ben?
Assignee: akkana → bratell
Status: ASSIGNED → NEW
Whiteboard: fix in hand
Target Milestone: Future → mozilla0.9
r=akkana.  Looks good, thanks!
Wouldn't it be a bit cheaper to just maintain state in the serializer indicating
whether a PRE element had been opened? In fact, shouldn't that be what
mPreformatted indicates? That's not the case currently - mPreformatted is set
based on checks for the existence of style properties related to preformatted
block. It seems, though, that it could be subsumed to generally indicate that we
are in a preformatted blocks. It might need to change from a boolean to an
integer count for this.
I thought about that too, but that would be difficult to implement if there are
for instance a table or something else non-pre inside a <pre>. My reasoning was
that this was the way that would break least. If I'm wrong I can do it the other
way. I'm sure you know those things better than me.

For performance my measurements showed about 90 ms to make plain text from the
url in this bug and this code added somewhere close to 2 ms to that.
The other reason not to subsume mPreformatted is that it could be toggled off
prematurely: for instance, if the style says the whole document is formatted, or
if the whole document is wrapped in a <pre>, then somewhere inside there is a
<pre> tag (legal and not too unlikely in the first case, illegal in the second
but could still happen), when the inner <pre> was closed we would prematurely
toggle off mPreformatted.  With Daniel's stack check, this can't happen.
I think that was why Vidur said we would have to switch from a bool to an 
integer (counter). Then we increase the count every time we see a <pre> and 
decrease when we see a </pre>. If it's preformatted from the beginning we start 
the count at 1 and it will never reach zero. But I still like my patch better 
unless someone has any more arguments against it.

Oh, sorry, I missed the integer counter part of the comment.  Yes, that would
work, and in fact, I think one of the many iterations of this code was
implemented that way (as are list counters).  It's really just a matter of
taste.  I tend to prefer Daniel's way because it's straightforward and a little
more reliable (no worry about forgetting to add the increment or decrement code
where it needs to be).  And besides, we have his fix in hand, another advantage
over the other solutions. :-)
I can live with Daniel's fix as well. sr=vidur.
Fix checked in.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Please verify and mark VERIFIED-FIXED...thanks!
Verified on win32.
Status: RESOLVED → VERIFIED
Just FYI,
this bug also exists in MSIE 5.5. Bad Microsoft. :-)
1. the font tag requires a close tag and
2. should be interpreted as a font change inside pre because they are HTML tags (albeit old). if < and > are desired to be seen, they should be encoded with &lt; and &gt; some groups do not understand this yet and new beginners to HTML find this a difficulty. it should be in the manual on the pre element.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: