Open Bug 347689 Opened 14 years ago Updated 3 months ago

Improve plaintext serialization and editing of spaces

Categories

(Core :: DOM: Serializers, defect)

x86
Linux
defect
Not set
critical

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

Details

(Keywords: dataloss)

See bug 218277 comment 59 for David's proposal (which sounds pretty reasonable to me).  See the rest of that bug for related discussion.
See also bug 218277 comment 57 (a couple comments earlier) for an alternate proposal which would be a lot cleaner, as well as easier to implement, *if* we could find a character that no one expects to see in serialized output (I don't know if zwsp is usable for that or not).
No, it's not really cleaner to use a reserved character to do nbsp/space tricks.

It's better to implement 218277#c59, but what would be the lightest way of doing that ?

When you do a view selection source on a text/plain page, it reveals that text/plain pages are in fact internally handled as html surrounded by a <pre> tag.

So I think that similarly the easiest way to do, would be not to create a new text/mode editor component, but just to make sure that whenever text mode is needed, the text edited in the HTML editor is globally surrounded by <pre>. And when hitting the space key inside a <pre> node to only generate real spaces.
The change in the converter would then be to convert nbsp only when outside a <pre> node, and never touch nbsp inside <pre> nodes.

And you know what ? I tested with trunk SeaMonkey composer and inside <pre> tags the space key already only generates actual spaces. The nbsp trick is used when elsewhere, but is already disabled inside <pre>.
=> dataloss, critical as this blocks another dataloss bug
Severity: normal → critical
Keywords: dataloss
Assignee: dom-to-text → nobody
QA Contact: dom-to-text
(In reply to Boris Zbarsky [:bz] (vacation until 7/7) from comment #0)
> See bug 218277 comment 59 for David's proposal (which sounds pretty
> reasonable to me).  See the rest of that bug for related discussion.

Bugs whose description is buried in some other bug are less likely to be understood and get attention...
So here's the description of this bug as envisioned by reporter:

(In reply to David Baron :dbaron: ⌚️UTC-7 from comment bug 218277 comment 59)
> It seems like what we should really be doing is:
>  * when we're editing plain text, store multiple presses of space as spaces
> (this is a change)
>  * when we're editing HTML, store multiple presses of space using
> non-breaking
> spaces for all but the last press (tricky with deletion) (we probably do this
> fine already)
>  * when serializing HTML to text, convert runs of non-breaking spaces
> terminated
> by a space to spaces
>  * when using nsPlainTextSerializer to convert text to text (if we need to
> use
> it at all, although we seem to now), don't mess with spaces
> 
> Does this make sense?

Well, NO.
As a last resort if we can't agree on anything better.

The main reason why we're currently wrongly eliminating non-breaking spaces seems to be that we're needlessly creating them in the first place when user inputs multiple spaces in HTML editor which we then automagically convert into a sequence of non-breaking spaces, followed by one breaking space. WHY!?

I'm totally convinced that we must get rid of all that automagical conversion circus. Whatever reason made us convert user input of multiple regular spaces into non-breaking spaces when editing HTML, that reason is now both BAD and OBSOLETE. I'm actually surprised that such non-standard hacks are still haunting us today; even more surprising, that they are coming from Mozilla as an advocate for standards compliance and teaching the web. An HTML editor which silently converts multiple regular spaces into non-breaking spaces is not only violating the standards, but also teaching users the wrong thing. They'll get used to our deviant automagical behaviour and assume that regular spaces "just work" to do the trick (whichever trick that might be, I'm not sure), a fatally wrong conclusion. Instead, Mozilla should be at the forefront of teaching the difference between normal spaces and non-breaking spaces, so that users are aware and can use the right one according to their purposes, and know that their favorite editor / mail composer / whatever app will just render whichever flavor exactly as entered. Surely non-breakable spaces are no longer an HTML design tool which should be encouraged for general layouting of whitespace. This seems to be a relict from times immemorial where CSS styling was not yet the order of the day.
(In reply to Thomas D. (currently busy elsewhere; needinfo?me) from comment #4)
> (In reply to David Baron :dbaron: ⌚️UTC-7 from comment bug 218277 comment 59)
> > It seems like what we should really be doing is:
> > ...
> Well, NO.
> ...
> I'm totally convinced that we must get rid of all that automagical
> conversion circus. Whatever reason made us convert user input of multiple
> regular spaces into non-breaking spaces when editing HTML, that reason is
> now both BAD and OBSOLETE.

Ummm, I spoke too soon. Between so many use cases where we fail, I overlooked that there's indeed a usecase where the automagical conversion of multiple spaces into nbsp's with trailing space makes sense, as it secures wysiwyg of spacing typed by user in a word processor fashion. For HTML messages, multiple consecutive spaces would otherwise be conflated into a single space, which is undesired. Maybe there might be better ways to secure this whitespace, along bug 218277 comment 23:
> .moznbsp { white-space: pre; display: inline; }

So FTR, David's proposal looks of comment 59 now looks pretty good to me for overall direction.
You need to log in before you can comment on or make changes to this bug.