Open
Bug 215700
Opened 21 years ago
Updated 2 years ago
In plaintext, U+2028 and U+2029 each display as "?" but should display as newlines
Categories
(Core :: Layout: Text and Fonts, defect)
Tracking
()
NEW
People
(Reporter: sburke, Unassigned)
References
(Blocks 1 open bug, )
Details
(Keywords: testcase)
Attachments
(1 file)
103 bytes,
text/plain
|
Details |
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6
Build Identifier: Mozilla/5.0 (Windows; U; Win98; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6
Unicode has two newline characters, u+2028 and U+2029, besides the
normal \n and \r that we all know so well.
2028 (that's hex) is "LINE SEPARATOR" and
2029 (that's hex) is "PARAGRAPH SEPARATOR".
Currently Mozilla doesn't seem to implement these, so viewing the plaintext file
at the above URL (which has several instances of these characters, and no \n's
or \r's) shows as just one line, whereas it should show as four.
If the above URL is unreachable, you can reproduce it with this Perl program:
use utf8;
open OUT, ">:utf8", 'uninl.txt' or die $!;
print OUT "\x{FEFF}", # BOM
"First paragraph.\x{2029}",
"Second paragraph\x{2029}",
"Third paragraph, first line.\x{2028}",
"Third paragraph, second line\x{2028}",
;
close(OUT);
Also, these Unicode newline characters occur quite frequently in
ftp://www.unicode.org/Public/TEXT/FIVEBOOKS
Reproducible: Always
Steps to Reproduce:
1.Start browser
2.View http://interglacial.com/~sburke/uninl.txt
Actual Results:
It shows a single line of text that looks like this:
First paragraph.?Second paragraph?Third paragraph, first line.?Third paragraph,
second line?
Expected Results:
It should instead display as:
First paragraph.
Second paragraph
Third paragraph, first line.
Third paragraph, second line?
I don't know how these characters should be treated if they occur in HTML
(inside or outside of PRE and the like), whether raw or &-encoded. But that
seems a larger and quite separate issue from what I'm reporting.
Presumably the issue of u+2028 and u+2029 in plaintext has a clearer and simpler
solution.
Comment 1•21 years ago
|
||
This could well be a dupe of bug 33032, which is about all the whitespace
characters in this Unicode range. However not a lot happening on that bug :-(
See also bug 138215.
Comment 2•21 years ago
|
||
Could you attach testcase please
Reporter | ||
Comment 3•21 years ago
|
||
Comment 4•21 years ago
|
||
marking as duplicate of bug 33032
transferring over the testcase
*** This bug has been marked as a duplicate of 33032 ***
Status: UNCONFIRMED → RESOLVED
Closed: 21 years ago
Resolution: --- → DUPLICATE
It's worth reading http://www.w3.org/TR/unicode-xml/#Line
Comment 6•21 years ago
|
||
Status: RESOLVED → UNCONFIRMED
Resolution: DUPLICATE → ---
Updated•20 years ago
|
Updated•15 years ago
|
Assignee: layout.fonts-and-text → nobody
QA Contact: ian → layout.fonts-and-text
Still exists in:
Mozilla/5.0 (X11; U; Linux i686; ru; rv:1.9.2.3) Gecko/20100401 Firefox/3.6.3
Comment 8•13 years ago
|
||
- The CSS 2.1 test suite recently got http://test.csswg.org/source/approved/css2.1/src/bidi-text/bidi-breaking-003.xht, which tests for this in the context of <pre>, where (as in <textarea>) there is absolutely no excuse not to support LINE SEPARATOR and TEXT SEPARATOR.
- Furthermore, HTML 4.01 (http://www.w3.org/TR/html401/struct/text.html#h-9.1) explicitly excluded LINE SEPARATOR and TEXT SEPARATOR from the categories of line breaks and whitespace. (Similarly, HTML5 leaves it all up to CSS, and CSS3 (http://dev.w3.org/csswg/css3-text/#white-space-rules) simply does not include LINE SEPARATOR and TEXT SEPARATOR in these categories.) It is the handling of these categories that constitutes the basic difference between the text inside <pre> and <textarea> and the text under other elements. Since LINE SEPARATOR and TEXT SEPARATOR are not in these categories, I do not see why they have to be handled any differently in <pre> and <textarea> and in other elements.
- Re http://www.w3.org/TR/unicode-xml/#Line, it is a set of guidelines for document authors. It is *not* a set of guidelines for what browsers should and should not support. Its recommendation to "use <xhtml:br /> instead of U+2028 and surround paragraphs by <xhtml:p> and </xhtml:p> instead of separating them with U+2029" never really held water. It certainly needs to be updated now that the HTML5 spec for <br> changed it from being a line separator to being a paragraph separator. If it really hates recommending using LINE SEPARATOR, it can recommend using <bdi><br/></bdi>, I guess. But I personally would prefer to use LINE SEPARATOR (as 
).
- It would be great if someone could mark this bug as blocking 613154; I don't have the rights.
Comment 9•13 years ago
|
||
BTW, I hope that when this bug is fixed, it will also result in the support for LINE SEPARATOR and TEXT SEPARATOR in alert() and confirm() (where there is also no good excuse not to support them). Do I remember correctly that alert() and confirm() are implemented via an element with preformatted whitespace?
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•