Closed
Bug 128181
Opened 23 years ago
Closed 23 years ago
ncr between 128-159 does not work in html attribute value
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: morse, Assigned: shanjian)
Details
(Keywords: intl, Whiteboard: [ADT3])
Attachments
(8 files, 2 obsolete files)
Bring up the following html. You will notice that the dash (ascii 96) displays
fine by itself. But inside a button it becomes either a vertical bar or a
question mark, depending on whether it was written as a single 8-bit character
or escaped using the ampersand notation.
<html>
<body>
Here's a hyphen as a single ascii character (2D): "-"<br>
Here's same hyphen in a button: <input type="button" value=" - "><br>
<br>
Here's a dash as a single ascii character (96): "–"<br>
Here's same dash in a button: <input type="button" value=" – "><br>
<br>
Here's a dash in escaped notation (ampersand #150;): "–"<br>
Here's same dash in a button: <input type="button" value=" – "><br>
<body>
</html>
Reporter | ||
Comment 1•23 years ago
|
||
Reporter | ||
Comment 2•23 years ago
|
||
Note: Although that the dash character in my sample code above looks identical
to the hyphen character, they are really two different ascii codes.
Comment 3•23 years ago
|
||
Worked fine on my machine but I can verify that the problem exists on Steve's
NT4 machine. madhur, would you be able to test this and see if it's a problem
specific to NT4?
Comment 4•23 years ago
|
||
BuildId 2002022603 NT4
See the same as Stephen P. Morse.
Reporter | ||
Comment 5•23 years ago
|
||
I just tried this on linux. This time the single 8-bit character display (ascii
96) inside the button was correct. But the ampersand notation inside the button
was not correct -- it gave the question mark, just as it did on winNT.
Reporter | ||
Comment 6•23 years ago
|
||
Comment 7•23 years ago
|
||
Ok, I do see the problem on my Linux machine. I have no idea why the text would
render differently inside a button. Tentatively guessing that this is an i18n
issue... can you guys shed some light on why this renders as a question mark?
(Changing OS to All)
Assignee: bryner → yokoyama
Component: HTML Form Controls → Internationalization
OS: Windows NT → All
QA Contact: madhur → ruixu
Hardware: PC → All
Reporter | ||
Comment 8•23 years ago
|
||
Here's an even more interesting test case. It generates all ascii characters
between decimal 100 ("d") and decimal 168. For each case it shows what the
character looks like normally, what it looks like inside a button using a single
ascii character code, and what it looks like inside a button using the amersand
notation.
Note that there doesn't appear to be any pattern as to which characters are
displayed correctly, nor is there a pattern as to which incorrect character it
is displayed as when it is not displayed correctly.
Will attach screen shots.
<html>
<body>
<form>
ampersand #100; "d" <input type="button" value=" d "> <input
type="button" value=" d "><br>
ampersand #101; "e" <input type="button" value=" e "> <input
type="button" value=" e "><br>
ampersand #102; "f" <input type="button" value=" f "> <input
type="button" value=" f "><br>
ampersand #103; "g" <input type="button" value=" g "> <input
type="button" value=" g "><br>
ampersand #104; "h" <input type="button" value=" h "> <input
type="button" value=" h "><br>
ampersand #105; "i" <input type="button" value=" i "> <input
type="button" value=" i "><br>
ampersand #106; "j" <input type="button" value=" j "> <input
type="button" value=" j "><br>
ampersand #107; "k" <input type="button" value=" k "> <input
type="button" value=" k "><br>
ampersand #108; "l" <input type="button" value=" l "> <input
type="button" value=" l "><br>
ampersand #109; "m" <input type="button" value=" m "> <input
type="button" value=" m "><br>
ampersand #110; "n" <input type="button" value=" n "> <input
type="button" value=" n "><br>
ampersand #111; "o" <input type="button" value=" o "> <input
type="button" value=" o "><br>
ampersand #112; "p" <input type="button" value=" p "> <input
type="button" value=" p "><br>
ampersand #113; "q" <input type="button" value=" q "> <input
type="button" value=" q "><br>
ampersand #114; "r" <input type="button" value=" r "> <input
type="button" value=" r "><br>
ampersand #115; "s" <input type="button" value=" s "> <input
type="button" value=" s "><br>
ampersand #116; "t" <input type="button" value=" t "> <input
type="button" value=" t "><br>
ampersand #117; "u" <input type="button" value=" u "> <input
type="button" value=" u "><br>
ampersand #118; "v" <input type="button" value=" v "> <input
type="button" value=" v "><br>
ampersand #119; "w" <input type="button" value=" w "> <input
type="button" value=" w "><br>
ampersand #120; "x" <input type="button" value=" x "> <input
type="button" value=" x "><br>
ampersand #121; "y" <input type="button" value=" y "> <input
type="button" value=" y "><br>
ampersand #122; "z" <input type="button" value=" z "> <input
type="button" value=" z "><br>
ampersand #123; "{" <input type="button" value=" { "> <input
type="button" value=" { "><br>
ampersand #124; "|" <input type="button" value=" | "> <input
type="button" value=" | "><br>
ampersand #125; "}" <input type="button" value=" } "> <input
type="button" value=" } "><br>
ampersand #126; "~" <input type="button" value=" ~ "> <input
type="button" value=" ~ "><br>
ampersand #127; "" <input type="button" value=" "> <input
type="button" value="  "><br>
ampersand #128; "€" <input type="button" value=" € "> <input
type="button" value=" € "><br>
ampersand #129; "" <input type="button" value=" � "> <input
type="button" value="  "><br>
ampersand #130; "‚" <input type="button" value=" ‚ "> <input
type="button" value=" ‚ "><br>
ampersand #131; "ƒ" <input type="button" value=" ƒ "> <input
type="button" value=" ƒ "><br>
ampersand #132; "„" <input type="button" value=" „ "> <input
type="button" value=" „ "><br>
ampersand #133; "…" <input type="button" value=" … "> <input
type="button" value=" … "><br>
ampersand #134; "†" <input type="button" value=" † "> <input
type="button" value=" † "><br>
ampersand #135; "‡" <input type="button" value=" ‡ "> <input
type="button" value=" ‡ "><br>
ampersand #136; "ˆ" <input type="button" value=" ˆ "> <input
type="button" value=" ˆ "><br>
ampersand #137; "‰" <input type="button" value=" ‰ "> <input
type="button" value=" ‰ "><br>
ampersand #138; "Š" <input type="button" value=" Š "> <input
type="button" value=" Š "><br>
ampersand #139; "‹" <input type="button" value=" ‹ "> <input
type="button" value=" ‹ "><br>
ampersand #140; "Œ" <input type="button" value=" Œ "> <input
type="button" value=" Œ "><br>
ampersand #141; "" <input type="button" value=" � "> <input
type="button" value="  "><br>
ampersand #142; "Ž" <input type="button" value=" Ž "> <input
type="button" value=" Ž "><br>
ampersand #143; "" <input type="button" value=" � "> <input
type="button" value="  "><br>
ampersand #144; "" <input type="button" value=" � "> <input
type="button" value="  "><br>
ampersand #145; "‘" <input type="button" value=" ‘ "> <input
type="button" value=" ‘ "><br>
ampersand #146; "’" <input type="button" value=" ’ "> <input
type="button" value=" ’ "><br>
ampersand #147; "“" <input type="button" value=" “ "> <input
type="button" value=" “ "><br>
ampersand #148; "”" <input type="button" value=" ” "> <input
type="button" value=" ” "><br>
ampersand #149; "•" <input type="button" value=" • "> <input
type="button" value=" • "><br>
ampersand #150; "–" <input type="button" value=" – "> <input
type="button" value=" – "><br>
ampersand #151; "—" <input type="button" value=" — "> <input
type="button" value=" — "><br>
ampersand #152; "˜" <input type="button" value=" ˜ "> <input
type="button" value=" ˜ "><br>
ampersand #153; "™" <input type="button" value=" ˜ "> <input
type="button" value=" ™ "><br>
ampersand #154; "š" <input type="button" value=" š "> <input
type="button" value=" š "><br>
ampersand #155; "›" <input type="button" value=" › "> <input
type="button" value=" › "><br>
ampersand #156; "œ" <input type="button" value=" œ "> <input
type="button" value=" œ "><br>
ampersand #157; "" <input type="button" value=" � "> <input
type="button" value="  "><br>
ampersand #158; "ž" <input type="button" value=" ž "> <input
type="button" value=" ž "><br>
ampersand #159; "Ÿ" <input type="button" value=" Ÿ "> <input
type="button" value=" Ÿ "><br>
ampersand #160; " " <input type="button" value=" "> <input
type="button" value="   "><br>
ampersand #161; "¡" <input type="button" value=" ¡ "> <input
type="button" value=" ¡ "><br>
ampersand #162; "¢" <input type="button" value=" ¢ "> <input
type="button" value=" ¢ "><br>
ampersand #163; "£" <input type="button" value=" £ "> <input
type="button" value=" £ "><br>
ampersand #164; "¤" <input type="button" value=" ¤ "> <input
type="button" value=" ¤ "><br>
ampersand #165; "¥" <input type="button" value=" ¥ "> <input
type="button" value=" ¥ "><br>
ampersand #166; "¦" <input type="button" value=" ¦ "> <input
type="button" value=" ¦ "><br>
ampersand #167; "§" <input type="button" value=" § "> <input
type="button" value=" § "><br>
ampersand #168; "¨" <input type="button" value=" ¨ "> <input
type="button" value=" ¨ "><br>
</form>
<body>
</html>
Reporter | ||
Comment 9•23 years ago
|
||
Comment 10•23 years ago
|
||
Steve, the HTML testcase would be easier to work with if you could attach it as
an HTML attachment, instead of pasting it in a comment.
Reporter | ||
Comment 11•23 years ago
|
||
Comment 12•23 years ago
|
||
ftang: this is bad; but I need your help on this?
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.0
Updated•23 years ago
|
Comment 13•23 years ago
|
||
€
…
–
Ž
•
ž
have this issue. this are invalid NCR which refer to range 0x80 - 0x9f
nsbeta1+
Comment 14•23 years ago
|
||
ftang: I am little confused.
For example, why #137 is invalid?
I could see '0/00' in a plain text; but not in button.
http://bugzilla.mozilla.org/attachment.cgi?id=72340&action=view
Comment 16•23 years ago
|
||
I think this is one issue we may consider to cut.
give to shanjian since this seems related to text display in xul button.
Assignee: yokoyama → shanjian
Assignee | ||
Comment 17•23 years ago
|
||
The testcase revealed 2 problems,
1) We have 2 different interpretation of NCR, that is incorrect. It also reveals that
we have problem with NCR surrogate support too, because surrogate support is only
available in one place.
2) We assume all characters in currect code page, from 0 to 255 are available in a
raster font. That's incorrect. Different raster fonts varies in glyph set it covers.
In almost all raster font for ansi, 127, 128 to 144, 147 to 159 are not available.
I need information for varies localized window to confirm this.
Status: NEW → ASSIGNED
Assignee | ||
Comment 18•23 years ago
|
||
Comment 19•23 years ago
|
||
I think we should split these two problem.
file one bug for raster font issue and file another bug for ncr issue. I think
the raster font problem is more important than the ncr issue.
Assignee | ||
Comment 20•23 years ago
|
||
bug 134733 is filed for raster font cmap problem. This bug will be dedicated to NCR problem.
Summary: 8-bit characters don't display properly in buttons → we should have only one NCR parsing path
Assignee | ||
Comment 21•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #76657 -
Attachment is obsolete: true
Comment 22•23 years ago
|
||
shanjian- if we don't take the fix of this bug , what may break ? in what
condiction we will hit the missing part ?
Comment 23•23 years ago
|
||
shanjian- I need impact analysis, in what case the newly added ncr_hack handling
will be neded ?
Comment 24•23 years ago
|
||
shanjian- I cannot justify this issue is important or not important without
detail information. Please tell me what will it break without your fix
Comment 25•23 years ago
|
||
shanjian said the problem show when the ncr is in attribute value without his patch
shanjian- could you make the patch smaller by 1. not moving the
+#define PA_REMAP_128_TO_160_ILLEGAL_NCR 1
+
+#ifdef PA_REMAP_128_TO_160_ILLEGAL_NCR
+/**
+ * Map some illegal but commonly used numeric entities into their
+ * appropriate unicode value.
+ */
+#define NOT_USED 0xfffd
+
+static PRUint16 PA_HackTable[] = {
+
0x20ac, /* EURO SIGN */
+
NOT_USED,
+
0x201a, /* SINGLE LOW-9 QUOTATION MARK */
+
0x0192, /* LATIN SMALL LETTER F WITH HOOK */
+
0x201e, /* DOUBLE LOW-9 QUOTATION MARK */
+
0x2026, /* HORIZONTAL ELLIPSIS */
+
0x2020, /* DAGGER */
+
0x2021, /* DOUBLE DAGGER */
+
0x02c6, /* MODIFIER LETTER CIRCUMFLEX ACCENT */
+
0x2030, /* PER MILLE SIGN */
+
0x0160, /* LATIN CAPITAL LETTER S WITH CARON */
+
0x2039, /* SINGLE LEFT-POINTING ANGLE QUOTATION MARK */
+
0x0152, /* LATIN CAPITAL LIGATURE OE */
+
NOT_USED,
+
0x017D, /* LATIN CAPITAL LETTER Z WITH CARON */
+
NOT_USED,
+
NOT_USED,
+
0x2018, /* LEFT SINGLE QUOTATION MARK */
+
0x2019, /* RIGHT SINGLE QUOTATION MARK */
+
0x201c, /* LEFT DOUBLE QUOTATION MARK */
+
0x201d, /* RIGHT DOUBLE QUOTATION MARK */
+
0x2022, /* BULLET */
+
0x2013, /* EN DASH */
+
0x2014, /* EM DASH */
+
0x02dc, /* SMALL TILDE */
+
0x2122, /* TRADE MARK SIGN */
+
0x0161, /* LATIN SMALL LETTER S WITH CARON */
+
0x203a, /* SINGLE RIGHT-POINTING ANGLE QUOTATION MARK */
+
0x0153, /* LATIN SMALL LIGATURE OE */
+
NOT_USED,
+
0x017E, /* LATIN SMALL LETTER Z WITH CARON */
+
0x0178 /* LATIN CAPITAL LETTER Y WITH DIAERESIS */
+};
+#endif /* PA_REMAP_128_TO_160_ILLEGAL_NCR */
+
+#define H_SURROGATE(s) ((PRUnichar)(((PRUint32)s - (PRUint32)0x10000) >> 10) +
(PRUnichar)0xd800)
+#define L_SURROGATE(s) ((PRUnichar)(((PRUint32)s - (PRUint32)0x10000) & 0x3ff)
+ (PRUnichar)0xdc00)
+#define IS_IN_BMP(ucs4) ((ucs4) < 0x10000)
+
part,
2. declare only but not implement your
static void AppendNCR(nsString& aString, PRInt32 aNCRValue)
around line 1549
3. implement your static void AppendNCR(nsString& aString, PRInt32 aNCRValue)
after line 2168
in this way, you can probably reduce the size of the patch to about 10 lines of
changes.
Assignee | ||
Comment 26•23 years ago
|
||
Comment 27•23 years ago
|
||
Comment 28•23 years ago
|
||
this is before the change, shanjian, please got sr= and attach image of after
the patch
Comment 29•23 years ago
|
||
Comment on attachment 77710 [details] [diff] [review]
the same patch, but there is no code relocation as suggested by ftang.
r=ftang, make sense. please ask for sr=
Attachment #77710 -
Flags: review+
Comment 30•23 years ago
|
||
it looks like another low risk fix mid impact change to me. This will certainly
make our surrogate support better and make sure when user use those hacky ncr in
the attribute value (such as in form or alt text), they behave similar to when
the ncr is in the content
Updated•23 years ago
|
Attachment #77123 -
Attachment is obsolete: true
Assignee | ||
Comment 31•23 years ago
|
||
Assignee | ||
Comment 32•23 years ago
|
||
jst, could you sr=?
Comment 33•23 years ago
|
||
Comment on attachment 77710 [details] [diff] [review]
the same patch, but there is no code relocation as suggested by ftang.
sr=jst
Attachment #77710 -
Flags: superreview+
Comment 34•23 years ago
|
||
low risk fix. the real impact is unknow, probably not very high. But I think we
should consider this for adt1.0.0 from the screenshot
Keywords: adt1.0.0
Comment 35•23 years ago
|
||
change subject to "ncr between 128-159 does not work in html attribute value"
Summary: we should have only one NCR parsing path → ncr between 128-159 does not work in html attribute value
Updated•23 years ago
|
Whiteboard: adt3 → [ADT3]
Comment 36•23 years ago
|
||
adt1.0.0- Ftang will document the reason.
Comment 37•23 years ago
|
||
we decide not to take this one based on the following reason:
1. these ncr are not legal html ncr and usually people won't use it.
2. we have this behavior for several years already and we didn't receive
complains from exteranl customer about this at all.
Therefore, it is not worthy to take it right now. please check into the trunk
after m1.0 branch off.
Comment 38•23 years ago
|
||
Comment on attachment 77710 [details] [diff] [review]
the same patch, but there is no code relocation as suggested by ftang.
a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #77710 -
Flags: approval+
Assignee | ||
Comment 39•23 years ago
|
||
fix checked in to trunk. Close this bug.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 40•23 years ago
|
||
Um, Sorry... but why was this checked in to the trunk now? Will it make it to
1.0? According to comment #37 from ftang it will not because it is non-standard
and nobody has really complained... so... why should we accept non-standard
behavior into the trunk, *especially* if we are not going to have this behavior
in our biggest, and most important release (1.0)? Accepting this fix in the
trunk seems a bit strange -- it only encourages bad behavior in webpages, at the
cost of our standards compliance.
Answering Roy's question in comment 14, ‰ is invalid because it is not in
the HTML specification. For example, HTML defines the per mille symbol as:
<!ENTITY permil CDATA "‰" -- per mille sign, U+2030 ISOtech -->
and not ‰ see http://www.w3.org/TR/html4/sgml/entities.html for more
definitions.
We should be looking to remove non-standard behaviour and not add it. Please
back this patch out and mark this bug wontfix.
Assignee | ||
Comment 41•23 years ago
|
||
Simply say it is non-standard is not enough reason for me to back out this
change. In real life we have to make balance between non-standard and other
factors, like the compatibility with old browser. As long as it does not voilate
standard, we have to be a little bit forgiven. Besides we don't want keep 2 code
path, which give users different behaviors when text appears in text and
attribute value. If it is decided that this is really evil, we should back out
them all.
Comment 42•23 years ago
|
||
> Simply say it is non-standard is not enough reason for me to back out this
change.
It was enough reason for ftang and ADT to reject this fix for 1.0. If this is
an important issue for webmasters, then this fix should also go into the 1.0
branch because that is the important release. Not the trunk.
> In real life we have to make balance between non-standard and other
factors, like the compatibility with old browser.
The fix is _NOT_ compatible with old browsers. Netscape 4.77 does not work with
this testcase very well either. Most items display "?" for me and I only see a
few of them display anything at all. We add more illegal characters than
Netscape 4 displayed. We should at the worst only support what NS 4.7 does. We
should not add any extra violations.
> As long as it does not voilate standard, we have to be a little bit forgiven.
But it _does_ violate the standard...
> Besides we don't want keep 2 code path, which give users different behaviors
> when text appears in text and attribute value. If it is decided that this is
> really evil, we should back out them all.
I agree. It is bad that different things happen in attributes and regular text
nodes. Should we consider discontinuing support for it in all places? It seems
that hardly many people use these characters since if our behavior was "broken"
for several years as ftang says (comment #37) and nobody complained, then it
could be safe to assume that support for these can be safely removed...
Assignee | ||
Comment 43•23 years ago
|
||
We have the "feature" there for html text for quite some time. The hack was
there since 9/22/1998. We just don't have it for attribute value.
Personally, I don't have strong opinion for either approach, ie. keep or remove
the hack. As long as it is acceptable to users and make our product better, I
will do it. I would suggest you file a separate bug about this issue and have
discussion there. In this bug, I just want to use a single code path for all NCR
parsing. Besides NCR hacking, I added surrogate support as well.
Comment 44•23 years ago
|
||
Fixed verified on 04-11trunk build / linux RH7.2.
Since we are not going to check this into branch (per comment #37), I'll mark
this one as verified.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•