Closed Bug 17962 Opened 25 years ago Closed 24 years ago

Display all HTML 4 character entities in browser correctly

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla0.8.1

People

(Reporter: sidr, Assigned: ftang)

References

()

Details

(Keywords: html4, intl)

Attachments

(7 files)

All of the HTML 4 character entities should display a useful and meaningful
glyph when referenced in an HTML file.
Depends on: 17958
At present, there are at least 3 issues outstanding.

1. On Windows NT (and 95) the "Miscellaneous Technical" char entities
   testcase crashes the browser, and the browser hangs instead of displaying
   the "ISO 8859-1" char entities, preventing inspection of those testcases
   on Win32. This is bug 17958.

2. On linux, • displays as "•" instead of as a bullet. This is
   bug 16872. Possibly other character entities are affected, testing the
   testcases linked from the attachment above will tell the tale.

3. On linux, “ and ” (left and right double quotes) delay loading
   of straightforward pages for tens of seconds on at least some machines,
   if they end up being found in an ISO 10646-1 (unicode) sharacter set,
   due to the sheer size of the unicode set. This is bug 14961. On Windows and
   Mac, these characters are part of the first 256 characters in any set. This
   could be a general problem for other platforms that use ISO 8859-1, rather
   than the Windows or Mac adaptions, as their default character set. For
   practical purposes, if character entities take tens of seconds to find,
   that can't be considered adequate support. Also, even if this took less
   time, it is less than ideal to use a different font for the glyphs for
   only a few characters.

Issue 1 is waiting on bug 17958.

Issues 2 & 3 are of unknown severity at present. The only simple way
to find out their severity would be to view the character entity testcases
on several platforms other than Windows and Mac.

BTW, the component is set to "internationalization" not because this is
of consequence only for i18n (this is really a "Browser-General" problem),
but because presumably that team is already working with character entities
and character sets.
Assignee: ftang → erik
Depends on: 14961, 16872
Target Milestone: M20
This looks like a tracking bug instead of a real bug for me. All the problem
mention here in this bug report have a seperate bug # associate w/ me. Mark this
M20 since this is a tracking bug. Seperate bug# should have different M number
since they should be fix eariler.

Ressign this bug to erik, even it is a tracking bug. Most of the stuff mention
here are GFX issue.
Sorry, yes, this is *also* a tracking bug, but it probably shouldn't be.
Resolving the three bugs referred to won't invalidate this report.
The only thing that can invalidate this report is testing.

At its core, this bug is the general case for bug 16872, where
• displays incorrectly on at least one platform. The reasoning:
where there's smoke, there may be fire. The real question is, do all
of the HTML 4 character entities that should display something display
something useful on all Platform/OS combos?
Status: NEW → ASSIGNED
Bug 16872, which this bug depends on, was resolved as a dup of bug 454. 
Updating dependencies.
Depends on: 454
No longer depends on: 16872
Depends on: 32412
Depends on: 33498
Depends on: 33501
We need to add transliterations for all of the HTML4 CERs to the transliteration
table. See also bug 33498 and bug 33501, which were created to track the
addition of transliteration to the Windows and Mac versions of the font engine.
(The Unix version already calls that API.)
Depends on: 36163
Many entities being displayed correctly in M16 have "broken" in the nightly I'm
using now (ID 2000071620).  They appear as inverted solid triangles.

There are more complete entity reference pages, but mine is at
http://www.r5i.com/~tim/symbols.shtml, or see the letterlike symbols, math
symbols, and arrow in the test case attachment.

Sorry if this is the wrong place for this.  It's the closest I could find in my
Bugzilla search.
These looked OK for me with last week's 2000071108 build on US Win95, but I
see the inverted triangles with today's build, 2000071709.

Reassigned to ftang because erik just left for sabbatical.
Assignee: erik → ftang
Status: ASSIGNED → NEW
See comments in bug 45543.
Testing with the 2000-07-17-09-M17 nightly binary on WinNT, 53 of the HTML 4
character entities display the same, incorrect, glyph, one that looks like a 
bold, bold left single quote mark. 

None of the ISO-8859-1 characters are affected. The affected symbols are mostly
mathematical or quasi-mathematical. They are found in:

http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2604
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2606
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2607
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2608
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2610
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2616

Nominating for nsbeta3 - surely reviewers will subject it to test suites,
and this is a very basic test to be failing. Looking at comments in bug
45543, the full fix for this will probably be waiting until Erik gets back:
see the Additional Comments From rbs@maths.uq.edu.au 2000-07-17 11:18.
This bug would almost certainly depend on bug 45543 except that there are no 
Korean characters in the HTML 4 set; the root problem looks to be the same.

rbs@maths.uq.edu.au, is the current problem with the math glyphs a blocker for 
you?

Updating Platform/OS to All/All, as this will need to be verified everywhere
for HTML 4.0 compliance.
Keywords: nsbeta3
OS: Windows NT → All
Hardware: PC → All
> is the current problem with the math glyphs a blocker for you?

No, it isn't. The hack I indicated on 45543 is temporarily doing the trick.
When I visit the links you gave above, they look okay (with missing glyphs
represented by '?' as expected). Also, MathML-enabled builds include the ucvmath 
module which gives access to more mathematical/scientific symbols for those who 
have the corresponding fonts.

(Notice that the default Mozilla can display many of the HTML4 symbols if
the user has the "Lucida Unicode Sans" font.)
reassign back to erik. It seems the fix for 45543 is good for short term and we 
should wait erik back to fix the rest.
Assignee: ftang → erik
This now WORKSFORME completely on Windows 2000 commerical build 6.0.17.2000080104.
Should this be closed, or are there remaining issues?
Whiteboard: WORKSFORME?
 > Should this be closed, or are there remaining issues?
Not quite yet, and who knows? To date the greatest number of font diplay
problems have occurred on Linux, but this also needs testing on Mac to
be sure that all of the character entities display properly on the Tier 1
builds. For fonts issues, testing on Windows only is not enough.
Whiteboard: WORKSFORME? → Win: WFM; Linux: ???; Mac: ???;
Agreed.

Eli, can you verifiy that this is WORKSFORME on all three tier 1 platforms?
The attachement is a set of links to the comprehensive testcases you were 
after the other day...
QA Contact: teruko → elig
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2608
does not seem to be the right testcase

http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2610
I get '?'s for the first 4 entities

http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2616
I get '?' for zwnj through rlm.

Linux build 2000.08.02.08 on RH 6.2
The second attachment shows all of the HTML 4 character entities in named
and numeric form in one testcase; now that random entities aren't crashing
Mozilla, that's feasible and convenient.

New testcase for spacing and zero-width characters in text: 
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=12289
Testing with 2000-08-02-08-M18 shows the "Windows" results in the next paragraph

Remaining problem character entities:
  On Linux: ⌈ &rciel; ⌊ ⌋ ‌ ‍ ‎ ‏
  On Windows:                                 ‌ ‍ ‎ ‏
  On Mac: as yet unknown

Richard: yeah, 2608 is not a character entity testcase; never was; 
mea culpa typoa: should have been 2609.
Whiteboard: Win: WFM; Linux: ???; Mac: ???; → Win: problems; Linux: problems; Mac: ???;
bug 47714 is about Mac and Symbol entity set.
Updated remaining problem character entities:
  On Linux: ⌈ &rciel; ⌊ ⌋ ‌ ‍ ‎ ‏
  On Windows:                                 ‌ ‍ ‎ ‏
  On Mac:   ⌈ &rciel; ⌊ ⌋ (Miscellaneous Technical)

It is entirely possible that the problems with the Miscellaneous Technical
glyphs on Linux has a similar cause to to the codepoint translation 
problem on Macs (bug 47714).

Note that ‌, ‍, ‎, and ‏ are displaying properly as "nothing" 
in the "General Punctuation" table in the second attachment, but in the 
"Spacing and Zero-width Characters", either a "?" or a thin vertical bar is
appearing when they are placed in text - and these characters' normal habitat
is in the midst of printable text.
Depends on: 47714
Keywords: html4
Whiteboard: Win: problems; Linux: problems; Mac: ???; → Win: problems; Linux: problems; Mac: problems;
It always helps, when evaluating testcase results, to know what to expect.
For ‎ and ‏, it appears that visible glyphs looking almost like 
thin vertical bars, with tiny right- and left- pointing arrows at the top,
should be expected.

To see this clearly, view http://www.hclrss.demon.co.uk/demos/ent4_frame.html ,
scroll down in the left frame to _left-to-right mark_, and click on that link.
Looking carefully at the in-text testcase (end of second attachment), the same 
glyphs are shown on WinNT testing with 2000-08-09-08-M18 -- they are smaller, 
and butted against the adjacent text characters, but the characters are clearly 
*not* just thin vertical bars. On the other hand these charaters mysteriously 
do not appear when they are the only content of a table cell.
I don't think it is reasonable to fix 
‌ ‍ ‎ ‏
for any platform. These characters are control characters and should not be test 
the rendering along, instead the apperance should change depend on the 
surranding characters. There are no visual requirment how to display them ALONG. 
Some application / OS display them one or or the other. The importance is how 
they change the rendering of surranding characters. For example, they should 
change the display behavior Arabic, and indict scripts. 

The ⌈ &rciel; ⌊ ⌋ issue should be possible to fix if we 
remap according to the html instead the adobe mapping. We need to remap Mac code                
also. But it should be easy.
⌈ is U+2308 in unicode. By look at the Symbol font, it look like code 
point 0xE9. Which mean in Macintosh, the font encode as U+F8EE 
⌉ is U+2309 in unicode. By look at the Symbol font, it look like code 
point 0xF9. Which mean in Macintosh, the font encode as U+F8F9
⌊ is U+230A in Unicode. By look at the Symbol font, it look like code 
point 0xEB. Which mean in Macintosh, the font encode as U+F8F0
&rflorr; is U+230B in Unicode. By look at the Symbol font, it look like code 
point 0xFB. Which mean in Macintosh, the font encode as U+F8FB

Therefore, the way we fix this bug is to change the mapping table for Unicode to 
symbol font mapping and change the entries for 0xE9, 0xEB, 0xF9, 0xFB to 
2308, 230a, 2309, 230b

To fix Mac, we should put down 4 if
if( (0x2308 <= (u)) && ((u) <= 0x230b)) {
  if(u == 0x2308)
   u = 0xf8ee;
  else if(u=0x2309)
   u = 0xf8f9;
  else if(u=0x230A)
   u = 0xf8f0;
  else if(u=0x230B)
   u = 0xf8fb;
}
nsbeta3- per bug meeting (ekrock)
Whiteboard: Win: problems; Linux: problems; Mac: problems; → [nsbeta3-]Win: problems; Linux: problems; Mac: problems;
Accepting bug, but marking Future, since it's nsbeta3-.
Status: NEW → ASSIGNED
Target Milestone: M20 → Future
QA Contact: elig → teruko
Keywords: intl
Nominating for Mozilla1.0 as a polish/compliance issue.
Keywords: mozilla1.0
Frank, I'm reassigning this to you since you seem to know what to do, and I
don't know my way around your Unicode conversion tables.

Should this be marked nsbeta1?
Assignee: erik → ftang
Status: ASSIGNED → NEW
We probably should fix the Mac . That should be easy to do. Mark this bug as P3
moz9 for only the Mac enhancment part.
Status: NEW → ASSIGNED
Keywords: nsbeta3nsbeta1
Whiteboard: [nsbeta3-]Win: problems; Linux: problems; Mac: problems;
Target Milestone: Future → mozilla0.9
Changed QA contact to andreasb@netscape.com for now.
QA Contact: teruko → andreasb
The &lceil; &rceil; &lfloor; &rfloor; display problem on Mac have been checked in 
8/15/2000. It seems the only remaining issue are these in Gtk.  
ok. I also fix Gtk. here are the patch
bstell- can you review this ?
Target Milestone: mozilla0.9 → mozilla0.8.1
sr=erik

Looks good.
Depends on: 67374
since this is only used for converting from Unicode to Adobe code for display
this is okay.

r=bstell@netscape.com
fix linux lcell/rcell/rflorr/lfloor 
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Verifying this bug, however see new bug report (bug 75059) which narrows down
problematic characters.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: