Closed Bug 17962 Opened 25 years ago Closed 24 years ago

Display all HTML 4 character entities in browser correctly

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla0.8.1

People

(Reporter: sidr, Assigned: ftang)

References

()

Details

(Keywords: html4, intl)

Attachments

(7 files)

All of the HTML 4 character entities should display a useful and meaningful glyph when referenced in an HTML file.
Depends on: 17958
At present, there are at least 3 issues outstanding. 1. On Windows NT (and 95) the "Miscellaneous Technical" char entities testcase crashes the browser, and the browser hangs instead of displaying the "ISO 8859-1" char entities, preventing inspection of those testcases on Win32. This is bug 17958. 2. On linux, • displays as "•" instead of as a bullet. This is bug 16872. Possibly other character entities are affected, testing the testcases linked from the attachment above will tell the tale. 3. On linux, “ and ” (left and right double quotes) delay loading of straightforward pages for tens of seconds on at least some machines, if they end up being found in an ISO 10646-1 (unicode) sharacter set, due to the sheer size of the unicode set. This is bug 14961. On Windows and Mac, these characters are part of the first 256 characters in any set. This could be a general problem for other platforms that use ISO 8859-1, rather than the Windows or Mac adaptions, as their default character set. For practical purposes, if character entities take tens of seconds to find, that can't be considered adequate support. Also, even if this took less time, it is less than ideal to use a different font for the glyphs for only a few characters. Issue 1 is waiting on bug 17958. Issues 2 & 3 are of unknown severity at present. The only simple way to find out their severity would be to view the character entity testcases on several platforms other than Windows and Mac. BTW, the component is set to "internationalization" not because this is of consequence only for i18n (this is really a "Browser-General" problem), but because presumably that team is already working with character entities and character sets.
Assignee: ftang → erik
Depends on: 14961, 16872
Target Milestone: M20
This looks like a tracking bug instead of a real bug for me. All the problem mention here in this bug report have a seperate bug # associate w/ me. Mark this M20 since this is a tracking bug. Seperate bug# should have different M number since they should be fix eariler. Ressign this bug to erik, even it is a tracking bug. Most of the stuff mention here are GFX issue.
Sorry, yes, this is *also* a tracking bug, but it probably shouldn't be. Resolving the three bugs referred to won't invalidate this report. The only thing that can invalidate this report is testing. At its core, this bug is the general case for bug 16872, where • displays incorrectly on at least one platform. The reasoning: where there's smoke, there may be fire. The real question is, do all of the HTML 4 character entities that should display something display something useful on all Platform/OS combos?
Status: NEW → ASSIGNED
Bug 16872, which this bug depends on, was resolved as a dup of bug 454. Updating dependencies.
Depends on: 454
No longer depends on: 16872
Depends on: 32412
Depends on: 33498
Depends on: 33501
We need to add transliterations for all of the HTML4 CERs to the transliteration table. See also bug 33498 and bug 33501, which were created to track the addition of transliteration to the Windows and Mac versions of the font engine. (The Unix version already calls that API.)
Depends on: 36163
Many entities being displayed correctly in M16 have "broken" in the nightly I'm using now (ID 2000071620). They appear as inverted solid triangles. There are more complete entity reference pages, but mine is at http://www.r5i.com/~tim/symbols.shtml, or see the letterlike symbols, math symbols, and arrow in the test case attachment. Sorry if this is the wrong place for this. It's the closest I could find in my Bugzilla search.
These looked OK for me with last week's 2000071108 build on US Win95, but I see the inverted triangles with today's build, 2000071709. Reassigned to ftang because erik just left for sabbatical.
Assignee: erik → ftang
Status: ASSIGNED → NEW
See comments in bug 45543.
Testing with the 2000-07-17-09-M17 nightly binary on WinNT, 53 of the HTML 4 character entities display the same, incorrect, glyph, one that looks like a bold, bold left single quote mark. None of the ISO-8859-1 characters are affected. The affected symbols are mostly mathematical or quasi-mathematical. They are found in: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2604 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2606 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2607 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2608 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2610 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2616 Nominating for nsbeta3 - surely reviewers will subject it to test suites, and this is a very basic test to be failing. Looking at comments in bug 45543, the full fix for this will probably be waiting until Erik gets back: see the Additional Comments From rbs@maths.uq.edu.au 2000-07-17 11:18. This bug would almost certainly depend on bug 45543 except that there are no Korean characters in the HTML 4 set; the root problem looks to be the same. rbs@maths.uq.edu.au, is the current problem with the math glyphs a blocker for you? Updating Platform/OS to All/All, as this will need to be verified everywhere for HTML 4.0 compliance.
Keywords: nsbeta3
OS: Windows NT → All
Hardware: PC → All
> is the current problem with the math glyphs a blocker for you? No, it isn't. The hack I indicated on 45543 is temporarily doing the trick. When I visit the links you gave above, they look okay (with missing glyphs represented by '?' as expected). Also, MathML-enabled builds include the ucvmath module which gives access to more mathematical/scientific symbols for those who have the corresponding fonts. (Notice that the default Mozilla can display many of the HTML4 symbols if the user has the "Lucida Unicode Sans" font.)
reassign back to erik. It seems the fix for 45543 is good for short term and we should wait erik back to fix the rest.
Assignee: ftang → erik
This now WORKSFORME completely on Windows 2000 commerical build 6.0.17.2000080104. Should this be closed, or are there remaining issues?
Whiteboard: WORKSFORME?
> Should this be closed, or are there remaining issues? Not quite yet, and who knows? To date the greatest number of font diplay problems have occurred on Linux, but this also needs testing on Mac to be sure that all of the character entities display properly on the Tier 1 builds. For fonts issues, testing on Windows only is not enough.
Whiteboard: WORKSFORME? → Win: WFM; Linux: ???; Mac: ???;
Agreed. Eli, can you verifiy that this is WORKSFORME on all three tier 1 platforms? The attachement is a set of links to the comprehensive testcases you were after the other day...
QA Contact: teruko → elig
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2608 does not seem to be the right testcase http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2610 I get '?'s for the first 4 entities http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2616 I get '?' for zwnj through rlm. Linux build 2000.08.02.08 on RH 6.2
The second attachment shows all of the HTML 4 character entities in named and numeric form in one testcase; now that random entities aren't crashing Mozilla, that's feasible and convenient. New testcase for spacing and zero-width characters in text: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=12289 Testing with 2000-08-02-08-M18 shows the "Windows" results in the next paragraph Remaining problem character entities: On Linux: ⌈ &rciel; ⌊ ⌋ ‌ ‍ ‎ ‏ On Windows: ‌ ‍ ‎ ‏ On Mac: as yet unknown Richard: yeah, 2608 is not a character entity testcase; never was; mea culpa typoa: should have been 2609.
Whiteboard: Win: WFM; Linux: ???; Mac: ???; → Win: problems; Linux: problems; Mac: ???;
bug 47714 is about Mac and Symbol entity set.
Updated remaining problem character entities: On Linux: ⌈ &rciel; ⌊ ⌋ ‌ ‍ ‎ ‏ On Windows: ‌ ‍ ‎ ‏ On Mac: ⌈ &rciel; ⌊ ⌋ (Miscellaneous Technical) It is entirely possible that the problems with the Miscellaneous Technical glyphs on Linux has a similar cause to to the codepoint translation problem on Macs (bug 47714). Note that ‌, ‍, ‎, and ‏ are displaying properly as "nothing" in the "General Punctuation" table in the second attachment, but in the "Spacing and Zero-width Characters", either a "?" or a thin vertical bar is appearing when they are placed in text - and these characters' normal habitat is in the midst of printable text.
Depends on: 47714
Keywords: html4
Whiteboard: Win: problems; Linux: problems; Mac: ???; → Win: problems; Linux: problems; Mac: problems;
It always helps, when evaluating testcase results, to know what to expect. For ‎ and ‏, it appears that visible glyphs looking almost like thin vertical bars, with tiny right- and left- pointing arrows at the top, should be expected. To see this clearly, view http://www.hclrss.demon.co.uk/demos/ent4_frame.html , scroll down in the left frame to _left-to-right mark_, and click on that link. Looking carefully at the in-text testcase (end of second attachment), the same glyphs are shown on WinNT testing with 2000-08-09-08-M18 -- they are smaller, and butted against the adjacent text characters, but the characters are clearly *not* just thin vertical bars. On the other hand these charaters mysteriously do not appear when they are the only content of a table cell.
I don't think it is reasonable to fix ‌ ‍ ‎ ‏ for any platform. These characters are control characters and should not be test the rendering along, instead the apperance should change depend on the surranding characters. There are no visual requirment how to display them ALONG. Some application / OS display them one or or the other. The importance is how they change the rendering of surranding characters. For example, they should change the display behavior Arabic, and indict scripts. The ⌈ &rciel; ⌊ ⌋ issue should be possible to fix if we remap according to the html instead the adobe mapping. We need to remap Mac code also. But it should be easy.
&lceil; is U+2308 in unicode. By look at the Symbol font, it look like code point 0xE9. Which mean in Macintosh, the font encode as U+F8EE &rceil; is U+2309 in unicode. By look at the Symbol font, it look like code point 0xF9. Which mean in Macintosh, the font encode as U+F8F9 &lfloor; is U+230A in Unicode. By look at the Symbol font, it look like code point 0xEB. Which mean in Macintosh, the font encode as U+F8F0 &rflorr; is U+230B in Unicode. By look at the Symbol font, it look like code point 0xFB. Which mean in Macintosh, the font encode as U+F8FB Therefore, the way we fix this bug is to change the mapping table for Unicode to symbol font mapping and change the entries for 0xE9, 0xEB, 0xF9, 0xFB to 2308, 230a, 2309, 230b To fix Mac, we should put down 4 if if( (0x2308 <= (u)) && ((u) <= 0x230b)) { if(u == 0x2308) u = 0xf8ee; else if(u=0x2309) u = 0xf8f9; else if(u=0x230A) u = 0xf8f0; else if(u=0x230B) u = 0xf8fb; }
nsbeta3- per bug meeting (ekrock)
Whiteboard: Win: problems; Linux: problems; Mac: problems; → [nsbeta3-]Win: problems; Linux: problems; Mac: problems;
Accepting bug, but marking Future, since it's nsbeta3-.
Status: NEW → ASSIGNED
Target Milestone: M20 → Future
QA Contact: elig → teruko
Keywords: intl
Nominating for Mozilla1.0 as a polish/compliance issue.
Keywords: mozilla1.0
Frank, I'm reassigning this to you since you seem to know what to do, and I don't know my way around your Unicode conversion tables. Should this be marked nsbeta1?
Assignee: erik → ftang
Status: ASSIGNED → NEW
We probably should fix the Mac . That should be easy to do. Mark this bug as P3 moz9 for only the Mac enhancment part.
Status: NEW → ASSIGNED
Keywords: nsbeta3nsbeta1
Whiteboard: [nsbeta3-]Win: problems; Linux: problems; Mac: problems;
Target Milestone: Future → mozilla0.9
Changed QA contact to andreasb@netscape.com for now.
QA Contact: teruko → andreasb
The &lceil; &rceil; &lfloor; &rfloor; display problem on Mac have been checked in 8/15/2000. It seems the only remaining issue are these in Gtk.
ok. I also fix Gtk. here are the patch
bstell- can you review this ?
Target Milestone: mozilla0.9 → mozilla0.8.1
sr=erik Looks good.
Depends on: 67374
since this is only used for converting from Unicode to Adobe code for display this is okay. r=bstell@netscape.com
fix linux lcell/rcell/rflorr/lfloor
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Verifying this bug, however see new bug report (bug 75059) which narrows down problematic characters.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: