Closed
Bug 17962
Opened 25 years ago
Closed 24 years ago
Display all HTML 4 character entities in browser correctly
Categories
(Core :: Internationalization, defect, P3)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
mozilla0.8.1
People
(Reporter: sidr, Assigned: ftang)
References
()
Details
(Keywords: html4, intl)
Attachments
(7 files)
All of the HTML 4 character entities should display a useful and meaningful glyph when referenced in an HTML file.
Reporter | ||
Comment 1•25 years ago
|
||
Reporter | ||
Comment 2•25 years ago
|
||
At present, there are at least 3 issues outstanding. 1. On Windows NT (and 95) the "Miscellaneous Technical" char entities testcase crashes the browser, and the browser hangs instead of displaying the "ISO 8859-1" char entities, preventing inspection of those testcases on Win32. This is bug 17958. 2. On linux, • displays as "•" instead of as a bullet. This is bug 16872. Possibly other character entities are affected, testing the testcases linked from the attachment above will tell the tale. 3. On linux, “ and ” (left and right double quotes) delay loading of straightforward pages for tens of seconds on at least some machines, if they end up being found in an ISO 10646-1 (unicode) sharacter set, due to the sheer size of the unicode set. This is bug 14961. On Windows and Mac, these characters are part of the first 256 characters in any set. This could be a general problem for other platforms that use ISO 8859-1, rather than the Windows or Mac adaptions, as their default character set. For practical purposes, if character entities take tens of seconds to find, that can't be considered adequate support. Also, even if this took less time, it is less than ideal to use a different font for the glyphs for only a few characters. Issue 1 is waiting on bug 17958. Issues 2 & 3 are of unknown severity at present. The only simple way to find out their severity would be to view the character entity testcases on several platforms other than Windows and Mac. BTW, the component is set to "internationalization" not because this is of consequence only for i18n (this is really a "Browser-General" problem), but because presumably that team is already working with character entities and character sets.
Assignee | ||
Updated•25 years ago
|
Assignee | ||
Comment 3•25 years ago
|
||
This looks like a tracking bug instead of a real bug for me. All the problem mention here in this bug report have a seperate bug # associate w/ me. Mark this M20 since this is a tracking bug. Seperate bug# should have different M number since they should be fix eariler. Ressign this bug to erik, even it is a tracking bug. Most of the stuff mention here are GFX issue.
Reporter | ||
Comment 4•25 years ago
|
||
Sorry, yes, this is *also* a tracking bug, but it probably shouldn't be. Resolving the three bugs referred to won't invalidate this report. The only thing that can invalidate this report is testing. At its core, this bug is the general case for bug 16872, where • displays incorrectly on at least one platform. The reasoning: where there's smoke, there may be fire. The real question is, do all of the HTML 4 character entities that should display something display something useful on all Platform/OS combos?
Updated•25 years ago
|
Status: NEW → ASSIGNED
Comment 5•25 years ago
|
||
Bug 16872, which this bug depends on, was resolved as a dup of bug 454. Updating dependencies.
Comment 6•24 years ago
|
||
We need to add transliterations for all of the HTML4 CERs to the transliteration table. See also bug 33498 and bug 33501, which were created to track the addition of transliteration to the Windows and Mac versions of the font engine. (The Unix version already calls that API.)
Comment 7•24 years ago
|
||
Many entities being displayed correctly in M16 have "broken" in the nightly I'm using now (ID 2000071620). They appear as inverted solid triangles. There are more complete entity reference pages, but mine is at http://www.r5i.com/~tim/symbols.shtml, or see the letterlike symbols, math symbols, and arrow in the test case attachment. Sorry if this is the wrong place for this. It's the closest I could find in my Bugzilla search.
These looked OK for me with last week's 2000071108 build on US Win95, but I see the inverted triangles with today's build, 2000071709. Reassigned to ftang because erik just left for sabbatical.
Assignee: erik → ftang
Status: ASSIGNED → NEW
Reporter | ||
Comment 10•24 years ago
|
||
Testing with the 2000-07-17-09-M17 nightly binary on WinNT, 53 of the HTML 4 character entities display the same, incorrect, glyph, one that looks like a bold, bold left single quote mark. None of the ISO-8859-1 characters are affected. The affected symbols are mostly mathematical or quasi-mathematical. They are found in: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2604 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2606 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2607 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2608 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2610 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2616 Nominating for nsbeta3 - surely reviewers will subject it to test suites, and this is a very basic test to be failing. Looking at comments in bug 45543, the full fix for this will probably be waiting until Erik gets back: see the Additional Comments From rbs@maths.uq.edu.au 2000-07-17 11:18. This bug would almost certainly depend on bug 45543 except that there are no Korean characters in the HTML 4 set; the root problem looks to be the same. rbs@maths.uq.edu.au, is the current problem with the math glyphs a blocker for you? Updating Platform/OS to All/All, as this will need to be verified everywhere for HTML 4.0 compliance.
Comment 11•24 years ago
|
||
> is the current problem with the math glyphs a blocker for you?
No, it isn't. The hack I indicated on 45543 is temporarily doing the trick.
When I visit the links you gave above, they look okay (with missing glyphs
represented by '?' as expected). Also, MathML-enabled builds include the ucvmath
module which gives access to more mathematical/scientific symbols for those who
have the corresponding fonts.
(Notice that the default Mozilla can display many of the HTML4 symbols if
the user has the "Lucida Unicode Sans" font.)
Assignee | ||
Comment 12•24 years ago
|
||
reassign back to erik. It seems the fix for 45543 is good for short term and we should wait erik back to fix the rest.
Assignee: ftang → erik
Comment 13•24 years ago
|
||
This now WORKSFORME completely on Windows 2000 commerical build 6.0.17.2000080104. Should this be closed, or are there remaining issues?
Whiteboard: WORKSFORME?
Reporter | ||
Comment 14•24 years ago
|
||
> Should this be closed, or are there remaining issues? Not quite yet, and who knows? To date the greatest number of font diplay problems have occurred on Linux, but this also needs testing on Mac to be sure that all of the character entities display properly on the Tier 1 builds. For fonts issues, testing on Windows only is not enough.
Whiteboard: WORKSFORME? → Win: WFM; Linux: ???; Mac: ???;
Comment 15•24 years ago
|
||
Agreed. Eli, can you verifiy that this is WORKSFORME on all three tier 1 platforms? The attachement is a set of links to the comprehensive testcases you were after the other day...
QA Contact: teruko → elig
Comment 16•24 years ago
|
||
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2608 does not seem to be the right testcase http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2610 I get '?'s for the first 4 entities http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2616 I get '?' for zwnj through rlm. Linux build 2000.08.02.08 on RH 6.2
Reporter | ||
Comment 17•24 years ago
|
||
Reporter | ||
Comment 18•24 years ago
|
||
The second attachment shows all of the HTML 4 character entities in named and numeric form in one testcase; now that random entities aren't crashing Mozilla, that's feasible and convenient. New testcase for spacing and zero-width characters in text: http://bugzilla.mozilla.org/showattachment.cgi?attach_id=12289 Testing with 2000-08-02-08-M18 shows the "Windows" results in the next paragraph Remaining problem character entities: On Linux: ⌈ &rciel; ⌊ ⌋ ‌ ‍ ‎ ‏ On Windows: ‌ ‍ ‎ ‏ On Mac: as yet unknown Richard: yeah, 2608 is not a character entity testcase; never was; mea culpa typoa: should have been 2609.
Whiteboard: Win: WFM; Linux: ???; Mac: ???; → Win: problems; Linux: problems; Mac: ???;
Assignee | ||
Comment 19•24 years ago
|
||
bug 47714 is about Mac and Symbol entity set.
Reporter | ||
Comment 20•24 years ago
|
||
Updated remaining problem character entities: On Linux: ⌈ &rciel; ⌊ ⌋ ‌ ‍ ‎ ‏ On Windows: ‌ ‍ ‎ ‏ On Mac: ⌈ &rciel; ⌊ ⌋ (Miscellaneous Technical) It is entirely possible that the problems with the Miscellaneous Technical glyphs on Linux has a similar cause to to the codepoint translation problem on Macs (bug 47714). Note that ‌, ‍, ‎, and ‏ are displaying properly as "nothing" in the "General Punctuation" table in the second attachment, but in the "Spacing and Zero-width Characters", either a "?" or a thin vertical bar is appearing when they are placed in text - and these characters' normal habitat is in the midst of printable text.
Reporter | ||
Comment 21•24 years ago
|
||
It always helps, when evaluating testcase results, to know what to expect. For ‎ and ‏, it appears that visible glyphs looking almost like thin vertical bars, with tiny right- and left- pointing arrows at the top, should be expected. To see this clearly, view http://www.hclrss.demon.co.uk/demos/ent4_frame.html , scroll down in the left frame to _left-to-right mark_, and click on that link. Looking carefully at the in-text testcase (end of second attachment), the same glyphs are shown on WinNT testing with 2000-08-09-08-M18 -- they are smaller, and butted against the adjacent text characters, but the characters are clearly *not* just thin vertical bars. On the other hand these charaters mysteriously do not appear when they are the only content of a table cell.
Assignee | ||
Comment 22•24 years ago
|
||
I don't think it is reasonable to fix ‌ ‍ ‎ ‏ for any platform. These characters are control characters and should not be test the rendering along, instead the apperance should change depend on the surranding characters. There are no visual requirment how to display them ALONG. Some application / OS display them one or or the other. The importance is how they change the rendering of surranding characters. For example, they should change the display behavior Arabic, and indict scripts. The ⌈ &rciel; ⌊ ⌋ issue should be possible to fix if we remap according to the html instead the adobe mapping. We need to remap Mac code also. But it should be easy.
Assignee | ||
Comment 23•24 years ago
|
||
⌈ is U+2308 in unicode. By look at the Symbol font, it look like code point 0xE9. Which mean in Macintosh, the font encode as U+F8EE ⌉ is U+2309 in unicode. By look at the Symbol font, it look like code point 0xF9. Which mean in Macintosh, the font encode as U+F8F9 ⌊ is U+230A in Unicode. By look at the Symbol font, it look like code point 0xEB. Which mean in Macintosh, the font encode as U+F8F0 &rflorr; is U+230B in Unicode. By look at the Symbol font, it look like code point 0xFB. Which mean in Macintosh, the font encode as U+F8FB Therefore, the way we fix this bug is to change the mapping table for Unicode to symbol font mapping and change the entries for 0xE9, 0xEB, 0xF9, 0xFB to 2308, 230a, 2309, 230b To fix Mac, we should put down 4 if if( (0x2308 <= (u)) && ((u) <= 0x230b)) { if(u == 0x2308) u = 0xf8ee; else if(u=0x2309) u = 0xf8f9; else if(u=0x230A) u = 0xf8f0; else if(u=0x230B) u = 0xf8fb; }
Assignee | ||
Comment 24•24 years ago
|
||
nsbeta3- per bug meeting (ekrock)
Whiteboard: Win: problems; Linux: problems; Mac: problems; → [nsbeta3-]Win: problems; Linux: problems; Mac: problems;
Comment 25•24 years ago
|
||
Accepting bug, but marking Future, since it's nsbeta3-.
Status: NEW → ASSIGNED
Target Milestone: M20 → Future
Updated•24 years ago
|
QA Contact: elig → teruko
Comment 26•24 years ago
|
||
Nominating for Mozilla1.0 as a polish/compliance issue.
Keywords: mozilla1.0
Comment 27•24 years ago
|
||
Frank, I'm reassigning this to you since you seem to know what to do, and I don't know my way around your Unicode conversion tables. Should this be marked nsbeta1?
Assignee: erik → ftang
Status: ASSIGNED → NEW
Assignee | ||
Comment 28•24 years ago
|
||
We probably should fix the Mac . That should be easy to do. Mark this bug as P3 moz9 for only the Mac enhancment part.
Comment 29•24 years ago
|
||
Changed QA contact to andreasb@netscape.com for now.
QA Contact: teruko → andreasb
Assignee | ||
Comment 30•24 years ago
|
||
The ⌈ ⌉ ⌊ ⌋ display problem on Mac have been checked in 8/15/2000. It seems the only remaining issue are these in Gtk.
Assignee | ||
Comment 31•24 years ago
|
||
ok. I also fix Gtk. here are the patch
Assignee | ||
Comment 32•24 years ago
|
||
Assignee | ||
Comment 33•24 years ago
|
||
Assignee | ||
Comment 34•24 years ago
|
||
Assignee | ||
Comment 35•24 years ago
|
||
Assignee | ||
Comment 36•24 years ago
|
||
Assignee | ||
Comment 37•24 years ago
|
||
bstell- can you review this ?
Target Milestone: mozilla0.9 → mozilla0.8.1
Comment 38•24 years ago
|
||
sr=erik Looks good.
Comment 39•24 years ago
|
||
since this is only used for converting from Unicode to Adobe code for display this is okay. r=bstell@netscape.com
Assignee | ||
Comment 40•24 years ago
|
||
fix linux lcell/rcell/rflorr/lfloor
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 41•23 years ago
|
||
Verifying this bug, however see new bug report (bug 75059) which narrows down problematic characters.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•