Closed Bug 6585 Opened 23 years ago Closed 22 years ago

missing glyphs should use substitute

Categories

(Core :: Layout, defect, P1)

x86
Windows NT
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: erik, Assigned: erik)

References

()

Details

(Keywords: css1, Whiteboard: [PDT-])

Attachments

(5 files)

If we cannot find any font that contains a glyph for the particular character
we are trying to measure/draw, then we should pop up a window that explains
where you can download such a font, similar to the window that comes up when
you visit a page containing a plug-in that you don't have yet.

Currently, the measuring code returns zero, and the drawing code doesn't
attempt to draw anything for such characters. (We might want to at least draw
some boxes in this case, like Windows does.)

For the pop-up window, there are various issues to consider:

What do we say to the user? "You don't have a font for the character U+1234"?
"Would you like to download a Japanese font?" Etc.

Do we automatically re-render the page after the user has downloaded the font?

How often do we trigger the pop-up? Once per page? Once per session? Once per
session for each character? For each "script" (Unicode range)? Etc.
Status: NEW → ASSIGNED
Target Milestone: M15
*** Bug 3852 has been marked as a duplicate of this bug. ***
OS: other → All
Summary: pop up window when no font found for particular character → {css1} missing glyphs should use substitute (U+FFFD)
The following page:
   http://www.bath.ac.uk/%7Epy8ieh/internet/eviltests/glyphs.html
...should help with this bug (it has some obscure maths codes in it).

You should definitely be drawing something for unknown glyphs. That is a {css1}
compliance issue. I am increasing the severity for the moment, since the missing
glpyh is an error, not an enhancement.

Technically, if you cannot display a glyph then you should be displaying the
glyph with unicode code point U+FFFD, REPLACEMENT CHARACTER. If that doesn't
map to anything, then, well... a space (U+0020) would be better than nothing!

A popup window would be a possibility, but see also the suggestion in bug 6211:
a window detailing all errors in the document (HTML, CSS, image downloading
problems, and so on). I will add "glyphs not found" to the list of errors that
should be reported.
Adding lyecies@netscape.com to cc list.
*** Bug 17567 has been marked as a duplicate of this bug. ***
Depends on: 19494
Here is the test case from bug 17567:
   http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2486
BTW, on my system IE5 has a really amusing attempt at this: it displays three
different "missing glyph" glyphs!!! :-) I'm hoping we will be slightly more
consistent than that...
Bug 17567 had a note saying that the code to fix this bug (or at least the part
covered by 17567) has already landed, but is #ifdef MOZ_MATHML.
Actually, the font code that Roger checked in for MathML was mostly written by
me, and was designed to be extensible, so that we could extend it to cover this
bug too. So, it isn't just a matter of making the #ifdef MATHML code be the
default. We need to write some more code.
Keywords: css1
Migrating from {css1} to css1 keyword. The {css1}, {css2}, {css3} and {css-moz}
radars should now be considered deprecated in favour of keywords.
I am *really* sorry about the spam...
Moving all of my M15s to M16. Please add comments if you disagree.
Target Milestone: M15 → M16
Summary: {css1} missing glyphs should use substitute (U+FFFD) → missing glyphs should use substitute (U+FFFD)
I would hope to see this fixed before M16.  How hard is it?  I would nominate it
for beta1 if I were more sure that it isn't that complicated...  It's not a
minor issue, especially when reading texts in certain languages (that are not
the user's default language).
Beta1 candidacy is not so much dependent on how hard it is to fix it, but how
much we need it for "beta quality", which is of course difficult to define, but
the PDT team has a pretty good sense of what is important and what isn't. So,
how about just nominating for Beta1, by adding the beta1 keyword above?
OK, nominating for beta1, because I think it's important that the user knows
when bits of text are missing.
Keywords: beta1
Agreed ... it's frustrating that there's content on the page that mozilla just
doesn't show at all.  4.x shows them wrong, but at least it shows that there's
something there.
PDT indicated it would not hold beta for this... so PDT-
Whiteboard: [PDT-]
Perhaps a tooltip can be displayed when hovering over a missing glyph (U+FFFD), e.g. _Latin small letter e with inverted breve_ for U+0207.
How can you hover over something that's not there at all?  And how would the
user know where to hover?
> How can you hover over something that's not
> there at all?  And how would the
> user know where to hover?

When a glyph can't be found, the REPLACEMENT CHARACTER U+FFFD should be displayed. When hovering over this, the name of the missing glyph can be shown as a tooltip. Hope this makes things clearer.
Erik, now that I am wading into fonts again for MathML add-ons, things are
pretty fresh in my memory, and I may lay a helping hand w.r.t. this bug in
the Win32 corner.

The fix has to be in FindGlobalFont(), right? Rather than return nsnull, return
the first font that has the Unicode's REPLACEMENT CHAR?
Thinking again, just returning a font there is not enough! Since this doesn't 
tell to other rendering/measuring functions that that particular char has to be
re-mapped to the missing char.
Right, you can't just return a font that contains a glyph for U+FFFD. I would
rather make it more general, and return an object that implements the GetWidth
and DrawString APIs, but isn't actually a font on the inside. It is just a
couple of methods that deal with rectangular boxes on the screen. I.e. the
GetWidth method returns the width of the boxes that would be drawn by the
DrawString method.

Also, we want to create another subclass that maps Unicodes to other character
codes for which we *do* have glyphs on the system. E.g. Unix, with limited
iso8859-1 fonts, where windows-1252 docs can't be displayed normally.

But a box-drawing subclass is a good first step, and I think we need it anyway.
Got a fix in my tree. As you suggested, I added another subclass
(nsFontWinSubstitute) which acts as a fall-back for chars without glyphs.
Great job with this foundation, erik!

I didn't implement a drawing box font. I am simply using an existing
font (out of the global list) that has the replacement char. 
This font is added in mLoadedFonts[], but its map only contains chars
without glyphs, and it is not added in the hashtable (this way, it cannot
be looked up, and therefore it will not interfere with the other font
switching tricks, and its name doesn't matter). 

The way to get hold of the font is with something like
mLoadedFonts[mIndexOfSubstituteFont], which is then used to 
instantiate a nsFontWinSubstitute that has its own GetWidth/DrawString.
[e.g., GetWidth('xyz') actually does GetWidth('???'),
DrawString('xyz') actually does DrawString('???')];

Since the font is kept in mLoadedFonts[], subsequent searches for the
same chars without glyphs are speedy, and return the substitute font
straight away.
Should this go in M14?
The PDT team has already looked at this and marked it PDT- (i.e. not a
candidate for Beta 1).

Your work *is* appreciated though, especially by me!
While the tree is still open (i.e., today), why should not being marked PDT+
prevent someone who doesn't work for Netscape from checking in the fix (assuming
it's reviewed, etc.)?  Some of us do think this should definitely be in beta1.
I wondered that as well.  The tree isn't closed to non-PDT+ fixes, it's just
that Netscape engineers aren't supposed to be spending time tracking down
non-PDT+ fixes.  Or at least, that's been my understanding.
I don't have time to review the code since I'm working on PDT+ bugs. If
somebody else does have time to review it, and both Roger and the reviewer can
promise that this code will not cause other bugs in M14 or cause performance
problems, then I may give my reluctant "OK". I'm not trying to be mean here.
I've had years of experience shipping betas, and this is not the right time to
introduce significant numbers of lines of code in sensitive areas for features
that are not even deemed PDT+ by the PDT team. When you are close to the
deadline, a fix must be for something that is *both* actually needed *and*
very safe and understood.
By the way, Windows GFX changes now require Troy's review, and Kevin McCluskey
is the GFX owner, so his review is also required.
By the way, as Jim Roskind said in mozilla.seamonkey, M14 may not become
Netscape's Beta 1. If M14 has too many beta-stoppers, M15 will be the target.
So there is still some chance that this fix will make Netscape's Beta 1, and
even if it doesn't we can still get it into Beta 2.
If you can post the diffs and get Troy and Kevin McCluskey to review the
changes, go for it...
cc'd troy and kmcclusk
Attached file diff in gfx
Attached file diff in intl
Ok guys, now you have everything. Upon review, the decision is yours.
I am pretty confident that the code is now solid. It makes the browsing
experience quite nice. Erik has been very helpful in providing insights
and a great deal of feedback during development. I understand his prudence,
and if the code has to wait, no problemo.

The part in intl consists in adding extra mapping tables to enable
the conversion of further math fonts.

These fonts can be enabled by default in gfx by removing the
#ifdef MOZ_MATHML that is in gFontsHaveConverters, if you deem
appropriate. Screenshots of the testcases when these were enabled
and disabled were attached.

The fact that some math chars "overflow" in the testcase is due to the 
FindFont('a') hack --it returns the "ascii" metrics that are used as the
default metrics. This is a recorded bugzilla bug 20394 which erik plans to 
address with a GetDimensions() method that will supersede the current 
GetHeight/GetWidth.
Of note, the patch for gfx can be tried in *isolation*.
So, feel free to apply it on gfx, and report what you see (on win32)...
Severity: normal → major
Priority: P3 → P1
Target Milestone: M16 → M15
The bug is scheduled for M15, meaning there is time for improvements on the
present solution, if worthwhile. Currently, since "viewsource" goes through the
same rendering process, it means that chars without glyphs are still renderered
with the REPLACEMENT CHAR, even within "viewsource". Is it, instead, desirable 
to display the unicode points there?

Implementing this is not really demanding, because the font engine knows 
everything, and can either choose to replace the chars, or render their
unicode points. However, "viewsource" is currently known to be notoriously slow
and the question is whether it is worthwhile doing these things that will make
it even slower. Also, the font engine will then have to know that the consumer 
is "viewsource"

BTW, is there a common unicode font that actually has a glyph for the Unicode's
replacement char U+FFFD?
nsFont.decorations is a PRUint8 bit-mask that could be used for your View Source
idea. If we had 2 bits, we could have 4 modes:

(1) show glyphs as usual (like browser)
(2) unavailable glyphs are shown as Unicodes
(3) all non-ASCII chars (> 0x7F) are shown as Unicodes
(4) all chars are shown as Unicodes

Number (4) is not that useful though...
Sounds good. Some comments/clarifications:

(1) show glyphs as usual (like browser). All missing chars show up as the 
replacement char (this is the current behavior --not really helpful,
but can be left as an option, as it is the fatest mode, and is a sort of
WYSIWYG too!).

(2) unavailable glyphs are shown as Unicodes -- this mode allow
copy-pasting of the screen content of viewsource.

(3) all non-ASCII chars (> 0x7F) are shown as Unicodes -- Not very
sure how helpful is this. So the reformulation:

(3) all non-ASCII chars (> 0x7F) are shown as both Unicodes and 
Glyphs, in the format "0xUnicode{Glyph}". Hence a char withouht glyph
will show up as "0xUnicode{REPLACEMENT_CHAR}".
I now have all this coded up in my tree, and there are in sync with the changes
Troy made recently --bug 27056, to optimize the use of SelectObject().

For (1), that was the original code. I didn't have to do anything --save
bringing the code in sync.

For (3), this is coded in nsRenderingContextWin. I added a convenience function
SubstituteNonAsciiChars() that checks to see if some non-ascii chars have to be
substituted. And... it is the output of this transformation that is used in the
body of GetWidth and DrawString. (This way, the font-swithing machinery will
pick the glyphs... For example, "Å" will appear as "ÅU+000C5". "Å" could be any
strange glyph that could be picked in any font.) The SubstituteNonAsciiChars() 
function is carefully optimized and does nothing if nothing has to be done 
(e.g., when the current text run consists only of ascii chars).

Concretely, the code of GetWidth(aString) and DrawString(aString) now loos like:
  PRUnichar* pstr = aString;
  if (shouldSubstituteNonAsciiChars) {
    pstr = SubstituteNonAsciiChars(aString);
  }
  ... use the *same* code as before to do the measurement/rendering using pstr

For (2), this is coded by nsFontWinSubstitute in nsFontMetricsWin.
(Doing so fulfills the intended purpose because nsFontWinSubstitute only
applies to chars without glyphs.) The code of GetWidth(aString) and
DrawString(aString) there now loos like:
  PRUnichar* pstr = aString;
  if (shouldDisplayUnicode) {
    ... sprintf Unicode points
  }
  else {
    ... default to the REPLACEMENT CHAR
  }
  ... use the *same* code as before to do the measurement/rendering using pstr

====
It is interesting to see the HEX (huh, Unicode) modes in action, and as you can
see from the description above, the code is solidly standing upon the existing
foundation and should therefore be as robust as the foundation :-)
That's why I am pretty confident that the code is OK and should go in M15 as
early as possible to enable further testing and enjoyement by others.

What is missing in the code is how to activate these 'shouldDisplayUnicode'
or 'shouldSubstituteNonAsciiChars'.
I just ran across this font problem while porting to the Photon UI,
which may be the same as this problem, on articles at 
www.abcnews.com I see HTML that looks like:

<font face= geneva,arial,helvetica size=5><b>
<a href="/sections/us/DailyNews/bettyloubeets000218.html" >
Weighing &#145;Black Widow&#146;s&#146; Fate
</a>
</b></font><br>

The "&#145;" character looks like an arrow pointing upward
under Photon. It is correct under Windows, Mac and Linux
Netscape 4.7 but it is not correct under Photon or GTK Mozilla.

I found out that numeric entities between 130-159 are special
and need to be translated. Check out this link for more info:

http://www.hut.fi/u/jkorpela/www/windows-chars.html

I guess I can put some sort of translation in my own GFX code
to do the translation but I was not sure if the purpose and fix
for this bug would also fix this problem.

Does this bug encompass this problem??
Those numeric entities should not be supported in strict mode, since they are
undefined.  (However, as characters in the document, they could be supported if
it's in a windows encoding, rather than iso-8859-1.)
I am not sure if these codes are treated as special cases somewhere, and
re-mapped to their proper unicode values --At the level of GFX they are not
treated as special cases.
NCRs (numeric character references) like &#145; are indeed illegal, but very
common on the Net. Our parser converts them to the correct Unicodes, and GFX
then tries to find fonts with glyphs for those Unicodes. On Unix we have for a
long time had the problem that Windows extensions in the 0x80-0x9F range could
not be displayed. This bug report does indeed refer to that kind of problem.
There are several bug reports assigned to me in this area, some Unix-specific.
BTW, we still need additional API to really nail this bug 6585 down.
The substitution mode is not yet defined and I saw in bug 5100
that peterl didn't like seeing nsFont used for extra things.
So we will need some other ways to pass rendering options in GFX.
There could be a SetFlags() (or whatever) to specify rendering
policies. 

RFE: going back to the initial questions raised in this bug report, an 
interesting option could be to pool/sort all the missing chars, in view of
possibly sending a one-off feedback to the user (bug 6211, with a listing of
names?), or in view of determining the fonts that can be auto-downloaded (or
suggested for download). Note that it would be hard to implement a 
tooltip --compared to improving the viewsource trick to show their unicode 
in an ERROR-type color, but even achieving this is not straightforward.
This bug is now for Windows only, and assigned to Roger for M16 (i.e. after
Beta 1 branch). I created bug 31252 for Unix. Mac already has substitutes.
Assignee: erik → rbs
Status: ASSIGNED → NEW
OS: All → Windows NT
Hardware: All → PC
Summary: missing glyphs should use substitute (U+FFFD) → missing glyphs should use substitute
Target Milestone: M15 → M16
Code checked-in. Chars with missing glyphs are now substituted.

Currently only one replacement mode is activated.
Additional API is needed to switch to different modes on the fly.
Roger, thanks for taking on this work, and for checking it in. This bug report
is out of control, and we ought to create separate bug reports for the
remaining issues. Re-assign the bug back to me if you'd like me to do that.
Re-assigning to erik to better organise where to head to:

* Use substitute as per bug title (done).
* API to flag different rendering policies (e.g., could be with 
a general struct, or for a start, a bitarray with some bits for
the replacement mode)
* Visual feedback of missing chars (Error-color for ViewSource? popup?)
* Auto-download of fonts ?
* Error log ? (collect all missing chars -- bug 6211) ?
Assignee: rbs → erik
At some point, I will look into this bug, and create one or more new bugs to
cover the remaining issues. For now, setting milestone to M17.
Status: NEW → ASSIGNED
Target Milestone: M16 → M17
I'd prefer we think about the design sooner, so we can define and prioritize
tasks and possibly delegate them as well...
I did similar thing on Mac already. However, we should do special process for
"space" characters in the range of U+2000-U+206F. (some of them, not all of
them). There are no reason we cannot "render"/"measure" a space characters, even
non of the font have these glyph (a glyph for a space character is simply a
matrix without any countor....)
This bug is *way* out of control. Please do not add any more comments to it.
I have created a new bug for the Unicode space character issue:

  http://bugzilla.mozilla.org/show_bug.cgi?id=33032
Please do not add any more comments to this bug. Marking FIXED (since we now at
least get question marks for unavailable glyphs). Other issues have been
separated out. Thanks, Shrirang!

From: shrir@netscape.com (Shrirang Khanzode)
  
I have seperated the other issues from bug 6585. 
Bugs logged: 
Bug 41251 : API to flag rendering policies 
Bug 41249: Visual feedback for missing characters 
Bug 41250 : Auto download of fonts 
(Filed already) Bug 6211 - Error log to collect missing characters
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Target Milestone: M17 → M16
Marking fixed in the July 6th build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.