6585 - missing glyphs should use substitute

Assignee

Description

•

25 years ago

If we cannot find any font that contains a glyph for the particular character
we are trying to measure/draw, then we should pop up a window that explains
where you can download such a font, similar to the window that comes up when
you visit a page containing a plug-in that you don't have yet.

Currently, the measuring code returns zero, and the drawing code doesn't
attempt to draw anything for such characters. (We might want to at least draw
some boxes in this case, like Windows does.)

For the pop-up window, there are various issues to consider:

What do we say to the user? "You don't have a font for the character U+1234"?
"Would you like to download a Japanese font?" Etc.

Do we automatically re-render the page after the user has downloaded the font?

How often do we trigger the pop-up? Once per page? Once per session? Once per
session for each character? For each "script" (Unicode range)? Etc.

Erik van der Poel

Assignee

Updated

•

25 years ago

Status: NEW → ASSIGNED

Target Milestone: M15

Hixie (not reading bugmail)

Comment 1

•

25 years ago

*** Bug 3852 has been marked as a duplicate of this bug. ***

Hixie (not reading bugmail)

Updated

•

25 years ago

URL: http://www.bath.ac.uk/%7Epy8ieh/inter...

OS: other → All

Summary: pop up window when no font found for particular character → {css1} missing glyphs should use substitute (U+FFFD)

Hixie (not reading bugmail)

Comment 2

•

25 years ago

The following page:
   http://www.bath.ac.uk/%7Epy8ieh/internet/eviltests/glyphs.html
...should help with this bug (it has some obscure maths codes in it).

You should definitely be drawing something for unknown glyphs. That is a {css1}
compliance issue. I am increasing the severity for the moment, since the missing
glpyh is an error, not an enhancement.

Technically, if you cannot display a glyph then you should be displaying the
glyph with unicode code point U+FFFD, REPLACEMENT CHARACTER. If that doesn't
map to anything, then, well... a space (U+0020) would be better than nothing!

A popup window would be a possibility, but see also the suggestion in bug 6211:
a window detailing all errors in the document (HTML, CSS, image downloading
problems, and so on). I will add "glyphs not found" to the list of errors that
should be reported.

Erik van der Poel

Assignee

Comment 3

•

25 years ago

See also:

  http://www.w3.org/TR/REC-html40/charset.html#h-5.4

leger

Comment 4

•

25 years ago

Adding lyecies@netscape.com to cc list.

Hixie (not reading bugmail)

Comment 5

•

25 years ago

*** Bug 17567 has been marked as a duplicate of this bug. ***

Hixie (not reading bugmail)

Updated

•

25 years ago

Depends on: 19494

Hixie (not reading bugmail)

Comment 6

•

25 years ago

Here is the test case from bug 17567:
   http://bugzilla.mozilla.org/showattachment.cgi?attach_id=2486
BTW, on my system IE5 has a really amusing attempt at this: it displays three
different "missing glyph" glyphs!!! :-) I'm hoping we will be slightly more
consistent than that...

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 7

•

25 years ago

Bug 17567 had a note saying that the code to fix this bug (or at least the part
covered by 17567) has already landed, but is #ifdef MOZ_MATHML.

Erik van der Poel

Assignee

Comment 8

•

25 years ago

Actually, the font code that Roger checked in for MathML was mostly written by
me, and was designed to be extensible, so that we could extend it to cover this
bug too. So, it isn't just a matter of making the #ifdef MATHML code be the
default. We need to write some more code.

Hixie (not reading bugmail)

Updated

•

25 years ago

Keywords: css1

Hixie (not reading bugmail)

Comment 9

•

25 years ago

Migrating from {css1} to css1 keyword. The {css1}, {css2}, {css3} and {css-moz}
radars should now be considered deprecated in favour of keywords.
I am *really* sorry about the spam...

Erik van der Poel

Assignee

Comment 10

•

25 years ago

Moving all of my M15s to M16. Please add comments if you disagree.

Target Milestone: M15 → M16

Michael Lowe

Updated

•

25 years ago

Summary: {css1} missing glyphs should use substitute (U+FFFD) → missing glyphs should use substitute (U+FFFD)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 11

•

25 years ago

I would hope to see this fixed before M16.  How hard is it?  I would nominate it
for beta1 if I were more sure that it isn't that complicated...  It's not a
minor issue, especially when reading texts in certain languages (that are not
the user's default language).

Erik van der Poel

Assignee

Comment 12

•

25 years ago

Beta1 candidacy is not so much dependent on how hard it is to fix it, but how
much we need it for "beta quality", which is of course difficult to define, but
the PDT team has a pretty good sense of what is important and what isn't. So,
how about just nominating for Beta1, by adding the beta1 keyword above?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 13

•

25 years ago

OK, nominating for beta1, because I think it's important that the user knows
when bits of text are missing.

Keywords: beta1

Akkana Peck

Comment 14

•

25 years ago

Agreed ... it's frustrating that there's content on the page that mozilla just
doesn't show at all.  4.x shows them wrong, but at least it shows that there's
something there.

Jim Roskind

Comment 15

•

25 years ago

PDT indicated it would not hold beta for this... so PDT-

Whiteboard: [PDT-]

Karl Ove Hufthammer

Comment 16

•

25 years ago

Perhaps a tooltip can be displayed when hovering over a missing glyph (U+FFFD), e.g. _Latin small letter e with inverted breve_ for U+0207.

Akkana Peck

Comment 17

•

25 years ago

How can you hover over something that's not there at all?  And how would the
user know where to hover?

Karl Ove Hufthammer

Comment 18

•

25 years ago

> How can you hover over something that's not
> there at all?  And how would the
> user know where to hover?

When a glyph can't be found, the REPLACEMENT CHARACTER U+FFFD should be displayed. When hovering over this, the name of the missing glyph can be shown as a tooltip. Hope this makes things clearer.

rbs

Comment 19

•

25 years ago

Erik, now that I am wading into fonts again for MathML add-ons, things are
pretty fresh in my memory, and I may lay a helping hand w.r.t. this bug in
the Win32 corner.

The fix has to be in FindGlobalFont(), right? Rather than return nsnull, return
the first font that has the Unicode's REPLACEMENT CHAR?

rbs

Comment 20

•

25 years ago

Thinking again, just returning a font there is not enough! Since this doesn't 
tell to other rendering/measuring functions that that particular char has to be
re-mapped to the missing char.

Erik van der Poel

Assignee

Comment 21

•

25 years ago

Right, you can't just return a font that contains a glyph for U+FFFD. I would
rather make it more general, and return an object that implements the GetWidth
and DrawString APIs, but isn't actually a font on the inside. It is just a
couple of methods that deal with rectangular boxes on the screen. I.e. the
GetWidth method returns the width of the boxes that would be drawn by the
DrawString method.

Also, we want to create another subclass that maps Unicodes to other character
codes for which we *do* have glyphs on the system. E.g. Unix, with limited
iso8859-1 fonts, where windows-1252 docs can't be displayed normally.

But a box-drawing subclass is a good first step, and I think we need it anyway.

rbs

Comment 22

•

25 years ago

Got a fix in my tree. As you suggested, I added another subclass
(nsFontWinSubstitute) which acts as a fall-back for chars without glyphs.
Great job with this foundation, erik!

I didn't implement a drawing box font. I am simply using an existing
font (out of the global list) that has the replacement char. 
This font is added in mLoadedFonts[], but its map only contains chars
without glyphs, and it is not added in the hashtable (this way, it cannot
be looked up, and therefore it will not interfere with the other font
switching tricks, and its name doesn't matter). 

The way to get hold of the font is with something like
mLoadedFonts[mIndexOfSubstituteFont], which is then used to 
instantiate a nsFontWinSubstitute that has its own GetWidth/DrawString.
[e.g., GetWidth('xyz') actually does GetWidth('???'),
DrawString('xyz') actually does DrawString('???')];

Since the font is kept in mLoadedFonts[], subsequent searches for the
same chars without glyphs are speedy, and return the substitute font
straight away.

rbs

Comment 23

•

25 years ago

Should this go in M14?

Erik van der Poel

Assignee

Comment 24

•

25 years ago

The PDT team has already looked at this and marked it PDT- (i.e. not a
candidate for Beta 1).

Your work *is* appreciated though, especially by me!

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 25

•

25 years ago

While the tree is still open (i.e., today), why should not being marked PDT+
prevent someone who doesn't work for Netscape from checking in the fix (assuming
it's reviewed, etc.)?  Some of us do think this should definitely be in beta1.

Akkana Peck

Comment 26

•

25 years ago

I wondered that as well.  The tree isn't closed to non-PDT+ fixes, it's just
that Netscape engineers aren't supposed to be spending time tracking down
non-PDT+ fixes.  Or at least, that's been my understanding.

Erik van der Poel

Assignee

Comment 27

•

25 years ago

I don't have time to review the code since I'm working on PDT+ bugs. If
somebody else does have time to review it, and both Roger and the reviewer can
promise that this code will not cause other bugs in M14 or cause performance
problems, then I may give my reluctant "OK". I'm not trying to be mean here.
I've had years of experience shipping betas, and this is not the right time to
introduce significant numbers of lines of code in sensitive areas for features
that are not even deemed PDT+ by the PDT team. When you are close to the
deadline, a fix must be for something that is *both* actually needed *and*
very safe and understood.

Erik van der Poel

Assignee

Comment 28

•

25 years ago

By the way, Windows GFX changes now require Troy's review, and Kevin McCluskey
is the GFX owner, so his review is also required.

Erik van der Poel

Assignee

Comment 29

•

25 years ago

By the way, as Jim Roskind said in mozilla.seamonkey, M14 may not become
Netscape's Beta 1. If M14 has too many beta-stoppers, M15 will be the target.
So there is still some chance that this fix will make Netscape's Beta 1, and
even if it doesn't we can still get it into Beta 2.

bobj

Comment 30

•

25 years ago

If you can post the diffs and get Troy and Kevin McCluskey to review the
changes, go for it...
cc'd troy and kmcclusk

rbs

Comment 31

•

25 years ago

Attached image screenshot of the replacements (stylable as well...) — Details

rbs

Comment 32

•

25 years ago

Attached image screenshot of the testcase with extra math fonts enabled — Details

rbs

Comment 33

•

25 years ago

Attached image screenshot of the testcase with extra math fonts disabled — Details

rbs

Comment 34

•

25 years ago

Attached file diff in gfx — Details

rbs

Comment 35

•

25 years ago

Attached file diff in intl — Details

rbs

Comment 36

•

25 years ago

Ok guys, now you have everything. Upon review, the decision is yours.
I am pretty confident that the code is now solid. It makes the browsing
experience quite nice. Erik has been very helpful in providing insights
and a great deal of feedback during development. I understand his prudence,
and if the code has to wait, no problemo.

The part in intl consists in adding extra mapping tables to enable
the conversion of further math fonts.

These fonts can be enabled by default in gfx by removing the
#ifdef MOZ_MATHML that is in gFontsHaveConverters, if you deem
appropriate. Screenshots of the testcases when these were enabled
and disabled were attached.

The fact that some math chars "overflow" in the testcase is due to the 
FindFont('a') hack --it returns the "ascii" metrics that are used as the
default metrics. This is a recorded bugzilla bug 20394 which erik plans to 
address with a GetDimensions() method that will supersede the current 
GetHeight/GetWidth.

rbs

Comment 37

•

25 years ago

Of note, the patch for gfx can be tried in *isolation*.
So, feel free to apply it on gfx, and report what you see (on win32)...

Erik van der Poel

Assignee

Updated

•

25 years ago

Severity: normal → major

Priority: P3 → P1

Target Milestone: M16 → M15

rbs

Comment 38

•

25 years ago

The bug is scheduled for M15, meaning there is time for improvements on the
present solution, if worthwhile. Currently, since "viewsource" goes through the
same rendering process, it means that chars without glyphs are still renderered
with the REPLACEMENT CHAR, even within "viewsource". Is it, instead, desirable 
to display the unicode points there?

Implementing this is not really demanding, because the font engine knows 
everything, and can either choose to replace the chars, or render their
unicode points. However, "viewsource" is currently known to be notoriously slow
and the question is whether it is worthwhile doing these things that will make
it even slower. Also, the font engine will then have to know that the consumer 
is "viewsource"

BTW, is there a common unicode font that actually has a glyph for the Unicode's
replacement char U+FFFD?

Erik van der Poel

Assignee

Comment 39

•

25 years ago

nsFont.decorations is a PRUint8 bit-mask that could be used for your View Source
idea. If we had 2 bits, we could have 4 modes:

(1) show glyphs as usual (like browser)
(2) unavailable glyphs are shown as Unicodes
(3) all non-ASCII chars (> 0x7F) are shown as Unicodes
(4) all chars are shown as Unicodes

Number (4) is not that useful though...

rbs

Comment 40

•

25 years ago

Sounds good. Some comments/clarifications:

(1) show glyphs as usual (like browser). All missing chars show up as the 
replacement char (this is the current behavior --not really helpful,
but can be left as an option, as it is the fatest mode, and is a sort of
WYSIWYG too!).

(2) unavailable glyphs are shown as Unicodes -- this mode allow
copy-pasting of the screen content of viewsource.

(3) all non-ASCII chars (> 0x7F) are shown as Unicodes -- Not very
sure how helpful is this. So the reformulation:

(3) all non-ASCII chars (> 0x7F) are shown as both Unicodes and 
Glyphs, in the format "0xUnicode{Glyph}". Hence a char withouht glyph
will show up as "0xUnicode{REPLACEMENT_CHAR}".

rbs

Comment 41

•

25 years ago

I now have all this coded up in my tree, and there are in sync with the changes
Troy made recently --bug 27056, to optimize the use of SelectObject().

For (1), that was the original code. I didn't have to do anything --save
bringing the code in sync.

For (3), this is coded in nsRenderingContextWin. I added a convenience function
SubstituteNonAsciiChars() that checks to see if some non-ascii chars have to be
substituted. And... it is the output of this transformation that is used in the
body of GetWidth and DrawString. (This way, the font-swithing machinery will
pick the glyphs... For example, "Å" will appear as "ÅU+000C5". "Å" could be any
strange glyph that could be picked in any font.) The SubstituteNonAsciiChars() 
function is carefully optimized and does nothing if nothing has to be done 
(e.g., when the current text run consists only of ascii chars).

Concretely, the code of GetWidth(aString) and DrawString(aString) now loos like:
  PRUnichar* pstr = aString;
  if (shouldSubstituteNonAsciiChars) {
    pstr = SubstituteNonAsciiChars(aString);
  }
  ... use the *same* code as before to do the measurement/rendering using pstr

For (2), this is coded by nsFontWinSubstitute in nsFontMetricsWin.
(Doing so fulfills the intended purpose because nsFontWinSubstitute only
applies to chars without glyphs.) The code of GetWidth(aString) and
DrawString(aString) there now loos like:
  PRUnichar* pstr = aString;
  if (shouldDisplayUnicode) {
    ... sprintf Unicode points
  }
  else {
    ... default to the REPLACEMENT CHAR
  }
  ... use the *same* code as before to do the measurement/rendering using pstr

====
It is interesting to see the HEX (huh, Unicode) modes in action, and as you can
see from the description above, the code is solidly standing upon the existing
foundation and should therefore be as robust as the foundation :-)
That's why I am pretty confident that the code is OK and should go in M15 as
early as possible to enable further testing and enjoyement by others.

What is missing in the code is how to activate these 'shouldDisplayUnicode'
or 'shouldSubstituteNonAsciiChars'.

Jerry.Kirk

Comment 42

•

25 years ago

I just ran across this font problem while porting to the Photon UI,
which may be the same as this problem, on articles at 
www.abcnews.com I see HTML that looks like:

<font face= geneva,arial,helvetica size=5><b>
<a href="/sections/us/DailyNews/bettyloubeets000218.html" >
Weighing &#145;Black Widow&#146;s&#146; Fate
</a>
</b></font><br>

The "&#145;" character looks like an arrow pointing upward
under Photon. It is correct under Windows, Mac and Linux
Netscape 4.7 but it is not correct under Photon or GTK Mozilla.

I found out that numeric entities between 130-159 are special
and need to be translated. Check out this link for more info:

http://www.hut.fi/u/jkorpela/www/windows-chars.html

I guess I can put some sort of translation in my own GFX code
to do the translation but I was not sure if the purpose and fix
for this bug would also fix this problem.

Does this bug encompass this problem??

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 43

•

25 years ago

Those numeric entities should not be supported in strict mode, since they are
undefined.  (However, as characters in the document, they could be supported if
it's in a windows encoding, rather than iso-8859-1.)

rbs

Comment 44

•

25 years ago

I am not sure if these codes are treated as special cases somewhere, and
re-mapped to their proper unicode values --At the level of GFX they are not
treated as special cases.

Erik van der Poel

Assignee

Comment 45

•

25 years ago

NCRs (numeric character references) like &#145; are indeed illegal, but very
common on the Net. Our parser converts them to the correct Unicodes, and GFX
then tries to find fonts with glyphs for those Unicodes. On Unix we have for a
long time had the problem that Windows extensions in the 0x80-0x9F range could
not be displayed. This bug report does indeed refer to that kind of problem.
There are several bug reports assigned to me in this area, some Unix-specific.

rbs

Comment 46

•

25 years ago

BTW, we still need additional API to really nail this bug 6585 down.
The substitution mode is not yet defined and I saw in bug 5100
that peterl didn't like seeing nsFont used for extra things.
So we will need some other ways to pass rendering options in GFX.
There could be a SetFlags() (or whatever) to specify rendering
policies. 

RFE: going back to the initial questions raised in this bug report, an 
interesting option could be to pool/sort all the missing chars, in view of
possibly sending a one-off feedback to the user (bug 6211, with a listing of
names?), or in view of determining the fonts that can be auto-downloaded (or
suggested for download). Note that it would be hard to implement a 
tooltip --compared to improving the viewsource trick to show their unicode 
in an ERROR-type color, but even achieving this is not straightforward.

Erik van der Poel

Assignee

Comment 47

•

25 years ago

This bug is now for Windows only, and assigned to Roger for M16 (i.e. after
Beta 1 branch). I created bug 31252 for Unix. Mac already has substitutes.

Assignee: erik → rbs

Status: ASSIGNED → NEW

OS: All → Windows NT

Hardware: All → PC

Summary: missing glyphs should use substitute (U+FFFD) → missing glyphs should use substitute

Target Milestone: M15 → M16

rbs

Comment 48

•

24 years ago

Code checked-in. Chars with missing glyphs are now substituted.

Currently only one replacement mode is activated.
Additional API is needed to switch to different modes on the fly.

Erik van der Poel

Assignee

Comment 49

•

24 years ago

Roger, thanks for taking on this work, and for checking it in. This bug report
is out of control, and we ought to create separate bug reports for the
remaining issues. Re-assign the bug back to me if you'd like me to do that.

rbs

Comment 50

•

24 years ago

Re-assigning to erik to better organise where to head to:

* Use substitute as per bug title (done).
* API to flag different rendering policies (e.g., could be with 
a general struct, or for a start, a bitarray with some bits for
the replacement mode)
* Visual feedback of missing chars (Error-color for ViewSource? popup?)
* Auto-download of fonts ?
* Error log ? (collect all missing chars -- bug 6211) ?

Assignee: rbs → erik

Erik van der Poel

Assignee

Comment 51

•

24 years ago

At some point, I will look into this bug, and create one or more new bugs to
cover the remaining issues. For now, setting milestone to M17.

Status: NEW → ASSIGNED

Target Milestone: M16 → M17

bobj

Comment 52

•

24 years ago

I'd prefer we think about the design sooner, so we can define and prioritize
tasks and possibly delegate them as well...

Frank Tang

Comment 53

•

24 years ago

I did similar thing on Mac already. However, we should do special process for
"space" characters in the range of U+2000-U+206F. (some of them, not all of
them). There are no reason we cannot "render"/"measure" a space characters, even
non of the font have these glyph (a glyph for a space character is simply a
matrix without any countor....)

Erik van der Poel

Assignee

Comment 54

•

24 years ago

This bug is *way* out of control. Please do not add any more comments to it.
I have created a new bug for the Unicode space character issue:

  http://bugzilla.mozilla.org/show_bug.cgi?id=33032

Erik van der Poel

Assignee

Comment 55

•

24 years ago

Please do not add any more comments to this bug. Marking FIXED (since we now at
least get question marks for unavailable glyphs). Other issues have been
separated out. Thanks, Shrirang!

From: shrir@netscape.com (Shrirang Khanzode)
  
I have seperated the other issues from bug 6585. 
Bugs logged: 
Bug 41251 : API to flag rendering policies 
Bug 41249: Visual feedback for missing characters 
Bug 41250 : Auto download of fonts 
(Filed already) Bug 6211 - Error log to collect missing characters

Status: ASSIGNED → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Target Milestone: M17 → M16

Chris Petersen

Comment 56

•

24 years ago

Marking fixed in the July 6th build.

Status: RESOLVED → VERIFIED

screenshot of the replacements (stylable as well...) 25 years ago rbs 76.90 KB, image/jpeg		Details
screenshot of the testcase with extra math fonts enabled 25 years ago rbs 123.19 KB, image/jpeg		Details
screenshot of the testcase with extra math fonts disabled 25 years ago rbs 121.73 KB, image/jpeg		Details
diff in gfx 25 years ago rbs 48.07 KB, text/plain		Details
diff in intl 25 years ago rbs 10.72 KB, text/plain		Details