Closed Bug 37330 Opened 24 years ago Closed 16 years ago

Do not use '?' as fallback gylph

Categories

(Core :: Internationalization, defect, P3)

x86
Linux
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: BenB, Assigned: smontagu)

References

Details

(Keywords: helpwanted)

Attachments

(2 files)

BUG
If a char can not be displayed on Unix, because there's no corresponding gylphs
found in the font(s), a "?" is displayed.

RELEVANCE
This is *very* irritating. You don't want to know, what "explanations" I found
for myself in trying to figure out, why there are always those question marks
instead of quote chars in 4.x. Even now, that I know, what is the reason, this
still irritates me.

SUGGESTION
Use a glyph, that doesn't usually appear in text. The convention on Windows is a
non-filled rectangle.
Attached file Testcase
Thanks for the report. Feel free to submit the fix. I have other priorities for
the foreseeable future.
Status: NEW → ASSIGNED
Target Milestone: --- → M30
Perhaps more useful than a filled rectangle would be to print [XYZ] in some 
special style, where XYZ is the ASCII (or whatever) number for the character? The 
string would, of course, need to be treated as one character (for selection, 
:first-letter etc purposes).

The solution for this bug should be the same as what you do for an unknown entity 
(&foobar;).
Keywords: helpwanted
There are some other possibilities here.

U+FFFD is the "replacement character" glyph and a Linux Mozilla will use it
if it can be found. If it can't be found then Mozilla tries a
transliteration string. Try adding "entity.65533=FOO" to the end of
entityTables/transliterate.properties and view a page with some invalid
entity like "". Mozilla uses '?' only if nothing else is available.

The transliteration strings could indeed be rendered in som distinct way or,
perhaps, Mozilla could have some built-in U+FFFD character.
There's another, *better* solution. See bug #12662.

(And bug #454 (yes, that's a very old bug).)
And bug #9574 ...
The other bugs do not help in all cases. And fixing this bug should be
incredibly easy, if you know where this char is sat. Erik, where is that? I
could fix it myself, then.
mozilla/gfx/src/windows/nsFontMetricsWin.cpp, near top of file, macro called
NS_REPLACEMENT_CHAR.

mozilla/gfx/src/gtk/nsFontMetricsGTK.cpp, look for nsFontGTKSubstitute::Convert.

For the Mac, please ask ftang@netscape.com (Cc'ed).

Please let me review any changes before you check them in.
Erik, thanks for the hint. Taking bug.

> For the Mac, please ask ftang@netscape.com (Cc'ed).

*ask*
BenB, before you spend a lot of time coding, would you please tell us what you
intend to do? Let's agree on that before putting a lot of effort into this.
Thanks.
Erik,
I intend to just replce '?' with some other char (one that usually doesn't
appear in text, see initial description, comments/suggestions welcome).

Replacing '?' with some otehr char in the places you mentioned didn't work - I
still see '?' in the attached testcase.
Summary: Do not use "?" as fallback gylph → Do not use '?' as fallback gylph
I'm using Linux and just changed the *GTK* file.
Take another look at nsFontGTKSubstitute::Convert. Look for "QuestionMark".
Keywords: mozilla1.0
take out TM and reassign to bstell
Assignee: erik → bstell
Status: ASSIGNED → NEW
Target Milestone: M30 → ---
Status: NEW → ASSIGNED
I had something working, but
- I don't know which char to use. It must not be used in normal text and should
be available in all relevant charsets. Which chars are save? Only ASCII or can I
use any ISO-8859-1 char? (The latter would make the choice *substantially* easier.)
- The source change is not as trivial as it seems.
  - Appearantly, anything can select over an IDL interface which replacement
strategy to use ("?", char code number and some other options). Unfortunately,
the identifer for "?" is actually called 'UseQuestionMark' or similar. :-( I.e.
either we have a misnamed API, or we change it.
  - In addition to that, there is at least one fallback in each of the
platform-specific implementation. The problem is to catch all, or we might have
the new replacement char in most cases and the old one, the question mark, in
some exceptional cases.
we could display the unicode value; eg: "\x5E3A" or (5E3A)
at least this way the user could tell us the code points that are
failing instead of "I say a bunch of <fallback> characters".
Brian, we have that option already - that's one of the other options in the
interface. This would mean to change the caller. But I would prefer not to do
that. Assuming you have the following original text:

Die Bäuerin ißt die größten Törtchen.

(The text makes no sense.) What is easier to read:

Die B#uerin i#t die gr##ten T#rtchen.

or

Die B\x5463uerin i\x4683t ist gr\x5734\x4683ten T\x5734rtchen.

? The former.
UNICODE has a fallback glyph, if any font on the system has this fallback glyph,
checking in the order in which fonts are given in the font-family list, then we
should use that before any other character.
> if any font on the system has this fallback glyph, [...]
> then we should use that before any other character.

Doesn't it look odd, if you have a Times glyph in a Courier font block?
> UNICODE has a fallback glyph

What's its name/number?
Ben: That's why you try all the specified fonts first. This is the same 
algorithm as is used to find each glyph in the first place.

Karl: U+FFFD REPLACEMENT CHARACTER
> This is the same algorithm as is used to find each glyph in the first place.

Oh, OK. So, all we need to do it replcae the |'?'| in the source with
|REPLACEMENT_CHAR| or however it is called? (Assuming we leave the misnamed API
alone for now.)
The API is not really "misnamed", since it *does* give you question marks, as
advertized. You may argue that the API should also (or instead) have given you
the option to choose something other than the question mark, but that is a
different argument.

Anyway, it is not sufficient to replace '?' with the Unicode replacement char,
since (at least on Windows and Unix) the "substitute" font machinery assumes
that the substitute font only guarantees ASCII availability (0x20-0x7E), and so
you can only use ASCII fallbacks (e.g. "?", "EUR", "...", etc).

If you wanted to have a more general fallback mechanism, you would have to
implement another level of font switching (in addition to the one that is
already in nsRenderingContext{Win,GTK}). I.e. you would need to switch from font
to font inside the nsFont{Win,GTK}Substitute.{GetWidth,DrawString,etc} methods
too.
> The API is not really "misnamed", since it *does* give you question marks, as
> advertized.

This bug is about changing that -> The API (name) was not general enough -> It
*will be* "misnamed" once we fix this bug (and not the API).

> it is not sufficient to replace '?' with the Unicode replacement char, [...]

I guessed so :-(.

Proposal: 2 steps for this bug.

Step 1 just changes the current replacement char ('?') to some other, more
uncommon, but ASCII, char, e.g. '|'.

Step 2 adds an additional layer on top of that. It tries to get the Unicode
replacement char via the normal font mechnisms (Erik, would that work in the
substitution code?), and, if failing that, use the ASCII replacement char.
> Karl: U+FFFD REPLACEMENT CHARACTER

This is actually very nice. It looks (could look) like a white question mark on 
a rotated black square. The description reads:

"used to replace an incoming character
whose value is unknown or unrepresentable
in Unicode"
If you submit a patch that simply changes the '?' to something else but does not
change the name used in the API (attr_FallbackQuestionMark), then I will
disapprove. Also, the old API should probably stay, so you need to add
something.

I actually don't feel so strongly about the '?', but if many people want to
change it, then I'd rather use '#' than '|', since the latter looks too much
like an 'l' or a '1' or an 'I'.

Yes, the additional font switching would be implemented inside
nsFontGTKSubstitute::{GetWidth,DrawString,GetBoundingMetrics}.
I'll attach what I have. Erik already said, he will disapprove it, but I don't
care to add a new interface (I would have changed the existing one).
Oh, cool, no the fix doesn't work too well - Schumacher Clean on Linux doesn't
have the replacement char I chose (°) :-(. I should have used #. Anyway, you can
see from the patch what to change.
Target Milestone: --- → Future
--> ftang
Assignee: bstell → ftang
Status: ASSIGNED → NEW
bulk move NEW FUTURE bug to ASSIGN
Status: NEW → ASSIGNED
BenB, in response to comment #19, the replacement character has an appearance
unique enough that I don't think it really would look different in different
fonts. It's a black diamond with a white question mark inside. Looks like this
if your browser can render it: &#65533;

Most (all?) versions of Netscape 6 used this character for many missing glyphs,
and it's the only method of calling out missing characters I've ever seen that
didn't make the text confusing. Empty squares, question marks, and # signs can
all look like legidimate text in some situations. I'd argue that no US-ASCII
character will ever avoid these problems, and that some distinct character must
be used.

Either use Replacement Character (and include a font with just that char in all
moz distributions if you have to), or create a GIF or PNG image that looks like
it, and insert that in the event that Replacement Character is not available.

Readers need something that makes it clear they're going to have to guess (from
context, likely) what the character was supposed to be, and authors also need
something bold and clear--so they know they're using a character that's either
incorrect or not present in all fonts. Using any other character is going to
hide that fact.

&#65533; is easily seen when scanning text, and though it can have a jarring
appearance, this is a necessary effect.
-> to default owner (rather than ftang's WONTFIX)
Assignee: ftang → smontagu
Status: ASSIGNED → NEW
QA Contact: teruko → amyy
Target Milestone: Future → ---
Fixed by bug 372629.
Status: NEW → RESOLVED
Closed: 16 years ago
Depends on: 372629
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: