Do not use '?' as fallback gylph

RESOLVED FIXED

Status

()

Core
Internationalization
P3
normal
RESOLVED FIXED
18 years ago
10 years ago

People

(Reporter: BenB, Assigned: smontagu)

Tracking

({helpwanted})

Trunk
x86
Linux
helpwanted
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

18 years ago
BUG
If a char can not be displayed on Unix, because there's no corresponding gylphs
found in the font(s), a "?" is displayed.

RELEVANCE
This is *very* irritating. You don't want to know, what "explanations" I found
for myself in trying to figure out, why there are always those question marks
instead of quote chars in 4.x. Even now, that I know, what is the reason, this
still irritates me.

SUGGESTION
Use a glyph, that doesn't usually appear in text. The convention on Windows is a
non-filled rectangle.
(Reporter)

Comment 1

18 years ago
Created attachment 8026 [details]
Testcase

Comment 2

18 years ago
Thanks for the report. Feel free to submit the fix. I have other priorities for
the foreseeable future.
Status: NEW → ASSIGNED
Target Milestone: --- → M30

Comment 3

18 years ago
Perhaps more useful than a filled rectangle would be to print [XYZ] in some 
special style, where XYZ is the ASCII (or whatever) number for the character? The 
string would, of course, need to be treated as one character (for selection, 
:first-letter etc purposes).

The solution for this bug should be the same as what you do for an unknown entity 
(&foobar;).
Keywords: helpwanted

Comment 4

18 years ago
There are some other possibilities here.

U+FFFD is the "replacement character" glyph and a Linux Mozilla will use it
if it can be found. If it can't be found then Mozilla tries a
transliteration string. Try adding "entity.65533=FOO" to the end of
entityTables/transliterate.properties and view a page with some invalid
entity like "". Mozilla uses '?' only if nothing else is available.

The transliteration strings could indeed be rendered in som distinct way or,
perhaps, Mozilla could have some built-in U+FFFD character.

Comment 5

18 years ago
There's another, *better* solution. See bug #12662.

(And bug #454 (yes, that's a very old bug).)

Comment 6

18 years ago
And bug #9574 ...
(Reporter)

Comment 7

18 years ago
The other bugs do not help in all cases. And fixing this bug should be
incredibly easy, if you know where this char is sat. Erik, where is that? I
could fix it myself, then.

Comment 8

18 years ago
mozilla/gfx/src/windows/nsFontMetricsWin.cpp, near top of file, macro called
NS_REPLACEMENT_CHAR.

mozilla/gfx/src/gtk/nsFontMetricsGTK.cpp, look for nsFontGTKSubstitute::Convert.

For the Mac, please ask ftang@netscape.com (Cc'ed).

Please let me review any changes before you check them in.
(Reporter)

Comment 9

18 years ago
Erik, thanks for the hint. Taking bug.

> For the Mac, please ask ftang@netscape.com (Cc'ed).

*ask*

Comment 10

18 years ago
BenB, before you spend a lot of time coding, would you please tell us what you
intend to do? Let's agree on that before putting a lot of effort into this.
Thanks.
(Reporter)

Comment 11

18 years ago
Erik,
I intend to just replce '?' with some other char (one that usually doesn't
appear in text, see initial description, comments/suggestions welcome).

Replacing '?' with some otehr char in the places you mentioned didn't work - I
still see '?' in the attached testcase.
Summary: Do not use "?" as fallback gylph → Do not use '?' as fallback gylph
(Reporter)

Comment 12

18 years ago
I'm using Linux and just changed the *GTK* file.

Comment 13

18 years ago
Take another look at nsFontGTKSubstitute::Convert. Look for "QuestionMark".
(Reporter)

Updated

18 years ago
Keywords: mozilla1.0

Comment 14

18 years ago
take out TM and reassign to bstell
Assignee: erik → bstell
Status: ASSIGNED → NEW
Target Milestone: M30 → ---

Updated

18 years ago
Status: NEW → ASSIGNED
(Reporter)

Comment 15

18 years ago
I had something working, but
- I don't know which char to use. It must not be used in normal text and should
be available in all relevant charsets. Which chars are save? Only ASCII or can I
use any ISO-8859-1 char? (The latter would make the choice *substantially* easier.)
- The source change is not as trivial as it seems.
  - Appearantly, anything can select over an IDL interface which replacement
strategy to use ("?", char code number and some other options). Unfortunately,
the identifer for "?" is actually called 'UseQuestionMark' or similar. :-( I.e.
either we have a misnamed API, or we change it.
  - In addition to that, there is at least one fallback in each of the
platform-specific implementation. The problem is to catch all, or we might have
the new replacement char in most cases and the old one, the question mark, in
some exceptional cases.

Comment 16

18 years ago
we could display the unicode value; eg: "\x5E3A" or (5E3A)
at least this way the user could tell us the code points that are
failing instead of "I say a bunch of <fallback> characters".
(Reporter)

Comment 17

18 years ago
Brian, we have that option already - that's one of the other options in the
interface. This would mean to change the caller. But I would prefer not to do
that. Assuming you have the following original text:

Die Bäuerin ißt die größten Törtchen.

(The text makes no sense.) What is easier to read:

Die B#uerin i#t die gr##ten T#rtchen.

or

Die B\x5463uerin i\x4683t ist gr\x5734\x4683ten T\x5734rtchen.

? The former.
UNICODE has a fallback glyph, if any font on the system has this fallback glyph,
checking in the order in which fonts are given in the font-family list, then we
should use that before any other character.
(Reporter)

Comment 19

18 years ago
> if any font on the system has this fallback glyph, [...]
> then we should use that before any other character.

Doesn't it look odd, if you have a Times glyph in a Courier font block?

Comment 20

18 years ago
> UNICODE has a fallback glyph

What's its name/number?
Ben: That's why you try all the specified fonts first. This is the same 
algorithm as is used to find each glyph in the first place.

Karl: U+FFFD REPLACEMENT CHARACTER
(Reporter)

Comment 22

18 years ago
> This is the same algorithm as is used to find each glyph in the first place.

Oh, OK. So, all we need to do it replcae the |'?'| in the source with
|REPLACEMENT_CHAR| or however it is called? (Assuming we leave the misnamed API
alone for now.)

Comment 23

18 years ago
The API is not really "misnamed", since it *does* give you question marks, as
advertized. You may argue that the API should also (or instead) have given you
the option to choose something other than the question mark, but that is a
different argument.

Anyway, it is not sufficient to replace '?' with the Unicode replacement char,
since (at least on Windows and Unix) the "substitute" font machinery assumes
that the substitute font only guarantees ASCII availability (0x20-0x7E), and so
you can only use ASCII fallbacks (e.g. "?", "EUR", "...", etc).

If you wanted to have a more general fallback mechanism, you would have to
implement another level of font switching (in addition to the one that is
already in nsRenderingContext{Win,GTK}). I.e. you would need to switch from font
to font inside the nsFont{Win,GTK}Substitute.{GetWidth,DrawString,etc} methods
too.
(Reporter)

Comment 24

18 years ago
> The API is not really "misnamed", since it *does* give you question marks, as
> advertized.

This bug is about changing that -> The API (name) was not general enough -> It
*will be* "misnamed" once we fix this bug (and not the API).

> it is not sufficient to replace '?' with the Unicode replacement char, [...]

I guessed so :-(.

Proposal: 2 steps for this bug.

Step 1 just changes the current replacement char ('?') to some other, more
uncommon, but ASCII, char, e.g. '|'.

Step 2 adds an additional layer on top of that. It tries to get the Unicode
replacement char via the normal font mechnisms (Erik, would that work in the
substitution code?), and, if failing that, use the ASCII replacement char.

Comment 25

18 years ago
> Karl: U+FFFD REPLACEMENT CHARACTER

This is actually very nice. It looks (could look) like a white question mark on 
a rotated black square. The description reads:

"used to replace an incoming character
whose value is unknown or unrepresentable
in Unicode"

Comment 26

18 years ago
If you submit a patch that simply changes the '?' to something else but does not
change the name used in the API (attr_FallbackQuestionMark), then I will
disapprove. Also, the old API should probably stay, so you need to add
something.

I actually don't feel so strongly about the '?', but if many people want to
change it, then I'd rather use '#' than '|', since the latter looks too much
like an 'l' or a '1' or an 'I'.

Yes, the additional font switching would be implemented inside
nsFontGTKSubstitute::{GetWidth,DrawString,GetBoundingMetrics}.
(Reporter)

Comment 27

18 years ago
I'll attach what I have. Erik already said, he will disapprove it, but I don't
care to add a new interface (I would have changed the existing one).
(Reporter)

Comment 28

18 years ago
Created attachment 22599 [details] [diff] [review]
Fix that WFM, version 1
(Reporter)

Comment 29

18 years ago
Oh, cool, no the fix doesn't work too well - Schumacher Clean on Linux doesn't
have the replacement char I chose (°) :-(. I should have used #. Anyway, you can
see from the patch what to change.

Updated

18 years ago
Target Milestone: --- → Future

Comment 30

17 years ago
--> ftang
Assignee: bstell → ftang
Status: ASSIGNED → NEW

Comment 31

17 years ago
bulk move NEW FUTURE bug to ASSIGN
Status: NEW → ASSIGNED

Comment 32

16 years ago
BenB, in response to comment #19, the replacement character has an appearance
unique enough that I don't think it really would look different in different
fonts. It's a black diamond with a white question mark inside. Looks like this
if your browser can render it: &#65533;

Most (all?) versions of Netscape 6 used this character for many missing glyphs,
and it's the only method of calling out missing characters I've ever seen that
didn't make the text confusing. Empty squares, question marks, and # signs can
all look like legidimate text in some situations. I'd argue that no US-ASCII
character will ever avoid these problems, and that some distinct character must
be used.

Either use Replacement Character (and include a font with just that char in all
moz distributions if you have to), or create a GIF or PNG image that looks like
it, and insert that in the event that Replacement Character is not available.

Readers need something that makes it clear they're going to have to guess (from
context, likely) what the character was supposed to be, and authors also need
something bold and clear--so they know they're using a character that's either
incorrect or not present in all fonts. Using any other character is going to
hide that fact.

&#65533; is easily seen when scanning text, and though it can have a jarring
appearance, this is a necessary effect.

Comment 33

13 years ago
-> to default owner (rather than ftang's WONTFIX)
Assignee: ftang → smontagu
Status: ASSIGNED → NEW
QA Contact: teruko → amyy
Target Milestone: Future → ---
(Assignee)

Comment 34

10 years ago
Fixed by bug 372629.
Status: NEW → RESOLVED
Last Resolved: 10 years ago
Depends on: 372629
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.