37330 - Do not use '?' as fallback gylph

Reporter

Description

•

24 years ago

BUG
If a char can not be displayed on Unix, because there's no corresponding gylphs
found in the font(s), a "?" is displayed.

RELEVANCE
This is *very* irritating. You don't want to know, what "explanations" I found
for myself in trying to figure out, why there are always those question marks
instead of quote chars in 4.x. Even now, that I know, what is the reason, this
still irritates me.

SUGGESTION
Use a glyph, that doesn't usually appear in text. The convention on Windows is a
non-filled rectangle.

Ben Bucksch (:BenB)

Reporter

Comment 1

•

24 years ago

Attached file Testcase — Details

Erik van der Poel

Comment 2

•

24 years ago

Thanks for the report. Feel free to submit the fix. I have other priorities for
the foreseeable future.

Status: NEW → ASSIGNED

Target Milestone: --- → M30

Matthew T (active 1999-2002)

Comment 3

•

24 years ago

Perhaps more useful than a filled rectangle would be to print [XYZ] in some 
special style, where XYZ is the ASCII (or whatever) number for the character? The 
string would, of course, need to be treated as one character (for selection, 
:first-letter etc purposes).

The solution for this bug should be the same as what you do for an unknown entity 
(&foobar;).

Keywords: helpwanted

tenthumbs

Comment 4

•

24 years ago

There are some other possibilities here.

U+FFFD is the "replacement character" glyph and a Linux Mozilla will use it
if it can be found. If it can't be found then Mozilla tries a
transliteration string. Try adding "entity.65533=FOO" to the end of
entityTables/transliterate.properties and view a page with some invalid
entity like "&#x81;". Mozilla uses '?' only if nothing else is available.

The transliteration strings could indeed be rendered in som distinct way or,
perhaps, Mozilla could have some built-in U+FFFD character.

Karl Ove Hufthammer

Comment 5

•

24 years ago

There's another, *better* solution. See bug #12662.

(And bug #454 (yes, that's a very old bug).)

Karl Ove Hufthammer

Comment 6

•

24 years ago

And bug #9574 ...

Ben Bucksch (:BenB)

Reporter

Comment 7

•

24 years ago

The other bugs do not help in all cases. And fixing this bug should be
incredibly easy, if you know where this char is sat. Erik, where is that? I
could fix it myself, then.

Erik van der Poel

Comment 8

•

24 years ago

mozilla/gfx/src/windows/nsFontMetricsWin.cpp, near top of file, macro called
NS_REPLACEMENT_CHAR.

mozilla/gfx/src/gtk/nsFontMetricsGTK.cpp, look for nsFontGTKSubstitute::Convert.

For the Mac, please ask ftang@netscape.com (Cc'ed).

Please let me review any changes before you check them in.

Ben Bucksch (:BenB)

Reporter

Comment 9

•

24 years ago

Erik, thanks for the hint. Taking bug.

> For the Mac, please ask ftang@netscape.com (Cc'ed).

*ask*

Erik van der Poel

Comment 10

•

24 years ago

BenB, before you spend a lot of time coding, would you please tell us what you
intend to do? Let's agree on that before putting a lot of effort into this.
Thanks.

Ben Bucksch (:BenB)

Reporter

Comment 11

•

24 years ago

Erik,
I intend to just replce '?' with some other char (one that usually doesn't
appear in text, see initial description, comments/suggestions welcome).

Replacing '?' with some otehr char in the places you mentioned didn't work - I
still see '?' in the attached testcase.

Summary: Do not use "?" as fallback gylph → Do not use '?' as fallback gylph

Ben Bucksch (:BenB)

Reporter

Comment 12

•

24 years ago

I'm using Linux and just changed the *GTK* file.

Erik van der Poel

Comment 13

•

24 years ago

Take another look at nsFontGTKSubstitute::Convert. Look for "QuestionMark".

Ben Bucksch (:BenB)

Reporter

Updated

•

24 years ago

Keywords: mozilla1.0

Frank Tang

Comment 14

•

24 years ago

take out TM and reassign to bstell

Assignee: erik → bstell

Status: ASSIGNED → NEW

Target Milestone: M30 → ---

kill this account

Updated

•

24 years ago

Status: NEW → ASSIGNED

Ben Bucksch (:BenB)

Reporter

Comment 15

•

24 years ago

I had something working, but
- I don't know which char to use. It must not be used in normal text and should
be available in all relevant charsets. Which chars are save? Only ASCII or can I
use any ISO-8859-1 char? (The latter would make the choice *substantially* easier.)
- The source change is not as trivial as it seems.
  - Appearantly, anything can select over an IDL interface which replacement
strategy to use ("?", char code number and some other options). Unfortunately,
the identifer for "?" is actually called 'UseQuestionMark' or similar. :-( I.e.
either we have a misnamed API, or we change it.
  - In addition to that, there is at least one fallback in each of the
platform-specific implementation. The problem is to catch all, or we might have
the new replacement char in most cases and the old one, the question mark, in
some exceptional cases.

kill this account

Comment 16

•

24 years ago

we could display the unicode value; eg: "\x5E3A" or (5E3A)
at least this way the user could tell us the code points that are
failing instead of "I say a bunch of <fallback> characters".

Ben Bucksch (:BenB)

Reporter

Comment 17

•

24 years ago

Brian, we have that option already - that's one of the other options in the
interface. This would mean to change the caller. But I would prefer not to do
that. Assuming you have the following original text:

Die Bäuerin ißt die größten Törtchen.

(The text makes no sense.) What is easier to read:

Die B#uerin i#t die gr##ten T#rtchen.

or

Die B\x5463uerin i\x4683t ist gr\x5734\x4683ten T\x5734rtchen.

? The former.

Hixie (not reading bugmail)

Comment 18

•

24 years ago

UNICODE has a fallback glyph, if any font on the system has this fallback glyph,
checking in the order in which fonts are given in the font-family list, then we
should use that before any other character.

Ben Bucksch (:BenB)

Reporter

Comment 19

•

24 years ago

> if any font on the system has this fallback glyph, [...]
> then we should use that before any other character.

Doesn't it look odd, if you have a Times glyph in a Courier font block?

Karl Ove Hufthammer

Comment 20

•

24 years ago

> UNICODE has a fallback glyph

What's its name/number?

Hixie (not reading bugmail)

Comment 21

•

24 years ago

Ben: That's why you try all the specified fonts first. This is the same 
algorithm as is used to find each glyph in the first place.

Karl: U+FFFD REPLACEMENT CHARACTER

Ben Bucksch (:BenB)

Reporter

Comment 22

•

24 years ago

> This is the same algorithm as is used to find each glyph in the first place.

Oh, OK. So, all we need to do it replcae the |'?'| in the source with
|REPLACEMENT_CHAR| or however it is called? (Assuming we leave the misnamed API
alone for now.)

Erik van der Poel

Comment 23

•

24 years ago

The API is not really "misnamed", since it *does* give you question marks, as
advertized. You may argue that the API should also (or instead) have given you
the option to choose something other than the question mark, but that is a
different argument.

Anyway, it is not sufficient to replace '?' with the Unicode replacement char,
since (at least on Windows and Unix) the "substitute" font machinery assumes
that the substitute font only guarantees ASCII availability (0x20-0x7E), and so
you can only use ASCII fallbacks (e.g. "?", "EUR", "...", etc).

If you wanted to have a more general fallback mechanism, you would have to
implement another level of font switching (in addition to the one that is
already in nsRenderingContext{Win,GTK}). I.e. you would need to switch from font
to font inside the nsFont{Win,GTK}Substitute.{GetWidth,DrawString,etc} methods
too.

Ben Bucksch (:BenB)

Reporter

Comment 24

•

24 years ago

> The API is not really "misnamed", since it *does* give you question marks, as
> advertized.

This bug is about changing that -> The API (name) was not general enough -> It
*will be* "misnamed" once we fix this bug (and not the API).

> it is not sufficient to replace '?' with the Unicode replacement char, [...]

I guessed so :-(.

Proposal: 2 steps for this bug.

Step 1 just changes the current replacement char ('?') to some other, more
uncommon, but ASCII, char, e.g. '|'.

Step 2 adds an additional layer on top of that. It tries to get the Unicode
replacement char via the normal font mechnisms (Erik, would that work in the
substitution code?), and, if failing that, use the ASCII replacement char.

Karl Ove Hufthammer

Comment 25

•

24 years ago

> Karl: U+FFFD REPLACEMENT CHARACTER

This is actually very nice. It looks (could look) like a white question mark on 
a rotated black square. The description reads:

"used to replace an incoming character
whose value is unknown or unrepresentable
in Unicode"

Erik van der Poel

Comment 26

•

24 years ago

If you submit a patch that simply changes the '?' to something else but does not
change the name used in the API (attr_FallbackQuestionMark), then I will
disapprove. Also, the old API should probably stay, so you need to add
something.

I actually don't feel so strongly about the '?', but if many people want to
change it, then I'd rather use '#' than '|', since the latter looks too much
like an 'l' or a '1' or an 'I'.

Yes, the additional font switching would be implemented inside
nsFontGTKSubstitute::{GetWidth,DrawString,GetBoundingMetrics}.

Ben Bucksch (:BenB)

Reporter

Comment 27

•

24 years ago

I'll attach what I have. Erik already said, he will disapprove it, but I don't
care to add a new interface (I would have changed the existing one).

Ben Bucksch (:BenB)

Reporter

Comment 28

•

24 years ago

Attached patch Fix that WFM, version 1 — Details — Splinter Review

Ben Bucksch (:BenB)

Reporter

Comment 29

•

24 years ago

Oh, cool, no the fix doesn't work too well - Schumacher Clean on Linux doesn't
have the replacement char I chose (°) :-(. I should have used #. Anyway, you can
see from the patch what to change.

kill this account

Updated

•

24 years ago

Target Milestone: --- → Future

kill this account

Comment 30

•

23 years ago

--> ftang

Assignee: bstell → ftang

Status: ASSIGNED → NEW

Frank Tang

Comment 31

•

23 years ago

bulk move NEW FUTURE bug to ASSIGN

Status: NEW → ASSIGNED

Peter K. Sheerin

Comment 32

•

22 years ago

BenB, in response to comment #19, the replacement character has an appearance
unique enough that I don't think it really would look different in different
fonts. It's a black diamond with a white question mark inside. Looks like this
if your browser can render it: &#65533;

Most (all?) versions of Netscape 6 used this character for many missing glyphs,
and it's the only method of calling out missing characters I've ever seen that
didn't make the text confusing. Empty squares, question marks, and # signs can
all look like legidimate text in some situations. I'd argue that no US-ASCII
character will ever avoid these problems, and that some distinct character must
be used.

Either use Replacement Character (and include a font with just that char in all
moz distributions if you have to), or create a GIF or PNG image that looks like
it, and insert that in the event that Replacement Character is not available.

Readers need something that makes it clear they're going to have to guess (from
context, likely) what the character was supposed to be, and authors also need
something bold and clear--so they know they're using a character that's either
incorrect or not present in all fonts. Using any other character is going to
hide that fact.

&#65533; is easily seen when scanning text, and though it can have a jarring
appearance, this is a necessary effect.

Adam Hauner

Comment 33

•

19 years ago

-> to default owner (rather than ftang's WONTFIX)

Assignee: ftang → smontagu

Status: ASSIGNED → NEW

QA Contact: teruko → amyy

Target Milestone: Future → ---

Simon Montagu :smontagu

Assignee

Comment 34

•

16 years ago

Fixed by bug 372629.

Status: NEW → RESOLVED

Closed: 16 years ago

Depends on: 372629

Resolution: --- → FIXED

Testcase 24 years ago Ben Bucksch (:BenB) 143 bytes, text/html		Details
Fix that WFM, version 1 24 years ago Ben Bucksch (:BenB) 1.77 KB, patch		Details \| Diff \| Splinter Review