Closed Bug 79928 Opened 23 years ago Closed 23 years ago

Zero-width space (unicode x200b) not interpreted

Categories

(Core :: Layout, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 33498

People

(Reporter: samuel, Assigned: waterson)

References

Details

(Keywords: fonts, platform-parity, Whiteboard: INVALID/WONTFIX?)

Attachments

(8 files)

The unicode character #x200b, which is supposed to be a zero-width space, is not
interpreted by mozilla, and whatever character happens to be or not be at that
position in the current font gets displayed.

What should happen is that mozilla interprets the character and lays out the
text accordingly and doesn't use the font.
Blocks: 71549
Keywords: fonts
Attached file testcase
As demonstrated by the testcase, it does cause a line break, so it just needs to
not display the font character.
Attached patch patchSplinter Review
why? If the font claims to have a glyph for U+200B, why shouldn't we use it?
I think this is INVALID.
This is marked "OS: All" but it works for me on Windows 2000. It could be a 
Linux-only bug, but more likely it's just an error in the font.
Me again. If it is a bug in the font and there are fonts on Linux which don't
have the bug, then one possible solution is to simply put the non-broken fonts
at the front of the preferred font list in the CSS.

I tried this on Exceed and apparently none of my fonts have this character, so
it displayed as a question mark. The same should happen if none of my fonts had
a normal space (U+0020) or any of the other spacing characters around U+200x.
Keywords: pp
Whiteboard: INVALID/WONTFIX?
On Windows 98 it gives a question mark.  Isn't the point of this character code
to have no visible representation?  I think having the font indicate what
character it is, is good for debugging applications that aren't supporting it
properly.
The point is there isn't anything to "support". It's up to the font to have a
codepoint for U+200B, just like it is for U+0020 and U+2009. U+200B is not
special in any way.

Note that when we ship with MathML, we will have to ship with Math fonts, so
that would be a good time to ensure that we also ship with a font that has a 
U+200B glyph.
But shipping fonts is a download headache at best, and has licensing issues at 
worst. Ought we simply hack around it for now? Seems reasonable to me, anyway.
Assignee: karnaze → waterson
Keywords: patch
Sounds reasonnable to me too for now. I have been wandering in the area 
recently. The best spot for the fix-up patch is probably IS_DISCARDED(_ch)
in nsTextTransformer.cpp.
Attached patch better patch?Splinter Review
Comment on attachment 49397 [details] [diff] [review]
better patch?

sr=waterson
Attachment #49397 - Flags: superreview+
r=rbs, didn't test - assuming it has been tested

(I tried using the checkbox from the patch manager for the first time, but got a
red alert that "one of the statuses wasn't valid" -- is there any link to a help
for these new things?)
Hang on, the second patch doesn't seem to work, I'll look at it some more.
(The first patch does work, btw)
The second patch doesn't make sense.  We don't want the character discarded or
it won't cause the line break.  Also, the macro doesn't seem to handle big
characters or something.  So, back to the first patch unless someone has a
better idea.
Whiteboard: INVALID/WONTFIX?
Just having a font is the cleanest way to handle the problem since nsTextFrame 
is not the only place where strings are drawn within Mozilla. We have been 
bitten many times by layout patches that "work". So that criteria is not enough. 
Specifically, it is not clear what happens with several consecutive ZWSP 
characters. Also, there is not enough analysis as to what is happening to the 
offsets that nsTextFrame maintains, and whether it is necessary to sync the 
tranformation flag that nsTextFrame uses when it ends up with a text that isn't 
a replica of the original (the flag helps to optimize subsequent reflows such as 
that coming from dynamic changes). Suggesting caution about this one; Futuring 
is also a sensible option as originally proposed.
Please do not remove comments which I added to the status whiteboard. I still think
this bug should be marked WONTFIX or INVALID. Fonts should have an empty glyph at 
the ZWSP codepoint. If they don't, then they probably intended to show the glyph,
or are buggy. Either way, that isn't our problem, except if we want to ship with a
font that does this right. See also my comments at 2001-05-13 13:47.
Whiteboard: INVALID/WONTFIX?
Allright: Hixie wins. (Arguing with him is like wrestling with a bag full of cats.)
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WONTFIX
Verified WONTFIX. Very rare problem. Hopefully, it might become a non-issue if 
the auto-download of fonts becomes more comprehensive --can't remember the bug#.
(BTW, whitespace testcases also need to include an example with whitespace:pre 
for completeness.)
Status: RESOLVED → VERIFIED
Waterson, you pansie ;)

This isn't rare.  It's something we could use in *every* line of the javascript
debugger console, and chatzilla.  In these console interfaces, we need to make
sure that a long word doesn't come along and cause a horizontal scrollbar (try
using a console where every line overflows to the right.)  Currently, we do this
by inserting an html:img with a width of 0px.  If we could reliably use the
&zwsp; character instead, we'd have far fewer DOM nodes, and could construct the
messages faster, while conserving DOM objects.  Unfortunatley, in the Real World
(you remember the Real World, right, where our users live?) lots of fonts are
broken.  This includes default installs of contemporary Windows and Linux
systems.  Please bear this in mind as this bug sits "WONTFIX".

This isn't ssieb picking nits, he's trying to do something cool with the code
(see bug 71549), lets not stonewall it.
What I meant was that ZWSP is rarely encountered in web pages.

FYI, the font called "Lucida Sans Unicode" has a glyph for ZWSP as well as some
other space-like characters.

But actually, speaking from a broader perspective, if this bug is blocking some
other work, it might be much easier/safer to special case the ZWSP in the font
subsystem than in the cryptic code of nsTextFrame -- although getting r/sr on
the various versions of GFX is an annoyance. There is another advantage if done
in the font subsystem in that not all characters will have to be tested. Only
when fonts are missing -- it is known when fonts are missing for particular
characters, that's where they get substituted to the question mark '?'.

If a font with a ZWSP glyph is found, then all goes well, nothing to worry
about, otherwise, the so-called fallback font-substitute code gets activated and
a one-liner can be added therein to not susbstitute any ZWSP (since its width is
zero and it can be treated as if it wasn't there...) I am in the midst of
changes and can easily try this if it is blocking an immediate work.
&zwsp; is not the way to fix the wrapping problem. The way to fix that problem is
by using a proprietary value for 'white-space' just like we did with the 'wrap'
attribute on the <pre> element (using the '-moz-pre-wrap' keyword). Call it 
'-moz-hard-wrap' or whatever. It is very likely that CSS3 will have such a feature
anyway, so we are going to have to write the code at some point whatever we do.
Interesting, people have already started asking for a hard-wrap in bug 99457.

I personnally think that hixie (and dbaron) do a good job by being very careful
with the standards and keeping an eye on the platform for the future. It is too
tempting to hack all sort of things as a matter of expediency, and since mozilla
is so big, you wouldn't know the ugly things that are going on on your back and
their potential side-effects in the long run. I also know that they are careful
to keep their feets on the ground and accept hacks when they are convinced of
their value. So...

Who will conduct the analysis for the nsTextFrame hack?! But I looked at the
font code and we might perhaps help w.r.t. ZWSP if it blocks some other nifty
work, alright hixie? If you think it is still premature, I will move on... or
maybe you might want to first see the patch to confirm that it is simpler/safer
over there... let me attach something soon.
The reason I didn't mark the bug WONTFIX, nor verify it when it was so marked, is
because although I don't like such a hack, I would be prepared to tolerate it to
some extent. :-) So long as a font with a glyph at U+200B would still show the
glyph in response to a &zwsp; entity, then we should be ok.

However, I *do* think that using &zwsp; for the purposes that rginda wants, 
namely a "cheap" hard wrap, is wrong, and would be better handled by the 
'white-space' property, so if that is the sole reason to implement this hack, 
then I'm against.
I think implementing this at the same level as the code that changes U+2122 into
the two characters "T" and "M" would be reasonable, and not a hack.
Re-opening since some activitity is going on in the bug.

I attached a patch for gfxwin. Other platforms use transliterate.properties
(this is where U+2122 is associated to "TM", and &copy associated to "(c)" when 
there is no font with glyphs). Maybe the fix on other platforms is simply to add 
empty entries in intl/unicharutil/tools/gentransliterate.pl which is the 
generator of transliterate.properties. Need testers on Mac & Linux to confirm 
that.
Status: VERIFIED → REOPENED
Resolution: WONTFIX → ---
Let's try to fix chatzilla as Hixie suggested (|whitespace: -moz-hard-wrap|),
then we can deal with this bug on its own merits (which seem marginal). It looks
like bug 99457 sufficiently covers that RFE, yes?
Agree. But notice that the patch is worthwhile in the substitute-font fallback
context. Once that code gets activated, it means there is no font at all with
the needed glyphs, and recovery (or "faking" the rendering if you like) is all
that matter. It benefit everybody -- not just nsTextframe, and callers don't
have to care that something else was used in the place of their input.
Yes, I buy that. sr=waterson on attachment 49543 [details] [diff] [review] if dbaron and other font-savvy
people are happy with it. (As you note, we'll need to duplicate this code for
Mac and Linux as well.)
It might actually be much easier on other platforms because they already use a
parameterized transliterate handling. Just adding an entry for ZWSP in the
transliterate.properties file might do the trick (need testing to be sure -- of
course):

http://lxr.mozilla.org/seamonkey/find?string=transliterate
These patches don't solve the original issue which was that the font does have a
glyph for that character.  It appears that the font designer expected the
application to interpret those character codes, instead of using the font.
The glyph looks like:
ZW
SP
with a box around it and the same for other similar characters, e.g. zwnj, thsp.
I would be very happy for a |whitespace: -moz-hard-wrap| option, that would
solve a lot of problems.
Just curious, care to tell which is the font that is claiming to have a glyph?
(indeed the problem wouldn't be solved by the patches since they come into play 
too late in the chain during the font-substitute fallback).
This "original problem" is the exact issue that I do not think we should "fix".
If the font is buggy then tough. Uninstall your font. Would you blame Mozilla for
misrendering the "A" character if your font had a "B" glyph at that code point?
To respond to waterson's comment at 20:18 above -- sure, it makes sense to me at
that level.
rbs's patch 49543 looks very reasonable to me. When a system could not provide a
glyph for a character, it is soly application's responsibility to provide a way
to represent it. Either replace it with '?' or transliterate, or in this case
just ignore it all looks reasonable as long as the character's function (as
line-break ) remains. 
I might possibly accept that the font is broken, but I'm not convinced yet. 
But, if mine is broken, then there are probably a lot of others which are broken
as well, and we can't expect the average user to know it's their font, so the
proposed solution here won't help chatzilla.
Samuel, won't rbs' patch solve the problem you mentioned? From user's 
perspective, it should have the same effect as your original patch.  
no, because as you can see in the picture, there actually *is* a glyph at that
character location, so even with the patch, mozilla will use that glyph.
I had to disable the Lucida Sans Unicode and Palatino fonts myself so as to hit
a break point I added in SubstituteChars() to test my patch. Had I not disabled
these fonts, I wouldn't reach that function. But unlike your font, these fonts
have the expected glyphs.

So what is the font with those glyphs if I may re-ask?
Attached image another screenshot
ok, I attached a screenshot of a bunch of fonts on my system.  So which one
should I use? ;-)
Are you going to tell me that everybody did their fonts wrong?
Those glyphs are probably all coming from a single font since they don't exist
in most or all of the fonts you list.
How does that work?  Which font would they be coming from?  Is it mozilla or the
font server picking them?
It is just that I am interested to know the font for my own records so that I
know about that font for future references.

To see the glyphs in a font, you could do something like:
xfd -fn -adobe-symbol-medium-r-normal--14-140-75-75-p-85-adobe-fontspecific

To see the font actually used by Mozilla, you could set a testpage with
the single character, and set:

setenv NS_FONT_DEBUG 2

(which is the value that I see defined in nsFontMetricsGTK for that purpose
#define NS_FONT_DEBUG_CALL_TRACE  0x02)

===
But FYI, here is how it works:

Suppose you have:
<span style="font-family: font1, font2, cursive"> your chatzilla text... </span>

If 'your chatzilla text...' is entiterely ASCII, then little happens, otherwise
the hunt for a font for _each_ character goes on like this:

FindFont(HDC aDC, PRUnichar aChar)
{
->see if the user prefers something else...
  nsFontWin* font = FindUserDefinedFont(aDC, aChar);
  if (!font) {
--->see if there is a glyph in the local list of fonts: "font1, font2"
    font = FindLocalFont(aDC, aChar);  
    if (!font) {
----->see if there is a glyph in a font that is of generic "cursive" type
      font = FindGenericFont(aDC, aChar);
      if (!font) {
------->see if there is a glyph in the global list of fonts on your system
        font = FindGlobalFont(aDC, aChar);
        if (!font) {
--------->fallback: substitute/transliterate to render something if possible...
          font = FindSubstituteFont(aDC, aChar);
        }
      }
    }
  }

  return font;
}

So if there is font on your system that claims to have the glyphs, the fallback
is never reached.
No longer blocks: 71549
> so the proposed solution here won't help chatzilla.

Just for the record again... this bug *should not* help chatzilla. The bug that
will help chatzilla is bug 99457.
Now I'm using zwsp in the purpose of wraping the lines 
instead of old <wbr> which is no longer supported.

In Thai, we can have a very long line of contacting words.

html 4.01:
".. When formatting text, user agents should identify these
words (squences white space separate words) and lay them out 
according to the conventions of the particular written 
language (script) and target medium .. This layout may 
involve putting space between words (called inter-word 
space), but conventions for inter-word space vary from 
script to script .. while in Thai it is a zero-width word 
separator (&x200B;) .."


xhtml 1.0:
".. in languages whose script is related to Nagari (e.g., 
Sanskrit, Thai, etc.), grammatical boundaries may be encoded 
using the ZW 'space' character, but will not typically be 
represented by typographic whitespace in rendered output .." 

So, I think using zwsp in my purpose makes sense.

zwsp is not just another "A". I think Mozilla should do 
something when font has no glyph.
Pawee: I agree. That's what the patch does -- when the character is missing from 
all the fonts, we will handle it specially (by rendering it as "" instead of "?"). 
The issue raised above is with regard fonts that _do_ have a glpyh in that 
position, either on purpose, or because the font designer was misguided.
With "setenv NS_FONT_DEBUG 2", ssieb reported that Mozilla gave the mystery font
being used on his system for ZWSP (U+200B) as:

FindFont(200B):
returns -mutt-clearlyu-medium-r-normal--17-120-100-100-p-128-iso10646-1
Depends on: 32536
Depends on: 33498
No longer depends on: 32536
Resolving as duplicate of bug 33498. I am hooking the transliterator to GfxWin, 
and this bug is fixed as a direct consequence of that.

*** This bug has been marked as a duplicate of 33498 ***
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: