Closed Bug 1027569 Opened 10 years ago Closed 2 years ago

tspan elements: some content becomes merged

Categories

(Core :: SVG, defect)

x86
Windows Vista
defect
Not set
major

Tracking

()

RESOLVED DUPLICATE of bug 1696792

People

(Reporter: j.vonpreussen, Unassigned)

Details

(Keywords: testcase)

Attachments

(3 files, 2 obsolete files)

User Agent: Mozilla/5.0 (Windows NT 6.0; rv:29.0) Gecko/20100101 Firefox/29.0 SeaMonkey/2.26 (Beta/Release)
Build ID: 20140428215944

Steps to reproduce:

coded two (2) 'tspan' lines (same 'x', different 'y')


Actual results:

The last 'tspan' line has its initial character joined to the second-to-the-last line. This does not happen if the source code has any whitespace between the ending '/tspan' of the second-to-the-last line and the following 'tspan'.

file url: http://2secure.us/testing/tspan_test.svg


Expected results:

two (2) separate lines
Severity: normal → major
Keywords: testcase
Whiteboard: svg text tspan text shows in prior element
Component: General → SVG
Product: SeaMonkey → Core
Version: SeaMonkey 2.26 Branch → Trunk
Status: UNCONFIRMED → NEW
Ever confirmed: true
This is something to do with the particular font: Palatino Linotype and the letters t and z (I tried several other letter combinations and those worked OK).

Perhaps these characters are incorrectly marked as a ligature in this character set?

I'm also had trouble displaying the original testcase locally as it seems to have an invalid unicode character — in it (a long dash). I'm not sure why that issue doesn't occur with the linked testcase.
Whiteboard: svg text tspan text shows in prior element
ccing some font experts
mr longson:

since i am not much of a coder, i have to constantly refresh my svg "skills" via the web and you have certainly made the life of myself and many others much easier by your many contributions and i would like to thank you for that.

now, first, the em dash you reference. i usually write in np++ and the characters i inserted in my example testcase should at least come through as the std ascii 0x97. as you noted, all browsers parse this correctly in downloaded files. further, i have no problems when the file is rendered locally by SM/FF. so, i am not sure why your SM is having a problem parsing a disk file with this quite-normal character. but (there is always a "but"), there is a bit of a disconnect when using unicode fonts. Palatino Linotype (PL) points-to/encodes this 0x97 character as their glyph 559 (see my att'd font info) and the character comes through as U+2014 ( your '—') in the outline profile which is in the unicode punctuation sub-set. U+2014 IS a valid unicode character (http://www.unicode.org/charts/PDF/U2000.pdf). from my experience. most (if not all) "unicode-aware" fonts use the punctuation sub-set. since the xml specifies UTF-8, no browser (well, there are chrome problems) SHOULD have a problem with these code-points no matter where the file resides.

as far as ligatures are concerned PL does the std [filt] groupings plus 'st'. from my memory i seem to remember a 'tz' ligature of sorts in one of the slavic languages, but PL does do that one.

the general font issue. i usually use PL because of its long history and wide-spread availability. its rendering is also aesthetically pleasing to (possibly, only) me. moreover, its non-english character rendering/availability and traditional approach makes for another great "plus".

that is what i like, now what re the 't' and 'z' issue. as i noted, supra, "...t</tspan><tspan ... >z..." does not work while "...t</tspan>[whitespace]<tspan ... >z..." does work. the non-working case DOES have the various attributes properly applied from its enclosing tspan (to the 'z', in this case), it just does not apply the x/y positioning. now, remember, i am not a hard-core coder; but it seems to me that the issue is in the element-parsing stage and not the font-rendering stage. however, to be certain; i pulled up the PL characters in a font editor and checked attributes against other fonts [book antigua and serif (times new roman)]. if you look at what i found you will see that nothing differs radically amongst the typefaces.

it is with no joy that i have to confirm your suspicions that PL might solely somehow be at fault. using your testcase and pushing back PL with 'Book Antigua' the problem goes away. the same is true using 'Palatino' which i installed just to check (it is not as "pretty" as PL!).

after all this, i do not know if i have added anything of worth to the discussion. please, again, accepts my thanks for your assistance.

johann
i was looking for some pages to close and refreshed this one. i do not know why it popped out at me, but there was a glitch in my last comment in the third (3rd) paragraph:

'but PL does do that one.' should read 'but PL does *NOT* do that one.'. sorry for this.

johann
Apparently that Palatino has a "tz" ligature. You can reproduce the same issue with more fonts using combinations such as "fi" or "fl" that are more commonly ligated. (E.g. try "fi" with Fira Sans.)

So it appears we don't handle drawing separately-positioned <tspan> elements where there is a ligature glyph that spans the boundary between the elements; the entire ligature glyph gets drawn for the first <tspan>, and doesn't appear at all in the second.

A workaround would be to put the two text fragments into separate <text> elements, as shaping (ligatures, etc.) will only occur within a single <text>.
perhaps you misunderstood my last comment. Palatino Linotype DOES NOT have a ligature for 'tz'. in fact, very few fonts do. moreover, ligatures, by definition, are a single character ... not two (2).

therefore, we are back to the fact that gecko is not handling the tspans correctly.

johann
No, you're misunderstanding what ligatures are. Palatino Linotype as found on Windows *does* have a "tz" ligature. Not a character that you'll find in a "character finder", but a glyph that is accessed via an OpenType feature.

If you try the testcase

  data:text/html,<div style="60px Palatino Linotype">tztztztztz

you get the ligature, where the "t" and "z" pairs are merged so that they actually touch. There are a number of other similar ligatures, such as "tt"; try

  data:text/html,<div style="60px Palatino Linotype">tttttttttt

and see how pairs of "t" glyphs are ligated, so you only see a (slight) gap at alternate character boundaries.

If you disable the OpenType ligature feature, you'll see the "tz" pairs remain separate:

  data:text/html,<div style="60px Palatino Linotype;
                  -moz-font-feature-settings: 'liga' off">tztztztztz

(And if you do the same in the style of the SVG example, the problem will be fixed there too.)

It would be good if Gecko handled this better; it already paints the ligature in two parts (using clipping) to achieve the two-color effect, but it's failing to reposition the second half appropriately. However, in general this will still be a less-than-perfect result, as painting the ligature glyph in two halves may not visually divide it very well.
ligatures commonly are located at fb00 and palatino linotype has seven (7) such glyphs and none of them are 'tz' and there is no 'tz' glyph in the entire typeface. what you are telling me is that gecko is actually "creating" ligatures by compacting character spacing when the typeface is not supporting such. also, it is messing up the attributes on the letters that the font says are separate when they bridge tspans.

frankly, tspans are svg elements that are supposed to be handled separately and insulated from any font fiddling that gecko is doing. you are saying that this "compacting" of certain character combinations is an OTF spec/"feature" (not followed by other browsers) and i will have to research that further since it has never arisen (for me) before. moreover, i am a bit suspicious re that being an OTF spec since 'Book Antigua' is also OTF and does not present with this problem. thus, it seems to be an OTF "feature" selectively applied only by gecko.

i am still convinced that gecko's approach is wrong since it does not attempt to combine characters into a pseudo ligature when there is any whitespace between the </tspan> and the next <tspan>. since all such intervening whitespace is supposed to be ignored IAW xml/svg spec, gecko is plainly not being consistent when what it was doing without the whitespace was wrong and with the whitespace is correct. and, of course, gecko is not ignoring the whitespace as it should.

once again, i see this as an element (tspan) parsing problem and not a font rendering problem. however, pushing 't' and 'z' more closely together (when in a legitimate run of characters) certainly does not seem like it should be done in english (where it is uncommon) and not even beneficial in most other languages where they also are considered separate characters. ligatures are supposed to enhance legibility and i see little benefit from the way gecko handles this pseudo ligature issue. in fact, it makes their combination less legible.

i suggest you look at the scaled-up 'tttz' (in separate tspans) testcase i submitted and you will see that the rendering of the second (2nd) "t" has the color attribute of its tspan, but the character is malformed (is not the glyph from the font) when combined with the first (1st) 't'. the third (3rd) 't' is not combined with the first two (2), but is combined with the 'z'. and there the color attribute from that 't' is partially spread to the 'z'.

i think that this gecko-specific problem would be solved if it made itself consistent with the other browsers and did not try to be so cute in rendering some character combinations oddly and did its tspan parsing correctly. font manufacturers already produce their outlines with appropriate spacing allowances and gecko trying to "go one better" is just not working and by no means an enhancement of legibility.

johann
Attached image malformed characters and painting errors (obsolete) —
I think the behaviour here might be correct.  Because the document does manual line breaking using <tspan> elements, and there is no white space between the <tspan>s, the "atzooms" is really considered to be one word, hence the "tz" ligature formation.

So my first suggestion is that you really should have white space between the <tspan> elements.

Reading the rules in the SVG spec about assigning character positions to glyphs (the bulleted list in http://www.w3.org/TR/SVG11/text.html#TSpanElement), it's clear that ligatures should not be broken up by positioning values.  I don't think those rules really cover the case here, where you have an absolute position assigned to the second character that makes up a glyph but no position assigned to the first character; that's something we'll cover in the SVG 2 spec.  I think it's probably most consistent with the existing text in SVG 1.1 that the position we assign to the second character of the ligature does not affect the position of that ligature, but instead gets assigned to the next glyph.
following your link:
"Within a ‘text’ element, text and font properties and the current text position can be adjusted with absolute or relative coordinate values by including a ‘tspan’ element."

the operative wording is "the current text position can be adjusted". the problem is that gecko is not using the tspan-assigned x/y values and, instead, is merging the text (the characters it thinks should be ligated). i have put up an amended testcase showing that whitespace between tspans does not affect what text is rendered (correctly) and that internal whitespace is not required for the tspan to properly work when gecko's pseudo ligature proc is not called (also correct).

gecko does not call the ligature proc if there is inter-element whitespace (incorrect since such whitespace must be ignored).  this is a non-spec inconsistency by gecko. once again, if gecko parsed the tspan element correctly none of this would happen. it would also be nice if gecko gave up this pseudo ligature rendering in its entirety and matched the performance of other browsers.

johann
Attachment #8449820 - Attachment is obsolete: true
(In reply to johann from comment #13)
> following your link:
> the operative wording is "the current text position can be adjusted". the
> problem is that gecko is not using the tspan-assigned x/y values and,
> instead, is merging the text (the characters it thinks should be ligated). i
> have put up an amended testcase showing that whitespace between tspans does
> not affect what text is rendered (correctly) and that internal whitespace is
> not required for the tspan to properly work when gecko's pseudo ligature
> proc is not called (also correct).

I think the merging of the text -- considering it a single word -- is correct.

> gecko does not call the ligature proc if there is inter-element whitespace
> (incorrect since such whitespace must be ignored).  this is a non-spec
> inconsistency by gecko. once again, if gecko parsed the tspan element
> correctly none of this would happen. it would also be nice if gecko gave up
> this pseudo ligature rendering in its entirety and matched the performance
> of other browsers.

Your example should include white space between the <span> elements that have the "t" and the "z", as that's where the ligature is being formed.

I did overlook something from the spec, though:

  * As mentioned above, ligature formation should not be enabled for the glyphs corresponding to
    characters within different text chunks.
  * Ligature formation should not be enabled for the glyphs corresponding to characters within
    different DOM text nodes; thus, characters separated by markup should not use ligatures.
    -- http://www.w3.org/TR/SVG11/text.html#TextLayoutIntroduction

For the first point, the behaviour of different browsers around text layout features that cross text chunk boundaries has traditionally been inconsistent.  Particularly regarding bidirectional text, but also whether characters that make up ligatures fall in one text chunk or another.  The model for text layout we have been wanting to move to in SVG 2 is effectively to lay out the text as CSS/HTML would, with <span>s instead of <tspan>s, and then to apply the positioning attributes on top of the result of that.

For the second point, that is not at least how Gecko works with HTML/CSS text layout, at least with two plain <span>s next to each other.

We can make some similar test cases in HTML to compare:

  <!DOCTYPE html>
  <style>
  body { font: 48px Palatino Linotype; }
  #a { color: blue; }
  #b { color: red; position: relative; top: 50px; left: 50px; }
  </style>
  <span id=a>at</span><span id=b>zooms</span>

Here the ligature is still formed, but the rendering is split up by the relative positioning of the second <span>.  If we change it to "position: absolute", then the ligature is not formed.  Maybe that means we should try to apply the "ligatures don't form in different chunks" rule here to parallel the behaviour of the "position: absolute" case in HTML.

In either case, relative or absolute, you can see that the browser considers it to be a single word by double clicking on either part and noticing that both parts are selected.  With white space between the <span> elements, they would be separate words.

Jonathan, where is the code that determines whether to form ligatures across separate text frames?

I wonder if there should be a difference between discretionary and mandatory ligatures, too.
the reason i put up the amended testcase was to demonstrate that inter-element spacing is not important unless you wish to defeat gecko's pseudo ligature business when it comes into play.

it also demonstrates that gecko is not producing the correct glyphs and is mis-painting its created "ligatures". throughout this discussion i really should have been referring to gecko's action as a "kerning" proc, since that is really what it is doing. once it has done this, the font-rendering gets messed up and the attributes are not applied appropriately.

i guess that the rendering stage puts out a couple of font-point ranges and the attributes get applied in such a way that the boundaries of the original glyphs are ignored. if you look at the painting of the 'tz' you will see that the 'z' is partially over-written by the 't' attributes far in excess of the advance of the 't'. just guessing, but the RSB of the 't' and the LSB of the 'z' (52 points, in toto) might be being added to the attribute range of the 't' (which should be 688 minus whatever "robbing" gecko does) and causing the over-shoot. the most that gecko can get from the 't' is 16 from the RSB and this would make the advance of the 't' only 672. obviously, gecko thinks the attributes should be applied to a much wider range of font-points.

the above situation also can result if the kerning causes characters to overlap as in the 'tz' case. if kerning is applied (gecko seems to kick in at font-size > 19), then the attributes should be applied strictly character-by-character instead of per font-point range. that is apparently not the case here.

i really think gecko would hit the "5 9's" in acceptability by just getting the tspan parsing done correctly.
(In reply to johann from comment #16)
> the reason i put up the amended testcase was to demonstrate that
> inter-element spacing is not important unless you wish to defeat gecko's
> pseudo ligature business when it comes into play.

But it's not only that.  Try copying and pasting the text which doesn't have the space between the <tspan>s.  You'll get the words at the end of each line connected to the words at the start of the next line without a space between them.  I think semantically you should have the space in there, regardless of the ligature issue.

> it also demonstrates that gecko is not producing the correct glyphs and is
> mis-painting its created "ligatures". throughout this discussion i really
> should have been referring to gecko's action as a "kerning" proc, since that
> is really what it is doing. once it has done this, the font-rendering gets
> messed up and the attributes are not applied appropriately.

I don't think it's related to kerning.  If by the "mis-painting" you mean that we allow painting half of a ligature with one style and half with another, then it's a "feature" that Gecko supports not only for SVG text but for HTML text too.  If you mean the creation of the ligature (and therefore its positioning as a whole glyph on the preceding line) despite having absolute positioning values on the <tspan>, then I'm willing to say that's a problem and we can find a solution to that.

> i guess that the rendering stage puts out a couple of font-point ranges and
> the attributes get applied in such a way that the boundaries of the original
> glyphs are ignored. if you look at the painting of the 'tz' you will see
> that the 'z' is partially over-written by the 't' attributes far in excess
> of the advance of the 't'. just guessing, but the RSB of the 't' and the LSB
> of the 'z' (52 points, in toto) might be being added to the attribute range
> of the 't' (which should be 688 minus whatever "robbing" gecko does) and
> causing the over-shoot. the most that gecko can get from the 't' is 16 from
> the RSB and this would make the advance of the 't' only 672. obviously,
> gecko thinks the attributes should be applied to a much wider range of
> font-points.

I think that might just be the "paint a ligature glyph in two halves, regardless of whether it makes sense to split it 50%" behaviour, rather than anything to do with the positioning of separate t and z glyphs.

> i really think gecko would hit the "5 9's" in acceptability by just getting
> the tspan parsing done correctly.

Depends what you mean by parsing of <tspan>s, but the issue isn't with how the document is parsed but how we take the parsed SVG elements and pass it off to the text layout functionality underneath (which is responsible for positioning glyphs, creating ligatures, etc.).
(In reply to Cameron McCormack (:heycam) from comment #15)
> Jonathan, where is the code that determines whether to form ligatures across
> separate text frames?

Basically, we'll form ligatures (or do other "shaping" things) if a single textrun is used for the content across the frame boundary; if we create separate textruns, then no such effects will happen.

I'm not sure offhand exactly what triggers the difference between position:relative and position:absolute with a <span> here. (Or even whether it's correct that we have such a difference.) I guess I'd start by looking at nsIFrame::CanContinueTextRun(), and perhaps BuildTextRunsScanner::ContinueTextRunAcrossFrames(), to see what they're checking.

> I wonder if there should be a difference between discretionary and mandatory
> ligatures, too.

I don't think so. Ligatures are only one of many potential glyph-layout effects that could be happening, and the same questions about whether or not it makes sense to apply them across element boundaries will apply.
(In reply to johann from comment #10)

ligatures commonly are located at fb00 and palatino linotype has seven (7) such glyphs and none of them are 'tz' and there is no 'tz' glyph in the entire typeface. what you are telling me is that gecko is actually "creating" ligatures by compacting character spacing when the typeface is not supporting such. also, it is messing up the attributes on the letters that the font says are separate when they bridge tspans.

(In reply to johann from comment #16)

> it also demonstrates that gecko is not producing the correct glyphs and is
> mis-painting its created "ligatures". throughout this discussion i really
> should have been referring to gecko's action as a "kerning" proc, since that
> is really what it is doing. once it has done this, the font-rendering gets
> messed up and the attributes are not applied appropriately.

I think we need to clear up this misunderstanding. What you are seeing with "tz" (and other pairs such as "tt") IS a ligature glyph in Palatino Linotype; it is NOT some kind of kerning or "created ligature" that Gecko is doing.

The Unicode *characters* at U+FB00 and following are "presentation forms" that are encoded for compatibility with legacy character sets that provided them as individual codepoints. But Palatino Linotype also contains a substantial number of true *ligature glyphs* that do not have individual Unicode *character codes* but are used by OpenType layout features that map multiple *characters* in the text to a single ligature *glyph*.

Many of these are part of the "discretionary ligatures" ('dlig') or "historical ligatures" ('hlig') feature, and so will not be used unless those features are explicitly applied. Some, however, including "fb", "ffb", "fh", "ffh", "fk", "ffk", "fj", "tt", and "tz" are part of the "common ligatures" ('liga') feature, along with "fi", "fl", etc., and these are applied by default.

You can see ALL the glyphs in the font - not only those present as encoded characters in the Unicode repertoire - if you open it with an editor such as FontForge. Then you can verify that the "tz" in your example is in fact a single ligature glyph.

(Personally, I think it's a mistake for Palatino Linotype to include the "tight" tz ligature in the 'liga' feature; it would have been more appropriate for this to be a discretionary ligature, like the similar "Th", "ch", and "ck" forms. Nevertheless, this IS what's in the font, and Gecko is merely following the font's instructions.)
1.) inter-element spacing: http://www.w3.org/TR/2004/REC-xml11-20040204/#sec-white-space
default xml parsing essentially discards this as "insignificant" whitespace unless it is something that equates to line-ending code(s). this can be directly specified with 'xml-space="default"' or overridden by 'xml-space="preserve"'. thus, gecko should not be interpreting this spacing to have anything to do with the actual element content since, semantically, xml considers this whitespace as "insignificant" and not relative to element content.
2.) i modified the testcase somewhat with the addition of adjoining tspans ending, first, with 't' and, second, starting with 't'. i cannot imagine that anyone believes that the line marked '4.)' is the way xml was meant to be laid out.
3.) gecko is not creating ligatures in that the character combinations do not become a single glyph. this is why i said that 'kerning' is what is actually happening in gecko's pseudo ligature proc.
4.) as i have noted and comments 15/17/18 seem to concur, gecko's character manipulation should only occur within a single run of text (if at all). obviously, the separating of text with tspans is a creation of separate text runs and gecko's moving of a character from one tspan to another is not according to spec.
5.) the only way text attributes can change is to create another tspan. if item 4, supra, were adhered to there would be no problems of mis-applying the attributes across a couple (or more) characters as exhibited in the 'tz' combo from different tspans.
6.) i am not certain that i see any real value in gecko overriding a typeface's designed character spacing and can definitely see that this extra processing is not really offering even a perceptible difference or much in the way of enhanced legibility.

johann
seems we were submitting simultaneously.

well, i must not be comprehending why -- if 'tz' is a true ligature -- the individual characters are selectable and not the ligature as a single glyph as is normally true for unicode/historical ligatures.

what ever that case may be, am i incorrect in thinking that all this 'liga' processing problem can be corrected by merely treating the tspans as separate text runs as they were intended in the spec and rendered by other browsers?

johann
Attachment #8449850 - Attachment is obsolete: true
(In reply to johann from comment #21)
> seems we were submitting simultaneously.
> 
> well, i must not be comprehending why -- if 'tz' is a true ligature -- the
> individual characters are selectable and not the ligature as a single glyph
> as is normally true for unicode/historical ligatures.

U+FB00, etc., are not what this is about. Graphically, those are ligatures, but from an implementation point of view they are single, indivisible characters.

"True" ligatures as implemented in OpenType fonts, like PL's "tz" etc., are not encoded as Unicode characters, or present explicitly in the source text; they're formed by glyph replacement (as specified in the font's GSUB table) during rendering. As such, the document still contains the two individual characters "t" and "z", which can be individually selected, and Gecko will conceptually divide the area of the single ligature glyph in two halves for highlighting purposes.

If you look closely at a word like "office" in Palatino Linotype (use a large font size), you can see that it uses an "ffi" ligature glyph for the three characters "ffi":

  data:text/html,<div style="font:60px Palatino Linotype">office

(Compare the ligature-free version:

  data:text/html,<div style="font:60px Palatino Linotype;
                 -moz-font-feature-settings:'liga' 0">office

to see more clearly that there *is* a ligature in the first example.)

Yet you can still select this in three separate parts. If you were to enter the word as "office" in the document, using the Unicode presentation-form codepoint, then the "ffi" will be selectable only as a unit:

  data:text/html,<div style="font:60px Palatino Linotype">o&%23xfb03;ce

(General rule: don't use those codepoints.)

> 
> what ever that case may be, am i incorrect in thinking that all this 'liga'
> processing problem can be corrected by merely treating the tspans as
> separate text runs as they were intended in the spec and rendered by other
> browsers?

Yes, that would fix this problem. But it would break other examples, where applying glyph-shaping effects (such as ligatures and kerning) defined in the font across tspan boundaries is the desired result.

Consider a case like:

  data:text/xml,<svg xmlns="http://www.w3.org/2000/svg">
    <style>text{font: 60px 'Trebuchet MS'}</style>
    <text x="10" y="50"><tspan fill="%23f00">A</tspan><tspan>WAY</tspan></text></svg>

where separate tspans are used to color parts of the text. If we always break text runs at each tspan boundary, we'll break the font's desired kerning at these places.
yes, i understand the GSUB table ref; but i did not understand that those ligatures were not rendered as one. thank you for your patience.

as to the tspan issue, the only thing i see is that tspans with different x/y values (and, obviously, dx/dy values) are not a single text run. those tspans that are governed by the tspan accumulation of positioning are not the problem here, only those with differing positions. it seems that the spec is clear on this.

johann

Reproduced on Firefox 29, Win 10 64-bit.
WFM on 94.0.2 and latest 97.0a1 using the attached test cases, so closing.

Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME

This was done in bug 1696792.

Resolution: WORKSFORME → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: