Last Comment Bug 534962 - (mathml-linebreaking) [MathML3] Improve MathML linebreaking
: [MathML3] Improve MathML linebreaking
Status: NEW
Product: Core
Classification: Components
Component: MathML (show other bugs)
: Trunk
: All All
: P4 normal with 11 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
: Anthony Jones (:kentuckyfriedtakahe, :k17e)
: mspace-linebreak 508120 (view as bug list)
Depends on:
Blocks: line-breaking stretch-mtd-math mathml-3 mathml-in-mathjax 958947
  Show dependency treegraph
Reported: 2009-12-15 13:05 PST by Frédéric Wang (:fredw)
Modified: 2017-01-14 06:57 PST (History)
12 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Description User image Frédéric Wang (:fredw) 2009-12-15 13:05:30 PST
Linebreaking issues have been reported either inside formulae (bug 380266 or bug 508120) or with the surrounding text (bug 209693 and bug 422619). (note: there are certainly duplicated reports among these various bugs). MathML 3 introduces mechanisms for controlling linebreaking and also clarify the case of inline formulae.
"In particular, this applies to spacing and linebreaking: for instance, there should not be spaces or line breaks inserted between inline math and any immediately following punctuation."
Comment 1 User image Frédéric Wang (:fredw) 2012-03-05 14:01:14 PST
I just read the §3.1.7 of the MathML REC. My first impressions are:

1) I think we can ignore linebreaking at <mspace> elements (i.e. resolve 380266 as WONTFIX) and only consider the <mo> case. It is only here for backward compatibility.

2) I suspect that we should implement specific box models for <math> element in display and inline mode. A possibily could be to determine the rendering of a formula as usual (i.e. on one line), then to determine where we may insert linebreaks and finally to redraw the formula on several lines. Actually, it seems that we can do the first step when we initally determine the widths of elements (I don't remember exactly our algorithm).

3) IIUC, to line wrap the children of mtable cell, we consider the available width for each column. Given that our width computation is a bit incorrect in mtable cell, we may want to deal with this case later and concentrate on linebreaking in the mrow-like case to begin with.

4) The REC seems to suggest ignoring indentation attributes in inline mode. As a first stage, we can also ignore indentation attributes in display mode.
Comment 2 User image Frédéric Wang (:fredw) 2012-03-08 05:41:30 PST
*** Bug 508120 has been marked as a duplicate of this bug. ***
Comment 3 User image Frédéric Wang (:fredw) 2012-11-11 13:07:20 PST
*** Bug 380266 has been marked as a duplicate of this bug. ***
Comment 4 User image Frédéric Wang (:fredw) 2012-11-30 08:33:25 PST
Mass change: setting priority to 2 for bugs that would be nice fix if Gecko's MathML support is enabled by default in MathJax but that are not in my opinion strictly required or for which a workaround could be written in the MathJax code.
Comment 5 User image Frédéric Wang (:fredw) 2014-02-06 00:43:25 PST
I'm removing the keywords. This bug is likely to be hard for volunteers and will probably require some preliminary code refactoring, so I don't want this to distract newcomers.
Comment 6 User image James Kitchener (:jkitch) 2014-10-07 06:07:14 PDT
Blocking bug 236963 as a complete solution to that involves modifications to the existing line-breaking code.

Doing this bug first will avoid the need to fix the stretch/line-breaking code a second time.

(In reply to Frédéric Wang (:fredw) from comment #5)
> I'm removing the keywords. This bug is likely to be hard for volunteers and
> will probably require some preliminary code refactoring, so I don't want
> this to distract newcomers.

What refactoring did you think was needed?
Comment 7 User image Frédéric Wang (:fredw) 2014-10-07 09:41:50 PDT
(In reply to James Kitchener from comment #6)
> What refactoring did you think was needed?

I think I had in mind the bugs with embellished operators & stretching, as well as the bugs with intrinsic widths ; but these are fixed now. I'm not clear whether bug 236963 should be fixed before, after or at the same time as this one, but I also thought that the fact that the <math> and <mtd> frames behave differently can be an issue for stretching & intrinsic widths and so some refactoring would be needed. Now I didn't analyze that in details, so feel free to ignore my comment and see what we can do.
Comment 8 User image James Kitchener (:jkitch) 2015-01-29 04:41:12 PST
The current behaviour of nsLineLayout is to fit as much as possible on a line (possibly with a few exceptions).  I propose to add an additional check which detects MathML frames and triggers a new line earlier if the MathML frame has a linebreak presentation flag set.

At the start of each new line, the continuation frame (nsMathMLmathInlineFrame, nsMathMLmathBlockFrame or nsMathMLmrowFrame) will calculate the appropriate break point based upon the intrinsic (pref?) widths of the remaining child elements.  In the event the break point determining algorithm malfunctions and specifies too much to fit on a line, the line layout object will fall back to its existing canPlaceFrame() behaviour and place the newline when it can't fit anything more.  The linebreak choosing algorithm would start again from that newline (remembering to clear any remaining linebreak flags).

Inline MathML:  At least for the first implementation, the regular nsInlineFrame will have a virtual method which nsMathMLmathInlineFrame will override to perform the linebreak calculation at the start of each inline frame (including continuations).  Eventually the inline MathML reflow process will be rewritten to support stretching (bug 236963), at which point this hook can be removed.  nsLineLayout will be modified to check if the frame being placed is a MathML frame and if so check the linebreaking flag.

<mrow> frames:  Like <span> elements these will now be continuable, but the logic to support this will be duplicated to avoid multiple inheritance issues (either that or they stop being nsMathMLContainerFrames).  The initial reflow of all child frames will be performed before the first linebreak, to allow stretch metrics to be determined.  The stretching, placement and finalising of reflow of a given line's elements will be performed by that line's <mrow> frame.  nsMathMLmrowFrames (and friends) that are descendants of <mroot>, <menclose>, <msup> etc. elements will not be considered for intra-frame line breaking at this time.

<mpadded>/<mphantom> need access to the <mrow>'s linebreaking algorithm.  Making them inherit from nsMathMLmrowFrame and putting the special reflow logic in there looks promising.

This ends the first pass, at which point try builds will be distributed (or possibly it gets landed preffed off) and people encouraged to try it.

Display/block MathML:
This will be implemented in the second phase.  To support all of the indentation attributes, a new class overriding nsLineLayout is needed and appropriate changes made to nsMathMLmathblockFrame.  The indentation attributes will be incorporated into the CSS engine as internal-only properties.  A different linebreak determining algorithm will probably be needed.

I'm currently inclined to put the linebreak determining algorithm in nsMathmLmathInlineFrame/nsMathMLmathBlockFrame.  <mrow> elements that are eligible for intra-frame breaking would be passed a pointer to the top level mathInline/mathBlock frame during the InheritAutomaticData process.

Special case: 
* linebreakstyle="duplicate".  The duplicate operator will probably be a <mfence> style fake one - non-selectable and non-scriptable.  (If we could easily clone/invent operators and insert them into the frame tree in a way that is transparent to the user, why haven't we done this for <mfence>?).  Care needs to be taken to ensure that style changes to the original will affect the duplicate and that it disappears whenever the linebreak position changes.
* linebreakmultchar - I'm still looking into this.  It is a similar problem to the FLAC font feature (bug 963079) in that font metrics suddenly change in the reflow process, but it may be easier if we can perform the substitution and invalidate the text metrics before the <mo> frame's reflow occurs.  The linebreak determining algorithm and intrinsic min/pref widths would need to account for the additional size of the replacement character.

Things not planned for the initial implementaiton:
* Intra-<mfence> breaks -  This involves duplicating the reflow code and additional complexity in the linebreak determination algorithms to account for frameless fences (however part of the latter issue is needed for linebreakstyle="duplicate").  Worth the implementation and future maintenance costs?  
* Indentalign="id" etc -   I haven't looked into this but it seems unpleasant.
* <mtd> linebreaking -  They use a different block frame class and the algorithm to determine the appropriate location for breaks will be different.  I'd prefer to get feedback for the non-<mtd> algorithms first.  There may also be font inflation related complications.
Automated Tests:
* Varying the width of a iframe and testing that the linebreak always occurs on an operator or 
* Testing that we never leave an empty line in the middle
* Testing that linebreak (newline, nobreak) etc work
* Testing that linebreakstyle (before, after, duplicate) work.
* Testing that changing the size of things on a multi-line <mrow> affects all lines.
* Testing that changing CSS styles of a multi-line <mrow> affects all relevant lines.
* Tests that assume a particular linebreaking algorithm or its weightings will be avoided as unavoidably fragile.

This feature will pass through the intent to implement/intent to ship process and will be developed behind an about:config preference.  The weightings/coefficients to the linebreak determining algorithms will also be preferences to encourage experimentation to find improved values (these may be platform specific).  It will be a while before this gets fully implemented and the preference gets enabled by default.  (This is a major change, and from past experience it may take over a month before we find out we have broken things).

Currently the intention is for this to be an opt-in feature, by setting overflow="linebreak" in the <math> element.  Deciding to make this the default, instead of the present "dumb" linebreaking behaviour will be left to a future bug.
Comment 9 User image James Kitchener (:jkitch) 2015-02-06 17:20:47 PST
To have linebreaks occur within mrow frames, the reflow logic needs to be similar to that of nsInlineFrame.  Unfortunately a class that nsMathMLmrowFrame inherits from and nsInlineFrame both inherit from nsContainerFrame, so I can't make the nsMathMLMrowFrame inherit from the nsInlineFrame.

I have thought of three possible solutions.

A.  Duplicate the code.  Simplest solution, but the two instances will invariably diverge as I expect changes to nsInlineFrame won't be propagated to the methods living in the MathML folder.

B. Stop inheriting from nsMathMLContainerFrame for nsMathMLmrowFrame - I expect this is going to cause major problems involving duplicating any required functionality and having to check for both frames with do_QueryFrame. 

C. Create a new nsInlineFrameReflow class which nsInlineFrame and nsMathMLmrowFrame inherit from and copy most of the reflow logic into that.  Unfortunately the affected reflow methods call protected methods, so nsInlineFrameReflow will need to be a friend class for nsInlineFrame and nsMathMLmrowFrame.  

Providing the use of friend can be stomached and that the logic gets moved to the new class in a history-preserving way, C looks to me to be the best option

Which approach do you want, or is there something else that I have not considered?
Comment 10 User image Robert O'Callahan (:roc) (email my personal email if necessary) 2015-02-16 19:11:22 PST
nsInlineFrame changes rarely. I actually think that duplicating the code might be the best approach. Or rather, factor out the logic of nsInlineFrame into a helper class, or helper functions, and call that logic from both classes. I think if possible we should avoid adding an abstract parent class or otherwise changing the inheritance hierarchy.

Actually, I'm not sure what you're proposing. Could you show exactly what the hierarchy would look like in option C?
Comment 11 User image Bruce Miller 2015-09-22 14:09:03 PDT
While I'd love full, automatic, line breaking in MathML, I wonder whether a good first step might be simply to implement linebreak="newline" (along with a few indentXXX attributes) --- at least as long as it doesn't inhibit eventual future work.

Having the ability to pre-generate linebreaks offline without resorting to mtable would open up great possibilities.
Comment 12 User image Lukasz 2017-01-14 06:57:06 PST
Hi, I'm trying to prepare accessible lecture notes in mathematics for visually impaired students. The lack of line breaking seems to me to be a major obstacle. Consider a typical latex equation like:
    x^2+y^2+4x-2y+3 = x^2+4x+y^2-2y+3 = (x+2)^2-4+(y-1)^2-1+3 = (x+2)^2+(y-1)^2-2 =
Now the problem is: for a user of the screen reader the above should be represented as a single equation. So preferably the above should be within a single math object:
</mo><mrow><mrow><mrow><msup><mi>x</mi><mn>2</mn> ...
But here's the problem: the notes might be read by someone with very low vision who actually wants to see zoomed-in equation in addition to sound/Braille (this is the actual use case of lecture notes which I preper - the zoom level is roughly equivalent to font size 160, so typically less than 8 symbols are visible on the laptop screen), and here it would be useful to have line breaks at equality signs.

Right now to get linebreaks I know no other option than to add various additional html elements, and split the equation using them.  This then confuses the screen reader, and the maths which is read out is much less clear.

Note You need to log in before you can comment on or make changes to this bug.