Open Bug 534962 (mathml-linebreaking) Opened 11 years ago Updated 10 months ago
ML3] Improve Math ML linebreaking
http://www.w3.org/TR/MathML3/chapter3.html#presm.linebreaking Linebreaking issues have been reported either inside formulae (bug 380266 or bug 508120) or with the surrounding text (bug 209693 and bug 422619). (note: there are certainly duplicated reports among these various bugs). MathML 3 introduces mechanisms for controlling linebreaking and also clarify the case of inline formulae. http://www.w3.org/TR/MathML3/chapter2.html#interf.toplevel.atts "In particular, this applies to spacing and linebreaking: for instance, there should not be spaces or line breaks inserted between inline math and any immediately following punctuation."
I just read the §3.1.7 of the MathML REC. My first impressions are: 1) I think we can ignore linebreaking at <mspace> elements (i.e. resolve 380266 as WONTFIX) and only consider the <mo> case. It is only here for backward compatibility. 2) I suspect that we should implement specific box models for <math> element in display and inline mode. A possibily could be to determine the rendering of a formula as usual (i.e. on one line), then to determine where we may insert linebreaks and finally to redraw the formula on several lines. Actually, it seems that we can do the first step when we initally determine the widths of elements (I don't remember exactly our algorithm). 3) IIUC, to line wrap the children of mtable cell, we consider the available width for each column. Given that our width computation is a bit incorrect in mtable cell, we may want to deal with this case later and concentrate on linebreaking in the mrow-like case to begin with. 4) The REC seems to suggest ignoring indentation attributes in inline mode. As a first stage, we can also ignore indentation attributes in display mode.
No longer depends on: 508120
Duplicate of this bug: 508120
Mass change: setting priority to 2 for bugs that would be nice fix if Gecko's MathML support is enabled by default in MathJax but that are not in my opinion strictly required or for which a workaround could be written in the MathJax code.
Priority: -- → P2
I'm removing the keywords. This bug is likely to be hard for volunteers and will probably require some preliminary code refactoring, so I don't want this to distract newcomers.
Blocking bug 236963 as a complete solution to that involves modifications to the existing line-breaking code. Doing this bug first will avoid the need to fix the stretch/line-breaking code a second time. (In reply to Frédéric Wang (:fredw) from comment #5) > I'm removing the keywords. This bug is likely to be hard for volunteers and > will probably require some preliminary code refactoring, so I don't want > this to distract newcomers. What refactoring did you think was needed?
(In reply to James Kitchener from comment #6) > What refactoring did you think was needed? I think I had in mind the bugs with embellished operators & stretching, as well as the bugs with intrinsic widths ; but these are fixed now. I'm not clear whether bug 236963 should be fixed before, after or at the same time as this one, but I also thought that the fact that the <math> and <mtd> frames behave differently can be an issue for stretching & intrinsic widths and so some refactoring would be needed. Now I didn't analyze that in details, so feel free to ignore my comment and see what we can do.
The current behaviour of nsLineLayout is to fit as much as possible on a line (possibly with a few exceptions). I propose to add an additional check which detects MathML frames and triggers a new line earlier if the MathML frame has a linebreak presentation flag set. At the start of each new line, the continuation frame (nsMathMLmathInlineFrame, nsMathMLmathBlockFrame or nsMathMLmrowFrame) will calculate the appropriate break point based upon the intrinsic (pref?) widths of the remaining child elements. In the event the break point determining algorithm malfunctions and specifies too much to fit on a line, the line layout object will fall back to its existing canPlaceFrame() behaviour and place the newline when it can't fit anything more. The linebreak choosing algorithm would start again from that newline (remembering to clear any remaining linebreak flags). Inline MathML: At least for the first implementation, the regular nsInlineFrame will have a virtual method which nsMathMLmathInlineFrame will override to perform the linebreak calculation at the start of each inline frame (including continuations). Eventually the inline MathML reflow process will be rewritten to support stretching (bug 236963), at which point this hook can be removed. nsLineLayout will be modified to check if the frame being placed is a MathML frame and if so check the linebreaking flag. <mrow> frames: Like <span> elements these will now be continuable, but the logic to support this will be duplicated to avoid multiple inheritance issues (either that or they stop being nsMathMLContainerFrames). The initial reflow of all child frames will be performed before the first linebreak, to allow stretch metrics to be determined. The stretching, placement and finalising of reflow of a given line's elements will be performed by that line's <mrow> frame. nsMathMLmrowFrames (and friends) that are descendants of <mroot>, <menclose>, <msup> etc. elements will not be considered for intra-frame line breaking at this time. <mpadded>/<mphantom> need access to the <mrow>'s linebreaking algorithm. Making them inherit from nsMathMLmrowFrame and putting the special reflow logic in there looks promising. This ends the first pass, at which point try builds will be distributed (or possibly it gets landed preffed off) and people encouraged to try it. Display/block MathML: This will be implemented in the second phase. To support all of the indentation attributes, a new class overriding nsLineLayout is needed and appropriate changes made to nsMathMLmathblockFrame. The indentation attributes will be incorporated into the CSS engine as internal-only properties. A different linebreak determining algorithm will probably be needed. I'm currently inclined to put the linebreak determining algorithm in nsMathmLmathInlineFrame/nsMathMLmathBlockFrame. <mrow> elements that are eligible for intra-frame breaking would be passed a pointer to the top level mathInline/mathBlock frame during the InheritAutomaticData process. Special case: * linebreakstyle="duplicate". The duplicate operator will probably be a <mfence> style fake one - non-selectable and non-scriptable. (If we could easily clone/invent operators and insert them into the frame tree in a way that is transparent to the user, why haven't we done this for <mfence>?). Care needs to be taken to ensure that style changes to the original will affect the duplicate and that it disappears whenever the linebreak position changes. * linebreakmultchar - I'm still looking into this. It is a similar problem to the FLAC font feature (bug 963079) in that font metrics suddenly change in the reflow process, but it may be easier if we can perform the substitution and invalidate the text metrics before the <mo> frame's reflow occurs. The linebreak determining algorithm and intrinsic min/pref widths would need to account for the additional size of the replacement character. Things not planned for the initial implementaiton: * Intra-<mfence> breaks - This involves duplicating the reflow code and additional complexity in the linebreak determination algorithms to account for frameless fences (however part of the latter issue is needed for linebreakstyle="duplicate"). Worth the implementation and future maintenance costs? * Indentalign="id" etc - I haven't looked into this but it seems unpleasant. * <mtd> linebreaking - They use a different block frame class and the algorithm to determine the appropriate location for breaks will be different. I'd prefer to get feedback for the non-<mtd> algorithms first. There may also be font inflation related complications. Automated Tests: * Varying the width of a iframe and testing that the linebreak always occurs on an operator or * Testing that we never leave an empty line in the middle * Testing that linebreak (newline, nobreak) etc work * Testing that linebreakstyle (before, after, duplicate) work. * Testing that changing the size of things on a multi-line <mrow> affects all lines. * Testing that changing CSS styles of a multi-line <mrow> affects all relevant lines. * Tests that assume a particular linebreaking algorithm or its weightings will be avoided as unavoidably fragile. This feature will pass through the intent to implement/intent to ship process and will be developed behind an about:config preference. The weightings/coefficients to the linebreak determining algorithms will also be preferences to encourage experimentation to find improved values (these may be platform specific). It will be a while before this gets fully implemented and the preference gets enabled by default. (This is a major change, and from past experience it may take over a month before we find out we have broken things). Currently the intention is for this to be an opt-in feature, by setting overflow="linebreak" in the <math> element. Deciding to make this the default, instead of the present "dumb" linebreaking behaviour will be left to a future bug.
To have linebreaks occur within mrow frames, the reflow logic needs to be similar to that of nsInlineFrame. Unfortunately a class that nsMathMLmrowFrame inherits from and nsInlineFrame both inherit from nsContainerFrame, so I can't make the nsMathMLMrowFrame inherit from the nsInlineFrame. I have thought of three possible solutions. A. Duplicate the code. Simplest solution, but the two instances will invariably diverge as I expect changes to nsInlineFrame won't be propagated to the methods living in the MathML folder. B. Stop inheriting from nsMathMLContainerFrame for nsMathMLmrowFrame - I expect this is going to cause major problems involving duplicating any required functionality and having to check for both frames with do_QueryFrame. C. Create a new nsInlineFrameReflow class which nsInlineFrame and nsMathMLmrowFrame inherit from and copy most of the reflow logic into that. Unfortunately the affected reflow methods call protected methods, so nsInlineFrameReflow will need to be a friend class for nsInlineFrame and nsMathMLmrowFrame. Providing the use of friend can be stomached and that the logic gets moved to the new class in a history-preserving way, C looks to me to be the best option Which approach do you want, or is there something else that I have not considered?
nsInlineFrame changes rarely. I actually think that duplicating the code might be the best approach. Or rather, factor out the logic of nsInlineFrame into a helper class, or helper functions, and call that logic from both classes. I think if possible we should avoid adding an abstract parent class or otherwise changing the inheritance hierarchy. Actually, I'm not sure what you're proposing. Could you show exactly what the hierarchy would look like in option C?
While I'd love full, automatic, line breaking in MathML, I wonder whether a good first step might be simply to implement linebreak="newline" (along with a few indentXXX attributes) --- at least as long as it doesn't inhibit eventual future work. Having the ability to pre-generate linebreaks offline without resorting to mtable would open up great possibilities.
Hi, I'm trying to prepare accessible lecture notes in mathematics for visually impaired students. The lack of line breaking seems to me to be a major obstacle. Consider a typical latex equation like: $$ x^2+y^2+4x-2y+3 = x^2+4x+y^2-2y+3 = (x+2)^2-4+(y-1)^2-1+3 = (x+2)^2+(y-1)^2-2 = $$ Now the problem is: for a user of the screen reader the above should be represented as a single equation. So preferably the above should be within a single math object: <math> </mo><mrow><mrow><mrow><msup><mi>x</mi><mn>2</mn> ... </math> But here's the problem: the notes might be read by someone with very low vision who actually wants to see zoomed-in equation in addition to sound/Braille (this is the actual use case of lecture notes which I preper - the zoom level is roughly equivalent to font size 160, so typically less than 8 symbols are visible on the laptop screen), and here it would be useful to have line breaks at equality signs. Right now to get linebreaks I know no other option than to add various additional html elements, and split the equation using them. This then confuses the screen reader, and the maths which is read out is much less clear.
You need to log in before you can comment on or make changes to this bug.