Open Bug 1466044 Opened 6 years ago Updated 2 years ago

Formula is not correctly displayed under reader view mode

Categories

(Toolkit :: Reader Mode, defect, P3)

60 Branch
defect

Tracking

()

People

(Reporter: cbakgly, Unassigned)

References

()

Details

(Whiteboard: [reader-mode-readability-algorithm])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Firefox/60.0
Build ID: 20180516032328

Steps to reproduce:

It's a mac version firefox 60.0.1, with OSX 10.13.4 (17E202)

1 open http://songshuhui.net/archives/76501, you'll see (first two) formulas are well displayed.
2 switch to reader view mode (I didn't notice what this feature called in English, but guess you understand.)
3 check the formula. Screen shot attached.


Actual results:

Please see the screen shot.


Expected results:

correct formula.
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0
20180601100102
Status: UNCONFIRMED → NEW
Component: Untriaged → Reader Mode
Ever confirmed: true
OS: Unspecified → All
Product: Firefox → Toolkit
Hardware: Unspecified → All
See Also: → 1430493
Whiteboard: [reader-mode-readability-algorithm]
Thanks, so it looks like this is created using MathML but then when presented on the page each element of the equation has its own inline styles to position it as the author desired.

We could try to include inline styles that are inside of a MathJax generated fragment (seems like anything with a "MathJax-" ID prefix).

Peter, you're listed on the MathJax website as part of the team. Could you help out here? Would the above solution cover all MathJax cases? This is the repo for Firefox' Readability if you'd like to look around or help fix this bug :)
Flags: needinfo?(peter)
Priority: -- → P3
Hi Jared,

Happy to help.

> it looks like this is created using MathML but then when presented on the page each element of the equation has its own inline styles to position it as the author desired.

That's roughly correct. MathJax uses (something like) MathML as an internal format and can output HTML+CSS (3 different outputs actually) as well as SVG.

> We could try to include inline styles that are inside of a MathJax generated fragment (seems like anything with a "MathJax-" ID prefix).

Preserving inline styles would be necessary but unfortunately not be sufficient (in general).

MathJax also injects <style>s into <head> defining both class names and (depending on the output) font-face rules. In addition, if the site (or user) configuration sets MathJax to using the SVG output, then there will also be a "global" SVG with path information (for re-use with <use>); this SVG also gets lost in reader mode, breaking rendering.

Finally, there's a third, reverse problem: MathJax can (as on the linked page) add visually-hidden but accessible MathML (as a legacy solution for accessibility).

To summarize, I see three major areas:

a) keeping styles for MathJax's visual output intact
  * inline styles
  * styles in head
b) keeping MathJax's "global" SVG intact (id="MathJax_SVG_Hidden")
c) avoiding visually hidden content for accessibility from becoming visible

If there are modifications on the MathJax side that would help indicate "this should be kept for reader mode", I think that would be extremely helpful.

> This is the repo for Firefox' Readability if you'd like to look around or help fix this bug :)

I didn't see a link. Is it https://github.com/mozilla/readability ?
Flags: needinfo?(peter)
(In reply to Peter Krautzberger from comment #3)

Thanks!

> > This is the repo for Firefox' Readability if you'd like to look around or help fix this bug :)
> 
> I didn't see a link. Is it https://github.com/mozilla/readability ?

Yeah, sorry that's the link. I must have forgot to paste it in.

Gijs, what do you think about the above?
Flags: needinfo?(gijskruitbosch+bugs)
> Yeah, sorry that's the link. I must have forgot to paste it in.

Thanks for confirming.

A few questions: 

* where do you prefer individual reports (here or on the github repository)?
* is there documentation somewhere on what is whitelisted? 
  For example, I found that .sr-only content is passed through. That could help resolve c) on the MathJax end. But if that's not an official feature, it's hard to argue for such a change.
* do you consider the loss of <use> elements targets a bug (or a missing feature)?
* is there something MathJax could do to simplify preserving its styles?
(In reply to Peter Krautzberger from comment #5)
> > Yeah, sorry that's the link. I must have forgot to paste it in.
> 
> Thanks for confirming.
> 
> A few questions: 
> 
> * where do you prefer individual reports (here or on the github repository)?

Either is fine, we can continue here for now, to keep the discussion in 1 place.

> * is there documentation somewhere on what is whitelisted? 
>   For example, I found that .sr-only content is passed through. That could
> help resolve c) on the MathJax end. But if that's not an official feature,
> it's hard to argue for such a change.

In principle almost nothing should be "whitelisted" in that sense. We aim to get very clean markup across (but don't always succeed, for various reasons).

In fact, the sr-only content looks to me like it isn't in fact passed through right now, but perhaps I'm missing something.

> * do you consider the loss of <use> elements targets a bug (or a missing
> feature)?

I'm not sure what this question is asking. If you're asking, given:

<svg><rect ... /><use ... /></svg>

readability produces:

<svg><rect ... /></svg>

and silently drops the <use> tag, that's a bug.

If you're asking, why given:

<svg><rect id="foo"/></svg>
...
<svg><use href="#foo" /></svg>

readability brings in the second bit but not the first, I guess that's a bug too, but it might be harder to fix... In either case it might be worth a separate bug - at least, I don't see any SVG in the testcase from comment #0.

> * is there something MathJax could do to simplify preserving its styles?

Not easily... in principle, reader mode is aimed at making very ugly/busy markup (e.g. from mainline news websites) as simple/semantic as possible. We remove a lot of stuff (and should probably remove more than we do). We've only started excepting a few things recently, and should probably think harder about that.

I'm tempted to suggest just extracting the mathml and inserting that instead. I think it might break on Firefox on iOS which uses mobile webkit, but otherwise, I guess it should be OK? Otherwise, things get trickier.

(In reply to Peter Krautzberger from comment #3)
Are (a), (b) and (c) ALL required, or would any of the three fix the core issue here?

> a) keeping styles for MathJax's visual output intact
>   * inline styles
>   * styles in head

Readability outputs a `<div>` that consumers essentially stick in the <body> of a doc, so <head>/<link> styling would need to jump through hoops (<style>@import("...")</style>, I guess). But at a higher-level, we normally scrub all styling the webpage provides, so that we can provide a consistent reading experience. Adding expections goes against that policy a bit, and we'd need to make sure that we don't inadvertently include styling that messes with the overall look/feel of readability.

How does the <head> style get inserted? It looks off-hand like it's just an added <style> that's largely not identifiable. :-(

How important are the head style bits? I noticed some inclusion of fonts, which could be completely unimportant, or could be really really important if maths symbols don't display without those fonts...

This is all made more complicated because the content isn't scoped/separated from the sidebar with reader mode controls, and so including remote styles risks messing with those controls. bug 1204818 would fix this but so far nobody has had cycles to deal with the complexity of implementing that.

> b) keeping MathJax's "global" SVG intact (id="MathJax_SVG_Hidden")

This seems doable in principle.

> c) avoiding visually hidden content for accessibility from becoming visible

The version on github should already be dropping display: none'd bits, as well as nodes with HTML5's hidden attribute. It's possible the changes that scrubbed all classes except those on a list of 'allowed' ones broke integration with the `.sr-only` class, in which case we may need to add that back. Either way I need to sync up mozilla-central (from which we build Firefox) with the github version. That's on my list but I've been away for the last week and a bit. Hopefully I can get to it this week or next.
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(peter)
> In principle almost nothing should be "whitelisted" in that sense. 

Copy that. Though I would find that problem interesting (but I'm on the W3C Publishing Working Group so I'm biased).

> In fact, the sr-only content looks to me like it isn't in fact passed through right now, but perhaps I'm missing something.

From a quick test, it is passed through.

>> * do you consider the loss of <use> elements targets a bug (or a missing
>> feature)?
> 
> I'm not sure what this question is asking.

Reader mode strips href attributes and removes the targets of said attributes, e.g.,

* `<use xlink:href="#MJMAIN-31" x="1049" y="676"></use>` turns into `<use x="1049" y="676"></use>`
* document.getElementById('MJMAIN-31') fails in reader mode.

> In either case it might be worth a separate bug 

Will do.

> I don't see any SVG in the testcase from comment #0.

You can get this by Right/CMD-click an equation and change Math Settings => Math Renderer => SVG.

> I'm tempted to suggest just extracting the mathml and inserting that instead.

Generally speaking, MathML will not be in the page. In particular, it's not when MathJax's Accessibility Extensions are used (which replace the legacy approach using visually-hidden MathML).

> How does the <head> style get inserted? 

That depends. If MathJax is used server-side, the styles most likely end up in some stylesheet. If it's used client-side, the styles get dynamically injected into the head.

> How important are the head style bits?

Critical for CSS-based layout. For SVG-based output the main loss would be inline vs block.

>> c) avoiding visually hidden content for accessibility from becoming visible
> 
> The version on github should already be dropping display: none'd bits, as well as nodes with HTML5's hidden attribute.

sr-only content would not be stripped that way since it's not display:none'd or hidden=true -- those would prevent the content from being in the accessibility tree.


My personal recap is:

a) CSS-based equation layout probably cannot be made to work in Reader Mode.
b) SVG-based equation output seems to run into bugs for and there might be interest in fixing them.
c) the issues with MathJax's AssistiveMML extension can be solved end by adding sr-only as a class name.
Flags: needinfo?(peter)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: