Open Bug 1371876 Opened 5 years ago Updated 5 years ago

Preserve strike-through text-decoration in reader mode (and potentially other semantics-related style properties)

Categories

(Toolkit :: Reader Mode, defect, P4)

53 Branch
defect

Tracking

()

People

(Reporter: rulatir, Unassigned)

References

Details

(Whiteboard: [reader-mode-readability-algorithm])

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0
Build ID: 20170520072017

Steps to reproduce:

Used Firefox's reader mode to read http://www.scp-wiki.net/scp-520 , a story on the creative fiction website SCP Foundation. Some text in this story has text-decoration: line-through which is very much part of the fiction and is intended to convey the style of an official document that has undergone redaction and modification.


Actual results:

The text that should have been line-through was not. The style attribute was stripped from the surrounding span element.


Expected results:

The text-decoration style attribute should have been preserved, or there should be an option to preserve it.
Component: Untriaged → Reader Mode
Product: Firefox → Toolkit
FWIW, if you have control over the markup, this wouldn't be an issue if the site simply used <del>tags</del> to represent the semantics (which should therefore be markup rather than style) that these bits of text were quasi-deleted.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P4
Summary: Add option to preserve text-decoration in reader mode → Preserve strike-through text-decoration in reader mode (and potentially other semantics-related style properties)
See Also: → 1391313
IMO "hyphens:" and "content:" should survive in Reader Mode
... and the "lang=" attribute because of that
(In reply to j.j. from comment #3)
> ... and the "lang=" attribute because of that

I think the lang attribute already does, but I haven't checked this.
Whiteboard: [reader-mode-readability-algorithm]
(In reply to j.j. from comment #2)
> "content:" should survive in Reader Mode

Presumably only if it contains text? If also for images, then that would imply background images should survive too, which seems odd.
(In reply to :Gijs from comment #4)
> (In reply to j.j. from comment #3)
> > ... and the "lang=" attribute because of that
> 
> I think the lang attribute already does, but I haven't checked this.

does not here: http://elektronotdienst-nuernberg.de/rechnungsaenderung.html
(In reply to :Gijs from comment #5)
> (In reply to j.j. from comment #2)
> > "content:" should survive in Reader Mode
> 
> Presumably only if it contains text?

Yes, my personal usecase is text. And it's somewhat common these days, to hide/protect email addresses using CSS content.
(In reply to j.j. from comment #6)
> (In reply to :Gijs from comment #4)
> > (In reply to j.j. from comment #3)
> > > ... and the "lang=" attribute because of that
> > 
> > I think the lang attribute already does, but I haven't checked this.
> 
> does not here: http://elektronotdienst-nuernberg.de/rechnungsaenderung.html

Hm, you're right. This is somewhat surprising to me, I thought we had code to deal with this that got added when the narrate feature was implemented. Eitan?

But to be completely clear, the lang attribute in that page is on the <html> node, and that node does not get included in reader mode, which explains why it's not there (though I did think we moved it in that case...). What I meant was that, unlike style attributes which reader mode reflexively removes altogether, I don't think we ever actively remove lang attributes from markup, so if you were looking at, say, a correctly marked up translation of Tolstoy's War & Peace, and the letters that the nobles write were rendered in French like, say, <blockquote lang=fr>...</blockquote>, we would keep the lang attribute(s).

I do appreciate that that's no use to you on that page, and that there's probably markup guidelines that suggest having the lang attribute on body/html, and that we should try to deal with that situation better (we can't well put it on reader mode's html/body tags because it might not match the UI language of Firefox, and our UI is included in that document - another reason to pursue bug 1204818...).
Flags: needinfo?(eitan)
I don't think we ever preserved the lang tag in the sanitized content.

We do some autodetection for the narrate mode. ... and I just found out we do that twice now because it was added in ReaderMode.jsm to predict article read times.
Flags: needinfo?(eitan)
You need to log in before you can comment on or make changes to this bug.