Reader View ignores <base href="...">

RESOLVED FIXED in Firefox 52

Status

()

Toolkit
Reader Mode
P2
normal
RESOLVED FIXED
2 years ago
26 days ago

People

(Reporter: Hagen von Eitzen, Assigned: evanxd)

Tracking

(Blocks: 1 bug)

38 Branch
mozilla52
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox52 fixed)

Details

(Whiteboard: [reader-mode-readability-algorithm])

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(1 attachment)

(Reporter)

Description

2 years ago
str
User Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Firefox/38.0
Build ID: 20150525141253

Steps to reproduce:

1. Visit a page with a <base> tag in its <head> and a relative link in its body (example: https://www.redeker.de/main-V2.php/de/news/pm20150611.html )
2. switch to "Reader View"
3. Follow the link


Actual results:

The link leads to the wrong or a nonexisting page because it is interperted relative to the page url instead of relative to the <base> url. Apparently, the attempt to remove everything "superfluous" from the DOM tree removed the <base href="..."/> from within the <head>, but as this influences all relative links of the document it is essential rather than superfluous


Expected results:

The correct page should be displayed

Updated

2 years ago
Component: Untriaged → Reader Mode
Product: Firefox → Toolkit
I can confirm that this happens with another example like wikipedia

https://en.wikipedia.org/wiki/Cley_Marshes#cite_note-NAE9-1
Click on a footnote url and the page will not scroll. < a href = "#cite_note-NAE9-1">

Support thread reference: https://support.mozilla.org/en-US/questions/1075085
Status: UNCONFIRMED → NEW
Ever confirmed: true

Updated

10 months ago
Duplicate of this bug: 1182433

Updated

10 months ago
Priority: -- → P2

Updated

10 months ago
Whiteboard: [reader-mode-readability-algorithm]

Updated

10 months ago
Blocks: 1286221
(Assignee)

Updated

7 months ago
Assignee: nobody → evan
(Assignee)

Comment 3

7 months ago
Sent a pull request[1] for reviewing.

The patch: https://github.com/mozilla/readability/pull/315/commits/bcd198c44e2aa09525ffc9454adaf5572568a644

[1]: https://github.com/mozilla/readability/pull/315

Comment 4

7 months ago
I think this needs to be fixed in the consumer (ie ReaderMode.jsm which sets the URL to pass to Readability) by using document.baseURI instead of document.documentURI here: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/reader/ReaderMode.jsm#199 .

We should be quite careful to check if document.baseURI is somehow security checked so that we don't open ourselves up to pages abusing <base> to get treated like chrome:// or google.com (when they're not) or whatever.
Comment hidden (mozreview-request)
(Assignee)

Comment 6

7 months ago
(In reply to :Gijs Kruitbosch from comment #4)
> I think this needs to be fixed in the consumer (ie ReaderMode.jsm which sets
> the URL to pass to Readability) by using document.baseURI instead of
> document.documentURI here:
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/reader/
> ReaderMode.jsm#199 .
Good idea. Let's do it.

> We should be quite careful to check if document.baseURI is somehow security
> checked so that we don't open ourselves up to pages abusing <base> to get
> treated like chrome:// or google.com (when they're not) or whatever.
We already do some check in `_shouldCheckUri` method[1]. But do we really need to forbid that someone use different domain in `<base>` tag in reader mode?(e.g. a webpage[2] use `https://www.google.com.tw/images/` as its base URI, I think reader mode should support it)

I think people could use any valid URI as their base URI if they want. What do you think?

[1]: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/reader/ReaderMode.jsm#370-394
[2]: http://evanxd.io/readability/test/test-pages/blogger/source.html

Comment 7

7 months ago
mozreview-review
Comment on attachment 8808095 [details]
Bug 1173823 - Always use document.baseURI as the uri parameter of _readerParse method,

https://reviewboard.mozilla.org/r/91030/#review90780

::: toolkit/components/reader/ReaderMode.jsm:201
(Diff revision 1)
>     * @resolves JS object representing the article, or null if no article is found.
>     */
>    parseDocument: Task.async(function* (doc) {
> -    let uri = Services.io.newURI(doc.documentURI, null, null);
> -    if (!this._shouldCheckUri(uri)) {
> +    let documentURI = Services.io.newURI(doc.documentURI, null, null);
> +    let baseURI = Services.io.newURI(doc.baseURI, null, null);
> +    if (!this._shouldCheckUri(documentURI) || !this._shouldCheckBaseUri(baseURI)) {

I think we can just reuse `_shouldCheckUri` instead of duplicating it?
Attachment #8808095 - Flags: review?(gijskruitbosch+bugs) → review-
Comment hidden (mozreview-request)

Comment 9

7 months ago
mozreview-review
Comment on attachment 8808095 [details]
Bug 1173823 - Always use document.baseURI as the uri parameter of _readerParse method,

https://reviewboard.mozilla.org/r/91030/#review91208
Attachment #8808095 - Flags: review?(gijskruitbosch+bugs) → review+

Comment 10

7 months ago
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/e32d64d47c52
Always use document.baseURI as the uri parameter of _readerParse method, r=Gijs

Comment 11

7 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/e32d64d47c52
Status: NEW → RESOLVED
Last Resolved: 7 months ago
status-firefox52: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
QA Whiteboard: [good first verify]
Depends on: 1358248
You need to log in before you can comment on or make changes to this bug.