Closed Bug 1173823 Opened 5 years ago Closed 3 years ago

Reader View ignores <base href="...">

Categories

(Toolkit :: Reader Mode, defect, P2)

38 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: eitzen, Assigned: evanxd)

References

(Blocks 1 open bug)

Details

(Whiteboard: [reader-mode-readability-algorithm])

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101 Firefox/38.0
Build ID: 20150525141253

Steps to reproduce:

1. Visit a page with a <base> tag in its <head> and a relative link in its body (example: https://www.redeker.de/main-V2.php/de/news/pm20150611.html )
2. switch to "Reader View"
3. Follow the link


Actual results:

The link leads to the wrong or a nonexisting page because it is interperted relative to the page url instead of relative to the <base> url. Apparently, the attempt to remove everything "superfluous" from the DOM tree removed the <base href="..."/> from within the <head>, but as this influences all relative links of the document it is essential rather than superfluous


Expected results:

The correct page should be displayed
Component: Untriaged → Reader Mode
Product: Firefox → Toolkit
I can confirm that this happens with another example like wikipedia

https://en.wikipedia.org/wiki/Cley_Marshes#cite_note-NAE9-1
Click on a footnote url and the page will not scroll. < a href = "#cite_note-NAE9-1">

Support thread reference: https://support.mozilla.org/en-US/questions/1075085
Status: UNCONFIRMED → NEW
Ever confirmed: true
Duplicate of this bug: 1182433
Priority: -- → P2
Whiteboard: [reader-mode-readability-algorithm]
Blocks: 1286221
Assignee: nobody → evan
I think this needs to be fixed in the consumer (ie ReaderMode.jsm which sets the URL to pass to Readability) by using document.baseURI instead of document.documentURI here: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/reader/ReaderMode.jsm#199 .

We should be quite careful to check if document.baseURI is somehow security checked so that we don't open ourselves up to pages abusing <base> to get treated like chrome:// or google.com (when they're not) or whatever.
(In reply to :Gijs Kruitbosch from comment #4)
> I think this needs to be fixed in the consumer (ie ReaderMode.jsm which sets
> the URL to pass to Readability) by using document.baseURI instead of
> document.documentURI here:
> https://dxr.mozilla.org/mozilla-central/source/toolkit/components/reader/
> ReaderMode.jsm#199 .
Good idea. Let's do it.

> We should be quite careful to check if document.baseURI is somehow security
> checked so that we don't open ourselves up to pages abusing <base> to get
> treated like chrome:// or google.com (when they're not) or whatever.
We already do some check in `_shouldCheckUri` method[1]. But do we really need to forbid that someone use different domain in `<base>` tag in reader mode?(e.g. a webpage[2] use `https://www.google.com.tw/images/` as its base URI, I think reader mode should support it)

I think people could use any valid URI as their base URI if they want. What do you think?

[1]: https://dxr.mozilla.org/mozilla-central/source/toolkit/components/reader/ReaderMode.jsm#370-394
[2]: http://evanxd.io/readability/test/test-pages/blogger/source.html
Comment on attachment 8808095 [details]
Bug 1173823 - Always use document.baseURI as the uri parameter of _readerParse method,

https://reviewboard.mozilla.org/r/91030/#review90780

::: toolkit/components/reader/ReaderMode.jsm:201
(Diff revision 1)
>     * @resolves JS object representing the article, or null if no article is found.
>     */
>    parseDocument: Task.async(function* (doc) {
> -    let uri = Services.io.newURI(doc.documentURI, null, null);
> -    if (!this._shouldCheckUri(uri)) {
> +    let documentURI = Services.io.newURI(doc.documentURI, null, null);
> +    let baseURI = Services.io.newURI(doc.baseURI, null, null);
> +    if (!this._shouldCheckUri(documentURI) || !this._shouldCheckBaseUri(baseURI)) {

I think we can just reuse `_shouldCheckUri` instead of duplicating it?
Attachment #8808095 - Flags: review?(gijskruitbosch+bugs) → review-
Comment on attachment 8808095 [details]
Bug 1173823 - Always use document.baseURI as the uri parameter of _readerParse method,

https://reviewboard.mozilla.org/r/91030/#review91208
Attachment #8808095 - Flags: review?(gijskruitbosch+bugs) → review+
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/e32d64d47c52
Always use document.baseURI as the uri parameter of _readerParse method, r=Gijs
https://hg.mozilla.org/mozilla-central/rev/e32d64d47c52
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
QA Whiteboard: [good first verify]
Depends on: CVE-2017-7762
You need to log in before you can comment on or make changes to this bug.