Closed Bug 1280888 Opened 4 years ago Closed 3 years ago

Relative links in about:reader documents (to #foo) should scroll to the anchor in the document instead of linking to the original page in a new tab

Categories

(Toolkit :: Reader Mode, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox48 --- wontfix
firefox49 --- wontfix
firefox50 - wontfix
firefox51 + fixed

People

(Reporter: Gijs, Assigned: Gijs)

References

(Blocks 1 open bug)

Details

(Whiteboard: [reader-mode-firefox-integration])

Attachments

(1 file)

STR:

1. open http://www.gutenberg.org/files/3300/3300-h/3300-h.htm
2. enter reader mode
3. click a link in the table of contents

ER:
scroll the target of the link into view

AR:
the original page opens in a new tab with the target of the link scrolled into view
Not entirely sure if this is caused by bug 1280584 but I wouldn't be surprised if it was.
See Also: → 1280584
In firefox 47, entering reader mode and clicking an anchor leads to loading the folder and using the anchor as filename, ex:

1/ go to http://www.gutenberg.org/files/3300/3300-h/3300-h.htm
2/ Enter reader mode
3/ click on 'Introduction'
expected result:
display http://www.gutenberg.org/files/3300/3300-h/3300-h.htm#link2H_4_0002 in reader mode
actualresult:
url loaded is http://www.gutenberg.org/files/3300/3300-h/#link2H_INTR

In Firefox 50, I can see the behaviour you are seeing, it seems broken to me in both 47 and 50 but in different ways.
(In reply to Pascal Chevrel:pascalc from comment #2)
> In firefox 47, entering reader mode and clicking an anchor leads to loading
> the folder and using the anchor as filename, ex:
> 
> 1/ go to http://www.gutenberg.org/files/3300/3300-h/3300-h.htm
> 2/ Enter reader mode
> 3/ click on 'Introduction'
> expected result:
> display http://www.gutenberg.org/files/3300/3300-h/3300-h.htm#link2H_4_0002
> in reader mode
> actualresult:
> url loaded is http://www.gutenberg.org/files/3300/3300-h/#link2H_INTR
> 
> In Firefox 50, I can see the behaviour you are seeing, it seems broken to me
> in both 47 and 50 but in different ways.

I'm aware of all of that - I wrote the fix that changed the behaviour in this way. :-\

The fix is on 48. Why do you think this is severe enough to track (for 50?) if it didn't work in 47 and before either?
Flags: needinfo?(pascalc)
(In reply to :Gijs Kruitbosch from comment #3)
> (In reply to Pascal Chevrel:pascalc from comment #2)
> > In firefox 47, entering reader mode and clicking an anchor leads to loading
> > the folder and using the anchor as filename, ex:
> > 
> > 1/ go to http://www.gutenberg.org/files/3300/3300-h/3300-h.htm
> > 2/ Enter reader mode
> > 3/ click on 'Introduction'
> > expected result:
> > display http://www.gutenberg.org/files/3300/3300-h/3300-h.htm#link2H_4_0002
> > in reader mode
> > actualresult:
> > url loaded is http://www.gutenberg.org/files/3300/3300-h/#link2H_INTR
> > 
> > In Firefox 50, I can see the behaviour you are seeing, it seems broken to me
> > in both 47 and 50 but in different ways.
> 
> I'm aware of all of that - I wrote the fix that changed the behaviour in
> this way. :-\
> 
> The fix is on 48. Why do you think this is severe enough to track (for 50?)
> if it didn't work in 47 and before either?

Because that's a behaviour change currently on Nightly that we may not want to see ship in this current state in 50 and if it does, it might require a release note mention as a known issue.
Flags: needinfo?(pascalc)
(In reply to Pascal Chevrel:pascalc from comment #4)
> (In reply to :Gijs Kruitbosch from comment #3)
> > (In reply to Pascal Chevrel:pascalc from comment #2)
> > > In firefox 47, entering reader mode and clicking an anchor leads to loading
> > > the folder and using the anchor as filename, ex:
> > > 
> > > 1/ go to http://www.gutenberg.org/files/3300/3300-h/3300-h.htm
> > > 2/ Enter reader mode
> > > 3/ click on 'Introduction'
> > > expected result:
> > > display http://www.gutenberg.org/files/3300/3300-h/3300-h.htm#link2H_4_0002
> > > in reader mode
> > > actualresult:
> > > url loaded is http://www.gutenberg.org/files/3300/3300-h/#link2H_INTR
> > > 
> > > In Firefox 50, I can see the behaviour you are seeing, it seems broken to me
> > > in both 47 and 50 but in different ways.
> > 
> > I'm aware of all of that - I wrote the fix that changed the behaviour in
> > this way. :-\
> > 
> > The fix is on 48. Why do you think this is severe enough to track (for 50?)
> > if it didn't work in 47 and before either?
> 
> Because that's a behaviour change currently on Nightly that we may not want
> to see ship in this current state in 50 and if it does, it might require a
> release note mention as a known issue.

I don't think this is true. There's a behaviour change in 48, compared to 47 (not specifically on Nightly, see the dep), and it's gone from awful to slightly less awful. We didn't have a release note for the awful version, there's no need to have one for the slightly less awful one.
Flags: needinfo?(gijskruitbosch+bugs)
Priority: -- → P2
Blocks: 1286221
Duplicate of this bug: 1180127
Duplicate of this bug: 1218226
Whiteboard: [reader-mode-firefox-integration]
Duplicate of this bug: 1286837
There are two reasons it doesn't work.

1) we (or readability?) put an xml:base tag on the readability page.
2) readability strips anchor ids, AFAICT, so the internal links are broken even if you input them manually.
Boris, it seems the xml:base tag on the main reader fragment is set by nsIParserUtils::ParseFragment. I can obviously easily remove it from the callsite after it does this, but I'm wondering what the security implications are. AIUI the main reason the param exists is to make sure that links inside the fragment are resolved relative to the original URI, but reader mode rewrites all the <a> links itself anyway, and AIUI both reader mode and nsTreeSanitizer should be stripping out data/javascript links. So I'm not sure it's actually having any security impact (and it wouldn't stop data/javascript: URIs anyway).

The TL;DR here is that the base URI breaks the hash links which should be resolved in the about:reader?url=<escaped uri document instead of the base URI (the original URI) against which they're being resolved instead. The trivial resolution would be to remove the base URL, but I'd like to understand the security implications of that better and/or get alternative suggestions...
Flags: needinfo?(gijskruitbosch+bugs) → needinfo?(bzbarsky)
There are no security implications I'm aware of.  This isn't a security feature; it's a "make the relative links work" feature.

Note that you can just pass a null base URI to parseFragment if you don't want it to add the xml:base things.
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] from comment #11)
> There are no security implications I'm aware of.  This isn't a security
> feature; it's a "make the relative links work" feature.
> 
> Note that you can just pass a null base URI to parseFragment if you don't
> want it to add the xml:base things.

AFAICT it gets inserted into the fragment before it gets to nsTreeSanitizer who uses element->GetBaseURI() to make security decisions (notably can X link to Y), so I think not passing it isn't quite the same. :-(
Ah, looks like nsTreeSanitizer converts all URLs to absolute.  So yeah, you do have to pass the URI.  This API needs a lot more documentation.  :(
Oh, I guess it doesn't actually convert URLs to absolute.  It just removes the ones it couldn't have done that with, but leaves the others as-is.

I guess that's where your question about security comes from too.  Well, your new URL will be non-hierarchical, so relative URIs will generally fail anyway.  One thing to worry about is <a href="?url=stuff"> or something like that, right?
(In reply to Boris Zbarsky [:bz] from comment #14)
> Oh, I guess it doesn't actually convert URLs to absolute.  It just removes
> the ones it couldn't have done that with, but leaves the others as-is.
> 
> I guess that's where your question about security comes from too.  Well,
> your new URL will be non-hierarchical, so relative URIs will generally fail
> anyway.  One thing to worry about is <a href="?url=stuff"> or something like
> that, right?

Right. AIUI the reader mode code already takes care of this by making all URLs absolute, but I don't trust it as much as I trust nsIParserUtils to be thorough and actually "get" everything. :-\

I could of course try to fix all the hash links after the whole thing has gone through (wrap them all in other <base> tags with a different URL or something?), but I don't know if that actually works...
You could go through and absolutize all the hash links against the document URL, I guess....
(In reply to Boris Zbarsky [:bz] from comment #14)
> Oh, I guess it doesn't actually convert URLs to absolute.  It just removes
> the ones it couldn't have done that with, but leaves the others as-is.
> 
> I guess that's where your question about security comes from too.  Well,
> your new URL will be non-hierarchical, so relative URIs will generally fail
> anyway.  One thing to worry about is <a href="?url=stuff"> or something like
> that, right?

... although from a quick test, none of that actually works. Neither "foo" or "?foo" work because newURI with a base URI for about: URIs always fails (back to bug 1280584, which addresses that only for the "#foo" case).

Not sure how fragile relying on that failing would be.
The general solution here is bug 1204818, of course - we should just have a document that doesn't have the about: documentURI in the first place. But that's a non-trivial change, though all this hemming and hawing sure makes it sound more attractive.
Tracking 50- for now - we can track this for 50. :Gijs both this bug and the bug in Comment 18 aren't assigned to anybody - should they be?
Flags: needinfo?(gijskruitbosch+bugs)
NB: this patch explicitly does not fix keeping <a name/id=foo> markup where that's currently not kept, leading to dead links, because that's an issue with Readability and needs fixing in the relevant github repo, rather than in m-c. This means that the fix will help some testcases more than others.
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Flags: needinfo?(gijskruitbosch+bugs)
Comment on attachment 8784398 [details]
Bug 1280888 - make hash links in about:reader absolute to avoid kicking people out of reader mode unnecessarily,

https://reviewboard.mozilla.org/r/73864/#review72424
Attachment #8784398 - Flags: review?(jaws) → review+
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/baa5b9c2dc71
make hash links in about:reader absolute to avoid kicking people out of reader mode unnecessarily, r=jaws
https://hg.mozilla.org/mozilla-central/rev/baa5b9c2dc71
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Is it something we want to uplift to 50? Too late for 49
(In reply to Sylvestre Ledru [:sylvestre] from comment #25)
> Is it something we want to uplift to 50? Too late for 49

I'd rather this just rode the trains - it hasn't worked "well" since basically forever. If/when we also make fixes to how we keep hash links in the document we can make sure they ride/uplift to the same train so we don't just swap one bad behaviour for another. 50 goes to beta "soon" which gives us much less time to do that.
Depends on: 1340799
You need to log in before you can comment on or make changes to this bug.