Closed Bug 1147337 Opened 9 years ago Closed 9 years ago

Reader View bounces back or loading white page when escaped characters exists in URL

Categories

(Firefox Graveyard :: Reading List, defect)

38 Branch
x86_64
Windows 8.1
defect
Not set
normal

Tracking

(firefox38 verified, firefox39 fixed, firefox40 fixed)

VERIFIED FIXED
Firefox 40
Tracking Status
firefox38 --- verified
firefox39 --- fixed
firefox40 --- fixed

People

(Reporter: petercpg, Assigned: Gijs)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 2 obsolete files)

STR:
1. Set reader.enabled and reader.parse-on-load.enabled to true in about:config
2. Open followings pages and click reading view button in location bar:
  a. http://moztw.org/space/
  b. http://moztw.org/space/?abc
  c. http://moztw.org/space/?;$%^^
  d. http://moztw.org/space/?測試參數

For a and b, reading view should be loading correctly.
For c, reading view loads an empty page.
For d, reading view bounces back to the webpage itself

Expected Result:
Reading view for all links should be loading.
Flags: qe-verify+
(In reply to Peter Pin-Guang Chen [:petercpg] (MozTW.org) from comment #0)
> STR:
> 1. Set reader.enabled and reader.parse-on-load.enabled to true in
> about:config
> 2. Open followings pages and click reading view button in location bar:
>   a. http://moztw.org/space/
>   b. http://moztw.org/space/?abc
>   c. http://moztw.org/space/?;$%^^
>   d. http://moztw.org/space/?測試參數
> 
> For a and b, reading view should be loading correctly.
> For c, reading view loads an empty page.
> For d, reading view bounces back to the webpage itself
> 
> Expected Result:
> Reading view for all links should be loading.

Once bug 1147122 is loaded, you'll see an error message instead of the bouncing back to the original page, but that's still not good.

There are really two different bugs here:

1) For c, I'm seeing an error in ReaderParent.jsm:
malformed URI sequence ReaderParent.jsm:198:0

2) For d, I don't see an error, so that must mean Readability.js isn't finding an article here. I suspect this is also caused by some issue parsing the URI.
(In reply to :Margaret Leibovic from comment #1)
> Once bug 1147122 is loaded, you'll see an error message instead of the
> bouncing back to the original page, but that's still not good.
> 
> There are really two different bugs here:
> 
> 1) For c, I'm seeing an error in ReaderParent.jsm:
> malformed URI sequence ReaderParent.jsm:198:0
> 
> 2) For d, I don't see an error, so that must mean Readability.js isn't
> finding an article here. I suspect this is also caused by some issue parsing
> the URI.

After re-thinking these cases, I would say case c is quite minor as they are only unsafe characters, the content system should encode well. However, c and d can be happened in a mixed form, as SEO url path. Since only few characters are legal in URL, it's common to find this bug happen in content sites (news, blogs) from non-ASCII using regions.
Might be related to Bug 1151205.

For what it's worth, d is fixed by my silly patch from comment 1[0]

[0]: https://bugzilla.mozilla.org/show_bug.cgi?id=1151205#c1
The solution I proposed in Bug 1151205 was erroneous, right fix is:

diff --git a/toolkit/components/reader/AboutReader.jsm b/toolkit/components/reader/AboutReader.jsm
index 271ab6c..02a8f36 100644
--- a/toolkit/components/reader/AboutReader.jsm
+++ b/toolkit/components/reader/AboutReader.jsm
@@ -604,7 +604,7 @@ AboutReader.prototype = {
       return;
     }
 
-    if (article && article.url == url) {
+    if (article && decodeURIComponent(article.url) == url) {
       this._showContent(article);
     } else if (this._articlePromise) {
       // If we were promised an article, show an error message if there's a failure.

I'll attach a patch.
Comment on attachment 8588405 [details] [diff] [review]
Fixed Reader View fails loading when URL contains escaped characters.

Review of attachment 8588405 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/reader/AboutReader.jsm
@@ +603,5 @@
>      if (this._windowUnloaded) {
>        return;
>      }
>  
> +    if (article && decodeURIComponent(article.url) == url) {

I'm a bit wary of this change, since we use article.url in a bunch of other places in the code, including when we add items to the reading list (and we use this as the key when caching article content for Android).

We set article.url here:
http://mxr.mozilla.org/mozilla-central/source/toolkit/components/reader/ReaderMode.jsm#284

I feel like we should probably be doing this decodeURIComponent call here instead. Although I'm not sure if it's expected to have an encoded URL at that point... which would make me wonder how we have an encoded URL by the time we end up here.

Also, I just tried calling `decodeURIComponent("http://moztw.org/space/?;$%^^")` in the console, and I got a malformed URI error, so I don't think this will actually fix that error case.

I saw in bug 1151205 that you suggested just removing this check, and I feel like that may actually be a reasonable approach. This check is very old, and I don't see how it's actually necessary with the current logic (we pass around the URL to try to get the article, so I don't think there could be some sort of state change that would lead to use getting back the wrong article).

It looks like we'd still run into a similar issue on mobile:
http://mxr.mozilla.org/mozilla-central/source/mobile/android/chrome/content/Reader.js#278

But I'm planning to get rid of this in bug 1150174, so we don't need to worry about that case.

So... at the end of my rambling, I think we should just get rid of this URL check. What do you think?
This addresses the malformed URI exception that's happening when we try calling decodeURIComponnt. This appears to work properly, although I'm trying to think if this might paper over any real problems... one concern is that this would allow users to store these malformed URLs in the reading list, but that's already possible if they can add any URL from the urlbar to the reading list.

So, catching that error solves case (c) and removing the article.url check solves case (d).
Attachment #8588812 - Flags: review?(nperriault)
Also, we should really find a shared place for this _getOriginalURL logic. Maybe that would be worth adding to ReaderMode.jsm.
Attachment #8588405 - Flags: review?(margaret.leibovic) → review-
(In reply to :Margaret Leibovic from comment #8)
> So... at the end of my rambling, I think we should just get rid of this URL
> check. What do you think?


Won't this break the case where we're trying to load a different article in reader mode while we're already in reader mode?
Comment on attachment 8588812 [details] [diff] [review]
Fall back to raw URL parameter if decodeURIComponent fails to get original URL

Review of attachment 8588812 [details] [diff] [review]:
-----------------------------------------------------------------

> Also, we should really find a shared place for this _getOriginalURL logic. Maybe that would be worth adding to ReaderMode.jsm.

Yeah, I'd really like we factorize _getOriginalURL before landing this patch. Also, if we could add some form of unit tests for it, that would be a big win (though I have no idea how we could setup some here).
Attachment #8588812 - Flags: review?(nperriault) → review-
Depends on: 1152121
Comment on attachment 8588812 [details] [diff] [review]
Fall back to raw URL parameter if decodeURIComponent fails to get original URL

I decided to split this off into bug 1152121.

We can keep this bug focused on dealing with the escaped character issue.
Attachment #8588812 - Attachment is obsolete: true
(In reply to :Gijs Kruitbosch from comment #11)
> (In reply to :Margaret Leibovic from comment #8)
> > So... at the end of my rambling, I think we should just get rid of this URL
> > check. What do you think?
> 
> 
> Won't this break the case where we're trying to load a different article in
> reader mode while we're already in reader mode?

How do you mean? I believe we create a new AboutReader instance every time we load an about:reader URL, so I don't think we ever change the URL within a single AboutReader instance.
(In reply to :Margaret Leibovic from comment #14)
> (In reply to :Gijs Kruitbosch from comment #11)
> > (In reply to :Margaret Leibovic from comment #8)
> > > So... at the end of my rambling, I think we should just get rid of this URL
> > > check. What do you think?
> > 
> > 
> > Won't this break the case where we're trying to load a different article in
> > reader mode while we're already in reader mode?
> 
> How do you mean? I believe we create a new AboutReader instance every time
> we load an about:reader URL, so I don't think we ever change the URL within
> a single AboutReader instance.

In that case I don't know why we are currently checking the URL...
(In reply to :Gijs Kruitbosch from comment #15)
> (In reply to :Margaret Leibovic from comment #14)
> > (In reply to :Gijs Kruitbosch from comment #11)
> > > (In reply to :Margaret Leibovic from comment #8)
> > > > So... at the end of my rambling, I think we should just get rid of this URL
> > > > check. What do you think?
> > > 
> > > 
> > > Won't this break the case where we're trying to load a different article in
> > > reader mode while we're already in reader mode?
> > 
> > How do you mean? I believe we create a new AboutReader instance every time
> > we load an about:reader URL, so I don't think we ever change the URL within
> > a single AboutReader instance.
> 
> In that case I don't know why we are currently checking the URL...

Yeah, I think this is some vestigal old mobile logic, leftover from when we had different async logic in place for loading the article, where this may have actually been an issue.
Blocks: 1150229
Attached file MozReview Request: bz://1147337/Gijs (obsolete) —
/r/6745 - Bug 1147337 - stop checking article URL as AboutReader.jsm gets created separately every time anyway, r?margaret

Pull down this commit:

hg pull -r d56acc2bb7d8446259de0ba40cc56af25306bd38 https://reviewboard-hg.mozilla.org/gecko/
Attachment #8589955 - Flags: review?(margaret.leibovic)
Comment on attachment 8589955 [details]
MozReview Request: bz://1147337/Gijs

https://reviewboard.mozilla.org/r/6743/#review5641

Ship It!
Attachment #8589955 - Flags: review?(margaret.leibovic) → review+
Keywords: checkin-needed
remote:   https://hg.mozilla.org/integration/fx-team/rev/6be66458eff4
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Iteration: --- → 40.1 - 13 Apr
Points: --- → 1
Flags: in-testsuite-
Flags: firefox-backlog+
Keywords: checkin-needed
Blocks: 1132074
Comment on attachment 8589955 [details]
MozReview Request: bz://1147337/Gijs

Approval Request Comment
[Feature/regressing bug #]: Reader Mode
[User impact if declined]: can't use reader mode on pages with particular characters in the URL or that are used as single-page apps
[Describe test coverage new/current, TreeHerder]: nope :-(
[Risks and why]: pretty low, one-line change which we're confident was overzealous in checking URLs
[String/UUID change made/needed]: nope
Attachment #8589955 - Flags: approval-mozilla-beta?
Attachment #8589955 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/6be66458eff4
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 40
QA Contact: andrei.vaida
Comment on attachment 8589955 [details]
MozReview Request: bz://1147337/Gijs

Should be in 38 beta 4
Attachment #8589955 - Flags: approval-mozilla-beta?
Attachment #8589955 - Flags: approval-mozilla-beta+
Attachment #8589955 - Flags: approval-mozilla-aurora?
Attachment #8589955 - Flags: approval-mozilla-aurora+
Mistakenly filed against Firefox 38 and should be instead 38 Branch. Sorry for the spam. dkl
Version: Firefox 38 → 38 Branch
I'm still seeing the issue related to: 
>   c. http://moztw.org/space/?;$%^^
on Beta 38.0b4-build1 (20150413143743), Aurora 39.0a2 (2015-04-15) and Nightly 40.0a1 (2015-04-15), using Ubuntu 14.04 (x64), Windows 7 (x64) and Mac OS X 10.9.5. Here's a screenshot - http://i.imgur.com/TE8WntQ.png.

Gijs, am I missing something here?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Andrei Vaida, QA [:avaida] from comment #27)
> I'm still seeing the issue related to: 
> >   c. http://moztw.org/space/?;$%^^
> on Beta 38.0b4-build1 (20150413143743), Aurora 39.0a2 (2015-04-15) and
> Nightly 40.0a1 (2015-04-15), using Ubuntu 14.04 (x64), Windows 7 (x64) and
> Mac OS X 10.9.5. Here's a screenshot - http://i.imgur.com/TE8WntQ.png.
> 
> Gijs, am I missing something here?

This also needs the fix from bug 1152121 which hasn't been uplifted yet.
Flags: needinfo?(gijskruitbosch+bugs)
FWIW, you should be able to test for this bug being fixed specifically against pages that do have %-encoded things in the URL, but where that %-encoding is actually valid.
Verified fixed on Beta 38.0b6-build1 (20150420134330), using Windows 7 (x64), Ubuntu 14.04 (x64) and Mac OS X 10.9.5.
Status: RESOLVED → VERIFIED
Looks like there's no more manual qa verification needed here.
Flags: qe-verify+
Attachment #8589955 - Attachment is obsolete: true
Attachment #8619867 - Flags: review+
See Also: → 1225448
Product: Firefox → Firefox Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: