Closed Bug 1640490 Opened 4 years ago Closed 4 years ago

Reader mode shows wrong content on some pages because it always re-fetches the article from the original URL

Categories

(Toolkit :: Reader Mode, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox-esr68 --- unaffected
firefox76 --- unaffected
firefox77 --- unaffected
firefox78 + fixed
firefox79 --- fixed

People

(Reporter: evilpie, Assigned: enndeakin)

References

(Regressed 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Bug 1591469 broke something for login protected sites. Since this change the reader mode will just contain a "readable" version of the login form, instead of the blog post after login. I can reproduce this on [removed], but I obviously can't share that password.

12:03.66 INFO: Last good revision: c060bcf8ea7a089333bfd0d99f096ef797ceb390
12:03.66 INFO: First bad revision: 217394da44ac9b55df2f91bf639944ea47949b3d
12:03.66 INFO: Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c060bcf8ea7a089333bfd0d99f096ef797ceb390&tochange=217394da44ac9b55df2f91bf639944ea47949b3d

Flags: needinfo?(enndeakin)
Summary: Read mode doesn't work on login-protected pages → Reader mode doesn't work on login-protected pages

[Tracking Requested - why for this release]:

Ah. I can also reproduce this on: https://www.zeit.de/mobilitaet/2020-05/hyundai-nexo-wasserstoff-brennstoffzelle-elektroauto-automarkt-deutschland. Click on "Einverstanden" under "zeit.de mit Werbung".

It looks like the article promise that's supposed to be coming from the content that we parse when clicking the button is not available, so we fall back to loading the content from the URL, which is why things break. I don't know why that promise is now null in these cases (or maybe always, but we just don't notice on other content?).

Summary: Reader mode doesn't work on login-protected pages → Reader mode shows wrong content on some pages

(In reply to :Gijs (he/him) from comment #2)

I don't know why that promise is now null in these cases (or maybe always, but we just don't notice on other content?).

Seems to be always - you can reproduce with e.g. a BBC article, by modifying the article text to contain some bogus text ("haha this should show up in reader mode") and then pressing the reader mode button. In working builds, the text shows up. In broken builds, it does not.

Keywords: regression
Summary: Reader mode shows wrong content on some pages → Reader mode shows wrong content on some pages because it always re-fetches the article from the original URL

OK, so this is broken because actor A (say, for zeit.de ) sets this._articlePromise, but then actor B (for about:reader?url=...zeit.de...) gets used for the reader mode document, because it's a different document and origin, and when it checks it won't have the article promise. So the passing of that promise got broken.

I'm not really sure what the best solution here is. For non-fission, we won't process switch the child, so we could store the pending promises in a weakmap keyed on browsingcontexts that were in-process. But that won't work for fission. To be fission-compatible, we could potentially pass the information through the parent, but (a) we'd be passing content from one (potentially compromised) child process to another, which is a bit unfortunate, and (b) we'd be passing quite large strings around, which feels unfortunate. Hopefully Neil has some ideas.

So if I understand this, the existing loaded document is processed and a new document tree for reader mode is generated from that. Then that is inserted into the about reader document, and presumably doesn't contain any scripts or anything, since it is not the same domain as the source page? So in fission mode, the reader mode version would not be in the same process as the original.

So this worked before because the legacy actors were reused for the reader mode version? But would never have worked in fission?

Flags: needinfo?(enndeakin)

(In reply to Neil Deakin from comment #6)

So if I understand this, the existing loaded document is processed and a new document tree for reader mode is generated from that. Then that is inserted into the about reader document,

Yes.

and presumably doesn't contain any scripts or anything, since it is not the same domain as the source page?

Readability strips some of it itself, but we also run everything through the sanitizer. So no scripts (nor much else, e.g. unfortunately youtube embeds also don't work, and nobody has bothered teaching our sanitizer implementation about srcset, either, etc...).

So in fission mode, the reader mode version would not be in the same process as the original.

Yes.

So this worked before because the legacy actors were reused for the reader mode version? But would never have worked in fission?

I guess so, yes. It wasn't always an actor, it used to just be a frame script...

Anyway, what should we do about it, going forward? :-)

(In reply to Julien Cristau [:jcristau] from comment #5)

Should we back out bug 1591469 for 78?

This is definitely the simplest, but isn't a permanent solution.

Flags: needinfo?(enndeakin)

78 is now on beta. A straight backout conflicts with further changes to AboutReader.jsm though.

Assignee: nobody → gijskruitbosch+bugs
Severity: -- → S3
Status: NEW → ASSIGNED
Priority: -- → P1

I agree this should be P1 but I would really prefer not to own this. I've just come up with a backout patch that applies to beta...

Assignee: gijskruitbosch+bugs → nobody
Status: ASSIGNED → NEW
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED

Would it be feasible to have some mechanism that always reuses the same process for the reader mode version as the original page?

Flags: needinfo?(enndeakin) → needinfo?(gijskruitbosch+bugs)

(In reply to Neil Deakin from comment #11)

Would it be feasible to have some mechanism that always reuses the same process for the reader mode version as the original page?

It's the same process right now, it just won't be under Fission, I think - different origin, so different process. And we definitely do not want the about:reader privileged code running on the same origin as the web content. We could conceivably put the about:reader content in some kind of iframe inside the privileged frame, and then make the iframe same-origin with the original page, but that's a lot of work, has complications (like how to make the iframe appear + behave seamless) and would upset the current actor set-up anyway - plus we ideally, long-term, want to move the content into a sandboxed iframe (cf. bug 1204818) which due to the sandboxing would likely make it not same-origin again.

I see phabricator assigned me anyway, despite comment #10, so that's nice...

Flags: needinfo?(gijskruitbosch+bugs)

This issue seems to be breaking reader mode on getpocket.com

Last good revision: c060bcf8ea7a089333bfd0d99f096ef797ceb390
First bad revision: 217394da44ac9b55df2f91bf639944ea47949b3d
Pushlog:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=c060bcf8ea7a089333bfd0d99f096ef797ceb390&tochange=217394da44ac9b55df2f91bf639944ea47949b3d

  1. Go to https://getpocket.com/explore/item/a-jury-of-peers?utm_source=pocket-newtab
  2. Press F9 to toggle reader view

Results:

Failed to load article from page

There may have been a change to the getpocket.com site that is compounding this issue. I noticed when testing with mozregression, the first navigation to an article on getpocket.com on each profile would give me an 500 error page. A refesh would then show the article.

Also in my reproduction steps, if you exit reader mode, you also get a 500 error page.

Seems similar to bug 1633431.

So it sounds like long term we can't really avoid passing the data from the child process to the parent and then to the new reader mode process.

Comment on attachment 9153146 [details]
Bug 1640490 - Backed out changeset 217394da44ac (bug 1591469) for breaking about:reader in a lot of cases, r?neildeakin

Beta/Release Uplift Approval Request

  • User impact if declined: Reader mode doesn't work with paywalls or when article content is fetched dynamically
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: n/a
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): This is a backout of a frontend-only patch
  • String changes made/needed: none
Attachment #9153146 - Flags: approval-mozilla-beta?

I also have a mostly done patch that sends the parsed article data up to the parent and on to the reader mode process.

Comment on attachment 9153146 [details]
Bug 1640490 - Backed out changeset 217394da44ac (bug 1591469) for breaking about:reader in a lot of cases, r?neildeakin

backout to fix a regression with reader mode, approved for 78.0b3

Attachment #9153146 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

(In reply to Neil Deakin from comment #17)

I also have a mostly done patch that sends the parsed article data up to the parent and on to the reader mode process.

Alright, going to pass this on to you then. :-)

Assignee: gijskruitbosch+bugs → enndeakin

This allows the correct article to be displayed even when the reader mode ends up using a different process than the original page,

Pushed by neil@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6323c2dfd00f
cache the article data in the parent process then retrieve it in the reader mode process when entering reader mode, r=Gijs
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla79
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: