Closed Bug 1333147 Opened 3 years ago Closed 3 years ago

Clicking "Back" from a link in about:sync-log causes hang

Categories

(Core :: DOM: Security, defect, P1)

53 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla55
Tracking Status
firefox52 --- unaffected
firefox-esr52 --- unaffected
firefox53 + verified
firefox54 + verified
firefox55 --- verified

People

(Reporter: Fanolian+BMO, Assigned: ckerschb)

References

()

Details

(Keywords: regression, reproducible, Whiteboard: [domsecurity-active])

Attachments

(2 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1332589+++

Version 	54.0a1
Build ID 	20170123080351

STR:
0. Check if there is any files in [profile]\weave\logs. If not, put a text file in there for the next steps.
1. Restart browser. (so workaround is not triggered unexpectedly before testing)
2. Open about:sync-log. Click a file. (DO NOT click "Up to higher level directory" as it should have fixed by bug 1329032)
3. Once the file is loaded, click "Back".

Actual result:
The tab hangs. Browser Console shows:
> Security Error: Content at file:///[profile path]/weave/logs/ may not load or link to about:sync-log.

Alternative STR:
1. Restart browser.
2. Open about:sync-log.
3. Reload the tab. ("Security Error" appears in Browser Console)
4. Click a file. (Another instance of "Security Error", and the tab hangs)

=================
Workaround:
If I click "Up to higher level directory" first and navigate back to the logs, the tabs does not hang.
I think the workaround works like bug 1332595 comment 4: do something that works first, and it makes other errors go away.
Blocks: 1182569
Has STR: --- → yes
See Also: → 1332589, 1329032
[Tracking Requested - why for this release]:
Regression caused by Bug 1182569
I didn't intend to CC :mconley, how do I remove him from the CC list?
Sorry for the spam.

(I can only clone my own bug due to bug 1330316. I was not aware of the CC list. Sorry again.)
Is this specific to about:sync-log? What happens if you e.g. add a file:/// link to about:support somewhere (use the devtools inspector if there isn't one already) ?
Flags: needinfo?(Fanolian+bugzilla)
(In reply to :Gijs from comment #3)
> Is this specific to about:sync-log? What happens if you e.g. add a file:///
> link to about:support somewhere (use the devtools inspector if there isn't
> one already) ?

I hope I tested it right. I only tested some about: pages that I may use.

How I tested:
1. Change a link in some about: pages to file:/// linking to a log in [profile]/weave/logs.
2. (Single, left) click on the file:/// link. See what happens.
3. If the file loads, click Back and see what happens.

What I tested:
about:support
↳ WORKS (the file loads. Clicking Back does not hang the tab.)

about:memory
↳ WORKS

about:crashes
↳ WORKS

about:about
↳ WORKS

about:home (Change a link in the snippet)
↳ File does not attempt to load. No hangs. 1 "Security Error" message on the first click. Subsequent clicks creates 2 "Security Error" messages on each click.

about:newtab (Created a tile linking to file:///. This may be more relevant to bug 1332595.)
↳ 1 "Security Error" message on click. File attempts to load but fails. Tab hangs.
(In reply to Fanolian from comment #0)
> Workaround:
> If I click "Up to higher level directory" first and navigate back to the
> logs, the tabs does not hang.
> I think the workaround works like bug 1332595 comment 4: do something that
> works first, and it makes other errors go away.

"Navigate back" here means clicking "logs" in the folder tree.
If I click "Up to higher level directory" and then click Back (from Nightly's UI), the tab hangs just as what this bug is about.

In summary, in about:sync-log's landing page:
A) Click a file, then click "Back" from UI
↳ HANGS

B) Click "Up to higher level directory" (it's just another file:///), then click "Back" from UI
↳ HANGS

C [the workaround]) Click "Up to higher level directory", click "logs" in the folder tree
↳ No more hangs nor "Security Error" messages even if I repeat A) or B) again.
Flags: needinfo?(Fanolian+bugzilla)
Yeah, about:home isn't privileged so it can't link to file:/// . As you noted, there's a separate bug for about:newtab.

Given that it works in other cases, I think this is caused by about:sync-log being 'special' - it's really just a 'forward' to a file:/// document (folder listing). I wonder if we should change how it's implemented, or if something (what?) is getting confused about its principal here.

One more question: can you still reproduce the hanging with about:sync-log if you set browser.tabs.remote.separateFileUriProcess to false?
Flags: needinfo?(Fanolian+bugzilla)
(In reply to :Gijs from comment #6)
> One more question: can you still reproduce the hanging with about:sync-log
> if you set browser.tabs.remote.separateFileUriProcess to false?

I can still reproduce the hanging with browser.tabs.remote.separateFileUriProcess set to false.
Flags: needinfo?(Fanolian+bugzilla)
We can repdrocude. Potentially this gets fixed by Bug 1332595. I will verify as soon as I can.
Assignee: nobody → ckerschb
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Priority: -- → P1
Whiteboard: [domsecurity-active]
(In reply to Christoph Kerschbaumer [:ckerschb] from comment #8)
> We can repdrocude. Potentially this gets fixed by Bug 1332595. I will verify
> as soon as I can.

So, this bug definitely does not get fixed by Bug 1332595. Interestingly the problem does not occur in regular mode (not e10s mode) which makes me think that we have a similar problem as in Bug 1331686 where we facing problems serializing the principal between child and parent. This needs more investigation obviously.
Finally I am getting around to testing/fixing the problem here. The important arguments within the contentSecurityManager are:

doContentSecurityCheck {
  channelURI: about:sync-log
  loadingPrincipal: nullptr
  triggeringPrincipal: file:///home/ckerschb/.mozilla/firefox/39vrr6v8.weave/weave/logs/
  principalToInherit: NullPrincipal
  contentPolicyType: 6
  securityMode: SEC_ALLOW_CROSS_ORIGIN_DATA_IS_NULL, 
  initalSecurityChecksDone: no
  enforceSecurity: no
}
Again, in non e10s mode this works just fine, but it's interesting that the contentSecurityManager is not consulted when clicking 'back' in non-e10s mode. Probably we don't create a new channel but load that page somewhere from the cache.
It's possible that in non-e10s mode we bfcache things, yes.

But... Clicking the back button should not have the currently-loaded thing as the _triggering_ principal.  The triggering principal for a history load should come from the SHEntry, right?  What exactly happens there?
Assuming 54 is also affecting, tracking this. Christoph are you still working on this?
Flags: needinfo?(ckerschb)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #13)
> Assuming 54 is also affecting, tracking this. Christoph are you still
> working on this?

Yeah - I am on it.
Flags: needinfo?(ckerschb)
Attached file stacktrace.txt
(In reply to Boris Zbarsky [:bz] (still a bit busy) from comment #12)
> It's possible that in non-e10s mode we bfcache things, yes.
> 
> But... Clicking the back button should not have the currently-loaded thing
> as the _triggering_ principal.  The triggering principal for a history load
> should come from the SHEntry, right?  What exactly happens there?

So, it's hard to find the root cause here. So far, I got a stacktrace from where we actually call nsSHEntry::SetURI("about:sync-log"). Since we don't call nsSHEntry::Create() I figured that must be a good starting point to figure out where the SetURI() call is coming from. Please see attached stacktrace. Unfrotunately |print DumpJSStack();| does not provide useful information.

However, in the stacktrace we see |SessionStore:restoreHistory|, so I traced down the calls around restoreHistory. Even though I haven't found the exact problem I realized that we collect tabState which we then JSON.stringify [2] and pass along to session restore. When working on Bug 1307736 we added triggeringprincipal_base64 to all of the tests [1]. So I assume we have to add a serialized triggeringPrincipal for those entries as well, probably somewhere around [2].

This needs further investigation (but has to wait till tomorrow)!

[0] https://dxr.mozilla.org/mozilla-central/source/browser/components/sessionstore/SessionStore.jsm#2249
[1] https://dxr.mozilla.org/mozilla-central/search?q=triggeringprincipal_base64&redirect=false
[2] https://dxr.mozilla.org/mozilla-central/source/browser/components/sessionstore/TabState.jsm#129
Attached patch bug_1333147_weave_bug.patch (obsolete) — Splinter Review
Boris, I finally figured the problem, but I don't know how to fix it properly. The good news - it seems all the serialization/deserialization I was worried about in comment 15 works correctly.

The problem is within Weave.js, where we actually set an owner on the channel (see patch), which then causes the history entry to be created using a triggeringPrincipal of a file:// instead of a systemPrincipal [1].

When I set the owner to null within Weave.js, I create the history entry using a triggeringPrincipal of system and a principalToInherit of NullPrincipal for about:sync-log, which is what we want I suppose.

In turn, that change causes Security checks to fail (I marked the two checks that fail within the patch) and which is also why the owner on the channel was set in the first place within Bug 781323 [2].

How do you think we should proceed here?

[1] https://dxr.mozilla.org/mozilla-central/source/docshell/base/nsDocShell.cpp#12360
[2] https://bugzilla.mozilla.org/show_bug.cgi?id=781323#c3
Flags: needinfo?(bzbarsky)
Hm...  So that docshell code looks conceptually wrong to me.  The channel owner is something that gets set to affect the principal of the thing being loaded from the channel.  It's not the principal of the thing that triggered the load.

This code initially got added for the "make various things that inherit their channel owner end up with the same channel owner" thing, and back when all we stored on the SHEntry was that one owner this kinda sorta made sense.  But now that we have explicit triggeringPrincipal and principalToInherit on the channel, I don't think we should be looking at the channel owner here anymore.  That is, I think the changes we made in bug 1286472 weren't quite right.
Flags: needinfo?(bzbarsky)
Boris, I saw you are not accepting review requests, but since you already pointed out the problem I was hoping you could review the patch anyway.

(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #17)
> Hm...  So that docshell code looks conceptually wrong to me.  The channel
> owner is something that gets set to affect the principal of the thing being
> loaded from the channel.  It's not the principal of the thing that triggered
> the load.

I agree and here is the patch for it. I manually verified that it fixes the problem described in comment 0 (click .txt file within about:sync-log and then click back so that about:sync-log loads again). I also ran tests locally - seems to work, let's make sure we are not missing something obvious:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=64a7751ebe93233d8dc40cd995cd0a36e19d2368
Attachment #8844886 - Attachment is obsolete: true
Flags: needinfo?(bzbarsky)
Comment on attachment 8845952 [details] [diff] [review]
bug_1333147_do_not_check_owner_for_history_loads.patch

r=me
Flags: needinfo?(bzbarsky)
Attachment #8845952 - Flags: review+
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/712bc9e923c7
Do not use owner as triggeringPrincipal when creating session history entry. r=bz
Keywords: checkin-needed
Comment on attachment 8845952 [details] [diff] [review]
bug_1333147_do_not_check_owner_for_history_loads.patch

Approval Request Comment
The problem we are facing was originally introduced by Bug 1286472 but did have not effect until we have converted docshell to perform additional security checks within asyncOpen2() which landed within Bug 1182569. We should uplift that changeset within this bug since it might impact other history loads as well. Please note it's still a corner case, regular history loads work as expected.

[Feature/Bug causing the regression]:
Bug 1182569 - Use asyncOpen2() for docshell loads (nsJSProtocolHandler and nsURILoader)

[User impact if declined]:
We are only aware of this particular problem but users might have problems loading from history in certain edge cases. In particular when clicking 'back' like in this bug reported.

[Is this code covered by automated tests?]:
No, manual verification only. It's quite complicated to write an automated test for this scenario. Please note that we have decent test coverage for regular history loads.

[Has the fix been verified in Nightly?]:
Not yet.

[Needs manual test from QE? If yes, steps to reproduce]:
Yes please. One should use the STRs provided in Comment 0.

[List of other uplifts needed for the feature/fix]:
---

[Is the change risky?]:
No, since we have a loadinfo on all channels, we should rely on the triggeringPrincipal rather then the owner of a channel. Even the code removed had a comment about that.

[Why is the change risky/not risky?]:
See previous comment.

[String changes made/needed]:
No.
Attachment #8845952 - Flags: approval-mozilla-beta?
Attachment #8845952 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/712bc9e923c7
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Hi Brindusa, could you help find someone to verify if this issue was fixed as expected on a latest Nightly build? Thanks!
Flags: needinfo?(brindusa.tot)
Verified this issue and was not reproducible on latest Nightly build 55.0a1, build ID 20170315030215 on Windows 10 x64 and Mac OS X 10.12.

Note that from the STR described on Comment 0, on older Nightly build 54.0a1, build ID 20170210030206, I was able to reproduce only the second scenario described only on Windows and not on Mac OS X.

Fanolian, could you please check if this is fixed on your end?
Flags: needinfo?(brindusa.tot) → needinfo?(Fanolian+bugzilla)
(In reply to Brindusa Tot[:brindusat] from comment #24)
> Fanolian, could you please check if this is fixed on your end?

The bug is fixed for both STRs. Thank you.
Status: RESOLVED → VERIFIED
Flags: needinfo?(Fanolian+bugzilla)
Based on comments 24 and 25, update the status-firefox55 to verified.
Comment on attachment 8845952 [details] [diff] [review]
bug_1333147_do_not_check_owner_for_history_loads.patch

Fix a hang issue when clicking "Back" from a link in about:sync-log and was verified. Aurora54+ and Beta53+.
Attachment #8845952 - Flags: approval-mozilla-beta?
Attachment #8845952 - Flags: approval-mozilla-beta+
Attachment #8845952 - Flags: approval-mozilla-aurora?
Attachment #8845952 - Flags: approval-mozilla-aurora+
Let's make sure this works as intended on 53 as well.
Flags: qe-verify+
Reproduced the initial issue using old Nightly (2016-01-23), verified that the issue is fixed using latest Developer Edition 54.0a2 and Firefox 53 beta 5 on Windows 7 64bit, macOS 10.12.2 and Ubuntu 16.04 32bit.
You need to log in before you can comment on or make changes to this bug.