Closed Bug 1419007 Opened 3 years ago Closed 3 years ago

browser.documentURI goes out of sync when doing error page loads

Categories

(Core :: DOM: Security, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla59
Tracking Status
firefox-esr52 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- verified
firefox60 --- verified

People

(Reporter: jkt, Assigned: jkt)

References

(Blocks 1 open bug)

Details

(Keywords: regression, regressionwindow-wanted, sec-low, Whiteboard: [domsecurity-backlog1][adv-main59-])

Attachments

(1 file, 7 obsolete files)

Sometimes for error loads we see that the documentURI isn't updated for refreshIdentityPopup code.

STR:
1. Set "security.insecure_connection_icon.enabled" true from Bug 1310447
2. Visit badssl.com
2. Click "dh480"

Expected:

No padlock is present

Actual:

Broken padlock is shown


On reload of the page the padlock isn't present.

---

As Johann pointed out on IRC there is a special case in DocShell that is calling FireOnLocationChange when an error occurs to change the location: https://searchfox.org/mozilla-central/source/docshell/base/nsDocShell.cpp#9321

This in turn calls refreshIdentityPopup where the documentURI hasn't updated yet. This means the code can't compare in all cases if this page load is an "unknownIdentity". So when Bug 1310447 prefs are enabled sometimes you will notice that the padlock is shown for error pages when it should be unknown.

This is also similar to Bug 1170488 where the documentURI gets out of sync when loading pages.
This is a WIP patch :ckerschb whilst Bug 1310447 is still outstanding, I can add a test once that lands.

It looks like errorOnLocationChangeNeeded and onLocationChangeNeeded aren't needed at the same time, would it make sense to store the channel, uri and flags in a variable and use the same function call?

So something like:

    if (failedURI) {
      bool errorOnLocationChangeNeeded = OnNewURI(
        failedURI, failedChannel, triggeringPrincipal,
        nullptr, mLoadType, false, false, false);
      if (errorOnLocationChangeNeeded) {
        onLocationChangeNeeded = true;
        locationChangeFlag = LOCATION_CHANGE_ERROR_PAGE;
        locationChangeRequest = failedChannel;
        locationChangeURI = failedURI;

      }
    }
...

  if (onLocationChangeNeeded) {
    FireOnLocationChange(this, locationChangeRequest, locationChangeURI, locationChangeFlag);
  }
Comment on attachment 8930436 [details]
Bug 1419007 - ensure errors FireOnLocationChange after documentURI has changed in docshell.

https://reviewboard.mozilla.org/r/201598/#review206906

I think that makes sense to me, so I'am f+ing this patch. Ultimately we should wait for the changes of Bug1310447 and land this together with some automated test. Please note that I am not a docshell peer, hence someone else ultimately needs  to sign off on that change.
Attachment #8930436 - Flags: review?(ckerschb)
(In reply to Jonathan Kingston [:jkt] from comment #3)
> This is a WIP patch :ckerschb whilst Bug 1310447 is still outstanding, I can
> add a test once that lands.
> 
> It looks like errorOnLocationChangeNeeded and onLocationChangeNeeded aren't
> needed at the same time, would it make sense to store the channel, uri and
> flags in a variable and use the same function call?

Potentially yes, but then again I am not a docshell peer :-(
Flags: needinfo?(ckerschb)
Priority: -- → P3
Whiteboard: [domsecurity-backlog1]
So, uh, this is really weird, but mozregression confidently gives me bug 1414745 as regressor (I can't reproduce it in beta), though that landed only a couple of hours ago... I'm really unsure what to make of this. It seems like this is a regression in any case. Can someone else try to get a regression window? If I enter the date when the bug was created as "bad" date I can't reproduce in mozregression...

Really strange.
It seems like contentPrincipal is also affected by this bug, which makes the neterror pages inherit the contentPrincipal from the previous page, which in turn means it's possible for them to get chrome privileges when coming from about:privatebrowsing (see bug 1421647). I can reproduce that all the way to release. I think that's a point where I'll make this bug non-public, to avoid 0-daying ourselves (I can't see anything actively exploitable here but who knows what else we might uncover).
Group: core-security
(In reply to Johann Hofmann [:johannh] from comment #6)
> So, uh, this is really weird, but mozregression confidently gives me bug
> 1414745 as regressor (I can't reproduce it in beta), though that landed only
> a couple of hours ago... I'm really unsure what to make of this. It seems
> like this is a regression in any case. Can someone else try to get a
> regression window? If I enter the date when the bug was created as "bad"
> date I can't reproduce in mozregression...
> 
> Really strange.

If it was intermittent, bug 1414745 probably has made it 100% reproducible.

onStateChange is also a place where documentURI gets updated:
https://searchfox.org/mozilla-central/rev/9f3bd430c2b132c86c46126a0548661de876799a/toolkit/modules/RemoteWebProgress.jsm#238

The buggy counters in nsBrowserStatusFilter::OnStateChange sometimes filtered out non-network requests, sometimes not. I took an aggressive approach to always filtered out non-network requests in bug 1414745 and removed the counters.

Fix onLocationChange sounds the correct way to me.
My regression window is the same so no idea how I managed to get that 8 days ago!

Perhaps however it is intermittent as a race condition however I'm not able to replicate in stable at all.

I'm just in the process of double checking the patch for the issues Johann described. I can neaten it up as I mentioned too.

Do we think we should be uplifting this too?
gBrowser.contentPrincipal is fixed in my patch as :johannh mentioned this is trackable all the way to stable! However I can't replicate the gBrowser.selectedBrowser.documentURI.spec test I was running I think is the regression caused by bug 1414745 to at least be consistent.

I haven't been able to replicate the URI issue in any build before that.
Group: core-security → dom-core-security
The contentViewer (and the document) for the error page was created here:
https://searchfox.org/mozilla-central/rev/9f3bd430c2b132c86c46126a0548661de876799a/docshell/base/nsDocShell.cpp#9314-9316

But it wasn't embedded to docshell until
https://searchfox.org/mozilla-central/rev/9f3bd430c2b132c86c46126a0548661de876799a/docshell/base/nsDocShell.cpp#9470

So when the LOCATION_CHANGE_ERROR_PAGE fires, nsDocShell::mContentViewer was still the previous page, so as nsGlobalWindowOuter::mInnerWindow. Anyone trying to access the inner window or document during that location change would get wrong window / document.
(In reply to Samael Wang [:freesamael] from comment #11)
> The contentViewer (and the document) for the error page was created here:
> https://searchfox.org/mozilla-central/rev/
> 9f3bd430c2b132c86c46126a0548661de876799a/docshell/base/nsDocShell.cpp#9314-
> 9316
> 
> But it wasn't embedded to docshell until
> https://searchfox.org/mozilla-central/rev/
> 9f3bd430c2b132c86c46126a0548661de876799a/docshell/base/nsDocShell.cpp#9470
> 
> So when the LOCATION_CHANGE_ERROR_PAGE fires, nsDocShell::mContentViewer was
> still the previous page, so as nsGlobalWindowOuter::mInnerWindow. Anyone
> trying to access the inner window or document during that location change
> would get wrong window / document.

The obvious question I have after reading this comment is: is there a straightforward way to get the correct document at this point, or do we need to change docshell?
Flags: needinfo?(sawang)
> The obvious question I have after reading this comment is: is there a straightforward way to get the correct document at this point, or do we need to change docshell?

My solution is to place:
        FireOnLocationChange(this, failedChannel, failedURI,
                             LOCATION_CHANGE_ERROR_PAGE);

After the code that is mentioned there, in my latest patch I also only fire one location change. From my testing the other isn't needed and might race however I will wait for :freesamael's review.

I'm just writing up a test or two but having some difficulty getting a clean test.
Attached patch bug-1419007.patch (obsolete) — Splinter Review
Attachment #8930436 - Attachment is obsolete: true
Attachment #8933276 - Flags: review?(sawang)
(In reply to :Gijs from comment #12)
> The obvious question I have after reading this comment is: is there a
> straightforward way to get the correct document at this point, or do we need
> to change docshell?

The reason I walked tho this was mainly to verify :jkt's patch, so yup changing locationchange in dochsell looks the right way to me.
Flags: needinfo?(sawang)
Comment on attachment 8933276 [details] [diff] [review]
bug-1419007.patch

Review of attachment 8933276 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the work! Please check browser_check_identity_state.js does work in e10s before landing, I think you need ContentTask.spawn.

::: browser/base/content/test/siteIdentity/browser_check_identity_state.js
@@ +238,5 @@
> +  await SpecialPowers.pushPrefEnv({set: [[INSECURE_ICON_PREF, secureCheck]]});
> +  let newTab = await loadNewTab("http://example.com/" + DUMMY);
> +
> +  let promise = BrowserTestUtils.waitForErrorPage(gBrowser.selectedBrowser);
> +  content.document.getElementById("no-cert").click();

Does this test run in e10s? I'm expecting that you would need `ContentTask.spawn` to access `content.document`.

@@ +242,5 @@
> +  content.document.getElementById("no-cert").click();
> +  await promise;
> +
> +  is(getIdentityMode(), "unknownIdentity", "Identity should be unknown");
> +  is(getConnectionState(), "not-secure", "Connection should be file");

Should we also verify that newTab.linkedBrowser.documentURI is nocert.example.com?

::: docshell/base/nsDocShell.cpp
@@ +9365,5 @@
>    // OnLoadingSite(), but don't fire OnLocationChange()
>    // notifications before we've called Embed(). See bug 284993.
>    mURIResultedInDocument = true;
> +  bool errorOnLocationChangeNeeded = false;
> +  // Make sure we have a URI to set currentURI.

Probably better to keep this comment above the NS_GetFinalChannelURI call (that's where it was). It's a bit confusing reading this comment right here.
Attachment #8933276 - Flags: review?(sawang) → review+
Attached patch bug-1419007.patch (obsolete) — Splinter Review
Still building this however I think this satisfies all the review comments. Thanks!
Attachment #8933276 - Attachment is obsolete: true
Attached patch bug-1419007.patch (obsolete) — Splinter Review
This seems to work, should we get a QA to verify it too?
Attachment #8933391 - Attachment is obsolete: true
Attachment #8933407 - Flags: review?(sawang)
Is this something that you think should be verified first? We can't run this on try so I'm always super sceptical.

Thanks!
Flags: needinfo?(mwobensmith)
Having the security indicators out of sync can be the basis for sec-high spoofing bugs.
Keywords: sec-high
(In reply to Jonathan Kingston [:jkt] from comment #19)
> Is this something that you think should be verified first? We can't run this
> on try so I'm always super sceptical.
> 
> Thanks!

We usually verify fixed bugs after they've landed. I assume you verified it was fixed, by using your instructions in comment 0?

That said, this is an easy bug to verify, so I'm setting the flags.
Flags: needinfo?(mwobensmith) → qe-verify+
Yeah I verified it, I was just concerned about failing tests on some crazy platform. Not that QE fixes that I guess.
Comment on attachment 8933407 [details] [diff] [review]
bug-1419007.patch

Review of attachment 8933407 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/base/content/test/siteIdentity/browser_check_identity_state.js
@@ +247,5 @@
> +    is(content.window.location.href, "https://nocert.example.com/", "Should be the cert error URL");
> +  });
> +
> +
> +  is(newTab.linkedBrowser.documentURI.spec.startsWith("about:certerror?"), true, "Should be an about:certerror");

Oh, right, documentURI would be the error page. Thanks.
Attachment #8933407 - Flags: review?(sawang) → review+
Comment on attachment 8933407 [details] [diff] [review]
bug-1419007.patch

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

The risk here is that the wrong content principal is used. Another exploit would be needed to make this usable.
The user could be tricked into the new indicator however that isn't a security issue currently.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No, there is a test for the contentPrincipal is correct however no explanation as to why.

Which older supported branches are affected by this flaw?

All branches including stable.

If not all supported branches, which bug introduced the flaw?

I wasn't able to find a regression, it tracks back further than 57.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Difficulty of the backports should be low, it impacts only one function in DocShell that doesn't appear to change very often.

How likely is this patch to cause regressions; how much testing does it need?

There is some risk of regressions regarding:
- History state with cert errors (back and forward or the url bar might not show the usual states)
- Pages may trigger endless error loads, whilst building this patch I was able to cause the window to crash and spawn new windows multiple times until the browser crashed.

I would feel a lot more comfortable if others could run our entire docshell test suite too.
Attachment #8933407 - Flags: sec-approval?
>  If not all supported branches, which bug introduced the flaw? 
> I wasn't able to find a regression, it tracks back further than 57.

What about ESR? That's most of the agenda of that question. 

> Another exploit would be needed to make this usable.

I guess we can call this moderate, then
Keywords: sec-highsec-moderate
> What about ESR? That's most of the agenda of that question. 

Sorry yes it is impacted also.

STR:
1. badssl.com
2. Click dh480
3. in toolbox: gBrowser.contentPrincipal.origin

Expected:
about:neterror?e=nssFailure2&u=https%3A//dh480.badssl.com/&c=UTF-8&f=regular&d=An%20error%20occurred%20during%20a%20connection%20to%20dh480.badssl.com.%0A%0ASSL%20received%20a%20weak%20ephemeral%20Diffie-Hellman%20key%20in%20Server%20Key%20Exchange%20handshake%20message.%0A%0AError%20code%3A%20%3Ca%20id%3D%22errorCode%22%20title%3D%22SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY%22%3ESSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY%3C/a%3E%0A

Actual:
https://badssl.com
Comment on attachment 8933407 [details] [diff] [review]
bug-1419007.patch

With this being a sec-moderate now, we don't need sec-approval for it. This can just go into trunk.
Attachment #8933407 - Flags: sec-approval?
Attached patch bug-1419007.patch (obsolete) — Splinter Review
So it looks like "NS_ENSURE_SUCCESS(Embed(viewer, "", nullptr), NS_ERROR_FAILURE);" breaks for content loads that involve safebrowsing. Instead I moved my error load further up and pushed to try as we aren't a high security patch now, it seems like everything else is working still.


https://treeherder.mozilla.org/#/jobs?repo=try&revision=de75f712c8e807e9d4ffda1d06946a22b14577cc

(not sure why windows is bust though, I assume it's unrelated).
Flags: needinfo?(jkt)
Attachment #8934157 - Flags: review?(sawang)
Ok I take that back it looks like a regression Christmas tree.
Looking back into other solutions.
The failing test in the previous backout in Comment 29 looks like the location changes I made in docshell break this safe browsing test: toolkit/components/url-classifier/tests/mochitest/test_threathit_report.html

https://treeherder.mozilla.org/logviewer.html#?job_id=149394899&repo=mozilla-inbound&lineNumber=13523

The response from "utils.makeThreatHitReport" in the test is different from what it was before.

My hunch is that it might only be the test is wrong and the util is expecting a broken viewer / different URL? See Comment 11 as this same line is the same line that caused this test to fail:
> NS_ENSURE_SUCCESS(Embed(viewer, "", nullptr), NS_ERROR_FAILURE);
Francois do you have any insight into Comment 32?
Flags: needinfo?(francois)
The test is using the makeThreatHitReport() function:

https://searchfox.org/mozilla-central/rev/ba2b0cf4d16711d37d4bf4d267b187c9a27f6638/toolkit/components/url-classifier/tests/mochitest/test_threathit_report.html#204-218

and comparing its output with the call (to the same underlying function) that will be made in platform code: 

https://searchfox.org/mozilla-central/rev/ba2b0cf4d16711d37d4bf4d267b187c9a27f6638/toolkit/components/url-classifier/nsUrlClassifierDBService.cpp#2089

My guess is that the channel used in the test (aRequest) is not the same as the channel used inside the platform code.
Flags: needinfo?(francois)
(In reply to Jonathan Kingston [:jkt] from comment #32)
> The failing test in the previous backout in Comment 29 looks like the
> location changes I made in docshell break this safe browsing test:
> toolkit/components/url-classifier/tests/mochitest/test_threathit_report.html
> 
> https://treeherder.mozilla.org/logviewer.html#?job_id=149394899&repo=mozilla-
> inbound&lineNumber=13523
> 
> The response from "utils.makeThreatHitReport" in the test is different from
> what it was before.
> 
> My hunch is that it might only be the test is wrong and the util is
> expecting a broken viewer / different URL? See Comment 11 as this same line
> is the same line that caused this test to fail:
> > NS_ENSURE_SUCCESS(Embed(viewer, "", nullptr), NS_ERROR_FAILURE);

I am not sure what's going on, fwiw, in the test, we collect expected and actual data on child before canceling channel on child (assign mFailedChannel then show error page).
SendThreatHitReport
https://searchfox.org/mozilla-central/rev/ba2b0cf4d16711d37d4bf4d267b187c9a27f6638/netwerk/base/nsChannelClassifier.cpp#1163
Notify SecurityChange
https://searchfox.org/mozilla-central/rev/ba2b0cf4d16711d37d4bf4d267b187c9a27f6638/netwerk/base/nsChannelClassifier.cpp#1157
https://searchfox.org/mozilla-central/rev/ba2b0cf4d16711d37d4bf4d267b187c9a27f6638/netwerk/base/nsChannelClassifier.cpp#871
I changed the AddThreatSourceFromChannel to use GetOriginalURI instead of GetURI and the test passed. My hunch is that these methods are getting the about page instead of the intended page now. I don't think this is the right solution though as this will be the URL of the start of a redirect too. I'm not sure if the channel has access to the window location as this would be the correct path to check.

We might want to make a follow up to make an assert in these functions blocking about and chrome protocols on these methods.
(In reply to Jonathan Kingston [:jkt] from comment #36)
> I changed the AddThreatSourceFromChannel to use GetOriginalURI instead of
> GetURI and the test passed. My hunch is that these methods are getting the
> about page instead of the intended page now. I don't think this is the right
> solution though as this will be the URL of the start of a redirect too. I'm
> not sure if the channel has access to the window location as this would be
> the correct path to check.
> 
> We might want to make a follow up to make an assert in these functions
> blocking about and chrome protocols on these methods.

You could get the docshell off the channel via the window corresponding to the load, and from the docshell get the actual failed channel, whose URI should be correct?

cf. https://dxr.mozilla.org/mozilla-central/source/dom/base/ThirdPartyUtil.cpp#255 on how to get to the window, though note that we don't necessarily want the top window here, unlike (apparently?) https://dxr.mozilla.org/mozilla-central/source/toolkit/components/url-classifier/nsUrlClassifierUtils.cpp#604-611
The AddThreatSourceFromChannel method is called 4 times within the test, on the 4th time through for the "top level tab_url" I was getting: "file:///{repo-location}/browser/base/content/blockedSite.xhtml" instead of "" for GetSpecWithoutSensitiveData.

Using NS_GetFinalChannelURI(aChannel, getter_AddRefs(uri)) seems to work I will make the change to the patch but it will need more reviewing.

Thanks all.
Attached patch bug-1419007-a.patch (obsolete) — Splinter Review
So this was actually fixed by Bug 1421803 which I spent time stepping through and it appears the test was behaving differently with my patch however doesn't have this race condition with francois change.
Attachment #8934157 - Attachment is obsolete: true
Attachment #8934157 - Flags: review?(sawang)
Ok I spoke too soon, the patch mentioned did help in making the code for threat hit work however it was actually the stray GetFinalChannelURI that I left in the code that made it pass again.

I'm going to put back in the code I had that behaved how Gijs mentioned in Comment 37, I'm not sure if that will work still however I think that is worth trying.
Attachment #8935137 - Attachment is obsolete: true
Attachment #8936470 - Flags: review?(sawang)
Attachment #8936470 - Flags: review?(francois)
Comment on attachment 8936470 [details] [diff] [review]
bug-1419007-c.patch

Review of attachment 8936470 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like the docshell part isn't changed. r+ to me.
Attachment #8936470 - Flags: review?(sawang) → review+
Comment on attachment 8936470 [details] [diff] [review]
bug-1419007-c.patch

Review of attachment 8936470 [details] [diff] [review]:
-----------------------------------------------------------------

The URL Classifier changes look fine to me, assuming that NS_GetFinalChannelURI() maps to the page URL instead of about:blocked.
Attachment #8936470 - Flags: review?(francois) → review+
This was already landed and merged. Fun.

https://hg.mozilla.org/mozilla-central/rev/0fc5c1fe3fea
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Keywords: checkin-needed
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
:RyanVM it was a backout in: https://bugzilla.mozilla.org/show_bug.cgi?id=1419007#c29 So the latest changes to check-in are to fix that.

Is that not the case?
Flags: needinfo?(ryanvm)
No, it was pushed by Aryx today and later merged to m-c. Note the timestamp of the push.
Flags: needinfo?(ryanvm)
Ah I see, thank you! (Back to sleep I go :P)
Do we want to uplift this to 58?
Flags: needinfo?(jkt)
Comment on attachment 8936470 [details] [diff] [review]
bug-1419007-c.patch

Approval Request Comment
[Feature/Bug causing the regression]: Unknown, it's present in 52 even.
[User impact if declined]: Risk of security escalation bugs from the wrong principal.
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: Somewhat
[Why is the change risky/not risky?]: Given the change in the race condition, there might be other code depending on this ordering like we have seen from the threat hit report.
[String changes made/needed]: N/A
Flags: needinfo?(jkt)
Attachment #8936470 - Flags: approval-mozilla-beta?
Comment on attachment 8936470 [details] [diff] [review]
bug-1419007-c.patch

let's leave this to 59.
Attachment #8936470 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Group: dom-core-security → core-security-release
Sounds like this is wontfix for ESR52 too, but feel free to set it back to affected and nominate (presumably for 52.7 in order to ship alongside Fx59) if you feel strongly otherwise.
I tried to reproduce the bug using the steps from comment 0 on an older version of Nightly (2017-11-20) on WIndows 10 x64, macOS 10.13 and Ubuntu 16.04 x64, but I couldn't. I always got the same result: No padlock is present.

Can you please give me some additional information in order for me to reproduce the bug?
Flags: needinfo?(jkt)
That version likely didn't have the patch to support the broken padlock.

Try Comment 26 instead.
Flags: needinfo?(jkt)
I managed to reproduce the bug using the steps from comment 26 on Windows 10 x64 on an older version of Nightly (2017-11-20). 
I retested everything using the latest Nightly 60.0a1 and beta 59.0b6 on Windows 10 x64, Ubuntu 16.04 x64 and macOS 10.13. The bug is not reproducing anymore. Everytime the error: 

"about:neterror?e=nssFailure2&u=https%3A//dh480.badssl.com/&c=UTF-8&f=regular&d=An%20error%20occurred%20during%20a%20connection%20to%20dh480.badssl.com.%0A%0ASSL%20received%20a%20weak%20ephemeral%20Diffie-Hellman%20key%20in%20Server%20Key%20Exchange%20handshake%20message.%0A%0AError%20code%3A%20%3Ca%20id%3D%22errorCode%22%20title%3D%22SSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY%22%3ESSL_ERROR_WEAK_SERVER_EPHEMERAL_DH_KEY%3C/a%3E%0A"

was displayed in the Browser Toolbox.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Whiteboard: [domsecurity-backlog1] → [domsecurity-backlog1][adv-main59+]
Whiteboard: [domsecurity-backlog1][adv-main59+] → [domsecurity-backlog1][adv-main59-]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.