Closed Bug 1053456 Opened 5 years ago Closed 5 years ago

[e10s] "Untrusted Connection" page's "Add Exception" button does nothing or crashes

Categories

(Firefox :: Untriaged, defect, P2, critical)

x86_64
All
defect
Points:
8

Tracking

()

VERIFIED FIXED
Firefox 35
Iteration:
35.3
Tracking Status
e10s m4+ ---
firefox37 --- verified

People

(Reporter: mail, Assigned: Felipe)

References

Details

(Keywords: crash, crashreportid)

Crash Data

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:31.0) Gecko/20100101 Firefox/31.0 (Beta/Release)
Build ID: 20140715214327

Steps to reproduce:

1. Open new e10s window
2. Load https://svn.resiprocate.org/rep/resiprocate/
3. See the "Untrusted Connection" page
4. Click "I Understand the Risks" and "Add Exception" button


Actual results:

nothing (button is clickable, but no response)


Expected results:

should open a new dialog to import the site certificate

This behaves exactly the same as 989875.
Attached file crash.log
Thanks for filing this, Yogu. What I'm seeing is the dialog will open, but the confirm exception button results in a crash (see attached). Here's the js stack at the time:

0 addException() ["chrome://pippki/content/exceptionDialog.js":342]
    this = [object ChromeWindow]
1 anonymous(event = [object Event]) ["chrome://global/content/bindings/dialog.xml line 370 > Function":1]
    this = [object ChromeWindow]
2 _fireButtonEvent(aDlgType = "extra1") ["chrome://global/content/bindings/dialog.xml":371]
    this = [object XULElement]
3 _doButtonCommand(aDlgType = "extra1") ["chrome://global/content/bindings/dialog.xml":339]
    this = [object XULElement]
4 _handleButtonCommand(aEvent = [object XULCommandEvent]) ["chrome://global/content/bindings/dialog.xml":327]
    this = [object XULElement]
I can confirm that some error occurs, because the number in the Developer Toolbar increments. But how do you get the stacktrace? The developer tools do not work for me with remote tabs, should they?
tracking-e10s: --- → ?
I'm seeing this on windows as well.

https://crash-stats.mozilla.com/report/index/3d03f3c9-2e52-4085-ae1f-76e7c2140814
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Interesting, sometimes the 'add security exception' does nothing (internal failure of some sort?) and sometimes the browser crashes.

The 'Add Exception' button works for me though every time.

Maybe we have two bugs here, I'll try to get a linux build going to see.
Assignee: nobody → jmathies
Blocks: old-e10s-m2
Priority: -- → P2
Flags: firefox-backlog+
Flags: firefox-backlog+
I have this same result happening on Ubuntu 14.04 x64 (with Ubuntu's packages) with both Firefox 31 and 32. Creating a new, fresh profile helps (the issue disappears) for a few days. After approximately a week it comes back for the websites where I have selected that Firefox should remember the exception. For the sites I did not add the exception this does not seem to reproduce when it starts happening again.

When this happens, I can work around it by adding the address mentioned in the certificate to the hosts file in the system (if you can place the CommonName contents there in a valid manner, that is).
Flags: firefox-backlog+
Moving old M2 P2 bugs to M4.
Move old M2's low-priority bugs to M6 milestone.
oops: old M2 P2 bugs should have been moved to new M4 milestone, not M6.
Points: --- → 8
Flags: qe-verify+
2014-09-18-03-02-02-mozilla-central-firefox-35.0a1.ru.linux-x86_64 crashes consistently with the same signature as in comment 3:

bp-16c9ef2a-1e09-4ea5-9256-f9e882140918	09/18/2014	10:30 PM
bp-be46a168-7226-4029-9453-997fb2140918	09/18/2014	10:29 PM
bp-14e82577-839a-46bc-a6e5-becab2140918	09/18/2014	10:24 PM
Severity: normal → critical
Crash Signature: [@ CERT_GetCommonName | mozilla::psm::DefaultServerNicknameForCert(CERTCertificateStr*) ]
Keywords: crash, crashreportid
Summary: [e10s] "Untrusted Connection" page's "Add Exception" button does nothing → [e10s] "Untrusted Connection" page's "Add Exception" button does nothing or crashes
The crash was probably fixed by bug 1069518. However, adding exceptions still won't work in e10s, since the code just causes a javascript assertion. I believe this bug is a consequence of sending connection security information to the child and then expecting it back when there's a security exception. As I've said before, this setup is undesirable for multiple reasons:
1. Bugs like this.
2. Networking and certificate verification happens in the parent process, so there's no need to notify the child to notify the parent that a certificate error was encountered.
3. The parent shouldn't trust security information from the child anyway.
2014-09-24-03-02-04-mozilla-central-firefox-35.0a1.ru.linux-x86_64 does not crash, does nothing.
I've been investigating this bug
Assignee: jmathies → felipc
Status: NEW → ASSIGNED
The main problem about this not working (not considering the broken design mentioned in comment 10), is that we trying to pass a CPOW for the sslstatus/cert to a C++ implemented function.

The code flow in this bug is:

- On a click in the about:certerror button, we pass the failedChannel as a CPOW here:
 http://hg.mozilla.org/mozilla-central/annotate/5e704397529b/browser/base/content/content.js#l429

- Which is received in browser.js here:
  http://hg.mozilla.org/mozilla-central/annotate/5e704397529b/browser/base/content/browser.js#l2374
  And passed to the modal exceptionDialog.xul dialog down in line 2391

- The addException function from exceptionDialog.js is able to check the flags from gSSLStatus properly through the CPOW, however things fail when we pass gCert to the rememberValidityOverride function here:
  http://hg.mozilla.org/mozilla-central/annotate/5e704397529b/security/manager/pki/resources/content/exceptionDialog.js#l342


David, to make this work without a major redesign, do you know if I can somehow retrieve the proper serverCert object to set it to gCert? This code is in the parent process so the SSLStatus{Provider}/ServerCert must be there somehow.. But perhaps not accessible from this code.

Alternatively I could call the checkCert() function which will XHR the URL and re-retrieve the certs, but that sounds undesirable.
Flags: needinfo?(dkeeler)
Iteration: --- → 35.2
(In reply to :Felipe Gomes from comment #13)
> David, to make this work without a major redesign, do you know if I can
> somehow retrieve the proper serverCert object to set it to gCert? This code
> is in the parent process so the SSLStatus{Provider}/ServerCert must be there
> somehow.. But perhaps not accessible from this code.

Somehow getting a handle on the original channel that failed would be best (it's somewhere in the parent process and has the appropriate sslStatus/serverCert/etc.). Unfortunately I have no idea how to get at it. Maybe it's somehow/somewhere on the docshell of the page that opened the certificate override dialog?

If the CPOW'd serverCert is actually valid at all in that code, here's the terrible hacky thing I would do:
I think when the cert gets sent to the child it's instantiated as a nsNSSCertificateFakeTransport, which doesn't actually do much. It can, however, be serialized again (see nsNSSCertListFakeTransport::Write, etc.). I think if you were to serialize it and deserialize it in the parent, it would be instantiated as a real nsNSSCertificate at that point, whereupon the parent could use it like a normal certificate.

> Alternatively I could call the checkCert() function which will XHR the URL
> and re-retrieve the certs, but that sounds undesirable.

I would rather avoid that - you're right that that would be undesirable.
Flags: needinfo?(dkeeler)
Duplicate of this bug: 1074064
Iteration: 35.2 → 35.3
Blocks: 1073957
(In reply to David Keeler (:keeler) [use needinfo?] from comment #14)
> I think when the cert gets sent to the child it's instantiated as a
> nsNSSCertificateFakeTransport, which doesn't actually do much. It can,
> however, be serialized again (see nsNSSCertListFakeTransport::Write, etc.).
> I think if you were to serialize it and deserialize it in the parent, it
> would be instantiated as a real nsNSSCertificate at that point, whereupon
> the parent could use it like a normal certificate.

and.. this works perfectly!
Attachment #8497913 - Flags: review?(dkeeler)
Blocks: 1066181
Comment on attachment 8497913 [details] [diff] [review]
Serialize SSL status

Review of attachment 8497913 [details] [diff] [review]:
-----------------------------------------------------------------

Well, I'm glad this works, but it's unfortunate that we have to do it at all. In the end, it's not worse than what we had before and at least it works, so that's good. Eventually we'll have to re-work how this all happens, but that will be a large endeavor.

::: browser/base/content/browser.js
@@ +2406,4 @@
>        break;
>        case "Browser:SiteBlockedError":
> +        this.onAboutBlocked(msg.data.elementId, msg.data.isMalware,
> +                            msg.data.isTopFrame, msg.data.location);

I'm not familiar enough with this code to know the difference between using msg.json and msg.data here, so I'll just assume you know what you're doing.

@@ +2424,5 @@
>            secHistogram.add(Ci.nsISecurityUITelemetry.WARNING_BAD_CERT_TOP_CLICK_ADD_EXCEPTION);
>          }
> +
> +        let serhelper = Cc["@mozilla.org/network/serialization-helper;1"]
> +                           .getService(Ci.nsISerializationHelper);

nit: I would move this line one space left for alignment

::: browser/base/content/content.js
@@ +433,5 @@
> +                                  .QueryInterface(Ci.nsISSLStatusProvider)
> +                                  .SSLStatus
> +                                  .QueryInterface(Ci.nsISerializable);
> +      serializedSSLStatus = serhelper.serializeToString(serializable);
> +    } catch (e) { }

I'm not sure I understand the rationale behind silently catching and eating this error. Wouldn't we rather know about it?
Attachment #8497913 - Flags: review?(dkeeler) → review+
QA Contact: jbecerra
(In reply to David Keeler (:keeler) [use needinfo?] from comment #18)
> 
> I'm not familiar enough with this code to know the difference between using
> msg.json and msg.data here, so I'll just assume you know what you're doing.

yeah, msg.json and msg.data are the same thing, but .json has been deprecated in favor of .data, so we usually update them when touching code nearby

> ::: browser/base/content/content.js
> @@ +433,5 @@
> > +                                  .QueryInterface(Ci.nsISSLStatusProvider)
> > +                                  .SSLStatus
> > +                                  .QueryInterface(Ci.nsISerializable);
> > +      serializedSSLStatus = serhelper.serializeToString(serializable);
> > +    } catch (e) { }
> 
> I'm not sure I understand the rationale behind silently catching and eating
> this error. Wouldn't we rather know about it?

The error will show up anyway in the parent side when it tries to reserialize, and it's easier to catch and debug it there (at least at the moment), so I preferred to let it fail there
https://hg.mozilla.org/mozilla-central/rev/d263222c8c44
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 35
Still encountering the bug in the current version of mozilla-central (freshly clobbered and rebuilt).
Flags: needinfo?(felipc)
PS: This is on Linux, amd64.
What kind of useful info could I provide ?
(In reply to Nicolas Braud-Santoni from comment #23)
> PS: This is on Linux, amd64.
> What kind of useful info could I provide ?

Would you mind filing a fresh bug? Please add the tracking-e10s: ? flag and good system info / steps to reproduce. Thanks!
I've also tried this on Win 7 x64, Ubuntu 13.04 x64, Mac OS X 10.9.5, using latest Firefox 35 Nightly (BuildID=20141009030201), and I can confirm that this does not work.

The scenario is basically the same as in comment 0:

Steps to reproduce:
1. Open new e10s window (or when starting with fresh profile just choose to enable e10s and restart).
2. Load https://svn.resiprocate.org/rep/resiprocate/.
3. See the "Untrusted Connection" page.
4. Click "I Understand the Risks" and "Add Exception" button.


Actual results:
Nothing happens (button is clickable, but no response). Console shows: "NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsISerializationHelper.deserializeObject]" each time the button is pressed.


Expected results:
Should open a new dialog to import the site certificate.

This is 100% reproducible for me on all 3 machines.

Given it's the same problem as originally reported I think this should be reopened rather then filing a new bug... let me know what you think.
For some reason, I can't open the test URL anymore. I load it on an e10s window and it keeps forever refreshing and never shows the error page. Does anyone else see this?
Flags: needinfo?(felipc)
I can still load it and get the bad ssl page. When I click on the add exception button I get:

NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsISerializationHelper.deserializeObject] @

let sslStatus = serhelper.deserializeObject(sslStatusAsString);

http://mxr.mozilla.org/mozilla-central/source/browser/base/content/browser.js#2433
sslStatusAsString is empty.
The error on the other side is:

[JavaScript Error: "docShell.failedChannel is null" {file: "chrome://browser/content/content.js" line: 442}]
I have verified that the bug is fixed with the patch posted here. However, there were two other regressions recently that makes thing break in a different way. I'm looking for regression range to file both bugs but I haven't been able to find it yet because bisecting it with the two new problems has been interesting.

The two problems are:
 - when the page hits the ssl error, the error page being displayed is loading in the parent process, instead of the content process. It's also being loaded as a normal page and not a loaderror, as you can see the about:certerror?etc in the URL (it's not supposed to be visible). If you see about:certerror?etc in the URL, then you are hitting this regression

 - I am able to go back in the tree enough to not hit this problem, however then the test URL end up being redirected/reloaded indefinitely, and the error page never shows, it just remains blank.


By going back enough (I used 104254bd1fc8 from Jul 31st), and applying the patch posted here, it works. If anyone else wants to take a stab at finding regression ranges for either of these mentioned problems, it's most welcome!
Better ranges - 

2014-08-16 - everything works
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=053a96e40244&tochange=94ba78a42305
2014-08-17 - blank white page
2014-09-13 - blank white page
2014-09-16 - blank white page
2014-09-17 - blank white page
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=d2c01d77b9d0&tochange=426497473505
2014-09-18 - everything works
2014-09-20 - everything works
2014-09-23 - everything works
2014-09-25 - everything works
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1735ff2bb23e&tochange=818f353b7aa6
2014-09-26 - loads but dialog is busted
2014-09-27 - loads but dialog is busted
(In reply to :Felipe Gomes from comment #30)
>  - when the page hits the ssl error, the error page being displayed is
> loading in the parent process, instead of the content process. It's also
> being loaded as a normal page and not a loaderror, as you can see the
> about:certerror?etc in the URL (it's not supposed to be visible). If you see
> about:certerror?etc in the URL, then you are hitting this regression

Actually, I posit that about:certerror *should* be loaded in the parent process. That way, we won't have to do the round-trip serialization/deserialization of the security information that we're doing. From a security perspective, the parent shouldn't be trusting security information coming from the child process anyway.
(In reply to David Keeler (:keeler) [use needinfo?] from comment #33)
> (In reply to :Felipe Gomes from comment #30)
> >  - when the page hits the ssl error, the error page being displayed is
> > loading in the parent process, instead of the content process. It's also
> > being loaded as a normal page and not a loaderror, as you can see the
> > about:certerror?etc in the URL (it's not supposed to be visible). If you see
> > about:certerror?etc in the URL, then you are hitting this regression
> 
> Actually, I posit that about:certerror *should* be loaded in the parent
> process. That way, we won't have to do the round-trip
> serialization/deserialization of the security information that we're doing.
> From a security perspective, the parent shouldn't be trusting security
> information coming from the child process anyway.

We've been moving toward having all about pages out of process. If there's a really compelling case to keep certain pages in the parent, would you mind filing a fresh bug on that and explaining? We would probably mark it for later work, but we could get around to it at some point.
(In reply to David Keeler (:keeler) [use needinfo?] from comment #33)
> (In reply to :Felipe Gomes from comment #30)
> >  - when the page hits the ssl error, the error page being displayed is
> > loading in the parent process, instead of the content process. It's also
> > being loaded as a normal page and not a loaderror, as you can see the
> > about:certerror?etc in the URL (it's not supposed to be visible). If you see
> > about:certerror?etc in the URL, then you are hitting this regression
> 
> Actually, I posit that about:certerror *should* be loaded in the parent
> process. That way, we won't have to do the round-trip
> serialization/deserialization of the security information that we're doing.
> From a security perspective, the parent shouldn't be trusting security
> information coming from the child process anyway.

But this is a separate issue. It's loading in the parent just as if someone had typed "about:certerror" in the URL. At that point there's no information about the failedChannel, the certs, or anything linking to the original page load that failed. So while it might sound like a change going in the right direction, it's not.. just a bug that's making it load in parent
Reopening since extra work is ongoing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
But not on this bug. The Untrusted Connection page is not working due to other regressions unrelated to this bug, which should be tracked separately.
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Depends on: 1082791
I've re-verified this after the fix for bug 1082791, on Windows 7 x64, Mac OS X 10.10, Ubuntu 14.04 x64, and the "Add Exception" button now works as expected on the latest Firefox 37 Nightly (BuildID=20141202030201).
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.