Closed Bug 682329 Opened 13 years ago Closed 12 years ago

Certificate error dialog box pops up when closing tab

Categories

(Core :: Security: PSM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla10

People

(Reporter: briansmith, Assigned: briansmith)

References

(Depends on 2 open bugs, Blocks 2 open bugs)

Details

(Keywords: APIchange, dev-doc-needed, ux-interruption, Whiteboard: [Checked in in comments 14, 15, 16])

Attachments

(5 files, 7 obsolete files)

2.85 KB, patch
mayhemer
: review+
briansmith
: checkin+
Details | Diff | Splinter Review
16.27 KB, patch
KaiE
: review-
bzbarsky
: superreview+
Details | Diff | Splinter Review
22.15 KB, patch
briansmith
: review+
KaiE
: review-
briansmith
: checkin+
Details | Diff | Splinter Review
7.38 KB, patch
briansmith
: review+
briansmith
: checkin+
Details | Diff | Splinter Review
45.87 KB, patch
mayhemer
: review-
Details | Diff | Splinter Review
Actual results:
If you close a tab while certificate validation for the page is happening, when the validation completes with an error, the certificate error dialog box will pop up.

Expected results:
No error message is shown. We do not care about the certificate validation error because the user is no longer interested in the page that had the error.

PSM shows this dialog box whenever it cannot get an nsISecureBrowserUI from the callbacks associated with the socket. I believe this is due to a race condition: We expect that the callbacks for the socket will not get reset (nsNSSSocketInfo::SetNotificationCallbacks(nsnsull)) during the course of certificate validation, but this is not true (as in this case).

My proposed solution is to make the default behavior to be to not show a certificate error dialog box at all. If a part of the application wants the certificate error dialog box, then it should register a callback that implements the nsIBadCertListener2 interface, and which dispatches an event to show a dialog box in its implementation of NotifyCertProblem. This is the solution that I implemented in my patch for bug 660559. (My solution for bug 660559 makes this race condition is made more evident.)
Depends on: 679036
No longer depends on: 660559
IMO, after the validation is done, ssl socket looks for an "external reporter" via its callbacks.  At that moment doc shell is probably detached from the channel and ssl socket decides to report the error via a dialog (default behavior).

This default behavior is there for any consumers of PSM that doesn't implement the external reporting (which is the cert error page in Firefox).  So we show at least something.

I would like to change this code or at least the code path soon anyway.
Blocks: 679140
No longer depends on: 679140
This patch stops the desktop browser from showing the model dialog boxes for cert errors and SSL errors on sub-resources, just like the mobile browser already does.
Attachment #562076 - Flags: review?(honzab.moz)
This patch changes the test so that it tests that we *don't* show the dialog box by default, instead of that we *do* show it by default.
Attachment #562080 - Flags: review?(honzab.moz)
This patch removes the API for showing the certificate error dialog box.
Attachment #562081 - Flags: superreview?(bzbarsky)
Attachment #562081 - Flags: review?(kaie)
Once the other patches are applied, we can remove this code from mobile/, because the default behavior will be the behavior that mobile is implementing.
Attachment #562082 - Flags: review?(mark.finkle)
Comment on attachment 562081 [details] [diff] [review]
Remove the API for showing the cert error dialog box from PSM

sr=me, but could we still log the errors to the error console instead of completely dropping them?
Attachment #562081 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 562076 [details] [diff] [review]
Remove default certificate error and SSL error prompts from PSM

Review of attachment 562076 [details] [diff] [review]:
-----------------------------------------------------------------

This patch will cause that TB and probably also SM will not show the dialog in some situations.  However, the dialog is a context-less [modal] pop-up that sometimes comes from nowhere and doesn't give much information anyway.

We should communicate this to TB and SM people explicitly, it could be recognized as an issue (and not a feature).

Please re-request review on the new patch.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ -1215,5 @@
> -    const char *stringID;
> -    if (wantsHtml)
> -      stringID = "certErrorMismatchSingle2";
> -    else
> -      stringID = "certErrorMismatchSinglePlain";

This (certErrorMismatchSinglePlain) should then be removed from locales.  Unless my general comment is addressed.

@@ +1418,5 @@
> +    // Set the error message before notifying the nsISSLErrorListener so that
> +    // it can inspect the default error message, if it wants.
> +    nsString formattedString;
> +    rv = getErrorMessage(err, hostNameU, port, nssComponent, formattedString);
> +    rv = socketInfo->SetErrorMessage(formattedString.get());

Also please see comments bellow.

You must *not* do this only when there are callbacks on the socket.  We collect cbs here only to collect nsISSLErrorListener and notify if present.

If it is not present (even the callbacks them self) you always have to set the error message.

Is this some new contract you are introducing on the nsISSLErroListener interface?  If consumers were OK w/o the additional info till now then it is probably worth leaving it unchanged.

If you want to propagate the error, we may find out some other way to do it, in a different bug.

Again, see comments bellow.

@@ +1435,4 @@
>          nsCString hostWithPortString = hostName;
>          hostWithPortString.AppendLiteral(":");
>          hostWithPortString.AppendInt(port);
> +        rv = proxy_sel->NotifySSLError(csi, err, hostWithPortString, &ignored);

Please see comments bellow..

@@ +3465,3 @@
>          nsIInterfaceRequestor *csi = static_cast<nsIInterfaceRequestor*>(infoObject);
> +        nsrv = proxy_bcl->NotifyCertProblem(csi, status, hostWithPortString, 
> +                                            &ignored);

This is the "comment bellow" :)

The old code prevents even the error page (the externally reported one) from showing up on suppressMessage=true by bypassing SetErrorMessage on the socket.

Your patch breaks it if we, of course, don't perceive not showing the error page as a bug (I don't, I believe that is the correct behavior).

Please preserve the old logic around NotifyCertProblem (call SetErrorMessage after we check the result of notifyCertProblem call).

If you have a different opinion on fixing this, let's discuss it.
Attachment #562076 - Flags: review?(honzab.moz)
Attached patch Remove default SSL error UI [v2] (obsolete) — Splinter Review
Review comments have been addressed.

Your patch queue should look like this to review this patch:
psm-PRBool-to-bool.patch
psm-remove-default-ssl-error-UI.patch
psm-remove-certerror-test.patch

If you don't apply psm-remove-certerror-test.patch, then you will get a test failure.
Attachment #562076 - Attachment is obsolete: true
Attachment #564044 - Flags: review?(honzab.moz)
Comment on attachment 564044 [details] [diff] [review]
Remove default SSL error UI [v2]

Review of attachment 564044 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +1447,5 @@
>    }
>  
> +  if (socketInfo->GetExternalErrorReporting()) {
> +    nsString formattedString;
> +    (void) getErrorMessage(err, hostNameU, port, nssComponent, formattedString);

Looks like hostNameU is not used sooner that in this code.  You may move NS_ConvertASCIItoUTF16 hostNameU(hostName) here.
Attachment #564044 - Flags: review?(honzab.moz) → review+
Comment on attachment 562080 [details] [diff] [review]
Fix test that previously checked that the dialog box was shown by default

Review of attachment 562080 [details] [diff] [review]:
-----------------------------------------------------------------

r=honzab
Attachment #562080 - Flags: review?(honzab.moz) → review+
Attachment #562082 - Flags: review?(mark.finkle) → review+
Comment on attachment 565141 [details] [diff] [review]
Remove default SSL error UI [v3]

Review comments addressed and rebased on top of current mozilla-central.
Attachment #565141 - Flags: review+
Rebased on current mozilla-central, carrying over r+
Attachment #562082 - Attachment is obsolete: true
Attachment #565144 - Flags: review+
Comment on attachment 562080 [details] [diff] [review]
Fix test that previously checked that the dialog box was shown by default

https://hg.mozilla.org/mozilla-central/rev/1e937319fe2b
Attachment #562080 - Flags: checkin+
Comment on attachment 565144 [details] [diff] [review]
Remove workaround that mobile is using to suppress the cert error dialog box

https://hg.mozilla.org/mozilla-central/rev/b3dd9c7c3e67
Attachment #565144 - Flags: checkin+
(In reply to Brian Smith (:bsmith) from comment #14)
> Comment on attachment 562080 [details] [diff] [review] [diff] [details] [review]
> Fix test that previously checked that the dialog box was shown by default
> 
> https://hg.mozilla.org/mozilla-central/rev/1e937319fe2b

That should be https://hg.mozilla.org/mozilla-central/rev/b49f2f92f12a.
In my opinion you must ensure that SSL related errors are still being reported in the non-browser scenarios.

PSM is platform code, not browser specific code.

I think it's not acceptable to require that each application implements it's own SSL error reporting.

If you want Firefox to suppress the errors in some scenarios, then you should find a way to do just that, not disable all reporting by default.
Comment on attachment 562081 [details] [diff] [review]
Remove the API for showing the cert error dialog box from PSM

Please correct me if my understanding is wrong.

It looks like you're completely removing the default reporting of bad certificate errors for all applications based on the Mozilla platform.

I disagree with this plan, see comment 18.
Attachment #562081 - Flags: review?(kaie) → review-
I see your intention is to remove prompts.

Well, IMHO removing prompts is not sufficient unless you replace the prompts with an equivalent mechanism, allowing the user to learn about the incident.

Maybe I've missed it, but in your patches I don't see other reporting mechanisms. You probably assume that the browser case is the most important one and having a solution for that is sufficient. I disagree.

If you want to remove the prompt, you must replace it with something new, at the time of removal.

An idea for removing the bad error prompt is to introduce a new SSL status element in chrome, that user can notice, and can click when necessary.

In other words, if the SSL connection fails, show a status icon somewhere on screen, and allow the user to click/touch it. When doing so, the error information should be shown.

If this is what you want to do, then it could be implemented as a default icon, implemented in PSM, that applications can easily included, by simply adding the new default status control to their user interface.

I described this idea/concept in more detail here:
http://kuix.de/mozilla/sslauth/
Comment on attachment 565141 [details] [diff] [review]
Remove default SSL error UI [v3]

In my opinion this code should be kept until you have a better solution.
Attachment #565141 - Flags: review-
(In reply to Kai Engert (:kaie) from comment #20)
> Well, IMHO removing prompts is not sufficient unless you replace the prompts
> with an equivalent mechanism, allowing the user to learn about the incident.

We don't do so for any other networking errors. And, the SSL error dialog boxes were already removed for Mobile Firefox (see the mobile-specific patch in this bug.) This change was already agreed on by three core developers. I already filed the follow-up to do the error reporting in the error console so that developers can better understand the issue. I also just filed a follow-up (bug 695066) for improving the UX when we fail to load subresources do to such errors.

> Maybe I've missed it, but in your patches I don't see other reporting
> mechanisms. You probably assume that the browser case is the most important
> one and having a solution for that is sufficient. I disagree.

The application can set the callbacks to something that implements nsIBadCertListener2; then it can do whatever error reporting it thinks is important in its notifyCertProblem implementation. If they want to use the same UI that we were providing, they can copy/paste the source code for our old UI into their project.

> An idea for removing the bad error prompt is to introduce a new SSL status
> element in chrome, that user can notice, and can click when necessary.

Let's deal with that in bug 695066.

More generally, the mobile team has asked that we allow the application to control the UI for more things in PSM. And, the necko team has decided to move all the UI prompts from Necko to the application. If there were no way for the application to report the error the same way as before, then I would have added a mechanism. However, because we already had nsIBadCertHandler2, I didn't have to.
Keywords: ux-interruption
We still track for "external reporting".  We could have a default handler that could be turned on or off by just single call of some new API and be invoked in case there is no external reporting.  That handler can implement an asynchronous/coalesced show of a dialog.

However, for example thunderbird when connecting to a server with bad cert show a dialog to add an exception directly, so there is a UI.  In other cases, like adding an image embedded to an email html body shows a dialog that comes from no where (user doesn't know what's happening).

As I have spotted, Chrome shows an error in the content in place of the resource - a nice frame with the chrome icon and text: resource failed to load because blah blah.  That is perfect and pretty logical way to show an error for both developers AND users.  We should have something like that too.
Thunderbird already uses nsIBadCertListener2 to override the PSM dialog and put up the exception dialog - //pippki/content/exceptionDialog.xul - is there anything more that we need to do?
(In reply to David :Bienvenu from comment #24)
> Thunderbird already uses nsIBadCertListener2 to override the PSM dialog and
> put up the exception dialog - //pippki/content/exceptionDialog.xul - is
> there anything more that we need to do?

From my point of view, no.

I think the only other case is inline content in HTML emails, both during composition and reading.  If a resource cannot not be loaded because of an ssl error then user will not know about that error, he will just see a broken image as in case of a common network error.  Probably bug 695066 is covering that.

Brian: to have at least something, could we by default asynchronously log to the error console?
(In reply to Honza Bambas (:mayhemer) from comment #25)
> From my point of view, no.

+1.

> Brian: to have at least something, could we by default asynchronously log to
> the error console?

That is bug 688810.
Thx, Brian and Honza - I'm not particularly concerned about the inline content in HTML mails, though I wonder a little bit about our content tabs, which essentially show web pages. We should just behave like Firefox there...
Comment on attachment 562081 [details] [diff] [review]
Remove the API for showing the cert error dialog box from PSM

This has been dead code for a while. No bugs have been filed (AFAICT) asking for this dialog box to be brought back.
Attachment #562081 - Flags: review- → review?(kaie)
Comment on attachment 562081 [details] [diff] [review]
Remove the API for showing the cert error dialog box from PSM

Sorry, I don't agree that 

  "avoid a warning in one scenario where the warning doesn't make sense"

justifies the complete removal of such warnings,
including in the scenarios where the warning does make sense.

You might be unaware that this warning isn't just Firefox code, but rather core code.

This warning is the default warning shown by the platform. It is used by any application encountering a bad certificate.

If you want to do something different in Firefox, find a way to override this default behaviour without removing this code.

Double r- r-
Attachment #562081 - Flags: review?(kaie) → review-
I'm not happy the removal was done despite my r- of this plan from comment 21.

We should have reporting of SSL errors by default, unless the application says it wants to suppress it.
(In reply to Kai Engert (:kaie) from comment #30)
> I'm not happy the removal was done despite my r- of this plan from comment
> 21.
> 
> We should have reporting of SSL errors by default, unless the application
> says it wants to suppress it.

Kai, all the changes to remove this dialog box have already landed way before you commented in comment 21. Look at this bug's history. All the patch that you "double r- r-"'d does is remove code that is now dead code after that removal was done.

If you think we need some UI for this then we will need to file a new bug to add back a UI, and we will have to work with the UI team on it.
Depends on: 739563
(In reply to Brian Smith (:bsmith) from comment #31)
> (In reply to Kai Engert (:kaie) from comment #30)
> > I'm not happy the removal was done despite my r- of this plan from comment
> > 21.
> 
> Kai, all the changes to remove this dialog box have already landed way
> before you commented in comment 21. 


I wouldn't call "6 days" as "way before". I complained about your approach 6 days after your landing in comment 19.

You should have reconsidered and have backed out your patch after I made that comment.


By removing this prompt by default, you made it nearly impossible for users to understand the reason of connection failures, see bug 739563 for an important example.

I hate it that "removing code" is seen by you as an acceptable quick fix for a complaint about a prompt. IMHO you didn't fix this bug, rather your removal caused functional corruption, at least a regression.
bsmith, I have to concur with kaie here. Hiding errors from users and just silently failing is outright user-hostile and gives the perception of a broken app. We must have a proper error message.

Somebody from SuSE, me and Mark Banner had to debug a case today which was just such a case. This has already cost needless time, and will do so 1000-fold once this ships to end-users.
Attached patch patch v4: Restore the prompt (obsolete) — Splinter Review
Unfortunately it's not possible to simply back out the old patch, because of lots of changes on top.

It took me the whole day to create this new patch, which reenables the prompts.

I tried to make it smarter. Can you please test if you get undesired prompts with this patch?

Currently, the patch uses DetectDocShellAndSetErrorPrompts
which will simulate the old behahiour.

If this doesn't work as desired, and if we still get incorrect prompts, we could use a short term workaround, and enable code path (a), which:
- disables all error prompts for any https sockets
- enables error prompts for any other ssl protocol sockets

We MUST do either (a) or (b) for Firefox 14, prior to April 24,
because having bug 739563 is not acceptable for me.

(After we do so, you could work on a better strategy, have each creator of a SSL socket make a decision whether or not the socket should have prompts shown or suppressed, and pass that along.)
Kai, I am happy to discuss how to best fix bug 695066 and bug 739563 before Firefox 14 is released. I think we probably have time to fix them properly instead of doing what you propose.

We should first discuss exactly why the removal of the dialog box was done when it was done: Besides the bad behavior described in the summary, the dialog box also blocked the socket transport thread, which meant that no network I/O could be completed--in any tab--while the dialog box was created. That is why it was fixed in the way it was fixed before the SSL thread removal landed. Also, Mobile Firefox and the networking team both want to avoid these low level components from showing UI. UI belongs in a higher layer.

Also, bug 739563 was a regression caused by the disabling of MD5 for certificate signature verification, not a regression caused by this bug, which landed many months before the MD5 disabling. I think that Thunderbird is best off reverting the MD5 disabling until a better approach to it is available, than reverting the fix for this bug.

By the way, I back when you gave the r- for the patches, I only noticed the r- for the dead removal patch. I didn't realize that you r-'d the patch that had already been r+'d by Honza and which had already been checked in.  That was just an oversight, and I just noticed that r- now. Also, I CC'd Mark and Thunderbird people discussed this change with us (see comments above) even before it landed. And, I marked it APIchange and dev-doc-needed. And, it was r+'d. So, I think process-wise, I did everything correctly.

Also, it is my understanding, from discussing related things with management, that while we should avoid breaking Thunderbird and XULRunner when we can easily avoid doing so, we are not supposed to be spending an inordinate amount of effort to do so. The requirement is that we don't break Tier 1 platforms for Firefox, that we give Mark and other Thunderbird developers a heads-up and help them with changes if they make us aware that they need help. The fact is that Platform people have to break XULRunner stuff all the time to get stuff done for Firefox with a reasonable amount of effort.

In the case of the MD5 error message, if the Thunderbird team wants to have a UI for this error message, then let's discuss that in the bug that's already on file for doing that--bug 739563. I suspect that it is much simpler to fix Thunderbird's specific issue than to add this cruft back into PSM where it doesn't belong. And, IMO, Thunderbird may be better off with a UI that allows the user to override the error and continue on. This is something we should decide with Mark and David and others on the Thunderbird team.
Blocks: 674147
(In reply to Brian Smith (:bsmith) from comment #35)
> By the way, I back when you gave the r- for the patches, I only noticed the
> r- for the dead removal patch.

In English: "By the way, back when you gave the r- for the patches, I only noticed the r- for the dead code removal patch."
> avoid these low level components from showing UI. UI belongs in a higher layer.

A classical solution to this is to let the component (be it a server or library) return both a specific error code and an error message. This allows the caller (application) both to show a custom error UI/handling based on the error code, or to just show the error message from the lib in its own UI.

If the API in question here is XPCOM and the component PSM (not raw NSS), there's a way to set an XPCException plus returning a normal nsresult error code from C++ XPCOM functions that JS will receive as a normal exception object with properties on the exception object. On the XPCException, you can set the error message and maybe even a custom value. I can provide sample code, if you are interested.
Attachment #613780 - Flags: superreview?(rrelyea)
Attachment #613780 - Flags: review?(honzab.moz)
Attachment #613780 - Flags: superreview?(rrelyea)
Attachment #613780 - Flags: superreview?(bzbarsky)
Attachment #613780 - Flags: review?(rrelyea)
(In reply to Brian Smith (:bsmith) from comment #35)
> Kai, I am happy to discuss how to best fix bug 695066 and bug 739563 before
> Firefox 14 is released. I think we probably have time to fix them properly
> instead of doing what you propose.

I disagree. The removal was a regression. You have had a long time since my complaint to come up with something better.

We must fix the regression first, by restoring the old functionality. Afterwards you are free to propose a new solution.
(In reply to Brian Smith (:bsmith) from comment #35)
> We should first discuss exactly why the removal of the dialog box was done
> when it was done:

I should have backed you out immediately when I saw it back in October.
I deeply regret that I didn't. 


> Besides the bad behavior described in the summary,

Why didn't you work to simply fix the bad behaviour?
There was no need to rip out everything, you should have worked to get the logic improved.
See also my comments in bug 739563.


> Also, bug 739563 was a regression caused by the disabling of MD5 for
> certificate signature verification, not a regression caused by this bug,
> which landed many months before the MD5 disabling. 

I disagree.

We don't know how many people left Firefox since you landed this change, in environments where Mozilla software stopped working, and users didn't have any idea why.


> I think that Thunderbird
> is best off reverting the MD5 disabling until a better approach to it is
> available, than reverting the fix for this bug.

I disagree.
A single version of the Mozilla platform should have equal crypto preferences across all applications, or you create a communication nightmare and lots of confusion.
Comment on attachment 613780 [details] [diff] [review]
patch v4: Restore the prompt

Review of attachment 613780 [details] [diff] [review]:
-----------------------------------------------------------------

To confirm: the goal here is to open a single, non-overlapping dialog informing about that some resource somewhere failed to load because of an SSL error, right?

That is madness to do in Firefox.

That is a wrong indication of a failure on any other software, IMO.

In short term we won't find a better solution for non-Firefox applications, hence this effort.


What I see is and was wrong with this code is that we are deciding to show or not to show a dialog by checking presence of security UI on docshell while calling on the callbacks from the socket thread.  Absolutely unnecessary.  On places we handle with suppressMessage after call on the bad cert listeners, just post an async event to the main thread with ref to callbacks and necessary info to show the dialog.  The event on the main thread will check for sec ui on the docshell by GI/QI the callbacks and if not present, based also on some other conditions we may want, show the dialog.  We don't need any result of the dialog and the dialog doesn't need to block any net loads or so, so it can be just an async operation.

Makes sense?

In my opinion, the will to show an ssl error dialog should be directed to a single service, that will by default show the error, but apps can override it (with contract id override, e.g.) to have a different behavior (that would be Firefox).

r-

::: netwerk/base/src/nsSocketTransport2.cpp
@@ +1059,5 @@
> +                    // (a) Use the behaviour that our creator asked for
> +                    // secCtrl->SetSSLErrorPromptsEnabled(mSSLErrorPromptsEnabled);
> +                    
> +                    // (b) detect whether we run in a docshell
> +                    secCtrl->DetectDocShellAndSetErrorPrompts();

This must be called on the main thread but here it is called on the socket thread.

::: netwerk/protocol/http/nsHttpConnectionMgr.cpp
@@ +2163,5 @@
>                                mEnt->mConnInfo->ProxyInfo(),
>                                getter_AddRefs(socketTransport));
>      NS_ENSURE_SUCCESS(rv, rv);
>      
> +    socketTransport->SetSSLErrorPromptsEnabled(false);

This will prevent the dialog for HTML email composer and HTML email view.  Is it what you want?

I understand your concern but check for sec ui is enough here, isn't it?

::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +255,5 @@
> +        nsString str(s);
> +        nsString empty;
> +        rv = dialogs->ShowCertError(nsnull, status, ix509, 
> +                                    str, 
> +                                    empty, host, port);

status is unused by the implementation.

@@ +449,5 @@
> +  // We must call SetCanceled prior to showing the prompt,
> +  // because it sets the error code which will be used to create the error string.
> +  mInfoObject->SetCanceled(result->mSuppressPrompt,
> +                           result->mErrorCode,
> +                           result->mErrorMessageType);

You actually don't need to do this.  nsHandleInvalidCertError reads just the message and status (unused) from the info object.  You could pass it directly.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +366,5 @@
> +    if (docshell) {
> +      nsCOMPtr<nsISecureBrowserUI> secureUI;
> +      docshell->GetSecurityUI(getter_AddRefs(secureUI));
> +      if (secureUI) {
> +        mSSLErrorPromptsEnabled = false;

I was so glad we removed this code...
Attachment #613780 - Flags: review?(honzab.moz) → review-
Which thread does nsNSSSocketInfo::DetectDocShellAndSetErrorPrompts run on?
(In reply to Boris Zbarsky (:bz) from comment #41)
> Which thread does nsNSSSocketInfo::DetectDocShellAndSetErrorPrompts run on?

Socket thread.
I'm not convinced that refcounting docshells on the socket thread is a good idea.

(I'm also not convinced that we have a clear statement of what the UI should be like here, with some actual UX input....)
We probably want to add docshells to cycle collection at some point, and that basically means
that refcounting in non-main-thread is prohibited
(In reply to Boris Zbarsky (:bz) from comment #43)
> (I'm also not convinced that we have a clear statement of what the UI should
> be like here, with some actual UX input....)

We are trying to put back a formerly existing UI here.  It will be replaced by a better one, probably bug 695066 will handle it.

To make my proposal less lost in the text:

On places we want to show a dialog, just post an async event to the main thread with ref to callbacks and necessary info to show the dialog.  The decision and any QI/GI can be made async on the main thread.
That proposal would make me much happier, yes.  Esp if mCallbacks is not itself a reference to the docshell.
I disagree, fundamentally, that we have to show a dialog by default. Defaulting to a terrible UI does not make sense for Thunderbird, Firefox. If a XULRunner app really wants this terrible UI, it is easy to replicate themselves with a little copy/paste and by implementing nsISSLErrorListener and/or nsIBadCertHandler2. 

It is not a requirement in Gecko that we must avoid requiring XULRunner applications to make changes. We should avoid gratuitous changes, but the compatibility burden for XULRunner applications is on the XULRunner application itself. Everybody here knows this, because we experienced something much, much more tedious to work around for XULRunner applications when we removed XPCOM proxies.

Let's focus our effort on fixing bug 739563 and bug 695066. Note that our UX team is already aware of bug 695066 and I've pinged them about moving forward on that. Further, I've already reached out to the Thunderbird people to figure out what UI they want, and I am planning to help them with it. The Thunderbird solution should set a reasonable example for other XULRunner applications.

Keep in mind that this change has *already* shipped in *multiple* Firefox and Thunderbird releases. Showing a dialog box by default now is likely to result in regressions for addons to both pieces of software that are (probably unknowingly) relying on the suppression of the dialog box. If you look at implementations of nsIBadCertHandler2 and nsISSLErrorListener, in addons and in Thunderbird and Firefox and Gecko, almost all of them exist solely to suppress these dialog boxes. So, restoring the dialog box is not a "free fix."

Further, adding UI dependencies back into the PSM SSL layer is completely wrong with respect to dependency management between layers. The UI parts of PSM's SSL layer lead to a lot of problems like bug 177175, bug 729536, and other bugs.
(In reply to Ben Bucksch (:BenB) from comment #37)
> > avoid these low level components from showing UI. UI belongs in a higher layer.
> 
> A classical solution to this is to let the component (be it a server or
> library) return both a specific error code and an error message. This allows
> the caller (application) both to show a custom error UI/handling based on
> the error code, or to just show the error message from the lib in its own UI.

The caller of PR_Write/PR_Send/PR_Read/PR_Recv can get the error code by checking PR_GetError(). The XPCOM wrappers around the NSPR functions should be returning values comparable using nsINSSErrorsService::getXPCOMFromNSSError(). And, it can get an already-formatted error message by using other methods in nsINSSErrorsService, OR by using nsITransportSecurityInfo::getErrorMessage(). The latter method is how Firefox determines what error message to show on the cert error pages, for example.
If you think the automatic docshell detection code has threading related risks, we could use the second-best alternative proposal:

- remove DetectDocShellAndSetErrorPrompts from the patch

- never show an error prompt for HTTP, by activating (a) in the patch

- use the prompt for non-HTTP
  (unless the application overrides and suppresses, as usual)


(In reply to Honza Bambas (:mayhemer) from comment #45)
> 
> On places we want to show a dialog, just post an async event to the main
> thread with ref to callbacks and necessary info to show the dialog.  The
> decision and any QI/GI can be made async on the main thread.

(In reply to Boris Zbarsky (:bz) from comment #46)
> That proposal would make me much happier, yes.  Esp if mCallbacks is not
> itself a reference to the docshell.


The prompts will be triggered from the main thread, if that's what you meant.
Also, given that the prompts are already being executed from within the main thread, we could attempt to move the call to DetectDocShellAndSetErrorPrompts to that thread, too, just prior to potentially executing the prompts.
I will shortly attach 2 patches, for the proposals described in comment 49 and comment 50.
Solution as proposed in comment 49: never show prompts for any http.

But I like this less. Let's get back to it if necessary. I'll attach another patch.
Attachment #614187 - Attachment description: Patch v5 → Patch v5 (strategy: never prompts for http)
Attached patch Patch v6 (obsolete) — Splinter Review
Solution as proposed in comment 50:

Postpone the docshell detection until we execute on the main thread.
(Seems to work, I didn't get an error on the console)
(In reply to Honza Bambas (:mayhemer) from comment #40)
> 
> To confirm: the goal here is to open a single, non-overlapping dialog
> informing about that some resource somewhere failed to load because of an
> SSL error, right?

Not exactly.

The goal is to implement a fallback error dialog,
shown whenever we run in a context that isn't attached to a docshell, 
and the context hasn't explicitly asked to suppress the dialog.


> That is madness to do in Firefox.

If you still don't like my patch v6 (or v5), 
we could do the following, 
in addition to the code in patch v5:

- implement a minimal service, which carries a bool useSSLErrorPrompts

- the bool defaults to "show", to help any application by default

- in Firefox specific code (not xulrunner, not shared code),
  at some point during startup, we set that bool to "don't show"
> What I see is and was wrong with this code is that we are deciding to show
> or not to show a dialog by checking presence of security UI on docshell
> while calling on the callbacks from the socket thread. 

I believe patch v6 fixes that -  call moved to main thread.


> In my opinion, the will to show an ssl error dialog should be directed to a
> single service, that will by default show the error, but apps can override
> it (with contract id override, e.g.) to have a different behavior (that
> would be Firefox).

Yep, that sounds fine to me.

If you say that patch v6 isn't enough, I'm fine to implement that.
> @@ +255,5 @@
> > +        nsString str(s);
> > +        nsString empty;
> > +        rv = dialogs->ShowCertError(nsnull, status, ix509, 
> > +                                    str, 
> > +                                    empty, host, port);
> 
> status is unused by the implementation.
> 
> @@ +449,5 @@
> > +  // We must call SetCanceled prior to showing the prompt,
> > +  // because it sets the error code which will be used to create the error string.
> > +  mInfoObject->SetCanceled(result->mSuppressPrompt,
> > +                           result->mErrorCode,
> > +                           result->mErrorMessageType);
> 
> You actually don't need to do this.  nsHandleInvalidCertError reads just the
> message and status (unused) from the info object.  You could pass it
> directly.


Ok thanks, I'll address these comments if I get some encouraging comment that either approach 5 or 6 seems reasonable in general, combined with a global behaviour service that can be used to set a different policies for Firefox and other apps by default.
Attachment #613780 - Attachment is obsolete: true
Attachment #613780 - Flags: superreview?(bzbarsky)
Attachment #613780 - Flags: review?(rrelyea)
> > @@ +449,5 @@
> > > +  // We must call SetCanceled prior to showing the prompt,
> > > +  // because it sets the error code which will be used to create the error string.
> > > +  mInfoObject->SetCanceled(result->mSuppressPrompt,
> > > +                           result->mErrorCode,
> > > +                           result->mErrorMessageType);
> > 
> > You actually don't need to do this.  nsHandleInvalidCertError reads just the
> > message and status (unused) from the info object.  You could pass it
> > directly.

I remember why I did this.
The problem is, I don't have the string yet. All I have is the type and code.

By calling SetCanceled, the string will be produced and cached inside the socket-info-object.
Later, we retrieve the string from that object and use it in the prompt.

If you ask me to avoid calling SetCanceled, then I must invent another way to retrieve the error string.
Assignee: bsmith → nobody
Component: Security: PSM → Security
QA Contact: psm → toolkit
(In reply to Kai Engert (:kaie) from comment #56)
> > @@ +255,5 @@
> > > +        nsString str(s);
> > > +        nsString empty;
> > > +        rv = dialogs->ShowCertError(nsnull, status, ix509, 
> > > +                                    str, 
> > > +                                    empty, host, port);
> > 
> > status is unused by the implementation.

It's unused by our implementation.
But that's the way the interface is defined, and there might be embedding apps (not using security/manager/pki such as Camino) that might already use it.
Kai,

First, it is really confusing that you have morphed after the fix has been committed for six months. Please open a new bug. I am going to open a new bug about the dead code removal, which was the only reason this bug stayed open so long after the initial checkins in the first place. I am resolving this bug to reduce confusion.

I discussed this issue with Johnathan. He confirmed that my understanding of the Gecko development policies and guidelines are correct:

1. UI shouldn't be in Core, and definitely shouldn't be in low-level networking code like this. UI code belongs in the Firefox, Thunderbird, and B2G products. If you disagree with this, then please post on dev-platform about changing this. But, it we are unlikely to get a different resolution before the Aurora merge.

2. We are not responsible for maintaining XULRunner any more, beyond accepting non-troublesome bug fixes. We do not have to maintain feature parity or backward compatibility with XULRunner apps. Apparently, this was announced last year, though I could not find the announcement. Again, if you disagree with this, please change this policy first. PSM shouldn't have its own policy regarding this.

3. Our UX team has given us guidance that modal dialog boxes are horrible, especially ones that pop up in content windows like this. To add a modal dialog box would be going directly against the advice of of our UX people. The UX people are in charge of designing the UX; we can give feedback, but we shouldn't do something that is in complete opposition to their goals and strategies.

4. We already have open bugs for fixing the issue in Firefox and Thunderbird. I believe we can get those bugs fixed in the next 10 days, if you are willing to help. Like I said, I've already contacted the UX people about it, and I am willing to help with the implementation.

5. The longer we go back and forth about this, the more we delay the solution for Firefox and Thunderbird, and we put it ourselves risk of missing the next merge window.

So, let's work on the proper fixes for Firefox and Thunderbird so that we can solve the problem that you are trying to solve in a way that meets the project-wide requirements as well as our local requirements in the PSM module.
Status: NEW → RESOLVED
Closed: 12 years ago
Component: Security → Security: PSM
QA Contact: toolkit → psm
Resolution: --- → FIXED
Whiteboard: [Checked in in comments 14, 15, 16]
Target Milestone: --- → mozilla10
Reopening. I disagree with comment 59.

I have a fix in hand that will make everyone happy.
I'm performing final testing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Patch v7 (obsolete) — Splinter Review
This patch implements a fallback prompt, which will be used ONLY if the default platform behaviour is used.

I have attached a patch to bug 744945 that will completely disable any fallback prompts in Firefox.
Attachment #614187 - Attachment is obsolete: true
Attachment #614190 - Attachment is obsolete: true
Attachment #614556 - Flags: superreview?(bzbarsky)
Attachment #614556 - Flags: review?(honzab.moz)
Brian apparently you completely misunderstood the intention.

The intention is to provide a fallback prompt, if no other code handles an error. You are free to handle the errors, and you'll never see the prompt. You're also free to land the patch in bug 744945, which will completely disable all such fallback prompts in the Firefox product, without having to individually handle each scenario.
(In reply to Brian Smith (:bsmith) from comment #59)
> Kai,
> 
> First, it is really confusing that you have morphed after the fix has been
> committed for six months.

But you never marked the fix as fixed, probably because I had complained about the patch early, and you didn't know how to solve that dilemma.


> Please open a new bug.

No. Your patch to this bug was unacceptable, and I'm cleaning up behind you.


> 1. UI shouldn't be in Core, and definitely shouldn't be in low-level
> networking code like this.

You can clean this up later, after we have fixed this mess.

In the meantime this prompt isn't causing any harm for Firefox, because you can disable it completely in Firefox with the patch in bug 744945.


> UI code belongs in the Firefox, Thunderbird, and
> B2G products. If you disagree with this, then please post on dev-platform
> about changing this.

I don't disagree in general, but a reasonable short term, which doesn't cause any harm for the primary product, but rather provides a helpful immediately necessary workaround for all secondary products, should be more important than insisting on some general purpose rule.


> 2. We are not responsible for maintaining XULRunner any more,

It's fine that you feel not responsible, but I do. I fully understand that you didn't care, we can see it in the way you attempted to fix this bug, by ripping out helpful user diagnostic support.


> beyond
> accepting non-troublesome bug fixes. We do not have to maintain feature
> parity or backward compatibility with XULRunner apps.

We do not have to, but I absolutely want to, and I've worked hard to produce this patch which is a compromise that I find very hard to complain about.


> 3. Our UX team has given us guidance that modal dialog boxes are horrible,

I'm not proposing to show modal dialogs. But certainly users aware of the reason why we reject the connection to a broken server is much better than users deciding to drop Thunderbird because it doesn't work for no apparent reason?


> especially ones that pop up in content windows like this. 

Again, this patch, together with bug 744945 won't show ANY prompt in a content window in Firefox.

It was your initial assignment to find a smart solution to this bug while keeping the fallback error notification, instead of removing all such fallbacks.


> 4. We already have open bugs for fixing the issue in Firefox and
> Thunderbird. I believe we can get those bugs fixed in the next 10 days, if
> you are willing to help.

I believe I have already produced the correct solution.


> 5. The longer we go back and forth about this,

I'm not going back and forth at all. You broke it, I waited for you to fix it, you didn't, now I'm fixing it.
Assignee: nobody → kaie
Comment on attachment 614556 [details] [diff] [review]
Patch v7

Please nix the confusing "BHV_" prefix on the constants, and sr=me
Attachment #614556 - Flags: superreview?(bzbarsky) → superreview+
Comment on attachment 614556 [details] [diff] [review]
Patch v7

For all the reasons I stated multiple times above.
Attachment #614556 - Flags: review-
Brian, if Honza or Bob gives r+, I'll ignore your r-
I believe Dan Veditz agrees with my proposal.
(In reply to Kai Engert (:kaie) from comment #63)
> > 1. UI shouldn't be in Core, and definitely shouldn't be in low-level
> > networking code like this.
> 
> You can clean this up later, after we have fixed this mess.
> 
> In the meantime this prompt isn't causing any harm for Firefox, because you
> can disable it completely in Firefox with the patch in bug 744945.

It does, because it wastes our time. It isn't reasonable to insist on an poor solution to a problem and then insist that somebody else create a reasonable one so that you don't have to. Please stop wasting our time.
 
> > 2. We are not responsible for maintaining XULRunner any more,
> 
> It's fine that you feel not responsible, but I do. I fully understand that
> you didn't care, we can see it in the way you attempted to fix this bug, by
> ripping out helpful user diagnostic support.

It doesn't matter. The project decided that XULRunner is not worth investing in any longer. It isn't reasonable to insist on other Gecko developers, and especially other PSM developers, to follow your preferences here. 

> We do not have to, but I absolutely want to, and I've worked hard to produce
> this patch which is a compromise that I find very hard to complain about.

> Again, this patch, together with bug 744945 won't show ANY prompt in a
> content window in Firefox.

Doesn't matter. The presence of the code in the Core product, and the time that would be wasted adding bad code to Core, are both harmful to Firefox. It doesn't make sense to add dead code to Firefox to make some XULRunner app developers' lives slightly easier.
The next merge is April 24th. I propose that, at least, we avoid making a final decision as to what to do in this bug until the 19th which would give us 5 days before the merge window.

That would give us a week to fix bug 739563. I believe the fix for bug 739563 is very simple--much simpler than this code and without all the problems that it has. I will try to find a solution for bug 695066 in that timeframe too.
Comment on attachment 614556 [details] [diff] [review]
Patch v7

I just realized that this patch makes it so that one has to explicitly opt out of the dialog.  I don't think that's reasonable; the UI for this should be opt-on, especially because different applications would have very different UI requirements for this sort of thing.
Attachment #614556 - Flags: superreview+ → superreview-
And to be clear, my initial sr just covered the IDL, basically; I have not done a thorough review of the general approach in the patch and I expect that to be sorted out through the normal module peer review process.  If at some point I _am_ needed to do a more thorough review of the overall approach, please tell me and I'll put in the time to do that, of course.
I would have preferred opt-out, but I can live with the one-line change to change this patch to opt-in.
Attached patch Patch v8Splinter Review
- changed the default to PROMPT_SUPPRESS.
- removes the BHV_ prefix from constants.
Attachment #614556 - Attachment is obsolete: true
Attachment #614556 - Flags: review?(honzab.moz)
Attachment #614708 - Flags: review?(honzab.moz)
Blocks: 744945
Blocks: 745132
Blocks: 745133
Comment on attachment 614708 [details] [diff] [review]
Patch v8

Review of attachment 614708 [details] [diff] [review]:
-----------------------------------------------------------------

OK.  The patch is still too complicated.  Since it has to be replaced, it has to be built to be replaceable easily, i.e. modify as few as possible of the existing code.  

My idea was to let the service show the dialog and not the core code.

I don't like modifying so many methods signatures and changing places where called on so many places just to carry a single flag.

[To save a thread loop,] let's do the check for behavior in CertErrorRunnable::CheckCertOverrides() and nsHandleSSLError, as your patch does.  The code that check the docshell has to be in the service's new method CheckBehavior(infoObject).

Set the state to show or not to show the prompt on info object with a new method nsNSSSocketInfo::SetDefaultPromptShow(true/false) at [1] and [2].  That method also checks the PSM UI tracker.

SSLServerCertVerificationResult::Run(), after call to SetCertVerificationResult that fills the info object with all info, reads this flag from the info object and if found |true| calls asynchronously (via nsRunnable probably) the new service's method ShowPrompt[Whatever](infoObject).  Info object gives to the service the message in a demanded format with a new method unichar* nsNSSSocketInfo::GetErrorMessageCustom(type).


Would this work?


[1] http://hg.mozilla.org/mozilla-central/annotate/d373afa244ee/security/manager/ssl/src/SSLServerCertVerification.cpp#l388
[2] http://hg.mozilla.org/mozilla-central/annotate/869edbbfad81/security/manager/ssl/src/nsNSSIOLayer.cpp#l1595

::: security/manager/ssl/public/nsISSLErrorPromptFallback.idl
@@ +104,5 @@
> +    *   If the socket isn't associated to a docshell,
> +    *   then showing the fallback prompt is allowed
> +    *
> +    */
> +  const unsigned short DETECT_DOCSHELL = 3;

DETECT_REPORTING_BY_ERROR_PAGE ?

::: security/manager/ssl/src/SSLServerCertVerification.cpp
@@ +215,5 @@
>    }
>  }
>  
> +static nsresult
> +nsHandleInvalidCertError(nsNSSSocketInfo *socketInfo, 

This should be method of the new service.

::: security/manager/ssl/src/nsNSSIOLayer.cpp
@@ +1679,3 @@
>  static void
> +nsHandleSSLError(nsNSSSocketInfo *socketInfo, 
> +                 ::mozilla::psm::SSLErrorMessageType errtype, 

errtype is unused, this just unnecessarily complicates the patch.
You're asking me to change a patch to perfection, that you plan to replace anyway?
I give up.

I tried my best to help the Mozilla mail products, and other code using the Mozilla technology.

If you don't want my help, and if you rather live with frustrated users, so be it.

I would have appreciated some support from the owners of the mail product, but if you don't want this fix, I should stop fighting for it.
Assignee: kaie → nobody
Plan:
- Brian will talk with TB people about a potentially one or two line fix for the original issue
- If it fails, I'll update this patch according my and Brian's suggestions and let Brian review it
- Goal is to finish or abandon this bug until the next Aurora merge on April 24th
Assignee: nobody → honzab.moz
Status: REOPENED → ASSIGNED
Comment on attachment 614708 [details] [diff] [review]
Patch v8

r- in comment 74.
Attachment #614708 - Flags: review?(honzab.moz) → review-
I looked into this issue more today. Thunderbird implements nsIBadCertListener and/or nsISSLErrorListener in various situations in a way where they've suppressed certificate error messages from being shown for a very long time. See http://mxr.mozilla.org/comm-central/source/mail/base/content/mailWindow.js#552 for example.

Like I mentioned in mail, Thunderbird uses a technique very similar to the "MiTM Me" addon to get the exception dialog box to pop up when there is a cert error. Interestingly, Thunderbird put up the exception dialog box when there was a domain mismatch error, but did not do so for the MD5 cert error. Looking at Thunderbird's code, it does not try to figure out whether the cert error is overridable or not, so I am surprised that this happened. I suspect that this is either a bug in Thunderbird's code, or in the code for the cert override dialog box; in either case, that should be handled in a separate bug: bug 739563.

IMO, Thunderbird needs to make multiple changes to its bad cert handling code. First, I strongly believe the "MitM Me" UI is the wrong one, so the basis of the code is wrong. But, even if the Thunderbird team disagrees with me, Thunderbird should *at least* try to determine whether the cert error is overridable and show different UI for overridable vs. non-overridable errors. This should be handled in a separate bug, such as bug 739563.

Most importantly, since the default UX proposed in this bug is wrong for Thunderbird *and* wrong for Desktop Firefox *and* wrong for Mobile Firefox, and because of all the other reasons that I noted above, we should just take the fix for this bug that has already been committed for six months.

Also, because mozilla-central is closed for approval-only checkins, and I think that no changes to Gecko will (or should) get approved, there's not much we could do in this bug for Firefox 14 or Thunderbird 14 anyway.

Honza, if you agree then please mark this bug RESOLVED: FIXED and Assigned: bsmith@mozilla.com so that we can properly track of the already-commited changes, and I will file the follow-up bug for the review of the dead code removal patch in the attachment 565141 [details] [diff] [review].

Kai, if you disagree with this, please email Brendan Eich directly. I have already discussed it with him.
We have no attachment to the MITM Me UI, but that's the UI we could get from the core code, so we went with it. If there's better UI we could put up, we're all for it.
(In reply to Brian Smith (:bsmith) from comment #79)
> Looking at Thunderbird's code, it does not try to figure out whether the
> cert error is overridable or not, so I am surprised that this happened. I
> suspect that this is either a bug in Thunderbird's code, or in the code for
> the cert override dialog box; in either case, that should be handled in a
> separate bug: bug 739563.

I believe that I made an error when I did the SSL thread removal and/or XPCOM proxy removal: 

  if (!collected_errors)
  {
    // This will happen when CERT_*Verify* only returned error(s) that are
    // not on our whitelist of overridable certificate errors.
    PR_LOG(gPIPNSSLog, PR_LOG_DEBUG, ("[%p] !collected_errors: %d\n",
           fdForLogging, static_cast<int>(defaultErrorCodeToReport)));
    PR_SetError(defaultErrorCodeToReport, 0);
    return nsnull;
  }

I will post the fix in bug 739563. I believe this will re-introduce the bug that Thunderbird shows the cert override dialog box even for non-overridable cert errors. If so, I will file a new bug in the Thunderbird component for that and I will provide the fix, which should be simple.
Assignee: honzab.moz → bsmith
Status: ASSIGNED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
If we won't have any dialog as a last resort, maybe bug 688810 should get some priority.
What API change happened here? Although there was a patch that removed API, as far as I can tell, it didn't land. I think the net is no doc change needed after all.
Before:
When you create a HTTPS (or other SSL-based) channel, and a cert error (e.g. expired certificate) or other SSL error (e.g. secure renegotiation failure) occurred, a dialog box would pop up, unless the callbacks you associate with the channel implement nsIBadCertListener2 or nsISSLErrorListener to suppress the dialogs.

After:
It is no longer necessary to implement nsIBadCertListener2 or nsISSLErrorListener to suppress the UI, because there is no default UI. If you want dialog boxes then your application must implement those callback interfaces and to supply a UI.
> After: ... no ... UI.

And the behaviour is block?
(In reply to Ben Bucksch (:BenB) from comment #85)
> > After: ... no ... UI.
> 
> And the behaviour is block?

Right, the connection will fail. The only intended functional change was the removal of the default dialog boxes.
Blocks: 783974
Depends on: 916728
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: