Closed Bug 1313595 Opened 3 years ago Closed 3 years ago

Lower timeout on HSTS Priming channels

Categories

(Core :: DOM: Security, defect, P2, major)

51 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla53
Tracking Status
firefox51 + wontfix
firefox52 - wontfix
firefox53 - verified

People

(Reporter: kmckinley, Assigned: kmckinley)

References

(Depends on 1 open bug)

Details

(Keywords: perf, regression, topperf, Whiteboard: [domsecurity-backlog1] [hsts-priming])

Attachments

(1 file, 4 obsolete files)

In certain cases where the HSTS Priming request times out, there is a 30 second delay before the load proceeds for loads which are not blocked. If there is a redirect to another HTTP resource which also times out, an additional 30 seconds are required to resolve the priming request.

Suggested timeout is 10 seconds. (Source: https://goo.gl/S5ARGX 95% of HTTP sub-loads happen in ~ 8.82s)
Priority: -- → P3
Whiteboard: [domsecurity-backlog1]
It looks like there is a way to set the timeout on a socket level (nsISocketTransport). Very open to different suggestions, or if there might be a better place to put it.
Comment on attachment 8806247 [details]
Bug 1313595 - Lower HSTS priming timeout

r=me on the loadInfo bits. I suppose Honza can tell you whether Necko bits are ok.
Attachment #8806247 - Flags: review?(ckerschb) → review+
Whiteboard: [domsecurity-backlog1] → [domsecurity-backlog1] [hsts-priming]
Comment on attachment 8806247 [details]
Bug 1313595 - Lower HSTS priming timeout

https://reviewboard.mozilla.org/r/89744/#review89696

I may rethink my assertion in https://bugzilla.mozilla.org/show_bug.cgi?id=407190#c45

As I understand, the priming request actually blocks the page load, right?  first of all, the priority of the channel should be set to "highest" (see nsISupportsPriority) and I believe the timeout should rather be measured on the top most level, ignoring all http transaction queues, connection and socket timeouts.  user more cares about the page to load than to be super-secure.  I know, it sucks, but that's the reality.  

so I'd implement it rather simple as: if you don't make it from AsyncOpen in 10 seconds (I would agree even with just 3 seconds here) you fail the priming request with the result "I don't know" and unblock the page load as if there were no priming.

make it a preference.  you can use higher values for tests to make them less volatile.

::: netwerk/base/nsILoadInfo.idl:665
(Diff revision 1)
>     */
>    [noscript, infallible] readonly attribute boolean mixedContentWouldBlock;
> +  /**
> +   * Whether this is a priming channel, so lower timeouts can be enforced.
> +   */
> +  [noscript, infallible] attribute boolean isHSTSPrimingChannel;

if you need to carry this to the channel, I think you should add it on nsIHttpChannelInternal rather than on nsILoadInfo

::: netwerk/protocol/http/nsHttpChannel.cpp:6996
(Diff revision 1)
> +            if (mLoadInfo) {
> +                bool isPrimingChannel = false;
> +                rv = mLoadInfo->GetIsHSTSPrimingChannel(&isPrimingChannel);
> +                NS_ENSURE_SUCCESS(rv, rv);
> +                if (isPrimingChannel) {
> +                    socketTransport->SetTimeout(nsISocketTransport::TIMEOUT_CONNECT, 10);

I believe this is set too late.
Attachment #8806247 - Flags: review?(honzab.moz) → review-
Track 51+ to reduce the timeout for bug 1311807.
Kate, do we have a follow up on comment 5, given that this is meant to uplift to 51?
Flags: needinfo?(kmckinley)
Comment on attachment 8806247 [details]
Bug 1313595 - Lower HSTS priming timeout

https://reviewboard.mozilla.org/r/89744/#review91608

::: netwerk/protocol/http/HSTSPrimerListener.cpp:43
(Diff revisions 1 - 3)
> +HSTSPrimingListener::HSTSPrimingListener(nsIHstsPrimingCallback* aCallback)
> +  : mCallback(aCallback)
> +{
> +  if (sHSTSPrimingTimeout == 0) {
> +    Preferences::AddUintVarCache(&sHSTSPrimingTimeout,
> +        "security.mixed_content.hsts_priming_request_timeout");

what if the preference is set to 0?

maybe do:

static nsresult rv = Preferences::AddUintVarCache(...);
Unused << rv;

(include "mozilla/Unused.h")

::: netwerk/protocol/http/HSTSPrimerListener.cpp:287
(Diff revisions 1 - 3)
> -  // Set up listener which will start the original channel
> -  nsCOMPtr<nsIStreamListener> primingListener(new HSTSPrimingListener(aCallback));
> +  // Image channels are loaded by default with reduced priority.
> +  nsCOMPtr<nsISupportsPriority> p = do_QueryInterface(primingChannel);
> +  if (p) {
> +    uint32_t priority = nsISupportsPriority::PRIORITY_HIGHEST;
>  
> +    p->AdjustPriority(priority);

use SetPriority

::: netwerk/protocol/http/HSTSPrimerListener.cpp:294
(Diff revisions 1 - 3)
> +
> +  // Set up listener which will start the original channel
> +  HSTSPrimingListener* listener = new HSTSPrimingListener(aCallback);
> +  nsCOMPtr<nsIStreamListener> primingListener(listener);
>    // Start priming
>    rv = primingChannel->AsyncOpen2(primingListener);

nit: you can just pass listener directly?

::: netwerk/protocol/http/HSTSPrimerListener.cpp:303
(Diff revisions 1 - 3)
> +  NS_ENSURE_STATE(timer);
> +
> +  rv = timer->InitWithFuncCallback(OnHSTSPrimingTimeout,
> +                                   primingChannel,
> +                                   sHSTSPrimingTimeout,
> +                                   nsITimer::TYPE_ONE_SHOT);

since you are providing a referencable closure (the channel) you must use either nsITimer.init(nsIObserver) or nsITimer.initWithCallback(nsITimerCallback) to handle referencing well.

problem with a static function is that it may be called after the channel is already gone.

I think HSTSPrimingListener should implement the observer/timer-callback interface instead.

also, when the channel makes it sooner than the timer fires, you must cancel the timer, which I don't see in the patch

::: netwerk/protocol/http/HSTSPrimerListener.cpp:306
(Diff revisions 1 - 3)
> +                                   primingChannel,
> +                                   sHSTSPrimingTimeout,
> +                                   nsITimer::TYPE_ONE_SHOT);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  listener->mHSTSPrimingTimeout = timer;

s/mHSTSPrimingTimeout/mHSTSPrimingTimer/

::: netwerk/protocol/http/nsHttpChannel.cpp:6996
(Diff revisions 1 - 3)
>          nsCOMPtr<nsISocketTransport> socketTransport =
>              do_QueryInterface(trans);
>          if (socketTransport) {
>              if (mLoadInfo) {
>                  bool isPrimingChannel = false;
> -                rv = mLoadInfo->GetIsHSTSPrimingChannel(&isPrimingChannel);
> +                rv = GetIsHSTSPrimingChannel(&isPrimingChannel);

I think you can just use the mIsPrimingChannel member?

::: netwerk/protocol/http/nsHttpChannel.cpp:6999
(Diff revisions 1 - 3)
>              if (mLoadInfo) {
>                  bool isPrimingChannel = false;
> -                rv = mLoadInfo->GetIsHSTSPrimingChannel(&isPrimingChannel);
> +                rv = GetIsHSTSPrimingChannel(&isPrimingChannel);
>                  NS_ENSURE_SUCCESS(rv, rv);
>                  if (isPrimingChannel) {
>                      socketTransport->SetTimeout(nsISocketTransport::TIMEOUT_CONNECT, 10);

OTOH, I don't think you need this code when you measure timeout on the upper layer.

::: netwerk/protocol/http/nsHttpChannel.cpp:8039
(Diff revisions 1 - 3)
> +{
> +    bool wouldBlock = mLoadInfo->GetMixedContentWouldBlock();
> +    LOG(("HSTS Priming timed out [this=%p], %s the load", this,
> +                (wouldBlock) ? "blocking" : "allowing"));
> +    // A priming request was sent, and no HSTS header was found that allows
> +    // the upgrade.

is this comment right?

::: netwerk/protocol/http/nsHttpChannel.cpp:8044
(Diff revisions 1 - 3)
> +    // the upgrade.
> +    Telemetry::Accumulate(Telemetry::MIXED_CONTENT_HSTS_PRIMING_RESULT,
> +            (wouldBlock) ?  HSTSPrimingResult::eHSTS_PRIMING_TIMEOUT_BLOCK :
> +                            HSTSPrimingResult::eHSTS_PRIMING_TIMEOUT_ACCEPT);
> +
> +    return CacheHSTSPrimingAndContinue(NS_ERROR_CONTENT_BLOCKED, wouldBlock);

is the priming channel canceled?  because when it makes it later it will call other priming callbacks on the request channel and it will totally break..

other reason to move the timer notification to the listener class where you also throw the callback reference away (to ensure that any of the hsts priming callbacks on the request channel can only be called once)

::: netwerk/protocol/http/HSTSPrimerListener.h:112
(Diff revision 3)
> +  /**
> +   * Keep a handle to the timer around
> +   */
> +  nsCOMPtr<nsITimer> mHSTSPrimingTimeout;
> +
> +  static uint32_t sHSTSPrimingTimeout;

some comment please
Attachment #8806247 - Flags: review?(honzab.moz) → review-
(In reply to Milan Sreckovic [:milan] from comment #9)
> Kate, do we have a follow up on comment 5, given that this is meant to
> uplift to 51?

It's in progress.
Flags: needinfo?(kmckinley)
Comment on attachment 8806247 [details]
Bug 1313595 - Lower HSTS priming timeout

https://reviewboard.mozilla.org/r/89744/#review91928

OK, stiff few nits to fix, then r+ with everything fixed.

::: modules/libpref/init/all.js:5521
(Diff revision 6)
>  pref("security.mixed_content.use_hsts", true);
>  #endif
>  // Approximately 1 week default cache for HSTS priming failures
>  pref ("security.mixed_content.hsts_priming_cache_timeout", 10080);
> +// Force the channel to timeout in 3 seconds if we have not received
> +pref ("security.mixed_content.hsts_priming_request_timeout", 3000);

please add to the coment a note that the preference expects being set to number of milliseconds, just to make sure the units are clear.

::: netwerk/protocol/http/HSTSPrimerListener.h:113
(Diff revision 6)
>     * the nsIHttpChannel to notify with the result of HSTS priming.
>     */
>    nsCOMPtr<nsIHstsPrimingCallback> mCallback;
> +
> +  /**
> +   * Keep a handle to the timer around

nice comment, but also please say why :)

::: netwerk/protocol/http/HSTSPrimerListener.cpp:30
(Diff revision 6)
>  using namespace mozilla;
>  
>  NS_IMPL_ISUPPORTS(HSTSPrimingListener, nsIStreamListener,
>                    nsIRequestObserver, nsIInterfaceRequestor)
>  
> +uint32_t HSTSPrimingListener::sHSTSPrimingTimeout = 0;

nit: sync this with the default you've added in all.js, i.e. this should be 3000.

::: netwerk/protocol/http/HSTSPrimerListener.cpp:72
(Diff revision 6)
> +NS_IMETHODIMP
> +HSTSPrimingListener::OnStartRequest(nsIRequest *aRequest,
> +                                    nsISupports *aContext)
> +{
> +  nsCOMPtr<nsIHstsPrimingCallback> callback(mCallback);
> +  mCallback = nullptr;

a bit more optimal is:
nsCOMPtr<nsIHstsPrimingCallback> callback;
callback.swap(mCallback);

this moves the pointer from the member to the local comptr and leaves the member empty (what we want), saves one addref/release

::: netwerk/protocol/http/HSTSPrimerListener.cpp:82
(Diff revision 6)
> +  nsresult primingResult = CheckHSTSPrimingRequestStatus(aRequest);
> +  ReportTiming(primingResult);
> +
> +  if (mHSTSPrimingTimer) {
> +    nsresult rv = mHSTSPrimingTimer->Cancel();
> +    Unused << rv;

first, please don't declare nsresult rv inside a block, always as the first variable in a function.  then, just do (if necessary) Unused << mHSTSPrimingTimer->Cancel();

and, maybe cancel the timer before you check the callback?

::: netwerk/protocol/http/HSTSPrimerListener.cpp:177
(Diff revision 6)
> +{
> +  nsCOMPtr<nsIHstsPrimingCallback> callback(mCallback);
> +  mCallback = nullptr;
> +  NS_ENSURE_STATE(callback);
> +  ReportTiming(NS_ERROR_HSTS_PRIMING_TIMEOUT);
> +  return callback->OnHSTSPrimingFailed(NS_ERROR_HSTS_PRIMING_TIMEOUT, false);

the result actually goes nowhere

and, who cancels the priming channel?

::: netwerk/protocol/http/HSTSPrimerListener.cpp:322
(Diff revision 6)
> +  NS_ENSURE_STATE(timer);
> +
> +  rv = timer->InitWithCallback(listener,
> +                               sHSTSPrimingTimeout,
> +                               nsITimer::TYPE_ONE_SHOT);
> +  NS_ENSURE_SUCCESS(rv, rv);

turn this to be non-fatal please.  because you have already opened the priming channel anyway

::: netwerk/protocol/http/HSTSPrimerListener.cpp:324
(Diff revision 6)
> +  rv = timer->InitWithCallback(listener,
> +                               sHSTSPrimingTimeout,
> +                               nsITimer::TYPE_ONE_SHOT);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  listener->mHSTSPrimingTimer = timer;

.swap() will be better here too

::: xpcom/base/ErrorList.h:773
(Diff revision 6)
>    /* Error code for XBL */
>    ERROR(NS_ERROR_XBL_BLOCKED,                   FAILURE(15)),
>    /* Error code for when the content process crashed */
>    ERROR(NS_ERROR_CONTENT_CRASHED,               FAILURE(16)),
> +  /* Error code for when HSTS priming request times out */
> +  ERROR(NS_ERROR_HSTS_PRIMING_TIMEOUT,          FAILURE(17)),

place this to the network module please
Attachment #8806247 - Flags: review?(honzab.moz) → review+
There are still a lot of issues with HSTS priming even with the timeout reduction, and I'm getting tons of complaints from visitors running the dev build regarding performance being ridiculously slow. For sites that rely on loading content from external hosts where none of the workarounds can be applied, this makes the site literally unusable.

1) Why was the decision made to hijack an existing established header for this mechanism? If a site wanted this functionality, it should be able to signal it somehow, or at the very least signal an opt-out.

2) The mechanism seems to be taking place even if a site is no longer sending the Strict-Transport-Security header, so with the typical long cache periods there is not even a way to back out of it when it inevitably breaks site functionality. If an opt-out signal is not added, the mechanism should absolutely only take place if the origin request is *currently* sending the header.

3) The mechanism is even attempted for a HTTP request to a non-standard port. I would think that the cases where HTTP content retrieved from a port except for 80, and it will be able to successfully request the same content from port 443, are few and far between.

This absolutely needs a lot more work to prevent breaking existing sites in ways that cannot be fixed, and cannot be deployed live in any fashion in the current state.
Comment on attachment 8806247 [details]
Bug 1313595 - Lower HSTS priming timeout

https://reviewboard.mozilla.org/r/89744/#review91928

> the result actually goes nowhere
> 
> and, who cancels the priming channel?

Can you take a look at the current diff? Updated to cancel the priming channel, then fail priming, but open to swapping order. Since the result goes nowhere, returning NS_OK.
Comment on attachment 8806247 [details]
Bug 1313595 - Lower HSTS priming timeout

https://reviewboard.mozilla.org/r/89744/#review92278

::: netwerk/protocol/http/HSTSPrimerListener.cpp:75
(Diff revisions 6 - 9)
>                                      nsISupports *aContext)
>  {
> -  nsCOMPtr<nsIHstsPrimingCallback> callback(mCallback);
> -  mCallback = nullptr;
> +  nsCOMPtr<nsIHstsPrimingCallback> callback;
> +  callback.swap(mCallback);
> +  if (mHSTSPrimingTimer) {
> +    Unused << mHSTSPrimingTimer->Cancel();

and drop reference to the timer as well please

::: netwerk/protocol/http/HSTSPrimerListener.cpp:327
(Diff revisions 6 - 9)
>      p->SetPriority(priority);
>    }
>  
>    // Set up listener which will start the original channel
>    HSTSPrimingListener* listener = new HSTSPrimingListener(aCallback);
>    //nsCOMPtr<nsIStreamListener> primingListener(listener);

remove this line?
Comment 16 ^
Flags: needinfo?(kmckinley)
Hi Heidi, thank you for your feedback. I would like to note that this patch is not yet landed, so the lower timeouts are not yet in effect.

(In reply to heidi from comment #16)
> There are still a lot of issues with HSTS priming even with the timeout
> reduction, and I'm getting tons of complaints from visitors running the dev
> build regarding performance being ridiculously slow. For sites that rely on
> loading content from external hosts where none of the workarounds can be
> applied, this makes the site literally unusable.

Can you provide an example?

> 1) Why was the decision made to hijack an existing established header for
> this mechanism? If a site wanted this functionality, it should be able to
> signal it somehow, or at the very least signal an opt-out.

Sites can opt-out of HSTS by sending the Strict-Transport-Security header with a max-age of 0 over a secure channel. HSTS is one of several signal sites can provide to indicate they want to be upgraded to HTTPS. At the moment, HSTS upgrades do not happen prior to blocking for mixed-content, and HSTS Priming is an experiment to be able to change that. The issue with only swapping the enforcement of mixed-content with HSTS is that there is no reliable way for resources specified over HTTP to signal they want that upgrade, and HSTS Priming allows sites to opt-in.

> 2) The mechanism seems to be taking place even if a site is no longer
> sending the Strict-Transport-Security header, so with the typical long cache
> periods there is not even a way to back out of it when it inevitably breaks
> site functionality. If an opt-out signal is not added, the mechanism should
> absolutely only take place if the origin request is *currently* sending the
> header.

Best practices for sites who want to turn HSTS on is to start with a shorter timeout rather than going to a long 18 week timeout. A site which needs to back out of HSTS can send the header with max-age of 0, which will clear the browser's cache.

> 3) The mechanism is even attempted for a HTTP request to a non-standard
> port. I would think that the cases where HTTP content retrieved from a port
> except for 80, and it will be able to successfully request the same content
> from port 443, are few and far between.

Thank you for this, we will take a look at it.

> This absolutely needs a lot more work to prevent breaking existing sites in
> ways that cannot be fixed, and cannot be deployed live in any fashion in the
> current state.

It is not currently slated to be turned on in Release.
Flags: needinfo?(kmckinley)
Priority: P3 → P2
(In reply to Kate McKinley [:kmckinley] from comment #24)
> Can you provide an example?

The problems arise when a site loads resources from diverse external hosts, for example if you do CDN on the application level rather than the DNS level, as the efficiency of the priming cache is hampered. A three second timeout is obviously much better than a 30-60 second timeout, but it would still slow down the load for many of these resources by three seconds, especially if the TTL of the priming cache is as low as the suggested one day.

> Sites can opt-out of HSTS by sending the Strict-Transport-Security header
> with a max-age of 0 over a secure channel. HSTS is one of several signal
> sites can provide to indicate they want to be upgraded to HTTPS. At the
> moment, HSTS upgrades do not happen prior to blocking for mixed-content, and
> HSTS Priming is an experiment to be able to change that. The issue with only
> swapping the enforcement of mixed-content with HSTS is that there is no
> reliable way for resources specified over HTTP to signal they want that
> upgrade, and HSTS Priming allows sites to opt-in.

Thanks for the tip about backing out of HSTS.

My suggestion is not to trust any signaling sent over insecure HTTP connections. There should however be a way for the site itself to signal over the secure HTTPS connection, possibly via additional arguments to the Strict-Transport-Security header, whether the insecure HTTP resources can be upgraded or not. So for example, if the site knows that the HTTP resources cannot be upgraded, it could send something like:

Strict-Transport-Security: max-age=604800; preload; no-priming

It could still default to attempt HSTS priming, but this would preempt any issues from slowdowns caused by timeouts of resources that are known to not be HTTPS-retrievable.

> Best practices for sites who want to turn HSTS on is to start with a shorter
> timeout rather than going to a long 18 week timeout. A site which needs to
> back out of HSTS can send the header with max-age of 0, which will clear the
> browser's cache.

This makes sense, but the issue here arises if browsers start to interpret the header differently at some point in the future. It is however not a significant problem if you can indeed clear the cache.
Pushed by kmckinley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a8be4ebc85cf
Reduce timeout for HSTS priming channels r=mayhemer
https://hg.mozilla.org/mozilla-central/rev/a8be4ebc85cf
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Depends on: 1317115
Backout by philringnalda@gmail.com:
https://hg.mozilla.org/mozilla-central/rev/5e7676832766
Backed out changeset a8be4ebc85cf for permaorange unexpected assertion in test_referrerdirective.html, a=backout
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla52 → ---
I'm intentionally leaving this on mozilla-aurora, since bug 1317115 has a simple test-only patch that should fix it there.
Pushed by philringnalda@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3472d9d9dd47
Reduce timeout for HSTS priming channels r=mayhemer
Depends on: 1317298
Depends on: 1317624
https://hg.mozilla.org/mozilla-central/rev/3472d9d9dd47
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Assignee: kmckinley → sankha93
(In reply to Sankha Narayan Guria [:sankha] from comment #33)
> Created attachment 8811139 [details] [diff] [review]
> Backed out changeset a8be4ebc85cf () for permaorange unexpected assertion in
> test_referrerdirective.html,
> 
> MozReview-Commit-ID: GxBqDrHHg7z

Why was this backed out and reassigned?
> Why was this backed out and reassigned?

I am sorry, I was struggling to set up the bzexport extension today morning on my machine and probably this got pushed from my account because of it. Nothing has been backed out, just that changeset was attached here as an attachment.

Sorry for the noise.
Assignee: sankha93 → kmckinley
Attachment #8811139 - Attachment is obsolete: true
Blocks: 1308612
Comment on attachment 8806247 [details]
Bug 1313595 - Lower HSTS priming timeout

Approval Request Comment
[Feature/regressing bug #]:
HSTS Priming causes long delays in multiple redirect situations such as audio players. (bug 1311807)

[User impact if declined]:
Users wishing to use HSTS Priming will experience long delays on certain sites.

[Describe test coverage new/current, TreeHerder]:
Added new test (in patch) to ensure the timeout functions appropriately.

[Risks and why]:
Risk should be limited to users using HSTS priming. In Release, users will need to flip a pref to use priming.

[String/UUID change made/needed]:
None
Attachment #8806247 - Flags: approval-mozilla-beta?
Hi Kate, 
Since this feature won't be enabled in 51, I would like this to ride the train. Can we turn both prefs off in 51?
Flags: qe-verify+
Flags: needinfo?(kmckinley)
That's fine, we can tell people to use 52 or higher to test.
Flags: needinfo?(kmckinley)
Comment on attachment 8806247 [details]
Bug 1313595 - Lower HSTS priming timeout

Let's let this ride the train on 52. Mark 51 won't fix.
Attachment #8806247 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Depends on: 1317517
Depends on: 1319606
I personally vote for backing this out now from all branches.

Kate, please write automated tests for this functionality, for both timeout and non-timeout case.  Also, add traceable channel on channels you are about to test with so that we find out why this is crashing.  If not super-complicated, preferably, the test should be a mochitest, so that all the docshell/error page at al machinery is being ran.

Note that I will not r+ the next version of the patch if tests are not added.  It was a mistake from me the first time.
Flags: needinfo?(kmckinley)
(In reply to Honza Bambas (:mayhemer) from comment #40)
> I personally vote for backing this out now from all branches.
> 
> Kate, please write automated tests for this functionality, for both timeout
> and non-timeout case.  Also, add traceable channel on channels you are about
> to test with so that we find out why this is crashing.  If not
> super-complicated, preferably, the test should be a mochitest, so that all
> the docshell/error page at al machinery is being ran.
> 
> Note that I will not r+ the next version of the patch if tests are not
> added.  It was a mistake from me the first time.

I don't understand what you mean by add traceable channel here. I see the interface, but it isn't clear what is supposed to be done with it. HttpBaseChannel implements nsITraceableChannel. Do you have any examples you can point to?
Flags: needinfo?(kmckinley)
(In reply to Kate McKinley [:kmckinley] from comment #44)
> I don't understand what you mean by add traceable channel here. I see the
> interface, but it isn't clear what is supposed to be done with it.
> HttpBaseChannel implements nsITraceableChannel. Do you have any examples you
> can point to?

Ah, OK.  nsITraceableChannel is used to catch (and potentially interfere with) OnStart/OnData/OnStop notifications coming directly from a channel.  It's mainly used by diagnostic tools such as our DevTools (specifically the Network pane) or Firebug and some other extensions.

It works the way that an intercepting listener is added between the channel and the current listener (regular consumer of the channel such as script loader, image parser, html parser etc).  This intercepting listener can then examine response headers and the response body and is responsible to forward the notifications to the original target listener of the channel.

One of the crashes that this patch seems to cause is having nsStreamListenerWrapper on the top frame.  That class is used when the intercepting listener is added on the channel via the nsITraceableChannel interface (SetNewListener) wrapping the current channel's listener.  See [1].

Tho, it's also used for fetch() at [2].  I unfortunately can't say from the stacks in bug 1319606 which use of nsStreamListenerWrapper leads to the crash for sure.  But my bet is more on DevTools than fetch().  Hence, the suggestion to exercise the traceable channel.


The reason I insist on adding a test for this is that the priming feature is already causing some intermittent test failures and seems to me overall immature and unstable so far.  So I want to make sure that more changes (even attempts to fix those intermittents) don't cause more regressions.

It's probably OK if you only modify existing tests too to add prove we don't crash or so with this patch.

Anyway, I'd like you to also focus on the already existing tests that are intermittently failing and overall stability improvements here, to find the true causes and fix either the tests or the code.  Maybe you already did that, and I might only miss some analyzes and conclusions added as bugzilla comments.


And one last question, is this fix actually fixing any of the test failures in bugs it blocks?  Does it improve anything?

Thanks.


[1] https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpBaseChannel.cpp#2698
[2] https://dxr.mozilla.org/mozilla-central/rev/bad312aefb42982f492ad2cf36f4c6c3d698f4f7/dom/xhr/XMLHttpRequestMainThread.cpp#2755
Flags: needinfo?(kmckinley)
Comment on attachment 8814315 [details]
Bug 1313595 - Missing interface in HSTSPrimerListener

https://reviewboard.mozilla.org/r/95584/#review95686

I know I did a drive-by formatting nit, but otherwise I'm not qualified to review this :)
Attachment #8814315 - Flags: review?(n.nethercote)
(In reply to Honza Bambas (:mayhemer) from comment #45)

This is a tough question to answer. Without nsITimerCallback in NS_IMPL_ISUPPORTS, I reliably get *a* crash when the timer fires, but I'm having a hard time to reproduce the specific crashes in 1319606 and 1317517, so I can't say for sure. With the patch applied, there are a number of flaky things probably caused by priming that are simply not occurring in this test run. For example, I could reliably reproduce the assertion in bug 1317298 before, and with the patch, I can't.

There is a treeherder run here with the most current version, fixing some test flakiness. These new timeout tests need a bit more tuning.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5575a14d72dffb75b287807434ec7976fb74e22b

I'm still not sure what you mean by adding a traceable channel. Do you mean to wrap the HSTSPrimerListener in an nsStreamListenerWrapper? Sorry, it just isn't clear form the examples or the class documentation what needs to be done here.

If all it is is the nsStreamListenerWrapper, there is a patch on try that adds that and hopefully fixes the test fails. https://treeherder.mozilla.org/#/jobs?repo=try&revision=15564dfe8583bafb7fb613c10b7e5d9ac9410cab

> (In reply to Kate McKinley [:kmckinley] from comment #44)
> > I don't understand what you mean by add traceable channel here. I see the
> > interface, but it isn't clear what is supposed to be done with it.
> > HttpBaseChannel implements nsITraceableChannel. Do you have any examples you
> > can point to?
> 
> Ah, OK.  nsITraceableChannel is used to catch (and potentially interfere
> with) OnStart/OnData/OnStop notifications coming directly from a channel. 
> It's mainly used by diagnostic tools such as our DevTools (specifically the
> Network pane) or Firebug and some other extensions.
> 
> It works the way that an intercepting listener is added between the channel
> and the current listener (regular consumer of the channel such as script
> loader, image parser, html parser etc).  This intercepting listener can then
> examine response headers and the response body and is responsible to forward
> the notifications to the original target listener of the channel.
> 
> One of the crashes that this patch seems to cause is having
> nsStreamListenerWrapper on the top frame.  That class is used when the
> intercepting listener is added on the channel via the nsITraceableChannel
> interface (SetNewListener) wrapping the current channel's listener.  See [1].
> 
> Tho, it's also used for fetch() at [2].  I unfortunately can't say from the
> stacks in bug 1319606 which use of nsStreamListenerWrapper leads to the
> crash for sure.  But my bet is more on DevTools than fetch().  Hence, the
> suggestion to exercise the traceable channel.
> 
> 
> The reason I insist on adding a test for this is that the priming feature is
> already causing some intermittent test failures and seems to me overall
> immature and unstable so far.  So I want to make sure that more changes
> (even attempts to fix those intermittents) don't cause more regressions.
> 
> It's probably OK if you only modify existing tests too to add prove we don't
> crash or so with this patch.
> 
> Anyway, I'd like you to also focus on the already existing tests that are
> intermittently failing and overall stability improvements here, to find the
> true causes and fix either the tests or the code.  Maybe you already did
> that, and I might only miss some analyzes and conclusions added as bugzilla
> comments.
> 
> 
> And one last question, is this fix actually fixing any of the test failures
> in bugs it blocks?  Does it improve anything?
> 
> Thanks.
> 
> 
> [1]
> https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/
> HttpBaseChannel.cpp#2698
> [2]
> https://dxr.mozilla.org/mozilla-central/rev/
> bad312aefb42982f492ad2cf36f4c6c3d698f4f7/dom/xhr/XMLHttpRequestMainThread.
> cpp#2755
Flags: needinfo?(kmckinley)
(In reply to Kate McKinley [:kmckinley] from comment #47)
> (In reply to Honza Bambas (:mayhemer) from comment #45)
> 
> This is a tough question to answer. Without nsITimerCallback in
> NS_IMPL_ISUPPORTS, I reliably get *a* crash when the timer fires, 

It's just an assertion failing when QI (QueryInterface) doesn't return what is expected (having an interface in hands already, but the object's QI doesn't return a result for it).

> but I'm
> having a hard time to reproduce the specific crashes in 1319606 and 1317517,

OK, then we don't know if you've found a fix or not for them.  But it sometimes may be too hard to try to reproduce such crashes.

> so I can't say for sure. With the patch applied, there are a number of flaky
> things probably caused by priming that are simply not occurring in this test
> run. For example, I could reliably reproduce the assertion in bug 1317298
> before, and with the patch, I can't.

That's the assertion I describe above.  It's fixed with this patch (adding the iface to the impl list).

> 
> There is a treeherder run here with the most current version, fixing some
> test flakiness. 

Which exactly?

> These new timeout tests need a bit more tuning.

OK.

> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=5575a14d72dffb75b287807434ec7976fb74e22b
> 
> I'm still not sure what you mean by adding a traceable channel. Do you mean
> to wrap the HSTSPrimerListener in an nsStreamListenerWrapper? Sorry, it just
> isn't clear form the examples or the class documentation what needs to be
> done here.

No.  HTTP channel are implementing nsITraceableChannel which allows you to insert an interception listener (SetNewListener - see its IDL docs).  Maybe try using that in your tests.  Best is to listen for on-modify-request and attach a listener via that interface (just a dummy that forwards to the original one).

> 
> If all it is is the nsStreamListenerWrapper, there is a patch on try that
> adds that and hopefully fixes the test fails.
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=15564dfe8583bafb7fb613c10b7e5d9ac9410cab

Will look at it.
(In reply to Kate McKinley [:kmckinley] from comment #47)
> If all it is is the nsStreamListenerWrapper, there is a patch on try that
> adds that and hopefully fixes the test fails.
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=15564dfe8583bafb7fb613c10b7e5d9ac9410cab

The net::nsStreamListenerWrapper is used when the channel is implemented in JS.  I don't think you need to do the following:

-  rv = primingChannel->AsyncOpen2(listener);
+  nsCOMPtr<nsIStreamListener> wrappedListener =
+    new net::nsStreamListenerWrapper(listener);
+  rv = primingChannel->AsyncOpen2(wrappedListener);


Also:

   nsCOMPtr<nsIHstsPrimingCallback> callback;
   callback.swap(mCallback);
+  mCallback = nullptr;
+

mCallback already is null.  swap actually swaps - as the name suggests - the two comptrs.  hence, if in |callback| was nothing, that "nothing" ends up in mCallback
Kate:

- if the patches that originally resolved:fixed this bug have been backed out please reopen this bug 
- if not, then please open a new bug (to track the progress correctly - this is a mess!) and put the new patches there
- when you submit a patch and Review Board doesn't provide an interdiff, please be so kind and write down what has changed from the last patch ; actually, please do it every time, directly in bugzilla since now
- I don't understand why https://reviewboard.mozilla.org/r/95582/diff/#index_header again needs my review
Flags: needinfo?(kmckinley)
Back out HSTS priming lower timeouts on Aurora due to ongoing crashes.
Attachment #8806247 - Attachment is obsolete: true
Flags: needinfo?(kmckinley)
Depends on: 1320877
Status: RESOLVED → REOPENED
Flags: needinfo?(kmckinley)
Resolution: FIXED → ---
No longer depends on: 1319606
[Tracking Requested - why for this release]: Regression
Severity: normal → major
Keywords: perf, regression, topperf
I don't think relman needs to track this for now. Once we have a viable fix, if it seems like a good idea to uplift, you can request uplift.
Comment on attachment 8814315 [details]
Bug 1313595 - Missing interface in HSTSPrimerListener

https://reviewboard.mozilla.org/r/95584/#review95994

I didn't check the tests deeply.  conditioned by a green try run (preferable two or three to catch intermittents this time)

::: dom/security/test/hsts/file_timeout_server.sjs:23
(Diff revision 1)
> +      }
> +      // avoid confusing cache behaviors
> +      response.setHeader("Cache-Control", "no-cache", false);
> +      response.setHeader("Content-Type", "text/javascript", false);
> +      // never sends Strict-Transport-Security
> +      response.write('console.log("timeout: "+'+timeout+');');

spaces around signs

::: dom/security/test/hsts/file_timeout_server.sjs:25
(Diff revision 1)
> +      response.setHeader("Cache-Control", "no-cache", false);
> +      response.setHeader("Content-Type", "text/javascript", false);
> +      // never sends Strict-Transport-Security
> +      response.write('console.log("timeout: "+'+timeout+');');
> +      response.finish();
> +    }, timeout, Components.interfaces.nsITimer.TYPE_ONE_SHOT); 

trailing ws

::: netwerk/protocol/http/HSTSPrimerListener.cpp:75
(Diff revision 1)
>  HSTSPrimingListener::OnStartRequest(nsIRequest *aRequest,
>                                      nsISupports *aContext)
>  {
>    nsCOMPtr<nsIHstsPrimingCallback> callback;
>    callback.swap(mCallback);
> +  mCallback = nullptr;

assigning null not needed

::: netwerk/protocol/http/HSTSPrimerListener.cpp:182
(Diff revision 1)
>  HSTSPrimingListener::Notify(nsITimer* timer)
>  {
>    nsresult rv;
>    nsCOMPtr<nsIHstsPrimingCallback> callback;
>    callback.swap(mCallback);
> -  NS_ENSURE_STATE(callback);
> +  mCallback = nullptr;

not needed
Attachment #8814315 - Flags: review?(honzab.moz) → review+
Attachment #8814314 - Attachment is obsolete: true
Attachment #8814314 - Flags: review?(honzab.moz)
Attachment #8814315 - Attachment is obsolete: true
I moved the tests from plain mochitests to browser mochitests, so all tests use the same framework.

Treeherder with --repeat=20 (filtered)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=3bc5b20c8791b5fdd8e38734f036ea458f1fe67a&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-classifiedState=unclassified
Flags: needinfo?(kmckinley)
Flags: needinfo?(honzab.moz)
Attachment #8815186 - Attachment is obsolete: true
Honza, I think this try run shows that this patch fixes the problems with the prior version. I moved the unreliable plain mochitests to browser mochitests.
Pushed by kmckinley@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/677c659aed2c
Lower HSTS priming timeout r=mayhemer
https://hg.mozilla.org/mozilla-central/rev/677c659aed2c
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
(answered privately, sorry for delay)
Flags: needinfo?(honzab.moz)
Requesting an uplift request up to version 51,
as Firefox 51 and 52 are also affected,
because HSTS Priming shouldn't land on Firefox 51 per bug #1311807 comment #32,
but oddly and unfortunately it landed.
Flags: needinfo?(gchang)
Hi :kmckinley,
Does security.mixed_content.send_hsts_priming affect the performance? This pref is enabled in 51 release.
Flags: needinfo?(gchang) → needinfo?(kmckinley)
(In reply to Gerry Chang [:gchang] from comment #66)
> Hi :kmckinley,
> Does security.mixed_content.send_hsts_priming affect the performance? This
> pref is enabled in 51 release.

It definitely does - from its nature.  How is that possible the preference is turned on in release?  This is still not a stable feature (recently its tests started to fail, again) and its performance impact has not been fully evaluated.
It's absolutely enabled in Firefox 51, and it's causing huge performance problems for visitors of affected sites. Between this and the HTTP/2 issues, someone needs to take a really close look at how such a completely broken build could have been greenlit for stable.
(In reply to Honza Bambas (:mayhemer) from comment #67)
> How is that possible the preference is turned on in release?

security.mixed_content.send_hsts_priming is enabled on release, but security.mixed_content.use_hsts is disabled:

https://dxr.mozilla.org/mozilla-release/rev/e318be6646ccabe111e295f421d200fb6129e00c/modules/libpref/init/all.js#5579-5591
https://dxr.mozilla.org/mozilla-release/rev/e318be6646ccabe111e295f421d200fb6129e00c/netwerk/base/security-prefs.js#102-112

(I don't know which of these takes precedence.) Does the latter preference override the former? This code was added in bug 1246540.
security.mixed_content.send_hsts_priming controls whether the request is sent, and security.mixed_content.use_hsts changes the order of mixed-content blocking and HSTS upgrades. Bug 1335134 has been created to disable it in the codebase, and bug 1335224 has been created to create a go-faster addon to disable the pref.
Flags: needinfo?(kmckinley)
Depends on: 1335083
Depends on: 1336817
Hello, Kate! Can you please give me a hint about how should I reproduce and verify this issue? Should I consider using the links provided in https://bugzilla.mozilla.org/show_bug.cgi?id=1311807#c0 and https://bugzilla.mozilla.org/show_bug.cgi?id=1311807#c17? Is there anything else I should focus on, beside the loading time?
Flags: needinfo?(kmckinley)
Hi Iulia,

Yes, either of those links should work. The loading time should be < 10 seconds since there are two timeouts. It should be cached for a week, so subsequent loads should be quicker. That is all, I think.
Flags: needinfo?(kmckinley)
Thank you, Kate, for your instructions! 
I managed to reproduce the issue on 52.0a1 (2016-10-22). The bug is verified fixed on 53.0b6 (20170323090023), using Windows 10 x64, Ubuntu 16.04 x64 and Mac OS X 10.11.6.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.