1484149 - Cache racing breaks NTLM authentication - Load / NTLM Auth / cache issue in Firefox and Sharepoint on premises

Reporter

Description

•

7 years ago

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:52.0) Gecko/20100101 Firefox/52.0 Build ID: 20180621064021 Steps to reproduce: I am experiencing a very weird issue when trying to access sharepoint on premises from new Firefox version (Quantum >v60.0), (browser runnig on a windows computer). SCENARIO: - Firefox is correctly configured for using NTLM for Auth ( FF about:config). - Sharepoint 2013 on premises. Firefox ESR 60. - Sharepoint is loaded just fine in IE & Chrome, however, Firefox fails to fully load the site, apparently, on a random basis. After a page refresh, FF manages to load the site just fine for some time ( I guess this time is based on the ntlm auth cookie expiration ). - Other firefox users experience the same issue. - Previous ESR Firefox version, v52.0, doesn't present this issue or have any problem in loading SP sites. - no errors are shown in Firefox console, in terms of javascript, XHR, etc Actual results: The problem is that FF (>v60.0) has issues to succesfully load the page, or sometimes even to start to load the page. However, if Firefox cache is disabled, this issue doesnt happen at all. "Network" tab from the FF developer tools, lets you notice that load process stops at some point ( You can see in the source code that some files/code has been downloaed, but not all of it), and it is only completed if you refresh the page, or disable cache. Expected results: Site should have loaded just fine, no matter in cache is enabled or disabled. FF v52 loads the site SP site just fine.

spawnschbob

Comment 1

•

7 years ago

Hi, I have the same Issue. In SharePoint 2013 and now a new installed SharePoint 2016. Current unhappy Solutions we used: 1. Reload the Page 2. Open SharePoint in Private-Mode (no Cache)

Alberto Suarez

Reporter

Comment 2

•

7 years ago

Correct, that is exactly the same behaviour I have noticed. If you disable the cache, then Sharepoint pages load without any issues. I hope some FF/SP expert can help us. Thanks

denis.brechemier

Comment 3

•

7 years ago

Hi, We have the same issue with SharePoint 2010. Please Help. Kind Regards

Marco Siefert

Comment 4

•

7 years ago

Hi, we have the Same issue with SharePoint 2016 On-Premise and Firefox 60.0.2 :-(. All users are affected. Page reload is needed very often because page is not loading comletly.

Sebastian Segerer

Comment 5

•

7 years ago

Hi, see also this Mozilla Support Forum entry: https://support.mozilla.org/de/questions/1213246 Since the creator of the post used Firefox 59, it shouldn't be related to Quantum. We experience the same behavior as described in Firefox 61 and 62 across several users in our company. We do use SharePoint 2007 (NTLM) and 2013 (Negotiate / Kerberos), but the problem is not limited to it. It also happens on our on-premise Team Foundation Server with NTLM and on custom ASP.NET applications hosted by our team on IIS.

Arnaud Meurou

Comment 6

•

7 years ago

Hi, we're not limited to sharepoint too, other sites using NTLM are concerned (delivred by IIS and Apache)

Alberto Suarez

Reporter

Comment 7

•

7 years ago

(In reply to Sebastian Segerer from comment #5) > Hi, > see also this Mozilla Support Forum entry: > https://support.mozilla.org/de/questions/1213246 > Since the creator of the post used Firefox 59, it shouldn't be related to > Quantum. > > We experience the same behavior as described in Firefox 61 and 62 across > several users in our company. > We do use SharePoint 2007 (NTLM) and 2013 (Negotiate / Kerberos), but the > problem is not limited to it. It also happens on our on-premise Team > Foundation Server with NTLM and on custom ASP.NET applications hosted by our > team on IIS. Sebastian, Please read the inital description of the issue. I clearly stated that I noticed this issue when working with firefox 60 ( Quantum version ). When I mention, firefox v.52, what I say is that version doesnt show the Sharepoint NTLM load issue.

Sebastian Segerer

Comment 8

•

7 years ago

@Alberto Suarez I was referring to the creator of the linked support.mozilla.org post, in which Firefox 59 was used.

Alberto Suarez

Reporter

Comment 9

•

7 years ago

(In reply to Sebastian Segerer from comment #8) > @Alberto Suarez > I was referring to the creator of the linked support.mozilla.org post, in > which Firefox 59 was used. My apologies Sebastian. I have taken a look to the post you shared. It seems that many users and experiencing the same issue. Thanks.

Kanchan Kumari QA

Comment 10

•

6 years ago

Based on many users experiencing this issue I am placing this under Core:Networking, so someone from the team can look into this issue. Thanks!

Component: Untriaged → Networking

Product: Firefox → Core

Honza Bambas (:mayhemer)

Comment 11

•

6 years ago

Thanks for putting this to the right component and thanks for the report. Alberto, Sebastian, I will kindly ask you for producing http logs according [1]. Please set the list of logging modules (MOZ_LOG) as: timestamp,rotate:400,nsHttp:5,cache2:5,negotiateauth:5,NTLM:5 As the logs may contain sensitive information, it will be best to send them to my bugzilla email directly. Thank you! [1] https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

Assignee: nobody → honzab.moz

Status: UNCONFIRMED → NEW

Component: Networking → Networking: HTTP

Ever confirmed: true

Flags: needinfo?(sebbl)

Flags: needinfo?(alberto.suarez.caballero)

Priority: -- → P2

Whiteboard: [necko-triaged][ntlm]

Honza Bambas (:mayhemer)

Comment 12

•

6 years ago

So, the problem is that when we race cache with network responses, NTLM authentication can be broken. What I can see in a privately provided log is that a _first request on a new connection_ is being raced with cache. We send a GET request (no authentication) and also open an entry that can be used w/o revalidation. The cache wins even before we get the first 401: NTLM response. The channel is finished, 401 response thew away, resource satisfied from the cache. The expected processing chain is to authenticate the connection with three 401,401,200 loops. The authentication state is kept in the requesting channel, so when that channel is finished, next request on the connection has to start over doing again a plain GET w/o any auth headers, restarting the NTLM auth process from scratch. If it happens that a previous raced request has already sent an NTLM message type 1, this likely confuses the server as it expects NTLM message type 3 in the next request. That is likely the cause of unsatisfied loads from the server, but I have to carefully look into the log again later. Possible fixes for the problem found so far: - in case of Basic or Digest auth we keep an information on a cache entry that it was served with authentication what makes us revalidate it ; I believe this also disallows cache racing (Michal?), but I can see we are sending conditional headers in requests... OTOH, this may be incomplete solution when NTLM is established later for the resource (we have an entry w/o "auth" marking) - let the channel finish the authentication despite a cache win ; this is probably the "easiest" and most clear way to fix this bug

Flags: needinfo?(sebbl)

Flags: needinfo?(michal.novotny)

Flags: needinfo?(alberto.suarez.caballero)

Summary: Load / NTLM Auth / cache issue in Firefox and Sharepoint on premises → Cache racing breaks NTLM authentication - Load / NTLM Auth / cache issue in Firefox and Sharepoint on premises

Whiteboard: [necko-triaged][ntlm] → [necko-triaged][ntlm][http-conn]

Honza Bambas (:mayhemer)

Comment 13

•

6 years ago

Hmm.. I inspected the provided log more in detail and I see a different problem (actually a second one). Michal, I will send you the log with ref to the affected channel to inspect. It's also definitely related to racing.

Honza Bambas (:mayhemer)

Comment 14

•

6 years ago

The second problem, that actually manifests as reported - responses are hanging - is the following: - we do a request - find an entry that needs to be re-validated with the server - we do a request - we do it on a new connection - we get a 401:NTLM response => the cache racing algorithm takes it as if that the first response came from the network, but that is a totally wrong assumption - we do the full NTLM authentication round: another GET, 401, GET 304 - now the server has confirmed that the cached entry can be used - we call ReadFromCache, but it's skipped because cache racing believes we have already provided the response from network This leads to total omission of calling OnStartReqest/OnDataAvailable/OnStopRequest of the final listener (HttpChannelParent) and thus the child request is hanging forever. If this happens for a top level page, a user just stars at a blank page and spinning throbber. P1 as this is a corporate serious bug. I'll file bugs to disable cache racing on ESR branches.

Priority: P2 → P1

Honza Bambas (:mayhemer)

Updated

•

6 years ago

Blocks: RCWN

Honza Bambas (:mayhemer)

Comment 15

•

6 years ago

I'd love to verify that Basic/Digest are not affected, in all scenarios, see [1] [1] https://searchfox.org/mozilla-central/rev/ce57be88b8aa2ad03ace1b9684cd6c361be5109f/netwerk/protocol/http/nsHttpChannel.cpp#4345-4348

Honza Bambas (:mayhemer)

Updated

•

6 years ago

Depends on: 1494405

Honza Bambas (:mayhemer)

Updated

•

6 years ago

status-firefox62: --- → affected

status-firefox63: --- → affected

status-firefox64: --- → affected

status-firefox-esr60: --- → affected

status-geckoview62: --- → affected

status-thunderbird_esr60: --- → affected

Honza Bambas (:mayhemer)

Comment 16

•

6 years ago

For all affected users, the actual fix/good workaround is to switch 'network.http.rcwn.enabled' to |false| in about:config.

Arnaud Meurou

Comment 17

•

6 years ago

Thanks Honza, i'm testing the workaround on few machines and so far so good.

Chris Peterson [:cpeterson]

Comment 18

•

6 years ago

status-geckoview62=wontfix because NTLM is not a critical use case for Focus+GeckoView. We don't need to uplift a fix for GeckoView 62 in Focus 7.0.

status-geckoview62: affected → wontfix

Michal Novotny [:michal]

Assignee

Comment 19

•

6 years ago

(In reply to Honza Bambas (:mayhemer) from comment #12) > - in case of Basic or Digest auth we keep an information on a cache entry > that it was served with authentication what makes us revalidate it ; I > believe this also disallows cache racing (Michal?) We don't use this information when we're deciding whether to race or not because we don't know it. It's not stored in the index and we don't have the entry.

Flags: needinfo?(michal.novotny)

Michal Novotny [:michal]

Assignee

Comment 20

•

6 years ago

(In reply to Honza Bambas (:mayhemer) from comment #14) > The second problem, that actually manifests as reported - responses are > hanging - is the following: > - we do a request > - find an entry that needs to be re-validated with the server > - we do a request > - we do it on a new connection > - we get a 401:NTLM response > => the cache racing algorithm takes it as if that the first response came > from the network, but that is a totally wrong assumption > - we do the full NTLM authentication round: another GET, 401, GET 304 > - now the server has confirmed that the cached entry can be used To not end up with 304 response while we don't have the entry, we remove conditional headers before we send the request. This landed in bug 1382831. I need to understand more how NTLM works to understand why the problem persists for NTLM.

Michal Novotny [:michal]

Assignee

Comment 21

•

6 years ago

So the problem is that we remove the conditional headers only in the first request but not in subsequent requests. It seems we need change a bit the condition at https://searchfox.org/mozilla-central/rev/ce57be88b8aa2ad03ace1b9684cd6c361be5109f/netwerk/protocol/http/nsHttpChannel.cpp#1186.

Ryan VanderMeulen [:RyanVM]

Comment 22

•

6 years ago

RCWN has been disabled for this week's forthcoming ESR 60.2.2 release, which should resolve this issue for those users.

status-firefox62: affected → wontfix

status-firefox-esr60: affected → fixed

status-thunderbird_esr60: affected → ---

tracking-firefox-esr60: --- → 62+

Keywords: regression

Michal Novotny [:michal]

Assignee

Comment 23

•

6 years ago

After studying the code, it seems that this is a dupe of bug 1477684 which was fixed in version 62 and was uplifted to ESR 60.2. Alberto, what version of ESR did you use when you were able to reproduce the bug? ESR 60 or ESR 60.2?

Michal Novotny [:michal]

Assignee

Updated

•

6 years ago

Flags: needinfo?(alberto.suarez.caballero)

Sebastian Segerer

Comment 24

•

6 years ago

From an end-user perspective, I can confirm that I did not encounter this problem for some days / maybe weeks; probably since the FF 62 update mid September. I'm sorry I did not report this earlier, but I wasn't sure if I was just "lucky" to not have this issue for some days. I also just checked with our team and no one hat this behaviour anymore.

Arnaud Meurou

Comment 25

•

6 years ago

In our case, i've provided a log from 60.0 to Honza. I've updated firefox to 60.2, i guess i should switch back to true "network.http.rcwn.enabled" and try.

Michal Novotny [:michal]

Assignee

Comment 26

•

6 years ago

(In reply to Arnaud Meurou from comment #25) > I've updated firefox to 60.2, i guess i should switch back to true > "network.http.rcwn.enabled" and try. Yes, please enable rcwn again and let me know whether ESR 60.2 works correctly. Thanks.

Honza Bambas (:mayhemer)

Comment 27

•

6 years ago

Great news! Thanks. When confirmed, we can back bug 1494405 out from ESR.

Honza Bambas (:mayhemer)

Updated

•

6 years ago

Assignee: honzab.moz → michal.novotny

Alberto Suarez

Reporter

Comment 28

•

6 years ago

(In reply to Sebastian Segerer from comment #24) > From an end-user perspective, I can confirm that I did not encounter this > problem for some days / maybe weeks; probably since the FF 62 update mid > September. > I'm sorry I did not report this earlier, but I wasn't sure if I was just > "lucky" to not have this issue for some days. > I also just checked with our team and no one hat this behaviour anymore. Yes I think, as Sebastian has obeserved, that I have not gone through this issue since the last FF update. Currently using FF ESR 60.2

Flags: needinfo?(alberto.suarez.caballero)

Alberto Suarez

Reporter

Comment 29

•

6 years ago

(In reply to Michal Novotny (:michal) from comment #23) > After studying the code, it seems that this is a dupe of bug 1477684 which > was fixed in version 62 and was uplifted to ESR 60.2. > > Alberto, what version of ESR did you use when you were able to reproduce the > bug? ESR 60 or ESR 60.2? ESR 60

Michal Novotny [:michal]

Assignee

Updated

•

6 years ago

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → DUPLICATE

Arnaud Meurou

Comment 31

•

6 years ago

Ok works for too with 60.2 ! thanks guys

Liz Henry (:lizzard) (relman/hg->git project)

Comment 32

•

6 years ago

Fixed in 62 and ESR in bug 1477684.

status-firefox63: affected → fixed

status-firefox64: affected → fixed