Closed Bug 1477684 Opened 2 years ago Closed 2 years ago

rcwn blocking some xmlHttpRequests

Categories

(Core :: Networking, defect, P3, major)

59 Branch
Unspecified
All
defect

Tracking

()

VERIFIED FIXED
mozilla63
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- fixed
firefox61 --- wontfix
firefox62 --- fixed
firefox63 --- fixed

People

(Reporter: serj.alekseev, Assigned: michal)

References

Details

(Keywords: regression, Whiteboard: [necko-triaged])

Attachments

(4 files, 2 obsolete files)

734.39 KB, application/x-zip-compressed
Details
939.99 KB, application/x-zip-compressed
Details
3.20 MB, application/x-zip-compressed
Details
fix
1.88 KB, patch
valentin
: review+
Details | Diff | Splinter Review
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36

Steps to reproduce:

Starting from version 59 Firefox sometimes can not load our app (SDL Tridion (https://www.sdl.com/software-and-services/tridion-dx/tridion-sites.html)).
Approximately every fifth reload of the page got stuck.

From our side we have found that initialization hangs because some of (cached) xhr requests never got response. It just waiting for it forever.

After long investigation we have found that after disabling rcwn (network.http.rcwn.enabled) issue never reproduces again (but it is not acceptable workaround for our customers)
Also we have found that decreasing network.http.rcwn.min_wait_before_racing_ms and network.http.rcwn.max_wait_before_racing_ms make defect more often reproducible (setting both values to 0 makes it reproducible almost every page refresh)

Not sure if it is related (most likely no, but still make sense to mention), looks like a coincidence, but all "broken" requests are requests to xml or xslt files.

Another interesting thing is that every time when one of the requests hangs in Web Tools (Network tab) we can see that there are 2 duplicates of the broken request (see screenshot). All three have no any information in UI except first one (it has a callstack). But in HAR (attached) we have found that first of those request always have flag "blocked": 1 in timings and in all repro cases in HAR file there are 6 in total requests with "blocked": 1 flag.
We tried to change network.http.max-persistent-connections-per-server property to different values and in all repro cases amount of request with "blocked" flag is equal to the value of the property.

One more thing: according to about:networking#rcwn page every time when issue reproduces the counter of "Net won count" increases by 2 or 3. In all cases when page loads fine - "Net won count" never increases.

Our assumption is that requests send via rcwn mechanism never got released if they got into a blocked state because of "network.http.max-persistent-connections-per-server" rule.

Also we had no success on building small demo page to reproduce this issue.

Would be great if you can give us any advice on how to proceed with investigation of this issue.

Thanks,
Sergii


Actual results:

Some xmlHttpRequests get blocked and never executes


Expected results:

All xmlHttpRequests should get response from network or from cache
Attached image Network-tab.png (obsolete) —
Screenshot of network tab after issue was reproduced
Attached file network.har (obsolete) —
har after issue was reproduced
+ Issue is reproducible on both Mac and PC
Severity: normal → blocker
OS: Unspecified → All
Component: Untriaged → Networking
Product: Firefox → Core
Flags: needinfo?(valentin.gosu)
Valentin, do you have time to take a look? If not, please hand off to someone else who knows rcwn well. Thanks!
Maybe Michal has some cycles to look into this sooner. Otherwise I'll take a look next week.
Flags: needinfo?(valentin.gosu) → needinfo?(michal.novotny)
Segrii, please provide HTTP log showing the problem (https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging). Ideally, it would be great if there was a test page that we could use to reproduce the problem.
Flags: needinfo?(michal.novotny) → needinfo?(serj.alekseev)
Attached file fiddler-log.zip
Fiddler log + HAR + Network tab screenshot
According to fiddler "broken" requests are actually send and got response.
Attachment #8994160 - Attachment is obsolete: true
Attachment #8994161 - Attachment is obsolete: true
Attached file networking-log.zip
about:networking log + HAR + Network tab screenshot
probably interesting part starting on line 30913
Flags: needinfo?(serj.alekseev)
Attached file full-log.zip
fiddler + about:networking + HAR + network tab screenshot
later stage of initialization (more logs), as issue very rarely reproduces with this setup
Michal, unfortunately we had no any success on building reproduction page (spend a week on that). We can not catch a key to provide a clear steps to reproduce. One thing I can say is that our application, during initialization, send about 120 requests to load configuration files (xml), javascript, images, etc.
Issue reproduces only during initialization, always on loading configuration (xhr) and there are always 3 copy of "broken"\"blocked" request.
Flags: needinfo?(michal.novotny)
Do the logs the reporter attached help diagnose?
Assignee: nobody → michal.novotny
Severity: blocker → major
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(michal.novotny)
Priority: -- → P3
Whiteboard: [necko-triaged]
What happens here is that network wins the race, but OnCacheEntryCheck() is called before OnStartRequest() and conditional headers are added. Server responds with 401 and we call DoAuthRetry(), so we send another request this time with conditional headers. Server responds with 304 but we don't have the cache entry.
Attached patch fixSplinter Review
Attachment #8995196 - Flags: review?(valentin.gosu)
Comment on attachment 8995196 [details] [diff] [review]
fix

Review of attachment 8995196 [details] [diff] [review]:
-----------------------------------------------------------------

Good catch. We'll want to uplift this.
I wonder if we could add a test for this.
Attachment #8995196 - Flags: review?(valentin.gosu) → review+
(In reply to Valentin Gosu [:valentin] from comment #14)
> Good catch. We'll want to uplift this.

I'll ask for beta uplift once it's verified.

> I wonder if we could add a test for this.

Probably not easily. Current internal API nsIRaceCacheWithNetwork doesn't allow this kind of timing and adding it would mean to do changes in the cache code, which seems to be too complex to me...
Keywords: checkin-needed
Pushed by dluca@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2a1ddebe7410
rcwn blocking some xmlHttpRequests, r=valentin
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/2a1ddebe7410
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Segrii, can you please confirm that this fixed the problem for you?
Flags: needinfo?(serj.alekseev)
We can not reproduce this issue on latest nightly build (63.0a1 (2018-07-30) (64-bit))
Great work guys!

Will it be uplifted to version 62?
We need to give an update to our customers.

Thanks,
Sergii
Flags: needinfo?(serj.alekseev)
Status: RESOLVED → VERIFIED
Comment on attachment 8995196 [details] [diff] [review]
fix

Approval Request Comment
[Feature/Bug causing the regression]: bug 1392841
[User impact if declined]: download of resource using authentication can fail
[Is this code covered by automated tests?]: the code will be triggered by some of the RCWN tests
[Has the fix been verified in Nightly?]: yes
[Needs manual test from QE? If yes, steps to reproduce]: no, it's not easy to reproduce
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: simple change
[String changes made/needed]: none
Attachment #8995196 - Flags: approval-mozilla-beta?
Comment on attachment 8995196 [details] [diff] [review]
fix

Recent regression, fix verified in Nightly, Beta62+
Attachment #8995196 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8995196 [details] [diff] [review]
fix

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: the impact described below is IMO serious enough to uplift it to ESR even if it's not security bug
User impact if declined: download of resource using authentication can fail
Fix Landed on Version: 62
Risk to taking this patch (and alternatives if risky): low, the patch is simple and the fix has been verified
String or UUID changes made by this patch: none

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8995196 - Flags: approval-mozilla-esr60?
Comment on attachment 8995196 [details] [diff] [review]
fix

Sounds like an important issue to fix across affected releases. Approved for ESR 60.2.
Attachment #8995196 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Duplicate of this bug: 1484149
You need to log in before you can comment on or make changes to this bug.