push cache not found with e10s on https://www.chromestatus.com

RESOLVED FIXED in Firefox 48

Status

()

Core
Networking: HTTP
P5
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: nwgh, Assigned: nwgh)

Tracking

42 Branch
mozilla48
Points:
---

Firefox Tracking Flags

(e10s+, firefox48 fixed)

Details

(Whiteboard: [spdy][necko-active])

Attachments

(2 attachments)

+++ This bug was initially created as a clone of Bug #1236170 +++

Bug 1236170 was filed because https://www.chromestatus.com never loaded in e10s-enabled firefox. This is because we failed to find a cache for pushes that we received, and when we tried to decompress and throw away the pushed headers, our validation was rejecting them, closing down the session. The validation issue has been fixed, but there's still the mystery of why we don't find a push cache for this page (not a show-stopping bug, just sub-optimal behavior). This bug is to investigate the missing push cache situation.
Whiteboard: [spdy]
Nicholas, is this meant as a non-e10s bug for Release/Beta channel?

On Mac, loading of https://www.chromestatus.com: 

Fails on Release - 43.0.3  (non-e10s)

Passes on latest Nightly - 46.0a1 (e10s enabled and in New non-e10s Window)
Flags: needinfo?(hurley)
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #1)
> Nicholas, is this meant as a non-e10s bug for Release/Beta channel?
> 
> On Mac, loading of https://www.chromestatus.com: 
> 
> Fails on Release - 43.0.3  (non-e10s)
> 
> Passes on latest Nightly - 46.0a1 (e10s enabled and in New non-e10s Window)

This bug does not need to be tracked for any particular release; it's sub-optimal behavior that will be nice to fix, but will not affect users. The other bug (#1236170) is what we need to think about uplifting, both for e10s and non-e10s.
Flags: needinfo?(hurley)
tracking-e10s: --- → +
Whiteboard: [spdy] → [spdy][necko-active]

Updated

2 years ago
Priority: -- → P5
Created attachment 8740143 [details]
nspr.log.gz

OK, finally picked this up a bit more. Here's what I've found so far - this may ALL be red herrings at this point, but I need to write it down before my pile of handwritten notes about this gets lost in my piles of other handwritten notes...

I've attached an NSPR log with the details from my notes below in case I need to dig further in this one run, though it could prove useless if I've not logged everything I need (this is where rr on mac would come in handy)

I *think* the whole issue boils down to the main page load coming in the form of a redirect. When we load https://www.chromestatus.com, the initial transaction is given a scheduling context (11e827200 in the log). That transaction (h2 stream 0xD/11d2e8cd0) ends in a 302 to https://www.chromestatus.com/features. Once the 302 happens, ReleaseBlockingTransaction is called on the initial transaction (timestamp 21:03:20.947367), which nulls out its scheduling context handle, losing any way the h2 session had for getting to that scheduling context. (Interesting side note - ReleaseBlockingTransaction is called multiple times on the transaction, of course only the first time changes the value of the scheduling context handle, as on all subsequent times it's already null - this may also be a bug).

Later on, at timestamp 21:03:20.972231, a new transaction is created for the redirection (1211be000). This transaction is never given a scheduling context, so when we get to the server pushing streams associated with the new transaction (h2 stream 0xF/11d2e90c0) around timestamp 21:03:21.086856, there is no scheduling context available to get a push cache from. Thus, we refuse the pushes.

Next step is to dig into the redirect code to see if there's a reasonable way to pass the scheduling context from the source transaction into the destination transaction.
More update - I've confirmed the redirect is the problem. If I go directly to https://www.chromestatus.com/features I have no problem receiving the pushes. Passing from the old transaction to the new one isn't going to work directly (the old transaction removes its own reference to the scheduling context before the channel has any knowledge of a redirect, let alone makes a new channel). However, the channel has the ID of the scheduling context, which should be sufficient to make this work - so long as we ensure the new channel gets a copy of that ID, then it can (when it creates its own transaction) tell the new transaction about the scheduling context, and we should be good to go.

Still don't totally know the redirect code path, so I've got to figure out the right place(s) to make this go, but we have a path forward.
OK, finding that was easier than thought. Patch incoming once I get all my debug printfs cleaned up :)
Created attachment 8740527 [details] [diff] [review]
patch

Here we go - I *think* it makes sense to unconditionally share the scheduling context ID, but I could be totally wrong on that one. If I am, we'll need to add a flag similar to preserveMethod, but that should be straightforward enough (famous last words).
Attachment #8740527 - Flags: review?(mcmanus)
Comment on attachment 8740527 [details] [diff] [review]
patch

Review of attachment 8740527 [details] [diff] [review]:
-----------------------------------------------------------------

one more thing to copy :)
Attachment #8740527 - Flags: review?(mcmanus) → review+

Comment 10

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/0e68f0792284
Status: NEW → RESOLVED
Last Resolved: 2 years ago
status-firefox48: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
You need to log in before you can comment on or make changes to this bug.