Closed Bug 1154426 Opened 5 years ago Closed 5 years ago

thelancet.com is horked on 39 and 40

Categories

(Core :: Networking: HTTP, defect)

x86_64
Windows 7
defect
Not set

Tracking

()

VERIFIED FIXED
mozilla40
Tracking Status
firefox38 --- unaffected
firefox39 + verified
firefox40 + verified

People

(Reporter: caspy77, Assigned: bagder)

References

()

Details

(Keywords: regression, Whiteboard: [Testday-20150619])

Attachments

(2 files)

Attached image The Lancet Screenshot
The following pages are screwed up in Firefox 39 & 40.
http://www.thelancet.com/
http://www.thelancet.com/journals/landia/article/PIIS2213-8587%2815%2900033-9/fulltext

I've attached a screenshot.
Guessing the CSS is not loading.
I'm on Windows 7. It has also been reproduced on Firefox 40 on Linux.
Beta 38 was unaffected. Chrome works fine.
[Tracking Requested - why for this release]:  Web compat regression

m-c regression range (on nightlies): http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=3094601af679&tochange=2f5c5ec1a24b

Looks like bug 237623 landed in that range, though it also looks like it was then backed out and then relanded...

In any case, if in a nightly I toggle the "network.http.enforce-framing.soft" preference to false, the problem goes away.
Blocks: 237623
Component: General → Networking: HTTP
Flags: needinfo?(daniel)
Thanks for the screenshot. I can repeat it with a recent Nightly on Linux too. I can however not immediately detect a prematurely cut off CSS that would "easily" explain this, so I need to dig a little to figure out exactly how this happens.
Flags: needinfo?(daniel)
Assignee: nobody → daniel
Confirmed in Aurora 39 (20150413004006).

Most (if not all) of the missing styles seem to come from http://www.thelancet.com/pb/css/head_1_1623_1624_6030.css?1429064815950 which has these headers:

Host: www.thelancet.com
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:39.0) Gecko/20100101 Firefox/39.0
Accept: text/css,*/*;q=0.1
Accept-Language: en-GB,en;q=0.5
Accept-Encoding: gzip, deflate
Referer: http://www.thelancet.com/journals/landia/article/PIIS2213-8587(15)00033-9/fulltext
Cookie: PRUM_EPISODES=s=1429090004838&r=http%3A//www.thelancet.com/journals/landia/article/PIIS2213-8587%2815%2900033-9/fulltext; SERVER=WZ6myaEXBLEIcey8uceZQQ==; JSESSIONID=aaahAZgoVzfD9_waIa3Yu; MAID=iXmsPgImYacfCK6lOUjbrw==
Connection: keep-alive

Cache-Control: public
Connection: close
Content-Encoding: gzip
Content-Type: text/css; charset=UTF-8
Date: Wed, 15 Apr 2015 09:26:48 GMT
Expires: Thu, 14 Apr 2016 02:28:08 GMT
Last-Modified: Wed, 15 Apr 2015 02:28:08 GMT
Server: AtyponWS/7.1
Transfer-Encoding: chunked
Vary: User-Agent,Accept-Encoding

If I use the developer tools to try and load the Style Editor on a broken instance, I get this in the Browser Console:

10:38:06.895 Error: Request failed with status code = 2152398924 in onStopRequest handler for url = http://www.thelancet.com/pb/css/head_1_1623_1624_6030.css?1429064815950
Stack trace:
fetch/streamListener.onStopRequest@resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/actors/stylesheets.js:1185:29
1 protocol.js:900

(As a side note, it seems that there is a bug there in that the tools don't handle that error).

Also, if in a working instance I load the Style Editor and clear the whole of that CSS file, the result looks pretty much like the broken version.

The other CSS files do not seem to be chunked, so that could be the place to start looking.

Indeed if I load up the file on in Chrome or cURL, the last style is:

.commissionsStory .seriesStory-teaser .timeHeader {
  color: #666;
  font-size: 15px;
  font-weight: normal;
  line-height: normal;
  margin: 0 0 2px;
}

but in Firefox, it is:

.quickSearch {
  padding-bottom: 15px;
  background: #fff;
}

(Last 8 styles missing).

Finally, when dragging the link to this file to the Download Manager, I get this in the browser console:

Date: Wed Apr 15 2015 11:04:03 GMT+0100 (BST)
Full Message: DownloadError: [Exception... "A transfer was only partially done when it completed"  nsresult: "0x804b004c (NS_ERROR_NET_PARTIAL_TRANSFER)"  location: "JS frame :: resource://gre/modules/DownloadCore.jsm :: this.DownloadError :: line 1498"  data: no]
Full Stack: this.DownloadError@resource://gre/modules/DownloadCore.jsm:1530:16
DCS_execute/task_DCS_execute/backgroundFileSaver.observer.onSaveComplete@resource://gre/modules/DownloadCore.jsm:1900:42
1 <unknown>

Hope that helps.
Strangely, this starts working again if I disable network.http.enforce-framing.soft OR if I enable network.http.enforce-framing.http1.

That shouldn't happen should it? I thought the second one was /more/ strict...
(In reply to Daniel Cater from comment #4)
> Strangely, this starts working again if I disable
> network.http.enforce-framing.soft OR if I enable
> network.http.enforce-framing.http1.
> 
> That shouldn't happen should it? I thought the second one was /more/
> strict...

Yeah, that sounds completely bonkers and wrong...
The problem seems to be that the CSS in that case is gzipped but without the gzip trailer (meaning mStreamEnded never goes true as it ends with inflate() returning Z_BUF_ERRROR), so by definition the gzip stream does not terminate nicely - although all the relevant content seems to have been shipped and the chunked decoding has a fine terminating zero and everything.

I'll need a moment to consider how to handle this.
yeah, if http delims are valid we shouldn't be throwing that particular error no matter the state of the gzip..
I also found other sites served by "AtyponWS/7.1" that show the same problem. Like http://www.cell.com/

Patrick, do you have any thoughts on how we'd best combine the knowledge that we ended the chunk fine into the nsHTTPCompressConv::OnStopRequest(), or vice versa?
I think the only object that the transaction and the compressconv share is the channel
Tracking for 39 and 40, since we would like to avoid being horked, borked, corked, or torqued whenever possible!
QA Whiteboard: [good first verify]
(In reply to Patrick McManus [:mcmanus] from comment #9)
> I think the only object that the transaction and the compressconv share is
> the channel

They do? Any hint on how would I go about and find the channel from within nsHTTPCompressConv?
Flags: needinfo?(mcmanus)
I'm reconsidering this. I now think we should stop checking for gzip errors when doing "soft enforcing" of HTTP framing. I think we've tried enough kludges for this and I don't think it is worth the effort to keep going down this rabbit hole.

This basically means we would ignore content-size vs content-length differences even for gzip by default. I believe that is also what Chrome does.

Setting network.http.enforce-framing.http1 to 'true' will still do the stricter checking.

The original issue (bug 237623) was about the download manager not noticing broken downloads, and luckily such downloads are less likely be content-encoded.
If we agree on taking the approach I just mentioned, here's a patch that makes us do just that.
Could you reproduce comment 4? If so, does this fix it?
(In reply to Daniel Cater from comment #14)
> Could you reproduce comment 4? If so, does this fix it?

Yes I could reproduce it and I found the cause as I mentioned above.

With this patch applied both www.cell.com and www.thelancet.com/ render fine for me.
Status: NEW → UNCONFIRMED
Ever confirmed: false
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Comment on attachment 8594774 [details] [diff] [review]
0001-bug-1154426-ignore-gzip-problems-if-only-soft-enforc.patch

Review of attachment 8594774 [details] [diff] [review]:
-----------------------------------------------------------------

ok. let's separate out the issue for the moment of malformed gzip and just assume anything with gzip is a valid delimiter.
Attachment #8594774 - Flags: review+
Try run seems fine: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6068b4f53ab1
Flags: needinfo?(mcmanus)
Keywords: checkin-needed
Sorry, I'm fighting _very_ hard to see how this patch could break this, seeing that it brings us back to a state we had previously more or less and that the failing test seems to have nothing to do with gzipped content.

I retriggered the gij(8) job 5 more times in my try-build and they're all green.

Is there a chance the problem was caused by another commit?
Flags: needinfo?(daniel)
I also backed out bug bug 1155634 as another possible cause for the failure. They started happening on a push with a bunch of various checkin-needed patches in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=b9a594d013bd

If things are still failing on the backout, I'll try backing out some of the other patches from that push and see where that gets us.

Setting ni on myself to remind me to reland this eventually if it's not the cause.
Flags: needinfo?(wkocher)
Turns out it was bug 1145680. Relanded.
Flags: needinfo?(wkocher)
https://hg.mozilla.org/mozilla-central/rev/7344990e0c5d
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Comment on attachment 8594774 [details] [diff] [review]
0001-bug-1154426-ignore-gzip-problems-if-only-soft-enforc.patch

Approval Request Comment
[Feature/regressing bug #]: 1154426
[User impact if declined]: Occasional badly rendered web pages. 
[Describe test coverage new/current, TreeHerder]: xpcshell tests for the underlying gzip handling,
[Risks and why]: 
[String/UUID change made/needed]:
Attachment #8594774 - Flags: approval-mozilla-aurora?
Daniel, what kind of risk do you think this might have? What else might this break?

Also noting that you list the regressing bug as this bug number. Do you think the regression is from bug 237623? If so I can mark 38 as unaffected. That can sometimes help us later if we need to come back to this bug in future.
Flags: needinfo?(daniel)
I think this fix is fairly low-risk since it reverts new logic back to a situation quite similar to what we had before.

This patch brings back the inability to properly detect a prematurely cut off HTTP transfers when the transfers are using gzipped content encoding. Bug 237623 was all about improving that detection, And yes, the regression should only have been present in 39+ so 38 remains unaffected.
Flags: needinfo?(daniel)
Noting also that there are no UUID or string changes in the patch.
Comment on attachment 8594774 [details] [diff] [review]
0001-bug-1154426-ignore-gzip-problems-if-only-soft-enforc.patch

Approving for uplift to aurora, since this has had some time on nightly with no issues, and it's a recent regression we want to avoid shipping in 39.
Attachment #8594774 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Reproduced this bug by following STR on Nightly 40.0a1 (2015-04-14) and on Windows 7 64 Bit!

The bug's fix is verified on Latest Beta 39.0b7 (2015-06-18) and Latest Developer Edition aka Aurora  	40.0a2 (2015-06-19)!

Build Id - 20150618135210
User Agent - Mozilla/5.0 (Windows NT 6.1; WOW64; rv:39.0) Gecko/20100101 Firefox/39.0

Build Id - 20150619004003
User Agent - Mozilla/5.0 (Windows NT 6.1; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [good first verify] → [good first verify][Testday-20150619]
Whiteboard: [Testday-20150619]
You need to log in before you can comment on or make changes to this bug.