Closed Bug 176222 Opened 17 years ago Closed 9 years ago

chami.com - undecoded gzip content displayed

Categories

(Tech Evangelism Graveyard :: English US, defect, P2, major)

x86
All

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: neil.williams, Unassigned)

References

()

Details

(Whiteboard: [byte-range requests])

Attachments

(2 files)

The first couple of pages of the site display correctly but as you get passed
the second page all you get is garbage. Whatever display method they are using
(I'm assuming Active X) doesn't work with Mozilla.

I use this site a lot as I like their HTML editor.  This is not a problem with
Mozilla this is a Tech Evangelism item but I'm not too sure where I submit that.
With the universal charset detector, the document was given an US-ASCII charset.
Changing that to iso-8859-1 makes the page appear correctly.
WFM 2002102208/NT4. Clicked through a good number of pages and never noticed a
problem.


Reporter: Novice users are strongly discouraged of using Advanced Bug Entry Form. 
Please use Bugzilla Helper to report future bugs:

http://bugzilla.mozilla.org/enter_bug.cgi?format=guided
Summary: Garbage displayed as page appears to be IE only TECH EVANGELISM → Garbage displayed as page appears to be IE only {TECH EVANGELISM?}
->Tech Evang
Assignee: adamlock → susiew
Component: Embedding: ActiveX Wrapper → US General
Product: Browser → Tech Evangelism
QA Contact: mdunn → zach
Version: other → unspecified
WFM in build 2002102208 PC/Win98.
Reporter, please update to a current Mozilla build and reopen bug if problem
persists.
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago
Resolution: --- → WORKSFORME
Mozilla 1.2b (Build ID: 2002101612)
Nvidia GeForce2 GTS (Tried Detonators 30.82,40.72,41.03)
Windows 2000 SP3
I get the garbled rendering.

Different system running Windows Me.
Renders correctly.

I believe it is specific to Windows 2000 only. Not sure though.
I forgot to include that if you have visited the page before, you may need to
clear your memory and disk cache to experience the problem.

Reloading the page always fixes the problem in my experience.

Also, the garbled rendering is not consistent.
The "Different system running Windows Me" (comment #6) was just upgraded to
Windows 2000. The bug now appears on that computer with Mozilla 1.2b.
I also tested with Build ID: 2002110508 . Bug still presesnt.

A friend also observed this same bug on Linux with Mozilla 1.1 Final.

Request Status change to REOPENED
Request Product change to Browser
Request Resolution change to Blank
Could someone please change the bug status to reopened?

Still getting the garbled rendering with Mozilla 1.2.1
as requested, reopening
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
Summary: Garbage displayed as page appears to be IE only {TECH EVANGELISM?} → chami.com - Garbage displayed as page appears to be IE only {TECH EVANGELISM?}
Confirming the problem with the latest trunk build from
2002-12-08.
The comment about changing the encoding seems to be due to the
fact that that action will reload the page, adn realodign usually
fixes layout breakage. This page tends to hang the browser, though.
We should investigate:

1. Why layout breakage occurs until the page is reloaded.
2. Why it leads to a hang sometimes when loading this page.

Will investigate the ASCII vs ISO-8859-1 issue also but I am not
sure if that is directly relevant to this case. There is a bug
I filed and had it resolved that might be of interest to this
issue:

http://bugzilla.mozilla.org/show_bug.cgi?id=149417
Status: UNCONFIRMED → NEW
Ever confirmed: true
This page validates as HTML 4.01 Transitional. 
The fact that reloading usually fixes the problem 
may point to a parser problem. harish?
I don't think it is a parser issue - I got the garbled page and did view source
and got stuff like this:

01+{t({5/hy';W;0aH7ns{s,uu-lpdicowiWl1+{Pcefh'elbbccc;ao(uS ;/aw0dWR(cor
nfr'a*c{s,uPh'' c 1;ulS.{ir+{huc nhyr nwi  nc rdx;dWEwi s.1dWEc 1n'.a( ut dWRh'
p' ;/aw0dWR(cor nfr'a*ca( C><kh((5/hpprcc;a'c;a1+{sfr'a(H7'earlRdxSaa( CeTuc
nhslkh((5/hpprcc;a'c;a1+{sfr'a(H7'ear0o{1 'S ;0aerhxgr;o'a(fhc{1 2(T*c{s,uPh'' c
1;ulS.{i;%t.*e   ibu(bbcc-domluS 

the parser isn't going to make any sense out of that.  how mozilla gets stuff
like that, I don't know - the code is valid HTML when viewed in IE or the
websniffer, and also (sometimes) with mozilla. I'd guess at some kind of
networking issue.

I don't see the basis on which this was made an evang issue... the page is not
IE-only, it's valid code which displays fine in mozilla. the problem is that
mozilla seems to be getting complete garbage sometimes, and that should be
investigated. reassigning.
Severity: minor → normal
Component: US General → Networking: HTTP
Product: Tech Evangelism → Browser
Summary: chami.com - Garbage displayed as page appears to be IE only {TECH EVANGELISM?} → chami.com - Garbage displayed
Version: unspecified → Trunk
oops... really reassign this time...
Assignee: susiew → darin
QA Contact: zach → httpqa
I'm seeing this with 2003020804 on windows 2000, by the way.

one other thing... after doing view source on the garbage, closing view source,
and doing view source again, I get just:
<html><body></body></html>

reloading the page again after that (I guess from the cache), I get a blank
page. Shift+reload brings up the garbage again, and another reload displayed the
page correctly. very strange. I note that the server for these pages doesn't
identify itself.
Hello
Being a user and tester of HTML-KIT, I've seen this blank page sometimes also.
However since installing: Mozilla 1.3b
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3b) Gecko/20030130 on
Windows XP Pro SP1 German language.

I haven't seen it yet. 

As to the missing header, Chami told me about this bug report
'they were talking about not being able to see the server header. That was done
to discourage worms that look for certain types of servers.'

So I told him that I was registered here and would sign me on to the 'CC' and
while here thought that a comment or two was in order. 

He also uses a lot of ASP pages and of late is getting into PHP. 
well communication at both ends is useful, so thanks.

the type of server may or not be an issue - could we know what kind of server it is?

just a random thought - I know mozilla uses the Server: header to avoid some
things (pipelining) when talking to some kinds of server. is it possible that
because Mozilla doesn't see what the server is, it's defaulting to doing
something the server can't handle?
@ Michael Lefevre

No problem, glad to help. 
I posted on his news group with your response and here is a paste of his reply.



'The server is running IIS 5. Hmm... I wonder if it's a
gzip content related issue, because the HTTP Compression option is enabled
on the server. If necessary I'll be glad to turn off compression for a short
period of time for testing purposes.

Regards,
Chami.'

As I stated earlier, I'm not seeing this garbled page, however it is interesting
about the gzip.

Attached file http log of connection
this log of a connection to this server looks pretty weird - funny things
happening with partial requests and messages about content lengths not
matching.
in case there was doubt, I've just confirmed that it is the gzip compression
that's causing the problem...

I tried adding user_pref("network.http.accept-encoding" ,""); to prefs.js, and
the site works fine with that.
Summary: chami.com - Garbage displayed → chami.com - undecoded gzip content displayed
OS: Windows 2000 → All
Hi All

I relayed the last two posts from below over to Chami and here is his response.

"Thank you very much for working on this Steve. Your involvement greatly
improved the speed of the process.

Now that we know it's related to HTTP compression, I'm not sure what the
best solution is. It seems to be working okay with IE and I'd hate to turn
off page compression. I'll see if there's a way to enable it based on user
agent.

Regards,
Chami."

I advised him to put things on hold and see if the added pref-line idea from 
Michael Lefevre will solve the problem and if so will it be checked into the code?

As you see f rom his response he wants to work together on this.

PS, Chami's HTML-KIT has add the power of using the Gecko engine to do internal
viewing and even dual side-on-side with IE very neat.
Steve - 

that pref was intended to be a diagnostic thing, it's not something that should
be checked in (the pref stops mozilla from using compression with all sites,
which takes away the benefit to chami and everyone else - not good!), although
you could use it as a temporary workaround.

there probably is a way (at least there is in Apache, I don't know about IIS)
that he can disable compression on his site for Mozilla, but while that would
fix this case, it would be much better to find the bug (either in IIS or
Mozilla) and make a proper fix for it.  What we need is for a Mozilla developer
to look at this bug.  Please tell Chami to leave things "on hold" for the moment
until someone gets around to looking at the bug on Mozilla's end (which may take
a while).  Thanks.
Hi

"Please tell Chami to leave things "on hold" for the moment
until someone gets around to looking at the bug on Mozilla's end (which may take
a while).  Thanks."


Consider it done. Posting to him now.
i think there may be a bunch of duplicates of this bug.  going to make this the
main bug since it has a HTTP log.  thanks michaell!
Severity: normal → major
Status: NEW → ASSIGNED
Priority: -- → P2
Summary: chami.com - undecoded gzip content displayed → HTTP headers getting mis-parsed [was: chami.com - undecoded gzip content displayed]
Target Milestone: --- → mozilla1.4alpha
*** Bug 196526 has been marked as a duplicate of this bug. ***
ok, so here's what's going on.  the toplevel document contains a meta tag with a
new charset.  the charset change causes a reload of the toplevel document.  this
causes the http channel to be canceled and recreated.

if the meta charset tag appears in the first block of data sent to the HTML
parser then the call to cancel the HTTP channel will doom the corresponding
cache entry.  however, if the meta charset tag appears in a subsequent block of
data then the call to cancel the HTTP channel will not doom the cache entry. 
this is expected behavior because the HTTP channel sees that 1) the server
supports byte-range requests, and 2) the cache already has some data stored in it.

for folks using a fast connection, the meta charset almost always appears in the
first block of data.  this bug is probably easiest to reproduce using a modem.

now the question is: why are things screwed up when we take part of the document
from our cache and part of it from the server?  many possible explanations at
this point, but in order to know for sure, i'm going to have to dig deeper.
Whiteboard: [byte-range requests]
ok, more info.  here's the set of HTTP requests and responses that matter:

 http request [
   GET /html-kit/ HTTP/1.1
   Host: www.chami.com
   User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3b)
Gecko/20030219
   Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*
   Accept-Language: en-gb,en-us;q=0.7,en;q=0.3
   Accept-Encoding: gzip,deflate,compress;q=0.9
   Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
   Keep-Alive: 300
   Connection: keep-alive
 ] 

 http response [
   HTTP/1.1 200 OK
   Content-Location: http://www.chami.com/html-kit/default.html
   Date: Wed, 19 Feb 2003 22:50:08 GMT
   Content-Type: text/html
   Accept-Ranges: bytes
   Last-Modified: Wed, 19 Feb 2003 22:20:38 GMT
   Etag: "bc32662165d8c21:944"
   Content-Length: 6561
   Content-Encoding: gzip
 ]

 http request [
   GET /html-kit/ HTTP/1.1
   Host: www.chami.com
   User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3b)
Gecko/20030219
   Accept:
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,video/x-mng,image/png,image/jpeg,image/gif;q=0.2,*/*
   Accept-Language: en-gb,en-us;q=0.7,en;q=0.3
   Accept-Encoding: gzip,deflate,compress;q=0.9
   Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
   Keep-Alive: 300
   Connection: keep-alive
   Range: bytes=4380-
   If-Range: "bc32662165d8c21:944"
 ]

 http response [ 
   HTTP/1.1 206 Partial content
   Content-Location: http://www.chami.com/html-kit/default.html
   Date: Wed, 19 Feb 2003 22:50:10 GMT
   Content-Type: text/html
   Etag: "bc32662165d8c21:944"
   Content-Length: 25824
   Content-Range: bytes 4380-30203/30204
 ]

notice that the byte-range response does not include a Content-Encoding: gzip
header!!  in fact, the 206 response contains non-compressed data, whereas the
original 200 response was compressed.  the HTTP code incorrectly assumes that
the byte-range response will also be compressed, and it just naively
concatenates the data stream and sends everything to the gzip decoder :-(

since the gzip decoder has state, i'm not exactly sure what the right solution
is here.  the server might be at fault or else i'm not sure how compression and
range requests can co-exist.  afterall, the requested byte range was relative to
a compressed data stream, not the uncompressed data stream.  the ETag for the
compressed stream matches the ETag for the uncompressed stream, so i think the
server is broken, but i'll need to double check RFC 2616.
Hmm. If this was transfer-encoding, then I suspect that this would be correct,
since TE doesn't affect the etag. Given that this isn't, though...

(BTW, since when did we send byte range requests voluntarily? Or is this the
plugin doing it)

10.2.7 says:

   If the 206 response is the result of an If-Range request that used a
   strong cache validator (see section 13.3.3), the response SHOULD NOT
   include other entity-headers. If the response is the result of an
   If-Range request that used a weak validator, the response MUST NOT
   include other entity-headers; this prevents inconsistencies between
   cached entity-bodies and updated headers. Otherwise, the response
   MUST include all of the entity-headers that would have been returned
   with a 200 (OK) response to the same request.

Is content-encoding an entity-header?

Interestingly, "a non-transparent proxy MAY modify the content-coding if the new
coding is known to be acceptable to the recipient, unless the "no-transform"
cache-control directive is present in the message." Which means that you could
request half an entity from a proxy, and then the other half from a proxy which
does not add content-encoding, and get garbled results. Or change your browser
settings such that there isn't a common encoding, or something.

/me thinks about that a bit.

14.35.1 says that "Byte range specifications in HTTP apply to the sequence of
bytes in the entity-body (not necessarily the same as the message-body)." The
content-encoding stuff talks about applying the content encoding to the entity.

So this (and esp the other point about the proxy/setting changes) makes me think
that mozilla is wrong here.

What does apache do?

Do we store stuff compressed or uncompressed in the cache? We could disallow
range requests on content-encodings (but allow it on transfer-encodings) except
that most browers don't support TE, so we'd basically be disabling a method used
to save bandwith on sites which compress content to save bandwidth.

Does the range in the request come from the proxy, or from us, and does it
calculate based on the content encoding, or the uncompressed data? I'm ugessing
its the uncompressed data, which sI (now) think iswhat the spec says.

Did any of that make sense? :)
bbaetz: so, how about this:

i can issue byte ranges to apache for a compressed html.gz document.  it will
return me a segment of the compressed document.  if i send apache a range for
the uncompressed document, it will not know what i'm talking about.  you see
what i mean?  i'm pretty convinced chami.com is broken.

also, the spec says that content-encoding is a property of the entity.  that
seems to suggest that what mozilla and apache are doing is correct.

-> tech evang
Assignee: darin → susiew
Status: ASSIGNED → NEW
Component: Networking: HTTP → US General
Product: Browser → Tech Evangelism
QA Contact: httpqa → zach
Target Milestone: mozilla1.4alpha → ---
Version: Trunk → unspecified
should we also open another bug report (Tech Evangelism) involving the Apache
developers (if this can be reproduced with current release 2.0.44) or directly
through http://nagoya.apache.org/bugzilla/ (I didn't find any such bug report) ?
How do you send it a range for the uncompressed document?

The spec also says that content-encoding is applied to the entity.

I'm not sure what is correct, but given the prevalence of T-E sent as C-E....
even if chami.com (that is, IIS with compression enabled) is technically broken,
is there something easy that mozilla could do to workaround this problem?

sucks to have to tell people with IIS servers they need to tweak them to work
with mozilla, even if it's actually IIS's fault.
bbaetz: the spec is really lacking in this case! :-/

in case someone is curious, here's a sample from Apache/2.0.40 for a range
request on a compressed document:

 GET /tests/foo HTTP/1.1
 Host: unagi
 Accept-Encoding: gzip
 Range: bytes=47962-
 If-Range: "3ebe3-16c25-d72940c0;502e8180"
 <plus other standard mozilla headers>

 HTTP/1.1 206 Partial Content
 Date: Fri, 14 Mar 2003 10:40:03 GMT
 Server: Apache/2.0.40 (Red Hat Linux)
 Content-Location: foo.gz.txt
 Vary: negotiate
 TCN: choice
 Last-Modified: Mon, 10 Mar 2003 18:59:55 GMT
 ETag: "3ebe3-16c25-d72940c0;502e8180".
 Accept-Ranges: bytes
 Content-Length: 45259
 Content-Range: bytes 47962-93220/93221
 Connection: close
 Content-Type: text/plain; charset=ISO-8859-1
 Content-Encoding: gzip

notice that apache's response is consistent with mozilla's request.  if we tried
to issue a byte range relative to the uncompressed content, we'd fetch the wrong
section of the document.

the site should be fixed if possible.  they should at least be told that they
are broken.  i'm sure mozilla is not the only useragent that issues byte range
requests.

as for fixing this on our end, it would require detecting the malformed response
(namely checking that the resulting Content-Length makes sense).  then if we
discover such a malformed request, we'd have to blow away our cached copy and
repeat the request.  that would require a non-trivial amount of code, but it
could be done.

the route problem however is the fact that mozilla double fetches the document.
it does that because of our lame way of handling a charset change occuring in a
meta tag.  we really shouldn't be re-requesting the document ever, but
technically there's nothing wrong in doing so.

maybe it is simply the configuration of IIS that is broken and not IIS itself? 
if on the other hand, IIS is broken like this, then we could avoid sending
byte-range requests to it for compressed documents.  unfortunatetly however it
does not advertize itself in this case :(
also, maybe the server admin can simply disable sending "Accept-Range: bytes"
along with compressed responses.
 Range: bytes=47962-
...
 Content-Range: bytes 47962-93220/93221

What happened to the last byte?

This may be a mod_negotiate bug - can you try with a real file?
bbaetz: are you sure that's a bug?  seems like the difference of 1 is due to the
first numbers being indices and the second number being a length.

 47962 + 45259 = 93221

which is what i would expect.  93220 is the index of the last byte.
http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.16 clarifies...
Content-Range sends 0-based indices, followed by a 1-based "total content
length" value.
Confirming this is still seen in Mozilla 1.4b (Build ID: 2003050714) using
Windows 2000.

You might need to click through a few links to observe the problem.
I can confirm this on: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4b)
Gecko/20030422

I did have to spend over five minutes of clicking through different links and
was just about to stop and write here that I could 'NOT' confirm, but then it
did happen i.e. garbled text as in the screenshot.
-->doron
Assignee: susiew → doron
tech evang june 2003 reorg
Assignee: doron → english-us
QA Contact: zach → english-us
*** Bug 209868 has been marked as a duplicate of this bug. ***
Looking on MS' support site IIS has *lots* of bugs related to compression in IIS
and does not even support it for IIS 5.1.

Steve, can you contact Chami again and ask him to disable compression for
Mozilla if possible? Note that he is not providing that much benefit to his
users by compressing his pages using IIS. IIS does not send 304 Not Modified for
compressed content but sends the full compressed page each time. 

He might consider using a non-borken web server like Apache instead of Windows
as well.

If he is concerned about performance, please have him contact me. I can perhaps
help.
Summary: HTTP headers getting mis-parsed [was: chami.com - undecoded gzip content displayed] → chami.com - undecoded gzip content displayed
@ Bob

I passed your last entry on the thread to Chami and here is his response.

"Thank you very much Steve. I was thinking about it the other day. Since
majority of the visitors to the site use IE, and also since only a very
small percentage of Mozilla users seem to run into this problem (things seem
to work okay after the first page load), turning page compression off is not
very beneficial when you consider the bandwidth savings. But I'll disable it
for now as suggested and see what other options/workaround are there."

Please understand the Chami is a one man show (HTML-KIT) thus very busy earning
bread.
this problem appears to have come up again.  see bug 241085.  i have proposed a
workaround patch in that bug that would also help here.
INCOMPLETE due to lack of activity since the end of 2009.

If someone is willing to investigate the issues raised in this bug to determine whether they still exist, *and* work with the site in question to fix any existing issues, please feel free to re-open and assign to yourself.

Sorry for the bugspam; filter on "NO MORE PRE-2010 TE BUGS" to remove.
Status: NEW → RESOLVED
Closed: 17 years ago9 years ago
Resolution: --- → INCOMPLETE
FWIW, at some point since 2004, the site has changed completely, and doesn't seem to be using the compression any more that caused this issue.
Resolution: INCOMPLETE → FIXED
Product: Tech Evangelism → Tech Evangelism Graveyard
You need to log in before you can comment on or make changes to this bug.