Closed Bug 236858 Opened 20 years ago Closed 11 years ago

Repeating GET requests when charset <meta> appears late

Categories

(Core :: DOM: HTML Parser, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 61363

People

(Reporter: pdsimic, Unassigned)

Details

Attachments

(3 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007

After finishing the development of a dynamic web site using php and server-side
sessions (using cookies), and uploading it to the server, I noticed that some of
it's functionallity is broken, in fact broken are pages using sessions for
determining current state of user's progress.

I traced this problem (by examining Web server's log files) to be caused by
repeating GET requests made by my Mozilla browser, while testing my web site
from my machine, over a 56k modem connection.

The question is: why the Mozilla browser is repeating GET queries to Web server?
Is there a way to increase HTTP (connection?) timeouts? I tried to fiddle with
configuration from about:config, but even after setting ...timeout... parameters
to enormous values, nothing worked better.

The described repetion of GET queries happens in about 90% cases, tested both
with Mozilla 1.5 and 1.6.

Any help? Thanks in advance.

BTW, here is a HTTP header from sample "hands-made" GET request to one of
"problematic" pages (and the headers are identical on my development server and
my production server, so there are no differences between them to cause these
problems):

HTTP/1.1 200 OK
Date: Mon, 08 Mar 2004 21:03:52 GMT
Server: Apache
Set-Cookie: admsid=23fe9156addcf1af54d82827cc124a43; path=/admin;
domain=xxx.xxxxxxxx.xxx
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Content-Type: text/html


Reproducible: Always
Steps to Reproduce:
1. Fetch the problematic page from my production server

Actual Results:  
Repeated GET queries to my production server.

Expected Results:  
Just one GET query. My 56k modem connection is quite stable. ;)
reporter: can you please provide a HTTP log per the instructions on this site:

http://www.mozilla.org/projects/netlib/http/http-debugging.html

feel free to email the log file directly to me if you would like its contents
kept private.  attaching the log file to this bug is otherwise fine :)
> Content-Type: text/html

You don't set a charset here.  Does the page set it?  Or are you relying on the
browser's charset autodetect?  Do the repeated GETs go away if you disable
charset autodetect?
> You don't set a charset here.  Does the page set it?  Or are you relying on the
> browser's charset autodetect?  Do the repeated GETs go away if you disable
> charset autodetect?

I have the following line in my page's header, so it's setting the charset:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">

In my Mozilla's settings, Navigator->Languages->Character Coding->Default
Character Coding is set to "Western (ISO-8859-1)".

Excuse me for a stupid question, but how do I disable charset autodetect?
Browser log, during problem reproducing
> reporter: can you please provide a HTTP log per the instructions on this site:

I've sent the requested browser's HTTP log, reproducing the described problem.
As a notice, it's a bzip2'ed file.

In this logfile, http://omega.homelab.net/ is just my start page, while pages
making problems are under http://mp3.rskoming.net/admin/, and you can see
repeated requests for /admin/index.php, /admin/add.php and finally for
/admin/logout.php.

As it could have something with local Caching, I've tried to reproduce the
problem with all four Caching setting ("Compare the page in local cache with the
page on network" - or however ;), and it persists with all four settings.

BTW, please don't think I'm violating many laws by distributing MP3's around,
this is just a local archive. ;)
> Excuse me for a stupid question, but how do I disable charset autodetect?

View menu > Character Encoding > Autodetect > (Off)
> > Excuse me for a stupid question, but how do I disable charset autodetect?
>
> View menu > Character Encoding > Autodetect > (Off)

Just for info, it was (and still is) turned Off...
Attachment #143372 - Attachment mime type: text/plain → application/x-bzip2
> I've sent the requested browser's HTTP log, reproducing the described problem.
> As a notice, it's a bzip2'ed file.

Any clues out of it?
I also was suffering this problem with my cart.  I spent about 40 minutes trying
to fix it on the server side, and then I decided to check the request via
LiveHTTPHeaders.  It was then I noticed that the file was being re-requested. 
After a quick search on Bugzilla I found this bug, noticed that comments
regarding the charset (I set one in the source, but not via HTTP Headers), and
sent the charset via the headers. It now works fine.  The PHP code to send the
charset via headers is header("Content-type: text/html; charset=ISO-8859-1"); if
anyone is interested.
Confirming.  One of my colleagues has managed to reproduce this reliably on
Firefox 1.0, WinXP.  (OS->all)

Setting the content-type header does indeed resolve the problem.

I'll attach HTTP traces in a minute.
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Log demonstrating the problem.	The request in question is:
GET
/template/startquote.launch?PolicyType=PC&CompanyName=template&brandName=default
HTTP/1.1
Log for a similar request, again, for:
GET
/template/startquote.launch?PolicyType=PC&CompanyName=template&brandName=default

HTTP/1.1
(apologies for unnecessary wrapping)

The only server-side change bewteen these two requests was to explicitly set a
"Content-Type: text/html;charset=UTF-8" response header, rather than relying on
the default.

I have reason to believe (though no detailed logs to back up my hunch) that
this problem is restricted to GET.  But then, if we were duplicating POSTs,
people would be yelling rather loudly because of all the site bustage.
this sounds like it's caused by a <meta> tag specifying a charset, where's the bug?
Further observations:  

The double "GET" only occurs if the page has a character-encoding meta tag which
differs from the encoding selected in the Firefox View->Encoding menu.  If the
browser encoding matches the encoding in the page, there's only a single
request, and everything is hunky dory.

[The reporter's web page is ISO-8859-2 (Central European);  ours are UTF-8; 
both our browers are set to ISO-8859-1 (Western)]

The reporter is sending "Cache-Control: no-store".  so are we.

<wild speculation, based on minimal knowledge of moz's networking/parsing code,
apologies if I'm off-target by a radian or two>

Moz recieves the page, and starts parsing.

Once the parser has gotten as far as the html meta http-equiv content-type tag,
[something] realises it's using the wrong encoding, drops everything on the
floor, and re-requests the incoming data from [something upstream] using the
*correct* encoding.

What "should" happen:
[something upstream] mungles the incoming data into the correct encoding, and
sends it to the parser, which starts parsing again.

What's actually happening:
[something upstream] issues a second HTTP GET to the originating web server. 
(Possibly because of the draconian cache-control header?)

</wild speculation>
Christian: Apologies, last comment posted before I grokked your comment.  It's
past midnight here.

Were you thinking of bug 61363 ?  Based on bz's comment 2 above, that's
certainly the one he had in mind.  

I'm pretty sure I had charset autodetect turned off when I hit the problem.

That said, it does sound awfully similar.  Two different ways to trigger the
same problem?
comment 14 describes what happens exactly. I can't tell whether bug 61363
applies only to autodetection or also to the more general case of current
encoding != meta encoding.

personally I don't consider this a bug, but this is not my code... not a necko
issue, since necko can't cache no-store pages; moving this to intl
Assignee: darin → smontagu
Component: Networking: HTTP → Internationalization
QA Contact: core.networking.http → amyy
Depends on: latemeta
Bug 61363 does also include the case of charset specified by meta, but it
doesn't (or didn't) happen when the <meta> is in the first 2048 bytes of the
document. Is that the case here?
Simon: I can confirm that, in our case, the meta tag most definitely IS within
the first 2048 bytes of the document content (it actully runs from ~380-450
characters, which unless my understanding of UTF-8 is way off, means it's
actually in the 380-450 byte range, given that there aren't any heavy-duty
characters likely to require multiple bytes that early in the document)

I'll try to roll a "simple" test case in the near future, but it's probably
going to be a day or two until I get time, and it's probably going to be
JSP-based when it does happen
I guess relevant is not so much byte count, but whether it is in the first
packet (or first 2048 bytes of the first packet or something)
I recently posted in the bug forum about a similar problem where pages are for
some reason loading twice.  I was referred to this bug.

http://forums.mozillazine.org/viewtopic.php?t=209461

A SUMMARY:

We've discovered a problem with our CLRStore.com website when using Firefox. The
only thing I can deduce at this point is a Firefox brower bug. I've done a ton
of testing on this, and I simply can't explain why Firefox mysteriously loads
the following page twice but Internet Explorer only loads it once as it should.

https://www.clrstore.com/cgi-bin/store.cgi

Here's what I mean:

When adding a new product to the shopping cart (a product you haven't already
added to the cart in the past under the same session), Firefox incorrectly loads
the script twice causing two products to be added to the cart instead of one. I
know this because you will notice a message stating the product already exists
in the shopping cart from the first time the page was loaded, yet after the "add
to cart" link was clicked. When a product already exists, the quantity is added
to the existing order. If I add a product that I've already added to the cart in
the past (and then removed again), the page only loads once like it should and
the product is subsequently only added once as well.

I tested this same thing in Internet Explorer and to my surprise it worked just
fine.

Why would the same website run differently on separate browsers? And why would
Firefox cause the same page to reload once it has already parsed to right around
the middle of the store.cgi script?  What seems to reload the page is either the
Perl "index" function or a delayed reaction by Firefox.  I've narrowed it down
to the exact spot in the script using trial and error.

This looks like a Firefox bug to me and I'm certain it isn't my script. I've
checked and there is no way the product could be added twice without reverse
processing the script page or reloading the entire script page. Anyone have any
ideas?

I can post portions of the code if needed, but I don't know if it would help. 
This problem breaks my script and I'm amazed it is still around.
I guess your problem would probably be fixed by sending a charset in the HTTP
header. at a guess, you are currently having it in a <meta> tag, and do not send
a http header, and send headers not to cache the page. that makes mozilla reload
the document once it sees that <meta>.

let me note that that was mentioned some comments above too...
I just tested adding "Content-Type: text/html; charset=UTF-8" to my Perl script
instead of the "Content-Type: text/html" I had before and things now work great!
 Just wanted to follow up on my previous comment I left a few days ago.
I've been searching for the answer to this problem for a while now, and having finally found this page, I am scratching my head that some people seem to think that is isn't really a bug.  Why should the browser send two page requests just because the HTTP header and the meta tag are absent or contradictory?  Assuming this really is the cause, it is strange for the browser to behave this way.  And as others have noted, it causes havoc on sites where page requests are logged in a database or a cookie for some reason.

I can't think of any reason why a browser should make two page requests just because of the charset encoding.  It is still happening in the latest release of Firefox, and it makes no sense.
it makes two requests because it needs to reinterpret the data in the other character set and doesn't have the data for that locally, so it regets it from the server (this is why it does do that, it doesn't necessarily mean this is good behaviour)
(In reply to comment #24)
> it makes two requests because it needs to reinterpret the data in the other
> character set and doesn't have the data for that locally, so it regets it from
> the server (this is why it does do that, it doesn't necessarily mean this is
> good behaviour)


Well...yes, I figured that out.  I meant my question in a more philosophical sense.  As in, is this really a desirable feature?  It seems to me that a better way to handle this would be to have Firefox have some form of priority list which declares whether to use the HTTP header or the meta tag in the event that they are absent or contradictory.  But Firefox having to make a whole new request to the server?  I just can't see this as anything other than a bug.
It certainly has that priority list. If there's an HTTP header it's used. The real question is, if you have to guess what the charset is, and you can only do that some thousand characters after the page started, and the page asked not to be cached, what do you do?

I think this is realted to my bug on:

https://bugzilla.mozilla.org/show_bug.cgi?id=359690

Which is really a pain. I've got at least 10 of these administrations set up. What combination of headers worked?

I've tried the following in different combinations. No luck.

//header("Cache-control: private");  // IE 6 Fix.
//header("Content-type: text/html; charset=ISO-8859-1"); // FF 2.0 Fix
//header("Content-Type: text/html; charset=UTF-8"); // FF 2.0 Fix
I think this is realted to my bug on:

https://bugzilla.mozilla.org/show_bug.cgi?id=359690

Which is really a pain. I've got at least 10 of these administrations set up. What combination of headers worked?

I've tried the following in different combinations. No luck.

//header("Cache-control: private");  // IE 6 Fix.
//header("Content-type: text/html; charset=ISO-8859-1"); // FF 2.0 Fix
//header("Content-Type: text/html; charset=UTF-8"); // FF 2.0 Fix
(In reply to comment #25)
> (In reply to comment #24)
> > it makes two requests because it needs to reinterpret the data in the other
> > character set and doesn't have the data for that locally, so it regets it from
> > the server (this is why it does do that, it doesn't necessarily mean this is
> > good behaviour)
> Well...yes, I figured that out.  I meant my question in a more philosophical
> sense.  As in, is this really a desirable feature?  It seems to me that a
> better way to handle this would be to have Firefox have some form of priority
> list which declares whether to use the HTTP header or the meta tag in the event
> that they are absent or contradictory.  But Firefox having to make a whole new
> request to the server?  I just can't see this as anything other than a bug.

As far as I know, standards, or at least best-practices say that the charset value in the HTTP Content-Type header should be used above a <meta> element (considering the name is http-*EQUIV* (eg, it should be in the HTTP header anyway).

In the general sense, this behaviour is expected if there's no charset specified in Content-Type or it differs from the <meta> declaration.
As others have said in fewer words than myself, the problem is that ideally you need to know the charset to be able to parse the page (otherwise how do you interpret the character data?).

If the browser is unsure of the intended charset it's stuck unless it guesses the charset.
So if it then finds a declaration in the page (in a <meta>) after it's started parsing the page, what should it do? Continue parsing the page using it's guessed charset, or do things "properly" (to avoid charset mis-match issues) by reloading and reparsing the page using the charset it found it the <meta> the first time round.

I'm not saying this is a good thing, but you can't expect to not give an HTTP user agent the charset info, then expect it to magically know the charset before it loads and parses the page lol

As others have said, doing things properly by specifying the correct charset used in the HTTP headers removes this problem completely; the browser knows the charset before it starts parsing the page.

However, I can appreciate how this affects some sites which either can't or won't specify the charset in the HTTP Content-Type header.
One possible solution might be to have the browser keep the page in memory (only request it from the server once), and if it finds conflicting or a charset different to the guessed default then, if possible, it should reparse the page using the newly learned charset while still in memory.

I'm not a programmer so I can't comment on how this could be implemented or how difficult it would be.
Further clarification to my previous comment (#29):

If the HTTP headers say not to cache the page in any way this might still be possible if it's all considered part of a single request from the user.
By that I mean it should all be treated as a single user-request of the page (regardless of HTTP, which should be a single request in a perfect world, but if you're not going to specify the charset in the HTTP what do you expect lol), unless the user requests a page refresh or some other operating which would normally invoke HTTP activity.

However, I've got the nasty feeling that reparsing in memory would probably break at least something.
QA Contact: amyy → i18n
It doesn't look like this ever got resolved although I see a few recent posts by others relating to image request multiple GET requests.  I'm also experiencing this on my server.  Here is a LiveHTTP request log:

http://mra.advanceday.com/link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T

GET /link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T HTTP/1.1
Host: mra.advanceday.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

HTTP/1.1 200 OK
Date: Tue, 25 Jan 2011 13:48:16 GMT
Server: Apache
Expires: Tue, 25 Jan 2011 14:48:16 GMT
Content-Length: 15093
Connection: close
Content-Type: image/jpeg; charset=binary
----------------------------------------------------------
http://mra.advanceday.com/link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T

GET /link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T HTTP/1.1
Host: mra.advanceday.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

HTTP/1.1 200 OK
Date: Tue, 25 Jan 2011 13:48:17 GMT
Server: Apache
Expires: Tue, 25 Jan 2011 14:48:17 GMT
Content-Length: 15093
Connection: close
Content-Type: image/jpeg; charset=binary
Comment #31 said:
> Content-Type: image/jpeg; charset=binary

I believe this is somewhat confused. If it’s binary data (i.e.; *not* text), there’s no charset to specify.
Read the HTTP/1·1 spec (RFC 2616, text-search: “binary” (*with* quotes)); AFAICT “binary” is one possible option for Transfer-Encoding or Content-Encoding, but not Content-Type (charset, where it would be nonsensical in my understanding).

Try configuring the server to respond without specifying a ‘charset’ for binary data, thus: Content-Type: image/jpeg

> Expires: [an hour in the future, from time of request]
I’d also question why (for images) you have Expires: set to only an hour in the future. Unless the images are actually displaying dynamic data (generated from elsewhere), which really does change *every* hour, then images (especially) should be set to something like a year in the future (relative to time of request, naturally), to enable caching (RFC 2616 & http://www.mnot.net/cache_docs/). If, then, the image displayed on (a|some) particular page(s) really needs to be different, then use a different *source* URI in the <img> element in the page mark-up. Best of both.
The charset=binary; is being generated by a mime type identifier (such as file on linux) though this is through php, not setting it specifically.  The hour in the future is just what you said, the graphic is being generated and has a lifetime of one hour.  

The question really though, is why is FF doing a double-request in the first place?
Is Simon Montagu still working on this or we should assignee this bug to other person ?
Don't use GET to change state on your server; clients, intermediaries, spiders, etc. can and will make automated requests, pre-fetch, retry failed requests, etc.

Use POST.
Same issue with FF4 and content like images/css and Apache. What is especially annoying (at least for me) is that the second request does NOT provide session cookie. Example:

GET /img_bg.gif HTTP/1.1
Host: 192.168.1.9
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
DNT: 1
Connection: keep-alive
Cookie: sessionid=*********/************

HTTP/1.1 200 OK
Date: Fri, 10 Jun 2011 04:01:29 GMT
Server: Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny10 with Suhosin-Patch
Last-Modified: Fri, 10 Jun 2011 03:40:58 GMT
ETag: "a289-9b-4a5535671a680"
Accept-Ranges: bytes
Content-Length: 155
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: image/gif

GIF89a...........;

GET /img_bg.gif HTTP/1.1
Host: 192.168.1.9
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
DNT: 1
Connection: keep-alive

HTTP/1.1 200 OK
Date: Fri, 10 Jun 2011 04:01:29 GMT
Server: Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny10 with Suhosin-Patch
Last-Modified: Fri, 10 Jun 2011 03:40:58 GMT
ETag: "a289-9b-4a5535671a680"
Accept-Ranges: bytes
Content-Length: 155
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: image/gif

GIF89a...........;
What's the status of fixing this bug ? Any plans ? It's hurting our servers and bandwidth.
What is the status of this bug ? Same issue with FF v16.0.2 when requesting dynamically generated image. Firefox sends request twice.
This is a long, long standing bug and really should be fixed.  I come across it occurring fairly regularly.  Imagine the bandwidth being wasted due to this bug, double-downloading images.
There are a lot of people receiving these updates, perhaps if we all vote for this bug it will make a difference?
What's the status in fixing this bug?
Flags: needinfo?(smontagu)
As far as I know nobody is working on this, nor on bug 61363 which it depends on.
Assignee: smontagu → nobody
Component: Internationalization → HTML: Parser
Flags: needinfo?(smontagu)
First of all, as far as I can tell this problem, as originally reported, is simply a duplicate of bug 61363. Hence, I am marking this is a duplicate.

The problem described in comment 31, comment 37 and comment 39 is most likely a different problem arising from prefetching images.

The original problem can be 100% avoided by the Web page author by using HTML correctly as required by the HTML specification. There are three different solutions any one of which can be used:

 1) Configure your server to declare the character encoding in the Content-Type HTTP header. For example, if your HTML document is encoded as UTF-8 (which it should be), make your servers send the HTTP header
Content-Type: text/html; charset=utf-8
instead of
Content-Type: text/html

This solution works with any character encoding supported by Firefox.

OR

 2) Make sure that you declare the character encoding of your HTML document using a "meta" element within the first 1024 bytes of your document. That is, if you are using UTF-8 (which you should), start your document with
<!DOCTYPE html>
<html>
  <head>
    <meta charset=utf-8>
    <title>Whatever>
etc. and don't put massive comments, scripts or other stuff before <meta charset=utf-8>.

This solution works with any character encoding supported by Firefox except UTF-16 encodings, which you shouldn't be using anyway.

OR

 3) Start your document with a BOM (byte order mark). If you're using UTF-8, make the first three bytes of your file be 0xEF, 0xBB, 0xBF. You probably should not use this method unless you're sure that the software you are using won't accidentally delete these three bytes.

This solution works only with UTF-8 and UTF-16, but you should not be using UTF-16 anyway, which is why I did not give the magic bytes for UTF-16.

As for the other problem related to prefetching images, please see https://developer.mozilla.org/en-US/docs/HTML/Optimizing_Your_Pages_for_Speculative_Parsing

Finally, Firefox 4 had a bug which made it load images between <noscript> and </noscript> even when scripting was enabled. That bug has been fixed.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Summary: Repeating GET requests → Repeating GET requests when charset <meta> appears late
No longer depends on: latemeta
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: