236858 - Repeating GET requests when charset <meta> appears late

Reporter

Description

•

20 years ago

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5) Gecko/20031007

After finishing the development of a dynamic web site using php and server-side
sessions (using cookies), and uploading it to the server, I noticed that some of
it's functionallity is broken, in fact broken are pages using sessions for
determining current state of user's progress.

I traced this problem (by examining Web server's log files) to be caused by
repeating GET requests made by my Mozilla browser, while testing my web site
from my machine, over a 56k modem connection.

The question is: why the Mozilla browser is repeating GET queries to Web server?
Is there a way to increase HTTP (connection?) timeouts? I tried to fiddle with
configuration from about:config, but even after setting ...timeout... parameters
to enormous values, nothing worked better.

The described repetion of GET queries happens in about 90% cases, tested both
with Mozilla 1.5 and 1.6.

Any help? Thanks in advance.

BTW, here is a HTTP header from sample "hands-made" GET request to one of
"problematic" pages (and the headers are identical on my development server and
my production server, so there are no differences between them to cause these
problems):

HTTP/1.1 200 OK
Date: Mon, 08 Mar 2004 21:03:52 GMT
Server: Apache
Set-Cookie: admsid=23fe9156addcf1af54d82827cc124a43; path=/admin;
domain=xxx.xxxxxxxx.xxx
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Connection: close
Content-Type: text/html


Reproducible: Always
Steps to Reproduce:
1. Fetch the problematic page from my production server

Actual Results:  
Repeated GET queries to my production server.

Expected Results:  
Just one GET query. My 56k modem connection is quite stable. ;)

Darin Fisher

Comment 1

•

20 years ago

reporter: can you please provide a HTTP log per the instructions on this site:

http://www.mozilla.org/projects/netlib/http/http-debugging.html

feel free to email the log file directly to me if you would like its contents
kept private.  attaching the log file to this bug is otherwise fine :)

Boris Zbarsky [:bzbarsky]

Comment 2

•

20 years ago

> Content-Type: text/html

You don't set a charset here.  Does the page set it?  Or are you relying on the
browser's charset autodetect?  Do the repeated GETs go away if you disable
charset autodetect?

Dragan Simic

Reporter

Comment 3

•

20 years ago

> You don't set a charset here.  Does the page set it?  Or are you relying on the
> browser's charset autodetect?  Do the repeated GETs go away if you disable
> charset autodetect?

I have the following line in my page's header, so it's setting the charset:
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-2">

In my Mozilla's settings, Navigator->Languages->Character Coding->Default
Character Coding is set to "Western (ISO-8859-1)".

Excuse me for a stupid question, but how do I disable charset autodetect?

Dragan Simic

Reporter

Comment 4

•

20 years ago

Attached file Browser log, during problem reproducing — Details

Browser log, during problem reproducing

Dragan Simic

Reporter

Comment 5

•

20 years ago

> reporter: can you please provide a HTTP log per the instructions on this site:

I've sent the requested browser's HTTP log, reproducing the described problem.
As a notice, it's a bzip2'ed file.

In this logfile, http://omega.homelab.net/ is just my start page, while pages
making problems are under http://mp3.rskoming.net/admin/, and you can see
repeated requests for /admin/index.php, /admin/add.php and finally for
/admin/logout.php.

As it could have something with local Caching, I've tried to reproduce the
problem with all four Caching setting ("Compare the page in local cache with the
page on network" - or however ;), and it persists with all four settings.

BTW, please don't think I'm violating many laws by distributing MP3's around,
this is just a local archive. ;)

Boris Zbarsky [:bzbarsky]

Comment 6

•

20 years ago

> Excuse me for a stupid question, but how do I disable charset autodetect?

View menu > Character Encoding > Autodetect > (Off)

Dragan Simic

Reporter

Comment 7

•

20 years ago

> > Excuse me for a stupid question, but how do I disable charset autodetect?
>
> View menu > Character Encoding > Autodetect > (Off)

Just for info, it was (and still is) turned Off...

Andrew Schultz

Updated

•

20 years ago

Attachment #143372 - Attachment mime type: text/plain → application/x-bzip2

Dragan Simic

Reporter

Comment 8

•

20 years ago

> I've sent the requested browser's HTTP log, reproducing the described problem.
> As a notice, it's a bzip2'ed file.

Any clues out of it?

Thomas O'Connor

Comment 9

•

20 years ago

I also was suffering this problem with my cart.  I spent about 40 minutes trying
to fix it on the server side, and then I decided to check the request via
LiveHTTPHeaders.  It was then I noticed that the file was being re-requested. 
After a quick search on Bugzilla I found this bug, noticed that comments
regarding the charset (I set one in the source, but not via HTTP Headers), and
sent the charset via the headers. It now works fine.  The PHP code to send the
charset via headers is header("Content-type: text/html; charset=ISO-8859-1"); if
anyone is interested.

gavin long

Comment 10

•

20 years ago

Confirming.  One of my colleagues has managed to reproduce this reliably on
Firefox 1.0, WinXP.  (OS->all)

Setting the content-type header does indeed resolve the problem.

I'll attach HTTP traces in a minute.

Status: UNCONFIRMED → NEW

Ever confirmed: true

OS: Linux → All

gavin long

Comment 11

•

20 years ago

Attached file HTTP log. No content type -> duplicated GET — Details

Log demonstrating the problem.	The request in question is:
GET
/template/startquote.launch?PolicyType=PC&CompanyName=template&brandName=default
HTTP/1.1

gavin long

Comment 12

•

20 years ago

Attached file HTTP log. Content type set -> single GET — Details

Log for a similar request, again, for:
GET
/template/startquote.launch?PolicyType=PC&CompanyName=template&brandName=default

HTTP/1.1
(apologies for unnecessary wrapping)

The only server-side change bewteen these two requests was to explicitly set a
"Content-Type: text/html;charset=UTF-8" response header, rather than relying on
the default.

I have reason to believe (though no detailed logs to back up my hunch) that
this problem is restricted to GET.  But then, if we were duplicating POSTs,
people would be yelling rather loudly because of all the site bustage.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 13

•

20 years ago

this sounds like it's caused by a <meta> tag specifying a charset, where's the bug?

gavin long

Comment 14

•

20 years ago

Further observations:  

The double "GET" only occurs if the page has a character-encoding meta tag which
differs from the encoding selected in the Firefox View->Encoding menu.  If the
browser encoding matches the encoding in the page, there's only a single
request, and everything is hunky dory.

[The reporter's web page is ISO-8859-2 (Central European);  ours are UTF-8; 
both our browers are set to ISO-8859-1 (Western)]

The reporter is sending "Cache-Control: no-store".  so are we.

<wild speculation, based on minimal knowledge of moz's networking/parsing code,
apologies if I'm off-target by a radian or two>

Moz recieves the page, and starts parsing.

Once the parser has gotten as far as the html meta http-equiv content-type tag,
[something] realises it's using the wrong encoding, drops everything on the
floor, and re-requests the incoming data from [something upstream] using the
*correct* encoding.

What "should" happen:
[something upstream] mungles the incoming data into the correct encoding, and
sends it to the parser, which starts parsing again.

What's actually happening:
[something upstream] issues a second HTTP GET to the originating web server. 
(Possibly because of the draconian cache-control header?)

</wild speculation>

gavin long

Comment 15

•

20 years ago

Christian: Apologies, last comment posted before I grokked your comment.  It's
past midnight here.

Were you thinking of bug 61363 ?  Based on bz's comment 2 above, that's
certainly the one he had in mind.  

I'm pretty sure I had charset autodetect turned off when I hit the problem.

That said, it does sound awfully similar.  Two different ways to trigger the
same problem?

Christian :Biesinger (don't email me, ping me on IRC)

Comment 16

•

20 years ago

comment 14 describes what happens exactly. I can't tell whether bug 61363
applies only to autodetection or also to the more general case of current
encoding != meta encoding.

personally I don't consider this a bug, but this is not my code... not a necko
issue, since necko can't cache no-store pages; moving this to intl

Assignee: darin → smontagu

Component: Networking: HTTP → Internationalization

QA Contact: core.networking.http → amyy

Boris Zbarsky [:bzbarsky]

Updated

•

20 years ago

Depends on: latemeta

Simon Montagu :smontagu

Comment 17

•

20 years ago

Bug 61363 does also include the case of charset specified by meta, but it
doesn't (or didn't) happen when the <meta> is in the first 2048 bytes of the
document. Is that the case here?

gavin long

Comment 18

•

20 years ago

Simon: I can confirm that, in our case, the meta tag most definitely IS within
the first 2048 bytes of the document content (it actully runs from ~380-450
characters, which unless my understanding of UTF-8 is way off, means it's
actually in the 380-450 byte range, given that there aren't any heavy-duty
characters likely to require multiple bytes that early in the document)

I'll try to roll a "simple" test case in the near future, but it's probably
going to be a day or two until I get time, and it's probably going to be
JSP-based when it does happen

Christian :Biesinger (don't email me, ping me on IRC)

Comment 19

•

20 years ago

I guess relevant is not so much byte count, but whether it is in the first
packet (or first 2048 bytes of the first packet or something)

Matt B

Comment 20

•

20 years ago

I recently posted in the bug forum about a similar problem where pages are for
some reason loading twice.  I was referred to this bug.

http://forums.mozillazine.org/viewtopic.php?t=209461

A SUMMARY:

We've discovered a problem with our CLRStore.com website when using Firefox. The
only thing I can deduce at this point is a Firefox brower bug. I've done a ton
of testing on this, and I simply can't explain why Firefox mysteriously loads
the following page twice but Internet Explorer only loads it once as it should.

https://www.clrstore.com/cgi-bin/store.cgi

Here's what I mean:

When adding a new product to the shopping cart (a product you haven't already
added to the cart in the past under the same session), Firefox incorrectly loads
the script twice causing two products to be added to the cart instead of one. I
know this because you will notice a message stating the product already exists
in the shopping cart from the first time the page was loaded, yet after the "add
to cart" link was clicked. When a product already exists, the quantity is added
to the existing order. If I add a product that I've already added to the cart in
the past (and then removed again), the page only loads once like it should and
the product is subsequently only added once as well.

I tested this same thing in Internet Explorer and to my surprise it worked just
fine.

Why would the same website run differently on separate browsers? And why would
Firefox cause the same page to reload once it has already parsed to right around
the middle of the store.cgi script?  What seems to reload the page is either the
Perl "index" function or a delayed reaction by Firefox.  I've narrowed it down
to the exact spot in the script using trial and error.

This looks like a Firefox bug to me and I'm certain it isn't my script. I've
checked and there is no way the product could be added twice without reverse
processing the script page or reloading the entire script page. Anyone have any
ideas?

I can post portions of the code if needed, but I don't know if it would help. 
This problem breaks my script and I'm amazed it is still around.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 21

•

20 years ago

I guess your problem would probably be fixed by sending a charset in the HTTP
header. at a guess, you are currently having it in a <meta> tag, and do not send
a http header, and send headers not to cache the page. that makes mozilla reload
the document once it sees that <meta>.

let me note that that was mentioned some comments above too...

Matt B

Comment 22

•

20 years ago

I just tested adding "Content-Type: text/html; charset=UTF-8" to my Perl script
instead of the "Content-Type: text/html" I had before and things now work great!
 Just wanted to follow up on my previous comment I left a few days ago.

fwustner

Comment 23

•

18 years ago

I've been searching for the answer to this problem for a while now, and having finally found this page, I am scratching my head that some people seem to think that is isn't really a bug.  Why should the browser send two page requests just because the HTTP header and the meta tag are absent or contradictory?  Assuming this really is the cause, it is strange for the browser to behave this way.  And as others have noted, it causes havoc on sites where page requests are logged in a database or a cookie for some reason.

I can't think of any reason why a browser should make two page requests just because of the charset encoding.  It is still happening in the latest release of Firefox, and it makes no sense.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 24

•

18 years ago

it makes two requests because it needs to reinterpret the data in the other character set and doesn't have the data for that locally, so it regets it from the server (this is why it does do that, it doesn't necessarily mean this is good behaviour)

fwustner

Comment 25

•

18 years ago

(In reply to comment #24)
> it makes two requests because it needs to reinterpret the data in the other
> character set and doesn't have the data for that locally, so it regets it from
> the server (this is why it does do that, it doesn't necessarily mean this is
> good behaviour)


Well...yes, I figured that out.  I meant my question in a more philosophical sense.  As in, is this really a desirable feature?  It seems to me that a better way to handle this would be to have Firefox have some form of priority list which declares whether to use the HTTP header or the meta tag in the event that they are absent or contradictory.  But Firefox having to make a whole new request to the server?  I just can't see this as anything other than a bug.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 26

•

18 years ago

It certainly has that priority list. If there's an HTTP header it's used. The real question is, if you have to guess what the charset is, and you can only do that some thousand characters after the page started, and the page asked not to be cached, what do you do?

Clint

Comment 27

•

18 years ago

I think this is realted to my bug on:

https://bugzilla.mozilla.org/show_bug.cgi?id=359690

Which is really a pain. I've got at least 10 of these administrations set up. What combination of headers worked?

I've tried the following in different combinations. No luck.

//header("Cache-control: private");  // IE 6 Fix.
//header("Content-type: text/html; charset=ISO-8859-1"); // FF 2.0 Fix
//header("Content-Type: text/html; charset=UTF-8"); // FF 2.0 Fix

Clint

Comment 28

•

18 years ago

I think this is realted to my bug on:

https://bugzilla.mozilla.org/show_bug.cgi?id=359690

Which is really a pain. I've got at least 10 of these administrations set up. What combination of headers worked?

I've tried the following in different combinations. No luck.

//header("Cache-control: private");  // IE 6 Fix.
//header("Content-type: text/html; charset=ISO-8859-1"); // FF 2.0 Fix
//header("Content-Type: text/html; charset=UTF-8"); // FF 2.0 Fix

Lee Carré

Comment 29

•

17 years ago

(In reply to comment #25)
> (In reply to comment #24)
> > it makes two requests because it needs to reinterpret the data in the other
> > character set and doesn't have the data for that locally, so it regets it from
> > the server (this is why it does do that, it doesn't necessarily mean this is
> > good behaviour)
> Well...yes, I figured that out.  I meant my question in a more philosophical
> sense.  As in, is this really a desirable feature?  It seems to me that a
> better way to handle this would be to have Firefox have some form of priority
> list which declares whether to use the HTTP header or the meta tag in the event
> that they are absent or contradictory.  But Firefox having to make a whole new
> request to the server?  I just can't see this as anything other than a bug.

As far as I know, standards, or at least best-practices say that the charset value in the HTTP Content-Type header should be used above a <meta> element (considering the name is http-*EQUIV* (eg, it should be in the HTTP header anyway).

In the general sense, this behaviour is expected if there's no charset specified in Content-Type or it differs from the <meta> declaration.
As others have said in fewer words than myself, the problem is that ideally you need to know the charset to be able to parse the page (otherwise how do you interpret the character data?).

If the browser is unsure of the intended charset it's stuck unless it guesses the charset.
So if it then finds a declaration in the page (in a <meta>) after it's started parsing the page, what should it do? Continue parsing the page using it's guessed charset, or do things "properly" (to avoid charset mis-match issues) by reloading and reparsing the page using the charset it found it the <meta> the first time round.

I'm not saying this is a good thing, but you can't expect to not give an HTTP user agent the charset info, then expect it to magically know the charset before it loads and parses the page lol

As others have said, doing things properly by specifying the correct charset used in the HTTP headers removes this problem completely; the browser knows the charset before it starts parsing the page.

However, I can appreciate how this affects some sites which either can't or won't specify the charset in the HTTP Content-Type header.
One possible solution might be to have the browser keep the page in memory (only request it from the server once), and if it finds conflicting or a charset different to the guessed default then, if possible, it should reparse the page using the newly learned charset while still in memory.

I'm not a programmer so I can't comment on how this could be implemented or how difficult it would be.

Lee Carré

Comment 30

•

17 years ago

Further clarification to my previous comment (#29):

If the HTTP headers say not to cache the page in any way this might still be possible if it's all considered part of a single request from the user.
By that I mean it should all be treated as a single user-request of the page (regardless of HTTP, which should be a single request in a perfect world, but if you're not going to specify the charset in the HTTP what do you expect lol), unless the user requests a page refresh or some other operating which would normally invoke HTTP activity.

However, I've got the nasty feeling that reparsing in memory would probably break at least something.

Phil Ringnalda (:philor)

Updated

•

15 years ago

QA Contact: amyy → i18n

Clint Priest

Comment 31

•

13 years ago

It doesn't look like this ever got resolved although I see a few recent posts by others relating to image request multiple GET requests.  I'm also experiencing this on my server.  Here is a LiveHTTP request log:

http://mra.advanceday.com/link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T

GET /link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T HTTP/1.1
Host: mra.advanceday.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

HTTP/1.1 200 OK
Date: Tue, 25 Jan 2011 13:48:16 GMT
Server: Apache
Expires: Tue, 25 Jan 2011 14:48:16 GMT
Content-Length: 15093
Connection: close
Content-Type: image/jpeg; charset=binary
----------------------------------------------------------
http://mra.advanceday.com/link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T

GET /link/9fCqc01E01C01ExlHfixi8qM9cRRS1F1C1BU1632T HTTP/1.1
Host: mra.advanceday.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729; .NET4.0C)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive

HTTP/1.1 200 OK
Date: Tue, 25 Jan 2011 13:48:17 GMT
Server: Apache
Expires: Tue, 25 Jan 2011 14:48:17 GMT
Content-Length: 15093
Connection: close
Content-Type: image/jpeg; charset=binary

Lee Carré

Comment 32

•

13 years ago

Comment #31 said:
> Content-Type: image/jpeg; charset=binary

I believe this is somewhat confused. If it’s binary data (i.e.; *not* text), there’s no charset to specify.
Read the HTTP/1·1 spec (RFC 2616, text-search: “binary” (*with* quotes)); AFAICT “binary” is one possible option for Transfer-Encoding or Content-Encoding, but not Content-Type (charset, where it would be nonsensical in my understanding).

Try configuring the server to respond without specifying a ‘charset’ for binary data, thus: Content-Type: image/jpeg

> Expires: [an hour in the future, from time of request]
I’d also question why (for images) you have Expires: set to only an hour in the future. Unless the images are actually displaying dynamic data (generated from elsewhere), which really does change *every* hour, then images (especially) should be set to something like a year in the future (relative to time of request, naturally), to enable caching (RFC 2616 & http://www.mnot.net/cache_docs/). If, then, the image displayed on (a|some) particular page(s) really needs to be different, then use a different *source* URI in the <img> element in the page mark-up. Best of both.

Clint Priest

Comment 33

•

13 years ago

The charset=binary; is being generated by a mime type identifier (such as file on linux) though this is through php, not setting it specifically.  The hour in the future is just what you said, the graphic is being generated and has a lifetime of one hour.  

The question really though, is why is FF doing a double-request in the first place?

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 34

•

13 years ago

Is Simon Montagu still working on this or we should assignee this bug to other person ?

Mark Nottingham

Comment 35

•

13 years ago

Don't use GET to change state on your server; clients, intermediaries, spiders, etc. can and will make automated requests, pre-fetch, retry failed requests, etc.

Use POST.

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 36

•

13 years ago

But that's no fix for this bug...

Karel

Comment 37

•

13 years ago

Same issue with FF4 and content like images/css and Apache. What is especially annoying (at least for me) is that the second request does NOT provide session cookie. Example:

GET /img_bg.gif HTTP/1.1
Host: 192.168.1.9
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
DNT: 1
Connection: keep-alive
Cookie: sessionid=*********/************

HTTP/1.1 200 OK
Date: Fri, 10 Jun 2011 04:01:29 GMT
Server: Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny10 with Suhosin-Patch
Last-Modified: Fri, 10 Jun 2011 03:40:58 GMT
ETag: "a289-9b-4a5535671a680"
Accept-Ranges: bytes
Content-Length: 155
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Content-Type: image/gif

GIF89a...........;

GET /img_bg.gif HTTP/1.1
Host: 192.168.1.9
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0.1) Gecko/20100101 Firefox/4.0.1
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: fr,en;q=0.5
Accept-Encoding: gzip, deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
DNT: 1
Connection: keep-alive

HTTP/1.1 200 OK
Date: Fri, 10 Jun 2011 04:01:29 GMT
Server: Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny10 with Suhosin-Patch
Last-Modified: Fri, 10 Jun 2011 03:40:58 GMT
ETag: "a289-9b-4a5535671a680"
Accept-Ranges: bytes
Content-Length: 155
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: image/gif

GIF89a...........;

Rob Tonsan

Comment 38

•

13 years ago

What's the status of fixing this bug ? Any plans ? It's hurting our servers and bandwidth.

nbritvikhina

Comment 39

•

12 years ago

What is the status of this bug ? Same issue with FF v16.0.2 when requesting dynamically generated image. Firefox sends request twice.

Clint Priest

Comment 40

•

12 years ago

This is a long, long standing bug and really should be fixed.  I come across it occurring fairly regularly.  Imagine the bandwidth being wasted due to this bug, double-downloading images.

Clint Priest

Comment 41

•

12 years ago

There are a lot of people receiving these updates, perhaps if we all vote for this bug it will make a difference?

Virtual_ManPL [:Virtual] 🇵🇱 - (please needinfo? me - so I will see your comment/reply/question/etc.)

Comment 42

•

11 years ago

What's the status in fixing this bug?

Flags: needinfo?(smontagu)

Simon Montagu :smontagu

Comment 43

•

11 years ago

As far as I know nobody is working on this, nor on bug 61363 which it depends on.

Assignee: smontagu → nobody

Component: Internationalization → HTML: Parser

Flags: needinfo?(smontagu)

Henri Sivonen (:hsivonen)

Comment 44

•

11 years ago

First of all, as far as I can tell this problem, as originally reported, is simply a duplicate of bug 61363. Hence, I am marking this is a duplicate.

The problem described in comment 31, comment 37 and comment 39 is most likely a different problem arising from prefetching images.

The original problem can be 100% avoided by the Web page author by using HTML correctly as required by the HTML specification. There are three different solutions any one of which can be used:

1) Configure your server to declare the character encoding in the Content-Type HTTP header. For example, if your HTML document is encoded as UTF-8 (which it should be), make your servers send the HTTP header
Content-Type: text/html; charset=utf-8
instead of
Content-Type: text/html

This solution works with any character encoding supported by Firefox.

OR

2) Make sure that you declare the character encoding of your HTML document using a "meta" element within the first 1024 bytes of your document. That is, if you are using UTF-8 (which you should), start your document with
<!DOCTYPE html>
<html>
<head>
<meta charset=utf-8>
<title>Whatever>
etc. and don't put massive comments, scripts or other stuff before <meta charset=utf-8>.

This solution works with any character encoding supported by Firefox except UTF-16 encodings, which you shouldn't be using anyway.

OR

3) Start your document with a BOM (byte order mark). If you're using UTF-8, make the first three bytes of your file be 0xEF, 0xBB, 0xBF. You probably should not use this method unless you're sure that the software you are using won't accidentally delete these three bytes.

This solution works only with UTF-8 and UTF-16, but you should not be using UTF-16 anyway, which is why I did not give the magic bytes for UTF-16.

As for the other problem related to prefetching images, please see https://developer.mozilla.org/en-US/docs/HTML/Optimizing_Your_Pages_for_Speculative_Parsing

Finally, Firefox 4 had a bug which made it load images between <noscript> and </noscript> even when scripting was enabled. That bug has been fixed.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → DUPLICATE

Summary: Repeating GET requests → Repeating GET requests when charset <meta> appears late

Mathew Hodson

Updated

•

7 months ago

No longer depends on: latemeta

Browser log, during problem reproducing 20 years ago Dragan Simic 10.11 KB, application/x-bzip2		Details
HTTP log. No content type -> duplicated GET 20 years ago gavin long 131.68 KB, text/plain		Details
HTTP log. Content type set -> single GET 20 years ago gavin long 192.42 KB, text/plain		Details