Open Bug 597820 Opened 14 years ago Updated 2 years ago

Charset autodetect can cause all loads of a page to be LOAD_FROM_CACHE, inhibiting correct validation of subresources

Categories

(Core :: DOM: Navigation, defect)

defect

Tracking

()

People

(Reporter: wenbinleo, Unassigned)

References

()

Details

Attachments

(5 files)

User-Agent:       Mozilla/5.0 (X11; Linux x86_64; rv:2.0b7pre) Gecko/20100916 Firefox-4.0/4.0b7pre Ubuntu/10.04
Build Identifier: Mozilla/5.0 (Windows NT 5.1; rv:2.0b6) Gecko/20100101 Firefox/4.0b6

The page http://www.cwb.gov.tw/V6/observe/rainfall/hk.htm? is a frame of Taiwan's Central Weather Bureau web page http://www.cwb.gov.tw/. It includes a JavaScript file "/V6/js/rain_val.js" in the html source code. The html page is rarely changed, but the JavaScript file is updated every 30 minutes to update a list of images to show how much rain in Taiwan areas. When browsing this page with Firefox 4 beta, it shows old results of many days ago and manually relopad the page do not work. Firefox 3.6, IE, Google Chrome and opera do not suffer from this problem.

Reproducible: Always

Steps to Reproduce:
1. Ensure cache size setting in Firefox is not too small. Default value is okay to reproduce the bug if you do not browse too much.
2. Go to http://www.cwb.gov.tw/V6/observe/rainfall/hk.htm? . The date and time in the image should be in 30 minutes from now. Remember the date and time showed.
3. Close the web page and wait for 30 minutes or more. In the mean time do not browse web too much to prevent the page is evicted in cache.
4. Reopen the page, the date and time is not updated.
Actual Results:  
The date and time is not updated. And manually reload the page only reload the html page which will get a "HTTP/1.0 304 Not Modified" which showed the html page is not updated. But the included JavaScript and image file is not reloaded even if it is expired.

Expected Results:  
The image with date and time inside updated.

This problem is first found when I tried Firefox 4 beta 2 on one of my computer. I havn't tried older Firefox alphas to see if it happens. But I have tried it on Windows 2000, Windows XP and Ubuntu x64 and it can be reproduced on all platform I tried.

The disk cache information in the about:cache on one machine showed rain_val.js is not updated for nearly 3 weeks, Even it is expired long ago.
Key                                                     Data size       Fetch count     Last modified  	        Expires
http://www.cwb.gov.tw/V6/observe/rainfall/hk.htm?       5431 bytes      10              2010-09-19 15:39:37     2010-09-30 17:43:13
http://www.cwb.gov.tw/V6/js/rain_val.js                 9225 bytes       5              2010-08-28 15:12:49     2010-08-28 15:12:49

And about manually reload the page do not work. When newer Firefox 4 beta with Web Console released. I tried it and found manually reload the page will only make it reload the html page, which will get a "HTTP/1.0 304 Not Modified". And the browser do not attempt to reload the JavaScript file and image file (no log showed in the Web Console).
Version: unspecified → Trunk
I don't know if the question mark in the address 
http://www.cwb.gov.tw/V6/observe/rainfall/hk.htm? is required to reproduce the bug. But it is in the original page.
Wenbin Leo, could you please attach an HTTP log of you following the steps to reproduce?  See https://developer.mozilla.org/en/HTTP_Logging for directions.  That will give a good bit more information than the web console shows about what's going on under the hood.
I have reproduce the bug twice on the same Win XP machine, and attached the log files.
I try to reproduce this bug on Ubuntu 10.04 x64. This log.txt is not got using official Mozilla build but a build from https://launchpad.net/~ubuntu-mozilla-daily/+archive/ppa which updates daily.
OK, here's what seems to be the relevant part of the log:

0[82b140]: HttpBaseChannel::Init [this=46dd4a0]
0[82b140]: host=www.cwb.gov.tw port=-1
0[82b140]: uri=http://www.cwb.gov.tw/V6/js/rain_val.js
...
0[82b140]: nsHttpChannel::OpenCacheEntry [this=46dd4a0 grantedAccess=3]
...
0[82b140]: NOT validating based on LOAD_FROM_CACHE load flag
0[82b140]: nsHTTPChannel::CheckCache exit [this=46dd4a0 doValidation=0]
0[82b140]: nsHttpChannel::ReadFromCache [this=46dd4a0] Using cached copy of: http://www.cwb.gov.tw/V6/js/rain_val.js

Why is your LOAD_FROM_CACHE flag set?  The only times that's set, as far as I can see, are:

1) Charset change reloads
2) Checking whether a URI is locally available via the mozIsLocallyAvailable API
3) Save as
4) View source
5) Some weirdness involving <xul:image> that's not relevant here.

Plus anything extensions do, of course.  Just to check, are there any extensions involved?  Does the problem appear in safe mode?
This problem appear in safe mode, too.
Plus I have another environment with Firefox installed with only one Mozilla's extension (Grafx Bot) can also reproduce it.
About the above probable situations:
1) Because the Charset setting in this page (Big5) is correct, I don't manually change charset when reading the page in question.
2) I don't know.
3) I havn't do any Save as on the site.
4) I did View source after the problem happens, did this affects next time I reading the page?
> did this affects next time I reading the page?

No, unless the log you attached is the log of you opening view source... ;)

When you say "reopen the page" in comment 0, you just retype the URI in the url bar and hit enter or something?

I really have no idea how your HTTP channels can possibly be ending up with the LOAD_FROM_CACHE flag set....
> No, unless the log you attached is the log of you opening view source... ;)

Thanks for your clarification. This log is generated opening the page in the browser main window, not view source.

> When you say "reopen the page" in comment 0, you just retype the URI in the 
> url bar and hit enter or something?

Yes, I just copy and paste the URI into the awesomebar and hit enter. I think this is easier for others to reproduce the bug.
In normal usage, this page is a frame of the site. User will use a link in another frame to switch to this page or others like temperature, etc. The temperature page suffers from the same problem.
But as I have tested, open the page in a frame or in a tab do not affect the result.
Another question.  Does the problem appear in a clean profile?

To be clear, I can't reproduce this problem.  When I follow your steps to reproduce, things reload correctly, and my HTTP log says:

1894456352[102613840]: no mandatory validation requirement
1894456352[102613840]: Validating based on expiration time
1894456352[102613840]: nsHTTPChannel::CheckCache exit [this=11e47fc50 doValidation=1]

and then the browser does a conditional GET as expected.
After using profile manager to create new profiles, and testing with the newly created profiles on two computers. I can still reproduce this problem on both of them. And on one of the two, I have tried to get the log file and can conclude there is still LOAD_FROM_CACHE in the log file.
Before reporting this to bugzilla, I have posted this on the local community (Mozilla Taiwan) forum. And there are some replies that they cannot reproduce it. One is using Windows 7 and the other is using Mac OS X 10.6. I don't know if this have something to do with the bug.
So it also happens with clean profiles?  That's pretty odd...  

I can try to put together a debug build that will log all sorts of information that might be useful in this case...  Are you willing to download and run that?
Yes, I can download and run debug build to help finding the cause. Thanks for your help.
Great!  On Linux, could you run http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/bzbarsky@mozilla.com-c5dac7ba9de1/tryserver-linux-debug/firefox-4.0b7pre.en-US.linux-i686.tar.bz2 please?  Start it from a terminal, redirecting stdout to a file.  Reproduce the bug.  Then attach the log file here?

Let me know if you're prefer a Windows build, but capturing stdout on Windows is more of a pain....
The log file is attached.
When I use a new profile created by this debug build, I can not reproduce the bug. But when I use my old profile with the build, I reproduced it.
There are also warnings output to terminal window while I redirected the log to text file, do I need to upload them?
Hrm.  I wonder why that's not showing me useful symbols for the stacks...  

Would you mind going into the directory in that debug Firefox install that has the file libxul.so in it and running the perl script at http://mxr.mozilla.org/mozilla-central/source/tools/rb/fix-linux-stack.pl?raw=1 on this log, then attaching the result?

The profile dependence is interesting, given that you said you could reproduce this in clean profiles before.  Does your profile that can reproduce the bug perhaps have charset autodetect turned on?
I use the command below to process the log file, is it correct?

wenbin@Wenbin:~/firefox$ ./fix-linux-stack.pl debuglogstdout.txt > debuglogstdoutprocessed.txt

Seems the output log have some information stripped but none newly added.

> The profile dependence is interesting, given that you said you could reproduce
> this in clean profiles before.  Does your profile that can reproduce the bug
> perhaps have charset autodetect turned on?

Yes, I checked 3 environment that can reproduce the problem, and all have charset autodetect on. Mine is set to East Asian, and the other two are set to Universal. I will try if this problem problem will disappear if Charset Encoding Auto-Detect is disabled.
But my sister's computer have this option set to Esat Asian without suffering from this problem.
I have tried on 2 computers and can confirm that this problem will disappear if Charset Encoding Auto-Detect is disabled.
OK.  So with charset auto-detect you will in fact get the charset change reloads I mention in comment 6.  That explains what's going on, at least!

What was the deal in comment 11?  There you said you could reproduce the problem in clean profiles, but autodetect defaults to off...

In any case, it sounds like we shouldn't inherit the charset autodetect LOAD_FROM_CACHE to the loadgroup... but should keep it for all the other cases.  Henri, Simon, is it possible to use charset autodetect without reloading?  That would make it easier to handle this.
Status: UNCONFIRMED → NEW
Component: Networking: Cache → Document Navigation
Ever confirmed: true
QA Contact: networking.cache → docshell
Summary: Expired JavaScript file is not re-downloaded when including html page is not changed → Charset autodetect can cause all loads of a page to be LOAD_FROM_CACHE, inhibiting correct validation of subresources
Was the test with a clean profile using a localized build? In the zh-TW localization autodetect defaults to on.

When you say "is it possible to use charset autodetect without reloading", what alternative are you thinking of? If we converted what we already loaded using charset X, in theory we could convert it back and then reconvert using charset Y, but there would be dataloss some of the time, and would that really be easier?
> OK.  So with charset auto-detect you will in fact get the charset change
> reloads I mention in comment 6.  That explains what's going on, at least!

Thank Boris for your efforts. I didn't know the auto-detect function have effects on cache.

> What was the deal in comment 11?  There you said you could reproduce the
> problem in clean profiles, but autodetect defaults to off...

As Simon has said.
In the zh-TW localized builds, character encoding auto-detect defaults to "Universal".
> what alternative are you thinking of?

The alternative of not starting character encoding conversion in autodetect-on mode until the point after we have applied the autodetect heuristics.

Thanks for the localized build clarification; that totally explains the behavior.
What about a META charset deep in the page, which is another scenario where we can get a charset reload on first load?
We limit the <meta> scan to the first 1024 bytes, iirc.  I'm not sure whether we still do charset reloads from that code, though.  Henri?
(In reply to comment #24)
> We limit the <meta> scan to the first 1024 bytes, iirc.  I'm not sure whether
> we still do charset reloads from that code, though.  Henri?

The HTML5 parser limits the <meta> prescan and, unlike the old parser, the chardet run (when enabled) to the first 1024 bytes. That is, the <meta> prescan is not supposed to cause a reload and the chardet run is no longer supposed to be able to cause a reload.

If a charset <meta> is seen later, the HTML5 parser issues a reload request to the docshell.
Oh, I see.  Ok, then we can get the issue that way too.  And the problem there is that we don't necessarily want to hit the network again for the loads we started before finding the <meta>.  But that case is rare, I assume.  I'm ok with changing its behavior.  Is charset autodetect reloading after subresource loads have already started also rare?
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: