uncached non-utf8 document loads forever (hangs) with auto-detect -> universal

RESOLVED FIXED

Status

()

Core
DOM
P2
normal
RESOLVED FIXED
9 years ago
8 years ago

People

(Reporter: Marien Zwart, Assigned: bz)

Tracking

({regression})

Trunk
x86
All
regression
Points:
---
Bug Flags:
blocking1.9.2 +
blocking1.9.1 -
wanted1.9.1 +
blocking1.9.1.1 -
in-testsuite +

Firefox Tracking Flags

(status1.9.2 beta1-fixed, status1.9.1 .4-fixed)

Details

(URL)

Attachments

(3 attachments, 1 obsolete attachment)

(Reporter)

Description

9 years ago
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.4) Gecko/2008111700 Gentoo Firefox/3.0.4
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.4) Gecko/2008111700 Gentoo Firefox/3.0.4

This is a reduced version of a problem occuring on http://furry-paws.com/shows.php. If you go there, log in as guest, make sure view -> character encoding -> universal is selected, and search for something you will go to a blank page with the load indicator in the tab spinning forever (or at least for the few minutes I gave it). If you look in the dropdown for forward/back you see a wyciwyg:// url for the search results page, and the previous page is the actual search results page. Hitting back will give the correct page. From then on that particular search will work, until you clear your cache and try again.

I tried to reduce this a bit and ended up with http://129.125.102.164/m/encoding-hang.html which has the following pieces:

- a <link rel="stylesheet" type="text/css" href="empty.css">. empty.css is a completely empty file.
- inline script doing a document.write.
- a non-utf-8 non-ascii character (currently a copyright ((C)) symbol).


Reproducible: Always

Steps to Reproduce:
1. enable view -> character encoding -> autodetect -> universal
2. load http://129.125.102.164/m/encoding-hang.html

Actual Results:  
Page loads forever, getting as far as "Hello, world!". After hitting "back" or clicking in the address bar and hitting enter the page loads completely. The page now keeps working until the cache is cleared and the page is revisited (clearing the cache and then hitting f5 keeps it loading, you have to actually revisit it).

Expected Results:  
Page loads immediately and completely (including the "(C)" character).
This seems like a serious regression that should block. Regression window is between 2008080402 and 2008080502, but I don't see anything in hg log that looks connected.
Flags: blocking1.9.1?
Keywords: regression
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 2

9 years ago
Universal isn't the default setting (except in sv-SE locale, which seems like a mistake).
Flags: wanted1.9.1+
Flags: blocking1.9.1?
Flags: blocking1.9.1-
Created attachment 359460 [details]
NSPR log with all:5

http://129.125.102.164/m/empty.css returns following header.

> [2268] 0[72c140]: http request [
> [2268] 0[72c140]:   GET /m/empty.css HTTP/1.1
> [2268] 0[72c140]:   Host: 129.125.102.164
> [2268] 0[72c140]:   User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090104 Minefield/3.2a1pre
> [2268] 0[72c140]:   Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> [2268] 0[72c140]:   Accept-Language: ja,en-us;q=0.7,en;q=0.3
> [2268] 0[72c140]:   Accept-Encoding: gzip,deflate
> [2268] 0[72c140]:   Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
> [2268] 0[72c140]:   Keep-Alive: 300
> [2268] 0[72c140]:   Connection: keep-alive
> [2268] 0[72c140]: ]
>
> [2268] 1252[72c500]: http response [
> [2268] 1252[72c500]:   HTTP/1.1 200 OK
> [2268] 1252[72c500]:   Content-Length: 0
> [2268] 1252[72c500]:   Accept-Ranges: bytes
> [2268] 1252[72c500]:   Server: TwistedWeb/8.2.0
> [2268] 1252[72c500]:   Last-Modified: Wed, 19 Nov 2008 05:14:40 GMT
> [2268] 1252[72c500]:   Date: Thu, 29 Jan 2009 04:37:29 GMT
> [2268] 1252[72c500]:   Content-Type: text/css
> [2268] 1252[72c500]: ]

Attached is NSPR log, with all:5, with Fx trunk.
> Steps to Reproduce:
> 0. Clear all private data, including cache
> 1. enable view -> character encoding -> autodetect -> universal
> 2. load http://129.125.102.164/m/encoding-hang.html

Log for HTTP GET to http://129.125.102.164//m/empty.css is seen in NSPR log.
But log for HTTP response to it is not seen in NSPR log.
Error in handling of "Content-Length: 0" when css data of length=0 is not cached yet?
Bug 463210 or Bug 475444 is relevant? 
> Bug 463210 "Loading" indicator in browser doesn't stop spinning, although all the requests to web pages/css/scripts/images on a page are finished
> Bug 475444 HTTP standard violation, Content-length header appears to be ignored under some circumstances
(Reporter)

Comment 5

9 years ago
Please note that I put this up on this particular server because I could not reproduce the problem using local files but am not intentionally sending anything unusual in the http headers. However this server setup is a bit nonstandard, so you may see oddities there that are unrelated to the bug I was trying to report. Please check if the actual problematic server (furry-paws.com) sends roughly the same headers before deciding those are involved.
(In reply to comment #5)
To Marien Zwart(bug opener):
"No problem with local file" indicates HTTP header is relevant to problem. And, as seen in my HTTP header log, there is no unusual header except Content-Length:0 in your test case. 
Can you create test case with "empty css file" but Content-Lenght > ZERO, to make situation clearer?
   /m/empty-2.css          : Null line only file (LF or CRLF only file) 
   /m/encoding-hang-2.html : Points empty-2.css
   (I guess no problem with empty-2.css)
(Reporter)

Comment 7

9 years ago
I have now done what you asked: http://129.125.102.164/m/encoding-hang-2.html is identical to encoding-hang.html except for loading empty-2.css instead of empty.css, and empty-2.css has a multi-line comment. I see the same problem using these files.

But please note that the original problem did *not* involve an empty css file. An empty css file was just how far I was able to reduce the original problem without making it go away completely (omitting the css file entirely makes the problem go away, an empty file or a file with any contents seems to be sufficient to trigger the problem). If there is an issue here specific to zero-length files that is *not* the original problem I tried to reduce and seperate bug(s) should be filed.
As you say, Content-Length:0 doesn't seem to be relevant.

Tested with next combination. Phenomenon seems to have slightly changed.
> http://129.125.102.164/m/encoding-hang-2.html
> Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20090203 Minefield/3.2a1pre
While loading is incomplete, infinite increase of fetch count of Key=encoding-hang-2.html was observed. Internal load loop?
What is happening around step (2-a)/(2-b)/(2-c)?
1. HTTP GET to encoding-hang-2.html. Parse starts with iso-88591.
2-a. HTTP GET to empty-2.css by <link rel="stylesheet">
2-b. document.write('Hello, World') in <head> forces </head><body>
2-c. Because no charset in HTTP header
     and because no <meta http-equiv=Content-Type content="...;charset=...">,
     auto-detect tries to change windows-1252 when © in HTML source is found.
     => Internal load occurs (this is current implementation)
        - If HTTP GET for html is incomplete, HTTP GET is probably aborted,
          and second HTTP GET is issued.
          (In your case, new HTTP GET is not seen in log)
        - If HTTP GET for css is incomplete, process for HTTP GET is perhaps
          aborted. (reason of no log for GET response, no cache entry for .css)
3. Re-parse html with windows-1252, and render with windows-1252. 

By the way, why many web developers write non-ascii HTML without specifiying charset?
Why many web developers put document.write(HTML_element_for_BODY_only) in <script> of <head> section? 

(Test result)

(1) View/Character Encoding/Auto detect=Universal, Clear cahce data,
    and load the URI
=> "Hello, World!" only is displayed. © is not displayed.
   ViewCharacter Encoding : Western(iso-8859-1) 
   Load doesn't complete.

(2) about:cache (load still incomplete) 
=> http://129.125.102.164/m/encoding-hang-2.html only in Disk Cache
   No entry for empty-2.css is seen.
   Number of entries:2 is possibly a result "cancel of HTTP GET to empty-2.css". 
> Disk cache device
> Number of entries: 	2
>(snip)
> Key: http://129.125.102.164/m/encoding-hang-2.html
>(snip)
> Fetch count: 3
> (snip)

(3) "Cache entry information" for Key=encoding-hang-2.html
> key:	http://129.125.102.164/m/encoding-hang-2.html
> fetch count: NNN
Note : While loading incomplete, this fetch count increased infinitely.
       This indicates loop of a process or group of processes.
> last fetched: 2009-02-04 13:50:54
> last modified: 2009-02-04 13:50:25
> expires: 2009-02-04 14:33:18
> Data size: 229
>(snip)	
> charset: ISO-8859-1
>(snip)	
>(top part of data) 
> 00000000:  3c  68  74  6d  6c  3e  0a  20  20  3c  68  65  61  64  3e  0a  <html>.  <head>.
> 00000010:  20  20  20  20  3c  6c  69  6e  6b  20  72  65  6c  3d  22  73      <link rel="s
> 00000020:  74  79  6c  65  73  68  65  65  74  22  20  74  79  70  65  3d  tylesheet" type=
> 00000030:  22  74  65  78  74  2f  63  73  73  22  20  68  72  65  66  3d  "text/css" href=
> 00000040:  22  65  6d  70  74  79  2d  32  2e  63  73  73  22  3e  0a  20  "empty-2.css">. 

(4) Cancel loading by X button.
View source:
> Hello, World!
Ctrl+A/View selection source:
> <html><head></head><body>Hello, world!</body></html>
"Infinite increase of fetch count of Key=encoding-hang-2.html" stopped.

(5) Clear cache, and load empty-2.css (hold in cache), and load the URI
    => Result is same. Load is incomplete. © is not displayed.
To Marien Zwart(bug opener):
I think following at top of your HTML source is simpest workaround of your case.
Is it right?
> <meta http-equiv="Content-Type" content="text/html; charset='windows-1252'">
(Reporter)

Comment 10

9 years ago
Apologies for the delay in replying. I somehow missed the bugmail for the last two comments.

I'm sure there are ways to make this testcase load properly by changing the testcase. But that is not what I filed this bug for. I filed this because there is at least one real-world page out there that Firefox currently does not display (getting stuck "loading" forever). This makes it seem like Firefox is broken. (Boring background info: I filed this after someone I helped upgrade from Firefox 2 to 3 on #firefox complained that it was broken, because it could no longer display the furry-paws.com site mentioned in comment #0). Even if the page is invalid Firefox should either say so quickly or display as much of it as it can.

If you think making further changes to the testcase will actually help figure out what causes Firefox to loop like this please give instructions. But I am really not looking for the "simplest workaround" on the source end, because I do not intend to find all servers out there producing non-utf8 content without proper headers (I'm pretty sure there are more than the one reported here).
(In reply to comment #10)
> I'm sure there are ways to make this testcase load properly by changing th
e testcase. But that is not what I filed this bug for.
> I filed this because there is at least one real-world page out there that
> Firefox currently does not display (getting stuck "loading" forever).

Please don't misunderstand my question.
And sorry for misleading term of "workaround". I used "workaround" for "an action which will make one of conditions false" in previous comment.

You already found conditions when problem occurs.
>(1) a <link rel="stylesheet" type="text/css" href="empty.css">.
>(2) inline script doing a document.write.
>(3) a non-utf-8 non-ascii character (currently a copyright ((C)) symbol).
Because ISO-8859-1 is set in cache entry, I think following is much detailed condition of (3). 
     (3-A) no charset in Content-Type:
  && (3-B) auto-detect=Universal(and some others)
  && (3-C) auto-detected charset NE ISO-8859-1
           - possibly windows-1252 case only
           - seemes to be detected as windows-1252 when copyright symbol only
And I think main cause of the problem is fault in Fx's process around (3), because problem looks "internal reload loop upon charset change by auto-detect".
In other words, Fx's code around (3) can't tolerate with "(1)==true && (2)==true". 

I guess "no problem" if correct charset is specified, because it'll disable auto-detect, but I'm not sure. So I'd like to know real result(including charset saved in cache entry) when (1) && (2) && "auto-detected charset is set in Content-Type:" && "non-ascii character in HTML(condition 3 by you)".
You don't need check with "auto-detected charset is set in Content-Type:" any more. I could see both "charset in Content-Type:" and "auto-deteced charset" in cahce entry.
(a) https://bugzilla.mozilla.org/show_bug.cgi?id=nnnnnn
    (Sent with Content-Type: text/html; charset=UTF-8 header, not by meta tag)
    => Cache entry in Memory Cache Device: charset=UTF-8
(b) http://129.125.102.164/m/encoding-hang-2.html
    (No charset in HTTP header nor meta tag)
    Clear cache and Load page => HTTP GET for ...html is issued
    => ....html is saved in Disk Cache (charset=ISO-88591)
    => problem occurs => Cancel infinite load by click of X
    => Go to the page again (not Reload, not Shift+Reload)
       (HTTP GET for ...html is probably not issued in this test) 
       (I didn't check 304 for If-modified-Since: case)
    => Page is loaded(normal load), detected as windows-1252
    => Cache entry in Memory Cache Device: charset=windows-1252

Problem is next?
  Setting of auto-detected charset in cache entry is interfered by (1) && (2),
  if ...html is not already held in cache(HTTP GET is issued and 200 response).
Changing Component: to Networking:Cache, because failure/interfere of "charset change of cache entry to auto-detected carset" looks to be involved in problem. If wrong, change to appropriate one, please.
Assignee: smontagu → nobody
Component: Internationalization → Networking: Cache
OS: Linux → All
QA Contact: i18n → networking.cache
Version: unspecified → Trunk
CC-ing to Boris. 
What's wrong do you think?
Do we need to provide other debug data than NSPR log with all:5?
Uh... I have no idea.  There's so much noise in this bug, I can't tell what's relevant and what's not.

Someone just needs to sit down with a debugger and look into this.  Michal, want to give it a shot?
I'll take a look at it.
(In reply to comment #15)
> There's so much noise in this bug, (snip)

Sorry for much noise, but I simply tried to clarify external phenomenon and related conditions, tried to rule out irrelevant conditions, before asking to developers, because no developer had joined, even though "simple test case which produces problem always", even after we provided nspr log.
Key comments are:
 - Comment #0 + Comment #11 (for conditions)
 - Comment #8 + Comment #12 (external observation of phenomenon)
The problem is that the document is already closed when nsHTMLDocument::WriteCommon is called. I've tracked it down to nsScriptLoader but I don't know how to fix it. The following is happening:

1) HTML page starts loading
2) LINK tag is found and scriptloader is blocked from executing at http://hg.mozilla.org/mozilla-central/annotate/f13ca2edd374/content/html/document/src/nsHTMLContentSink.cpp#l2937
3) document.write() is found in HTML and nsScriptLoader::ProcessScriptElement() is called. Request is queued because loader is blocked (but it isn't deferred)
4) charset detection stops loading of the page at http://hg.mozilla.org/mozilla-central/annotate/f13ca2edd374/intl/chardet/src/nsDetectionAdaptor.cpp#l66
5) nsScriptLoader::EndDeferringScripts is called, but the request stays in queue since the scriptloader is still blocked.
6) document is now closed
7) CSS stops loading and the scriptloader is unblocked at http://hg.mozilla.org/mozilla-central/annotate/f13ca2edd374/content/base/src/nsContentSink.cpp#l339
8) queued request is dispatched and nsHTMLDocument::WriteCommon() is called
> 5) nsScriptLoader::EndDeferringScripts is called, but the request stays in
> queue since the scriptloader is still blocked.

This sounds like the bug to me; if we're stopping the load, we should drop queued scripts.  Jonas?
Component: Networking: Cache → DOM
QA Contact: networking.cache → general
That is, I think that if the sink is terminated we should cancel all pending script execution, not just deferred scripts.  Seem reasonable?
Flags: blocking1.9.2?
Flags: blocking1.9.1.1?
Created attachment 386793 [details] [diff] [review]
Perhaps like so?
Attachment #386793 - Flags: superreview?(jonas)
Attachment #386793 - Flags: review?(jonas)
Attachment #386793 - Flags: superreview?(jonas)
Attachment #386793 - Flags: superreview+
Attachment #386793 - Flags: review?(jonas)
Attachment #386793 - Flags: review+
Comment on attachment 386793 [details] [diff] [review]
Perhaps like so?

Testcase?
I have no idea how to write a reliable one.  I can't even reproduce very reliably...  Michal, can you create one, by chance?
I'll try to create it.
Pushed http://hg.mozilla.org/mozilla-central/rev/7093a03dfdca
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Flags: in-testsuite-
Resolution: --- → FIXED
Assignee: nobody → bzbarsky
Created attachment 388315 [details] [diff] [review]
testcase
Attachment #388315 - Flags: review?(bzbarsky)
Comment on attachment 388315 [details] [diff] [review]
testcase

Doesn't this need to unset the pref?  And that 2-second timer doesn't look like a good idea; it'll lead to random orange, won't it?
Attachment #388315 - Flags: review?(bzbarsky) → review-
Flags: wanted1.9.1.x+
Flags: blocking1.9.1.1?
Flags: blocking1.9.1.1-
Attachment #386793 - Flags: approval1.9.1.2?
Created attachment 388476 [details] [diff] [review]
testcase

> And that 2-second timer doesn't look like
> a good idea; it'll lead to random orange, won't it?

IMO it shouldn't but it was there only to speed up the failure so I've removed it. I'm now setting back original detector on success. Can I define some function that would be called in case of timeout so I could set original detector on failure too?
Attachment #388315 - Attachment is obsolete: true
Attachment #388476 - Flags: review?(bzbarsky)
Attachment #388476 - Flags: review?(bzbarsky) → review+
Comment on attachment 388476 [details] [diff] [review]
testcase

This looks fine.  On timeout of the test the test harness will stop running, so no point resetting the detector pref then.

Can you push this, or should I?
I can't, please push it.
Pushed test as http://hg.mozilla.org/mozilla-central/rev/0888a715cb3e
Flags: in-testsuite- → in-testsuite+
Attachment #386793 - Flags: approval1.9.1.2? → approval1.9.1.3?
status1.9.1: --- → wanted
Flags: wanted1.9.1.x+
Comment on attachment 386793 [details] [diff] [review]
Perhaps like so?

Approved for 1.9.1.4, a=dveditz for release-drivers
Attachment #386793 - Flags: approval1.9.1.3? → approval1.9.1.4+
Pushed http://hg.mozilla.org/releases/mozilla-1.9.1/rev/71163cab3efb
status1.9.1: wanted → .4-fixed
Flags: blocking1.9.2? → blocking1.9.2+
Priority: -- → P2
status1.9.2: --- → beta1-fixed
You need to log in before you can comment on or make changes to this bug.