Closed Bug 272815 Opened 20 years ago Closed 8 years ago

On page reloads, MS quotes alternately rendered, butchered

Categories

(Core :: Networking: Cache, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: klbehren, Unassigned)

References

()

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.7.5) Gecko/20041107 Firefox/1.0

http://evergreenleaders.org/elcc/tiki-newsletters.php?nlId=7&editionId=2

uses a UTF-8 character set and includes Microsoft smart quotes.  These include
fancy left and right slanted quotation marks and stylized apostrophes.  Such
characters are often present in content which originated in MS Word.  Their
character codes are as follows:

apostrophe: 92h, 146 decimal
open quote: 93h, 147 decimal
close quote: 94h, 148 decimal

Upon the initial page load, Firefox fails to render the smart quotes.  Instead
it interprets the UTF-8 code as three separate characters:

apostrophe displays as ’
open quote displays as “
close quote displays as â€

However, a FireFox-induced page reload causes the smart quotes to render legibly.

In subsequent reloads, the page alternates predictably between rendered and
butchered smart quotes.

In all cases, FireFox page source displays all quotes legibly.  A file compare
indicates that reloading does not change any relevant page source content. 

Reproducible: Always
Steps to Reproduce:
1. load URL
2. observe butchered smart quotes
3. click FireFox "Reload" button
4. observe pretty smart quotes
5. click FireFox "Reload" button
6. observe butchered smart quotes
7. click FireFox "Reload" button
8. observe pretty smart quotes
9. repeat as desired

Actual Results:  
Microsoft smart quotes were butchered, then rendered, then butchered, then rendered.

Expected Results:  
Render the Microsoft smart quotes legibly on the first page load and all
subsequent reloads.
Summary: On repeated reloads, MS smart quotes alternate being rendered or butchered → On page reloads, MS quotes alternately rendered, butchered
Assignee: firefox → nobody
Component: General → Layout
Product: Firefox → Core
QA Contact: firefox.general → core.layout
Version: unspecified → Other Branch
The page is seriously broken - it contains two <html> blocks, one of them
containing a <meta http-equiv="Content-Type" content="text/html; charset=utf-8"
/>  tag, and the other a <meta http-equiv=Content-Type content="text/html;
charset=windows-1252">.

Still, Firefox's behaviour seems weird.
(In reply to comment #1)

The outer <html> tag was generated by my CMS.  The inappropriate inner tag is 
from user-stored content (the entire contents of a MS Word html file).

Note also that the URL I included in the description is incomplete.  It should 
read:
http://evergreenleaders.org/elcc/tiki-newsletters.php?nlId=7&editionId=204
Not a layout issue... over to networking for a start, but this could also be parser.

When we get into the parser's first OnDataAvailable, loading from cache, the
amount of data available is 2 or 3 bytes (the amount varies).  So the parser
naturally doesn't find that first meta tag in that 2-3 byte chunk, which ends up
relying on the meta charset observer to set the charset; it's not quite clear to
me why that's screwing up and not seeing the first meta tag.

Oh, and I can't seem to reproduce this in a debug build, of course.... :(

Over to necko for now, I guess -- it'd be nice to get more than 2 bytes out of
the cache.  Blake, do you have time to look at this from the charset observer end?
Assignee: nobody → darin
Status: UNCONFIRMED → NEW
Component: Layout → Networking: Cache
Ever confirmed: true
QA Contact: layout → networking.cache
OS: Windows 2000 → All
Hardware: PC → All
Version: Other Branch → Trunk
Taking. I will have a wallpaper over this (very strange cache behavior) from the
parser's end sometime soon.
Assignee: darin → mrbkap
Target Milestone: --- → mozilla1.8beta2
Status: NEW → ASSIGNED
This is a slightly dirty way to fix this bug and it would be nice to know why
the cache was returning two bytes in the first chunk. Basically, on the first
load, we would see the first <meta> tag and process the charset. This meant
that on the first load, we'd always get the right charset. However, on
subsequent loads from the cache, we'd ignore the first <meta> tag, since the
charset was already the same as the one we'd obtained from the cache (and there
was no reason to reload). 
However, this meant that we'd never notice the first charset, so when we came
across the second charset, we'd use that, thinking that our only charset hint
was from the cache. With this patch, we notify the parser that we've seen a
meta tag giving us charset information, so we pass over the second <meta> tag.
Attachment #174645 - Flags: review?(bzbarsky)
Comment on attachment 174645 [details] [diff] [review]
patch v1 [checked in]

r=bzbarsky
Attachment #174645 - Flags: review?(bzbarsky) → review+
Comment on attachment 174645 [details] [diff] [review]
patch v1 [checked in]

Darin, can you sr?
Attachment #174645 - Flags: superreview?(darin)
Comment on attachment 174645 [details] [diff] [review]
patch v1 [checked in]

punting sr request to jst.  i don't know this code.
Attachment #174645 - Flags: superreview?(darin) → superreview?(jst)
If the cache tells you the charset, then it means that you analyzed the content
previously and determined that that is the best charset.  Why do you need to
analyze the stream again when you have a hint from the cache?  It should be the
correct charset if it was the correct charset last time.  If you trust the
charset from the cache, then you don't have to worry about the 2 bytes of data
from the cache.  That 2 bytes could come for any number of reasons.  Maybe we
got a context switch at an inopportune moment while reading from the cache file
(on the background thread).  NOTE: the act of writing to the nsPipe causes an
event to be dispatched to the UI thread, which could trigger a context switch
immediately.  so, this scenario is possible.  we could say that we will delay
first ODA until we have at least 4k of data (or the entire document), but that
shouldn't be necessary for things to function properly.
it is the parser that sets this value on the cache btw.  it owns this entire
process :)
ok, upon further investigation it appears that the HTML for the site is served
with "cache-control: no-store" so i don't believe that our cached copy is ever
used.  it would be a major bug if it were used.
Darin, for the reason that we re-detect the charset (over the cache), see bug
81253 comment 75 and beyond. The short of it is a worry that if a user turns off
charset autodetection and we guess wrong, turning autodetection back on should
detect the charset, even if we have a cached charset. Simon, is this still an issue?

I also set a breakpoint in nsHTMLDocument::TryCacheCharset(), and after the
first load, I invariably see the function returning PR_TRUE (and a real
charset), so it looks like the cache is, in fact, storing the charset despite
the headers. I'm not sure where the rest of the data comes from.
Comment on attachment 174645 [details] [diff] [review]
patch v1 [checked in]

sr=jst
Attachment #174645 - Flags: superreview?(jst) → superreview+
The attachment 174645 [details] [diff] [review] has been checked in. Giving the bug back to Darin, so he
can look at the charset issue.

Darin, Boris was wondering on IRC what it means for the cache entry to exist and
have a charset (and whether cache entries are reused).
Assignee: mrbkap → darin
Status: ASSIGNED → NEW
This bug might give a clue about the cause of bug 278571. I failed to reproduce
it the other day with a debug build running within a gdb session, but I'll take
another look.
Priority: -- → P4
Target Milestone: mozilla1.8beta2 → mozilla1.9alpha
-> default owner
Assignee: darin → nobody
Target Milestone: mozilla1.9alpha → ---
Attachment #174645 - Attachment description: patch v1 → patch v1 [checked in]
this patched is checked in (c14) and other than that seems to have bitrotted..
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: