charset request cause page load twice

VERIFIED FIXED in mozilla0.9.3

Status

()

P1
normal
VERIFIED FIXED
18 years ago
18 years ago

People

(Reporter: cathleennscp, Assigned: vidur)

Tracking

Trunk
mozilla0.9.3
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: PDT+, patch in progress)

Attachments

(9 attachments)

(Reporter)

Description

18 years ago
We talked about this issue on Monday's performance meeting.  Just want to make 
sure a bug is filed... and Frank help investigate.

See also related bug 61363, where charset's 2nd request is being fetched from 
server not from cache.


Past email thread
=====================

Yung-Fong Tang wrote:

> someone need to give me more context so I can help
>
> Cathleen Wang wrote:
>
>   Fank, can you help investigate?
>
>   ~ Cathleen
>
>   Gagan Saksena wrote:
>
> >   If we are still doing a double load inspite of this then something in 
> >   layout/intl./parser is broken. Cc'ing some relevant people-- Who should
> >   get this bug?
> >
> >   -Gagan
> >
> >   Tom Everingham wrote:
> >
> > >   there doesn't appear to be any significant speed differences when the 
> > >   default char set is changed to match ibench.  Here are the numbers 
> > >   I'm getting ->
> > >
> > >   Char encoding           ISO-8859-1          Windows-1252
> > >                          -------------       ---------------
> > >   All iterations:            342.66                343.57
> > >   1st iteration:              52.95                 49.9
> > >   Subsequent iteration        41.39                 41.95
> > >
> > >  (Avg of 3 runs of html load pages tests, 4 M mem cache, 40 M disk cache,
> > >   450 Mhz P3, Win NT)
> > >
> > >  It appears as though ibench is sending out a meta tag setting the charset 
> > >  to windows-1252. ( That was the only charset item I could find.)   Also,
> > >  to be safe I did check to make sure that the preference settings took 
> > >  and that the ACCEPT header actually changed.
> > >
> > >  Tom
> > >
(Reporter)

Updated

18 years ago
Blocks: 71668

Comment 1

18 years ago
see also http://bugscape.mcom.com/show_bug.cgi?id=4540, originally filed as
bug 64612 in bugzilla "Auto-Detect (All)" causes a duplicate HTTP GET (or POST)
to be sent". 

Comment 2

18 years ago
nhotta and shanjian- please help to look at this one.
Assignee: ftang → shanjian

Comment 3

18 years ago
hofmann said we should call Tom Everingham about this.

Comment 4

18 years ago
I set break point at nsHTMLDocument::StartDocumentLoad and MetaCharObserver::Notify. 
In case of iso-8859-1, browser only stop at StartDocumentLoad once. MetaCharObserver
did not report anything. Since the charset is same as our default. For win-1252, 
it stop in StartDocumentLoad twice, and stop in Notify once. That's what we expected
since default (8859-1) is different from meta charset (win1252). So the page got 
reload again. Either the reload happens through cache or not, that is another story. 
We have a bug 61363 to handle that. So from my point of view, everything seems perfectly
fine. 
I talked with Tom, and he told me he will post some new number for current build. After
that, let's decide how to handle this bug. 

Updated

18 years ago
QA Contact: andreasb → ylong

Comment 5

18 years ago
I had a little problem with todays build so using 051604.  I'm not exactly sure 
where this bug is showing itself with these numbers but here they are.

Char encoding            ISO-8859-1          Windows-1252
                        -------------       ---------------
All iterations:            342.14                341.59
1st iteration:              49.76                 49.4
Subsequent iteration        41.77                 41.74

Comment 6

18 years ago
I don't see any problem in charset related operations. For iso-8859-1, not charset 
notification is made. I could not see much difference between 8859-1 and win1252. 
That's say charset request itself did not cause page load twice. 

Cathleen, what was your expection on tom's data? Were you expecting to see
much less time used in subsequent page load because cache should be used 
there? If so, please reassign the bug to gordon@netscape.com for consideration.
Assignee: shanjian → cathleen
Summary: charset request cause page load twice → cache does not make much difference in subsequent load???
(Reporter)

Comment 7

18 years ago
There are 2 problems we see when running ibench tests:

1) charset causing fetching page twice, which is what we're trying to address
   in this bug (change back summary)
2) charset's 2nd fetch is being loaded from server not from cache, which is a
   known bug with cache, bug 61363 (don't need to address here)

Now, back to issue #1, tever has modified ibench tests to have the default 
charset to match ibench pages, and page load time are still the same.  So, the 
purpose of this bug is to investigate and find either a) charset is really not 
causing pages to be fetched twice, or b) charset is causing pages to be fetched 
twice, with ibench test.

tever's test results didn't show any difference in performance number between 
the two scenarios of charset configuration.   With our assumption of b) is 
what's going on with ibench test, we were hoping if a) is true, overall page 
load time with ibench should improve some.

Shanjian or ftang, did you guys try running ibench tests to verify?
Summary: cache does not make much difference in subsequent load??? → charset request cause page load twice

Comment 8

18 years ago
maybe we should alias windows-1252 as ISO-8859-1. Currently, we treat them as 
two seperate charset since windows-1252 have defined characters in 0x80-0x9f 
range bug ISO-8859-1 don't . This could cause this issue. 

Comment 9

18 years ago
I did not try ibench, I set break point inside program to see what really happennging. 
I found:
a, if default charset is the same as page charset, meta charset/auto detection does not 
   cause document to be fetched twice. The document is not  fetched 
   twice though. (I revisited 4540.Log in http server does not show the page has been 
   fetched twice.)
b, if the default charset does not match page charset, no mater if page charset is provided 
   through meta tag or detected by auto detection, page will be reloaded. Either this 
   reload happened inside cache or not is the concern of 61363,

Since this bug should address case a, I would suggest to mark it as worksforme. There are 
many interpretion about ibench data. 
1, In my case b, in most case meta charset can be found in the first block of http packet.
   Charset notification is sent immediately, and first load is stopped then. So this "twice"
   load might just take a little extra time that one load. 
2, Cache works in reload and the transported packets are not loaded again.
3, Layout, rendering takes too much time to finish that network loading. 




(Reporter)

Comment 10

18 years ago
assign to gagan.

gagan was gonna see if they can narrow down to a small test case/page, and with 
necko buffer set to a smaller size, back to how it was a few weeks ago.
Assignee: cathleen → gagan
(Reporter)

Updated

18 years ago
Blocks: 82244
(Reporter)

Comment 11

18 years ago
ftang and shanjian,

we're finding charset reload problem get exposed again, see bug 82244.  can you
guys see what you can do?
Assignee: gagan → ftang

Comment 12

18 years ago
shanjian- please take a look at this again. 
Assignee: ftang → shanjian

Comment 13

18 years ago
I am still not convinced that the page is UNNECESSARILY loaded twice because of meta-charset, 
but I can't say that is not the case either. It might be a little difficult for me to handle 
this problem without a specific page. Could you let me know which specific page has this problem
and I will track down what cause the problem. 

FYI, in case you want to confirm this yourself, you can set a break point at 
http://lxr.mozilla.org/seamonkey/source/intl/chardet/src/nsMetaCharsetObserver.cpp
around line 132, which is the observer's notify function. This notification will 
cause page to be reloaded. If we send this nofication with no good reason, we have a problem.

Comment 14

18 years ago
some though regarding to darin's comment in 82244.
When meta charset observer or charset detector requires page reload using new 
charset, this reload should happen using cache instead of real reload. We certainly 
can add some flag in nsWebShell.cpp::ReloadDocument. If new cache requires certain 
flag to be set. But the reload must happen no matter how it happens if a new charset 
is detected. 

Comment 15

18 years ago
Now we all agree the page need to be reloaded. I close this bug as WFM. The performance 
problem should be address in 82244.
Status: NEW → RESOLVED
Last Resolved: 18 years ago
Resolution: --- → WORKSFORME

Comment 16

18 years ago
Verified.
Status: RESOLVED → VERIFIED

Comment 17

18 years ago
on winnt, i tried setting my default charset to windows-1252, and saw a huge
improvement on initial page loads running ibench.  here are the numbers:

default optimized build:         44.06
w/ charset set to windows-1252:  31.88

that's a huge difference.  so, IMO we need to prevent the double load or make
it more efficient.  i think we can prevent it in 99% of the cases.  if the
parser receives the <meta> tag in the first OnDataAvailable, then it should be
able to make the charset switch without requiring a full reload.  performing
a double load is always going to suck BIG TIME... even if we are getting it from
the cache.  i think we should try to optimize for the case when the <meta> tag
shows up in the first OnDataAvailable, and default to the current behavior
otherwise.

does this sound feasible?
Status: VERIFIED → REOPENED
Resolution: WORKSFORME → ---

Comment 18

18 years ago
I think that is doable, but its a lot of work. 
Since most of the webpages are still latin1 based, we probably should treat 
latin1 and cp1252 specially. In  meta charset detector, we do not discriminate 
between latin1 and cp1252. We have two options to implement this:

option 1, treat latin1 as a alias of cp1252, since latin1 is a subset of cp1252. 
system default need to be changed cp1252.

option 2, we still keep system default as latin1, but handle latin1 as cp1252.

I would prefer option2, frank, what do you think? 
Status: REOPENED → ASSIGNED

Comment 19

18 years ago
ok, cut and paste some email discussion here:

ftang:
I personally do not have strong objectation to make such change. It is a simple 
change in the
xpfe/browser/resources/locale/en-US/navigator.properties 
change from 
intl.charset.default=ISO-8859-1 
to 
intl.charset.default=windows-1252 

However, this setting is cross-platform. I am not sure what will be react to 
that if we ship with this setting. I am very afraid this
will make some people mad . 

We can also choose to apply this only for Netscape if we want to do so- see file
/ns/xpfe/browser/resources/locale/en-US/navigator.properties 

bobj/momoi/i18ngrp- 
need your help. 

Shanjian Li wrote: 

  IE is using latin1 (ie, iso-8859-1). But IE does not discriminate between 
latin1 and cp1252 as we do. We might want to do
  something similar as I proposed in the bug 81253 to achieve the same result. 
If we want this to happen quickly, we need to
  get frank involved. 

  shanjian 

  Gagan Saksena wrote: 

    Frank/Shanjian: 

    Is there a reason why we don't set our default charset to the same as IE's? 
IE seems to be using Unicode (Western
    European) whatever that maps to (is that Windows-1252?) and we are using 
Western (IS-8859-1) 

    With this change, as Darin demonstrated  (see bug 81253) we could make a 
significant improvement on ibench page
    loading. Note that I am not proposing this as a way to fix bug 81253 but 
only a way to minimize its occurence. We still
    have to fix it to get the best possible results. 

    -Gagan

Comment 20

18 years ago
Do we really do a 2 full loads?  I thought the original design was for the
observer to notify as soon as the parser parsed a META charset, then current
load would be aborted, and then the reload the page.  The META tag should 
hopefully be in the first or second block from the network, so the abort
would happen before much of the load had occurred.  Is this not how it works?

Anyway, changing the default to cp1252 (for English releases) should be OK. 
It's a bit counter to promoting standards (ISO vs MS), but for most cases,
users should see no difference.  The loading performance is probably
not that perceptible when loading individual pages.  Ibench and tests like
it are a bit artificial in that they are loading in rapid succession.

But just switching the default may mean we perform poorly when a different
set of data (mostly ISO-8859-1 pages) is used.

Can we add a special optimization in the MetaCharsetObserver to check if the 
default and meta charsets are cp1252 and iso-8859-1, and if so do NOT reload?
We probably still need to update the charset for the docshell for other 
reasons, but without a reload.  Can we switch the Unicode converter in
mid-load?  A potential problem when the default is iso-8859-1 is that
characters that are in the cp1252 superset (e.g., smart quotes) may have
been misconverted into Unicode.

Comment 21

18 years ago
Yes, we really do two full loads. The pages are short.

harishd, how hard would it be to add charset detection along with the DOCTYPE 
sniffing code?

Comment 22

18 years ago
how about treat windows-1252 as ISO-8859-1? that seems what we did in 4.x
Here is the patch. don't land that untill we full agree with it. ok?

Comment 23

18 years ago
Created attachment 37925 [details] [diff] [review]
treat windows-1252 as ISO-8859-1

Comment 24

18 years ago
again, try the previouse patch but don't check it in. we need more analysis of 
the impact of that change. 

Comment 25

18 years ago
put into moz0.9.3 
Target Milestone: --- → mozilla0.9.3

Comment 26

18 years ago
with this patch i see a 28% initial page load improvement on dogspit/ibench.

Comment 27

18 years ago
bobj: there are a number of problems with the reload (even if the first load
hasn't been fully completed):

1) we drop a potential keep-alive connection
2) we waste cycles writing data to the disk cache only to delete it almost      
   immediately.
3) we have the added expense (i would think) of tearing down and recreating
   docshell related stuff in order to process the reload.

Comment 28

18 years ago
Here is the plan
check in the last patch so we treat windows-1252 and ISO-8859-1 as the same
We may want to consider the following:
1. change the ISO-8859-1 unicode encoder wo we can convert to the 0x80-0x9F part 
of windows-1252, but if we do so , we also need to
2. make sure the unicode encoder used by xwindow fonts do not include that part, 
we may need to zap out anything > 0xFF or create a new uniocode encoder for it.
3. how about mail news?
Assignee: shanjian → ftang
Status: ASSIGNED → NEW
Target Milestone: mozilla0.9.3 → mozilla0.9.2

Updated

18 years ago
Status: NEW → ASSIGNED

Comment 29

18 years ago
we should check in the current patch as is for now to get QA feedback.

Comment 30

18 years ago
ftang: why don't we see what harishd has to say about meta-charset sniffing? I 
talked with vidur about this today, and he says it should be a pretty 
straight-forward thing to do.

Comment 31

18 years ago
pdt+ base on 6/11 pdt meeting.

Updated

18 years ago
Whiteboard: [PDT+]

Comment 32

18 years ago
waterson:
>why don't we see what harishd has to say about meta-charset sniffing? 
we could. but do you want to wait?

Comment 33

18 years ago
the other way we can do is treat ISO-8859-1 as windows-1252, then 
we don't need to worry about form posting. 

Comment 34

18 years ago
Created attachment 38001 [details] [diff] [review]
treat ISO-8859-1 as windows-1252

Comment 35

18 years ago
you should only apply ONE of the above patches, nevery apply both to you tree.

Darin, can you undo the first one and try the 2nd one. I think it should give 
you the same performance result. 

Comment 36

18 years ago
mark it as P1.
Priority: -- → P1

Comment 37

18 years ago
ftang, what do you expect the side-effects of this change to be? Also cc'ing 
momoi, who seemed to think this might not be a good idea...

Comment 38

18 years ago
Frank and I chatted about this in his cube.
Both his patches generally do the same thing which is to alias cp1252 and
iso-8859-1 as the same.  The difference is which is the main and which is
the alias charset.

The big win for Frank's approach is that it is merely a mapping change and
the risk of introducing new bugs is extremely low.  So this does not
preclude doing a better fix later.

There are undesirable side effects caused by converting FROM Unicode INTO the
"main" charset for codepoints outside (superset) of the iso-8859-1 set.  Areas
affected would be forms submission, composer, and maybe save-as.

I gotta do something else right now, Frank or I will document these side
effects very soon today.

Comment 39

18 years ago
> ftang, what do you expect the side-effects of this change to be? 
> Also cc'ing momoi, who seemed to think this might not be a good idea...

I don't have an objection to the approach taken in the proposed 
patches by ftang. In mail, I did voice an objection against re-setting
the default encoding to Windows-1252 just for Windows platform. This
ideas was floated in some of the comments above. I don't think we
should do that and stick to ISO-8859-1 instead as the default UI
setting. What we do internally might be somewhat different.

(Assignee)

Comment 40

18 years ago
I've started working on the meta tag sniffing solution in the parser. ETA:
Thursday, 6/14.

Comment 41

18 years ago
>ftang, what do you expect the side-effects of this change to be? Also cc'ing 
>momoi, who seemed to think this might not be a good idea...
momoi's objection in mail is aginst a different proposal- change the default 
setting.
both my patch here does not change default setting, but instead treat both 
charset name as the same meaning, which is not what momoi object.
I personallyu prefer the later patch since then we don't need to worry about 
form submission. I would like to ask nhotta test mail with the later patch and 
make sure we didn't break anything in mail before we make decision.

Comment 42

18 years ago
I applied the patch on 06/11/01 18:27, and found a couple of problems.

* Charset menu shows "windows-1252" for pages which actual charst is
"ISO-8859-1". This is also true for mail (both view and compose).
* No menu check mark for the above cases. Possibly because the actual charset
and the menu item do not match.

(Assignee)

Comment 43

18 years ago
Created attachment 38319 [details] [diff] [review]
charset sniffing in parser patch v1.0
(Assignee)

Comment 44

18 years ago
Created attachment 38337 [details] [diff] [review]
patch v1.1 based on jrgm and harishd suggestions

Comment 45

18 years ago
Additional info to my last comment (2001-06-13 14:35).
* the charset check mark problem was for browser and mail compose, mail view
does not have the problem.
* mail is sent out as ISO-8859-1, but if you select windows-1252 from the menu
then it sends out as windows-1252.
* some code do special handling for ISO-8859-1, those may break if changing to
windows-1252.
e.g. aCharset.EqualsIgnoreCase("ISO-8859-1")
http://lxr.mozilla.org/seamonkey/search?string=ISO-8859-1
(Assignee)

Comment 46

18 years ago
The parser sniffing code posted above looks in up to the first 2k (an arbitrary
number than can be changed) of the first buffer delivered by Necko for a META
tag, prior to the initial unicode conversion done by the parser. If it finds a
META tag with charset information, the charset and its origin are recorded and
used by the parser. The nsMetaCharsetObserver code will still be triggered, but
will not do a reload. If a META tag with charset information exists, but is not
in the first buffer delivered or in the first 2k of the file, we will fall back
to the existing reloading scheme.

Comment 47

18 years ago
Vidur, I tried your patch on an optimized build against iBench. It
_significantly_ cuts down on first-page load time, but it appears that it also
negatively impacts cached page load time. I'm not sure why. Here are the numbers:

                                   Overall   First  Cached
  2001-06-14 (post darin's fixes)   147.65   29.93   16.82
  2001-06-14 with patch             140.29   18.98   17.33

As you can see, first page dropped almost by a factor of two, but cached load
increased by a bit (~3%). I was able to reproduce these results pretty
consistently. (I did a second set of runs, and the overall load time fluctuated
by about a second, run-to-run).

I guess it'd be interesting to see if this had the same sort of effect on jrgm's
tests (which don't have <META> tags). Vidur, do you think there'd be any way to
consolidate this stuff with the DOCTYPE sniffing? (As an aside, just spoke with
dbaron; since we don't use the strict DTD anymore, perhaps we don't need to look
at the DOCTYPE at _all_ up front, and can collect the information as part of the
normal parsing process?)

Comment 48

18 years ago
I should also add, r=waterson on your patch, if you need it. (And we check
jrgm's stuff to see if it has any effects there...)
(Assignee)

Comment 49

18 years ago
jrgm is getting the same results on his run of the ibench test i.e. a slight 
increase in cached load performance. This is extremely curious, since the 
sniffing code isn't (well...shouldn't be) invoked on a cached load. jrgm says 
that he'd seen similar results when he manually changed our version of the 
ibench pages to not have a META charset. So it sounds like the increase is 
unrelated to the specifics of my patch.

I believe dbaron's right about not needing to do upfront DOCTYPE sniffing 
anymore. We still need DOCTYPE information to determine whether to do strict or 
quirks layout, but that can be determined later in the parsing/content creation 
process. Harish should probably look at this.

Comment 50

18 years ago
chris: did you clear out your cache for these numbers? The cached version and 
the first one are fairly close to each other and I wonder if that was becuz we 
were still running the objects from the cache. 

Comment 51

18 years ago
Yes, I cleared my cache. (I was so surprised that I ran it twice, just to be 
sure -- clearing the cache before each run.)

Comment 52

18 years ago
vidur: I think we still make decisions, based on DOCTYPE, at the tokenizer
level. Ex. When XHTML document served as text/html. Though I have to double check. 

Comment 53

18 years ago
reassign this to vidur because now it seems we want to fix this problem by using 
the parser charset sniffing code vidur/harishd put together. 
Please make sure don't check in the 
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=37925
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=38001

because that two patch may cause other undesirable side effects.
Assignee: ftang → vidur
Status: ASSIGNED → NEW

Comment 54

18 years ago
vidur: why the 2k limit?  why not scan everything in the first ODA?
(Assignee)

Comment 55

18 years ago
Darin: the 2k limit was arbitrary and seemed like a reasonable boundary within
which one would expect a META tag. Is there an upper bound on the size of the
first buffer? Since the sniffing is not particularly processor intensive, maybe
the question itself is moot.

Comment 56

18 years ago
I ran an opt. mozilla build, with and without vidur's patch on 3
machine (source pulled 06/18 evening, except for the 266 machine,
which was done with 06/13 source).

I tested against my page load test, which (since the original content
didn't have any) doesn't have any 'meta charset' tags that aren't
iso-8859-1. [Also note that in my test, the top level document is
never cached]. The results for those tests showed no difference in
page load times, which is the ~expected result.

I also ran against the ibench content, configured on an internal
server, and got these times.


win98 800MHz/128MB
                            #1        #2
Without patch. 
  All iterations          132.26    131.66     
  First iteration          26.14     25.98      
  Subsequent iteration     15.16     15.10   

With patch:                                   % Change
  All iterations          129.41    128.91      -2.1%   
  First iteration          18.29     18.13     -30.1%   
  Subsequent iteration     15.87     15.83       4.8%


win2k 500MHz/128MB
                            #1        #2
Without patch. 
  All iterations          271.03     270.72       
  First iteration          34.59      34.27        
  Subsequent iteration     33.78      33.78     
			                    
With patch:                                    % Change
  All iterations          247.91     248.42       9.2%
  First iteration          48.16      48.47     -28.7%
  Subsequent iteration     28.54      28.57      18.3%


win98 266MHz/128MB
                            #1              
Without patch.                              
  All iterations          568.12                           
  First iteration          94.50                            
  Subsequent iteration     61.43                         
                                                        
With patch:                                    % Change
  All iterations          490.81                -13.6%   
  First iteration          67.66                -28.4%   
  Subsequent iteration     61.34                  0.0%


So, by sniffing and avoiding the need to reload, with the negative
consequences that has, particularly on image loads, then the 'First
Iteration' times drop dramatically on all machines/speeds. But, 
at least on the 500 and 800 system (and in waterson's earlier note)
this raises the times when pulling out of the cache. 

I don't really understand why this is doing this, but it is, at least
as measured. (I wasn't able to determine if this was some odd artifact
of the way ibench does the test, but this is the real amount of time
that it is taking to complete the sequence of steps [i.e., there is no
time miscount for the given task]. I still think there is something else
odd, since, if I immediately rerun the test without shutting down, my 
cached times drop, from the initial results.)

One thing I confirmed (with good 'ol printf) was that we are calling
nsParser::DetectMetaTag even when pulling the document from the
cache. So, looks like we need to cut that out (since it is costing).
(Assignee)

Comment 57

18 years ago
John's right - we *are* calling nsParser::DetectMetaTag (and the original
intl-based meta tag observer) during subsequent loads. For some reason, the
charset retrieved from the cache is considered to be of a lower weighting than
one determined from a meta tag (see the charset source enum in nsIParser.h).
Switching the weightings may make sense, but further investigation shows that
the charset retrieved from the cache may not be correct in the first place.

I'll post again when I have more details.
(Assignee)

Comment 58

18 years ago
After talking with ftang, I've made the following changes:

1) The original patch that I posted just recorded the sniffed charset in the
parser. This charset wasn't percolated up as it should have been. We now notify
the document and its charset observers of the detected charset.
2) The original code that set the charset in the cache relied on a reload
occuring. We now set the cache meta data directly in the parser.
3) The weighting of a charset recorded in cache meta data was lower than the
sniffed charset. This is now no longer the case - we use the cached value rather
than resniffing.

I'll post the new patch momentarily. Unfortunately, none of this explains the
performance results...unless rerunning the sniffing code on a subsequent reload
somehow explains the performance degradation.
(Assignee)

Comment 59

18 years ago
Created attachment 39384 [details] [diff] [review]
patch v1.2 after a conversation with ftang
(Assignee)

Comment 60

18 years ago
I've found a problem that *could* account for some part of the performance
degradation we're seeing in the cached case. I wasn't normalizing the name of
the charset sniffed from the META tag. As a result, there's a chance that we
might have been going down a different intl charset conversion path than we did
before. A patch to fix this problem can be found below.
(Assignee)

Comment 61

18 years ago
Created attachment 39441 [details] [diff] [review]
patch v1.3 with charset name normalization

Updated

18 years ago
Whiteboard: [PDT+] → [PDT+] patch available

Comment 62

18 years ago
(vidur: I did a little clobbering in htmlparser and intl, and my earlier 
problem with loading went away. Who knows.)

At any rate, I've tried patch 'v1.3' and I am still getting elevated times
for the "subsequent" page loads.

Comment 63

18 years ago
*** Bug 86289 has been marked as a duplicate of this bug. ***

Updated

18 years ago
Whiteboard: [PDT+] patch available → [PDT+] patch in progress

Comment 64

18 years ago
Okay, I tried out a couple of builds from vidur, one being trunk, and the 
other with patch v1.3 above. I ran them on four machines: a lower-end Win98,
a midrange win2k, a midrange win98 and a higher-end win98.

                                          Delta
win98/266MHz/128MB                      
   All iterations          509.2  484.0   -5.0%
   First iteration          90.1   65.9  -26.9%
   Subsequent iteration     59.8   59.7   -0.3%
                        	       
win2k/500MHz/128MB		       
   All iterations          244.3  266.8    9.2%
   First iteration          46.8   33.5  -28.4%
   Subsequent iteration     28.2   33.3   18.1%
                        	       
win98/500MHz/128MB                     
   All iterations          232.0  235.8    1.7%
   First iteration          46.1   32.1  -30.3%
   Subsequent iteration     26.5   29.1    9.6%
                        	       
win98/800MHz/128MB                     
   All iterations          133.2  129.1   -3.1%
   First iteration          27.8   18.2  -34.5%
   Subsequent iteration     15.0   15.8    5.2%

...

Comment 65

18 years ago
(Just for the record, I mentioned this to vidur yesterday...) Are the cache 
entries (on disk) identical both with and without the change? Could we be 
failing to properly initialize the cache entry?
(Assignee)

Comment 66

18 years ago
Yeah, I started to look at this yesterday. The cache meta data seems to be the
same - at least for the first several sites in the test. I'll do some more
comparisions and post if I see anything different.

Comment 67

18 years ago
Converting from PDT+ to nsBranch.  Let's see this on the trunk for a little
before we bring it to the branch.  I appreciate the effort that's gone into
this, it's obviously a very hard problem.
Keywords: nsBranch
Whiteboard: [PDT+] patch in progress → patch in progress
(Assignee)

Comment 68

18 years ago
Moving to 0.9.3 since we haven't yet resolved the cache loading issue.
Target Milestone: mozilla0.9.2 → mozilla0.9.3
(Assignee)

Comment 69

18 years ago
I've spent some time discussing this with Darin and Frank and we're all stumped.
Other findings include:
1) The cache entries are essentially identical for runs with and without the
patch. There's a slight difference in ordering of cache meta data, but the meta
data itself and the set of files in the cache are identical.
2) I primed the cache with an ibench run on a build without the patch and then
ran the tests again with both builds. I still see the 4-5% difference for cached
loads on my 700MHz Windows 2k system. This, along with the previous finding,
absolves the cache entries (and the process of writing them out) from any direct
cause.
3) I ran a TCP sniffer to confirm that we aren't going back to the network for
specific files with a patched build. For both builds (with and without the
patch) we get no cache misses.
4) I stepped through new and existing code with both Frank and Darin. Darin and
I convinced ourselves that the code path for cached reads wasn't substantially
different with the patch. In fact, given the numbers that John has come up with,
it doesn't sound like problem can be correlated with additional processing (this
is substantiated by Quantify runs I did that showed no substantial differences),
but possibly with additional I/O. We have yet to understand what might cause us
to spend additional time doing I/O.

At this point, my inclination is to check the patch into the trunk and, pending
a bit of testing on the trunk, get it onto the Netscape post-0.9.2 branch. The
initial load time improvement for files with a META tag is significant and the
cached load problems don't seem to manifest themselves in jrgm's regular load
time tests (or so I've been told).

Can I get r's and sr's? 

Comment 70

18 years ago
r=harishd

Comment 71

18 years ago
sr=waterson. I spent some time looking at before and after jprof profiles, but
nothing really sticks out. I'll attach them here for posterity's sake.

Comment 72

18 years ago
Created attachment 40350 [details]
profile of iBench run before applying patch (.html.gz)

Comment 73

18 years ago
Created attachment 40352 [details]
profile after applying the patch
(Reporter)

Comment 74

18 years ago
vidur, are you landing this to trunk soon?

Comment 75

18 years ago
In patch v1.3, position  of kCharsetFromCache has been changed. I am not 
sure if this is really necessary. I read through the bug, and it does not 
seem to be the case. This change will cause a undesired side-effect. Suppose
a webpage with no charset meta tag, and user does not choose any charset 
detector, the default one is incorrect and webpage displayed incorrectly. 
Now user switch on a charset detector, but no change will be made in subsequent
load since charset from cache take precedence of autodetection. This is really
confusing to users. I would suggest that we take out that change before check 
it in to CVS. 
(Assignee)

Comment 76

18 years ago
The patch went into the trunk on 6/29.

The movement of kCharsetFromCache was required to prevent resniffing of the META
tag on cached loads. It's debatable whether the charset sniffing itself is
significant enough in the scheme of things to make a difference...but it seemed
like a good thing to eliminate, especially in light of the performance we're
getting for cached loads.

While the case that Shanjian mentions will result in unexpected behavior,
raising the priority of the charset in the cache helps other cases. For example,
it preserves any charset that's forced by the user through the charset menu.
Prior to the change, the forced charset is trumped by (possibly incorrect)
autodetection.

Comment 77

18 years ago
I disagree with you about the cache change. Yes, it improves in some other 
cases. Because of that, I have been tring to make such a change. But soon after 
I found out the gain is not worth the loss, and I had to backed out my change.
Please reconsider the scenario I raised, it is very common one. User always 
try to change charset/detector only after they see a page displayed incorrectly. 
If this does not work, the whole charset menu is not much useful. In my opinion, 
that is unacceptable. 

Comment 78

18 years ago
adding vtrunk keyword to indicate the bug has been fixed on the trunk.  Bug left
open for tracking checkin to branch (nsbranch) when appropriate.

Once bug has been fixed on the branch also, pls remove vtrunk keyword.
Keywords: vtrunk

Comment 79

18 years ago
The change in this area really worries me now. This is not
just an engineering issue. It is a usability issue. I would like
to know what menu action changes will be affected by the proposed fix. 

> While the case that Shanjian mentions will result in unexpected behavior,
> raising the priority of the charset in the cache helps other cases.

What does this mean? Does it mean that the cached charset is not 
overridable? We need more explanation of how the proposed fix will 
change the known behavior. Without additional explanation and consideration
of typical scenarios, it would be very difficult to evaluate the real 
worth of the fix proposed here.
I propose a meeting to discuss the issues involved.
(Assignee)

Comment 80

18 years ago
The cached charset is overridable by a charset forced by the user. To be frank,
I'm not sure what the correlation is between the different charset-related menu
items and the charset source enumeration. 

I'm willing to attend a meeting set up by the i18n group.

Comment 81

18 years ago
Yes, even after you change, user forced charset will override cached charset 
value. But newly selected autodetection result will not be able to override 
cached charset, that's the problem. 

Comment 82

18 years ago
> Yes, even after you change, user forced charset will override 
> cached charset value. But newly selected autodetection result 
> will not be able to override cached charset, that's the problem. 

I experimented with the trunk build with the current
patch in it, and I find that auto-detect cannot
oveeride the cached value.
We have had a spec on charset menu actions out there since
before NS 6.0/Mozilla M18. The spec calls for both the
user-initiated charset menu and auto-detect menu change to
override the HTTP, Document-based Meta charset, or the
cached charset value. 

The fact that auto-detection cannot oveeride the cached value as
it is in the trunk build is not acceptable. We have a consistent
meaning to user-initiated menu action, i.e. give it the 
priority it deserves. Not only this is the correct thing to
do but also what international users deserve. Insetad of forcing
the use to try out 3 or more menu items to display incorreclty
loaded page, auto-detect override let you do this in 1 move.
IE of course provides this and so have we so far until
this patch.
I request that we bring back the ability to have the auto-detect
oveeride the cached value. What we have currently is simply
a regression against the spec we had agreed on, You should not
change the user spec.
(Assignee)

Comment 83

18 years ago
Also, refer to the patch, in bug 88746 ( a spin off from this bug ), that fixes
a problem in DetectMetaCharset() method.

Comment 84

18 years ago
we found several problem with the current landing on the trunk. please do not 
land them into branch untill we address all of them.
I think we should at least do the following on the trunk
1. back out the nsIParser.h change. Don't change the charset source priority. It 
is too risky. And while we have both meta charset and auto detect, it may cause 
some problem.
2. add "ignore UTF-16,etc" code into the parser so we just ignore those charset.  
like what we did in nsMetaCharsetObserver. 

Updated

18 years ago
Blocks: 88746
(Assignee)

Comment 85

18 years ago
Fixes in bugs 89155, 89177 and 89169 are related to getting this patch on the
branch. The last involves reverting the weighting of the charset out of the
cache back to its original value. A fall out of this will be resniffing of the
META tag even for cached loads. Based on testing we did when we were looking at
the cached loading performance issues a while ago, I believe that there won't be
measurable performance impact from the resniffing.

Comment 86

18 years ago
Please also refer to the patch in bug 89732.
Blocks: 89732

Comment 87

18 years ago
Putting back PDT+, looks like we have something in hand now.
Whiteboard: patch in progress → PDT+, patch in progress

Updated

18 years ago
Depends on: 90479

Comment 88

18 years ago
Any update on this bug?  Is it ready to go in yet?

Comment 89

18 years ago
It's almost ready to go. I'm incorporating the patches into my branch build.
Once it's done ( and compiled ) I'll be ready to land. However, I will not be
able to land the changes today.

Comment 90

18 years ago
Created attachment 42182 [details] [diff] [review]
patch v1.4 [ patches integrated ]

Comment 91

18 years ago
Oops, the previous patch was from me not pollmann!!!

Comment 92

18 years ago
fix landed on the branch.

Comment 93

18 years ago
Now that the patch is on the branch and the trunk, marking bug fixed.
Status: NEW → RESOLVED
Last Resolved: 18 years ago18 years ago
Resolution: --- → FIXED

Updated

18 years ago
Keywords: vtrunk

Comment 94

18 years ago
Verified the fix together with Frank on 07-17 Win32 branch build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.