Closed
Bug 81253
Opened 24 years ago
Closed 23 years ago
charset request cause page load twice
Categories
(Core :: Internationalization, defect, P1)
Core
Internationalization
Tracking
()
VERIFIED
FIXED
mozilla0.9.3
People
(Reporter: cathleennscp, Assigned: vidur)
References
Details
(Whiteboard: PDT+, patch in progress)
Attachments
(9 files)
762 bytes,
patch
|
Details | Diff | Splinter Review | |
1.22 KB,
patch
|
Details | Diff | Splinter Review | |
10.49 KB,
patch
|
Details | Diff | Splinter Review | |
10.52 KB,
patch
|
Details | Diff | Splinter Review | |
21.89 KB,
patch
|
Details | Diff | Splinter Review | |
22.17 KB,
patch
|
Details | Diff | Splinter Review | |
287.50 KB,
application/octet-stream
|
Details | |
292.41 KB,
application/octet-stream
|
Details | |
21.69 KB,
patch
|
Details | Diff | Splinter Review |
We talked about this issue on Monday's performance meeting. Just want to make
sure a bug is filed... and Frank help investigate.
See also related bug 61363, where charset's 2nd request is being fetched from
server not from cache.
Past email thread
=====================
Yung-Fong Tang wrote:
> someone need to give me more context so I can help
>
> Cathleen Wang wrote:
>
> Fank, can you help investigate?
>
> ~ Cathleen
>
> Gagan Saksena wrote:
>
> > If we are still doing a double load inspite of this then something in
> > layout/intl./parser is broken. Cc'ing some relevant people-- Who should
> > get this bug?
> >
> > -Gagan
> >
> > Tom Everingham wrote:
> >
> > > there doesn't appear to be any significant speed differences when the
> > > default char set is changed to match ibench. Here are the numbers
> > > I'm getting ->
> > >
> > > Char encoding ISO-8859-1 Windows-1252
> > > ------------- ---------------
> > > All iterations: 342.66 343.57
> > > 1st iteration: 52.95 49.9
> > > Subsequent iteration 41.39 41.95
> > >
> > > (Avg of 3 runs of html load pages tests, 4 M mem cache, 40 M disk cache,
> > > 450 Mhz P3, Win NT)
> > >
> > > It appears as though ibench is sending out a meta tag setting the charset
> > > to windows-1252. ( That was the only charset item I could find.) Also,
> > > to be safe I did check to make sure that the preference settings took
> > > and that the ACCEPT header actually changed.
> > >
> > > Tom
> > >
Comment 1•24 years ago
|
||
see also http://bugscape.mcom.com/show_bug.cgi?id=4540, originally filed as
bug 64612 in bugzilla "Auto-Detect (All)" causes a duplicate HTTP GET (or POST)
to be sent".
Comment 2•24 years ago
|
||
nhotta and shanjian- please help to look at this one.
Assignee: ftang → shanjian
Comment 3•24 years ago
|
||
hofmann said we should call Tom Everingham about this.
Comment 4•24 years ago
|
||
I set break point at nsHTMLDocument::StartDocumentLoad and MetaCharObserver::Notify.
In case of iso-8859-1, browser only stop at StartDocumentLoad once. MetaCharObserver
did not report anything. Since the charset is same as our default. For win-1252,
it stop in StartDocumentLoad twice, and stop in Notify once. That's what we expected
since default (8859-1) is different from meta charset (win1252). So the page got
reload again. Either the reload happens through cache or not, that is another story.
We have a bug 61363 to handle that. So from my point of view, everything seems perfectly
fine.
I talked with Tom, and he told me he will post some new number for current build. After
that, let's decide how to handle this bug.
Updated•24 years ago
|
QA Contact: andreasb → ylong
Comment 5•24 years ago
|
||
I had a little problem with todays build so using 051604. I'm not exactly sure
where this bug is showing itself with these numbers but here they are.
Char encoding ISO-8859-1 Windows-1252
------------- ---------------
All iterations: 342.14 341.59
1st iteration: 49.76 49.4
Subsequent iteration 41.77 41.74
Comment 6•24 years ago
|
||
I don't see any problem in charset related operations. For iso-8859-1, not charset
notification is made. I could not see much difference between 8859-1 and win1252.
That's say charset request itself did not cause page load twice.
Cathleen, what was your expection on tom's data? Were you expecting to see
much less time used in subsequent page load because cache should be used
there? If so, please reassign the bug to gordon@netscape.com for consideration.
Assignee: shanjian → cathleen
Summary: charset request cause page load twice → cache does not make much difference in subsequent load???
There are 2 problems we see when running ibench tests:
1) charset causing fetching page twice, which is what we're trying to address
in this bug (change back summary)
2) charset's 2nd fetch is being loaded from server not from cache, which is a
known bug with cache, bug 61363 (don't need to address here)
Now, back to issue #1, tever has modified ibench tests to have the default
charset to match ibench pages, and page load time are still the same. So, the
purpose of this bug is to investigate and find either a) charset is really not
causing pages to be fetched twice, or b) charset is causing pages to be fetched
twice, with ibench test.
tever's test results didn't show any difference in performance number between
the two scenarios of charset configuration. With our assumption of b) is
what's going on with ibench test, we were hoping if a) is true, overall page
load time with ibench should improve some.
Shanjian or ftang, did you guys try running ibench tests to verify?
Summary: cache does not make much difference in subsequent load??? → charset request cause page load twice
Comment 8•24 years ago
|
||
maybe we should alias windows-1252 as ISO-8859-1. Currently, we treat them as
two seperate charset since windows-1252 have defined characters in 0x80-0x9f
range bug ISO-8859-1 don't . This could cause this issue.
Comment 9•24 years ago
|
||
I did not try ibench, I set break point inside program to see what really happennging.
I found:
a, if default charset is the same as page charset, meta charset/auto detection does not
cause document to be fetched twice. The document is not fetched
twice though. (I revisited 4540.Log in http server does not show the page has been
fetched twice.)
b, if the default charset does not match page charset, no mater if page charset is provided
through meta tag or detected by auto detection, page will be reloaded. Either this
reload happened inside cache or not is the concern of 61363,
Since this bug should address case a, I would suggest to mark it as worksforme. There are
many interpretion about ibench data.
1, In my case b, in most case meta charset can be found in the first block of http packet.
Charset notification is sent immediately, and first load is stopped then. So this "twice"
load might just take a little extra time that one load.
2, Cache works in reload and the transported packets are not loaded again.
3, Layout, rendering takes too much time to finish that network loading.
Reporter | ||
Comment 10•24 years ago
|
||
assign to gagan.
gagan was gonna see if they can narrow down to a small test case/page, and with
necko buffer set to a smaller size, back to how it was a few weeks ago.
Assignee: cathleen → gagan
Reporter | ||
Comment 11•23 years ago
|
||
ftang and shanjian,
we're finding charset reload problem get exposed again, see bug 82244. can you
guys see what you can do?
Assignee: gagan → ftang
Comment 13•23 years ago
|
||
I am still not convinced that the page is UNNECESSARILY loaded twice because of meta-charset,
but I can't say that is not the case either. It might be a little difficult for me to handle
this problem without a specific page. Could you let me know which specific page has this problem
and I will track down what cause the problem.
FYI, in case you want to confirm this yourself, you can set a break point at
http://lxr.mozilla.org/seamonkey/source/intl/chardet/src/nsMetaCharsetObserver.cpp
around line 132, which is the observer's notify function. This notification will
cause page to be reloaded. If we send this nofication with no good reason, we have a problem.
Comment 14•23 years ago
|
||
some though regarding to darin's comment in 82244.
When meta charset observer or charset detector requires page reload using new
charset, this reload should happen using cache instead of real reload. We certainly
can add some flag in nsWebShell.cpp::ReloadDocument. If new cache requires certain
flag to be set. But the reload must happen no matter how it happens if a new charset
is detected.
Comment 15•23 years ago
|
||
Now we all agree the page need to be reloaded. I close this bug as WFM. The performance
problem should be address in 82244.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
Comment 17•23 years ago
|
||
on winnt, i tried setting my default charset to windows-1252, and saw a huge
improvement on initial page loads running ibench. here are the numbers:
default optimized build: 44.06
w/ charset set to windows-1252: 31.88
that's a huge difference. so, IMO we need to prevent the double load or make
it more efficient. i think we can prevent it in 99% of the cases. if the
parser receives the <meta> tag in the first OnDataAvailable, then it should be
able to make the charset switch without requiring a full reload. performing
a double load is always going to suck BIG TIME... even if we are getting it from
the cache. i think we should try to optimize for the case when the <meta> tag
shows up in the first OnDataAvailable, and default to the current behavior
otherwise.
does this sound feasible?
Status: VERIFIED → REOPENED
Resolution: WORKSFORME → ---
Comment 18•23 years ago
|
||
I think that is doable, but its a lot of work.
Since most of the webpages are still latin1 based, we probably should treat
latin1 and cp1252 specially. In meta charset detector, we do not discriminate
between latin1 and cp1252. We have two options to implement this:
option 1, treat latin1 as a alias of cp1252, since latin1 is a subset of cp1252.
system default need to be changed cp1252.
option 2, we still keep system default as latin1, but handle latin1 as cp1252.
I would prefer option2, frank, what do you think?
Status: REOPENED → ASSIGNED
Comment 19•23 years ago
|
||
ok, cut and paste some email discussion here:
ftang:
I personally do not have strong objectation to make such change. It is a simple
change in the
xpfe/browser/resources/locale/en-US/navigator.properties
change from
intl.charset.default=ISO-8859-1
to
intl.charset.default=windows-1252
However, this setting is cross-platform. I am not sure what will be react to
that if we ship with this setting. I am very afraid this
will make some people mad .
We can also choose to apply this only for Netscape if we want to do so- see file
/ns/xpfe/browser/resources/locale/en-US/navigator.properties
bobj/momoi/i18ngrp-
need your help.
Shanjian Li wrote:
IE is using latin1 (ie, iso-8859-1). But IE does not discriminate between
latin1 and cp1252 as we do. We might want to do
something similar as I proposed in the bug 81253 to achieve the same result.
If we want this to happen quickly, we need to
get frank involved.
shanjian
Gagan Saksena wrote:
Frank/Shanjian:
Is there a reason why we don't set our default charset to the same as IE's?
IE seems to be using Unicode (Western
European) whatever that maps to (is that Windows-1252?) and we are using
Western (IS-8859-1)
With this change, as Darin demonstrated (see bug 81253) we could make a
significant improvement on ibench page
loading. Note that I am not proposing this as a way to fix bug 81253 but
only a way to minimize its occurence. We still
have to fix it to get the best possible results.
-Gagan
Comment 20•23 years ago
|
||
Do we really do a 2 full loads? I thought the original design was for the
observer to notify as soon as the parser parsed a META charset, then current
load would be aborted, and then the reload the page. The META tag should
hopefully be in the first or second block from the network, so the abort
would happen before much of the load had occurred. Is this not how it works?
Anyway, changing the default to cp1252 (for English releases) should be OK.
It's a bit counter to promoting standards (ISO vs MS), but for most cases,
users should see no difference. The loading performance is probably
not that perceptible when loading individual pages. Ibench and tests like
it are a bit artificial in that they are loading in rapid succession.
But just switching the default may mean we perform poorly when a different
set of data (mostly ISO-8859-1 pages) is used.
Can we add a special optimization in the MetaCharsetObserver to check if the
default and meta charsets are cp1252 and iso-8859-1, and if so do NOT reload?
We probably still need to update the charset for the docshell for other
reasons, but without a reload. Can we switch the Unicode converter in
mid-load? A potential problem when the default is iso-8859-1 is that
characters that are in the cp1252 superset (e.g., smart quotes) may have
been misconverted into Unicode.
Comment 21•23 years ago
|
||
Yes, we really do two full loads. The pages are short.
harishd, how hard would it be to add charset detection along with the DOCTYPE
sniffing code?
Comment 22•23 years ago
|
||
how about treat windows-1252 as ISO-8859-1? that seems what we did in 4.x
Here is the patch. don't land that untill we full agree with it. ok?
Comment 23•23 years ago
|
||
Comment 24•23 years ago
|
||
again, try the previouse patch but don't check it in. we need more analysis of
the impact of that change.
Comment 26•23 years ago
|
||
with this patch i see a 28% initial page load improvement on dogspit/ibench.
Comment 27•23 years ago
|
||
bobj: there are a number of problems with the reload (even if the first load
hasn't been fully completed):
1) we drop a potential keep-alive connection
2) we waste cycles writing data to the disk cache only to delete it almost
immediately.
3) we have the added expense (i would think) of tearing down and recreating
docshell related stuff in order to process the reload.
Comment 28•23 years ago
|
||
Here is the plan
check in the last patch so we treat windows-1252 and ISO-8859-1 as the same
We may want to consider the following:
1. change the ISO-8859-1 unicode encoder wo we can convert to the 0x80-0x9F part
of windows-1252, but if we do so , we also need to
2. make sure the unicode encoder used by xwindow fonts do not include that part,
we may need to zap out anything > 0xFF or create a new uniocode encoder for it.
3. how about mail news?
Assignee: shanjian → ftang
Status: ASSIGNED → NEW
Target Milestone: mozilla0.9.3 → mozilla0.9.2
Updated•23 years ago
|
Status: NEW → ASSIGNED
Comment 29•23 years ago
|
||
we should check in the current patch as is for now to get QA feedback.
Comment 30•23 years ago
|
||
ftang: why don't we see what harishd has to say about meta-charset sniffing? I
talked with vidur about this today, and he says it should be a pretty
straight-forward thing to do.
Comment 31•23 years ago
|
||
pdt+ base on 6/11 pdt meeting.
Updated•23 years ago
|
Whiteboard: [PDT+]
Comment 32•23 years ago
|
||
waterson:
>why don't we see what harishd has to say about meta-charset sniffing?
we could. but do you want to wait?
Comment 33•23 years ago
|
||
the other way we can do is treat ISO-8859-1 as windows-1252, then
we don't need to worry about form posting.
Comment 34•23 years ago
|
||
Comment 35•23 years ago
|
||
you should only apply ONE of the above patches, nevery apply both to you tree.
Darin, can you undo the first one and try the 2nd one. I think it should give
you the same performance result.
Comment 37•23 years ago
|
||
ftang, what do you expect the side-effects of this change to be? Also cc'ing
momoi, who seemed to think this might not be a good idea...
Comment 38•23 years ago
|
||
Frank and I chatted about this in his cube.
Both his patches generally do the same thing which is to alias cp1252 and
iso-8859-1 as the same. The difference is which is the main and which is
the alias charset.
The big win for Frank's approach is that it is merely a mapping change and
the risk of introducing new bugs is extremely low. So this does not
preclude doing a better fix later.
There are undesirable side effects caused by converting FROM Unicode INTO the
"main" charset for codepoints outside (superset) of the iso-8859-1 set. Areas
affected would be forms submission, composer, and maybe save-as.
I gotta do something else right now, Frank or I will document these side
effects very soon today.
Comment 39•23 years ago
|
||
> ftang, what do you expect the side-effects of this change to be?
> Also cc'ing momoi, who seemed to think this might not be a good idea...
I don't have an objection to the approach taken in the proposed
patches by ftang. In mail, I did voice an objection against re-setting
the default encoding to Windows-1252 just for Windows platform. This
ideas was floated in some of the comments above. I don't think we
should do that and stick to ISO-8859-1 instead as the default UI
setting. What we do internally might be somewhat different.
Assignee | ||
Comment 40•23 years ago
|
||
I've started working on the meta tag sniffing solution in the parser. ETA:
Thursday, 6/14.
Comment 41•23 years ago
|
||
>ftang, what do you expect the side-effects of this change to be? Also cc'ing
>momoi, who seemed to think this might not be a good idea...
momoi's objection in mail is aginst a different proposal- change the default
setting.
both my patch here does not change default setting, but instead treat both
charset name as the same meaning, which is not what momoi object.
I personallyu prefer the later patch since then we don't need to worry about
form submission. I would like to ask nhotta test mail with the later patch and
make sure we didn't break anything in mail before we make decision.
Comment 42•23 years ago
|
||
I applied the patch on 06/11/01 18:27, and found a couple of problems.
* Charset menu shows "windows-1252" for pages which actual charst is
"ISO-8859-1". This is also true for mail (both view and compose).
* No menu check mark for the above cases. Possibly because the actual charset
and the menu item do not match.
Assignee | ||
Comment 43•23 years ago
|
||
Assignee | ||
Comment 44•23 years ago
|
||
Comment 45•23 years ago
|
||
Additional info to my last comment (2001-06-13 14:35).
* the charset check mark problem was for browser and mail compose, mail view
does not have the problem.
* mail is sent out as ISO-8859-1, but if you select windows-1252 from the menu
then it sends out as windows-1252.
* some code do special handling for ISO-8859-1, those may break if changing to
windows-1252.
e.g. aCharset.EqualsIgnoreCase("ISO-8859-1")
http://lxr.mozilla.org/seamonkey/search?string=ISO-8859-1
Assignee | ||
Comment 46•23 years ago
|
||
The parser sniffing code posted above looks in up to the first 2k (an arbitrary
number than can be changed) of the first buffer delivered by Necko for a META
tag, prior to the initial unicode conversion done by the parser. If it finds a
META tag with charset information, the charset and its origin are recorded and
used by the parser. The nsMetaCharsetObserver code will still be triggered, but
will not do a reload. If a META tag with charset information exists, but is not
in the first buffer delivered or in the first 2k of the file, we will fall back
to the existing reloading scheme.
Comment 47•23 years ago
|
||
Vidur, I tried your patch on an optimized build against iBench. It
_significantly_ cuts down on first-page load time, but it appears that it also
negatively impacts cached page load time. I'm not sure why. Here are the numbers:
Overall First Cached
2001-06-14 (post darin's fixes) 147.65 29.93 16.82
2001-06-14 with patch 140.29 18.98 17.33
As you can see, first page dropped almost by a factor of two, but cached load
increased by a bit (~3%). I was able to reproduce these results pretty
consistently. (I did a second set of runs, and the overall load time fluctuated
by about a second, run-to-run).
I guess it'd be interesting to see if this had the same sort of effect on jrgm's
tests (which don't have <META> tags). Vidur, do you think there'd be any way to
consolidate this stuff with the DOCTYPE sniffing? (As an aside, just spoke with
dbaron; since we don't use the strict DTD anymore, perhaps we don't need to look
at the DOCTYPE at _all_ up front, and can collect the information as part of the
normal parsing process?)
Comment 48•23 years ago
|
||
I should also add, r=waterson on your patch, if you need it. (And we check
jrgm's stuff to see if it has any effects there...)
Assignee | ||
Comment 49•23 years ago
|
||
jrgm is getting the same results on his run of the ibench test i.e. a slight
increase in cached load performance. This is extremely curious, since the
sniffing code isn't (well...shouldn't be) invoked on a cached load. jrgm says
that he'd seen similar results when he manually changed our version of the
ibench pages to not have a META charset. So it sounds like the increase is
unrelated to the specifics of my patch.
I believe dbaron's right about not needing to do upfront DOCTYPE sniffing
anymore. We still need DOCTYPE information to determine whether to do strict or
quirks layout, but that can be determined later in the parsing/content creation
process. Harish should probably look at this.
Comment 50•23 years ago
|
||
chris: did you clear out your cache for these numbers? The cached version and
the first one are fairly close to each other and I wonder if that was becuz we
were still running the objects from the cache.
Comment 51•23 years ago
|
||
Yes, I cleared my cache. (I was so surprised that I ran it twice, just to be
sure -- clearing the cache before each run.)
Comment 52•23 years ago
|
||
vidur: I think we still make decisions, based on DOCTYPE, at the tokenizer
level. Ex. When XHTML document served as text/html. Though I have to double check.
Comment 53•23 years ago
|
||
reassign this to vidur because now it seems we want to fix this problem by using
the parser charset sniffing code vidur/harishd put together.
Please make sure don't check in the
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=37925
http://bugzilla.mozilla.org/showattachment.cgi?attach_id=38001
because that two patch may cause other undesirable side effects.
Assignee: ftang → vidur
Status: ASSIGNED → NEW
Comment 54•23 years ago
|
||
vidur: why the 2k limit? why not scan everything in the first ODA?
Assignee | ||
Comment 55•23 years ago
|
||
Darin: the 2k limit was arbitrary and seemed like a reasonable boundary within
which one would expect a META tag. Is there an upper bound on the size of the
first buffer? Since the sniffing is not particularly processor intensive, maybe
the question itself is moot.
Comment 56•23 years ago
|
||
I ran an opt. mozilla build, with and without vidur's patch on 3
machine (source pulled 06/18 evening, except for the 266 machine,
which was done with 06/13 source).
I tested against my page load test, which (since the original content
didn't have any) doesn't have any 'meta charset' tags that aren't
iso-8859-1. [Also note that in my test, the top level document is
never cached]. The results for those tests showed no difference in
page load times, which is the ~expected result.
I also ran against the ibench content, configured on an internal
server, and got these times.
win98 800MHz/128MB
#1 #2
Without patch.
All iterations 132.26 131.66
First iteration 26.14 25.98
Subsequent iteration 15.16 15.10
With patch: % Change
All iterations 129.41 128.91 -2.1%
First iteration 18.29 18.13 -30.1%
Subsequent iteration 15.87 15.83 4.8%
win2k 500MHz/128MB
#1 #2
Without patch.
All iterations 271.03 270.72
First iteration 34.59 34.27
Subsequent iteration 33.78 33.78
With patch: % Change
All iterations 247.91 248.42 9.2%
First iteration 48.16 48.47 -28.7%
Subsequent iteration 28.54 28.57 18.3%
win98 266MHz/128MB
#1
Without patch.
All iterations 568.12
First iteration 94.50
Subsequent iteration 61.43
With patch: % Change
All iterations 490.81 -13.6%
First iteration 67.66 -28.4%
Subsequent iteration 61.34 0.0%
So, by sniffing and avoiding the need to reload, with the negative
consequences that has, particularly on image loads, then the 'First
Iteration' times drop dramatically on all machines/speeds. But,
at least on the 500 and 800 system (and in waterson's earlier note)
this raises the times when pulling out of the cache.
I don't really understand why this is doing this, but it is, at least
as measured. (I wasn't able to determine if this was some odd artifact
of the way ibench does the test, but this is the real amount of time
that it is taking to complete the sequence of steps [i.e., there is no
time miscount for the given task]. I still think there is something else
odd, since, if I immediately rerun the test without shutting down, my
cached times drop, from the initial results.)
One thing I confirmed (with good 'ol printf) was that we are calling
nsParser::DetectMetaTag even when pulling the document from the
cache. So, looks like we need to cut that out (since it is costing).
Assignee | ||
Comment 57•23 years ago
|
||
John's right - we *are* calling nsParser::DetectMetaTag (and the original
intl-based meta tag observer) during subsequent loads. For some reason, the
charset retrieved from the cache is considered to be of a lower weighting than
one determined from a meta tag (see the charset source enum in nsIParser.h).
Switching the weightings may make sense, but further investigation shows that
the charset retrieved from the cache may not be correct in the first place.
I'll post again when I have more details.
Assignee | ||
Comment 58•23 years ago
|
||
After talking with ftang, I've made the following changes:
1) The original patch that I posted just recorded the sniffed charset in the
parser. This charset wasn't percolated up as it should have been. We now notify
the document and its charset observers of the detected charset.
2) The original code that set the charset in the cache relied on a reload
occuring. We now set the cache meta data directly in the parser.
3) The weighting of a charset recorded in cache meta data was lower than the
sniffed charset. This is now no longer the case - we use the cached value rather
than resniffing.
I'll post the new patch momentarily. Unfortunately, none of this explains the
performance results...unless rerunning the sniffing code on a subsequent reload
somehow explains the performance degradation.
Assignee | ||
Comment 59•23 years ago
|
||
Assignee | ||
Comment 60•23 years ago
|
||
I've found a problem that *could* account for some part of the performance
degradation we're seeing in the cached case. I wasn't normalizing the name of
the charset sniffed from the META tag. As a result, there's a chance that we
might have been going down a different intl charset conversion path than we did
before. A patch to fix this problem can be found below.
Assignee | ||
Comment 61•23 years ago
|
||
Updated•23 years ago
|
Whiteboard: [PDT+] → [PDT+] patch available
Comment 62•23 years ago
|
||
(vidur: I did a little clobbering in htmlparser and intl, and my earlier
problem with loading went away. Who knows.)
At any rate, I've tried patch 'v1.3' and I am still getting elevated times
for the "subsequent" page loads.
Comment 63•23 years ago
|
||
*** Bug 86289 has been marked as a duplicate of this bug. ***
Updated•23 years ago
|
Whiteboard: [PDT+] patch available → [PDT+] patch in progress
Comment 64•23 years ago
|
||
Okay, I tried out a couple of builds from vidur, one being trunk, and the
other with patch v1.3 above. I ran them on four machines: a lower-end Win98,
a midrange win2k, a midrange win98 and a higher-end win98.
Delta
win98/266MHz/128MB
All iterations 509.2 484.0 -5.0%
First iteration 90.1 65.9 -26.9%
Subsequent iteration 59.8 59.7 -0.3%
win2k/500MHz/128MB
All iterations 244.3 266.8 9.2%
First iteration 46.8 33.5 -28.4%
Subsequent iteration 28.2 33.3 18.1%
win98/500MHz/128MB
All iterations 232.0 235.8 1.7%
First iteration 46.1 32.1 -30.3%
Subsequent iteration 26.5 29.1 9.6%
win98/800MHz/128MB
All iterations 133.2 129.1 -3.1%
First iteration 27.8 18.2 -34.5%
Subsequent iteration 15.0 15.8 5.2%
...
Comment 65•23 years ago
|
||
(Just for the record, I mentioned this to vidur yesterday...) Are the cache
entries (on disk) identical both with and without the change? Could we be
failing to properly initialize the cache entry?
Assignee | ||
Comment 66•23 years ago
|
||
Yeah, I started to look at this yesterday. The cache meta data seems to be the
same - at least for the first several sites in the test. I'll do some more
comparisions and post if I see anything different.
Comment 67•23 years ago
|
||
Converting from PDT+ to nsBranch. Let's see this on the trunk for a little
before we bring it to the branch. I appreciate the effort that's gone into
this, it's obviously a very hard problem.
Keywords: nsBranch
Whiteboard: [PDT+] patch in progress → patch in progress
Assignee | ||
Comment 68•23 years ago
|
||
Moving to 0.9.3 since we haven't yet resolved the cache loading issue.
Target Milestone: mozilla0.9.2 → mozilla0.9.3
Assignee | ||
Comment 69•23 years ago
|
||
I've spent some time discussing this with Darin and Frank and we're all stumped.
Other findings include:
1) The cache entries are essentially identical for runs with and without the
patch. There's a slight difference in ordering of cache meta data, but the meta
data itself and the set of files in the cache are identical.
2) I primed the cache with an ibench run on a build without the patch and then
ran the tests again with both builds. I still see the 4-5% difference for cached
loads on my 700MHz Windows 2k system. This, along with the previous finding,
absolves the cache entries (and the process of writing them out) from any direct
cause.
3) I ran a TCP sniffer to confirm that we aren't going back to the network for
specific files with a patched build. For both builds (with and without the
patch) we get no cache misses.
4) I stepped through new and existing code with both Frank and Darin. Darin and
I convinced ourselves that the code path for cached reads wasn't substantially
different with the patch. In fact, given the numbers that John has come up with,
it doesn't sound like problem can be correlated with additional processing (this
is substantiated by Quantify runs I did that showed no substantial differences),
but possibly with additional I/O. We have yet to understand what might cause us
to spend additional time doing I/O.
At this point, my inclination is to check the patch into the trunk and, pending
a bit of testing on the trunk, get it onto the Netscape post-0.9.2 branch. The
initial load time improvement for files with a META tag is significant and the
cached load problems don't seem to manifest themselves in jrgm's regular load
time tests (or so I've been told).
Can I get r's and sr's?
Comment 70•23 years ago
|
||
r=harishd
Comment 71•23 years ago
|
||
sr=waterson. I spent some time looking at before and after jprof profiles, but
nothing really sticks out. I'll attach them here for posterity's sake.
Comment 72•23 years ago
|
||
Comment 73•23 years ago
|
||
Reporter | ||
Comment 74•23 years ago
|
||
vidur, are you landing this to trunk soon?
Comment 75•23 years ago
|
||
In patch v1.3, position of kCharsetFromCache has been changed. I am not
sure if this is really necessary. I read through the bug, and it does not
seem to be the case. This change will cause a undesired side-effect. Suppose
a webpage with no charset meta tag, and user does not choose any charset
detector, the default one is incorrect and webpage displayed incorrectly.
Now user switch on a charset detector, but no change will be made in subsequent
load since charset from cache take precedence of autodetection. This is really
confusing to users. I would suggest that we take out that change before check
it in to CVS.
Assignee | ||
Comment 76•23 years ago
|
||
The patch went into the trunk on 6/29.
The movement of kCharsetFromCache was required to prevent resniffing of the META
tag on cached loads. It's debatable whether the charset sniffing itself is
significant enough in the scheme of things to make a difference...but it seemed
like a good thing to eliminate, especially in light of the performance we're
getting for cached loads.
While the case that Shanjian mentions will result in unexpected behavior,
raising the priority of the charset in the cache helps other cases. For example,
it preserves any charset that's forced by the user through the charset menu.
Prior to the change, the forced charset is trumped by (possibly incorrect)
autodetection.
Comment 77•23 years ago
|
||
I disagree with you about the cache change. Yes, it improves in some other
cases. Because of that, I have been tring to make such a change. But soon after
I found out the gain is not worth the loss, and I had to backed out my change.
Please reconsider the scenario I raised, it is very common one. User always
try to change charset/detector only after they see a page displayed incorrectly.
If this does not work, the whole charset menu is not much useful. In my opinion,
that is unacceptable.
Comment 78•23 years ago
|
||
adding vtrunk keyword to indicate the bug has been fixed on the trunk. Bug left
open for tracking checkin to branch (nsbranch) when appropriate.
Once bug has been fixed on the branch also, pls remove vtrunk keyword.
Keywords: vtrunk
Comment 79•23 years ago
|
||
The change in this area really worries me now. This is not
just an engineering issue. It is a usability issue. I would like
to know what menu action changes will be affected by the proposed fix.
> While the case that Shanjian mentions will result in unexpected behavior,
> raising the priority of the charset in the cache helps other cases.
What does this mean? Does it mean that the cached charset is not
overridable? We need more explanation of how the proposed fix will
change the known behavior. Without additional explanation and consideration
of typical scenarios, it would be very difficult to evaluate the real
worth of the fix proposed here.
I propose a meeting to discuss the issues involved.
Assignee | ||
Comment 80•23 years ago
|
||
The cached charset is overridable by a charset forced by the user. To be frank,
I'm not sure what the correlation is between the different charset-related menu
items and the charset source enumeration.
I'm willing to attend a meeting set up by the i18n group.
Comment 81•23 years ago
|
||
Yes, even after you change, user forced charset will override cached charset
value. But newly selected autodetection result will not be able to override
cached charset, that's the problem.
Comment 82•23 years ago
|
||
> Yes, even after you change, user forced charset will override
> cached charset value. But newly selected autodetection result
> will not be able to override cached charset, that's the problem.
I experimented with the trunk build with the current
patch in it, and I find that auto-detect cannot
oveeride the cached value.
We have had a spec on charset menu actions out there since
before NS 6.0/Mozilla M18. The spec calls for both the
user-initiated charset menu and auto-detect menu change to
override the HTTP, Document-based Meta charset, or the
cached charset value.
The fact that auto-detection cannot oveeride the cached value as
it is in the trunk build is not acceptable. We have a consistent
meaning to user-initiated menu action, i.e. give it the
priority it deserves. Not only this is the correct thing to
do but also what international users deserve. Insetad of forcing
the use to try out 3 or more menu items to display incorreclty
loaded page, auto-detect override let you do this in 1 move.
IE of course provides this and so have we so far until
this patch.
I request that we bring back the ability to have the auto-detect
oveeride the cached value. What we have currently is simply
a regression against the spec we had agreed on, You should not
change the user spec.
Assignee | ||
Comment 83•23 years ago
|
||
Also, refer to the patch, in bug 88746 ( a spin off from this bug ), that fixes
a problem in DetectMetaCharset() method.
Comment 84•23 years ago
|
||
we found several problem with the current landing on the trunk. please do not
land them into branch untill we address all of them.
I think we should at least do the following on the trunk
1. back out the nsIParser.h change. Don't change the charset source priority. It
is too risky. And while we have both meta charset and auto detect, it may cause
some problem.
2. add "ignore UTF-16,etc" code into the parser so we just ignore those charset.
like what we did in nsMetaCharsetObserver.
Assignee | ||
Comment 85•23 years ago
|
||
Fixes in bugs 89155, 89177 and 89169 are related to getting this patch on the
branch. The last involves reverting the weighting of the charset out of the
cache back to its original value. A fall out of this will be resniffing of the
META tag even for cached loads. Based on testing we did when we were looking at
the cached loading performance issues a while ago, I believe that there won't be
measurable performance impact from the resniffing.
Comment 87•23 years ago
|
||
Putting back PDT+, looks like we have something in hand now.
Whiteboard: patch in progress → PDT+, patch in progress
Comment 88•23 years ago
|
||
Any update on this bug? Is it ready to go in yet?
Comment 89•23 years ago
|
||
It's almost ready to go. I'm incorporating the patches into my branch build.
Once it's done ( and compiled ) I'll be ready to land. However, I will not be
able to land the changes today.
Comment 90•23 years ago
|
||
Comment 91•23 years ago
|
||
Oops, the previous patch was from me not pollmann!!!
Comment 92•23 years ago
|
||
fix landed on the branch.
Comment 93•23 years ago
|
||
Now that the patch is on the branch and the trunk, marking bug fixed.
Status: NEW → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 94•23 years ago
|
||
Verified the fix together with Frank on 07-17 Win32 branch build.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•