81253 - charset request cause page load twice

Reporter

Description

•

24 years ago

We talked about this issue on Monday's performance meeting. Just want to make sure a bug is filed... and Frank help investigate. See also related bug 61363, where charset's 2nd request is being fetched from server not from cache. Past email thread ===================== Yung-Fong Tang wrote: > someone need to give me more context so I can help > > Cathleen Wang wrote: > > Fank, can you help investigate? > > ~ Cathleen > > Gagan Saksena wrote: > > > If we are still doing a double load inspite of this then something in > > layout/intl./parser is broken. Cc'ing some relevant people-- Who should > > get this bug? > > > > -Gagan > > > > Tom Everingham wrote: > > > > > there doesn't appear to be any significant speed differences when the > > > default char set is changed to match ibench. Here are the numbers > > > I'm getting -> > > > > > > Char encoding ISO-8859-1 Windows-1252 > > > ------------- --------------- > > > All iterations: 342.66 343.57 > > > 1st iteration: 52.95 49.9 > > > Subsequent iteration 41.39 41.95 > > > > > > (Avg of 3 runs of html load pages tests, 4 M mem cache, 40 M disk cache, > > > 450 Mhz P3, Win NT) > > > > > > It appears as though ibench is sending out a meta tag setting the charset > > > to windows-1252. ( That was the only charset item I could find.) Also, > > > to be safe I did check to make sure that the preference settings took > > > and that the ACCEPT header actually changed. > > > > > > Tom > > >

Cathleen

Reporter

Updated

•

24 years ago

Blocks: 71668

John Morrison

Comment 1

•

24 years ago

see also http://bugscape.mcom.com/show_bug.cgi?id=4540, originally filed as bug 64612 in bugzilla "Auto-Detect (All)" causes a duplicate HTTP GET (or POST) to be sent".

Frank Tang

Comment 2

•

24 years ago

nhotta and shanjian- please help to look at this one.

Assignee: ftang → shanjian

Frank Tang

Comment 3

•

24 years ago

hofmann said we should call Tom Everingham about this.

Shanjian Li

Comment 4

•

24 years ago

I set break point at nsHTMLDocument::StartDocumentLoad and MetaCharObserver::Notify. In case of iso-8859-1, browser only stop at StartDocumentLoad once. MetaCharObserver did not report anything. Since the charset is same as our default. For win-1252, it stop in StartDocumentLoad twice, and stop in Notify once. That's what we expected since default (8859-1) is different from meta charset (win1252). So the page got reload again. Either the reload happens through cache or not, that is another story. We have a bug 61363 to handle that. So from my point of view, everything seems perfectly fine. I talked with Tom, and he told me he will post some new number for current build. After that, let's decide how to handle this bug.

Andreas Becker

Updated

•

24 years ago

QA Contact: andreasb → ylong

Tom Everingham

Comment 5

•

24 years ago

I had a little problem with todays build so using 051604. I'm not exactly sure where this bug is showing itself with these numbers but here they are. Char encoding ISO-8859-1 Windows-1252 ------------- --------------- All iterations: 342.14 341.59 1st iteration: 49.76 49.4 Subsequent iteration 41.77 41.74

Shanjian Li

Comment 6

•

24 years ago

I don't see any problem in charset related operations. For iso-8859-1, not charset notification is made. I could not see much difference between 8859-1 and win1252. That's say charset request itself did not cause page load twice. Cathleen, what was your expection on tom's data? Were you expecting to see much less time used in subsequent page load because cache should be used there? If so, please reassign the bug to gordon@netscape.com for consideration.

Assignee: shanjian → cathleen

Summary: charset request cause page load twice → cache does not make much difference in subsequent load???

Cathleen

Reporter

Comment 7

•

24 years ago

There are 2 problems we see when running ibench tests: 1) charset causing fetching page twice, which is what we're trying to address in this bug (change back summary) 2) charset's 2nd fetch is being loaded from server not from cache, which is a known bug with cache, bug 61363 (don't need to address here) Now, back to issue #1, tever has modified ibench tests to have the default charset to match ibench pages, and page load time are still the same. So, the purpose of this bug is to investigate and find either a) charset is really not causing pages to be fetched twice, or b) charset is causing pages to be fetched twice, with ibench test. tever's test results didn't show any difference in performance number between the two scenarios of charset configuration. With our assumption of b) is what's going on with ibench test, we were hoping if a) is true, overall page load time with ibench should improve some. Shanjian or ftang, did you guys try running ibench tests to verify?

Summary: cache does not make much difference in subsequent load??? → charset request cause page load twice

Frank Tang

Comment 8

•

24 years ago

maybe we should alias windows-1252 as ISO-8859-1. Currently, we treat them as two seperate charset since windows-1252 have defined characters in 0x80-0x9f range bug ISO-8859-1 don't . This could cause this issue.

Shanjian Li

Comment 9

•

24 years ago

I did not try ibench, I set break point inside program to see what really happennging. I found: a, if default charset is the same as page charset, meta charset/auto detection does not cause document to be fetched twice. The document is not fetched twice though. (I revisited 4540.Log in http server does not show the page has been fetched twice.) b, if the default charset does not match page charset, no mater if page charset is provided through meta tag or detected by auto detection, page will be reloaded. Either this reload happened inside cache or not is the concern of 61363, Since this bug should address case a, I would suggest to mark it as worksforme. There are many interpretion about ibench data. 1, In my case b, in most case meta charset can be found in the first block of http packet. Charset notification is sent immediately, and first load is stopped then. So this "twice" load might just take a little extra time that one load. 2, Cache works in reload and the transported packets are not loaded again. 3, Layout, rendering takes too much time to finish that network loading.

Cathleen

Reporter

Comment 10

•

24 years ago

assign to gagan. gagan was gonna see if they can narrow down to a small test case/page, and with necko buffer set to a smaller size, back to how it was a few weeks ago.

Assignee: cathleen → gagan

Cathleen

Reporter

Updated

•

23 years ago

Blocks: 82244

Cathleen

Reporter

Comment 11

•

23 years ago

ftang and shanjian, we're finding charset reload problem get exposed again, see bug 82244. can you guys see what you can do?

Assignee: gagan → ftang

Frank Tang

Comment 12

•

23 years ago

shanjian- please take a look at this again.

Assignee: ftang → shanjian

Shanjian Li

Comment 13

•

23 years ago

I am still not convinced that the page is UNNECESSARILY loaded twice because of meta-charset, but I can't say that is not the case either. It might be a little difficult for me to handle this problem without a specific page. Could you let me know which specific page has this problem and I will track down what cause the problem. FYI, in case you want to confirm this yourself, you can set a break point at http://lxr.mozilla.org/seamonkey/source/intl/chardet/src/nsMetaCharsetObserver.cpp around line 132, which is the observer's notify function. This notification will cause page to be reloaded. If we send this nofication with no good reason, we have a problem.

Shanjian Li

Comment 14

•

23 years ago

some though regarding to darin's comment in 82244. When meta charset observer or charset detector requires page reload using new charset, this reload should happen using cache instead of real reload. We certainly can add some flag in nsWebShell.cpp::ReloadDocument. If new cache requires certain flag to be set. But the reload must happen no matter how it happens if a new charset is detected.

Shanjian Li

Comment 15

•

23 years ago

Now we all agree the page need to be reloaded. I close this bug as WFM. The performance problem should be address in 82244.

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → WORKSFORME

Teruko Kobayashi

Comment 16

•

23 years ago

Verified.

Status: RESOLVED → VERIFIED

Darin Fisher

Comment 17

•

23 years ago

on winnt, i tried setting my default charset to windows-1252, and saw a huge improvement on initial page loads running ibench. here are the numbers: default optimized build: 44.06 w/ charset set to windows-1252: 31.88 that's a huge difference. so, IMO we need to prevent the double load or make it more efficient. i think we can prevent it in 99% of the cases. if the parser receives the <meta> tag in the first OnDataAvailable, then it should be able to make the charset switch without requiring a full reload. performing a double load is always going to suck BIG TIME... even if we are getting it from the cache. i think we should try to optimize for the case when the <meta> tag shows up in the first OnDataAvailable, and default to the current behavior otherwise. does this sound feasible?

Status: VERIFIED → REOPENED

Resolution: WORKSFORME → ---

Shanjian Li

Comment 18

•

23 years ago

I think that is doable, but its a lot of work. Since most of the webpages are still latin1 based, we probably should treat latin1 and cp1252 specially. In meta charset detector, we do not discriminate between latin1 and cp1252. We have two options to implement this: option 1, treat latin1 as a alias of cp1252, since latin1 is a subset of cp1252. system default need to be changed cp1252. option 2, we still keep system default as latin1, but handle latin1 as cp1252. I would prefer option2, frank, what do you think?

Status: REOPENED → ASSIGNED

Frank Tang

Comment 19

•

23 years ago

ok, cut and paste some email discussion here: ftang: I personally do not have strong objectation to make such change. It is a simple change in the xpfe/browser/resources/locale/en-US/navigator.properties change from intl.charset.default=ISO-8859-1 to intl.charset.default=windows-1252 However, this setting is cross-platform. I am not sure what will be react to that if we ship with this setting. I am very afraid this will make some people mad . We can also choose to apply this only for Netscape if we want to do so- see file /ns/xpfe/browser/resources/locale/en-US/navigator.properties bobj/momoi/i18ngrp- need your help. Shanjian Li wrote: IE is using latin1 (ie, iso-8859-1). But IE does not discriminate between latin1 and cp1252 as we do. We might want to do something similar as I proposed in the bug 81253 to achieve the same result. If we want this to happen quickly, we need to get frank involved. shanjian Gagan Saksena wrote: Frank/Shanjian: Is there a reason why we don't set our default charset to the same as IE's? IE seems to be using Unicode (Western European) whatever that maps to (is that Windows-1252?) and we are using Western (IS-8859-1) With this change, as Darin demonstrated (see bug 81253) we could make a significant improvement on ibench page loading. Note that I am not proposing this as a way to fix bug 81253 but only a way to minimize its occurence. We still have to fix it to get the best possible results. -Gagan

bobj

Comment 20

•

23 years ago

Do we really do a 2 full loads? I thought the original design was for the observer to notify as soon as the parser parsed a META charset, then current load would be aborted, and then the reload the page. The META tag should hopefully be in the first or second block from the network, so the abort would happen before much of the load had occurred. Is this not how it works? Anyway, changing the default to cp1252 (for English releases) should be OK. It's a bit counter to promoting standards (ISO vs MS), but for most cases, users should see no difference. The loading performance is probably not that perceptible when loading individual pages. Ibench and tests like it are a bit artificial in that they are loading in rapid succession. But just switching the default may mean we perform poorly when a different set of data (mostly ISO-8859-1 pages) is used. Can we add a special optimization in the MetaCharsetObserver to check if the default and meta charsets are cp1252 and iso-8859-1, and if so do NOT reload? We probably still need to update the charset for the docshell for other reasons, but without a reload. Can we switch the Unicode converter in mid-load? A potential problem when the default is iso-8859-1 is that characters that are in the cp1252 superset (e.g., smart quotes) may have been misconverted into Unicode.

Chris Waterson

Comment 21

•

23 years ago

Yes, we really do two full loads. The pages are short. harishd, how hard would it be to add charset detection along with the DOCTYPE sniffing code?

Frank Tang

Comment 22

•

23 years ago

how about treat windows-1252 as ISO-8859-1? that seems what we did in 4.x Here is the patch. don't land that untill we full agree with it. ok?

Frank Tang

Comment 23

•

23 years ago

Attached patch treat windows-1252 as ISO-8859-1 — Details — Splinter Review

Frank Tang

Comment 24

•

23 years ago

again, try the previouse patch but don't check it in. we need more analysis of the impact of that change.

Frank Tang

Comment 25

•

23 years ago

put into moz0.9.3

Target Milestone: --- → mozilla0.9.3

Darin Fisher

Comment 26

•

23 years ago

with this patch i see a 28% initial page load improvement on dogspit/ibench.

Darin Fisher

Comment 27

•

23 years ago

bobj: there are a number of problems with the reload (even if the first load hasn't been fully completed): 1) we drop a potential keep-alive connection 2) we waste cycles writing data to the disk cache only to delete it almost immediately. 3) we have the added expense (i would think) of tearing down and recreating docshell related stuff in order to process the reload.

Frank Tang

Comment 28

•

23 years ago

Here is the plan check in the last patch so we treat windows-1252 and ISO-8859-1 as the same We may want to consider the following: 1. change the ISO-8859-1 unicode encoder wo we can convert to the 0x80-0x9F part of windows-1252, but if we do so , we also need to 2. make sure the unicode encoder used by xwindow fonts do not include that part, we may need to zap out anything > 0xFF or create a new uniocode encoder for it. 3. how about mail news?

Assignee: shanjian → ftang

Status: ASSIGNED → NEW

Target Milestone: mozilla0.9.3 → mozilla0.9.2

Frank Tang

Updated

•

23 years ago

Status: NEW → ASSIGNED

Frank Tang

Comment 29

•

23 years ago

we should check in the current patch as is for now to get QA feedback.

Chris Waterson

Comment 30

•

23 years ago

ftang: why don't we see what harishd has to say about meta-charset sniffing? I talked with vidur about this today, and he says it should be a pretty straight-forward thing to do.

Frank Tang

Comment 31

•

23 years ago

pdt+ base on 6/11 pdt meeting.

Frank Tang

Updated

•

23 years ago

Whiteboard: [PDT+]

Frank Tang

Comment 32

•

23 years ago

waterson: >why don't we see what harishd has to say about meta-charset sniffing? we could. but do you want to wait?

Frank Tang

Comment 33

•

23 years ago

the other way we can do is treat ISO-8859-1 as windows-1252, then we don't need to worry about form posting.

Frank Tang

Comment 34

•

23 years ago

Attached patch treat ISO-8859-1 as windows-1252 — Details — Splinter Review

Frank Tang

Comment 35

•

23 years ago

you should only apply ONE of the above patches, nevery apply both to you tree. Darin, can you undo the first one and try the 2nd one. I think it should give you the same performance result.

Frank Tang

Comment 36

•

23 years ago

mark it as P1.

Priority: -- → P1

Chris Waterson

Comment 37

•

23 years ago

ftang, what do you expect the side-effects of this change to be? Also cc'ing momoi, who seemed to think this might not be a good idea...

bobj

Comment 38

•

23 years ago

Frank and I chatted about this in his cube. Both his patches generally do the same thing which is to alias cp1252 and iso-8859-1 as the same. The difference is which is the main and which is the alias charset. The big win for Frank's approach is that it is merely a mapping change and the risk of introducing new bugs is extremely low. So this does not preclude doing a better fix later. There are undesirable side effects caused by converting FROM Unicode INTO the "main" charset for codepoints outside (superset) of the iso-8859-1 set. Areas affected would be forms submission, composer, and maybe save-as. I gotta do something else right now, Frank or I will document these side effects very soon today.

Katsuhiko Momoi

Comment 39

•

23 years ago

> ftang, what do you expect the side-effects of this change to be? > Also cc'ing momoi, who seemed to think this might not be a good idea... I don't have an objection to the approach taken in the proposed patches by ftang. In mail, I did voice an objection against re-setting the default encoding to Windows-1252 just for Windows platform. This ideas was floated in some of the comments above. I don't think we should do that and stick to ISO-8859-1 instead as the default UI setting. What we do internally might be somewhat different.

vidur (gone)

Assignee

Comment 40

•

23 years ago

I've started working on the meta tag sniffing solution in the parser. ETA: Thursday, 6/14.

Frank Tang

Comment 41

•

23 years ago

>ftang, what do you expect the side-effects of this change to be? Also cc'ing >momoi, who seemed to think this might not be a good idea... momoi's objection in mail is aginst a different proposal- change the default setting. both my patch here does not change default setting, but instead treat both charset name as the same meaning, which is not what momoi object. I personallyu prefer the later patch since then we don't need to worry about form submission. I would like to ask nhotta test mail with the later patch and make sure we didn't break anything in mail before we make decision.

nhottanscp

Comment 42

•

23 years ago

I applied the patch on 06/11/01 18:27, and found a couple of problems. * Charset menu shows "windows-1252" for pages which actual charst is "ISO-8859-1". This is also true for mail (both view and compose). * No menu check mark for the above cases. Possibly because the actual charset and the menu item do not match.

vidur (gone)

Assignee

Comment 43

•

23 years ago

Attached patch charset sniffing in parser patch v1.0 — Details — Splinter Review

vidur (gone)

Assignee

Comment 44

•

23 years ago

Attached patch patch v1.1 based on jrgm and harishd suggestions — Details — Splinter Review

nhottanscp

Comment 45

•

23 years ago

Additional info to my last comment (2001-06-13 14:35). * the charset check mark problem was for browser and mail compose, mail view does not have the problem. * mail is sent out as ISO-8859-1, but if you select windows-1252 from the menu then it sends out as windows-1252. * some code do special handling for ISO-8859-1, those may break if changing to windows-1252. e.g. aCharset.EqualsIgnoreCase("ISO-8859-1") http://lxr.mozilla.org/seamonkey/search?string=ISO-8859-1

vidur (gone)

Assignee

Comment 46

•

23 years ago

The parser sniffing code posted above looks in up to the first 2k (an arbitrary number than can be changed) of the first buffer delivered by Necko for a META tag, prior to the initial unicode conversion done by the parser. If it finds a META tag with charset information, the charset and its origin are recorded and used by the parser. The nsMetaCharsetObserver code will still be triggered, but will not do a reload. If a META tag with charset information exists, but is not in the first buffer delivered or in the first 2k of the file, we will fall back to the existing reloading scheme.

Chris Waterson

Comment 47

•

23 years ago

Vidur, I tried your patch on an optimized build against iBench. It _significantly_ cuts down on first-page load time, but it appears that it also negatively impacts cached page load time. I'm not sure why. Here are the numbers: Overall First Cached 2001-06-14 (post darin's fixes) 147.65 29.93 16.82 2001-06-14 with patch 140.29 18.98 17.33 As you can see, first page dropped almost by a factor of two, but cached load increased by a bit (~3%). I was able to reproduce these results pretty consistently. (I did a second set of runs, and the overall load time fluctuated by about a second, run-to-run). I guess it'd be interesting to see if this had the same sort of effect on jrgm's tests (which don't have <META> tags). Vidur, do you think there'd be any way to consolidate this stuff with the DOCTYPE sniffing? (As an aside, just spoke with dbaron; since we don't use the strict DTD anymore, perhaps we don't need to look at the DOCTYPE at _all_ up front, and can collect the information as part of the normal parsing process?)

Chris Waterson

Comment 48

•

23 years ago

I should also add, r=waterson on your patch, if you need it. (And we check jrgm's stuff to see if it has any effects there...)

vidur (gone)

Assignee

Comment 49

•

23 years ago

jrgm is getting the same results on his run of the ibench test i.e. a slight increase in cached load performance. This is extremely curious, since the sniffing code isn't (well...shouldn't be) invoked on a cached load. jrgm says that he'd seen similar results when he manually changed our version of the ibench pages to not have a META charset. So it sounds like the increase is unrelated to the specifics of my patch. I believe dbaron's right about not needing to do upfront DOCTYPE sniffing anymore. We still need DOCTYPE information to determine whether to do strict or quirks layout, but that can be determined later in the parsing/content creation process. Harish should probably look at this.

Gagan

Comment 50

•

23 years ago

chris: did you clear out your cache for these numbers? The cached version and the first one are fairly close to each other and I wonder if that was becuz we were still running the objects from the cache.

Chris Waterson

Comment 51

•

23 years ago

Yes, I cleared my cache. (I was so surprised that I ran it twice, just to be sure -- clearing the cache before each run.)

harishd

Comment 52

•

23 years ago

vidur: I think we still make decisions, based on DOCTYPE, at the tokenizer level. Ex. When XHTML document served as text/html. Though I have to double check.

Frank Tang

Comment 53

•

23 years ago

reassign this to vidur because now it seems we want to fix this problem by using the parser charset sniffing code vidur/harishd put together. Please make sure don't check in the http://bugzilla.mozilla.org/showattachment.cgi?attach_id=37925 http://bugzilla.mozilla.org/showattachment.cgi?attach_id=38001 because that two patch may cause other undesirable side effects.

Assignee: ftang → vidur

Status: ASSIGNED → NEW

Darin Fisher

Comment 54

•

23 years ago

vidur: why the 2k limit? why not scan everything in the first ODA?

vidur (gone)

Assignee

Comment 55

•

23 years ago

Darin: the 2k limit was arbitrary and seemed like a reasonable boundary within which one would expect a META tag. Is there an upper bound on the size of the first buffer? Since the sniffing is not particularly processor intensive, maybe the question itself is moot.

John Morrison

Comment 56

•

23 years ago

I ran an opt. mozilla build, with and without vidur's patch on 3 machine (source pulled 06/18 evening, except for the 266 machine, which was done with 06/13 source). I tested against my page load test, which (since the original content didn't have any) doesn't have any 'meta charset' tags that aren't iso-8859-1. [Also note that in my test, the top level document is never cached]. The results for those tests showed no difference in page load times, which is the ~expected result. I also ran against the ibench content, configured on an internal server, and got these times. win98 800MHz/128MB #1 #2 Without patch. All iterations 132.26 131.66 First iteration 26.14 25.98 Subsequent iteration 15.16 15.10 With patch: % Change All iterations 129.41 128.91 -2.1% First iteration 18.29 18.13 -30.1% Subsequent iteration 15.87 15.83 4.8% win2k 500MHz/128MB #1 #2 Without patch. All iterations 271.03 270.72 First iteration 34.59 34.27 Subsequent iteration 33.78 33.78 With patch: % Change All iterations 247.91 248.42 9.2% First iteration 48.16 48.47 -28.7% Subsequent iteration 28.54 28.57 18.3% win98 266MHz/128MB #1 Without patch. All iterations 568.12 First iteration 94.50 Subsequent iteration 61.43 With patch: % Change All iterations 490.81 -13.6% First iteration 67.66 -28.4% Subsequent iteration 61.34 0.0% So, by sniffing and avoiding the need to reload, with the negative consequences that has, particularly on image loads, then the 'First Iteration' times drop dramatically on all machines/speeds. But, at least on the 500 and 800 system (and in waterson's earlier note) this raises the times when pulling out of the cache. I don't really understand why this is doing this, but it is, at least as measured. (I wasn't able to determine if this was some odd artifact of the way ibench does the test, but this is the real amount of time that it is taking to complete the sequence of steps [i.e., there is no time miscount for the given task]. I still think there is something else odd, since, if I immediately rerun the test without shutting down, my cached times drop, from the initial results.) One thing I confirmed (with good 'ol printf) was that we are calling nsParser::DetectMetaTag even when pulling the document from the cache. So, looks like we need to cut that out (since it is costing).

vidur (gone)

Assignee

Comment 57

•

23 years ago

John's right - we *are* calling nsParser::DetectMetaTag (and the original intl-based meta tag observer) during subsequent loads. For some reason, the charset retrieved from the cache is considered to be of a lower weighting than one determined from a meta tag (see the charset source enum in nsIParser.h). Switching the weightings may make sense, but further investigation shows that the charset retrieved from the cache may not be correct in the first place. I'll post again when I have more details.

vidur (gone)

Assignee

Comment 58

•

23 years ago

After talking with ftang, I've made the following changes: 1) The original patch that I posted just recorded the sniffed charset in the parser. This charset wasn't percolated up as it should have been. We now notify the document and its charset observers of the detected charset. 2) The original code that set the charset in the cache relied on a reload occuring. We now set the cache meta data directly in the parser. 3) The weighting of a charset recorded in cache meta data was lower than the sniffed charset. This is now no longer the case - we use the cached value rather than resniffing. I'll post the new patch momentarily. Unfortunately, none of this explains the performance results...unless rerunning the sniffing code on a subsequent reload somehow explains the performance degradation.

vidur (gone)

Assignee

Comment 59

•

23 years ago

Attached patch patch v1.2 after a conversation with ftang — Details — Splinter Review

vidur (gone)

Assignee

Comment 60

•

23 years ago

I've found a problem that *could* account for some part of the performance degradation we're seeing in the cached case. I wasn't normalizing the name of the charset sniffed from the META tag. As a result, there's a chance that we might have been going down a different intl charset conversion path than we did before. A patch to fix this problem can be found below.

vidur (gone)

Assignee

Comment 61

•

23 years ago

Attached patch patch v1.3 with charset name normalization — Details — Splinter Review

selmer (gone)

Updated

•

23 years ago

Whiteboard: [PDT+] → [PDT+] patch available

John Morrison

Comment 62

•

23 years ago

(vidur: I did a little clobbering in htmlparser and intl, and my earlier problem with loading went away. Who knows.) At any rate, I've tried patch 'v1.3' and I am still getting elevated times for the "subsequent" page loads.

neeti

Comment 63

•

23 years ago

*** Bug 86289 has been marked as a duplicate of this bug. ***

selmer (gone)

Updated

•

23 years ago

Whiteboard: [PDT+] patch available → [PDT+] patch in progress

John Morrison

Comment 64

•

23 years ago

Okay, I tried out a couple of builds from vidur, one being trunk, and the other with patch v1.3 above. I ran them on four machines: a lower-end Win98, a midrange win2k, a midrange win98 and a higher-end win98. Delta win98/266MHz/128MB All iterations 509.2 484.0 -5.0% First iteration 90.1 65.9 -26.9% Subsequent iteration 59.8 59.7 -0.3% win2k/500MHz/128MB All iterations 244.3 266.8 9.2% First iteration 46.8 33.5 -28.4% Subsequent iteration 28.2 33.3 18.1% win98/500MHz/128MB All iterations 232.0 235.8 1.7% First iteration 46.1 32.1 -30.3% Subsequent iteration 26.5 29.1 9.6% win98/800MHz/128MB All iterations 133.2 129.1 -3.1% First iteration 27.8 18.2 -34.5% Subsequent iteration 15.0 15.8 5.2% ...

Chris Waterson

Comment 65

•

23 years ago

(Just for the record, I mentioned this to vidur yesterday...) Are the cache entries (on disk) identical both with and without the change? Could we be failing to properly initialize the cache entry?

vidur (gone)

Assignee

Comment 66

•

23 years ago

Yeah, I started to look at this yesterday. The cache meta data seems to be the same - at least for the first several sites in the test. I'll do some more comparisions and post if I see anything different.

selmer (gone)

Comment 67

•

23 years ago

Converting from PDT+ to nsBranch. Let's see this on the trunk for a little before we bring it to the branch. I appreciate the effort that's gone into this, it's obviously a very hard problem.

Keywords: nsBranch

Whiteboard: [PDT+] patch in progress → patch in progress

vidur (gone)

Assignee

Comment 68

•

23 years ago

Moving to 0.9.3 since we haven't yet resolved the cache loading issue.

Target Milestone: mozilla0.9.2 → mozilla0.9.3

vidur (gone)

Assignee

Comment 69

•

23 years ago

I've spent some time discussing this with Darin and Frank and we're all stumped. Other findings include: 1) The cache entries are essentially identical for runs with and without the patch. There's a slight difference in ordering of cache meta data, but the meta data itself and the set of files in the cache are identical. 2) I primed the cache with an ibench run on a build without the patch and then ran the tests again with both builds. I still see the 4-5% difference for cached loads on my 700MHz Windows 2k system. This, along with the previous finding, absolves the cache entries (and the process of writing them out) from any direct cause. 3) I ran a TCP sniffer to confirm that we aren't going back to the network for specific files with a patched build. For both builds (with and without the patch) we get no cache misses. 4) I stepped through new and existing code with both Frank and Darin. Darin and I convinced ourselves that the code path for cached reads wasn't substantially different with the patch. In fact, given the numbers that John has come up with, it doesn't sound like problem can be correlated with additional processing (this is substantiated by Quantify runs I did that showed no substantial differences), but possibly with additional I/O. We have yet to understand what might cause us to spend additional time doing I/O. At this point, my inclination is to check the patch into the trunk and, pending a bit of testing on the trunk, get it onto the Netscape post-0.9.2 branch. The initial load time improvement for files with a META tag is significant and the cached load problems don't seem to manifest themselves in jrgm's regular load time tests (or so I've been told). Can I get r's and sr's?

harishd

Comment 70

•

23 years ago

r=harishd

Chris Waterson

Comment 71

•

23 years ago

sr=waterson. I spent some time looking at before and after jprof profiles, but nothing really sticks out. I'll attach them here for posterity's sake.

Chris Waterson

Comment 72

•

23 years ago

Attached file profile of iBench run before applying patch (.html.gz) — Details

Chris Waterson

Comment 73

•

23 years ago

Attached file profile after applying the patch — Details

Cathleen

Reporter

Comment 74

•

23 years ago

vidur, are you landing this to trunk soon?

Shanjian Li

Comment 75

•

23 years ago

In patch v1.3, position of kCharsetFromCache has been changed. I am not sure if this is really necessary. I read through the bug, and it does not seem to be the case. This change will cause a undesired side-effect. Suppose a webpage with no charset meta tag, and user does not choose any charset detector, the default one is incorrect and webpage displayed incorrectly. Now user switch on a charset detector, but no change will be made in subsequent load since charset from cache take precedence of autodetection. This is really confusing to users. I would suggest that we take out that change before check it in to CVS.

vidur (gone)

Assignee

Comment 76

•

23 years ago

The patch went into the trunk on 6/29. The movement of kCharsetFromCache was required to prevent resniffing of the META tag on cached loads. It's debatable whether the charset sniffing itself is significant enough in the scheme of things to make a difference...but it seemed like a good thing to eliminate, especially in light of the performance we're getting for cached loads. While the case that Shanjian mentions will result in unexpected behavior, raising the priority of the charset in the cache helps other cases. For example, it preserves any charset that's forced by the user through the charset menu. Prior to the change, the forced charset is trumped by (possibly incorrect) autodetection.

Shanjian Li

Comment 77

•

23 years ago

I disagree with you about the cache change. Yes, it improves in some other cases. Because of that, I have been tring to make such a change. But soon after I found out the gain is not worth the loss, and I had to backed out my change. Please reconsider the scenario I raised, it is very common one. User always try to change charset/detector only after they see a page displayed incorrectly. If this does not work, the whole charset menu is not much useful. In my opinion, that is unacceptable.

lchiang

Comment 78

•

23 years ago

adding vtrunk keyword to indicate the bug has been fixed on the trunk. Bug left open for tracking checkin to branch (nsbranch) when appropriate. Once bug has been fixed on the branch also, pls remove vtrunk keyword.

Keywords: vtrunk

Katsuhiko Momoi

Comment 79

•

23 years ago

The change in this area really worries me now. This is not just an engineering issue. It is a usability issue. I would like to know what menu action changes will be affected by the proposed fix. > While the case that Shanjian mentions will result in unexpected behavior, > raising the priority of the charset in the cache helps other cases. What does this mean? Does it mean that the cached charset is not overridable? We need more explanation of how the proposed fix will change the known behavior. Without additional explanation and consideration of typical scenarios, it would be very difficult to evaluate the real worth of the fix proposed here. I propose a meeting to discuss the issues involved.

vidur (gone)

Assignee

Comment 80

•

23 years ago

The cached charset is overridable by a charset forced by the user. To be frank, I'm not sure what the correlation is between the different charset-related menu items and the charset source enumeration. I'm willing to attend a meeting set up by the i18n group.

Shanjian Li

Comment 81

•

23 years ago

Yes, even after you change, user forced charset will override cached charset value. But newly selected autodetection result will not be able to override cached charset, that's the problem.

Katsuhiko Momoi

Comment 82

•

23 years ago

> Yes, even after you change, user forced charset will override > cached charset value. But newly selected autodetection result > will not be able to override cached charset, that's the problem. I experimented with the trunk build with the current patch in it, and I find that auto-detect cannot oveeride the cached value. We have had a spec on charset menu actions out there since before NS 6.0/Mozilla M18. The spec calls for both the user-initiated charset menu and auto-detect menu change to override the HTTP, Document-based Meta charset, or the cached charset value. The fact that auto-detection cannot oveeride the cached value as it is in the trunk build is not acceptable. We have a consistent meaning to user-initiated menu action, i.e. give it the priority it deserves. Not only this is the correct thing to do but also what international users deserve. Insetad of forcing the use to try out 3 or more menu items to display incorreclty loaded page, auto-detect override let you do this in 1 move. IE of course provides this and so have we so far until this patch. I request that we bring back the ability to have the auto-detect oveeride the cached value. What we have currently is simply a regression against the spec we had agreed on, You should not change the user spec.

vidur (gone)

Assignee

Comment 83

•

23 years ago

Also, refer to the patch, in bug 88746 ( a spin off from this bug ), that fixes a problem in DetectMetaCharset() method.

Frank Tang

Comment 84

•

23 years ago

we found several problem with the current landing on the trunk. please do not land them into branch untill we address all of them. I think we should at least do the following on the trunk 1. back out the nsIParser.h change. Don't change the charset source priority. It is too risky. And while we have both meta charset and auto detect, it may cause some problem. 2. add "ignore UTF-16,etc" code into the parser so we just ignore those charset. like what we did in nsMetaCharsetObserver.

harishd

Updated

•

23 years ago

Blocks: 88746

vidur (gone)

Assignee

Comment 85

•

23 years ago

Fixes in bugs 89155, 89177 and 89169 are related to getting this patch on the branch. The last involves reverting the weighting of the charset out of the cache back to its original value. A fall out of this will be resniffing of the META tag even for cached loads. Based on testing we did when we were looking at the cached loading performance issues a while ago, I believe that there won't be measurable performance impact from the resniffing.

harishd

Comment 86

•

23 years ago

Please also refer to the patch in bug 89732.

Blocks: 89732

selmer (gone)

Comment 87

•

23 years ago

Putting back PDT+, looks like we have something in hand now.

Whiteboard: patch in progress → PDT+, patch in progress

harishd

Updated

•

23 years ago

Depends on: 90479

selmer (gone)

Comment 88

•

23 years ago

Any update on this bug? Is it ready to go in yet?

harishd

Comment 89

•

23 years ago

It's almost ready to go. I'm incorporating the patches into my branch build. Once it's done ( and compiled ) I'll be ready to land. However, I will not be able to land the changes today.

Eric Pollmann

Comment 90

•

23 years ago

Attached patch patch v1.4 [ patches integrated ] — Details — Splinter Review

harishd

Comment 91

•

23 years ago

Oops, the previous patch was from me not pollmann!!!

harishd

Comment 92

•

23 years ago

fix landed on the branch.

Nisheeth Ranjan

Comment 93

•

23 years ago

Now that the patch is on the branch and the trunk, marking bug fixed.

Status: NEW → RESOLVED

Closed: 23 years ago → 23 years ago

Resolution: --- → FIXED

lchiang

Updated

•

23 years ago

Keywords: vtrunk

Yuying Long

Comment 94

•

23 years ago

Verified the fix together with Frank on 07-17 Win32 branch build.

Status: RESOLVED → VERIFIED

treat windows-1252 as ISO-8859-1 23 years ago Frank Tang 762 bytes, patch		Details \| Diff \| Splinter Review
treat ISO-8859-1 as windows-1252 23 years ago Frank Tang 1.22 KB, patch		Details \| Diff \| Splinter Review
charset sniffing in parser patch v1.0 23 years ago vidur (gone) 10.49 KB, patch		Details \| Diff \| Splinter Review
patch v1.1 based on jrgm and harishd suggestions 23 years ago vidur (gone) 10.52 KB, patch		Details \| Diff \| Splinter Review
patch v1.2 after a conversation with ftang 23 years ago vidur (gone) 21.89 KB, patch		Details \| Diff \| Splinter Review
patch v1.3 with charset name normalization 23 years ago vidur (gone) 22.17 KB, patch		Details \| Diff \| Splinter Review
profile of iBench run before applying patch (.html.gz) 23 years ago Chris Waterson 287.50 KB, application/octet-stream		Details
profile after applying the patch 23 years ago Chris Waterson 292.41 KB, application/octet-stream		Details
patch v1.4 [ patches integrated ] 23 years ago Eric Pollmann 21.69 KB, patch		Details \| Diff \| Splinter Review