82244 - PDT+ ibench regression between 5/18-5/21

Reporter

Description

•

24 years ago

paw did additional test on ibech and we got some interesting results... machine used: win98 266mHz 128Mb RAM 5/18 5/18 (with 5/22 nkcache.dll) ----------------- ------------------------ all iteration 607.80 937.25 first (no cache) 85.63 118.31 subsequent 74.60 116.99 paw re-ran 5/18 build just to verify that it is getting the same results we got before. then he removed nkcache.dll and replaced with new nkcache.dll from 5/22, deleted and regenerated component.reg, removed cache (created new profile) ran ibench and generated the results.

chris hofmann

Comment 1

•

24 years ago

getting this on the 0.9.1 target milestone radar...

Summary: ibench regression → ibench regression between 5/18-5/21

Target Milestone: --- → mozilla0.9.1

Gagan

Comment 2

•

24 years ago

don't think it can make it to 0.9.1 since we don't know as yet the details of whats causing it let alone the fix.

Target Milestone: mozilla0.9.1 → mozilla0.9.2

John Morrison

Comment 3

•

24 years ago

I'll be looking into this this evening.

John Morrison

Comment 4

•

24 years ago

I tried the above "removed [5/18] nkcache.dll and replaced with new [5/21] nkcache.dll, deleted and regenerated component.reg, removed cache", and ran with the page-loader. But ... after deleting the component.reg and running, I ran with '-console', and it said: nsNativeComponentLoader:SelfRegisterDll(C:\JRGM\051813\components\nkcache.dll) Load FAILED with error: error 31 So (for whatever reason), this means I done got no cache. We already know that with no cache, on a fast network, then that makes us faster, particularly as compared to the pre-L2 cache, where we were doing even more overhead work. When I ran in this state, compared to running a 5/18 'as-installed', I got the following times with the page-loader #1 #2 #3 5/18 'as-installed' : 2271 1620 1631 5/18 + 5/21 nkcache.dll (unregistered) : 1650 1669 1644 Okay, so ... 1) I'm faster in my test, after "breaking" the cache. 2) My cached and uncached times are ~equal 3) point (1) is inconsistent with paw's numbers above 4) point (2) is consistent with paw's numbers, in that there is no effective difference between his cached and uncached times (i.e., he had no cache) One thing to consider: I reviewed the server logs, and noticed that in the 'has no cache' state, every image is fetched *including* images mentioned more than once on a page (e.g., 'pixel.gif' is fetched 59 times on every visit to netscape.com). Perhaps a 266 machine is more affected by this than a 500MHz machine, leading to the overall slowdown. [I don't really believe this]. I guess I'm going to start poking at the tests some more. (There's always a reason, there's always a reason, there's no place like home ...).

John Morrison

Comment 5

•

24 years ago

> 2) My cached and uncached times are ~equal 2) My supposedly "cached" and "uncached" times are ~equal

John Morrison

Comment 6

•

24 years ago

So, I ran the ibench test with the 500MHz win98 PC, and I got: 05/21 05/18 Total 301.1 Total 291.98 First iter. 56.63 First iter. 46.52 Subsq. iter. 34.92 Subsq. iter. 35.07 These are similar to paw's results on the other machine (modulo machine power). I grabbed the server logs from the test server, and after a little slicing and dicing, it turns out that (with post 5/18 builds) we are not caching images properly when they appear in more than one page. e.g., if '/path/to/some.gif' is used in two different pages, it is fetched once for each page, and not from the cache. (I have more detailed data that shows this happening). Now, the zdnet pages serve up a lot of pages that are identical in structure, and which reuse the same graphics page to page. So, effectively, very few of the zdnet pages are in a purely uncached state (some content that they need has been previously pulled in for earlier pages). But, in my test, each page has a unique set of images, so the uncached round is a purely uncached round, not some mixed state. That is why my results shows a lowered number (we *are* better at the initial download of content), but the ibench times go "up" (we aren't caching 'cross-page' effectively).

Keywords: perf

Cathleen

Reporter

Comment 7

•

24 years ago

btw, dialy ibench results are posted internally here: http://slip/projects/mojo/perf/i-bench.html we found out droping in the new cache dll to old build didn't work since some base string api got changed over the weekend, so new cache dll couldn't work with friday or older builds. the number paw got earlier reflects no cache runs.

Cathleen

Reporter

Comment 8

•

24 years ago

cathleen's ibench results: 5/24 commercial bld my optimzed commercial bld (with pav's changes backed out) --------------------- ----------------------- all 136.01 126.25 first 26.25 19.13 subsequent 15.68 15.30 and also, gordon's iBench results: optimized Mac build on 500MHz G4 Titanium PowerBook: L1 Disk Cache (120 k) Run #1 Run #2 Average 403.02 405.85 404.435 76.98 75.86 76.42 46.58 47.14 46.86 L2 Disk Cache (112 k) Run #1 Run #2 Average % speedup 384.32 380.25 382.285 5.79% 71.49 70.15 70.82 7.91% 44.69 44.30 44.495 5.32%

Cathleen

Reporter

Comment 9

•

24 years ago

Attached patch patch for backing out pavlov's 2 line changes — Details — Splinter Review

Cathleen

Reporter

Comment 10

•

24 years ago

gagan will help investigate what's causing pav's changes to regress ibench results. attatched the diff to backout pavlov's 2 line changes to this bug, in case anyone else is interested in helping out.

Cathleen

Reporter

Comment 11

•

24 years ago

Attached patch newer patch, diff with context — Details — Splinter Review

Chris Waterson

Comment 12

•

24 years ago

We talked about this yesterday, but I just wanted to make sure I got it on the record. I _think_ Pavlov's changes are doing the correct thing. It sounds like the real bug is that somebody is telling imagelib to force-invalidate. Even though it makes us slower, I think that pav's changes should stay.

Gagan

Comment 13

•

24 years ago

Seems like the only places where we set VALIDATE_ALWAYS is either thru Reload or the browser.cache.check_doc_frequency preference of validate always. And otherwise someone is not initializing these flags correctly somewhere... and we are getting garbage. I agree Pav's changes are correct and should stay. Can someone verify that in the lab we are not setting validate always in the cache prefs?

David Hyatt

Comment 14

•

24 years ago

Perhaps this is an interaction of Gordon and Pav's checkins? Maybe the L2 cache doesn't do the right thing here, but Pav's bug masked the problem. Maybe when Pav fixed his problem, he exposed a bug in the new cache? Just randomly guessing here. Maybe it's some other checkin that caused the force validate.

Darin Fisher

Comment 15

•

24 years ago

pav is doing the right thing. the problem here is that we reload each top level document because we detect a charset change in a meta html tag. the docshell is instructed to do a reload just as it would do if the user pressed reload, which results in the VALIDATE_ALWAYS flag being set. In the old world, pav didn't honor this flag and would just happily reuse his cached images. Now that he's fixed that bug, we're once again seeing the ramifications of the meta charset issue.

gordon

Comment 16

•

24 years ago

This will not be fixed by adding overlapped i/o to the disk cache. I don't see any reason why the images shouldn't be resused just because the text is being interpreted as a different charset. Isn't there something else docShell can do besides a full reload?

Cathleen

Reporter

Comment 17

•

24 years ago

cc'ing international folks. We need help fix charset meta reload problems. ftang, can you help? Looks like it is surfaced up again with pav' fix, which is the correct thing to do. (bug 81253) pavlov/darin, is there anyway we can fix the docshell, so that we will use cached images? also, anything we can do to use images from cache between different pages that request the same exact images?

Depends on: 81253

Cathleen

Reporter

Comment 18

•

24 years ago

assigning to pavlov.

Assignee: gordon → pavlov

Shanjian Li

Comment 19

•

24 years ago

Refer my comment in 81253, why it is the right thing to do to reload images for charset reload? That's certainly not true. If pov's fix cover a large scope other than charset reload, should we use a flag to indicate it is a charset reload and skipping image reloading?

Darin Fisher

Comment 20

•

24 years ago

yes exactly.. the solution is to figure out where to put/define that flag.

Ronald van Kuijk

Comment 21

•

24 years ago

Look at bug 83721. It is even worse when you do a shift-reload.

Shanjian Li

Comment 22

•

24 years ago

Attached patch this patch include (add flag for charset reload, pass it to docshell) — Details — Splinter Review

Shanjian Li

Comment 23

•

24 years ago

I post a patch, which added a flag for charset reload, and pass it to docshell. Inside docshell, this charset reload is treated as normal reload except that nsIRequest::VALIDATE_ALWAYS is not set. This should provide a base for a final patch that might eventually fix this problem. Let me know if I can be any further help. Since we all agreed that the page did need to be reloaded for charset change, I will close 81253.

Darin Fisher

Comment 24

•

24 years ago

r=darin on the docshell patch

Gagan

Comment 25

•

24 years ago

r=gagan as well. darin could you do the sr?

Keywords: patch

Priority: -- → P3

Darin Fisher

Comment 26

•

24 years ago

i defer to rpotts for the sr= on this one. i'm not so familar with the docshell code.

rpotts (gone)

Comment 27

•

24 years ago

Radha, Can you look at the attached patch - especially the checks around the calls to SetCacheKey(...). Does this look right? If so, then sr=rpotts :-)

Radha on family leave (not reading bugmail)

Comment 28

•

24 years ago

I think the calls around SetCacheKey() are harmless. By the time Reload() is called upon charset detection, the right SH entries are set and the cachekey is obtained from the right entry.

Cathleen

Reporter

Comment 29

•

24 years ago

looks like patch for this bug is reviewed. who should be the real owner now to check this in??

rpotts (gone)

Comment 30

•

24 years ago

After thinking about this patch a bit more... I have one question: Why is the new 'LOAD_RELOAD_CHARSET_CHANGE' load flag different from the existing 'LOAD_HISTORY' flag? It seems like in the charset reload case, you want to pull as much of the content from the cache as possible - just like the history case... If this is true, then you don't need to add all the code to deal with the new load flag... Instead, in nsWebShell::ReloadDocument(...) you can simply do: return Reload(LOAD_HISTORY); What am I missing? -- rick

Darin Fisher

Comment 31

•

24 years ago

rick: i think you're right on about that! shanjian?

Cathleen

Reporter

Comment 32

•

24 years ago

reassign to shanjian, since he owns the patch for fixing this bug... :-)

Assignee: pavlov → shanjian

patch for backing out pavlov's 2 line changes 24 years ago Cathleen 537 bytes, patch		Details \| Diff \| Splinter Review
newer patch, diff with context 24 years ago Cathleen 962 bytes, patch		Details \| Diff \| Splinter Review
this patch include (add flag for charset reload, pass it to docshell) 24 years ago Shanjian Li 6.76 KB, patch		Details \| Diff \| Splinter Review
alternate patch which uses LOAD_HISTORY instead of defining a new load flag... 24 years ago rpotts (gone) 1.83 KB, patch		Details \| Diff \| Splinter Review
backout old patch and use the new patch 24 years ago Frank Tang 7.51 KB, patch		Details \| Diff \| Splinter Review