Closed Bug 82244 Opened 21 years ago Closed 21 years ago

PDT+ ibench regression between 5/18-5/21

Categories

(Core :: Networking: Cache, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla0.9.3

People

(Reporter: cathleennscp, Assigned: tetsuroy)

References

()

Details

(Keywords: perf, Whiteboard: checked in.)

Attachments

(5 files)

paw did additional test on ibech and we got some interesting results...


machine used: win98 266mHz 128Mb RAM 

                         5/18                  5/18
                                              (with 5/22 nkcache.dll)
                   -----------------          ------------------------
all iteration            607.80                  937.25
first (no cache)          85.63                  118.31
subsequent                74.60                  116.99

paw re-ran 5/18 build just to verify that it is getting the same results we got 
before.  then he removed nkcache.dll and replaced with new nkcache.dll from 
5/22, deleted and regenerated component.reg, removed cache (created new profile)
ran ibench and generated the results.
getting this on the 0.9.1 target milestone radar...
Summary: ibench regression → ibench regression between 5/18-5/21
Target Milestone: --- → mozilla0.9.1
don't think it can make it to 0.9.1 since we don't know as yet the details of
whats causing it let alone the fix. 
Target Milestone: mozilla0.9.1 → mozilla0.9.2
I'll be looking into this this evening.
I tried the above "removed [5/18] nkcache.dll and replaced with new [5/21]
nkcache.dll, deleted and regenerated component.reg, removed cache", and ran
with the page-loader.

But ... after deleting the component.reg and running, I ran with '-console',
and it said:
nsNativeComponentLoader:SelfRegisterDll(C:\JRGM\051813\components\nkcache.dll)
Load FAILED with error: error 31

So (for whatever reason), this means I done got no cache. We already know that
with no cache, on a fast network, then that makes us faster, particularly as
compared to the pre-L2 cache, where we were doing even more overhead work.

When I ran in this state, compared to running a 5/18 'as-installed', I got the
following times with the page-loader
                                               #1        #2       #3
  5/18 'as-installed'                    :    2271      1620     1631
  5/18 + 5/21 nkcache.dll (unregistered) :    1650      1669     1644 

Okay, so ...
1) I'm faster in my test, after "breaking" the cache.
2) My cached and uncached times are ~equal
3) point (1) is inconsistent with paw's numbers above
4) point (2) is consistent with paw's numbers, in that there is no effective 
   difference between his cached and uncached times (i.e., he had no cache)

One thing to consider: I reviewed the server logs, and noticed that in the
'has no cache' state, every image is fetched *including* images mentioned more
than once on a page (e.g., 'pixel.gif' is fetched 59 times on every visit to
netscape.com). Perhaps a 266 machine is more affected by this than a 500MHz 
machine, leading to the overall slowdown. [I don't really believe this].

I guess I'm going to start poking at the tests some more. (There's always a 
reason, there's always a reason, there's no place like home ...).
> 2) My cached and uncached times are ~equal

2) My supposedly "cached" and "uncached" times are ~equal
So, I ran the ibench test with the 500MHz win98 PC, and I got:
                       
  05/21                      05/18                     
    Total          301.1       Total          291.98
    First iter.    56.63       First iter.     46.52
    Subsq. iter.   34.92       Subsq. iter.    35.07

These are similar to paw's results on the other machine (modulo machine power).


I grabbed the server logs from the test server, and after a little slicing
and dicing, it turns out that (with post 5/18 builds) we are not caching images 
properly when they appear in more than one page. e.g., if '/path/to/some.gif' 
is used in two different pages, it is fetched once for each page, and not from 
the cache. (I have more detailed data that shows this happening).

Now, the zdnet pages serve up a lot of pages that are identical in structure,
and which reuse the same graphics page to page. So, effectively, very few of
the zdnet pages are in a purely uncached state (some content that they need
has been previously pulled in for earlier pages). But, in my test, each page 
has a unique set of images, so the uncached round is a purely uncached round, 
not some mixed state. 

That is why my results shows a lowered number (we *are* better at the initial
download of content), but the ibench times go "up" (we aren't caching
'cross-page' effectively).
Keywords: perf
btw, dialy ibench results are posted internally here:
http://slip/projects/mojo/perf/i-bench.html

we found out droping in the new cache dll to old build didn't work since some 
base string api got changed over the weekend, so new cache dll couldn't work 
with friday or older builds.  the number paw got earlier reflects no cache runs.

cathleen's ibench results:

               5/24 commercial bld        my optimzed commercial bld 
                                         (with pav's changes backed out) 
             ---------------------      ----------------------- 
all              136.01                          126.25 
first             26.25                           19.13 
subsequent        15.68                           15.30 



and also, 

gordon's iBench results:
optimized Mac build on 500MHz G4 Titanium PowerBook:
              
L1 Disk Cache (120 k) 
Run #1  Run #2   Average 
403.02  405.85   404.435 
 76.98   75.86    76.42 
 46.58   47.14    46.86 
  
L2 Disk Cache (112 k) 
Run #1  Run #2   Average % speedup 
384.32  380.25   382.285    5.79% 
 71.49   70.15    70.82     7.91% 
 44.69   44.30    44.495    5.32% 
gagan will help investigate what's causing pav's changes to regress ibench 
results.

attatched the diff to backout pavlov's 2 line changes to this bug, in case 
anyone else is interested in helping out.
We talked about this yesterday, but I just wanted to make sure I got it on the
record. I _think_ Pavlov's changes are doing the correct thing. It sounds like
the real bug is that somebody is telling imagelib to force-invalidate. Even
though it makes us slower, I think that pav's changes should stay.
Seems like the only places where we set VALIDATE_ALWAYS is either thru Reload or
the browser.cache.check_doc_frequency preference of validate always. And
otherwise someone is not initializing these flags correctly somewhere... and we
are getting garbage.

I agree Pav's changes are correct and should stay.

Can someone verify that in the lab we are not setting validate always in the
cache prefs?
Perhaps this is an interaction of Gordon and Pav's checkins?  Maybe the L2 cache
doesn't do the right thing here, but Pav's bug masked the problem.  Maybe when
Pav fixed his problem, he exposed a bug in the new cache?

Just randomly guessing here.  Maybe it's some other checkin that caused the
force validate.

 
pav is doing the right thing.  the problem here is that we reload each top level
document because we detect a charset change in a meta html tag.  the docshell
is instructed to do a reload just as it would do if the user pressed reload,
which results in the VALIDATE_ALWAYS flag being set.  In the old world, pav
didn't honor this flag and would just happily reuse his cached images.  Now
that he's fixed that bug, we're once again seeing the ramifications of the 
meta charset issue.
This will not be fixed by adding overlapped i/o to the disk cache.  I don't see 
any reason why the images shouldn't be resused just because the text is being 
interpreted as a different charset.  Isn't there something else docShell can do 
besides a full reload?
cc'ing international folks.  We need help fix charset meta reload problems.
ftang, can you help?  Looks like it is surfaced up again with pav' fix, which is
the correct thing to do. (bug 81253)

pavlov/darin, is there anyway we can fix the docshell, so that we will use
cached images?  also, anything we can do to use images from cache between
different pages that request the same exact images?










Depends on: 81253
assigning to pavlov.  
Assignee: gordon → pavlov
Refer my comment in 81253, why it is the right thing to do to reload images for charset reload?
That's certainly not true. If pov's fix cover a large scope other than charset reload, should 
we use a flag to indicate it is a charset reload and skipping image reloading? 
yes exactly.. the solution is to figure out where to put/define that flag.
Look at bug 83721. It is even worse when you do a shift-reload.
I post a patch, which added a flag for charset reload, and pass it to docshell. 
Inside docshell, this charset reload is treated as normal reload except that 
nsIRequest::VALIDATE_ALWAYS is not set. This should provide a base for a final 
patch that might eventually fix this problem. Let me know if I can be any further 
help.

Since we all agreed that the page did need to be reloaded for charset change, I 
will close 81253.
r=darin on the docshell patch
r=gagan as well. darin could you do the sr? 
Keywords: patch
Priority: -- → P3
i defer to rpotts for the sr= on this one.  i'm not so familar with the docshell
code.
Radha,

Can you look at the attached patch - especially the checks around the calls to 
SetCacheKey(...).

Does this look right?  If so, then sr=rpotts :-)
I think the calls around SetCacheKey() are harmless. By the time  Reload() is
called upon charset detection, the right SH entries are set and the cachekey is
obtained from the right entry.
looks like patch for this bug is reviewed.  who should be the real owner now to 
check this in??
After thinking about this patch a bit more... I have one question:

Why is the new 'LOAD_RELOAD_CHARSET_CHANGE' load flag different from the 
existing 'LOAD_HISTORY' flag?

It seems like in the charset reload case, you want to pull as much of the 
content from the cache as possible - just like the history case...

If this is true, then you don't need to add all the code to deal with the new 
load flag... Instead, in nsWebShell::ReloadDocument(...) you can simply do:
    return Reload(LOAD_HISTORY);

What am I missing?
-- rick
rick: i think you're right on about that!  shanjian?
reassign to shanjian, since he owns the patch for fixing this bug...  :-)
Assignee: pavlov → shanjian
shanjian is a new father for 2nd baby now. He won't be available till 5/26
cathleen- sorry to reassign this bug back to you. Shanjian just have a new baby. 
He will be on two weeks vacation to take care his baby for his wife. He will 
be off line for two weeks. Is that possible that you can find someone to work on 
the remaining issue. I think you can call him and ask question (see phonebook 
for telephone number) if you need. But I don't think he can work on patch with 
handful of diaper. 

I am not familar with the code in that patch and are also overloaded right now. 
Otherwise, I will jump in...
Assignee: shanjian → cathleen
base on previous bug comments, we can either 1) take shanjian's already reviewed 
patch to get an approval for check in or 2) make a new pacth to replace 
LOAD_RELOAD_CHARSET_CHANGE flag with LOAD_HISTORY flag and undo other necessary 
stuff in docshell

ftang, since you're the expert in charset code, you should own this bug, review 
and decide what we need to do.
Assignee: cathleen → ftang
if changing nsWebShell::ReloadDocument(...) to use LOAD_HISTORY instead of 
LOAD_NORMAL will fix the problem, then I would definately go with it rather than  
the current patch...

Changing the load flag is an isolated '1 line change' which only effects the 
codepath when a meta charset reload is occurring...

The current docshell patch is much bigger and effects the code paths of *all* 
URL loads.

"simple is better"
vidur is also working on a patch for parser to sniff charset tag. see bug 81253, 
and that is another way of fixing reload twice problem (should we got with that 
one instead??)

pavlov need to fix in imglib to not throw away the images from image cache, bug 
85978
Depends on: 85978
vidur's fix should help the common case of preventing the Reload (assuming the 
meta charset tag is in the first buffer of data...)  However, I think that we 
still need to fix the Reload case too.

I never got a clear answer to my question of whether changing LOAD_NORMAL to 
LOAD_HISTORY actually fixes the problem...  If so, I think that we should make 
this change too..
> I never got a clear answer to my question of whether changing LOAD_NORMAL to 
> LOAD_HISTORY actually fixes the problem...  

ftang, can you help??
rpotts-
I am not quite sure what do you exactly mean about "changing 
nsWebShell::ReloadDocument(...) to use LOAD_HISTORY instead of 
LOAD_NORMAL"
In shanjian's patch, it seems he change from LOAD_FLAGS_NONE to 
LOAD_FLAGS_CHARSET_CHANGE (which he added in the early in the patch)
It seems LOAD_FLAGS_NONE is defined in 
http://lxr.mozilla.org/seamonkey/source/docshell/base/nsIWebNavigation.idl#83

I cannot find a LOAD_HISTORY in that files. 
Status: NEW → ASSIGNED
rpotts- I reassing to you.
reassing back to me after you put the patch in. Thanks
Assignee: ftang → rpotts
Status: ASSIGNED → NEW
rpotts- any progress? need my testing?
Rick the second form looks preferable to me as long as long as equating history
to cache is safe i.e. works even when global/session history aren't there (e.g.
in embedding) or set to 0 entries.
Frank, i'm reassigning this back to you for testing and checkin...

After talking with Adam and Radha, we decided that both patches have the same
issues if session history is disabled.  In either patch, if a session history
entry is not available (ie. history is disabled), the document will be reloaded
from the network - even if it has been cached.  This is because the cache key is
obtained from the session history entry :-(

If this behavior is unacceptable, we can come up with a further patch to fix
this issue...
Assignee: rpotts → ftang
It seems loading the character correctly with my local build. I cannot verify 
the patch load from cache or network. I will ask i18ngrp to do more testing on 
different platform this afternoon.
You may want to set the Session History length to 0 entries in Prefs and load a
character-encoded  page again to see if you get the desired result.
I applied the last patch on my Macintosh and tried two URLs.
http://home.netscape.com/ja and http://www.lycos.co.jp both have META charset.
The pages loaded fine. I tried again with history length to zero, that also
loaded fine.
adamlock and radah, please r=
vidur, please sr=
Whiteboard: rpott's patch need r= and sr=
sr=vidur.
Whiteboard: rpott's patch need r= and sr=
I test it on linux and it work fine. 
Comparing differences between the last patch (from rpotts) and the patch from
6/4/01 14:06 (that has LOAD_RELOAD_CHARSET_CHANGE), I notice that this segment
of code is missing in the last patch 
@@ -4629,6 +4645,9 @@
 
     case LOAD_RELOAD_NORMAL:
         loadFlags |= nsIRequest::VALIDATE_ALWAYS;
+        break;
+
+    case LOAD_RELOAD_CHARSET_CHANGE:
         break;

ie., in latest patch, since LOAD_HISTORY flag is used for charset changes, it
will be loaded with nsIRequest::LOAD_FROM_CACHE loadFlags. Is that alright? In
the previous patch, no flags were passed to necko for a RELOAD_CHARSET_CHANGE.
If this is alright, r=radha
I think we want to load it from cache in case of this usage. However, I prefer 
rpotts to answer that explicitly since it's his patch. Once that got answer, we 
can ask a= and wait for check in.
Testing on Mac/Linux/Win show it display CJK correctly with meta tag.
cathleen- from the previous comment, I assume we don't want to check in the 
first two patches which change imglib, right?
Whiteboard: r=radha, sr=vidur, wait for rpotts to answer one Q before ask for a=
r=adamlock as well
I think loading from cache is appropriate and the latest patch from rpotts does
that. I do not understand, why the previous patch did not pass any load flag to
necko. 
assigned
Whiteboard: r=radha, sr=vidur, wait for rpotts to answer one Q before ask for a= → r=radha, sr=vidur, ask a= since 13:25 6/21
a=chofmann in email
Status: NEW → ASSIGNED
Whiteboard: r=radha, sr=vidur, ask a= since 13:25 6/21 → r=radha, sr=vidur, a=chofmann. Wait for tree green since 5:51 6/21
fix check in.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Whiteboard: r=radha, sr=vidur, a=chofmann. Wait for tree green since 5:51 6/21 → r=radha, sr=vidur, a=chofmann.
Whiteboard: r=radha, sr=vidur, a=chofmann. → r=radha, sr=vidur, a=chofmann. critical for 0.9.2
The check in caused a regression bug 87700 (HTML anchor does not work).
Reopen this, please consider different change for this.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Blocks: 87700
Whiteboard: r=radha, sr=vidur, a=chofmann. critical for 0.9.2 → new patch need review. old patch cause regression on 87700
The new patch seems work very fine 
rpotts- can you code review it?
Status: REOPENED → ASSIGNED
Whiteboard: new patch need review. old patch cause regression on 87700 → need r= from rpotts
r= radha for the latest patch
sr=rpotts
Severity: normal → critical
Summary: ibench regression between 5/18-5/21 → PDT+ ibench regression between 5/18-5/21
Whiteboard: need r= from rpotts → r=radha sr=rpotts, wait for a= since Wed, 27 Jun 2001 13:30:21 -0700
ask yokoyama to land into trunk .
yokoyama, please do not close this bug after you land into trunk. I need this 
open to land into moz0.9.2.
pushing out. 0.9.2 is done. (querying for this string will get you the list of
the 0.9.2 bugs I moved to 0.9.3)
Target Milestone: mozilla0.9.2 → mozilla0.9.3
land the new change into moz0.9.2 branch
reassign to yokoyama for trunk landing.
yokoyama- once you land into trunk, mark this and 87700 fixed. thanks.
Assignee: ftang → yokoyama
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
Whiteboard: r=radha sr=rpotts, wait for a= since Wed, 27 Jun 2001 13:30:21 -0700 → waiting for the trunk to open.
checked in.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago21 years ago
Resolution: --- → FIXED
Whiteboard: waiting for the trunk to open. → checked in.
*** Bug 17889 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.