Closed Bug 77269 Opened 23 years ago Closed 23 years ago

ibench test performance regression

Categories

(Core Graveyard :: Tracking, defect)

defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9

People

(Reporter: cathleennscp, Assigned: darin.moz)

References

Details

(Keywords: perf, Whiteboard: investigating)

Attachments

(2 files)

Tom narrowed down to an 8 hr window where the regression happened.
between 041620 build (in 04-16-22 win32 ftp dir) and 041704 build (in 04-17-06 
win32 ftp dir).

I'm going to pull builds by time stamps, and Tom will run tests to find a more 
percised time frame.
Blocks: 71668
Severity: normal → major
Keywords: perf
run this query to see a list of fishy checkins (provided by waterson):

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&bra
nchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&d
ate=explicit&mindate=04%2F16%2F2001+20%3A00%3A00&maxdate=04%2F17%2F2001+04%3A00%
3A00&cvsroot=%2Fcvsroot
Can this be tested also with debug builds? If yes how can do it. I am suspecting
my checkin to BasicTableLayoutStrategy.cpp.
gordon is on the hook for that day, and he beleives in the 
grand unification theory.
Assignee: cathleen → gordon
Whiteboard: third on gordon's list toi get to.
Target Milestone: --- → mozilla0.9
Chris Hofmann wrote:
 
Here is a  big hairy grand unification theory.....

IIRC the double read on the pages that jrgm sees comes out
of the need to look at the header to figure out the char set conversion.
The old cache had some clever voodoo of just looking at just the first X
bytes to figure this out, then go back to load the entire document.
In conversation with gordon just before the disabling of the old cache
this was still on the issues list of how to handle it.

Gordon,  is it possible that the mechanism to peek into the doc
to figure out char set is not working correctly and is slowing us down?
I also see that our big leak bug
http://bugzilla.mozilla.org/show_bug.cgi?id=75641
has now been associated with char set conversions..

------- Additional Comments From alexeyc@bigfoot.com  2001-04-24 00:51 -------
Oh, and if it's of any help, I have Auto-Detect set on Russian.
It's my wild guess, but probably with Auto-Detect on, after processing the pages
are left in memory and accumulate as you browse/reload.

Are we hanging on to the first top level document under some conditions?

I'm I letting my mind run wild with this big harmonic coverange theory
or will fixing up the peaking into to the doc to determine the char set
conversion fix up all our problems?...  if it will is there a bug?

chris h.
 

Gordon Sheridan wrote:
The cache checkins were to remove the old cache from the build because it was no
longer used.  The cache checkins for nsDiskCacheBlockFile are not part of the
current build.

-Gordon
 
 

Chris Waterson wrote:
  So, if I understand this correctly, something went sour between the
build that we pushed at 8pm on 4/16 and the build we pushed at 4am on
4/17 (judging by the number of checkins, this must be in GMT?) Anyway,
here's the bonsai link for those checkins.Somebody double-check to make
sure I've actually queried the right set of changes.

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=04%2F16%2F2001+20%
AFAICT, there are plenty of big cache and layout checkins here, with a
bit of event queue hackery just to spice things up.

chris
 


jrgm wrote:

So, the zdnet pages work like this.

1) All the pages hardwire an onload handler into the <body> element:
     <body onload='document.location="nextpageinsequence.html"'>
2) The last page in a cycle pings a JS enabled page, which reads a
   timestamp cookie, and updates it with that cycle times. At the
   end of all the cycles, it does a form submission to a server.

I grabbed a copy of all the zdnet content, and put it on a local server
so I could see the server logs. I snipped out step (2) and just made
the document.onloads into a continuous loop.

I grabbed 2001041612 and 2001041706 for win32(win2k), created new, *default
profiles* for each and started the cycle running. After three cycles, I'd
hit the throbber to break out of the loop.

What I see in the server logs:

Apr 16 build
 - cycle 1   - pages and images are fetched for each URL (200 OK)
             - however, each top level, html document is fetched TWICE!!!
               (That's a bug, but it turns out that's not this bug).

 - cycle 2/3 - silence. No 200's, no 304's.

Apr 17 build
 - cycle 1   - pages and images are fetched for each URL (200 OK)
             - each top level, html document is still fetched TWICE!!!

 - cycle 2/3 - a 304 for every image on every page
             - a '200 OK' for every top level html document, repeated TWICE!!!

Okay, so which build runs faster :-]

(Note: the reason that the in-house page loader test didn't pick this up, is
that each top level document has a unique URL (due to the fact I use it to
pass information back to the server in the query string. A bug/feature!).
So, every top level document should be 200 OK in my test harness, so it
didn't see this bug happen).

Do we have a bug on file for this slowdown? (I will file a bug for the double fetch,
that was already happening in Apr 16 builds (at least for the way that zdnet
triggers
page loads)).

John
 

Chris Waterson wrote:
>
>   So, if I understand this correctly, something went sour between the
> build that we pushed at 8pm on 4/16 and the build we pushed at 4am on
> 4/17 (judging by the number of checkins, this must be in GMT?) Anyway,
> here's the bonsai link for those checkins.Somebody double-check to make
> sure I've actually queried the right set of changes.
>
>
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=04%2F16%2F2001+20%3A00%3A00&maxdate=04%2F17%2F2001+04%3A00%3A00&cvsroot=%2Fcvsroot
>
> AFAICT, there are plenty of big cache and layout checkins here, with a
> bit of event queue hackery just to spice things up.
>
> chris
>
> Tom Everingham wrote:
>
> > It appears as though the largest regression for ibench takes place
> > between the 4-16 and 4-17 builds.  Cache seems to be working to improve
> > performance by 22.9 and 25.4 seconds for the two runs. - Tom
> >
> > 2001041620
> > All iterations - 382.11
> > First iteration - 67.80
> > Subsequent iteration - 44.90
> >
> > 2001041704  (4-17-06)
> > All iterations - 639.96
> > First iteration - 102.22
> > Subsequent iteration - 76.82
> >
 
taking.
Assignee: gordon → darin
ok, i've been playing around with the slashdot.org test page from
jrgm.mcom.com/page-loader/loader.pl, and i noticed that some <noscript>
images are being downloaded every time!!  these images should not even
be downloaded.  otherwise, i don't see any problems on this particular
page with multiple downloads or excessive validations.
jrgm told me that the double-load problem was happening with builds before 4-16,
and i've somewhat tracked it down to nsParser::OnDataAvailable returning 
NS_ERROR_HTMLPARSER_STOPPARSING on the first load.  this causes HTTP to cancel
the page load and doom the cache entry.  the second load successfully fills
out the cache entry, and subsequent loads of the page are taken from the cache
as expected.

does anyone know why nsParser could be returning this STOPPARSING nsresult??
btw: i'm testing a debug pull from today 4/24.
ok, the image requests resulting in conditional HTTP requests with 304 "Not
Modified" responses seem to be resulting directly from the VALIDATE_ALWAYS
loadFlag being set on the HTTP channel for those images.  I don't understand
how or why this flag is being set, but from what I can tell, HTTP is simply
doing what it has been told to do.
Ok, it seems like VALIDATE_ALWAYS is always being set on the toplevel document
load by nsDocShell::DoChannelLoad.  I'm not sure why this is the case since
we're just doing normal loads.  At any rate, this causes VALIDATE_ALWAYS to be
set as the default load flags for the load group, which then causes all image
loads (which cannot be satisfied from the imagelib cache) to be validated.

i'm not sure how changes on the 16th might have caused this to happen.
Whiteboard: third on gordon's list toi get to. → investigating
Perhaps we need a separate pair of bugs, one to deal with the duplicate GET's
(which was happening prior to apr 17), and one to deal with the unwanted 
'if-modified-since' GET's that began happening on apr 17. Since one existed 
independent of the other, it would seem they have different root causes. 

But, in fear of breaking up chofmann's harmonic convergence groove, I will 
leave the strong force and the weak force as part of the GUT for now ....

A comment on the duplicate GET's: I have only seen this happen with the i-bench
pages. One thing to note about these pages is the each document load is 
initiated immediately in the onload handler of the previous page. My test, 
which (default) delays the load of a subsequent page by 1 second, does not 
show the double GET. (Is it possible that the first attempt to load the next 
page fails because the window.onload event happens before some other per-page
cleanup code has yet to be run?).
the double reload seems to result from a page with a meta tag like:

<meta content="text/html; charset=windows-1252" http-equiv=Content-Type>

the charset "change" seems to force a reload, which causes VALIDATE_ALWAYS
to be set.  with VALIDATE_ALWAYS set on the toplevel document, all sub-documents
(in particular images) get the VALIDATE_ALWAYS load flag as well.

this would seem to explain both seemingly unrelated problems.  i'm not sure what
exactly changed to cause this regression.  i suspect some code in
nsHTMLDocument.cpp that was storing charset values in the cache, but i'm not
sure exactly how doing so would prevent the double load.
cc'ing ftang for help!  i don't understand what our "original/current" solution
is to this problem... how is the browser supposed to handle this problem??
See related http://bugscape.mcom.com/show_bug.cgi?id=4540 (originally 
filed as bugzilla bug 64612) "Auto-Detect (All)" causes a duplicate HTTP GET 
(or POST) to be sent". [I'll note that I filed that over four months ago, 
and the only action has been to ask me to file it in bugscape instead].
here's a simple testcase which demonstrates the problem

http://status.mcom.com/~darin/test-meta-charset.html
It's not clear how easy we can change content-type, or really charset, in the
middle of parsing the document's head.  The old codebase, IIRC, let the cache
finish filling and (possibly without blocking) also re-read the cache file from
the beginning, redecoding characters using an old netlib stream converter, and
reparsing via the old ebina HTML layout engine.

Why don't we try to do the same sort of thing in the new codebase?  Cc'ing
harishd.

/be
In response to Darin's 2001-04-24 17:17 comment:

Does anyone know why nsParser could be returning this STOPPARSING nsresult?

STOPPARSING is ment to shutdown the parsing engine for the current document ( 
usually when we encounter META tag with a charset ). But, I wonder, how this 
message will be useful for netlib. In other words, can we not propagate this 
message beyond the parser?  Darin, how does netlib react to canelling a document 
load?
Typo!

canelling => cancelling
turns out i was mistaken about the <noscript> images loading... in the <script>
case there are some image URLs that contain "the current time" as a string.
somehow i matched these URLs up to the <noscript> image URLs... so false alarm!
i've attached a patch that makes the "store the charset in the cache" logic work
with the new cache.  this doesn't fix the initial double-fetch problem, but it
does make it so that later fetches are taken from the cache, which in turn
prevents the proliferation of the VALIDATE_ALWAYS flag.
Status: NEW → ASSIGNED
Keywords: patch
I don't understand the charsetSource comparisons (in the new or original), but 
the cache code looks good.  Darin, what was the difference between this patch and 
the original we tried to land last week?

r=gordon.
i believe it is the rv=NS_OK line that was left out of the previous attempt.
Nits only:

No point in declaring or initializing cachingChan outside the if (httpChannel)
then-block:

+    nsCOMPtr<nsICachingChannel> cachingChan = do_QueryInterface(httpChannel);
+    if (httpChannel) {
+      nsCOMPtr<nsISupports> cacheToken;
+      cachingChan->GetCacheToken(getter_AddRefs(cacheToken));

Someone violated When-in-Rome with 'if(' style, three-space indentation, and
rv=NS_OK (no spaces on either side of =), but I see no need to preserve those
style violations in your reworking of the charset-in-cache code.

Fix if you can, sr=brendan@mozilla.org with or without for 0.9.

/be
turns out the "if (httpChannel)" should have been "if (cachingChan)" ... this
didn't manifest itself as a bug since nsHTTPChannel implements both nsIHTTPChannel
and nsICachingChannel, but it could have become trouble once other protocols 
start implementing nsICachingChannel.
fix checked in on the trunk and the 0.9 branch.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
tever, can you generate some new ibench numbers for us, now that darin's fix is 
checked in last night?  :-)
We can generate numbers with the current build. However, independent of darin's 
change, these numbers should|will be lower, due to hyatt's change. 

But tever can confirm the fix by going to http://jrgm/zdnet2 for a testcase,
and reviewing the server logs. Tom, I can go through this with you if you want.
Cathleen, I posted the ibench numbers earlier and they show an improvement, 
probably because of a combination of darins and hyatts changes.  I believe jrgm 
is still looking into Darin's fix ... from what I saw it looked good.
Yeah, tever and I looked at this, and darin's fix knocks out the problem of 
the "proliferation of the VALIDATE_ALWAYS flag" which shows up in the server
logs in the second and third cycle through (as HTTP 304 for all the images on 
the pages and HTTP 200 (twice) for the top level HTML document.) So that
problem is fixed. Yay, darin!

However, given that hyatt's change landed with darin's, we can't use the test 
results to say with certainty whether the charset issue was the only reason
we regressed between 4/16 and 4/17. (I think there may have been something 
else involved, since we regressed on both initial and subsequent page load 
times; darin's fix would only address performance for subsequent visits, no?).
Comments?

Two other lingering issues: 

1) I do see a small number of 304's and 200's for images on the second and 
third cycle through the pages. In general there should be none (modulo any 
special (e.g., no-cache) HTTP headers, or loading enough content that we would 
be forced to evict, but neither of those cases apply to the test at 
jrgm/zdnet2). I'm looking some more at this and will file a bug.

2) as darin notes, the problem of "double-fetching" the document when a 
charset is encountered in the document still occurs. There is a bug
http://bugscape.mcom.com/show_bug.cgi?id=4540 (originally filed as bugzilla 
bug 64612) for this happening in another scenario (using auto-detect).
But I think I will file a bug to cover the case (seen in mozilla and commercial 
builds) where a charset meta tag in a document forces this behaviour. 

So, two bugs to file, and an open question whether there was something else
behind the slowdown 4/17. Marking this bug verified for the 'VALIDATE_ALWAYS'
issue.
Status: RESOLVED → VERIFIED
Filed:
[Bug 78018] New: Document with charset != iso-8859-1 is GET twice from server
[Bug 78019] New: Some images loaded from server, but they should be in cache
[Bug 78015] New: IMG with no SRC attribute is still fetched from the server
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: