Closed Bug 77269 Opened 24 years ago Closed 24 years ago

ibench test performance regression

Categories

(Core Graveyard :: Tracking, defect)

defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9

People

(Reporter: cathleennscp, Assigned: darin.moz)

References

Details

(Keywords: perf, Whiteboard: investigating)

Attachments

(2 files)

Tom narrowed down to an 8 hr window where the regression happened. between 041620 build (in 04-16-22 win32 ftp dir) and 041704 build (in 04-17-06 win32 ftp dir). I'm going to pull builds by time stamps, and Tom will run tests to find a more percised time frame.
Blocks: 71668
Severity: normal → major
Keywords: perf
run this query to see a list of fishy checkins (provided by waterson): http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&bra nchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&d ate=explicit&mindate=04%2F16%2F2001+20%3A00%3A00&maxdate=04%2F17%2F2001+04%3A00% 3A00&cvsroot=%2Fcvsroot
Can this be tested also with debug builds? If yes how can do it. I am suspecting my checkin to BasicTableLayoutStrategy.cpp.
gordon is on the hook for that day, and he beleives in the grand unification theory.
Assignee: cathleen → gordon
Whiteboard: third on gordon's list toi get to.
Target Milestone: --- → mozilla0.9
Chris Hofmann wrote: Here is a big hairy grand unification theory..... IIRC the double read on the pages that jrgm sees comes out of the need to look at the header to figure out the char set conversion. The old cache had some clever voodoo of just looking at just the first X bytes to figure this out, then go back to load the entire document. In conversation with gordon just before the disabling of the old cache this was still on the issues list of how to handle it. Gordon, is it possible that the mechanism to peek into the doc to figure out char set is not working correctly and is slowing us down? I also see that our big leak bug http://bugzilla.mozilla.org/show_bug.cgi?id=75641 has now been associated with char set conversions.. ------- Additional Comments From alexeyc@bigfoot.com 2001-04-24 00:51 ------- Oh, and if it's of any help, I have Auto-Detect set on Russian. It's my wild guess, but probably with Auto-Detect on, after processing the pages are left in memory and accumulate as you browse/reload. Are we hanging on to the first top level document under some conditions? I'm I letting my mind run wild with this big harmonic coverange theory or will fixing up the peaking into to the doc to determine the char set conversion fix up all our problems?... if it will is there a bug? chris h. Gordon Sheridan wrote: The cache checkins were to remove the old cache from the build because it was no longer used. The cache checkins for nsDiskCacheBlockFile are not part of the current build. -Gordon Chris Waterson wrote: So, if I understand this correctly, something went sour between the build that we pushed at 8pm on 4/16 and the build we pushed at 4am on 4/17 (judging by the number of checkins, this must be in GMT?) Anyway, here's the bonsai link for those checkins.Somebody double-check to make sure I've actually queried the right set of changes. http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=04%2F16%2F2001+20% AFAICT, there are plenty of big cache and layout checkins here, with a bit of event queue hackery just to spice things up. chris jrgm wrote: So, the zdnet pages work like this. 1) All the pages hardwire an onload handler into the <body> element: <body onload='document.location="nextpageinsequence.html"'> 2) The last page in a cycle pings a JS enabled page, which reads a timestamp cookie, and updates it with that cycle times. At the end of all the cycles, it does a form submission to a server. I grabbed a copy of all the zdnet content, and put it on a local server so I could see the server logs. I snipped out step (2) and just made the document.onloads into a continuous loop. I grabbed 2001041612 and 2001041706 for win32(win2k), created new, *default profiles* for each and started the cycle running. After three cycles, I'd hit the throbber to break out of the loop. What I see in the server logs: Apr 16 build - cycle 1 - pages and images are fetched for each URL (200 OK) - however, each top level, html document is fetched TWICE!!! (That's a bug, but it turns out that's not this bug). - cycle 2/3 - silence. No 200's, no 304's. Apr 17 build - cycle 1 - pages and images are fetched for each URL (200 OK) - each top level, html document is still fetched TWICE!!! - cycle 2/3 - a 304 for every image on every page - a '200 OK' for every top level html document, repeated TWICE!!! Okay, so which build runs faster :-] (Note: the reason that the in-house page loader test didn't pick this up, is that each top level document has a unique URL (due to the fact I use it to pass information back to the server in the query string. A bug/feature!). So, every top level document should be 200 OK in my test harness, so it didn't see this bug happen). Do we have a bug on file for this slowdown? (I will file a bug for the double fetch, that was already happening in Apr 16 builds (at least for the way that zdnet triggers page loads)). John Chris Waterson wrote: > > So, if I understand this correctly, something went sour between the > build that we pushed at 8pm on 4/16 and the build we pushed at 4am on > 4/17 (judging by the number of checkins, this must be in GMT?) Anyway, > here's the bonsai link for those checkins.Somebody double-check to make > sure I've actually queried the right set of changes. > > http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=04%2F16%2F2001+20%3A00%3A00&maxdate=04%2F17%2F2001+04%3A00%3A00&cvsroot=%2Fcvsroot > > AFAICT, there are plenty of big cache and layout checkins here, with a > bit of event queue hackery just to spice things up. > > chris > > Tom Everingham wrote: > > > It appears as though the largest regression for ibench takes place > > between the 4-16 and 4-17 builds. Cache seems to be working to improve > > performance by 22.9 and 25.4 seconds for the two runs. - Tom > > > > 2001041620 > > All iterations - 382.11 > > First iteration - 67.80 > > Subsequent iteration - 44.90 > > > > 2001041704 (4-17-06) > > All iterations - 639.96 > > First iteration - 102.22 > > Subsequent iteration - 76.82 > >
taking.
Assignee: gordon → darin
ok, i've been playing around with the slashdot.org test page from jrgm.mcom.com/page-loader/loader.pl, and i noticed that some <noscript> images are being downloaded every time!! these images should not even be downloaded. otherwise, i don't see any problems on this particular page with multiple downloads or excessive validations.
jrgm told me that the double-load problem was happening with builds before 4-16, and i've somewhat tracked it down to nsParser::OnDataAvailable returning NS_ERROR_HTMLPARSER_STOPPARSING on the first load. this causes HTTP to cancel the page load and doom the cache entry. the second load successfully fills out the cache entry, and subsequent loads of the page are taken from the cache as expected. does anyone know why nsParser could be returning this STOPPARSING nsresult??
btw: i'm testing a debug pull from today 4/24.
ok, the image requests resulting in conditional HTTP requests with 304 "Not Modified" responses seem to be resulting directly from the VALIDATE_ALWAYS loadFlag being set on the HTTP channel for those images. I don't understand how or why this flag is being set, but from what I can tell, HTTP is simply doing what it has been told to do.
Ok, it seems like VALIDATE_ALWAYS is always being set on the toplevel document load by nsDocShell::DoChannelLoad. I'm not sure why this is the case since we're just doing normal loads. At any rate, this causes VALIDATE_ALWAYS to be set as the default load flags for the load group, which then causes all image loads (which cannot be satisfied from the imagelib cache) to be validated. i'm not sure how changes on the 16th might have caused this to happen.
Whiteboard: third on gordon's list toi get to. → investigating
Perhaps we need a separate pair of bugs, one to deal with the duplicate GET's (which was happening prior to apr 17), and one to deal with the unwanted 'if-modified-since' GET's that began happening on apr 17. Since one existed independent of the other, it would seem they have different root causes. But, in fear of breaking up chofmann's harmonic convergence groove, I will leave the strong force and the weak force as part of the GUT for now .... A comment on the duplicate GET's: I have only seen this happen with the i-bench pages. One thing to note about these pages is the each document load is initiated immediately in the onload handler of the previous page. My test, which (default) delays the load of a subsequent page by 1 second, does not show the double GET. (Is it possible that the first attempt to load the next page fails because the window.onload event happens before some other per-page cleanup code has yet to be run?).
the double reload seems to result from a page with a meta tag like: <meta content="text/html; charset=windows-1252" http-equiv=Content-Type> the charset "change" seems to force a reload, which causes VALIDATE_ALWAYS to be set. with VALIDATE_ALWAYS set on the toplevel document, all sub-documents (in particular images) get the VALIDATE_ALWAYS load flag as well. this would seem to explain both seemingly unrelated problems. i'm not sure what exactly changed to cause this regression. i suspect some code in nsHTMLDocument.cpp that was storing charset values in the cache, but i'm not sure exactly how doing so would prevent the double load.
cc'ing ftang for help! i don't understand what our "original/current" solution is to this problem... how is the browser supposed to handle this problem??
See related http://bugscape.mcom.com/show_bug.cgi?id=4540 (originally filed as bugzilla bug 64612) "Auto-Detect (All)" causes a duplicate HTTP GET (or POST) to be sent". [I'll note that I filed that over four months ago, and the only action has been to ask me to file it in bugscape instead].
here's a simple testcase which demonstrates the problem http://status.mcom.com/~darin/test-meta-charset.html
It's not clear how easy we can change content-type, or really charset, in the middle of parsing the document's head. The old codebase, IIRC, let the cache finish filling and (possibly without blocking) also re-read the cache file from the beginning, redecoding characters using an old netlib stream converter, and reparsing via the old ebina HTML layout engine. Why don't we try to do the same sort of thing in the new codebase? Cc'ing harishd. /be
In response to Darin's 2001-04-24 17:17 comment: Does anyone know why nsParser could be returning this STOPPARSING nsresult? STOPPARSING is ment to shutdown the parsing engine for the current document ( usually when we encounter META tag with a charset ). But, I wonder, how this message will be useful for netlib. In other words, can we not propagate this message beyond the parser? Darin, how does netlib react to canelling a document load?
Typo! canelling => cancelling
turns out i was mistaken about the <noscript> images loading... in the <script> case there are some image URLs that contain "the current time" as a string. somehow i matched these URLs up to the <noscript> image URLs... so false alarm!
i've attached a patch that makes the "store the charset in the cache" logic work with the new cache. this doesn't fix the initial double-fetch problem, but it does make it so that later fetches are taken from the cache, which in turn prevents the proliferation of the VALIDATE_ALWAYS flag.
Status: NEW → ASSIGNED
Keywords: patch
I don't understand the charsetSource comparisons (in the new or original), but the cache code looks good. Darin, what was the difference between this patch and the original we tried to land last week? r=gordon.
i believe it is the rv=NS_OK line that was left out of the previous attempt.
Nits only: No point in declaring or initializing cachingChan outside the if (httpChannel) then-block: + nsCOMPtr<nsICachingChannel> cachingChan = do_QueryInterface(httpChannel); + if (httpChannel) { + nsCOMPtr<nsISupports> cacheToken; + cachingChan->GetCacheToken(getter_AddRefs(cacheToken)); Someone violated When-in-Rome with 'if(' style, three-space indentation, and rv=NS_OK (no spaces on either side of =), but I see no need to preserve those style violations in your reworking of the charset-in-cache code. Fix if you can, sr=brendan@mozilla.org with or without for 0.9. /be
turns out the "if (httpChannel)" should have been "if (cachingChan)" ... this didn't manifest itself as a bug since nsHTTPChannel implements both nsIHTTPChannel and nsICachingChannel, but it could have become trouble once other protocols start implementing nsICachingChannel.
fix checked in on the trunk and the 0.9 branch.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
tever, can you generate some new ibench numbers for us, now that darin's fix is checked in last night? :-)
We can generate numbers with the current build. However, independent of darin's change, these numbers should|will be lower, due to hyatt's change. But tever can confirm the fix by going to http://jrgm/zdnet2 for a testcase, and reviewing the server logs. Tom, I can go through this with you if you want.
Cathleen, I posted the ibench numbers earlier and they show an improvement, probably because of a combination of darins and hyatts changes. I believe jrgm is still looking into Darin's fix ... from what I saw it looked good.
Yeah, tever and I looked at this, and darin's fix knocks out the problem of the "proliferation of the VALIDATE_ALWAYS flag" which shows up in the server logs in the second and third cycle through (as HTTP 304 for all the images on the pages and HTTP 200 (twice) for the top level HTML document.) So that problem is fixed. Yay, darin! However, given that hyatt's change landed with darin's, we can't use the test results to say with certainty whether the charset issue was the only reason we regressed between 4/16 and 4/17. (I think there may have been something else involved, since we regressed on both initial and subsequent page load times; darin's fix would only address performance for subsequent visits, no?). Comments? Two other lingering issues: 1) I do see a small number of 304's and 200's for images on the second and third cycle through the pages. In general there should be none (modulo any special (e.g., no-cache) HTTP headers, or loading enough content that we would be forced to evict, but neither of those cases apply to the test at jrgm/zdnet2). I'm looking some more at this and will file a bug. 2) as darin notes, the problem of "double-fetching" the document when a charset is encountered in the document still occurs. There is a bug http://bugscape.mcom.com/show_bug.cgi?id=4540 (originally filed as bugzilla bug 64612) for this happening in another scenario (using auto-detect). But I think I will file a bug to cover the case (seen in mozilla and commercial builds) where a charset meta tag in a document forces this behaviour. So, two bugs to file, and an open question whether there was something else behind the slowdown 4/17. Marking this bug verified for the 'VALIDATE_ALWAYS' issue.
Status: RESOLVED → VERIFIED
Filed: [Bug 78018] New: Document with charset != iso-8859-1 is GET twice from server [Bug 78019] New: Some images loaded from server, but they should be in cache [Bug 78015] New: IMG with no SRC attribute is still fetched from the server
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: