Open Bug 268200 Opened 21 years ago Updated 3 years ago

speed up nsStandardURL creation

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

People

(Reporter: bryner, Unassigned)

References

(Depends on 2 open bugs)

Details

(Keywords: perf, Whiteboard: [necko-backlog])

Attachments

(5 files)

In order to make sure URLs are fully normalized for global history, we always create a nsIURI when resolving style for anchors. This turns out to be a significant part of frame construction time on link-heavy pages. There is some low-hanging fruit here (patch coming up) as well as some more involved tasks (like making net_CoalesceDirs only need to walk the string once instead of 3 times).
Attached patch patch — — Splinter Review
These sort of jumped out from looking at Quantify data. The first part of the patch is just passing the string length through to avoid an extra strlen. The second part is just hand-optimizing; this seems to be faster than a call to strchr() for each character.
Attachment #164983 - Flags: superreview?(darin)
Attachment #164983 - Flags: review?(darin)
Comment on attachment 164983 [details] [diff] [review] patch r+sr=darin
Attachment #164983 - Flags: superreview?(darin)
Attachment #164983 - Flags: superreview+
Attachment #164983 - Flags: review?(darin)
Attachment #164983 - Flags: review+
Assignee: bryner → nobody
QA Contact: benc → networking
(attachment 164983 [details] [diff] [review] checked in 2004-11-07) so next step is CoalesceDirs?
Taking up this bug to look at net_CoalesceDirs (and then anything else that appears to be a pageload factor). bz - you said you knew of traces that showed Url creation showing up enough to worry about perf - can you point to test scenarios and/or data that can be used to guide this? Thanks.
Assignee: nobody → rjesup
Sure. Here's a test scenario that shows the problem pretty clearly: 1) Pull m-c. Update to revision cea568a2621b. 2) Apply this patch. 3) Build. 4) Run http://dromaeo.com/?dom-modify and note the appendChild/insertBefore numbers 5) Edit nsGenericElement::UpdateEditableState and in the oldEditable != newEditable case just always call |UpdateState(aNotify)| instead of the current branching on the value of aNotify. 6) Repeat step 3. 7) Repeat step 4. 8) Observe a 2-3x slowdown in append/insert performance. That slowdown is entirely due to the nsIURIs for all the <a> elements in the DOM subtree being inserted being created eagerly with the change from step 5.
(The patch has to be applied with -R BTW) Well, I see a slowdown, but mostly in CreateElement: Before: http://dromaeo.com/?id=141218 After: http://dromaeo.com/?id=141228 append/insert slowed down by ~10%, create slowed down by ~20%. Athlon X2 5600+ (2.8GHz), 4GB ram. Ubuntu 8.04LTS, 2.6.24, 5626.01 BogoMIPS per core. Only option was "ac_add_options --enable-optimize" I'll set up jprof and see what the difference is, but I definitely don't see 2x change.
Attachment #535342 - Attachment is patch: true
Attachment #535342 - Attachment mime type: text/x-patch → text/plain
Er, replace step 5 in the steps to reproduce from comment 5 with calling UpdateState(aNotify) unconditionally, even if oldEditable == newEditable.
Thanks; I believe I see it now.
I see a jump in hits from NewURI from 0.3% total to 6% total, which is a big jump, though I'm not sure it accounts for all the change in those two measurements (we can look deeper at that). Considering NewURI: net_CoalesceDirs is only 0.4%, so while it may be of some interest, it's not a hog. Most time is spent in ::SetSpec() (3.4%) and ::Resolve (0.7%). ParseURL() is 0.7%, EncodeSegmentCount() in SetSpec() is 0.6%, and of that 1/2 (0.3%) is NS_EscapeURL(). So I don't see many hotspots here for improvement; incremental improvements are possible in a few places perhaps. Looks just like it's being called a lot. I'll attach before and after jprof's (2ms)
Yes, it being called a bunch is the problem.
I'm thinking about ways to make major improvements in NewURI performance - I can nibble at the edges, and that may help, but it's not going to produce 2x improvements. One idea (still not vetted for practicality) is to make NewURI creation largely "lazy" - defer expensive operations (parsing, allocation) until needed, and just copy the spec as-is to start. NewURI spends a lot of time parsing URIs out into pieces and then rebuilding "normalized" versions, which are almost always the same as the input. Also all the parsing and splitting supports changing pieces of a URI - something that's I believe rarely done. First step is to look at how often various nsStandardURL functions are called. See the attachment for stats for a modified dromaeo run and a standard profile start with many bugzilla and a few jprof tabs. I'll note SetXxxx() are rarely called, and uri's are rarely changed (normalize_changed_uri is 2 for dromaeo, and ~2200 of ~43000 for the other profile, but all but a handful of those 2200 are from one tab (graphs.mozilla.org). What I don't have yet is how often we need to determine if the URL needs normalization or not, or how often we might have to do the deferred parsing/etc. Still, it looks promising at first cut.
> One idea (still not vetted for practicality) is to make NewURI creation largely > "lazy" That sounds excellent! The only parts that really need to be eager are the ones that would throw from init (e.g. checking for spaces in the hostname). Various setters on URIs should be pretty rare. We need to normalize when someone calls GetSpec or various other url part getters, but I suspect we have lots of URIs where that doesn't really happen either.
Turns out more URLs are fetched than you'd think. The Dromaeo test fetches 337085 of 343925, though only 797 were modified. My test profile fetches 44038 of 46608, and modifies ~5500: normalize_changed_uri = 2993 url_was_set = 5589 url_was_get = 44038 url_wasnt_used = 2530 nsstandardurl = 46608 So, we need to look at why they're being gotten. It appears from the debugger (in Dromaeo) that they're the result of nsHTMLAnchorElement::IntrinsicState(), which calls LinkState() which adds the URL to the link-coloring history. Unfortunately, if this history is needed then the URL must be properly normalized. We can minimize the cost of normalization in many cases I think, but it will complicate things. Continuing to investigate.
Hrm. The IntrinsicState() call is in fact what causes URI creation, yes. We could change things around a bit so it just creates the URI but does not start the history lookup in the dromaeo test case. That's an obvious prerequisite to having the idea work, I think. Can you file a bug on me to do that?
Done. I'm not sure the link state lookup can be deferred enough to make this useful in the real world. In the meantime, I'm looking at a separate improvement to do fast normalization in the common cases of no changes needed.
Also note the deferring normalization means GetSpec() (and other GetXxxx() and d SetXxxx() methods) can fail where they would previously have failed in SetSpec(); so this will require looking at what that might imply.
I'm fine with explicit SetSpec normalizing as long as the standard newURI codepath doesn't.
bz: could you expand on that last comment? While waiting on the bug 661768 work, I'm working on minimizing the overhead when we do need to handle GetSpec() and return a normalized URL. I'm deciding if it's already normalized as-is without parsing everything out - which would handle a subset of URLs, but probably 90-99% of real URLs you see - I've added code to Normalize to flag when it actually changes something, which is very rare except for a few sites (like graphs.mozilla.org). The trick is writing the IsAlreadyNormalized() function such that it doesn't actually add more overhead than we save (and it never gives a false positive). We'll see how this works and if it's practical.
More issues: Equals() is called on almost all anchors when they're removed in order to do ResetLinkState() (via UnregisterVisitedCallback()). EqualsInternal() is complex enough that it may be tricky to avoid doing the full parsing for, though again I may be able to handle a subset without a full parse.
So we've been pondering attempting to replace our URL parsing wholesale with a wrapper we'd write around http://code.google.com/p/google-url/ I'm wondering how much that work would stomp on the work being done here.
stomp on? Almost totally. :-) Also on bug 125608 that I'm waiting on a final review from bz on. What's the relative performance vs nsStandardURL/etc? Both on initial creation and normalization/canonicalization? How much of a disconnect is there in the APIs? There are a lot of "modify this piece of the URL" functions in nsStandardURL, though not all of them are used. What's the timeframe/likelihood?
In any case, my investigative patch is finally showing some promise. After tweaking Equals to both short-circuit the same pointer being passed (frequent), and to string-compare if the URLs are already normalized at creation time (and we're including the ref in the Equals), my Dromaeo test only parses 3000 of 300000 URLs; my semi-real-world test (bunch of bugzilla, mozilla docs, a few jprofs, etc) parses 16000 of 35000 URLs (9000 of those are lazy parses - I may be able to significantly reduce them).
Ok; some partial results: Comparing two versions of my patch - one with suppressed parsing, one without - suppressing/deferring the full parsing on the Dromaeo test (as modified) increases performance on this machine from ~160-190 to ~215-220. A jprof shows that NewURI went from ~600 total hits to ~340; and Equals got better too ~100 to ~50. AppendChild went from ~170-260 (wide range) to ~300-320; InsertBefore went from ~135-141 to ~151-158.
Note: this is not full lazy evaluation; this is basically looking at the URL and deciding if it's already Normalized to start with, and if so deferring full parsing until needed (other than Scheme, which is done on every URL).
> bz: could you expand on that last comment? I posit that SetSpec calls that are not part of the newURI codepath are rare. Please tell me if this is not the case! Given that, I don't care about the performance of SetSpec per se as long as it does not impact newURI. > More issues: Equals() is called on almost all anchors when they're removed in > order to do ResetLinkState() (via UnregisterVisitedCallback()). This is bug 661768, no? If we're not registered with history, we don't need to unregister and whatnot. > EqualsInternal() is complex enough that it may be tricky to avoid doing the > full parsing for, though again I may be able to handle a subset without a > full parse. I wouldn't worry about it too much if bug 661768 solves the problem. How much of the URI-creation overhead is newURI outside of parsing, and can we speed that up? For comment 25, what are your numbers without the generic element changes (so effectively on trunk)?
(In reply to comment #27) > > bz: could you expand on that last comment? > > I posit that SetSpec calls that are not part of the newURI codepath are > rare. Please tell me if this is not the case! I can infer this: SetSpec in the run I'm looking at was called 56424 times, and Init was called 56409 times, and Init (unless it fails) always calls SetSpec. So at most in this run SetSpec was called via other paths 15 times. > > Given that, I don't care about the performance of SetSpec per se as long as > it does not impact newURI. Well, what I'm doing right now is to short-circuit SetSpec to avoid full parsing. If we can deal with the other issues, we may be able to make it fully lazy (lazy even for non-normalized specs). Right now it's only lazy for parsing, and then only if the URL appears to already be normalized (and most are!) The code is written to support full lazy creation. Note that it appears that almost every URL created has GetScheme() called on it, so it behooves us to cheat and at least parse out the scheme when copying the input spec. > > > More issues: Equals() is called on almost all anchors when they're removed in > > order to do ResetLinkState() (via UnregisterVisitedCallback()). > > This is bug 661768, no? If we're not registered with history, we don't need > to unregister and whatnot. Yes, probably. > > EqualsInternal() is complex enough that it may be tricky to avoid doing the > > full parsing for, though again I may be able to handle a subset without a > > full parse. > > I wouldn't worry about it too much if bug 661768 solves the problem. Well, I managed to make it avoid full parsing in many or most cases. > How much of the URI-creation overhead is newURI outside of parsing, and can > we speed that up? Well, in the modified Dromaeo test, NewURI went from 13.9% to 8.7% (600 to 373 hits). Most (7%) is Init(); of Init only 2.2% is SetSpec now. Resolve is 2.3%, a few string functions show up well below 1%, and a smattering elsewhere. 1% of Resolve's 2.2% is net_CoalesceDirs; that may be improveable. I can get a little out of SetSpec even in the lazy case by swapping the order of IsAlreadyNormalized and net_FilterURIString. > For comment 25, what are your numbers without the generic element changes > (so effectively on trunk)? Checking; I'll get back to you.
Note: all of these numbers are with my monitoring harness in there (--enable-debug, and in nsStandardURL I have call-counters on almost every routine that dump when we exit).
(In reply to comment #27) > For comment 25, what are your numbers without the generic element changes > (so effectively on trunk)? Well, over 3 runs it seems to be a little better overall - but this system has too much random load for consistent results across a small number of tests. Instead of suppressing ~1000000 ParseURLs it suppresses ~50000, because there are a lot less NewURI calls. Hits in NewURI and descendents went from 39 (1%) to 19 (1/2%). I've cut NewURI by 40-50% (in the benchmark; maybe averaging 20-40% in real-world cases. However, I'm still not sure it happens enough in non-benchmarks to be important - and even in this benchmark it's the nsGenericElement changes that turn it from 1% to 13% in the first place (by calling it a lot more). Still, saving 50% of cycles for something done even moderating often is a good thing, but we need to weigh it against the increase of complexity.
> I can infer this: Good, good! > so it behooves us to cheat and at least parse out the scheme when copying the > input spec Of course we've already looked for the scheme before that point... Might be nice to just pass it in, if it weren't for all the indirection. :( It sounds like you're not seeing the hits in nsIOService::NewURI and stuff under it but outside nsStandardURL (getting the protocol handler, etc) that I was seeing? > Well, over 3 runs it seems to be a little better overall I think I wasn't clear. Comment 25 has actual dromaeo test result numbers (presumably from an optimized build). I'm interested in what those look like for trunk, trunk + this patch, trunk + modifications to generic element + this patch. > I'm still not sure it happens enough in non-benchmarks to be important Past experience says yes. > it's the nsGenericElement changes that turn it from 1% to 13% Well, that change is just removing code that was put in place to work around this performance issue! There's other such code we could remove.
(In reply to comment #31) > It sounds like you're not seeing the hits in nsIOService::NewURI and stuff > under it but outside nsStandardURL (getting the protocol handler, etc) that > I was seeing? NewURI is called mostly from nsIOService::NewURI. 11%, 8.7% of which is NewURI(). Nothing else in nsIOService::NewURI is above 0.8% - top three amount to 1.6% (most of the rest); two are getting the Scheme, the other is GetProtocolHandler. 59273 8 (0.1%) 472 (11.0%) nsIOService::NewURI(nsACString_internal const&, char const*, nsIURI*, nsIURI**) 370 (8.7%) NewURI(nsACString_internal const&, char const*, nsIURI*, int, nsIURI**) 36 (0.8%) nsIOService::GetProtocolHandler(char const*, nsIProtocolHandler**) 20 (0.5%) nsIOService::ExtractScheme(nsACString_internal const&, nsACString_internal&) 14 (0.3%) nsStandardURL::GetScheme(nsACString_internal&) > > Well, over 3 runs it seems to be a little better overall > > I think I wasn't clear. Comment 25 has actual dromaeo test result numbers > (presumably from an optimized build). I'm interested in what those look > like for trunk, trunk + this patch, trunk + modifications to generic element > + this patch. Same test, for trunk (3 runs): 402 (538/300) trunk + this patch (3 runs): 412 (537/317) Mods to GenericElement (5 runs): 177 (min 159, max 192) Mods to GenericElement + patch (7 runs) 210 (min 176, max 224) (176 was the only run under 209 - excluding that one it's 215) Note: the test is too short I think for consistent results; at least on this machine. Those are averages of 3 runs; for example the trunk+patch was 399/425/414 for the three runs. > > it's the nsGenericElement changes that turn it from 1% to 13% > > Well, that change is just removing code that was put in place to work around > this performance issue! There's other such code we could remove. Well, you'll never optimize an operation more than not doing it... :-) Unless that code causes problems or negative side-effects, or makes maintenance tough, I'd leave them. With my patch, we cut NewURI() by 40%, but it's still 7 times as expensive with the nsGenericElement changes as it is in trunk (in the Dromaeo test).
> Unless that code causes problems or negative side-effects, or makes maintenance > tough Bingo on that third one. It makes all sorts of DOM stuff very very fragile to constantly have to keep in mind that creating URIs is expensive....
Whiteboard: [necko-would-take]
Whiteboard: [necko-would-take] → [necko-backlog]
Blocks: 1281848, 1285020
Depends on: 1301344
Depends on: 1304594
Depends on: 1304605
Depends on: 1304629
FWIW, a simple URL creation_and_init takes 1.8 + 0.1n micro secs in my machine, where n is the length of the URL. a http://IPV4/ takes 3.98 + 0.04n micro secs.
I have a simple table for the correlation between length and time The unit is micro secs. If we want to decrease the slop, parser might not be a good idea. +--------+------------+---------------------------+----------+-------------------------+ | length | preprocess | special protocol handling | parse | build normalized string | +--------+------------+---------------------------+----------+-------------------------+ | 20 | 17%/0.36 | 10%/0.23 | 15%/0.33 | 55%/1.2 | +--------+------------+---------------------------+----------+-------------------------+ | 100 | 21%/1.2 | 13%/0.8 | 17%/1.0 | 46%/2.6 | +--------+------------+---------------------------+----------+-------------------------+ | 1000 | 22%/2.7 | 11%/1.4 | 11%/1.4 | 54%/6.5 | +--------+------------+---------------------------+----------+-------------------------+
Priority: -- → P1
Priority: P1 → P3
Assignee: rjesup → nobody
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: