Open
Bug 268200
Opened 21 years ago
Updated 3 years ago
speed up nsStandardURL creation
Categories
(Core :: Networking, defect, P3)
Core
Networking
Tracking
()
NEW
People
(Reporter: bryner, Unassigned)
References
(Depends on 2 open bugs)
Details
(Keywords: perf, Whiteboard: [necko-backlog])
Attachments
(5 files)
3.06 KB,
patch
|
darin.moz
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
228.25 KB,
patch
|
Details | Diff | Splinter Review | |
599.93 KB,
text/html
|
Details | |
760.15 KB,
text/html
|
Details | |
535.44 KB,
text/plain
|
Details |
In order to make sure URLs are fully normalized for global history, we always
create a nsIURI when resolving style for anchors. This turns out to be a
significant part of frame construction time on link-heavy pages. There is some
low-hanging fruit here (patch coming up) as well as some more involved tasks
(like making net_CoalesceDirs only need to walk the string once instead of 3 times).
Reporter | ||
Comment 1•21 years ago
|
||
These sort of jumped out from looking at Quantify data. The first part of the
patch is just passing the string length through to avoid an extra strlen. The
second part is just hand-optimizing; this seems to be faster than a call to
strchr() for each character.
Reporter | ||
Updated•21 years ago
|
Attachment #164983 -
Flags: superreview?(darin)
Attachment #164983 -
Flags: review?(darin)
Comment 2•21 years ago
|
||
Comment on attachment 164983 [details] [diff] [review]
patch
r+sr=darin
Attachment #164983 -
Flags: superreview?(darin)
Attachment #164983 -
Flags: superreview+
Attachment #164983 -
Flags: review?(darin)
Attachment #164983 -
Flags: review+
Reporter | ||
Updated•19 years ago
|
Assignee: bryner → nobody
QA Contact: benc → networking
Comment 3•19 years ago
|
||
(attachment 164983 [details] [diff] [review] checked in 2004-11-07)
so next step is CoalesceDirs?
Comment 4•14 years ago
|
||
Taking up this bug to look at net_CoalesceDirs (and then anything else that appears to be a pageload factor).
bz - you said you knew of traces that showed Url creation showing up enough to worry about perf - can you point to test scenarios and/or data that can be used to guide this? Thanks.
Assignee: nobody → rjesup
![]() |
||
Comment 5•14 years ago
|
||
Sure. Here's a test scenario that shows the problem pretty clearly:
1) Pull m-c. Update to revision cea568a2621b.
2) Apply this patch.
3) Build.
4) Run http://dromaeo.com/?dom-modify and note the appendChild/insertBefore
numbers
5) Edit nsGenericElement::UpdateEditableState and in the
oldEditable != newEditable case just always call |UpdateState(aNotify)|
instead of the current branching on the value of aNotify.
6) Repeat step 3.
7) Repeat step 4.
8) Observe a 2-3x slowdown in append/insert performance.
That slowdown is entirely due to the nsIURIs for all the <a> elements in the DOM subtree being inserted being created eagerly with the change from step 5.
Comment 6•14 years ago
|
||
(The patch has to be applied with -R BTW)
Well, I see a slowdown, but mostly in CreateElement:
Before: http://dromaeo.com/?id=141218
After: http://dromaeo.com/?id=141228
append/insert slowed down by ~10%, create slowed down by ~20%.
Athlon X2 5600+ (2.8GHz), 4GB ram. Ubuntu 8.04LTS, 2.6.24, 5626.01 BogoMIPS per core. Only option was "ac_add_options --enable-optimize"
I'll set up jprof and see what the difference is, but I definitely don't see 2x change.
![]() |
||
Updated•14 years ago
|
Attachment #535342 -
Attachment is patch: true
Attachment #535342 -
Attachment mime type: text/x-patch → text/plain
![]() |
||
Comment 7•14 years ago
|
||
Er, replace step 5 in the steps to reproduce from comment 5 with calling UpdateState(aNotify) unconditionally, even if oldEditable == newEditable.
Comment 8•14 years ago
|
||
Thanks; I believe I see it now.
Comment 9•14 years ago
|
||
I see a jump in hits from NewURI from 0.3% total to 6% total, which is a big jump, though I'm not sure it accounts for all the change in those two measurements (we can look deeper at that).
Considering NewURI:
net_CoalesceDirs is only 0.4%, so while it may be of some interest, it's not a hog. Most time is spent in ::SetSpec() (3.4%) and ::Resolve (0.7%). ParseURL() is 0.7%, EncodeSegmentCount() in SetSpec() is 0.6%, and of that 1/2 (0.3%) is NS_EscapeURL().
So I don't see many hotspots here for improvement; incremental improvements are possible in a few places perhaps. Looks just like it's being called a lot.
I'll attach before and after jprof's (2ms)
Comment 10•14 years ago
|
||
Comment 11•14 years ago
|
||
![]() |
||
Comment 12•14 years ago
|
||
Yes, it being called a bunch is the problem.
Comment 13•14 years ago
|
||
I'm thinking about ways to make major improvements in NewURI performance - I can nibble at the edges, and that may help, but it's not going to produce 2x improvements.
One idea (still not vetted for practicality) is to make NewURI creation largely "lazy" - defer expensive operations (parsing, allocation) until needed, and just copy the spec as-is to start. NewURI spends a lot of time parsing URIs out into pieces and then rebuilding "normalized" versions, which are almost always the same as the input. Also all the parsing and splitting supports changing pieces of a URI - something that's I believe rarely done.
First step is to look at how often various nsStandardURL functions are called. See the attachment for stats for a modified dromaeo run and a standard profile start with many bugzilla and a few jprof tabs.
I'll note SetXxxx() are rarely called, and uri's are rarely changed (normalize_changed_uri is 2 for dromaeo, and ~2200 of ~43000 for the other profile, but all but a handful of those 2200 are from one tab (graphs.mozilla.org).
What I don't have yet is how often we need to determine if the URL needs normalization or not, or how often we might have to do the deferred parsing/etc.
Still, it looks promising at first cut.
![]() |
||
Comment 14•14 years ago
|
||
> One idea (still not vetted for practicality) is to make NewURI creation largely
> "lazy"
That sounds excellent! The only parts that really need to be eager are the ones that would throw from init (e.g. checking for spaces in the hostname).
Various setters on URIs should be pretty rare.
We need to normalize when someone calls GetSpec or various other url part getters, but I suspect we have lots of URIs where that doesn't really happen either.
Comment 15•14 years ago
|
||
Turns out more URLs are fetched than you'd think. The Dromaeo test fetches 337085 of 343925, though only 797 were modified. My test profile fetches 44038 of 46608, and modifies ~5500:
normalize_changed_uri = 2993
url_was_set = 5589
url_was_get = 44038
url_wasnt_used = 2530
nsstandardurl = 46608
So, we need to look at why they're being gotten. It appears from the debugger (in Dromaeo) that they're the result of nsHTMLAnchorElement::IntrinsicState(), which calls LinkState() which adds the URL to the link-coloring history. Unfortunately, if this history is needed then the URL must be properly normalized.
We can minimize the cost of normalization in many cases I think, but it will complicate things.
Continuing to investigate.
![]() |
||
Comment 16•14 years ago
|
||
Hrm. The IntrinsicState() call is in fact what causes URI creation, yes. We could change things around a bit so it just creates the URI but does not start the history lookup in the dromaeo test case. That's an obvious prerequisite to having the idea work, I think. Can you file a bug on me to do that?
Comment 17•14 years ago
|
||
Done. I'm not sure the link state lookup can be deferred enough to make this useful in the real world.
In the meantime, I'm looking at a separate improvement to do fast normalization in the common cases of no changes needed.
Comment 18•14 years ago
|
||
Also note the deferring normalization means GetSpec() (and other GetXxxx() and d SetXxxx() methods) can fail where they would previously have failed in SetSpec(); so this will require looking at what that might imply.
![]() |
||
Comment 19•14 years ago
|
||
I'm fine with explicit SetSpec normalizing as long as the standard newURI codepath doesn't.
Comment 20•14 years ago
|
||
bz: could you expand on that last comment?
While waiting on the bug 661768 work, I'm working on minimizing the overhead when we do need to handle GetSpec() and return a normalized URL. I'm deciding if it's already normalized as-is without parsing everything out - which would handle a subset of URLs, but probably 90-99% of real URLs you see - I've added code to Normalize to flag when it actually changes something, which is very rare except for a few sites (like graphs.mozilla.org). The trick is writing the IsAlreadyNormalized() function such that it doesn't actually add more overhead than we save (and it never gives a false positive).
We'll see how this works and if it's practical.
Comment 21•14 years ago
|
||
More issues: Equals() is called on almost all anchors when they're removed in order to do ResetLinkState() (via UnregisterVisitedCallback()). EqualsInternal() is complex enough that it may be tricky to avoid doing the full parsing for, though again I may be able to handle a subset without a full parse.
Comment 22•14 years ago
|
||
So we've been pondering attempting to replace our URL parsing wholesale with a wrapper we'd write around
http://code.google.com/p/google-url/
I'm wondering how much that work would stomp on the work being done here.
Comment 23•14 years ago
|
||
stomp on? Almost totally. :-) Also on bug 125608 that I'm waiting on a final review from bz on.
What's the relative performance vs nsStandardURL/etc? Both on initial creation and normalization/canonicalization? How much of a disconnect is there in the APIs? There are a lot of "modify this piece of the URL" functions in nsStandardURL, though not all of them are used.
What's the timeframe/likelihood?
Comment 24•14 years ago
|
||
In any case, my investigative patch is finally showing some promise. After tweaking Equals to both short-circuit the same pointer being passed (frequent), and to string-compare if the URLs are already normalized at creation time (and we're including the ref in the Equals), my Dromaeo test only parses 3000 of 300000 URLs; my semi-real-world test (bunch of bugzilla, mozilla docs, a few jprofs, etc) parses 16000 of 35000 URLs (9000 of those are lazy parses - I may be able to significantly reduce them).
Comment 25•14 years ago
|
||
Ok; some partial results:
Comparing two versions of my patch - one with suppressed parsing, one without - suppressing/deferring the full parsing on the Dromaeo test (as modified) increases performance on this machine from ~160-190 to ~215-220. A jprof shows that NewURI went from ~600 total hits to ~340; and Equals got better too ~100 to ~50.
AppendChild went from ~170-260 (wide range) to ~300-320; InsertBefore went from ~135-141 to ~151-158.
Comment 26•14 years ago
|
||
Note: this is not full lazy evaluation; this is basically looking at the URL and deciding if it's already Normalized to start with, and if so deferring full parsing until needed (other than Scheme, which is done on every URL).
![]() |
||
Comment 27•14 years ago
|
||
> bz: could you expand on that last comment?
I posit that SetSpec calls that are not part of the newURI codepath are rare. Please tell me if this is not the case!
Given that, I don't care about the performance of SetSpec per se as long as it does not impact newURI.
> More issues: Equals() is called on almost all anchors when they're removed in
> order to do ResetLinkState() (via UnregisterVisitedCallback()).
This is bug 661768, no? If we're not registered with history, we don't need to unregister and whatnot.
> EqualsInternal() is complex enough that it may be tricky to avoid doing the
> full parsing for, though again I may be able to handle a subset without a
> full parse.
I wouldn't worry about it too much if bug 661768 solves the problem.
How much of the URI-creation overhead is newURI outside of parsing, and can we speed that up?
For comment 25, what are your numbers without the generic element changes (so effectively on trunk)?
Comment 28•14 years ago
|
||
(In reply to comment #27)
> > bz: could you expand on that last comment?
>
> I posit that SetSpec calls that are not part of the newURI codepath are
> rare. Please tell me if this is not the case!
I can infer this: SetSpec in the run I'm looking at was called 56424 times, and Init was called 56409 times, and Init (unless it fails) always calls SetSpec. So at most in this run SetSpec was called via other paths 15 times.
>
> Given that, I don't care about the performance of SetSpec per se as long as
> it does not impact newURI.
Well, what I'm doing right now is to short-circuit SetSpec to avoid full parsing. If we can deal with the other issues, we may be able to make it fully lazy (lazy even for non-normalized specs). Right now it's only lazy for parsing, and then only if the URL appears to already be normalized (and most are!) The code is written to support full lazy creation. Note that it appears that almost every URL created has GetScheme() called on it, so it behooves us to cheat and at least parse out the scheme when copying the input spec.
>
> > More issues: Equals() is called on almost all anchors when they're removed in
> > order to do ResetLinkState() (via UnregisterVisitedCallback()).
>
> This is bug 661768, no? If we're not registered with history, we don't need
> to unregister and whatnot.
Yes, probably.
> > EqualsInternal() is complex enough that it may be tricky to avoid doing the
> > full parsing for, though again I may be able to handle a subset without a
> > full parse.
>
> I wouldn't worry about it too much if bug 661768 solves the problem.
Well, I managed to make it avoid full parsing in many or most cases.
> How much of the URI-creation overhead is newURI outside of parsing, and can
> we speed that up?
Well, in the modified Dromaeo test, NewURI went from 13.9% to 8.7% (600 to 373 hits). Most (7%) is Init(); of Init only 2.2% is SetSpec now. Resolve is 2.3%, a few string functions show up well below 1%, and a smattering elsewhere.
1% of Resolve's 2.2% is net_CoalesceDirs; that may be improveable. I can get a little out of SetSpec even in the lazy case by swapping the order of IsAlreadyNormalized and net_FilterURIString.
> For comment 25, what are your numbers without the generic element changes
> (so effectively on trunk)?
Checking; I'll get back to you.
Comment 29•14 years ago
|
||
Note: all of these numbers are with my monitoring harness in there (--enable-debug, and in nsStandardURL I have call-counters on almost every routine that dump when we exit).
Comment 30•14 years ago
|
||
(In reply to comment #27)
> For comment 25, what are your numbers without the generic element changes
> (so effectively on trunk)?
Well, over 3 runs it seems to be a little better overall - but this system has too much random load for consistent results across a small number of tests. Instead of suppressing ~1000000 ParseURLs it suppresses ~50000, because there are a lot less NewURI calls. Hits in NewURI and descendents went from 39 (1%) to 19 (1/2%).
I've cut NewURI by 40-50% (in the benchmark; maybe averaging 20-40% in real-world cases. However, I'm still not sure it happens enough in non-benchmarks to be important - and even in this benchmark it's the nsGenericElement changes that turn it from 1% to 13% in the first place (by calling it a lot more). Still, saving 50% of cycles for something done even moderating often is a good thing, but we need to weigh it against the increase of complexity.
![]() |
||
Comment 31•14 years ago
|
||
> I can infer this:
Good, good!
> so it behooves us to cheat and at least parse out the scheme when copying the
> input spec
Of course we've already looked for the scheme before that point... Might be nice to just pass it in, if it weren't for all the indirection. :(
It sounds like you're not seeing the hits in nsIOService::NewURI and stuff under it but outside nsStandardURL (getting the protocol handler, etc) that I was seeing?
> Well, over 3 runs it seems to be a little better overall
I think I wasn't clear. Comment 25 has actual dromaeo test result numbers (presumably from an optimized build). I'm interested in what those look like for trunk, trunk + this patch, trunk + modifications to generic element + this patch.
> I'm still not sure it happens enough in non-benchmarks to be important
Past experience says yes.
> it's the nsGenericElement changes that turn it from 1% to 13%
Well, that change is just removing code that was put in place to work around this performance issue! There's other such code we could remove.
Comment 32•14 years ago
|
||
(In reply to comment #31)
> It sounds like you're not seeing the hits in nsIOService::NewURI and stuff
> under it but outside nsStandardURL (getting the protocol handler, etc) that
> I was seeing?
NewURI is called mostly from nsIOService::NewURI. 11%, 8.7% of which is NewURI(). Nothing else in nsIOService::NewURI is above 0.8% - top three amount to 1.6% (most of the rest); two are getting the Scheme, the other is GetProtocolHandler.
59273 8 (0.1%) 472 (11.0%) nsIOService::NewURI(nsACString_internal const&, char const*, nsIURI*, nsIURI**)
370 (8.7%) NewURI(nsACString_internal const&, char const*, nsIURI*, int, nsIURI**)
36 (0.8%) nsIOService::GetProtocolHandler(char const*, nsIProtocolHandler**)
20 (0.5%) nsIOService::ExtractScheme(nsACString_internal const&, nsACString_internal&)
14 (0.3%) nsStandardURL::GetScheme(nsACString_internal&)
> > Well, over 3 runs it seems to be a little better overall
>
> I think I wasn't clear. Comment 25 has actual dromaeo test result numbers
> (presumably from an optimized build). I'm interested in what those look
> like for trunk, trunk + this patch, trunk + modifications to generic element
> + this patch.
Same test, for trunk (3 runs):
402 (538/300)
trunk + this patch (3 runs):
412 (537/317)
Mods to GenericElement (5 runs):
177 (min 159, max 192)
Mods to GenericElement + patch (7 runs)
210 (min 176, max 224) (176 was the only run under 209 - excluding that one it's 215)
Note: the test is too short I think for consistent results; at least on this machine. Those are averages of 3 runs; for example the trunk+patch was 399/425/414 for the three runs.
> > it's the nsGenericElement changes that turn it from 1% to 13%
>
> Well, that change is just removing code that was put in place to work around
> this performance issue! There's other such code we could remove.
Well, you'll never optimize an operation more than not doing it... :-) Unless that code causes problems or negative side-effects, or makes maintenance tough, I'd leave them. With my patch, we cut NewURI() by 40%, but it's still 7 times as expensive with the nsGenericElement changes as it is in trunk (in the Dromaeo test).
![]() |
||
Comment 33•14 years ago
|
||
> Unless that code causes problems or negative side-effects, or makes maintenance
> tough
Bingo on that third one. It makes all sorts of DOM stuff very very fragile to constantly have to keep in mind that creating URIs is expensive....
Updated•9 years ago
|
Whiteboard: [necko-would-take]
Updated•9 years ago
|
Whiteboard: [necko-would-take] → [necko-backlog]
Updated•9 years ago
|
Comment 35•9 years ago
|
||
FWIW,
a simple URL creation_and_init takes 1.8 + 0.1n micro secs in my machine,
where n is the length of the URL.
a http://IPV4/ takes 3.98 + 0.04n micro secs.
Comment 36•9 years ago
|
||
I have a simple table for the correlation between length and time
The unit is micro secs.
If we want to decrease the slop, parser might not be a good idea.
+--------+------------+---------------------------+----------+-------------------------+
| length | preprocess | special protocol handling | parse | build normalized string |
+--------+------------+---------------------------+----------+-------------------------+
| 20 | 17%/0.36 | 10%/0.23 | 15%/0.33 | 55%/1.2 |
+--------+------------+---------------------------+----------+-------------------------+
| 100 | 21%/1.2 | 13%/0.8 | 17%/1.0 | 46%/2.6 |
+--------+------------+---------------------------+----------+-------------------------+
| 1000 | 22%/2.7 | 11%/1.4 | 11%/1.4 | 54%/6.5 |
+--------+------------+---------------------------+----------+-------------------------+
Comment 37•8 years ago
|
||
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: -- → P1
Comment 38•8 years ago
|
||
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P1 → P3
Updated•4 years ago
|
Assignee: rjesup → nobody
Updated•3 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•