Closed Bug 103916 Opened 23 years ago Closed 23 years ago

nsStdURL::GetSpec() is 2.6% of main1()

Categories

(Core :: Networking, defect, P1)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla0.9.7

People

(Reporter: dp, Assigned: darin.moz)

References

Details

(Keywords: perf)

Attachments

(2 files, 19 obsolete files)

1.86 KB, text/plain
Details
141.66 KB, patch
darin.moz
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
Func
		#calls	F ms	F+D ms	Module
-------------------------------------------------------------
main1
		   1	0
3,279ms
mozilla.exe
nsStdURL::GetSpec
2832
0
   86ms	necko.dll

That is about 2.6% of main1's time

In general, nsStdURL:: allocates way too many times (small quantities but too
many allocations).
Blocks: 7251
Keywords: perf
-> 0.9.7
Priority: -- → P2
Target Milestone: --- → mozilla0.9.7
-> ok.. 0.9.6
Target Milestone: mozilla0.9.7 → mozilla0.9.6
Status: NEW → ASSIGNED
Priority: P2 → P1
from browser startup to displaying http://mozilla.org/start/, this is what i found:

  count_GetSpec = 3801,
  count_SetSpec = 875,
  count_GetPrePath = 1,
  count_SetPrePath = 0,
  count_GetScheme = 680,
  count_SetScheme = 0,
  count_GetPreHost = 1,
  count_SetPreHost = 0,
  count_GetUsername = 0,
  count_SetUsername = 0,
  count_GetPassword = 0,
  count_SetPassword = 0,
  count_GetHost = 246,
  count_SetHost = 0,
  count_GetPort = 8,
  count_SetPort = 0,
  count_GetPath = 4044,
  count_SetPath = 0,
  count_Equals = 197,
  count_SchemeIs = 595,
  count_Clone = 43,
  count_Resolve = 996,
  count_GetFilePath = 0,
  count_SetFilePath = 0,
  count_GetParam = 0,
  count_SetParam = 0,
  count_GetQuery = 1,
  count_SetQuery = 0,
  count_GetRef = 4,
  count_SetRef = 0,
  count_GetDirectory = 4,
  count_SetDirectory = 0,
  count_GetFileName = 0,
  count_SetFileName = 0,
  count_GetFileBaseName = 0,
  count_SetFileBaseName = 0,
  count_GetFileExtension = 232,
  count_SetFileExtension = 0,
  count_GetEscapedQuery = 0,
  count_GetFile = 54,
  count_SetFile = 0

this suggests to me that storing the URL in decomposed form is really not worth
it.  we should really be storing the URL as a single string with pointers to the
particular section.  that way, we'd be optimizing for the Getter methods, and in
particular GetSpec.
Totally. I came to the same conlusion by looking at quantify data. Go go go...
This storing method might be difficult. You have to distinguish between an non
existant component and an empty component, for example there is a difference
between http://host/path?query and http://host/path? . Both have a query, but
the last one is empty. That is different from http://host/path which has no query
The more I think about this I'm pretty sure it will not work or at least be very
very difficult. We currently store the components as we get them and return
isolated components unescaped (safe), but composed components accessible trough
GetSpec or GetPath for example escaped. That is necessary to retain url syntax.
This again will be very difficult to achive with the proposed solution I think.

Instead I would propose to move the old structure to the string world and see if
that alone helps. If not it might be possible to have the whole spec cached
inside in addition to the components to speed up GetSpec.
andreas: i'm not convinced that it's not possible to store the whole string with
simple offsets to each component.  i was thinking of storing a starting position
integer and a string length integer.  if the component is present, but
zero-length, then it's length value will be 0.  if its delimiter is not present,
then it will have a length of -1.  the important thing is to store a normalized
version of the spec, so that Equals can be a simple strcmp.
about GetSpec(), we should determine what the break down is within GetSpec().

I think that quite a bit of it (and it's callers) is in doing malloc and free.

I'll go quantify and get some numbers.

the caller will do a malloc and free, and within GetSpec() we'll do a malloc and
free on GetPath()

see bug http://bugzilla.mozilla.org/show_bug.cgi?id=104929

I want to fix GetPath() and GetSpec() to take a nsCAutoString reference, so that
the caller can use a stack based string and we can pass that through.
seth: nsIURI/nsIURL are nearly frozen interfaces... we may run into some serious
resistance if we make changes to those interfaces.  perhaps we need to invent
some new internal interfaces for working with URI/L's within mozilla?
Seth: Don't we have an nsACString& idl type somewhere? Discussions on irc imply
that the answer is no. We definately don't have an nsCAutoString& idl type.

You definately don't want to hard code a specific string type.

darin: internal interfaces would suck. If its not frozen, then lets change it
before it is - why not give external consumers the perf increase?

Of course, there are a whole lot of callers, and all of them would have to be
individually modified...
I met with darin, and we looked at the quantify data.

we can get a win if we add a *non scriptable* method:

[noscript] void GetSpecConst([shared] out wstring aConstValue);

This is similar in spirt to GetValueConst() that we have on nsIRDFLiteral.

That will prevent all the allocation, copying, and freeing that goes on within
GetSpec(), GetPath() and the callers to GetSpec().

following darin's plans, GetSpecConst() and GetSpec() would no longer call
GetPath(), as we'd have the address to the entire spec to return.
  
adding GetPathConst() might also be a win, we should look into the callers of
GetPath()
bbaetz: i believe that there may be external implementations of nsIURI/nsIURL.
GetValueConst will vanish once we get COW strings, I believe (although I'm not
certain) And you're still going to have to change all the callers...

If its a wstring, then you're going to have to convert it anyway, so you'll lose
any perf gain. I presume thats just a typo, though.
If you don't want to change the callers, what about adding a:

bool specIs(in OurNsACStringType foo);

method? Again, this assumes that we have such a type
darin: Its not a frozen interface. And updating the implementation isn't as much
work as updating all the callers.

It is a reasonably large perf win, so it needs to be considered, I think.
Cope-on-write strings will only benefit this case if the interface is converted
to use |AString| or |ACString| (if/when we add this type to xpidl).

|getSpecConst| (or how about |getSpecShared|?) is a solution I can live with.
bbaetz:

yeah, seth meant 'string' not 'wstring'

i'm not saying that we shouldn't consider adding GetSpecConst and GetPathConst
to nsIURI... we just need to keep in mind that changing the interface could mean
surprises for some people.  moreover, these we must not require that all URI
implementations support GetSpecConst/GetPathConst, since these methods require
the implementation to hold onto a null terminated spec/path... it just so
happens that StdURL could do this if we changed its internal representation as i
have suggested.

also, isSpec and isPath would not satisfy the problem here.  consider imgCache,
which calls GetSpec in order to pass the spec as a cache key.  if we replaced
that call site with GetSpecConst, we'd see a noticeable perf improvement.
|GetSpecShared| sounds better to me too.
darin: Hmm.

You mean that you want an nsIURIConstGetter interface? You'd lose almost as much
from the QI as you would from saving a copy, wouldn't you?

If we're going to go for a [noscript] class, then the method could take an
nsSomethingCString& and do string-fu to either hold an owning or a non-owning
reference. xpidlcstrings used to be able to do this, but it looks like
getter_shares vanished some time ago.

jag: Is there a replacement?
bbaetz: Nope, no replacement for getter_Shares.
note: nsIURI is not frozen. It is under review
(http://lxr.mozilla.org/seamonkey/source/netwerk/base/public/nsIURI.idl#70 ). It
is definately an interface that people are using, so mods to it will cause
bustage, but, non frozen interfaces are used at the user's own risk until frozen.

GetSpec() may be 2.6% of main1(), but, what does that ammount to in "time"? Are
we talking microseconds here? If so, we need to consider whether or not the
bustage is worth the optimization.
2.6% of 20sec = .52 secs

Either way is acceptable to me. Keeping the old way working is code bloat and
maintenance issues. So changing interface seems the cheaper option.
no matter what, there is a lot of work that can be done inside of nsStdURL and
ns*URLParser to improve performance.  so, i'm not convinced that we absolutely
need to change the interfaces to see a significant perf win... it is however an
option that would improve things further... how much further remains to be seen.

so, my current plan is to work with the existing api.. get a better nsStdURL
impl, and then rerun the perf tests.  then we'll know for sure if we want to
change the interfaces.
just for reference: GetSpec is called at least once for each and every chrome
element loaded, since the spec is used by imagelib as the cache key.  opening
a new window, opening the preferences panel... opening anything having to do
with chrome (and of course loading web pages), will show GetSpec taking a
significant slice of time.  with a nsStdURL impl that stores the spec as a
single string, and with a GetSpecShared method, we could nearly eliminate this cost.
Blocks: 105040
with this patch, we pass all the tests in urlparse.dat as well as the builtin
Resolve tests, and i can start the browser and surf the web.

things that still need to be done:

1- implement nsIURI, nsIURL setters (SetSpec already done)
2- implement nsISerializable methods
3- cleanup xpcom/io/nsEscape
4- rename interfaces, classes, etc -- replacing nsStdURL and the URL parsers.
5- test, test, test ;-)
some preliminary testing shows %5 improvement with a debug linux build. 
however, my optimized win32 build isn't showing much of an improvement :-(

GetSpec[Const/Shared] would definitely eliminate the 2.6% reported by dp.

oh yeah, and one more thing that still needs to be done... put back the code for
caching mFile, and make sure it actually works! (unlike the nsStdURL impl)
This mFile stuff ... can't say I like it at all. Why not use this opportunity to
get rid of it? This GetFile/SetFile methods are highly platform depended, for
every new platform we need to change nsStdURL. I would very much like it to have
Get/SetURL on the different nsIFile implementations. That makes much more sense
to me.
The patch seems to be missing the implementations of nsIURLParser2. Also I don't
like the nsStandardURL::Resolve method. It seems wrong to me to use the
noAuthURLParser to find out if there is a scheme or not. ExtractURLScheme should
be used instead, a full parse seems to much for that. 
just this morning, i thought of exactly the same thing...

so roughly we'd have:

nsStandardURL::SetFile(nsIFile *file) {
  nsXPIDLCString url;
  file->GetURL(getter_Copies(url));
  if (url)
    SetSpec(url);
}

and

nsStandardURL::GetFile(nsIFile **result) {
  nsCOMPtr<nsIFile> file = do_CreateInstance(...);
  file->SetURL(mSpec);
  NS_ADDREF(*result = file);
}

and then all of the platforms specific code would be moved into nsLocalFileXXX
whoops.. you're right.. i forgot to add nsURLParser2.{h,cpp} ... i'll add them
along with a revised patch tonight.
Attachment #54320 - Attachment is obsolete: true
Attached patch v0.8 - nearing completion (obsolete) — Splinter Review
Attachment #54602 - Attachment is obsolete: true
Attachment #54786 - Attachment is obsolete: true
With this patch i'm seeing anywhere from 0-4% (averaging about 2%) at startup
on a PIII 850MHz w/ 256Mb RAM running Win2k.

Drilling down to the low level details, here's some of the numbers for the
actual nsIURI methods of interest:

    method          current impl              new impl
    ------          ------------              --------
    SetSpec         1821                      1595
    GetSpec         2151                       150
    Resolve         1537                       334
    SetPath          938                       181
    GetPath         1342                       164

These numbers are time in milliseconds and correspond to the amount of time
required to run the specified method 100,000 times on an average URL with
average arguments to each method.  So, the numbers aren't exact by any measure,
but they do give some indication of improvement across the various key methods.

I have two implementations of SetSpec: one using the string code and another
simply using malloc and memcpy.  The later is roughly 20% faster.  I've asked
some of the string folks to take a look to see if this isn't just due to some
mistake in my usage of the string facilities.

At any rate, this implementation seems to give the level of improvement I was
hoping for, and it still passes all of the URL tests.

So, now it's just a matter of testing it out to make sure that it isn't missing
something major -- andreas?
*** Bug 105040 has been marked as a duplicate of this bug. ***
Attachment #55176 - Attachment is obsolete: true
Comment on attachment 55329 [details] [diff] [review]
v1.0 final draft, ready for reviews

@@ -70,6 +67,7 @@
 		nsStreamProviderProxy.cpp \
 		nsURIChecker.cpp \
 		nsURLHelper.cpp \
+        nsURLParsers.cpp \
 		$(NULL)


Should use tabs to indent to align (copied from elsewhere, I know, but...).

 if (urlPart)
-            *urlPart = nsCRT::strdup(scheme.get());
+           *urlPart = nsCRT::strdup(scheme);
         return NS_OK;

Looks like  a 3-space indent crept in on the + line, somehow.

+        case url_Port:
+            if (port != -1) {
+                nsCAutoString buf;
+                buf.AppendInt(port);
+                ExtractUrlPart_Helper(buf, 0, buf.Length(), startPos, endPos, urlPart);
+            }
+            else
+                ExtractUrlPart_Helper(nsnull, 0, -1, startPos, endPos, urlPart);
+            break;

These calls seem silly -- why not set *urlPart = ToNewCString(buf) or whatever the
latest incantation is?  PL_strndup is not *necessarily* compatible with nsMemory, btw.

+void
+nsStandardURL::CoalesePath(char *path)

"Coalesce" is misspelled.

+            nsCRT::memcpy(buf + i, escapedStr->get(), seg.mLen);

What's wrong with memcpy?  gcc among others will optimize it, many libcs optimize
it too.
Sorry for the nits, more later -- our plain text editor bugs are killing me in
this tiny textarea (grrrrrrr!).

/be
Attachment #55329 - Flags: needs-work+
I still don't like this part of nsStandardUrl::Resolve:

+    // NOTE: there is no need for this function to produce normalized
+    // output.  normalization will occur when the result is used to 
+    // initialize a nsStandardURL object.
+
+    if (mScheme.mLen < 0) {
+        NS_ERROR("unable to Resolve URL: this URL not initialized");
+        return NS_ERROR_NOT_INITIALIZED;
+    }
+
+    nsresult rv;
+    URLSegment scheme;
+    URLSegment path;
+
+    // relative urls should never contain a path, so we always want to use
+    // the noauth url parser.

Why path? Do you mean host? path or host, I dont't think this assumption is valid.

+    rv = gNoAuthParser->ParseURL(relpath, -1,
+                                 &scheme.mPos, &scheme.mLen,
+                                 nsnull, nsnull,
+                                 &path.mPos, &path.mLen);
+    if (NS_FAILED(rv)) return rv;
+

Why use the whole parser when you just want to know if there is a scheme? We
have ExtractURLScheme for that which does a much better job. And if you must use
the full parser please don't use the noAuthParser, you can't make any
assumptions about the url given to Resolve.
brendan: thanks for the review comments.

andreas: yeah, i meant "host" not "path."  thanks for catching that.
i'm calling the parser instead of ExtractURLScheme because i need to know both
the scheme as well as the path.  if there is something missing from the scheme
parsing logic in nsBaseURLParser then that should be fixed.  i don't see the
point of having ExtractURLScheme.
andreas: let me answer your question a bit more directly: i want to optimize for
the case when the relativeURL is actually relative, and normally we wouldn't
expect there to be a scheme, but there would always be a path.  so, i'm calling
the parser to extract these portions.  and it isn't parsing the whole URL.. it
is only looking for the scheme, auth, and path sections.  i use the no-auth
parser, since i know that relativeURL's should not have an authority section. 
isn't this the case?  nsStandardURL::Resolve passes all of the relative url
tests... is there some case that i'm not considering?
darin: I haven't found the usage of the path inside Resolve on first scanning,
are you sure? ExtractURLScheme looks a bit on the characters inside the scheme
and returns false if there is an invalid character. Maybe that can be moved into
the urlparser too.
My guess is that 90% of all calls on startup to Resolve through MakeAbsolute are
done by already absolute URLs, we should optimize on that. And having there most
of the time an absoulute URL (with every thinkable scheme!) it is dangerous in
my opinion to scan with the noAuthParser.
andreas: you're right... |path| isn't even being used!  but it was meant to be,
and that is an error.  it is needed in case there is some extra junk at the end
of relpath that the parser wants to trim off (such as a trailing \n).  so
references to relpath should really be references to something a kin to
Substring(relpath, path.mPos, path.mLen).

regarding the appropriateness of the no-auth url parser... i don't see why it
would matter which url parser is invoked.  all of the basic url parsers make the
same assumptions when parsing the scheme... in fact they are implemented by the
same piece of code (see nsBaseURLParser::ParseURL).  the distinction between a
no-auth url parser and an auth url parser lies in what follows the scheme,
right?  so there is no reason to really care which parser is invoked.except in
this case, we know that we also want the path section of relpath, so
as i said, i intended to use the |path| result to extract the url path from relpath.
actually, the old nsStdURL::Resolve didn't do any cleansing of trailing
whitespace or control characters, so as a first pass i'm just going to eliminate
the |path| local variable, and i'll make ParseURL return early if both |authLen|
and |pathLen| are null (ie. if the caller isn't interested in the authority
section or the path section, then don't both parsing them).  i'll also move the
extra scheme validation checks from ExtractURLScheme into ParseURL.  i'll make
ParseURL return NS_ERROR_MALFORMED_URI if it detects invalid chars in the scheme.
Yes, nsStdURL::Resolve goal was to create a good looking url string with minimal
effort (without parsing to much). Everything else was left to the most of the
time following SetSpec call.
Attachment #55329 - Attachment is obsolete: true
Comment on attachment 55674 [details] [diff] [review]
v1.1 revised per comments from brendan and andreas

accidentally left some debugging crap in this patch... ugh!
Attachment #55674 - Attachment is obsolete: true
Attachment #55674 - Flags: needs-work+
so, i tested this patch on Cathleen's P200, and got the following startup times
(in seconds):

    [new code]                  [old code]
     18.62                       18.95
     19.11                       19.01
     18.67                       19.01
     18.57                       18.95

removing the worst time from each, and averaging this comes out to:

     18.62                       18.97so the new code amounts to approximately
1.9% savings at startup.

the next thing to test is how this patch effects page load times.
Okay, I applied the patch on a linux build, it seems to work. A few comments:

1)

There is an indention problem with mozilla/netwerk/base/src/Makefile.in on the
entry for nsStandardURL.cpp. Please use 2 tabs instead of 8 spaces. Emacs marks
this line suspicious with good reason.

2)

I see a new assertion:

###!!! ASSERTION: NS_ENSURE_TRUE(unknownOther) failed: 'unknownOther', file
nsStandardURL.cpp, line 1039
###!!! Break: at file nsStandardURL.cpp, line 1039
Someone trys to Equal with a null pointer. No problem of the new code, but
points to a problem somewhere else.

3)

I checked with urltest and see some changes:

Doing urltest -file res/urlparse.dat

I see two more failures than usual:

Testing: 
http://username:password@hostname.com:80/pathname/./more/stuff/../pathResult:  
http,username,password,hostname.com,80,/pathname/more/,path,,,,,http://username:password@hostname.com:80/pathname/more/path
Expected:
http,username,password,hostname.com,80,/pathname/more/,path,,,,,http://username:password@hostname.com/pathname/more/path
        FAILED

Testing:  http://username:password@hostname:80/pathname
Result:  
http,username,password,hostname,80,/,pathname,,,,,http://username:password@hostname:80/pathname
Expected:
http,username,password,hostname,80,/,pathname,,,,,http://username:password@hostname/pathname
        FAILED

We no longer suppress the port when it is given, but is the default port.



Doing urltest -file res/urlparse_unx.dat

I see 2 failures;
Testing:  file://home
Result:   file,,,,-1,/,home,,,,,file:///home
Expected: file,,,home,-1,/,,,,,,file://home/
        FAILED

Testing:  file:home
Result:   file,,,,-1,/,@(,k+@,,,,file:///@(%C9%05%08%03%00%00%00D%EB%FF%B...
...(very, very long line)...00%00%00file%3Ahome
Expected: file,,,,-1,/,home,,,,,file:///home
        FAILED
The first problem points to a loss in functionality in the noAuthParser.
Although there are usually no hosts in file urls the syntax allows it and it is
sometimes used. So there are some special cases where there are "authorities" in
file urls. I started the same way as it is with this patch, but had a bunch of
bugs filed against it. All those bugs will have to be reopened ...

The second problem is either with urltest.cpp or with the parser, haven't
figured that out yet. Something points to nowhere ...
It's a parser problem. When searching for the most right / you don't take into
account that there might be none. I changed ParseFilePath to get it working:

    // search backwards for filename
                            ========    <= better directory delimiter
    for (p = end - 1; *p != '/' && (p - filepath) > 0 ; --p)
                                =====================

The default port stuff is also an easy fix, just use the lines from
nsStdURL::GetSpec.
The new assertion is caused when this url
chrome://navigator/content/navigator.xul is compared with a null pointer.
 
Attachment #55677 - Flags: needs-work+
I just found that you removed nsStdUnescape from nsEscape.cpp and replaced it
with the old unescape stuff. That's bad, because the old stuff contains at least
one nasty error when dealing with a % followed by chars that are not from the
hex range. You should resurect it ...
 
andreas: thanks for all the feedback!

the port problem looks like a regression... i must have regressed that when i
was trying to optimize SetSpec.

thanks for pointing out the other issues... i hate the fact that file://home
shouldn't be interpreted with home as a hostname.  that just seems wrong.  do
you have any idea which bugs would regress if we didn't "fix" this? 
I have a "patched" patch in my local tree that now performs well with the
exception of the file://home stuff. Instead of resurrecting the contents of
StdUnescape I fixed the old stuff.

file://home    In the old nsNoAuthURLParser there was some special stuff to deal
with this case. That is now gone, take a look. On first search I could find 
bug 32997.
 
I just found a big regression in mail/news. I can't read any mail stored in
folders, that includes the inbox. I get a message that the file could not be
found. Darin could you please check if this happens in your tree too.
andreas: i have patches in my tree to handle everything but file://home and the
unescape stuff... could you please submit your revised patch for
nsEscape.{h,cpp}?  thanks!
i'm also experiencing problems with imap... nntp works fine though. 
investigating...
i just played around with standard url using xpcshell, and it looks like
nsStandardURL::GetPreHost is busted.  it just returns empty on a URL like:

imap://darin@nsmail-2.mcom.com/

however url.username comes out correctly.
With pop3 mail it seems we are trying to read a file including the query part.
The mailbox url is simply turned into a file url.
nevermind... turns out that i was typing "url.prehost" instead of "url.preHost"
nevermind ... I see the same file/mailbox urls in an unpatched build. Must be
something else ... Seth, any ideas where to look?
as far as file://home is concerned we currently are kind of broken anyways. 
nsLocalFile::SetURL looks for a host and simply prepends it to the path.  it
does not handle the host as a UNC hostname.  so in some sense bug 32997 still
exists today.  looks like nsStdURL::GetFile simply ignores the hostname section,
which makes

file:/// == file://localhost/ == file://xyz/

how confusing is that?!?  to do the same we could simply make the file protocol
use the auth parser, but it hardly seems right.  perhaps, file://localhost/
should map to file:////localhost/, but then we'd need to add logic elsewhere to
ignore the UNC-hostname part of the path.
ok, i discovered the problem with Imap... it was actually caused by a
miscalculation of the path/filePath length in BuildNormalizedSpec.  the fix is
trivial and now Imap is working again.
Attached patch v1.3 (obsolete) — Splinter Review
Attachment #55677 - Attachment is obsolete: true
this latest patch handles everything andreas raised except for the file://home
issue.  i'm still thinking about other ways of dealing with this problem.
(p - filepath) > 0

=> p > filepath ?

benc said elsewhere that file://foo/etc/bar should only work from computers who
are AKA foo. eg file://www.netscape.com/etc/passwd should only work from
computers who think/know they are www.netscape.com, and not my computer since
mine knows it isn't.

a reminder that microsoft has a prefered interpretation for
+// eg. file:foo/bar.txt
which says it's a relative path.
so if you encounter it in c:\windows\desktop\something.url microsoft will go to
c:\windows\desktop\foo\bar.txt

speaking of nonstandard behaviors, try this in lynx:
http://www.mozilla.org:http/ having just read the spec today i can tell you that
this isn't valid per at least one iteration of the spec.

of course, implementing that could make urls really interesting:
http://user:password@host:protoportname/path@idontcare
   p - filepath > 0
=> p - filepath + filepath > 0 + filepath
=> p > filepath

ok, so file: URLs aren't exactly no-auth URLs.  there is an authority section.
but we can't just use the auth parser for file: URLs either because it makes
different assumptions about file:/foo/bar/ and file:foo/bar

what we need is an URL parser that makes the following normalization:

file:foo/bar    => file:///foo/bar
file:/foo/bar   => file:///foo/bar
file://foo/bar  => file://foo/bar
file:///foo/bar => file:///foo/bar

so, for all but the 3rd, the authority section is empty.  this to me sounds like
another specialization of nsBaseURLParser::ParseAfterScheme.  perhaps we need a
nsIStandardURL::InitWithParser method, so we can implement a custom URL parser
for file: URLs in netwerk/protocol/file/src.  or, we could just add another
standard URL parser for file: URLs.
NoAuthUrlparser was a bad name choice, but at that point in time I really
thought there was no authority in file urls. I should have called this thing
LocalUrlParser, which would have fitted much better. I don't think we need a new
specialisation, all noAuth urls are most likely really local urls.

If you want to make that distinction, just add the new parser, put it into the
global objects part and change the New_URI part in the file protocol handler to
use that parser by default. No need for any new methods.
but file: urls are not only able to reference local resources.  isn't that the
point of the authority section: that it can reference any host?  of course, we
only support local file urls, but that doesn't restrict the meaning of the host
section for a file url.

i think that no-auth still makes sense.  auth always tries to err on the side of
ensuring that there is an authority section, noauth always treats what may look
like an authority section as part of the path.  std is in between.  and file
needs something else that is also in between auth and noauth, but not equal to
std.  so i really think that file needs a separate url parser.  i'm just not
sure what it should be called... perhaps file??  local just seems incorrect.
Hmmm ... we could call it LocAuth for LocalAuth. If we do this and use it for
file NoAuth becomes unused at the moment, as far as I know.
when i started working on this bug i promised myself that the first revision
wouldn't involve any api changes... now it seems we're talking about api
changes, so i think that i should forget renaming/replacing no-auth for now, and
just concentrate on getting the performance improvements in.  there can be
another bug later on about cleaning up the api.
But then let the current noAuth have the originial functionallity of the old
NoAuth, put in the file:// handling and the drive detection-code for XP_PC.
right.. v1.3 has the drive detection code (though it's not quite in the right
place).  i'll upload new nsURLParsers.{h,cpp} as well as a new complete patch.
Attached patch v1.4 (obsolete) — Splinter Review
v 1.3 including url.tgz = v1.4 (?) looks good, it passes all urlparser tests it
should. But I still see the problem in mailnews. I can't access any mail stored
in mailfolders of pop3 mailaccounts or the Local Folders.
andreas: v1.4's nsStandardURL.cpp fixes the mailnews problem.  please give it a
try.  thx!
actually, no.. v1.3 nsStandardURL.cpp and v1.4 nsStandardURL.cpp are identical.

andreas: can you enable NSPR_LOG_MODULES=nsStandardURL:5, try loading your pop3
folders, and submit the log?  this might reveal the problem.  thx!
hmm.. there's nothing odd about those URLs.. i also can't detect any difference
in the parse results.  i'm going to try setting up a pop3 account myself to see
i can reproduce the problem.
i setup a pop3 test account and am likewise noticing a problem... but the
problem seems a bit different than what you describe.  in my case i get an alert
dialog that says it can't find a file of the form:

/C|/Documents%20and%20Settings/darinf/Application%20Data/Mozilla/Profiles/test/ ...

the rest of the text exceeds the width of the dialog.  anyways, it looks like
someone is getting the raw escaped path when they shouldn't.  i suspect it may
be a problem related to my changes to nsStandardURL::[GS]etFile().
I think it's the same thing as what I see, maybe you should try puting in the
old Get/SetFile stuff for now. Get/SetFile is a really good reason, it makes
sense since the url looks fine. I'm signing off for today, it's close to 2AM
over here and I'm to tired to get anything useful done.
Okay the mailnews problems is that we get the query part of the url as part of
the  FileBaseName part inside SetURL. Looking more into it.
andreas: thanks for figuring out the mailbox bug... i had mistakenly thought
that i could get away with not calling ParsePath.
What's the RFC's after 1738 I should be reading? I need to catch up to some of
the terms used here.

re: file://string, that is probably illegal, or if you are being nice, it would
be mapped to file://string/, which would presumably be the top level of the
filesystem if the browser is running on a hostnamed "string". Our current file
URL implementation always ignores the hostname, so if I'm reading your comments
correctly, things might be working fine.
darin, I'll sr= as soon as you're happy with testing results and have an r=. 
Cc'ing myself so I can see the bugmail.

/be
darin: Are your problems with the pop3 account on windows gone with the change
to nsLocalFileCommon.cpp?
andreas: yes, your change to nsLocalFileCommon.cpp fixed the problem i was
seeing with POP3.  thanks again!  uploading new "final" patch shortly.
Attached patch v1.5 (obsolete) — Splinter Review
Attachment #56206 - Attachment is obsolete: true
Attachment #56295 - Attachment is obsolete: true
darin: which files changed between v1.4 and 1.5? Only nsEscape.cpp and
nsLocalFileCommon.cpp? What about the unimplemented setters in nsStandardUrl.cpp?
Will you do them?
andreas: yes, i changed those two files according to your latest diffs and then
added a bit of documentation to nsIURL.  otherwise, it is equivalent to the v1.4
patch.  the setters that are unimplemented in nsStandardURL were also
unimplemented in nsStdURL, so there should be no loss of functionality.
Comment on attachment 56600 [details] [diff] [review]
v1.5

r=andreas.otte@debitel.net (tested on linux, please get some more r=)
Attachment #56600 - Flags: review+
Comment on attachment 56615 [details] [diff] [review]
patch to nsXULDocument.cpp to rev the fastload version number

got a rs=brendan on this change
Attachment #56615 - Flags: superreview+
Comment on attachment 56600 [details] [diff] [review]
v1.5

Underindented by 1, or a (somehow hidden from me) tab?

 if (urlPart)
-            *urlPart = nsCRT::strdup(scheme.get());
+           *urlPart = nsCRT::strdup(scheme);

Why not combine cases below, one Extract... call under the 3 case labels?

+        else {
+            switch (flag) {
+            case url_Param:
+                ExtractUrlPart_Helper(urlString, paramPos, paramLen, startPos, endPos, urlPart);
+                break;
+            case url_Query:
+                ExtractUrlPart_Helper(urlString, paramPos, paramLen, startPos, endPos, urlPart);
+                break;
+            case url_Ref:
+                ExtractUrlPart_Helper(urlString, paramPos, paramLen, startPos, endPos, urlPart);
+                break;

This char-at-a-time (after the first \r, \n, or \t) Append loop can hurt:

+    PRBool writing = PR_FALSE;
+    for (const char *p = str; len && *p; ++p, --len) {
+        if (PL_strchr("\r\n\t", *p)) {
+            writing = PR_TRUE;
+            result.Assign(str, p - str);
+        }
+        else if (writing)
+            result.Append(*p);
+    }
+    return writing;

How about this instead?

+    PRBool writing = PR_FALSE;
+    for (const char *p = str, *q = p + len; p < q; str = ++p) {
+        while (*p && *p != '\r' && *p != '\n' && *p != '\t')
+            ++p;
+        if (!writing) {
+            if (*p == '\0')
+                break;
+            writing = PR_TRUE;
+        }
+        result.Assign(str, p - str);
+    }
+    return writing;

Why not nsCAutoStrings here and below?

+    // buffers for holding escaped url segments (these will remain empty unless
+    // escaping is required).
+    nsCString escapedScheme;
+    nsCString escapedUsername;
+    nsCString escapedPassword;

Maybe jag knows of a way here:

+    mSpec = buf; // XXX too bad there isn't some way of avoiding this strdup!
+    return NS_OK;

Sorry, tired and I don't trust the stinking plaintext (textarea) editor.
More tomorrow.

/be
Attachment #56600 - Flags: superreview+
I guess those could be nsCAutoString, stack allocation is pretty cheap and it'll
save us an alloc in most cases when escaping is needed, since I suspect most
parts to be less than 64 characters.

I've looked at the |mSpec = buf;| before. Basically what this needs is |Adopt|
on nsSharableString. dbaron has a patch for this in bug 104663. That doesn't
quite help nsCString right now, so maybe we should hack this method on nsCString
for the time being.
i went with nsCString thinking that i was optimizing for the case when escaping
wasn't necessary, but i guess at 64 bytes a piece nsCAutoString's wouldn't be too
costly.
thanks for all the good review comments... i'll put out a new patch right away.
Attached patch v1.5 (obsolete) — Splinter Review
the latest patch fixes the problems pointed out by brendan.  i went with the
following algorithm for FilterString:

PRBool
nsStandardURL::FilterString(const char *str, nsCString &result)
{
    PRBool writing = PR_FALSE;
    result.Truncate();
    for (const char *p = str; *p; ++p) {
        if (*p == '\t' || *p == '\r' || *p == '\n') {
            writing = PR_TRUE;
            // append chars up to but not including *p
            result.Append(str, p - str);
            str = p + 1;
        }
    }
    if (writing)
        result.Append(str, p - str);
    return writing;
}
tested this out using urltest.exe and it seems to work correctly.
I looked over the v1.5 patch and spoke with Darin about any questions I had.  I
didn't look at the code in context, but what I did see looks okay. r=gordon
bug 108734 blocks this patch... i think i can work around that bug for now.
nsCString::Replace can also cause a hang... see bug 108738.  so, i definitely
should not be using it.
i talked with jag... he's going to try to fix these bugs, but we're not going to
wait on those fixes before landing this patch.  so, i'm going to replace all
instances of string.Replace, with a inline static helper function that instead
calls Cut followed by Insert.  none of the Replace calls are on the critical
path so i don't think we need to worry too much about the perf hit of not using
Replace.
later when the string bugs are fixes, we can easily revert to the current patch.
i discovered a crash (unrelated to the string.Replace problem) while running the
page loader test.  the stack trace on my MOZ_PROFILE win32 build is all screwed
up, but it looks like someone is calling nsIURI::Resolve with a junk |this| pointer.

time to break out the debug build... this patch unfortunately might not make it
for 0.9.6 :(
For the record, I've pointed out to darin that the |for| loop he's using won't
build everywhere since the |p| is declared within the scope of the |for| loop
and won't be accessible below it (though some compilers get the scoping wrong,
which is why it compiled for him).
Attached patch v1.6 (obsolete) — Splinter Review
Attachment #56600 - Attachment is obsolete: true
Attachment #56743 - Attachment is obsolete: true
Attached file diff -u v1.5 v1.6 # slightly annotated (obsolete) —
patch v1.6 accidently includes changes to netwerk/test/makefile.win to build
foo.exe -- this'll be removed before commiting the patch.

andreas, brendan, gordon: can i get updated r='s and sr='s from you guys on the
latest patch?  thx!

-> 0.9.7 (since there's no reason to risk destabilizing 0.9.6 for this patch)
Target Milestone: mozilla0.9.6 → mozilla0.9.7
Blocks: 109179
darin: could you include a complete final patch, v1.6 is only some stuff about
nsAString.
Attached patch v1.6 (for real this time) (obsolete) — Splinter Review
Attachment #56133 - Attachment is obsolete: true
Attachment #56286 - Attachment is obsolete: true
Attachment #56411 - Attachment is obsolete: true
Attachment #56413 - Attachment is obsolete: true
Attachment #56615 - Attachment is obsolete: true
Attachment #56864 - Attachment is obsolete: true
Blocks: 109255
Comment on attachment 57254 [details] [diff] [review]
v1.6 (for real this time)

r=andreas.otte@debitel.net
Attachment #57254 - Flags: review+
what I want is an URL parser that makes the following mangling:

file:foo/bar    => foo/bar
(./foo/bar if you prefer, it doesn't matter to me, as long as we can provide a 
compatible relative path form for ms .url files, and "bare/word" alone doesn't 
work in them, so i think we pretty much have to support 
"foo:notreallybare/word" => "notreallybare/word")

I know that no one likes it, but ms uses it, how much harm would come about 
from us doing that?
A canonical URL? Lets make an new bug for this...
I agree, that should be a new bug, but I wouldn't give it a good chance, because
that kind of urls are violating rfc2396 when interpreted as relative urls. For
these .url files, what about having a base url like file:///whereever and a
relative url like bare/word?
Blocks: 100212
So I'm willing to sr= this because I want to see it land...

* NS_EscapeURL() should use nsACString not nsCString as its result
* shaver sez that true out parameters are not "optional" - meaning you can't
pass in nsnull and expect it to work.. what you can do instead is just pass in
one dummy PRInt32 to each of these.
* its a litte late now, but your URLSegment is essentially the same as an
nsDependentSingleFragmentSubstring, which means you could be passing them around
as nsAString-level classes instead of constantly pulling out mPos and mLen
* doesn't look like you're freeing "buf" in BuildNormalizedSpec
* there may be some cool way of avoiding that extra malloc() in
BuildNormalizedSpec() with CBufDescriptor.. there may be some way to pass off
the malloced buffer to the nsCString...but I'm not an expert with that.
 the code itself might be cleaner if you make buf into an nsCString (or maybe
nsCAutoString)and do a SetCapacity(approxLen+32) - that will also make sure you
never overflow that buffer, because nsCString ought to do it for you..
* a comment just to say that the URLSegment's in nsStandardURL are relative to
mSpec might make it a little clearer what they are for :)

Comment on attachment 57254 [details] [diff] [review]
v1.6 (for real this time)

walked through the v1.6 changes with brendan today and got a verbal sr=brendan
Attachment #57254 - Flags: superreview+
alec: thanks for the comments... didn't see them before sitting down with brendan.

1- i'll make the change to NS_EscapeURL.

2- i disagree, what's the point?

3- nsDependentSingleFragmentCSubstring wouldn't work because i need to be able
to change the underlying buffer out from under the offsets/lengths.  in other
words, the URLSegments cannot be replaced with |char *start|, |char *end|.

4- yikes!  ...you're right.  bad me!

5- originally i used SetCapacity and Append to build up the normalized spec, and
i found it to be really really slow... 8-20% slower than the current method.  i
spent some time with jag trying to understand where the cost is, but we weren't
able to get a good understanding on it... seems like Append is much much more 
expensive then it should be.  and since the cost of the additional malloc is
only 1.5% of SetSpec, i figured it would be sufficient until we get support for
nsCString::Adopt.

6- yeah, i'll add that.
I disagree with shaver as well, thought I'd bring it up, see what other folks say :)

Anyway, all sounds good (esp. good to hear about the perf analysis)
1- made NS_EscapeURL take a nsACString
2- added free(buf) after mSpec = buf;
3- switched to use malloc instead of PR_Malloc per discussion with brendan
4- added comment above URLSegment member variables "relative to mSpec"

assuming r=andreas, sr=brendan,alecf
Attachment #56866 - Attachment is obsolete: true
Attachment #57254 - Attachment is obsolete: true
Attachment #57990 - Flags: superreview+
Attachment #57990 - Flags: review+
landed-on-trunk
marking FIXED.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: