Closed Bug 73845 Opened 23 years ago Closed 23 years ago

stdURL parsing is broken.

Categories

(Core :: Networking, defect, P1)

x86
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.2

People

(Reporter: jud, Assigned: dougt)

References

()

Details

(Keywords: crash, Whiteboard: PDT+, critical for 0.9.2; ready for checkin)

Attachments

(3 files)

There are various crashers cropping up because parsing a url sometimes produces
a null mScheme. The said URL points to the place where we can wind up w/ a null
mScheme. The crashers manifest themselves in ::IsScheme() when we go to
dereference mScheme.
Blocks: 64833
Keywords: mozilla0.9
Perhaps we should do the following for now?

Index: nsStdURL.cpp
===================================================================
RCS file: /cvsroot/mozilla/netwerk/base/src/nsStdURL.cpp,v
retrieving revision 1.69
diff -u -r1.69 nsStdURL.cpp
--- nsStdURL.cpp	2001/03/13 02:00:54	1.69
+++ nsStdURL.cpp	2001/03/28 22:24:11
@@ -347,6 +347,11 @@
     nsresult rv = mURLParser->ParseAtScheme(i_Spec, &mScheme, &mUsername, 
                                             &mPassword, &mHost, &mPort, 
                                             &ePath);
+    NS_ASSERTION(mScheme, "we should have a scheme");
+    if (!mScheme) {
+        CRTFREEIF(ePath);
+        return NS_ERROR_FAILURE;
+    }
     if (NS_SUCCEEDED(rv)) {
         // Now parse the path
         rv = mURLParser->ParseAtDirectory(ePath, &mDirectory, &mFileBaseName, 
Am going to investigate this tonight. We could avoid the crash as jud suggests
but I am keen to find out how we got here. If anyone has a reproducible case to
lead us to this state I'd appreciate it. 
rick: jud tells me that you may have a reproducible case for this?
I've forwarded gagan a mail msg that causes this crash.
Keywords: crash
Here's what Rick found during investigation.

There's code that creates *raw* uri/url objects (nsSimpleURI, or nsStdURL) using
the component manager. That code then turns around and does a uri->SetSpec("some
uri string"); . Then it turns around and asks for the scheme, host, username,
pwd, whatever. Point being that people are creating urls just to get string
parsing functionality out of them. This is bad for several reasons:

1. if the string they pass in isn't recognized by our parser, we fail.
2. building up a url object for parsing is overkill.
3. it allows a URL to be created w/ out a scheme
4. the model circumvents our entirely flexible necko protocol handler model
because it doesn't go through necko for uri creation.

Suggested fixes:

1. hang general uri string parsing off of nsIIOService so all these parsing
usages can use that. nsIOService's impl would *just* do *simple* specification
based heuristic parsing on the string. and hand out strings to the caller (much
like the old NET_ParseURL() did.
2. For those really needing a URL object, they would need to be fixed up to go
through necko as originally intended.
Target Milestone: --- → mozilla1.0
due to high interest from the very important embedding customer I'm setting this
to 0.9.1 (for renewed consideration)
Target Milestone: mozilla1.0 → mozilla0.9.1
Blocks: 75664
Whiteboard: needed by 05/08/01
the overall cleanup is going to be bigger than the scope of this bug. But I
would go ahead and add this check proposed by jud to prevent crashing (for now)
someone give me an sr on jud's patch?
we should sooo avoid my patch, it will sweep the problem under the carpet and
we'll forget about the real solution and battle incarnations of this for years
to come.
no you misunderstood my comments. Your patch is a temporary fix to get over the
problem while me and rpotts are working on the good-clean fix which is possibly
going to take longer than 0.9.1 deadline. I am totally in favor for fixing this
the right way. Just that we may not be able to get that "right" way in time. 
gagan, could we do the following?
- keep this bug open
- create a new bug just for the patch in the current bug
- check-in the patch in both 0.9 branch and 0.9.1 trunk by 11/May/01 (after the
proper procedures are observed)
- close the newly created bug for the patch and keep the current bug open
dougt taking over URL parsing stuff... 
Assignee: gagan → dougt
#73323.  another bug where the caller had to be fixed.  when this land, we need 
to make sure to remove all these temporary fixes.
Blocks: 73323
REPEAT OF QUESTIONS TO GAGAN: 
dougt, could we do the following?
- keep this bug open
- create a new bug just for the patch in the current bug
- check-in the patch in both 0.9 branch and 0.9.1 trunk by 11/May/01 (after the
proper procedures are observed)
- close the newly created bug for the patch and keep the current bug open

No longer blocks: 64833, 75664
well:

- keep this bug open

until it is fixed, I will :-)

- create a new bug just for the patch in the current bug

I don't see the point.

- check-in the patch in both 0.9 branch and 0.9.1 trunk by 11/May/01 (after the
proper procedures are observed)

we are past this date.

- close the newly created bug for the patch and keep the current bug open

I don't see the point.


I am with Jud here.  The patch is not something that we should do.  people will 
be implementing against 0.9 and they may do the wrong thing here.  If your 
customer want to patch their tree, have them do that, but lets not apply this 
patch to mozilla.
First, "hang[ing] general uri string parsing off of nsIIOService", is not really 
that easy.  Parsing URI's is nontrival and I don't want to see it in two places.

The way to do URL parsing is to create a URL Parser via the component manager.  
A problem is that there are 3 parsers to choose from: std, auth, and noauth.  
This is what creating a standard uri buys you - not having to deal with the 
complexities of parsing.  

If users are going to call SetSpec on a standard url that they created via the 
component manager, they must expect null's on the standard getters.  Garbage in 
- Garbage out.  Any simplified parser (as suggested) would not be expected to do 
more.

I don't think that "building up a url object for parsing is overkill.".   It is 
the only way to avoid "circumvent[ing] our entirely flexible necko protocol 
handler model".

My suggestion is that clients call NS_NewURI() to ensure that they will support 
pluggable protocols.  repeat contrapositive - if you don't want to support 
pluggable protocols, don't call NS_NewURI().  

If clients have a hell-bent need for creating a standard uri outside of necko, 
remember to pass in qualified url string and program defensively.  I am 
not sure that this should be officially supported in any embedding 
context.

Just some thoughts.... comments?
Status: NEW → ASSIGNED
the problem is that we have several clients in the current tree who don't
understand the plugable protocol architecture, and subsequently they don't
expect nulls back. as a result, when you put our code in a plugged protocol
environment (someone adding their own protocol handler) we fail in several
places because people are trying to parse urls using the object from the
component manager.

the idea behind the url parsing off of nsIIOService was to just allow basic
string parsing, nothing fancy, like the old Net_ParseURL() function. creating a
full blown URI object is indeed overkill if all you're doing, over and over and
over again is looking for the scheme.

the point is that we have many places in the system where all people want is
string parsing, and it's been suggested that we provide a centralized place for
that, nsIIOService, so we get *some* degree of consistency.

another option would be go to around to all those places and have them do the
string parsing themselves (lots of room for error there). and another option,
which you point out is that all callers could be "fixed" to use NS_NewURI() and
deal w/ failures gracefully.
I think that I am arguing that all callers should either (a) be "fixed" to use 
NS_NewURI(), or (b) if their parsing is simple enough (like getting the scheme) 
they should do it themselves.  


I think that exposing some "simple url parser" on the IOService will just be 
plagued with bugs:  "It does handle a, b, c", "I breaks this load", ect.  I much 
rather have correct parsing even it if cost a bit more.  
Blocks: 75664
Call sites (as of May 21) that must change:

content\xbl\src\nsBindingManager.cpp(877)
content\xbl\src\nsXBLService.cpp(1151)
editor\base\nsHTMLDataTransfer.cpp(662)
embedding\browser\powerplant\source\CBrowserApp.cp(416)
embedding\components\ui\helperAppDlg\nsHelperAppDlg.js(422)
extensions\inspector\resources\content\prefs\pref-sidebar.js(17)
extensions\pipella\ui\content\protoCommon.js(435)
extensions\python\xpcom\file.py(78)
extensions\python\xpcom\file.py(155)
extensions\wallet\signonviewer\SignonViewer.js(166)
extensions\wallet\src\singsign.cpp(2044)
extensions\wallet\src\singsign.cpp(2168)
extensions\xml-rpc\src\nsXmlRpcClient.js(81)
js\src\xpconnect\loader\mozJSComponentLoader.cpp(88)
mailnews\base\src\nsMsgServiceProvider.cpp(107)
mailnews\base\src\nsStatusBarBiffManager.cpp(115)
mailnews\compose\resources\content\MsgComposeCommands.js(1937)
mailnews\compose\src\nsMsgCompFields.cpp(130)
modules\libimg\src\if.cpp(1749)
netwerk\build\nsNetCID.h(71)
netwerk\build\nsNetModule.cpp(635)
rdf\chrome\src\nsChromeRegistry.cpp(213)
rdf\chrome\src\nsChromeRegistry.cpp(1321)
rdf\chrome\src\nsChromeRegistry.cpp(2514)
rdf\chrome\src\nsChromeRegistry.cpp(2541)
rdf\chrome\src\nsChromeRegistry.cpp(2837)
rdf\chrome\src\nsChromeRegistry.cpp(3134)
uriloader\exthandler\nsExternalHelperAppService.cpp(172)
webshell\tests\viewer\samples\soundtest.html(8)
widget\src\beos\nsFilePicker.cpp(236)
widget\src\mac\nsFilePicker.cpp(627)
widget\src\os2\nsFilePicker.cpp(229)
widget\src\windows\nsClipboard.cpp(659)
widget\src\windows\nsFilePicker.cpp(269)
xpfe\browser\resources\content\metadata.js(414)
xpfe\browser\resources\content\sessionHistoryUI.js(132)
xpfe\browser\resources\content\sessionHistoryUI.js(135)
xpfe\communicator\resources\content\contentAreaDD.js(275)
xpfe\communicator\resources\content\nsContextMenu.js(676)
xpfe\components\filepicker\res\content\filepicker.js(32)
xpfe\components\filepicker\src\nsFilePicker.js(90)
xpfe\components\prefwindow\resources\content\pref-applications-edit.xul(156)
xpfe\components\prefwindow\resources\content\pref-applications-new.js(75)
xpfe\components\prefwindow\resources\content\pref-applications.js(72)
xpfe\components\search\src\nsInternetSearchService.cpp(1219)
xpfe\components\sidebar\resources\customize.js(259)
xpfe\components\sidebar\resources\sidebarOverlay.js(630)
xpfe\components\sidebar\src\nsSidebar.js(44)

I am updating all callers sites that use SetSpec.  I posted a question on the 
newsgroup regarding callers to SetFile.  If you are interested, please comment.
Open Networking bugs, qa=tever -> qa to me.
QA Contact: tever → benc
has dougt ever given up on widesweeping changes that could hork the tree? ;-)
lets take jud's short term crash protection scheme for 0.9.1/beta
LoL.  

The current patch is only 87 K this time and touches a mere 46 files.  I am 
still working out the kinks, but it should be ready in a day or two.  
Patch to convert callers is more risky then valeski's patch per PDT.

Jud, can you check in your "hack".  Once it is in, I can move this bug to 0.9.2.
once we cut a branch for 0.9.1, I'll put my patch on the branch, my patch is the
wrong thing and I don't want it on the trunk.
Whiteboard: needed by 05/08/01 → workaround ready and will check into branch
workaround bug number = http://bugzilla.mozilla.org/show_bug.cgi?id=82385

Moving this bug out a milestone.
Keywords: mozilla0.9
Whiteboard: workaround ready and will check into branch
Target Milestone: mozilla0.9.1 → mozilla0.9.2
I don't think you need a full blown uri/url-object to get url-parsing, you only
need the urlparser-object itself. And, if I remember correctly, the three
available parsers are kind of static (global instances) with necko for
performance reasons. To get urlparsing you can just use one of these "preloaded"
parsers.
Andreas, that is what the nsStdURL does for you.  It determines which preloaded
parser you should get.  The problem with going directly to one of these parsers
outside of NS_NewURI is that addon protocol schemes will be ignored.  There
needs to be one bottlenecko for all URI creation
Doug,

Currently, nsStdURL does alot more than simple URL parsing :-(  It is MUCH more
expensive to create a URI object than it is to simply parse the string.

Maybe the "right" thing to do is to make nsStdURL *much less expensive*...  But
unless we do, I don't think that we can wash our hands of the problem and say
"use NS_NewURI(...) or write your own parser..."

I think that the WORST situation we can end up in is if consumers ACTUALLY START
DOING THEIR OWN URL PARSING!!

This would be a HUGE disaster and would ensure that pluggable protocols never
worked :-(

I think that we need to evaluate whether we can make URI object creation cheap
enough that we can create millions of them for the sole purpose of string
parsing...  If we cannot, then I think that we must expose a consistant means of
parsing URIs at the Necko level that is effecient...
I don't think that you can make url parsing less expensive while preserving
plugablity.  
Doug, why not use nsStdURLParser as default parser? I think it is currently done
this way. If you don't wish for a special parser (bound to the scheme) you get
the StDURLParser. It is designed to work for all protocols that are not simple.
There are certain corner cases which make the other parsers necessary, but
StdURLParser usually does the job if the urlsyntax fits the usual scheme. I
think we can offer an api to access the StdURLParser without creating an
URI-object.
Well, under-the-hood this is what we are going to basically do.  There is some 
additional work to make this work pluggable protocols/uri parsers (a mapping of 
scheme to urlparser).  

The jist is to add a function on the IO service that would take a scheme (or 
string containing a scheme), and return a nsIURLParser.  This *could* be QI'ed 
for a protocol specific uri parser if needed.

Another function would be added which would take a uri string and part key, and 
would return you the request string part of of the URI similar to what netlib 
did.

I will attach some IDL's early next week for comment.

Priority: -- → P3
sorry it took so long to get back to this bug, but now that 0.9.1 is done, I can 
focus on this for a while. 

The last attachment adds the two new interface that rick and I are proposing.  
just out of curiosity: why isn't NS_IURLPARSER_KEY NS_URLPARSER_CONTRACTID?
and shouldn't it be defined as "@mozilla.org/network/urlparser;1" instead of
"@mozilla.org/urlparser;1"?
Attached patch patch v.1Splinter Review
The last patch changes all callers to use either the uri parsing api on the
ioservice or to create a new URI via the NS_NewURI method.

I know that I am leaking a URI in nsGlobalHistory::RemovePagesFromHost.  Alec,
can you help me out here?  Is this what you meant by caching a URI?  Can we just
make this a nsCOMPtr and forget about it?

Alec, darin, rick, please review my patch...

Rick, that bug we were chasing existed on the trunk!  D'oh.
Comments on the patch and API:

for the sake of those of us who write javascript, can we please change this:
+    void extractUrlPart(in string urlString,
+                        in short mask,
+                        out unsigned long startPos,
+                        out unsigned long endPos,
+                        out string urlPart);


to
+ string extractUrlPart(in string urlString,
+                       in short mask,
+                       out unsigned long startPos,
+                       out unsigned long endPos);


----
GetParserForScheme hits the registry EVERY time - seems like we should be
caching this in memory. 
Also, the entire function is enclosed in an 
"if (scheme) {..."

why not just start the function with

if (!scheme)
    return nsServiceManager::GetService(kStdURLParserCID,
NS_GET_IID(nsIURLParser), (nsISupports **)_retval);

I also notice that literally every single call to extractUrlPart() passes in
null (and in fact, in SignonViewer, you're passing in 0, not null, which is
wrong) - let's not create extra API parameters if nobody needs them.. (yes, I
agree it would save allocations, etc, but I don't see any of the callers you
list begging for it)


Finally, I notice that in the API you refer to the first parameter as a "mask"
which would imply that you could pass in a composition of values (i.e. url_Host
| url_Username).. when in fact the API allows you to specify exactly one of
these values. You need a new name, and you need to specify that in
nsIIOService.idl...

D'Oh! Wrong patch!  

I think that I did mention to you yesterday that I was cacheing the stuff in
GetParserForScheme.  I will post a new patch over the weekend/monday.

Also, I will change the api and JS callers to use the return value as the
outstring.  I will also change the parameter name from "mask" to "flag" to avoid
confusion.  

However, I will not change the out vars for the start and end position.  These
all pass null now since under the hood we do a terrible job "guessing" what
these values should be.  Once the URI Parser is rewritten to use scc's string
classes, these parameter we be more useful.  I guess what you may be argueing
for, or what I think I hear, is that you want an api that avoids extra parms. 
If this is the case, we could split this into two methods if you think that
would be cleaner.


well, the way I look at it is if we're really bent on saving the allocations, we
should be using ACString rather than "string" - that allows us to pass in a
nsAWritableCString, which will avoid the allocations (not the copying, but I
argue that the allocation is the bigger issue) - I'd like to here scc's advice
on that..
I don't think nsACString is currently exposed to IDL, even though nsAString is.
 I'd like to see it exposed, because we would use it in the LDAP XPCOM SDK as
well, if we could.  scc?
It is all kind of a mute point since the uri parsers are not updated to scott
strings....
Alec, this next patch will address most of your concerns.  However, the one that
I do not address is the return value being the string.  I did not change this
for two reasons: the first is that |extractScheme| already uses this syntax.
Secondly, we are needing to land this in the next two days or a ton of stuff
will fall off my plate.  I know, that the best reasons/excuses, but I will file
another bug to switch both extractScheme and extractUrlPart to the more use
friendly forms sometime after 0.9.2.

This next patch is only for mozilla/netwerk... the rest of the stuff stays the same.

Alec, Darin can I please have a review?
looks good (r=darin)

comments:

- make sure you don't transfer your branch client.mk:
  MOZ_CO_TAG = DOUGT_URI_PARSER_06122001_BRANCH

- from JS, you use 1<<1 instead of the actual urlpart flag... why not use
  ioService.url_Username?  extensions/wallet/signonviewer/SignonViewer.js

- the indentation seems off in:
  mailnews/base/src/nsMessengerMigrator.cpp
  netwerk/base/public/nsIIOService.idl
  
- it seems unfortunate that in order to implement ExtractUrlPart, we have to
  parse the entire urlString.
thanks for reviewing this patch.  It is long and boring....

- make sure you don't transfer your branch client.mk:
  MOZ_CO_TAG = DOUGT_URI_PARSER_06122001_BRANCH

Doesn't everyone want to be on my branch?! :-)


- from JS, you use 1<<1 instead of the actual urlpart flag... why not use
  ioService.url_Username?  extensions/wallet/signonviewer/SignonViewer.js

fixed.

- the indentation seems off in:
  mailnews/base/src/nsMessengerMigrator.cpp
  netwerk/base/public/nsIIOService.idl

^M's will be pounded out...
  
- it seems unfortunate that in order to implement ExtractUrlPart, we have to
  parse the entire urlString.

Yeah,  we need to optimized this later...
Priority: P3 → P1
I'll do a fuller review of this in a bit (just got back from vacation) but I
really have to object over the IDL syntax of this change! When you IDLized
nsIAtom, I asked for a similar change (re: ToString() and friends), and you said
"I'll get to it later" - and here we are years later and no change. 

I write a lot of javascript for this product, and I /know/ I'll be using this
down the road. It is a massive pain to have to use "out" values from JavaScript
- and all told, there are only 3 changes you have to make to support this
change, all JavaScript, because the C++ signature will not change in the
slightest. Just search your patch for ".extractUrlPart" and you'll see the 3
places I'm talking about.
okay.  I will change it.  How is the rest of the patch?
You can't pass NULL for an XPCOM out param.  End of story.  Please pass pointers
to dummy PRInt32s instead (they could both be the same one if you want to save
the four bytes of stack).

Similarly, you can't avoid creating the objects in JS.

  ioService.extractScheme(something, {}, {}, fourth);

The {} is a clear way of showing that you're going to discard it, though you
could make it a hair cheaper by doing:

  var unused = { };
  ioService.extractScheme(something, unused, unused, fourth);

The JS bit is a (large) style nit, but passing NULL as an out parameter is
strictly verboten.  Please do not check in this patch until you fix that.
is it verboten per XPConnect or JS? It seems silly to make that verboten - there
are plenty of places where we have optional out parameters...

the rest of the patch looks great though
The last patch should address the js out parameters.  shaver, alec, can you 
review this please?
ack, unfortunately, your conversion from the out param to the return value isn't
quite right ..example:

 try {
-        url = ioService.extractUrlPart(host, ioService.url_Username, 0, 0,
username);  // TODO verify!
+        username = ioService.extractUrlPart(host, ioService.url_Username,
unused, unused);  // TODO verify!
       } catch(e) {}
       if (username.value) {
         user = username.value;

here, "username" is the actual value you're using, and "username.value" will
have no value...the other two cases suffer the same ills..
yeah. I see.  something like this would work right:

+      try {
+        username = ioService.extractUrlPart(host, ioService.url_Username, 
unused, unused);  // TODO verify!
+      } catch(e) {
+        username = "";
+      }
+
+      if (username != "") {
+        user = username;
       } else {
         user = "<>";
       }


Also, I don't see the problem with the calls in sessionHistoryUI.js.
ah, I was reading that part wrong.. 
ok, with all of that, sr=alecf
great!  alec, thank you for help.
Marking PDT+
Whiteboard: PDT+
a=blizzard on behalf of drivers for 0.9.2
Whiteboard: PDT+ → PDT+, critical for 0.9.2
I have a crasher bug 86800, crash because nsStdURL::SetSpec() does not do a null
check on the parameter. I noticed a lot of stuff in that class does not have
null checks. The patch here does not seem to add those null checks (but it might
still fix my crasher, haven't checked yet). Anyway, do we really want to crash
every time somebody calls these methods wrong? Wouldn't it be better to put some
NS_ENSURE_ARG_POINTER magic in the file, so we would assert in debug builds and
not crash and burn in optimized?
 Heikki, the crashes were caused by a misuse of the nsStdURL implementation.  I 
have fixed all the call sites.  Read my post in the netlib newsgroup for more 
information.

Whiteboard: PDT+, critical for 0.9.2 → PDT+, critical for 0.9.2; ready for checkin
*** Bug 87131 has been marked as a duplicate of this bug. ***
fixed checked in.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
+verifyme
Is there a place I can verify these fixes, or are the qa people for the
components that were sending null schemes going to need to check this?
Keywords: verifyme
The PDT list is shrinking, and I'd like to close this. Whats the steps to verify
this?
verify that the patches were checked in? 
VERIFYING,
per bbaetz.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: