Closed Bug 104876 Opened 24 years ago Closed 24 years ago

c:foo wrongly considered valid URL

Categories

(Core :: Networking, defect)

x86
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla0.9.6

People

(Reporter: BenB, Assigned: andreas.otte)

Details

Attachments

(1 file, 2 obsolete files)

Reproduction: 1. Read plaintext email with "c:foo" in content Actual result: It's linked to <c:foo> (which means, given the recognizer code, that Necko considers "c:foo" to be a valid URL, which is of course wrong). Expected result:Only valid and usable URI schemes are considered valid.
Nice example ... :-) I guess this depends on the recognizer code. The urlparser will make this: urltest -std c:foo c:foo c,,,foo,-1,/,,,,,,c://foo/ which is somewhat legal, since the parser does not know which schemes are vaild. A creation of a real url would give these results: urltest c:foo c:foo Can not create URL So it seems that the code that does this recognition inside mailnews has to do something more than just call the parser, at least, if not creating the full url, it has to check for a valid protocolhandler. This can be a performance hit. Where is the recognizer code located? I will take a look.
> A creation of a real url That's what I do: rv = mIOService->NewURI(specStr, nsnull, getter_AddRefs(uri)); <http://lxr.mozilla.org/seamonkey/source/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp#380> > would give these results: [...] Can not create URL It doesn't, which is the bug.
This is really strange. I used the exact same codelayout inside urltest.cpp: result = pService->NewURI(i_pURL, nsnull, getter_AddRefs(pURL)); which returns the results as seen above. I will dig into this.
See also bug 104693 (and the C:\\ example)
this is a dup of bug 104693. Punctuation liks dots and colons that don't have space before and after are treated like http:// URL's
I will be gone for two days, can't work on it now, but there seems a lot of coverage on this stuff. This seems to be a recent change, but going through changes in netwerk over the last week nothing comes to mind except the str?cmp changes by alecf. Or maybe some caching of protocol handlers gone bad ...
> going through changes in netwerk over the last week nothing comes to mind > except the str?cmp changes by alecf. Same with the converter.
This isn't a dupe of the recent regression; it's been this way since 2001-09-03 at least. Gerv
It seems we call in these cases a default protocol handler which calls the os to figure out if it can handle the url. Somehow, I don't know why yet, "c" is considered a protocol that can be executed by (at least) linux. As a consequence the url can be created ...
No, it's GetCachedProtocolHandler in nsIOService.cpp. It mistakes c for the chrome protocol and returns the chrome handler.
This funny thing was caused by rjesup's checkin to nsIOService.cpp (1.104) http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsIOService.cpp&root=/cvsroot&subdir=mozilla/netwerk/base/src&command=DIFF_FRAMESET&rev1=1.103&rev2=1.104 protocol "c" is found to be "chrome" because of the strncasecmp with length 1.
Attached patch this is better ... (obsolete) — Splinter Review
Attachment #54241 - Attachment is obsolete: true
And now is clear why urltest worked ... there was no chrome url to cache ...
Comment on attachment 54502 [details] [diff] [review] alternate solution w/o the additional strlen gScheme[i][len] might peek into some memory that is somewhere else if you enter a very long scheme to test with. I can't think of a case where this check fails but some boundary checking code might get upset.
actually, i believe it is ok. consider these 3 cases: 1) len > strlen(gScheme[i]) and gScheme[i] is a substring of scheme: strncasecmp would return non-zero, since gScheme[i][k] would equal '\0' for some k < len. 2) len < strlen(gScheme[i]) and scheme is a substring of gScheme[i]: strncasecmp would return zero, but gScheme[i][len] would be non-zero. 3) len == strlen(gScheme[i]) and scheme is equal to gScheme[i]: strncasecmp would return zero, and gScheme[i][len] would be zero.
Comment on attachment 54502 [details] [diff] [review] alternate solution w/o the additional strlen r=rjesup@wgate.com I agree with darin's analysis of the latest patch
Attachment #54502 - Flags: review+
I also believe the patch works fine, it's just that for example with a test scheme of "ccccccccccccccccccccccccccccccccccccc" and a list of cached hadlers like "http","chrome","file" we would peek with gScheme[i][len] into some memory that is not allocated by the gScheme array. If you both think that is okay, I will accept that.
cccccccccccccccccccccccccccccccc won't cause a past-end reference. Looking at it again, chromeccccccccccccccccccccccccc will, however. While this is 99% likely to be "safe" (not cause a page fault), it's not 100% - if the linker puts the string near the end of a page it could fault. Also, even worse, gScheme[i][len] could be a \0 in ANOTHER string (or other data-section constant). So this is inherently wrong. I retract my r= for the no-strlen patch. If you want to avoid the strlen you should encode an array of length of the values in gScheme (gSchemeLen[]). Otherwise do the strlen - the data will be in L1 cache, and the longest scheme is about 6 characters, so the hit is minimal (possible cache miss for strlen's code if the compiler doesn't inline it).
Comment on attachment 54502 [details] [diff] [review] alternate solution w/o the additional strlen My apologies - it's too early in the morning. I looked at it one more time, and realized that the strncasecmp uses len. Since we don't look at gSchemes[i][len] unless strncasecmp returns 0, we know that the scheme and gScheme[i] are the same for len characters. Thus we know that strlen(gScheme[i]) >= len, so we can look at gScheme[i][len]. Mea Culpa. I need caffiene
Attachment #54502 - Flags: review+
Okay, I didn't took the execution order into account. I will go hunting for sr=, but it will be weekend before I check this in, I'm away for three days.
yes, the code working correctly depends on execution order, and rjesup explained it perfectly.
Attachment #54246 - Attachment is obsolete: true
Comment on attachment 54502 [details] [diff] [review] alternate solution w/o the additional strlen okay by me
Attachment #54502 - Flags: superreview+
So we have r=/sr=, let's check in
Keywords: patch
Target Milestone: --- → mozilla0.9.6
fix checked in ...
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Using build 20020322 on winxp, mac 9.1 and linux and the original scenario in plain text compose and html compose, this is fixed. Verified
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: