User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:126.96.36.199) Gecko/20080129 Iceweasel/188.8.131.52 (Debian-184.108.40.206-2) Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:220.127.116.11) Gecko/20080129 Iceweasel/18.104.22.168 (Debian-22.214.171.124-2) This tag: <a href=http://yahoo.com/space space here>linky</a> genrates this link text: htt:///yahoo.co/m/space%20space%20her there's a ^A at the beginning of that, if you can't see it here. What is wrong with this? Let me count the ways: htt: instead of http: three slashes instead of two before "yahoo" slash in the middle of "com" missing "e" at the very end I'm a C programmer and this smells like the kind of result that could spring from a potential buffer overflow bug. However I have no evidence that an overflow is actually happening. I tag this bug "Security" out of caution. BTW this link works in Outlook - not that I would suggest for a second that you imitate Outlook! But that's why I was testing it. Reproducible: Always Steps to Reproduce: 1. Paste the above link tag into HTML 2. View it 3. "Copy link location" from link Actual Results: htt:///yahoo.co/m/space%20space%20her (note leading ^A) Expected Results: http://yahoo.com/space%20space%20here (possibly with leading ^A)
This looks like it might be related to bug 415034? Chip: could you attach a HTML test file, rather than inline HTML code? I'm not able to get your example code to act like a normal link...
It's not a "normal" link at all - the first char of the URL is a ^A. But it is a URL that can be copied with "Copy link location".
(adjusting summary. quotes don't matter)
Ugh. We first decide this is a relative URL because we're unable to extract a valid scheme (nsIOService::NewURI calls ExtractScheme). ExtractScheme fails because of the invalid scheme character. If you replace the x01 with '_' you see something like https://bugzilla.mozilla.org/_http://yahoo.com/space%20space%20here which is fine (probably not what the author intended, but not an unreasonable guess). So given the base, nsHttpsHandler::NewURI is called, which creates a nsStandardURL and for some reason calls resolve on the purely "relative" part. nsBaseURLParser::ParseURL looks for a scheme, but skips initial space and all control characters, not just the limited whitespace set used earlier. We now think the scheme is a valid "http" instead of bailing out at this point. We now remember scheme starts at 0 and has a length of 4. This is true relative to our modified spec string which we incremented when skipping initial chars, but the caller never gets told we changed the start so all the offsets from that point on are off one. This bit should have used net_FilterURIString() like the original scheme extraction. At this point because the scheme doesn't match what it thought the original base scheme was it assumes it must be an absolute URI, regardless of the fact that it was convinced it was relative earlier. When it puts the URI back together from its component pieces in BuildNormalizedSpec all the offsets are off by one (or more, if there were more leading control characters) which explains the odd result. The scheme length is 4, so the 'p' gets dropped. The host is "/yahoo.co" instead of "yahoo.com", and of course it puts "://" between the scheme and host thus the three slashes, and so on. Because BuildNormalizedSpec uses our string classes to handle potential expansion of percent encoding and IDN conversion to punycode there's no danger of overflowing any buffers. The code is clearly broken, but I don't think there's a security problem since the URL in question was broken to begin with (invalid characters). On the other hand any time we get confused when parsing URIs I get nervous.
Yeah, seems to me like ParseURL() should do exactly what ExtractScheme does... Darin, is there a reason this code skips control chars?
This is fixed in bug 451613, checked into the 1.8 and 1.9.0 branches.
Verified for 126.96.36.199 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:188.8.131.52pre) Gecko/2008112505 GranParadiso/3.0.5pre. Verified for 184.108.40.206 with Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:220.127.116.11pre) Gecko/2008112503 BonEcho/18.104.22.168pre.
Since this bug is "fixed" (per Daniel Veditz, comment #7), should we close it?
Jason: Bug 451613 hasn't landed on the trunk yet, so this bug can't be FIXED. (Additionally, it hasn't landed on 1.9.1, but that'd be tracked with the fixed1.9.1 keyword.)
jst just landed this for me