Closed Bug 123006 Opened 23 years ago Closed 19 years ago

Keyword substitution with %s doesn't escape characters

Categories

(SeaMonkey :: Bookmarks & History, enhancement)

x86
All
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Future

People

(Reporter: brad, Assigned: p_ch)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(2 files, 4 obsolete files)

You can't use strings like "c++" or "s&p 500" with keyword substitution in most circumstances because Mozilla doesn't convert characters to their hex counterparts (e.g., '+' should become %2B, '&' should become %26, etc.). Build ID: 2002020103 Steps to recreate: . Create a bookmark with a keyword for Google (name "Google Search", location "http://www.google.com/search?q=%s", keyword "g") . Use it to search for "c++": "g c++". Note that it left the '+'s as entered and they are ignored.
This would break multi-term queries entered by hand requiring these characters (esp. &) literally. When bug 98749 is fixed, this will change, as they could be passed as multiple parameters. Until then, in the interest of usability, this should wait. Marking dependency.
Depends on: 98749
Confirmed. ->Enhancement
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
Target Milestone: --- → Future
this bug and it's dependant (and some others) can be fixed by implementing RFE bug 124237
*** Bug 114371 has been marked as a duplicate of this bug. ***
is this bug the reason that non-ASCII characters (such as german Umlauts) don't work with a google custom keyword, or should I file a new bug?
Yes, Joachim, this is almost certainly why non-ASCII characters (such as german umlauts) don't work with keyword substitution. I really don't think this is an "enhancement". It is critical for making use of keywords. For example, a Google keyword set on "gg" that looks like: http://www.google.com/search?q=%s will break terribly if you type "gg b&w", giving a 404. The resultant URI ("http://www.google.com/search?q=b&w") is something completely different to what was expected. This can be get worse if the incorrect URI means something different, such as "gg blah&hl=fr". It is quite possible, using keywords as they are, to create invalid URIs. "gg %" (given the previous keyword example) would create the following URI: http://www.google.com/search?q=% Note that the "%" is unescaped, making the URI invalid. This is Really Bad(TM), and certainly not just an "enhancement".
*** Bug 135471 has been marked as a duplicate of this bug. ***
Not OS specific. To have multiple, meaningful substitution (i.e. substitutions that aren't all the same), we need bug 124237 to be fixed, not bug 98749. See comment 2 in bug 124237.
Depends on: 124237
No longer depends on: 98749
OS: Windows 2000 → All
No longer depends on: 124237
Ack! I removed the dependency while adding myself to the Cc list of bug 124237. I had the old page open in a tab and didn't reload. :-( Re-adding the dependency. Sorry for the multiple spam.
Depends on: 124237
This breaks basic searches.
Summary: Keyword substitution with %s doesn't convert characters → Keyword substitution with %s doesn't escape characters
*** Bug 170953 has been marked as a duplicate of this bug. ***
This breaks having a custom keyword which turns: bug #43 into http://bugzilla.mozilla.org/show_bug.cgi?id=#43 correctly, because the browser interprets the # as an anchor and so only passes the "http://bugzilla.mozilla.org/show_bug.cgi?id=" to Bugzilla, which obviously gets confused. (See related bug 172415). I can't see why bug 124237 needs to be fixed to fix this bug. Can someone enlighten me? Gerv
I have gu => http://bonsai.mozilla.org/cvsguess.cgi?file=%s i use things like gu nsCRT.h#30 please don't break my usage pattern.
timeless: I understand that there are uses for the current behaviour, but I think it's a reasonable assertion that the escaping behaviour will Do The Right Thing for more people more of the time. I would have thought you could knock up a Bookmarklet which gave you a text box to type "nsCRT.h#30" into, and then which did the right thing with that. Gerv
Blocks: 172415
Why not add another substition (e.g. %S instead of %s) which quotes the entities correctly. Right now using %s is really annoying when I try to find german documents containing umlauts.
That's a great idea - although I'd say that the original %s should quote correctly, and the new value should be a different letter, with %S and %s doing the same thing, to avoid an obvious class of user error. Gerv
*** Bug 199545 has been marked as a duplicate of this bug. ***
*** Bug 161033 has been marked as a duplicate of this bug. ***
Escaping wouldn't fix the problem for keyword bookmarklets (such as the ones at http://www.squarefree.com/bookmarklets/keywords.html) because javascript: URLs are unescaped before being executed. But it would fix the problem for search keyword bookmarks, for example when the query contains '+'.
*** Bug 212668 has been marked as a duplicate of this bug. ***
This is a REALLY ANNOYING problem for me. Recommend severity -> normal. I understand the utility of the unescaped substitution but that's really an advanced feature for a select few. The default behavior should be to escape for most people most of the time. The proposed solution of having different %s-style wildcards for escaped and unescaped substitution would work great for me, so long as the default behavior is to escape. Better would be a check box in the Bookmark's properties - whatever the name/default-value is for this check box, the default behavior should be escaped substition (at least for https?:// urls.) I frequently do google searches with + signs - for example google +"browser comparisons" +mozilla and the +'s are converted into spaces so Google never sees them. I know I could look up the hex value for "+" and type %dd"browser comparisons" %ddmozilla but that's just silly (and four extra keystrokes.) People who want to have both escaped and unescaped for the same site could make two bookmarks with the same url but different keywords.
I've written an example bookmarklet that gets us a little closer without having to change the code base -- it's not perfect: javascript:(function(){var s='http://www.google.com/search?q=%%';s=s.replace(/%%/,encodeURIComponent('%s'));location.href=s})(); It searches Google. I use %% inside to allow the user to put anything they want either before or after the substitution. If you bookmark this and give it a keyword, you can do things like g2 "comeau c++" and it will work properly. It will *not* work for cases like g2 "rosemary's baby" because of the single quote. Note that I use single quotes because for me, double quotes in Google searches are extremely common.
Wouldn't g2 "rosemary\'s baby" work if you wanted to search on something with a single-quote?
Hmm. It should. <tries it> Yep, that works. It's still a bit inconvenient, though, especially for the creation of new keyword bookmarks (though I tried to make it simple). The bookmarklet should fill the void until we can get some sort of true solution.
WFM - withdraw severity->normal recommendation
FWIW, here's my tweaked version of the bookmarklet: javascript: var s = "http://www.google.com/search?q=%%"; var q = escape('%s').replace(/+/, "%252B"); s = s.replace(/%%/, q); location.href = s; void(1); Line-broken for readability: javascript: var s = "http://www.google.com/search?q=%%"; var q = escape('%s').replace(/+/, "%252B"); s = s.replace(/%%/, q); location.href = s; void(1);
Er, /+/ should be /+/g in my tweaked version above. as to the dependency... I think the multiple substitution case would seem to be most simply handled by using multiple %s's in the bookmark http://www.example.com/page/?foo=%s&bar=%s call as example 17 fish for http://www.example.com/page/?foo=17&bar=fish and so on
Basically there are two conflicting user scenarios: a) multi-term queries which require that '&' is not encoded, and b) queries containing '+' and special characters (umlauts etc.) which need to be encoded. Why not encode all characters, except '&' and '#'? Then, the keyword substitution would work in almost all cases, while remaining very flexible. (Of course, searching for "s&p" with a google keyword would still not work.)
Patch for encoding all characters except '&' and '#' in keyword '%s' substitution. Since the encoding is in utf-8, this should be specified, e.g., when searching google. Using http://www.google.com/search?ie=utf-8&q=%s works fine with umlauts.
Wouldn't you need to add "=" to the list of escape-exempt characters? (bringing it up to [&=#])
What's wrong with the idea in comment 15? Gerv
Matthew: Yes, '=' should not be escaped. This should be fine with most queries, since "search?q=a=b" is generally interpreted as name="q" value="a=b". Gerv: IMO the idea from comment 15 and comment 16 is very appealing. The only downside of such a solution is that existing "multi-term queries" break (until "%s" is replaced by "%S"). Both solutions (comment 15 and comment 28) seem reasonable, I just would like to see one of them implemented...
I think that bug 98749 (and bug 123237, which it is duped against) want multiple different parameters, each of which gets allocated to a %s, rather than a single parameter which gets placed into all occurrences of %s. You need to split the text after the first space on whitespace, and then allocate each member of the list to a %s until you run out of list members or %ses. :-) Still, this seems like the right fix for _this_ bug :-) Gerv
Indeed, bug 124237 asks for multiple replacement parameters. However bug 98749 (and its duplicate bug 124173) ask for replacing multiple '%s'es with the same value. The example given in the description suggests typing "mozbug XXXXXXXXXX" in the location field, i.e., just one parameter (comment #5 in bug 98749 stresses the difference to the multi-parameter case). This would be fixed by attachment 139974 [details] [diff] [review]. Anyway, I agree with Gerv that these other issues are not so relevant for this bug. I just wanted to point out the nice side effect of the patch :)
*** Bug 153027 has been marked as a duplicate of this bug. ***
Attachment #139920 - Attachment is obsolete: true
Attachment #139974 - Flags: review?(bryner)
Comment on attachment 139974 [details] [diff] [review] Encoding all characters for %s replacement (complete urlencoding); no encoding for %S -> ben, he knows this code much better than I do
Attachment #139974 - Flags: review?(bryner) → review?(bugs)
I also think that the default behaviour should be to urlencode everything in the url, i.e. also that spaces should encode as '+', and very importantly that meta-characters like '&', '#' and '=' get escaped correctly. It's very confusing to a non-technical web-user why certain keyword searches don't work as expected, and it certainly breaks my usage pattern (try searching for c# something-or other... oops). I understand this will break some things, but frankly, that's a different problem perhaps better solved by an extension or some other means. My vote for complete urlencoding :-).
Attachment #139974 - Attachment description: Encoding all characters for %s replacement; no encoding for %S → Encoding all characters for %s replacement (complete urlencoding); no encoding for %S
Blocks: 252209
Flags: blocking-aviary1.0?
*** Bug 257911 has been marked as a duplicate of this bug. ***
Attachment #139974 - Flags: superreview?(neil.parkwaycc.co.uk)
Comment on attachment 139974 [details] [diff] [review] Encoding all characters for %s replacement (complete urlencoding); no encoding for %S >+ shortcutURL = substitutedURL==shortcutURL ? null : substitutedURL; I'd prefer a pre-substitute test. Or at least spaces around the ==
Attachment #139974 - Flags: superreview?(neil.parkwaycc.co.uk) → superreview+
Flags: blocking-aviary1.0? → blocking-aviary1.0+
We don't need %S support, especially now keyword addition is automated.
Attachment #139974 - Attachment is obsolete: true
Fixed in firefox, br&trunk.
Assignee: bugs → p_ch
Status: ASSIGNED → NEW
Flags: blocking-aviary1.0+
QA Contact: claudius → seamonkey.bookmarks
Blocks: 181243
Blocks: 264406
What is restraining the patch updated by Ben from getting into 1.8 ? Can the existing s/sr be transferred ?
(In reply to comment #43) > What is restraining the patch updated by Ben from getting into 1.8 ? the fact that it's a firefox patch?
Attached patch Updated patch for Mozilla (obsolete) — Splinter Review
OK, so I didn't notice ben had only modified the FF part of the patch. Here is the update for the mozilla code, based on what bg did for FF. I keeped the %S, as I don't really see the connexion with automated keyword addition (what is exactly automated keyword addition ?) With regard to bug 264406, this patch happens to encode i18n charcters as UTF-8 when using %s, and encode as the local encoding when using %S. A bit of a quirk, but the only clean solution to the problem is see is using LAST_CHARSET to always do the right thing, which is significantly more complex.
Attachment #162844 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162844 - Flags: review?(neil.parkwaycc.co.uk)
Comment on attachment 162844 [details] [diff] [review] Updated patch for Mozilla >+ shortcutURL = shortcutURL.match(/%[sS]/) ? shortcutURL.replace(/%s/g, encodeURIComponent(text)) Nit: you should only have one space before the = Nit: you should use regexp.test(string), not match Nit: you wrapped a line but it was still over 80 characters. Either wrap it harder or don't wrap it at all. >+ .replace(/%S/g, text) >+ : null; r+sr=me only once all three nits are fixed.
Attachment #162844 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162844 - Flags: superreview+
Attachment #162844 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #162844 - Flags: review+
Attached patch nit corrected mozilla patch (obsolete) — Splinter Review
It's not fair, ben could enter his patch without the nitpick about using match ;-)
Attachment #162844 - Attachment is obsolete: true
Attachment #162990 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162990 - Flags: review?(neil.parkwaycc.co.uk)
Comment on attachment 162990 [details] [diff] [review] nit corrected mozilla patch Sorry, didn't you know that /%[sS]/ is already a RegExp? Compare [] is a new Array, {} is a new Object and function anonymous(){} is a new Function.
Attachment #162990 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162990 - Flags: superreview-
Attachment #162990 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #162990 - Flags: review-
Should rs= as per neil's email comment The tree is currently open, so if someone could check it in ...
Attachment #162990 - Attachment is obsolete: true
Attachment #163330 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #163330 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #163330 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #163330 - Flags: superreview+
Attachment #163330 - Flags: review?(timeless)
Attachment #163330 - Flags: review?(neil.parkwaycc.co.uk)
patch checked into the trunk. Shouldn't this go in to branches as well?
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment on attachment 163330 [details] [diff] [review] corrects neil's comments >Index: navigator.js >=================================================================== >+ // Bug 123006 : %s replace and URI escape, %S replace with raw value We shouldn't be putting comments like this into the tree.
Why not? Comments are good.
(In reply to comment #52) > Why not? Comments are good. Because if we did, we'd have thousands of "Bug <n>" comments polluting the source code. That's what bonsai.mozilla.org is for.
I added the comment : - because there was a similar comment just above, but this was back in 2000, so the politic about that may have changed - more to document the difference in behaviour between %s and %S. But it's true this doesn't belong to the sourcecode, and should be added to mozilla doc somewhere. If someone with the right rights reading this wants to remove the line, that's OK for me, but I don't feel it's worth bugging someone to do it. I'd like to get a look at bug 144875 now : "custom keywords with keyword same as bookmark don't work". If I can reproduce, and if patching it touches the same code, I'll think about removing the comment. BTW I accidently noticed (by adding an alert when debugging the change) that code gets called twice when processing an URL which is quite strange in my opinion.
jshin: excuse me, but where do you see r+sr = neil? i clearly see neil:r?timeless
see comment #46. comment #49 appears to mean that it's settled somehow. sorry if it's not.
jshin: i would think that neil's actions see <https://bugzilla.mozilla.org/show_activity.cgi?id=123006> neil.parkwaycc.co.uk@myrealbox.com 2004-10-25 16:42 PDT Attachment #163330 [details] [diff]Flag review?(neil.parkwaycc.co.uk@myrealbox.com), superreview?(neil.parkwaycc.co.uk@myrealbox.com) review?(timeless@bemail.org), superreview+ would trump his earlier comments (from 2004-10-21 12:53 PDT).
Sorry I didn't go that much length to tell which came first and which came later when I checked in the patch. In restrospect, comment #48 obivously superceded comment #44, which I didn't pay attention to. I also didn't realize that comment #49 had come before neil's review request. Anyway, do you have anything you don't like about in the patch? If you don't, why don't you add 'r' retroactively?
This patch appears to have completely broken my keyword bookmark, which uses %s, and needs to expand stuff like "foo/bar" to "http://baz/foo/bar". Firefox 1.0 does not implement either workaround (%S, or multiple keywords). What is the workaround for this breakage? Is there one? I don't understand why Ben G thinks %S is not needed (comment 41).
Using build 20041107 - %s is being replaced, and correctly encodes things (breaking things like bar/baz in previous comment) But %S is not being replaced at all. Perhaps there's another area of the code where the %s-detection needs to be changed to %[sS]-detection?
Ben got rid of '%S' when checking it to firefox 1.0branch while I kept it when committing Jean-Marc's patch to the trunk. See also bug 258223 for a more extensive patch.
I'd like to cast a vote for %S, though I don't see using it myself. But here's my $0.02 for possible future improvements, priority => minimal escape appropriately to the URL Different protocols should be escaped differently (like Cold Fusion does) javascript:alert('%s' + "%s"); should: arg.replace("'", "\\'") on the first %s arg.replace("\"", "\\\"") on the second %s http://%s;%s:%s@www.%s.com/%s?q=%s#%s should .URIescape the first, second, and third, fifth, and sixth %s's only Do validity checking on the fourth and seventh mailto:%s?cc=%s&subject=%s&body=%s Do validity checking on %first and second, .URIescape third and fourth (and please-oh-please escape spaces as %20 and not as +) file:///some/path/filename-%s/ Do validity checking... no ".." paths or overlong Unicode escapes for such
Product: Browser → Seamonkey
Comment on attachment 139974 [details] [diff] [review] Encoding all characters for %s replacement (complete urlencoding); no encoding for %S removing old request
Attachment #139974 - Flags: review?(bugs)
Attachment #163330 - Flags: review?(timeless)
Reopening. Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Closing this one in favor of another defect marked as a regression -- bug 319169.
Status: REOPENED → RESOLVED
Closed: 20 years ago19 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: