Closed Bug 123006 Opened 20 years ago Closed 16 years ago

Keyword substitution with %s doesn't escape characters

Categories

(SeaMonkey :: Bookmarks & History, enhancement)

x86
All
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Future

People

(Reporter: brad, Assigned: p_ch)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(2 files, 4 obsolete files)

You can't use strings like "c++" or "s&p 500" with keyword substitution in most
circumstances because Mozilla doesn't convert characters to their hex
counterparts (e.g., '+' should become %2B, '&' should become %26, etc.).

Build ID: 2002020103

Steps to recreate:
. Create a bookmark with a keyword for Google (name "Google Search", location
"http://www.google.com/search?q=%s", keyword "g")
. Use it to search for "c++": "g c++".  Note that it left the '+'s as entered
and they are ignored.
This would break multi-term queries entered by hand requiring these characters
(esp. &) literally.  When bug 98749 is fixed, this will change, as they could be
passed as multiple parameters.  Until then, in the interest of usability, this
should wait.

Marking dependency.
Depends on: 98749
Confirmed. ->Enhancement
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
Target Milestone: --- → Future
this bug and it's dependant (and some others) can be fixed by implementing RFE
bug 124237 

*** Bug 114371 has been marked as a duplicate of this bug. ***
is this bug the reason that non-ASCII characters (such as german Umlauts) don't
work with a google custom keyword, or should I file a new bug?
Yes, Joachim, this is almost certainly why non-ASCII characters (such as german
umlauts) don't work with keyword substitution.

I really don't think this is an "enhancement".  It is critical for making use of
keywords.  For example, a Google keyword set on "gg" that looks like:

http://www.google.com/search?q=%s

will break terribly if you type "gg b&w", giving a 404.  The resultant URI
("http://www.google.com/search?q=b&w") is something completely different to what
was expected.  This can be get worse if the incorrect URI means something
different, such as "gg blah&hl=fr".

It is quite possible, using keywords as they are, to create invalid URIs.  "gg
%" (given the previous keyword example) would create the following URI:

http://www.google.com/search?q=%

Note that the "%" is unescaped, making the URI invalid.  This is Really Bad(TM),
and certainly not just an "enhancement".
*** Bug 135471 has been marked as a duplicate of this bug. ***
Not OS specific.

To have multiple, meaningful substitution (i.e. substitutions that aren't all
the same), we need bug 124237 to be fixed, not bug 98749.  See comment 2 in bug
124237.
Depends on: 124237
No longer depends on: 98749
OS: Windows 2000 → All
No longer depends on: 124237
Ack!  I removed the dependency while adding myself to the Cc list of bug 124237.
I had the old page open in a tab and didn't reload. :-(  Re-adding the
dependency.  Sorry for the multiple spam.
Depends on: 124237
This breaks basic searches.
Summary: Keyword substitution with %s doesn't convert characters → Keyword substitution with %s doesn't escape characters
*** Bug 170953 has been marked as a duplicate of this bug. ***
This breaks having a custom keyword which turns:

bug #43
into
http://bugzilla.mozilla.org/show_bug.cgi?id=#43
correctly, because the browser interprets the # as an anchor and so only passes
the "http://bugzilla.mozilla.org/show_bug.cgi?id=" to Bugzilla, which obviously
gets confused. 
(See related bug 172415). 

I can't see why bug 124237 needs to be fixed to fix this bug. Can someone
enlighten me?

Gerv
I have gu => http://bonsai.mozilla.org/cvsguess.cgi?file=%s

i use things like gu nsCRT.h#30

please don't break my usage pattern.
timeless: I understand that there are uses for the current behaviour, but I
think it's a reasonable assertion that the escaping behaviour will Do The Right
Thing for more people more of the time.

I would have thought you could knock up a Bookmarklet which gave you a text box
to type "nsCRT.h#30" into, and then which did the right thing with that.

Gerv
Why not add another substition (e.g. %S instead of %s) which quotes the entities
correctly. Right now using %s is really annoying when I try to find german
documents containing umlauts.
That's a great idea - although I'd say that the original %s should quote
correctly, and the new value should be a different letter, with %S and %s doing
the same thing, to avoid an obvious class of user error.

Gerv
*** Bug 199545 has been marked as a duplicate of this bug. ***
*** Bug 161033 has been marked as a duplicate of this bug. ***
Escaping wouldn't fix the problem for keyword bookmarklets (such as the ones at
http://www.squarefree.com/bookmarklets/keywords.html) because javascript: URLs
are unescaped before being executed.  But it would fix the problem for search
keyword bookmarks, for example when the query contains '+'.
*** Bug 212668 has been marked as a duplicate of this bug. ***
This is a REALLY ANNOYING problem for me.  Recommend severity -> normal.  I
understand the utility of the unescaped substitution but that's really an
advanced feature for a select few.  The default behavior should be to escape for
most people most of the time.

The proposed solution of having different %s-style wildcards for escaped and
unescaped substitution would work great for me, so long as the default behavior
is to escape.  Better would be a check box in the Bookmark's properties -
whatever the name/default-value is for this check box, the default behavior
should be escaped substition (at least for https?:// urls.)

I frequently do google searches with + signs - for example
google +"browser comparisons" +mozilla
and the +'s are converted into spaces so Google never sees them.

I know I could look up the hex value for "+" and type %dd"browser comparisons"
%ddmozilla but that's just silly (and four extra keystrokes.)

People who want to have both escaped and unescaped for the same site could make
two bookmarks with the same url but different keywords.
I've written an example bookmarklet that gets us a little closer without having
to change the code base -- it's not perfect:

javascript:(function(){var
s='http://www.google.com/search?q=%%';s=s.replace(/%%/,encodeURIComponent('%s'));location.href=s})();

It searches Google.  I use %% inside to allow the user to put anything they want
either before or after the substitution.

If you bookmark this and give it a keyword, you can do things like

g2 "comeau c++"

and it will work properly.  It will *not* work for cases like

g2 "rosemary's baby"

because of the single quote.  Note that I use single quotes because for me,
double quotes in Google searches are extremely common.
Wouldn't
g2 "rosemary\'s baby"
work if you wanted to search on something with a single-quote?
Hmm.  It should.  <tries it>  Yep, that works.  It's still a bit inconvenient,
though, especially for the creation of new keyword bookmarks (though I tried to
make it simple).  The bookmarklet should fill the void until we can get some
sort of true solution.
WFM - withdraw severity->normal recommendation
FWIW, here's my tweaked version of the bookmarklet:
javascript: var s = "http://www.google.com/search?q=%%"; var q =
escape('%s').replace(/+/, "%252B"); s = s.replace(/%%/, q); location.href = s;
void(1);

Line-broken for readability:

javascript:
var s = "http://www.google.com/search?q=%%";
var q = escape('%s').replace(/+/, "%252B");
s = s.replace(/%%/, q);
location.href = s;
void(1);
Er, /+/ should be /+/g in my tweaked version above.

as to the dependency...
I think the multiple substitution case would seem to be most simply handled by
using multiple %s's in the bookmark

http://www.example.com/page/?foo=%s&bar=%s
call as
example 17 fish
for http://www.example.com/page/?foo=17&bar=fish

and so on
Basically there are two conflicting user scenarios:
a) multi-term queries which require that '&' is not encoded, and
b) queries containing '+' and special characters (umlauts etc.) which need to be
encoded.

Why not encode all characters, except '&' and '#'? Then, the keyword
substitution would work in almost all cases, while remaining very flexible. (Of
course, searching for "s&p" with a google keyword would still not work.) 
Patch for encoding all characters except '&' and '#' in keyword '%s'
substitution.

Since the encoding is in utf-8, this should be specified, e.g., when searching
google. Using http://www.google.com/search?ie=utf-8&q=%s works fine with
umlauts.
Wouldn't you need to add "=" to the list of escape-exempt characters?  (bringing
it up to [&=#])
What's wrong with the idea in comment 15?

Gerv
Matthew: Yes, '=' should not be escaped. This should be fine with most queries,
since "search?q=a=b" is generally interpreted as name="q" value="a=b".

Gerv: IMO the idea from comment 15 and comment 16 is very appealing. The only
downside of such a solution is that existing "multi-term queries" break (until
"%s" is replaced by "%S").

Both solutions (comment 15 and comment 28) seem reasonable, I just would like to
see one of them implemented... 
I think that bug 98749 (and bug 123237, which it is duped against) want multiple
different parameters, each of which gets allocated to a %s, rather than a single
parameter which gets placed into all occurrences of %s.

You need to split the text after the first space on whitespace, and then
allocate each member of the list to a %s until you run out of list members or
%ses. :-)

Still, this seems like the right fix for _this_ bug :-)

Gerv
Indeed, bug 124237 asks for multiple replacement parameters. However bug 98749
(and its duplicate bug 124173) ask for replacing multiple '%s'es with the same
value. The example given in the description suggests typing "mozbug XXXXXXXXXX"
in the location field, i.e., just one parameter (comment #5 in bug 98749
stresses the difference to the multi-parameter case). This would be fixed by
attachment 139974 [details] [diff] [review].

Anyway, I agree with Gerv that these other issues are not so relevant for this
bug. I just wanted to point out the nice side effect of the patch :)
*** Bug 153027 has been marked as a duplicate of this bug. ***
Attachment #139920 - Attachment is obsolete: true
Attachment #139974 - Flags: review?(bryner)
Comment on attachment 139974 [details] [diff] [review]
Encoding all characters for %s replacement (complete urlencoding); no encoding for %S

-> ben, he knows this code much better than I do
Attachment #139974 - Flags: review?(bryner) → review?(bugs)
I also think that the default behaviour should be to urlencode everything in the
url, i.e. also that spaces should encode as '+', and very importantly that
meta-characters like '&', '#' and '=' get escaped correctly.  It's very
confusing to a non-technical web-user why certain keyword searches don't work as
expected, and it certainly breaks my usage pattern (try searching for c#
something-or other... oops).  I understand this will break some things, but
frankly, that's a different problem perhaps better solved by an extension or
some other means.

My vote for complete urlencoding :-).
Attachment #139974 - Attachment description: Encoding all characters for %s replacement; no encoding for %S → Encoding all characters for %s replacement (complete urlencoding); no encoding for %S
Blocks: 252209
Flags: blocking-aviary1.0?
*** Bug 257911 has been marked as a duplicate of this bug. ***
Attachment #139974 - Flags: superreview?(neil.parkwaycc.co.uk)
Comment on attachment 139974 [details] [diff] [review]
Encoding all characters for %s replacement (complete urlencoding); no encoding for %S

>+          shortcutURL = substitutedURL==shortcutURL ? null : substitutedURL;
I'd prefer a pre-substitute test. Or at least spaces around the ==
Attachment #139974 - Flags: superreview?(neil.parkwaycc.co.uk) → superreview+
Flags: blocking-aviary1.0? → blocking-aviary1.0+
We don't need %S support, especially now keyword addition is automated.
Attachment #139974 - Attachment is obsolete: true
Fixed in firefox, br&trunk. 
Assignee: bugs → p_ch
Status: ASSIGNED → NEW
Flags: blocking-aviary1.0+
QA Contact: claudius → seamonkey.bookmarks
Blocks: 181243
Blocks: 264406
What is restraining the patch updated by Ben from getting into 1.8 ?
Can the existing s/sr be transferred ?
(In reply to comment #43)
> What is restraining the patch updated by Ben from getting into 1.8 ?

the fact that it's a firefox patch?
Attached patch Updated patch for Mozilla (obsolete) — Splinter Review
OK, so I didn't notice ben had only modified the FF part of the patch.

Here is the update for the mozilla code, based on what bg did for FF.
I keeped the %S, as I don't really see the connexion with automated keyword
addition (what is exactly automated keyword addition ?)

With regard to bug 264406, this patch happens to encode i18n charcters as UTF-8
when using %s, and encode as the local encoding when using %S. A bit of a
quirk, but the only clean solution to the problem is see is using LAST_CHARSET
to always do the right thing, which is significantly more complex.
Attachment #162844 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162844 - Flags: review?(neil.parkwaycc.co.uk)
Comment on attachment 162844 [details] [diff] [review]
Updated patch for Mozilla

>+          shortcutURL  = shortcutURL.match(/%[sS]/) ? shortcutURL.replace(/%s/g, encodeURIComponent(text))
Nit: you should only have one space before the =
Nit: you should use regexp.test(string), not match
Nit: you wrapped a line but it was still over 80 characters.
Either wrap it harder or don't wrap it at all.
>+                                                                 .replace(/%S/g, text)
>+                                                    : null;
r+sr=me only once all three nits are fixed.
Attachment #162844 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162844 - Flags: superreview+
Attachment #162844 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #162844 - Flags: review+
Attached patch nit corrected mozilla patch (obsolete) — Splinter Review
It's not fair, ben could enter his patch without the nitpick about using match
;-)
Attachment #162844 - Attachment is obsolete: true
Attachment #162990 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162990 - Flags: review?(neil.parkwaycc.co.uk)
Comment on attachment 162990 [details] [diff] [review]
nit corrected mozilla patch

Sorry, didn't you know that /%[sS]/ is already a RegExp? Compare [] is a new
Array, {} is a new Object and function anonymous(){} is a new Function.
Attachment #162990 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162990 - Flags: superreview-
Attachment #162990 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #162990 - Flags: review-
Should rs= as per neil's email comment

The tree is currently open, so if someone could check it in ...
Attachment #162990 - Attachment is obsolete: true
Attachment #163330 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #163330 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #163330 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #163330 - Flags: superreview+
Attachment #163330 - Flags: review?(timeless)
Attachment #163330 - Flags: review?(neil.parkwaycc.co.uk)
patch checked into the trunk. Shouldn't this go in to branches as well? 
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Comment on attachment 163330 [details] [diff] [review]
corrects neil's comments

>Index: navigator.js
>===================================================================
>+        // Bug 123006 : %s replace and URI escape, %S replace with raw value

We shouldn't be putting comments like this into the tree.
Why not?  Comments are good.
(In reply to comment #52)
> Why not?  Comments are good.

Because if we did, we'd have thousands of "Bug <n>" comments polluting the
source code.  That's what bonsai.mozilla.org is for.
I added the comment :
- because there was a similar comment just above, but this was back in 2000, so
the politic about that may have changed
- more to document the difference in behaviour between %s and %S. But it's true
this doesn't belong to the sourcecode, and should be added to mozilla doc somewhere.

If someone with the right rights reading this wants to remove the line, that's
OK for me, but I don't feel it's worth bugging someone to do it.

I'd like to get a look at bug 144875 now : "custom keywords with keyword same as
bookmark don't work". 
If I can reproduce, and if patching it touches the same code, I'll think about
removing the comment. BTW I accidently noticed (by adding an alert when
debugging the change) that code gets called twice when processing an URL which
is quite strange in my opinion.
jshin: excuse me, but where do you see r+sr = neil? i clearly see neil:r?timeless
see comment #46. comment #49 appears to mean that it's settled somehow. sorry if
it's not.
jshin: i would think that neil's actions see
<https://bugzilla.mozilla.org/show_activity.cgi?id=123006>

neil.parkwaycc.co.uk@myrealbox.com  	2004-10-25 16:42 PDT  	  Attachment
#163330 [details] [diff]Flag  	review?(neil.parkwaycc.co.uk@myrealbox.com),
superreview?(neil.parkwaycc.co.uk@myrealbox.com)  	review?(timeless@bemail.org),
superreview+

would trump his earlier comments (from 2004-10-21 12:53 PDT).
Sorry I didn't go that much length to tell which came first and which came later
when I checked in the patch. In restrospect, comment #48 obivously superceded
comment #44, which I didn't pay attention to. I also didn't realize that comment
#49 had come before neil's review request. Anyway,  do you have anything you
don't like about in the patch? If you don't,  why don't you add 'r' retroactively? 
This patch appears to have completely broken my keyword bookmark, which uses
%s, and needs to expand stuff like "foo/bar" to "http://baz/foo/bar". Firefox
1.0 does not implement either workaround (%S, or multiple keywords). What is the
workaround for this breakage? Is there one?

I don't understand why Ben G thinks %S is not needed (comment 41).
Using build 20041107 - %s is being replaced, and correctly encodes things
(breaking things like bar/baz in previous comment)
But %S is not being replaced at all.  Perhaps there's another area of the code
where the %s-detection needs to be changed to %[sS]-detection?
Ben got rid of '%S' when checking it to firefox 1.0branch while I kept it when
committing Jean-Marc's patch to the trunk. See also bug 258223 for a more
extensive patch.
I'd like to cast a vote for %S, though I don't see using it myself.

But here's my $0.02 for possible future improvements, priority => minimal

escape appropriately to the URL
Different protocols should be escaped differently (like Cold Fusion does)
javascript:alert('%s' + "%s"); should:
arg.replace("'", "\\'") on the first %s
arg.replace("\"", "\\\"") on the second %s

http://%s;%s:%s@www.%s.com/%s?q=%s#%s should
.URIescape the first, second, and third, fifth, and sixth %s's only
Do validity checking on the fourth and seventh

mailto:%s?cc=%s&subject=%s&body=%s
Do validity checking on %first and second, .URIescape third and fourth (and
please-oh-please escape spaces as %20 and not as +)

file:///some/path/filename-%s/
Do validity checking... no ".." paths or overlong Unicode escapes for such
Product: Browser → Seamonkey
Comment on attachment 139974 [details] [diff] [review]
Encoding all characters for %s replacement (complete urlencoding); no encoding for %S

removing old request
Attachment #139974 - Flags: review?(bugs)
Attachment #163330 - Flags: review?(timeless)
Reopening.

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Closing this one in favor of another defect marked as a regression -- bug 319169.
Status: REOPENED → RESOLVED
Closed: 17 years ago16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.