Keyword substitution with %s doesn't escape characters

RESOLVED FIXED in Future

Status

--
enhancement
RESOLVED FIXED
17 years ago
13 years ago

People

(Reporter: brad, Assigned: p_ch)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

Trunk
Future
x86
All
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments, 4 obsolete attachments)

(Reporter)

Description

17 years ago
You can't use strings like "c++" or "s&p 500" with keyword substitution in most
circumstances because Mozilla doesn't convert characters to their hex
counterparts (e.g., '+' should become %2B, '&' should become %26, etc.).

Build ID: 2002020103

Steps to recreate:
. Create a bookmark with a keyword for Google (name "Google Search", location
"http://www.google.com/search?q=%s", keyword "g")
. Use it to search for "c++": "g c++".  Note that it left the '+'s as entered
and they are ignored.

Comment 1

17 years ago
This would break multi-term queries entered by hand requiring these characters
(esp. &) literally.  When bug 98749 is fixed, this will change, as they could be
passed as multiple parameters.  Until then, in the interest of usability, this
should wait.

Marking dependency.
Depends on: 98749

Comment 2

17 years ago
Confirmed. ->Enhancement
Severity: normal → enhancement
Status: UNCONFIRMED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED
Target Milestone: --- → Future

Comment 3

17 years ago
this bug and it's dependant (and some others) can be fixed by implementing RFE
bug 124237 

Comment 4

17 years ago
*** Bug 114371 has been marked as a duplicate of this bug. ***

Comment 5

17 years ago
is this bug the reason that non-ASCII characters (such as german Umlauts) don't
work with a google custom keyword, or should I file a new bug?

Comment 6

17 years ago
Yes, Joachim, this is almost certainly why non-ASCII characters (such as german
umlauts) don't work with keyword substitution.

I really don't think this is an "enhancement".  It is critical for making use of
keywords.  For example, a Google keyword set on "gg" that looks like:

http://www.google.com/search?q=%s

will break terribly if you type "gg b&w", giving a 404.  The resultant URI
("http://www.google.com/search?q=b&w") is something completely different to what
was expected.  This can be get worse if the incorrect URI means something
different, such as "gg blah&hl=fr".

It is quite possible, using keywords as they are, to create invalid URIs.  "gg
%" (given the previous keyword example) would create the following URI:

http://www.google.com/search?q=%

Note that the "%" is unescaped, making the URI invalid.  This is Really Bad(TM),
and certainly not just an "enhancement".

Comment 7

17 years ago
*** Bug 135471 has been marked as a duplicate of this bug. ***

Comment 8

17 years ago
Not OS specific.

To have multiple, meaningful substitution (i.e. substitutions that aren't all
the same), we need bug 124237 to be fixed, not bug 98749.  See comment 2 in bug
124237.
Depends on: 124237
No longer depends on: 98749
OS: Windows 2000 → All

Updated

17 years ago
No longer depends on: 124237

Comment 9

17 years ago
Ack!  I removed the dependency while adding myself to the Cc list of bug 124237.
I had the old page open in a tab and didn't reload. :-(  Re-adding the
dependency.  Sorry for the multiple spam.
Depends on: 124237

Comment 10

16 years ago
This breaks basic searches.
Summary: Keyword substitution with %s doesn't convert characters → Keyword substitution with %s doesn't escape characters

Comment 11

16 years ago
*** Bug 170953 has been marked as a duplicate of this bug. ***
This breaks having a custom keyword which turns:

bug #43
into
http://bugzilla.mozilla.org/show_bug.cgi?id=#43
correctly, because the browser interprets the # as an anchor and so only passes
the "http://bugzilla.mozilla.org/show_bug.cgi?id=" to Bugzilla, which obviously
gets confused. 
(See related bug 172415). 

I can't see why bug 124237 needs to be fixed to fix this bug. Can someone
enlighten me?

Gerv

Comment 13

16 years ago
I have gu => http://bonsai.mozilla.org/cvsguess.cgi?file=%s

i use things like gu nsCRT.h#30

please don't break my usage pattern.
timeless: I understand that there are uses for the current behaviour, but I
think it's a reasonable assertion that the escaping behaviour will Do The Right
Thing for more people more of the time.

I would have thought you could knock up a Bookmarklet which gave you a text box
to type "nsCRT.h#30" into, and then which did the right thing with that.

Gerv

Comment 15

16 years ago
Why not add another substition (e.g. %S instead of %s) which quotes the entities
correctly. Right now using %s is really annoying when I try to find german
documents containing umlauts.
That's a great idea - although I'd say that the original %s should quote
correctly, and the new value should be a different letter, with %S and %s doing
the same thing, to avoid an obvious class of user error.

Gerv

Comment 17

16 years ago
*** Bug 199545 has been marked as a duplicate of this bug. ***

Comment 18

16 years ago
*** Bug 161033 has been marked as a duplicate of this bug. ***

Comment 19

16 years ago
Escaping wouldn't fix the problem for keyword bookmarklets (such as the ones at
http://www.squarefree.com/bookmarklets/keywords.html) because javascript: URLs
are unescaped before being executed.  But it would fix the problem for search
keyword bookmarks, for example when the query contains '+'.

Comment 20

15 years ago
*** Bug 212668 has been marked as a duplicate of this bug. ***

Comment 21

15 years ago
This is a REALLY ANNOYING problem for me.  Recommend severity -> normal.  I
understand the utility of the unescaped substitution but that's really an
advanced feature for a select few.  The default behavior should be to escape for
most people most of the time.

The proposed solution of having different %s-style wildcards for escaped and
unescaped substitution would work great for me, so long as the default behavior
is to escape.  Better would be a check box in the Bookmark's properties -
whatever the name/default-value is for this check box, the default behavior
should be escaped substition (at least for https?:// urls.)

I frequently do google searches with + signs - for example
google +"browser comparisons" +mozilla
and the +'s are converted into spaces so Google never sees them.

I know I could look up the hex value for "+" and type %dd"browser comparisons"
%ddmozilla but that's just silly (and four extra keystrokes.)

People who want to have both escaped and unescaped for the same site could make
two bookmarks with the same url but different keywords.
(Reporter)

Comment 22

15 years ago
I've written an example bookmarklet that gets us a little closer without having
to change the code base -- it's not perfect:

javascript:(function(){var
s='http://www.google.com/search?q=%%';s=s.replace(/%%/,encodeURIComponent('%s'));location.href=s})();

It searches Google.  I use %% inside to allow the user to put anything they want
either before or after the substitution.

If you bookmark this and give it a keyword, you can do things like

g2 "comeau c++"

and it will work properly.  It will *not* work for cases like

g2 "rosemary's baby"

because of the single quote.  Note that I use single quotes because for me,
double quotes in Google searches are extremely common.

Comment 23

15 years ago
Wouldn't
g2 "rosemary\'s baby"
work if you wanted to search on something with a single-quote?
(Reporter)

Comment 24

15 years ago
Hmm.  It should.  <tries it>  Yep, that works.  It's still a bit inconvenient,
though, especially for the creation of new keyword bookmarks (though I tried to
make it simple).  The bookmarklet should fill the void until we can get some
sort of true solution.

Comment 25

15 years ago
WFM - withdraw severity->normal recommendation

Comment 26

15 years ago
FWIW, here's my tweaked version of the bookmarklet:
javascript: var s = "http://www.google.com/search?q=%%"; var q =
escape('%s').replace(/+/, "%252B"); s = s.replace(/%%/, q); location.href = s;
void(1);

Line-broken for readability:

javascript:
var s = "http://www.google.com/search?q=%%";
var q = escape('%s').replace(/+/, "%252B");
s = s.replace(/%%/, q);
location.href = s;
void(1);

Comment 27

15 years ago
Er, /+/ should be /+/g in my tweaked version above.

as to the dependency...
I think the multiple substitution case would seem to be most simply handled by
using multiple %s's in the bookmark

http://www.example.com/page/?foo=%s&bar=%s
call as
example 17 fish
for http://www.example.com/page/?foo=17&bar=fish

and so on

Comment 28

15 years ago
Basically there are two conflicting user scenarios:
a) multi-term queries which require that '&' is not encoded, and
b) queries containing '+' and special characters (umlauts etc.) which need to be
encoded.

Why not encode all characters, except '&' and '#'? Then, the keyword
substitution would work in almost all cases, while remaining very flexible. (Of
course, searching for "s&p" with a google keyword would still not work.) 

Comment 29

15 years ago
Created attachment 139920 [details] [diff] [review]
Encoding all characters except '&' and '#' in keyword '%s' substitution

Patch for encoding all characters except '&' and '#' in keyword '%s'
substitution.

Since the encoding is in utf-8, this should be specified, e.g., when searching
google. Using http://www.google.com/search?ie=utf-8&q=%s works fine with
umlauts.

Comment 30

15 years ago
Wouldn't you need to add "=" to the list of escape-exempt characters?  (bringing
it up to [&=#])
What's wrong with the idea in comment 15?

Gerv

Comment 32

15 years ago
Matthew: Yes, '=' should not be escaped. This should be fine with most queries,
since "search?q=a=b" is generally interpreted as name="q" value="a=b".

Gerv: IMO the idea from comment 15 and comment 16 is very appealing. The only
downside of such a solution is that existing "multi-term queries" break (until
"%s" is replaced by "%S").

Both solutions (comment 15 and comment 28) seem reasonable, I just would like to
see one of them implemented... 

Comment 33

15 years ago
Created attachment 139974 [details] [diff] [review]
Encoding all characters for %s replacement (complete urlencoding); no encoding for %S

Solution according to comment 15 and comment 16.

This also fixes bug 98749.
I think that bug 98749 (and bug 123237, which it is duped against) want multiple
different parameters, each of which gets allocated to a %s, rather than a single
parameter which gets placed into all occurrences of %s.

You need to split the text after the first space on whitespace, and then
allocate each member of the list to a %s until you run out of list members or
%ses. :-)

Still, this seems like the right fix for _this_ bug :-)

Gerv

Comment 35

15 years ago
Indeed, bug 124237 asks for multiple replacement parameters. However bug 98749
(and its duplicate bug 124173) ask for replacing multiple '%s'es with the same
value. The example given in the description suggests typing "mozbug XXXXXXXXXX"
in the location field, i.e., just one parameter (comment #5 in bug 98749
stresses the difference to the multi-parameter case). This would be fixed by
attachment 139974 [details] [diff] [review].

Anyway, I agree with Gerv that these other issues are not so relevant for this
bug. I just wanted to point out the nice side effect of the patch :)

Comment 36

15 years ago
*** Bug 153027 has been marked as a duplicate of this bug. ***

Updated

14 years ago
Attachment #139920 - Attachment is obsolete: true

Updated

14 years ago
Attachment #139974 - Flags: review?(bryner)
Comment on attachment 139974 [details] [diff] [review]
Encoding all characters for %s replacement (complete urlencoding); no encoding for %S

-> ben, he knows this code much better than I do
Attachment #139974 - Flags: review?(bryner) → review?(bugs)

Comment 38

14 years ago
I also think that the default behaviour should be to urlencode everything in the
url, i.e. also that spaces should encode as '+', and very importantly that
meta-characters like '&', '#' and '=' get escaped correctly.  It's very
confusing to a non-technical web-user why certain keyword searches don't work as
expected, and it certainly breaks my usage pattern (try searching for c#
something-or other... oops).  I understand this will break some things, but
frankly, that's a different problem perhaps better solved by an extension or
some other means.

My vote for complete urlencoding :-).

Updated

14 years ago
Attachment #139974 - Attachment description: Encoding all characters for %s replacement; no encoding for %S → Encoding all characters for %s replacement (complete urlencoding); no encoding for %S

Updated

14 years ago
Blocks: 252209

Updated

14 years ago
Flags: blocking-aviary1.0?

Comment 39

14 years ago
*** Bug 257911 has been marked as a duplicate of this bug. ***

Updated

14 years ago
Attachment #139974 - Flags: superreview?(neil.parkwaycc.co.uk)

Comment 40

14 years ago
Comment on attachment 139974 [details] [diff] [review]
Encoding all characters for %s replacement (complete urlencoding); no encoding for %S

>+          shortcutURL = substitutedURL==shortcutURL ? null : substitutedURL;
I'd prefer a pre-substitute test. Or at least spaces around the ==
Attachment #139974 - Flags: superreview?(neil.parkwaycc.co.uk) → superreview+
Flags: blocking-aviary1.0? → blocking-aviary1.0+
Created attachment 160422 [details] [diff] [review]
patch, un-bit-rotted. 

We don't need %S support, especially now keyword addition is automated.
Attachment #139974 - Attachment is obsolete: true
Fixed in firefox, br&trunk. 
Assignee: bugs → p_ch
Status: ASSIGNED → NEW
Flags: blocking-aviary1.0+
QA Contact: claudius → seamonkey.bookmarks

Updated

14 years ago
Blocks: 181243

Updated

14 years ago
Blocks: 264406

Comment 43

14 years ago
What is restraining the patch updated by Ben from getting into 1.8 ?
Can the existing s/sr be transferred ?
(In reply to comment #43)
> What is restraining the patch updated by Ben from getting into 1.8 ?

the fact that it's a firefox patch?

Comment 45

14 years ago
Created attachment 162844 [details] [diff] [review]
Updated patch for Mozilla

OK, so I didn't notice ben had only modified the FF part of the patch.

Here is the update for the mozilla code, based on what bg did for FF.
I keeped the %S, as I don't really see the connexion with automated keyword
addition (what is exactly automated keyword addition ?)

With regard to bug 264406, this patch happens to encode i18n charcters as UTF-8
when using %s, and encode as the local encoding when using %S. A bit of a
quirk, but the only clean solution to the problem is see is using LAST_CHARSET
to always do the right thing, which is significantly more complex.

Updated

14 years ago
Attachment #162844 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162844 - Flags: review?(neil.parkwaycc.co.uk)

Comment 46

14 years ago
Comment on attachment 162844 [details] [diff] [review]
Updated patch for Mozilla

>+          shortcutURL  = shortcutURL.match(/%[sS]/) ? shortcutURL.replace(/%s/g, encodeURIComponent(text))
Nit: you should only have one space before the =
Nit: you should use regexp.test(string), not match
Nit: you wrapped a line but it was still over 80 characters.
Either wrap it harder or don't wrap it at all.
>+                                                                 .replace(/%S/g, text)
>+                                                    : null;
r+sr=me only once all three nits are fixed.
Attachment #162844 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162844 - Flags: superreview+
Attachment #162844 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #162844 - Flags: review+

Comment 47

14 years ago
Created attachment 162990 [details] [diff] [review]
nit corrected mozilla patch

It's not fair, ben could enter his patch without the nitpick about using match
;-)
Attachment #162844 - Attachment is obsolete: true

Updated

14 years ago
Attachment #162990 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162990 - Flags: review?(neil.parkwaycc.co.uk)

Comment 48

14 years ago
Comment on attachment 162990 [details] [diff] [review]
nit corrected mozilla patch

Sorry, didn't you know that /%[sS]/ is already a RegExp? Compare [] is a new
Array, {} is a new Object and function anonymous(){} is a new Function.
Attachment #162990 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #162990 - Flags: superreview-
Attachment #162990 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #162990 - Flags: review-

Comment 49

14 years ago
Created attachment 163330 [details] [diff] [review]
corrects neil's comments

Should rs= as per neil's email comment

The tree is currently open, so if someone could check it in ...

Updated

14 years ago
Attachment #162990 - Attachment is obsolete: true

Updated

14 years ago
Attachment #163330 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #163330 - Flags: review?(neil.parkwaycc.co.uk)

Updated

14 years ago
Attachment #163330 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #163330 - Flags: superreview+
Attachment #163330 - Flags: review?(timeless)
Attachment #163330 - Flags: review?(neil.parkwaycc.co.uk)

Comment 50

14 years ago
patch checked into the trunk. Shouldn't this go in to branches as well? 
Status: NEW → RESOLVED
Last Resolved: 14 years ago
Resolution: --- → FIXED

Comment 51

14 years ago
Comment on attachment 163330 [details] [diff] [review]
corrects neil's comments

>Index: navigator.js
>===================================================================
>+        // Bug 123006 : %s replace and URI escape, %S replace with raw value

We shouldn't be putting comments like this into the tree.

Comment 52

14 years ago
Why not?  Comments are good.

Comment 53

14 years ago
(In reply to comment #52)
> Why not?  Comments are good.

Because if we did, we'd have thousands of "Bug <n>" comments polluting the
source code.  That's what bonsai.mozilla.org is for.

Comment 54

14 years ago
I added the comment :
- because there was a similar comment just above, but this was back in 2000, so
the politic about that may have changed
- more to document the difference in behaviour between %s and %S. But it's true
this doesn't belong to the sourcecode, and should be added to mozilla doc somewhere.

If someone with the right rights reading this wants to remove the line, that's
OK for me, but I don't feel it's worth bugging someone to do it.

I'd like to get a look at bug 144875 now : "custom keywords with keyword same as
bookmark don't work". 
If I can reproduce, and if patching it touches the same code, I'll think about
removing the comment. BTW I accidently noticed (by adding an alert when
debugging the change) that code gets called twice when processing an URL which
is quite strange in my opinion.

Comment 55

14 years ago
jshin: excuse me, but where do you see r+sr = neil? i clearly see neil:r?timeless

Comment 56

14 years ago
see comment #46. comment #49 appears to mean that it's settled somehow. sorry if
it's not.

Comment 57

14 years ago
jshin: i would think that neil's actions see
<https://bugzilla.mozilla.org/show_activity.cgi?id=123006>

neil.parkwaycc.co.uk@myrealbox.com  	2004-10-25 16:42 PDT  	  Attachment
#163330 [details] [diff]Flag  	review?(neil.parkwaycc.co.uk@myrealbox.com),
superreview?(neil.parkwaycc.co.uk@myrealbox.com)  	review?(timeless@bemail.org),
superreview+

would trump his earlier comments (from 2004-10-21 12:53 PDT).

Comment 58

14 years ago
Sorry I didn't go that much length to tell which came first and which came later
when I checked in the patch. In restrospect, comment #48 obivously superceded
comment #44, which I didn't pay attention to. I also didn't realize that comment
#49 had come before neil's review request. Anyway,  do you have anything you
don't like about in the patch? If you don't,  why don't you add 'r' retroactively? 

Comment 59

14 years ago
This patch appears to have completely broken my keyword bookmark, which uses
%s, and needs to expand stuff like "foo/bar" to "http://baz/foo/bar". Firefox
1.0 does not implement either workaround (%S, or multiple keywords). What is the
workaround for this breakage? Is there one?

I don't understand why Ben G thinks %S is not needed (comment 41).

Comment 60

14 years ago
Using build 20041107 - %s is being replaced, and correctly encodes things
(breaking things like bar/baz in previous comment)
But %S is not being replaced at all.  Perhaps there's another area of the code
where the %s-detection needs to be changed to %[sS]-detection?

Comment 61

14 years ago
Ben got rid of '%S' when checking it to firefox 1.0branch while I kept it when
committing Jean-Marc's patch to the trunk. See also bug 258223 for a more
extensive patch.

Comment 62

14 years ago
I'd like to cast a vote for %S, though I don't see using it myself.

But here's my $0.02 for possible future improvements, priority => minimal

escape appropriately to the URL
Different protocols should be escaped differently (like Cold Fusion does)
javascript:alert('%s' + "%s"); should:
arg.replace("'", "\\'") on the first %s
arg.replace("\"", "\\\"") on the second %s

http://%s;%s:%s@www.%s.com/%s?q=%s#%s should
.URIescape the first, second, and third, fifth, and sixth %s's only
Do validity checking on the fourth and seventh

mailto:%s?cc=%s&subject=%s&body=%s
Do validity checking on %first and second, .URIescape third and fourth (and
please-oh-please escape spaces as %20 and not as +)

file:///some/path/filename-%s/
Do validity checking... no ".." paths or overlong Unicode escapes for such
Product: Browser → Seamonkey
Comment on attachment 139974 [details] [diff] [review]
Encoding all characters for %s replacement (complete urlencoding); no encoding for %S

removing old request
Attachment #139974 - Flags: review?(bugs)

Updated

14 years ago
Attachment #163330 - Flags: review?(timeless)
(Reporter)

Comment 65

13 years ago
Reopening.

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(Reporter)

Comment 66

13 years ago
Closing this one in favor of another defect marked as a regression -- bug 319169.
Status: REOPENED → RESOLVED
Last Resolved: 14 years ago13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.