124237 - Custom keywords should support multiple substitutions (w/ code)

Reporter

Description

•

23 years ago

caveat: the code is in perl, as a starting point for whomever might do it in C.


custom keywords are awesome for most simple searches. Fixing bug 124173 would
make custom keywords great for a set of more complex searches. However, I think
that the geeks and powerusers will always be itching for something more: the
ability to use custom keywords to search from the URL bar when your query needs
to set two different fields. (Actually, implementing this RFE would make bug
124173 obselete.)

For example, suppose you're a stock market geek. You're constantly looking at
stock graphs all day, and need not only to look at different stocks, but also at
different graph time frames. You might want any of the following:

    http://quote.fool.com/Chart/chart.asp?time=1dy&symbols=aapl
    http://quote.fool.com/Chart/chart.asp?time=2dy&symbols=aapl
    http://quote.fool.com/Chart/chart.asp?time=1mo&symbols=aapl
    http://quote.fool.com/Chart/chart.asp?time=3mo&symbols=aapl
    http://quote.fool.com/Chart/chart.asp?time=ytd&symbols=aapl
    http://quote.fool.com/Chart/chart.asp?time=2yr&symbols=aapl
    http://quote.fool.com/Chart/chart.asp?time=5yr&symbols=aapl
    http://quote.fool.com/Chart/chart.asp?time=all&symbols=aapl
    http://quote.fool.com/Chart/chart.asp?time=1dy&symbols=msft
    http://quote.fool.com/Chart/chart.asp?time=2dy&symbols=msft
    http://quote.fool.com/Chart/chart.asp?time=1mo&symbols=msft
    http://quote.fool.com/Chart/chart.asp?time=3mo&symbols=msft
    http://quote.fool.com/Chart/chart.asp?time=ytd&symbols=msft
    http://quote.fool.com/Chart/chart.asp?time=2yr&symbols=msft
    http://quote.fool.com/Chart/chart.asp?time=5yr&symbols=msft
    http://quote.fool.com/Chart/chart.asp?time=all&symbols=msft

(for any of many many ticker symbols and about 10 durations)

The user might want to have a custom keyword graph and be able to

    graph 2yr aapl
    graph 1dy msft

This general situation can exist in many other contexts as well. The above is
just one example. Common bugzilla queries might be another use. Driving
directions from one zip code to another. Translation services might be an example:
    trans en de http://example.com/something_english_I_want_in_german
    trans fr en http://example.com/something_french_I_want_in_english

Attempted workarounds might include:

    1) rearange the URL so you can do something like
       graph aapl&time=1dy
       but this workaround fails because '&' and '=' get encoded

    2) create ten custom keywords so you can do
       graph1dy aapl
       graph2dy aapl
       graphytd aapl etc
       this workaround works for some cases

In some cases, the workaround is not practical. For example, if you're trying to
creat a bugzilla query which looks for bugs with a certain summary stubstring
assigned to a certain person, you might use this to find all tab-related bugs
assigned to user@example.com:

    ownedby user@example.com tab

It would be most impractical to create a custom keyword for every bug owner or
for every summary substring.


The interface for this might be something like this:
    user uses %q (query term) instead of %s

To get multiple query terms, we change the interface:
    user uses %q1, %q2, %q3, %q4, etc instead of %s
    initial implementation might only do %q1 to %q9

However, this could break in ambiguous situations:
    http://example.com/foo1.html   (one not lc L)
    might be the first page of info about foo
    there might be pages of info on other things
    the user might have
        http://example.com/%s1.html
    which would not translate well in the %q world
        because %q12 might mean term 12
        or might mean term 1 and then '2'
    so, we need to terminate the %q terms
    %q1; %q2; %q3; ... %q9; %q10; %q11; ...

Note: because we're doing this termination, it becomes more vital that we use a
different %x letter than the existing %s. Existing custom keyword queries might
break if we tried to use the same %s rather than a new %q. My hope is that %q
would become the documented way to do things, and %s would be deprecated.

We also need a way to say "use all of the terms" or "all of the not-yet used
terms" so that in the above example, you could do something like
    ownedby user@example.com MULTI-WORD SUBSTRING HERE
To search for that persons's bugs with that multi-word substring in the summary.

I suggest that %q; be the "everything else" token. If the code wanted to be
trickier, %q followed by a non(digit-or-semicolon) could also be read as %q;
This would make the %q queries work just like the %s queries for *most* cases
(but not the pathalogical cases with numbers in the URL right after the token...
in which cases the user would need the semicolon).

Another potential issue. What if we've got a case where a user wants to put two
different multi-word substrings in their query. I propose that the user be able
to specify query terms that will not be encoded by using a capital Q instead of
a lower case q. So, they could do
    keword term+one+here term+two+here
        to go in http://a.com/a.cgi?a=%Q1;&b=%Q2;
or they could do
    keyword term+one+here term two here
        to go in http://a.com/a.cgi?a=%Q1;&b=%q;
        (%q; takes "everything else" and url-encodes it)
either one of those approaches gives the final url of:
    http://a.com/a.cgi?a=term+one+here&b=term+two+here


Finally, some messy cases are worth considering. What if the user puts numbered
%q##; tokens in their custom keyword URL, but doesn't use sequential numbers
starting from 1?
    http://a.com/a.cgi?a=%q5;&b=%q8;&c=%q2;
The above should work exactly like
    http://a.com/a.cgi?a=%q2;&b=%q3;&c=%q1;
In other words, they get filled in in numeric order.


What if the user doesn't provide enough terms to fill in all of their %q tokens?
Remaining tokens are filled in with empty strings.

What if the user wants to leave their first token filled with an empty string,
but provide terms for the other tokens? Sorry, that is not allowed by this system.

What about leading zeros? Treat %q000001; as identical to %q1;

What about a %q (without semicolon) at the end of the URL? Treat it as though it
had a semicolon.

What about %q1 (or other digits without semicolon) at the end of the URL? Again,
treat it as though it had a semicolon.

What about %q0; or %q0000; tokens? Treat them as %q; tokens.

What about %qxyz; tokens (where 'xyz' are non digits)? Treat them as %q; tokens
followed by 'xyz;' (the semicolon after xyz has no special meaning).

What if the user has %Q1; and %q1; in the same URL? Substitute the encoded first
term for %q1; and the non-encoded first term for %Q1;


I've got the following rough code to do the job in perl. I used perl because I
know perl, and hope that someone interested in this feature would be able to
turn it into functioning C code, using whatever regex capabilities are in
Mozilla's code. I realized that this will be much harder in C.

I haven't done more than type the following code, so don't be suprised if
there's a missing semicolon or comma. The idea was to provide a starting point
for whomever might be able to tackle it in C, not to provide polished
functionality in in perl.



# User entered "customkeyword terma termb termc" into URL bar
# (or at least "customkeyword" by itself into the URL bar).
# Given that we've identified a keyword with associated $bookmark
# ($bookmark in form of 'http://a.com/a.cgi?a=%q1;&b=%q2;&c=%q3;')
# and remainder of URL bar is in $query ie: "terma termb termc"
# need to return a $url to which user will be directed:

sub build_url ($$) {
    my( $bookmark, $query ) = @_;
    $bookmark = &canonicalize( $bookmark );
    return $bookmark unless &has_tokens( $bookmark );
    for ( $query ) {
        # strip initial/trailing whitespace
        s/^\s+//g; s/\s+$//g;
    }
    # we're done if query was only whitespace:
    return &empty_tokens( $bookmark ) unless length( $query );
    # otherwise, split query into terms on whitespace
    my @terms =  split( /\s+/, $query);
    my @numeric_tokens; # need to see which ones were used:
    while ( $bookmark =~ m/%[qQ](\d+);/g ) {
        push( @numeric_tokens, $1 ); # save numbered tokens
    }
    if ( @numeric_tokens ) {
        # suppose @numeric_tokens = 22, 22, 1, 2, 7, 6, 5, 2, 2, 1
        @tokens = sort { $a <=> $b } (
            keys %{{ map {$_ => 1} @numeric_tokens }}
        );
        # now @numeric_tokens has unique, numerically sorted entries
        # like this: 1, 2, 5, 6, 7, 22
        # Now we start filling in @terms for @numeric_tokens
        while ( @terms and @numeric_tokens ) {
            my $term = shift @terms;
            my $token = shift @numeric_tokens;
            # substitute %Q (un-encoded) tokens:
            $bookmark =~ s/%Q$token;/$term/g;
            # now get the encoded form of the token:
            $token = &url_encode( $token );
            # and substitute %q (normal encoded) tokens:
            $bookmark =~ s/%q$token;/$term/g;
        }
    }
    # we may or may not have used up all of our @terms
    return &empty_tokens( $bookmark ) unless @terms;
    # now we know that we still have terms left
    # therefor, we know we're out of @numeric_tokens
    # $bookmark now has non-numbered tokens or no tokens
    return $bookmark unless &has_tokens( $bookmark );
    # $bookmark has non-numbered token(s) and terms remain
    # Get the un-encoded version of $everything_else
    my $everything_else = join( '+', @terms );
    # put $everything_else in for non-numbered %Q; token(s)
    $bookmark =~ s/%Q;/$everything_else/g;
    # get the encoded version of $everything_else
    $everything_else = join( '+', @terms );
    # and put that in for non-numbered %q tokens:
    $bookmark =~ s/%q;/$everything_else/g;
    # that's it... give it to the user:
    return $bookmark;
}
sub canonicalize ($) {
    my $bookmark = shift;
    # if nothing resembles a token, we're already done:
    return $bookmark unless $bookmark =~ m/%[qQ]/;
    # the comments below talk about %q, but really
    # mean %q or %Q (case-preserving)
    for ($bookmark) {
        # %q on the end should be %q; (add semicolon)
        # likewise, %q123 at end should be %q123;
        s/%([qQ])(\d*)$/%$1$2;/g;
        # %q followed by non(digit or semicolon)
        # should be treated as %q; (followed by
        # whatever that non(digit or semicolon) was
        s/%([qQ])([^\d;])/%$1;$2/g;
        # ignore initial zeros in numbered tokens,
        # and all zeros means just %q; (non-numbered)
        s/%([qQ])0+(\d*);/%$1$2;/g;
    }
    return $bookmark;
}
sub has_tokens ($) {
    my $bookmark = shift;
    for ($bookmark) {
        # return 1 (true) if we find tokens
        return 1 if m/%[qQ]\d+;/; # numbered token
        return 1 if m/%[qQ];/;    # everything else token
    }
    return 0; # otherwise return 0 (false) if none found
}
sub empty_tokens ($) {
    my $bookmark = shift;
    for ($bookmark) {
        # put empty string in for all tokens:
        s/%[qQ]\d+;//g; # numbered tokens
        s/%[qQ];//g;    # everything else tokens
    }
    return $bookmark;
}
sub url_encode ($) {
    my $term = shift;
    # yadda yadda yadda
    # %-encode all the naughty bits
    # code not provided, as I'm sure
    # Mozilla has an existing routine
    # which would be used for that.
    return $term;
}

m_mozilla

Reporter

Updated

•

23 years ago

OS: Mac System 9.x → All

xyzzy

Comment 1

•

23 years ago

Thank you for the submission, but please use an attachment for long reports,
especially for code.

*** This bug has been marked as a duplicate of 98749 ***

Status: NEW → RESOLVED

Closed: 23 years ago

Resolution: --- → DUPLICATE

m_mozilla

Reporter

Comment 2

•

23 years ago

not a dup. Bug 98749 asks for way less than this [RFE] asks.

In particular, bug 98749 only wants s/%s/query/g instead of s/%s/query/ (global
subsitution of the %s token) while this bug wants multiple tokens to stuff
ordered arguments in different locations (and the ability to specify that some
args not be %-encoded).

If this bug is addressed, then bug 98749 one may become irrelevant, but fixing
bug 98749 won't address this RFE at all. Actually, I submited a dup of 98749
immediately before submitting this bug, precisely because they are two separate
things. I'll go mark *that* bug (bug 124173, referenced in the description) as a
dup of bug 98749.

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

m_mozilla

Reporter

Comment 3

•

23 years ago

noticed bug in perl code

    # get the encoded version of $everything_else
    $everything_else = join( '+', @terms );

no encoding actually happens. Should be

    # get the encoded version of $everything_else
    $everything_else = join( '+', map {&url_encode($_)} @terms );

correcting, and putting perl code in as an attachment...

-matt

m_mozilla

Reporter

Comment 4

•

23 years ago

Attached file (corrected) perl code to use as starting point for C code to implement RFE (obsolete) — Details

m_mozilla

Reporter

Comment 5

•

23 years ago

Attached file (correct and tested) perl code to use as staring point (obsolete) — Details

noticed another bug. Decided to actually test (*gasp*) the code rather than
just re-read it. Found/fixed a couple more bugs. Added a couple of
optimizations (extra attempted substitutions and term escaping are no longer
performed). This code actually works. Interested parties may want to use this
in a CGI script called for their "Internet Keywords" search engine until this
RFE is addressed.

I hope this won't be too messy to translate into C/C++...

-matt

Attachment #68535 - Attachment is obsolete: true

m_mozilla

Reporter

Comment 6

•

23 years ago

Attached file improved perl code to use as a starting piont (obsolete) — Details

I noticed one problem with the other code. I didn't have enough test cases and
my previous discussion had not covered this odd case, so my code didn't address
it either.

    http://a.com/?%q123abc

How should that be addressed? I decided that it should be treated as

    http://a.com/?%q;123abc

The previous code would have failed to recognize a %q token at all, and left
the %q token unused and unremoved in the resulting URL

Attachment #68540 - Attachment is obsolete: true

Andreas Franke (gone)

Comment 7

•

23 years ago

I like this idea in general. But if you are implementing this, you should be
strict in refusing ambiguous or non-wellformed urls, like these cases:

- mising ; at the end of the url
- more than one digit, as long as you don't support more than 9 args
- %qxyz; (where 'xyz' are non digits)

Just reject the url when the user tries to enter the keyword - url pair.
You can always relax the restriction later on, but you can't add additional
restrictions without the danger of breaking backwards compatibility.

Also, you may want to have a look at how shells do this on *nix (linux etc.),
like bash , or csh / tcsh. In shells, args are usually referenced as $1 ... $9 ,
but there are some subtleties, e.g. "$*" in tcsh, or "$@" in bash for all args,
and $0 for the program name (=script name) itself (which would correspond to the
keyword name in your setting). If you make keywords behave like shell commands,
people with linux experience will thank you :-)

timeless

Comment 8

•

23 years ago

if we're going to look to shells, %1..%9 is what cmd uses, and %* means everything. ^ is used to escape things. http://www.ss64.demon.co.uk/ntsyntax/parameters.html

Personally I don't like the idea of resequencing based on usage.

For now, i'd suggest that we use the intl routines instead of any of the ideas suggested above. A short description of them can be found in bug 67372.

keyword: graph
url: http://quote.fool.com/Chart/chart.asp?time=%s&symbols=%s
keyword: bugzilla
url: http://bugzilla.mozilla.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&field0-0-0=product&type0-0-0=substring&value0-0-0=%1$s&field0-0-1=component&type0-0-1=substring&value0-0-1=%1$s&field0-0-2=short_desc&type0-0-2=substring&value0-0-2=%1$s&field0-0-3=status_whiteboard&type0-0-3=substring&value0-0-3=%1$s

keyword translate
url: http://fets3.freetranslation.com:5081/?Language=%3$s%2F%5$s&Url=http%3A%2F%2F%1$s&Sequence=core
notes:
 from: http://www.faqts.com/knowledge_base/view.phtml/aid/11896/fid/53
 usage: translate http://url from English to Swedish
 comment: this keyword is designed to be a natural language parser instead of a techy parser.  It would be impossible to implement this if you practiced parameter sliding because %2$s and %4$s are not useful to the backend although they are useful to the end user.

--

note that at this point in time, i'd imagine you could use a javascript: url to implement whatever you want. [It'd be painful but hey]

The string code we have also supports %d however, afaik it does not support syntaxes for
javascript:"take the third and following items".split(/ /).slice(2)

http://developer.netscape.com/docs/manuals/communicator/jsref/core1.htm

bugzilla won't linkify the above url, you'd probably need to use the one below
javascript:eval(String.fromCharCode(102,117,110,99,116,105,111,110,32,117,40,97,41,123,114,101,116,117,114,110,32,97,46,114,101,112,108,97,99,101,40,47,43,47,103,44,39,32,39,41,125));if(dump)dump(u('this+is+a+test+of+plus+escaped+urls\n'));eval(u("'take+the+third+and+following+items'.split(/+/).slice(2)"))

ftang: is there a way to get all parameters (akin to %*)

Status: REOPENED → NEW

m_mozilla

Reporter

Comment 9

•

23 years ago

Comment #8 wrote:
> if we're going to look to shells, %1..%9

I personally think there should be fair amount of emphasis on *if*.

    1) shell scripts aren't URI templates
       they have whitespace and other tokens to provide context
       we have much less

    2) I think it would be good to leverage existing expectations.
       people already use non-shell tokens for URI templates
       (custom keywords, map-making links in address book)

> Personally I don't like the idea
> of resequencing based on usage.

I didn't understand that statement, so I don't know whether
I feel the same or not.

> For now, i'd suggest that we use the intl routines instead
> of any of the ideas suggested above. A short description
> of them can be found in bug 67372.

If I'm reading those correctly, they may work well in the
context sprintf-type construction of messages to the user,
but they are ill-suited to a URI-Template context.

    keyworda arg_one arg_two
    http://a.com/?foo=%1$s%2F%2$s
    http://a.com/?foo=arg_one%2Farg_two

ok, that works... but what if you have more than 9 args?

    %10$s  ==  URI-escape "%10" and then "$s"
 or %10$s  ==  URI-Template token for 10th arg

The syntax also seems very unfriendly to non-programmers
(and even programmers will have to type *very* slowly).

Certainly there's something to be said for leveraging existing
code, but I think that it is a bad idea to push the syntax in
a direction with a limited future. I would rather see *no*
support for these cool features in the near-term and then see
it done right farther down the road.... instead of putting in
support in the near-term which will then make it even harder
to add features down the road. ...IMHO

I'd say a good first step is to require token termination in
URI templates. So, instead of %s you now have to do %s;. I've
already put in bug 124240 in the hopes that this might happen
in the near term before it gets harder to ask users to do.

When I get around to it, I'll probably just "make this work"
for myself by creating a CGI script to handle all my keyword
needs. This also gives me the advantage of a globally-avail
bookmarks file. However, I think the features are cool enough
that users everywhere might benefit if Mozilla provided them.

I've got some shorter perl code that does things with a bit
less flexibility with regard to malformed tokens. I'll afix
it in a sec.

-matt

m_mozilla

Reporter

Comment 10

•

23 years ago

Attached file new perl: more efficient, less forgiving WRT malformed tokens, fewer comments (obsolete) — Details

I'm not sure if having something leaner in perl will
make it easier or harder to put into C... but FWIW,
here's some new perl code...

-matt

Attachment #68748 - Attachment is obsolete: true

m_mozilla

Reporter

Comment 11

•

23 years ago

Attached file bug fix to above (obsolete) — Details

Sigh...

It's important to save a file before uploading it, eh?

Some time after uploading the previsous version, I noticed the file was unsaved
in my editor... Turns out I uploaded a version missing " unless @terms" in one
line... which totally stripped out some functionality.

fixed. ...sigh...

Attachment #69476 - Attachment is obsolete: true

m_mozilla

Reporter

Comment 12

•

23 years ago

Attached file now entirely unforgiving of malformed tokens (obsolete) — Details

must... stop... attaching...!

basically a regex tweak to make code entirely unforgiving of malformed tokens.
Code will not attempt to "fix" anything that isn't a perfectly-formed token.
Malformed tokens are simply ignored as left in-place in the resultin URI.

Note that initial zeros are assumed to be an aesthetic preference rather than
a malformed token. So, %q09; might be easier to read for some folks than %q9;

...and for predicatability's sake %0q; is treated as %q; (greedy token).
If folks think that zero-indexed numbered tokens should be an option,
then you simply need to change one '*' in the regex to a '+'...

    s/%([qQ])0*(\d*);/%$1$2;/g;
    s/%([qQ])0*(\d+);/%$1$2;/g;

-matt
(who has no clue how hairy this would look in C, but is curious)

(...and who promisses not to add any more attachments.... today)

Attachment #69480 - Attachment is obsolete: true

timeless

Comment 13

•

23 years ago

not that i'm encouraging it but, &#37; would probably be a reasonable approach 
for % escaping.  I didn't pick the %1000$s syntax, I only asked and was told.

As I mentioned earlier, ^ is cmd's escape for magic chars, so ^% would be used 
to mean %.

%s is just a shortcut.  $s/$d certainly can be considered termination tokens.

the translate http://url from English to Swedish example was about 
resequencing.

If you resequence, you have a really hard time discarding tokens (from, to)

it was in response to:
What if the user puts numbered %q##; tokens in their custom keyword URL, but 
doesn't use sequential numbers
starting from 1?
    http://a.com/a.cgi?a=%q5;&b=%q8;&c=%q2;
The above should work exactly like
    http://a.com/a.cgi?a=%q2;&b=%q3;&c=%q1;
In other words, they get filled in in numeric order.

m_mozilla

Reporter

Comment 14

•

23 years ago

The problem I see here is that %00 to %FF already have specific meaning in URIs.
That's how you encode "funky" characters in an URI. So, if we have a syntax that
has tokens begining with any of those hexidecimal escapes, then we run into
potential confusion. Is it a hexidecimal escape followed by stuff that happens
to make it look like a token by cooincidence, or is it a real token?

The possibility of escaping legitimate % characters in the URI is one way to fix
this ambiguity. However, now we're asking users to do a pretty serious amount of
work in order to generate a URI token. They have to take each *legitimate* %
character and replace it with either ^% or &#37; (and now we have to be concered
about legitimate appearances of *those* sequences in a URI which only happen to
look like escaped % characters in our template syntax). This just gets very very
messy and very easy for users to screw up.

All of these problems vanish when numbers (and A-F) are not allowed after the %
character in our token syntax. Because %s or %q have no defined meaning in a URI
because the % character should only appear as the initial character in a
hexidecimal character escape, and if it's followed by a character not in 0-F,
then it's not a hexidecimal escape and therefor not part of a "real" URI.

This is why I'm pushing for something like %s; %s1; %s2; etc.

The biggest problem with *any* mechanisim of escaping legitimate % characters is
that most users won't do it. If joe sixpack has a few custom keywords (which are
*not* intended to be used as custom keyword queries), then he's not going to
scan through for % characters and escape them. He's just going to treat it like
a regular bookmark with a "nickname". If because of this, he leaves something in
place that looks like a token, then his reasonable expectation of a functioning
bookmark will fail. Example:
    http://a.com/vote_for_a_var.cgi?myvote=$stop%21$stop%21$stop%21
    user is expressing a vote of "$stop!$stop!$stop"
    (%21 is an encoded exclamation point)
Kinda looks like %21$s doesn't it? The code required to tell when someone is
escaping something to have a query token or failing to escape things because
they've never heard of query tokens will never be ensured to be accurate. Sure,
we can say "you should have escaped those % characters", but who among the joe
sixpack crowd is going to buy that? They'll just decide that custom kewords are
broken sometimes.

Suppose you want a bookmark to validate any page on mozilla.org. To validate
    http://mozilla.org/foo.html
you want to do just
    val foo.html

You start with

    http://validator.w3.org/check?uri=http%3A%2F%2Fmozilla.org%2Ffoo.html

In current syntax, you just replace foo.html with %s.
That works, but it doesn't accomodate numbered tokens. If we follow the proposal
that legitimate % chracters need to be escaped in order to accomodate numbered
tokens, then we have a fair amount of work to do.

    http://validator.w3.org/check?uri=http^%3A^%2F^%2Fmozilla.org^%2F%s
    (assuming we're going with the easier of the two escape methods)

Now, the ^ character can be seen in the wild in escaped and unescaped forms in
URIs. It isn't as common as, say the tilde, (as in a.com/~username/), but it
does happen. How are we going to distinguish legitimate ^ characters from ones
used to escape % characters? Suppose we use the &#37; escape instead.... Things
are even worse. The & character is *very* common in URIs. In HTML, that
character is escaped, but in a raw URI (in the location bar), that character
appears all the time. Not only are users likely to screw up, but the code
required on the back end is going to be complicated and more likely to have bugs
or be difficult to extend in the future.

Suppose I've got a site hosted on a mac OS9, (slashes routinely allowed in
paths) and all of my pages are in the path (from document root)
    ~dir   ->   misc/etc   ->  crap&stuff  ->  100% mine ^%^
and I want a similar bookmark to validate any of my pages. Start with

    http://validator.w3.org/check?uri=
    http://amac.example.com/~dir/misc%2Fetc/crap%26stuff/100%25%20mine^%25^/

hmmm.... If I'm joe sixpack and I want to have a keyword that does just that
validation, will it work if I forget to escape my % characters? If I'm geek guy
and want to have a custom keyword with a token at the end of that (and maybe
another token past that to specifiy which version of HTML/XHTML I want to
validate against), then I'll need to escape things. How likely am I to get all
the escaping done right?

basically, this need to escape things is just about guaranteed to blow up in the
future. Some folks won't escape, some will. Some will try, and do it wrong.
Mozilla will be expected to recognize when someone is trying to escape things,
and when they just have a URI (without tokens) that just happens to *look* like
some sort of escape. Shoehorning new features into URI templates with all of
these problems in the future is going to be a nightmare.

The problem is that the current syntax was *not* derived with this usage in
mind, and it is horribly limited in this context. It can't be extended much past
the simple %s that we have now without breaking things.

We need a syntax where tokens are terminated, and we need that syntax (if it
uses % characters) to NEVER have any of 0123456789ABCDEFabcdef after the %
character.

This is why I've suggested the syntax I have.

I've attached perl code that does what I have in mind. This shows that it can be
done. I admit that it won't be as consice in C, but it can still be done with
the same precision. If you go with some other syntax which requires escaping of
% characters, I challenge you to write (in *any* language, including english)
code that will reliably handle cases where the user isn't doing any escaping
(joe sixpack who doesn't do custom keyword queries) and geek guy who uses them
all the time. Then consider how much messier your code is, how much more work it
required on the part of the end user (all that escaping), and decide whether you
still want to stick with the current %1$s syntax instead of moving toward %s;
and %s1;

thanks :)

-matt

m_mozilla

Reporter

Comment 15

•

23 years ago

Attached file bugfixed code with tests included — Details

decided I liked %q0; to be a zero-indexed numbered token rather than a synonym
for %q; (greedy token). found/fixed another bug. Put in a double fistfull of
test cases so that I can stop introducing bugs when I tweak it.

test cases make interesting reading. I've got working examples for
    driving directions from one address to another
    searching for a substring in bugs owned by specific person
    translating web pages to/from several languages and english
    graphing stock exchange symbols over different timeframes

Attachment #69485 - Attachment is obsolete: true

m_mozilla

Reporter

Comment 16

•

23 years ago

I think the specific case of driving directions between arbitrary addresses
hilights the flexibility of this URI template syntax. It's a bit ugly right now,
but has the potential to be fixed up because the syntax has room to expane.

The perl code supports a URLbar text of

drivefrom_a_to_b 111+StartHere+Blvd 12345 999+EndHere+Ave 56789

(extra whitespace not required, just added to help readability).
Assumes keyword drivefrom_a_to_b with URI Template of
http://maps.yahoo.com/py/ddResults.py?newaddr=%Q1;&newcsz=%q2;&taraddr=%Q3;&tarcsz=%q4;

Admitidly, you would probably just use the keyword "drive" instead. If you've
got the proper custom keyword set up, the above thing is possible with the perl
code... and I don't think it comes close to being possible with the current
syntax, or even with the %1$s syntax. This syntax allows it, but it's a wee bit
ugly.

However, because the syntax has room to expand, some day down the road, an
enterprising developer could invent some syntax addition.

%q4t,;

might mean "fourth numbered token gobbles up subsequent tokens until it
encounters a token which ends in the character ','. (and the comma is
configurable)... and

%q3x;

might mean the third token, which is always rendered as an empty string in the
resulting URI (which allows you to put "filler" in your queries). These two
additions would allow

drive_from 123 Here Avenue, 12345 to 789 There Blvd, 56789

With a URI Template of
http://maps.yahoo.com/py/ddResults.py?newaddr=%q1t,;&newcsz=%q2;&taraddr=%q4t,;&tarcsz=%q5;%q3x;

Mind you, I'm not saying that the precise syntax extensions described are a good
idea, but they are at least possible, and someone could code up support for
better ideas. I envision that once support for such things was built in, dynamic
content providors could provide users with "smart bookmarks" which, when given
the right keyword allow users to use nearly natural language queries in their
URL bar.

I know... pretty far fetched, but the low-hanging fruit is sweet enough that I
don't think you need to buy into the pie-in-the-sky possible future if this
route is taken. It's handy enough to be able to have mutiple arguments for
custom keyword queries, and I hope easy enough to do it in a future-friendly
fashion with potentially extensible syntax that doesn't require joe sixpack to
do any extra work when making "normal" bookmarks.

-matt

m_mozilla

Reporter

Comment 17

•

23 years ago

fixing this bug takes care of the following (and more):

    bug 124240
    bug 98749
    bug 123006

-matt

Ben Goodger (use ben at mozilla dot org for email)

Updated

•

23 years ago

Status: NEW → ASSIGNED

Target Milestone: --- → Future

m_mozilla

Reporter

Updated

•

22 years ago

Depends on: 124240

Alex Stewart

Updated

•

22 years ago

Blocks: 123006

Alex Stewart

Updated

•

22 years ago

No longer blocks: 123006

Alex Stewart

Updated

•

22 years ago

Blocks: 123006

Daniel Wang

Comment 18

•

22 years ago

*** Bug 98749 has been marked as a duplicate of this bug. ***

Josh

Comment 19

•

21 years ago

Attached file Bookmarklet to generate other bookmarklets which recognize multiple keyword arguments — Details

I've written a bookmarklet ("mmab" -> Make Multiple Argument Bookmarklet) that
does the job for me.  Given a reference URL containing %s placeholders and a
label, it constructs a new bookmarklet which will accept multiple arguments and
substitute them where each %s occurs in the reference URL.  In the main browser
window it outputs a link to the new bookmarklet with the given label, suitable
for bookmarking and assigning a keyword.

If mmab is invoked with no arguments (i.e. by clicking or by keyword alone) it
prompt()'s for the URL and label.  Alternatively, it can be assigned a keyword
and invoked from the location bar like this:
     mmab http://finance.yahoo.com/q?s=%s+%s&d=t Yahoo! Quotes

There is no support for named parameters ($1,$2,etc.).	The substitution order
is "argument n -> placeholder n". 

(Note: remove newlines in attachment or set preference
editor.singleLine.pasteNewlines = 3 in about:config)

Jesse Ruderman

Comment 20

•

20 years ago

See also bug 236097, same bug for Firefox.

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: Browser → Seamonkey

Mark Smith

Comment 21

•

18 years ago

I think that one part that missing here is how to have defaults for the parameters, which I think is the next most common use case.  For example, with the case of looking at a stock, you may want to default to period to 2 days, only overriding it if it's been supplied.  This would be accomplished with syntax such as:
http://www.stock.com/compare.hml?stock1=%q1;&stock2=%q2;&period=%q3=2d;
Where we're using the fact that we terminate the token with ';' and = isn't a digit.  Alternatively, we could reuse % as a delimiter for these kinds of sections/parameters to a token, so it would be:

%q<index>%<default>%<modifiers>;

such as:
%q3%d2%s;
which would mean the 3rd argument, defaulted to "d2" if not given and strip any whitespace (for some random meaning of the 's' parameter).

Lastly, I'm not sure what the right answer is for having stuff with spaces in it treated as a single parameter.  It's been suggested to enclose it in "", but what if you want the speechmarks included, such as when doing a google search for a phrase?  Something to think about I guess.

Serge Gautherie (:sgautherie)

Updated

•

16 years ago

Assignee: bugs → nobody

Status: ASSIGNED → NEW

QA Contact: claudius → bookmarks

Target Milestone: Future → ---

Serge Gautherie (:sgautherie)

Updated

•

14 years ago

Depends on: 236097

Phoenix

Comment 22

•

12 years ago

Still valid RFE

Summary: [RFE] (w/ code) custom keywords should support multiple substitutions → Custom keywords should support multiple substitutions (w/ code)

Whiteboard: [2012 Fall Equinox]

(corrected) perl code to use as starting point for C code to implement RFE 23 years ago m_mozilla 4.09 KB, text/plain		Details
(correct and tested) perl code to use as staring point 23 years ago m_mozilla 3.35 KB, text/plain		Details
improved perl code to use as a starting piont 23 years ago m_mozilla 2.90 KB, text/plain		Details
new perl: more efficient, less forgiving WRT malformed tokens, fewer comments 23 years ago m_mozilla 1.42 KB, text/plain		Details
bug fix to above 23 years ago m_mozilla 1.43 KB, text/plain		Details
now entirely unforgiving of malformed tokens 23 years ago m_mozilla 1.44 KB, text/plain		Details
bugfixed code with tests included 23 years ago m_mozilla 18.87 KB, text/plain		Details
Bookmarklet to generate other bookmarklets which recognize multiple keyword arguments 21 years ago Josh 1.19 KB, text/plain		Details