Closed Bug 764856 Opened 12 years ago Closed 12 years ago

GCLI should allow fuzzy matching of commands

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Future

People

(Reporter: jwalker, Unassigned)

Details

Attachments

(3 files, 1 obsolete file)

faster spell-checking patch 12 years ago Quentin Pradet [:quentinp] 8.22 KB, patch		Details \| Diff \| Splinter Review
v2 12 years ago Joe Walker [:jwalker] 11.22 KB, patch		Details \| Diff \| Splinter Review
Profile for the new Damerau-Levenshtein algorithm. 12 years ago Quentin Pradet [:quentinp] 151.27 KB, text/html		Details
Profile for the old version 12 years ago Quentin Pradet [:quentinp] 151.12 KB, text/html		Details

Joe Walker [:jwalker]

Reporter

Description

•

12 years ago

GCLI has a spell module, but it's disabled for being slow.
Example:

>> hlp -> help


See also bug 720689
See also http://stackoverflow.com/questions/5859561/getting-the-closest-string-match#answer-5859823

Joe Walker [:jwalker]

Reporter

Comment 1

•

12 years ago

Also GCLI has a 'Speller' module that was used, but is now disabled because it was too slow.

(Triage)

Whiteboard: [good first bug][mentor=jwalker]

Quentin Pradet [:quentinp]

Comment 2

•

12 years ago

Attached patch faster spell-checking patch (obsolete) — Details — Splinter Review

It turns out that the Speller module uses Peter Norvig's spellchecking algorithm where the vocabulary size is expected to be huge (~100K words). It computes all plausible edits and tries to match them with a big dictionary. This is not practical for GCLI, since the dictionary is very small. Instead, the Speller module should compute the Levenshtein distance for every possible command.

I implemented a working proof-of-concept that passes existing unit tests (I tested manually since I don't know how to run the tests). I'm not only using Levenshtein distance but Damerau-Levenshtein distance which includes swaps (eg. distance("help", "hlep") is not two but one).

I overwrote spell.js but this may be a bad idea since the patch is hard to read. I'd be happy to change anything in this patch. This is my first patch for a Mozilla product: if I did anything wrong I'll be happy to fix it.

Attachment #657691 - Flags: review?

Thaddee Tyl [:espadrine]

Comment 3

•

12 years ago

Quentin, this patch looks great.

The fuzzy matching algorithms that I've seen in the wild look something like this: <https://raw.github.com/espadrine/mozcmd/master/find.mozcmd>
This algorithm returns a list of all accepted matches, and it is a subset of all commands. On the other hand, using Levenshtein yields all commands, each with a score (that may be very bad for words that have nothing in common).

This begets the question, what is the corresponding user experience? If it provides a list of acceptable matches, a list that gets more precise as the user types, then using Levenshtein is awkward. If it auto-corrects when we type space, then we really only need the best match, and Levenshtein is the tool for the job.

On a totally different note, I'd be very interested to know a performance analysis between the old approach (spellchecking) and the new one (yours).

Also, as a nit: the < operator doesn't order lexicographically, but along Unicode code points. (For instance, Abel < ant < Bob, but "Abel" < "Bob" < "ant".) 

Finally, a very rare edge-case occurs when you're computing with "__proto__" as a candidate. Unlikely to happen, but we never know. Can you use a map instead of an object for |distance|?
<https://developer.mozilla.org/en-US/docs/JavaScript/Reference/Global_Objects/Map>

Joe Walker [:jwalker]

Reporter

Comment 4

•

12 years ago

Thanks for looking at this Thaddee.

(In reply to Thaddee Tyl [:espadrine] from comment #3)
> Quentin, this patch looks great.
> 
> The fuzzy matching algorithms that I've seen in the wild look something like
> this: <https://raw.github.com/espadrine/mozcmd/master/find.mozcmd>
> This algorithm returns a list of all accepted matches, and it is a subset of
> all commands. On the other hand, using Levenshtein yields all commands, each
> with a score (that may be very bad for words that have nothing in common).
> 
> This begets the question, what is the corresponding user experience? If it
> provides a list of acceptable matches, a list that gets more precise as the
> user types, then using Levenshtein is awkward. If it auto-corrects when we
> type space, then we really only need the best match, and Levenshtein is the
> tool for the job.

Right now, it's providing a single match, but there is nothing to prevent us updating the UI to show more than one.

> On a totally different note, I'd be very interested to know a performance
> analysis between the old approach (spellchecking) and the new one (yours).
> 
> Also, as a nit: the < operator doesn't order lexicographically, but along
> Unicode code points. (For instance, Abel < ant < Bob, but "Abel" < "Bob" <
> "ant".) 

Does that cause a bug, or is it just a case of localeCompare being better in general?

> Finally, a very rare edge-case occurs when you're computing with "__proto__"
> as a candidate. Unlikely to happen, but we never know. Can you use a map
> instead of an object for |distance|?
> <https://developer.mozilla.org/en-US/docs/JavaScript/Reference/
> Global_Objects/Map>

Agreed, except that GCLI works in web browsers where Map isn't available yet.

I'd like to accept this patch because it works (quite nicely) now, but I'm not against us looking at presenting the user with multiple matches in a followup.

Joe Walker [:jwalker]

Reporter

Comment 5

•

12 years ago

Attached patch v2 — Details — Splinter Review

Several of the tests that checked for predictions were broken buy the original patch, this fixes them.
Also hidden commands were considered for completion. They are now correctly hidden.
Finally the spell module was tidied up to remove the need for a constructor, to fix some indents, etc.

Attachment #657691 - Attachment is obsolete: true

Attachment #657691 - Flags: review?

Thaddee Tyl [:espadrine]

Comment 6

•

12 years ago

(In reply to Joe Walker [:jwalker] from comment #4)
> > Also, as a nit: the < operator doesn't order lexicographically, but along
> > Unicode code points. (For instance, Abel < ant < Bob, but "Abel" < "Bob" <
> > "ant".) 
> 
> Does that cause a bug, or is it just a case of localeCompare being better in
> general?

It's not a bug, especially in this case where it's only used for strings with the same score. That's just me being pedant about alphabetical order ☺
`localeCompare` still gives us ASCIIbetical order either, though.

(More at http://en.wikipedia.org/wiki/ASCIIbetical_order#Order)

> I'd like to accept this patch because it works (quite nicely) now, but I'm
> not against us looking at presenting the user with multiple matches in a
> followup.

That sounds great.

Quentin Pradet [:quentinp]

Comment 7

•

12 years ago

Attached file Profile for the new Damerau-Levenshtein algorithm. — Details

(In reply to Thaddee Tyl [:espadrine] from comment #3)
> Quentin, this patch looks great.
> 
> This begets the question, what is the corresponding user experience? If it
> provides a list of acceptable matches, a list that gets more precise as the
> user types, then using Levenshtein is awkward. If it auto-corrects when we
> type space, then we really only need the best match, and Levenshtein is the
> tool for the job.

I assumed that Levenshtein was only to be used for spellchecking, eg. when nothing else worked. It gives a nice edit distance and the way it works is well understood and is said to match the kind of typos humans do.

If you want to show multiple commands at the same time, the current algorithm can stay the same: when matching, show possible matches, otherwise, show the possible matches. Lowering the maximum edit distance would avoid showing some of the weirdest matches. However, multiple matches are not going to be needed until GCLI has way more commands. But it can make sense to optimize for a lot of commands right now.

> On a totally different note, I'd be very interested to know a performance
> analysis between the old approach (spellchecking) and the new one (yours).

I used Firebug (the only tool I know for this), and "benchmarked" by typing a few wrong words and asking for autocomplete. I tested the old version (norvig-spellcheck.html), and Joe Walker's version of the patch (levenshtein.html).

You can look at the fully detailed profile logs if you like, but what's important is that the old version spent more than 97% of CPU time in spell.js and the new version about 7.80% (3.57% in the actual edit distance code). It's no longer the main bottleneck.

> Also, as a nit: the < operator doesn't order lexicographically, but along
> Unicode code points. (For instance, Abel < ant < Bob, but "Abel" < "Bob" <
> "ant".) 

Maybe the patch could say "in the ASCIIbetical order"? What's important here is to have an uniform behavior.

Quentin Pradet [:quentinp]

Comment 8

•

12 years ago

Attached file Profile for the old version — Details

Quentin Pradet [:quentinp]

Updated

•

12 years ago

Attachment #658616 - Attachment mime type: text/plain → text/html

Quentin Pradet [:quentinp]

Updated

•

12 years ago

Attachment #658614 - Attachment mime type: text/plain → text/html

Joe Walker [:jwalker]

Reporter

Comment 9

•

12 years ago

https://tbpl.mozilla.org/?tree=Try&rev=b02fbf6363d7
https://tbpl.mozilla.org/?tree=Try&rev=2042d268a26b

Joe Walker [:jwalker]

Reporter

Comment 10

•

12 years ago

https://tbpl.mozilla.org/?tree=Fx-Team&rev=aaa9837d68b3

Whiteboard: [good first bug][mentor=jwalker] → [fixed-in-fx-team]

Tim Taubert [:ttaubert] (inactive)

Comment 11

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/dea1d1224107

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Whiteboard: [fixed-in-fx-team]

BMO Automation

Updated

•

6 years ago

Product: Firefox → DevTools

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

GCLI should allow fuzzy matching of commands

Categories

(DevTools :: Console, defect, P2)

Tracking

(Not tracked)

People

(Reporter: jwalker, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files, 1 obsolete file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Comment 9

Comment 10

Comment 11

Updated

Attachment

General

Description

File Name

Content Type