Open Bug 1334630 Opened 7 years ago Updated 4 years ago

Measure the edit distance when selecting a search suggestion

Tracking

()

Status:

NEW

Tracking Flags:

Tracking

Status

firefox54

---

affected

People

(Reporter: past, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxsearch])

Attachments

(1 file)

Bug 1334630 - Measure the edit distance when selecting a search suggestion. 7 years ago Drew Willcoxon :adw 59 bytes, text/x-review-board-request	dzeber : feedback+	Details

Panos Astithas (he/him) [:past] (please ni?)

Reporter

Description

•

7 years ago

When the user selects a search suggestion, we want to measure the edit distance (Levenshtein distance) between the selected search suggestion and the typed string. We want the same measurement for the search bar as well.

Georg Fritzsche [:gfritzsche]

Comment 1

•

7 years ago

This could be implemented using an exponential histogram.

Drew Willcoxon :adw

Updated

•

7 years ago

Assignee: nobody → adw

Status: NEW → ASSIGNED

Panos Astithas (he/him) [:past] (please ni?)

Reporter

Updated

•

7 years ago

Priority: -- → P1

Dave Zeber [:dzeber]

Comment 2

•

7 years ago

This is an interesting metric, and I want to make sure we choose an optimal binning that maximizes the interpretability of the collected data.

I've been thinking about what the edit distance would look like for search suggestions. Levenshtein distance is at least the difference in string lengths, and at most the length of the longer string. Some example use cases for search suggestions:

- suggestion is a spelling-corrected version of the typed query (distance is <= 5? in absolute terms)
- suggestion is the completion of a longer phrase from a shorter typed prefix (distance is larger (10-100?), depends on length of suggestion string)

This could be captured in an exponential histogram, which has more granular buckets for smaller values. We'd want to make sure 0 is its own bucket, since this is an interesting special case.

An alternative scheme would be to normalize the edit distance by the length of the longer string, and report the value as a percentage. This lets us use a linear histogram with potentially more granularity. However, then we lose the absolute value of the metric.

Ideally, we'd record the pair of measurements (query length, edit distance from suggestion). For example, we could key the edit distance histogram by binned query length (short/medium/long?). However, this might be overkill.

Any thoughts?

Panos Astithas (he/him) [:past] (please ni?)

Reporter

Comment 3

•

7 years ago

If we'd record the pair of measurements, would we need both this and bug 1334611?

Dave Zeber [:dzeber]

Comment 4

•

7 years ago

(In reply to Panos Astithas [:past] from comment #3)
> If we'd record the pair of measurements, would we need both this and bug
> 1334611?

Yes, since the measurements here would only apply to cases where a search suggestion was selected.

That's why I think that if we go with this keyed approach, at least for now, we don't need to worry about getting too granular with the query lengths. I was thinking of this mainly as a way to add more context to the edit distance metric.

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/121602/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/121602/

Drew Willcoxon :adw

Comment 6

•

7 years ago

Comment on attachment 8848707 [details]
Bug 1334630 - Measure the edit distance when selecting a search suggestion.

Dave, could you please give me feedback on this patch?  In particular I need help choosing bucket sizes, etc.  I'll describe what it does:

* Adds a new FX_SEARCH_SUGGESTION_EDIT_DISTANCE histogram:

  "FX_SEARCH_SUGGESTION_EDIT_DISTANCE": {
    "expires_in_version": "58",
    "kind": "exponential",
    "keyed": true,
    "high": 200,
    "n_buckets": 20,
    ...
  }

Are the high and n_buckets values OK?  200 seems way high but also a not-too-unreasonable upper bound.

According to the telemetry docs and my experience testing this patch, we'll get a default low value of 1 with this, which means that we will get a separate zero bucket (which would also capture negative values if we had any, but we shouldn't of course), as you suggested.

* I chose some keys based on the last paragraph in your comment 2.  The keys are:

    "6" => queries of length  (0,  6]
   "12" => queries of length  (6, 12]
   "24" => queries of length (12, 24]
   "48" => queries of length (24, 48]
  ">48" => queries of length > 48

Is this too many keys, the right mix, etc.?

Attachment #8848707 - Flags: feedback?(dzeber)

Panos Astithas (he/him) [:past] (please ni?)

Reporter

Updated

•

7 years ago

Priority: P1 → P3

Dave Zeber [:dzeber]

Comment 7

•

7 years ago

Comment on attachment 8848707 [details]
Bug 1334630 - Measure the edit distance when selecting a search suggestion.

For the keys, how about we do:

"short" => queries of length  (0,  3]
"medium" => queries of length  (3, 10]
"long" => queries of length > 10

I think three buckets for query length is sufficient for now, and we can revisit this if we decide this metric should become more central to evaluating search suggestions. I'm especially interested in separating out the case where a search suggestion completes a short prefix of a few characters from cases involving longer typed queries (eg. a spelling correction). I think 3 is a good cutoff for the prefix completion case, and the second threshold of 10 is somewhat more arbitrary.

For the edit distance buckets, a "high" of 200 is fine. Edit distances larger than this are likely an edge case, as this means that the suggestion modifies/adds/removes 200 characters relative to the typed query.

What are the bucket ranges when setting "n_buckets": 20? I suggest choosing an "n_buckets" large enough that values of 1 to 5 each get assigned to separate buckets, and the remaining bucket ranges will be granular enough for our purposes.

Attachment #8848707 - Flags: feedback?(dzeber) → feedback+

Drew Willcoxon :adw

Comment 8

•

7 years ago

Unassigning myself from non-P1 search bugs for now so I have more time for Photon bugs.

Assignee: adw → nobody

Status: ASSIGNED → NEW

Mark Banner (:standard8)

Updated

•

4 years ago

Severity: normal → N/A

Type: defect → task

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Measure the edit distance when selecting a search suggestion

Categories

(Firefox :: Address Bar, task, P3)

Tracking

()

People

(Reporter: past, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxsearch])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Updated

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Updated

Attachment

General

Description

File Name

Content Type