Closed Bug 1779267 Opened 3 years ago Closed 3 years ago

[Search terms data collection] Duplicate sequence numbers

Categories

(Firefox :: Address Bar, defect, P1)

defect

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
firefox105 --- fixed

People

(Reporter: rburwei, Assigned: adw)

References

Details

Attachments

(1 file)

I am seeing the same Sequence Number assigned to different search terms with different timestamps in the same Session ID.

Running this SQL will show you some examples:

WITH
  query_logs AS (
  SELECT
    timestamp as merino_timestamp,
    jsonPayload.fields.*
  FROM
    `suggest-searches-prod-a30f.logs.stdout`
  WHERE
    jsonPayload.type = "web.suggest.request"
    AND jsonPayload.fields.session_id IS NOT NULL ),
  ordered_queries AS (
  SELECT
    merino_timestamp,
    query,
    session_id,
    sequence_no,
    sequence_no - LAG(sequence_no) OVER(PARTITION BY session_id ORDER BY sequence_no) AS sequence_no_diff1,
    sequence_no - LEAD(sequence_no) OVER(PARTITION BY session_id ORDER BY sequence_no) AS sequence_no_diff2
  FROM
    query_logs
  ORDER BY
    session_id,
    sequence_no )
SELECT
  session_id,
  sequence_no,
  merino_timestamp,
  query,
FROM
  ordered_queries
WHERE
  (sequence_no_diff1 NOT IN (1,
      -1)
    AND sequence_no_diff1 IS NOT NULL)
  OR (sequence_no_diff2 NOT IN (1,
      -1)
    AND sequence_no_diff2 IS NOT NULL)
ORDER BY
  1,
  2,
  3,
  4

The severity field is not set for this bug.
:adw, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(adw)

This is probably due to my decision to increment the sequence number when Firefox receives a response from Merino instead of when it sends the request. If a request happens to reach Merino after Firefox cancels it or after 200ms, Firefox will reuse the number for the next request. The drawback of that decision is possible duplicate numbers, but the benefit is there shouldn't be any gaps in the numbers. I think we have to live with one or the other -- occasional reused numbers or gaps. Would gaps be better?

Severity: -- → S3
Flags: needinfo?(adw) → needinfo?(rburwei)
Priority: -- → P3

Drew, could you clarify why there would be gaps if we made the change? I am leaning towards gaps, but want to understand the consequences. Would the change need to ride the trains?

Adding Dave in case he has input.

Flags: needinfo?(rburwei) → needinfo?(adw)

The reason we'll have to live with one or the other has to do with the window of time after Firefox sends a request to Merino and before it receives a response. If the user types another character during that window, Firefox will send another request. It will first attempt to cancel the previous request, but by that point the request may already be on its way to Merino or have even already reached it. Firefox needs to choose a number for the new request but it can't know if Merino will receive the previous one.

So for the new request's sequence number, there's a matrix of possible outcomes when considering (a) whether the previous request reached Merino and (b) the time at which sequence numbers are incremented:

Previous request reached Merino Previous request did not reach Merino
Increment on request sent No gap or duplicate Gap
Increment on response received Duplicate No gap or duplicate

So for example looking at the upper-right quadrant, if Firefox increments sequence numbers at the time it sends requests and the previous request did not reach Merino, then Merino will see a gap when it receives the new request. We might expect that to happen often for users who type quickly. My hunch is that gaps would happen more often than duplicates, but I don't know. Sorry for not making this clearer earlier.

Would the change need to ride the trains?

Yes, 105 is the earliest version we could make a change.

Flags: needinfo?(adw)

Thanks Drew for the information!

We see duplicates in ~15% of search terms and ~35% of search sessions. @Dave, because of the high percentages here, I'm inclined to switch to gaps (increment on request sent) even though this will push out the date we start receiving this data from the Release channel. What do you think?

Flags: needinfo?(dzeber)

I would favour gaps as well. The main use case for the sequence number is event ordering, so gaps would not inherently be an issue, and having unique numbers within a session would be preferable to having duplicates. As a side effect, IIUC seeing a gap would also provide an indication that we are missing a request out of that session eg. due to fast typing (although not sure that this is useful at the moment).

Flags: needinfo?(dzeber)

Sounds good, sorry again for doing the wrong thing here.

Assignee: nobody → adw
Severity: S3 → S4
Status: NEW → ASSIGNED
Depends on: 1776513
Priority: P3 → P1

Something I failed to consider with the initial implementation is that with duplicate sequence numbers we would not be able to use them in conjunction with the session_id as a replacement for the request _id. For that we would need to guarantee that session_id + sequence_no is unique across requests.

@Drew -- no worries! Thanks for your help. I appreciate your leadership. Seeing as we rarely go back on decisions, I think you're making the right call much of the time.

Pushed by dwillcoxon@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/31f619af801f Increment the Merino sequence number on each request instead of each response. r=daisuke
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: