Closed Bug 1612945 Opened 6 years ago Closed 6 years ago

Main ping contains missing rows between decoded and live tables on 2020-01-22

Categories

(Data Platform and Tools :: General, defect, P1)

defect
Points:
1

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: amiyaguchi, Assigned: klukas)

Details

(Whiteboard: [dataquality])

As per the pings decoded and in live tables (main/event) dashboard, there is a 0.08% difference between the counts of rows in decoded vs live tables for the main ping. Other dates, such as 2020-01-30, also contain a significant number of missing rows.

There should be little to no differences between the number of rows between decoded and live tables.

When we deduplicate by document_id, there is no difference between the counts. The following query yields a null result set:

with pbd AS (
select document_id, count(*) n_pbd from
`moz-fx-data-shared-prod.payload_bytes_decoded.telemetry_telemetry__main_v4` 
where date(submission_timestamp ) = '2020-01-22'
group by 1
),
live as (
select document_id, count(*) n_live from
`moz-fx-data-shared-prod.telemetry_live.main_v4`  
where date(submission_timestamp ) = '2020-01-22'
group by 1
)

select * from pbd join live using (document_id)
where n_pbd is null or n_live is null

I think the only explanation here is that the payload_bytes_decoded sink read some messages from pubsub twice. It would be interesting to check if there was a deploy on this day that could have contributed.

Assignee: nobody → jklukas
Points: --- → 1
Priority: -- → P1

there is a 0.08% difference between the counts of rows in decoded vs live tables for the main ping

When we deduplicate by document_id, there is no difference between the counts.

this is expected behavior.

the live sink uses the old ingestion-beam sink, which attempts to achieve exactly once delivery but does not guarantee at least once delivery.

the decoded sink uses the new ingestion-sink, which produces a higher volume of duplicates because it guarantees at least once delivery.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Component: Datasets: General → General
Whiteboard: [data-quality] → [dataquality]
You need to log in before you can comment on or make changes to this bug.