Re-establish the dataflow_watermark_age dashboard
Categories
(Data Platform and Tools :: General, defect)
Tracking
(Not tracked)
People
(Reporter: klukas, Assigned: kik)
References
Details
(Whiteboard: [dataquality])
The pipeline latency sum query (https://sql.telemetry.mozilla.org/queries/69304/source#174837) gives: Access Denied: Table benwu-test-1:monitoring.dataflow_watermark_age: User does not have permission to query table benwu-test-1:monitoring.dataflow_watermark_age
I don't know if any of us have enough context to rebuild what benwu had in that project.
| Reporter | ||
Updated•4 years ago
|
| Assignee | ||
Updated•3 years ago
|
| Assignee | ||
Comment 1•3 years ago
|
||
Started having a look, interestingly enough the permissions error I see in Redash does not seem to be related to my GCP account's lack of access to the tables as I'm able to access the tables that are part of the query just fine via the BigQuery UI and I am able to execute the very query just fine.
table: benwu-test-1:monitoring.pubsub_subscript_oldest_unacked
last_updated: Aug 9, 2021, 11:00:10 PM UTC+2
project_id subscription_id timestamp value
moz-fx-data-beam-prod-11f7 telemetry-aet.decoder 2020-08-04 23:20:00 UTC 0.0
SELECT distinct project_id FROM `benwu-test-1.monitoring.pubsub_subscript_oldest_unacked` LIMIT 1000
results in the following two rows:
moz-fx-data-beam-prod-11f7
moz-fx-data-ingesti-prod-d59c
table: benwu-test-1:monitoring.dataflow_watermark_age
last_updated: Aug 9, 2021, 11:00:13 PM UTC+2
example row from the table:
job_name project_id region timestamp value
telemetry-raw_republisher_060dd0c_2 moz-fx-data-beam-prod-11f7 us-west1 2020-11-08 04:30:00 UTC 15.5
SELECT distinct project_id FROM `benwu-test-1.monitoring.dataflow_watermark_age` LIMIT 1000
returns only one row:
moz-fx-data-beam-prod-11f7
Looking at the last_updated values for both tables it is clear that the processes updating those tables has stopped, my assumption is that it stopped when Benwu's account got suspended.
The next step is to look at the GCP project identified so far to see if there are any existing artifacts left that could help identify how those tables were populated in the past.
| Assignee | ||
Comment 2•3 years ago
|
||
Two random rows that get generated from the query in redash (only changed time interval to last 2 years due to the tables not updating):
timestamp value component
2021-08-09 21:00:00 UTC 149.0 bq_sink_payload_sub_unacked
2021-08-09 21:00:00 UTC 655.4 bq_sink_loader_sub_unacked
| Assignee | ||
Comment 3•3 years ago
|
||
Created a JIRA ticket as requsted by George: https://mozilla-hub.atlassian.net/browse/DENG-78
Updated•3 years ago
|
Comment 4•3 years ago
|
||
Kik, can you outline what the expected outcome would be of fixing this? That would inform some conversation on how important it is, which we can discuss during a future working group meeting.
| Assignee | ||
Comment 5•3 years ago
•
|
||
Honestly, I never really got much background on this. Revisiting this it appears the original redash query was displaying some sort of latency information regarding pub/sub topics for project moz-fx-data-beam-prod-11f7:
https://console.cloud.google.com/cloudpubsub/subscription/list?project=moz-fx-data-beam-prod-11f7
From the subscription_ids specifically filtered for in the query only the following seem to still exist:
- telemetry-raw.decoder
Based on the comments contained in the query the dashboard was meant to display the following two pieces of information:
- Dataflow watermark age (I'm actually not sure what this refers to)
- Oldest unacknowledged message for the subscription_ids filtered for.
In terms of importance, I'd say rather low. This appears to have been broken before I even joined Mozilla (at least 10 months) and I don't recall anyone ever mentioning this. It is not even clear to me if this is actually useful nor do I know who I could ask about this.
Maybe it would be worth asking about it during the next DPIWG meeting to see if it should just be dropped.
The biggest problem with just "reproducing" this is that I can't find the original logic used to create benwu-test-1.monitoring.pubsub_subscript_oldest_unacked and benwu-test-1.monitoring.dataflow_watermark_age.
Comment 6•3 years ago
|
||
Agreed. Let's close this out. IIRC this was more useful than the Grafana dashboards, but like you said we haven't had it in a looong time. If we decide we want something similar we can start from scratch as a new task.
| Assignee | ||
Comment 7•3 years ago
|
||
Thanks!
Updated•2 years ago
|
Description
•