Closed Bug 1657852 Opened 4 years ago Closed 4 years ago

Validate deleted_pings_after_quota_hit and pending_pings_directory_size metrics

Categories

(Data Platform and Tools :: Glean: SDK, task, P2)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: brizental, Assigned: brizental)

References

Details

Attachments

(1 file)

Validate that both these metrics that were added in Bug 1601550 are being reported as expected.

Type: defect → task
Depends on: 1601550
Priority: -- → P3
Whiteboard: [telemetry:glean-rs:m?]
Assignee: nobody → brizental
Priority: P3 → P2
Whiteboard: [telemetry:glean-rs:m?]

Alright, so I made two queries to try and validate this:

  1. Monitor pending ping director size

This query does validate that pending_pings_directory_size is working correctly, please do correct me if you disagree. Now, this very sharp spike around 10MB, makes me wonder if we should follow what desktop does and lower the size of overflowing pending pings directories to 90% of the quota, so that we have space to store more pings and don't keep hitting the quota multiple times.

  1. Number of deleted pings vs. Max recorded pending pings directory size

This query is my attempt in validating deleted_pings_after_quota_hit, by relating it to the directory size. The larger the directory the more deleted pings we should see. That is the trend we observe in this plot, so I am confident that the metric works as expected. We also see pings being deleted before quota is hit and that is probably due to the latest feature added by [:janerik] of not only having a quota for size but also for number of pings.

There is A LOT of pings getting deleted by this feature though, and I am not sure that is an issue. Could be related to Bug 1665041.

Regarding this comment and Bug 1665653, we are indeed reaching the 10MB limit, even if you uncomment line 17 of query number 2 and only get clients on version 32.4.0, we still get clients who have reached the 10MB quota. Probably we should monitor this query and see if that number drops to zero at some point, but overall I believe we should keep that limit for the time being.

My conclusion is that the metrics work and we should keep an eye on them by adding these plots to the error monitoring dashboard.

Do you agree with my conclusions [:janerik], [:Dexter]?

Flags: needinfo?(jrediger)
Flags: needinfo?(alessio.placitelli)

(In reply to Beatriz Rizental from comment #2)

Alright, so I made two queries to try and validate this:

  1. Monitor pending ping director size

This query does validate that pending_pings_directory_size is working correctly, please do correct me if you disagree. Now, this very sharp spike around 10MB, makes me wonder if we should follow what desktop does and lower the size of overflowing pending pings directories to 90% of the quota, so that we have space to store more pings and don't keep hitting the quota multiple times.

I believe this makes sense, but let's see what Jan-Erik thinks.

My conclusion is that the metrics work and we should keep an eye on them by adding these plots to the error monitoring dashboard.

Do you agree with my conclusions [:janerik], [:Dexter]?

Yup :)

Flags: needinfo?(alessio.placitelli)

(In reply to Beatriz Rizental from comment #2)

Alright, so I made two queries to try and validate this:

  1. Monitor pending ping director size

This query does validate that pending_pings_directory_size is working correctly, please do correct me if you disagree. Now, this very sharp spike around 10MB, makes me wonder if we should follow what desktop does and lower the size of overflowing pending pings directories to 90% of the quota, so that we have space to store more pings and don't keep hitting the quota multiple times.

We currently just delete everything that puts us over this quota, right?
Then yes, ensuring some more space between what's left and the quota seems sensible.

  1. Number of deleted pings vs. Max recorded pending pings directory size

This query is my attempt in validating deleted_pings_after_quota_hit, by relating it to the directory size. The larger the directory the more deleted pings we should see. That is the trend we observe in this plot, so I am confident that the metric works as expected. We also see pings being deleted before quota is hit and that is probably due to the latest feature added by [:janerik] of not only having a quota for size but also for number of pings.

There is A LOT of pings getting deleted by this feature though, and I am not sure that is an issue. Could be related to Bug 1665041.

Regarding this comment and Bug 1665653, we are indeed reaching the 10MB limit, even if you uncomment line 17 of query number 2 and only get clients on version 32.4.0, we still get clients who have reached the 10MB quota. Probably we should monitor this query and see if that number drops to zero at some point, but overall I believe we should keep that limit for the time being.

Yeah, we should keep an eye on that. Once we get clients to the new version with the "number of files" limit we should see those numbers drop.

My conclusion is that the metrics work and we should keep an eye on them by adding these plots to the error monitoring dashboard.

Do you agree with my conclusions [:janerik], [:Dexter]?

Yup, agree. Please add it to the dashboard.

Flags: needinfo?(jrediger)
See Also: → 1668306

Ok, I opened Bug 1668306 for lowering the directory size to 90% of the quota and I added the queries to the error dashboard.

You'll notice that there is no data for Fenix Stable on that yet, but I put the queries there anyways because soon there will be.

Finally, I opened Bug 1668312 for updating these queries when they break due to the fix about the memory unit, that I fix here. Since these queries are in the monitoring dashboard we will see as soon as the corrected data starts flowing in and then I can fix the query right away. Actually while I write this I realize I can just fix this before the query breaks, so that is what I'll do. Either way we can close this bug for now :)

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: