Add a memory distribution metric to record the Glean database size
Categories
(Data Platform and Tools :: Glean: SDK, enhancement, P1)
Tracking
(Not tracked)
People
(Reporter: mdroettboom, Assigned: janerik)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
2.40 KB,
text/plain
|
chutten
:
data-review+
|
Details |
42 bytes,
text/x-github-pull-request
|
Details | Review |
It's possible a bug could be introduced that would explode the Glean database size in an unbounded way. Perhaps we should collect metadata on that (or only when size grows above some service-level guarantee?)
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
It's easy to get to the file size, additionally Rkv can tell us the number of data entries if we want to.
However, I'm struggling to find the right place to record this data.
Should this be a memory distribution?
If so:
- What exactly do we want to answer? The average size of the database we see across all users? Looking at outliers with large databases?
- What lifetime for the metric?
- When do we collect the data? Only on init, on each backgrounding? Before sending any ping? After sending a ping?
:Dexter, :mdroettboom, any input here?
Comment 2•4 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #1)
Should this be a memory distribution?
Probably, if we can get a memory value out of the data entries you linked.
If so:
- What exactly do we want to answer? The average size of the database we see across all users? Looking at outliers with large databases?
"Does a significant part of our users have a bloated database?" / "Do we need to find a solution to empty the database SOON?"
- What lifetime for the metric?
I believe "ping" lifetime on the "metrics" ping is fine.
- When do we collect the data? Only on init, on each backgrounding? Before sending any ping? After sending a ping?
When is the data available? Is there any cost involved in querying the data entries?
Assignee | ||
Comment 3•4 years ago
|
||
(In reply to Alessio Placitelli [:Dexter] from comment #2)
Probably, if we can get a memory value out of the data entries you linked.
We can also get the file size of the database on disk.
The pure number of data entries won't help us.
When is the data available? Is there any cost involved in querying the data entries?
When we ask for it. :)
File size is easy, we just need to look at the file system (that's I/O though).
I think number of data entries (if we even want that) is tracked by the DB and thus cheap to get, but I'll verify that. Consider it to be cheap for now (though again I don't think just number of entries is a helpful measure for us)
Comment 4•4 years ago
|
||
(In reply to Jan-Erik Rediger [:janerik] from comment #3)
When we ask for it. :)
File size is easy, we just need to look at the file system (that's I/O though).
Then I'd lean more towards only at init
Reporter | ||
Comment 5•4 years ago
|
||
Yeah -- I think reading the file size on init (in a background thread) feels like the right cadence.
Assignee | ||
Comment 6•4 years ago
|
||
(In reply to Michael Droettboom [:mdroettboom] from comment #5)
Yeah -- I think reading the file size on init (in a background thread) feels like the right cadence.
init runs off main thread plus we want to do that before rkv touches the database again (because then we potentially already read/write from/to it)
I believe "ping" lifetime on the "metrics" ping is fine.
Actually, that means it will only be sent on the next ping, then cleared. Is this a case for an "application" lifetime metric instead?
Assignee | ||
Comment 7•4 years ago
|
||
Comment 8•4 years ago
|
||
Comment 9•4 years ago
|
||
Comment on attachment 9168710 [details]
data-review-request.txt
DATA COLLECTION REVIEW RESPONSE:
Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?
Yes.
Is there a control mechanism that allows the user to turn the data collection on and off?
Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.
If the request is for permanent data collection, is there someone who will monitor the data over time?
Yes, Jan-Erik Rediger is responsible.
Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?
Category 1, Technical.
Is the data collection request for default-on or default-off?
Default on for all channels.
Does the instrumentation include the addition of any new identifiers?
No.
Is the data collection covered by the existing Firefox privacy notice?
Yes.
Does there need to be a check-in in the future to determine whether to renew the data?
No. This collection is permanent.
Result: datareview+
Assignee | ||
Updated•4 years ago
|
Description
•