Test numeric limits of the updated size_t memory_distribution type in GIFFT
Categories
(Toolkit :: Telemetry, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox105 | --- | fixed |
People
(Reporter: perry.mcmanis, Assigned: perry.mcmanis)
Details
Attachments
(1 file)
What happens when we call Telemetry::Accumulate(id, size_t) and fog_memory_distribution_accumulate(id, size_t) with values that exceed the FOG version of memory distributions?
fog_memory_distribution_accumulate takes a uint64_t which might not always be big enough to fit all values of size_t which (for exotic platforms)
Write a test for the numeric limits of the memory_distribution type in GIFFT. Include some weird/huge values
Assignee | ||
Comment 1•2 years ago
•
|
||
So after a bit of experimentation I could use a bit of assistance on how to go about this.
When I try to accumulate a value too big for an unsigned 64 bit int (2^65), the buckets that show up are not what (I would expect) we should see if this were defined behavior. Instead the tail of the distro reads: ["8760003",0],["9147841",0],["9552851",0],["9975792",1],["10417458",0]]
If I understand, the final bucket should have the logged value or the key should be over 2^65, correct? In fact, it will do this for much smaller values (e.g. 2^61). Ok well it is out of range, but I'd have expected an exception.
As it happens, 2^35 throws.
I did notice that values above 1tb/2^12 should cause an invaliderror to be logged. But, there is no Desktop testGetNumRecordedErrors.
I'm not sure how to proceed here. If you like, I can file a bug for the test interface and start implementing that so we can count, I could write a test that generically confirms this doesn't cause the value to go into the final bucket, we could say that the value is clamped and just let it go and I could close, or I'm open to suggestions if you don't like any of these.
Comment 2•2 years ago
|
||
2^65
should become invalid_value
because it's over 1TB
: https://mozilla.github.io/glean/book/reference/metrics/memory_distribution.html#accumulate
That it doesn't suggests that there's a conversion (implicit or explicit) before it makes it down to Rust. Put some logging (fprintf_stderr
usually does the trick) in the JS implementation (perhaps here) to see what it's getting. And note that the JS impl is taking a uint64_t
there, so it'll be implicitly converted to size_t
at that point (if there's a difference between the two types, which on my machine there isn't)). Might wish to make that size_t
at that point.
I guess I should digress a moment to decorate the flow here.
- When you call accumulate in JS, XPCOM will try to convert the JS
Number
you are using into the C++ type in the method signature (uint64_t
as mentioned previously). There are some checks and balances to make sure you're not sending strings, but aside from that it has been, in my experience, a very loose implicit transformation. No doubt it is rigourously defined somewhere, but I don't know where, but regardless you should have auint64_t
in the implementation of the JSaccumulate
- It will be implicitly cast to
size_t
as it goes from the JS impl to the C++ impl. Again, only if the types are at all different. - It will be implicitly cast to the implementation type of
u64
as it goes from C++ impl across cbindgen to Rust. Pretty sure this isuint64_t
again on all the usual platforms.
So my guess is that 2^65 is being converted using either XPCOM or C++ implicit conversion rules to a value that is smaller than 1TB. We should investigate where that happens and what happens in that case. (Maybe filing a follow-up bug to reimplement that API for JS to not implicitly convert Really Big Numbers to weird small ones. At that point maybe we also extend support to include JS BigInt
)/
A note on testGetNumRecordedErrors
-- As per docs, FOG uses testGetValue
to report all states: Error, Success, Presence, or Absence of data. If there are errors on the metric, regardless of the presence or absence of data we return an Error in C++ which in JS we turn into NS_ERROR_LOSS_OF_SIGNIFICANT_DATA
(after printing to the Browser Console (and possibly your shell) the text of the actual problem. Tools > Browser Tools > Browser Console to view.). Thus, throwing on testGetValue
should be expected when you exceed the numeric limits here.
Does that help?
Assignee | ||
Comment 3•2 years ago
|
||
Unchanged type:
0:09.56 pid:55904 Perry is accumulating value 999 <- provided 999
0:09.56 pid:55904 Perry is accumulating value 10022 <- provided 10022
0:09.56 pid:55904 Perry is accumulating value 0 <- provided math.pow(2, 63)
0:09.57 pid:55904 Perry is accumulating value 0 <- provided math.pow(2, 65)
Changed GleanMemoryDistribution to take size_t:
0:03.15 pid:60059 Perry is accumulating value 999
0:03.15 pid:60059 Perry is accumulating value 10022
0:03.15 pid:60059 Perry is accumulating value 0
0:03.15 pid:60059 Perry is accumulating value 0
Assignee | ||
Comment 4•2 years ago
|
||
Assignee | ||
Comment 5•2 years ago
|
||
Also adds a small guardrail consistent with the implementation in Timing Distribution
Pushed by pmcmanis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5fd037a32fc2 test size limits of updated Memory Distribution Accumulate r=Dexter
Comment 7•2 years ago
|
||
Backed out for causing xpcshell failures on test_GIFFT.js
Pushed by pmcmanis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/92f831c8044e test size limits of updated Memory Distribution Accumulate r=Dexter
Comment 9•2 years ago
|
||
bugherder |
Assignee | ||
Updated•2 years ago
|
Description
•