Test numeric limits of the updated size_t memory_distribution type in GIFFT
Categories
(Toolkit :: Telemetry, task)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox105 | --- | fixed |
People
(Reporter: perry.mcmanis, Assigned: perry.mcmanis)
Details
Attachments
(1 file)
What happens when we call Telemetry::Accumulate(id, size_t) and fog_memory_distribution_accumulate(id, size_t) with values that exceed the FOG version of memory distributions?
fog_memory_distribution_accumulate takes a uint64_t which might not always be big enough to fit all values of size_t which (for exotic platforms)
Write a test for the numeric limits of the memory_distribution type in GIFFT. Include some weird/huge values
| Assignee | ||
Comment 1•3 years ago
•
|
||
So after a bit of experimentation I could use a bit of assistance on how to go about this.
When I try to accumulate a value too big for an unsigned 64 bit int (2^65), the buckets that show up are not what (I would expect) we should see if this were defined behavior. Instead the tail of the distro reads: ["8760003",0],["9147841",0],["9552851",0],["9975792",1],["10417458",0]]
If I understand, the final bucket should have the logged value or the key should be over 2^65, correct? In fact, it will do this for much smaller values (e.g. 2^61). Ok well it is out of range, but I'd have expected an exception.
As it happens, 2^35 throws.
I did notice that values above 1tb/2^12 should cause an invaliderror to be logged. But, there is no Desktop testGetNumRecordedErrors.
I'm not sure how to proceed here. If you like, I can file a bug for the test interface and start implementing that so we can count, I could write a test that generically confirms this doesn't cause the value to go into the final bucket, we could say that the value is clamped and just let it go and I could close, or I'm open to suggestions if you don't like any of these.
Comment 2•3 years ago
|
||
2^65 should become invalid_value because it's over 1TB: https://mozilla.github.io/glean/book/reference/metrics/memory_distribution.html#accumulate
That it doesn't suggests that there's a conversion (implicit or explicit) before it makes it down to Rust. Put some logging (fprintf_stderr usually does the trick) in the JS implementation (perhaps here) to see what it's getting. And note that the JS impl is taking a uint64_t there, so it'll be implicitly converted to size_t at that point (if there's a difference between the two types, which on my machine there isn't)). Might wish to make that size_t at that point.
I guess I should digress a moment to decorate the flow here.
- When you call accumulate in JS, XPCOM will try to convert the JS
Numberyou are using into the C++ type in the method signature (uint64_tas mentioned previously). There are some checks and balances to make sure you're not sending strings, but aside from that it has been, in my experience, a very loose implicit transformation. No doubt it is rigourously defined somewhere, but I don't know where, but regardless you should have auint64_tin the implementation of the JSaccumulate - It will be implicitly cast to
size_tas it goes from the JS impl to the C++ impl. Again, only if the types are at all different. - It will be implicitly cast to the implementation type of
u64as it goes from C++ impl across cbindgen to Rust. Pretty sure this isuint64_tagain on all the usual platforms.
So my guess is that 2^65 is being converted using either XPCOM or C++ implicit conversion rules to a value that is smaller than 1TB. We should investigate where that happens and what happens in that case. (Maybe filing a follow-up bug to reimplement that API for JS to not implicitly convert Really Big Numbers to weird small ones. At that point maybe we also extend support to include JS BigInt)/
A note on testGetNumRecordedErrors -- As per docs, FOG uses testGetValue to report all states: Error, Success, Presence, or Absence of data. If there are errors on the metric, regardless of the presence or absence of data we return an Error in C++ which in JS we turn into NS_ERROR_LOSS_OF_SIGNIFICANT_DATA (after printing to the Browser Console (and possibly your shell) the text of the actual problem. Tools > Browser Tools > Browser Console to view.). Thus, throwing on testGetValue should be expected when you exceed the numeric limits here.
Does that help?
| Assignee | ||
Comment 3•3 years ago
|
||
Unchanged type:
0:09.56 pid:55904 Perry is accumulating value 999 <- provided 999
0:09.56 pid:55904 Perry is accumulating value 10022 <- provided 10022
0:09.56 pid:55904 Perry is accumulating value 0 <- provided math.pow(2, 63)
0:09.57 pid:55904 Perry is accumulating value 0 <- provided math.pow(2, 65)
Changed GleanMemoryDistribution to take size_t:
0:03.15 pid:60059 Perry is accumulating value 999
0:03.15 pid:60059 Perry is accumulating value 10022
0:03.15 pid:60059 Perry is accumulating value 0
0:03.15 pid:60059 Perry is accumulating value 0
| Assignee | ||
Comment 4•3 years ago
|
||
| Assignee | ||
Comment 5•3 years ago
|
||
Also adds a small guardrail consistent with the implementation in Timing Distribution
Comment 7•3 years ago
|
||
Backed out for causing xpcshell failures on test_GIFFT.js
Comment 9•3 years ago
|
||
| bugherder | ||
| Assignee | ||
Updated•3 years ago
|
Description
•