Closed Bug 1780621 Opened 2 years ago Closed 2 years ago

Test numeric limits of the updated size_t memory_distribution type in GIFFT

Categories

(Toolkit :: Telemetry, task)

task

Tracking

()

RESOLVED FIXED
105 Branch
Tracking Status
firefox105 --- fixed

People

(Reporter: perry.mcmanis, Assigned: perry.mcmanis)

Details

Attachments

(1 file)

What happens when we call Telemetry::Accumulate(id, size_t) and fog_memory_distribution_accumulate(id, size_t) with values that exceed the FOG version of memory distributions?

fog_memory_distribution_accumulate takes a uint64_t which might not always be big enough to fit all values of size_t which (for exotic platforms)

Write a test for the numeric limits of the memory_distribution type in GIFFT. Include some weird/huge values

So after a bit of experimentation I could use a bit of assistance on how to go about this.

When I try to accumulate a value too big for an unsigned 64 bit int (2^65), the buckets that show up are not what (I would expect) we should see if this were defined behavior. Instead the tail of the distro reads: ["8760003",0],["9147841",0],["9552851",0],["9975792",1],["10417458",0]]

If I understand, the final bucket should have the logged value or the key should be over 2^65, correct? In fact, it will do this for much smaller values (e.g. 2^61). Ok well it is out of range, but I'd have expected an exception.

As it happens, 2^35 throws.

I did notice that values above 1tb/2^12 should cause an invaliderror to be logged. But, there is no Desktop testGetNumRecordedErrors.

I'm not sure how to proceed here. If you like, I can file a bug for the test interface and start implementing that so we can count, I could write a test that generically confirms this doesn't cause the value to go into the final bucket, we could say that the value is clamped and just let it go and I could close, or I'm open to suggestions if you don't like any of these.

Flags: needinfo?(chutten)

2^65 should become invalid_value because it's over 1TB: https://mozilla.github.io/glean/book/reference/metrics/memory_distribution.html#accumulate

That it doesn't suggests that there's a conversion (implicit or explicit) before it makes it down to Rust. Put some logging (fprintf_stderr usually does the trick) in the JS implementation (perhaps here) to see what it's getting. And note that the JS impl is taking a uint64_t there, so it'll be implicitly converted to size_t at that point (if there's a difference between the two types, which on my machine there isn't)). Might wish to make that size_t at that point.

I guess I should digress a moment to decorate the flow here.

  1. When you call accumulate in JS, XPCOM will try to convert the JS Number you are using into the C++ type in the method signature (uint64_t as mentioned previously). There are some checks and balances to make sure you're not sending strings, but aside from that it has been, in my experience, a very loose implicit transformation. No doubt it is rigourously defined somewhere, but I don't know where, but regardless you should have a uint64_t in the implementation of the JS accumulate
  2. It will be implicitly cast to size_t as it goes from the JS impl to the C++ impl. Again, only if the types are at all different.
  3. It will be implicitly cast to the implementation type of u64 as it goes from C++ impl across cbindgen to Rust. Pretty sure this is uint64_t again on all the usual platforms.

So my guess is that 2^65 is being converted using either XPCOM or C++ implicit conversion rules to a value that is smaller than 1TB. We should investigate where that happens and what happens in that case. (Maybe filing a follow-up bug to reimplement that API for JS to not implicitly convert Really Big Numbers to weird small ones. At that point maybe we also extend support to include JS BigInt)/

A note on testGetNumRecordedErrors -- As per docs, FOG uses testGetValue to report all states: Error, Success, Presence, or Absence of data. If there are errors on the metric, regardless of the presence or absence of data we return an Error in C++ which in JS we turn into NS_ERROR_LOSS_OF_SIGNIFICANT_DATA (after printing to the Browser Console (and possibly your shell) the text of the actual problem. Tools > Browser Tools > Browser Console to view.). Thus, throwing on testGetValue should be expected when you exceed the numeric limits here.

Does that help?

Flags: needinfo?(chutten)

Unchanged type:
0:09.56 pid:55904 Perry is accumulating value 999 <- provided 999
0:09.56 pid:55904 Perry is accumulating value 10022 <- provided 10022
0:09.56 pid:55904 Perry is accumulating value 0 <- provided math.pow(2, 63)
0:09.57 pid:55904 Perry is accumulating value 0 <- provided math.pow(2, 65)

Changed GleanMemoryDistribution to take size_t:
0:03.15 pid:60059 Perry is accumulating value 999
0:03.15 pid:60059 Perry is accumulating value 10022
0:03.15 pid:60059 Perry is accumulating value 0
0:03.15 pid:60059 Perry is accumulating value 0

Also adds a small guardrail consistent with the implementation in Timing Distribution

Pushed by pmcmanis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5fd037a32fc2
test size limits of updated Memory Distribution Accumulate  r=Dexter

Backed out for causing xpcshell failures on test_GIFFT.js

Backout link

Push with failures

Failure log

Flags: needinfo?(pmcmanis)
Pushed by pmcmanis@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/92f831c8044e
test size limits of updated Memory Distribution Accumulate  r=Dexter
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 105 Branch
Flags: needinfo?(pmcmanis)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: