Functional histogram bucketing is different on x86 and x86_64
Categories
(Data Platform and Tools :: Glean: SDK, defect, P1)
Tracking
(Not tracked)
People
(Reporter: mdroettboom, Assigned: mdroettboom)
Details
In the process of building i686 Windows wheels for the Python bindings, I discovered a difference in the output of bucket_index_to_bucket_minimum. The following values are for the timing distribution, which sets log_base = 2 and buckets_per_magnitude to 8.
For x86_64 (let's consider this baseline and "correct" for now):
input: 7 output: 1
input: 8 output: 2
input: 9 output: 2
input: 10 output: 2
input: 11 output: 2
input: 12 output: 2
input: 13 output: 3
input: 14 output: 3
input: 15 output: 3
input: 16 output: 4
For i686:
input: 7 output: 1
input: 8 output: 1
input: 9 output: 2
input: 10 output: 2
input: 11 output: 2
input: 12 output: 2
input: 13 output: 3
input: 14 output: 3
input: 15 output: 3
input: 16 output: 3
(This discrepancy results in the failure in this Python test, but it's probably easier to look at this at the Rust level, since that seems to be the source of the issue).
Questions:
- Do we see the same discrepancy between 32-bit and 64-bit ARM? If so, that seems like a P1-level thing to fix.
- Is this difference the result of Windows-specifically, or do we also see it on i686 on other platforms?
| Assignee | ||
Updated•6 years ago
|
I'm looking into it.
I extract the important parts into its own crate here: https://github.com/badboy/histogram-test/blob/master/src/lib.rs
I ran the tests on several platforms (cargo test --target $TARGET or copying the generated executable and running it directly), all passing:
i686-unknown-linux-gnu- 32 bit Linux target, Host: 64 bit Linuxi686-unknown-linux-musl- 32 bit Linux & musl target, Host: 64 bit Linuxx86_64-apple-darwin- my default 64bit macOSarmv7-linux-androideabi- Android 32bit arm target- Using
armv7a-linux-androideabi21-clangas provided by the Android NDK - Running in the emulator on an Arm machine:
Linux localhost 3.10.0+ #255 SMP PREEMPT Fri May 19 11:50:06 PDT 2017 armv7l
- Using
i686-pc-windows-msvc(done by :dexter), 32 bit Windows target, Host: 64 bit Windows
:mdboom reported that test failed on i686-pc-windows-gnu.
As I'm away the rest of the week I'm going to drop assignment for now.
:mdboom, can you run the above crate on i686-pc-windows-gnu and x86_64-pc-windows-gnu?
| Assignee | ||
Comment 3•6 years ago
|
||
I can't reproduce with i686-pc-windows-gnu. There must be something fishy about running within the Python interpreter? Poking at more things...
| Assignee | ||
Updated•6 years ago
|
| Assignee | ||
Comment 4•6 years ago
|
||
The difference is only visible when running inside of the Python interpreter.
Thanks to the helpful information in the blog post floating point determinism, I tracked this down to possibly being the result of FPU flags, which have per-thread global state. The flags and the Windows-specific APIs used to change them are documented here.
Using this API, I was able to see that when running jan-erik's standalone and correct example above, the flags were 0x8001f. In the context of running inside Python, the flags were 0x9001f, so the different bit is 0x010000. This is the _PC_53 flag, which specifies 53-bit floating-point precision (rather than 64-bit precision).
The Python interpreter does, in fact, set up code to tweak this bit for x86 on Windows specifically, and this is used in a handful of places, mostly around parsing strings into floats, such as here. Unfortunately, Google and git blame don't provide easy answers as to why this is necessary.
All of the sites where this bit is turned on do in fact turn it back off, so I'm not sure why the setting is sticking and then impacting our Rust code. It's clear that CPython is at least trying to be a good citizen with global state here. Perhaps the reading of the current value is failing... That's the one part of this investigation that feels like a hole to me.
However, the solution of unsetting this bit around our floating-point operations does seem to work and help our unit tests pass.
One important side outcome of this investigation: It is clearly useful to have unit tests at the Python level that test functionality that is primarily at the Rust level, otherwise this might not have been noticed before we started collecting data. I, personally, felt I was wasting time when I wrote this unit test and I clearly was wrong.
| Assignee | ||
Comment 5•6 years ago
|
||
Some additional context (for curiosity's sake) about why CPython does this: https://bugs.python.org/issue13889
Updated•6 years ago
|
| Assignee | ||
Updated•6 years ago
|
Description
•