Open Bug 1303730 Opened 8 years ago Updated 8 years ago

Consider changing the default precision for cardinality queries

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

People

(Reporter: marco, Unassigned)

References

Details

What is the current value for the `precision_threshold`?

Could we change it only for a specific aggregation? (_cardinality.install_time)
Flags: needinfo?(adrian)
We currently use the default value for Elasticsearch, which "depends on the number of parent aggregations" -- see https://www.elastic.co/guide/en/elasticsearch/reference/1.4/search-aggregations-metrics-cardinality-aggregation.html

We could set a value for just install_time, what should it be?
Flags: needinfo?(adrian)
In an ideal world with infinite memory, we'd have precise results for the top signature (OOM | small), so that we would have precise results for all the other signatures as well (at least with a level-1 aggregation, e.g. _aggs.signature=_cardinality.install_time).

Would it be possible to have an estimate of the error that we're committing now for a few signatures?

How much memory does it cost to have more precise results? The documentation says "For a precision threshold of c, the implementation that we are using requires about c * 8 bytes.", but is this an overall number or a per-signature number?
Marcia hit this while investigating a signature (js::IdIsIndex). The estimated number of installations was 0. Looking at the data from those crashes, it looks like the actual number was probably 1.
You need to log in before you can comment on or make changes to this bug.