Closed
Bug 1181650
Opened 9 years ago
Closed 9 years ago
asses viability of ES as a replacement data store for reports
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: lonnen, Assigned: adrian)
Details
I'd like to better understand what is easy and hard to do with ES in the webapp. Specifically, I'd like to assess transferring the following the source their data from ES instead of Postgres, perhaps with a long-lived cache in the middle:
each tab within signature summary
crashes per user
top changers
top crashers
crash trends
gc crashes
exploitable crashes
Reporter | ||
Comment 1•9 years ago
|
||
Skip top changers and crash trends, we'll purge those instead.
https://bugzilla.mozilla.org/show_bug.cgi?id=1185055
https://bugzilla.mozilla.org/show_bug.cgi?id=1186474
Assignee | ||
Comment 2•9 years ago
|
||
Signature Summary
=================
Operating System
----------------
We have something close in the signature report page, but missing
precision on the OS names (signature summary shows "Windows 7" or
"Windows Vista", data that is not in Elasticsearch).
Requires: transforming the processed JSON's ``os_name`` field (exposed
as ``platform``) to have a more precise OS name.
Uptime Range
------------
Doable with a histogram on the ``uptime`` field. Not a feature yet, but
very similar to bug 1186355, so it should be easy to add afterwards.
Requires: adding histogram capability on the ``uptime`` field to Super
Search.
Product
-------
That's actually "Product / Version". That's an aggregation in an
aggregation, and we already have something similar for ``signature``
(see bug 1177605). Extending it to other fields is not complex.
Requires: adding sub-aggregations on the ``product`` field to Super
Search.
Architecture
------------
Already possible in the signature report page. In the Aggregations tab,
aggregate on ``cpu name``.
Requires: nothing.
Process Type
------------
Already possible in the signature report page. In the Aggregations tab,
aggregate on ``process type``.
Requires: nothing.
Flash Version
-------------
Already possible in the signature report page. In the Aggregations tab,
aggregate on ``flash version``.
Requires: nothing.
Crashes per Install
-------------------
This one requires counting the distinct ``install age`` of a dataset.
That's doable with the "cardinality" aggregation of Elasticsearch. It is
not accessible from Super Search as of now but doesn't sound too hard to
add.
Requires: adding distinct values counting on the ``install age`` field
to Super Search.
Mobile Devices
--------------
I can find all the data except the "API Level" in Elasticsearch. We will
need to have sub-aggregations for ``android manufacturer`` field.
Requires: adding sub-aggregations on the ``android manufacturer`` field
to Super Search.
Graphics Adapter Report
-----------------------
It seems the data that is shown there is not accessible from Super
Search as of now. Since I don't where it is stored, or if it makes it to
Elasticsearch at all, it's hard to answer this one. It might either
simply require that we expose a few fields, or that we change processors
to put the data in Elasticsearch.
Requires: unsure.
Exploitability
--------------
The data is there, it needs date histograms to be presented by date, and
that's bug 1186355.
Requires: bug 1186355 (add histograms to Super Search) in order to show
the data by day.
Crashes per user
================
Simple graph of a count of crashes per day in a dataset. That's a date
histogram in Elasticsearch and some filtering. We already have bug
1186355 to add the date histogram. The real problem is that this graph
is scaled using ADI, and that data is only in PostgreSQL at the moment,
not accessible from an API endpoint.
Requires: bug 1186355 (add histograms to Super Search) in order to build
the graph and an API endpoint to access ADI data (or to rebuild it
without the ADI ratio).
Top changers
============
Not exactly sure what the data is, but it seems to be using only
signatures, so I don't think that's impossible to do with ES.
Top crashers
============
By date
-------
Can be done today (ongoing). "First apprearance" is going to be
difficult to do though, that data is in Postgres. If we really need it,
maybe we can expose it in an API endpoint?
Requires: bug 1184173 (near real-time TCBS), exposing "signature first
appearance" API endpoint.
By build
--------
This one is tough for me to answer, because I do not understand how it
is build, nor what data it uses. Also I don't know if users actually
need it, analytics seem to show that it is used very little.
Crash trends
============
That graph looks completely broken to me, so I don't know.
GC Crashes
==========
For this one we just need histograms for ``build id``, which is very
similar to bug 1186355. Data is scaled using ADI too, so exposing that
data would be necessary to match the existing one exactly.
Requires: adding histogram capability on the ``build id`` field to Super
Search and an API endpoint to access ADI data (or to rebuild it without
the ADI ratio).
Exploitable crashes
===================
Already doable using Super Search. It's a sub-aggregation of
``exploitability`` under ``signature`` with a filter on
``exploitability``.
Assignee | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•