Closed Bug 1181650 Opened 9 years ago Closed 9 years ago

asses viability of ES as a replacement data store for reports

Categories

(Socorro :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: lonnen, Assigned: adrian)

Details

I'd like to better understand what is easy and hard to do with ES in the webapp. Specifically, I'd like to assess transferring the following the source their data from ES instead of Postgres, perhaps with a long-lived cache in the middle: each tab within signature summary crashes per user top changers top crashers crash trends gc crashes exploitable crashes
Signature Summary ================= Operating System ---------------- We have something close in the signature report page, but missing precision on the OS names (signature summary shows "Windows 7" or "Windows Vista", data that is not in Elasticsearch). Requires: transforming the processed JSON's ``os_name`` field (exposed as ``platform``) to have a more precise OS name. Uptime Range ------------ Doable with a histogram on the ``uptime`` field. Not a feature yet, but very similar to bug 1186355, so it should be easy to add afterwards. Requires: adding histogram capability on the ``uptime`` field to Super Search. Product ------- That's actually "Product / Version". That's an aggregation in an aggregation, and we already have something similar for ``signature`` (see bug 1177605). Extending it to other fields is not complex. Requires: adding sub-aggregations on the ``product`` field to Super Search. Architecture ------------ Already possible in the signature report page. In the Aggregations tab, aggregate on ``cpu name``. Requires: nothing. Process Type ------------ Already possible in the signature report page. In the Aggregations tab, aggregate on ``process type``. Requires: nothing. Flash Version ------------- Already possible in the signature report page. In the Aggregations tab, aggregate on ``flash version``. Requires: nothing. Crashes per Install ------------------- This one requires counting the distinct ``install age`` of a dataset. That's doable with the "cardinality" aggregation of Elasticsearch. It is not accessible from Super Search as of now but doesn't sound too hard to add. Requires: adding distinct values counting on the ``install age`` field to Super Search. Mobile Devices -------------- I can find all the data except the "API Level" in Elasticsearch. We will need to have sub-aggregations for ``android manufacturer`` field. Requires: adding sub-aggregations on the ``android manufacturer`` field to Super Search. Graphics Adapter Report ----------------------- It seems the data that is shown there is not accessible from Super Search as of now. Since I don't where it is stored, or if it makes it to Elasticsearch at all, it's hard to answer this one. It might either simply require that we expose a few fields, or that we change processors to put the data in Elasticsearch. Requires: unsure. Exploitability -------------- The data is there, it needs date histograms to be presented by date, and that's bug 1186355. Requires: bug 1186355 (add histograms to Super Search) in order to show the data by day. Crashes per user ================ Simple graph of a count of crashes per day in a dataset. That's a date histogram in Elasticsearch and some filtering. We already have bug 1186355 to add the date histogram. The real problem is that this graph is scaled using ADI, and that data is only in PostgreSQL at the moment, not accessible from an API endpoint. Requires: bug 1186355 (add histograms to Super Search) in order to build the graph and an API endpoint to access ADI data (or to rebuild it without the ADI ratio). Top changers ============ Not exactly sure what the data is, but it seems to be using only signatures, so I don't think that's impossible to do with ES. Top crashers ============ By date ------- Can be done today (ongoing). "First apprearance" is going to be difficult to do though, that data is in Postgres. If we really need it, maybe we can expose it in an API endpoint? Requires: bug 1184173 (near real-time TCBS), exposing "signature first appearance" API endpoint. By build -------- This one is tough for me to answer, because I do not understand how it is build, nor what data it uses. Also I don't know if users actually need it, analytics seem to show that it is used very little. Crash trends ============ That graph looks completely broken to me, so I don't know. GC Crashes ========== For this one we just need histograms for ``build id``, which is very similar to bug 1186355. Data is scaled using ADI too, so exposing that data would be necessary to match the existing one exactly. Requires: adding histogram capability on the ``build id`` field to Super Search and an API endpoint to access ADI data (or to rebuild it without the ADI ratio). Exploitable crashes =================== Already doable using Super Search. It's a sub-aggregation of ``exploitability`` under ``signature`` with a filter on ``exploitability``.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.