Closed Bug 1475934 Opened 7 years ago Closed 7 years ago

Investigate the current state of art for anomaly detection and explanation

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
Tracking Status
firefox63 --- affected

People

(Reporter: Dexter, Assigned: aciepielewska)

References

Details

We should investigate the scientific literature about anomaly detection and explanation and create a document about what we found.
Assignee: nobody → aciepielewska
Blocks: 1475933
Priority: -- → P1
Proposed solutions after investigation: 1. Take the data we have and calculate the distance between each day and the next (first day with the second, second with third etc.) with respect to the day of the week. This way we have probability distribution and each day we may perform a test if the change is significantly different. Pros of this solution: - it uses the fact we have distributions as data points - it’s simple and does not require much data - it’s easy to implement 2. Predicting the next step using LSTM (perhaps 2 LSTMs, one for categorical data and one for continuous). Then the error will indicate an anomaly, the threshold is chosen by a heuristic. Pros of this solution: - it’s implemented in python - it’s interesting to investigate - LSTMs are doing well in many kinds of series problems, so there is a possibility they’ll work in our case 3. Trying to ‘learn’ the series by VAE as in DONUT (check out the notes) architecture (maybe change the sliding window to LSTM, because it’s simple to use on histograms). Again, there may be need of making 2 VAE, one for continuous data and one for categorical. Pros of this solution: - it’s implemented in Python - it’s interesting to investigate - it seems like people are using it often nowadays, so maybe it’s worth a try The rest of the algorithms were rejected for various reasons: - The statistical methods didn't use the fact, that our data points are histograms and potentially would not work well with seasonality - The other machine learning methods were tested on univariate data or were not implemented in Python Link to my notes: https://docs.google.com/document/d/1DcGOV4bwqdDX6i855ZW2shztbUPZgmzpvQouOn8pEio/edit?ts=5b44b5f1#
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.