Closed
Bug 1859614
Opened 2 years ago
Closed 9 months ago
Suggestions for metric incident investigation playbook
Categories
(Data Platform and Tools :: Glean: SDK, enhancement)
Data Platform and Tools
Glean: SDK
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: travis_, Assigned: travis_)
Details
This bug is aimed mainly at Frank and the work on this doc: https://docs.google.com/document/d/1tmI7PyHR1DGjbyTU8J1hAMq0lwngm6cV9c6E8SppCFw/edit
I had some additional (and some overlapping) suggestions that I have found useful, along with some explanations of what each split might tell us if we find anything:
Countries (e.g. China, Iran..?)
- If so, is there a national holiday or something similar going on?
- Is this an area known for bots or unusual activity (malaysia, china, Ireland, etc.)
ISP (e.g. BrowserStack?)
- Typically this is more fine grained than country and can be more proof of potential bots or automation if the anomaly is coming from a single ISP.
- There’s a lot of ISPs, might need a HAVING clause to filter out smaller ISPs
Product Version/build-id
- Did this start in a specific product version, if so, what changed in that version (work with the product team to answer this question)?
- Is this build-id a known Mozilla build-id, if not it could be a clone/fork or sideload build.
Glean SDK version
- Did this start in a new Glean version, if so, what changed in that version (work with the Glean team to answer this question)?
Other library version changes?
- Check Application Services updates, gecko updates, etc. to see if this can be tied to a specific version change there. Glean relies on things like viaduct and rkv which can affect data collection if there is a regression.
OS-SDK version (android SDK, iOS targets, etc)
- Something may have changed in the platform SDK that is affecting data collection.
- Typically this behavior shows up in either platform lifecycle event behavior changing (see 0 duration pings, etc), or in background task work (uploading pings)
Time difference between start/end_time and submission_timestamp.
- Do the timestamps we record appear reasonable for both the ping time window and the delay from collection/submission to receiving the ping in ingestion.
What Glean errors are there?
- Any networking or other telemetry errors that might be indicative of the issue? This could be an ingestion issue, etc.
Hardware mfg./version/etc.
- Does this only happen on older/newer hardware?
Updated•9 months ago
|
Component: Glean Platform → Glean: SDK
| Assignee | ||
Comment 1•9 months ago
|
||
This was the result of this work seen in the link above.
Assignee: nobody → tlong
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•