Closed Bug 1693708 Opened 5 years ago Closed 5 years ago

Narrow down the zero_byte_load probe to tailor results for YSOD

Categories

(Core :: Networking: JAR, task, P2)

task

Tracking

()

RESOLVED FIXED
87 Branch
Tracking Status
firefox87 --- fixed

People

(Reporter: zbraniecki, Assigned: zbraniecki)

References

(Blocks 1 open bug)

Details

(Whiteboard: [necko-triaged])

Attachments

(1 file)

The probe landed in bug 1693146 returned 1.5 million results on nightly in a day. Let's filter out everything we don't care about for now.

I want to carefully tailor down the number of events we're getting to zero down on ones that are most likely causing YSOD.

That is a bit of a guess game until we have correlations, but based on results I posted in bug 1693711 I believe I can reliably cut out ~1.3m out of 1.5 events we got today without losing the data we're hunting down.

I'm going to document what I'm filtering out both in the code and here to keep awareness that we are filtering data and may want to unfilter later to analyze it for other errors or in correlative with YSOD:

  1. Remove "other" category
    Volume: 60% of events
    Why: Other is dominated by SVG and JSON unrelated to YSOD. It is worth noting that the most common status there is NS_BINDING_ABORTED and not file not found. May be worth investigating separately.

  2. Remove "FTL" when matched with "NS_ERROR_FILE_NOT_FOUND"
    Volume: 17%
    Why: Fluent L10nRegistry intentionally attempts to load files from toolkit/browser omni.ja to learn if the file is present. file not found is an expected output of such test and we heavily cache it so that we don't fire it multiple times.
    But it is not causing YSODs and even if some of those calls are errors, Fluent will recover, report to console and display as much as it can without breaking UI (think, CSS style). In result it's not worth investigating file not found for it. If other errors show up, I'm keeping them in the probe.

  3. Remove "JS" when not coming from "omni.ja!"
    Volume: 10%
    Why: JS coming from extensions may be worth investigating by extensions, but is not related to our main sources of YSODs

  4. Remove "DTD" when starting with "omni.ja!/res/dtd"
    Volume: 7%
    Why: "res/dtd" paths are not interesting for our use case I believe as they don't cause the most common DTD related YSODs - neither the NO_ELEMENTS nor the MISSING_ENTITY (which is located in the first localization DTD callsite, not svg11.dtd style).
    The volume of such missing DTD files is suspicious and some paths like omni.ja!/chrome/toolkit/content/global/DTD/xhtml1-strict.dtd indicate that we may be constructing wrong paths in some generated files. May be worth investigating separately.

Those 4 will remove ~95% of the events leaving less than 70k events per day which should be much more managable. If we don't see a strong correlation in that group we can carefully open up to let more events be sent from the filtered group.

Assignee: nobody → zbraniecki
Status: NEW → ASSIGNED
Pushed by zbraniecki@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/adf8225b25af Narrow down the scope of the YSOD probe to limit the volume of events. r=mossop
Severity: -- → S3
Priority: -- → P2
Whiteboard: [necko-triaged]
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: