Closed Bug 1704680 Opened 4 years ago Closed 4 years ago

Annotate crash reports when child-process profiler shuts down

Categories

(Core :: Gecko Profiler, task, P1)

task

Tracking

()

RESOLVED FIXED
89 Branch
Tracking Status
firefox89 --- fixed

People

(Reporter: mozbugz, Assigned: mozbugz)

References

Details

Attachments

(3 files)

This is to help get a better idea of when shutdown timeouts from bug 1613798 occur when the profiler is running.

In particular, this should help distinguish between timeouts due to a big profile buffer that can take a significant amount of time to process, and suspicious reports that seem to indicate some issue (cond-var UAF? deadlock?) while trying to shutdown the thread itself after the final buffer processing task has already ended.

This should help investigating bug 1613798.
In particular, this should show if the profiler is taking too long to serialize its profile buffer, or something else is happening afterwards.

Comment on attachment 9215359 [details]
Bug 1704680 - Annotate crash reports when child-process profiler shuts down - r?canaltinova!,gsvelto!

Request for data collection review form (source)

  1. What questions will you answer with this data?

There are timeout crash reports that involves a child process main thread waiting for the profiler thread to end, we want to know if the profiler was actually running, and how far it got into its shutdown sequence.

  1. Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements?

We want to reduce shutdown hangs, to improve user experience.

  1. What alternative methods did you consider to answer these questions? Why were they not sufficient?

These issues are difficult to reproduce, so we need data from intermittent crashes to better focus on the root cause.

  1. Can current instrumentation answer these questions?

No.

  1. List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories found on the Mozilla wiki.

Two related measurements captured in a single crash report annotation "ProfilerChildShutdownPhase":

Measurement Description: "Profiling/Not profiling", indicating if the user was running the profiler when the crash happened.
Data Collection Category: Category 2 "Interaction data"
Tracking Bug #: 1613798

Measurement Description: Phase name, indicating the last step reached by the profiler code shutdown sequence, such as "ShutdownProfilerChild complete, waiting for thread shutdown", "SendShutdownProfile (sent)", etc.
Data Collection Category: Category 1 "Technical data"
Tracking Bug #: 1613798

  1. Please provide a link to the documentation for this data collection which describes the ultimate data set in a public, complete, and accurate way.

Described in CrashAnnotations.yaml, see https://phabricator.services.mozilla.com/D111786 for what it will look like.

  1. How long will this data be collected? Choose one of the following:

I want this data to be collected for 6 months initially (potentially renewable).
The main goal is to fix bug 1613798, but this annotation will stay useful after that, to catch other potential issues, including if the profiler takes too long to process its data during shutdown.

  1. What populations will you measure?

All.

  1. If this data collection is default on, what is the opt-out mechanism for users?

The general opt-out for telemetry / crash annotations.

  1. Please provide a general description of how you will analyze this data.

We will gather data from crash reports, and from that hopefully focus efforts on a smaller area of code in order to fix bug 1613798.
And in future crash reports, this will help measure the impact of the profile buffer processing.

  1. Where do you intend to share the results of your analysis?

On the tracking bug 1613798, and follow-up bugs, including under meta bug 1577656 (Profiler output performance issues).

  1. Is there a third-party tool (i.e. not Telemetry) that you are proposing to use for this data collection? If so:

No.

Attachment #9215359 - Flags: data-review?(chutten)
Attached file data collection review
Attachment #9216188 - Flags: data-review?(chutten)

Comment on attachment 9216188 [details]
data collection review

DATA COLLECTION REVIEW RESPONSE:

Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes.

Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is Telemetry so can be controlled through Firefox's Preferences.

If the request is for permanent data collection, is there someone who will monitor the data over time?

No. This collection will expire in six months.

Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

Is the data collection request for default-on or default-off?

Default on for all channels.

Does the instrumentation include the addition of any new identifiers?

No.

Is the data collection covered by the existing Firefox privacy notice?

Yes.

Does there need to be a check-in in the future to determine whether to renew the data?

Yes. :gsquelart is responsible for renewing or removing the collection before it expires in six months.


Result: datareview+

Attachment #9216188 - Flags: data-review?(chutten) → data-review+
Attachment #9215359 - Flags: data-review?(chutten)
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e9130bfb04d1 Annotate crash reports when child-process profiler shuts down - r=canaltinova,gsvelto
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 89 Branch

mozilla-pipeline-schemas needs to be updated for the new column to be made available in BigQuery

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: