Closed Bug 1589124 Opened 3 years ago Closed 2 years ago

Add metric(s) to report networking errors when sending pings

Categories

(Data Platform and Tools :: Glean: SDK, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mdroettboom, Assigned: brizental)

References

Details

(Whiteboard: [telemetry:glean-rs:m16])

Attachments

(2 files)

It might be useful to have counter metrics that report:

  • Number of 5xx errors
  • Number of networking timeouts

These would not appear on the ping being sent, but on subsequent pings.

Open question: Should these be on the metrics ping or the baseline ping?

Whiteboard: [telemetry:glean-rs:m?] → [telemetry:glean-rs:m11]
Attached file Data review request
Attachment #9116133 - Flags: data-review?(chutten)

Note: I'm proceeding with the data review on this now, but it might be most efficient to handle the implementation of this after the refactoring of the uploaders to move more things into Rust.

Assignee: nobody → mdroettboom
Comment on attachment 9116133 [details]
Data review request

DATA COLLECTION REVIEW RESPONSE:

    Is there or will there be documentation that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes. This collection is documented [in the Book of Glean](https://mozilla.github.io/glean/book/user/collected-metrics/metrics.html).

    Is there a control mechanism that allows the user to turn the data collection on and off?

Yes. This collection is controlled through the Glean SDK's `set_upload_enabled` API which is exposed to the user through application UI in each embedding application. For instance, Fenix has it in Settings > Data Collection.

    If the request is for permanent data collection, is there someone who will monitor the data over time?

Yes, Michael Droetboom is responsible.

    Using the category system of data types on the Mozilla wiki, what collection type of data do the requested measurements fall under?

Category 1, Technical.

    Is the data collection request for default-on or default-off?

Default on for all channels.

    Does the instrumentation include the addition of any new identifiers?

No.

    Is the data collection covered by the existing Firefox privacy notice?

Yes.

    Does there need to be a check-in in the future to determine whether to renew the data?

No. This collection is permanent.

---
Result: datareview+
Attachment #9116133 - Flags: data-review?(chutten) → data-review+
Depends on: 1605077
Whiteboard: [telemetry:glean-rs:m11] → [telemetry:glean-rs:m16]
Assignee: mdroettboom → nobody

I see this includes a "timeout" counter. We don't currently catch specifically timeout errors (at least not in Kotlin). What did you have in mind for this Mike? 408 / 504 status codes?

Assignee: nobody → brizental
Flags: needinfo?(mdroettboom)

It looks like timeouts throw a java.net.SocketTimeoutException (see here, which is a subclass of IOException, so they end up in the catch block we have for that. We should probably catch timeouts specifically and report that through this mechanism. But maybe there are other kinds of IOExceptions we also should report through this mechanism...

Flags: needinfo?(mdroettboom)

That's for java.net only. Fenix replaces the uploader with a Gecko/Necko-powered one.
We would need to provide it in our reporting API to pass along this information, and then also update our consumers.

Attached file GitHub Pull Request
Priority: P3 → P1
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.