Closed Bug 1777233 Opened 3 years ago Closed 2 years ago

Manage ping submission during shutdown a bit better (FOG)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

116 Branch

Tracking Flags:

Tracking

Status

firefox116

---

fixed

People

(Reporter: Dexter, Assigned: perry.mcmanis)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1777233 - Improve behavior around Ping Submission during shutdown r?chutten 2 years ago Perry McManis [:perry.mcmanis] 48 bytes, text/x-phabricator-request		Details \| Review

Alessio Placitelli [:Dexter]

Reporter

Description

•

3 years ago

In bug 1776201 we found that if we submit FOG pings during shutdown, we'll file to send them most likely because the network layer was already shut down.

We currently shut down FOG on XPCOMShutdown, but we have some other phases we could potentially fit into.

Legacy telemetry, for example, will reject any ping submission/recording happening after AppShutdownTelemetry.

Things we should figure out:

when should we stop doing networking?
should we only allow ping submission (with upload disable) after networking stops working and until XPCOMShutdown?

Alessio Placitelli [:Dexter]

Reporter

Updated

•

3 years ago

Comment 1

•

3 years ago

Anything after (or during?) shutdown phase AppShutdownNetTeardown will fail. See logic around tooLateToSend in TelemetrySend where we respond to the JS topic version of the shutdown phase by declaring that everything attempting to be sent from then on is to pend and be stored.

Similar logic could be worked into viaduct_uploader.rs. We have other topic observers in FOG Rust and RunOnShutdown shutdown phase listeners in C++ which could communicate to the uploader that we're too late to send.

A question for Jan-Erik when he returns, and Alessio for today is: Would the Glean SDK like an API for integrators to communicate network availability to? Or is responding to upload requests with recoverable errors the right way to do this?

Flags: needinfo?(alessio.placitelli)

Alessio Placitelli [:Dexter]

Reporter

Comment 2

•

3 years ago

(In reply to Chris H-C :chutten|PTO (back June 30) from comment #1)

Similar logic could be worked into viaduct_uploader.rs. We have other topic observers in FOG Rust and RunOnShutdown shutdown phase listeners in C++ which could communicate to the uploader that we're too late to send.

This sounds like a sensible thing to do. If we're too late to send, there's no reason why we should go through the troubles of attempting to use network :)

We should probably record some data about this, but not sure if this would strictly fit a "network error" definition.

A question for Jan-Erik when he returns, and Alessio for today is: Would the Glean SDK like an API for integrators to communicate network availability to?

Not sure. This feels like an integration-specific issue that should be solved within the FOG layer, given the complexity of the Firefox shutdown.

Or is responding to upload requests with recoverable errors the right way to do this?

Maybe, but it indeed feels "odd".

Flags: needinfo?(alessio.placitelli)

Chris H-C :chutten

Comment 3

•

3 years ago

With Jan-Erik back I can ni? him : )

Because I said I could communicate to the uploader we're too late to send... but the only way to do that is to say each attempted upload is a (drumroll) Recoverable Network Failure.

So FOG does need some way to communicate to the SDK that this particular upload request wasn't tried, please try later, but also don't record this as a network failure.

Either that, or we keep it as-is and treat recoverable network failures as "fine" (ie, don't try to fix them).

Flags: needinfo?(jrediger)

Jan-Erik Rediger [:janerik]

Comment 4

•

3 years ago

Right now there's no way for the uploader to communicate that it knows it can't upload anymore.
So even if it communicates once that it encountered a recoverable network failure, the uploader manager will poll again and give it a new task.

I think it would be worthwhile adding a return value that allows it to stop processing in these cases.

Flags: needinfo?(jrediger)

Chris H-C :chutten

Updated

•

3 years ago

Depends on: 1780806

Wesley Dawson [:whd]

Comment 5

•

3 years ago

•

Edited

With the rollout of Fx 103 at 100% we're seeing a marked increase in http "0"s that's being tracked in https://mozilla-hub.atlassian.net/browse/DSRE-928. These represent submissions where a client initiated an HTTP request but from GCP's perspective the client disconnected before the server could fully respond and e.g. a 200 response could be returned to the client. These appear to be, just as in bug #1776201, mostly newtab pings.

I suspect this is related to the work in this bug and other followup bugs from the incident investigation. My assumption is that we'd ideally like to minimize 0s via client behavior changes and that work on this bug might decrease the number of 0s we're seeing, but that any pings that are interrupted in this way will eventually be resent and so it's not particularly concerning that we're seeing so many disconnects. If this doesn't sound like an accurate characterization of the current state please let me know and I will prioritize server side investigation of the issue.

EDIT: :chutten has confirmed this is likely not a server side issue and the corresponding Jira ticket has been closed

Chris H-C :chutten

Comment 6

•

2 years ago

Next steps: Wait for bug 1780806 to get into a release, vendor that release, figure out the best way to communicate we're in/after netteardown (note that PastShutdownPhase must be called on the main thread which we are most definitely not on when inside ViaductUploader, so we're looking at either an observer like this or registering during init a C++ closure to run at shutdown like this (notice the guard for ensuring we're not already beyond the phase we're registering a shutdown handler for) plus a little state.), then if we're in/after net teardown respond with Done all the time.

Perry McManis [:perry.mcmanis]

Assignee

Updated

•

2 years ago

Assignee: nobody → pmcmanis

Chris H-C :chutten

Updated

•

2 years ago

Comment 7

•

2 years ago

Attached file Bug 1777233 - Improve behavior around Ping Submission during shutdown r?chutten — Details

Pulsebot

Comment 8

•

2 years ago

Pushed by pmcmanis@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a0fa68309a2e Improve behavior around Ping Submission during shutdown r=chutten

Norisz Fay [:noriszfay]

Comment 9

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a0fa68309a2e

Status: NEW → RESOLVED

Closed: 2 years ago

status-firefox116: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 116 Branch

Chris H-C :chutten

Updated

•

2 years ago

Blocks: 1845124

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Manage ping submission during shutdown a bit better (FOG)

Categories

(Toolkit :: Telemetry, enhancement)

Tracking

()

People

(Reporter: Dexter, Assigned: perry.mcmanis)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Updated

Updated

Comment 7

Comment 8

Comment 9

Updated

Attachment

General

Description

File Name

Content Type