Closed Bug 1553540 Opened 5 years ago Closed 5 years ago

Collect SSD/HDD telemetry environment data lazily, off the main thread, after initial startup (instead of eagerly on startup on the main thread)

Categories

(Toolkit :: Telemetry, task, P1)

task

Tracking

()

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: Gijs, Assigned: Gijs)

References

(Blocks 1 open bug)

Details

(Keywords: main-thread-io, perf, Whiteboard: [fxperf:p2] [fxperfsize:S])

Attachments

(1 file)

As per summary. This data is non-critical and gathering it (which is comparatively expensive) shouldn't hold up startup.

Is there as yet an abstraction for "stuff we want to add to the environment off an idle task after startup is complete", per bug 1372845?

If not, where is the best place to introduce such an abstraction? Should/can we delay submitting pings until we've ensured this information is present in the env, and if so, how? (keeping in mind we might shut down before the code ever runs. We can potentially add code to OOP ping-sender if that's the best way of dealing with this, assuming we've completed that switch at this point - I don't know for sure.)

Flags: needinfo?(alessio.placitelli)
See Also: → 1553546

(In reply to :Gijs (he/him) from comment #1)

Is there as yet an abstraction for "stuff we want to add to the environment off an idle task after startup is complete", per bug 1372845?

I'm afraid we didn't get there, no :(

If not, where is the best place to introduce such an abstraction?

This is a great question. We do have some lazy loaded stuff (that you implemented ;) ) in TelemetryEnvironment.jsm. I guess a similar approach could be used there.

Should/can we delay submitting pings until we've ensured this information is present in the env, and if so, how? (keeping in mind we might shut down before the code ever runs. We can potentially add code to OOP ping-sender if that's the best way of dealing with this, assuming we've completed that switch at this point - I don't know for sure.)

That depends on who needs the information. We usually don't send pings right at startup, not until 60s have passed since the process started (unless addons are toggled on/off or there was some other change there). However, as soon as we receive 'profile-after-change', we annotate the environment in the crash reporter, and that might need some of the bits that are lazily loaded.

Right now, before we shut down, if the "telemetry delayed init" is still ongoing, we block shutdown on that. So data will/should always be available for the shutdown ping.

I'm not sure I understand how the 'ping-sender' ties into this: it is supposed to be a dumb sender that just forwards pings we assemble.

@Chris, Georg - thoughts?

Flags: needinfo?(gfritzsche)
Flags: needinfo?(chutten)
Flags: needinfo?(alessio.placitelli)

The system is such that we already do the "delay submitting pings until we've ensured this information is present in the env" part, so pings that need an Environment will just happily wait forever for it to be ready before we attempt to send them.

Alessio is quite right, pingsender takes a fully-formed ping and tries (once) to throw it through the network. If we're awaiting the Environment, there's no fully-formed ping.

I foresee no major problems in delaying Environment construction... once we deal with how we annotate the crash reporter (which is deadlocked on either a) finding someone who examines Crash Reports who will tell us definitively what we'll be costing them in effort by taking the Environment away from early crashes b) Making the decision that, even not knowing the precise cost, it is worth doing anyway).

Flags: needinfo?(chutten)

Chris, could you send a proposal to fx-data-dev for any planned changes here?
That is our channel to announce relevant changes to consumers of our data.

Flags: needinfo?(gfritzsche) → needinfo?(chutten)
Priority: -- → P1

(In reply to Alessio Placitelli [:Dexter] from comment #2)

However, as soon as we receive 'profile-after-change', we annotate the environment in the crash reporter, and that might need some of the bits that are lazily loaded.

I'm confident that this data (for ssd/hdd models) is not needed for crash reports - it was only added recently (in bug 1533861), and not for crashreporting purposes

(In reply to Georg Fritzsche [:gfritzsche] from comment #4)

Chris, could you send a proposal to fx-data-dev for any planned changes here?
That is our channel to announce relevant changes to consumers of our data.

Sorry, how would this affect consumers? They can already not assume that this data is there, right? Also, "proposal" and "announce changes" sound quite different to me. Does a formal proposal need writing about this specific change? Or are you just saying we need to notify the group that this change is coming? Or something else?

Flags: needinfo?(gfritzsche)

(In reply to :Gijs (he/him) from comment #5)

(In reply to Georg Fritzsche [:gfritzsche] from comment #4)

Chris, could you send a proposal to fx-data-dev for any planned changes here?
That is our channel to announce relevant changes to consumers of our data.

Sorry, how would this affect consumers? They can already not assume that this data is there, right? Also, "proposal" and "announce changes" sound quite different to me. Does a formal proposal need writing about this specific change? Or are you just saying we need to notify the group that this change is coming? Or something else?

In some cases data may be missing, which has impact on the collected data. This is why we filed bug 1552858 etc. and try to determine if anyone would be impacted.
In the absence of clear stakeholders, the next best thing is to announce an "intent to change" to that group, in case we missed something.
In the meantime, to not block you much, we could probably find a path to move ahead under the assumption this is ok (if this looks overall reasonable to Chris).

Flags: needinfo?(gfritzsche)

Email sent to fx-data-dev and stability lists. Let's see if anyone has a problem!

Flags: needinfo?(chutten)

(In reply to Chris H-C :chutten from comment #7)

Email sent to fx-data-dev and stability lists. Let's see if anyone has a problem!

I don't really understand how this happened, but to be clear, I'm not suggesting we completely stop collecting the telemetry environment early on startup. I'm only suggesting we take the items that are expensive and not needed out of the startup path and put them in the environment later, making the initial initialization cheaper. I replied to the ML post but I think it got caught in moderation.

Ack, sorry, I skipped a couple of steps without communicating them.

Removing the Environment annotation from startup is something that would satisfy these new requests in a way that there seems to have been (quiet, vague) support for over the past year. We (the Firefox Telemetry Team) have been trying to get clarity on this issue for quite some time so this choice of a solution that is technically simple, architecturally clean, and also slightly incendiary was seen as a good way to pop us out of the uncertainty marsh we've found ourselves stuck in.

I am sorry for not working through this with you or communicating our intent clearly before sending out that email. At the time I thought of this as just another bug asking for clarity we didn't have to give and didn't properly consider that it could appear as though you were the one proposing this change.

To be clear: I am proposing that we move the Environment annotation later in startup to satisfy this and other Telemetry startup performance bugs.

(In reply to Chris H-C :chutten from comment #9)

Ack, sorry, I skipped a couple of steps without communicating them.

Removing the Environment annotation from startup is something that would satisfy these new requests in a way that there seems to have been (quiet, vague) support for over the past year. We (the Firefox Telemetry Team) have been trying to get clarity on this issue for quite some time so this choice of a solution that is technically simple, architecturally clean, and also slightly incendiary was seen as a good way to pop us out of the uncertainty marsh we've found ourselves stuck in.

I am sorry for not working through this with you or communicating our intent clearly before sending out that email. At the time I thought of this as just another bug asking for clarity we didn't have to give and didn't properly consider that it could appear as though you were the one proposing this change.

Ah, right. Likewise, sorry for hijacking your plans on the ML thread then...

To be clear: I am proposing that we move the Environment annotation later in startup to satisfy this and other Telemetry startup performance bugs.

OK, that makes sense - the problem is that in bug 1363586 there was significant concern at least relating to the antivirus/firewall data which is currently in the telemetry env, so we'll want to make sure to cater for that usecase. So perhaps I was too pessimistic... On the other hand, if we can gradually move things over I think there shouldn't be too much of a perf reason to move all the work to "late" in startup, as it were, so it might not be needed?

It sounds from bug 1363586 there's a plan to move the AV annotations to top-level (ie searchable in socorro) annotations whose values are fetched in the analyzer. I don't foresee any complications with the "move the env annotation to delayedInit" plan except perhaps the timing (there might be a span of builds where crashes from "early but not too early" in startup have no telemetry env).

Is that your read of the situation as well?

See Also: → 1558470
Attachment #9071088 - Attachment description: Bug 1553540 - switch disk information collection in system info off the main thread, r?aklotz!,chutten!,jya → Bug 1553540 - switch disk information collection in system info off the main thread, r?aklotz!,chutten!,jya!
Summary: Collect SSD/HDD telemetry environment data lazily, off the main thread, after idle after startup (instead of eagerly on startup on the main thread) → Collect SSD/HDD telemetry environment data lazily, off the main thread, after initial startup (instead of eagerly on startup on the main thread)
See Also: → 1558518
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/4be6e0cf9a6c
switch disk information collection in system info off the main thread, r=aklotz,chutten,jya,arai
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
Blocks: 1543096
See Also: → 1601678
Regressions: 1606556
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: