Open Bug 1633921 Opened 2 years ago Updated 3 months ago

600ms CreateFile on update.test on reference hardware during startup

Categories

(Toolkit :: Application Update, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: dthayer, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [fxperf:p2])

Okay the title for this bug is click bait, as it probably doesn't normally take this long. I just happen to have recorded a session on a spinny disk where, three seconds into startup, we spend 600ms in CreateFile on update.test. I looked at a few other procmon logs on the same machine and I see between 40ms and 100ms in this same call. Molly, do you know why we need to do this during startup? Could it be deferred? It always seems to be happening simultaneously (but on the same thread) with a 4096-byte read from C:\$Secure:$SDH:$INDEX_ALLOCATION, which I wish I understood, but do not.

Flags: needinfo?(mhowell)

Well, I don't think we're really doing anything to make that go that slow; what we're doing is a CreateFile to see if we can get write permission in the relevant directory, and then immediately deleting the file. The relevant directory being the one we call the "update directory", the place where update state information and patch files are stored.

I also don't think we're doing this during every startup, at least I don't see that happening and I can't find where it would. There are a few places where we do that check, and as far as I can tell they're all hit in one of two conditions: either we are about to check for an update, or we have started up when there is an update at some point in the process already (likely meaning we just finished one and restarted), and we need to figure out what state that update is in and act on it. Both of those situations involve writing to the update directory, so we want to know early on if we can't (see bug 1567077, which added one such write check here, maybe the one you're seeing).

Now, it is possible for an update check to happen that early during startup if the build that's running is old enough (as defined by app.update.checkInstallTime.days, which is different for each channel). That's called from here. If that's what's happening, I don't think there would be any harm in deferring that check.

It always seems to be happening simultaneously (but on the same thread) with a 4096-byte read from C:\$Secure:$SDH:$INDEX_ALLOCATION, which I wish I understood, but do not.

$Secure is a special file where NTFS keeps its ACL's, so I think you're just seeing it look up some permissions. My guess is that that part of the operation is the source of the delay.

Flags: needinfo?(mhowell)

(In reply to Molly Howell (she/her) [:mhowell] from comment #1)

Now, it is possible for an update check to happen that early during startup if the build that's running is old enough (as defined by app.update.checkInstallTime.days, which is different for each channel). That's called from here. If that's what's happening, I don't think there would be any harm in deferring that check.

Ah - right. I have updates set to check but not download, so that must be why this is happening every run. Anyway, it looks like we could just move the this._checkForOldBuildUpdates(); line inside BrowserGlue._onFirstWindowLoaded to a task inside BrowserGlue._scheduleArbitrarilyLateIdleTasks()?

The doc comment on _scheduleArbitrarilyLateIdleTasks does scare me a bit, because I don't want app update checks getting deferred out of existence. I'd be happier if there was some middle ground available. Or failing that, do you have a sense for how many sessions fail to run one or more of these tasks in practice? It's possible that the thing I'm worried about isn't actually a problem.

Flags: needinfo?(dothayer)
Depends on: 1637024
Whiteboard: [fxperf] → [fxperf:p2]

(In reply to Molly Howell (she/her) [:mhowell] from comment #3)

The doc comment on _scheduleArbitrarilyLateIdleTasks does scare me a bit, because I don't want app update checks getting deferred out of existence. I'd be happier if there was some middle ground available. Or failing that, do you have a sense for how many sessions fail to run one or more of these tasks in practice? It's possible that the thing I'm worried about isn't actually a problem.

dthayer: gently bumping this. Looks like the dependent ticket Bug 1637024 has active work and this will follow after that. Is that a reasonable schedule for your/Perf Team's needs?

(In reply to Nick Alexander :nalexander [he/him] from comment #4)

dthayer: gently bumping this. Looks like the dependent ticket Bug 1637024 has active work and this will follow after that. Is that a reasonable schedule for your/Perf Team's needs?

Yeah - apologies for the silence on this. Yes, we're planning to have this follow after bug 1637024, and that should be a reasonable schedule for us.

Flags: needinfo?(dothayer)
Priority: -- → P3
You need to log in before you can comment on or make changes to this bug.