Closed Bug 701872 Opened 13 years ago Closed 6 years ago

Searching for cause of multi-minute startups

Categories

(Firefox :: General, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: taras.mozilla, Assigned: taras.mozilla)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [Snappy:p1])

Attachments

(2 files)

We've had a funny pattern in our startup numbers since telemetry got deployed. It seems like a significant percentage of startups is somewhere between 30seconds and 34hours. http://people.mozilla.org/~xstevens/telemetry/telemetry-firstpaint.png http://people.mozilla.org/~xstevens/telemetry/telemetry-firstpaint-over30.png

In above graphs x-startup time, y-frequency. Red is for interrupted, blue is for (as we as we know uninterrupted). Addons can be ruled out since a lot of these profiles do not have any addons installed.
 
I long assumed that these are caused by some dialog interrupting startup, but we've ruled that out with bug 670194. 

I filed bug 701563 because we determined that fsync()ing webapps sqlite blocked the browser UI on Linux(due to an ext4 bug the browser spent 8minutes fsync()ing stuff instead of starting up). However the graphs above are for windows. We can't explain wider startup problems with that.
In bug 701583, I made telemetry take snapshot of sqlite histograms that happen before startup. In the meantime here are all of the pings where
firstpaint > 60000ms, uptime < 5min
Seems like a lot of these have MOZ_STORAGE_ASYNC_REQUESTS_MS that are rather large and similar in magnitude to firstPaint, which would suggest that some queries are being blocked on a mutex.
Vladan is working on reporting slow sql statements via telemetry in bug 699051.
Depends on: 699051
Depends on: 701880
Disk cache causes atleast one of the 1min+ startups in attachment above. Filed bug 701909
Depends on: 701909
Bug 691774 is still unfixed, are there other interruptions that aren't being reported? Safe-mode dialogues, UAC, updater, import wizards, various extension dialogues, blocklist dialogues etc.? The new 3rd party extension dialogue? Are they all covered? Could 3rd party software dialogues, like firewall permissions get in the way too?
Depends on: 691774
(In reply to Daniel Cater from comment #3)
> Bug 691774 is still unfixed, are there other interruptions that aren't being
> reported? Safe-mode dialogues, UAC, updater, import wizards, various
> extension dialogues, blocklist dialogues etc.? The new 3rd party extension
> dialogue? Are they all covered? Could 3rd party software dialogues, like
> firewall permissions get in the way too?

691774 doesn't matter on windows. I'm looking at windows-alone here since it has the highest quality startup speed data(bugs on mac/linux). UAC might be a problem if the user clicks no(otherwise we get a new process).
I'm not worried about extensions because a lot of these are profiles without extensions. Also given how infrequently we are interrupted with extension manager, I'll assume that other prompts are even more rare. So yes, I agree there is room for improvement in measurement, but I also think that we have sufficient evidence that this is a real problem.
Whiteboard: [Snappy]
Whiteboard: [Snappy] → [Snappy:p1]
Assignee: nobody → tglek
MOZ_STORAGE_ASYNC_REQUESTS_MS and/or HTTP_SUBITEM_OPEN_LATENCY_TIME are taking very long in almost all of these reports. I've tried reproducing these issues by interrupting startup in various ways, with no luck. I'm now waiting on new reports with slowSQL telemetry data.
Depends on: 715402
Blocks: 447581
Managed in other bugs.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.