Closed Bug 1035290 Opened 11 years ago Closed 11 years ago

AsyncShutdownTimeout "FHR: Flushing storage shutdown"

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set
normal

Tracking

(firefox34 wontfix, firefox35+ wontfix, firefox36+ wontfix, firefox37+ wontfix)

RESOLVED WONTFIX
Tracking Status
firefox34 --- wontfix
firefox35 + wontfix
firefox36 + wontfix
firefox37 + wontfix

People

(Reporter: Yoric, Unassigned)

References

()

Details

(Keywords: topcrash)

According to Socorro, we have 25 on these on Nightly (1/8 of AsyncShutdownTimeout crashes) and 97 on Aurora (1/3 of AsyncShutdownTimeout crashes). Apparently, bug 1017706 helped but did not suffice. From the 25 Nightly crashes, we have * 15 with {shutdownInitiated:true, initialized:false, shutdownRequested:true, initializeHadError:false, providerManagerInProgress:false, storageInProgress:false, hasProviderManager:false, hasStorage: true, shutdownComplete:false} * 8 with {shutdownInitiated:false, initialized:false, shutdownRequested:true, initializeHadError:false, providerManagerInProgress:true, storageInProgress:false, hasProviderManager:true, hasStorage:true, shutdownComplete:false} From the 93 Aurora crashes, we have * 87 with {shutdownInitiated:false, initialized:true, shutdownRequested:false, initializeHadError:false, providerManagerInProgress:false, storageInProgress:false, hasProviderManager:true, hasStorage:true} We really should make sure that we fix the problem before it gets to Beta.
Pinging gps for ideas.
Flags: needinfo?(gps)
Forgot to mention: this is essentially the same as bug 944873, except that bug has been hijacked by other issues.
Unless I'm missing something, the running theory is bug 1030266 is the sole cause and fixing will make this go away. All the other solutions (timeouts on tasks, etc) are nice, but they can probably wait. They hopefully amount to a benign itch.
Flags: needinfo?(gps)
Well, we can certainly try and fix it.
This is a topcrash right now on Firefox 34. This crash signature has come back into the top 10 for 34 with 683/47826 crashes in the last 3 days, and 1255/91832 crashes for 34.0.5. Some comments mention Flash, plugin container issues, and an inability to shutdown. There isn't a crash signature field for this component, so people may file duplicate bugs by accident (as this shows up on the lists of top crash signatures that don't have bugs associated yet. ) Yoric, is it worth keeping the one I filed open to avoid that confusion?
Flags: needinfo?(dteller)
Keywords: topcrash
I thought we had at least one open already with that crash signature? If that's not the case, yeah, you can keep it open. Just make it a meta bug, because all it means is "some component fails to shut down properly", and you have to look at the metadata to find out which component is. Georg, I think that's in your lap, isn't it?
Flags: needinfo?(dteller) → needinfo?(georg.fritzsche)
(In reply to David Rajchenbach-Teller [:Yoric] (hard to reach until December 10th - use "needinfo") from comment #8) > Georg, I think that's in your lap, isn't it? I'm not sure what exactly you are asking me about?
I'm asking whether you are investigating the issue.
Depends on: 1110681
(In reply to David Rajchenbach-Teller [:Yoric] (hard to reach until December 10th - use "needinfo") from comment #10) > I'm asking whether you are investigating the issue. Ah, yes, i'm checking into this one right now. I don't think that i can take on all the data collection issues personally though.
I did a quick manual check of the "0 days ago" reports for Fx 34.* here: http://yoric.github.io/are-we-shutting-down-yet-/?signature=~FHR%3A+Flushing+storage+shutdown&version=Firefox+34.0.5&version=Firefox+34.0# For 18 of 19 of these reports, the state matches bug 1110681. The single outlier is bp-6f645945-0102-4135-8bfa-f08392141212, which has this data: { "phase":"Metrics Storage Backend", "conditions":[ { "name":"FHR: Flushing storage shutdown", "state":{ "shutdownInitiated":true, "initialized":false, "shutdownRequested":true, "initializeHadError":false, "providerManagerInProgress":false, "storageInProgress":false, "hasProviderManager":false, "hasStorage":true, "shutdownComplete":false }, "filename":"resource://gre/modules/HealthReport.jsm", "lineNumber":4335, "stack":[ "resource://gre/modules/HealthReport.jsm:AbstractHealthReporter.prototype<.init/<:4335", "" ] } ] }
(In reply to Georg Fritzsche [:gfritzsche] from comment #12) > The single outlier is bp-6f645945-0102-4135-8bfa-f08392141212, which has > this data: And going by that state, this is waiting on the storage closing around here: http://hg.mozilla.org/mozilla-central/annotate/0cf461e62ce5/services/healthreport/healthreporter.jsm#l625
Depends on: 1110691
Depends on: 1106036
So, i ran a proper analysis now based on the full release data we have for this. Of 6212 AsyncShutdownTimeouts for "FHR: Flushing storage shutdown" [0], 3810 are on Fx 34. Of those, 3579 have the same state as bug 1110681, so we need to push that bug. The detailed breakdown: { '{"shutdownRequested": true, "shutdownInitiated": false, "providerManagerInProgress": true, "hasProviderManager": true, "hasStorage": true, "initializeHadError": false, "initialized": false, "shutdownComplete": false, "storageInProgress": false}': 3579, '{"shutdownRequested": true, "shutdownInitiated": true, "providerManagerInProgress": false, "hasProviderManager": false, "hasStorage": true, "initializeHadError": false, "initialized": false, "shutdownComplete": false, "storageInProgress": false}': 219, '{"shutdownRequested": true, "shutdownInitiated": false, "providerManagerInProgress": false, "hasProviderManager": false, "hasStorage": false, "initializeHadError": false, "initialized": false, "shutdownComplete": false, "storageInProgress": true}': 7, '{"shutdownRequested": true, "shutdownInitiated": true, "providerManagerInProgress": false, "hasProviderManager": true, "hasStorage": true, "initializeHadError": false, "initialized": false, "shutdownComplete": false, "storageInProgress": false}': 5, } [0] http://bsmedberg.github.io/crash-stats-api-magic/analyze-crash.html?url=https%3A%2F%2Fdl.dropboxusercontent.com%2Fu%2F15124579%2Fasync_shutdown_timeout_crashes.json&rulecount=3&rule0_action=filter&rule0_fn=function%28d%29%20{%0A%20%20return%20d.version.indexOf%28%2234%22%29%20%3D%3D%200%20%26%26%0A%20%20%20%20%20%20%20%20%20d.async_shutdown_timeout.indexOf%28%22FHR%3A%20Flushing%20storage%20shutdown%22%29%20%3E%20-1%3B%0A}&rule1_action=map&rule1_fn=function%28d%29%20{%0A%20%20return%20JSON.parse%28d.async_shutdown_timeout%29.conditions%5B0%5D.state%3B%0A}&rule2_action=counter&rule2_fn=
If this regressed in Firefox 34, what changed in Firefox 34 that is causing this code path to get exercised more often?
(In reply to Gregory Szorc [:gps] from comment #16) > If this regressed in Firefox 34, what changed in Firefox 34 that is causing > this code path to get exercised more often? We don't know at this point - there are more diagnostics that went into Firefox 35+ that should hopefully help us pin down the offending provider. See also bug 1110681 for more context.
kairo, we don't have great short-term options right now because we probably won't get enough diagnostic data back in time for the 35 release. Can you tell whether the specific issue filed here (AsyncShutdownTimeout with state "FHR: Flushing storage shutdown") has a big impact? We might consider silencing this specific issue for release if we have to, but would lose diagnostic data in the process.
Flags: needinfo?(kairo)
(Note that we only pulled 3810 crashes for this issue per comment 15, but i'm not sure about the effect of throttling or other factors here)
http://yoric.github.io/are-we-shutting-down-yet-/# usually gives good info of how many of the shutdown crashes are what issue but its main page doesn't seem to load right now. That said, http://yoric.github.io/are-we-shutting-down-yet-/?version=Firefox+34.0.5# shows that this is the major part of those signatures other than bug 1114567 that just came up in the last days. Unfortunately, http://yoric.github.io/are-we-shutting-down-yet-/?version=Firefox+35.0# also doesn't seem to load. Looks like Yoric needs to fix it.
Flags: needinfo?(kairo)
KaiRo confirmed that the AsyncShutdownTimeout is ~1% of release crashes, with most of them presumably being this issue. Given that volume and that it is a shutdown crash, we're not jumping to emergency measures for 35 here. Instead we can wait on the diagnostic data, which should hopefully get us to a fix in 36 beta.
Wontfix based on comment 21 and the timing of 35.
Georg, can we have an assignee on this for 36? Thanks
Flags: needinfo?(gfritzsche)
The next actionable step is adding additional forensics on the SearchProvider in bug 1110681. I hope that me or Yoric can get to that soon.
Flags: needinfo?(gfritzsche)
Guys, have you been able to work on this? Thanks
Flags: needinfo?(gfritzsche)
Flags: needinfo?(dteller)
We haven't been able to get bug 1110681 done yet due to other prioritized work.
Flags: needinfo?(gfritzsche)
Flags: needinfo?(dteller)
OK. So, wontfix for 36 too.
The additional forensics from bug 1110681 look to have landed about 1.5 months ago on 35+. Do you have the data that you need to proceed with the investigation for this bug?
Flags: needinfo?(gfritzsche)
Flags: needinfo?(dteller)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #29) > The additional forensics from bug 1110681 look to have landed about 1.5 > months ago on 35+. Do you have the data that you need to proceed with the > investigation for this bug? The first part landed, pointed to the SearchService issue and required further investigation. Bug 1110681 is still open for the further forensics on what is broken in the search service, hence comment 27. We are currently pushing for FHR & Telemetry unification, which would obsolete these issues on desktop, which is why this hasn't made the top of the list.
Flags: needinfo?(gfritzsche)
Flags: needinfo?(dteller)
We're not moving very quickly on this bug, which is still marked as a top crash. Kairo - Can you confirm that this is still a top crash?
Flags: needinfo?(kairo)
(In reply to Lawrence Mandel [:lmandel] (use needinfo) from comment #31) > Kairo - Can you confirm that this is still a top crash? http://yoric.github.io/are-we-shutting-down-yet-/?version=Firefox+35.0# says it's still 22% of all 35 AsyncShutdownTimeouts. According to http://yoric.github.io/are-we-shutting-down-yet-/?version=Firefox+36.0# it's only 4% of them in 36. The overall signature that we usually see with AsyncShutdownTimeouts is right now #9 with 1.1% of 36.0b10 crashes.
Flags: needinfo?(kairo)
Georg - This bug is still very relevant as it has a rather substantial portion of the shutdown crashes. (See comment 32.) FHR/Telemetry unification is happening in 38. If you intend to investigate further in 38 after the unification is complete (in the next couple of weeks), perhaps we can hold off. If you're suggesting that we'll look at this again to ship a fix in 39, that seems pretty late. Do we have any other options to get better data and get back to making progress on this bug?
Flags: needinfo?(gfritzsche)
With the unification done, we will have a decision about whether we can turn off FHR in 38, which would make the problem go away anyway. We may want to still solve things here if this is critical enough to fix on 37 or if we want to have this ready in case disabling FHR is a no-go.
Flags: needinfo?(gfritzsche)
This bug is pretty old at this point but it is still flagged as a topcrash and still something that we want to fix if we have a way to make progress. Assuming that we don't want to wait for FHR/Telemetry unification or that that doesn't happen in 38, what are the next steps toward resolving this bug?
Flags: needinfo?(gfritzsche)
I believe that the next step is extracting data collected from bug 1110681.
(once it has landed, that is)
Indeed, note that David picked that one up again (thanks!).
Flags: needinfo?(gfritzsche)
I can confidently say that we aren't going to have time for this bug. The only sane way to fix this is by removing the code in question, which will either be 38 or 39 depending on the status of unification.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
I think comment 34 and comment 39 make sense. This issue will be eliminated in 38 or 39. I'm going to mark as wontfix for 37+
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.