Closed Bug 1175521 Opened 8 years ago Closed 8 years ago

crash in RtlEnterCriticalSection | MessageLoop::PostTask_Helper(tracked_objects::Location const&, Task*, int, bool)

Categories

(Core :: Graphics: Layers, defect)

x86
Windows NT
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
e10s ? ---
firefox38 --- wontfix
firefox39 + wontfix
firefox38.0.5 --- wontfix
firefox40 + fixed
firefox41 + fixed
firefox-esr38 40+ fixed

People

(Reporter: tracy, Assigned: nical)

References

Details

(Keywords: crash, topcrash-win)

Crash Data

Attachments

(1 file)

This bug was filed from the Socorro interface and is 
report bp-0ee12a3a-100a-41d9-a463-526152150616.
=============================================================

[Tracking Requested - why for this release]:

This has been creeping up the topcrash list on Nightly.  Currently #7 at 67 crashed in the past seven days.  It is also on 38, 39 and 40. But highest in volume on 40 and 41.

The only reporter comment is: "Heavy duty imgur.com browsing with animated images/html5 videos. started to use 30% processor. tried to shut down browser. This caused crash."
Shutdown crash: compositor appears to be trying to post something to a dead message loop.
Component: IPC → Graphics: Layers
Flags: needinfo?(milan)
I'll track this for 39+ but it seems unlikely we can land a fix in time for 39 release.
Can we tell if it crept up because we took care of crashes above it, or are the overall numbers trending up as well?
Flags: needinfo?(milan)
This is a wallpaper patch, but we can see if it helps nightly.  Should still figure out if it's expected for the compositor loop to be null.
Attachment #8623775 - Flags: review?(nical.bugzilla)
Can we tell if this was there before 33?
(In reply to Milan Sreckovic [:milan] from comment #3)
> Can we tell if it crept up because we took care of crashes above it, or are
> the overall numbers trending up as well?

I think it's more of the former.  But the numbers have been up a little more consistently (for Nightly and DevEd) beginning 20150604. 

 (In reply to Milan Sreckovic [:milan] from comment #5)
> Can we tell if this was there before 33?

There is a single crash reported on 9.0.1 https://crash-stats.mozilla.com/report/index/2dd5c293-c18d-4adf-9ae3-3e31c2150612.  All of the rest are 38.0 or later.
Comment on attachment 8623775 [details] [diff] [review]
Check if the loop is there before proceeding. r=nical

Review of attachment 8623775 [details] [diff] [review]:
-----------------------------------------------------------------

It's fine to paper over this, but there are some things that deserve further fixing in here.
Looks like we are getting in an invalid state in the middle of shutting down, and the error code is trying to use resources that are probably already gone (like the event loop). I think that the right thing to do, since various things can trigger posting to messages loops or other resources that are destroyed during shutdown, is to figure out if all of these things are happening on the same thread and if so maybe keep a state somewhere to have the "dangerous" things check that we are not shutting down and cancel/assert whatever they are about to do if it is the case (except things that are supposed to be hapenning that during shutdown, obviously).

This also means that the crash we are looking at is the symptom of the real issue: whatever caused us do get into MessageChannel::NotifyMaybeChannelError() and the cascade of OnChannelError that follows.
Attachment #8623775 - Flags: review?(nical.bugzilla) → review+
Sounds like a plan - let's land the wallpaper patch (no try run - let me know if you'd like one), and leave open and assigned to :nical for an investigation and a real fix.  This will still assert in debug and drop a message in any crash report if it happens.
Assignee: nobody → nical.bugzilla
[Tracking Requested - why for this release]: Topcrash
(In reply to Milan Sreckovic [:milan] from comment #8)
> Sounds like a plan - let's land the wallpaper patch (no try run - let me
> know if you'd like one), and leave open and assigned to :nical for an
> investigation and a real fix.  This will still assert in debug and drop a
> message in any crash report if it happens.

Just for book keeping, can we close this out and do the investigation in a follow up?
Flags: needinfo?(milan)
Assuming the crashes go away, sure.  I'll close this one, we can reopen if the crashes aren't gone, and clone for a followup.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(milan)
Resolution: --- → FIXED
Milan, can we have an uplift request to aurora? Thanks
Flags: needinfo?(milan)
Keywords: leave-open
Comment on attachment 8623775 [details] [diff] [review]
Check if the loop is there before proceeding. r=nical

Approval Request Comment
[Feature/regressing bug #]: Seems to be 38 and later
[User impact if declined]: Topcrash
[Describe test coverage new/current, TreeHerder]:
[Risks and why]: No crashes on nightly since this landed.
[String/UUID change made/needed]:
Flags: needinfo?(milan)
Attachment #8623775 - Flags: approval-mozilla-aurora?
Comment on attachment 8623775 [details] [diff] [review]
Check if the loop is there before proceeding. r=nical

Approving for uplift to Aurora, low risk, it has been in Nightly for approximately a week.
Attachment #8623775 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8623775 [details] [diff] [review]
Check if the loop is there before proceeding. r=nical

[Triage Comment]
Sorry, we meant beta.
Attachment #8623775 - Flags: approval-mozilla-beta+
Attachment #8623775 - Flags: approval-mozilla-aurora-
Attachment #8623775 - Flags: approval-mozilla-aurora+
Target Milestone: --- → mozilla41
Since this was a topcrash, let's track for upcoming ESR38 release.

Milan, do you think we should be fixing this (backporting?) for ESR38 release?
Flags: needinfo?(milan)
Not a security problem, I don't think it would get approved for an ESR release, though the patch is simple enough.
Flags: needinfo?(milan)
Looking at the past week, the only crashes that would have been this one instead, are the shutdown hangs like bug 1127270.
Milan, if the backport is simple and helps rather than hurts our end-users I think we should consider taking it. You might be the best judge as to how safe the backport fix would be. Thanks!
Comment on attachment 8623775 [details] [diff] [review]
Check if the loop is there before proceeding. r=nical

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: top crash
User impact if declined: more crashes?
Fix Landed on Version: 41, uplifted to 40
Risk to taking this patch (and alternatives if risky): low risk, we are stopping a nullptr dereference.
String or UUID changes made by this patch: 

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Attachment #8623775 - Flags: approval-mozilla-esr38?
Comment on attachment 8623775 [details] [diff] [review]
Check if the loop is there before proceeding. r=nical

I agree with Ritu that we should take this topcrash fix in ESR38 as stability has noticeably improved in 39 and 40. ESR38+

Ritu - Can you please add this to the relnotes for ESR 38.2.0.
Flags: needinfo?(rkothari)
Attachment #8623775 - Flags: approval-mozilla-esr38? → approval-mozilla-esr38+
Forgot to add that this bug will be mentioned in ESR 38.2.0 release notes.
You need to log in before you can comment on or make changes to this bug.