Closed
Bug 1064107
Opened 10 years ago
Closed 10 years ago
crash in RtlEnterCriticalSection | LockImpl::Lock()
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
VERIFIED
FIXED
mozilla35
People
(Reporter: lizzard, Assigned: nical)
References
Details
(Keywords: crash, topcrash)
Crash Data
Attachments
(1 file)
991 bytes,
patch
|
bas.schouten
:
review+
Sylvestre
:
approval-mozilla-aurora-
Sylvestre
:
approval-mozilla-beta+
Sylvestre
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-3b3598a4-6210-4841-b9cb-74b742140905. ============================================================= #1 topcrash, 373/1476 crashes in the last 3 days for Firefox 34.0a2 (Aurora) It looks like a startup crash on Windows platforms. Also crashing Nightly and Beta. While the signature has shown up at a low level since 2012, it spiked dramatically with the 2014083103 build. Crashing thread: 0 ntdll.dll RtlEnterCriticalSection 1 xul.dll LockImpl::Lock() ipc/chromium/src/base/lock_impl_win.cc 2 xul.dll MessageLoop::PostTask_Helper(tracked_objects::Location const&, Task*, int, bool) ipc/chromium/src/base/message_loop.cc 3 xul.dll MessageLoop::PostTask(tracked_objects::Location const&, Task*) ipc/chromium/src/base/message_loop.cc 4 xul.dll mozilla::layers::CompositorParent::CompositorParent(nsIWidget*, bool, int, int) gfx/layers/ipc/CompositorParent.cpp 5 xul.dll nsBaseWidget::NewCompositorParent(int, int) widget/xpwidgets/nsBaseWidget.cpp 6 xul.dll nsBaseWidget::CreateCompositor(int, int) widget/xpwidgets/nsBaseWidget.cpp 7 xul.dll nsBaseWidget::CreateCompositor() widget/xpwidgets/nsBaseWidget.cpp 8 xul.dll nsWindow::GetLayerManager(mozilla::layers::PLayerTransactionChild*, mozilla::layers::LayersBackend, nsIWidget::LayerManagerPersistence, bool*) widget/windows/nsWindow.cpp 9 mozjs.dll js::EmptyShape::getInitialShape(js::ExclusiveContext*, js::Class const*, js::TaggedProto, JSObject*, JSObject*, unsigned int, unsigned int) js/src/vm/Shape.cpp 10 xul.dll mozilla::dom::CanvasRenderingContext2D::DrawImage(mozilla::dom::HTMLImageElementOrHTMLCanvasElementOrHTMLVideoElement const&, double, double, double, double, double, double, double, double, unsigned char, mozilla::ErrorResult&) dom/canvas/CanvasRenderingContext2D.cpp
Updated•10 years ago
|
Component: General → Graphics: Layers
Usually RtlEnterCriticalSection means an uninitialized CS, but the code seems to initialize correctly. At first glance I can't tell what's causing this. Liz, can you post a regression range for the spike on the 31st?
Flags: needinfo?(lhenry)
Comment 2•10 years ago
|
||
This probably started when 34 was still on nightly, we see the signature on 35 nightly as well. That said, the data on https://crash-analysis.mozilla.com/rkaiser/2014-09-07/2014-09-07.firefox.nightly.explosiveness.html makes it looks very much like 2014-08-31 is the start of this. And looking at the nightly channel with the start date set to a number of days before 8/32 on https://crash-stats.mozilla.com/report/list?signature=RtlEnterCriticalSection%20|%20LockImpl%3A%3ALock%28%29#tab-graph also points to the nightly build of the 31st to be the first to have the issue.
Reporter | ||
Comment 3•10 years ago
|
||
Here is a not-very-narrow window for now... http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=eed9fe35a00d&tochange=1db35d2c9a2f
Flags: needinfo?(lhenry)
CompositorLoop() is null here: http://hg.mozilla.org/mozilla-central/file/152ef25e89ae/gfx/layers/ipc/CompositorParent.cpp#l234 Bas is your FIXME still true? I don't see what in the pushlog would cause this, though...
Flags: needinfo?(bas)
Comment 5•10 years ago
|
||
(In reply to David Major [:dmajor] from comment #4) > CompositorLoop() is null here: > http://hg.mozilla.org/mozilla-central/file/152ef25e89ae/gfx/layers/ipc/ > CompositorParent.cpp#l234 > > Bas is your FIXME still true? > > I don't see what in the pushlog would cause this, though... I believe it was Nical that wrote this particular code.. fwiw, I believe it is, but Nical, can you tell me?
Flags: needinfo?(bas) → needinfo?(nical.bugzilla)
Assignee | ||
Comment 6•10 years ago
|
||
This comment means "we are sending a pointer (this) to another task and the reason we know it will still be alive is because whatever can delete the instance will be created after in the thread that receives the task". It is still true AFAICT. It's not related to the event loop being null, though.
Flags: needinfo?(nical.bugzilla)
Reporter | ||
Comment 7•10 years ago
|
||
This is the #1 topcrash on Firefox 34.0a2 with 1511/10042 crashes in the last 7 days. It looks like it is most often a startup crash on Windows 7. More reports: https://crash-stats.mozilla.com/report/list?signature=RtlEnterCriticalSection+|+LockImpl%3A%3ALock%28%29&range_value=7&range_unit=days&date=2014-09-15#tab-reports
> It's not related to the event loop being null, though.
Who would be the best person to investigate a null CompositorLoop()?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 9•10 years ago
|
||
(In reply to David Major [:dmajor] from comment #8) > > It's not related to the event loop being null, though. > > Who would be the best person to investigate a null CompositorLoop()? bjacob, maybe? it looks like some code is trying to create a layer manager in a process where all of the OMTC stuff hasn't been initialized (yet?), that is CompositorParent::StartUp() hasn't been called (the startup happens in gfxPlatform::Startup so if we are too early, then we are really really early).
Flags: needinfo?(nical.bugzilla)
Updated•10 years ago
|
status-firefox32:
--- → unaffected
tracking-firefox33:
--- → +
tracking-firefox34:
--- → +
tracking-firefox35:
--- → +
Updated•10 years ago
|
Flags: needinfo?(bjacob)
Comment 10•10 years ago
|
||
I'm not current on what changed with OMTC-on-desktop (this being on 34.0a2 is a strong indication that it's OMTC related, and I didn't work on that), but the interesting thing here is how there is Canvas2D stuff in the call stack. It seems as if this startup crash occurred when a user started Firefox directly on a page using Canvas2D, and that tried to talk to the compositor before it got started up. Bas/Nical, I think that we might have to update the Canvas2D code to wait for the compositor to start up before doing things that interact directly with it.
Flags: needinfo?(bjacob)
Comment 11•10 years ago
|
||
Setting a ni? for comment #10 as we really need some progress here.
Flags: needinfo?(bas)
Comment 12•10 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #11) > Setting a ni? for comment #10 as we really need some progress here. I've been unable to reproduce this so far, it doesn't make a lot of sense that we could create a canvas -before- we initialize the platform. But possible some add-on code or something could do this? It probably won't be hard to construct a work-around here that will create minimal artifacts. One question to assist in diagnosing this, am I correct in understanding from earlier comments this is -not- happening in Beta?
Flags: needinfo?(bas)
Assignee | ||
Comment 13•10 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #10) > but the interesting thing here is how there is Canvas2D stuff in the call > stack. It seems as if this startup crash occurred when a user started > Firefox directly on a page using Canvas2D, and that tried to talk to the > compositor before it got started up. Even before gfxPlatform::Startup() ? This is when (IIUC) we instantiate the compositor thread. (cf: gfxPlatform::Startup -> gfxPlatform::InitLayersIPC -> CompositorParent::StartUp -> new CompositorThreadHolder).
Comment 14•10 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #12) > One question to assist in diagnosing this, am I correct in understanding > from earlier comments this is -not- happening in Beta? Yes. This only happens on 34 Aurora and on 35 Nightly.
Comment 16•10 years ago
|
||
Updating the tracking flags wrt comment #14
Reporter | ||
Comment 17•10 years ago
|
||
This is the #2 topcrasher for Firefox 34.0a2 with 1133/8332 crashes in the last 7 days. It looks like a pretty common problem and one that we probably need to fix if OMTC is on for 34 as it moves toward release.
Comment 18•10 years ago
|
||
Bas, I understand we can't reproduce this locally yet, but can we at least add some logging to help us?
Assignee: nobody → bas
Reporter | ||
Comment 19•10 years ago
|
||
RtlFreeHeap | LockImpl::Lock() reports have similar stacks. I am not sure if this is the same issue but it's showing up as the #8 top crash in Aurora right now, and maybe it's related.
Crash Signature: [@ RtlEnterCriticalSection | LockImpl::Lock()] → [@ RtlEnterCriticalSection | LockImpl::Lock()]
@ RtlFreeHeap | LockImpl::Lock() ]
Reporter | ||
Comment 20•10 years ago
|
||
Here's a sample report: https://crash-stats.mozilla.com/report/index/34ce4d8e-601a-4fa1-b410-66ca62140923
Updated•10 years ago
|
Crash Signature: [@ RtlEnterCriticalSection | LockImpl::Lock()]
@ RtlFreeHeap | LockImpl::Lock() ] → [@ RtlEnterCriticalSection | LockImpl::Lock()]
[@ RtlFreeHeap | LockImpl::Lock() ]
Comment 21•10 years ago
|
||
Actually, 33 is affected, but at much lover volume, and it's so late in the cycle, so let's mark it wontfix.
Reporter | ||
Comment 22•10 years ago
|
||
These sigs are now the #1 and #7 topcrashers in Aurora. My worry is that it will affect many users as 34 goes into Beta. This is just an update on the crash volume; I know that doesn't particularly help diagnose the problem.
Comment 23•10 years ago
|
||
:nical, can we wallpaper this and see if it helps?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 24•10 years ago
|
||
I find it slightly worrying that we are executing js and canvas2d stuff before gfxPlatform::Init, but it looks like the latter gets initialized lazily so maybe we got really unlucky and never called gfxPlatform::GetPlatform() (which triggers Init) by the time we tried to create the compositor. This patch should fix the issue if this is the case.
Assignee: bas → nical.bugzilla
Attachment #8502317 -
Flags: review?(bas)
Flags: needinfo?(nical.bugzilla)
Comment 25•10 years ago
|
||
Comment on attachment 8502317 [details] [diff] [review] all GetPlatform at the beginning of CreateCompositor to make sure that gfxPlatfrom gets initialized Review of attachment 8502317 [details] [diff] [review]: ----------------------------------------------------------------- It can't hurt I suppose...
Attachment #8502317 -
Flags: review?(bas) → review+
Assignee | ||
Comment 26•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c6980f9caff
https://hg.mozilla.org/mozilla-central/rev/5c6980f9caff
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Reporter | ||
Comment 28•10 years ago
|
||
Oh, awesome! I'll keep an eye on it on 35 and maybe it'll be good to uplift.
Reporter | ||
Updated•10 years ago
|
QA Contact: lhenry
Reporter | ||
Comment 29•10 years ago
|
||
This didn't appear to affect anything, at least not obviously, in Nightly. This crash signature wasn't showing up as a topcrasher in 35 though, at least not for the past month. There were only a few crashes reported for 35. None in 35 past 10-6. Let's try uplifting it to 34. Lawrence, what do you think?
Flags: needinfo?(lmandel)
Comment 30•10 years ago
|
||
(In reply to Liz Henry :lizzard from comment #29) > This didn't appear to affect anything, at least not obviously, in Nightly. Well, we didn't have any crashes in builds after this landed on Nightly, the last crash happened in the 10/10 build, which was just before this was merged to m-c. That said, we only had a total of 8 of those crashes in all October on Nightly 35, whereas it's a top crash on Aurora 34. We really need to uplift to know if the fix worked, and I'd hope we can do this before we actually go to build with b1 today, ideally.
Comment 31•10 years ago
|
||
And for documentation, the query I used as a basis for the comment above is here: https://crash-stats.mozilla.com/search/?signature=~LockImpl%3A%3ALock%28%29&release_channel=nightly&date=%3E2014-10-01&_facets=build_id&_columns=date&_columns=signature&_columns=product&_columns=build_id&_columns=platform (click on the build id facet and then on its build id colum to sort by build IDs)
Reporter | ||
Comment 32•10 years ago
|
||
Nicolas can you nominate this for uplift to 34? Thanks! I guess I could but I we usually leave it to the developer or reviewer.
Assignee | ||
Comment 33•10 years ago
|
||
Comment on attachment 8502317 [details] [diff] [review] all GetPlatform at the beginning of CreateCompositor to make sure that gfxPlatfrom gets initialized Approval Request Comment [Feature/regressing bug #]: [User impact if declined]: Crashes [Describe test coverage new/current, TBPL]: [Risks and why]: None. The patch is trivial. In most cases it's a no-op, in the other cases we are already crashing anyway, so there is no risk of making things worse. [String/UUID change made/needed]:
Attachment #8502317 -
Flags: approval-mozilla-beta?
Attachment #8502317 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
Attachment #8502317 -
Flags: approval-mozilla-beta?
Attachment #8502317 -
Flags: approval-mozilla-beta+
Attachment #8502317 -
Flags: approval-mozilla-aurora?
Attachment #8502317 -
Flags: approval-mozilla-aurora+
Comment 34•10 years ago
|
||
Comment on attachment 8502317 [details] [diff] [review] all GetPlatform at the beginning of CreateCompositor to make sure that gfxPlatfrom gets initialized Aurora has already landed
Attachment #8502317 -
Flags: approval-mozilla-aurora+ → approval-mozilla-aurora-
Comment 35•10 years ago
|
||
remote: https://hg.mozilla.org/releases/mozilla-beta/rev/00cf6dd17a10
Updated•10 years ago
|
Flags: needinfo?(lmandel)
Updated•10 years ago
|
Flags: qe-verify+
Comment 36•10 years ago
|
||
Looking in Socorro ([1] & [2]) for crashes with the two signatures over the past 14 days (since the fix landed in 35 on October 10) I see the following data: RtlFreeHeap | LockImpl::Lock() - Firefox 34 Beta 1 and newer – 3 crashes in 34 Beta 1 - Firefox 35 Aurora (builds after October 11) – NO crashes RtlEnterCriticalSection | LockImpl::Lock() - Firefox 34 Beta 1 and newer – 399 crashes in 34 Beta 1, 26 crashes in 34 Beta 2 - Firefox 35 Aurora (builds after October 11) – 6 crashes - Firefox 36 Nightly (builds after October 11) – 4 crashes [1] https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=14&signature=RtlFreeHeap+%7C+LockImpl%3A%3ALock%28%29#tab-reports [2] https://crash-stats.mozilla.com/report/list?product=Firefox&query_type=contains&range_unit=days&process_type=any&hang_type=any&signature=RtlEnterCriticalSection+%7C+LockImpl%3A%3ALock%28%29&date=2014-10-24+12%3A00%3A00&range_value=14#tab-sigsummary
Comment 37•10 years ago
|
||
Comment on attachment 8502317 [details] [diff] [review] all GetPlatform at the beginning of CreateCompositor to make sure that gfxPlatfrom gets initialized Let's take it for 33.0.3... Nical, Bas, you confirm that you didn't see any regression from this?
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(bas)
Attachment #8502317 -
Flags: approval-mozilla-release+
Updated•10 years ago
|
Comment 38•10 years ago
|
||
We need in the GECKO331_2014103013_RELBRANCH relbranch too :) thanks again
Comment 39•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-release/rev/c26a00be9a55 https://hg.mozilla.org/releases/mozilla-release/rev/204173cef30b
Assignee | ||
Comment 40•10 years ago
|
||
No regression.
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(bas)
Comment 41•10 years ago
|
||
Also landed on GECKO330_2014101104_RELBRANCH for 33.0.3: https://hg.mozilla.org/releases/mozilla-release/rev/691739025fac
Comment 42•10 years ago
|
||
Looking in Socorro ([1] & [2]) for crashes with the two signatures over the past 28 days I see the following data: RtlFreeHeap | LockImpl::Lock() - Firefox 33.1 & 33.0.3 or later - 1 crash in 33.0.3 - Firefox 34 Beta 1 and newer – 7 crashes - Firefox 35 Aurora – NO crashes - Firefox 36 Nightly – NO crashes RtlEnterCriticalSection | LockImpl::Lock() - Firefox 33.1 & 33.0.3 or later - NO crashes - Firefox 34 Beta 1 and newer – 397 crashes in 34 Beta 1, 22 crashes in 34 Beta 2, NO crashes in Beta 3 or later - Firefox 35 Aurora – 5 crashes - Firefox 36 Nightly – 3 crashes Comparing to results from 3 weeks ago (comment 36), and considering the low number of occurences, I think it's safe to call this fixed. [1] - https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=RtlFreeHeap+%7C+LockImpl%3A%3ALock%28%29 [2] - https://crash-stats.mozilla.com/report/list?product=Firefox&query_type=contains&range_unit=days&process_type=any&hang_type=any&signature=RtlEnterCriticalSection+%7C+LockImpl%3A%3ALock%28%29&date=2014-10-24+12%3A00%3A00&range_value=28
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•