All users were logged out of Bugzilla on October 13th, 2018

Crash in gfxPlatform::Init

RESOLVED FIXED in Firefox 56

Status

()

P1
critical
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: philipp, Assigned: kechen)

Tracking

(Blocks: 1 bug, {crash, regression})

53 Branch
mozilla57
crash, regression
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox53 wontfix, firefox54 wontfix, firefox55 wontfix, firefox56 fixed, firefox57 fixed)

Details

(Whiteboard: [gfx-noted], crash signature)

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

2 years ago
This bug was filed from the Socorro interface and is 
report bp-f804adaa-a451-4a62-9e62-db6270170505.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	gfxPlatform::Init() 	gfx/thebes/gfxPlatform.cpp:740
1 	xul.dll 	gfxPlatform::GetPlatform() 	gfx/thebes/gfxPlatform.cpp:536
2 	xul.dll 	mozilla::widget::GfxInfo::GetDWriteEnabled(bool*) 	widget/windows/GfxInfo.cpp:69
3 	xul.dll 	NS_InvokeByIndex 	xpcom/reflect/xptcall/md/win32/xptcinvoke_asm_x86_msvc.asm:54
4 	xul.dll 	XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) 	js/xpconnect/src/XPCWrappedNative.cpp:1296
5 	xul.dll 	XPC_WN_GetterSetter(JSContext*, unsigned int, JS::Value*) 	js/xpconnect/src/XPCWrappedNativeJSOps.cpp:1019
6 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:460
7 	xul.dll 	CallGetter 	js/src/vm/NativeObject.cpp:1812
8 	xul.dll 	js::GetProperty(JSContext*, JS::Handle<JSObject*>, JS::Handle<JSObject*>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/jsobj.h:857
9 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp:2827
10 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp:406
11 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:478
12 	xul.dll 	InternalCall 	js/src/vm/Interpreter.cpp:505
13 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp:2989
14 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp:406
15 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:478
16 	xul.dll 	InternalCall 	js/src/vm/Interpreter.cpp:505
17 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp:2989
18 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp:406
19 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:478
20 	xul.dll 	CallGetter 	js/src/vm/NativeObject.cpp:1812
21 	xul.dll 	js::Wrapper::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/proxy/Wrapper.cpp:143
22 	xul.dll 	js::CrossCompartmentWrapper::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/proxy/CrossCompartmentWrapper.cpp:209
23 	xul.dll 	js::Proxy::get(JSContext*, JS::Handle<JSObject*>, JS::Handle<JS::Value>, JS::Handle<jsid>, JS::MutableHandle<JS::Value>) 	js/src/proxy/Proxy.cpp:311
24 	xul.dll 	GetPropertyOperation 	js/src/vm/Interpreter.cpp:192
25 	xul.dll 	Interpret 	js/src/vm/Interpreter.cpp:2706
26 	xul.dll 	js::RunScript(JSContext*, js::RunState&) 	js/src/vm/Interpreter.cpp:406
27 	xul.dll 	js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) 	js/src/vm/Interpreter.cpp:478
28 	xul.dll 	InternalCall 	js/src/vm/Interpreter.cpp:505
29 	xul.dll 	nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS*, unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) 	js/xpconnect/src/XPCWrappedJSClass.cpp:1213
30 	xul.dll 	nsXPCWrappedJS::CallMethod(unsigned short, XPTMethodDescriptor const*, nsXPTCMiniVariant*) 	js/xpconnect/src/XPCWrappedJS.cpp:613
31 	xul.dll 	PrepareAndDispatch 	xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:85
32 	xul.dll 	SharedStub 	xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:112
33 	xul.dll 	NS_CreateServicesFromCategory(char const*, nsISupports*, char const*, char16_t const*) 	xpcom/components/nsCategoryManager.cpp:821

the crash signature is rising in firefox 53 and subsequent builds.

Correlations for Firefox Release
(50.37% in signature vs 02.99% overall) startup_crash = 1 [98.55% vs 02.47% if process_type = null]
(95.56% in signature vs 00.80% overall) GFX_ERROR "Failed to create DrawTarget, Type: " = true
(100.0% in signature vs 30.51% overall) plugin_version = null
(90.37% in signature vs 32.68% overall) platform_pretty_version = Windows 10
(93.33% in signature vs 14.94% overall) Module "igd10iumd32.dll" = true [93.94% vs 32.95% if platform_version = 10.0.10586]
(86.67% in signature vs 12.68% overall) Module "igdusc32.dll" = true [97.06% vs 40.60% if platform_version = 10.0.10240]
(93.33% in signature vs 17.90% overall) Module "ncrypt.dll" = true [93.94% vs 35.50% if platform_version = 10.0.10586]
(92.59% in signature vs 12.21% overall) Module "ntasn1.dll" = true [93.94% vs 35.50% if platform_version = 10.0.10586]
(96.30% in signature vs 00.40% overall) GFX_ERROR "[D2D1.1] 3CreateBitmap failure " = true [57.97% vs 00.25% if process_type = null]
(48.15% in signature vs 00.03% overall) moz_crash_reason = MOZ_CRASH(Could not initialize mScreenReferenceDrawTarget)
(44.44% in signature vs 01.70% overall) adapter_device_id = 0x22b1
(42.22% in signature vs 01.22% overall) CPU Info = GenuineIntel family 6 model 76 stepping 3

Comment 2

a year ago
The error code 0x8899000C means D2DERR_RECREATE_TARGET. Per discussion with Jerry, it looks like the device is broken and we should fallback to software backend. Kevin, could you check this issue?
Flags: needinfo?(kechen)
Priority: -- → P1
Whiteboard: [gfx-noted]
(Assignee)

Comment 3

a year ago
(In reply to Ethan Lin[:ethlin] from comment #2)
> The error code 0x8899000C means D2DERR_RECREATE_TARGET. Per discussion with
> Jerry, it looks like the device is broken and we should fallback to software
> backend. Kevin, could you check this issue?

Yes, I think it's advisable to fallback to the software backend; but I noticed that some crashes happened in content process.
In this case, not sure if we have to change the backend of UI process too.

Also I am curious about the rising crash number in Firefox 53.

I will look into the code and see if I can find the root cause first.
Flags: needinfo?(kechen)
Comment hidden (mozreview-request)
(Assignee)

Updated

a year ago
Assignee: nobody → kechen
Too late for 53, and we may want testing before trying to uplift this. Maybe better to leave in 55.
status-firefox53: affected → wontfix
status-firefox54: affected → fix-optional

Comment 6

a year ago
mozreview-review
Comment on attachment 8870369 [details]
Bug 1362321 - Do not crash in gfxPlatform:Init if there is a TDR happening;

https://reviewboard.mozilla.org/r/141822/#review147880

D2DERR_RECREATE_TARGET does not mean 'the device is broken and we should fallback' it means we should reset the device. I'm not entirely certain why our current codepaths aren't dealing with that here. I'd say the callers to this function should probably deal with it potentially returning null.
Attachment #8870369 - Flags: review?(bas) → review-
(Assignee)

Comment 7

a year ago
I think it's impossible to reset the device within a synchrized ipc via current mechanism.

Most workable solution to this might be triggering the FORCE_RESET flag and ask the compositor process to do the device reset recovery process and update the draw target.

However, base on the current codebase[1], ScreenReferenceDrawTarget() is defined as an infalliable function call; I might need to remove this assumption and modify all the 20 callers[2] of this function before doing the change mentioned above or we will run into other crashes during the recovery process.


[1] https://dxr.mozilla.org/mozilla-central/rev/7b8937970f9ca85db88cb2496f2112175fd847c8/gfx/thebes/gfxPlatform.h#565
[2] https://dxr.mozilla.org/mozilla-central/search?q=%2Bcallers%3AgfxPlatform%3A%3AScreenReferenceDrawTarget%28%29
Comment hidden (mozreview-request)
Comment hidden (mozreview-request)
(Assignee)

Comment 10

a year ago
Hello Bas,

I think there are two possible reasons which can cause the failure of the draw target creation in gfxPlatform::Init() : 
(a) There are something wrong with graphics device(removed, driver updated) and we are in the middle of device reset process.
(b) There are some internal errors within the device of this process but do not trigger device reset.

In situation (a), we should just leave an invalid draw target, and it will be updated later by device reset process; however, according to comment 20, ScreenReferenceDrawTarget() is defined as an infallible function and change it to be fallible and deal with its 20 callers is a big work. So I assign a draw target with software backend to it since it is going to be replaced later anyway. (Change ScreenReferenceDrawTarget() to fallible might be the ideal solution maybe we can open a follow-up bug for it.)

In situation (b), no device reset reason is detected, but we still failed to create the draw target, I am not sure if this situation can happen, so I leave a MOZ_CRASH for tracking. If it does happen, we can reinitialize graphic devices of the process and see if the draw target can be created by new devices.
(Assignee)

Updated

a year ago
Flags: needinfo?(bas)
(In reply to Kevin Chen[:kechen] (UTC + 8) from comment #10)
> Hello Bas,
> 
> I think there are two possible reasons which can cause the failure of the
> draw target creation in gfxPlatform::Init() : 
> (a) There are something wrong with graphics device(removed, driver updated)
> and we are in the middle of device reset process.
> (b) There are some internal errors within the device of this process but do
> not trigger device reset.
> 
> In situation (a), we should just leave an invalid draw target, and it will
> be updated later by device reset process; however, according to comment 20,
> ScreenReferenceDrawTarget() is defined as an infallible function and change
> it to be fallible and deal with its 20 callers is a big work. So I assign a
> draw target with software backend to it since it is going to be replaced
> later anyway. (Change ScreenReferenceDrawTarget() to fallible might be the
> ideal solution maybe we can open a follow-up bug for it.)
> 
> In situation (b), no device reset reason is detected, but we still failed to
> create the draw target, I am not sure if this situation can happen, so I
> leave a MOZ_CRASH for tracking. If it does happen, we can reinitialize
> graphic devices of the process and see if the draw target can be created by
> new devices.

Alright, so if we want to add a bandaid and keep ScreenReferenceDrawTarget infallible, it's okay to return a Software drawtarget from it, but let's not assign that to gPlatform->mScreenReferenceDrawTarget. Having a magical codepath which can change mScreenReferenceDrawTarget could cause mistakes in the future where mScreenReferenceDrawTarget has the wrong type of DT, which could be disastrous for performance.

Also, add a gfxCriticalError in this codepath when a device reset is -not- pending.
Flags: needinfo?(bas)
(Assignee)

Comment 12

a year ago
If we don't assign the software drawtarget to gPlatform->mScreenReferenceDrawTarget and leave it null, the program will still crash in other places since ScreenReferenceDrawTarget() is defined as infallible.
Therefore, the best solution might still be make ScreenReferenceDrawTarget() fallible, I am working on this patch.
(Assignee)

Comment 13

a year ago
Created attachment 8890654 [details] [diff] [review]
Recreate draw target after failing to create it in gfxPlatform::Init;

Hello Bas,
Per offline discussion with Jerry, making ScreenReferenceDrawTarget() infallible is too risky to do in this bug which will make the codebase unstable since there are lots of callers separated in different module[1]. It's advisable to open another bug for this.

Therefore, in this patch, I force gecko to reset the device and try to create the screen target if the first trial fails and fallback to software draw target if the second trial fails too.

Do you have any thought for this patch?
Thanks.

[1] https://dxr.mozilla.org/mozilla-central/search?q=%2Bcallers%3AgfxPlatform%3A%3AScreenReferenceDrawTarget%28%29
Attachment #8870369 - Attachment is obsolete: true
Attachment #8870369 - Flags: review?(bas)
Attachment #8890654 - Flags: feedback?(bas)
(Assignee)

Updated

a year ago
Attachment #8890654 - Flags: feedback?(bas)
Comment hidden (mozreview-request)

Comment 15

a year ago
mozreview-review
Comment on attachment 8870369 [details]
Bug 1362321 - Do not crash in gfxPlatform:Init if there is a TDR happening;

https://reviewboard.mozilla.org/r/141822/#review171236

::: gfx/thebes/gfxPlatform.h:577
(Diff revision 4)
>      /**
>       * Returns a 1x1 DrawTarget that can be used for measuring text etc. as
>       * it would measure if rendered on-screen.  Guaranteed to return a
>       * non-null and valid DrawTarget.
>       */
> -    mozilla::gfx::DrawTarget* ScreenReferenceDrawTarget() { return mScreenReferenceDrawTarget; }
> +    already_AddRefed<mozilla::gfx::DrawTarget> ScreenReferenceDrawTarget();

Hrm, if we make the return type RefPtr<mozilla::gfx::DrawTarget> will that allow us to use it a little more conveniently? I believe this should be okay now that we have move constructors.
Attachment #8870369 - Flags: review?(bas) → review-
Comment hidden (mozreview-request)

Comment 17

a year ago
mozreview-review
Comment on attachment 8870369 [details]
Bug 1362321 - Do not crash in gfxPlatform:Init if there is a TDR happening;

https://reviewboard.mozilla.org/r/141822/#review173506

::: dom/canvas/CanvasRenderingContext2D.cpp:6573
(Diff revision 5)
>  CanvasPath::AddPath(CanvasPath& aCanvasPath, const Optional<NonNull<SVGMatrix>>& aMatrix)
>  {
> -  RefPtr<gfx::Path> tempPath = aCanvasPath.GetPath(CanvasWindingRule::Nonzero,
> -                                                   gfxPlatform::GetPlatform()->ScreenReferenceDrawTarget());
> +  RefPtr<DrawTarget> dt =
> +    gfxPlatform::GetPlatform()->ScreenReferenceDrawTarget();
> +  RefPtr<gfx::Path> tempPath =
> +    aCanvasPath.GetPath(CanvasWindingRule::Nonzero, dt);

I don't think we need to go this if the return time is RefPtr<> right? Can't we just do aCanvasPath.GetPath(CanvasWindingRule::Nonzero, gfxPlatform::GetPlatform()->ScreenReferenceDrawTarget()); then? Or at least ScreenReferenceDrawTarget.get()?

That was the whole point of changing it from already_AddRefed<> to RefPtr<> :)
Attachment #8870369 - Flags: review?(bas) → review-
Comment hidden (mozreview-request)
(Assignee)

Updated

a year ago
Blocks: 1388995

Comment 19

a year ago
mozreview-review
Comment on attachment 8870369 [details]
Bug 1362321 - Do not crash in gfxPlatform:Init if there is a TDR happening;

https://reviewboard.mozilla.org/r/141822/#review174886

It makes me sad we have to do these .get()'s, but I suppose that's okay.
Attachment #8870369 - Flags: review?(bas) → review+
Comment hidden (mozreview-request)
(Assignee)

Updated

a year ago
Keywords: checkin-needed
Autoland can't push this until all pending issues in MozReview are marked as resolved.
Flags: needinfo?(kechen)
Keywords: checkin-needed
Attachment #8890654 - Attachment is obsolete: true
(Assignee)

Comment 22

a year ago
Thank you for the reminding, already resolved the issues in MozReview.
Flags: needinfo?(kechen)
Keywords: checkin-needed

Comment 23

a year ago
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/e5e5c2fe0f99
Do not crash in gfxPlatform:Init if there is a TDR happening; r=bas
Keywords: checkin-needed

Comment 24

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/e5e5c2fe0f99
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox57: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
status-firefox54: fix-optional → wontfix
status-firefox55: affected → wontfix
status-firefox56: --- → fix-optional
Please request Beta approval on this when you get a chance.
Flags: needinfo?(kechen)
(Assignee)

Comment 26

a year ago
Created attachment 8901706 [details] [diff] [review]
[Uplift] Do not crash in gfxPlatform:Init if there is a TDR happening;

Approval Request Comment
[Feature/Bug causing the regression]:
  Not a regression.
[User impact if declined]:
  Users will run into the startup crash.
[Is this code covered by automated tests?]:
  No.
[Has the fix been verified in Nightly?]:
  No, but the patch has been landed into central for 4 days, and not related crashes are discovered currently.
[Needs manual test from QE? If yes, steps to reproduce]: 
  No.
[List of other uplifts needed for the feature/fix]:
  None.
[Is the change risky?]:
  Not risky.
[Why is the change risky/not risky?]:
  This patch relieves the crash condition; without this patch, browsers just crashes without any recovery mechanism.
[String changes made/needed]:
  No.
Flags: needinfo?(kechen)
Attachment #8901706 - Flags: approval-mozilla-beta?
Doesn't apply to beta:     

grafting 418734:e5e5c2fe0f99 "Bug 1362321 - Do not crash in gfxPlatform:Init if there is a TDR happening; r=bas"
merging dom/canvas/CanvasRenderingContext2D.cpp
merging gfx/thebes/gfxPlatform.cpp
merging gfx/thebes/gfxPlatform.h
merging layout/base/PresShell.cpp
merging layout/base/nsLayoutUtils.cpp
merging layout/generic/nsBulletFrame.cpp
merging layout/generic/nsColumnSetFrame.cpp
 warning: conflicts while merging gfx/thebes/gfxPlatform.h! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
Flags: needinfo?(kechen)
We could still take this for the beta 8 build if we can get a clean patch for beta by Thursday morning.
(Assignee)

Comment 29

a year ago
Hello Sylvestre,

I downloaded the patch and tried to apply it on latest beta(with revision[1]) and no conflicts were found.
Could you help me to figure out which revision should I try?

[1] https://hg.mozilla.org/releases/mozilla-beta/rev/1fd4f4337aae
Flags: needinfo?(kechen) → needinfo?(sledru)
I think the issue is that the bot was trying to graft the commit that landed on m-c instead of the attached uplift patch. I can confirm that the uplift patch applies cleanly to Beta.
Flags: needinfo?(sledru)
Comment on attachment 8901706 [details] [diff] [review]
[Uplift] Do not crash in gfxPlatform:Init if there is a TDR happening;

Fix a crash. Beta56+ and see how it goes.
Attachment #8901706 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 32

a year ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/2baaba7b398c
status-firefox56: fix-optional → fixed
(In reply to Kevin Chen[:kechen] (UTC + 8) from comment #26)

> [Needs manual test from QE? If yes, steps to reproduce]: 
>   No.

Setting the qe-verify- flag based on Kevin's assessment.
Flags: qe-verify-
You need to log in before you can comment on or make changes to this bug.