crash in SkPixelRef::lockPixels()

RESOLVED FIXED in Firefox 42

Status

()

Core
WebRTC
P1
critical
Rank:
10
RESOLVED FIXED
3 years ago
2 years ago

People

(Reporter: potch, Assigned: pehrsons)

Tracking

({crash})

unspecified
mozilla42
Unspecified
Windows
crash
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox41 wontfix, firefox42 fixed)

Details

(crash signature)

MozReview Requests

()

Submitter Diff Changes Open Issues Last Updated
Loading...
Error loading review requests:

Attachments

(3 attachments, 2 obsolete attachments)

(Reporter)

Description

3 years ago
This bug was filed from the Socorro interface and is 
report bp-18a57c8e-1d15-4ac1-9a36-eb10c2150617.
=============================================================

This usually happens a few minutes after calling canvas.captureStream().

Here's the code I used: http://codepen.io/anon/pen/Ejvqbw
Is this a GFX bug?  (Or should someone else answer the question?)
backlog: --- → webRTC+
Rank: 10
Flags: needinfo?(jgilbert)
Priority: -- → P1
(Assignee)

Comment 2

3 years ago
George is the skia person I believe.
Flags: needinfo?(jgilbert) → needinfo?(gwright)
This is a thread safety issue with the way captureStream is implemented.

Moz2D isn't really threadsafe, and taking a snapshot (via CanvasRenderingContext2D::GetSurfaceSnapshot) returns a copy-on-write lightweight snapshot that refers back to the original DrawTarget (each DrawTarget implements this separately).

This mechanism isn't threadsafe, so have the snapshot accessed on the ImageBridge thread (the crashing thread) is racing with access to the DrawTarget for canvas drawing on the main thread.
Aha.  Thanks Matt.  Is there some way to add a thread-safety assertion to the snapshots?  And is there a workaround to resolve the problem without causing a threading issue?  

Related question: how does the impact of such workarounds affect performance?
Flags: needinfo?(matt.woodrow)
I'll also note I tried it on my Mac and got no crashes in 10 minutes.  Andreas, did we make any chances in how we grab frames since this was reported?
Flags: needinfo?(pehrsons)
By the looks of things we actually use runnables to avoid releasing off the main thread and hitting those assertions. Threading in Moz2D is something we need to look into further.

The best fix is to change the implementation of TimerDriver::TakeSnapshot() to allocate a new SourceSurfaceRawData and copy the pixels from the canvas snapshot into that. That way we guarantee that we have ownership of the data we're using.

This will likely perform worse as its making an extra copy, but there isn't much choice right now.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(gwright)
(Assignee)

Comment 7

3 years ago
Ah, I wish I had known this earlier. As Randell mentioned, some thread safety assertions would have been very nice.

I'll have a patch up real soon. It can hopefully fix some of the other issues we've been seeing with captureStream as well.

(In reply to Randell Jesup [:jesup] from comment #5)
> I'll also note I tried it on my Mac and got no crashes in 10 minutes. 
> Andreas, did we make any chances in how we grab frames since this was
> reported?

No changes, no. It's probably just rare.
Flags: needinfo?(pehrsons)
(Assignee)

Comment 8

3 years ago
Created attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow

Bug 1176363 - Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
Attachment #8629783 - Flags: review?(matt.woodrow)
(Assignee)

Updated

3 years ago
See Also: → bug 1165928
Comment on attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow

https://reviewboard.mozilla.org/r/12645/#review11133

Ship It!
Attachment #8629783 - Flags: review?(matt.woodrow) → review+
(Assignee)

Updated

3 years ago
Assignee: nobody → pehrsons
Status: NEW → ASSIGNED
Keywords: checkin-needed
status-firefox41: --- → affected
status-firefox42: --- → affected
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/80da8a74fad2 for frequent Win8 mochitest-gl and less frequent mochitest-1 assertion failures like https://treeherder.mozilla.org/logviewer.html#?job_id=11416725&repo=mozilla-inbound or the ones I retriggered up for you on your try push.
(Assignee)

Comment 13

3 years ago
OK, I figured out what is racing now.

The stack trace in most failures (like comment 12) is:
> 04:44:35     INFO -  Thread 0 (crashed) <- main thread
> 04:44:35     INFO -   0  xul.dll!mozilla::gfx::DataSourceSurface::Unmap() [2D.h:f90e28d039c1 : 475 + 0x53]
> 04:44:35     INFO -   1  xul.dll!mozilla::gfx::DrawTargetD2D1::OptimizeSourceSurface(mozilla::gfx::SourceSurface *) [DrawTargetD2D1.cpp:f90e28d039c1 : 1584 + 0x1d]
> 04:44:35     INFO -   2  xul.dll!nsLayoutUtils::SurfaceFromElement(mozilla::dom::HTMLVideoElement *,unsigned int,mozilla::gfx::DrawTarget *) [nsLayoutUtils.cpp:f90e28d039c1 : 6918 + 0x10]
> 04:44:35     INFO -   3  xul.dll!nsLayoutUtils::SurfaceFromElement(mozilla::dom::Element *,unsigned int,mozilla::gfx::DrawTarget *) [nsLayoutUtils.cpp:f90e28d039c1 : 6947 + 0x7]
> 04:44:35     INFO -   4  xul.dll!mozilla::dom::CanvasRenderingContext2D::DrawImage(mozilla::dom::HTMLImageElementOrHTMLCanvasElementOrHTMLVideoElement const &,double,double,double,double,double,double,double,double,unsigned char,mozilla::ErrorResult &) [CanvasRenderingContext2D.cpp:f90e28d039c1 : 4379 + 0x16]
> 04:44:35     INFO -   5  xul.dll!mozilla::dom::CanvasRenderingContext2DBinding::drawImage [CanvasRenderingContext2DBinding.cpp:f90e28d039c1 : 4126 + 0x9]
> ...

I was debugging on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b4f063845c0b

One of the M-gl tests failed, but due to timing changes gave me the other stack trace, http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/pehrsons@gmail.com-b4f063845c0b/try-win64-debug/try_win8_64-debug_test-mochitest-gl-bm119-tests1-windows-build1085.txt.gz:
> 05:59:39     INFO -  Thread 34 (crashed)
> 05:59:39     INFO -   0  xul.dll!mozilla::gfx::DataSourceSurface::Unmap() [2D.h:b4f063845c0b : 477 + 0x53]
> 05:59:39     INFO -   1  xul.dll!mozilla::gfx::CreatePartialBitmapForSurface [HelpersD2D.h:b4f063845c0b : 620 + 0x13]
> 05:59:39     INFO -   2  xul.dll!mozilla::gfx::DrawTargetD2D1::GetImageForSurface(mozilla::gfx::SourceSurface *,mozilla::gfx::Matrix &,mozilla::gfx::ExtendMode,mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const *) [DrawTargetD2D1.cpp:b4f063845c0b : 1552 + 0x3d]
> 05:59:39     INFO -   3  xul.dll!mozilla::gfx::DrawTargetD2D1::CopySurface(mozilla::gfx::SourceSurface *,mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> const &,mozilla::gfx::IntPointTyped<mozilla::gfx::UnknownUnits> const &) [DrawTargetD2D1.cpp:b4f063845c0b : 329 + 0x2d]
> 05:59:39     INFO -   4  xul.dll!mozilla::layers::CairoImage::GetTextureClient(mozilla::layers::CompositableClient *) [ImageContainer.cpp:b4f063845c0b : 654 + 0x3b]
> 05:59:39     INFO -   5  xul.dll!mozilla::layers::ImageClientSingle::UpdateImage(mozilla::layers::ImageContainer *,unsigned int) [ImageClient.cpp:b4f063845c0b : 154 + 0x8]
> 05:59:39     INFO -   6  xul.dll!mozilla::layers::UpdateImageClientNow [ImageBridgeChild.cpp:b4f063845c0b : 408 + 0x11]
> 05:59:39     INFO -   7  xul.dll!RunnableFunction<void (*)(mozilla::layers::ImageClient *,mozilla::layers::ImageContainer *),Tuple2<mozilla::layers::ImageClient *,nsRefPtr<mozilla::layers::ImageContainer> > >::Run() [task.h:b4f063845c0b : 420 + 0x9]
> ...


So it's a race between
* drawing the video element (with our now self-owned raw datasourcesurface) to a 2d canvas on the main thread,
* and compositing the video element with the same raw surface on the image client bridge thread (?)

So it basically boils down to an existing issue on Windows that I just happen to exercise a tad bit more than existing tests.

Matt, any ideas for fixes?
Flags: needinfo?(matt.woodrow)
(Assignee)

Updated

3 years ago
See Also: → bug 1166432
Created attachment 8633709 [details] [diff] [review]
Allow mapping of SourceSurfaceRawData from multiple threads
Flags: needinfo?(matt.woodrow)
Attachment #8633709 - Flags: review?(bas)
Created attachment 8633769 [details] [diff] [review]
Stop using DrawTargets off the main thread
Attachment #8633769 - Flags: review?(matt.woodrow)
Attachment #8633709 - Flags: review?(bas) → review+
Comment on attachment 8633769 [details] [diff] [review]
Stop using DrawTargets off the main thread

Review of attachment 8633769 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/d3d11/TextureD3D11.cpp
@@ +491,5 @@
>    }
>  
>    gfxWindowsPlatform* windowsPlatform = gfxWindowsPlatform::GetPlatform();
> +  ID3D11Device* d3d11device = windowsPlatform->GetD3D11DeviceForCurrentThread();
> +  bool haveD3d11Backend = windowsPlatform->GetContentBackend() == BackendType::DIRECT2D1_1 || NS_IsMainThread();

Comment about why using d3d11 is ok (even with d2d 1.1) off the main thread.

@@ +502,5 @@
>                                    D3D11_BIND_RENDER_TARGET | D3D11_BIND_SHADER_RESOURCE);
>  
>      newDesc.MiscFlags = D3D11_RESOURCE_MISC_SHARED;
>  
> +    if (!NS_IsMainThread()) {

Comment here about sync object handling this for us on the main thread.

::: gfx/layers/d3d11/TextureD3D11.h
@@ +63,5 @@
>    virtual bool CanExposeDrawTarget() const override { return true; }
>  
>    virtual gfx::DrawTarget* BorrowDrawTarget() override;
>  
> +  virtual void UpdateFromSurface(gfx::DataSourceSurface* aSurface) override;

Comment about how we use this for off-main-thread updates, and that BorrowDrawTarget() is not safe for this.
Attachment #8633769 - Flags: review?(matt.woodrow) → review+
This is increasingly a Windows bug.
OS: Mac OS X → Windows
Andreas, feel free to land my patch if you want to land your stuff. We can just leave the bug open until Bas lands with the review comments fixed.
(Assignee)

Comment 20

3 years ago
Thanks for the work here guys. I'm running our patches on try now Matt: https://treeherder.mozilla.org/#/jobs?repo=try&revision=83fd58f54a8f
(Assignee)

Comment 21

3 years ago
Comment on attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow

Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
Attachment #8629783 - Attachment description: MozReview Request: Bug 1176363 - Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow → MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow
Attachment #8629783 - Flags: review+
(Assignee)

Comment 22

3 years ago
Comment on attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow

Rebased and added part number. Carrying r=mattwoodrow.
Attachment #8629783 - Flags: review+
(Assignee)

Comment 23

3 years ago
Created attachment 8634648 [details] [diff] [review]
Part 2. Allow mapping of SourceSurfaceRawData from multiple threads

Rebased Matt's patch and fixed a typo (Atomic.h -> Atomics.h). Carrying forward r=bas.
Attachment #8633709 - Attachment is obsolete: true
Attachment #8634648 - Flags: review+
(Assignee)

Comment 24

3 years ago
Martin, see linux M-e10s-2 on [1], `application crashed [@ mozilla::TransportLayerShutdown]` is going crazy now. Maybe some timing changed with these patches but they seem unrelated to the crash. Is this on your radar?

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=83fd58f54a8f
Flags: needinfo?(martin.thomson)
I think that I want your patches if they are that effective.I've landed a fix for the crash, but not the underlying problem, which I've been trying to reproduce unsuccessfully for most of this week.
Flags: needinfo?(martin.thomson)
(Assignee)

Comment 26

3 years ago
Please land part 1 from reviewboard and part 2 from bugzilla.

Try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=83fd58f54a8f

The failing test was since disabled, so we should be good here.

Leaving open for bas to land his patch when it's ready.

Thanks!
Keywords: checkin-needed, leave-open
Created attachment 8639136 [details] [diff] [review]
Part 1: Stop using DrawTargets off the main thread v2

Update with a bunch of fixes and made to pass try.
Attachment #8639136 - Flags: review?(matt.woodrow)
Comment on attachment 8639136 [details] [diff] [review]
Part 1: Stop using DrawTargets off the main thread v2

Review of attachment 8639136 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/ImageContainer.cpp
@@ +649,5 @@
> +  RefPtr<DataSourceSurface> dataSurf = surface->GetDataSurface();
> +  textureClient->UpdateFromSurface(dataSurf);
> +
> +  if (NS_IsMainThread()) {
> +    textureClient->SyncWithObject(forwarder->GetSyncObject());

This is a bit weird because the TextureClient implicitly decides whether to use the sync object (or keyedmutex), but we're required to explicitly tell it to sync (sometimes).

It'd be a bit clearer if these two things matched, or if we had a textureClient->NeedsSync() method to check (rather than NS_IsMainThread which isn't very obvious).

Anyway, a comment about why we only do this for the main thread would be a great start.
Attachment #8639136 - Flags: review?(matt.woodrow) → review+
Keywords: leave-open
https://hg.mozilla.org/mozilla-central/rev/efb3e9a0b0b8
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox42: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
(Assignee)

Comment 35

3 years ago
Comment on attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow

Approval Request Comment
[Feature/regressing bug #]: Canvas CaptureStream
[User impact if declined]: A live 30FPS canvas captureStream causes Firefox to crash after a couple of minutes. Depends a bit on the platform (timing sensitive) but has shown up frequently in automation, and been reported by users. Canvas captureStreams are protected by a pref and are not available by default.
[Describe test coverage new/current, TreeHerder]: Good coverage in automation that shows the issue as largely fixed. (occasional crashes have occurred, perhaps once a month, but not several a day like before this fix)
[Risks and why]: low. This patch only affects code behind a pref. Accompanying patches (part 2 on this bug and bug 1185011) are in very central code and thus well tested in automation.
[String/UUID change made/needed]: none
Attachment #8629783 - Flags: approval-mozilla-aurora?
(Assignee)

Comment 36

3 years ago
Comment on attachment 8634648 [details] [diff] [review]
Part 2. Allow mapping of SourceSurfaceRawData from multiple threads

Approval Request Comment
See comment 35.
Attachment #8634648 - Flags: approval-mozilla-aurora?
Matt, I was unable to find this crash in FF41 in the past 28 days. Do you see it? I am just wondering whether this change can just ride the train instead of getting uplifted to Aurora. Thoughts?
Flags: needinfo?(matt.woodrow)
I was never able to reproduce this, so unsure.
Flags: needinfo?(matt.woodrow)
The test failures in the dependent bugs definitely affect Aurora41.
Comment on attachment 8629783 [details]
MozReview Request: Bug 1176363 - Part 1. Make a raw copy of each Canvas::CaptureStream frame. r=mattwoodrow

Seems safe to uplift to Aurora as 1) this has been in m-c for a week and 2) by default CanvasStream is pref'd off.
Attachment #8629783 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

3 years ago
Attachment #8634648 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I was able to uplift Andreas' patches, but Bas' patch needs rebasing.

https://hg.mozilla.org/releases/mozilla-aurora/rev/ef164895b708
https://hg.mozilla.org/releases/mozilla-aurora/rev/ff1c96567f25
Flags: needinfo?(bas)
(Assignee)

Comment 42

3 years ago
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #41)
> I was able to uplift Andreas' patches, but Bas' patch needs rebasing.
> 
> https://hg.mozilla.org/releases/mozilla-aurora/rev/ef164895b708
> https://hg.mozilla.org/releases/mozilla-aurora/rev/ff1c96567f25

I only requested aurora approval for those because I don't think Bas' patch is necessary. As I take it it just asserts that no-one is using DrawTargets in the wrong way and thus is mainly needed for new offenders - existing issues would probably pop up as intermittent oranges anyway.

I'll let Bas chip in as well.
The crashes didn't go away for good until Bas' patch landed.
(Assignee)

Comment 44

3 years ago
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #43)
> The crashes didn't go away for good until Bas' patch landed.

Ah, got it. That's why there were some stray oranges for a few days..
See Also: → bug 1168762
See Also: → bug 1179551
(In reply to Andreas Pehrson [:pehrsons] (Telenor) from comment #42)
> (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #41)
> > I was able to uplift Andreas' patches, but Bas' patch needs rebasing.
> > 
> > https://hg.mozilla.org/releases/mozilla-aurora/rev/ef164895b708
> > https://hg.mozilla.org/releases/mozilla-aurora/rev/ff1c96567f25
> 
> I only requested aurora approval for those because I don't think Bas' patch
> is necessary. As I take it it just asserts that no-one is using DrawTargets
> in the wrong way and thus is mainly needed for new offenders - existing
> issues would probably pop up as intermittent oranges anyway.
> 
> I'll let Bas chip in as well.

No, my patch fixes the actual problem :-). Matt's patches just make the crash significantly less likely to occur. Having said that, my patches are significantly more risky. They've had some testing but they at least need bug 1190950 uplifted as well. I'll leave it up to release management to determine if we want to do that uplift.
Flags: needinfo?(rkothari)
Flags: needinfo?(pehrsons)
Flags: needinfo?(bas)
Attachment #8633769 - Attachment is obsolete: true
(Assignee)

Comment 46

3 years ago
(In reply to Bas Schouten (:bas.schouten) from comment #45)
> No, my patch fixes the actual problem :-).

Right, not sure why I thought they were just asserting...
Flags: needinfo?(pehrsons)
Bas, can you please nominate the patch(s) from bug 1190950 for uplift to Beta? Without that tracking flag "approval-mozilla-aurora/beta" set to "?" I cannot do much.
Flags: needinfo?(rkothari) → needinfo?(bas)
Done.
Flags: needinfo?(bas)
Depends on: 1190950
Bas, were you still going to look at this rebase for Beta41?
Flags: needinfo?(bas)
Hi Bas and Matt,

I have a question about the main thread assertion in TextureClient::BorrowDrawTarget()[1].

If we have off-main-thread painting, we still need to call the BorrowDrawTarget() at painting thread and hit the assertion.

With [2], if we borrow the drawTarget and get the snapshot at main thread, we still have the data race problem when we access the snapshot at another thread.

Any suggestion for calling borrowTarget() at painting thread?

[1]
https://hg.mozilla.org/mozilla-central/annotate/a6786bf8d71d4cf40c3d40e06d8e3c9866863475/gfx/layers/client/TextureClient.cpp#l836

[2]
https://bugzilla.mozilla.org/attachment.cgi?id=8639136
Flags: needinfo?(matt.woodrow)
Jerry, we should move this to a new bug, since fixing it might take a bit of work.

BorrowDrawTarget asserts that it's only called on the main thread because Moz2D isn't really threadsafe and some backends have issues with using it across threads (especially d2d).

We're only going to be using a DrawTarget (and snapshots of it) from a single thread, so this should be solvable fairly easy. D2D in particular will need to be setup with a separate device for each thread that it is used on.
Flags: needinfo?(matt.woodrow)
status-firefox41: affected → wontfix
Flags: needinfo?(bas)

Updated

2 years ago
Depends on: 1246410

Updated

2 years ago
No longer depends on: 1246410
You need to log in before you can comment on or make changes to this bug.