Closed Bug 916128 Opened 6 years ago Closed 6 years ago

Canvas2D crash [@nsXPCWrappedJS::Release]

Categories

(Core :: Canvas: 2D, defect, critical)

x86_64
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla27
Tracking Status
firefox26 --- unaffected
firefox27 --- verified
firefox-esr24 --- unaffected
b2g18 --- unaffected

People

(Reporter: posidron, Assigned: spohl)

References

(Blocks 1 open bug)

Details

(Keywords: crash, sec-high, testcase)

Attachments

(4 files, 5 obsolete files)

Attached file testcase
I believe this depends on bug 817700 as well.

Tested with https://hg.mozilla.org/integration/mozilla-inbound/rev/3c7913e853b8
Attached file callstack
See Also: → 914966
Assignee: nobody → spohl.mozilla.bugs
Regression window(m-i)
Good:
http://hg.mozilla.org/integration/mozilla-inbound/rev/5b89a959713b
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130912125238
Bad:
http://hg.mozilla.org/integration/mozilla-inbound/rev/e375ecffbc98
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:26.0) Gecko/20100101 Firefox/26.0 ID:20130912140538
Pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=5b89a959713b&tochange=e375ecffbc98
Looks like this is caused by FileCallback not being thread-safe:
Hit MOZ_CRASH(CallbackObject not thread-safe) at /Users/Stephen/Documents/mozilla-central/dom/bindings/CallbackObject.cpp:31

Thread 45 Crashed:
0   XUL                           	0x00000001046ad840 mozilla::dom::CallbackObject::Release() + 224 (CallbackObject.cpp:31)
1   XUL                           	0x0000000102405534 nsRefPtr<mozilla::dom::FileCallback>::~nsRefPtr() + 52 (nsAutoPtr.h:887)
2   XUL                           	0x00000001024054f5 nsRefPtr<mozilla::dom::FileCallback>::~nsRefPtr() + 21 (nsAutoPtr.h:888)
3   XUL                           	0x0000000102405495 mozilla::dom::EncodingCompleteEvent::~EncodingCompleteEvent() + 53 (ImageEncoder.cpp:28)
4   XUL                           	0x0000000102404aa5 mozilla::dom::EncodingCompleteEvent::~EncodingCompleteEvent() + 21 (ImageEncoder.cpp:28)
5   XUL                           	0x0000000102404ac9 mozilla::dom::EncodingCompleteEvent::~EncodingCompleteEvent() + 25 (ImageEncoder.cpp:28)
6   XUL                           	0x0000000102403415 mozilla::dom::EncodingCompleteEvent::Release() + 485 (ImageEncoder.cpp:68)
7   XUL                           	0x00000001024052c1 nsRefPtr<mozilla::dom::EncodingCompleteEvent>::~nsRefPtr() + 49 (nsAutoPtr.h:887)
8   XUL                           	0x0000000102404745 nsRefPtr<mozilla::dom::EncodingCompleteEvent>::~nsRefPtr() + 21 (nsAutoPtr.h:888)
9   XUL                           	0x0000000102405355 mozilla::dom::EncodingRunnable::~EncodingRunnable() + 53 (ImageEncoder.cpp:92)
10  XUL                           	0x00000001024050f5 mozilla::dom::EncodingRunnable::~EncodingRunnable() + 21 (ImageEncoder.cpp:92)
11  XUL                           	0x0000000102405119 mozilla::dom::EncodingRunnable::~EncodingRunnable() + 25 (ImageEncoder.cpp:92)
12  XUL                           	0x00000001024037d5 mozilla::dom::EncodingRunnable::Release() + 485 (ImageEncoder.cpp:147)
13  XUL                           	0x0000000101563f1b nsCOMPtr<nsIRunnable>::~nsCOMPtr() + 91 (nsCOMPtr.h:514)
14  XUL                           	0x00000001015623b5 nsCOMPtr<nsIRunnable>::~nsCOMPtr() + 21 (nsCOMPtr.h:515)
15  XUL                           	0x0000000104779333 nsThread::ProcessNextEvent(bool, bool*) + 1875 (nsThread.cpp:630)
16  XUL                           	0x00000001046d1519 NS_ProcessNextEvent(nsIThread*, bool) + 169 (nsThreadUtils.cpp:238)
17  XUL                           	0x0000000104777c17 nsThread::ThreadFunc(void*) + 295 (nsThread.cpp:250)
18  libnss3.dylib                 	0x00000001012383f5 _pt_root + 357
19  libsystem_c.dylib             	0x00007fff968687a2 _pthread_start + 327
20  libsystem_c.dylib             	0x00007fff968551e1 thread_start + 13
On MacOS with a non-debug/opt build this produces now a stack-buffer-underflow.

Two days ago it only returned a null-deref, yesterday it was not reproducible at all. 

Will mark this as s-s.
Group: core-security
It seems right to back out bug 817700 until this issue (and bug 914966) is resolved. If I don't hear otherwise, I'll go ahead with the backout today.
Yeah, EncodingCompleteEvent can't assume that it will be destroyed on the main thread since the main thread could run it before NS_DispatchToMainThread returns on the background thread.

The solution is to clear mCallback (and mScriptContext) once we're done with them at http://hg.mozilla.org/mozilla-central/annotate/87c4a21ed158/content/canvas/src/ImageEncoder.cpp#l45.

Happy to review here.
Thanks Kyle! I get the impression that we shouldn't be calling Release() on mCallback from the encoding thread at all. Unfortunately, we do this indirectly via the EncodingRunnable destructor (see stack in comment 3). Clearing mCallback and mScriptContext like you suggested wasn't sufficient in my local run.

Should I try to make CallbackObject thread-safe (not sure how difficult that would be), store a raw FileCallback* instead of an nsRefPtr in EncodingCompleteEvent, or should I try something different? Before bug 911575 we used to store an nsCOMPtr<nsIFileCallback> which did not exhibit this issue. Bug 911575 replaced the use of nsIFileCallback with FileCallback.
Flags: needinfo?(khuey)
(In reply to Stephen Pohl [:spohl] from comment #7)
> Thanks Kyle! I get the impression that we shouldn't be calling Release() on
> mCallback from the encoding thread at all.

That's correct.

> Unfortunately, we do this
> indirectly via the EncodingRunnable destructor (see stack in comment 3).
> Clearing mCallback and mScriptContext like you suggested wasn't sufficient
> in my local run.

Er, why not?  What went wrong after that?

> Should I try to make CallbackObject thread-safe (not sure how difficult that
> would be)

No.  That's not possible

> , store a raw FileCallback* instead of an nsRefPtr in EncodingCompleteEvent,

That'll be a different sg:crit.

> or should I try something different? Before bug
> 911575 we used to store an nsCOMPtr<nsIFileCallback> which did not exhibit
> this issue. Bug 911575 replaced the use of nsIFileCallback with FileCallback.

It just didn't exhibit the problem yet.  We're going to make XPCWrappedJS main-thread-only at some point.
Flags: needinfo?(khuey)
> Clearing mCallback and mScriptContext like you suggested wasn't sufficient 

Hmm.  Why not?

> Should I try to make CallbackObject thread-safe

You can't: it's cycle-collected.

> store a raw FileCallback* instead of an nsRefPtr in EncodingCompleteEvent

And proxy-release on mainthread?  Better to use nsMainThreadPtrHolder if we have to.

> Before bug 911575 we used to store an nsCOMPtr<nsIFileCallback> which did not exhibit
> this issue

Maybe.  See bug 771074: it could trigger data races which would corrupt memory.  The new code just does a MOZ_CRASH instead of silently scribbling on non-threadsafe data structures.
Note that nsMainThreadPtrHolder might be the simple-and-principled way to do this...
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #8)
> (In reply to Stephen Pohl [:spohl] from comment #7) 
> > Unfortunately, we do this
> > indirectly via the EncodingRunnable destructor (see stack in comment 3).
> > Clearing mCallback and mScriptContext like you suggested wasn't sufficient
> > in my local run.
> 
> Er, why not?  What went wrong after that?

We continue to call Release() on mCallback via the EncodingRunnable destructor (same stack as comment 3). I'd say this causes problems when the EncodingRunnable object is done and gets destroyed before the EncodingCompleteEvent's Run() method could get as far as clearing mCallback. The console prints:
Hit MOZ_CRASH(CallbackObject not thread-safe) at /Users/Stephen/Documents/mozilla-central/dom/bindings/CallbackObject.cpp:31

> > , store a raw FileCallback* instead of an nsRefPtr in EncodingCompleteEvent,
> 
> That'll be a different sg:crit.

I figured you'd say that...

(In reply to Boris Zbarsky [:bz] from comment #9)
> > store a raw FileCallback* instead of an nsRefPtr in EncodingCompleteEvent
> 
> And proxy-release on mainthread?  Better to use nsMainThreadPtrHolder if we
> have to.

I shall try this route.

> > Before bug 911575 we used to store an nsCOMPtr<nsIFileCallback> which did not exhibit
> > this issue
> 
> Maybe.  See bug 771074: it could trigger data races which would corrupt
> memory.  The new code just does a MOZ_CRASH instead of silently scribbling
> on non-threadsafe data structures.

Ah, I see. Thanks!
> EncodingRunnable object is done and gets destroyed before the EncodingCompleteEvent's
> Run() method could ge

The EncodingRunnable should transfer ownership to the EncodingCompleteEvent, I'd think.
It looks like I've been misunderstanding the meaning of NS_ENSURE_SUCCESS. I thought this was a debug-only check, similar to an assertion and didn't realize it would return early when encountering a failure. In this particular case, we return early after a (seemingly expected) failed call to ImageEncoder::ExtractDataInternal here:
http://hg.mozilla.org/mozilla-central/annotate/87c4a21ed158/content/canvas/src/ImageEncoder.cpp#l118

We end up never dispatching the complete event to the main thread and therefore release mCallback from the encoder thread.

This was one of the tougher ways for me to learn the meaning of NS_ENSURE_SUCCESS... :-(
Attached patch Patch (obsolete) — Splinter Review
This fixes the issue for me.
Attachment #804686 - Flags: review?(khuey)
Comment on attachment 804686 [details] [diff] [review]
Patch

Ah, snap. Forgot to shut down the encoder thread in the failure case.
Attachment #804686 - Flags: review?(khuey)
Attached patch Patch (obsolete) — Splinter Review
Made sure to shut down the worker thread on failure as well.
Attachment #804686 - Attachment is obsolete: true
Attachment #804695 - Flags: review?(khuey)
Comment on attachment 804695 [details] [diff] [review]
Patch

Review of attachment 804695 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/ImageEncoder.cpp
@@ +130,5 @@
> +    if (NS_FAILED(rv)) {
> +      mEncodingCompleteEvent->SetFailed();
> +      NS_DispatchToMainThread(mEncodingCompleteEvent, NS_DISPATCH_NORMAL);
> +      return rv;
> +    }

You should wrap this entire function with something like.

nsresult InnerDoWork(args);

nsresult DoWork(args) {
  rv = InnerDoWorker(args);
  if (NS_FAILED(rv)) {
    mEncodingCompleteEvent->SetFailed();
  }
  NS_DispatchToMainThread(mEncodingCompleteEvent);
  return rv;
}

@@ +158,5 @@
>      rv = NS_DispatchToMainThread(mEncodingCompleteEvent, NS_DISPATCH_NORMAL);
> +    if (NS_FAILED(rv)) {
> +      mEncodingCompleteEvent->SetFailed();
> +      NS_DispatchToMainThread(mEncodingCompleteEvent, NS_DISPATCH_NORMAL);
> +      return rv;

You definitely don't want to try to dispatch again ...
Attachment #804695 - Flags: review?(khuey) → review-
Attached patch Patch (obsolete) — Splinter Review
Thanks Kyle! Addressed feedback.
Attachment #804695 - Attachment is obsolete: true
Attachment #804785 - Flags: review?(khuey)
Comment on attachment 804785 [details] [diff] [review]
Patch

Review of attachment 804785 [details] [diff] [review]:
-----------------------------------------------------------------

::: content/canvas/src/ImageEncoder.cpp
@@ +49,4 @@
>      }
>  
> +    mScriptContext = nullptr;
> +    mCallback = nullptr;

Add a comment about why you are explicitly setting these to null.

@@ +153,2 @@
>      rv = NS_DispatchToMainThread(mEncodingCompleteEvent, NS_DISPATCH_NORMAL);
>      NS_ENSURE_SUCCESS(rv, rv);

Not that NS_DispatchToMainThread really fails, but this should be

if (NS_FAILED(rv)) {
  // Better to leak than to crash.
  mEncodingCompleteEvent.forget();
  return rv;
}
Attachment #804785 - Flags: review?(khuey) → review+
Attached patch PatchSplinter Review
Thanks, Kyle! Addressed feedback. Carrying over r+.

I will land this at the same time as bug 817700 once we're ready to land again.
Attachment #804785 - Attachment is obsolete: true
Attachment #805307 - Flags: review+
Keywords: sec-high
Attached patch Test (obsolete) — Splinter Review
Adds crashtest.
Attachment #810047 - Flags: review?(khuey)
Comment on attachment 810047 [details] [diff] [review]
Test

Review of attachment 810047 [details] [diff] [review]:
-----------------------------------------------------------------

Please name the test 916128-1.html.  Also please credit Christoph Diehl as the author of this commit.
Attachment #810047 - Flags: review?(khuey) → review+
Attached patch TestSplinter Review
Addressed feedback, carrying over r+.

Sorry, didn't mean to take credit for the test. Thanks for catching it.

I chose the descriptive file name because of the discussion in firefox-dev[1] and dev-planning[2], but actually prefer a name with just a bug number myself. I'm fine either way.

[1] https://groups.google.com/forum/#!topic/firefox-dev/RIGHJHpxcIY
[2] https://groups.google.com/forum/#!msg/mozilla.dev.planning/HlOyIYwWeb8/NUH1cUa2cgYJ
Attachment #810047 - Attachment is obsolete: true
Attachment #810535 - Flags: review+
The test (attachment 804478 [details]) is now failing at http://mxr.mozilla.org/mozilla-central/source/content/html/content/src/HTMLCanvasElement.cpp#600, an assert introduced with bug 916322. The context's and the canvas' sizes are mismatched again.

At the time of failure we have:
elementSize = { width = 0, height = 150 }
mCurrentContext->GetWidth() = 1
mCurrentContext->GetHeight() = 1

Milan, are you able to tell what's going on here?
Flags: needinfo?(milan)
Even when we pass width (or height) zero to the context, we insist on having a minimum 1x1 size, as 0 size "causes problems" according to the code comments.  This would explain the 0 vs. 1.  I'm checking on the 150 vs. 1.
Not sure this is a right thing to do, but let me know if this actually fixes the problem.
Attachment #813597 - Attachment description: WIP: Since the contexts don't allow <=0 size, don't allow that in the canvas element either. → Since the contexts don't allow <=0 size, don't allow that in the canvas element either.
Milan, this fixes the issue in my testing! mCurrentContext->GetHeight() now also correctly returns 150. Thanks!
Flags: needinfo?(milan)
Great.  Let's wait for the review to see if this is the right approach.  May take a few days because of the summit.
Comment on attachment 813597 [details] [diff] [review]
Since the contexts don't allow <=0 size, don't allow that in the canvas element either.

Review of attachment 813597 [details] [diff] [review]:
-----------------------------------------------------------------

If this is in line with what the specs wants us to do (which I don't know), this looks fine to me.
Attachment #813597 - Flags: review?(bas) → review+
(In reply to Bas Schouten (:bas.schouten) from comment #31)
> Comment on attachment 813597 [details] [diff] [review]
> Since the contexts don't allow <=0 size, don't allow that in the canvas
> element either.
> 
> Review of attachment 813597 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> If this is in line with what the specs wants us to do (which I don't know),
> this looks fine to me.

The spec calls for "valid non-negative integer", so we should be covered.
Stephen, looks like you should be able to try again - just make sure you grab my patch along the way.
Flags: needinfo?(spohl.mozilla.bugs)
Looks like this will get back out.  Let me spin off the patch I put in into a separate bug and make sure that lands, as it did cause some issues in the tests that were expecting it to fail.
(In reply to Milan Sreckovic [:milan] from comment #32)
> (In reply to Bas Schouten (:bas.schouten) from comment #31)
> > Comment on attachment 813597 [details] [diff] [review]
> > Since the contexts don't allow <=0 size, don't allow that in the canvas
> > element either.
> > 
> > Review of attachment 813597 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > If this is in line with what the specs wants us to do (which I don't know),
> > this looks fine to me.
> 
> The spec calls for "valid non-negative integer", so we should be covered.

I have no idea how "valid non-negative integer" suggested to me that it's OK to exclude zero.
Green try with the patch from bug 924573 and the test above.
Comment on attachment 813597 [details] [diff] [review]
Since the contexts don't allow <=0 size, don't allow that in the canvas element either.

Different patch in bug 924573.
Attachment #813597 - Flags: review+ → review-
Try run with all patches here, plus patches in bug 817700 and bug 924573 applied. Bug 924573 has already landed on inbound. This bug plus bug 817700 will land simultaneously, assuming try will be green:
https://tbpl.mozilla.org/?tree=Try&rev=8246ac236c59
If bug 924573 sticks, it'll be worth doing a try run (do not take my r- patch above.)
(In reply to Milan Sreckovic [:milan] from comment #41)
> If bug 924573 sticks, it'll be worth doing a try run (do not take my r-
> patch above.)

Yup. I've manually applied the patch from bug 924573 instead of the one here. Try run is referenced in comment 40. I'll push bug 817700 and this bug here once bug 924573 is green on inbound and the try run in comment 40 succeeded.
Comment on attachment 813597 [details] [diff] [review]
Since the contexts don't allow <=0 size, don't allow that in the canvas element either.

Marking this patch as obsolete since a new patch landed with bug 924573 and to avoid confusion.
Attachment #813597 - Attachment is obsolete: true
fixed on mozilla-central
https://hg.mozilla.org/mozilla-central/rev/18d72b851345
https://hg.mozilla.org/mozilla-central/rev/3b3eb94009e6
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Keywords: verifyme
Cleaning up list of security bugs for b2g18. This bug doesn't need to be backported either due to it affecting a later version of Fx or another reason.
verified with Nightly build 20131024030204
Status: RESOLVED → VERIFIED
Keywords: verifyme
Group: core-security
Blocks: 1289929
You need to log in before you can comment on or make changes to this bug.