Closed Bug 876859 Opened 11 years ago Closed 11 years ago

GetThebesSurfaceForDrawTarget leaks the DrawTarget

Categories

(Core :: Graphics, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bjacob, Unassigned)

References

Details

Talking about an actual leak reported by Valgrind while running Canvas 2D tests with Skia/GL, see bug 875218 comment 1.

Here is the code for GetThebesSurfaceForDrawTarget in gfxPlatform.cpp:

already_AddRefed<gfxASurface>
gfxPlatform::GetThebesSurfaceForDrawTarget(DrawTarget *aTarget)
{
  if (aTarget->GetType() == BACKEND_CAIRO) {
    cairo_surface_t* csurf =
      static_cast<cairo_surface_t*>(aTarget->GetNativeSurface(NATIVE_SURFACE_CAIRO_SURFACE));
    return gfxASurface::Wrap(csurf);
  }

  // The semantics of this part of the function are sort of weird. If we
  // don't have direct support for the backend, we snapshot the first time
  // and then return the snapshotted surface for the lifetime of the draw
  // target. Sometimes it seems like this works out, but it seems like it
  // might result in no updates ever.
  RefPtr<SourceSurface> source = aTarget->Snapshot();
  RefPtr<DataSourceSurface> data = source->GetDataSurface();

  if (!data) {
    return NULL;
  }

  IntSize size = data->GetSize();
  gfxASurface::gfxImageFormat format = OptimalFormatForContent(ContentForFormat(data->GetFormat()));


  nsRefPtr<gfxASurface> surf =
    new gfxImageSurface(data->GetData(), gfxIntSize(size.width, size.height),
                        data->Stride(), format);

  surf->SetData(&kDrawSourceSurface, data.forget().drop(), DataSourceSurfaceDestroy);
  // keep the draw target alive as long as we need its data
  aTarget->AddRef();
  surf->SetData(&kDrawTargetForSurface, aTarget, DataDrawTargetDestroy);

  return surf.forget();
}

This relies on the AddRef() here to be matched by a Release() call made by DataDrawTargetDestroy which is supposed to be called back by Cairo when the Cairo surface goes away.

However, let's look at gfxASurface::SetData() in gfxASurface.cpp:

void
gfxASurface::SetData(const cairo_user_data_key_t *key,
                     void *user_data,
                     thebes_destroy_func_t destroy)
{
    if (!mSurfaceValid)
        return;
    cairo_surface_set_user_data(mSurface, key, user_data, destroy);
}

Setting a breakpoint there showed that everytime this is called while running Canvas 2D tests, |mSurfaceValid| is false, and thus this function returns without calling cairo_surface_set_user_data. So our DataDrawTargetDestroy callback is never called.

Short-term fix suggestion: figure why mSurfaceValid is false. We just constructed that surface above in GetThebesSurfaceForDrawTarget, so this has got to have a simple explanation.

Better yet: stop being crazy with these manual AddRef and Release. Was the author of this code smoking crack? Relying on Cairo to call a callback to call Release() to match an AddRef() that we did ourselves --- seriously? In a sane world we wouldn't be doing any AddRef() and Release() here and would instead have a strong pointer to let the Surface keep the DrawTarget alive.

Apparently the craziness started with:

# HG changeset patch
# User Jeff Muizelaar <jmuizelaar@mozilla.com>
# Date 1326466109 18000
# Node ID 044c107a719f16795f2eeb0fb2605530b6839963
# Parent  d3b761d31434ee736a92969360df0d4a5df9fe5e
Bug 717921. Only have one thebes surface. r=bas
Alright, so Jeff convinced me that this pattern is forced by Cairo, as we are passing it a memory buffer that it may access arbitrarily late, and the only way that it informs us of when it is done with that buffer is a callback that it calls on us.

So in my book, it's the Cairo authors who are smoking crack. But according to Jeff, this pattern exists also in Direct2D.

Anyway, we need to understand why our surface here is invalid; we should also fix gfxASurface::SetData to not silently do nothing when passed an invalid surface. It should either return bool, or assert.
It seems that we will have to do with the callback for now:
 - once we use only Azure for everything including content, we'll be able to rely on that to sort out the ownership within Azure;
 - alternatively, taking a step back, the basic problem here is that we lack a system-wide bitmap surface class. If instead of passing plain bitmap pointers we could pass something like android::GraphicBuffer pointers, all sides would be able to get reference counting right without manual addref'ing and release'ing.

Back to the short-term fix:

The problem is that SourceSurfaceSkia::GetData() wants to use mBitmap but here the interesting stuff is in mCanvas, not in mBitmap. So SourceSurfaceSkia::GetData() returns null.
The SourceSurfaceSkia was initialized by InitFromCanvas. aOwner was non-null, so it only initialized mCanvas and not mBitmap.

So what's the fix --- the comment on InitFromCanvas sounds like it should be to call DrawTargetWillChange, but it doesn't seem that this will pull any data from mCanvas?
(In reply to Benoit Jacob [:bjacob] from comment #2)
> Back to the short-term fix:
> 
> The problem is that SourceSurfaceSkia::GetData() wants to use mBitmap but
> here the interesting stuff is in mCanvas, not in mBitmap. So
> SourceSurfaceSkia::GetData() returns null.

I don't understand. With software skia, mBitmap has the bits for the canvas. With SkiaGL, it does a readback when you lockPixels, as GetData does.
(In reply to Benoit Jacob [:bjacob] from comment #1)
> Alright, so Jeff convinced me that this pattern is forced by Cairo, as we
> are passing it a memory buffer that it may access arbitrarily late, and the
> only way that it informs us of when it is done with that buffer is a
> callback that it calls on us.
> 
> So in my book, it's the Cairo authors who are smoking crack. But according
> to Jeff, this pattern exists also in Direct2D.
> 
> Anyway, we need to understand why our surface here is invalid; we should
> also fix gfxASurface::SetData to not silently do nothing when passed an
> invalid surface. It should either return bool, or assert.

I still don't really understand how this justifies the SetData hack. Why can't we have a gfxAzureSurface or something which wraps a SourceSurface and holds a strong reference itself?
Alright. With the patch for bug 848482 applied, we never have null data now, so this leak does not occur. I think we should fortify this path a little, though, so we don't leak when we SetData on an invalid surface.
(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #5)
> I still don't really understand how this justifies the SetData hack. Why
> can't we have a gfxAzureSurface or something which wraps a SourceSurface and
> holds a strong reference itself?

Please have this conversation directly with Jeff M.

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #4)
> I don't understand. With software skia, mBitmap has the bits for the canvas.
> With SkiaGL, it does a readback when you lockPixels, as GetData does.

I was referring to the state of the code before your recent patches. There, the absence of a mBitmap really caused GetData to fail.
This has been fixed for a little while now. Not sure whether what fixed it was Snorp's SourceSurfaceSkia fixes, or the ownership-refactoring of DrawTargetSkia. But it's fixed.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.