Open Bug 1092837 Opened 10 years ago Updated 2 years ago

Setting an Image element's src to a Blob or data URI should allow synchronous retrieval of decoded image data

Categories

(Core :: Graphics: ImageLib, defect)

defect

Tracking

()

People

(Reporter: till, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(2 files, 2 obsolete files)

Currently, when an Image gets a blob or data URI as its source, this sequence of events is triggered:
1. in the event turn where `src` was set: nothing
2. in a later event turn: the Image gets an internal `ready` flag set to `true`, and `onload` is triggered. Critically, the image is not decoded at this point
3. if at a later point, but perhaps as soon as from under the onload handler, content script requests decoded image data (e.g. by drawing it into a canvas), the image is decoded synchronously

(I'm not 100% sure of the details in step 2. The important part is that decoding doesn't *necessarily* happen asynchronously and before `onload` is triggered.)

Given that no network requests are involved and that we already block on image decoding at the point where image data is requested and don't do much besides setting a flag before we trigger `onload`, I propose to keep steps 1 and 2 as-is, but enable step 3 immediately. I.e., make accessing the decoded data possible synchronously right after `src` has been set.

Seth, does this capture what we talked about a few days ago?


The immediate reason for filing this bug is that, without having something to the same effect, it's impossible to fully implement Flash's SWF parsing semantics without shipping a full image decoding library in JS with Shumway: Flash guarantees that whenever a byte of data has been loaded, its contents are available for full consumption by content, including access to contained images' pixel data.
Flags: needinfo?
/me grumbles about people misunderstanding the [:nick] thing in bugzilla
Flags: needinfo? → needinfo?(seth)
So baku has been kind enough to agree to look at this bug. I understand he's a blob expert! =)

I'm going to monitor the bug and try to help out with ImageLib issues. Please needinfo me (or ping me on IRC) when questions arise; that'll make me notice faster.

We have a jsbin here that demonstrates the problem:

http://jsbin.com/tehofewoxa/2/edit?html,js,output

It also contains code for the data URI case; I'd be happy to put together a more minimal demonstration if you'd like.
Flags: needinfo?(seth)
Investigating this issue I found these issues:

1. a blob could not have all the data available. For instance a blob can point to a real file, or it can be a remote blob. In this case the data is received from the main-thread using IPC.

2. the reasonwhy dataURLs are immediately shown is because the data is in cache. Changing the script to produce different dataURLs all the time, the canvas output will be broken as it is with blobs.
With 'onload' this problem goes away.

Makes sense?
Flags: needinfo?(seth)
(In reply to Andrea Marchesini (:baku) from comment #3)
> 1. a blob could not have all the data available. For instance a blob can
> point to a real file, or it can be a remote blob. In this case the data is
> received from the main-thread using IPC.

I'm not sure I understand. Is a 'remote blob' a blob that needs to receive its data via IPC, or a blob that needs to receive its data from the network?

We're running the canvas drawing code on the main thread as well. In theory it should be no problem to block for receiving data as long as it's coming via IPC and not over the network. (Drawing an image to canvas is a synchronous process that can block the main thread for other reasons anyway.) I can imagine in practice it could be problematic if we need to pump the event loop to receive the data.

> 2. the reasonwhy dataURLs are immediately shown is because the data is in
> cache. Changing the script to produce different dataURLs all the time, the
> canvas output will be broken as it is with blobs.
> With 'onload' this problem goes away.

Ah yes, I remember that we had discovered this before, but then I forgot about it! We should absolutely fix that case as well. There are clearly no inherent difficulties in the case of data URIs.
Flags: needinfo?(seth)
Attached patch image.patch (obsolete) — Splinter Review
This patch works, but I don't know if this is a good approach for this issue.
The basic idea is that, if the image is a data or a blob URI, we do a sync loading, calling the ProxyListener callbacks 1 by 1 in the correct order.
Attachment #8532860 - Flags: review?(seth)
Comment on attachment 8532860 [details] [diff] [review]
image.patch

Review of attachment 8532860 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good! Thanks again for tackling this, Andrea!

There are some minor issues which need fixing (see the comments below) but I mainly want to hold off on r+'ing until we get some feedback from the Necko team. Let's make sure there are no hidden gotchas we're missing.

Jason, could you please take a look at this or forward the f? to the appropriate person?

::: dom/canvas/test/test_sync_loading.html
@@ +8,5 @@
> +SimpleTest.waitForExplicitFinish();
> +
> +function testDataURL() {
> +  var canvas = document.createElement('canvas');
> +  canvas.style.border = '1px solid red';

I'm not sure whether we should use red here, because usually in our tests red indicates some sort of failure, and green indicates success. But here the border colors just tell you which canvas is which, right? Maybe using colors other than red and green would be better.

Also, it might be nice to add some comments to this function to explain in words what it does.

@@ +39,5 @@
> +    ctx3.drawImage(img2, 0, 0);
> +    var dataURL3 = canvas3.toDataURL();
> +
> +    is(dataURL1, dataURL2, "Data1 and data2 should match");
> +    is(dataURL1, dataURL3, "Data1 and data3 should match");

Should change the message here so that it's obvious whether the data URL test or the blob URL test failed.

This and my previous comment apply to testBlobURL, too.

::: image/src/imgLoader.cpp
@@ +2013,5 @@
>    // Keep the channel in this scope, so we can adjust its notificationCallbacks
>    // later when we create the proxy.
>    nsCOMPtr<nsIChannel> newChannel;
> +
> +  // Usually the loading of an image is done asynchrounsly, except for blob and

Nit: "asynchronously"

@@ +2100,5 @@
> +    if (NS_WARN_IF(NS_FAILED(rv))) {
> +      return rv;
> +    }
> +
> +    if (scheme.Equals("blob") || scheme.Equals("data")) {

nsIURI has a SchemeIs method that you use instead of GetScheme/Equals here.

@@ +2104,5 @@
> +    if (scheme.Equals("blob") || scheme.Equals("data")) {
> +      rv = newChannel->Open(getter_AddRefs(syncInputStream));
> +      if (NS_WARN_IF(NS_FAILED(rv))) {
> +        return rv;
> +      }

Probably we should handle failure in the same way (call CancelAndAbort, etc.) whether we call Open or AsyncOpen, right? Or is there a reason not to?

@@ +2173,5 @@
> +
> +  if (syncInputStream) {
> +    // This RAII class helps to abort the request in case one of the following
> +    // operations fails.
> +    class MOZ_STACK_CLASS RAII {

Nice! It'd be better to define this outside of this function, though, and give it a clearer name. Seems like we could use the same class with the Open/AsyncOpen code above.

@@ +2211,5 @@
> +
> +    rv = listener->OnStopRequest(newChannel, nullptr, NS_OK);
> +    if (NS_WARN_IF(NS_FAILED(rv))) {
> +      return rv;
> +    }

This set of calls looks like it should be its own function to me. Maybe "PumpSynchronousRequest"?
Attachment #8532860 - Flags: feedback?(jduell.mcbugs)
Just pulling this out so it's easier to see:

(In reply to Seth Fowler [:seth] from comment #6)
> Jason, could you please take a look at this or forward the f? to the
> appropriate person?
Attached patch image.patch (obsolete) — Splinter Review
Attachment #8532860 - Attachment is obsolete: true
Attachment #8532860 - Flags: review?(seth)
Attachment #8532860 - Flags: feedback?(jduell.mcbugs)
Attachment #8533436 - Flags: review?(seth)
Attachment #8533436 - Flags: feedback?(jduell.mcbugs)
So after thinking about this patch more I have further questions. (Some of which we may have already covered in face-to-face discussions, but the last week has been a whirlwind and I don't remember the answers to everything.)

Andrea, you mentioned that some blobs are remote blobs (which have to be transferred from the main thread using IPC). Is that the only case where we might need to block for a long time for the data to be available? (And, indeed, *can* we safely block in that case, or do we need to pump the event loop to receive the data?) Can we tell when these slow cases are going to happen, and perhaps have a different policy for those cases?

Jason, is it safe to synchronously deliver Necko notifications (like imgLoader::PumpSynchronousRequest does in Andrea's patch) from off-main-thread? We don't need that right now, but eventually it'd be nice to deliver these notifications off-main-thread and simply block if we need to draw before all the notifications are delivered. ImageLib is not ready for that architecturally, but we may get there, and it'd help me in evaluating this patch and planning for the future if I knew whether Necko can deal with that.
Flags: needinfo?(amarchesini)
Flags: needinfo?(jduell.mcbugs)
(To be clear, Andrea's patch does *not* currently deliver those notifications from off-main-thread.)
I'm confused about why we'd read from a Blob synchronously if it's a file (presumably that could block on disk I/O?).  But I'm no Blob expert.

> is it safe to synchronously deliver Necko notifications (like imgLoader::PumpSynchronousRequest does in Andrea's patch) from off-main-thread?

You might get away with it if you can be sure that the listener class supports it, but there are some subtleties. Any addon can currently intersperse itself as a listener:

   http://mxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsITraceableChannel.idl#15

The way we "solved" this for off-main delivery of OnDataAvailable (ODA is the only function we support delivering off-main right now: OnStart/Stop always go to main thread) is to check the listener chain to ensure all listeners implement nsIThreadRetargetableStreamListener. Addons don't, so in that case we revert to main-thread delivery.  That logic is already part of ImgRequest:

   http://mxr.mozilla.org/mozilla-central/source/image/src/imgRequest.cpp#720

I would love to ban nsITraceableChannel usage, but until we do, we can't do off-main thread without checks like these.

I forget the exact reasons why it was hard to get off-main OnStartRequest working.  If you know your use case supports it, you could give it a try--the issue here is not whether "necko is OK with it", it's whether the listener is OK with it.

> eventually it'd be nice to deliver these notifications off-main-thread and simply 
> block if we need to draw before all the notifications are delivered. 

I don't understand this use case--you'd block the main thread to wait for the off-main thread calls to happen? (my knowledge of imglib is thin :)
Flags: needinfo?(jduell.mcbugs)
Comment on attachment 8533436 [details] [diff] [review]
image.patch

Review of attachment 8533436 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/src/imgLoader.cpp
@@ +2131,5 @@
> +      return rv;
> +    }
> +
> +    if (isBlob || isData) {
> +      rv = newChannel->Open(getter_AddRefs(syncInputStream));

As per my previous comment, my only question is whether it's ok to do sync Open() on a Blob, i.e. could it block on I/O (or IPDL), and is that ok?

@@ +2462,5 @@
> +    return rv;
> +  }
> +
> +  rv = aListener->OnStartRequest(aChannel, nullptr);
> +  if (NS_WARN_IF(NS_FAILED(rv))) {

You can't simply return here: a listener returning something other than NS_OK causes a channel to be logically canceled, but the listener contract is that if AsyncOpen succeeds, we will call both OnStartRequest and OnStopRequest.  So instead of returning, you need to skip OnDataAvailable and then call OnStopRequest with the error rv code as the status.

@@ +2468,5 @@
> +  }
> +
> +  rv = aListener->OnDataAvailable(aChannel, nullptr, aInputStream, 0, size);
> +  if (NS_WARN_IF(NS_FAILED(rv))) {
> +    return rv;

Same here--don't return:  instead call OnStop with rv

@@ +2471,5 @@
> +  if (NS_WARN_IF(NS_FAILED(rv))) {
> +    return rv;
> +  }
> +
> +  rv = aListener->OnStopRequest(aChannel, nullptr, NS_OK);

pass 'rv' instead of NS_OK.

@@ +2472,5 @@
> +    return rv;
> +  }
> +
> +  rv = aListener->OnStopRequest(aChannel, nullptr, NS_OK);
> +  if (NS_WARN_IF(NS_FAILED(rv))) {

Not sure you need to WARN, really.  You can probably just remove and have all code go to 'return rv'
Attachment #8533436 - Flags: feedback?(jduell.mcbugs) → feedback+
(In reply to Jason Duell [:jduell] (needinfo? me for lower latency) from comment #11)
> I don't understand this use case--you'd block the main thread to wait for
> the off-main thread calls to happen? (my knowledge of imglib is thin :)

Yes, that's exactly right! The idea would be that typically, we would not need to block the main thread, because nobody would try to draw the image fast enough to care that the notifications are being delivered asynchronously. But if someone did try to draw the image in a synchronous fashion (for example, by drawing it to a canvas, where we're required to block the main thread if necessary to draw successfully), then the drawing code could wait for the notifications to be delivered before attempting to draw.

That's not what we're trying to implement in this bug, though. Here we're just eagerly delivering the notifications on the main thread. But it's useful to know what would be involved in implementing the other approach in the long term.
(In reply to Jason Duell [:jduell] (needinfo? me for lower latency) from comment #11)
> I'm confused about why we'd read from a Blob synchronously if it's a file
> (presumably that could block on disk I/O?).  But I'm no Blob expert.

A bit more context: This feature is needed for shumway where we read a swf file and we create blobs for each embedded image. For what I understood, flash needs sync decoding of images. This is the reason why we need this feature. Seth, is this correct?

There are several way to improve this patch. For instance we can enable the sync loading just for memory blobs. But we should speak with shumway people to know the kind of blob they have or they want to use.
Flags: needinfo?(amarchesini)
(In reply to Andrea Marchesini (:baku) from comment #14)
> A bit more context: This feature is needed for shumway where we read a swf
> file and we create blobs for each embedded image. For what I understood,
> flash needs sync decoding of images. This is the reason why we need this
> feature. Seth, is this correct?

That's correct, yes: SWF files contain embedded images, which are available for synchronous decoding. What that means is that, as soon as the part of the file that contains the image has been loaded, content can request the image and access the contained pixels.

Currently, we don't report bytes as loaded until all contained images have been loaded. This works, but has two issues: one is that it causes significant slowdowns during startup - not only because the browser has to load the blob, but more importantly because we have to parse the part of the SWF file containing it into said blob. The other is that we pay the price for these images in terms of memory usage when many of them will only be used far later in the SWFs runtime; perhaps never, if they're used in, say, an error screen that's not usually shown.
> 
> There are several way to improve this patch. For instance we can enable the
> sync loading just for memory blobs. But we should speak with shumway people
> to know the kind of blob they have or they want to use.

Since I don't know the implementation details of all this, let me describe what exactly we do:
1. create a typed array of the size of the encoded image
2. parse the part of the SWF file containing the image and store the results in that typed array
3. create a blob from that typed array
4. create an Image element and load the blob

Naively, I'd assume that that guarantees the blob to be in-memory.
Andrea, it looks to me like Till answered your question. It seems like Shumway's needs are met by supporting synchronous loading for in-memory blobs only.

In the short term, I think we should only perform synchronous loading if it doesn't mean blocking the main thread for a long time on IO. So probably file-based or remote blob URIs are out, as long as we can meet Shumway's needs without having them. (I don't know what other kind of blob URIs there are.)

In the long term, we can think about expanding our support for this to other kinds of blob URIs - that's why I was asking Jason about delivering these notifications off-main-thread. But we don't need it for this bug.

I need to take a good look at the spec to figure out exactly what's needed, but I suspect this would be standardizable even if only a subset of blob URIs are supported. We just need to make sure that the img element's |complete| property reflects whether synchronous loading occurred, and that should be enough for content authors to use. We basically just need to make sure that the spec doesn't forbid this behavior, and we're golden.

It would be a *very* good idea to make sure that when synchronous loading happens, the |complete| property is also updated synchronously. That should be true with the patch you've written Andrea. Would you mind updating the tests to check that as well? I think it's really important.
Attached patch image.patchSplinter Review
in this new version we check the kind of blob and if the blob is in memory, we load it synchronously.
Attachment #8533436 - Attachment is obsolete: true
Attachment #8533436 - Flags: review?(seth)
Attachment #8534319 - Flags: review?(seth)
Comment on attachment 8534319 [details] [diff] [review]
image.patch

Review of attachment 8534319 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great, Andrea! Thanks so much for your hard work on this!
Attachment #8534319 - Flags: review?(seth) → review+
...And now the fun begins. =)

Looks like there's a leak and at least one test failure. Let me know if you run into any trouble debugging this and I'll be happy to help.
(In reply to Seth Fowler [:seth] from comment #20)
> ...And now the fun begins. =)
> 
> Looks like there's a leak and at least one test failure. Let me know if you
> run into any trouble debugging this and I'll be happy to help.

I fixed the leak but I'm still having some issue with <img src="data:text/html," />
Working on it.
Seth, I'm currently blocked by other bugs.
I can work on it in the next 2 weeks, is it ok for you?
Flags: needinfo?(seth)
While it'd still be really nice to have this as a performance and memory-usage optimization, it doesn't block Shumway progress, really.
Blocks: shumway-m5
No longer blocks: shumway-m3
Flags: needinfo?(seth)
We may want to put an upper bound on the size of the request that we will send synchronously so as not to block the main thread too long for ridiculously sized images.
We need this functionality in noVNC for similar performance reasons as Shumway, so it would be nice if development could be finished. :)

Currently we are using data URIs though. Would it be better to switch to Blob URIs?
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: