Closed Bug 1369317 Opened 3 years ago Closed 3 years ago

We suck at data URIs

Categories

(Core :: Networking, defect)

53 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: ossman, Assigned: valentin)

References

(Blocks 1 open bug)

Details

(Whiteboard: [qf:p2][necko-active])

Attachments

(4 files)

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:53.0) Gecko/20100101 Firefox/53.0
Build ID: 20170420084331

Steps to reproduce:

Run noVNC's performance test (vnc_playback.html) with the attached recording.


Actual results:

Running the test took ~15000 ms per iteration.


Expected results:

Each iteration should have taken ~5000 ms as this is what Chrome/Chromium achieves on the same system.
The profiling seems to reveal two major differences, one of which is the image handling. The recording includes a lot of JPEG data that the browser is expected to decode. The profiling shows these differences between Chromium and Firefox in execution time:

 - imageRect() (does Image.src = <data uri>)
   Firefox: 3000 ms + 1400 ms "Tools"
   Chromium: 750 ms

 - drawImage() (does CanvasRenderingContext2D.drawImage())
   Firefox: 500 ms + 2300 ms "Graphics"
   Chromium: 1600 ms

This is a performance bottleneck for noVNC as JPEG decoding is heavily utilised.

On the plus side Firefox seems to be much faster than Chromium at blitting between two canvases (400 ms vs 1200 ms). :)
Argh. The recording is too large to attach. For now you can get it here:

http://www.cendio.com/~ossman/cube.js.xz
Component: Untriaged → ImageLib
Product: Firefox → Core
FYI, I tracked down the other issue so the total run time is now ~10000 ms vs Chrome's ~5000ms.
Whiteboard: [qf]
Hi Pierre, do you have a link to a page that allows us to run this test?  Thanks!
Flags: needinfo?(ossman)
Alternatively, any chance you can try to install the Gecko Profiler extension from https://perf-html.io and try to capture a profile of the slowness and paste a link to it here by clicking Share after submitting the profile?  That will allow us to see what code was causing the slowness and get a better sense of where the problem lies inside Firefox.  Thanks!
Sure. I've set up a copy of noVNC here:

https://www.cendio.com/~ossman/noVNC/tests/vnc_playback.html?data=cube.js
Looks like a lot of time in imgLoader::LoadImage():

https://perfht.ml/2sscX8p

Are these data URI images?
Yes they are. The data comes in over a WebSocket, so we convert the data to a base64 data URI in order to ask the browser to decode them for us.
Flags: needinfo?(ossman)
Here is a run from my machine:

https://perfht.ml/2ru5Fnz
Pierre, have you tried putting these into Blob objects and use createObjectURL()?  That would avoid the need to b64 encode the images.  Just make sure to call revokeObjectURL() when you are done with them.

We also seem to have some poor performance issues on data URLs we should fix.
Flags: needinfo?(ossman)
This bug is all about our extremely poor handling of data URIs.

Looking at https://perfht.ml/2qELtMj, around 66% of the time is spent in imgLoader::LoadImage(), but not in anything about images.  Around *half* of the total time in nsContentUtils::LoadImage() is spent in nsDataChannel::OpenContentStream()!  Here are the two top places:

https://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/netwerk/protocol/data/nsDataChannel.cpp#29
https://searchfox.org/mozilla-central/source/netwerk/protocol/data/nsDataChannel.cpp#75

(This is also a bit expensive: https://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/netwerk/protocol/data/nsDataChannel.cpp#39.  Same with https://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/netwerk/protocol/data/nsDataChannel.cpp#45.)

This is really sad, this work should happen off the main thread.

Parsing the URIs is also very slow, 6.6% of the total time is spent parsing URIs *only* calling NS_EscapeURL() for example.  This is coming from <https://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/netwerk/base/nsSimpleURI.cpp#263>.  IINM we escape the entire URI just to find the first colon in it?!  That seems suboptimal.

Last but not least, using the entire URI as the image hash key for imagelib also shows up as 3.4% of the total time (under ImageCacheKey::ComputeHash()) but I'd say for now Necko side fixes are more prudent.
Component: ImageLib → Networking: HTTP
Component: Networking: HTTP → Networking
Summary: jpeg decoding very slow → We suck at data URIs
Whiteboard: [qf] → [qf:p1]
I think fixing this bug should include a Talos test exercising data urls.
RE: base64 encoded data URIs, bug 1333899 and it's patches might be of interest.
(In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #10)
> Pierre, have you tried putting these into Blob objects and use
> createObjectURL()?  That would avoid the need to b64 encode the images. 
> Just make sure to call revokeObjectURL() when you are done with them.
> 
> We also seem to have some poor performance issues on data URLs we should fix.

Okay, so I tried that, and the overall performance is roughly the same. I've shared a profile run here:

https://perfht.ml/2rxLSDB

The page is updated so you can test it yourselves as well.

However using blobs is apparently inefficient in Chrome as it dragged the performance there down to Firefox' levels.
Flags: needinfo?(ossman)
(In reply to Pierre Ossman from comment #14)
> (In reply to Ben Kelly [reviewing, but slowly][:bkelly] from comment #10)
> > Pierre, have you tried putting these into Blob objects and use
> > createObjectURL()?  That would avoid the need to b64 encode the images. 
> > Just make sure to call revokeObjectURL() when you are done with them.
> > 
> > We also seem to have some poor performance issues on data URLs we should fix.
> 
> Okay, so I tried that, and the overall performance is roughly the same. I've
> shared a profile run here:
> 
> https://perfht.ml/2rxLSDB
> 
> The page is updated so you can test it yourselves as well.

Ugh.  :-(  Looks like even after that change, nsDataHandler::NewURI() is spending around 775ms stripping whitespaces here <https://searchfox.org/mozilla-central/rev/a798ee4fc323f9387b7576dbed177859d29d09b7/netwerk/protocol/data/nsDataHandler.cpp#98>  And the overhead of GetAsciiSpec mentioned in comment 11 is still here (607ms).  Also some overhead of setting HTMLImageElement is present (around 415ms) just flattening strings...

In short, it seems like our code here is somehow ending up with images with a base64-encoded data URI again.  Are you sure that the code on your end is definitely now using only blob URIs and no data URIs at all?  Is this the correct test page <https://www.cendio.com/~ossman/noVNC/tests/vnc_playback.html?data=cube.js>?  Is it possible to make it console.log() the image URLs that it draws to the canvas?

BTW besides this, it seems like your test is falling into a sync decode path for images which is slow.  I opened bug 1371441 for that.

> However using blobs is apparently inefficient in Chrome as it dragged the
> performance there down to Firefox' levels.

It may be a good idea to file a bug on chromium for that.  The chromium folks would probably appreciate that!  :-)
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #15)
> 
> In short, it seems like our code here is somehow ending up with images with
> a base64-encoded data URI again.  Are you sure that the code on your end is
> definitely now using only blob URIs and no data URIs at all?

Sorry, I just realised there's a big source of noise in the data; the recordings are base64 encoded so there is a lot of decoding of that data going on. However it doesn't use URIs, so it doesn't explain why NewURI() is such a big offender.

I've grepped the code. The only place we assign anything to "src" is this line:

core/display.js:        img.src = URL.createObjectURL(blob);

So I think you'll have to debug the Gecko code to figure out why it is spending so much time there despite it being a Blob URL.

> Is this the correct test page
> <https://www.cendio.com/~ossman/noVNC/tests/vnc_playback.html?data=cube.js>?

Yes.

> Is it possible to make it console.log() the image URLs that it draws to the
> canvas?
> 

Sure. I've updated the page with that now.

> > However using blobs is apparently inefficient in Chrome as it dragged the
> > performance there down to Firefox' levels.
> 
> It may be a good idea to file a bug on chromium for that.  The chromium
> folks would probably appreciate that!  :-)

Alright. Reported as https://bugs.chromium.org/p/chromium/issues/detail?id=731586 :)
Thanks a lot, Pierre.  I've so far only looked at the profiles, just trying to make as much info available to the engineer who will look at this more in depth.  It's entirely possible that there is some quirk somewhere that causes blob URLs also fall into the slow code path here somehow...
Whiteboard: [qf:p1] → [qf:p1][necko-next]
Whiteboard: [qf:p1][necko-next] → [qf:p2][necko-next]
Assignee: nobody → valentin.gosu
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Whiteboard: [qf:p2][necko-next] → [qf:p2][necko-active]
This fixes about half of the issues mentioned in comment 11. 

This would be what is left:
Base64Decode - https://searchfox.org/mozilla-central/source/netwerk/protocol/data/nsDataChannel.cpp#75

NS_UnescapeURL - https://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/netwerk/protocol/data/nsDataChannel.cpp#39.
This is tricky, but maybe we can find a way to cache it, or not do it at all.

StripWhitespace - https://searchfox.org/mozilla-central/rev/1a0d9545b9805f50a70de703a3c04fc0d22e3839/netwerk/protocol/data/nsDataChannel.cpp#45
As we do this just before SetSpec, we can probably make SetSpec deal with whitespace as well, in just one pass.
Comment on attachment 8896689 [details]
Bug 1369317 - Do not escape the returned string in nsSimpleURI::GetAsciiSpec

https://reviewboard.mozilla.org/r/167988/#review173434
Attachment #8896689 - Flags: review?(mcmanus) → review+
Comment on attachment 8896690 [details]
Bug 1369317 - Make sure nsSimpleURI::SetPathQueryRef escapes its non-ASCII argument

https://reviewboard.mozilla.org/r/167990/#review173436
Attachment #8896690 - Flags: review?(mcmanus) → review+
Comment on attachment 8896691 [details]
Bug 1369317 - Use net_ExtractURLScheme to parse the scheme in nsSimpleURI::SetSpec

https://reviewboard.mozilla.org/r/167992/#review173438
Attachment #8896691 - Flags: review?(mcmanus) → review+
Comment on attachment 8896692 [details]
Bug 1369317 - Filter and escape URI string in only one pass

https://reviewboard.mozilla.org/r/167994/#review173442
Attachment #8896692 - Flags: review?(mcmanus) → review+
(In reply to Valentin Gosu [:valentin] from comment #23)
> This fixes about half of the issues mentioned in comment 11. 
> 
> This would be what is left:
> Base64Decode -
> https://searchfox.org/mozilla-central/source/netwerk/protocol/data/
> nsDataChannel.cpp#75
> 
> NS_UnescapeURL -
> https://searchfox.org/mozilla-central/rev/
> 1a0d9545b9805f50a70de703a3c04fc0d22e3839/netwerk/protocol/data/nsDataChannel.
> cpp#39.
> This is tricky, but maybe we can find a way to cache it, or not do it at all.
>
> StripWhitespace -
> https://searchfox.org/mozilla-central/rev/
> 1a0d9545b9805f50a70de703a3c04fc0d22e3839/netwerk/protocol/data/nsDataChannel.
> cpp#45
> As we do this just before SetSpec, we can probably make SetSpec deal with
> whitespace as well, in just one pass.

Just a reminder most of this is addressed in bug 1333899. We could probably pull those over here and just leave the string stream change in the other bug.
Pushed by valentin.gosu@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/74f97f3b48f8
Do not escape the returned string in nsSimpleURI::GetAsciiSpec r=mcmanus
https://hg.mozilla.org/integration/autoland/rev/0fb8312cb920
Make sure nsSimpleURI::SetPathQueryRef escapes its non-ASCII argument r=mcmanus
https://hg.mozilla.org/integration/autoland/rev/b3a881b7896e
Use net_ExtractURLScheme to parse the scheme in nsSimpleURI::SetSpec r=mcmanus
https://hg.mozilla.org/integration/autoland/rev/454cae437211
Filter and escape URI string in only one pass r=mcmanus
I'm not familiar with your routines, but does this mean that Firefox nightly contains these changes and I can do a test of that? :)
(In reply to Pierre Ossman from comment #30)
> I'm not familiar with your routines, but does this mean that Firefox nightly
> contains these changes and I can do a test of that? :)

This change will most likely be in the nightly update on august 17.
Blocks: 1392231
FWIW nightly shows a 20% improvement compared to beta when refreshing page with the "data-url-fetch" service worker installed here:

https://jakearchibald.github.io/service-worker-benchmark/
Pierre,

Could you try running the noVNC test on nightly and see if we've narrowed the 15 vs 5 sec performance gap here with Chrome?
Flags: needinfo?(ossman)
Alright, some new tests:

 Chrome 60: 7.0 seconds
 Firefox 55: 7.5 - 8.0 seconds
 Firefox Nightly (57): 5.0 - 7.0 seconds

So Chrome seems to have regressed a bit, or I've gotten a general performance hit on my system. Chrome is very consistent with its numbers though.

Firefox has apparently already improved massively between 53 when I reported this, and 55 which is the current stable version I have installed. However nightly further improves those numbers.

Both stable and nightly have large variation in run times though. I see a jerk now and then in the animation, so it might be some garbage collection problem or similar. The best number I got for nightly was 4789 ms, and the worst was 7367 ms.



So in summary, the problem does indeed seem to be solved. :)
Flags: needinfo?(ossman)
(In reply to Pierre Ossman from comment #35)
> Firefox has apparently already improved massively between 53 when I reported
> this, and 55 which is the current stable version I have installed. However
> nightly further improves those numbers.
> 
> Both stable and nightly have large variation in run times though. I see a
> jerk now and then in the animation, so it might be some garbage collection
> problem or similar. The best number I got for nightly was 4789 ms, and the
> worst was 7367 ms.

Thanks for re-testing!  We can look at the jank as well if you can capture in a profile like in comment 9.
Test run with nightly:

https://perfht.ml/2vPtnZY

This test did five iterations of the performance test. Iteration 2 and 5 definitely had the jank. Iteration 1 was the slowest, but that might be because of some startup costs.
Hmm.   This profile doesn't have symbols for some reason.  Were you using a local build or did you copy the URL quickly after loading it?
Odd. It was the build I downloaded from you guys on the 28th. And I could see the symbols in the initial, local window.

Should I just try again, or is there some step I should be extra careful about?
Maybe try a new nightly build?  Markus are there issues with symbolification at the moment?
Flags: needinfo?(mstange)
I'm not aware of any. I'll keep an eye out.
Flags: needinfo?(mstange)
New attempt with the current nightly build:

https://perfht.ml/2wycLJw

Seems to have worked better.

In this trace it was the third iteration that had the largest jank, with iteration 4 and 5 being unaffected.
(In reply to Pierre Ossman from comment #42)
> New attempt with the current nightly build:
> 
> https://perfht.ml/2wycLJw
> 
> Seems to have worked better.

Thanks!  I moved this over to a new bug in bug 1396870.
Depends on: 1418810
See Also: → 1426996
You need to log in before you can comment on or make changes to this bug.